WC(1) General Commands Manual WC(1)NAME
wc - word count
SYNOPSIS
wc [ -lwrbc ] [ file ... ]
DESCRIPTION
Wc counts lines, words, runes, syntactically-invalid UTF codes and bytes in the named files, or in the standard input if no file is named.
A word is a maximal string of characters delimited by spaces, tabs or newlines. The count of runes includes invalid codes.
If the optional argument is present, just the specified counts (lines, words, runes, broken UTF codes or bytes) are selected by the letters
l, w, r, b, or c. Otherwise, lines, words and bytes (-lwc) are reported.
SOURCE
/sys/src/cmd/wc.c
BUGS
The Unicode Standard has many blank characters scattered through it, but wc looks for only ASCII space, tab and newline.
Wc should have options to count suboptimal UTF codes and bytes that cannot occur in any UTF code.
WC(1)
Check Out this Related Man Page
UTF(6) Games Manual UTF(6)NAME
UTF, Unicode, ASCII, rune - character set and format
DESCRIPTION
The Plan 9 character set and representation are based on the Unicode Standard and on the ISO multibyte UTF-8 encoding (Universal Character
Set Transformation Format, 8 bits wide). The Unicode Standard represents its characters in 16 bits; UTF-8 represents such values in an
8-bit byte stream. Throughout this manual, UTF-8 is shortened to UTF.
In Plan 9, a rune is a 16-bit quantity representing a Unicode character. Internally, programs may store characters as runes. However, any
external manifestation of textual information, in files or at the interface between programs, uses a machine-independent, byte-stream
encoding called UTF.
UTF is designed so the 7-bit ASCII set (values hexadecimal 00 to 7F), appear only as themselves in the encoding. Runes with values above
7F appear as sequences of two or more bytes with values only from 80 to FF.
The UTF encoding of the Unicode Standard is backward compatible with ASCII: programs presented only with ASCII work on Plan 9 even if not
written to deal with UTF, as do programs that deal with uninterpreted byte streams. However, programs that perform semantic processing on
ASCII graphic characters must convert from UTF to runes in order to work properly with non-ASCII input. See rune(2).
Letting numbers be binary, a rune x is converted to a multibyte UTF sequence as follows:
01. x in [00000000.0bbbbbbb] -> 0bbbbbbb
10. x in [00000bbb.bbbbbbbb] -> 110bbbbb, 10bbbbbb
11. x in [bbbbbbbb.bbbbbbbb] -> 1110bbbb, 10bbbbbb, 10bbbbbb
Conversion 01 provides a one-byte sequence that spans the ASCII character set in a compatible way. Conversions 10 and 11 represent higher-
valued characters as sequences of two or three bytes with the high bit set. Plan 9 does not support the 4, 5, and 6 byte sequences pro-
posed by X-Open. When there are multiple ways to encode a value, for example rune 0, the shortest encoding is used.
In the inverse mapping, any sequence except those described above is incorrect and is converted to rune hexadecimal 0080.
FILES
/lib/unicode
table of characters and descriptions, suitable for look(1).
SEE ALSO ascii(1), tcs(1), rune(2), keyboard(6), The Unicode Standard.
UTF(6)
I had received a complaint recently that 'wc' is not returning a correct byte count for large files (in 10s of GB). Does 'wc' have any such limitation? If yes, would 'ls' be a better option to fetch file size.
I am planning on replacing wc with ls for calculating file sizes in the existing code.... (13 Replies)
Hey my friend was asking me if i knew a way to cout how many different words in a file. I told him no not off hand, but i was thinking about it, and i started to wonder also. I imagine this is probably pretty simple im just missing something, I keep confusing my self with how you would compair and... (16 Replies)
Hi,
This might be a very basic question but i am begineer with UNIX. The output of wc -l gives the line count along with the filename.
$ wc -l compare_output.dat > test.dat
$ more test.dat
10 compare_output.dat
I just want the digit 10 in this sceniro. Can anyone plez help me on... (15 Replies)
Hi all i have a syntax , Can some one please give me a script
1, I need to check and execute the command.
ps -ef | grep java | wc -l
5
Output should me 5
if not have to run the command:
ps -ef | grep java
the following java process... (11 Replies)
Hi there,
I'd like to find a way to display a string and count the words in it.
supernova:~# echo 'hello world' | tee - | wc
Unfortunately, this doesn't work.
Any idea?
Thanks in advance.
Santiago (15 Replies)
Hi all,
I need to count the number of lines in all the files under a directory (several levels deep). I am feeling extremely dumb, but I don't know how to do that. Needless to say, I am not a shell script wiz... Any advice?
thanks in advance! (13 Replies)
Hi guys!
I need to cut first row from a file (using awk) without the record separator character (in my case its MS-DOS 0D0A) and field separator character (in my case ; 3B) and put it in another file.
Can you help with that?
Regards,
PsmakR (18 Replies)
Hi techies ..
This is my first posting hr ..
Am facing a serious performance problem in counting the number of lines in the file. The input files i get will be in some 10 to 15 Gb of size or even sometimes more ..and I will load it to db
I have used wc -l to confirm whether the loader... (14 Replies)
I have to create a Perl script which will transpose the data output from my experiment, from columns to rows, in order for me to analyse the data.
I am a complete Perl novice so any help would be greatly appreciated.
The data as it stands looks like this:
Subject Condition Fp1 ... (12 Replies)
Hi, I am facing issue with cut and wc. here is the sample.
the data in file -
tail -1 05_19_BT_TBL_LOAD_20120524064242.bad|cut -c9-58
WatsSaver - AGGREGATED PLAN1581 CALLS FOR 2872.6
tail -1 05_19_BT_TBL_LOAD_20120524064242.bad|cut -c9-58|wc -c
51
tail -1... (12 Replies)
Hey All,
Quick question...
I'm writing a short script to check if a continuous port is running on a server.
I'm using "ps -ef | grep -v grep | grep processName" and I was wondering if it was better/more reliable to just check the
return code from the command or if its better to pipe to... (12 Replies)
Hallo Friends,
I have +/- 800K files that i need to go through. The files have extention .csv
The script that i need should read field number 14 and if field number 14 is equal to 6010 or 2345 or 5690 or 8670 or 4567 then i need to know the total count.
... (17 Replies)
On AIX When I Run the commands below I get -
cat tt11.ksh
#!/bin/ksh
ps -eaf |grep tt11.ksh|grep -v grep|wc -l
count=`ps -eaf |grep tt11.ksh|grep -v grep|wc -l`
echo "value of count is $count"
Output (what I expected)
./tt11.ksh
1
value of count is 1
When I Run... (12 Replies)
Im repeating same command to get count, filename from 4 different files, writing to one same file.
awk 'END{print NR"|"FILENAME}' file.txt >> temp.txt;
awk 'END{print NR"|"FILENAME}' asdf.txt >> temp.txt;
awk 'END{print NR"|"FILENAME}' lkjh.txt >> temp.txt;
awk 'END{print NR"|"FILENAME}'... (12 Replies)
I have a directory of files, I can show the number of lines in each file and order them from lowest to highest with:
wc -l *|sort
15263 Image.txt
16401 reference.txt
40459 richtexteditor.txt
How can I also print the number of unique lines in each file?
15263 1401 Image.txt
16401... (15 Replies)