02-15-2017
Count unique words
Dear all,
I would like to know how to list and count unique words in thousands number of text files.
Please help me out
thanks in advance
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
How can i read all the unique words in a file, i used -
cat comment_file.txt | /usr/xpg6/bin/tr -sc 'A-Za-z' '/012'
and
cat comment_file.txt | /usr/xpg6/bin/tr -sdc 'A-Za-z' '/012'
but they didnt worked..... (5 Replies)
Discussion started by: aditya.ece1985
5 Replies
2. Shell Programming and Scripting
find the number of unique words in a file using sort com-
mand. (7 Replies)
Discussion started by: abhikamune
7 Replies
3. Shell Programming and Scripting
hello,
i 'd like your help about a bash script which:
1. finds inside the html file (it is attached with my post) the code number of the Latest Stable Kernel,
2.finds the link which leads to the download location of the Latest Stable Kernel version,
(the right link should lead to the file... (3 Replies)
Discussion started by: alex83
3 Replies
4. Homework & Coursework Questions
Hello, I tried to count all unique words of all files in one folder and its subfolders. Can anybody say me, why this doesnt work:
ls| find -d | cat | tr "\ " "\n"| uniq -u | wc -l
???
Cat writes only the names of those files, but not the wors, which should be in them.
Thanks for any advice.
... (9 Replies)
Discussion started by: Dworza
9 Replies
5. Shell Programming and Scripting
I am having a file with duplicate words how can I eliminate them
ant,bat
bat,cat
cat a.txt | grep -bat | awk '{print $1}'
expecting o/p as ant,bat,cat
How can I display the output as ant,bat,cat in a single line and no duplicates exists. (2 Replies)
Discussion started by: shikshavarma
2 Replies
6. Shell Programming and Scripting
In each row there could be repetition of a word. I want to delete all repetitions and keep unique occurrences.
Example:
a+b+c ab+c ab+c
abbb+c ab+bbc a+bbbc
aaa aaa aaa
Output:
a+b+c ab+c
abbb+c ab+bbc a+bbbc
aaa (6 Replies)
Discussion started by: Viernes
6 Replies
7. Shell Programming and Scripting
Im looking for an awk script that will take the unique values in column 5, then print and count the unique values in column 6.
CA001011500 11111 11111 -9999 201301 AAA
CA001012040 11111 11111 -9999 201301 AAA
CA001012573 11111 11111 -9999 201301 BBB
CA001012710 11111 11111 -9999 201301... (4 Replies)
Discussion started by: ncwxpanther
4 Replies
8. Shell Programming and Scripting
Hi ,
I need to count the number of errors associated with the two words occurring in the file. It's about counting the occurrences of the word "error" for where is the word "index.js". As such the command should look like. Please kindly help. I was trying: grep "error" log.txt | wc -l (1 Reply)
Discussion started by: jmarx
1 Replies
9. Shell Programming and Scripting
Hello Team,
I need your help on the following:
My input file a.txt is as below:
3330690|373846|108471
3330690|373846|108471
0640829|459725|100001
0640829|459725|100001
3330690|373847|108471
Here row 1 and row 2 of column 1 are identical but corresponding column 2 value are... (4 Replies)
Discussion started by: angshuman
4 Replies
10. Shell Programming and Scripting
Hello,
I have a dictionary which I am building for the Open Source Community. The data structure is as under
HEADWORD=PARTOFSPEECH=ENGLISH MEANING
as shown in the example below
अ=m=Prefix signifying negation.
अँहँ=ind=Interjection expressing disapprobation.
अं=int=An interjection... (2 Replies)
Discussion started by: gimley
2 Replies
wc(1) General Commands Manual wc(1)
NAME
wc - count words, lines, and bytes or characters in a file
SYNOPSIS
[file]...
DESCRIPTION
The command counts lines, words, and bytes or characters in the named files, or in the standard input if no file names are specified. It
also keeps a total count for all named files.
A word is a string of characters delimited by spaces, tabs, or newlines.
Options
recognizes the following options:
Report the number of bytes in each input file.
Report the number of newline characters in each input file.
Report the number of characters in each input file.
Report the number of words in each input file.
The and options are mutually exclusive. Otherwise, the and or options can be used in any combination to specify that a subset of lines,
words, and bytes or characters are to be reported.
When any option is specified, reports only the information requested. If no option is specified, the default output is
When a file is specified on the command line, its name is printed along with the counts.
Standard Output
By default, the standard output contains an entry for each input file in the form:
newlines words bytes file
If the option is specified, the number of characters replaces the bytes field in this format.
If any option is specified, the fields for the unspecified options are omitted.
If no file operand is specified, neither the file name nor the preceding blank character is written.
If more than one file operand is specified, an additional line is written at the end of the output, of the same format as the other lines,
except that the word (in the POSIX locale) is written instead of a file name and the total of each column is written as appropriate.
Under UNIX Standard environment, a word is a string of characters delimited by spaces, tabs, newline, carriage-return, vertical tab, or
form-feed.
RETURN VALUE
exits with one of the following values:
Successful completion.
An error occurred.
EXTERNAL INFLUENCES
For information about the UNIX Standard environment, see standards(5).
Environment Variables
determines the range of graphics and space characters, and the interpretation of text as single- and/or multibyte characters.
determines the language in which messages are displayed.
If or is not specified in the environment or is null, they default to the value of
If is not specified or is null, it defaults to (see lang(5)).
If any internationalization variable contains an invalid setting, they all default to See environ(5).
International Code Set Support
Single- and multibyte character code sets are supported. with a newline character, the count will be off by one.
WARNINGS
The command counts the number of newlines to determine the line count. If a text file has a final line that is not terminated with a new-
line character, the count will be off by one.
EXAMPLES
Print the number of words and characters in
The following is printed when the above command is executed:
where words is the number of words and chars is the number of characters in
SEE ALSO
standards(5).
STANDARDS CONFORMANCE
wc(1)