Count unique words


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Count unique words
# 8  
Old 02-27-2017
Code:
tr -cs A-Za-z\' '\n' | tr A-Z a-z | sort | uniq -c | sort -k1,1nr -k2 | sed ${1:-100}q

May you try this one, it is not my creation, but it worked for my purposes to get the most frequent one hundred words in a file. You can adapt the value 100 to any other number.
# 9  
Old 02-27-2017
Not clear to me -

you want to read the files and count total unique words in the text inside the files?
You want word frequencies like this
Code:
1033 cow
999   the
998   family

If those files are large consider doing something else while this code runs.
Code:
awk ' {
       $0=tolower($0)
       for (i=1; i<=NF; i++) {arr[$(i)]++}
       }
       END {for (i in arr) {print arr[i], i  }}
     '  *National*  *International* *Health*  > wordcount.txt
     # this gives the top 500 most common words.  Work with the output file
     # to get what you want.
     sort -k1n wordcount.txt | tail -500

Some of the replies answer a different question, it seems to me. So I am not sure if this is what you want.
# 10  
Old 02-27-2017
This is a ahell example that searches my $HOME for 'Scope' inside filenames containing the characters 'Scope' then each individual file that contains the same...
OSX 10.12.3, default terminal calling 'sh'...
Code:
Last login: Mon Feb 27 20:01:48 on ttys000
AMIGA:barrywalker~> cd Desktop/Code/Shell
AMIGA:barrywalker~/Desktop/Code/Shell> cat search.sh
#!/bin/sh
# search.sh $1
ls "$HOME"/*Sc* > /tmp/listing
echo "Do the 'grep -c \"$1\" /tmp/listing' file for $1..."
grep -c "$1" /tmp/listing
echo "Now do the same for each individual file."
while read -r line
do
	echo "Inside file $line."
	grep -c "$1" "$line"
done < /tmp/listing
AMIGA:barrywalker~/Desktop/Code/Shell> 
AMIGA:barrywalker~/Desktop/Code/Shell> 
AMIGA:barrywalker~/Desktop/Code/Shell> ./search.sh Scope
Do the 'grep -c "Scope" /tmp/listing' file for Scope...
3
Now do the same for each individual file.
Inside file /Users/barrywalker/AudioScope.Manual.
62
Inside file /Users/barrywalker/AudioScope.config.
0
Inside file /Users/barrywalker/AudioScope.sh.
103
AMIGA:barrywalker~/Desktop/Code/Shell> _

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Regex to identify unique words in a dictionary database

Hello, I have a dictionary which I am building for the Open Source Community. The data structure is as under HEADWORD=PARTOFSPEECH=ENGLISH MEANING as shown in the example below अ=m=Prefix signifying negation. अँहँ=ind=Interjection expressing disapprobation. अं=int=An interjection... (2 Replies)
Discussion started by: gimley
2 Replies

2. Shell Programming and Scripting

Count occurrence of column one unique value having unique second column value

Hello Team, I need your help on the following: My input file a.txt is as below: 3330690|373846|108471 3330690|373846|108471 0640829|459725|100001 0640829|459725|100001 3330690|373847|108471 Here row 1 and row 2 of column 1 are identical but corresponding column 2 value are... (4 Replies)
Discussion started by: angshuman
4 Replies

3. Shell Programming and Scripting

How count the number of two words associated with the two words occurring in the file?

Hi , I need to count the number of errors associated with the two words occurring in the file. It's about counting the occurrences of the word "error" for where is the word "index.js". As such the command should look like. Please kindly help. I was trying: grep "error" log.txt | wc -l (1 Reply)
Discussion started by: jmarx
1 Replies

4. Shell Programming and Scripting

awk to count using each unique value

Im looking for an awk script that will take the unique values in column 5, then print and count the unique values in column 6. CA001011500 11111 11111 -9999 201301 AAA CA001012040 11111 11111 -9999 201301 AAA CA001012573 11111 11111 -9999 201301 BBB CA001012710 11111 11111 -9999 201301... (4 Replies)
Discussion started by: ncwxpanther
4 Replies

5. Shell Programming and Scripting

Unique words in each line

In each row there could be repetition of a word. I want to delete all repetitions and keep unique occurrences. Example: a+b+c ab+c ab+c abbb+c ab+bbc a+bbbc aaa aaa aaa Output: a+b+c ab+c abbb+c ab+bbc a+bbbc aaa (6 Replies)
Discussion started by: Viernes
6 Replies

6. Shell Programming and Scripting

display unique words.

I am having a file with duplicate words how can I eliminate them ant,bat bat,cat cat a.txt | grep -bat | awk '{print $1}' expecting o/p as ant,bat,cat How can I display the output as ant,bat,cat in a single line and no duplicates exists. (2 Replies)
Discussion started by: shikshavarma
2 Replies

7. Homework & Coursework Questions

unique words in files of folder and its subfolders

Hello, I tried to count all unique words of all files in one folder and its subfolders. Can anybody say me, why this doesnt work: ls| find -d | cat | tr "\ " "\n"| uniq -u | wc -l ??? Cat writes only the names of those files, but not the wors, which should be in them. Thanks for any advice. ... (9 Replies)
Discussion started by: Dworza
9 Replies

8. Shell Programming and Scripting

Shell script to find out words, replace them and count words

hello, i 'd like your help about a bash script which: 1. finds inside the html file (it is attached with my post) the code number of the Latest Stable Kernel, 2.finds the link which leads to the download location of the Latest Stable Kernel version, (the right link should lead to the file... (3 Replies)
Discussion started by: alex83
3 Replies

9. Shell Programming and Scripting

Finding the number of unique words in a file

find the number of unique words in a file using sort com- mand. (7 Replies)
Discussion started by: abhikamune
7 Replies

10. Shell Programming and Scripting

how to read all the unique words in a text file

How can i read all the unique words in a file, i used - cat comment_file.txt | /usr/xpg6/bin/tr -sc 'A-Za-z' '/012' and cat comment_file.txt | /usr/xpg6/bin/tr -sdc 'A-Za-z' '/012' but they didnt worked..... (5 Replies)
Discussion started by: aditya.ece1985
5 Replies
Login or Register to Ask a Question