counts the number of distinct words


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting counts the number of distinct words
# 8  
Old 08-17-2008
Quote:
Originally Posted by Annihilannic
I don't think you understood the question vidyadhar85.
can you explain please Smilie
# 9  
Old 08-17-2008
He wants to count the number of "distinct" (i.e. unique) words... e.g. if the words were "Blah yak blah blah yak rhubarb" the answer would be 3 ("blah yak rhubarb").
# 10  
Old 08-18-2008
Code:
tr -s '[:space:]' '\n' < infile | sort -u | wc -l

The way the question is posed it looks like homework. But I cannot be sure. Don't post homework questions, please.
# 11  
Old 08-18-2008
With that limitation (grep, tr, sort, wc) definitely looks like homework.

If you could use xargs:
Code:
xargs -a FILENAME -n 1|sort -u|wc -l

# 12  
Old 08-18-2008
My friends this is not HW it's just challenge between my friends Smilie
Anyway i try to solve it:

grep -c | sort -u test.txt | tr -d "\t \v \f [unct:] [:upper:] "

but actually i don't know how i can sort this commands ?? so anybody can correct it for me.

Thanks everyone replayed to me.
# 13  
Old 08-18-2008
What's the grep for? It doesn't do anything without at least a regular expression to search for, and usually also a file name.

Jim' solution already solved your problem; did you really not try the solutions posted here?

Code:
# Change any whitespace into a newline
tr '\t\v\f ' '\n' <test.txt |
# sort, deleting any repeated occurrences of the same word
sort -u |
# count how many we have
wc -l

This is very much a staple of introductory Unix text books; I'd recommend that you finish the first chapter before you accept any more challenges.

Last edited by era; 08-18-2008 at 02:29 PM.. Reason: Note that Jim already posted the solution
# 14  
Old 08-18-2008
Something like this:
Code:
for word in $(cat $filename)
do
echo $word|tr -d [:punc:]|tr [:upper:] [:lower:] 
done|sort |uniq -c

Just like Annihilannic suggested.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Output counts of all matching strings lessthan a number using awk

The awk below is supposed to count all the matching $5 strings and count how many $7 values is less than 20. I don't think I need the portion in bold as I do not need any decimal point or format, but can not seem to get the correct counts. Thank you :). file chr5 77316500 77316628 ... (6 Replies)
Discussion started by: cmccabe
6 Replies

2. Shell Programming and Scripting

How count the number of two words associated with the two words occurring in the file?

Hi , I need to count the number of errors associated with the two words occurring in the file. It's about counting the occurrences of the word "error" for where is the word "index.js". As such the command should look like. Please kindly help. I was trying: grep "error" log.txt | wc -l (1 Reply)
Discussion started by: jmarx
1 Replies

3. Shell Programming and Scripting

How can I sort by n number is like words?

I want to sort a file with a list of words, in order of most occuring words to least occurring words as well as alphabetically. ex: file1: cat 3 cat 7 cat 1 dog 3 dog 5 dog 9 dog 1 ape 4 ape 2 I want the outcome to be: file1.sorted: dog 1 (12 Replies)
Discussion started by: castrojc
12 Replies

4. UNIX for Dummies Questions & Answers

count number of distinct values in each column with awk

Hi ! input: A|B|C|D A|F|C|E A|B|I|C A|T|I|B As the title of the thread says, I would need to get: 1|3|2|4 I tried different variants of this command, but I don't manage to obtain what I need: gawk 'BEGIN{FS=OFS="|"}{for(i=1; i<=NF; i++) a++} END {for (b in a) print b}' input ... (2 Replies)
Discussion started by: beca123456
2 Replies

5. Shell Programming and Scripting

Shell script to search a pattern in a directory and output number of find counts

I need a Shell script which take two inputs which are 1) main directory where it has to search and 2) pattern to search within main directory all files (.c and .h files) It has to print number of pattern found in main directory & each sub directory. main dir --> Total pattern found = 5 |... (3 Replies)
Discussion started by: vivignesh
3 Replies

6. UNIX for Dummies Questions & Answers

how to get distinct counts in a column of a file

If i have a file sample.txt with more than 10 columns and 11th column as following data. would it be possible to get the distinct counts of values in single shot,Thank you. Y Y N N N P P o Expected Result: Value count Y 2 N 3 P 2 (2 Replies)
Discussion started by: Ariean
2 Replies

7. Shell Programming and Scripting

Counts a number of unique word contained in the file and print them in alphabetical order

What should be the Shell script that counts a number of unique word contained in a file and print them in alphabetical order line by line? (7 Replies)
Discussion started by: proactiveaditya
7 Replies

8. Shell Programming and Scripting

how to find number of words

please help me for this "divide the file into multiple files containing no more than 50 lines each and find the number of words of length less than 5 characters" (3 Replies)
Discussion started by: annapurna konga
3 Replies

9. UNIX for Advanced & Expert Users

Number of days between two distinct dates

Hi I'm looking for a .ksh script/function that will calculate ONLY the number of days between two distinct dates. Further convert the number of days to weeks and display. I need this to be part of another larger script that checks the password expiry on several servers and notifies the... (1 Reply)
Discussion started by: radheymohan
1 Replies

10. UNIX for Dummies Questions & Answers

finding no of counts the words occured

hi, cud u help me to find this. i hav 2 files. file1 has data as "ARUN ARUN is from Australia Arun likes America etc.. ARUN ARUN " file2 has "ARUN Australia America" i... (5 Replies)
Discussion started by: arunsubbhian
5 Replies
Login or Register to Ask a Question