counts the number of distinct words


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting counts the number of distinct words
# 1  
Old 08-17-2008
counts the number of distinct words

I'm looking to write a sample shell script that counts the number of distinct words in a text file given as Argument.
Remark: White space characters are spaces, tabs, form feeds, and new lines.

JUST with this commands tr, sort, grep. wc.

Thanks.
# 2  
Old 08-17-2008
It sounds like you already know which commands you need to use. Where are you stuck? Have you read the man pages for those commands?
# 3  
Old 08-17-2008
Actully yes but i don't know how i can use it to give me distinct words.

please if you can just guide me how i can do it. thank you
# 4  
Old 08-17-2008
Difficult to "guide" you how to do it without just telling you how to do it.... and that way you won't learn anything!

Try using tr to strip out all punctuation (see the -d option), then using tr again to convert all spaces to carriage returns and all upper-case characters to lower-case. Then you can sort the output using the unique option (see the man page) so that you end up with only distinct words, and then count the number of lines produced using wc.
# 5  
Old 08-17-2008
perl:

Code:
$file=shift;
open(FH,"<file");
while(<FH>){
	@arr=split(" ",$_);
	for($i=0;$i<=$#arr;$i++){
		$hash{$arr[$i]}++;
	}
}
close(FH);
for $key (keys %hash){
	print $key,"--->",$hash{$key},"\n";
}

# 6  
Old 08-17-2008
just pass those agrument to grep -c

scriptname abc efg
grep -c "$1" filename---this will give the count of abc in file
similarly do for remainning...
# 7  
Old 08-17-2008
I don't think you understood the question vidyadhar85.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Output counts of all matching strings lessthan a number using awk

The awk below is supposed to count all the matching $5 strings and count how many $7 values is less than 20. I don't think I need the portion in bold as I do not need any decimal point or format, but can not seem to get the correct counts. Thank you :). file chr5 77316500 77316628 ... (6 Replies)
Discussion started by: cmccabe
6 Replies

2. Shell Programming and Scripting

How count the number of two words associated with the two words occurring in the file?

Hi , I need to count the number of errors associated with the two words occurring in the file. It's about counting the occurrences of the word "error" for where is the word "index.js". As such the command should look like. Please kindly help. I was trying: grep "error" log.txt | wc -l (1 Reply)
Discussion started by: jmarx
1 Replies

3. Shell Programming and Scripting

How can I sort by n number is like words?

I want to sort a file with a list of words, in order of most occuring words to least occurring words as well as alphabetically. ex: file1: cat 3 cat 7 cat 1 dog 3 dog 5 dog 9 dog 1 ape 4 ape 2 I want the outcome to be: file1.sorted: dog 1 (12 Replies)
Discussion started by: castrojc
12 Replies

4. UNIX for Dummies Questions & Answers

count number of distinct values in each column with awk

Hi ! input: A|B|C|D A|F|C|E A|B|I|C A|T|I|B As the title of the thread says, I would need to get: 1|3|2|4 I tried different variants of this command, but I don't manage to obtain what I need: gawk 'BEGIN{FS=OFS="|"}{for(i=1; i<=NF; i++) a++} END {for (b in a) print b}' input ... (2 Replies)
Discussion started by: beca123456
2 Replies

5. Shell Programming and Scripting

Shell script to search a pattern in a directory and output number of find counts

I need a Shell script which take two inputs which are 1) main directory where it has to search and 2) pattern to search within main directory all files (.c and .h files) It has to print number of pattern found in main directory & each sub directory. main dir --> Total pattern found = 5 |... (3 Replies)
Discussion started by: vivignesh
3 Replies

6. UNIX for Dummies Questions & Answers

how to get distinct counts in a column of a file

If i have a file sample.txt with more than 10 columns and 11th column as following data. would it be possible to get the distinct counts of values in single shot,Thank you. Y Y N N N P P o Expected Result: Value count Y 2 N 3 P 2 (2 Replies)
Discussion started by: Ariean
2 Replies

7. Shell Programming and Scripting

Counts a number of unique word contained in the file and print them in alphabetical order

What should be the Shell script that counts a number of unique word contained in a file and print them in alphabetical order line by line? (7 Replies)
Discussion started by: proactiveaditya
7 Replies

8. Shell Programming and Scripting

how to find number of words

please help me for this "divide the file into multiple files containing no more than 50 lines each and find the number of words of length less than 5 characters" (3 Replies)
Discussion started by: annapurna konga
3 Replies

9. UNIX for Advanced & Expert Users

Number of days between two distinct dates

Hi I'm looking for a .ksh script/function that will calculate ONLY the number of days between two distinct dates. Further convert the number of days to weeks and display. I need this to be part of another larger script that checks the password expiry on several servers and notifies the... (1 Reply)
Discussion started by: radheymohan
1 Replies

10. UNIX for Dummies Questions & Answers

finding no of counts the words occured

hi, cud u help me to find this. i hav 2 files. file1 has data as "ARUN ARUN is from Australia Arun likes America etc.. ARUN ARUN " file2 has "ARUN Australia America" i... (5 Replies)
Discussion started by: arunsubbhian
5 Replies
Login or Register to Ask a Question