Count and print all repeating words in a line


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Count and print all repeating words in a line
# 1  
Old 10-13-2010
Count and print all repeating words in a line

Gurus,

I have a file containing lines like this :
Quote:
GO:0005874 GO:0005634 GO:0007067 GO:0000778 GO:0005876 GO:0005874
GO:0005938 GO:0000776 GO:0007067 GO:0000092 GO:0007067
GO:0043332 GO:0008017 GO:0005737 GO:0008017 GO:0051301 GO:0005737
GO:0005874 GO:0007067 GO:0000555 GO:0005123
Now, number of words in each line varies. My need is, if a word repeats in a line get it printed. Also total number of repeats.

So, the output would be :

Quote:
GO:0005874 1
GO:0007067 1
GO:0008017 GO:0005737 2
Any help would be highly appreciated.

Thanks & Regards
# 2  
Old 10-13-2010
Try this:
Code:
awk '{for(i=1;i<=NF;i++)A[$i]++;for(i in A)if(A[i]>1){printf i FS; n++}}n{print n;n=0}{delete A}' infile

shorter:
Code:
awk '{for(i=1;i<=NF;i++){A[$i]++;if(A[$i]==2){printf $i FS; n++}}}n{print n;n=0}{delete A}' infile


Last edited by Scrutinizer; 10-13-2010 at 04:29 AM..
# 3  
Old 10-13-2010
Code:
awk '{for(i=1;i<=NF;i++) {if (a[$i]) {k++;printf $i FS} else {a[$i]++} if (i==NF && k != 0) {printf k "\n";k=0;for (j in a) {delete a[j]}}}}'  infile

# 4  
Old 10-13-2010
My code is similar as Scrutinizer's

Code:
awk '{delete a;s=0}{for (i=1;i<=NF;i++) if (++a[$i]==2) {printf $i FS;s++}}s{print s}' infile

# 5  
Old 10-13-2010
Using perl:
Code:
perl -ne '%h={};$i=0;$r="";map {$h{$_}++} split /\s+/,$_;for (keys %h){if($h{$_}>1){$i++;$r.=$_." "}};print $r.$i."\n" unless ! $i' infile

# 6  
Old 10-13-2010
Quote:
Originally Posted by AshwaniSharma09
My need is, if a word repeats in a line get it printed. Also total number of repeats.
Code:
awk '{delete _;for(i=0;++i<=NF;){_[$i]++}}{for(i in _){if(_[i]-1){print i,_[i]}}}' file

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Sort words based on word count on each line

Hi Folks :) I have a .txt file with thousands of words. I'm trying to sort the lines in order based on number of words per line. Example from: word word word word word word word word word word word word word word word word to desired output: word (2 Replies)
Discussion started by: martinsmith
2 Replies

2. Shell Programming and Scripting

Count and print the most repeating string in each line

Hi all, I have a file in which each string from column 1 is associated with one or multiple strings from column 2. For an example, in the sample input below, Gene1 from column1 is associated with two different strings from column 2 (BP1 and BP2).For every unique string from column 1, I need to... (9 Replies)
Discussion started by: AshwaniSharma09
9 Replies

3. Shell Programming and Scripting

print number of words in each line

Hi, Please suggest a way to print number of words in the end of each line. <input file> red aunt house blue sky bat and ball game <output file> red aunt house 3 blue sky 2 bat and ball game 4 Thanks! (2 Replies)
Discussion started by: mira
2 Replies

4. UNIX for Advanced & Expert Users

cut words based on the word count of a line

I would like to cut words based on the word count of a line. This over here inspired me with some ideas but I wasn't able to get what I needed. https://www.unix.com/shell-programming-scripting/105841-count-words-each-line-file-using-xargs.html If the line has 6 words I would like to use this.... (8 Replies)
Discussion started by: cokedude
8 Replies

5. Shell Programming and Scripting

How to print the words in the same line with space or to the predefined line?

HI, cat test abc echo "def" >> test output is cat test abc def the needed output is cat test abc def and so on (5 Replies)
Discussion started by: jobycxa
5 Replies

6. Shell Programming and Scripting

Print two matched words from the same line

Hi experts I need to pick 2 matched words from the same line..... I have given below an example file eg: O14757 hsa04110 hsa04115 2 P38398 hsa04120 1 O15111 hsa04010 hsa04210 hsa04920 hsa04620 hsa04660 hsa04662 hsa05200 hsa05212 hsa05221 hsa05220 hsa05215 hsa05222 hsa05120 13 O14920... (4 Replies)
Discussion started by: binnybio
4 Replies

7. Shell Programming and Scripting

Count words on each line in file using xargs

Hi, im having a problem with xargs, i want to cout word of each line in file, and i HAVE to use xargs, i tried: cat file | xargs wc -w .....that uses all words in file like name of files and passed then to wc so it worte wc :somewordformfile is not i afile or directory cat file | xargs -I{} wc... (3 Replies)
Discussion started by: Qwetek
3 Replies

8. Shell Programming and Scripting

count no of words in a line

hi i have a line "abc,def,ghi,abc,def ,ghi,abc,def,ghi,abc,def ,ghi,abc,def,ghi,abc" I want to print the no of words, words separated by comma please help (3 Replies)
Discussion started by: Satyak
3 Replies

9. Shell Programming and Scripting

print only last two words of a line

can u help me out to print last two words of each sentence of a file. for example. contents of input file: i love songs my favourite songs sent songs all kind good buddy Ouput file should contain: love songs favourite songs sent all kind good buddy (5 Replies)
Discussion started by: pradeepreddy
5 Replies

10. Shell Programming and Scripting

count no of words in a line

hi i have a string like str=abc def ghi jkl now i want to count the no of words in the string please help (7 Replies)
Discussion started by: satish@123
7 Replies
Login or Register to Ask a Question