Print number of occurrences of many different strings


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Print number of occurrences of many different strings
# 1  
Old 08-20-2010
Print number of occurrences of many different strings

People, I need your help with making a script which will

1. take as an input the number of lines, smth like this:

((RUBROBACTER_1_PE1288
(((SALINISPORA_1_PE1863
SALINISPORA_1_PE1828)100
((NOCARDIOIDES_2_PE2419
PROPIONIBACTERIUM_1_PE1395)96
((((((((CORYNEBACTERIUM_1_PE1119
CORYNEBACTERIUM_2_PE839)100
CORYNEBACTERIUM_1_PE1134)91
CORYNEBACTERIUM_3_PE1518)71
CORYNEBACTERIUM_1_PE1349)100
(((RHODOCOCCUS_4_PE951

2. find occurrences of each name in all lines based on the names_list provided, smth like this (names in the list are unique):

PROPIONIBACTERIUM
RHODOCOCCUS
NOCARDIOIDES
RUBROBACTER
SALINISPORA
CORYNEBACTERIUM
AZORHIZOBIUM
AZOTOBACTER

3. output the same names_list with the number of names occurrences across all lines printed next to them, i.e. in this example:

PROPIONIBACTERIUM 1
RHODOCOCCUS 1
NOCARDIOIDES 1
RUBROBACTER 1
SALINISPORA 2
CORYNEBACTERIUM 5
AZORHIZOBIUM 0
AZOTOBACTER 0

I guess one should use awk, but my current knowledge to create scripts is very basic..

Thanks a lot in advance.
# 2  
Old 08-21-2010
Code:
awk '
NR==FNR{a[$1]=0;next}
{for (i in a) {s=match($0,i);if (s>0) a[i]++}}
END {for (i in a) print i,a[i]}
' name_list input.txt

SALINISPORA 2
PROPIONIBACTERIUM 1
RHODOCOCCUS 1
AZORHIZOBIUM 0
NOCARDIOIDES 1
RUBROBACTER 1
CORYNEBACTERIUM 5
AZOTOBACTER 0

This User Gave Thanks to rdcwayx For This Post:
# 3  
Old 08-21-2010
rdcwayx, thank you very much, I'll test this on large data.
# 4  
Old 08-21-2010
Code:
# for i in $(cat namelist); do echo "$i" $(sed -n "/$i/p" infile | sed '$=' | sed -n 'N;$s/\(.*\)\n\(.*\)/\1/;p' | sed -n '$p') ;done
PROPIONIBACTERIUM 1
RHODOCOCCUS 1
NOCARDIOIDES 1
RUBROBACTER 1
SALINISPORA 2
CORYNEBACTERIUM 5
AZORHIZOBIUM
AZOTOBACTER

with val zero

Code:
for i in $(cat namelist)
  do
    if [[ $(sed -n "/$i/p" infile | sed '$=' | sed -n 'N;$s/\(.*\)\n\(.*\)/\1/;p' | sed -n '$p') == "" ]] ; then
         echo "$i 0" ; else echo "$i" $(sed -n "/$i/p" infile | sed '$=' | sed -n 'N;$s/\(.*\)\n\(.*\)/\1/;p' | sed -n '$p')
    fi
  done
PROPIONIBACTERIUM 1
RHODOCOCCUS 1
NOCARDIOIDES 1
RUBROBACTER 1
SALINISPORA 2
CORYNEBACTERIUM 5
AZORHIZOBIUM 0
AZOTOBACTER 0

 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How to strictly grep (with before and after args) for last N number of occurrences?

Here is my grep string to print only the last occurrence, with before and after lines. Note that the tail Argument is sum of B and A args + 1, so that it prints the data of only the last 1 match. Now I need to print last 2 such matches. I thought of doubling the tail arg like 5+5+1 (For -- line),... (2 Replies)
Discussion started by: samjna
2 Replies

2. Shell Programming and Scripting

Print occurrences for pattern match

Hi All, I want to print all the occurrences for a particular pattern from a file. The catch is that the pattern search is partial and if any word in the file contains the pattern, that complete word has to be printed. If there are multiple words matching the pattern on a specific line, then all... (2 Replies)
Discussion started by: decci_7
2 Replies

3. Shell Programming and Scripting

GREP between last occurrences of two strings

Hi I have the server.log file as: Server Stopped ABC DEF GHI JKL Server Started MNO PQR STU Server Stopped VWX YZ ABC Server Started Server Stopped 123 456 789 (9 Replies)
Discussion started by: ankur328
9 Replies

4. Shell Programming and Scripting

Compare strings between 2 arrays and print number in AWK

Hi to everyone, Please some help over here. Hi have array a with 6 elements and array b with 3 elements as shown inside BEGIN{} statement. I need help to get the correct sintax (the part in red) to compare if string from array b is in array a and print the number related for each match.... (3 Replies)
Discussion started by: Ophiuchus
3 Replies

5. Linux

How to sort the number of occurrences

file:///C:/Users/TSHEPI%7E1.LEB/AppData/Local/Temp/moz-screenshot.pngATM@ubuntu:~$ cat numbers2 | sort -n | uniq -c 1 7 1 11 2 10 3 the 1st numbers are the counts from the command "uniq -c", which represent the number of occurrences of each in the file. The "sort -n"... (4 Replies)
Discussion started by: lebogot
4 Replies

6. Shell Programming and Scripting

PERL : Sort substring occurrences in array of strings

Hi, My developer is on vacation and I am not sure if there is something which is easier for this. I have an array of strings. Each string in the array has "%" characters in it. I have to get the string(s) which have the least number of "%" in them. I know how I can get occurrences : ... (7 Replies)
Discussion started by: sinpeak
7 Replies

7. Shell Programming and Scripting

counting number of pattern occurrences

Hi All, Is it possible to count number of occurrences of a pattern in a single record using awk?? for example: a line like this: abrsjdfhafa I want to count the number of a character occurrences. but still use the default RS, I don't want to set RS to single character. (1 Reply)
Discussion started by: ghoda2_10
1 Replies

8. UNIX for Dummies Questions & Answers

Replace all occurrences of strings with parentheses

Hi, I tried to adapt bartus's solution to my problem, without success. I want to replace all the occurences of this: with: , where something can contain an arbitrary number of balanced parens and brakets. Any ideas ? Best, (1 Reply)
Discussion started by: ff1969ff1969
1 Replies

9. Shell Programming and Scripting

how many occurrences of different strings are there in each FILE.

Hello , I need some help to pull the data from different files, simultaneously for the string provided. I want to search below strings. PTN:3763427632478 IDB:3298734287438 PTN:8734983298738 From the files BELOW CODE_FILE_LOG1 CODE_FILE_LOG2 CODE_FILE_LOG3 CODE_FILE_LOG4 (3 Replies)
Discussion started by: baraghun
3 Replies

10. Shell Programming and Scripting

Count the number of occurrences of the word

I am a newbie in UNIX shell script and seeking help on this UNIX function. Please give me a hand. Thanks. I have a large file. Named as 'MyFile'. It was tab-delmited. I am told to write a shell function that counts the number of occurrences of the ord “mysring” in the file 'MyFile'. (1 Reply)
Discussion started by: duke0001
1 Replies
Login or Register to Ask a Question