Count number of occurences using awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Count number of occurences using awk
# 1  
Old 04-14-2013
Count number of occurences using awk

Hi Guys,

I have 2 files like below

Code:
file1
xx
yy


file2
b
yy
b2
xx
c1
yy
xx
yy

Now I want an idea which can count occurences of text from file1 and file2 so outbout would be kind of

Code:
xx-2
yy-3

I know this is possible using awk or grep -c but I am not able to get desired output.

Any suggestion would be really appreciated.
# 2  
Old 04-14-2013
Something which works on your sample data:
Code:
awk 'FNR==NR{c[$1];next}$1 in c{++c[$1]}END{for(i in c) print i"-"c[i]}' file1 file2

# 3  
Old 04-14-2013
Quote:
Originally Posted by elixir_sinari
Something which works on your sample data:
Code:
awk 'FNR==NR{c[$1];next}$1 in c{++c[$1]}END{for(i in c) print i"-"c[i]}' file1 file2

I forgot to mention that file2 is quite big in GB so what would be good to split it first in smaller pieces and run on all of them parallel to get results faster?
# 4  
Old 04-14-2013
The size of file2 doesn't matter much. How big is file1?
# 5  
Old 04-14-2013
file 2 will be 120 million records
file 1 will be 11 million records
# 6  
Old 04-14-2013
Why don't you try the command/script and then worry about the size?
# 7  
Old 04-14-2013
As you can see size of files if I run script without splitting them then it might take hours to complete i think. I want this result to come within 15-30 minutes max.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Count and print the number of occurences

I have some files as shown below GLL ALM 654-656 654 656 SEM LYG 655-657 655 657 SEM LYG 655-657 655 657 ALM LEG 656-658 656 658 ALM LEG 656-658 656 658 ALM LEG 656-658 656 658 LEG LEG 658-660 658 660 LEG LEG 658-660 658 660 The value of GLL is... (5 Replies)
Discussion started by: arch
5 Replies

2. Shell Programming and Scripting

[solved]awk count occurences in time window

Input File Time, KeyStation 00:00:00,000;KS_1 00:00:01,000;KS_1 00:00:02,000;KS_1 00:00:03,000;KS_1 00:00:04,000;KS_1 00:00:05,000;KS_1 00:00:06,000;KS_1 00:01:01,000;KS_1 00:01:02,000;KS_1 00:01:03,000;KS_1 00:01:04,000;KS_1 00:01:05,000;KS_1 00:01:06,000;KS_1 01:00:01,000;KS_1... (0 Replies)
Discussion started by: Calypso
0 Replies

3. Shell Programming and Scripting

awk count occurences

line number:status, market, keystation 1,SENT,EBS,1 : 1 2,DONE,REU,1 : 1 3,SENT,EBS,2 : 1 4,DONE,EBS,1 : 0 5,SENT,EBS,2 : 0 6,SENT,EBS,2 : 0 7,SENT,EBS,2 : 0 8,SENT,EBS,1 : 1 for each status, market combination I want to keep a tally of active orders. i.e if an order is SENT, then +1, if... (8 Replies)
Discussion started by: Calypso
8 Replies

4. UNIX for Dummies Questions & Answers

Count pattern occurences

hi, I have a text..and i need to find a pattern in the text and count to the no of times the pattern occured. i have used grep command ..but the problem is , it shows the occurrences of the pattern but doesn't count no of times the pattern occuries. (5 Replies)
Discussion started by: nvnni
5 Replies

5. Shell Programming and Scripting

Awk to count occurences

Hi, i am in need of an awk script to accomplish the following: Input table looks like: Student1 arts Student2 science Student3 arts Student4 science Student5 science Student6 science Student7 science Student8 science Student9 science Student10 science Student11 science... (8 Replies)
Discussion started by: saint2006
8 Replies

6. Shell Programming and Scripting

to count the number of occurences of a column value

im trying to count the number of occurences of column 2 value(starting from KKK*) of the below file, file.txt using the code cat file.txt | awk ' BEGIN { print "Category Counts"} {FS=","} {NR > 2} { cats = cats + 1} END { for(c in cats) { print c, "=", cats} } ' but its returning as ... (6 Replies)
Discussion started by: michaelrozar17
6 Replies

7. Shell Programming and Scripting

Count number of occurences of a character in a field defined by the character in another field

Hello, I have a text file with n lines in the following format (9 column fields): Example: contig00012 149606 G C 49 68 60 18 c$cccccacccccccccc^c I need to count the number of lower-case and upper-case occurences in column 9, respectively, of the... (3 Replies)
Discussion started by: s052866
3 Replies

8. Shell Programming and Scripting

awk counting number of occurences

Hi, I am trying to count the max number of occurences of field1 in my apache log example: 10.0.0.1 field2 field3 10.0.0.2 filed2 field3 10.0.0.1 field2 field3 10.0.0.1 field2 field3 awk result to print out only the most occurence of field1 and number of occurence and field1 is... (3 Replies)
Discussion started by: phamp008
3 Replies

9. UNIX for Dummies Questions & Answers

Count number of occurences of a word

I want to count the number of occurences of say "200" in a file but that file also contains various stuff including dtaes like 2007 or smtg like 200.1 so count i am getting by doing grep -c "word" file is wrong Please help!!!!! (8 Replies)
Discussion started by: shikhakaul
8 Replies

10. Shell Programming and Scripting

How to count the number of occurences of this pattern?

Hi all, I have a pattern like this in a file: 123 4 56 789 234 5 67 789 121 3 56 789 222 4 65 789 321 6 90 100 478 8 40 789 243 7 80 789 How can I count the number of occurences of '789' (4th column) in this set...? Thanks for all your help! K (7 Replies)
Discussion started by: kripssmart
7 Replies
Login or Register to Ask a Question