awk - Counting number of similar lines


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk - Counting number of similar lines
# 1  
Old 05-14-2008
awk - Counting number of similar lines

Hi All

I have the input file OMAK_11.

OMAK 000002EXCLUDE 1341
OMAK 000002EXCLUDE 1341
OMAK 000002EXCLUDE 1341
OMAK 000003EXCLUDE 1341
OMAK 000003EXCLUDE 1341
OMAK 000003EXCLUDE 1341
OMAK 000004EXCLUDE 1341
OMAK 000004EXCLUDE 1341
OMAK 000004EXCLUDE 1341
OMAK 000004EXCLUDE 1341
OMAK 000005EXCLUDE 1341
OMAK 000005EXCLUDE 1341
OMAK 000005EXCLUDE 1341

I want the output as

OMAK EXCLUDE 000002 3 1341
OMAK EXCLUDE 000003 3 1341
OMAK EXCLUDE 000004 4 1341
OMAK EXCLUDE 000005 3 1341



I have this program
which is doing quite well. Except for the last line where i could not get any output. There is something to do with END of awk.

awk '{ curr=substr($0,1,11)

if ( curr != prev && prev != "")
{
a=sprintf("%s %-50s %6s %-6s %s",substr(prev_0,1,5),substr(prev_0,12,29),substr(prev_0,6,6),count,substr(prev_0,41,4))
print a
count=0
}
count++
prev=curr
prev_0=$0
} END {a=sprintf("%s %-50s %6s %-6s %s",substr($0,1,5),substr($0,12,29),substr($0,6,6),count,substr($0,41,4))
print a
}' OMAK_11


Can any one tell me how to fix this?

Regards
Dhana
# 2  
Old 05-14-2008
Incidentally, instead of a=sprintf(...); print a you can just use printf(...).

Maybe $0 is undefined when you reach the end clause... if you cange print a to print $0 in that last section does it print the last line of input?
# 3  
Old 05-14-2008
By the way, you could also use uniq -c and rearrange the order of the output columns using awk.
# 4  
Old 05-14-2008
Code:
[n]awk '{
  c[$0]++
  split($2, m, /[A-Z]+/)
  split($2, n, /[0-9]+/)
  a[$1" "n[2]" "m[1]]=c[$0]" "$3
} END {for(i in a) print i, a[i]}' file

# 5  
Old 05-15-2008
awk - counting number of similar lines

Hi
Thanks for the information provided.
I read the source code that you have proivded. For eg I have the below said data.

SIZEC000002EXCLUDE 1341
SIZEC000002EXCLUDE 1341
SIZEC000002EXCLUDE 1341
SIZEC000003EXCLUDE 1341
SIZEC000003EXCLUDE 1341
SIZEC000003EXCLUDE 1341
SIZEC000004EXCLUDE 1341
SIZEC000004EXCLUDE 1341
SIZEC000004EXCLUDE 1341
SIZEC000004EXCLUDE 1341
SIZEC000005EXCLUDE 1341
SIZEC000005EXCLUDE 1341
SIZEC000005EXCLUDE 1341

I have two questions
a] What is the purpose of having these statements if input is the above said data

split($2, m, /[A-Z]+/)
split($2, n, /[0-9]+/)
as $2 will not have any values of alphabets.
OR is it necessary to have both m and n.


b] If i have the below data

SIZEC000004EXCLUDE 1380
SIZEC000004EXCLUDE 1382
SIZEC000005EXCLUDE 1340
SIZEC000005EXCLUDE 1341
SIZEC000005EXCLUDE 1342

I want to group the datas like the below

SIZEC000004EXCLUDE 1380 1382
SIZEC000005EXCLUDE 1340 1341 1342

Is awk having any standard functions to do it.

Regards
Dhana
# 6  
Old 05-15-2008
Use an array indexed by $1, and append $2 to it as you process each line.
# 7  
Old 05-15-2008
Awk - Grouping Lines

Hi All

I have the input file as

INFOR00028114 GRAINS BAKERY 4000
INFOR00028114 GRAINS BAKERY 4000
INFOR00028114 GRAINS BAKERY 4000
INFOR0009183-RIVERS - IC 2672
INFOR0009183-RIVERS - IC 2672
INFOR0009183-RIVERS - IC 2672
INFOR0009183-RIVERS - IC 2671

I want the output like
BRAND 14 GRAINS BAKERY 000281 3 4000
BRAND 3-RIVERS - IC 000918 1 2671
BRAND 3-RIVERS - IC 000918 3 2672
BRAND 5 STAR 001972 2 3618



The Layout would be like
postion 1-5 for NAME1
position 6-6 for NAME2
position 12-41 for NAME3
position 42-46 for NAME4

I framed the below logic but i am getting the output like
BRAND 14 GRAINS BAKERY 000281 3 4000
BRAND 3-RIVERS - IC 000918 1 2671
BRAND 5 STAR 001972 2 3618
which is not that expected.

awk '{
c[$0]++
a=substr($0,1,5)
b=substr($0,12,30)
ff=substr($0,6,6)
d=substr($0,42,4)
j[a" "b" "ff]=c[$0]" " d
}END {for(i in j) print i, j[i]}' tes|sort

I am not sure what needs to be changed.
Can any one help me?

Regards
Dhana
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

[Solved] Counting The Number of Lines Between Values with Multiple Variables

Hey everyone, I have a bunch of lines with values in field 4 that I am interested in. If these values are between 1 and 3 I want it to count all these values to all be counted together and then have the computer print out LOW and the number of lines with those values in between 1 and 3,... (2 Replies)
Discussion started by: VagabondGold
2 Replies

2. Shell Programming and Scripting

Counting lines in a file using awk

I want to count lines of a file using AWK (only) and not in the END part like this awk 'END{print FNR}' because I want to use it. Does anyone know of a way? Thanks a lot. (7 Replies)
Discussion started by: guitarist684
7 Replies

3. Shell Programming and Scripting

Running sed and counting number of lines processed

/bin/sed -n ';4757335,$ p' | wc -l /bin/sed -n ';4757335,$ p' | egrep "Failed" | egrep -c "PM late arrrival" how can i combine the above two sed commands into one? i want to count the number of lines between the specified line number and the end of the file. AND and i want to count how many... (5 Replies)
Discussion started by: SkySmart
5 Replies

4. Shell Programming and Scripting

Counting similar lines

Hi, I have a little problem with counting lines. I know similar topics from this forum, but they don't resolve my problem. I have file with lines like this: 2009-05-25 16:55:32,143 some text some regular expressions ect. 2009-05-25 16:55:32,144 some text. 2009-05-28 18:15:12,148 some... (4 Replies)
Discussion started by: marcinnnn
4 Replies

5. Shell Programming and Scripting

awk counting number of occurences

Hi, I am trying to count the max number of occurences of field1 in my apache log example: 10.0.0.1 field2 field3 10.0.0.2 filed2 field3 10.0.0.1 field2 field3 10.0.0.1 field2 field3 awk result to print out only the most occurence of field1 and number of occurence and field1 is... (3 Replies)
Discussion started by: phamp008
3 Replies

6. Shell Programming and Scripting

Counting similar lines from file UNIX

I have a file which contains data as below: nbk1j7o pageName=/jsp/RMBS/RMBSHome.jsf nbk1j7o pageName=/jsp/RMBS/RMBSHome.jsf nbk1j7o pageName=/jsp/RMBS/RMBSHome.jsf nbk1j7o pageName=/jsp/RMBS/RMBSHome.jsf nbk1j7o pageName=/jsp/common/index.jsf nbk1j7o pageName=/jsp/common/index.jsf nbk1wqe... (6 Replies)
Discussion started by: mohsin.quazi
6 Replies

7. Shell Programming and Scripting

counting non integer number in awk

Hi, I am having the following number in the file tmp 31013.004 20675.336 43318.190 30512.926 48992.559 277893.111 41831.330 8749.113 415980.576 28273.054 I want to add these numbers, I am using following script awk 'END{print s}{s += $1}' tmp its giving answer 947239 which is correct,... (3 Replies)
Discussion started by: chaitubek
3 Replies

8. Shell Programming and Scripting

counting the number of lines - again

Hi all, I use bash shell and I have a problem with wc. I would like to determine the number of lines in a file so I do wc -l filename but I don't want to get the filename again I just would like to have the number of lines and use it in a variable. Can anybody help? Thank you, (7 Replies)
Discussion started by: f_o_555
7 Replies

9. Linux

counting the number of lines

Hello, I have afile which begins with a few urls on multiple lines and then there is listing of some information on separate lines. The listing begins with the word Name on a given line followed by teh actual list. I want to count the number of lines in this file after the line having... (6 Replies)
Discussion started by: nayeemmz
6 Replies

10. UNIX for Dummies Questions & Answers

Counting The Number Of Duplicate Lines In a File

Hello. First time poster here. I have a huge file of IP numbers. I am trying to output only the class b of the IPs and rank them by most common and output the total # of duplicate class b's before the class b. An example is below: 12.107.1.1 12.107.9.54 12.108.3.89 12.109.109.4 12.109.6.3 ... (2 Replies)
Discussion started by: crunchtime
2 Replies
Login or Register to Ask a Question