![]() |
|
|
|
|
|||||||
| Forums | Portal | Register | Forum Rules | FAQ | Contribute | Members List | Arcade | Search | Today's Posts | Mark Forums Read |
| Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts here. |
|
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Appending text to a number of similar filenames | Djaunl | UNIX for Dummies Questions & Answers | 4 | 10-09-2007 08:16 PM |
| Remove Similar Lines from a File | Nysif Steve | Shell Programming and Scripting | 3 | 09-04-2007 07:20 AM |
| counting the number of lines | nayeemmz | Linux | 6 | 01-19-2005 09:37 AM |
| Counting the number of lines in ASCII file | alarmcall | Shell Programming and Scripting | 8 | 08-26-2003 03:53 PM |
| Counting The Number Of Duplicate Lines In a File | crunchtime | UNIX for Dummies Questions & Answers | 2 | 07-04-2003 10:24 AM |
|
|
Submit Tools | LinkBack | Thread Tools | Display Modes |
|
#1
|
|||
|
|||
|
awk - Counting number of similar lines
Hi All
I have the input file OMAK_11. OMAK 000002EXCLUDE 1341 OMAK 000002EXCLUDE 1341 OMAK 000002EXCLUDE 1341 OMAK 000003EXCLUDE 1341 OMAK 000003EXCLUDE 1341 OMAK 000003EXCLUDE 1341 OMAK 000004EXCLUDE 1341 OMAK 000004EXCLUDE 1341 OMAK 000004EXCLUDE 1341 OMAK 000004EXCLUDE 1341 OMAK 000005EXCLUDE 1341 OMAK 000005EXCLUDE 1341 OMAK 000005EXCLUDE 1341 I want the output as OMAK EXCLUDE 000002 3 1341 OMAK EXCLUDE 000003 3 1341 OMAK EXCLUDE 000004 4 1341 OMAK EXCLUDE 000005 3 1341 I have this program which is doing quite well. Except for the last line where i could not get any output. There is something to do with END of awk. awk '{ curr=substr($0,1,11) if ( curr != prev && prev != "") { a=sprintf("%s %-50s %6s %-6s %s",substr(prev_0,1,5),substr(prev_0,12,29),substr(prev_0,6,6),count,substr(prev_0,41,4)) print a count=0 } count++ prev=curr prev_0=$0 } END {a=sprintf("%s %-50s %6s %-6s %s",substr($0,1,5),substr($0,12,29),substr($0,6,6),count,substr($0,41,4)) print a }' OMAK_11 Can any one tell me how to fix this? Regards Dhana |
| Forum Sponsor | ||
|
|
|
#2
|
|||
|
|||
|
Incidentally, instead of a=sprintf(...); print a you can just use printf(...).
Maybe $0 is undefined when you reach the end clause... if you cange print a to print $0 in that last section does it print the last line of input? |
|
#3
|
|||
|
|||
|
By the way, you could also use uniq -c and rearrange the order of the output columns using awk.
|
|
#4
|
|||
|
|||
|
Code:
[n]awk '{
c[$0]++
split($2, m, /[A-Z]+/)
split($2, n, /[0-9]+/)
a[$1" "n[2]" "m[1]]=c[$0]" "$3
} END {for(i in a) print i, a[i]}' file
|
|
#5
|
|||
|
|||
|
awk - counting number of similar lines
Hi
Thanks for the information provided. I read the source code that you have proivded. For eg I have the below said data. SIZEC000002EXCLUDE 1341 SIZEC000002EXCLUDE 1341 SIZEC000002EXCLUDE 1341 SIZEC000003EXCLUDE 1341 SIZEC000003EXCLUDE 1341 SIZEC000003EXCLUDE 1341 SIZEC000004EXCLUDE 1341 SIZEC000004EXCLUDE 1341 SIZEC000004EXCLUDE 1341 SIZEC000004EXCLUDE 1341 SIZEC000005EXCLUDE 1341 SIZEC000005EXCLUDE 1341 SIZEC000005EXCLUDE 1341 I have two questions a] What is the purpose of having these statements if input is the above said data split($2, m, /[A-Z]+/) split($2, n, /[0-9]+/) as $2 will not have any values of alphabets. OR is it necessary to have both m and n. b] If i have the below data SIZEC000004EXCLUDE 1380 SIZEC000004EXCLUDE 1382 SIZEC000005EXCLUDE 1340 SIZEC000005EXCLUDE 1341 SIZEC000005EXCLUDE 1342 I want to group the datas like the below SIZEC000004EXCLUDE 1380 1382 SIZEC000005EXCLUDE 1340 1341 1342 Is awk having any standard functions to do it. Regards Dhana |
|
#6
|
|||
|
|||
|
Use an array indexed by $1, and append $2 to it as you process each line.
|
|
#7
|
|||
|
|||
|
Awk - Grouping Lines
Hi All
I have the input file as INFOR00028114 GRAINS BAKERY 4000 INFOR00028114 GRAINS BAKERY 4000 INFOR00028114 GRAINS BAKERY 4000 INFOR0009183-RIVERS - IC 2672 INFOR0009183-RIVERS - IC 2672 INFOR0009183-RIVERS - IC 2672 INFOR0009183-RIVERS - IC 2671 I want the output like BRAND 14 GRAINS BAKERY 000281 3 4000 BRAND 3-RIVERS - IC 000918 1 2671 BRAND 3-RIVERS - IC 000918 3 2672 BRAND 5 STAR 001972 2 3618 The Layout would be like postion 1-5 for NAME1 position 6-6 for NAME2 position 12-41 for NAME3 position 42-46 for NAME4 I framed the below logic but i am getting the output like BRAND 14 GRAINS BAKERY 000281 3 4000 BRAND 3-RIVERS - IC 000918 1 2671 BRAND 5 STAR 001972 2 3618 which is not that expected. awk '{ c[$0]++ a=substr($0,1,5) b=substr($0,12,30) ff=substr($0,6,6) d=substr($0,42,4) j[a" "b" "ff]=c[$0]" " d }END {for(i in j) print i, j[i]}' tes|sort I am not sure what needs to be changed. Can any one help me? Regards Dhana |
|||
| Google The UNIX and Linux Forums |