awk, max value, array, row


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk, max value, array, row
# 1  
Old 07-16-2012
awk, max value, array, row

Hello:
I want to print out the entire row with max value in column 3 based on column 2. Input file is millions rows. test.dat:
Code:
Contig1 lcl|1DL 111     155     265     27
Contig2 lcl|1DS 100     73      172     100
Contig3 lcl|1DL 140     698     837     140
Contig3 lcl|6DS 107     1488    1594    1
Contig5 lcl|6DL 193     59      251     374
Contig5 lcl|4DS 119     1       119     119
Contig5 lcl|6DL 107     145     251     596
Contig6 lcl|6DS 153     90      242     674
Contig7 lcl|4DL 103     913     1015    6590
Contig7 lcl|6DL 107     1016    1122    1152
Contig8 lcl|6DS 291     2700    2990    291
Contig8 lcl|4DS 279     2594    2872    279
Contig8 lcl|6DS 244     3711    3954    1
Contig8 lcl|6DS 159     3796    3954    1
Contig8 lcl|6DL 194     3237    3430    194
Contig8 lcl|1DS 109     4069    4177    269

I first tried:
Code:
awk '{if(! ($2 in a)) a[$2]=$3; else if($3 > a[$2]) a[$2]=$3; max[$2]=$0} END {for (i in max) print i, a[i]}' test.dat

and the output is:
Code:
lcl|4DL 103
lcl|4DS 279
lcl|6DL 194
lcl|6DS 291
lcl|1DL 140
lcl|1DS 109

As I want to print out the whole row of the max values of each item, then I tried:
Code:
awk '{if(! ($2 in a)) a[$2]=$3; else if($3 > a[$2]) a[$2]=$3; max[$2]=$0} END {for (i in max) print  max[i]}' test.dat

and the output is:
Code:
Contig7 lcl|4DL 103     913     1015    6590
Contig8 lcl|4DS 279     2594    2872    279
Contig8 lcl|6DL 194     3237    3430    194
Contig8 lcl|6DS 159     3796    3954    1
Contig3 lcl|1DL 140     698     837     140
Contig8 lcl|1DS 109     4069    4177    269

Obviously I had something wrong with the second script. I am very nervous with the second script for millions of rows, but could not figure it out myself. Thanks in advance!
YT
# 2  
Old 07-16-2012
The max assignment is outside the if-else statement, so it always occurs.

Regards,
Alister

---------- Post updated at 02:59 PM ---------- Previous update was at 02:55 PM ----------

Also, you don't really need to bother with checking for the existence of an array member in awk. If it does not exist, it's treated as a zero or an empty string (depends on the context). Your script could be simplified a bit:
Code:
awk '$3 > a[$2] {a[$2]=$3; max[$2]=$0} END {for (i in max) print  max[i]}' test.dat

Regards,
Alister
This User Gave Thanks to alister For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Print a row with the max number in a column

Hello, I have this table: chr1_16857_17742 - chr1 17369 17436 "ENST00000619216.1"; "MIR6859-1"; - 67 chr1_16857_17742 - chr1 14404 29570 "ENST00000488147.1"; "WASH7P"; - 885 chr1_16857_18061 - chr1 ... (5 Replies)
Discussion started by: coppuca
5 Replies

2. Shell Programming and Scripting

Filter Row Based On Max Column Value After Group BY

Hello Team, Need your expertise on following: Here is the set of data: C1|4|C1SP1|A1|C1BP1|T1 C1|4|C1SP2|A1|C1BP2|T2 C2|3|C2SP1|A2|C2BP1|T2 C3|3|C3SP1|A3|C3BP1|T2 C2|2|C2SP2|A2|C2BP2|T1 I need to filter above date base on following two steps: 1. Group them by column 1 and 4 2.... (12 Replies)
Discussion started by: angshuman
12 Replies

3. Shell Programming and Scripting

Add sum of columns and max as new row

Hi, I am a new bie i need some help with respect to shell onliner; I have data in following format Name FromDate UntilDate Active Changed Touched Test 28-03-2013 28-03-2013 1 0.6667 100 Test2 28-03-2013 03-04-2013 ... (1 Reply)
Discussion started by: gangaraju6
1 Replies

4. Shell Programming and Scripting

Identify max value in diff columns for same row

Hi, I have a file with 1M records ABC 200 400 2.4 5.6 ABC 410 299 12 1.5 XYZ 4 5 6 7 MNO 22 40 30 70 MNO 47 55 80 150 What I want is for all the rows it should take the max value where there are duplicates output ABC 410 400 12 5.6 XYZ 4 5 6 7 MNO 47 55 80 150 How can i... (6 Replies)
Discussion started by: Diya123
6 Replies

5. Shell Programming and Scripting

Sum value in a row and print the max

I have the input file in attached. I want the output file : Date , Time , Max_Bearer 11/01/2013 , 23:00 , 1447.894167 11/02/2013 , 00:00 , 1429.266667 11/03/2013 , 00:00 , 712.3175 11/04/2013 , 22:00 , 650.9533333 11/05/2013 , 23:00 , 665.9558333 11/06/2013 , 23:00 , 659.8616667... (2 Replies)
Discussion started by: justbow
2 Replies

6. Shell Programming and Scripting

extracting row with max column value using awk or unix

Hello, BC106081_abc_128240811_128241377 7.96301 BC106081_abc_128240811_128241377 39.322 BC106081_cde_128240811_128241377 1.98628 BC106081_def_128240811_128241377 -2.44492 BC106081_abc_128240811_128241377 69.5504 FLJ00075_xyz_14406_16765 -0.173417 ... (3 Replies)
Discussion started by: Diya123
3 Replies

7. Shell Programming and Scripting

Finding Max value from an array

Hi, I need to find max and second max element from an array. array contains 0338,0337,0339,0340,0401,0402,0403 (10 Replies)
Discussion started by: vjasai
10 Replies

8. Shell Programming and Scripting

Max amount of awk array indices

Does anyone know what the max amount of indices you can store in a awk array? (0 Replies)
Discussion started by: timj123
0 Replies

9. UNIX for Advanced & Expert Users

MAX SIZE ARRAY Can Hold it

Hi, Do anyone know what's the max size of array (in awk) can be store before hit any memory issue. Regards (3 Replies)
Discussion started by: epall
3 Replies

10. Shell Programming and Scripting

How i get the max value of a row?

I have a file like: <word> 5 <word> 3 <word> 5 <word> 2 <word> 6 <word> 8 <word> 12 and i need to know the max value of the second column, in this case 12. Plz help me! Actually i need the TOTAL, AVERANGE and MAX VALUE and i'm using this in... (10 Replies)
Discussion started by: Lestat
10 Replies
Login or Register to Ask a Question