Sponsored Content
Full Discussion: awk, max value, array, row
Top Forums Shell Programming and Scripting awk, max value, array, row Post 302672569 by yifangt on Monday 16th of July 2012 02:42:35 PM
Old 07-16-2012
awk, max value, array, row

Hello:
I want to print out the entire row with max value in column 3 based on column 2. Input file is millions rows. test.dat:
Code:
Contig1 lcl|1DL 111     155     265     27
Contig2 lcl|1DS 100     73      172     100
Contig3 lcl|1DL 140     698     837     140
Contig3 lcl|6DS 107     1488    1594    1
Contig5 lcl|6DL 193     59      251     374
Contig5 lcl|4DS 119     1       119     119
Contig5 lcl|6DL 107     145     251     596
Contig6 lcl|6DS 153     90      242     674
Contig7 lcl|4DL 103     913     1015    6590
Contig7 lcl|6DL 107     1016    1122    1152
Contig8 lcl|6DS 291     2700    2990    291
Contig8 lcl|4DS 279     2594    2872    279
Contig8 lcl|6DS 244     3711    3954    1
Contig8 lcl|6DS 159     3796    3954    1
Contig8 lcl|6DL 194     3237    3430    194
Contig8 lcl|1DS 109     4069    4177    269

I first tried:
Code:
awk '{if(! ($2 in a)) a[$2]=$3; else if($3 > a[$2]) a[$2]=$3; max[$2]=$0} END {for (i in max) print i, a[i]}' test.dat

and the output is:
Code:
lcl|4DL 103
lcl|4DS 279
lcl|6DL 194
lcl|6DS 291
lcl|1DL 140
lcl|1DS 109

As I want to print out the whole row of the max values of each item, then I tried:
Code:
awk '{if(! ($2 in a)) a[$2]=$3; else if($3 > a[$2]) a[$2]=$3; max[$2]=$0} END {for (i in max) print  max[i]}' test.dat

and the output is:
Code:
Contig7 lcl|4DL 103     913     1015    6590
Contig8 lcl|4DS 279     2594    2872    279
Contig8 lcl|6DL 194     3237    3430    194
Contig8 lcl|6DS 159     3796    3954    1
Contig3 lcl|1DL 140     698     837     140
Contig8 lcl|1DS 109     4069    4177    269

Obviously I had something wrong with the second script. I am very nervous with the second script for millions of rows, but could not figure it out myself. Thanks in advance!
YT
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How i get the max value of a row?

I have a file like: <word> 5 <word> 3 <word> 5 <word> 2 <word> 6 <word> 8 <word> 12 and i need to know the max value of the second column, in this case 12. Plz help me! Actually i need the TOTAL, AVERANGE and MAX VALUE and i'm using this in... (10 Replies)
Discussion started by: Lestat
10 Replies

2. UNIX for Advanced & Expert Users

MAX SIZE ARRAY Can Hold it

Hi, Do anyone know what's the max size of array (in awk) can be store before hit any memory issue. Regards (3 Replies)
Discussion started by: epall
3 Replies

3. Shell Programming and Scripting

Max amount of awk array indices

Does anyone know what the max amount of indices you can store in a awk array? (0 Replies)
Discussion started by: timj123
0 Replies

4. Shell Programming and Scripting

Finding Max value from an array

Hi, I need to find max and second max element from an array. array contains 0338,0337,0339,0340,0401,0402,0403 (10 Replies)
Discussion started by: vjasai
10 Replies

5. Shell Programming and Scripting

extracting row with max column value using awk or unix

Hello, BC106081_abc_128240811_128241377 7.96301 BC106081_abc_128240811_128241377 39.322 BC106081_cde_128240811_128241377 1.98628 BC106081_def_128240811_128241377 -2.44492 BC106081_abc_128240811_128241377 69.5504 FLJ00075_xyz_14406_16765 -0.173417 ... (3 Replies)
Discussion started by: Diya123
3 Replies

6. Shell Programming and Scripting

Sum value in a row and print the max

I have the input file in attached. I want the output file : Date , Time , Max_Bearer 11/01/2013 , 23:00 , 1447.894167 11/02/2013 , 00:00 , 1429.266667 11/03/2013 , 00:00 , 712.3175 11/04/2013 , 22:00 , 650.9533333 11/05/2013 , 23:00 , 665.9558333 11/06/2013 , 23:00 , 659.8616667... (2 Replies)
Discussion started by: justbow
2 Replies

7. Shell Programming and Scripting

Identify max value in diff columns for same row

Hi, I have a file with 1M records ABC 200 400 2.4 5.6 ABC 410 299 12 1.5 XYZ 4 5 6 7 MNO 22 40 30 70 MNO 47 55 80 150 What I want is for all the rows it should take the max value where there are duplicates output ABC 410 400 12 5.6 XYZ 4 5 6 7 MNO 47 55 80 150 How can i... (6 Replies)
Discussion started by: Diya123
6 Replies

8. Shell Programming and Scripting

Add sum of columns and max as new row

Hi, I am a new bie i need some help with respect to shell onliner; I have data in following format Name FromDate UntilDate Active Changed Touched Test 28-03-2013 28-03-2013 1 0.6667 100 Test2 28-03-2013 03-04-2013 ... (1 Reply)
Discussion started by: gangaraju6
1 Replies

9. Shell Programming and Scripting

Filter Row Based On Max Column Value After Group BY

Hello Team, Need your expertise on following: Here is the set of data: C1|4|C1SP1|A1|C1BP1|T1 C1|4|C1SP2|A1|C1BP2|T2 C2|3|C2SP1|A2|C2BP1|T2 C3|3|C3SP1|A3|C3BP1|T2 C2|2|C2SP2|A2|C2BP2|T1 I need to filter above date base on following two steps: 1. Group them by column 1 and 4 2.... (12 Replies)
Discussion started by: angshuman
12 Replies

10. UNIX for Beginners Questions & Answers

Print a row with the max number in a column

Hello, I have this table: chr1_16857_17742 - chr1 17369 17436 "ENST00000619216.1"; "MIR6859-1"; - 67 chr1_16857_17742 - chr1 14404 29570 "ENST00000488147.1"; "WASH7P"; - 885 chr1_16857_18061 - chr1 ... (5 Replies)
Discussion started by: coppuca
5 Replies
NTPDC(8)						      System Manager's Manual							  NTPDC(8)

NAME
ntpdc - monitor operation of ntp daemon SYNOPSIS
ntpdc [-n] [-v] hosts... DESCRIPTION
ntpdc sends an INFO_QUERY packet to an ntp daemon running on the given hosts. Each daemon responds with information about each of its peers, which ntpdc formats on the standard output. Normally, the name of the responding host and its peers are printed. The -n switch disables this, printing only internet addresses. Default is a terse, table-style report. The -t switch generates an alternate form of the terse report. The -v switch generates a verbose report. TERSE REPORT
A typical terse report looks like: (rem) Address (lcl) Strat Poll Reach Delay Offset Disp ========================================================================== -umd1 128.8.10.14 1 64 266 3.0 -65.0 0.0 *DCN1.ARPA 128.8.10.14 1 256 332 155.0 -4.0 0.0 128.8.251.92 128.8.10.14 2 64 367 -16.0 -61.0 0.0 idunno.Princeto 128.8.10.14 3 64 252 60.0 -53.0 0.0 leo 128.8.10.14 2 64 275 4.0 -273.0 1536.2 The alternate form is only slightly different; it looks like: Address Reference Strat Poll Reach Delay Offset Disp ========================================================================== -umd1 WWVB 1 64 266 3.0 -65.0 0.0 *DCN1.ARPA WWVB 1 256 332 155.0 -4.0 0.0 128.8.251.92 umd1 2 64 367 -16.0 -61.0 0.0 idunno.Prince trantor 3 64 252 60.0 -53.0 0.0 leo umd1 2 64 275 4.0 -273.0 1536.2 Fields are interpreted as follows: - or *: The - mark indicates a pre-configured peer (mentioned in ntp.conf). the * mark shows which pre-configured peer (if any) is cur- rently being used for synchronization. (rem) address: The remote host name or internet address of a peer. (lcl) address: The "local" host as specified as an argument to ntpdc. Reference: The reference time source being used for synchronization by the peer. Strat: The stratum level of the peer (as perceived by the local host). Poll: Current polling interval in seconds for this peer. Reach: Octal value of a shift register indicating which responses were received from the previous 8 polls to this peer (see RFC-????). Delay: Round-trip delay in milliseconds for this peer as of the latest poll. Disp: Current value of dispersion (see RFC-????) in milliseconds for this peer. VERBOSE REPORTS
When the -v flag is given a series of verbose reports are presented. A typical one looks like this: Neighbor address 128.4.0.6 port:123 local address 192.35.201.47 Reach: 0376 stratum: 1 poll int(HPI): 10 precision: -10 Sync distance: 0 disp: 0.014000 flags: 0 leap: 0 Reference clock ID: WWV timestamp: a7c2832e.6f9d0000 Poll int(MPI): 10 threshold: 1024 timer: 1024 send: 266 received: 192 samples: 9 Delay(ms) 1144.00 1296.00 1118.00 1115.00 1225.00 1129.00 1086.00 1087.00 Offset(ms) 19.00 92.00 -17.00 12.00 41.00 4.00 -1.00 -14.00 delay: 1086.000000 offset: -1.000000 dsp 0.014000 Fields are interpreted as follows: Neighbor address...: The address and port number of this neighbor, followed by the local address. Reach: nn Reachability in response to last 8 polls (octal value of shift register) stratum: n Stratum level. poll interval: time precision: nn The precision of this clock, given in seconds as a power of 2. e.g A clock derived from the power line frequency (60 Hz) has a pre- cision of 1/60 second (about 2^-6) and would be indicated by a precision of -6. Syn distance: 0 Synchronizing distance. Always zero in the current implementation. disp: nn Dispersion. flags: nn leap: flag The leap second indicator. Non-zero if there is to be a leap second added or subtracted at the new year. Reference clock ID: [address] timestamp: nn Poll interval: time threshold: nn timer: nn send: nn The number of ntp packets sent to this neighbor. received: nn The number of ntp packets received from this neighbor. samples: nn Delay and Offset The round-trip delay and clock offset for the last eight ntp packet exchanges. If there are fewer than eight valid samples, the delay field will be zero. delay: avg-delay offset: avg-offset dsp ??? Average delay, offset, and dispersion calculated from the above samples. Meanings...??? BUGS
Probably a few. Report bugs to Louis A. Mamakos (louie@trantor.umd.edu). SEE ALSO
RFC-???? Network Time Protocol(1), Dave Mills and ... ntpd(8), ntp(8) 10 March 1989 NTPDC(8)
All times are GMT -4. The time now is 08:09 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy