Sponsored Content
Full Discussion: awk, max value, array, row
Top Forums Shell Programming and Scripting awk, max value, array, row Post 302672569 by yifangt on Monday 16th of July 2012 02:42:35 PM
Old 07-16-2012
awk, max value, array, row

Hello:
I want to print out the entire row with max value in column 3 based on column 2. Input file is millions rows. test.dat:
Code:
Contig1 lcl|1DL 111     155     265     27
Contig2 lcl|1DS 100     73      172     100
Contig3 lcl|1DL 140     698     837     140
Contig3 lcl|6DS 107     1488    1594    1
Contig5 lcl|6DL 193     59      251     374
Contig5 lcl|4DS 119     1       119     119
Contig5 lcl|6DL 107     145     251     596
Contig6 lcl|6DS 153     90      242     674
Contig7 lcl|4DL 103     913     1015    6590
Contig7 lcl|6DL 107     1016    1122    1152
Contig8 lcl|6DS 291     2700    2990    291
Contig8 lcl|4DS 279     2594    2872    279
Contig8 lcl|6DS 244     3711    3954    1
Contig8 lcl|6DS 159     3796    3954    1
Contig8 lcl|6DL 194     3237    3430    194
Contig8 lcl|1DS 109     4069    4177    269

I first tried:
Code:
awk '{if(! ($2 in a)) a[$2]=$3; else if($3 > a[$2]) a[$2]=$3; max[$2]=$0} END {for (i in max) print i, a[i]}' test.dat

and the output is:
Code:
lcl|4DL 103
lcl|4DS 279
lcl|6DL 194
lcl|6DS 291
lcl|1DL 140
lcl|1DS 109

As I want to print out the whole row of the max values of each item, then I tried:
Code:
awk '{if(! ($2 in a)) a[$2]=$3; else if($3 > a[$2]) a[$2]=$3; max[$2]=$0} END {for (i in max) print  max[i]}' test.dat

and the output is:
Code:
Contig7 lcl|4DL 103     913     1015    6590
Contig8 lcl|4DS 279     2594    2872    279
Contig8 lcl|6DL 194     3237    3430    194
Contig8 lcl|6DS 159     3796    3954    1
Contig3 lcl|1DL 140     698     837     140
Contig8 lcl|1DS 109     4069    4177    269

Obviously I had something wrong with the second script. I am very nervous with the second script for millions of rows, but could not figure it out myself. Thanks in advance!
YT
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

How i get the max value of a row?

I have a file like: <word> 5 <word> 3 <word> 5 <word> 2 <word> 6 <word> 8 <word> 12 and i need to know the max value of the second column, in this case 12. Plz help me! Actually i need the TOTAL, AVERANGE and MAX VALUE and i'm using this in... (10 Replies)
Discussion started by: Lestat
10 Replies

2. UNIX for Advanced & Expert Users

MAX SIZE ARRAY Can Hold it

Hi, Do anyone know what's the max size of array (in awk) can be store before hit any memory issue. Regards (3 Replies)
Discussion started by: epall
3 Replies

3. Shell Programming and Scripting

Max amount of awk array indices

Does anyone know what the max amount of indices you can store in a awk array? (0 Replies)
Discussion started by: timj123
0 Replies

4. Shell Programming and Scripting

Finding Max value from an array

Hi, I need to find max and second max element from an array. array contains 0338,0337,0339,0340,0401,0402,0403 (10 Replies)
Discussion started by: vjasai
10 Replies

5. Shell Programming and Scripting

extracting row with max column value using awk or unix

Hello, BC106081_abc_128240811_128241377 7.96301 BC106081_abc_128240811_128241377 39.322 BC106081_cde_128240811_128241377 1.98628 BC106081_def_128240811_128241377 -2.44492 BC106081_abc_128240811_128241377 69.5504 FLJ00075_xyz_14406_16765 -0.173417 ... (3 Replies)
Discussion started by: Diya123
3 Replies

6. Shell Programming and Scripting

Sum value in a row and print the max

I have the input file in attached. I want the output file : Date , Time , Max_Bearer 11/01/2013 , 23:00 , 1447.894167 11/02/2013 , 00:00 , 1429.266667 11/03/2013 , 00:00 , 712.3175 11/04/2013 , 22:00 , 650.9533333 11/05/2013 , 23:00 , 665.9558333 11/06/2013 , 23:00 , 659.8616667... (2 Replies)
Discussion started by: justbow
2 Replies

7. Shell Programming and Scripting

Identify max value in diff columns for same row

Hi, I have a file with 1M records ABC 200 400 2.4 5.6 ABC 410 299 12 1.5 XYZ 4 5 6 7 MNO 22 40 30 70 MNO 47 55 80 150 What I want is for all the rows it should take the max value where there are duplicates output ABC 410 400 12 5.6 XYZ 4 5 6 7 MNO 47 55 80 150 How can i... (6 Replies)
Discussion started by: Diya123
6 Replies

8. Shell Programming and Scripting

Add sum of columns and max as new row

Hi, I am a new bie i need some help with respect to shell onliner; I have data in following format Name FromDate UntilDate Active Changed Touched Test 28-03-2013 28-03-2013 1 0.6667 100 Test2 28-03-2013 03-04-2013 ... (1 Reply)
Discussion started by: gangaraju6
1 Replies

9. Shell Programming and Scripting

Filter Row Based On Max Column Value After Group BY

Hello Team, Need your expertise on following: Here is the set of data: C1|4|C1SP1|A1|C1BP1|T1 C1|4|C1SP2|A1|C1BP2|T2 C2|3|C2SP1|A2|C2BP1|T2 C3|3|C3SP1|A3|C3BP1|T2 C2|2|C2SP2|A2|C2BP2|T1 I need to filter above date base on following two steps: 1. Group them by column 1 and 4 2.... (12 Replies)
Discussion started by: angshuman
12 Replies

10. UNIX for Beginners Questions & Answers

Print a row with the max number in a column

Hello, I have this table: chr1_16857_17742 - chr1 17369 17436 "ENST00000619216.1"; "MIR6859-1"; - 67 chr1_16857_17742 - chr1 14404 29570 "ENST00000488147.1"; "WASH7P"; - 885 chr1_16857_18061 - chr1 ... (5 Replies)
Discussion started by: coppuca
5 Replies
CLAQGB(l)								 )								 CLAQGB(l)

NAME
CLAQGB - equilibrate a general M by N band matrix A with KL subdiagonals and KU superdiagonals using the row and scaling factors in the vectors R and C SYNOPSIS
SUBROUTINE CLAQGB( M, N, KL, KU, AB, LDAB, R, C, ROWCND, COLCND, AMAX, EQUED ) CHARACTER EQUED INTEGER KL, KU, LDAB, M, N REAL AMAX, COLCND, ROWCND REAL C( * ), R( * ) COMPLEX AB( LDAB, * ) PURPOSE
CLAQGB equilibrates a general M by N band matrix A with KL subdiagonals and KU superdiagonals using the row and scaling factors in the vec- tors R and C. ARGUMENTS
M (input) INTEGER The number of rows of the matrix A. M >= 0. N (input) INTEGER The number of columns of the matrix A. N >= 0. KL (input) INTEGER The number of subdiagonals within the band of A. KL >= 0. KU (input) INTEGER The number of superdiagonals within the band of A. KU >= 0. AB (input/output) COMPLEX array, dimension (LDAB,N) On entry, the matrix A in band storage, in rows 1 to KL+KU+1. The j-th column of A is stored in the j-th column of the array AB as follows: AB(ku+1+i-j,j) = A(i,j) for max(1,j-ku)<=i<=min(m,j+kl) On exit, the equilibrated matrix, in the same storage format as A. See EQUED for the form of the equilibrated matrix. LDAB (input) INTEGER The leading dimension of the array AB. LDA >= KL+KU+1. R (output) REAL array, dimension (M) The row scale factors for A. C (output) REAL array, dimension (N) The column scale factors for A. ROWCND (output) REAL Ratio of the smallest R(i) to the largest R(i). COLCND (output) REAL Ratio of the smallest C(i) to the largest C(i). AMAX (input) REAL Absolute value of largest matrix entry. EQUED (output) CHARACTER*1 Specifies the form of equilibration that was done. = 'N': No equilibration = 'R': Row equilibration, i.e., A has been premultiplied by diag(R). = 'C': Column equilibration, i.e., A has been postmulti- plied by diag(C). = 'B': Both row and column equilibration, i.e., A has been replaced by diag(R) * A * diag(C). PARAMETERS
THRESH is a threshold value used to decide if row or column scaling should be done based on the ratio of the row or column scaling factors. If ROWCND < THRESH, row scaling is done, and if COLCND < THRESH, column scaling is done. LARGE and SMALL are threshold values used to decide if row scaling should be done based on the absolute size of the largest matrix element. If AMAX > LARGE or AMAX < SMALL, row scaling is done. LAPACK version 3.0 15 June 2000 CLAQGB(l)
All times are GMT -4. The time now is 09:15 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy