Visit The New, Modern Unix Linux Community


Min/max/total for selected columns based on first column as ID


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Min/max/total for selected columns based on first column as ID
# 1  
Min/max/total for selected columns based on first column as ID

For every ID in column one, I want to get the min/max/total for each ID.


The values in different columns are not sorted, the actual attempt, works only if the columns are sorted.


input file
Code:
2010  1  44413               41105.0 21.75 146  
2010  1  44415               41105.0 21.75 146  
2010  1  44417               41105.0 21.75 100  
2010  1  44419               41105.0 28.00 146 
2010  1  50000               41105.0 21.75 200  
2010  1  44423               41105.0 21.75 146  
2011  1  44425               41105.0 21.75 146  
2011  1  44427               41105.0 20.00 146  
2011  1  70000               41105.0 21.75 146  
2011  1  44433               41105.0 29.00 700  
2011  1  44435               41105.0 21.00 146  
2012  1  44437               41105.0 21.75 146  
2012  1  20000               41105.0 21.75 150  
2012  1  44441               41105.0 21.75 146  
2012  1  90000               41105.0 21.75 146  
2012  1  44445               41105.0 21.75 600  
2012  1  44447               41105.0 21.75 146  
2012  1  44447               41105.0 21.75 146

attempt

Code:
 awk '{ currKey = $1 }
    currKey != prevKey { prt(); min=$3;min1=$5;min2=$6;cnt=0}
    { prevRec=$0; prevKey=currKey; max=$3;max1=$10;max2=$6; cnt++ }
      { prevKey=currKey; TOTAL+=$6}    
    END { prt() }
    function prt(   f) {
        if ( cnt ) {
        split(prevRec,f)
        print f[1],min,max,min1,max1,min2,max2,TOTAL, cnt
        }
    }' file | column -t

output from attempt
Code:
2010  44413  44423  21.75  21.75  146  146  884   6
2011  44425  44435  21.75  21.00  146  146  2668  5
2012  44437  44447  21.75  21.75  146  146  4002  7

output desired
Code:
2010  44413  50000  21.75  28.00  100  200  884   6
2011  44425  70000  20.00  29.00  146  700  1284  5
2012  20000  90000  21.75  21.75  146  600  1280  7

Appreciate your support

Last edited by jiam912; 03-27-2020 at 07:14 PM..
# 2  
I get the results using datamash. But i will like to get the same with awk

Code:
datamash -W -g1 min 3 max 3 min 5 max 5 min 6 max 6 sum 6 count 1 < file


Code:
2010    44413    50000    21.75   28      100    200    884    6
2011    44425    70000    20      29      146    700    1284    5
2012    20000    90000    21.75   21.75   146    600    1480    7

# 3  
Hi
Maybe this will be a little clearer
Code:
awk '
!A9[$1]         {A2[$1]=$3;A4[$1]=$5;A6[$1]=$6}
A2[$1]>$3       {A2[$1]=$3}
A3[$1]<$3       {A3[$1]=$3}
A4[$1]>$5       {A4[$1]=$5}
A5[$1]<$5       {A5[$1]=$5}
A6[$1]>$6       {A6[$1]=$6}
A7[$1]<$6       {A7[$1]=$6}
                {A8[$1]+=$6; A9[$1]++;}
END             {for(i in A9) print i,A2[i],A3[i],A4[i],A5[i],A6[i],A7[i],A8[i],A9[i]}
' OFS='\t' file

# 4  
Slight optimization
Code:
awk '
!A9[$1]++       {A2[$1]=$3;A4[$1]=$5;A6[$1]=$6}
A2[$1]>$3       {A2[$1]=$3}
A3[$1]<$3       {A3[$1]=$3}
A4[$1]>$5       {A4[$1]=$5}
A5[$1]<$5       {A5[$1]=$5}
A6[$1]>$6       {A6[$1]=$6}
A7[$1]<$6       {A7[$1]=$6}
                {A8[$1]+=$6}
END             {for(i in A9) print i,A2[i],A3[i],A4[i],A5[i],A6[i],A7[i],A8[i],A9[i]}
' OFS='\t' file

This User Gave Thanks to nezabudka For This Post:
# 5  
I understand your idea. You have a list sorted by ID and you process one ID at a time. You assume that the file being processed is too large for the memory allocated for this. I want to note the script I proposed earlier allows you to process an unsorted list. And the limits of RAM are limited only by the system-wide but I would continue only for a sorted list...
Code:
awk '
a != $1         {if(NR!=1)print a,a2,a3,a4,a5,a6,a7,a8,a9
                 a=$1; a2=$3; a4=$5; a6=$6; a9=a8=a7=a5=a3=0
                }
a2>$3           {a2=$3}
a3<$3           {a3=$3}
a4>$5           {a4=$5}
a5<$5           {a5=$5}
a6>$6           {a6=$6}
a7<$6           {a7=$6}
                {a8+=$6; a9++}
END             {print a,a2,a3,a4,a5,a6,a7,a8,a9}
' OFS='\t' file


Last edited by nezabudka; 03-28-2020 at 08:45 AM..
This User Gave Thanks to nezabudka For This Post:
# 6  
Hi nezabudka.


The codes works perfect, thanks a lot for your help. Yes, As you mentioned the list in the file are sorted by ID (column1) and we process one ID at a time

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #642
Difficulty: Easy
The first full release of NeXTSTEP 1.0 shipped on September 18, 1988.
True or False?

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Is it possible to ascend a numbers based on selected columns?

I have to ascend the number of two selected columns by horizontal manner. For example, I have a data like this in csv file (tab delimited format) 08 1 19185 18010 16 4 7960 9339 01 6 516769 517428 09 9 51384 49270 I need to ascend the two columns numbers (horizontal manner) like as... (5 Replies)
Discussion started by: dineshkumarsrk
5 Replies

2. Shell Programming and Scripting

Get min and max value in column

Gents, I have a big file file like this. 5100010002 5100010004 5100010006 5100010008 5100010010 5100010012 5102010002 5102010004 5102010006 5102010008 5102010010 5102010012 The file is sorted and I would like to find the min and max value, taking in the consideration key1... (3 Replies)
Discussion started by: jiam912
3 Replies

3. Shell Programming and Scripting

Please Help!!!! Awk for summing columns based on selected column value

a,b,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z,aa,bb,cc,dd,ee,ff,gg,hh,ii a thru ii are digits and strings.... The awk needed....if coloumn 9 == i (coloumn 9 is string ), output the sum of x's(coloumn 22 ) in all records and sum of y's (coloumn 23 ) in all records in a file (records.txt).... (6 Replies)
Discussion started by: BrownBob
6 Replies

4. Shell Programming and Scripting

Print min and max value from two column

Dear All, I have data like this, input: 1254 10125 1254 10126 1254 10127 1254 10128 1254 10129 1255 10130 1255 10131 1255 10132 1255 10133 1256 10134 1256 10135 1256 10137... (3 Replies)
Discussion started by: aksin
3 Replies

5. UNIX for Dummies Questions & Answers

[Solved] Print a line using a max and a min values of different columns

Hi guys, I already search on the forum but i can't solve this on my own. I have a lot of files like this: And i need to print the line with the maximum value in last column but if the value is the same (2 in this exemple for the 3 last lines) i need get the line with the minimum value in... (4 Replies)
Discussion started by: MetaBolic0
4 Replies

6. Shell Programming and Scripting

to find min and max value for each column!

Hello Experts, I have got a txt files which has multiple columns, I want to get the max, min and diff (max-min) for each column in the same txt file. Example: cat file.txt a 1 4 b 2 5 c 3 6 I want ouput like: cat file.txt a 1 4 b 2 5 c 3 6 Max 3 6 Min 1 4 Diff 2 2 awk 'min=="" ||... (4 Replies)
Discussion started by: dixits
4 Replies

7. Shell Programming and Scripting

How to find the average,min,max ,total count?

Hi , Below is my sample data,I have this 8 column(A,B,C,D,E,F,G,H) in csv file. A , B ,C ,D ,E ,F,G ,H 4141,127337,24,15,20,69,72.0,-3 4141,128864,24,15,20,65,66.0,-1 4141,910053,24,15,4,4,5.0,-1 4141,910383,24,15,22,3,4.0,-1 4141,496969,24,15,14,6,-24.0,-18... (7 Replies)
Discussion started by: vinothsekark
7 Replies

8. Shell Programming and Scripting

AWK script - extracting min and max values from selected lines

Hi guys! I'm new to scripting and I need to write a script in awk. Here is example of file on which I'm working ATOM 4688 HG1 PRO A 322 18.080 59.680 137.020 1.00 0.00 ATOM 4689 HG2 PRO A 322 18.850 61.220 137.010 1.00 0.00 ATOM 4690 CD ... (18 Replies)
Discussion started by: grincz
18 Replies

9. Shell Programming and Scripting

Find min.max value if matching columns found using AWK

Input_ File : 2 3 4 5 1 1 0 1 2 1 -1 1 2 1 3 1 3 1 4 1 6 5 6 6 6 6 6 7 6 7 6 8 5 8 6 7 Desired output : 2 3 4 5 -1 1 4 1 6 5 6 8 5 8 6 7 (3 Replies)
Discussion started by: vasanth.vadalur
3 Replies

10. Shell Programming and Scripting

sorting csv file based on column selected

Hi all, in my csv file it'll look like this, and of course it may have more columns US to UK;abc-hq-jcl;multimedia UK to CN;def-ny-jkl;standard DE to DM;abc-ab-klm;critical FD to YM;la-yr-tym;standard HY to MC;la-yr-ytm;multimedia GT to KJ;def-ny-jrt;critical I would like to group... (4 Replies)
Discussion started by: tententen
4 Replies

Featured Tech Videos