Sum based on certain column Post: 302834793

Sponsored Content

Top Forums UNIX for Dummies Questions & Answers Sum based on certain column Post 302834793 by Don Cragun on Saturday 20th of July 2013 12:48:38 AM

07-20-2013

Registered User

You could try something like the following:

Code:

awk '
BEGIN { FS = "[ /\t]+"
        OFS = "    "
        s = " "
}
{       v[$1 s $2 s $3 OFS $8] += $5 }
END {   for(i in v)
                printf("%s%s%.*f\n",
                        i, OFS, 9 - int(log(v[i]) / log(10)), v[i])
}' file1 | sort -k3n,3 -k1n,1 -k2n,2 -k4,4 | sed 's# #/#;s# #/#'

As always, if you are going to run this on a Solaris/SunOS system, use /usr/xpg4/bin/awk, /usr/xpg6/bin/awk, or nawk instead of /usr/bin/awk or /bin/awk.

By having awk use sequences of spaces, slashes, and tabs as field separators, the date field is split into month, day, and year fields as input lines are read. The subscript used for the v[] array (which contains the sum of the values in column 3 [field 5 after splitting the date field]) is the month followed by a space followed by the day of the month followed by a space followed by the year followed by four spaces followed by the contents of the 6th column (8th field after splitting the date). The END clause prints the subscript for each value found along with the sum of the values accumulated for each subscript.

Translating the slashes in the date field to spaces allows the sort command to sort the output produced by awk on the various numeric components of the date and the original contents of the alphanumeric input file's 6th column. After sorting the output, the sed command converts the 1st two spaces on the output line back to slashes thereby restoring the date field to its original format.

The above script produces the output you said you wanted in the 1st message in this thread except that the output shown in red below was rounded differently than in your example:

Code:

1/1/2013    X1    1012.909698
1/1/2013    X2    600.8333588
1/2/2013    X1    844.2973022
1/2/2013    X2    833.9300537
1/3/2013    X1    563.6917419
1/3/2013    X2    632.0749969
1/4/2013    X1    48.33055687

Note that the log() calculations in the awk printf statement are there to calculate the varying number of decimal places you showed in your desired output. That printf statement could be simplified if you were willing to accept a constant number of digits after the decimal point in the printed sums.

Alternatively, you could split the date field, sort the input into the desired output order, reform the date field in the sorted input and use the procedures outlined in the thread bakunin referenced. I haven't made any attempt to compare the efficiency of these alternative approaches.

Hope this helps,
Don

This User Gave Thanks to Don Cragun For This Post:

Don Cragun

View Public Profile for Don Cragun

Find all posts by Don Cragun

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

How do I sum one column based on another column?

Hi, I am new to this forum and new to awk. I have a file that contains 2 columns. Heres an example of what it looks like: 10 + 20 + 40 + 50 - 70 - So the file is tab-delimited. What I want to do is add 10 to column 1 whenever column 2 is + and substract 10 from column 1...

2. Shell Programming and Scripting

sum multiple columns based on column value

i have a file - it will be in sorted order on column 1 abc 0 1 abc 2 3 abc 3 5 def 1 7 def 0 1 -------- i'd like (awk maybe?) to get the results (any ideas)??? abc 5 9 def 1 8

3. Shell Programming and Scripting

Sum a column value based on multiple keys

Hi, I have below as i/p file: 5ABC 36488989 K 000010000ASB BYTRES 5PQR 45757754 K 000200005KPC HGTRET 5ABC 36488989 K 000045000ASB HGTRET 5GTH 36488989 K 000200200ASB BYTRES 5FTU ...

4. Shell Programming and Scripting

Sum Of Column Based On Column Condition

I have a following inputfile MT,AP,CDM,TTML,MUM,GS,SUCC,3 MT,AP,CDM,TTSL,AP,GS,FAIL,9 MT,AP,CDM,RCom,MAH,GS,SUCC,3 MT,AP,CDM,RTL,HP,GS,SUCC,1 MT,AP,CDM,Uni,UPE,GS,SUCC,2 MT,AP,CDM,Uni,MUM,GS,SUCC,2 TTSL,AP,GS,MT,MAH,CDM,SUCC,20 TTML,AP,GS,MT,MAH,CDM,FAIL,10...

5. UNIX for Dummies Questions & Answers

Sum based on column 1

i have file input aaa ccc,45567,rterw,1 bbb dcs,564543,hjghgh,1 aaa ccc,454,rterw,6 i want to sum based on column 1 expected output aaa ccc,7 bbb dcs,1

6. Shell Programming and Scripting

awk to sum a column based on duplicate strings in another column and show split totals

Hi, I have a similar input format- A_1 2 B_0 4 A_1 1 B_2 5 A_4 1 and looking to print in this output format with headers. can you suggest in awk?awk because i am doing some pattern matching from parent file to print column 1 of my input using awk already.Thanks! letter number_of_letters...

7. Shell Programming and Scripting

Sum column values based in common identifier in 1st column.

Hi, I have a table to be imported for R as matrix or data.frame but I first need to edit it because I've got several lines with the same identifier (1st column), so I want to sum the each column (2nd -nth) of each identifier (1st column) The input is for example, after sorted: K00001 1 1 4 3...

8. UNIX for Dummies Questions & Answers

Match sum of values in each column with the corresponding column value present in trailer record

Hi All, I have a requirement where I need to find sum of values from column D through O present in a CSV file and check whether the sum of each Individual column matches with the value present for that corresponding column present in the trailer record. For example, let's assume for column D...

9. Shell Programming and Scripting

Sum of a column as new column based on header in a script

Hello, I am trying to store sum of a column as a new column inside a file but have to find the column names dynamically I/p c1,c2,c3,c4,c5 10,20,30,40,50 20,30,40,50,60 If i want to find sum only column c1, c3 and output it as c6,c7 O/p c1,c2,c3,c4,c5,c6,c7 10,20,30,40,50,30,70...

10. UNIX for Beginners Questions & Answers

Sum in file based column

Hi All, I have a file as below and want to sum based on the id in the first column Input 10264;ATE; 12 10265;SES;11 10266AUT;50 10264;ATE;10 10265;SES;13 10266AUT;89 10264;ATE;1 10265;SES;15 10266AUT;78 Output 10264;ATE; 23 10265;SES;39 10266AUT;139

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

How do I sum one column based on another column?

Discussion started by: phil_heath

2. Shell Programming and Scripting

sum multiple columns based on column value

Discussion started by: jjoe

3. Shell Programming and Scripting

Sum a column value based on multiple keys

Discussion started by: nirnkv

4. Shell Programming and Scripting

Sum Of Column Based On Column Condition

Discussion started by: siramitsharma

5. UNIX for Dummies Questions & Answers

Sum based on column 1

Discussion started by: radius

6. Shell Programming and Scripting

awk to sum a column based on duplicate strings in another column and show split totals

Discussion started by: prashob123

7. Shell Programming and Scripting

Sum column values based in common identifier in 1st column.

Discussion started by: sargotrons

8. UNIX for Dummies Questions & Answers

Match sum of values in each column with the corresponding column value present in trailer record

Discussion started by: tpk

9. Shell Programming and Scripting

Sum of a column as new column based on header in a script

Discussion started by: mkathi

10. UNIX for Beginners Questions & Answers

Sum in file based column

Discussion started by: arunkumar_mca