Sponsored Content
Full Discussion: Sum based on certain column
Top Forums UNIX for Dummies Questions & Answers Sum based on certain column Post 302834869 by bakunin on Saturday 20th of July 2013 07:19:01 AM
Old 07-20-2013
Quote:
Originally Posted by radius
but i would like to learn the master code of your Mr Don..eager to learn
That is a laudable attitude.

Quote:
Originally Posted by Don Cragun
Code:
awk '
BEGIN { FS = "[ /\t]+"
        OFS = "    "
        s = " "
}
{       v[$1 s $2 s $3 OFS $8] += $5 }
END {   for(i in v)
                printf("%s%s%.*f\n",
                        i, OFS, 9 - int(log(v[i]) / log(10)), v[i])
}' file1 | sort -k3n,3 -k1n,1 -k2n,2 -k4,4 | sed 's# #/#;s# #/#'

First, to appreciate what each part of the above command (actually a pipeline of three different commands) does you might want to redirect the output into a file, examine this and then run the file through the next step to see what this does. I suggest you use a small input file so that it is easy to oversee the output and notice any changes. You can even use several slightly altered versions of an input file to see how it affects the outcome.

In one word: its only files, which you can copy infinitely - play around.

Code:
awk '
BEGIN { FS = "[ /\t]+"
        OFS = "    "
        s = " "
}
{       v[$1 s $2 s $3 OFS $8] += $5 }
END {   for(i in v)
                printf("%s%s%.*f\n",
                        i, OFS, 9 - int(log(v[i]) / log(10)), v[i])
}' file1 > tempfile1

sort -k3n,3 -k1n,1 -k2n,2 -k4,4 tempfile1 > tempfile2

sed 's# #/#;s# #/#' tempfile2 > tempfile3

Let us start with the last part. "sed" is a non-interactive text-editor. It gets a script containing changes it should make in a text file and then does these changes. Here, two change rules are in the script:

Code:
s# #/#
s# #/#

These are "substitution"-rules: they search for a pattern in the first part, then substitute it with what is in the last part:

Code:
s<delimiter><pattern-to-search-for><delimiter><replacement><delimiter>

Usually "/" is used as delimiter, but as Don wanted to replace "/" he couldn't use it as delimiter, therefore he went for "#". He replaces a space char with a "/". This rule is there twice because per default each rule only subsitutes the first occurrance and he wanted to change the first two.

Code:
sort -k3n,3 -k1n,1 -k2n,2 -k4,4 tempfile1 > tempfile2

This sorts the output. I suggest you read the man page of all the commands used but the man page of this one will explain most: He constructs a sorting key for the date. As the date format is "M/D/Y" he first sorts on the year (field 3), then on the month (field 1), then on the day (field 2). Only then he sorts on field 4. All but the last key parts are sorted numerically.

Finally, the core piece: a really elegant awk script, which consists of three parts.

Code:
BEGIN { FS = "[ /\t]+"
        OFS = "    "
        s = " "
}
{       v[$1 s $2 s $3 OFS $8] += $5 }
END {   for(i in v)
                printf("%s%s%.*f\n",
                        i, OFS, 9 - int(log(v[i]) / log(10)), v[i])
}

awk processes input files line by line. The middle part:

Code:
{       v[$1 s $2 s $3 OFS $8] += $5 }

is what is executed for every line of the input file. It adds the content of the field over which to sum to a record in an associative array with the key value(s) as array index. This way lines with identical key values get summed automatically.

The first part:

Code:
BEGIN { FS = "[ /\t]+"
        OFS = "    "
        s = " "
}

Is executed once before the first line of the input file is read. It sets up the "Field Separator" and the "Output Field Separator" and a variable "s", which holds a single space. When you use "$1" (field 1) or "$2" (field 2) in an "awk" script it has to be told how to separate "field 1" from "field 2". It does so by splitting the input line at a "field separator" character. Per default this is a space, but Don redefines it here so that "field" is what you said it should be.

The last part

Code:
END {   for(i in v)
                printf("%s%s%.*f\n",
                        i, OFS, 9 - int(log(v[i]) / log(10)), v[i])
}

is executed once after the last line of the input is processed. This here is a simple for-llop which outputs the associative array which was collected in the middle part in a formatted way.

I hope this helps.

bakunin
 

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

How do I sum one column based on another column?

Hi, I am new to this forum and new to awk. I have a file that contains 2 columns. Heres an example of what it looks like: 10 + 20 + 40 + 50 - 70 - So the file is tab-delimited. What I want to do is add 10 to column 1 whenever column 2 is + and substract 10 from column 1... (1 Reply)
Discussion started by: phil_heath
1 Replies

2. Shell Programming and Scripting

sum multiple columns based on column value

i have a file - it will be in sorted order on column 1 abc 0 1 abc 2 3 abc 3 5 def 1 7 def 0 1 -------- i'd like (awk maybe?) to get the results (any ideas)??? abc 5 9 def 1 8 (2 Replies)
Discussion started by: jjoe
2 Replies

3. Shell Programming and Scripting

Sum a column value based on multiple keys

Hi, I have below as i/p file: 5ABC 36488989 K 000010000ASB BYTRES 5PQR 45757754 K 000200005KPC HGTRET 5ABC 36488989 K 000045000ASB HGTRET 5GTH 36488989 K 000200200ASB BYTRES 5FTU ... (2 Replies)
Discussion started by: nirnkv
2 Replies

4. Shell Programming and Scripting

Sum Of Column Based On Column Condition

I have a following inputfile MT,AP,CDM,TTML,MUM,GS,SUCC,3 MT,AP,CDM,TTSL,AP,GS,FAIL,9 MT,AP,CDM,RCom,MAH,GS,SUCC,3 MT,AP,CDM,RTL,HP,GS,SUCC,1 MT,AP,CDM,Uni,UPE,GS,SUCC,2 MT,AP,CDM,Uni,MUM,GS,SUCC,2 TTSL,AP,GS,MT,MAH,CDM,SUCC,20 TTML,AP,GS,MT,MAH,CDM,FAIL,10... (2 Replies)
Discussion started by: siramitsharma
2 Replies

5. UNIX for Dummies Questions & Answers

Sum based on column 1

i have file input aaa ccc,45567,rterw,1 bbb dcs,564543,hjghgh,1 aaa ccc,454,rterw,6 i want to sum based on column 1 expected output aaa ccc,7 bbb dcs,1 (4 Replies)
Discussion started by: radius
4 Replies

6. Shell Programming and Scripting

awk to sum a column based on duplicate strings in another column and show split totals

Hi, I have a similar input format- A_1 2 B_0 4 A_1 1 B_2 5 A_4 1 and looking to print in this output format with headers. can you suggest in awk?awk because i am doing some pattern matching from parent file to print column 1 of my input using awk already.Thanks! letter number_of_letters... (5 Replies)
Discussion started by: prashob123
5 Replies

7. Shell Programming and Scripting

Sum column values based in common identifier in 1st column.

Hi, I have a table to be imported for R as matrix or data.frame but I first need to edit it because I've got several lines with the same identifier (1st column), so I want to sum the each column (2nd -nth) of each identifier (1st column) The input is for example, after sorted: K00001 1 1 4 3... (8 Replies)
Discussion started by: sargotrons
8 Replies

8. UNIX for Dummies Questions & Answers

Match sum of values in each column with the corresponding column value present in trailer record

Hi All, I have a requirement where I need to find sum of values from column D through O present in a CSV file and check whether the sum of each Individual column matches with the value present for that corresponding column present in the trailer record. For example, let's assume for column D... (9 Replies)
Discussion started by: tpk
9 Replies

9. Shell Programming and Scripting

Sum of a column as new column based on header in a script

Hello, I am trying to store sum of a column as a new column inside a file but have to find the column names dynamically I/p c1,c2,c3,c4,c5 10,20,30,40,50 20,30,40,50,60 If i want to find sum only column c1, c3 and output it as c6,c7 O/p c1,c2,c3,c4,c5,c6,c7 10,20,30,40,50,30,70... (6 Replies)
Discussion started by: mkathi
6 Replies

10. UNIX for Beginners Questions & Answers

Sum in file based column

Hi All, I have a file as below and want to sum based on the id in the first column Input 10264;ATE; 12 10265;SES;11 10266AUT;50 10264;ATE;10 10265;SES;13 10266AUT;89 10264;ATE;1 10265;SES;15 10266AUT;78 Output 10264;ATE; 23 10265;SES;39 10266AUT;139 (6 Replies)
Discussion started by: arunkumar_mca
6 Replies
All times are GMT -4. The time now is 07:32 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy