Sum of a column as new column based on header in a script


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Sum of a column as new column based on header in a script
# 1  
Old 11-18-2018
Sum of a column as new column based on header in a script

Hello,

I am trying to store sum of a column as a new column inside a file but have to find the column names dynamically


I/p
Code:
c1,c2,c3,c4,c5
10,20,30,40,50
20,30,40,50,60

If i want to find sum only column c1, c3 and output it as c6,c7
O/p
Code:
c1,c2,c3,c4,c5,c6,c7
10,20,30,40,50,30,70
20,30,40,50,60,30,70


note that i want to do this dynamically what i mean is i do not know the position of the column and i want to do this sum for multiple columns

I can get the total sum of each column like this

Code:
#!/bin/sh
awk -F, '{for(i=1;i<=NF;i++)a[i]+=$i}
        END{for(i=1;i<=NF;i++)printf "%d%s", a[i], (i==NF?"\n":",")}'file

but how can i get the sum of each column as a seperate column in the file especially when i do not know the position of that column.

Thanks.
# 2  
Old 11-18-2018
How about


Code:
awk -F, '
NR == 1         {for (i=1; i<=NF; i++) if ("," MCOL "," ~ "," $i ",") COL[++CNT] = i
                 print $0
                }
FNR == 1        {next
                }

NR == FNR       {for (i=1; i<=CNT; i++) SUM[i] += $(COL[i])
                 next
                }
                {for (i=1; i<=CNT; i++) $(NF+1) = SUM[i]
                 print
                }
' OFS="," MCOL="c1,c2" file file

Yes, you read correctly, supply the input file twice - once to determine the values, once to print them.
# 3  
Old 11-18-2018
Quote:
Originally Posted by RudiC
How about


Code:
awk -F, '
NR == 1         {for (i=1; i<=NF; i++) if ("," MCOL "," ~ "," $i ",") COL[++CNT] = i
                 print $0
                }
FNR == 1        {next
                }

NR == FNR       {for (i=1; i<=CNT; i++) SUM[i] += $(COL[i])
                 next
                }
                {for (i=1; i<=CNT; i++) $(NF+1) = SUM[i]
                 print
                }
' OFS="," MCOL="c1,c2" file file

Yes, you read correctly, supply the input file twice - once to determine the values, once to print them.
Nice thanks ,this is a very interesting way to write code but here is the output I am getting when I execute this code

Code:
c1,c2,c3,c4,c5
c1,c2,c3,c4,c5,30,50
10,20,30,40,50,30,50
20,30,40,50,60,30,50

i tried to say

Code:
awk -F, '
NR == 1         {for (i=1; i<=NF; i++) if ("," MCOL "," ~ "," $i ",") COL[++CNT] = i
                 print $0,MCOL[i]
                }
FNR == 1        {next
                }

NR == FNR       {for (i=1; i<=CNT; i++) SUM[i] += $(COL[i])
                 next
                }
                {for (i=1; i<=CNT; i++) $(NF+1) = SUM[i]
                 print
                }
' OFS="," MCOL="c1,c2" file file

some thing like this but it is printing the whole line ideally this is how i want my out put

Code:
c1,c2,c3,c4,c5,sum_c1,sum_c3
10,20,30,40,50,30,50
20,30,40,50,60,30,50

is this possible. thanks for the input

------ Post updated at 02:13 AM ------

I am also trying something like this but I am pretty sure i am writing the code wrongly as i am getting syntax error. I am trying out the code on ideone so it doesnt print the error out unfortunately.

Please can you point out what is wrong with my approach or syntax here

Code:
#!/usr/bin/sh
col="c1,c3"   
idx=   
sum=0         
{
IFS=, read -ra values <<< "$col"
for v in "${values[@]}"
do
  for i in ${v//,/ } do
     while IFS=, read -r -a line; do
     sum=$(( sum + ${line[$i]} 
	 echo $sum
	 done
  done
done
}<testfile


Thanks.
# 4  
Old 11-19-2018
Quote:
Originally Posted by mkathi
Nice thanks ,this is a very interesting way to write code but here is the output I am getting when I execute this code

Code:
c1,c2,c3,c4,c5
c1,c2,c3,c4,c5,30,50
10,20,30,40,50,30,50
20,30,40,50,60,30,50

i tried to say

Code:
awk -F, '
NR == 1         {for (i=1; i<=NF; i++) if ("," MCOL "," ~ "," $i ",") COL[++CNT] = i
                 print $0,MCOL[i]
                }
FNR == 1        {next
                }

NR == FNR       {for (i=1; i<=CNT; i++) SUM[i] += $(COL[i])
                 next
                }
                {for (i=1; i<=CNT; i++) $(NF+1) = SUM[i]
                 print
                }
' OFS="," MCOL="c1,c2" file file

some thing like this but it is printing the whole line ideally this is how i want my out put

Code:
c1,c2,c3,c4,c5,sum_c1,sum_c3
10,20,30,40,50,30,50
20,30,40,50,60,30,50

is this possible. thanks for the input

------ Post updated at 02:13 AM ------

I am also trying something like this but I am pretty sure i am writing the code wrongly as i am getting syntax error. I am trying out the code on ideone so it doesnt print the error out unfortunately.

Please can you point out what is wrong with my approach or syntax here

Code:
#!/usr/bin/sh
col="c1,c3"   
idx=   
sum=0         
{
IFS=, read -ra values <<< "$col"
for v in "${values[@]}"
do
  for i in ${v//,/ } do
     while IFS=, read -r -a line; do
     sum=$(( sum + ${line[$i]} 
	 echo $sum
	 done
  done
done
}<testfile


Thanks.
Obviously, if you're getting syntax errors you're doing something wrong. Since you haven't shown us what errors you're getting, haven't given us comments to explain what your code is trying to do, haven't told us what shell you're using, and haven't told us what operating system you're using; it is hard to suggest how to fix your code to do what you expect it to do.

It is clear that you have a do that is being interpreted as a command argument when you probably intended for it to be interpreted as a keyword. It seems likely that you're trying to read your input file twice in your inner loop but you have only given the nested loops one copy of the file to read.

And, I don't see anything in your code that would produce the output header you say you want.

If we look at the code RudiC suggested (which would not produce second line of the output you showed us at the top of the post I quoted above), it seems obvious to me that he was giving you sample code that would be easy to modify to produce the code you requested in post #1 in this thread (which is not what you asked for in your last post).

Did you try to read and understand his code? If you had trouble understanding it, why didn't you ask questions about the part(s) you didn't understand? Didn't you see that the MCOL variable specifies the headings of the fields that are to be summed and the order in which the sums of those fields are to appear in the output lines?

So his example code gave you the sums of fields with the heading c1 and c2 instead of c1 and c3. A one character change in his sample code would have fixed that for you.

And his example just copied the input file heading line to the output instead of adding the two extra field headings. If we go back to his suggested code and make some minor modifications for your new output request:
Code:
awk -F, '
NR == 1         {printf("%s", $0)
		 for (i=1; i<=NF; i++)
			if ("," MCOL "," ~ "," $i ",") {
				COL[++CNT] = i
				printf(",sum_%s", $i)
			}
                 print ""
                }
FNR == 1        {next
                }

NR == FNR       {for (i=1; i<=CNT; i++) SUM[i] += $(COL[i])
                 next
                }
                {for (i=1; i<=CNT; i++) $(NF+1) = SUM[i]
                 print
                }
' OFS="," MCOL="c1,c3" file file

If I run this code with your sample input, I get the output:
Code:
c1,c2,c3,c4,c5,sum_c1,sum_c3
10,20,30,40,50,30,70
20,30,40,50,60,30,70

which seems to be what you said you wanted, but does not match what you have above where the last field in the last two lines of the output is 50 instead of 70. (Maybe RudiC's suggestion of calculating and printing the sums of the 1st and 2nd fields instead of the 1st and 3rd fields was correct???)
These 2 Users Gave Thanks to Don Cragun For This Post:
# 5  
Old 11-19-2018
Quote:
Originally Posted by Don Cragun
Obviously, if you're getting syntax errors you're doing something wrong. Since you haven't shown us what errors you're getting, haven't given us comments to explain what your code is trying to do, haven't told us what shell you're using, and haven't told us what operating system you're using; it is hard to suggest how to fix your code to do what you expect it to do.
Yes I was getting syntax issues which as you mentioned the error I should have posted here but I was trying my code on ideone.com which was not printing where the error is it was not me being lazy.I am using bash shell which is by default on the mentioned website.

Quote:
It is clear that you have a do that is being interpreted as a command argument when you probably intended for it to be interpreted as a keyword. It seems likely that you're trying to read your input file twice in your inner loop but you have only given the nested loops one copy of the file to read.
Code:
#!/usr/bin/sh
col="c1,c3"   
idx=   
sum=0         
{
IFS=, read -ra values <<< "$col" #Here I am trying to read column names from col variavble
for v in "${values[@]}"# looping all the column names i passed in the col variable
do
  for i in ${v//,/ } do# Here since the @ in the above for loop has values like c1,c3 comma seperated i was trying to read one by one
     while IFS=, read -r -a line; do# now read all the values under the column (first c1 then c3)
     sum=$(( sum + ${line[$i]} $sum the values under columns passed
	 echo $sum
	 done
  done
done
}<testfile

Now yes I know my syntax is wrong just was not getting error message but also I am pretty new to programming(Bad defense)

Quote:
And, I don't see anything in your code that would produce the output header you say you want.
Yes I haven't figured that part yet sorry my bad i should have been more precise.

Quote:
If we look at the code RudiC suggested (which would not produce second line of the output you showed us at the top of the post I quoted above), it seems obvious to me that he was giving you sample code that would be easy to modify to produce the code you requested in post #1 in this thread (which is not what you asked for in your last post).

Did you try to read and understand his code? If you had trouble understanding it, why didn't you ask questions about the part(s) you didn't understand? Didn't you see that the MCOL variable specifies the headings of the fields that are to be summed and the order in which the sums of those fields are to appear in the output lines?
I wasn't trying to say RudiC code is wrong. I was getting the above mentioned output on ideone.com now since i am at work i will try it on my terminal and yes MCOL i did try to print it by saying
Code:
print $0,MCOL

or
Code:
print $0,MCOL_sum

instead of
Code:
print $0,MCOL[i]

. The reason I did not ask clarification on his code yet is i am still trying to read through his code and once i do all my research i can come back with answers. Also i was trying to write the same code in my own way which obviously failed miserably.

Quote:
So his example code gave you the sums of fields with the heading c1 and c2 instead of c1 and c3. A one character change in his sample code would have fixed that for you.

And his example just copied the input file heading line to the output instead of adding the two extra field headings. If we go back to his suggested code and make some minor modifications for your new output request:
Code:
awk -F, '
NR == 1         {printf("%s", $0)
		 for (i=1; i<=NF; i++)
			if ("," MCOL "," ~ "," $i ",") {
				COL[++CNT] = i
				printf(",sum_%s", $i)
			}
                 print ""
                }
FNR == 1        {next
                }

NR == FNR       {for (i=1; i<=CNT; i++) SUM[i] += $(COL[i])
                 next
                }
                {for (i=1; i<=CNT; i++) $(NF+1) = SUM[i]
                 print
                }
' OFS="," MCOL="c1,c3" file file

If I run this code with your sample input, I get the output:
Code:
c1,c2,c3,c4,c5,sum_c1,sum_c3
10,20,30,40,50,30,70
20,30,40,50,60,30,70

which seems to be what you said you wanted, but does not match what you have above where the last field in the last two lines of the output is 50 instead of 70. (Maybe RudiC's suggestion of calculating and printing the sums of the 1st and 2nd fields instead of the 1st and 3rd fields was correct???)

Finally Thanks for everything I will keep this feedback in mind in my future posts
# 6  
Old 11-19-2018
Quote:
Originally Posted by mkathi
... I was trying my code on ideone.com which was not printing where the error ...

stderr from your original code on ideone.com :
Code:
./prog.sh: line 10: syntax error near unexpected token `while'
./prog.sh: line 10: `     while IFS=, read -r -a line; do'

... looks like a semicolon is missing in the preceding line.





Quote:
... I wasn't trying to say RudiC code is wrong. I was getting the above mentioned output on ideone.com now since i am at work i will try it on my terminal and yes MCOL i did try to print it by saying
Code:
print $0,MCOL

or
Code:
print $0,MCOL_sum

instead of
Code:
print $0,MCOL[i]

. ...


You should try to make very clear, what variables and arrays you use, and what values they are assigned (if at all), and where this might deviate from what you expect.
# 7  
Old 11-19-2018
Thanks RudiC and Don Cragun. your solutions work elegantly. I am still working through how i can write this code differently for my learning purpose and will update this thread once i am ablr to figure it out.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Sum in file based column

Hi All, I have a file as below and want to sum based on the id in the first column Input 10264;ATE; 12 10265;SES;11 10266AUT;50 10264;ATE;10 10265;SES;13 10266AUT;89 10264;ATE;1 10265;SES;15 10266AUT;78 Output 10264;ATE; 23 10265;SES;39 10266AUT;139 (6 Replies)
Discussion started by: arunkumar_mca
6 Replies

2. Shell Programming and Scripting

Sum column values based in common identifier in 1st column.

Hi, I have a table to be imported for R as matrix or data.frame but I first need to edit it because I've got several lines with the same identifier (1st column), so I want to sum the each column (2nd -nth) of each identifier (1st column) The input is for example, after sorted: K00001 1 1 4 3... (8 Replies)
Discussion started by: sargotrons
8 Replies

3. Shell Programming and Scripting

awk to sum a column based on duplicate strings in another column and show split totals

Hi, I have a similar input format- A_1 2 B_0 4 A_1 1 B_2 5 A_4 1 and looking to print in this output format with headers. can you suggest in awk?awk because i am doing some pattern matching from parent file to print column 1 of my input using awk already.Thanks! letter number_of_letters... (5 Replies)
Discussion started by: prashob123
5 Replies

4. UNIX for Dummies Questions & Answers

Sum based on certain column

I have file 1 1/1/2013 A 553.0763397 96 16582 X1 X3 X5 X7 X9 1/1/2013 B 600.8333588 195 11992 X2 X3 X6 X7 X9 1/1/2013 B 459.8333588 195 11992 X1 X3 X6 X7 X9 1/2/2013 A 844.2973022 306 19555 X1 ... (12 Replies)
Discussion started by: radius
12 Replies

5. UNIX for Dummies Questions & Answers

Sum based on column 1

i have file input aaa ccc,45567,rterw,1 bbb dcs,564543,hjghgh,1 aaa ccc,454,rterw,6 i want to sum based on column 1 expected output aaa ccc,7 bbb dcs,1 (4 Replies)
Discussion started by: radius
4 Replies

6. Shell Programming and Scripting

Sum Of Column Based On Column Condition

I have a following inputfile MT,AP,CDM,TTML,MUM,GS,SUCC,3 MT,AP,CDM,TTSL,AP,GS,FAIL,9 MT,AP,CDM,RCom,MAH,GS,SUCC,3 MT,AP,CDM,RTL,HP,GS,SUCC,1 MT,AP,CDM,Uni,UPE,GS,SUCC,2 MT,AP,CDM,Uni,MUM,GS,SUCC,2 TTSL,AP,GS,MT,MAH,CDM,SUCC,20 TTML,AP,GS,MT,MAH,CDM,FAIL,10... (2 Replies)
Discussion started by: siramitsharma
2 Replies

7. UNIX for Dummies Questions & Answers

Rename a header column by adding another column entry to the header column name

Hi All, I have a file example.csv which looks like this GrpID,TargetID,Signal,Avg_Num CSCH74_1_1,2007,61,256 CSCH74_1_1,212007,647,679 CSCH74_1_1,12007,3,32 CSCH74_1_1,207,299,777 I want the output as GrpID,TragetID,Signal-CSCH74_1_1,Avg_Num CSCH74_1_1,2007,61,256... (1 Reply)
Discussion started by: Vavad
1 Replies

8. Shell Programming and Scripting

Rename a header column by adding another column entry to the header column name URGENT!!

Hi All, I have a file example.csv which looks like this GrpID,TargetID,Signal,Avg_Num CSCH74_1_1,2007,61,256 CSCH74_1_1,212007,647,679 CSCH74_1_1,12007,3,32 CSCH74_1_1,207,299,777 I want the output as GrpID,TragetID,Signal-CSCH74_1_1,Avg_Num CSCH74_1_1,2007,61,256... (4 Replies)
Discussion started by: Vavad
4 Replies

9. Shell Programming and Scripting

Sum a column value based on multiple keys

Hi, I have below as i/p file: 5ABC 36488989 K 000010000ASB BYTRES 5PQR 45757754 K 000200005KPC HGTRET 5ABC 36488989 K 000045000ASB HGTRET 5GTH 36488989 K 000200200ASB BYTRES 5FTU ... (2 Replies)
Discussion started by: nirnkv
2 Replies

10. UNIX for Dummies Questions & Answers

How do I sum one column based on another column?

Hi, I am new to this forum and new to awk. I have a file that contains 2 columns. Heres an example of what it looks like: 10 + 20 + 40 + 50 - 70 - So the file is tab-delimited. What I want to do is add 10 to column 1 whenever column 2 is + and substract 10 from column 1... (1 Reply)
Discussion started by: phil_heath
1 Replies
Login or Register to Ask a Question