Sum column values based in common identifier in 1st column.


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Sum column values based in common identifier in 1st column.
# 1  
Old 10-21-2014
Sum column values based in common identifier in 1st column.

Hi,
I have a table to be imported for R as matrix or data.frame but I first need to edit it because I've got several lines with the same identifier (1st column), so I want to sum the each column (2nd -nth) of each identifier (1st column)

The input is for example, after sorted:
Code:
K00001 1 1 4 3 5
K00001 2 4 4 3 3
K00001 8 9 1 5 2
K00006 7 2 3 2 8
K00006 3 4 6 6 3
K00006 5 1 9 7 7
K00008 1 1 2 1 1
K00011 14 18 18 12 15

I want an output as follow:
Code:
K00001 11 14 9 11 10
K00006 15 7 18 15 18
K00008 1 1 2 1 1
K00011 14 18 18 12 15

How can do this in awk? I queried some threads with similar task but all of those were about just a sum of one given column.

Please help. Thanks in advance !



Moderator's Comments:
Mod Comment Please use code tags next time for your code and data. Thanks

Last edited by vbe; 10-21-2014 at 10:06 AM..
# 2  
Old 10-21-2014
Any attempts from your side?
# 3  
Old 10-21-2014
Yes I tried to modify this example I found here in this forum,
Code:
awk '{A[$1]+=$2;next}END{for(i in A){print i,A[i]}}'

I'm pretty newbie, and mostly the way I'm learning is by googling similar examples and adapt to my objectives.

Regarding the example above I think it can't be used because
Code:
A[$1]+=$2

.
# 4  
Old 10-21-2014
Code:
awk '{a[$1] += $2; b[$1] += $3; c[$1] += $4; d[$1] += $5; e[$1] += $6} END {for(x in a) print x, a[x], b[x], c[x], d[x], e[x]}' file

---------- Post updated at 09:50 AM ---------- Previous update was at 09:48 AM ----------

or since the input is already sorted and if you want the sorted order in the output
Code:
awk 'a1==$1 {a2+=$2; a3+=$3; a4+=$4; a5+=$5; a6+=$6; next} {print a1, a2, a3, a4, a5, a6; a1=$1; a2=$2; a3=$3; a4=$4; a5=$5; a6=$6} END {print a1, a2, a3, a4, a5, a6}' file

This User Gave Thanks to SriniShoo For This Post:
# 5  
Old 10-21-2014
@SriniShoo : One array is enough I think

Try

Code:
akshay@nio:/tmp$ cat infile
K00001 1 1 4 3 5
K00001 2 4 4 3 3
K00001 8 9 1 5 2
K00006 7 2 3 2 8
K00006 3 4 6 6 3
K00006 5 1 9 7 7
K00008 1 1 2 1 1
K00011 14 18 18 12 15

Code:
awk '
	function p(v, s)
	{
		for(i=2;i<=NF;i++)
		{
		  if(v == "push"){
			A[$1,i]+=$i
		  }
		  else{	
			if(($1,i) in  A)
			  s = (length(s) ? s OFS:"") A[$1,i]
			  delete A[$1,i]
		  }
		}
		return s
	}

 FNR==NR{ p("push"); next }

        { if(length(s=p()))print $1,s }

    ' infile infile

Resulting
Code:
K00001 11 14 9 11 10
K00006 15 7 18 15 18
K00008 1 1 2 1 1
K00011 14 18 18 12 15


OR

Code:
awk '{
	for(i=2; i<=NF; i++)
	A[$1,i]+=$i; 
	max = ( NF > max ) ? NF : max
     }
  END{
	for(i in A)
	{
	  if(i in A)
	  {
		split(i,X,SUBSEP)
		s = X[1]
		for(j=2; j<=max; j++)
		{
			if((X[1],j) in A)
			{
				s = s OFS A[X[1],j] 
				delete A[X[1],j]
			} 
		}
			
		print s
	  }
	}
     }'  infile


Resulting (if you don't care order)

Code:
K00008 1 1 2 1 1
K00006 15 7 18 15 18
K00001 11 14 9 11 10
K00011 14 18 18 12 15


Last edited by Akshay Hegde; 10-21-2014 at 11:09 AM..
This User Gave Thanks to Akshay Hegde For This Post:
# 6  
Old 10-21-2014
Thanks, I will have a look to see i can understand the logic of the array, that seems more pratical since my actual input file has 26 columns.

Thank you both for the fast help! I might be back with some doubts about the logic or semantics of the array, so I wont indicate this topic for closure, although is solved! Smilie
# 7  
Old 10-22-2014
Try also
Code:
awk     'function SP()  {n=split ($0, T); ID=$1}
         function PR()  {printf "%s", ID; for (i=2; i<=n; i++) printf "\t%s", T[i]; printf "\n"}

         NR==1          {SP();next}
         $1 != ID       {PR(); SP(); next}
                        {for (i=2; i<=NF; i++) T[i]+=$i}
         END            {PR()}
        ' file
K00001  11      14      9       11      10
K00006  15      7       18      15      18
K00008  1       1       2       1       1
K00011  14      18      18      12      15

This User Gave Thanks to RudiC For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Sum of a column as new column based on header in a script

Hello, I am trying to store sum of a column as a new column inside a file but have to find the column names dynamically I/p c1,c2,c3,c4,c5 10,20,30,40,50 20,30,40,50,60 If i want to find sum only column c1, c3 and output it as c6,c7 O/p c1,c2,c3,c4,c5,c6,c7 10,20,30,40,50,30,70... (6 Replies)
Discussion started by: mkathi
6 Replies

2. UNIX for Beginners Questions & Answers

Sum the values in the column using date column

I have a file which need to be summed up using date column. I/P: 2017/01/01 a 10 2017/01/01 b 20 2017/01/01 c 40 2017/01/01 a 60 2017/01/01 b 50 2017/01/01 c 40 2017/01/01 a 20 2017/01/01 b 30 2017/01/01 c 40 2017/02/01 a 10 2017/02/01 b 20 2017/02/01 c 30 2017/02/01 a 10... (6 Replies)
Discussion started by: Booo
6 Replies

3. UNIX for Dummies Questions & Answers

Match sum of values in each column with the corresponding column value present in trailer record

Hi All, I have a requirement where I need to find sum of values from column D through O present in a CSV file and check whether the sum of each Individual column matches with the value present for that corresponding column present in the trailer record. For example, let's assume for column D... (9 Replies)
Discussion started by: tpk
9 Replies

4. Shell Programming and Scripting

awk to sum a column based on duplicate strings in another column and show split totals

Hi, I have a similar input format- A_1 2 B_0 4 A_1 1 B_2 5 A_4 1 and looking to print in this output format with headers. can you suggest in awk?awk because i am doing some pattern matching from parent file to print column 1 of my input using awk already.Thanks! letter number_of_letters... (5 Replies)
Discussion started by: prashob123
5 Replies

5. Shell Programming and Scripting

Sum Of Column Based On Column Condition

I have a following inputfile MT,AP,CDM,TTML,MUM,GS,SUCC,3 MT,AP,CDM,TTSL,AP,GS,FAIL,9 MT,AP,CDM,RCom,MAH,GS,SUCC,3 MT,AP,CDM,RTL,HP,GS,SUCC,1 MT,AP,CDM,Uni,UPE,GS,SUCC,2 MT,AP,CDM,Uni,MUM,GS,SUCC,2 TTSL,AP,GS,MT,MAH,CDM,SUCC,20 TTML,AP,GS,MT,MAH,CDM,FAIL,10... (2 Replies)
Discussion started by: siramitsharma
2 Replies

6. Shell Programming and Scripting

common entries between files based on 1st column

Hi, I am trying to get the common entries from 2 files based on 1st field.. However when I try to do in perl I am getting blank output.. How can I do this in awk? open(BUFF1, "my_genes"); open(BUFF3, "rawcounts"); #open(WRBUFF,">result_rawcounts"); while($line =<BUFF1>) { ... (3 Replies)
Discussion started by: Diya123
3 Replies

7. Shell Programming and Scripting

How to averaging column based on first column values

Hello I have file that consist of 2 columns of millions of entries timestamp and throughput I want to find the average (throughput ) for each equal timestamp before change it to proper format e.g : i want to average 2 coloumnd fot all 1308154800 values in column 1 and then print... (4 Replies)
Discussion started by: aadel
4 Replies

8. Shell Programming and Scripting

print unique values of a column and sum up the corresponding values in next column

Hi All, I have a file which is having 3 columns as (string string integer) a b 1 x y 2 p k 5 y y 4 ..... ..... Question: I want get the unique value of column 2 in a sorted way(on column 2) and the sum of the 3rd column of the corresponding rows. e.g the above file should return the... (6 Replies)
Discussion started by: amigarus
6 Replies

9. Shell Programming and Scripting

merging column from two files based on identifier

Hi, I have two files consisting of two columns. So I want to merge column 2 if column 1 is the same. So heres an example of what I mean. FILE1 driver 444 car 333 hat 222 FILE2 driver 333 car 666 hat 999 So I want to merge the column 2's together so... (4 Replies)
Discussion started by: phil_heath
4 Replies

10. UNIX for Dummies Questions & Answers

How do I sum one column based on another column?

Hi, I am new to this forum and new to awk. I have a file that contains 2 columns. Heres an example of what it looks like: 10 + 20 + 40 + 50 - 70 - So the file is tab-delimited. What I want to do is add 10 to column 1 whenever column 2 is + and substract 10 from column 1... (1 Reply)
Discussion started by: phil_heath
1 Replies
Login or Register to Ask a Question