sum the columns of files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting sum the columns of files
# 1  
Old 01-14-2011
sum the columns of files

I have several csv files like this:
file1.csv
1 12 1
2 8 9
3 9 2
4 5 9
...
file2.csv
1 0 1
2 2 3
3 4 1
...
file3.csv
1 0 1
2 4 0
...
I want the result like this
1 12 3
2 14 12
3 13 3
4 5 9
I have a script:
Code:
awk '{a[$1]+=$2;b[$1]+=$3}END{for(i in a)print i"\t"a[i]"\t"b[i]}' *.csv|sort -k1n

Due to the huge size of my datasets, I don't want it run many days, I wonder if there is a much more efficiency approach

Last edited by frewise; 01-14-2011 at 08:38 AM..
This User Gave Thanks to frewise For This Post:
# 2  
Old 01-14-2011
You could try mawk, which is much faster then other awk implementations.
# 3  
Old 01-14-2011
Another idea is to parallelize the operations, seeing as the output of the awk script can be used as input. So we could chop up the number of files into for example 8 parts, run 8 parallel awk script and use the resulting 8 results as input to the awk script to produce the final result...
# 4  
Old 01-14-2011
You can also try this perl :
Code:
#!/usr/bin/perl

(@files) = glob "*.csv";

foreach ( @files ) {
   open (CSV,"<".$_) || warn "Unable to open file ".$_ ."\n";
   while (<CSV>) {
     chomp;
     /(\d+)\s+(\d+)\s+(\d+)/;
     $a[$1]+=$2;
     $b[$1]+=$3;
     $max=$1 if ( $1 > $max ) ;
     }
   close (CSV);
   }

for ($i=1;$i<=$max;$i++) {
  print $i ." ".$a[$i]." ".$b[$i]."\n" if $a[$i];
  }

# 5  
Old 01-14-2011
Or perhaps this:
Code:
paste -d'\n' *.csv | awk 'NF&&$1!=p x{if(p x)print p,s,t; s=t=0; p=$1} {s+=$2;t+=$3} END{print p,s,t}'

provided the csv's are in sorted order like in the sample

Last edited by Scrutinizer; 03-04-2013 at 06:25 AM..
# 6  
Old 01-14-2011
thanks, but this is a little slower than awk+sort
# 7  
Old 01-14-2011
Quote:
Originally Posted by frewise
I have several csv files like this:
file1.csv
1 12 1
2 8 9
3 9 2
4 5 9
...
file2.csv
1 0 1
2 2 3
3 4 1
...
file3.csv
1 0 1
2 4 0
...
I want the result like this
1 12 3
2 14 12
3 13 3
4 5 9
I have a script:
Code:
awk '{a[$1]+=$2;b[$1]+=$3}END{for(i in a)print i"\t"a[i]"\t"b[i]}' *.csv|sort -k1n

Due to the huge size of my datasets, I don't want it run many days, I wonder if there is a much more efficiency approach
Is it the awk or the sort that takes most of the time?
If it's sort, you can change the awk code to display output in the same order as its input.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Group by columns and add sum in new columns

Dear Experts, I have input file which is comma separated, has 4 columns like below, BRAND,COUNTRY,MODEL,COUNT NIKE,USA,DUMMY,5 NIKE,USA,ORIGINAL,10 PUMA,FRANCE,DUMMY,20 PUMA,FRANCE,ORIGINAL,15 ADIDAS,ITALY,DUMMY,50 ADIDAS,ITALY,ORIGINAL,50 SPIKE,CHINA,DUMMY,1O And expected output add... (2 Replies)
Discussion started by: ricky1991
2 Replies

2. Shell Programming and Scripting

Sum of Columns

HI Guys, I gave Input file F.Txt ID H1 H2 H3 H4 H5 A 5 6 7 8 9 B 4 65 4 4 7 C 4 4 4 4 4 D 4 4 4 4 4 Output :- ID H1 H2 H3 H4 H5 Total 17 79 19 20 24 Sum of Each Columns (8 Replies)
Discussion started by: pareshkp
8 Replies

3. Shell Programming and Scripting

Evaluate 2 columns, add sum IF two columns satisfy the condition

HI All, I'm embedding SQL query in Script which gives following output: Assignee Group Total ABC Group1 17 PQR Group2 5 PQR Group3 6 XYZ Group1 10 XYZ Group3 5 I have saved the above output in a file. How do i sum up the contents of this output so as to get following output: ... (4 Replies)
Discussion started by: Khushbu
4 Replies

4. Shell Programming and Scripting

Get the SUM of TWO columns SEPARATELY by doing GROUP BY on other columns

My File looks like: "|" -> Field separator A|B|C|100|1000 D|E|F|1|2 G|H|I|0|7 D|E|F|1|2 A|B|C|10|10000 G|H|I|0|7 A|B|C|1|100 D|E|F|1|2 I need to do a SUM on Col. 5 and Col.6 by grouping on Col 1,2 & 3 My expected output is: A|B|C|111|11100 (2 Replies)
Discussion started by: machomaddy
2 Replies

5. Shell Programming and Scripting

Evaluate 2 columns, add sum IF two columns match on two rows

Hi all, I know this sounds suspiciously like a homework course; but, it is not. My goal is to take a file, and match my "ID" column to the "Date" column, if those conditions are true, add the total number of minutes worked and place it in this file, while not printing the original rows that I... (6 Replies)
Discussion started by: mtucker6784
6 Replies

6. Shell Programming and Scripting

Sum columns

Hi All, I'm new to this forum. So please be patience with me! :) I have a file that looks like this (all rows have the same number of columns): 19 20 30 15 17 38 51 60 74 85 96 07 .... 10 20 44 59 39 88 13 77 30 10 11 12 .... . . . I want to sum the value of first field to all the... (2 Replies)
Discussion started by: Aderson Nascime
2 Replies

7. Shell Programming and Scripting

Sum up values of columns in 4 files using shell script

I am new to shell script.I have records like below in 4 different files which have about 10000 records each, all records unique and sorted based on column 2. 1 2 3 4 5 6 --------------------------- SR|1010478|000044590|1|0|0| SR|1014759|000105790|1|0|0| SR|1016609|000108901|1|0|0|... (2 Replies)
Discussion started by: reach.sree@gmai
2 Replies

8. UNIX for Dummies Questions & Answers

Sum of all columns in all files in one output file

If I have say 4 files like this: File1: 1 3 4 7 7 0 5 7 5 9 1 2 7 4 8 File2: 1 4 6 2 5 7 1 2 3 6 0 3 0 3 8 File3: (5 Replies)
Discussion started by: cosmologist
5 Replies

9. Shell Programming and Scripting

Sum of three columns - in 4N columns file

Hi All, happy new year. I have a file with 4xN columns like 0.0000e+00 0.0000e+00 7.199E+07 7.123E+07 6.976E+07 6.482E+07 5.256E+07 2.523E+07 0.0000e+00 0.0000e+00 8.641E+07 8.550E+07 8.373E+07 7.780E+07 6.309E+07 3.028E+07... (8 Replies)
Discussion started by: f_o_555
8 Replies

10. Shell Programming and Scripting

sum of three columns

Hi All, I have like this M17XX-050-01 0100000000 QQSSS 0.0000e+00 1.712E+06 1.255E+07 0.0000e+00 0.0000e+00 1.722E+06 1.263E+07 0.0000e+00 ... 0.0000e+00 1.204E+06 8.829E+06 0.0000e+00 M17XX-050-01 0100000000 WWSSS 0.0000e+00 7.564E+03 1.165E+01 0.0000e+00... (6 Replies)
Discussion started by: f_o_555
6 Replies
Login or Register to Ask a Question