Sum the rows number based on first field string value


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Sum the rows number based on first field string value
# 1  
Old 05-05-2011
Sum the rows number based on first field string value

Hi,
I have a file like this one

Code:
h1	4.70650E-04	4.70650E-04	4.70650E-04
h2	1.92912E-04	1.92912E-04	1.92912E-04
h3A	3.10160E-11	2.94562E-11	2.78458E-11
h4	0.00000E+00	0.00000E+00	0.00000E+00
h1	1.18164E-12	2.74150E-12	4.35187E-12
h1	7.60813E-01	7.60813E-01	7.60813E-01
h2	0.00000E+00	0.00000E+00	0.00000E+00
h1	7.20928E-17	7.20928E-17	7.20928E-17
h2	3.91678E-13	3.91678E-13	3.91678E-13
h1	0.00000E+00	0.00000E+00	0.00000E+00
h2	0.00000E+00	0.00000E+00	0.00000E+00
h3A	6.02954E-12	6.02954E-12	6.02954E-12
h1	3.85772E-06	3.85772E-06	3.85772E-06
h2	0.00000E+00	0.00000E+00	0.00000E+00
h3A	1.15837E-12	2.69242E-12	4.36386E-12
h1	1.34083E-10	1.34083E-10	1.34083E-10
h2	0.00000E+00	0.00000E+00	0.00000E+00
h3A	5.03809E-06	5.03809E-06	5.03809E-06
h4	2.45852E+00	2.45852E+00	2.45852E+00

I would like to obtain a file like

Code:
h1	4.74508E-04	4.74508E-04	4.74508E-04
h2	1.92912E-04	1.92912E-04	1.92912E-04
h3A	5.03813E-06	5.03813E-06	5.03813E-06
h4	2.45852E+00	2.45852E+00	2.45852E+00

where the data in each column have been summed based on the content of the string.
the fields are tab separated and there can be any number of columns.

I've tried with awk something like

Code:
awk '{A[$1]+=$2} END{for (i in A) print A[i]","i}' test_001.dat

In this way I can sum the content of the second column, now I'm stuck and need some help.
Thanks a lot,
Sarah

P.S.
Sorry the formatting doesn't seems to work!

Last edited by radoulov; 05-05-2011 at 07:20 AM.. Reason: Code tags, please!
# 2  
Old 05-05-2011
Code:
awk '{A[$1]+=$2;B[$1]+=$3;C[$1]+=$4} END{for (i in A) print i,A[i],B[i],C[i]}' test_001.dat

You may then have to fix the calculation format (use printf with the relevant long float number format that suites)
This User Gave Thanks to ctsgnb For This Post:
# 3  
Old 05-05-2011
Code:
awk 'END {
  for (C in c) {
    printf "%s", C OFS
    for (i = 1; ++i <= c[C];) {
	  printf "%E", v[C,i] 
	  printf "%s", (i < c[C] ? OFS : RS)
	  }
	}
  }
{ 
  for (i = 1; ++i <= NF;)
    v[$1,i] += $i
  c[$1] = NF	
  }' OFS='\t' infile

This User Gave Thanks to radoulov For This Post:
# 4  
Old 05-05-2011
Thanks a lot to both of you.
Is there a way to print the results in a certain order (e.g. in the order the strings appear in the input file)? I'm not sure if I understood the script, and it seems that they appear in an order I cannot predict.
Maybe the file could be sorted afterwards.
Thank you once more,
Sarah
# 5  
Old 05-05-2011
Something like this?

Code:
awk 'END {
  for (j = 0; ++j <= cnt;) {
    printf "%s", idx[j] OFS
    for (i = 1; ++i <= c[idx[j]];) {
      printf "%E", v[idx[j],i] 
      printf "%s", (i < c[idx[j]] ? OFS : RS)
      }
    }
  }
{ 
  for (i = 1; ++i <= NF;)
    v[$1,i] += $i
  c[$1] = NF
  tmp[$1]++ || idx[++cnt] = $1  
  }' OFS='\t' infile

This User Gave Thanks to radoulov For This Post:
# 6  
Old 05-05-2011
Lovely,
I had also tried with success

Quote:
sort -t$'\t' -k 1,1 file_name
Cheers,
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Script to find string based on pattern and search for its corresponding rows in column

Experts, Need your support for this awk script. we have only one input file, all these column 1 and column 2 are in same file and have to do lookup for values in one file(column1 and column2) but output we need in another file Need to grep row whose string contains 9K from column 1. When found... (6 Replies)
Discussion started by: as7951
6 Replies

2. Shell Programming and Scripting

Splitting single row into multiple rows based on for every 10 digits of last field of the row

Hi ALL, We have requirement in a file, i have multiple rows. Example below: Input file rows 01,1,102319,0,0,70,26,U,1,331,000000113200000011920000001212 01,1,102319,0,1,80,20,U,1,241,00000059420000006021 I need my output file should be as mentioned below. Last field should split for... (4 Replies)
Discussion started by: kotra
4 Replies

3. Shell Programming and Scripting

Add specific string to last field of each line in perl based on value

I am trying to add a condition to the below perl that will capture the GTtag and place a specific string in the last field of each line. The problem is that the GT value used is not right after the tag rather it is a few fields away. The values should always be 0/1 or 1/2 and are in bold in the... (12 Replies)
Discussion started by: cmccabe
12 Replies

4. Shell Programming and Scripting

Split File based on number of rows

Hi I have a requirement, where i will receive multiple files in a folder (say: /fol1/fol2/). There will be at least 14 to 16 files. The size of the files will different, some may be 80GB or 90GB, some may be less than 5 GB (and the size of the files are very unpredictable). But the names of the... (10 Replies)
Discussion started by: kpk_ds
10 Replies

5. Shell Programming and Scripting

Comma separated to rows based on field

Hi to all, I have a file like: chr1 a1 a2 a3 a4 a5 a6,a7,a8,a9 chr1 b1 b2 b3 b4 b5 b6,b7 chr2 c1 c2 c3 c4 c5 c6,c7,c8,c9,c10 ... I would like an output like this: chr1 a6 chr1 a7 chr1 a8 chr1 a9 chr1 b6 chr1 b7 chr2 c6 chr2 c7 chr2 c8 (6 Replies)
Discussion started by: aec
6 Replies

6. UNIX for Dummies Questions & Answers

count number of rows based on other column values

Could anybody help with this? I have input below ..... david,39 david,39 emelie,40 clarissa,22 bob,42 bob,42 tim,32 bob,39 david,38 emelie,47 what i want to do is count how many names there are with different ages, so output would be like this .... david,2 emelie,2 clarissa,1... (3 Replies)
Discussion started by: itsme999
3 Replies

7. Shell Programming and Scripting

Average calculation based on number of rows

Dear users, I need your support, I have a file like this: 272134.548 6680572.715 272134.545 6680572.711 272134.546 6680572.713 272134.548 6680572.706 272134.545 6680572.721 272134.543 6680572.710 272134.544 6680572.715 272134.543 6680572.705 272134.540 6680572.720 272134.544... (10 Replies)
Discussion started by: Gery
10 Replies

8. UNIX for Dummies Questions & Answers

how to count number of rows and sum of column using awk

Hi All, I have the following input which i want to process using AWK. Rows,NC,amount 1,1202,0.192387 2,1201,0.111111 3,1201,0.123456 i want the following output count of rows = 3 ,sum of amount = 0.426954 Many thanks (2 Replies)
Discussion started by: pistachio
2 Replies

9. UNIX for Dummies Questions & Answers

count the number of field in a string with deliminter

hi i am new to shell and would like ask some question about shell how am i suppose to count the number of fields with delementer example a:b:c:d would return me 4 how am i going to do it using shell?? what is the syntax for file locking? thanks a lot (9 Replies)
Discussion started by: yang
9 Replies

10. Shell Programming and Scripting

Splitting file based on number of rows

Hi, I'm, new to shell scripting, I have a requirement where I have to split an incoming file into separate files each containing a maximum of 3 million rows. For e.g: if my incoming file say In.txt has 8 mn rows then I need to create 3 files, in which two will 3 mn rows and one will contain 2... (2 Replies)
Discussion started by: wahi80
2 Replies
Login or Register to Ask a Question