awk to sum a column based on duplicate strings in another column and show split totals


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting awk to sum a column based on duplicate strings in another column and show split totals
# 1  
Old 01-10-2014
awk to sum a column based on duplicate strings in another column and show split totals

Hi,
I have a similar input format-
Code:
A_1 2
B_0 4
A_1 1
B_2 5
A_4 1

and looking to print in this output format with headers. can you suggest in awk?awk because i am doing some pattern matching from parent file to print column 1 of my input using awk already.Thanks!
letter number_of_letters Total Split
Code:
A        3        4     2+1+1
B        2        10    4+5


Last edited by Franklin52; 01-10-2014 at 03:14 AM.. Reason: Please use code tags
# 2  
Old 01-10-2014
That would require something like this:

Code:
awk -F'[_ ]*' '
  {
    A[$1]++
    n=$2*$3
    if(n>B[$1]) B[$1]=n
    C[$1]=C[$1] (C[$1]==""?x:"+") $3
  } 
  END{
    for(i in A) print i, A[i], B[i], C[i]
  }
' OFS='\t' file

If not please specify more elaborately what it is that you need. Also, next time please show your attempts at a solution...
# 3  
Old 01-10-2014
Thank you.Can you pls tell me how to get rid of the _ field separator & count the duplicates in $1?
Say, for the input
Code:
A_1 2
B_0 4
A_1 1
B_0 5
A_1 1

and output should be
Code:
A_1       3        4     2+1+1 B_0       2        10    4+5


Last edited by Franklin52; 01-10-2014 at 10:10 AM.. Reason: fixed code tags
# 4  
Old 01-10-2014
How do you arrive at 10 for B_0 ?
# 5  
Old 01-10-2014
Sorry, typo.
It should read:
Code:
A        3        4     2+1+1
B        2        9    4+5

---------- Post updated at 09:28 AM ---------- Previous update was at 09:26 AM ----------

Oops.. this is the correct format required:
Code:
A_1        3        4     2+1+1
B_0        2        9    4+5


Last edited by Scrutinizer; 01-10-2014 at 01:21 PM.. Reason: moved [/code] tags
# 6  
Old 01-10-2014
Try:
Code:
awk '
  {
    A[$1]++
    B[$1]+=$2
    C[$1]=C[$1] (C[$1]==""?x:"+") $2
  } 
  END{
    for(i in A) print i, A[i], B[i], C[i]
  }
' OFS='\t' file

This User Gave Thanks to Scrutinizer For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Sum of a column as new column based on header in a script

Hello, I am trying to store sum of a column as a new column inside a file but have to find the column names dynamically I/p c1,c2,c3,c4,c5 10,20,30,40,50 20,30,40,50,60 If i want to find sum only column c1, c3 and output it as c6,c7 O/p c1,c2,c3,c4,c5,c6,c7 10,20,30,40,50,30,70... (6 Replies)
Discussion started by: mkathi
6 Replies

2. Shell Programming and Scripting

Do replace operation and awk to sum multiple columns if another column has duplicate values

Hi Experts, Please bear with me, i need help I am learning AWk and stuck up in one issue. First point : I want to sum up column value for column 7, 9, 11,13 and column15 if rows in column 5 are duplicates.No action to be taken for rows where value in column 5 is unique. Second point : For... (12 Replies)
Discussion started by: as7951
12 Replies

3. Shell Programming and Scripting

awk to Sum columns when other column has duplicates and append one column value to another with Care

Hi Experts, Please bear with me, i need help I am learning AWk and stuck up in one issue. First point : I want to sum up column value for column 7, 9, 11,13 and column15 if rows in column 5 are duplicates.No action to be taken for rows where value in column 5 is unique. Second point : For... (1 Reply)
Discussion started by: as7951
1 Replies

4. Shell Programming and Scripting

Sum column values based in common identifier in 1st column.

Hi, I have a table to be imported for R as matrix or data.frame but I first need to edit it because I've got several lines with the same identifier (1st column), so I want to sum the each column (2nd -nth) of each identifier (1st column) The input is for example, after sorted: K00001 1 1 4 3... (8 Replies)
Discussion started by: sargotrons
8 Replies

5. UNIX for Dummies Questions & Answers

awk solution to duplicate lines based on column

Hi experts, I have a tab-delimited file with one column containing values separated by a comma. I wish to duplicate the entire line for every value in that comma-delimited field. For example: $cat file 4444 4444 4444 4444 9990 2222,7777 6666 2222 ... (3 Replies)
Discussion started by: torchij
3 Replies

6. UNIX for Dummies Questions & Answers

Sum based on certain column

I have file 1 1/1/2013 A 553.0763397 96 16582 X1 X3 X5 X7 X9 1/1/2013 B 600.8333588 195 11992 X2 X3 X6 X7 X9 1/1/2013 B 459.8333588 195 11992 X1 X3 X6 X7 X9 1/2/2013 A 844.2973022 306 19555 X1 ... (12 Replies)
Discussion started by: radius
12 Replies

7. UNIX for Dummies Questions & Answers

awk to sum column field from duplicate row/lines

Hello, I am new to Linux environment , I working on Linux script which should send auto email based on the specific condition from log file. Below is the sample log file Name m/c usage abc xxx 10 abc xxx 20 abc xxx 5 xyz ... (6 Replies)
Discussion started by: asjaiswal
6 Replies

8. UNIX for Dummies Questions & Answers

Sum based on column 1

i have file input aaa ccc,45567,rterw,1 bbb dcs,564543,hjghgh,1 aaa ccc,454,rterw,6 i want to sum based on column 1 expected output aaa ccc,7 bbb dcs,1 (4 Replies)
Discussion started by: radius
4 Replies

9. Shell Programming and Scripting

Sum Of Column Based On Column Condition

I have a following inputfile MT,AP,CDM,TTML,MUM,GS,SUCC,3 MT,AP,CDM,TTSL,AP,GS,FAIL,9 MT,AP,CDM,RCom,MAH,GS,SUCC,3 MT,AP,CDM,RTL,HP,GS,SUCC,1 MT,AP,CDM,Uni,UPE,GS,SUCC,2 MT,AP,CDM,Uni,MUM,GS,SUCC,2 TTSL,AP,GS,MT,MAH,CDM,SUCC,20 TTML,AP,GS,MT,MAH,CDM,FAIL,10... (2 Replies)
Discussion started by: siramitsharma
2 Replies

10. UNIX for Dummies Questions & Answers

How do I sum one column based on another column?

Hi, I am new to this forum and new to awk. I have a file that contains 2 columns. Heres an example of what it looks like: 10 + 20 + 40 + 50 - 70 - So the file is tab-delimited. What I want to do is add 10 to column 1 whenever column 2 is + and substract 10 from column 1... (1 Reply)
Discussion started by: phil_heath
1 Replies
Login or Register to Ask a Question