Awk/sed summation of one column based on some entry in first column


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Awk/sed summation of one column based on some entry in first column
# 1  
Old 12-07-2019
Awk/sed summation of one column based on some entry in first column

Hi All ,

I am having an input file as stated below

Input file

Code:
6  ddk/djhdj/djhdj/Q  10 0.5 
    dhd/jdjd.djd.nd/QB 01 0.5 
    hdhd/jd/jd/jdj/Q  10 0.5
512 hd/hdh/gdh/Q 01 0.5
      jdjd/jd/ud/j/QB 10 0.5 
      HD/jsj/djd/Q  01 0.5 
71 hdh/jjd/dj/jd/Q  10 0.5
    jd/jdld/je/j/QB 01 0.5
    IDP/jd/jdd/Q 10 0.5 
    1K/JDJ/JDJK/QL 01 0.5

I need to take the summation of the 4th column and it should start based on the first column and print at the 5th column at every entry of first column .

My output file

Code:
6  ddk/djhdj/djhdj/Q  10 0.5  1.5
    dhd/jdjd.djd.nd/QB 01 0.5 
    hdhd/jd/jd/jdj/Q  10 0.5
512 hd/hdh/gdh/Q 01 0.5  1.5
      jdjd/jd/ud/j/QB 10 0.5 
      HD/jsj/djd/Q  01 0.5 
71 hdh/jjd/dj/jd/Q  10 0.5 2.0
    jd/jdld/je/j/QB 01 0.5
    IDP/jd/jdd/Q 10 0.5 
    1K/JDJ/JDJK/QL 01 0.5

I tried something like this but not able to extract the right values
Code:
awk '
BEGIN { FS=OFS="\t" }
NR==FNR { s[$1]+=$4; next }
{ print $0,s[$1] }
' Final.tran.map.pattern5

Please let me know.
# 2  
Old 12-07-2019
Try
Code:
tac file | awk -F" +" '{SUM += $4} $1 {$5 = SUM; SUM = 0} 1' | tac

# 3  
Old 12-07-2019
Thanks a lot Rudic ,

Could you give a brief explanation of this code, I am not able to get why we are selecting the field separator as " +"
?

Thanks and Regards
Kshitij Kulshreshtha

Last edited by RavinderSingh13; 12-07-2019 at 07:37 AM..
# 4  
Old 12-07-2019
Hello kshitij,

Could you please try following.

Code:
awk '
FNR==NR{
  if($0~/^[0-9]+/){
     ++count
  }
  sum[count]+=$NF
  next
}
/^[0-9]+/{
  print $0,sum[++var]
  next
}
1
'  Input_file  Input_file

Output will be as follows.

Code:
6  ddk/djhdj/djhdj/Q  10 0.5  1.5
    dhd/jdjd.djd.nd/QB 01 0.5
    hdhd/jd/jd/jdj/Q  10 0.5
512 hd/hdh/gdh/Q 01 0.5 1.5
      jdjd/jd/ud/j/QB 10 0.5
      HD/jsj/djd/Q  01 0.5
71 hdh/jjd/dj/jd/Q  10 0.5 2
    jd/jdld/je/j/QB 01 0.5
    IDP/jd/jdd/Q 10 0.5
    1K/JDJ/JDJK/QL 01 0.5

Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
# 5  
Old 12-07-2019
Your input file is not too consistent in its usage of spaces as field separators. It has one or two between fields, one up to six from BOL to $2 if $1 is missing, and zero or one at EOL. That's what " +" is for: it stands for "one or many" spaces. c.f. man regex for further reference.
This User Gave Thanks to RudiC For This Post:
# 6  
Old 12-07-2019
Quote:
Originally Posted by RavinderSingh13
Hello kshitij,
Could you please try following.
Code:
awk '
FNR==NR{
  if($0~/^[0-9]+/){
     ++count
  }
  sum[count]+=$NF
  next
}
/^[0-9]+/{
  print $0,sum[++var]
  next
}
1
'  Input_file  Input_file

Output will be as follows.

Code:
6  ddk/djhdj/djhdj/Q  10 0.5  1.5
    dhd/jdjd.djd.nd/QB 01 0.5
    hdhd/jd/jd/jdj/Q  10 0.5
512 hd/hdh/gdh/Q 01 0.5 1.5
      jdjd/jd/ud/j/QB 10 0.5
      HD/jsj/djd/Q  01 0.5
71 hdh/jjd/dj/jd/Q  10 0.5 2
    jd/jdld/je/j/QB 01 0.5
    IDP/jd/jdd/Q 10 0.5
    1K/JDJ/JDJK/QL 01 0.5

Thanks,
R. Singh
Hello kshitij,

Adding a detailed level explanation for my code.

Code:
awk '                            ##Starting awk program from here.
FNR==NR{                         ##Checking condition FNR==NR which will be TRUE when Input_file is being read first time.
  if($0~/^[0-9]+/){              ##Checking condition if a line starts from digit then do following.
     ++count                     ##Increment variable count with 1, each time cursor comes here.
  }                              ##Closing BLOCK for if condition here.
  sum[count]+=$NF                ##Creating an array named sum with index of count and keep on adding value of $NF to its own value.
  next                           ##next will skip all further statements from here.
}                                ##Closing BLOCK for condition FNR==NR here.
/^[0-9]+/{                       ##Checking condition if a line starts from digit then do following.
  print $0,sum[++var]            ##Printing current line and array sum with index of variable var with its increment of 1 each time cursor comes here.
  next                           ##next will skip all further statements from here.
}                                ##Closing BLOCK for /^[0-9]+/ condition here.
1                                ##Mentioning 1 for printing edited/non-edited line here.
'  Input_file  Input_file        ##Mentioning Input_file 2 times here.

Thanks,
R. Singh
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

sed command to add a new column entry

My input file looks like this 12 3 5.122.281.413 172.31.15.220 3421 4133 2 2 1454 3421 4133 2 2 0 12 44036 214.215.52.146 90.123.245.211 2312 3911 4 4 521 2312 3911 4 4 1 14 504 6.254.324.219 192.61.27.120 4444 5611 7 5 1415 4444 5611 7 5 1 ... (2 Replies)
Discussion started by: sampitosh
2 Replies

2. Shell Programming and Scripting

Solution for replacement of 4th column with 3rd column in a file using awk/sed preserving delimters

input "A","B","C,D","E","F" "S","T","U,V","W","X" "AA","BB","CC,DD","EEEE","FFF" required output: "A","B","C,D","C,D","F" "S", T","U,V","U,V","X" "AA","BB","CC,DD","CC,DD","FFF" tried using awk but double quotes not preserving for every field. any help to solve this is much... (5 Replies)
Discussion started by: khblts
5 Replies

3. Shell Programming and Scripting

awk script to call another script based on second column entry

Hi I have a text file (Input.txt) with two column entries separated by tab as given below: aaa str1 bbb str2 cccccc str3 dddd str4 eee str3 ssss str2 sdf str3 hhh str1 fff str2 ccc str3 ..... ..... ..... (1 Reply)
Discussion started by: my_Perl
1 Replies

4. Shell Programming and Scripting

awk to sum a column based on duplicate strings in another column and show split totals

Hi, I have a similar input format- A_1 2 B_0 4 A_1 1 B_2 5 A_4 1 and looking to print in this output format with headers. can you suggest in awk?awk because i am doing some pattern matching from parent file to print column 1 of my input using awk already.Thanks! letter number_of_letters... (5 Replies)
Discussion started by: prashob123
5 Replies

5. Shell Programming and Scripting

Pick the column value based on another column using awk or CUT

My scenario is that I need to pick value from third column based on fourth column value, if fourth column value is 1 then first value of third column.Third column (2|3|4|6|1) values are cancatenated. Please someone help me to resolve this issue. Source column1 column2 column3 column4... (2 Replies)
Discussion started by: Ganesh L
2 Replies

6. Shell Programming and Scripting

awk or sed: change the color of a column w/o screwing up column spacing

Hey folks. I wrote a little awk script that summarizes /proc/net/dev info and then pipes it to the nix column command to set up column spacing appropriately. Here's some example output: Iface RxMBytes RxPackets RxErrs RxDrop TxMBytes TxPackets TxErrs TxDrop bond0 9 83830... (3 Replies)
Discussion started by: ryran
3 Replies

7. Shell Programming and Scripting

Rename a header column by adding another column entry to the header column name URGENT!!

Hi All, I have a file example.csv which looks like this GrpID,TargetID,Signal,Avg_Num CSCH74_1_1,2007,61,256 CSCH74_1_1,212007,647,679 CSCH74_1_1,12007,3,32 CSCH74_1_1,207,299,777 I want the output as GrpID,TragetID,Signal-CSCH74_1_1,Avg_Num CSCH74_1_1,2007,61,256... (4 Replies)
Discussion started by: Vavad
4 Replies

8. Shell Programming and Scripting

for each different entry in column 1 extract maximum values from column 2 in unix/awk

Hello, I have 2 columns (1st column has multiple entries but the corresponding values in the column 2 may be the same or different.) however I want to extract unique values for each entry in column 1 by assigning the max value from column 2 SDF4 -0.211654 SDF4 0.978068 ... (1 Reply)
Discussion started by: Diya123
1 Replies

9. Shell Programming and Scripting

Awk or Sed, fubd match in column, then edit column.

FILE A: 9780743551526,(Abridged) 9780743551779,(Unabridged) 9780743582469,(Abridged) 9780743582483,(Unabridged) 9780743563468,(Abridged) 9780743563475,(Unabridged) FILE B: c3saCandyland 9780743518321 "CANDYLAND" "MCBAIN, ED" 2001 c3sbCandyland 9780743518321 ... (7 Replies)
Discussion started by: glev2005
7 Replies

10. Shell Programming and Scripting

awk/sed column replace using column header - help

$ cat log.txt Name Age Sex Lcation nfld alias xsd CC 25 M XYZ asx KK Y BB 21 F XAS awe SS N SD 21 M AQW rty SD A How can I replace the column with header "Lcation" with the column with header "alias" and delete the "alias" column? so that the final output will become: Name Age Sex... (10 Replies)
Discussion started by: jkl_jkl
10 Replies
Login or Register to Ask a Question