How to sum value of a column by range defined in another file awk?


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers How to sum value of a column by range defined in another file awk?
# 8  
Old 03-01-2019
Quote:
Originally Posted by Corona688
Is everything sorted? Can we depend on N1, N2, N3 being nicely grouped and coming in the same order in both file1 and file2? Order of the ranges doesn't necessarily need sorted.
Yes, they are grouped and sorted nicely.
# 9  
Old 03-01-2019
This is not a final solution (range end missing, empty intervals missing), but a test of an algorithm that shows severe discrepancies to your desired result. Could you pls check and explain the descrepancies?
Code:
awk '{SUM[$1 OFS int($2/100)*100] += $4} END {for (s in SUM) print s, SUM[s]}' OFS="\t" file1 | sort
N1	0	2
N1	100	2
N1	300	0
N1	400	3
N1	500	1
N1	600	0
N1	700	1
N2	0	0
N2	500	4
N2	600	5
N2	700	6


Last edited by RudiC; 03-01-2019 at 05:33 PM..
# 10  
Old 03-01-2019
The overlapping problem is ambiguous. If two rows overlap, which one wins, for a given bucket?
# 11  
Old 03-01-2019
Try



Code:
awk '
FNR == 1        {next
                }
NR == FNR       {SUM[$1 OFS $2 OFS $3]
                }
                {for (s in SUM) {split (s, T, OFS)
                                 if ($1 == T[1] &&
                                     $2 >= T[2] &&
                                     $2 <= T[3]) SUM[s] += $4
                                }
                }
END             {for (s in SUM)  print s, SUM[s]     # {split (s, T, OFS) redundant, I think
                                                     #}
                }
' OFS="\t" file2 file1 | sort
N1    0    99    2
N1    100    199    2
N1    200    299    0
N1    300    399    0
N1    400    499    3
N1    500    599    1
N1    600    699    0
N1    700    799    1
N1    800    899    0
N1    900    999    0
N2    0    99    0
N2    100    199    0
N2    200    299    0
N2    300    399    0
N2    400    499    0
N2    500    599    4
N2    600    699    5
N2    700    799    6
N2    800    899    0
N2    900    999    0


Last edited by RudiC; 03-04-2019 at 04:40 PM..
This User Gave Thanks to RudiC For This Post:
# 12  
Old 03-02-2019
Hi @RudiC, in the first example does not work as expected
Hi @yifangt,
The condition of the problem does not match the logic
Take the whole range from file2.range
The file has a range from 0 to 999 without gaps.
All values in file file1.table are in this range.
sum of counts in file 1 is equal 11
So in the output file all 11 should be presented
But in the expected result only 10 counts
It is asked by what algorithm the pattern N1 752 875 1 does not fall within the interval 700-999 ?
Code:
expected result:
N1    600    699    1   
N1    700    799    0   
N1    800    899    0   
N1    900    999    0

file1.table
N1 752 875  1


Last edited by Neo; 03-02-2019 at 03:02 AM.. Reason: please use code tags, not quote tags, for sample input and output. quote tags are for human-like speech. Thanks.
# 13  
Old 03-04-2019
Thanks RudiC!
I like these tricks:
Code:
SUM[$1 OFS int($2/100)*100]     #A very good trick to me for simple situations
...
for (s in SUM) {split (s, T, OFS) 
    if ($1 == T[1] && $2 >= T[2] && $2 <= T[3]) 
          SUM[s] += $4

The overlapping problem is quite complicated to me, which should be another topic I think.
Hi @nezbudka, the two input files were updated after the original message. Sorry for the confusion.
It is asked by what algorithm the pattern N1 752 875 1 does not fall within the interval 700-999 ?
No, I simplified this scenario to the interval 700-799 based on column 2 only (752 ignoring 875).

Thanks again to all of you!

Last edited by yifangt; 03-04-2019 at 03:14 PM.. Reason: Elaboration on some points and corrected typos
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to Sum columns when other column has duplicates and append one column value to another with Care

Hi Experts, Please bear with me, i need help I am learning AWk and stuck up in one issue. First point : I want to sum up column value for column 7, 9, 11,13 and column15 if rows in column 5 are duplicates.No action to be taken for rows where value in column 5 is unique. Second point : For... (1 Reply)
Discussion started by: as7951
1 Replies

2. Shell Programming and Scripting

Sum values of specific column in multiple files, considering ranges defined in another file

I have a file (let say file B) like this: File B: A1 3 5 A1 7 9 A2 2 5 A3 1 3 The first column defines a filename and the other two define a range in that specific file. In the same directory, I have also three more files (File A1, A2 and A3). Here is 10 sample lines... (3 Replies)
Discussion started by: Bastami
3 Replies

3. Shell Programming and Scripting

awk to sum a column based on duplicate strings in another column and show split totals

Hi, I have a similar input format- A_1 2 B_0 4 A_1 1 B_2 5 A_4 1 and looking to print in this output format with headers. can you suggest in awk?awk because i am doing some pattern matching from parent file to print column 1 of my input using awk already.Thanks! letter number_of_letters... (5 Replies)
Discussion started by: prashob123
5 Replies

4. Shell Programming and Scripting

Sum of 286th column using awk in a file

Hi, I am using the following code to find the sum of the values of column 286 in a file. It will have the Decimal values with the scale of 2. Delimiter is '|^' cut -d'|^' -f286 filename|cut -c3-| awk '{ x += $1 } END { printf("%.2f\n", x) }' There are around 50k records in this file... (2 Replies)
Discussion started by: Jram
2 Replies

5. Shell Programming and Scripting

How to sum multiple column output with awk ?

Hi Experts, I am trying to sum multiple columns and rows with awk , I want the sum of : 1] Horizontal Sum: (rows sum): 2] Vertical Sum: (Column's sum] details: # cat file1 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 40 31 32 33 34 35 36 37 38 39 70 41 42 43 44... (2 Replies)
Discussion started by: rveri
2 Replies

6. Shell Programming and Scripting

awk count characters, sum, and divide by another column

Hi All, I am another biologist attempting to parse a large txt file containing several million lines like: tucosnp 56762 T Y 228 228 60 23 .CcCcc,,..c.c,cc,,.C... What I need to do is get the frequency of periods (.) plus commas (,) in column 9, and populate this number into another... (1 Reply)
Discussion started by: peromhc
1 Replies

7. Shell Programming and Scripting

help sum columns by break in first column with awk or sed or something.

I have some data that is something like this? item: onhand counted location ITEM0001 1 0 a1 ITEM0001 0 1 a2 ITEM0002 5 0 b5 ITEM0002 0 6 c1 I want to sum up... (6 Replies)
Discussion started by: syadnom
6 Replies

8. UNIX for Dummies Questions & Answers

Column containing sum using awk

Hi All, I am trying to add a column that contains the sum of the previous column repeated. IE 1 2 3 4 I would like to get: 1 10 2 10 3 10 4 10 Advice? I can get 1 1 2 3 3 6 (4 Replies)
Discussion started by: baconbasher
4 Replies

9. Shell Programming and Scripting

give column range in awk

hi all, I generally give an awk command to print multiple columns like this: awk -F~ '{OFS=",";print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13}' test.txt can't we give a range like : awk -F~ '{OFS=",";print $1-$13}' ( I know this will subtract column 13 from 1) or awk -F~... (1 Reply)
Discussion started by: sumeet
1 Replies

10. Shell Programming and Scripting

Log File date compare for user defined range

:confused: Hi i am a noob and need a little help to finish my shell script. I am learning as i go but hit a problem. I am search thorugh logs(*.rv) files to find entires between two user defined dates, The script so far looks for the "START" and "END" of each entry at sees if it belongs To... (0 Replies)
Discussion started by: mojo24
0 Replies
Login or Register to Ask a Question