Sum values of specific column in multiple files, considering ranges defined in another file


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Sum values of specific column in multiple files, considering ranges defined in another file
# 1  
Old 12-08-2014
Sum values of specific column in multiple files, considering ranges defined in another file

I have a file (let say file B) like this:

Code:
 File B:  
A1  3  5  
A1  7  9  
A2  2  5  
A3  1  3

The first column defines a filename and the other two define a range in that specific file. In the same directory, I have also three more files (File A1, A2 and A3). Here is 10 sample lines from each file:
Code:
File A1:           File A2:     File A3:   
    1   0.6           1  0.2       1  0.1       
    2   0.04          2  0.1       2  0.2 
    3   0.4           3  0.2       3  0.5 
    4   0.5           4  0.4       4  0.3 
    5   0.009         5  0.2       5  0.7
    6   0.2           6  0.3       6  0.3
    7   0.3           7  0.8       7  0.3   
    8   0.2           8  0.1       8  0.2 
    9   0.15          9  0.9       9  0.8 
    10  0.1           10 0.4       10 0.1

I need to add a new column to file B, which in each line gives the sum of values of column two in the defined range and file. For example, file B row 1 means that calculate the sum of values of line 3 to 5 in column two of file A1. The desired output is something like this:

Code:
File B:
    A1  3  5  0.909    
    A1  7  9  0.65  
    A2  2  5  0.9  
    A3  1  3  0.8

All files are in tabular text format. How can I perform this task? I have access to bash (ubuntu 14.04) and R but not an expert bash or R programmer. Any help would be greatly appreciated. Thanks in advance
# 2  
Old 12-08-2014
For above sample, this should do
Code:
awk     '       {printf "%s ", $0
                 FN=$1; ST=$2; EN=$3;
                 while ($1+0 < EN)      {getline < FN
                                         if ($1 >= ST) sum+=$2}
                                         printf "%s\n", sum
                                         sum=0}
        ' file
A1  3  5 0.909
A1  7  9 0.65
A2  2  5 0.9
A3  1  3 0.8

It is not aimed at nor tested for severely different input data.

Last edited by RudiC; 12-08-2014 at 04:37 PM..
This User Gave Thanks to RudiC For This Post:
# 3  
Old 12-08-2014
Code:
while read f1 f2 f3
do
 awk 'NR==start,NR==end {sum+=$2} END {print FILENAME,start,end,sum}' start="$f2" end="$f3" "$f1"
done < B > B.new

If B.new looks okay, you can rename it to B
Code:
mv B.new B

This User Gave Thanks to MadeInGermany For This Post:
# 4  
Old 12-09-2014
Thank you for your prompt reply RudiC and MadeInGermany. I will try your code today and post the results.

---------- Post updated at 03:10 PM ---------- Previous update was at 10:30 AM ----------

It solved the problem. Thank you for your clear and effective code and helping me.

---------- Post updated at 09:06 PM ---------- Previous update was at 03:10 PM ----------

sorry MadeInGermany, but I got an error running the code on real data. The code performs well in sample data (f1, f2, f3). but when I run it on multiple files I get this error:
Code:
read: f100: bad variable name

I renamed the file to '100' with no success. I also renamed to 'hundred' but the error was repeated for the next file (f101). Unfortunately googling wasn't helpful.
Thank you for your time and consideration.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

How to sum value of a column by range defined in another file awk?

I have two files, file1.table is the count table, and the other is the range condition file2.range. file1.table chr start end count N1 0 48 1 N1 48 181 2 N1 181 193 0 N1 193 326 2 N1 326 457 0 N1 457 471 1 N1 471 590 2 N1 590 604 1 N1 604 752 1 N1 752 875 1 file2.range... (12 Replies)
Discussion started by: yifangt
12 Replies

2. UNIX for Beginners Questions & Answers

Copy columns from one file into another and get sum of column values and row count

I have a file abc.csv, from which I need column 24(PurchaseOrder_TotalCost) to get the sum_of_amounts with date and row count into another file say output.csv abc.csv- UTF-8,,,,,,,,,,,,,,,,,,,,,,,,, ... (6 Replies)
Discussion started by: Tahir_M
6 Replies

3. Shell Programming and Scripting

Do replace operation and awk to sum multiple columns if another column has duplicate values

Hi Experts, Please bear with me, i need help I am learning AWk and stuck up in one issue. First point : I want to sum up column value for column 7, 9, 11,13 and column15 if rows in column 5 are duplicates.No action to be taken for rows where value in column 5 is unique. Second point : For... (12 Replies)
Discussion started by: as7951
12 Replies

4. UNIX for Dummies Questions & Answers

Match sum of values in each column with the corresponding column value present in trailer record

Hi All, I have a requirement where I need to find sum of values from column D through O present in a CSV file and check whether the sum of each Individual column matches with the value present for that corresponding column present in the trailer record. For example, let's assume for column D... (9 Replies)
Discussion started by: tpk
9 Replies

5. Shell Programming and Scripting

Need to sum up a column value from multiple files into Credit and Debit categories using awk command

i have multiple files with Batch Header, Record detail & Batch trailer data in the files like : BH 20150225950050N8262 RD 20140918000000 99999999 unk Deferred Settlement -13950 BT01 -13950 *Above sample data donot have the spaces coorectly defined. I do have multiple batch trailer... (1 Reply)
Discussion started by: kcdg859
1 Replies

6. Shell Programming and Scripting

Sum column values based in common identifier in 1st column.

Hi, I have a table to be imported for R as matrix or data.frame but I first need to edit it because I've got several lines with the same identifier (1st column), so I want to sum the each column (2nd -nth) of each identifier (1st column) The input is for example, after sorted: K00001 1 1 4 3... (8 Replies)
Discussion started by: sargotrons
8 Replies

7. Shell Programming and Scripting

Sum of a column in multiple files

I am performing the following operation on a file that looks like this 1000 0 10 479.0 1115478.07497 0.0 0.0 0.0872665 1000 10 20 1500.0 3470012.29304 0.0 0.0 0.261799 1000 20 30 2442.0 5676346.87758 0.0 0.0 0.436332 1000 30 40 3378.0 7737905.30957 0.0 0.0 0.610865 1000 40 50 4131.0... (2 Replies)
Discussion started by: kayak
2 Replies

8. Shell Programming and Scripting

Divide data with specific column values into separate files

hello! i need a little help from you :) ... i need to split a file into separate files depending on two conditions using scripting. The file has no delimiters. The conditions are col 17 = "P" and col 81 = "*", this will go to one output file; col 17 = "R" and col 81 = " ". Here is an example. ... (3 Replies)
Discussion started by: chanclitas
3 Replies

9. Shell Programming and Scripting

extracting columns falling within specific ranges for multiple files

Hi, I need to create weekly files from daily records stored in individual monthly filenames from 1999-2010. my sample file structure is like the ones below: daily record stored per month: 199901.xyz, 199902.xyz, 199903.xyz, 199904.xyz ...199912.xyz records inside 199901.xyz (original data... (4 Replies)
Discussion started by: ida1215
4 Replies

10. Shell Programming and Scripting

print unique values of a column and sum up the corresponding values in next column

Hi All, I have a file which is having 3 columns as (string string integer) a b 1 x y 2 p k 5 y y 4 ..... ..... Question: I want get the unique value of column 2 in a sorted way(on column 2) and the sum of the 3rd column of the corresponding rows. e.g the above file should return the... (6 Replies)
Discussion started by: amigarus
6 Replies
Login or Register to Ask a Question