sum of a column and selecting lines with value above threshold


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting sum of a column and selecting lines with value above threshold
# 1  
Old 11-28-2011
sum of a column and selecting lines with value above threshold

Hi again,
I need to further process the results of a previous manipulation.
I have a file with three columns
e.g.
Code:
AAA5 0.00175 1.97996e-06
AAA5 0.01334 2.14159e-05
AAA5 0.01340 4.12155e-05
AAA5 0.01496 1.10312e-05
AAA5 0.51401 0.0175308
BB0 0.00204 2.8825e-07
BB0 0.01569 7.94746e-07
BB0 0.01578 1.51949e-06
BB0 0.01766 3.91196e-07
BB0 2.18630 3.60312e-10
BB1 1.20490 6.06204e-09
CCC5 0.00226 5.31546e-10

What I would like to do is to sum the total of the 3rd column.
then, select (print) only those line whose ratio 3rd column/total is bigger than a certain value (stored in a variable).

I tried awk, but I still don't get the way it works.

Thanks in advance,
# 2  
Old 11-28-2011
Code:
#! /usr/bin/perl -w
use strict;

my ($line, $sum, $crtn_value, $ratio) = (0, 0, 1e-4, 0);
my (@x);

open INPUT, "< input.txt";
for $line (<INPUT>) {
    @x = split /\s+/, $line;
    $sum = $sum + $x[2];
}
close INPUT;

open INPUT, "< input.txt";
for $line (<INPUT>) {
    @x = split /\s+/, $line;
    $ratio = $x[2] / $sum;
    ($ratio > $crtn_value) && print "$line";
}
close INPUT;

This User Gave Thanks to balajesuri For This Post:
# 3  
Old 11-28-2011
Do you think that could this be done with AWK (maybe I gain some insight how it works)?
# 4  
Old 11-28-2011
Try this...
Code:
awk '{sum+=$3;a[$0]=$3}END{for(i in a){if(a[i]/sum > val){print i}}}' val="0.016" input_file

--ahamed
This User Gave Thanks to ahamed101 For This Post:
# 5  
Old 11-28-2011
Quote:
Originally Posted by ahamed101
Try this...
Code:
awk '{sum+=$3;a[$0]=$3}END{for(i in a){if(a[i]/sum > val){print i}}}' val="0.016" input_file

--ahamed
Hi ahamed,
indeed it works thank you.
It is not clear in which order the results are printed.
Is there a way to print them in the same order as they appear in the input file?
# 6  
Old 11-28-2011
sort the output if that helps i.e. awk '{...}' input_file | sort
or
Try this...
Code:
awk '{sum+=$3;a[++j]=$0}END{for(i=1;i<=j;i++){split(a[i],arr);if(arr[3]/sum > val){print a[i]}}}' val="0.016" input_file

--ahamed
This User Gave Thanks to ahamed101 For This Post:
# 7  
Old 11-30-2011
Quote:
Originally Posted by ahamed101
sort the output if that helps i.e. awk '{...}' input_file | sort
or
Try this...
Code:
awk '{sum+=$3;a[++j]=$0}END{for(i=1;i<=j;i++){split(a[i],arr);if(arr[3]/sum > val){print a[i]}}}' val="0.016" input_file

--ahamed
Hi this solution works better than the previous, which didn't work all the time.
e.g.
file1
Code:
eu154     1.51000000        1.70513841e+10
eu154     1.52200000        2.07052522e+09
eu154     1.52200000        2.07052522e+09
eu154     1.53140000        2.07052522e+10
eu154     1.53780000        1.70513841e+11
eu154     1.55400000        4.87182404e+09
eu154     1.59300000        3.53207243e+12
eu154     1.59650000        6.27264499e+12


Code:
awk '{sum+=$3;a[$0]=$3}END{for(i in a){if(a[i]/sum > val){print i}}}' val=0.0 file3

was giving out

Code:
eu154     1.53140000        2.07052522e+10
eu154     1.59300000        3.53207243e+12
eu154     1.52200000        2.07052522e+09
eu154     1.59650000        6.27264499e+12
eu154     1.55400000        4.87182404e+09
eu154     1.51000000        1.70513841e+10
eu154     1.53780000        1.70513841e+11

So it was missing one of the 2 identical lines

Still I have to sort the file with sort afterwards, but the command

Code:
awk '{sum+=$3;a[++j]=$0}END{for(i=1;i<=j;i++){split(a[i],arr);if(arr[3]/sum > val){print a[i]}}}' val="0.0" file3

gives

Code:
eu154     1.51000000        1.70513841e+10
eu154     1.52200000        2.07052522e+09
eu154     1.52200000        2.07052522e+09
eu154     1.53140000        2.07052522e+10
eu154     1.53780000        1.70513841e+11
eu154     1.55400000        4.87182404e+09
eu154     1.59300000        3.53207243e+12
eu154     1.59650000        6.27264499e+12

as expected
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Sum of a column as new column based on header in a script

Hello, I am trying to store sum of a column as a new column inside a file but have to find the column names dynamically I/p c1,c2,c3,c4,c5 10,20,30,40,50 20,30,40,50,60 If i want to find sum only column c1, c3 and output it as c6,c7 O/p c1,c2,c3,c4,c5,c6,c7 10,20,30,40,50,30,70... (6 Replies)
Discussion started by: mkathi
6 Replies

2. Shell Programming and Scripting

awk to Sum columns when other column has duplicates and append one column value to another with Care

Hi Experts, Please bear with me, i need help I am learning AWk and stuck up in one issue. First point : I want to sum up column value for column 7, 9, 11,13 and column15 if rows in column 5 are duplicates.No action to be taken for rows where value in column 5 is unique. Second point : For... (1 Reply)
Discussion started by: as7951
1 Replies

3. UNIX for Beginners Questions & Answers

Selecting lines based on the value in the 3rd column.

Hello, I have a sample data like this: A1 B1 100.00 B1 A1 100.00 A2 B2 90.80 B2 A2 90.80 A3 B3 99.07 B3 A3 99.07 A4 B4 99.00 B4 A4 99.00 A5 B5 97.13 B5 A5 99.53 . . Ax By i By Ax j each two lines are same comparison with opposite order. What I expected is... (3 Replies)
Discussion started by: nengcheng
3 Replies

4. UNIX for Dummies Questions & Answers

Match sum of values in each column with the corresponding column value present in trailer record

Hi All, I have a requirement where I need to find sum of values from column D through O present in a CSV file and check whether the sum of each Individual column matches with the value present for that corresponding column present in the trailer record. For example, let's assume for column D... (9 Replies)
Discussion started by: tpk
9 Replies

5. Shell Programming and Scripting

Sum column values based in common identifier in 1st column.

Hi, I have a table to be imported for R as matrix or data.frame but I first need to edit it because I've got several lines with the same identifier (1st column), so I want to sum the each column (2nd -nth) of each identifier (1st column) The input is for example, after sorted: K00001 1 1 4 3... (8 Replies)
Discussion started by: sargotrons
8 Replies

6. Shell Programming and Scripting

Selecting a value of column through sqlplus

Hi All, The value of a column in my DB table is: LOAD DATA APPEND INTO TABLE MK9210.EG FIELDS TERMINATED BY '|' TRAILING NULLCOLS ( STUDY CHAR ,PATIENT CHAR ,CPEVENT CHAR ,NAMEG CHAR ,REFIDEG CHAR ,POSEG CHAR ,METHEG CHAR ,EGDAT CHAR ,EGTIM CHAR ,COMEG CHAR ,COM1EG CHAR (5 Replies)
Discussion started by: Pratiksha Mehra
5 Replies

7. Shell Programming and Scripting

awk to sum a column based on duplicate strings in another column and show split totals

Hi, I have a similar input format- A_1 2 B_0 4 A_1 1 B_2 5 A_4 1 and looking to print in this output format with headers. can you suggest in awk?awk because i am doing some pattern matching from parent file to print column 1 of my input using awk already.Thanks! letter number_of_letters... (5 Replies)
Discussion started by: prashob123
5 Replies

8. UNIX for Dummies Questions & Answers

awk to sum column field from duplicate row/lines

Hello, I am new to Linux environment , I working on Linux script which should send auto email based on the specific condition from log file. Below is the sample log file Name m/c usage abc xxx 10 abc xxx 20 abc xxx 5 xyz ... (6 Replies)
Discussion started by: asjaiswal
6 Replies

9. Shell Programming and Scripting

Select lines where at least x columns above threshold value

I have a file with 20 columns. I'd like to retain only the lines for which the values in at least x columns, looking only at columns 6-20, are above a threshold. For example, I'd like to retain only the lines in the file below that have at least 8 columns (again, looking only at columns 6-20)... (3 Replies)
Discussion started by: pathunkathunk
3 Replies

10. Shell Programming and Scripting

selecting column in perl

Dear all, I have a rather large file of numbers which i would like to read into a script and then do some maths on a specific column( e.g column). so far i have been using the following awk command awk '{print $4}' infile.txt > out.tmp to strip out the desired column within the in perl... (3 Replies)
Discussion started by: Mish_99
3 Replies
Login or Register to Ask a Question