averaging column values with awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting averaging column values with awk
# 15  
Old 01-26-2009
Quote:
Originally Posted by johnmillsbro
thanks. that appears to work. Could you explain the code in detail so I fully grasp that? Im not sure if I understand floating numbers? Also. Now if I wanted then to take the average for each BCxxxxxx ID and subtract the mean from each number would that be equally difficult?
thanks a million. this is all new too me...like learning chinese...

-J
I put comments in the last post with the code - hope it helps.
How do you define "mean" and what's the format of your desired output?
# 16  
Old 01-26-2009
mean= average. The format would look like the original attachment. But each value would have the average for that set of data substracted out of each value
BC156041
56 subtract (avg all values for BC156041)
45 subtract (avg all values for BC156041)
# 17  
Old 01-26-2009
mean= average. The format would look like the original attachment. But each value would have the average for that set of data substracted out of each value
BC156041
56 subtract (avg all values for BC156041)
45 subtract (avg all values for BC156041)
n.. subtract (avg of n values for BC156041)
BC056472
12 subtract (avg all values for BC056472)
45 subtract (avg all values for BC056472)
n.. subtract (avg all values for BC056472)
etc etc

so that the output looks identical to the input except the average for each data set has been subtacted from each original data value. i.e. the average for all IDs is being set to zero. by subtracting the mean for each ID we are zeroing the average...

are you following me?
thanks you are great help..
# 18  
Old 01-26-2009
i.e
for input
BC111111
4
8
12
BC555555
2
4
6

output
BC111111
-4
0
4
BC555555
-2
0
2
# 19  
Old 01-26-2009
assuming 'mean = value - average'....

I also put the 'average' for every 'BC' for the ease of validating - you can change
Code:
FNR!=NR && /^BC[0-9]+$/ {id=$0; print $0, arr[id]}

TO
Code:
FNR!=NR && /^BC[0-9]+$/ {id=$0; print $0}

in your final version.

awk -f john.awk data.txt data.txt

john.awk:
Code:
        # for the very FIRST line (NR==1) in a file, assign the entire record ($0) to a variable "id".
        # then proceed to the next input line ("next")
        FNR==1 && NR==1{id=$0; next} 

        # if a line starts (^) with "BC" and is followed by one or more (+) numbers "[0-9]"
        # output the value of a variable "id", followed by:
        #     if "n" is non-zero, divide '"s" by "n"
        #     if "n" is 0, output string "NA"
        # "s=n=0" - assign "0" to "s" and "n"
        # assign a current record/line ($0) to variable "id"
        (FNR==NR || FNR==1) && /^BC[0-9]+$/ {arr[id]= (n) ? s/n : "NA"; s=n=0; id=$0}

        # if a line starts with one or more (+) numbers ([0-9]) optionally followed
        # by zero or more (*) numbers ([0-9]) or a dot (.)...
        #    calculate a sum (s) by adding the current record value ($0) to a running sum (s): s+=$0
        #    increment the running counter for records associated with a current "id": n++
        FNR==NR && /^[0-9]+[.0-9]*$/{s+=$0; n++}


        FNR!=NR && /^BC[0-9]+$/ {id=$0; print $0, arr[id]}
        FNR!=NR && /^[0-9]+[.0-9]*$/{print $0, $0 - arr[id] }

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk script to append suffix to column when column has duplicated values

Please help me to get required output for both scenario 1 and scenario 2 and need separate code for both scenario 1 and scenario 2 Scenario 1 i need to do below changes only when column1 is CR and column3 has duplicates rows/values. This inputfile can contain 100 of this duplicated rows of... (1 Reply)
Discussion started by: as7951
1 Replies

2. Shell Programming and Scripting

awk Print New Column For Every Two Lines and Match On Multiple Column Values to print another column

Hi, My input files is like this axis1 0 1 10 axis2 0 1 5 axis1 1 2 -4 axis2 2 3 -3 axis1 3 4 5 axis2 3 4 -1 axis1 4 5 -6 axis2 4 5 1 Now, these are my following tasks 1. Print a first column for every two rows that has the same value followed by a string. 2. Match on the... (3 Replies)
Discussion started by: jacobs.smith
3 Replies

3. Shell Programming and Scripting

How to perform averaging of values for particular timestamp using awk or anythoing else??

I have a file of the form. 16:00:26,83.33 16:05:26,83.33 16:10:26,83.33 16:15:26,83.33 16:20:26,90.26 16:25:26,83.33 16:30:26,83.33 17:00:26,83.33 17:05:26,83.33 17:10:26,83.33 17:15:26,83.33 17:20:26,90.26 17:25:26,83.33 17:30:26,83.33 For the timestamp 16:00:00 to 16:55:00, I need to... (5 Replies)
Discussion started by: Saidul
5 Replies

4. Shell Programming and Scripting

Selective Replace awk column values

Hi, I have the following data: 2860377|"DATA1"|"DATA2"|"65343"|"DATA2"|"DATA4"|"11"|"DATA5"|"DATA6"|"65343"|"DATA7"|"0"|"8"|"1"|"NEGATIVE" 32340377|"DATA1"|"DATA2"|"65343"|"DATA2"|"DATA4"|"11"|"DATA5"|"DATA6"|"65343"|"DATA7"|"0"|"8"|"1"|"NEG-DID"... (3 Replies)
Discussion started by: sdohn
3 Replies

5. UNIX for Dummies Questions & Answers

awk for concatenation of column values

Hello, I have a table as shown below. I want to concatenate values in col2 and col3 based on a value in col4. 1 X Y A 3 Y Z B 4 A W B 5 T W A If col4 is A, then I want to concatenate col3 with itself. Otherwise it should concateneate col2 with col3. 1 X Y YY 3 Y Z YZ... (10 Replies)
Discussion started by: Gussifinknottle
10 Replies

6. Shell Programming and Scripting

Averaging each row with null values

Hi all, I want to compute for the average of a file with null values (NaN) for each row. any help on how to do it. the sample file looks like this. 1.4 1.2 1.5 NaN 1.6 1.3 1.1 NaN 1.3 NaN 2.4 1.3 1.5 NaN 1.5 NaN 1.2 NaN 1.4 NaN I need to do a row-wise averaging such that it will sum only... (14 Replies)
Discussion started by: ida1215
14 Replies

7. Shell Programming and Scripting

averaging specific column of multiple files

Hi all, I'm needing help again on scripting. I have weekly files with 3 columns, and I need to do monthly averaging on the values on column 3, the file naming convention is as follows: 20000105.u- 2000:year 01:month 05:day 20000112.u 20000119.u 20000126.u 20000202.u 20020209.u I need to... (15 Replies)
Discussion started by: ida1215
15 Replies

8. Shell Programming and Scripting

How to averaging column based on first column values

Hello I have file that consist of 2 columns of millions of entries timestamp and throughput I want to find the average (throughput ) for each equal timestamp before change it to proper format e.g : i want to average 2 coloumnd fot all 1308154800 values in column 1 and then print... (4 Replies)
Discussion started by: aadel
4 Replies

9. Shell Programming and Scripting

for each different entry in column 1 extract maximum values from column 2 in unix/awk

Hello, I have 2 columns (1st column has multiple entries but the corresponding values in the column 2 may be the same or different.) however I want to extract unique values for each entry in column 1 by assigning the max value from column 2 SDF4 -0.211654 SDF4 0.978068 ... (1 Reply)
Discussion started by: Diya123
1 Replies

10. Shell Programming and Scripting

How to pick values from column based on key values by usin AWK

Dear Guyz:) I have 2 different input files like this. I would like to pick the values or letters from the inputfile2 based on inputfile1 keys (A,F,N,X,Z). I have done similar task by using awk but in that case the inputfiles are similar like in inputfile2 (all keys in 1st column and values in... (16 Replies)
Discussion started by: repinementer
16 Replies
Login or Register to Ask a Question