averaging column values with awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting averaging column values with awk
# 8  
Old 01-26-2009
it gives an output for some of the IDs..but most recieve NA..?
thanks
# 9  
Old 01-26-2009
Since they're not ordered linearly, you can pre-sort them, or use awk arrays:
Code:
awk '/^BC[0-9]+$/ {id=$1; next;} { sum[id]+=$0; count[id]++; } 
  END {   for (id in sum)  print id, sum[id]/count[id]; }'

# 10  
Old 01-26-2009
even if I pre-sort them...the ID numbers are not continous. i.e. (BC100100, BC100199, etc)

-J
# 11  
Old 01-26-2009
Quote:
Originally Posted by johnmillsbro
it gives an output for some of the IDs..but most recieve NA..?
thanks
please post a sample file
# 12  
Old 01-26-2009
here is the beginning of the file i am attempting to calculate averages from. This is just the 1st couple of entries...in total there are 25,000 BCxxxxxx entries.

thanks
# 13  
Old 01-26-2009
you have floating point numbers - need to accommodate a regex for that:
Code:
awk  '
        # for the very FIRST line (NR==1) in a file, assign the entire record ($0) to a variable "id".
        # then proceed to the next input line ("next")
        NR==1{id=$0; next} 

        # if a line starts (^) with "BC" and is followed by one or more (+) numbers "[0-9]"
        # output the value of a variable "id", followed by:
        #     if "n" is non-zero, divide '"s" by "n"
        #     if "n" is 0, output string "NA"
        # "s=n=0" - assign "0" to "s" and "n"
        # assign a current record/line ($0) to variable "id"
        /^BC[0-9]+$/{print id, (n) ? s/n : "NA"; s=n=0; id=$0}

        # if a line starts with one or more (+) numbers ([0-9]) optionally followed
        # by zero or more (*) numbers ([0-9]) or a dot (.)...
        #    calculate a sum (s) by adding the current record value ($0) to a running sum (s): s+=$0
        #    increment the running counter for records associated with a current "id": n++
        /^[0-9]+[.0-9]*$/{s+=$0; n++} 

# at the END of processing the entire file, we still have the LAST "id" no printed out
# print the "id" value AND its average as described above.
END{print id, (n) ? s/n : "NA"}' data.txt


Last edited by vgersh99; 01-26-2009 at 12:46 PM..
# 14  
Old 01-26-2009
thanks. that appears to work. Could you explain the code in detail so I fully grasp that? Im not sure if I understand floating numbers? Also. Now if I wanted then to take the average for each BCxxxxxx ID and subtract the mean from each number would that be equally difficult?
thanks a million. this is all new too me...like learning chinese...

-J
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk script to append suffix to column when column has duplicated values

Please help me to get required output for both scenario 1 and scenario 2 and need separate code for both scenario 1 and scenario 2 Scenario 1 i need to do below changes only when column1 is CR and column3 has duplicates rows/values. This inputfile can contain 100 of this duplicated rows of... (1 Reply)
Discussion started by: as7951
1 Replies

2. Shell Programming and Scripting

awk Print New Column For Every Two Lines and Match On Multiple Column Values to print another column

Hi, My input files is like this axis1 0 1 10 axis2 0 1 5 axis1 1 2 -4 axis2 2 3 -3 axis1 3 4 5 axis2 3 4 -1 axis1 4 5 -6 axis2 4 5 1 Now, these are my following tasks 1. Print a first column for every two rows that has the same value followed by a string. 2. Match on the... (3 Replies)
Discussion started by: jacobs.smith
3 Replies

3. Shell Programming and Scripting

How to perform averaging of values for particular timestamp using awk or anythoing else??

I have a file of the form. 16:00:26,83.33 16:05:26,83.33 16:10:26,83.33 16:15:26,83.33 16:20:26,90.26 16:25:26,83.33 16:30:26,83.33 17:00:26,83.33 17:05:26,83.33 17:10:26,83.33 17:15:26,83.33 17:20:26,90.26 17:25:26,83.33 17:30:26,83.33 For the timestamp 16:00:00 to 16:55:00, I need to... (5 Replies)
Discussion started by: Saidul
5 Replies

4. Shell Programming and Scripting

Selective Replace awk column values

Hi, I have the following data: 2860377|"DATA1"|"DATA2"|"65343"|"DATA2"|"DATA4"|"11"|"DATA5"|"DATA6"|"65343"|"DATA7"|"0"|"8"|"1"|"NEGATIVE" 32340377|"DATA1"|"DATA2"|"65343"|"DATA2"|"DATA4"|"11"|"DATA5"|"DATA6"|"65343"|"DATA7"|"0"|"8"|"1"|"NEG-DID"... (3 Replies)
Discussion started by: sdohn
3 Replies

5. UNIX for Dummies Questions & Answers

awk for concatenation of column values

Hello, I have a table as shown below. I want to concatenate values in col2 and col3 based on a value in col4. 1 X Y A 3 Y Z B 4 A W B 5 T W A If col4 is A, then I want to concatenate col3 with itself. Otherwise it should concateneate col2 with col3. 1 X Y YY 3 Y Z YZ... (10 Replies)
Discussion started by: Gussifinknottle
10 Replies

6. Shell Programming and Scripting

Averaging each row with null values

Hi all, I want to compute for the average of a file with null values (NaN) for each row. any help on how to do it. the sample file looks like this. 1.4 1.2 1.5 NaN 1.6 1.3 1.1 NaN 1.3 NaN 2.4 1.3 1.5 NaN 1.5 NaN 1.2 NaN 1.4 NaN I need to do a row-wise averaging such that it will sum only... (14 Replies)
Discussion started by: ida1215
14 Replies

7. Shell Programming and Scripting

averaging specific column of multiple files

Hi all, I'm needing help again on scripting. I have weekly files with 3 columns, and I need to do monthly averaging on the values on column 3, the file naming convention is as follows: 20000105.u- 2000:year 01:month 05:day 20000112.u 20000119.u 20000126.u 20000202.u 20020209.u I need to... (15 Replies)
Discussion started by: ida1215
15 Replies

8. Shell Programming and Scripting

How to averaging column based on first column values

Hello I have file that consist of 2 columns of millions of entries timestamp and throughput I want to find the average (throughput ) for each equal timestamp before change it to proper format e.g : i want to average 2 coloumnd fot all 1308154800 values in column 1 and then print... (4 Replies)
Discussion started by: aadel
4 Replies

9. Shell Programming and Scripting

for each different entry in column 1 extract maximum values from column 2 in unix/awk

Hello, I have 2 columns (1st column has multiple entries but the corresponding values in the column 2 may be the same or different.) however I want to extract unique values for each entry in column 1 by assigning the max value from column 2 SDF4 -0.211654 SDF4 0.978068 ... (1 Reply)
Discussion started by: Diya123
1 Replies

10. Shell Programming and Scripting

How to pick values from column based on key values by usin AWK

Dear Guyz:) I have 2 different input files like this. I would like to pick the values or letters from the inputfile2 based on inputfile1 keys (A,F,N,X,Z). I have done similar task by using awk but in that case the inputfiles are similar like in inputfile2 (all keys in 1st column and values in... (16 Replies)
Discussion started by: repinementer
16 Replies
Login or Register to Ask a Question