Visit Our UNIX and Linux User Community


Calculate the average of a column based on the value of another column


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Calculate the average of a column based on the value of another column
# 1  
Old 01-27-2013
Calculate the average of a column based on the value of another column

Hi,

I would like to calculate the average of column 'y' based on the value of column 'pos'.

For example, here is file1

Code:
id   pos    y   c
11   1      220   aa
11   4333     207   f
11   5333     112   ee
11   11116     305   e
11   11117     310   r
11   22228    781   gg
11   33310    121   hhh
11   55511    981   rr
11   111112      22    e
...



What I wanted is to calculate the average of "y" based on "pos", specifically, I want the average of y for pos from 1-10000, 10001-20000, ..., and out put would look like,

Code:
outputfile
id    pos     mean.y
11   1        179.6667                   because 179.6667=(220+207+112)/3
11   10001  307.5                                   307.5=(305+310)/2
11   20001  781                                      781=781/1
11   30001  121                                      121=121/1
11   40001  0                             because there is no 'pos' from 40001 to 5000
11   50001  981
...


Thanks a lot!

Note, the 'pos' of the outputfile is the starting value of a range.



Last edited by Scrutinizer; 01-27-2013 at 03:39 PM.. Reason: code tags
# 2  
Old 01-27-2013
PLEASE use code tags as demanded!
Try this as a starting point; the 40000 line I've left to your exercise...
Code:
awk 'NR==1  {print "id pos   mean.y"}
     NR>1   {tmp=($2-$2%10000)/10000;
             if (tmp!=cnt) {printf "%s %05d% 8.4f\n", ID, cnt*10000+1, sum/n; cnt=tmp; n=sum=0}
             ID=$1; sum+=$3; n++; 
            }
    ' file
id pos   mean.y
11 00001 179.6667
11 10001 307.5000
11 20001 781.0000
11 30001 121.0000
11 50001 981.0000

or
Code:
awk 'NR==1    {print "id pos   mean.y"; next}
              {tmp=($2-$2%10000)/10000}
     tmp!=cnt {printf "%s %05d% 8.4f\n", ID, cnt*10000+1, sum/n; cnt=tmp; n=sum=0}
              {ID=$1; sum+=$3; n++ }
    ' file


Last edited by RudiC; 01-27-2013 at 03:55 PM..
# 3  
Old 01-27-2013
It works like a charm!

Thank you so much!

Previous Thread | Next Thread
Test Your Knowledge in Computers #907
Difficulty: Easy
The Unix shell command line is a sequence of ASCII text words delimited by curly braces.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Calculate 5th percentile based on another column

I would like to have some help in calculating 5th percentile value of column 2 for each site, the input is like below:site val1 val2 002 10 25.3 002 20 25.3 002 30 25.3 002 40 20 002 50 20 002 60 20 002 70 20 002 80 30 002 90 30 002 100 30 002 120 30 003 20 30.3 003 20 30.3 003 30 20... (2 Replies)
Discussion started by: wuhuai
2 Replies

2. Shell Programming and Scripting

Match first two columns and calculate percent of average in third column

I have the need to match the first two columns and when they match, calculate the percent of average for the third columns. The following awk script does not give me the expected results. awk 'NR==FNR {T=$3; next} $1,$2 in T {P=T/$3*100; printf "%s %s %.0f\n", $1, $2, (P>=0)?P:-P}' diff.file... (1 Reply)
Discussion started by: ncwxpanther
1 Replies

3. Shell Programming and Scripting

Calculate Average time of one column

Hello dears, I have a log file with records like below and want to get a average of one column based on the search of one specific keyword. 2015-02-07 08:15:28 10.102.51.100 10.112.55.101 "kevin.c" POST ... (2 Replies)
Discussion started by: Newman
2 Replies

4. Shell Programming and Scripting

Check first column - average second column based on a condition

Hi, My input file Gene1 1 Gene1 2 Gene1 3 Gene1 0 Gene2 0 Gene2 0 Gene2 4 Gene2 8 Gene3 9 Gene3 9 Gene4 0 Condition: If the first column matches, then look in the second column. If there is a value of zero in the second column, then don't consider that record while averaging. ... (5 Replies)
Discussion started by: jacobs.smith
5 Replies

5. UNIX for Dummies Questions & Answers

Find the average based on similar names in the first column

I have a table, say this: name1 num1 num2 num3 num4 name2 num5 num6 num7 num8 name3 num1 num3 num4 num9 name2 num8 num9 num1 num2 name2 num4 num5 num6 num4 name4 num4 num5 num7 num8 name5 num1 num3 num9 num7 name5 num6 num8 num3 num4 I want a code that will sort my data according... (4 Replies)
Discussion started by: FelipeAd
4 Replies

6. Shell Programming and Scripting

Calculate 2nd Column Based on 1st Column

Dear All, I have input file like this. input.txt CE2_12-15 3950.00 589221.0 9849709.0 768.0 CE2_12_2012 CE2_12-15 3949.00 589199.0 9849721.0 768.0 CE2_12_2012 CE2_12-15 3948.00 589178.0 9849734.0 768.0 CE2_12_2012 CE2_12-52 1157.00 ... (3 Replies)
Discussion started by: attila
3 Replies

7. Shell Programming and Scripting

Average values in a column based on range

Hi i have data with two columns like below. I want to find average of column values like if the value in column 2 is between 0-250000 the average of column 1 is some xx and average of column2 is ww then if value is 250001-5000000 average of column 1 is yy and average of column 2 is zz. And my... (5 Replies)
Discussion started by: bhargavpbk88
5 Replies

8. Shell Programming and Scripting

AWK: how to get average based on certain column

Hi, I'm new to shell programming, can anyone help me on this? I want to do following operations - 1. Average salary for each country 2. Total salary for each city and data that looks like - salary country city 10000 zzz BN 25000 zzz BN 30000 zzz BN 10000 yyy ZN 15000 yyy ZN ... (3 Replies)
Discussion started by: shell123
3 Replies

9. UNIX for Dummies Questions & Answers

Use awk to calculate average of column 3

Suppose I have 500 files in a directory and I need to Use awk to calculate average of column 3 for each of the file, how would I do that? (6 Replies)
Discussion started by: grossgermany
6 Replies

10. UNIX for Dummies Questions & Answers

calculate average of column 2

Hi I have fakebook.csv as following: F1(current date) F2(popularity) F3(name of book) F4(release date of book) 2006-06-21,6860,"Harry Potter",2006-12-31 2006-06-22,,"Harry Potter",2006-12-31 2006-06-23,7120,"Harry Potter",2006-12-31 2006-06-24,,"Harry Potter",2006-12-31... (0 Replies)
Discussion started by: onthetopo
0 Replies

Featured Tech Videos