Compute average based on field values


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Compute average based on field values
# 1  
Old 02-22-2018
Compute average based on field values

Im looking for a way to average the values in field 14 (when field 2 is equal to 2016) and fields 3 and 4 (when field 2 is equal to 2017).

Any help is appreciated.

Code:
001001  2016    33.22   38.19   48.07   51.75   59.77   67.68   70.86   72.21   66.92   53.67   42.31   40.15
001001  2017    42.04   44.02   47.01   54.81   57.97   67.21   70.88   70.11   63.86   55.80   42.96   37.83

(40.15 + 42.04 + 44.02) / 3

Expected output
Code:
001001  42.07

# 2  
Old 02-22-2018
After six and a half years as a member and with more than 130 posts, some of which on average handling / calculation, one would presume you have at least some idea of an approach. So - any attempts from your side?
These 2 Users Gave Thanks to RudiC For This Post:
# 3  
Old 02-22-2018
Using the example that RudiC linked to - I know how to get the average of fields 3 and 4, but Im not sure how to include field 14 in the calculation. The below script is not clean but appears to work.

Code:
 awk 'NR == FNR {if ($2==2017) {SUM[$1] += ($3 + $4)/2; CNT[$1]++}; next} {print $0, SUM[$1]/CNT[$1]}' OFS="\t" OFMT="%.2f" file file

Code:
001001  2016    33.22   38.19   48.07   51.75   59.77   67.68   70.86   72.21   66.92   53.67   42.31   40.150          43.03
001001  2017    42.04   44.02   47.01   54.81   57.97   67.21   70.88   70.11   63.86   55.80   42.96   37.830          43.03

# 4  
Old 02-22-2018
Quote:
Originally Posted by ncwxpanther
. . . but appears to work.
Does it? For your 2017 line, yes, but not for your specified task for the entire data file. To implement that, you need to tell us,
- are the records sorted in ascending / descending order?
- are there more / other lines, on top of 2016 or 2017 lines, like 2015 or 2018? Should those be ignored?

Try
Code:
awk '$2==2016 {SUM[$1]+=$14; CNT[$1]++} $2==2017 {SUM[$1]+=$3+$4; CNT[$1]+=2} END {for (s in SUM) print s, SUM[s]/CNT[s]}' file

If there are no other data / years to be found, this can be simplified. The order of data is not conserved.


EDIT: IF there's 2016 & 2017 only, try
Code:
awk '
        {SUM[$1] += ($2==2016 ? $14 : $3+$4)
         CNT[$1] += $2-2015
        }

END     {for (s in SUM) print s, SUM[s]/CNT[s]
        }

' file
001001 42.07


Last edited by RudiC; 02-22-2018 at 03:39 PM..
# 5  
Old 02-23-2018
The records are sorted by field 2 then by field 1.

Code:
001001  2015    32.97   30.74   48.52   56.88   60.67   69.22   72.41   69.48   64.44   54.18   49.64   47.21
001001  2016    33.22   38.19   48.07   51.75   59.77   67.68   70.86   72.21   66.92   53.67   42.31   40.15
001001  2017    42.04   44.02   47.01   54.81   57.97   67.21   70.88   70.11   63.86   55.80   42.96   37.83
001001  2018    30.78   -99.99  -99.99  -99.99  -99.99  -99.99  -99.99  -99.99  -99.99  -99.99  -99.99  -99.99
001002  1900    39.87   34.00   49.69   56.97   63.68   71.04   73.20   73.63   71.76   54.52   47.71   40.05
001002  1901    40.68   43.45   49.51   60.91   69.08   71.44   73.69   73.94   67.93   57.72   52.77   41.38
001002  1902    38.23   47.23   58.32   56.21   60.51   72.48   73.54   72.19   67.95   60.04   49.51   43.66


Your script
Code:
awk '$2==2016 {SUM[$1]+=$14; CNT[$1]++} $2==2017 {SUM[$1]+=$3+$4; CNT[$1]+=2} END {for (s in SUM) print s, SUM[s]/CNT[s]}'

works for a 3 value computation ((2016 field 14 + 2017 field 3 + field 4)/3)

Output
Code:
001001 42.07

but once you add more values, the calculation is not correct.


For instance
Code:
awk '$2>=2015 {SUM[$1]+=$14; CNT[$1]++} $2<=2018 {SUM[$1]+=$3+$4; CNT[$1]+=2} END {for (s in SUM) print s, SUM[s]/CNT[s]}'

Output
Code:
001001 35.2234

I could not replicate the end result in excel. Perhaps its the ordering/sorting? There are additional lines on top of 2015 so that could be an issue. They should not be ignored.
# 6  
Old 02-23-2018
Quote:
Originally Posted by ncwxpanther
The records are sorted by field 2 then by field 1.
They are not. It's field 1, then field 2.

Quote:
. . . but once you add more values, the calculation is not correct. . . .
How can that be "not correct"? You didn't specify what to do for field 2 values other than 2016 and 2017, so "ignore" was assumed. With your NEW sample data, the proposal given yields
Code:
001001 42.07

- exactly what was requested.

With your modified code, several fields will be counted more than once, falsifying the average.
# 7  
Old 02-23-2018
You are correct. The data is sorted by field 1 than field 2.

Can the code be modified to work across multiple fields for instance for values between 2015 and 2018? Much like the script you helped me with here. Seems like you would just divide by the number of fields examined.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Problem with getting awk to multiply a field by a value set based on condition of another field

Hi, So awk is driving me crazy on this one. I have searched everywhere and read man, docs and every related post Google can find and still no luck. The actual files I need to run this on are sensitive in nature, but it is the same thing as if I needed to calculate weighted grades for multiple... (15 Replies)
Discussion started by: cotilloe
15 Replies

2. Shell Programming and Scripting

awk to look up values in File 2 from File 1, & printingNth field of File1 based value of File2 $2

I have two files which are the output of a multiple choice vocab test (60 separate questions) from 104 people (there are some missing responses) and the question list. I have the item list in one file (File1) Item,Stimulus,Choice1,Choice2,Choice3,Choice4,Correct... (5 Replies)
Discussion started by: samonl
5 Replies

3. UNIX for Beginners Questions & Answers

Print lines based upon unique values in Nth field

For some reason I am having difficulty performing what should be a fairly easy task. I would like to print lines of a file that have a unique value in the first field. For example, I have a large data-set with the following excerpt: PS003,001 MZMWR/ L-DWD// * PS003,001... (4 Replies)
Discussion started by: jvoot
4 Replies

4. Shell Programming and Scripting

awk to adjust coordinates in field based on sequential numbers in another field

I am trying to output a tab-delimited result that uses the data from a tab-delimited file to combine and subtract specific lines. If $4 matches in each line then the first matching sequential $6 value is added to $2, unless the value is 1, then the original $2 is used (like in the case of line... (3 Replies)
Discussion started by: cmccabe
3 Replies

5. Shell Programming and Scripting

awk to combine by field and average by another

In the below awk I am trying to combine all matching $4 into a single $5 (up to the -), and count the lines in $6 and average all values in $7. The awk is close but it seems to only be using the last line in the file and skipping all others. The posted input is a sample of the file that is over... (3 Replies)
Discussion started by: cmccabe
3 Replies

6. Shell Programming and Scripting

Shell or awk script to compute average of all the points within a circle

HI Help, I have a file which looks like below --- Input file ---> 1970113.00000 3460.00000 1.09516 1970116.00000 3791.00000 1.06350 1970120.00000 4120.00000 1.07588 1970115.00000 4450.00000 1.09591 1970116.00000 4780.00000 1.09965 1970120.00000 5109.00000 1.06733 ... (7 Replies)
Discussion started by: Indra2011
7 Replies

7. Shell Programming and Scripting

Compute average ignoring outliers of different segments within a dat file using awk

I have data files that look like this, say data.txt 0.00833 6.34 0.00833 6.95 0.00833 7.08 0.00833 8.07 0.00833 8.12 0.00833 8.26 0.00833 8.70 0.00833 9.36 0.01667 20.53 0.01667 6.35 0.01667 6.94 0.01667 7.07 0.01667 8.06 0.01667 8.10 0.01667 8.25 0.01667 8.71 0.01667 9.31... (7 Replies)
Discussion started by: malandisa
7 Replies

8. Shell Programming and Scripting

Get the average from column and write the value at the last field

Dear Experts, Kindly help me please to get the average from column 14 and to write the value at the last field., But we need to take as reference the column 16., example the first 4 lines has the same value in column 16, therefore I want ot get the average only for these lines in column 14. And... (2 Replies)
Discussion started by: jiam912
2 Replies

9. Shell Programming and Scripting

Average values in a column based on range

Hi i have data with two columns like below. I want to find average of column values like if the value in column 2 is between 0-250000 the average of column 1 is some xx and average of column2 is ww then if value is 250001-5000000 average of column 1 is yy and average of column 2 is zz. And my... (5 Replies)
Discussion started by: bhargavpbk88
5 Replies

10. Shell Programming and Scripting

Find top N values for field X based on field Y's value

I want to find the top N entries for a certain field based on the values of another field. For example if N=3, we want the 3 best values for each entry: Entry1 ||| 100 Entry1 ||| 95 Entry1 ||| 30 Entry1 ||| 80 Entry1 ||| 50 Entry2 ||| 40 Entry2 ||| 20 Entry2 ||| 10 Entry2 ||| 50... (1 Reply)
Discussion started by: FrancoisCN
1 Replies
Login or Register to Ask a Question