Calculating Running Variance Using Awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Calculating Running Variance Using Awk
# 1  
Old 07-03-2012
Calculating Running Variance Using Awk

Hi all,

I am attempting to calculate a running variance for a file containing a column of numbers. I am using the formula variance=sum((x-mean(x))^2)/(n-1), where x is the value on the current row, and mean(x) is the average of all of the values up until that row. n represents the total number of rows up until the current row.

For example, given a column of three numbers:

Code:
100
100
-50

The variance should be:

Code:
0
0
7500

Because when we get to row three, mean(x) = (100+100+(-50))/3 = 50, and the variance would therefore be:

Code:
variance= ((100-50)^2 + (100-50)^2 + (-50-50)^2)/(3-1) = (50^2 + 50^2 + 100^2) / 2 = 15000/2 = 7500

My question is, how do I do this with awk to generate a running total of the variance per line? I am using awk to perform several other mathematical operations on my data, so I would prefer to use it for this operation as well; however, if there is a more appropriate tool for doing this, I would like to hear about it.


Thanks,

-Jahn

Last edited by Scott; 07-03-2012 at 02:28 PM.. Reason: Added code tags
# 2  
Old 07-03-2012
It has to recalculate the entire thing every single line, so it's not a matter of which 'tool' you use, it's just storing the data and doing the work...

Code:
awk '{ D[NR]=$0; T+=$0 }
        NR==1 { print 0 ; next }
        {        V=0
                  A=T/NR
                  for(N=1; N<=NR; N++) V+=(D[N]-A)*(D[N]-A)
                  V/=(NR-1)
                  $0 = V } 1' data

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Calculating correlations across columns in awk

Hello, I came across a previous thread "awk-calculating-simple-correlation-rows" which calculated correlations across rows in awk. Code: awk '{ a = 0; for (i = 2; i <= NF; ++i) a += $i; a /= NF-1 b = 0; for (i = 2; i <= NF; ++i) b += ($i - a) ^ 2; b = sqrt(b) if... (7 Replies)
Discussion started by: Ross
7 Replies

2. Shell Programming and Scripting

Calculating the running time

Hi All, I want to run a utility for all the process id that are running for more than 15 mins. I have captured process id's and the time that they were run in a file like below 1st column represnts the process ids and the 2nd one is the Time < 21014 01:00 21099 01:00 24361 01:03 24406... (5 Replies)
Discussion started by: r_t_1601
5 Replies

3. Shell Programming and Scripting

AWK sample variance

I would like to calculate 1/n In awk, I wrote the following line for the sigma summation: { summ+=($1-average)^2 } Full code: BEGIN { Print "This script calculate error estimates"; sum=0 } { sum+=$1; n++ } END { average = sum/n } BEGIN { summ=0 } { summ+=($1-average)^2 } END { print... (8 Replies)
Discussion started by: chrisjorg
8 Replies

4. Shell Programming and Scripting

Calculating average with awk

I need to find the average from a file like: data => BW:123 M:30 RTD:0 1 0 1 0 0 1 1 1 1 0 0 1 1 0' data => BW:123 N:30 RTD:0 1 0 1 0 0 1 1 1 1 0 0 1 1 0' data => BW:123 N:30 RTD:0 1 0 1 0 0 1 1 1 1 0 0 1 1 0' data => BW:123 N:30 RTD:0 1 0 1 0 0 1 1 1 1 0 0 1 1 0' data => BW:123 N:30 RTD:0 1... (4 Replies)
Discussion started by: Slagle
4 Replies

5. Shell Programming and Scripting

Calculating the epoch time from standard time using awk and calculating the duration

Hi All, I have the following time stamp data in 2 columns Date TimeStamp(also with milliseconds) 05/23/2012 08:30:11.250 05/23/2012 08:30:15.500 05/23/2012 08:31.15.500 . . etc From this data I need the following output. 0.00( row1-row1 in seconds) 04.25( row2-row1 in... (5 Replies)
Discussion started by: ks_reddy
5 Replies

6. Shell Programming and Scripting

AWK way of calculating growth

Hi All, IS there any 'awk' way to manipulate following data? Fruit Date Count Apple 20/08/2011 5 Apple 27/08/2011 7 Apple 05/09/2011 11 Apple 12/09/2011 3 Apple 19/09/2011 25 . . . . Orange 20/08/2011 9 Orange 27/08/2011 20 Orange 27/08/2011 7 Orange 05/09/2011 15 Orange... (3 Replies)
Discussion started by: aniketdixit
3 Replies

7. Shell Programming and Scripting

Calculating an integer with awk

I would like to extract a number from $0 and calculate if it can be devided by 25. Though the number can also be less then 25 or bigger than 100. How do i extract the number and how can the integer be calculated? String: "all_results">39</span>I am looking for the number between "all_results"> ... (5 Replies)
Discussion started by: sdf
5 Replies

8. Shell Programming and Scripting

Awk total and variance

File1 0358 Not Visible ***:* NA:NA RDF1+TDEV Grp'd (M) RW 102413 0359 Not Visible ***:* NA:NA RDF1+TDEV N/Grp'd (m) RW - 035A Not Visible ***:* NA:NA RDF1+TDEV N/Grp'd (m) RW - 035B Not Visible ***:* NA:NA ... (2 Replies)
Discussion started by: greycells
2 Replies

9. Shell Programming and Scripting

Calculating totals in AWK

Hello, With the following small script I list the size of documents belonging to a certain user by each time selecting the bytes-field of that file ($7). Now it fills the array with every file it finds so in the end the output of some users contains up to 200.000 numbers. So how can I calculate... (7 Replies)
Discussion started by: Hille
7 Replies

10. Shell Programming and Scripting

calculating variance in perl programming

#!/usr/bin/perl -w use strict; open(FH,"$ARGV") or die; my @temp=<FH>; close FH; my $mean = Mean(\@temp); my $var = variance(\@temp); print "$var\n"; sub estimate_variance { my ($arrayref) = @_; my ($mean,$result) = (mean($arrayref),0); foreach (@$arrayref) {... (4 Replies)
Discussion started by: cdfd123
4 Replies
Login or Register to Ask a Question