I am attempting to calculate a running variance for a file containing a column of numbers. I am using the formula variance=sum((x-mean(x))^2)/(n-1), where x is the value on the current row, and mean(x) is the average of all of the values up until that row. n represents the total number of rows up until the current row.
For example, given a column of three numbers:
The variance should be:
Because when we get to row three, mean(x) = (100+100+(-50))/3 = 50, and the variance would therefore be:
My question is, how do I do this with awk to generate a running total of the variance per line? I am using awk to perform several other mathematical operations on my data, so I would prefer to use it for this operation as well; however, if there is a more appropriate tool for doing this, I would like to hear about it.
Thanks,
-Jahn
Last edited by Scott; 07-03-2012 at 02:28 PM..
Reason: Added code tags
It has to recalculate the entire thing every single line, so it's not a matter of which 'tool' you use, it's just storing the data and doing the work...
Hello,
I came across a previous thread "awk-calculating-simple-correlation-rows" which calculated correlations across rows in awk.
Code:
awk '{
a = 0; for (i = 2; i <= NF; ++i) a += $i; a /= NF-1
b = 0; for (i = 2; i <= NF; ++i) b += ($i - a) ^ 2; b = sqrt(b)
if... (7 Replies)
Hi All,
I want to run a utility for all the process id that are running for more than 15 mins.
I have captured process id's and the time that they were run in a file like below
1st column represnts the process ids and the 2nd one is the Time
<
21014 01:00
21099 01:00
24361 01:03
24406... (5 Replies)
I would like to calculate
1/n
In awk, I wrote the following line for the sigma summation:
{ summ+=($1-average)^2 }
Full code:
BEGIN { Print "This script calculate error estimates"; sum=0 }
{ sum+=$1; n++ }
END { average = sum/n }
BEGIN { summ=0 }
{ summ+=($1-average)^2 }
END { print... (8 Replies)
Hi All,
I have the following time stamp data in 2 columns
Date TimeStamp(also with milliseconds)
05/23/2012 08:30:11.250
05/23/2012 08:30:15.500
05/23/2012 08:31.15.500
.
.
etc
From this data I need the following output.
0.00( row1-row1 in seconds)
04.25( row2-row1 in... (5 Replies)
Hi All,
IS there any 'awk' way to manipulate following data?
Fruit Date Count
Apple 20/08/2011 5
Apple 27/08/2011 7
Apple 05/09/2011 11
Apple 12/09/2011 3
Apple 19/09/2011 25
.
.
.
.
Orange 20/08/2011 9
Orange 27/08/2011 20
Orange 27/08/2011 7
Orange 05/09/2011 15
Orange... (3 Replies)
I would like to extract a number from $0 and calculate if it can be devided by 25. Though the number can also be less then 25 or bigger than 100. How do i extract the number and how can the integer be calculated?
String:
"all_results">39</span>I am looking for the number between "all_results"> ... (5 Replies)
Hello,
With the following small script I list the size of documents belonging to a certain user by each time selecting the bytes-field of that file ($7). Now it fills the array with every file it finds so in the end the output of some users contains up to 200.000 numbers. So how can I calculate... (7 Replies)
#!/usr/bin/perl -w
use strict;
open(FH,"$ARGV") or die;
my @temp=<FH>;
close FH;
my $mean = Mean(\@temp);
my $var = variance(\@temp);
print "$var\n";
sub estimate_variance {
my ($arrayref) = @_;
my ($mean,$result) = (mean($arrayref),0);
foreach (@$arrayref) {... (4 Replies)