Script for finding standard deviation


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Script for finding standard deviation
# 1  
Old 09-11-2008
Script for finding standard deviation

I have a CSV file that looks like
Code:
 
0,0,0,0,1,0,1,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0
10,11,7,0,4,12,2,3,7,0,11,3,12,4,0,5,5,4,5,0,8,6,12,0,9,3,3,0,2,7,8
19,11,7,0,4,14,16,10,8,2,13,7,15,6,0,76,6,4,10,0,18,10,17,1,11,3,3,0,9,9,8
22,11,13,1,5,14,16,10,9,10,13,7,16,6,0,59,6,4,10,0,18,13,17,1,11,3,3,0,12,9,10
22,11,13,1,5,14,16,10,9,10,13,7,16,6,22,90,6,4,10,0,18,13,17,1,11,3,4,0,12,9,10
41,18,27,9,27,41,59,20,27,54,63,34,28,43,40,131,7,8,19,0,62,16,30,23,25,3,4,9,24,12,19
42,18,27,9,27,41,59,20,27,55,68,36,28,46,41,132,7,8,19,13,64,16,31,25,25,3,4,9,24,12,19
125,124,78,62,97,87,145,70,87,119,150,124,99,95,41,175,85,58,57,88,142,83,92,102,107,80,45,64,64,94, 89
125,126,78,62,99,87,145,70,87,119,161,124,99,95,41,175,85,58,58,88,142,84,112,103,108,80,68,64,65,98 ,89
189,254,164,153,192,153,230,132,188,163,210,210,167,198,93,235,146,110,97,130,211,107,181,140,151,11 9,105,105,178,126,165
189,324,168,192,194,159,233,132,192,169,244,210,167,201,103,235,147,152,180,181,213,107,192,190,212, 119,119,126,195,126,166
189,324,168,255,194,225,233,141,192,230,244,260,167,201,172,283,181,206,217,216,261,107,192,235,212, 119,169,197,264,189,229
366,438,315,319,382,287,398,320,416,382,407,397,342,448,276,392,297,368,237,347,336,332,384,405,412, 284,329,350,396,326,356

I need to find the stadard deviation for each individual row. Here is the code I have so far. I can't get the square root to work and also I can't get any floating point numbers.
Code:
 
for i in `cat file.csv ` 
do
     x1=0
     x2=0
     sigma=0
     IFS=, 
     for f in $i 
          do  
          let x1=$x1+$f
          let x2=$f*$f+$x2
     done 
     let x1=$x1/30
     let x2=$x2/30
     let sigma=sqrt($x2-$x1*$x1)
     echo "Mean = " $x1
     echo "Standard Deviation = " $sigma
done

# 2  
Old 09-11-2008
The shell does only integer arithmetic operations. You need to use awk or perl or some other envrionment that supports FP operations.
# 3  
Old 09-11-2008
Ok. Can anyone help me rewrite my above script into awk or perl. awk would be preferred? Thanks.
# 4  
Old 09-11-2008
using your algorithm.... in awk which supports FP.
Code:
awk -F','  '{ sum=0; sumsq=0;
                for(i=1; i<=NF;i++) {sum+=$i; sumsq+=$i*$i}
                printf("mean=%f  stddev= %f\n", sum/NF, sqrt(sumsq - (sum*sum)) )
              } ' file.csv

# 5  
Old 09-11-2008
Tools Perhaps way out in left field, but

Depending on the accuracy required, you might consider
(a) For each of your values, multiplying by 100 or 1000 prior to beginning any math. Then know that you have to remove the extra digits and they are after the decimal point. For example 3/2 = 1 in integer, but 300/2 = 150 or adjusted 1.50
(b) An approximation for square root can be done in two parts. First off, add up all odd numbers until you are greater than the starting number. For example, sqrt of 10 would give you 1+3+5+7 and those four pieces are greater than the 10 you started with, so sqrt=3 (one less) as integer. Perhaps easier to see in the following [to get the integer part]
1 sqrt = 1+3 (more), so one digit is 1
2 sqrt = 1+3 (more), so 1
3 sqrt = 1+3 (more), so 1
4 sqrt = 1+3+5 (more), so 2 (again think one less)
5 sqrt = 1+3+5 (more), so 2
...
9 sqrt = 1+3+5+7 (more), so 3
To get to the decimal part there is another strange methodology involving looking at remaindors. In short the sqrt of 5 starts off with a 2 as seen above. Adding 1+3+5=9 and that is 4 too many (9-5). My last number in the 1+3+5 was a 5 and if I have 4 too many, I only needed a 1 (5-4=1). Take the 1 and the 5 and do 1/5 = .2
Add the first 2 to the .2 and you get 2.2 vs. actual of 2.23

For 8, start with the 2 as the integer. That is 1 too many (9-8). My last number was 5 again (in 1+3+5), so I only needed 4. Take that 4 and 5 to get to 4/5 = .8
Add the first 2 to this .8 and you get 2.8 vs actual 2.82


This is normally within a couple hundredths of the pure answer.

***
And I knew by the time I could write all that up, someone would have a program solution. But what the heck, if you can follow the logic of what I wrote for approximating sqrt, then you might agree it to be a cool function!Smilie

Last edited by joeyg; 09-11-2008 at 11:44 AM.. Reason: added comment at end
# 6  
Old 09-11-2008
Quote:
Originally Posted by jim mcnamara
using your algorithm.... in awk which supports FP.
Code:
awk -F','  '{ sum=0; sumsq=0;
                for(i=1; i<=NF;i++) {sum+=$i; sumsq+=$i*$i}
                printf("mean=%f  stddev= %f\n", sum/NF, sqrt(sumsq - (sum*sum)) )
              } ' file.csv

This awk code throws the following error.
awk: The sqrt parameter to a math library function is not in the domain.

This means that the portion to find the average works fine but because sqrt throws an error the std deviation does not work. I think this is because sumsq - (sum*sum) is a negative number.

Last edited by RJ17; 09-11-2008 at 12:16 PM..
# 7  
Old 09-11-2008
You are correct. I copied your algorithm - it needs checks.

Code:
awk -F','  '{ sum=0; sumsq=0;
                for(i=1; i<=NF;i++) {sum+=$i; sumsq+=$i*$i}
                printf("mean=%f  stddev= %f\n", sum/NF, 
                sqrt(  ( (sumsq - (sum*sum))< 0) 
                           ? sumsq - (sum*sum)*-1 : sumsq -(sum*sum) )
              } ' file.csv

This should prevent domain errors.... the fact that there are a lot of zero values means the sum of squares can be very small number. You could also use a function like this placed at the top of the awk code block
function abs(n) { return (n <0)? n*=-1 : n}


Code:
awk -F','  '{ function abs(n) { return (n <0)? n*=-1 : n}

                sum=0; sumsq=0;
                for(i=1; i<=NF;i++) {sum+=$i; sumsq+=$i*$i}
                printf("mean=%f  stddev= %f\n", sum/NF, sqrt(abs(sumsq - (sum*sum))) )
              } ' file.csv

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

SMA (Single Moving Average) and Standard Deviation

Hello Team, I am using the following awk script to calculate the SMA (Single Moving Average) for an specific period but now I would like to include the standard deviation output. Could you please help me to modify this awk shell script awk -F, -v points=5 ' { a = $2; ... (4 Replies)
Discussion started by: csierra
4 Replies

2. Shell Programming and Scripting

Output mean and standard deviation of a row

I have a file that looks that this: 820 890 530 1650 1600 1800 1850 1900 2270 1640 2300 1670 2080 2200 2350 1150 1630 2210 I would like to output the mean and standard deviation of each row so that my final output would look like this 820 890 530 746.667 155.849 1650 1600 1800... (5 Replies)
Discussion started by: kayak
5 Replies

3. Shell Programming and Scripting

Computing average and standard deviation from multiple text files

Hello there, I found an elegant solution to computing average values from multiple text files awk '{for (i=1;i<=NF;i++){if ($i!~"n/a"){a+=$i}else{b++}}}END{for (i=1;i<=FNR;i++){for (j=1;j<=NF;j++){printf (a/(3-b))((b>0)?"~"b" ":" ")};printf "\n"}}' file1 file2 file3 I tried to modify... (2 Replies)
Discussion started by: charmmilein
2 Replies

4. Shell Programming and Scripting

calculating row-wise standard deviation using awk

Hi, I have a file containing 100,000 rows-by-120 columns and I need to compute for the standard deviation for each row. Any idea on how to calculate row-wise standard deviation using awk? My sample data looks like this: input data: 23 35 12 25 16 17 18 19 29 12 12 26 15 14 15 23 12 12... (2 Replies)
Discussion started by: ida1215
2 Replies

5. Shell Programming and Scripting

Finding standard deviation for all columns in a data file

Hi All, I want someone to modify the below script from this forum so that it can be used for all columns in the file( instead of only printing column 3 mean and standard deviation values). I don't know how to loop around all the columns. ... (3 Replies)
Discussion started by: ks_reddy
3 Replies

6. Shell Programming and Scripting

AWK script for standard deviation / root mean square deviation

I have a file with say 50 columns, each containing a whole lot of data. Each column contains data from a separate simulation, but each simulation is related to the data in the last (REFERENCE) column $50 I need to calculate the RMS deviation for each data line, i.e. column 1 relative to... (12 Replies)
Discussion started by: chrisjorg
12 Replies

7. Shell Programming and Scripting

Standard deviation in awk

Hi all, I need to find the standard deviation of each column of a dataset below for each hour. The data is given in 5 second intervals as shown below DATE TIME FRAC_DAYS_SINCE_JAN1 FRAC_HRS_SINCE_JAN1 EPOCH_TIME ... (11 Replies)
Discussion started by: gd9629
11 Replies

8. Shell Programming and Scripting

using awk to print average and standard deviation into a file

Hi I want to use awk to print avg and st deviation but it does not go into a file for column 1 only. I can do average and # of records but i cannot get st deviation. awk '{sum+=$1} END { print "Average = ",sum/NR}' thanks (1 Reply)
Discussion started by: phil_heath
1 Replies

9. UNIX for Dummies Questions & Answers

Calculating the Standard Deviation for a column

Hi all, I want to calculate the standard deviation for a column (happens to be column 3). Does any know of simple awk script to do this? Thanks (1 Reply)
Discussion started by: kylle345
1 Replies

10. Shell Programming and Scripting

Mean and Standard deviation

Hi all, I am new to shell scripting and wanna calculate the mean and standard deviation using shell programming. I have a file with letters that are repeating and their corresponding duration a 0.32 a 0.89 aa 0.34 aa 0.23 au 0.012 au 0.26... (4 Replies)
Discussion started by: lakshmikanth.pg
4 Replies
Login or Register to Ask a Question