Standard deviation in awk


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Standard deviation in awk
# 1  
Old 08-19-2011
Standard deviation in awk

Hi all,

I need to find the standard deviation of each column of a dataset below for each hour. The data is given in 5 second intervals as shown below

Code:
DATE                      TIME                      FRAC_DAYS_SINCE_JAN1      FRAC_HRS_SINCE_JAN1       EPOCH_TIME                ALARM_STATUS              species                   solenoid_valves           MPVPosition               OutletValve               CavityPressure            CavityTemp                WarmBoxTemp               EtalonTemp                DasTemp                   CO2_sync                  CO2_dry_sync              CH4_sync                  CH4_dry_sync              H2O_sync                  
2011-06-25                08:03:20.000              175.33564815              4208.055556               1308989000.000            0                         1.0000000000E+000         0.0000000000E+000         0.0000000000E+000         3.1033031768E+004         1.3998939700E+002         4.4999687195E+001         4.4999909923E+001         4.4982554694E+001         3.1751035912E+001         3.8940279934E+002         3.9515856123E+002         3.8618224512E+000         3.9114643916E+000         9.4040946797E-001         
2011-06-25                08:03:25.000              175.33570602              4208.056944               1308989005.000            0                         2.0000000000E+000         0.0000000000E+000         0.0000000000E+000         3.1036267092E+004         1.3999910014E+002         4.4999724978E+001         4.5000055343E+001         4.4982432169E+001         3.1750000000E+001         3.8935193355E+002         3.9511128829E+002         3.8489583767E+000         3.8990622477E+000         9.4172965077E-001         
2011-06-25                08:03:30.000              175.33576389              4208.058333               1308989010.000            0                         3.0000000000E+000         0.0000000000E+000         0.0000000000E+000         3.1035811523E+004         1.4000220783E+002         4.4999819829E+001         4.5000161391E+001         4.4982414246E+001         3.1795673077E+001         3.8938861262E+002         3.9514732622E+002         3.8595829527E+000         3.9062814635E+000         9.3441447742E-001         
2011-06-25                08:03:35.000              175.33582176              4208.059722               1308989015.000            0                         2.0000000000E+000         0.0000000000E+000         0.0000000000E+000         3.1036211602E+004         1.4000074680E+002         4.4999892989E+001         4.5000205994E+001         4.4982414246E+001         3.1750000000E+001         3.8945211683E+002         3.9517328822E+002         3.8841252351E+000         3.9451982461E+000         9.3417578630E-001         
2011-06-25                08:03:40.000              175.33587963              4208.061111               1308989020.000            0                         1.0000000000E+000         0.0000000000E+000         0.0000000000E+000         3.1036852113E+004         1.3999663742E+002         4.4999757503E+001         4.5000099772E+001         4.4982477978E+001         3.1793508287E+001         3.8936261353E+002         3.9511123990E+002         3.8395527261E+000         3.8931575825E+000         9.4752583244E-001         
2011-06-25                08:03:45.000              175.33593750              4208.062500               1308989025.000            0                         2.0000000000E+000         0.0000000000E+000         0.0000000000E+000         3.1032849098E+004         1.3998770158E+002         4.4999748230E+001         4.5000053406E+001         4.4982505798E+001         3.1750000000E+001         3.8931505434E+002         3.9512509568E+002         3.8351808367E+000         3.8599191928E+000         9.5367870751E-001         
2011-06-25                08:03:50.000              175.33599537              4208.063889               1308989030.000            0                         1.0000000000E+000         0.0000000000E+000         0.0000000000E+000         3.1039831891E+004         1.4001021926E+002         4.5000107359E+001         4.4999785829E+001         4.4982363833E+001         3.1750000000E+001         3.8929289686E+002         3.9508579576E+002         3.8474802185E+000         3.8609535036E+000         9.4079656257E-001

As each line 5 seconds, each hour would represent 720 lines! Basically need to find the SD each 720 lines.

I've worked out how to find the SD of a column (column 6 here)

Code:
awk  'NR>2 {sum+=$6; array[NR]=$6} END {for(x=1;x<=NR;x++){sumsq+=((array[x]-(sum/NR))^2);}print sqrt(sumsq/NR)}' INPUTFILE.dat


but instead of the whole column I need every hour (every 720 lines).

I've attached a file containing 3 hours worth of data if needed

Thanks a lot!
# 2  
Old 08-19-2011
Re: standard deviation in awk

Give this a try:

Code:
#!/usr/bin/awk

function std_dev(data, count) {
    sum=0;
    for( x=1; x <= count; x++) {
        sum += data[x];
    }

    avg = sum/count;

    sumsq=0;
    for( x=1; x <= count; x++) {
        sumsq += (data[x] - avg)^2;
    }

    print sqrt(sumsq/count);
}

BEGIN {
    cnt = 0;
}

END {
    std_dev(array, cnt);
}

{
    array[cnt++] = $7;

    if (cnt == 720) {
        std_dev(array, cnt);
        cnt = 0;
    }
}

# 3  
Old 08-19-2011
Hi Roboticus,

Thanks for the help. Am I right in thinking this will give me the hourly SD's for column 7? How would I get the hourly SD's of say columns 17 19 and 20 and place them next to each other?

Last edited by gd9629; 08-22-2011 at 12:13 PM..
# 4  
Old 08-20-2011
I'd create an array for each of the columns, something like:
Code:
{
   array7[cnt]=$7; array8[cnt]=$8; array9[cnt]=$9; cnt++;
   if (cnt==720) {
      print std_dev(array7,cnt) ", " std_dev(array8,cnt)
       ", " std_dev(array9,cnt);
   }
}

But you'd need to modify std_dev to return the value instead of printing it, update the END block, etc.

(I'd give it a go right now, but am having production issues. If you're still fighting it tomorrow, I can tune it up some.)

--roboticus
# 5  
Old 08-22-2011
Hi Roboticus,

I've tried combining the codes you've given me into

Code:
function std_dev(data, count) {
    sum=0;
    for( x=1; x <= count; x++) {
        sum += data[x];
    }

    avg = sum/count;

    sumsq=0;
    for( x=1; x <= count; x++) {
        sumsq += (data[x] - avg)^2;
    }

    print sqrt(sumsq/count);
}


BEGIN {
    cnt = 0;
}


END {
    std_dev(array17, cnt); std_dev(array19, cnt); std_dev(array20, cnt);
}

{
   array17[cnt]=$17; array19[cnt]=$19; array20[cnt]=$20; cnt++;
   if (cnt==720) {
      print std_dev(array17, cnt) " " std_dev(array19, cnt)
       " " std_dev(array20, cnt);
   }
	
}

Unfortunately this just gives me a few numbers, the meaning of which I'm unsure lol

Code:
71.1077
0.178527
 
0.0675481
38.6887
0.825408
0.140382

How should I edit the script further to give the 3 separate columns of SD values?

Thanks!

---------- Post updated at 04:19 PM ---------- Previous update was at 09:58 AM ----------

Hmm I've tried this now

Code:
BEGIN {
    cnt = 0;
}


END {
    std_dev(array17, cnt); std_dev(array19, cnt); std_dev(array20, cnt);
}

{
   array17[cnt++] = $17; array19[cnt++] = $19; array20[cnt++] = $20;
   
   if (cnt==720) {
    print  std_dev(array17, cnt) " " std_dev(array19, cnt)
       " " std_dev(array20, cnt); 
	   cnt = 0; 
	  
   }
	
}

and get this as my results

Code:
272.963
1.14211
 
0.477465
251.055
1.05022
 
0.446297
232.296
1.00244
 
0.43713
222.966
0.970351
 
0.412753
219.27
0.957796
 
0.404489
216.511
0.948255

Any ideas?

Last edited by gd9629; 08-22-2011 at 11:02 AM..
# 6  
Old 08-22-2011
I've fixed your code a bit:
Code:
function std_dev(data, count) {
    sum=0;
    for( x=1; x <= count; x++) {
        sum += data[x];
    }
    avg = sum/count;
    sumsq=0;
    for( x=1; x <= count; x++) {
        sumsq += (data[x] - avg)^2;
    }
    return sqrt(sumsq/count);
}
BEGIN {
    cnt = 0;
}
END {
    std_dev(array17, cnt); std_dev(array19, cnt); std_dev(array20, cnt);
}
NR>1{
   array17[cnt]=$17; array19[cnt]=$19; array20[cnt]=$20; cnt++;
   if (cnt==720) {
      print std_dev(array17, cnt) " " std_dev(array19, cnt) " " std_dev(array20, cnt);
      cnt=0;
   }
}

Run it as: awk -f script.awk testfile.txt
If you need times to be printed before those values, I might come up with something (in an hour or so).
This User Gave Thanks to bartus11 For This Post:
# 7  
Old 08-23-2011
Bartus I do believe you're a magician. If I want to combine two files

File 1
Code:
Site: Ridge Hill
yyyy mm dd hh         CO2                CH4         H20
2011 06 09 14  5.2977498477e+02 2.2533536482e+00 9.5975030698e-01
2011 06 09 15  4.6520528305e+02 2.0304031747e+00 8.5550947898e-01
2011 06 09 16  4.2844761381e+02 1.9940230591e+00 8.1996490325e-01
2011 06 09 17  4.2039731519e+02 2.1120403967e+00 8.8853008313e-01
2011 06 09 18  4.1383091367e+02 2.2051511063e+00 9.2270478393e-01
2011 06 09 19  4.0919424013e+02 2.2731077128e+00 9.3647077771e-01
2011 06 09 20  4.0612124764e+02 2.3469439802e+00 9.3830980979e-01
2011 06 09 21  4.0448877229e+02 2.4314198862e+00 9.3857043465e-01
2011 06 09 22  4.0341963267e+02 2.5037286957e+00 9.3017004739e-01
2011 06 09 23  4.0258560618e+02 2.5635443249e+00 9.2232603099e-01

File 2 (3 columns with all these values to 4.d.p)
Code:
71.0168 0.178157 0.0674305
24.332 0.087471 0.0448028
19.5077 0.082566 0.0407083
15.8686 0.0851615 0.0366714
15.507 0.0850516 0.0349926
15.3 0.0870619 0.0349696
15.1607 0.0904091 0.0350349
15.0886 0.0938745 0.0350621
15.0478 0.0954998 0.0348352
15.0147 0.0965944 0.0344979

Desired Output
Code:
Site: Ridge Hill
yyyy mm dd hh         CO2       	    CH4           		   H20
2011 06 09 14  5.2977498477e+02 71.0168	2.2533536482e+00 0.178157 9.5975030698e-01 0.0674305
2011 06 09 15  4.6520528305e+02 24.332	2.0304031747e+00 0.087471 8.5550947898e-01 0.0448028
2011 06 09 16  4.2844761381e+02 19.5077	1.9940230591e+00 0.082566 8.1996490325e-01 0.0407083
2011 06 09 17  4.2039731519e+02 15.8686	2.1120403967e+00 0.0851615 8.8853008313e-01 0.0366714
2011 06 09 18  4.1383091367e+02 15.507	2.2051511063e+00 0.0850516 9.2270478393e-01 0.0349926
2011 06 09 19  4.0919424013e+02 15.3 	2.2731077128e+00 0.0870619 9.3647077771e-01 0.0349696
2011 06 09 20  4.0612124764e+02 15.1607	2.3469439802e+00 0.0904091 9.3830980979e-01 0.0350349
2011 06 09 21  4.0448877229e+02 15.0886	2.4314198862e+00 0.0938745 9.3857043465e-01 0.0350621
2011 06 09 22  4.0341963267e+02 15.0478	2.5037286957e+00 0.0954998 9.3017004739e-01 0.0348352
2011 06 09 23  4.0258560618e+02 15.0147	2.5635443249e+00 0.0965944 9.2232603099e-01 0.0344979

Can I use the 'join' command? Or would it be better to use the 'FNR==NR' or perhaps something different?

Thanks

Last edited by gd9629; 08-23-2011 at 06:02 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

SMA (Single Moving Average) and Standard Deviation

Hello Team, I am using the following awk script to calculate the SMA (Single Moving Average) for an specific period but now I would like to include the standard deviation output. Could you please help me to modify this awk shell script awk -F, -v points=5 ' { a = $2; ... (4 Replies)
Discussion started by: csierra
4 Replies

2. Shell Programming and Scripting

Output mean and standard deviation of a row

I have a file that looks that this: 820 890 530 1650 1600 1800 1850 1900 2270 1640 2300 1670 2080 2200 2350 1150 1630 2210 I would like to output the mean and standard deviation of each row so that my final output would look like this 820 890 530 746.667 155.849 1650 1600 1800... (5 Replies)
Discussion started by: kayak
5 Replies

3. Shell Programming and Scripting

Computing average and standard deviation from multiple text files

Hello there, I found an elegant solution to computing average values from multiple text files awk '{for (i=1;i<=NF;i++){if ($i!~"n/a"){a+=$i}else{b++}}}END{for (i=1;i<=FNR;i++){for (j=1;j<=NF;j++){printf (a/(3-b))((b>0)?"~"b" ":" ")};printf "\n"}}' file1 file2 file3 I tried to modify... (2 Replies)
Discussion started by: charmmilein
2 Replies

4. Shell Programming and Scripting

calculating row-wise standard deviation using awk

Hi, I have a file containing 100,000 rows-by-120 columns and I need to compute for the standard deviation for each row. Any idea on how to calculate row-wise standard deviation using awk? My sample data looks like this: input data: 23 35 12 25 16 17 18 19 29 12 12 26 15 14 15 23 12 12... (2 Replies)
Discussion started by: ida1215
2 Replies

5. Shell Programming and Scripting

Finding standard deviation for all columns in a data file

Hi All, I want someone to modify the below script from this forum so that it can be used for all columns in the file( instead of only printing column 3 mean and standard deviation values). I don't know how to loop around all the columns. ... (3 Replies)
Discussion started by: ks_reddy
3 Replies

6. Shell Programming and Scripting

AWK script for standard deviation / root mean square deviation

I have a file with say 50 columns, each containing a whole lot of data. Each column contains data from a separate simulation, but each simulation is related to the data in the last (REFERENCE) column $50 I need to calculate the RMS deviation for each data line, i.e. column 1 relative to... (12 Replies)
Discussion started by: chrisjorg
12 Replies

7. Shell Programming and Scripting

using awk to print average and standard deviation into a file

Hi I want to use awk to print avg and st deviation but it does not go into a file for column 1 only. I can do average and # of records but i cannot get st deviation. awk '{sum+=$1} END { print "Average = ",sum/NR}' thanks (1 Reply)
Discussion started by: phil_heath
1 Replies

8. UNIX for Dummies Questions & Answers

Calculating the Standard Deviation for a column

Hi all, I want to calculate the standard deviation for a column (happens to be column 3). Does any know of simple awk script to do this? Thanks (1 Reply)
Discussion started by: kylle345
1 Replies

9. Shell Programming and Scripting

Mean and Standard deviation

Hi all, I am new to shell scripting and wanna calculate the mean and standard deviation using shell programming. I have a file with letters that are repeating and their corresponding duration a 0.32 a 0.89 aa 0.34 aa 0.23 au 0.012 au 0.26... (4 Replies)
Discussion started by: lakshmikanth.pg
4 Replies

10. Shell Programming and Scripting

Script for finding standard deviation

I have a CSV file that looks like 0,0,0,0,1,0,1,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0 10,11,7,0,4,12,2,3,7,0,11,3,12,4,0,5,5,4,5,0,8,6,12,0,9,3,3,0,2,7,8 19,11,7,0,4,14,16,10,8,2,13,7,15,6,0,76,6,4,10,0,18,10,17,1,11,3,3,0,9,9,8... (7 Replies)
Discussion started by: RJ17
7 Replies
Login or Register to Ask a Question