AWK script for standard deviation / root mean square deviation


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting AWK script for standard deviation / root mean square deviation
# 8  
Old 01-12-2012
Code:
nawk '{s=0;for(i=1;i<=NF;i++) s+=($NF-$i)^2;print sqrt(s)}' slice.txt

# 9  
Old 01-13-2012
Thanks,
guys this solved the issue.

Am very grateful for your patience, and your support.

X

---------- Post updated 01-13-12 at 05:58 AM ---------- Previous update was 01-12-12 at 04:37 PM ----------

Hi guys,
a little extra question,

what if I wanted the program to work in the reverse manner,

parse ONLY two columns per iteration, e.g. column 1 ($1) and column 50 ($50) and subtract value 1 in column 1 from value 1 in column 50, etc. until I calculate one RMSD number.

THEN move onto column 2 ($2) and column 50 ($50) etc. and obtain a second RMSD number.
etc etc until it is column 50 ($50) vs. column 50 ($50).

?

Currently my script parses each line separately, and it is great, but I want to try the other way around.

---------- Post updated at 06:09 AM ---------- Previous update was at 05:58 AM ----------









so instead of NF I thought of using NR to go down the column instead of across?

---------- Post updated at 06:45 AM ---------- Previous update was at 06:09 AM ----------

So for instance,

If I wanted to obtain the RMSD between column 1 and column 50 I wrote



Code:
{s=0;NF==1;for(i=1;i<=NR;i++)} s+=($NF-$50)^2;print sqrt(s/NR)}

but I get an error:

Code:
awk -f rmsd4 merge.pmf > test2
awk: syntax error at source line 2 source file rmsd4
 context is
    {s=0;NF==1;for(i=1;i<=NR;i++)} >>>  s+=($NR-$36)^2;print <<<  sqrt(s/NR)}
    extra }
awk: bailing out at source line 4


Last edited by chrisjorg; 01-13-2012 at 07:51 AM..
# 10  
Old 01-13-2012
If you could post a sample of your data, a sample of what output you want, and show your calculations that would be much, much better than trying to explain an algorithm in casual English. It's a little more work for you, I understand, but it also means a much greater chance of being understood on the first try. And being all you need is a single line of data to demonstrate, it really isn't so awful. It may also help you get it organized better and see an algorithm yourself.

I don't understand what you're getting at now, and since your code doesn't work, it's not a good demonstration either.

I think I see your syntax error -- an extra bracket:

Code:
{s=0;NF==1;for(i=1;i<=NR;i++)} s+=($NF-$50)^2;print sqrt(s/NR)}

There's logic errors too, though.

The statement in blue, though, what did you intend that to do? Right now it's a complete no-op.

NR is the number of lines(records), not the number of fields, I think you want (i=1;i<=NF;i++) and sqrt(s/NF)

I also think that's off by one, since that will include the last column, the average itself, so:

(i=1;i<NF;i++) and sqrt(s/(NF-1))

And then, that thing in green. Since NF doesn't change until the number of columns does, this is always adding the same columns: ($NF-$50) In fact if you have 50 columns, $NF will be column 50, causing the result to always be zero!

Last edited by Corona688; 01-13-2012 at 12:14 PM..
This User Gave Thanks to Corona688 For This Post:
# 11  
Old 01-13-2012
I think the OP wants to do the calculations based on columns, not based on rows as initially stated. But I fail to understand the algorithm entirely - using 'NR' instead of 'NF' is not going to achieve the desired results.
One would need to parse all lines and build up a 2-D matrix and navigate it by columns performing the desired calculation.
Once again, this is all somewhat nebulous for me.
If the OP could demonstrate what he/she/it is after given a simple table (say 3x3 - not necessarily Nx50)
Code:
1 2 3
4 5 6
7 8 9

and a sample output - that would be helpful.
[/code]
# 12  
Old 01-16-2012
Hi,
ok,
I understand I was not clear enough.

Code:
BEGIN {s_0=0;n_0=0}
      {n_0++;s_0+=($50-$1)^2}
END {print sqrt(s_0/n_0)}

BEGIN {s_1=0;n_1=0}
      {n_1++;s_1+=($50-$2)^2}
END {print sqrt(s_1/n_1)}
...
BEGIN {s_50=0;n_50=0}
      {n_50++;s_50+=($50-$50)^2}
END {print sqrt(s_50/n_50)}


I have written a code to work on a document with 50 columns of data to calculate the RMSD of each column of data relative to the last column of data.

The script is pretty primitive, and involves 50 different variables. Is there a way I could make it automatic by simply incrementing the n_x variable instead of having to define the 50 variables manually. Also, instead of defining the last column as $50 how do I do it with an $NF?
# 13  
Old 01-18-2012
How about that sample input and output?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

SMA (Single Moving Average) and Standard Deviation

Hello Team, I am using the following awk script to calculate the SMA (Single Moving Average) for an specific period but now I would like to include the standard deviation output. Could you please help me to modify this awk shell script awk -F, -v points=5 ' { a = $2; ... (4 Replies)
Discussion started by: csierra
4 Replies

2. Shell Programming and Scripting

Output mean and standard deviation of a row

I have a file that looks that this: 820 890 530 1650 1600 1800 1850 1900 2270 1640 2300 1670 2080 2200 2350 1150 1630 2210 I would like to output the mean and standard deviation of each row so that my final output would look like this 820 890 530 746.667 155.849 1650 1600 1800... (5 Replies)
Discussion started by: kayak
5 Replies

3. Shell Programming and Scripting

Computing average and standard deviation from multiple text files

Hello there, I found an elegant solution to computing average values from multiple text files awk '{for (i=1;i<=NF;i++){if ($i!~"n/a"){a+=$i}else{b++}}}END{for (i=1;i<=FNR;i++){for (j=1;j<=NF;j++){printf (a/(3-b))((b>0)?"~"b" ":" ")};printf "\n"}}' file1 file2 file3 I tried to modify... (2 Replies)
Discussion started by: charmmilein
2 Replies

4. Shell Programming and Scripting

calculating row-wise standard deviation using awk

Hi, I have a file containing 100,000 rows-by-120 columns and I need to compute for the standard deviation for each row. Any idea on how to calculate row-wise standard deviation using awk? My sample data looks like this: input data: 23 35 12 25 16 17 18 19 29 12 12 26 15 14 15 23 12 12... (2 Replies)
Discussion started by: ida1215
2 Replies

5. Shell Programming and Scripting

Finding standard deviation for all columns in a data file

Hi All, I want someone to modify the below script from this forum so that it can be used for all columns in the file( instead of only printing column 3 mean and standard deviation values). I don't know how to loop around all the columns. ... (3 Replies)
Discussion started by: ks_reddy
3 Replies

6. Shell Programming and Scripting

Standard deviation in awk

Hi all, I need to find the standard deviation of each column of a dataset below for each hour. The data is given in 5 second intervals as shown below DATE TIME FRAC_DAYS_SINCE_JAN1 FRAC_HRS_SINCE_JAN1 EPOCH_TIME ... (11 Replies)
Discussion started by: gd9629
11 Replies

7. Shell Programming and Scripting

using awk to print average and standard deviation into a file

Hi I want to use awk to print avg and st deviation but it does not go into a file for column 1 only. I can do average and # of records but i cannot get st deviation. awk '{sum+=$1} END { print "Average = ",sum/NR}' thanks (1 Reply)
Discussion started by: phil_heath
1 Replies

8. UNIX for Dummies Questions & Answers

Calculating the Standard Deviation for a column

Hi all, I want to calculate the standard deviation for a column (happens to be column 3). Does any know of simple awk script to do this? Thanks (1 Reply)
Discussion started by: kylle345
1 Replies

9. Shell Programming and Scripting

Mean and Standard deviation

Hi all, I am new to shell scripting and wanna calculate the mean and standard deviation using shell programming. I have a file with letters that are repeating and their corresponding duration a 0.32 a 0.89 aa 0.34 aa 0.23 au 0.012 au 0.26... (4 Replies)
Discussion started by: lakshmikanth.pg
4 Replies

10. Shell Programming and Scripting

Script for finding standard deviation

I have a CSV file that looks like 0,0,0,0,1,0,1,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0 10,11,7,0,4,12,2,3,7,0,11,3,12,4,0,5,5,4,5,0,8,6,12,0,9,3,3,0,2,7,8 19,11,7,0,4,14,16,10,8,2,13,7,15,6,0,76,6,4,10,0,18,10,17,1,11,3,3,0,9,9,8... (7 Replies)
Discussion started by: RJ17
7 Replies
Login or Register to Ask a Question