Calculate Correlation between two fields !


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Calculate Correlation between two fields !
# 1  
Old 02-23-2011
Question Calculate Correlation between two fields !

Hello,

I request your help with a shell script (awk) that ask for two inputs in order to calculate the correlation of the last rows between two fields ( 3 and 4).

Data:
Code:
EC-GLD,1/25/2011,41.270000,129.070000
EC-GLD,1/26/2011,41.550000,129.280000
EC-GLD,1/27/2011,42.260000,127.800000
EC-GLD,1/28/2011,41.940000,127.950000
EC-GLD,1/31/2011,42.380000,129.250000
EC-GLD,2/1/2011,42.580000,129.330000
EC-GLD,2/2/2011,42.650000,129.450000
EC-GLD,2/3/2011,41.990000,129.280000
EC-GLD,2/4/2011,41.170000,131.230000
EC-GLD,2/7/2011,41.300000,131.230000
EC-GLD,2/8/2011,41.650000,132.800000
EC-GLD,2/9/2011,41.900000,132.490000
EC-GLD,2/10/2011,41.400000,132.000000
EC-GLD,2/11/2011,41.240000,132.090000
EC-GLD,2/14/2011,41.200000,132.700000
EC-GLD,2/15/2011,41.010000,133.630000
EC-GLD,2/16/2011,41.040000,133.450000
EC-GLD,2/17/2011,40.540000,134.470000
EC-GLD,2/18/2011,39.690000,134.880000
EC-GLD,2/22/2011,39.660000,136.190000

For example:
----> Correlation 1 : 15 (Last rows)
----> Correlation 2 : 10 (Last rows)

Output
Code:
EC-GLD,15,-0.8979
EC-GLD,10,-0.9037

Thanks a lot for your help

Last edited by Franklin52; 02-24-2011 at 03:36 AM.. Reason: Please use code tags, thank you
# 2  
Old 02-24-2011
cor.awk
Code:
awk -F, -v c="$1" '
{ d=$1; for(i=c;i;i--) {
    x[i]=x[i-1]
    y[i]=y[i-1] }
    x[0]=$3
    y[0]=$4
}
END { for(i=0;i<c;i++) {
        sx+=x[i]
        sy+=y[i]
        sxy+=x[i]*y[i]
        sx2+=x[i]*x[i]
        sy2+=y[i]*y[i] }
    print d, c, ( c * sxy - sx * sy ) / ( sqrt(c*sx2 - sx*sx) * sqrt(c*sy2 - sy * sy))
} ' OFS=, cor infile

Code:
$ ./cor.awk 10
EC-GLD,10,-0.903706
 
$ ./cor.awk 15
EC-GLD,15,-0.897988


Script could now be something like:

Code:
echo -e "Correlation 1: \c"
read c1
echo -e "Correlation 2: \c"
read c2
./cor.awk $c1
./cor.awk $c2

---------- Post updated 25-02-11 at 12:44 PM ---------- Previous update was 24-02-11 at 02:32 PM ----------

This is the 2nd time Iv'e done (what now looks like) stats homework for you Carlos, without any thanks or recognition, and to top things off you have been asked several times to put [code] tags around data files and scripts that you post which you just seem to ignore.

Well no more solutions from me sorry!

Last edited by Chubler_XL; 02-24-2011 at 10:48 PM.. Reason: Added EC-GLD and count to output
This User Gave Thanks to Chubler_XL For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

8 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Correlation Between 3 Different Loops using Bash

I have 3 loops that I use to determine the permission level of AWS user accounts. This array lists the AWS policy ARN (Amazon Resource Name): for ((policy_index=0;policy_index<${#aws_managed_policies};++policy_index)); do aws_policy_arn="${aws_managed_policies}" ... (1 Reply)
Discussion started by: bluethundr
1 Replies

2. Shell Programming and Scripting

awk to calculate fields only if match is found

Trying to combine the matching $5 values between file1 and file2. If a match is found then the last $6 value in the match and the sum of $7 are outputted to a new file. The awk below I hope is a good start. Thank you :). file1 chr12 9221325 9221448 chr12:9221325-9221448 A2M 1... (5 Replies)
Discussion started by: cmccabe
5 Replies

3. Shell Programming and Scripting

3 column .csv --> correlation matrix; awk, perl?

Greetings, salutations. I have a 3 column csv file with ~13 million rows and I would like to generate a correlation matrix. Interestingly, you all previously provided a solution to the inverse of this problem. Thread title: "awk? adjacency matrix to adjacency list / correlation matrix to list"... (6 Replies)
Discussion started by: R3353
6 Replies

4. Shell Programming and Scripting

Pearson correlation between two files

Hi, I want a quick way to determine the pearson correlation between two files. The two files have the same format with only the 3rd column varying. E.g. of file 1 chr1 0 62 chr1 1 260 chr1 2 474 chr1 3 562 chr1 4 633 chr1 5 870 chr1 6 931 chr1 7 978 chr1 8 1058 chr1 9 1151 E.g.... (1 Reply)
Discussion started by: kylle345
1 Replies

5. Shell Programming and Scripting

AWK - calculating simple correlation of rows

Is there any way to calculate a simple correlation of few selected rows with all the rows in input ? In the below example I selected Row01,02,03 and correlated with all the rows. I was trying to run in R. But the this big data matrix is too much to handle for R and eventually my system is... (3 Replies)
Discussion started by: quincyjones
3 Replies

6. Shell Programming and Scripting

Calculate age of a file | calculate time difference

Hello, I'm trying to create a shell script (#!/bin/sh) which should tell me the age of a file in minutes... I have a process, which delivers me all 15 minutes a new file and I want to have a monitoring script, which sends me an email, if the present file is older than 20 minutes. To do... (10 Replies)
Discussion started by: worm
10 Replies

7. Shell Programming and Scripting

correlation coefficient - Awk

Hi guys I have an input file with multiple columns and and rows. Is it possible to calculate correlation of certain value of certain No (For example x of S1 = 112) with all other values (for example start with x 112 corr a 3 of S1 = x-a 0.2 ) INPUT ******* No S1 S2 S3 S4 Sn a 3 ... (2 Replies)
Discussion started by: quincyjones
2 Replies

8. UNIX for Dummies Questions & Answers

chmod and cgi correlation

How much do chmod settings affect cgi scripts?? I have a "webmaster" at my work that says I cannot change the permissions on the cgi scripts, and that they work with only certain permissions. They are set for 644, I want to change them to 775 and put her in her own group, like she should be, not... (6 Replies)
Discussion started by: bigmacc
6 Replies
Login or Register to Ask a Question