Sponsored Content
Top Forums Shell Programming and Scripting Pearson correlation between two files Post 302647185 by balajesuri on Monday 28th of May 2012 01:41:03 AM
Old 05-28-2012
Code:
#! /usr/bin/perl -w
use strict;

my ($x_bar, $x_sd, $y_bar, $y_sd, $i, $numerator, $r);
my (@f1_data, @f2_data);

open F1, "< file1";
for (<F1>) {
    push (@f1_data, (split /\s+/)[2]);
}
close F1;

open F2, "< file2";
for (<F2>) {
    push (@f2_data, (split /\s+/)[2]);
}
close F2;

($x_bar, $x_sd) = avg_sd (@f1_data);
($y_bar, $y_sd) = avg_sd (@f2_data);

for ($i=0; $i<@f1_data; $i++) {
    $numerator += (($f1_data[$i] - $x_bar) * ($f2_data[$i] - $y_bar));
}

$r = $numerator / (@f1_data * $x_sd * $y_sd);
print "$r\n";

sub avg_sd {
    my ($sum, $avg, $sum_of_sq, $sd) = (0, 0, 0, 0);
    my @data = @_;
    for (@data) {
        $sum += $_;
    }
    $avg = $sum / @data;
    
    for (@data) {
        $sum_of_sq += (($_ - $avg) ** 2);
    }
    
    $sd = sqrt ($sum_of_sq / @data);
    
    return ($avg, $sd);
}

For the given two input files viz. file1 and file2, the correlation coefficient is 0.999125083532687.

By the way, if the input data are fewer in number, I'd suggest you use a scientific calculator. I was using a Casio FX 991 MS back in college Smilie I still have it. Masterpiece.
This User Gave Thanks to balajesuri For This Post:
 

8 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

chmod and cgi correlation

How much do chmod settings affect cgi scripts?? I have a "webmaster" at my work that says I cannot change the permissions on the cgi scripts, and that they work with only certain permissions. They are set for 644, I want to change them to 775 and put her in her own group, like she should be, not... (6 Replies)
Discussion started by: bigmacc
6 Replies

2. Shell Programming and Scripting

correlation coefficient - Awk

Hi guys I have an input file with multiple columns and and rows. Is it possible to calculate correlation of certain value of certain No (For example x of S1 = 112) with all other values (for example start with x 112 corr a 3 of S1 = x-a 0.2 ) INPUT ******* No S1 S2 S3 S4 Sn a 3 ... (2 Replies)
Discussion started by: quincyjones
2 Replies

3. Shell Programming and Scripting

Calculate Correlation between two fields !

Hello, I request your help with a shell script (awk) that ask for two inputs in order to calculate the correlation of the last rows between two fields ( 3 and 4). Data: EC-GLD,1/25/2011,41.270000,129.070000 EC-GLD,1/26/2011,41.550000,129.280000 EC-GLD,1/27/2011,42.260000,127.800000... (1 Reply)
Discussion started by: csierra
1 Replies

4. Shell Programming and Scripting

AWK - calculating simple correlation of rows

Is there any way to calculate a simple correlation of few selected rows with all the rows in input ? In the below example I selected Row01,02,03 and correlated with all the rows. I was trying to run in R. But the this big data matrix is too much to handle for R and eventually my system is... (3 Replies)
Discussion started by: quincyjones
3 Replies

5. Shell Programming and Scripting

awk? adjacency matrix to adjacency list / correlation matrix to list

Hi everyone I am very new at awk but think that that might be the best strategy for this. I have a matrix very similar to a correlation matrix and in practical terms I need to convert it into a list containing the values from the matrix (one value per line) with the first field of the line (row... (5 Replies)
Discussion started by: stonemonkey
5 Replies

6. Shell Programming and Scripting

3 column .csv --> correlation matrix; awk, perl?

Greetings, salutations. I have a 3 column csv file with ~13 million rows and I would like to generate a correlation matrix. Interestingly, you all previously provided a solution to the inverse of this problem. Thread title: "awk? adjacency matrix to adjacency list / correlation matrix to list"... (6 Replies)
Discussion started by: R3353
6 Replies

7. Shell Programming and Scripting

Correlation Between 3 Different Loops using Bash

I have 3 loops that I use to determine the permission level of AWS user accounts. This array lists the AWS policy ARN (Amazon Resource Name): for ((policy_index=0;policy_index<${#aws_managed_policies};++policy_index)); do aws_policy_arn="${aws_managed_policies}" ... (1 Reply)
Discussion started by: bluethundr
1 Replies

8. UNIX for Beginners Questions & Answers

Automate splitting of files , scp files as each split completes and combine files on target server

i use the split command to split a one terabyte backup file into 10 chunks of 100 GB each. The files are split one after the other. While the files is being split, I will like to scp the files one after the other as soon as the previous one completes, from server A to Server B. Then on server B ,... (2 Replies)
Discussion started by: malaika
2 Replies
comm(1) 							   User Commands							   comm(1)

NAME
comm - select or reject lines common to two files SYNOPSIS
comm [-123] file1 file2 DESCRIPTION
The comm utility reads file1 and file2, which must be ordered in the current collating sequence, and produces three text columns as output: lines only in file1; lines only in file2; and lines in both files. If the input files were ordered according to the collating sequence of the current locale, the lines written will be in the collating sequence of the original lines. If not, the results are unspecified. OPTIONS
The following options are supported: -1 Suppresses the output column of lines unique to file1. -2 Suppresses the output column of lines unique to file2. -3 Suppresses the output column of lines duplicated in file1 and file2. OPERANDS
The following operands are supported: file1 A path name of the first file to be compared. If file1 is -, the standard input is used. file2 A path name of the second file to be compared. If file2 is -, the standard input is used. USAGE
See largefile(5) for the description of the behavior of comm when encountering files greater than or equal to 2 Gbyte ( 2^31 bytes). EXAMPLES
Example 1 Printing a list of utilities specified by files If file1, file2, and file3 each contain a sorted list of utilities, the command example% comm -23 file1 file2 | comm -23 - file3 prints a list of utilities in file1 not specified by either of the other files. The entry: example% comm -12 file1 file2 | comm -12 - file3 prints a list of utilities specified by all three files. And the entry: example% comm -12 file2 file3 | comm -23 -file1 prints a list of utilities specified by both file2 and file3, but not specified in file1. ENVIRONMENT VARIABLES
See environ(5) for descriptions of the following environment variables that affect the execution of comm: LANG, LC_ALL, LC_COLLATE, LC_CTYPE, LC_MESSAGES, and NLSPATH. EXIT STATUS
The following exit values are returned: 0 All input files were successfully output as specified. >0 An error occurred. ATTRIBUTES
See attributes(5) for descriptions of the following attributes: +-----------------------------+-----------------------------+ | ATTRIBUTE TYPE | ATTRIBUTE VALUE | +-----------------------------+-----------------------------+ |Availability |SUNWesu | +-----------------------------+-----------------------------+ |CSI |enabled | +-----------------------------+-----------------------------+ |Interface Stability |Standard | +-----------------------------+-----------------------------+ SEE ALSO
cmp(1), diff(1), sort(1), uniq(1), attributes(5), environ(5), largefile(5), standards(5) SunOS 5.11 3 Mar 2004 comm(1)
All times are GMT -4. The time now is 08:32 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy