Sponsored Content
Top Forums UNIX for Beginners Questions & Answers Calculating correlations across columns in awk Post 303026256 by Corona688 on Thursday 22nd of November 2018 11:52:27 AM
Old 11-22-2018
I've broken my code up into stdev and pearson parts for readability.

Code:
$ cat stdev.awk

function stdev_mean(TITLE) { return(DATA[TITLE,"T"]/DATA[TITLE,"C"]); }
function stdev_count(TITLE) { return(DATA[TITLE,"C"]-1); }

# Run first for data 1-n to get means
function stdev_pass1(TITLE, VAL) {
        DATA[TITLE,"T"] += VAL+0;
        DATA[TITLE,"C"] ++;
}

# Run second for data 1-n to get standard deviations
function stdev_pass2(TITLE,VAL,X) {
        X = stdev_mean(TITLE) - VAL+0;
        DATA[TITLE,"D"] += X*X;
}

# Final result after both passes
function stdev(TITLE) { return(sqrt(DATA[TITLE,"D"] / stdev_count(TITLE)));}

$ cat pearson.awk

# Integrate titles and skip first line
FNR==1 {        for(N=2; N<=NF; N++) COL[N]=$N ; MAX=NF; next   }

# First pass, calculate means and skip to next line
NR==FNR {       for(N=2; N<=NF; N++) stdev_pass1(N, $N); next }

# Second pass, means are now valid, calculate deviation and correlation
{
        for(N=2; N<=NF; N++) stdev_pass2(N, $N);
        for(N=2; N<=NF; N++) for(M=N; M<=NF; M++)
                CORR[N,M]+=(stdev_mean(N) - ($N+0)) * (stdev_mean(M) - ($M+0));
}

END { # Print final data
        for(N=2; N<=MAX; N++)   for(M=N; M<=MAX; M++)
        print COL[N], COL[M], CORR[N,M] / (stdev_count(N)*stdev(N)*stdev(M));
}

# Not a typo, awk is fed inputfile twice.
# This avoids needing to store the entire massive file in memory.
$ awk -f stdev.awk -f pearson.awk inputfile inputfile

C1 C1 1
C1 C2 -0.4
C1 C3 -0.069843
C1 C4 -0.597614
C1 C5 -0.175412
C1 C6 -0.362738
C2 C2 1
C2 C3 0.488901
C2 C4 0.239046
C2 C5 0.30697
C2 C6 0.362738
C3 C3 1
C3 C4 0.667827
C3 C5 0.581936
C3 C6 0.861381
C4 C4 1
C4 C5 0.576557
C4 C6 0.932143
C5 C5 1
C5 C6 0.731727
C6 C6 1

$

Be sure to use GNU awk, other versions may have a 2048-byte line size limit.
These 2 Users Gave Thanks to Corona688 For This Post:
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Calculating totals in AWK

Hello, With the following small script I list the size of documents belonging to a certain user by each time selecting the bytes-field of that file ($7). Now it fills the array with every file it finds so in the end the output of some users contains up to 200.000 numbers. So how can I calculate... (7 Replies)
Discussion started by: Hille
7 Replies

2. Shell Programming and Scripting

calculating endless columns

I have about 5000 columns of data that i need to convert all of it into pecentages. for shorter colums i have been using this code: {print $1/($1+$2)*100,$2/($1+$2),$3/($3+$4)*100 .....} but this is a teadious process... is there anyway to do it without having to write all of them out? sample... (20 Replies)
Discussion started by: chronicx
20 Replies

3. Shell Programming and Scripting

Awk program for calculating dates.

Hi All, I have a txt file which has hundreds of lines and 41 fields. I have a requirement to pick up field 14 from the text file which is a date fiels in the following format. Field 14 : Data Type : NUMERIC DATE (YYYYMMDD) Field Length : 8 Example of Data :20090415 Field 42 : Data Type... (2 Replies)
Discussion started by: nua7
2 Replies

4. Shell Programming and Scripting

[Solved] awk calculating between lines

Hey guys, maybe you can help me with this... I want to read input.dat line by line, while doing a simple calculation between the second column value of the current line and the second column value of the next line (like a difference). input is something like this: 0 3.945757 1 ... (1 Reply)
Discussion started by: origamisven
1 Replies

5. Shell Programming and Scripting

Calculating an integer with awk

I would like to extract a number from $0 and calculate if it can be devided by 25. Though the number can also be less then 25 or bigger than 100. How do i extract the number and how can the integer be calculated? String: "all_results">39</span>I am looking for the number between "all_results"> ... (5 Replies)
Discussion started by: sdf
5 Replies

6. Shell Programming and Scripting

AWK way of calculating growth

Hi All, IS there any 'awk' way to manipulate following data? Fruit Date Count Apple 20/08/2011 5 Apple 27/08/2011 7 Apple 05/09/2011 11 Apple 12/09/2011 3 Apple 19/09/2011 25 . . . . Orange 20/08/2011 9 Orange 27/08/2011 20 Orange 27/08/2011 7 Orange 05/09/2011 15 Orange... (3 Replies)
Discussion started by: aniketdixit
3 Replies

7. Shell Programming and Scripting

Calculating the epoch time from standard time using awk and calculating the duration

Hi All, I have the following time stamp data in 2 columns Date TimeStamp(also with milliseconds) 05/23/2012 08:30:11.250 05/23/2012 08:30:15.500 05/23/2012 08:31.15.500 . . etc From this data I need the following output. 0.00( row1-row1 in seconds) 04.25( row2-row1 in... (5 Replies)
Discussion started by: ks_reddy
5 Replies

8. Shell Programming and Scripting

Calculating average with awk

I need to find the average from a file like: data => BW:123 M:30 RTD:0 1 0 1 0 0 1 1 1 1 0 0 1 1 0' data => BW:123 N:30 RTD:0 1 0 1 0 0 1 1 1 1 0 0 1 1 0' data => BW:123 N:30 RTD:0 1 0 1 0 0 1 1 1 1 0 0 1 1 0' data => BW:123 N:30 RTD:0 1 0 1 0 0 1 1 1 1 0 0 1 1 0' data => BW:123 N:30 RTD:0 1... (4 Replies)
Discussion started by: Slagle
4 Replies

9. Shell Programming and Scripting

Calculating Running Variance Using Awk

Hi all, I am attempting to calculate a running variance for a file containing a column of numbers. I am using the formula variance=sum((x-mean(x))^2)/(n-1), where x is the value on the current row, and mean(x) is the average of all of the values up until that row. n represents the total number... (1 Reply)
Discussion started by: Jahn
1 Replies

10. UNIX for Dummies Questions & Answers

Calculating cumulative frequency using awk

Hi, I wanted to calculate cumulative frequency distribution of my data that involves several arithmetic calls. I did things in excel but its taking me forever. this is what I want to do: var1.txt contains n observations which I have to compute for frequency which is given by 1/n and subsequently... (7 Replies)
Discussion started by: ida1215
7 Replies
DPKG-AWK(1)						      General Commands Manual						       DPKG-AWK(1)

NAME
dpkg-awk - Utility to read a dpkg style db file SYNOPSIS
dpkg-awk [(-f|--file) filename] [(-d|--debug) ##] [(-s|--sort) list] [(-rs|--rec_sep) ??] '<fieldname>:<regex>' ... -- <out_fieldname> .. DESCRIPTION
dpkg-awk Parses a dpkg status file (or other similarly formatted file) and outputs the resulting records. It can use regex on the field values to limit the returned records, it can also be told which fields to output, and it can sort the matched fields. OPTIONS
-f filename --file filename The file to parse. The default is /var/lib/dpkg/status. -d [#] --debug [#] Each time this is specified, it increased the debug level. -s field(s) --sort field(s) A space or comma separated list of fields to sort on. -n field(s) --numeric field(s) A space or comma separated list of fields that should be interpreted as numeric in value. -rs ?? --rec_sep ?? Output this string at the end of each output paragraph. -h --help Display some help. fieldname The fields from the file, that are matched with the regex given. The fieldnames are case insensitive. out_fieldname The fields from the file, that are output for each record. If the first field listed begins with ^, then the list of fields that follows will NOT be output. BUGS
Be warned that the author has only a shallow understanding of the dpkg packaging system, so there are probably tons of bugs in this pro- gram. This program comes with no warranties. If running this program causes fire and brimstone to rain down upon the earth, you will be on your own. This program accesses the dpkg database directly in places, querying for data that cannot be gotten via dpkg. AUTHOR
Adam Heath <doogie@debian.org> DEBIAN
Debian Utilities DPKG-AWK(1)
All times are GMT -4. The time now is 06:53 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy