Average of columns with values of other column with same name

12-13-2012

Registered User

5, 0

Join Date: Nov 2012

Last Activity: 24 January 2013, 7:28 AM EST

Posts: 5

Thanks Given: 3

Thanked 0 Times in 0 Posts

Average of columns with values of other column with same name

I have a lot of input files that have the following form:

Code:

Sample	Cq	Sample	Cq	Sample	Cq	Sample	Cq	Sample	Cq	
1WBIN	23.45	1WBIN	23.45	1CVSIN	23.96	1CVSIN	23.14	S1	31.37	
1WBIN	23.53	1WBIN	23.53	1CVSIN	23.81	1CVSIN	23.24	S1	31.49	
1WBIN	24.55	1WBIN	24.55	1CVSIN	23.86	1CVSIN	23.24	S1	31.74	
1CVSIN	23.62	1CVSIN	23.62	1CVSIP	22.12	1CVSIP	21.53	S10	31.13	
1CVSIN	23.46	1CVSIN	21.74	1CVSIP	22.24	1CVSIP	21.40	S10	31.10	
ICVSIN	21.74	1CVSIN	23.33	1CVSIP	22.22	1CVSIP	21.36	S10	31.29	
...

I need to read each column, get the row values from the column next to it that have same name in the first column and compute the average of their values and output in a new file. They are usually triplicates of the same name, i need to get all three of its values and compute the average of these . The output should look like this in a new file:

Code:

Sample	Cq	Sample	Cq	Sample	Cq	Sample	Cq	Sample	Cq	
1WBIN	23.843	1WBIN	23.84	1CVSIN	23.87	1CVSIN	23.20	S1	31.53	
1CVSIN	22.94	1CVSIN	22.89	1CVSIP	22.19	1CVSIP	21.43	S10	31.17

I print only one of the triplicates and next to it the average of the three values that each of the triplicates had.
Is it possible to be done in awk? I am trying with perl.

With PERL:
I thought of converting the columns into rows, read each line and create a hash having as KEY each Sample name and adding the value of each triplicate every time i find the same name to its VALUE. At the end I will divide the value/3 and output the key->value in the output file.
To keep the records organised and in order I will read the input file again and print key->value from the hash for each line.

Is there an easier way to do this? Because with hashes I will lose the ordering of the columns in the file and it will get a bit messy I'm afraid.

Thank you for any input.

isildur1234

View Public Profile for isildur1234

Find all posts by isildur1234

12-13-2012

Registered User

1,650, 478

Join Date: Mar 2012

Last Activity: 11 September 2019, 8:06 AM EDT

Posts: 1,650

Thanks Given: 58

Thanked 478 Times in 474 Posts

Try....

Code:

awk 'function print_data(){
        for(i=1;i<=NF;i++){if(i%2){k=k?k"\t"P[s,i]:P[s,i]}else{p=sprintf("%.02f", P[s,i]/A[s]);k=k"\t"p}}print k;k="";
}
NR==1
{A[$1]++;for(i=1;i<=NF;i++){if(i%2){P[$1,i]=$i}else{P[$1,i]+=$i}}}
NR>1{if(s!=$1 && s){print_data()};s=$1}END{print_data()}' file

Sample  Cq      Sample  Cq      Sample  Cq      Sample  Cq      Sample  Cq
1WBIN   23.84   1WBIN   23.84   1CVSIN  23.88   1CVSIN  23.21   S1      31.53
1CVSIN  22.94   1CVSIN  22.90   1CVSIP  22.19   1CVSIP  21.43   S10     31.17

Last edited by pamu; 12-13-2012 at 05:44 AM.. Reason: Corrected

This User Gave Thanks to pamu For This Post:

pamu

View Public Profile for pamu

Find all posts by pamu

12-13-2012

Registered User

5, 0

Join Date: Nov 2012

Last Activity: 24 January 2013, 7:28 AM EST

Posts: 5

Thanks Given: 3

Thanked 0 Times in 0 Posts

Thank you so much pamu.
It works just great with this file.
Is there any clear reason that you can see that it does not work properly with columns that have spaces in their names?
for example if i try this input:

Code:

Sample	Cq	Sample	Cq	Sample	Cq	Sample	Cq	Sample	Cq
1 WB IN	23.45	1 WB IN	23.45	1 CVS IN	23.96	1 CVS IN	23.14	S1	31.37
1 WB IN	23.53	1 WB IN	23.53	1 CVS IN	23.81	1 CVS IN	23.24	S1	31.49
1 WB IP	24.55	1 WB IN	24.55	1 CVS IN	23.86	1 CVS IN	23.24	S1	31.74
1 CVS IN	23.62	1 CVS IN	23.62	1 CVS IP	22.12	1 CVS IP	38.3	S10	31.13
1 CVS IN	23.46	I CVS IN	21.74	1 CVS IP	22.24	1 CVS IP	21.4	S10	31.1
1 CVS IN	21.74	1 CVS IN	23.33	1 CVS IP	22.22	1 CVS IP	21.36	S10	31.29

it only outputs the first line.

Many thanks again.

isildur1234

View Public Profile for isildur1234

Find all posts by isildur1234

12-13-2012

Registered User

1,650, 478

Join Date: Mar 2012

Last Activity: 11 September 2019, 8:06 AM EDT

Posts: 1,650

Thanks Given: 58

Thanked 478 Times in 474 Posts

I think you have filed separator as tab then try

Code:

awk -F "\t" 'function print_data(){
        for(i=1;i<=NF;i++){if(i%2){k=k?k"\t"P[s,i]:P[s,i]}else{p=sprintf("%.02f", P[s,i]/A[s]);k=k"\t"p}}print k;k="";
}
NR==1
{A[$1]++;for(i=1;i<=NF;i++){if(i%2){P[$1,i]=$i}else{P[$1,i]+=$i}}}
NR>1{if(s!=$1 && s){print_data()};s=$1}END{print_data()}' file

Sample  Cq      Sample  Cq      Sample  Cq      Sample  Cq      Sample  Cq
1 WB IN 23.49   1 WB IN 23.49   1 CVS IN        23.88   1 CVS IN        23.19   S1      31.43
1 WB IP 24.55   1 WB IN 24.55   1 CVS IN        23.86   1 CVS IN        23.24   S1      31.74
1 CVS IN        22.94   1 CVS IN        22.90   1 CVS IP        22.19   1 CVS IP        27.02   S10     31.17

This User Gave Thanks to pamu For This Post:

pamu

View Public Profile for pamu

Find all posts by pamu

Shell Programming and Scripting

Average of columns with values of other column with same name

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Copy columns from one file into another and get sum of column values and row count

Discussion started by: Tahir_M

2. Shell Programming and Scripting

Match first two columns and calculate percent of average in third column

Discussion started by: ncwxpanther

3. Shell Programming and Scripting

Splitting the numeric vs alpha values in a column to distinct columns

Discussion started by: driftlogic

4. Linux

To get all the columns in a CSV file based on unique values of particular column

Discussion started by: sanvel

5. Shell Programming and Scripting

Get the average from column, and eliminate the duplicate values.

Discussion started by: jiam912

6. Shell Programming and Scripting

Add the values in second and third columns with group by on first column.

Discussion started by: angshuman

7. UNIX for Dummies Questions & Answers

Taking the average of two columns and printing it on a new column

Discussion started by: evelibertine

8. Shell Programming and Scripting

Average values in a column based on range

Discussion started by: bhargavpbk88

9. Shell Programming and Scripting

how to flip values of two columns and add an extra column

Discussion started by: smriti_shridhar

10. Shell Programming and Scripting

How to check Null values in a file column by column if columns are Not NULLs

Discussion started by: Mandab