Average of columns with values of other column with same name


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Average of columns with values of other column with same name
# 1  
Old 12-13-2012
Average of columns with values of other column with same name

I have a lot of input files that have the following form:
Code:
Sample	Cq	Sample	Cq	Sample	Cq	Sample	Cq	Sample	Cq	
1WBIN	23.45	1WBIN	23.45	1CVSIN	23.96	1CVSIN	23.14	S1	31.37	
1WBIN	23.53	1WBIN	23.53	1CVSIN	23.81	1CVSIN	23.24	S1	31.49	
1WBIN	24.55	1WBIN	24.55	1CVSIN	23.86	1CVSIN	23.24	S1	31.74	
1CVSIN	23.62	1CVSIN	23.62	1CVSIP	22.12	1CVSIP	21.53	S10	31.13	
1CVSIN	23.46	1CVSIN	21.74	1CVSIP	22.24	1CVSIP	21.40	S10	31.10	
ICVSIN	21.74	1CVSIN	23.33	1CVSIP	22.22	1CVSIP	21.36	S10	31.29	
...

I need to read each column, get the row values from the column next to it that have same name in the first column and compute the average of their values and output in a new file. They are usually triplicates of the same name, i need to get all three of its values and compute the average of these . The output should look like this in a new file:

Code:
Sample	Cq	Sample	Cq	Sample	Cq	Sample	Cq	Sample	Cq	
1WBIN	23.843	1WBIN	23.84	1CVSIN	23.87	1CVSIN	23.20	S1	31.53	
1CVSIN	22.94	1CVSIN	22.89	1CVSIP	22.19	1CVSIP	21.43	S10	31.17

I print only one of the triplicates and next to it the average of the three values that each of the triplicates had.
Is it possible to be done in awk? I am trying with perl.

With PERL:
I thought of converting the columns into rows, read each line and create a hash having as KEY each Sample name and adding the value of each triplicate every time i find the same name to its VALUE. At the end I will divide the value/3 and output the key->value in the output file.
To keep the records organised and in order I will read the input file again and print key->value from the hash for each line.


Is there an easier way to do this? Because with hashes I will lose the ordering of the columns in the file and it will get a bit messy I'm afraid.

Thank you for any input.
# 2  
Old 12-13-2012
Try....

Code:
awk 'function print_data(){
        for(i=1;i<=NF;i++){if(i%2){k=k?k"\t"P[s,i]:P[s,i]}else{p=sprintf("%.02f", P[s,i]/A[s]);k=k"\t"p}}print k;k="";
}
NR==1
{A[$1]++;for(i=1;i<=NF;i++){if(i%2){P[$1,i]=$i}else{P[$1,i]+=$i}}}
NR>1{if(s!=$1 && s){print_data()};s=$1}END{print_data()}' file

Sample  Cq      Sample  Cq      Sample  Cq      Sample  Cq      Sample  Cq
1WBIN   23.84   1WBIN   23.84   1CVSIN  23.88   1CVSIN  23.21   S1      31.53
1CVSIN  22.94   1CVSIN  22.90   1CVSIP  22.19   1CVSIP  21.43   S10     31.17


Last edited by pamu; 12-13-2012 at 05:44 AM.. Reason: Corrected
This User Gave Thanks to pamu For This Post:
# 3  
Old 12-13-2012
Thank you so much pamu.
It works just great with this file.
Is there any clear reason that you can see that it does not work properly with columns that have spaces in their names?
for example if i try this input:

Code:
Sample	Cq	Sample	Cq	Sample	Cq	Sample	Cq	Sample	Cq
1 WB IN	23.45	1 WB IN	23.45	1 CVS IN	23.96	1 CVS IN	23.14	S1	31.37
1 WB IN	23.53	1 WB IN	23.53	1 CVS IN	23.81	1 CVS IN	23.24	S1	31.49
1 WB IP	24.55	1 WB IN	24.55	1 CVS IN	23.86	1 CVS IN	23.24	S1	31.74
1 CVS IN	23.62	1 CVS IN	23.62	1 CVS IP	22.12	1 CVS IP	38.3	S10	31.13
1 CVS IN	23.46	I CVS IN	21.74	1 CVS IP	22.24	1 CVS IP	21.4	S10	31.1
1 CVS IN	21.74	1 CVS IN	23.33	1 CVS IP	22.22	1 CVS IP	21.36	S10	31.29

it only outputs the first line.

Many thanks again.
# 4  
Old 12-13-2012
I think you have filed separator as tab then try

Code:
awk -F "\t" 'function print_data(){
        for(i=1;i<=NF;i++){if(i%2){k=k?k"\t"P[s,i]:P[s,i]}else{p=sprintf("%.02f", P[s,i]/A[s]);k=k"\t"p}}print k;k="";
}
NR==1
{A[$1]++;for(i=1;i<=NF;i++){if(i%2){P[$1,i]=$i}else{P[$1,i]+=$i}}}
NR>1{if(s!=$1 && s){print_data()};s=$1}END{print_data()}' file

Sample  Cq      Sample  Cq      Sample  Cq      Sample  Cq      Sample  Cq
1 WB IN 23.49   1 WB IN 23.49   1 CVS IN        23.88   1 CVS IN        23.19   S1      31.43
1 WB IP 24.55   1 WB IN 24.55   1 CVS IN        23.86   1 CVS IN        23.24   S1      31.74
1 CVS IN        22.94   1 CVS IN        22.90   1 CVS IP        22.19   1 CVS IP        27.02   S10     31.17

This User Gave Thanks to pamu For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Copy columns from one file into another and get sum of column values and row count

I have a file abc.csv, from which I need column 24(PurchaseOrder_TotalCost) to get the sum_of_amounts with date and row count into another file say output.csv abc.csv- UTF-8,,,,,,,,,,,,,,,,,,,,,,,,, ... (6 Replies)
Discussion started by: Tahir_M
6 Replies

2. Shell Programming and Scripting

Match first two columns and calculate percent of average in third column

I have the need to match the first two columns and when they match, calculate the percent of average for the third columns. The following awk script does not give me the expected results. awk 'NR==FNR {T=$3; next} $1,$2 in T {P=T/$3*100; printf "%s %s %.0f\n", $1, $2, (P>=0)?P:-P}' diff.file... (1 Reply)
Discussion started by: ncwxpanther
1 Replies

3. Shell Programming and Scripting

Splitting the numeric vs alpha values in a column to distinct columns

How could i take an input file and split the numeric values from the alpha values (123 vs abc) to distinc columns, and if the source is blank to keep it blank (null) in both of the new columns: So if the source file had a column like: Value: |1 | |2.3| | | |No| I would... (7 Replies)
Discussion started by: driftlogic
7 Replies

4. Linux

To get all the columns in a CSV file based on unique values of particular column

cat sample.csv ID,Name,no 1,AAA,1 2,BBB,1 3,AAA,1 4,BBB,1 cut -d',' -f2 sample.csv | sort | uniq this gives only the 2nd column values Name AAA BBB How to I get all the columns of CSV along with this? (1 Reply)
Discussion started by: sanvel
1 Replies

5. Shell Programming and Scripting

Get the average from column, and eliminate the duplicate values.

Dear Experts, Kindly help me please, I have a big file where there is duplicate values in col 11 till col 23, every 2 rows appers a new numbers, but in each row there is different coordinates x and y in col 57 till col 74. Please i will like to get a single value and average of the x and y... (8 Replies)
Discussion started by: jiam912
8 Replies

6. Shell Programming and Scripting

Add the values in second and third columns with group by on first column.

Hi All, I have a pipe seperated file. I need to add the values in second and third columns with group by on first column. MYFILE_28012012_1115|47|173.90 MYFILE_28012012_1115|4|0.00 MYFILE_28012012_1115|6|22.20 MYFILE_28012012_1116|47|173.90 MYFILE_28012012_1116|4|0.00... (3 Replies)
Discussion started by: angshuman
3 Replies

7. UNIX for Dummies Questions & Answers

Taking the average of two columns and printing it on a new column

Hi, I have a space delimited text file that looks like the following: Aa 100 200 Bb 300 100 Cc X 500 Dd 600 X Basically, I want to take the average of columns 2 and 3 and print it in column 4. However if there is an X in either column 2 or 3, I want to print the non-X value. Therefore... (11 Replies)
Discussion started by: evelibertine
11 Replies

8. Shell Programming and Scripting

Average values in a column based on range

Hi i have data with two columns like below. I want to find average of column values like if the value in column 2 is between 0-250000 the average of column 1 is some xx and average of column2 is ww then if value is 250001-5000000 average of column 1 is yy and average of column 2 is zz. And my... (5 Replies)
Discussion started by: bhargavpbk88
5 Replies

9. Shell Programming and Scripting

how to flip values of two columns and add an extra column

Hi guys, Couldn't find the solution of this problem. Please Help! I have a file- Input_File TC200232 92 30 TC215306 2 74 TC210135 42 14 I want an output file in which if column2>column3, the values are swapped and an additional column with value Rev_Com is... (4 Replies)
Discussion started by: smriti_shridhar
4 Replies

10. Shell Programming and Scripting

How to check Null values in a file column by column if columns are Not NULLs

Hi All, I have a table with 10 columns. Some columns(2nd,4th,5th,7th,8th and 10th) are Not Null columns. I'll get a tab-delimited file and want to check col by col and generate seperate error code for each col eg:102 if 2nd col value is NULL and 104 if 4th col value is NULL so on... I am a... (7 Replies)
Discussion started by: Mandab
7 Replies
Login or Register to Ask a Question