Visit Our UNIX and Linux User Community


average of rows with same value in the first column


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting average of rows with same value in the first column
# 1  
Old 08-29-2012
average of rows with same value in the first column

Dear All,
I have this file tab delimited
Code:
A	1	12	22
B	3	34	33
C	55	9	32
A	12	81	71
D	11	1	66
E	455	4	2
B	89	4	3

I would like to make the average every column where the first column is the same, for example,
Code:
A	6,5	46,5	46,5
B	46,0	19,0	18,0
C	55,0	9,0	32,0
D	11,0	1,0	66,0
E	455,0	4,0	2,0

Any help?
thanks,
paolo

Moderator's Comments:
Mod Comment Please view this code tag video for how to use code tags when posting code and data.

Last edited by vbe; 08-29-2012 at 05:32 AM..
# 2  
Old 08-29-2012
Yes, nice, and what have you done so far?
# 3  
Old 08-29-2012
I have done it in excel but my file is huge, almost a million of rows, I can't go on by hand!
paolo
# 4  
Old 08-29-2012
Code:
awk '{
c[$1]=1
for(i=2;i<=NF;i++)
{
 total[$1,i]+=$i
 count[$1,i]++
}
}
END{
for(i in c)
{
 printf "%s",i
 for(j=2;count[i,j];j++)
  printf "\t%.1f", (total[i,j]/count[i,j])
 printf "\n"
}
}' file


Last edited by elixir_sinari; 08-29-2012 at 06:02 AM..
# 5  
Old 08-29-2012
thanks but unfortunately is not working the script,

I found a way to sum up values with same ID, now I need to calculate the average:
Code:
awk '{a[$1]+=$2;b[$1]+=$3;c[$1]+=$4;d[$1]+=$5} END{for (i in a) print i,a[i],b[i],c[i],d[i]}' file

cheers,
paolo

Last edited by Franklin52; 08-29-2012 at 06:23 AM.. Reason: Please use code tags for data and code samples
# 6  
Old 08-29-2012
Another approach:
Code:
awk '{a[$1]+=$2; b[$1]+=$3; c[$1]+=$4; n[$1]++}
END{for(i in a) printf("%s\t%.1f\t%.1f\t%.1f\n", i, a[i]/n[i], b[i]/n[i], c[i]/n[i])}
' file

# 7  
Old 08-29-2012
with your code you only need to add a count line :

Code:
awk '{a[$1]+=$2;b[$1]+=$3;c[$1]+=$4;d[$1]+=$5;count[$1]+=1} END{for (i in a) print i,a[i]/count[i],b[i]/count[i],c[i]/count[i],d[i]/count[i]}' file


Previous Thread | Next Thread
Test Your Knowledge in Computers #619
Difficulty: Medium
In Python, variables may not be used by just assigning them a value. Declaration and data type definition is required in Python.
True or False?

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Average select rows

I have no idea how to even get started with this script. I need to average field 3 for each of the unique identifiers found in field 1. However, I only want to average these rows when field 2 is equal to 1506 - 2000 (note that i replaced the values field 2 for security reasons, but the real... (6 Replies)
Discussion started by: ncwxpanther
6 Replies

2. Shell Programming and Scripting

Average across rows with a condition

Hi Friends, My input file Gene1 10 20 0 Gene2 5 0 15 Gene3 10 10 10 Gene4 5 0 0 If there is a zero for any gene in any column, I don't want that column to be considered which reduces the denominator value during average. Here is my output Gene1 10 20 0 10 Gene2 5 0 15 10 Gene3... (5 Replies)
Discussion started by: jacobs.smith
5 Replies

3. Shell Programming and Scripting

Check first column - average second column based on a condition

Hi, My input file Gene1 1 Gene1 2 Gene1 3 Gene1 0 Gene2 0 Gene2 0 Gene2 4 Gene2 8 Gene3 9 Gene3 9 Gene4 0 Condition: If the first column matches, then look in the second column. If there is a value of zero in the second column, then don't consider that record while averaging. ... (5 Replies)
Discussion started by: jacobs.smith
5 Replies

4. Shell Programming and Scripting

Average values of duplicate rows

I have this file input.txt. I want to take average column-wise for the rows having duplicate gene names. Gene Sample_1 Sample_2 Sample_3 gene_A 2 4 5 gene_B 1 2 3 gene_A 0 5 7 gene_B 4 5 6 gene_A 11 12 13 gene_C 2 3 4 Desired output: gene_A 4.3 7 8.3 gene_B 2.5 3.5 4.5 gene_C 2 3 4... (6 Replies)
Discussion started by: Sanchari
6 Replies

5. UNIX for Dummies Questions & Answers

Writing a script to take the average of two columns every 3 rows

I have a dataset with 120 columns. I would like to write a script, that takes the average of every two columns, starting from columns 2 and 3, and moving consecutively in frames of 3 columns, all the way until the last column. The first column in the output file would be the averages of columns... (1 Reply)
Discussion started by: evelibertine
1 Replies

6. UNIX Desktop Questions & Answers

Calculate average for rows in a text file

Dear Gurus, I have tab-delimited text files with matrix containing values. The first column is a identifier and other columns have the corresponding values. I would like to calculate the average value (total number/number of entries) for all entries from 2nd column to the last column in row... (3 Replies)
Discussion started by: Unilearn
3 Replies

7. Shell Programming and Scripting

Average calculation based on number of rows

Dear users, I need your support, I have a file like this: 272134.548 6680572.715 272134.545 6680572.711 272134.546 6680572.713 272134.548 6680572.706 272134.545 6680572.721 272134.543 6680572.710 272134.544 6680572.715 272134.543 6680572.705 272134.540 6680572.720 272134.544... (10 Replies)
Discussion started by: Gery
10 Replies

8. Shell Programming and Scripting

Script to find the average of a given column and also for specified number of rows?

Hi Friends, In continuation to my earlier post https://www.unix.com/shell-programming-scripting/99166-script-find-average-given-column-also-specified-number-rows.html I am extending my problem as follows. Input: Column1 Column2 MAS 1 MAS 4 ... (2 Replies)
Discussion started by: ks_reddy
2 Replies

9. Shell Programming and Scripting

Script to find the average of a given column and also for specified number of rows??

Hi friends I have 100 files in my directory. Each file look like this.. Temp1 Temp2 Temp3 MAS 1 2 3 MAS 4 5 6 MAS 7 8 9 Delhi 10 11 12 Delhi 13 14 15 Delhi 16 17 ... (4 Replies)
Discussion started by: ks_reddy
4 Replies

10. UNIX for Dummies Questions & Answers

Calculating the Number of Rows and Average

Hi All I like to know how can we calculate the number of rows and the average of the values present in the file. I will not know what will be the rowcount, which will be dynamic in nature of the file. eg. 29 33 48 30 28 (6 Replies)
Discussion started by: pk_eee
6 Replies

Featured Tech Videos