How to average value if they have the same annotation names?


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers How to average value if they have the same annotation names?
# 1  
Old 11-03-2014
How to average value if they have the same annotation names?

Hi I have a file like this
input_file
Code:
CR387793 -0.8
CR387793 -5.5
CR387794 -5.3
CR387795 -0.9
AR388755 -3.0
AR388755 3.8
AR388755 4.5

Each line has annotation name and its correlated value. The annotation name and the value are seperated by a space. I want to average the value if the lines have the same annotation names in the output file. In this case, there are 2 lines have CR387793 annotataion and 3 lines have AR388755 annotation in the input file, so the average value of CR387793 should be -3.15 and the average value of AR388755 should be 2.65, all the other annotation has only one unique value so it will be kept in the output file. So I am expecting the output file is like this below
output_file
Code:
CR387793 -3.15
CR387794 -5.3
CR387795 -0.9
AR388755 2.65

I have thousands of lines in my input file to be processed like the example, how can I achieve this by the Unix command. Thank you very much!

Last edited by yuejian; 11-03-2014 at 02:32 PM..
# 2  
Old 11-03-2014
Code:
awk '{A[$1]+=$2;++D[$1]} END{ for (n in A){print n, A[n]/D[n]}}' file

CR387793 -3.15
CR387794 -5.3
CR387795 -0.9
AR388755 1.76667

Notice that the average of
Code:
AR388755 -3.0
AR388755 3.8
AR388755 4.5

is not
Code:
AR388755 2.65


Last edited by Aia; 11-03-2014 at 02:37 PM..
This User Gave Thanks to Aia For This Post:
# 3  
Old 11-03-2014
Quote:
Originally Posted by Aia
Code:
awk '{A[$1]+=$2;++D[$1]} END{ for (n in A){print n, A[n]/D[n]}}' file

CR387793 -3.15
CR387794 -5.3
CR387795 -0.9
AR388755 1.76667

Notice that the average of
Code:
AR388755 -3.0
AR388755 3.8
AR388755 4.5

is not
Code:
AR388755 2.65

Thanks Aia, it works perfectly. Also thanks for point my mistake and I will be more careful when I post my question.
This User Gave Thanks to yuejian For This Post:
# 4  
Old 11-03-2014
I don't know if I missed this part or it was edited:
Quote:
all the other annotation has only one unique value so it will be kept in the output file. So I am expecting the output file is like this below
I am assuming that your file, actually, looks something like:
Code:
CR387793 -0.8
CR387793 -5.5
CR387794 -5.3
CR387795 -0.9
CR387796
AR388755 -3.0
AR388755 3.8
AR388755 4.5

In that case:
Code:
awk 'NF==2{A[$1]+=$2;++D[$1]} END{ for (n in A){print n, A[n]/D[n]}}' file

This User Gave Thanks to Aia For This Post:
# 5  
Old 11-04-2014
Hello yuejian,

Following may help if there is a column whose 2nd field is empty. Let's say following is input file.

Code:
cat Input_file
CR387793 -0.8
CR387793 -5.5
CR387794 -5.3
CR387795 -0.9
ABCVDGSS
AR388755 -3.0
AR388755 3.8
AR388755 4.5

Code:
awk '$2{A[$1]+=$2;++B[$1]} !$2{A[$1]=" No Average.";} END{for(i in A){if(B[i]){print i OFS A[i] / B[i]} else {print i OFS A[i]}}}' Input_file

Output will be as follows.

Code:
ABCVDGSS  No Average.
CR387793 -3.15
CR387794 -5.3
CR387795 -0.9
AR388755 1.76667

Thanks,
R. Singh

Last edited by RavinderSingh13; 11-04-2014 at 01:09 AM.. Reason: Added input file details
This User Gave Thanks to RavinderSingh13 For This Post:
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Help with average calculation.

i have a file with 2 columns. i want to calculate the average of column 1 based on the values of column 2. here's how the file looks like. i want to calculate the sums of numbers corresponding to 1 and then calculate the average. same for numbers corresponding to zero. any help with a code would... (1 Reply)
Discussion started by: onerokeyz
1 Replies

2. UNIX for Beginners Questions & Answers

Snps annotation

I have the following Snps data CHROM POS ID chr7 78599583 rs987435 chr15 33395779 rs987436 chr1 189807684 rs987437 chr20 33907909 rs987438 chr12 75664046 rs987439 and the following gene data genename name chrom strand txstart txend... (8 Replies)
Discussion started by: marwah
8 Replies

3. Shell Programming and Scripting

Exclude certain file names while selectingData files coming in different names in a file name called

Data files coming in different names in a file name called process.txt. 1. shipments_yyyymmdd.gz 2 Order_yyyymmdd.gz 3. Invoice_yyyymmdd.gz 4. globalorder_yyyymmdd.gz The process needs to discard all the below files and only process two of the 4 file names available ... (1 Reply)
Discussion started by: dsravanam
1 Replies

4. UNIX for Dummies Questions & Answers

Find the average based on similar names in the first column

I have a table, say this: name1 num1 num2 num3 num4 name2 num5 num6 num7 num8 name3 num1 num3 num4 num9 name2 num8 num9 num1 num2 name2 num4 num5 num6 num4 name4 num4 num5 num7 num8 name5 num1 num3 num9 num7 name5 num6 num8 num3 num4 I want a code that will sort my data according... (4 Replies)
Discussion started by: FelipeAd
4 Replies

5. UNIX for Dummies Questions & Answers

Help with load average?

how load average is calculated and what exactly is it difference between cpu% and load average (9 Replies)
Discussion started by: robo
9 Replies

6. Shell Programming and Scripting

Searching for file names in a directory while ignoring certain file names

Sun Solaris Unix Question Haven't been able to find any solution for this situation. Let's just say the file names listed below exist in a directory. I want the find command to find all files in this directory but at the same time I want to eliminate certain file names or files with certain... (2 Replies)
Discussion started by: 2reperry
2 Replies

7. Programming

calculate average

I have a file which is 2 3 4 5 6 6 so i am writing program in C to calculate mean.. #include<stdio.h> #include<string.h> #include <math.h> double CALL mean(int n , double x) main (int argc, char **argv) { char Buf,SEQ; int i; double result = 0; FILE *fp; (3 Replies)
Discussion started by: cdfd123
3 Replies

8. Shell Programming and Scripting

Average in awk

Hi I am looking for an awk script which can compute average of all the fields every 5th line. The file looks: A B C D E F G H I J K L M 1 18 13 14 12 14 13 11 12 12 15 15 15 2 17 17 13 13 13 12 12 11 12 14 15 14 3 16 16 12 12 12 11 11 12 11 16 14 13 4 15 15 11 11 11 12 11 12 11... (6 Replies)
Discussion started by: saint2006
6 Replies

9. Shell Programming and Scripting

how to average in awk

Hi, I have the data like this $1 $2 1 12 2 13 3 14 4 12 5 12 6 12 7 13 8 14 9 12 10 12 i want to compute average of $1 and $2 every 5th line (1-5 and 6-10) Please help me with awk Thank you (4 Replies)
Discussion started by: saint2006
4 Replies

10. UNIX for Dummies Questions & Answers

average value

If I have a file like this, could anyone please guide me how to find the average value in each metrix. The file has got about 130,000 metrixs. Grid-ref= 142, 235 178 182 203 240 273 295 289 293 283 262 201 176 167 187 187 246 260 282 299 312 293 276 230 191 169 ... (2 Replies)
Discussion started by: su_in99
2 Replies
Login or Register to Ask a Question