Average of columns


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers Average of columns
# 1  
Old 12-04-2019
Average of columns

I have files that have the following columns

Code:
chr    pos    ref    alt    sample 1    sample 2    sample 3
chr2    179644035    G    A    1,107    0,1    58,67
chr7    151945167    G    T    142,101    100,200    500,700
chr13    31789169    CTT    CT,C    6,37,8    0,0,0    15,46,89
chr22    50962208    T    G    1,107    1,10    0,0
chr23    4373957984    CTT    A,T,G,C    0,1,2,4    0,0,1,3    9,4,6,2


I need to take the average of the values comma separated for each sample rounded to 2 decimal point to have the following output


Code:
chr    pos    ref    alt    sample 1    sample 2    sample 3
chr2    179644035    G    A    54    0.5    62.5
chr7    151945167    G    T    121.5    150    600
chr13    31789169    CTT    CT,C    17    0    50
chr22    50962208    T    G    54    5.5    0
chr23    4373957984    CTT    A,T,G,C    1.75    1    5.25

Any guide in using awk to achieve this will be extremly helpful

Last edited by rbatte1; 12-05-2019 at 09:16 AM..
# 2  
Old 12-04-2019
Hello nans,

On forums we encourage users to do add their efforts which they have out in order to solve their own problems.
So kindly do add your efforts in your question and let us know then.

Thanks,
R. Singh
# 3  
Old 12-04-2019
Hello R.Singh,


I haven't come anywhere near but this is what I have so far which is terribly wrong


Code:
awk '{a[FNR]=a[FNR]+$5;b[FNR]++;}END{for(i in a){print i,a[i]/b[i]}}' file.txt > output

I was working on trying to get the average on sample 1 only. Here I was intending to take the 5th column, separate them and then calculate its average but obviously this doesn't do that
# 4  
Old 12-04-2019
Try
Code:
awk '
NR > 1  {for (i=5; i<=NF; i++)  {n   = split ($i, T, ",")
                                 for (j=1; j<=n; j++) SUM += T[j]
                                 $i  = SUM/n
                                 SUM = 0
                                }
        }
1
' OFS="\t" file

This User Gave Thanks to RudiC For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Average of a columns from three files

hello, I have three files in the following order ==> File1 <== 1 20977000 20977000 A C 1.00 0,15 15 45 1 115829313 115829313 G A 0.500 6,7 13 99 ==> File2 <== 1 20977000 20977000 A C 1.00 0,13 13 39 1 115829313 ... (5 Replies)
Discussion started by: nans
5 Replies

2. Shell Programming and Scripting

Average across multiple columns - awk

Hi forum members, I'm trying to get an average of multiple columns in a csv file using awk. A small example of my input data is as follows: cu,u3o8,au,ag -9,20,-9,3.6 0.005,30,-9,-9 0.005,50,10,3.44 0.021,-9,8,3.35 The following code seems to do most of what I want gawk -F","... (6 Replies)
Discussion started by: theflamingmoe
6 Replies

3. Emergency UNIX and Linux Support

Average columns based on header name

Hi Friends, I have files with columns like this. This sample input below is partial. Please check below for main file link. Each file will have only two rows. ... (8 Replies)
Discussion started by: jacobs.smith
8 Replies

4. Shell Programming and Scripting

Match first two columns and average third from multiple files

I have the following format of input from multiple files File 1 24.01 -81.01 1.0 24.02 -81.02 5.0 24.03 -81.03 0.0 File 2 24.01 -81.01 2.0 24.02 -81.02 -5.0 24.03 -81.03 10.0 I need to scan through the files and when the first 2 columns match I... (18 Replies)
Discussion started by: ncwxpanther
18 Replies

5. Shell Programming and Scripting

Match first two columns and calculate percent of average in third column

I have the need to match the first two columns and when they match, calculate the percent of average for the third columns. The following awk script does not give me the expected results. awk 'NR==FNR {T=$3; next} $1,$2 in T {P=T/$3*100; printf "%s %s %.0f\n", $1, $2, (P>=0)?P:-P}' diff.file... (1 Reply)
Discussion started by: ncwxpanther
1 Replies

6. Shell Programming and Scripting

Average across multiple columns group by

Hi experts, I want to group by average, for multiple columns starting column $7 until NF, group by ($1-$5), please help For just 7th column, I can do awk ' NR>1{ arr += $7 count += 1 } END{ for (a in arr) { print a, arr/count ... (10 Replies)
Discussion started by: ritakadm
10 Replies

7. UNIX for Dummies Questions & Answers

Writing a script to take the average of two columns every 3 rows

I have a dataset with 120 columns. I would like to write a script, that takes the average of every two columns, starting from columns 2 and 3, and moving consecutively in frames of 3 columns, all the way until the last column. The first column in the output file would be the averages of columns... (1 Reply)
Discussion started by: evelibertine
1 Replies

8. Shell Programming and Scripting

How to calculate average of two columns and copy into another file?

Hi, I need help with the awk command. I have a folder with aprox 500 files each one with two columns and I want to print in a new file, the average of column 1 and average of column 2 and the name of each file. Input files are: File-1: 100 99 20 99 50 99 50 99 File-2: 200 85... (3 Replies)
Discussion started by: Lokaps
3 Replies

9. Shell Programming and Scripting

Average of columns with values of other column with same name

I have a lot of input files that have the following form: Sample Cq Sample Cq Sample Cq Sample Cq Sample Cq 1WBIN 23.45 1WBIN 23.45 1CVSIN 23.96 1CVSIN 23.14 S1 31.37 1WBIN 23.53 1WBIN 23.53 1CVSIN 23.81 1CVSIN 23.24 S1 31.49 1WBIN 24.55 1WBIN 24.55 1CVSIN 23.86 1CVSIN 23.24 S1 31.74 ... (3 Replies)
Discussion started by: isildur1234
3 Replies

10. UNIX for Dummies Questions & Answers

Taking the average of two columns and printing it on a new column

Hi, I have a space delimited text file that looks like the following: Aa 100 200 Bb 300 100 Cc X 500 Dd 600 X Basically, I want to take the average of columns 2 and 3 and print it in column 4. However if there is an X in either column 2 or 3, I want to print the non-X value. Therefore... (11 Replies)
Discussion started by: evelibertine
11 Replies
Login or Register to Ask a Question