Average of a columns from three files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Average of a columns from three files
# 1  
Old 08-31-2017
Average of a columns from three files

hello, I have three files in the following order
Code:
==> File1 <==
1    20977000    20977000    A    C    1.00    0,15    15    45
1    115829313    115829313    G    A    0.500    6,7    13    99

==> File2 <==
1    20977000    20977000    A    C    1.00    0,13    13    39
1    115829313    115829313    G    A    0.500    7,7    14    99

==> File3 <==
1    20977000    20977000    A    C    1.00    0,16    16    48
1    115829313    115829313    G    A    0.500    7,10    17    99

I need to take the average of the 8th column from each file and output the average to the first file. So the results should look like

Code:
==> Results <==
1    20977000    20977000    A    C    1.00    0,15    15    45 14.6667
1    115829313    115829313    G    A    0.500    6,7    13    99 14.6667

I am able to calculate the average with this
Code:
awk '{a[FNR]=a[FNR]+$8;b[FNR]++;}END{for(i in a){print i,a[i]/b[i]}}' *.txt

but not able to write it into the first file. Any suggestions ?
# 2  
Old 08-31-2017
Hello nans,

Could you please try following and let me know if this helps you.
Code:
awk 'FNR==NR && FNR==1{first_file=FILENAME} FNR==1 && val{close(val)} FNR==1{val=FILENAME} {a[FNR]+=$8;b[FNR]++;c[FNR]=$0}  END{for(i in a){print c[i],a[i]/b[i] > first_file}}' *.txt

Let me know how it goes then. Also I would say you could test this command in few test files, once you are happy with it then you could give it a shot for real files.

Thanks,
R. Singh
This User Gave Thanks to RavinderSingh13 For This Post:
# 3  
Old 08-31-2017
Try also (for the fun of it)
Code:
awk '
FNR == 1                {FC++
                        }
 NR == 1                {AC = ARGC
                         ARGV[ARGC++] = FILENAME
                        }
FNR == NR               {next
                        }
                        {SUM[FNR] += $8
                        }
 FC == AC               {print $0, SUM[FNR] / (AC-1)
                        } 

' file[123]
1    20977000    20977000    A    C    1.00    0,15    15    45 14.6667
1    115829313    115829313    G    A    0.500    6,7    13    99 14.6667

This User Gave Thanks to RudiC For This Post:
# 4  
Old 09-04-2017
Both the solutions worked perfect! Thank you
# 5  
Old 09-07-2017
Revisiting this post. Can we condition the command to say,
go through first two columns in each of the three files and if it matches (assuming the files are not in the same order) or even present in a single file (out of three), print the average of the 8th columns in the first file ?


Code:
==> File1 <==
1    115829313    115829313    G    A    0.500    6,7    13    99
1    20977000    20977000    A    C    1.00    0,17    17   45

==> File2 <==
1    20977000    20977000    A    C    1.00    0,15    15    45
1    115829313    115829313    G    A    0.500    7,7    14    99

==> File3 <==
1    115829313    115829313    G    A    0.500    7,10    17    99
1    209897000    209897000    T    C    1.00    0,16    16    48

Code:
==> Results <==
1    115829313    115829313    G    A    0.500    6,7    13    99   14.66
1    20977000    20977000    A    C    1.00    0,17    17   45   16
1    209897000    209897000    T    C    1.00    0,16    16    48   16

Thank you
# 6  
Old 09-07-2017
Quote:
Originally Posted by nans
Revisiting this post. Can we condition the command to say,
go through first two columns in each of the three files and if it matches (assuming the files are not in the same order) or even present in a single file (out of three), print the average of the 8th columns in the first file ?
Code:
==> File1 <==
1    115829313    115829313    G    A    0.500    6,7    13    99
1    20977000    20977000    A    C    1.00    0,17    17   45

==> File2 <==
1    20977000    20977000    A    C    1.00    0,15    15    45
1    115829313    115829313    G    A    0.500    7,7    14    99

==> File3 <==
1    115829313    115829313    G    A    0.500    7,10    17    99
1    209897000    209897000    T    C    1.00    0,16    16    48

Code:
==> Results <==
1    115829313    115829313    G    A    0.500    6,7    13    99   14.66
1    20977000    20977000    A    C    1.00    0,17    17   45   16
1    209897000    209897000    T    C    1.00    0,16    16    48   16

Thank you
Hello nans,

Could you please try following and let me know if this helps you.
Code:
awk '
FNR==NR && FNR==1{
   first_file=FILENAME
}
FNR==1 && val{
   close(val)
}
FNR==NR{
   val=FILENAME;
   c[$2,$3]=$0;
   d[$2,$3]++;
   a[$2,$3]+=$8;
   b[$2,$3]++;
   next
}
!b[$2,$3]++{
   c[$2,$3]=$0
}
{
   d[$2,$3]++;
   a[$2,$3]+=$8;
   val=FILENAME
}
END{
   for(i in c){
       print c[i],(a[i]/d[i])
}
}
' SUBSEP=" " File1 File2  File3

Above will only print the output on terminal, if you want to put the output into first Input_file1 named File1 then you could change print c[i],(a[i]/d[i]) to print c[i],(a[i]/d[i]) > first_file.

Thanks,
R. Singh

Last edited by RavinderSingh13; 09-07-2017 at 03:45 PM.. Reason: Added statement for adding output into first File1.
This User Gave Thanks to RavinderSingh13 For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Average of columns

I have files that have the following columns chr pos ref alt sample 1 sample 2 sample 3 chr2 179644035 G A 1,107 0,1 58,67 chr7 151945167 G T 142,101 100,200 500,700 chr13 31789169 CTT CT,C 6,37,8 0,0,0 15,46,89 chr22 ... (3 Replies)
Discussion started by: nans
3 Replies

2. Shell Programming and Scripting

Average across multiple columns - awk

Hi forum members, I'm trying to get an average of multiple columns in a csv file using awk. A small example of my input data is as follows: cu,u3o8,au,ag -9,20,-9,3.6 0.005,30,-9,-9 0.005,50,10,3.44 0.021,-9,8,3.35 The following code seems to do most of what I want gawk -F","... (6 Replies)
Discussion started by: theflamingmoe
6 Replies

3. Emergency UNIX and Linux Support

Average columns based on header name

Hi Friends, I have files with columns like this. This sample input below is partial. Please check below for main file link. Each file will have only two rows. ... (8 Replies)
Discussion started by: jacobs.smith
8 Replies

4. Shell Programming and Scripting

Match first two columns and average third from multiple files

I have the following format of input from multiple files File 1 24.01 -81.01 1.0 24.02 -81.02 5.0 24.03 -81.03 0.0 File 2 24.01 -81.01 2.0 24.02 -81.02 -5.0 24.03 -81.03 10.0 I need to scan through the files and when the first 2 columns match I... (18 Replies)
Discussion started by: ncwxpanther
18 Replies

5. Shell Programming and Scripting

Match first two columns and calculate percent of average in third column

I have the need to match the first two columns and when they match, calculate the percent of average for the third columns. The following awk script does not give me the expected results. awk 'NR==FNR {T=$3; next} $1,$2 in T {P=T/$3*100; printf "%s %s %.0f\n", $1, $2, (P>=0)?P:-P}' diff.file... (1 Reply)
Discussion started by: ncwxpanther
1 Replies

6. Shell Programming and Scripting

Average across multiple columns group by

Hi experts, I want to group by average, for multiple columns starting column $7 until NF, group by ($1-$5), please help For just 7th column, I can do awk ' NR>1{ arr += $7 count += 1 } END{ for (a in arr) { print a, arr/count ... (10 Replies)
Discussion started by: ritakadm
10 Replies

7. UNIX for Dummies Questions & Answers

Writing a script to take the average of two columns every 3 rows

I have a dataset with 120 columns. I would like to write a script, that takes the average of every two columns, starting from columns 2 and 3, and moving consecutively in frames of 3 columns, all the way until the last column. The first column in the output file would be the averages of columns... (1 Reply)
Discussion started by: evelibertine
1 Replies

8. Shell Programming and Scripting

How to calculate average of two columns and copy into another file?

Hi, I need help with the awk command. I have a folder with aprox 500 files each one with two columns and I want to print in a new file, the average of column 1 and average of column 2 and the name of each file. Input files are: File-1: 100 99 20 99 50 99 50 99 File-2: 200 85... (3 Replies)
Discussion started by: Lokaps
3 Replies

9. Shell Programming and Scripting

Average of columns with values of other column with same name

I have a lot of input files that have the following form: Sample Cq Sample Cq Sample Cq Sample Cq Sample Cq 1WBIN 23.45 1WBIN 23.45 1CVSIN 23.96 1CVSIN 23.14 S1 31.37 1WBIN 23.53 1WBIN 23.53 1CVSIN 23.81 1CVSIN 23.24 S1 31.49 1WBIN 24.55 1WBIN 24.55 1CVSIN 23.86 1CVSIN 23.24 S1 31.74 ... (3 Replies)
Discussion started by: isildur1234
3 Replies

10. UNIX for Dummies Questions & Answers

Taking the average of two columns and printing it on a new column

Hi, I have a space delimited text file that looks like the following: Aa 100 200 Bb 300 100 Cc X 500 Dd 600 X Basically, I want to take the average of columns 2 and 3 and print it in column 4. However if there is an X in either column 2 or 3, I want to print the non-X value. Therefore... (11 Replies)
Discussion started by: evelibertine
11 Replies
Login or Register to Ask a Question