Match first two columns and average third from multiple files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Match first two columns and average third from multiple files
# 1  
Old 01-30-2015
Match first two columns and average third from multiple files

I have the following format of input from multiple files

File 1
Code:
 24.01  -81.01    1.0
 24.02  -81.02    5.0
 24.03  -81.03    0.0

File 2
Code:
 24.01  -81.01    2.0
 24.02  -81.02   -5.0
 24.03  -81.03   10.0

I need to scan through the files and when the first 2 columns match I need to print out the first two columns and the average of the third.

Expected output:
Code:
 24.01  -81.01    1.5
 24.02  -81.02    0.0
 24.03  -81.03    5.0

I was using this code but I am not getting the expected output (data are wrong).

Code:
awk -F, '{array[$1 $2]+=$3} END { for (i in array) {print i array[i]}}' input.file

I think I need to remove the -F, (since there are no commas) and divide the array by number of records (NR).

Any help is appreciated
# 2  
Old 01-30-2015
Code:
awk '{array[$1 $2]+=$3; a[$1 $2]++} END { for (i in array) {print i, array[i]/(a[i])}}'  file1 file2 > newfile

Try that for a start.
This User Gave Thanks to jim mcnamara For This Post:
# 3  
Old 01-30-2015
Try

Code:
 awk '{arr[$1,$2]   += $3
        count[$1,$2] += 1
    }
    END{
        for (a in arr) {
            print  a  arr[a] / count[a]
        }
    }
' file*

# 4  
Old 02-01-2015
Quote:
Originally Posted by jim mcnamara
Code:
awk '{array[$1 $2]+=$3; a[$1 $2]++} END { for (i in array) {print i, array[i]/(a[i])}}'  file1 file2 > newfile

Try that for a start.
The results are as expected, but the format is not. What is the command to preserve the original format.
# 5  
Old 02-01-2015
Try this modification:

Any awk:
Code:
awk '{v=$NF; sub(/ +[^ ]*$/,x); A[$0]+=v; C[$0]++} END{for(i in A) printf "%s%7.1f\n",i,A[i]/C[i]}' file1 file2

GNU awk:
Code:
gawk '{A[$1$2]+=$3; C[$1$2]++} END{for(i in A) printf "%s%7.1f\n",i,A[i]/C[i]}' FIELDWIDTHS="6 8 9" file1 file2


Last edited by Scrutinizer; 02-01-2015 at 10:11 AM..
This User Gave Thanks to Scrutinizer For This Post:
# 6  
Old 02-01-2015
[QUOTE=Scrutinizer;302933668]Try this modification:

Any awk:
Code:
awk '{v=$NF; sub(/ +[^ ]*$/,x); A[$0]+=v; C[$0]++} END{for(i in A) printf "%s%7.1f\n",i,A[i]/C[i]}' file1 file2


Great! The first awk statement worked.

Now Im looking for the correct way to sort the first column (ascending).
# 7  
Old 02-01-2015
Try running it through sort :
Code:
awk ... | sort -n

This User Gave Thanks to Scrutinizer For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Data match 2 files based on first 2 columns matching only and join if match

Hi, i have 2 files , the data i need to match is in masterfile and i need to pull out column 3 from master if column 1 and 2 match and output entire row to new file I have tried with join and awk and i keep getting blank outputs or same file is there an easier way than what i am... (4 Replies)
Discussion started by: axis88
4 Replies

2. Shell Programming and Scripting

Average of a columns from three files

hello, I have three files in the following order ==> File1 <== 1 20977000 20977000 A C 1.00 0,15 15 45 1 115829313 115829313 G A 0.500 6,7 13 99 ==> File2 <== 1 20977000 20977000 A C 1.00 0,13 13 39 1 115829313 ... (5 Replies)
Discussion started by: nans
5 Replies

3. Shell Programming and Scripting

Average across multiple columns - awk

Hi forum members, I'm trying to get an average of multiple columns in a csv file using awk. A small example of my input data is as follows: cu,u3o8,au,ag -9,20,-9,3.6 0.005,30,-9,-9 0.005,50,10,3.44 0.021,-9,8,3.35 The following code seems to do most of what I want gawk -F","... (6 Replies)
Discussion started by: theflamingmoe
6 Replies

4. Shell Programming and Scripting

Match first two columns and calculate percent of average in third column

I have the need to match the first two columns and when they match, calculate the percent of average for the third columns. The following awk script does not give me the expected results. awk 'NR==FNR {T=$3; next} $1,$2 in T {P=T/$3*100; printf "%s %s %.0f\n", $1, $2, (P>=0)?P:-P}' diff.file... (1 Reply)
Discussion started by: ncwxpanther
1 Replies

5. Shell Programming and Scripting

Average across multiple columns group by

Hi experts, I want to group by average, for multiple columns starting column $7 until NF, group by ($1-$5), please help For just 7th column, I can do awk ' NR>1{ arr += $7 count += 1 } END{ for (a in arr) { print a, arr/count ... (10 Replies)
Discussion started by: ritakadm
10 Replies

6. Shell Programming and Scripting

Join two files combining multiple columns and produce mix and match output

I would like to join two files when two columns in each file matches with each other and then produce an output when taking multiple columns. Like I have file A 1234,ABCD,23,JOHN,NJ,USA 2345,ABCD,24,SAM,NY,USA 5678,GHIJ,24,TOM,NY,USA 5678,WXYZ,27,MAT,NJ,USA and file B ... (2 Replies)
Discussion started by: mady135
2 Replies

7. Shell Programming and Scripting

Average of a column in multiple files

I have several sequential files with name stat.1000, stat.1001....to stat.1020 with a format like this 0.01 1 3822 4.97379915032e-14 4.96982253992e-09 0 0.01 3822 1 4.97379915032e-14 4.96982253992e-09 0 0.01 2 502 0.00993165137406 993.165137406 0 0.01 502 2 0.00993165137406 993.165137406 0... (6 Replies)
Discussion started by: kayak
6 Replies

8. Shell Programming and Scripting

Awk match multiple columns in multiple lines in single file

Hi, Input 7488 7389 chr1.fa chr1.fa 3546 9887 chr5.fa chr9.fa 7387 7898 chrX.fa chr3.fa 7488 7389 chr21.fa chr3.fa 7488 7389 chr1.fa chr1.fa 3546 9887 chr9.fa chr5.fa 7898 7387 chrX.fa chr3.fa Desired Output 7488 7389 chr1.fa chr1.fa 2 3546 9887 chr5.fa chr9.fa 2... (2 Replies)
Discussion started by: jacobs.smith
2 Replies

9. Shell Programming and Scripting

Computing average values from multiple text files

Hi, first, I have searched in the forum for this, but I could not find the right answer. (There were some similar threads, but I was not sure how to adapt the ideas.) Anyway, I have a quite natural problem: Given are several text files. All files contain the same number of lines and the same... (3 Replies)
Discussion started by: rbredereck
3 Replies

10. Shell Programming and Scripting

Average of elements throught multiple files

Hi, I got a lot of files looking like this: 1 0.5 6 All together there are ard 1'000'000 lines in each of the ard 100 files. I want to build the average for every line, and write the result to a new file. The averaging should start at a specific line, here for example at line... (10 Replies)
Discussion started by: chillmaster
10 Replies
Login or Register to Ask a Question