Match first two columns and average third from multiple files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Match first two columns and average third from multiple files
# 15  
Old 03-05-2015
[QUOTE=Scrutinizer;302933668]Try this modification:

Any awk:
Code:
awk '{v=$NF; sub(/ +[^ ]*$/,x); A[$0]+=v; C[$0]++} END{for(i in A) printf "%s%7.1f\n",i,A[i]/C[i]}' file1 file2




Turns out that the above code did not work for me, at least in the way that I am using it.

I have the following script set up:

Code:
foreach numb (`seq 1 100`)
awk '{array[$1 $2]+=$3; a[$1 $2]++} END { for (i in array) {print i, array[i]/(a[i])}}' /dir/of/data/${numb}02.pnt > output.txt
end

I expect the code to search through each of the 100 files and when the first 2 columns match, average the 3rd.

Input File 102:
Code:
43.1042	-78.6042	-6.07
25.2708	-81.1458	19.27

Input File 202:
Code:
43.1042	-78.6042	-4.88
25.2708	-81.1458	17.76


Output:
Code:
43.1042	-78.6042   -5.5
25.2708	-81.1458	18.52

Any suggestions are appreciated!
# 16  
Old 03-05-2015
Hi, try this modification. This should work better:
Code:
awk '{v=$NF; sub(/[ \t]+[^ \t]*$/,x); A[$0]+=v; C[$0]++} END{for(i in A) printf "%s%7.1f\n",i,A[i]/C[i]}' file1 file2

---

Quote:
Originally Posted by ncwxpanther
[..]
Turns out that the above code did not work for me, at least in the way that I am using it.

I have the following script set up:

Code:
foreach numb (`seq 1 100`)
awk '{array[$1 $2]+=$3; a[$1 $2]++} END { for (i in array) {print i, array[i]/(a[i])}}' /dir/of/data/${numb}02.pnt > output.txt
end

Looking at you method, i have the following remarks:
  • why use csh?
  • like this awk will be called 100 times and the file output will get overwritten 100 times.
  • the result will be wrong 100 times, because the script needs to read all 100 files for it to work properly.

With bash you could try:
Code:
cd /dir/of/data
awk '.....' "${numb}"{1..100} > output.txt

or, if that gives line line length problems, try:
Code:
cd /dir/of/data
for i in "${numb}"{1..100}
do
  cat "$i"
done |
awk '.....' > output.txt


Last edited by Scrutinizer; 03-05-2015 at 12:52 PM..
This User Gave Thanks to Scrutinizer For This Post:
# 17  
Old 03-05-2015
Quote:
Originally Posted by Scrutinizer
Hi, try this modification. This should work better:
Code:
awk '{v=$NF; sub(/[ \t]+[^ \t]*$/,x); A[$0]+=v; C[$0]++} END{for(i in A) printf "%s%7.1f\n",i,A[i]/C[i]}' file1 file2

Thanks for replying Scrutinizer, but im not getting the expected results.

For instance this is the data for 1 match:

Code:
-8.44
-7.86
-3.28
-9.16
-8.71
-4.76
-7.75
-6.83
-2.45
-6.42
-4.17
-7.73
-6.32
-9.33
-3.07
-7.76
-8.43
-6.08
-2.55
-7.19
-2.09
-2.9
-7.4
-7.18
-2.24
-6.29
-2.02
-4.44
-5.88
-2.36
-3.06
-1.64
-3
-12.19
-5.57
-8.51
-2.61
-2.32
-3.99
-5.02
-5.36
-6.93
-3.64
-4.52
-4
-4.92
-6.34
-4.88
-1.27
-4.88
-3.33
-2.31
-1.44
-0.48
-3.76
-2.84
-2.01
-7.14
-5.36
-3.62
-3.01
-6.29
-8.16
-5.08
-4.38
-4.44
-6.72
-7.19
-3.94
-4.85
-3.72
-6.18
-6.05
-6.09
-2.58
-0.59
-4.65
-9.03
-10.09
-6.33
-0.75
-5.53
-2.15
0.21
-4.23
-4.79
-4.61
-4.57
-5.4
-1.98
-1.18
-2.83
-7.05
-6.07
-6.02
-4.88
-1.35
0.5
-0.86
-1.55

This is the expected result:
Code:
-4.72

This is what your code produced
Code:
-1.6

Any ideas?
# 18  
Old 03-05-2015
I had made an edit to my post, please have a look
# 19  
Old 03-05-2015
Quote:
Originally Posted by Scrutinizer
I had made an edit to my post, please have a look
This worked.

Code:
for numb in "${elem}"/{1..100}02.pnt 
do
cat "$numb"
done |
awk '{v=$NF; sub(/[ \t]+[^ \t]*$/,x); A[$0]+=v; C[$0]++} END{for(i in A) printf "%s%7.1f\n",i,A[i]/C[i]}' > output


Last edited by ncwxpanther; 03-05-2015 at 02:29 PM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Data match 2 files based on first 2 columns matching only and join if match

Hi, i have 2 files , the data i need to match is in masterfile and i need to pull out column 3 from master if column 1 and 2 match and output entire row to new file I have tried with join and awk and i keep getting blank outputs or same file is there an easier way than what i am... (4 Replies)
Discussion started by: axis88
4 Replies

2. Shell Programming and Scripting

Average of a columns from three files

hello, I have three files in the following order ==> File1 <== 1 20977000 20977000 A C 1.00 0,15 15 45 1 115829313 115829313 G A 0.500 6,7 13 99 ==> File2 <== 1 20977000 20977000 A C 1.00 0,13 13 39 1 115829313 ... (5 Replies)
Discussion started by: nans
5 Replies

3. Shell Programming and Scripting

Average across multiple columns - awk

Hi forum members, I'm trying to get an average of multiple columns in a csv file using awk. A small example of my input data is as follows: cu,u3o8,au,ag -9,20,-9,3.6 0.005,30,-9,-9 0.005,50,10,3.44 0.021,-9,8,3.35 The following code seems to do most of what I want gawk -F","... (6 Replies)
Discussion started by: theflamingmoe
6 Replies

4. Shell Programming and Scripting

Match first two columns and calculate percent of average in third column

I have the need to match the first two columns and when they match, calculate the percent of average for the third columns. The following awk script does not give me the expected results. awk 'NR==FNR {T=$3; next} $1,$2 in T {P=T/$3*100; printf "%s %s %.0f\n", $1, $2, (P>=0)?P:-P}' diff.file... (1 Reply)
Discussion started by: ncwxpanther
1 Replies

5. Shell Programming and Scripting

Average across multiple columns group by

Hi experts, I want to group by average, for multiple columns starting column $7 until NF, group by ($1-$5), please help For just 7th column, I can do awk ' NR>1{ arr += $7 count += 1 } END{ for (a in arr) { print a, arr/count ... (10 Replies)
Discussion started by: ritakadm
10 Replies

6. Shell Programming and Scripting

Join two files combining multiple columns and produce mix and match output

I would like to join two files when two columns in each file matches with each other and then produce an output when taking multiple columns. Like I have file A 1234,ABCD,23,JOHN,NJ,USA 2345,ABCD,24,SAM,NY,USA 5678,GHIJ,24,TOM,NY,USA 5678,WXYZ,27,MAT,NJ,USA and file B ... (2 Replies)
Discussion started by: mady135
2 Replies

7. Shell Programming and Scripting

Average of a column in multiple files

I have several sequential files with name stat.1000, stat.1001....to stat.1020 with a format like this 0.01 1 3822 4.97379915032e-14 4.96982253992e-09 0 0.01 3822 1 4.97379915032e-14 4.96982253992e-09 0 0.01 2 502 0.00993165137406 993.165137406 0 0.01 502 2 0.00993165137406 993.165137406 0... (6 Replies)
Discussion started by: kayak
6 Replies

8. Shell Programming and Scripting

Awk match multiple columns in multiple lines in single file

Hi, Input 7488 7389 chr1.fa chr1.fa 3546 9887 chr5.fa chr9.fa 7387 7898 chrX.fa chr3.fa 7488 7389 chr21.fa chr3.fa 7488 7389 chr1.fa chr1.fa 3546 9887 chr9.fa chr5.fa 7898 7387 chrX.fa chr3.fa Desired Output 7488 7389 chr1.fa chr1.fa 2 3546 9887 chr5.fa chr9.fa 2... (2 Replies)
Discussion started by: jacobs.smith
2 Replies

9. Shell Programming and Scripting

Computing average values from multiple text files

Hi, first, I have searched in the forum for this, but I could not find the right answer. (There were some similar threads, but I was not sure how to adapt the ideas.) Anyway, I have a quite natural problem: Given are several text files. All files contain the same number of lines and the same... (3 Replies)
Discussion started by: rbredereck
3 Replies

10. Shell Programming and Scripting

Average of elements throught multiple files

Hi, I got a lot of files looking like this: 1 0.5 6 All together there are ard 1'000'000 lines in each of the ard 100 files. I want to build the average for every line, and write the result to a new file. The averaging should start at a specific line, here for example at line... (10 Replies)
Discussion started by: chillmaster
10 Replies
Login or Register to Ask a Question