awk match two fields in two files


 
Thread Tools Search this Thread
Top Forums UNIX for Beginners Questions & Answers awk match two fields in two files
# 1  
Old 06-12-2018
Question awk match two fields in two files

Hi, I have two TEST files t.xyz and a.xyz which have three columns each. a.xyz have more rows than t.xyz. I will like to output rows at which $1 and $2 of t.xyz match $1 and $2 of a.xyz. Total number of output rows should be equal to that of t.xyz.
It works fine, but when I apply it to large file, the output is more than in t.xyz.

I use the following:
Code:
awk 'FNR==NR{a[$1];b[$2];next} $1 in a && $2 in b'  t.xyz a.xyz > out.xyz

Code:
t.xyz
1907.05604682 2983.53399456 -5435.67749023
1908.05607621 2983.53399456 -3593.08154297
1910.05613499 2983.53399456 -1238.71289063
1911.05616438 2983.53399456 -4244.93823242
1912.05619377 2983.53399456 -3595.24414063
1913.05622316 2983.53399456 -2454.96728516
1923.05651706 2983.53399456 NaN

a.xyz
1907.05604682 2983.53399456 35.67749023
1908.05607621 2983.53399456 93.08154297
1910.05613499 2983.53399456 38.71289063
1911.05616438 2983.53399456 44.93823242
1912.05619377 2983.53399456 95.24414063
1913.05622316 2983.53399456 54.96728516
1923.05651706 2983.53399456 NaN
631.018545121 2646.58662319 24.715881348
635.018662681 2646.58662319 27.13696289

expected out.xyz
1907.05604682 2983.53399456 35.67749023
1908.05607621 2983.53399456 93.08154297
1910.05613499 2983.53399456 38.71289063
1911.05616438 2983.53399456 44.93823242
1912.05619377 2983.53399456 95.24414063
1913.05622316 2983.53399456 54.96728516
1923.05651706 2983.53399456 NaN

Any help to fix this will be appreciated.
# 2  
Old 06-12-2018
I tried your script and I get your expected output. Do you have sample where the expected output is not produced?
This User Gave Thanks to Scrutinizer For This Post:
# 3  
Old 06-12-2018
a slightly simplified variation:
Code:
awk '{idx=$1 SUBSEP $2} FNR==NR{a[idx];next} idx in a'  t.xyz a.xyz > out.xyz

This User Gave Thanks to vgersh99 For This Post:
# 4  
Old 06-12-2018
This works on my linux mawk 1.3.3:
Code:
awk 'FNR==NR {a[$1,$2]; next} ($1,$2) in a'  t.xyz a.xyz

This User Gave Thanks to RudiC For This Post:
# 5  
Old 06-13-2018
Question

Yes, I apply it to large data file and it failed. I don't understand why I should have output (a.xyz) that is more than number of rows in t.xyz.
I tried this by vgersh99 and it works fine.
Code:
awk '{idx=$1 SUBSEP $2} FNR==NR{a[idx];next} idx in a'  t.xyz a.xyz > out.xyz

I now understand that there is no constraint on a.xyz, aside the matching ones, all the row print.
Thanks.
# 6  
Old 06-13-2018
Did you consider duplicates when the output is larger than t.xyz?
# 7  
Old 06-13-2018
I think I found another "failure mode" in your post#1 approach, NOT in the proposals from the forum:
If $1 from a.xyz matches $1 in any line in t.xyz, and $2 matches any OTHER line in t.xyz, your code prints. The other approaches insist on both matches being in one single line to print!
Example:


file1:
Code:
A B C
D E F

file2:
Code:
A E X

Your code prints A E X!
This User Gave Thanks to RudiC For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to update file based on match in 3 fields

Trying to use awk to store the value of $5 in file1 in array x. That array x is then used to search $4 of file1 to find aa match (I use x to skip the header in file1). Since $4 can have multiple strings in it seperated by a , (comma), I split them and iterate througn each split looking for a match.... (2 Replies)
Discussion started by: cmccabe
2 Replies

2. UNIX for Beginners Questions & Answers

Match Fields between two files, print portions of each file together when matched in ([g]awk)'

I've written an awk script to compare two fields in two different files and then print portions of each file on the same line when matched. It works reasonably well, but every now and again, I notice some errors and cannot seem to figure out what the issue may be and am turning to you for help. ... (2 Replies)
Discussion started by: jvoot
2 Replies

3. Shell Programming and Scripting

awk to print match or non-match and select fields/patterns for non-matches

In the awk below I am trying to output those lines that Match between file1 and file2, those Missing in file1, and those missing in file2. Using each $1,$2,$4,$5 value as a key to match on, that is if those 4 fields are found in both files the match, but if those 4 fields are not found then missing... (0 Replies)
Discussion started by: cmccabe
0 Replies

4. Shell Programming and Scripting

awk to print fields that match using conditions and a default value for non-matching in two files

Trying to use awk to match the contents of each line in file1 with $5 in file2. Both files are tab-delimited and there may be a space or special character in the name being matched in file2, for example in file1 the name is BRCA1 but in file2 the name is BRCA 1 or in file1 name is BCR but in file2... (6 Replies)
Discussion started by: cmccabe
6 Replies

5. Shell Programming and Scripting

awk to calculate fields only if match is found

Trying to combine the matching $5 values between file1 and file2. If a match is found then the last $6 value in the match and the sum of $7 are outputted to a new file. The awk below I hope is a good start. Thank you :). file1 chr12 9221325 9221448 chr12:9221325-9221448 A2M 1... (5 Replies)
Discussion started by: cmccabe
5 Replies

6. Shell Programming and Scripting

awk help: Match data fields from 2 files & output results from both into 1 file

I need to take 2 input files and create 1 output based on matches from each file. I am looking to match field #1 in both files (Userid) and create an output file that will be a combination of fields from both file1 and file2 if there are any differences in the fields 2,3,4,5,or 6. Below is an... (5 Replies)
Discussion started by: ambroze
5 Replies

7. Shell Programming and Scripting

Add fields in different files only if some fields between them match

Hi everybody (first time posting here) I have a file1 that looks like > 1,101,0.1,0.1 1,26,0.1,0.1 1,3,0.1,0.1 1,97,0.5,0.5 1,98,8.1,0.218919 1,99,6.2,0.248 2,101,0.1,0.1 2,24,3.1,0.147619 2,25,23.5,0.559524 2,26,34,0.723404with 762 lines.. I have another 'similar' file2 > ... (10 Replies)
Discussion started by: murpholinox
10 Replies

8. Shell Programming and Scripting

how to match fields from different files in PERL

Howdy! I have multiple files with tab-separated data: File1_filtered.txt gnl|Amel_4.0|Group3.29 1 G R 42 42 60 15 ,.AAA.aa,aa.A.. hh00/f//hD/h/hh gnl|Amel_4.0|Group3.29 2 C Y 36 36 60 5 T.,T, LggJh gnl|Amel_4.0|Group3.29 3 A R 27 27 60 9 Gg,,.gg., B6hcc22_c File2_filtered.txt ... (3 Replies)
Discussion started by: sramirez
3 Replies

9. Shell Programming and Scripting

Match two files and divide fields

I have two files that have the date field in common. I request your help with some script that divide each field value from file1 by the correspond field value of the file2 only when the field date is equal in both files. Thanks in advance ! This is a sample of the files file 1 12/16/2010,... (2 Replies)
Discussion started by: csierra
2 Replies

10. Shell Programming and Scripting

AWK break string into fields + pattern match

I am trying to break a string into separate fields and print the field that matches a pattern. I am using awk at the moment and have gotten this far: awk '{for(i=1;i<=NF;++i)print "\t" $i}' longstring This breaks the string into fields and prints each field on a separate line. I want to add... (2 Replies)
Discussion started by: Moxy
2 Replies
Login or Register to Ask a Question