awk for matching fields between files with repeated records

11-17-2019

Registered User

58, 2

Join Date: Aug 2014

Last Activity: 6 April 2020, 3:03 PM EDT

Posts: 58

Thanks Given: 61

Thanked 2 Times in 2 Posts

awk for matching fields between files with repeated records

Hello all, I am having trouble with what should be an easy task, but seem to be missing something fundamental. I have two files, with File 1 consisting of a single field of many thousands of records. I also have File 2 with two fields and many thousands of records.

My goal is that when $1 of File 1 matches $1 of File 2, then print $1 and $2 of File 2, or alternatively, print $1 from File 1 with $2 of File 2 when $1 and $2 match between the files. The problem is that File 1 has repeated records in it. Thus when I apply awk 'FNR==NR{a[$1]; next} $1 in a' File 1 File 2 I can get all matches where $1 in File 1 matches $1 in File 2 and then output $1 && $2 in File 2, but without the repeated records. However, I need the order of the records in File 1 to be retained as well as all of the repeated records.

File 1

Code:

ABC
DEF
XYZ
ABC
DEF
ABC
XYZ

File 2

Code:

ABC 123
DEF 345
XYZ 678

Desired Output:

Code:

ABC 123
DEF 345
XYZ 678
ABC 123
DEF 345
ABC 123
XYZ 678

NB: The records are much more varied and repeats much further spread out in the actual file than the simplified examples here.

I had a somewhat similar, albeit more involved, issue in the past that RudiC helped me with (see here), but I am having trouble applying his code to this simpler example.

I got it close with this:

Code:

awk 'NR==FNR {q=$1; $1=""; T[q "," ++C[q]] = $0; next} {q=$1; X=q "," ++D[q]; printf "%s\t",  $0; if(X in T); print T[X]}' File 2 File 1

While this attempt printed all of the repeated records of File 1, it only supplied $2 from File 2 along with $1 of File 1 on the first time it appears, but not every time, such as the following:

Code:

ABC 123
DEF 345
XYZ 678
ABC
DEF
ABC
XYZ

Thanks so much in advance.

Thanks so much.

Last edited by vbe; 11-17-2019 at 10:46 AM..

jvoot

View Public Profile for jvoot

Find all posts by jvoot

11-17-2019

Moderator

3,105, 1,603

Join Date: May 2013

Last Activity: 31 August 2020, 1:46 AM EDT

Location: Chennai

Posts: 3,105

Thanks Given: 1,269

Thanked 1,603 Times in 1,369 Posts

Hello jvoot,

Could you please try following.

Code:

awk 'FNR==NR{a[$1]=$2;next} ($0 in a){print $1,a[$1]}'  Input_file2   Input_file1

Output will be as follows.

Code:

ABC 123
DEF 345
XYZ 678
ABC 123
DEF 345
ABC 123
XYZ 678

EDIT: After reading your question again, 1 question came. Is it you want to check $2 also from Input_file2 to Input_file1 comparison vice?

Thanks,
R. Singh

Last edited by RavinderSingh13; 11-17-2019 at 02:25 AM..

This User Gave Thanks to RavinderSingh13 For This Post:

RavinderSingh13

View Public Profile for RavinderSingh13

Find all posts by RavinderSingh13

11-17-2019

Registered User

58, 2

Join Date: Aug 2014

Last Activity: 6 April 2020, 3:03 PM EDT

Posts: 58

Thanks Given: 61

Thanked 2 Times in 2 Posts

Thanks so much RavinderSingh13. An early quick test seems to reveal that that did the trick!

jvoot

View Public Profile for jvoot

Find all posts by jvoot

UNIX for Beginners Questions & Answers

awk for matching fields between files with repeated records

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Awk: group multiple fields from different records

Discussion started by: beca123456

2. UNIX for Beginners Questions & Answers

Matching fields between two files, repeated records

Discussion started by: jvoot

3. UNIX for Beginners Questions & Answers

Continued trouble matching fields in different files and selective field printing ([g]awk)

Discussion started by: jvoot

4. Shell Programming and Scripting

awk to print fields that match using conditions and a default value for non-matching in two files

Discussion started by: cmccabe

5. UNIX for Beginners Questions & Answers

Awk: matching multiple fields between 2 files

Discussion started by: beca123456

6. UNIX for Dummies Questions & Answers

Make all records with the same number of fields (awk)

Discussion started by: beca123456

7. Shell Programming and Scripting

awk pattern matching name in records

Discussion started by: Jill Ceke

8. Shell Programming and Scripting

Averaging all fields while counting repeated records

Discussion started by: nxp

9. Shell Programming and Scripting

AWK Matching Fields and Combining Files

Discussion started by: Michelangelo

10. UNIX for Dummies Questions & Answers

AWK ??-print for fields within records in a file

Discussion started by: hyennah