Match pattern from file 1 with any/all columns in file 2


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Match pattern from file 1 with any/all columns in file 2
# 1  
Old 12-10-2015
Match pattern from file 1 with any/all columns in file 2

Hi, I have been looking everywhere for an example so I can try and do this myself but I am having difficulty. I have 2 large files of different sizes and if the pattern in the 3rd column in file 1 is in "any" column in file 2 I want to print all of the line in file 1 and append that line with the pattern in column 1 of file 2.

Here is an example. The files are comma separated.

file 1

Code:
A,B,XXX,000
C,D,YYY,111
E,F,ZZZ,222

file 2

Code:
G,XXX,333,H,222
H,4,555,YYY,kkk
L,R,JJJ,LMN,KRP,ZZZ,0

In this example, in file 1 I want to look for XXX, YYY and ZZZ in file 2 which could be in any column. When I find them I want to print the line from file 1 and the entry in the 1st column in file 2. Like this

Code:
A,B,XXX,000,G
C,D,YYY,111,H
E,F,ZZZ,222,L

I have tried several variations of what I have seen in NR==FNR posts but I am finding it hard to get the right combination. All the NR==FNR posts talk about matching a "specific" column in one file with a "specific" column in another file.

I would really really appreciate help with this.

Thanks in advance.
# 2  
Old 12-10-2015
Hi,
In file 1 or 2, can we find several times the same pattern ?
Can we load in memory file 1 or is it too large ?
Regards.
# 3  
Old 12-10-2015
Well, try
Code:
awk 'NR==FNR {T[$3]=$0; next} {for (t in T) if ($0 ~ t) print T[t], $1}' FS=, OFS=, file[12]
A,B,XXX,000,G
C,D,YYY,111,H
E,F,ZZZ,222,L

# 4  
Old 12-10-2015
There are 2445 entries in file 1 and 2431 entries in file 2. File 1 is 264KB and file 2 is 352 KB. file 2 has a lot more columns.

The pattern in column 3 in file 1 can exist in several lines in file 2 but only once in file 1. So I need to read in column 3 in every line from file 1 and look for a match in every line in file 2. And also the pattern in column 3 in file 1 can also "not" exist in file 2 and then this happens I want to just print the line from file 1 with nothing appended.

RudiC, Can I put an else in your code to print the line from file 1 when no matches occur? Something like so?

Code:
awk 'NR==FNR {T[$3]=$0; next} {for (t in T) if ($0 ~ t) {print T[t], $1} else {print T[t]}}' FS=, OFS=, file1 file2

# 5  
Old 12-10-2015
More like
Code:
awk 'NR==FNR {T[$3]=$0; P[$3]; next} {for (t in T) if ($0 ~ t) {print T[t], $1; delete P[t]}} END {for (p in P) print T[p]}' FS=, OFS=, file[12]

---------- Post updated at 16:39 ---------- Previous update was at 15:10 ----------

Above has a drawback: Although it wouldn't match substrings of file1's $3 in file2, superstrings would be matched. If that's a problem, we need to reconsider.
This User Gave Thanks to RudiC For This Post:
# 6  
Old 12-10-2015
Better loop over the fields, and the look-up in the array is exact and maybe even faster.
Code:
awk 'NR==FNR {T[$3]=$0; next} {for (i=1;i<=NF;i++) if ($i in T) print T[$i], $1}' FS=, OFS=, file[12]

---------- Post updated at 03:01 PM ---------- Previous update was at 02:29 PM ----------

"break" the loop if found once, and with the latest requirement
Code:
awk 'NR==FNR {T[$3]=$0; next} {for (i=1;i<=NF;i++) if ($i in T) {print T[$i], $1; seen[$i]; break}} END {for (t in T) if (!(t in seen)) print T[t]}' FS=, OFS=, file1 file2

This User Gave Thanks to MadeInGermany For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Match pattern only between certain lines in entire file

Hello, I have input that looks like this: * 0 -1 103 0 0 m. 7 LineNr 23 ClauseNr 1: 1: 1: 304: 0 0 SentenceNr 13 TxtType: Q Pargr: 2.1 ClType:MSyn PS004,006 ZBX= 0 1 1 0 7 -1 -1 3 2 3 2 -1 1 1 -1 -1 -1 -1 0 501 0 PS004,006 ZBX ... (2 Replies)
Discussion started by: jvoot
2 Replies

2. Shell Programming and Scripting

Match Columns in one file and extract columns from another file

Kindly help merging information from two files with the following data structure. I want to match for the CHR-SNP in Foo and get the columns that match from CHROM-rsID Fields 1 & 2 of foo may have duplicates, however, a joint key of Fields $1$2$3$4 is unique. Also would be helpful to clean up... (4 Replies)
Discussion started by: genehunter
4 Replies

3. Shell Programming and Scripting

awk pattern match not printing desired columns

Hi all, I'm trying to match the following two files with the code below: awk -F, 'NR==FNR {a=$0; next} ($12,$4) in a {print $12,$1,a}' OFS="," file4.csv file3.csv but the code does not print the entire row from file4 in addition to column 12 and 1 of file3. file4: o,c,q,co,ov,b... (1 Reply)
Discussion started by: bkane3
1 Replies

4. Shell Programming and Scripting

Help with ksh-to read ip file & append lines to another file based on pattern match

Hi, I need help with this- input.txt : L B white X Y white A B brown M Y black Read this input file and if 3rd column is "white", then add specific lines to another file insert.txt. If 3rd column is brown, add different set of lines to insert.txt, and so on. For example, the given... (6 Replies)
Discussion started by: prashob123
6 Replies

5. Shell Programming and Scripting

awk Pattern Match One File to Another

I want to read from file 1 and pattern match in file two and print field two from the next line. File 1: user1 user2 user3 File 2: name=user1 gud=12345 name=user2 gud=32456 I have this pattern hardcoded but can't work out how to pass file 1 to the pattern match: (6 Replies)
Discussion started by: u20sr
6 Replies

6. Shell Programming and Scripting

Pattern match till the end of the file.

I have a file which is like this ……………………………………….. ………………………………… ………………………………… …………………………………… ……………………………………. ……………………………… <<<from_here>>> ……………………………… ………………………………. I want a script which would fetch the data starting from <<<from_here>>> in the file till the end... (2 Replies)
Discussion started by: halfafringe
2 Replies

7. Shell Programming and Scripting

Match and print columns in second file

Hi All, I have to match each row in file 1 with 1st row in file 2 and print the corresponding column from file2. I am trying to use an awk script to do this. For example cat File1 X1 X3 X4 cat File2 ID X1 X2 X3 X4 A 1 6 2 1 B 2 7 3 3 C 3 8 4 1 D 4 9 1 1 (3 Replies)
Discussion started by: newpro
3 Replies

8. Shell Programming and Scripting

AWK match $1 $2 pattern in file 1 to $1 $2 pattern in file2

Hi, I have 2 files that I have modified to basically match each other, however I want to determine what (if any) line in file 1 does not exist in file 2. I need to match column $1 and $2 as a single string in file1 to $1 and $2 in file2 as these two columns create a match. I'm stuck in an AWK... (9 Replies)
Discussion started by: right_coaster
9 Replies

9. Shell Programming and Scripting

extracting columns using pattern file from source file

Hi experts,Please help me for the below requirement. i have a source file.(lets say contains 50 columns). I am extarcting five columns from the source file by using pattern file. for example input file:--------a,b,c,d,"a,g","v b",s,koutputfile=======a,"a,g","v b",s,kThanks in advance subhendu (2 Replies)
Discussion started by: subhendu81
2 Replies

10. Shell Programming and Scripting

Get columns from another file for match in col 2 in 1st file

Hi, My first file has 592155 9 rs16916098 1 592156 19 rs7249604 1 592157 4 rs885156 1 592158 5 rs350067 12nd file has 9 rs16916098 0 113228129 2 4 19 rs7249604 0 58709070 4 2 2 rs17042833 0 113558750 4 2... (2 Replies)
Discussion started by: genehunter
2 Replies
Login or Register to Ask a Question