Matching string on two files based on match rules.


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Matching string on two files based on match rules.
# 1  
Old 12-12-2010
Matching string on two files based on match rules.

Hi,

How to check if a string on file2 exactly matches with a part or complete string on file1, and return a match indicator based on some match rules.
1) only records on file1 with category A should be matched. for other category, the output match indicator should default to 'N'
2) on file2 (lookup file), trailing space is represented by #, so PO BOX on file1 should match with PO#
Ex:
file1
Code:
Adrfld,category
PO BOX,A
POST,A
avenue,A
business,X
bus terminus,A
first cross,A
firstcross,A

file2 ( Pattern file)
Code:
Adrptrn,active
ave,y
PO#,y
bus,y
cross,y

output
Code:
Adrfld,category,matchind
PO BOX,A,Y (matches with PO#)
POST,A,N (no match )
avenue,A,Y (matches with ave)
business,X,N (matches with bus, but category is X, so matchind is N)
bus terminus,A,Y (matches with bus)
first cross,A,Y (matches with cross with a leading space)
firstcross,A,N (cross matches but has no leading space0

Please help me with this as Im new to Unix, and am trying hard to achieve this.
Thanks.

Last edited by Franklin52; 12-14-2010 at 04:40 AM.. Reason: Please use code tags, thank you
# 2  
Old 12-12-2010
Hi,

Here you have a solution. Try it:
Code:
$ cat file1
PO BOX,A
POST,A
avenue,A
business,X
bus terminus,A
first cross,A
firstcross,A
$ cat file2
ave,y
PO ,y
bus,y
 cross,y
$ cat script.pl
#!/usr/bin/perl

use strict;
use warnings;

open my $f1, "<", "file1" or die "Error opening file1: $!";
open my $f2, "<", "file2" or die "Error opening file2: $!";

my @pat;
while ( <$f2> ) {
    s/\s+$//;
    s/,.//;
    push @pat, $_;
}

INI:
while ( my $line = <$f1> ) {
    $line =~ s/\s+$//;
    print($line . ",N\n") && next INI if ( $line =~ /X$/i );
    for my $pat (@pat) {
    if ( $line =~ /$pat/i ) {
        print $line . ",Y\n";
         next INI;
    }
    
    }
    print $line . ",N\n";
}
$ ./script.pl file1 file2
PO BOX,A,Y
POST,A,N
avenue,A,Y
business,X,N
bus terminus,A,Y
first cross,A,Y
firstcross,A,N

Regards,
Birei
This User Gave Thanks to birei For This Post:
# 3  
Old 12-12-2010
How about this:

Code:
$ awk -F, '
  NR==FNR {if($2=="y") M[gensub(/#$/, " ", 1, " "$1)]++; next }
  FNR==1 { print "adrfld", "category", "matchind"; next}
  $2 != "A" { print $0, "N"; next }
  { for(c in M) {if (" "$1 ~ c) { print $0, "Y"; next } }
    print $0, "N";} ' OFS=, file2 file1
adrfld,category,matchind
PO BOX,A,Y
POST,A,N
avenue,A,Y
business,X,N
bus terminus,A,Y
first cross,A,Y
firstcross,A,N


Last edited by Chubler_XL; 12-12-2010 at 07:26 PM..
# 4  
Old 12-12-2010
Code:
awk -F, 'NR==FNR{sub(/#/," ");A[$1];next}{$3="N"}FNR==1{$3="matchind"}$2=="A"{for(i in A)if(" "$1~" "i)$3="Y"}1' OFS=, file2 file1

Code:
Adrfld,category,matchind
PO BOX,A,Y
POST,A,N
avenue,A,Y
business,X,N
bus terminus,A,Y
first cross,A,Y
firstcross,A,N


Last edited by Scrutinizer; 12-12-2010 at 07:56 PM..
This User Gave Thanks to Scrutinizer For This Post:
# 5  
Old 12-12-2010
Scrutinizer, some nice optimisations of my original, but you missed suport for disabled patterns (ie $2 must be "y" in file2).

Code:
awk -F, 'NR==FNR{sub(/#/," ");if($2=="y")A[$1];next}{$3="N"}FNR==1{$3="matchind"}$2=="A"{for(i in A)if(" "$1~" "i)$3="Y"}1' OFS=, file2 file1


Last edited by Chubler_XL; 12-12-2010 at 09:40 PM..
This User Gave Thanks to Chubler_XL For This Post:
# 6  
Old 12-13-2010
MySQL Matching string on two files based on match rules.

Thanks a ton for your help !!! its working perfectly fine. As im new to unix, appreciate if you could explain the solution and also suggest books to learn awk. Many thanks.
# 7  
Old 12-13-2010
Here is an explanation:

-F,Set the field separator to a comma
'NR==FNRFor every line do, if we are reading the first file ( that is when FNR and NR are the same) then
{sub(/#/," ") replace every # with a space.
if($2=="y")A[$1] if the second field is y (then the pattern is active) create an empty array element with name of field 1 ($1) in array A
next} read the next line and don't do further processing on this line
{$3="N"} If we are reading the second file then for every line... create a new field nr. 3 and set it to "N"
FNR==1{$3="matchind"} If we are on the first line of the second file (the header) then the 3rd field becomes matchind
$2=="A"{ If the second field is "A" then
for(i in A) for every array element "i" in array A that was previously set while reading the first file,
if(" "$1~" "i)$3="Y"}if the value of the first field ($1) with a leading space contains the value of array element "i" with a leading space
1print every record (line)
OFS=, with output separator comma
file2 file1first read file2, then file1
I like sed & awk by O'Reilly

Quote:
Originally Posted by Chubler_XL
Scrutinizer, some nice optimisations of my original, but you missed suport for disabled patterns (ie $2 must be "y" in file2).

Code:
awk -F, 'NR==FNR{sub(/#/,"  ");if($2=="y")A[$1];next}{$3="N"}FNR==1{$3="matchind"}$2=="A"{for(i in  A)if(" "$1~" "i)$3="Y"}1' OFS=, file2 file1

Thanks Chubler, I did not know that that was a requirement.

Last edited by Scrutinizer; 12-13-2010 at 01:13 PM..
This User Gave Thanks to Scrutinizer For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Data match 2 files based on first 2 columns matching only and join if match

Hi, i have 2 files , the data i need to match is in masterfile and i need to pull out column 3 from master if column 1 and 2 match and output entire row to new file I have tried with join and awk and i keep getting blank outputs or same file is there an easier way than what i am... (4 Replies)
Discussion started by: axis88
4 Replies

2. Shell Programming and Scripting

Matching two fields in two csv files, create new file and append match

I am trying to parse two csv files and make a match in one column then print the entire file to a new file and append an additional column that gives description from the match to the new file. If a match is not made, I would like to add "NA" to the end of the file Command that Ive been using... (6 Replies)
Discussion started by: dis0wned
6 Replies

3. Shell Programming and Scripting

awk to print fields that match using conditions and a default value for non-matching in two files

Trying to use awk to match the contents of each line in file1 with $5 in file2. Both files are tab-delimited and there may be a space or special character in the name being matched in file2, for example in file1 the name is BRCA1 but in file2 the name is BRCA 1 or in file1 name is BCR but in file2... (6 Replies)
Discussion started by: cmccabe
6 Replies

4. Shell Programming and Scripting

New files based off match or no match

Trying to match $2 in original_targets with $2 of new_targets . If the two numbers match exactly then a match.txt file is outputted using the information in the new_targets in the beginning 4 fields $1, $2, $3, $4 and value of $4 in the original_targets . If there is "No Match" then a no... (2 Replies)
Discussion started by: cmccabe
2 Replies

5. Shell Programming and Scripting

Need to print the next word from the same line based on grep string condtion match.

I need to fetch particular string from log file based on grep condition match. Actual requirement is need to print the next word from the same line based on grep string condtion match. File :Java.lanag.xyz......File copied completed : abc.txt Ouput :abc.txt I have used below... (5 Replies)
Discussion started by: siva83
5 Replies

6. Shell Programming and Scripting

Match part of string in file2 based on column in file1

I have a file containing texts and indexes. I need the text between (and including ) INDEX and number "1" alone in line. I have managed this: awk '/INDEX/,/1$/{if (!/1$/)print}' file1.txt It works for all indexes. And then I have second file with years and indexes per year, one per line... (3 Replies)
Discussion started by: phoebus
3 Replies

7. Shell Programming and Scripting

Based on column in file1, find match in file2 and print matching lines

file1: file2: I need to find matches for any lines in file1 that appear in file2. Desired output is '>' plus the file1 term, followed by the line after the match in file2 (so the title is a little misleading): This is honestly beyond what I can do without spending the whole night on it, so I'm... (2 Replies)
Discussion started by: pathunkathunk
2 Replies

8. Shell Programming and Scripting

Matching 2 files based on one column

Hi, On a similar subject, the following. I have two files: file1.txt dbSNP_rsID,Chromosome,Position,Gene rs10399749,chr. 01,45162,? rs4030303,chr. 01,72434,? rs4030300,chr. 01,72515,? rs940550,chr. 01,78032,? rs13328714,chr. 01,81468,? rs11490937,chr. 01,222077,? rs6683466,chr.... (5 Replies)
Discussion started by: swvanderlaan
5 Replies

9. Shell Programming and Scripting

awk to print lines based on string match on another line and condition

Hi folks, I have a text file that I need to parse, and I cant figure it out. The source is a report breaking down softwares from various companies with some basic info about them (see source snippet below). Ultimately what I want is an excel sheet with only Adobe and Microsoft software name and... (5 Replies)
Discussion started by: rowie718
5 Replies

10. Shell Programming and Scripting

Concatenating and appending string based on specific pattern match

Input #GEO-1-type-1-fwd-Initial 890 1519 OPKHIJEFVTEFVHIJEFVOPKHIJTOPKEFVHIJTEFVOPKOPKHIJHIJHIJTTOPKHIJHIJEFVEFVOPKHIJOPKHIJOPKEFVEFVOPKHIJHIJEFVHIJHIJEFVTHIJOPKOPKTEFVEFVEFVOPKHIJOPKOPKHIJTTEFVEFVTEFV #GEO-1-type-2-fwd-Terminal 1572 2030... (7 Replies)
Discussion started by: patrick87
7 Replies
Login or Register to Ask a Question