Find the exact and best match between 2 files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Find the exact and best match between 2 files
# 1  
Old 01-20-2016
Find the exact and best match between 2 files

Dear Forum,

File1: Reference
Code:
4474189	United Kingdom Mobile
4474188	United Kingdom Mobile
4474187	United Kingdom Mobile
447	United Kingdom
93	AFGHANISTAN 0093
1907	ALASKA 001907
355	ALBANIA 00355
35568	ALBANIA MOBILE 0035568
35569	ALBANIA MOBILE 0035569
213	ALGERIA 00213
2137	ALGERIA DJEZZY MOBILE 002137
21360	ALGERIA OTHER MOBILE 0021360
21361	ALGERIA OTHER MOBILE 0021361
21362	ALGERIA OTHER MOBILE 0021362
21363	ALGERIA OTHER MOBILE 0021363
21364	ALGERIA OTHER MOBILE 0021364
97155	United Arab Emirates Mobile Du
97152	United Arab Emirates Mobile Du
97189	United Arab Emirates Mobile
97180	United Arab Emirates Mobile
9715	United Arab Emirates Mobile
971929	United Arab Emirates Du
9717277	United Arab Emirates Du
9716678	United Arab Emirates Du
9714567	United Arab Emirates Du
971443	United Arab Emirates Du

File2:

Code:
9716678
9714432
44777
9330
1907
355680
35569
213700
971929
9717277

I want to compare First column of File1 with First column of File2, print if match is found, but incase no exact match is found, print the best match

e.g. below for looking up 3 values from file2 in file1 entries
Code:
9716678 will print (9716678, 9716678 United Arab Emirates Du )because it was exact match 
9714432 will print (9714432, 971443 United Arab Emirates Du ) because it was best match and 9714432 was not found
213700 will print (213700, 2137 ALGERIA DJEZZY MOBILE 002137  ) because it was best match and 213700  was not found

Thanks

Last edited by Scrutinizer; 01-20-2016 at 01:13 PM.. Reason: CODE tags
# 2  
Old 01-20-2016
This one strips off more and more digits from the end until it finds a match
Code:
awk '
NR==FNR {
# file1
  s1=$1
  $1=""  # this changes $0
  s[s1]=$0
  next
}
{
# file2
  len=length($1)
  for (i=len; i>0; i--) {
    lookup=substr($1,1,i)
    if (lookup in s) {
      print $1 ",", lookup s[lookup], i==len ? "(exact match)" : "(best match)"
      next  # jump to next cycle
    }
  }
  print $1, "(no match)"
}
' file1 file2

This User Gave Thanks to MadeInGermany For This Post:
# 3  
Old 01-20-2016
Closest match can have several meanings... Assuming that you mean you want the longest string in File1 that is a leading substring of the string in File2, you could also try this slightly different approach:
Code:
awk '
FNR == NR {
	prefix[$1] = $0
	next
}
{	lm = 0
	for(p in prefix)
		if($1 == p) {
			print $1, prefix[$1]
			next
		} else {if(match($1, "^" p) && RLENGTH > lm) {
				lm = RLENGTH
				v = p
			}
		}
	print $1, lm ? prefix[v] : "** No match **"
}' OFS=", " File1 File2

which, with your sample File1 contents and the following in File2:
Code:
9716678
9714432
44777
9330
1907
355680
35569
213700
971929
9717277
1408

(note the added line at the end) produces the output:
Code:
9716678, 9716678	United Arab Emirates Du
9714432, 971443	United Arab Emirates Du
44777, 447	United Kingdom
9330, 93	AFGHANISTAN 0093
1907, 1907	ALASKA 001907
355680, 35568	ALBANIA MOBILE 0035568
35569, 35569	ALBANIA MOBILE 0035569
213700, 2137	ALGERIA DJEZZY MOBILE 002137
971929, 971929	United Arab Emirates Du
9717277, 9717277	United Arab Emirates Du
1408, ** No match **

If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.
This User Gave Thanks to Don Cragun For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

awk to update file based on partial match in field1 and exact match in field2

I am trying to create a cronjob that will run on startup that will look at a list.txt file to see if there is a later version of a database using database.txt as the source. The matching lines are written to output. $1 in database.txt will be in list.txt as a partial match. $2 of database.txt... (2 Replies)
Discussion started by: cmccabe
2 Replies

2. UNIX for Beginners Questions & Answers

Find command with Metacharacter (*) Should match exact filename

Hi, Below is list of files in my directory. -rw-rw-r--. 1 Roots Roots 0 Dec 26 06:58 12345_kms_report.csv -rw-rw-r--. 1 Roots Roots 0 Dec 26 06:59 12346_kms_report.csv -rw-rw-r--. 1 Roots Roots 0 Dec 26 06:59 12347_kms_report.csv -rw-rw-r--. 1 Roots Roots 0 Dec 26 06:59... (2 Replies)
Discussion started by: Balraj
2 Replies

3. Shell Programming and Scripting

Exact match using sed

I would like replace all the rows in a file if a row has an exact match to number say 21 in a tab delimited file. I want to delete the row only if it has 21 any of the rows but it should not delecte the row that has 542178 or 563421. I tried this sed '/\<21\>/d' ./inputfile > output.txt ... (7 Replies)
Discussion started by: Kanja
7 Replies

4. Shell Programming and Scripting

Help me with s script to find exact match

Hi, im extracting data from oracle DB. As the data is case sensitive, i have to extract the data which doesn't match exactly. im poor in unix scripting, can someone plz help me with a script. Here are the details. Need to compare the second column of the each line of the file1.csv with the data in... (5 Replies)
Discussion started by: JSKOBS
5 Replies

5. Shell Programming and Scripting

Exact match and #

Hi friends, i am using the following grep command for exact word match: >echo "sachin#tendulkar" | grep -iw "sachin" output: sachin#tendulkar as we can see in the above example that its throwinng the exact match(which is not the case as the keyword is sachin and string is... (6 Replies)
Discussion started by: neelmani
6 Replies

6. Shell Programming and Scripting

QUESTION1: grep only exact string. QUESTION2: find and replace only exact value with sed

QUESTION1: How do you grep only an exact string. I am using Solaris10 and do not have any GNU products installed. Contents of car.txt CAR1_KEY0 CAR1_KEY1 CAR2_KEY0 CAR2_KEY1 CAR1_KEY10 CURRENT COMMAND LINE: WHERE VARIABLE CAR_NUMBER=1 AND KEY_NUMBER=1 grep... (1 Reply)
Discussion started by: thibodc
1 Replies

7. Shell Programming and Scripting

How to find lines that match exact input and count?

I am writing a package manager in BASH and I would like a small snippet of code that finds lines that match exact input and count them. For example, my file contains: xyz xyz-lib2.0+ xyz-lib2.0 xyz-lib1.5 and "grep -c xyz" returns 4. The current function is: # $1 is the package name.... (3 Replies)
Discussion started by: cooprocks123e
3 Replies

8. Shell Programming and Scripting

exact string match ; search and print match

I am trying to match a pattern exactly in a shell script. I have tried two methods awk '/\<mpath${CURR_MP}\>/{print $1 $2}' multipath perl -ne '/\bmpath${CURR_MP}\b/ and print' /var/tmp/multipath Both these methods require that I use the escape character. I am guessing that is why... (8 Replies)
Discussion started by: bash_in_my_head
8 Replies

9. Shell Programming and Scripting

perl exact match

How to emulate grep -o option in perl. I mean to print not all line, only the exact match. echo "2A2 BB" | perl -ne 'print if /2A2/' 2A2 BB I want to print only 2A2. (2 Replies)
Discussion started by: mirusnet
2 Replies

10. UNIX for Dummies Questions & Answers

using sed or grep to find exact match of text

Hi, Can anyone help me with the text editing I need here. I have a file that contains the following lines for example: (line numbers are for illustration only) 1 Hello world fantasy. 2 Hello worldfuntastic. 3 Hello world wonderful. I would like to get all those lines of text that... (5 Replies)
Discussion started by: risk_sly
5 Replies
Login or Register to Ask a Question