Deleting duplicate records from file 1 if records from file 2 match

05-05-2012

Registered User

16, 0

Join Date: Mar 2010

Last Activity: 18 July 2012, 8:26 AM EDT

Posts: 16

Thanks Given: 2

Thanked 0 Times in 0 Posts

Deleting duplicate records from file 1 if records from file 2 match

I have 2 files

"File 1" is delimited by ";" and "File 2" is delimited by "|".

File 1 below (3 record shown):

Code:

Doc1;03/01/2012;New York;6 Main Street;Mr. Smith 1;Mr. Jones
Doc2;03/01/2012;Syracuse;876 Broadway;John Davis;Barbara Lull
Doc3;03/01/2012;Buffalo;779 Old Windy Road;Charles O'Brien;Bill Rudd

File 2 below (4 records shown):

Code:

6 Main Street|New York
345 Tipp Road|Brewser
885 Peartree|Buffalo
779 Old Windy Road|Buffalo

"File 1" is faily small, "File 2" is huge.

My problem: Line by line I need to copare each record in "File 1", the 3rd field (city) and 4th field (address)against matching field data in "File 2", the 1st field (address) and 2nd field (city) to make sure that there are no record matches.

All records that do not match should be copied out or > redirected to a new file (the edited file). If there is a match then that record should not be copied out to the edited file.

In other words given the example data above from "File 1" and "File 2" the "new edited file" should look like this:

Code:

Doc2;03/01/2012;Syracuse;876 Broadway;John Davis;Barbara Lull

The other 2 files below would be discarded as records matched "File 2"

Code:

Doc1;03/01/2012;New York;6 Main Street;Mr. Smith 1;Mr. Jones
Doc3;03/01/2012;Buffalo;779 Old Windy Road;Charles O'Brien;Bill Rudd

I hope that is not too confusing. I know this can probably be done with awk but I am as rusty as the Titanic with coding and lucky I got as far as I did with this project. Many thanks to "agama" for helping out on the last issue!

Thanks in advance for any replies!

Art

Last edited by Franklin52; 05-05-2012 at 10:45 AM.. Reason: Please use code tags for data and code samples, thank you

vestport

View Public Profile for vestport

Find all posts by vestport

05-05-2012

Moderator

1,484, 567

Join Date: Mar 2011

Last Activity: 28 November 2020, 9:34 AM EST

Posts: 1,484

Thanks Given: 68

Thanked 567 Times in 444 Posts

First you convert the separator in File1 to pipe, use tr or sed, it's fairly simple.

Then try this code :

Code:

awk -F"|" 'NR==FNR { s=$1FS$2; a[s] = $0; next }  ! a[$4FS$3] { print > "nonmatch.txt" }  ' file2 file1

Hope that helps

Regards
Peasant.

Peasant

View Public Profile for Peasant

Find all posts by Peasant

05-05-2012

Registered User

16, 0

Join Date: Mar 2010

Last Activity: 18 July 2012, 8:26 AM EDT

Posts: 16

Thanks Given: 2

Thanked 0 Times in 0 Posts

peasant thanks so much for that! It worked perfectly!

What I did was as you suggested convert the ";" delimiters in the one file first to "|" to get a common delimiter as your code uses the -F"|" option:

Code:

cat FileThatNeedsConverting | sed 's/;/|/g' > ConvertedFile

I was going to ask how to see discarded data but a simple "diff" between the 2 files (original and nonmatch.txt) accomplishes that.

Code:

diff OriginalFile nonmatch.txt

Also by doing a:

Code:

wc -l OriginalFile

and a:

Code:

wc -l nonmatch.txt

you can see that records were shaved off. I just wanted to add that in the case that this may help someone else verify or similar project.

Many thanks man!

Art

Last edited by Franklin52; 05-05-2012 at 10:45 AM.. Reason: Please use code tags for data and code samples, thank you

vestport

View Public Profile for vestport

Find all posts by vestport

Shell Programming and Scripting

Deleting duplicate records from file 1 if records from file 2 match

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Deleting records from .dat file

Discussion started by: narayanv

2. UNIX for Dummies Questions & Answers

CSV file:Find duplicates, save original and duplicate records in a new file

Discussion started by: arvindosu

3. Shell Programming and Scripting

Find Duplicate records in first Column in File

Discussion started by: Murugesh

4. Shell Programming and Scripting

Deleting Duplicate Records

Discussion started by: DFr0st

5. Shell Programming and Scripting

Deleting last records of a file

Discussion started by: vsairam

6. Shell Programming and Scripting

find out duplicate records in file?

Discussion started by: tiger2000

7. Shell Programming and Scripting

compare fields in a file with duplicate records

Discussion started by: rleal

8. Shell Programming and Scripting

How to find Duplicate Records in a text file

Discussion started by: G.Aavudai

9. UNIX for Advanced & Expert Users

Duplicate records from oracle to text file.

Discussion started by: shilendrajadon

10. Shell Programming and Scripting

Remove all instances of duplicate records from the file

Discussion started by: vukkusila