Removing duplicate records from 2 files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Removing duplicate records from 2 files
# 1  
Old 07-27-2010
Removing duplicate records from 2 files

Can anyone help me to removing duplicate records from 2 separate files in UNIX?

Please find the sample records for both the files

Code:
cat Monday.dat
3FAHP0JA1AR319226MOHMED ATEK 966504453742 SAU2010DE
3LNHL2GC6AR636361HEA DEUK CHOI 821057314531 KOR2010LE
3MEHM0JG7AR652083MUTLAB NAL-NAFISAH 966552299383 966552299383 SAU2010MI
NM0KS9BN1AT035143JOSE AHERNANDEZ-RODRIGUEZ 7878545055 PRI2010C6
NM0KS9BN9AT030157JOSE AHERNANDEZ-RODRIGUEZ 7878545055 PRI2009C6

Code:
cat Tuesday.dat
1FAHP25106G169212O-GYEONG GWON 821191370489 KOR2006FH
3LNHL2GC6AR636361HEA DEUK CHOI 821057314531 KOR2010LE
3MEHM0JG7AR652083MUTLAB NAL-NAFISAH 966552299383 966552299383 SAU2010MI
1FAHP25196G136869SEONGYUL KIM 82117722451 KOR2006FH
1FAHP25W58G107612HYUNGKYOO PARK 82623642043 KOR2008FH

I need to compare Monday.dat and Tuesday.dat and delete the duplicate records which exits in both files and get desire output like below
Code:
1FAHP25106G169212O-GYEONG GWON 821191370489 KOR2006FH
1FAHP25196G136869SEONGYUL KIM 82117722451 KOR2006FH
1FAHP25W58G107612HYUNGKYOO PARK 82623642043 KOR2008FH

Moderator's Comments:
Mod Comment Having 20 posts you should be familiar by now with using code tags - if not, you got a PM Smilie

Last edited by zaxxon; 07-27-2010 at 11:35 AM..
# 2  
Old 07-27-2010
Code:
$> grep -vf Monday.dat Tuesday.dat
1FAHP25106G169212O-GYEONG GWON 821191370489 KOR2006FH
1FAHP25196G136869SEONGYUL KIM 82117722451 KOR2006FH
1FAHP25W58G107612HYUNGKYOO PARK 82623642043 KOR2008FH

# 3  
Old 07-27-2010
I think that they are miisng lines in your output.

A possible solution:
Code:
$ head Monday.txt Tuesday.txt
==> Monday.txt <==
3FAHP0JA1AR319226MOHMED ATEK 966504453742 SAU2010DE
3LNHL2GC6AR636361HEA DEUK CHOI 821057314531 KOR2010LE
3MEHM0JG7AR652083MUTLAB NAL-NAFISAH 966552299383 966552299383 SAU2010MI
NM0KS9BN1AT035143JOSE AHERNANDEZ-RODRIGUEZ 7878545055 PRI2010C6
NM0KS9BN9AT030157JOSE AHERNANDEZ-RODRIGUEZ 7878545055 PRI2009C6

==> Tuesday.txt <==
1FAHP25106G169212O-GYEONG GWON 821191370489 KOR2006FH
3LNHL2GC6AR636361HEA DEUK CHOI 821057314531 KOR2010LE
3MEHM0JG7AR652083MUTLAB NAL-NAFISAH 966552299383 966552299383 SAU2010MI
1FAHP25196G136869SEONGYUL KIM 82117722451 KOR2006FH
1FAHP25W58G107612HYUNGKYOO PARK 82623642043 KOR2008FH
$ sort Monday.txt > Monday.tmp
$ sort Tuesday.txt > Tuesday.tmp
$ head Monday.tmp Tuesday.tmp
==> Monday.tmp <==
3FAHP0JA1AR319226MOHMED ATEK 966504453742 SAU2010DE
3LNHL2GC6AR636361HEA DEUK CHOI 821057314531 KOR2010LE
3MEHM0JG7AR652083MUTLAB NAL-NAFISAH 966552299383 966552299383 SAU2010MI
NM0KS9BN1AT035143JOSE AHERNANDEZ-RODRIGUEZ 7878545055 PRI2010C6
NM0KS9BN9AT030157JOSE AHERNANDEZ-RODRIGUEZ 7878545055 PRI2009C6

==> Tuesday.tmp <==
1FAHP25106G169212O-GYEONG GWON 821191370489 KOR2006FH
1FAHP25196G136869SEONGYUL KIM 82117722451 KOR2006FH
1FAHP25W58G107612HYUNGKYOO PARK 82623642043 KOR2008FH
3LNHL2GC6AR636361HEA DEUK CHOI 821057314531 KOR2010LE
3MEHM0JG7AR652083MUTLAB NAL-NAFISAH 966552299383 966552299383 SAU2010MI
$ join -v1 -v2 Monday.tmp Tuesday.tmp
1FAHP25106G169212O-GYEONG GWON 821191370489 KOR2006FH
1FAHP25196G136869SEONGYUL KIM 82117722451 KOR2006FH
1FAHP25W58G107612HYUNGKYOO PARK 82623642043 KOR2008FH
3FAHP0JA1AR319226MOHMED ATEK 966504453742 SAU2010DE
NM0KS9BN1AT035143JOSE AHERNANDEZ-RODRIGUEZ 7878545055 PRI2010C6
NM0KS9BN9AT030157JOSE AHERNANDEZ-RODRIGUEZ 7878545055 PRI2009C6
$ rm Monday.tmp Tuesday.tmp
$

Jean-Pierre.
# 4  
Old 07-27-2010
Code:
cat Monday.txt Tuesday.txt | sort | uniq -u

# 5  
Old 07-27-2010
Since both file has huge records it difficult for me to confirm that i got exact result. The following command ened with diffrent counts. I need to delete the duplicate records which exits in both files.
Code:
grep -vf Monday.dat Tuesday.dat

Code:
grep -vf Tuesday.dat Monday.dat

i tried join cmd but it produced the result with diffrent file format. Thanks.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Duplicate records

Gents, Please give a help file --BAD STATUS NOT RESHOOTED-- *** VP 41255/51341 in sw 2973 *** VP 41679/51521 in sw 2973 *** VP 41687/51653 in sw 2973 *** VP 41719/51629 in sw 2976 --BAD COG NOT RESHOOTED-- *** VP 41689/51497 in sw 2974 *** VP 41699/51677 in sw 2974 *** VP... (18 Replies)
Discussion started by: jiam912
18 Replies

2. Shell Programming and Scripting

Removing specific records from files when duplicate key

Hello I have been trying to remove a row from a file which has the same first three columns as another row - I have tried lots of different combinations of suggestion on this forum but can't get it exactly right. what I have is 900 - 1000 = 0 900 - 1000 = 2562 1000 - 1100 = 0 1000 - 1100... (7 Replies)
Discussion started by: tinytimmay
7 Replies

3. Shell Programming and Scripting

Deleting duplicate records from file 1 if records from file 2 match

I have 2 files "File 1" is delimited by ";" and "File 2" is delimited by "|". File 1 below (3 record shown): Doc1;03/01/2012;New York;6 Main Street;Mr. Smith 1;Mr. Jones Doc2;03/01/2012;Syracuse;876 Broadway;John Davis;Barbara Lull Doc3;03/01/2012;Buffalo;779 Old Windy Road;Charles... (2 Replies)
Discussion started by: vestport
2 Replies

4. Shell Programming and Scripting

Removing duplicate records in a file based on single column explanation

I was reading this thread. It looks like a simpler way to say this is to only keep uniq lines based on field or column 1. https://www.unix.com/shell-programming-scripting/165717-removing-duplicate-records-file-based-single-column.html Can someone explain this command please? How are there no... (5 Replies)
Discussion started by: cokedude
5 Replies

5. Shell Programming and Scripting

removing duplicate records comparing 2 csv files

Hi All, I want to remove the rows from File1.csv by comparing a column/field in the File2.csv. If both columns matches then I want that row to be deleted from File1 using shell script(awk). Here is an example on what I need. File1.csv: RAJAK,ACTIVE,1 VIJAY,ACTIVE,2 TAHA,ACTIVE,3... (6 Replies)
Discussion started by: rajak.net
6 Replies

6. Shell Programming and Scripting

Removing duplicate records in a file based on single column

Hi, I want to remove duplicate records including the first line based on column1. For example inputfile(filer.txt): ------------- 1,3000,5000 1,4000,6000 2,4000,600 2,5000,700 3,60000,4000 4,7000,7777 5,999,8888 expected output: ---------------- 3,60000,4000 4,7000,7777... (5 Replies)
Discussion started by: G.K.K
5 Replies

7. Linux

Need awk script for removing duplicate records

I have log file having Traffic line 2011-05-21 15:11:50.356599 TCP (6), length: 52) 10.10.10.1.3020 > 10.10.10.254.50404: 2011-05-21 15:11:50.652739 TCP (6), length: 52) 10.10.10.254.50404 > 10.10.10.1.3020: 2011-05-21 15:11:50.652558 TCP (6), length: 89) 10.10.10.1.3020 >... (1 Reply)
Discussion started by: Rastamed
1 Replies

8. Linux

Need awk script for removing duplicate records

I have huge txt file having millions of trade data. For e.g Trade.txt (first 8 lines in the file is header info) COB_DATE,TRADE_ID,SOURCE_SYSTEM_TRADE_ID,TRADE_GROUP_ID, TRADE_TYPE,DEALER_NAME,EXTERNAL_COUNTERPARTY_ID, EXTERNAL_COUNTERPARTY_NAME,DB_COUNTERPARTY_ID,... (6 Replies)
Discussion started by: nmumbarkar
6 Replies

9. Shell Programming and Scripting

Records Duplicate

Hi Everyone, I have a flat file of 1000 unique records like following : For eg Andy,Flower,201-987-0000,12/23/01 Andrew,Smith,101-387-3400,11/12/01 Ani,Ross,401-757-8640,10/4/01 Rich,Finny,245-308-0000,2/27/06 Craig,Ford,842-094-8740,1/3/04 . . . . . . Now I want to duplicate... (9 Replies)
Discussion started by: ganesh123
9 Replies

10. Shell Programming and Scripting

Removing duplicate files from list with different path

I have a list which contains all the jar files shipped with the product I am involved with. Now, in this list I have some jar files which appear again and again. But these jar files are present in different folders. My input file looks like this /path/1/to a.jar /path/2/to a.jar /path/1/to... (10 Replies)
Discussion started by: vino
10 Replies
Login or Register to Ask a Question