removing duplicate records comparing 2 csv files

02-18-2012

Registered User

14, 0

Join Date: Nov 2011

Last Activity: 14 June 2012, 8:52 AM EDT

Posts: 14

Thanks Given: 1

Thanked 0 Times in 0 Posts

removing duplicate records comparing 2 csv files

Hi All,

I want to remove the rows from File1.csv by comparing a column/field in the File2.csv. If both columns matches then I want that row to be deleted from File1 using shell script(awk). Here is an example on what I need.

File1.csv:

RAJAK,ACTIVE,1
VIJAY,ACTIVE,2
TAHA,ACTIVE,3

File2.csv:

VIJAY
TAHA

Output:

RAJAK,ACTIVE,1

Above scenario I need to delete the records if col1 of File1=col2 of File2 and the output should be File1 after removing the duplicate records.

Can you please help me out in preparing a shell script for above.

Thanks in Advance.

rajak.net

View Public Profile for rajak.net

Find all posts by rajak.net

02-18-2012

Registered User

7,747, 559

Join Date: Feb 2007

Last Activity: 20 April 2020, 11:28 AM EDT

Location: The Netherlands

Posts: 7,747

Thanks Given: 139

Thanked 559 Times in 520 Posts

Use grep with the -v and -f option. Check your man page.

Franklin52

View Public Profile for Franklin52

Find all posts by Franklin52

02-18-2012

Registered User

120, 24

Join Date: Feb 2012

Last Activity: 29 March 2019, 9:40 AM EDT

Location: /home/arijit

Posts: 120

Thanks Given: 3

Thanked 24 Times in 24 Posts

Does the below command snippet serves your purpose ?

Code:

egrep -v $(cat file2.csv | tr '\n' '|' | sed 's/.$//') file1.csv

codemaniac

View Public Profile for codemaniac

Find all posts by codemaniac

02-19-2012

Registered User

14, 0

Join Date: Nov 2011

Last Activity: 14 June 2012, 8:52 AM EDT

Posts: 14

Thanks Given: 1

Thanked 0 Times in 0 Posts

Hi codemaniac,

I tried the given code but it is giving the following error.
Error: grep: can't open |VIJAY

Actually I'm new to this shell/awk scripting. Is there any help using awk script will be very much helpful.

Thank you.

rajak.net

View Public Profile for rajak.net

Find all posts by rajak.net

02-19-2012

Registered User

1,466, 512

Join Date: Jul 2010

Last Activity: 7 April 2014, 3:02 PM EDT

Location: earth>US>UTC-5

Posts: 1,466

Thanks Given: 110

Thanked 512 Times in 491 Posts

Working with Franklin52's suggestion this is probably all you need:

Code:

grep -v -f file2.csv file1.csv >output-file

I note that in your sample, file2 isn't actually a comma separated list. If that is true, then the previous command will be fine. However, if file2 is indeed a comma separated list (as the name and your description implies) then you'll need to take a different approach.

agama

View Public Profile for agama

Find all posts by agama

02-19-2012

Registered User

120, 24

Join Date: Feb 2012

Last Activity: 29 March 2019, 9:40 AM EDT

Location: /home/arijit

Posts: 120

Thanks Given: 3

Thanked 24 Times in 24 Posts

Dear Rajak ,

Can you chechk if egrep is available in uour *NIX flavor ?

Code:

which egrep

I have tested the below commandline in my RHEL and DEBIAN box .

Code:

egrep -v $(cat file2.csv | tr '\n' '|' | sed 's/.$//') file1.csv

Otherwise Agama's approach is the easiest , and you can use that .

codemaniac

View Public Profile for codemaniac

Find all posts by codemaniac

02-19-2012

Registered User

120, 24

Join Date: Feb 2012

Last Activity: 29 March 2019, 9:40 AM EDT

Location: /home/arijit

Posts: 120

Thanks Given: 3

Thanked 24 Times in 24 Posts

Dear Rajak ,

Can you chechk if egrep is available in uour *NIX flavor ?

Code:

which egrep

I have tested the below commandline in my RHEL and DEBIAN box .

Code:

egrep -v $(cat file2.csv | tr '\n' '|' | sed 's/.$//') file1.csv

Otherwise Agama's approach is the easiest , and you can use that .

codemaniac

View Public Profile for codemaniac

Find all posts by codemaniac

Shell Programming and Scripting

removing duplicate records comparing 2 csv files

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

CSV File:Filter duplicate records from column1 & another column having unique record

Discussion started by: as7951

2. Shell Programming and Scripting

Filter duplicate records from csv file with condition on one column

Discussion started by: as7951

3. Shell Programming and Scripting

Removing specific records from files when duplicate key

Discussion started by: tinytimmay

4. Shell Programming and Scripting

Comparing 2 CSV files and sending the difference to a new csv file

Discussion started by: Naresh101

5. Shell Programming and Scripting

Removing duplicate records in a file based on single column explanation

Discussion started by: cokedude

6. Shell Programming and Scripting

Removing duplicate records in a file based on single column

Discussion started by: G.K.K

7. UNIX for Dummies Questions & Answers

CSV file:Find duplicates, save original and duplicate records in a new file

Discussion started by: arvindosu

8. Linux

Need awk script for removing duplicate records

Discussion started by: Rastamed

9. Shell Programming and Scripting

Removing duplicate records from 2 files

Discussion started by: zooby

10. Linux

Need awk script for removing duplicate records

Discussion started by: nmumbarkar