removing duplicate records comparing 2 csv files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting removing duplicate records comparing 2 csv files
# 1  
Old 02-18-2012
removing duplicate records comparing 2 csv files

Hi All,

I want to remove the rows from File1.csv by comparing a column/field in the File2.csv. If both columns matches then I want that row to be deleted from File1 using shell script(awk). Here is an example on what I need.

File1.csv:

RAJAK,ACTIVE,1
VIJAY,ACTIVE,2
TAHA,ACTIVE,3

File2.csv:

VIJAY
TAHA

Output:

RAJAK,ACTIVE,1

Above scenario I need to delete the records if col1 of File1=col2 of File2 and the output should be File1 after removing the duplicate records.

Can you please help me out in preparing a shell script for above.

Thanks in Advance.
# 2  
Old 02-18-2012
Use grep with the -v and -f option. Check your man page.
# 3  
Old 02-18-2012
Does the below command snippet serves your purpose ?

Code:
egrep -v $(cat file2.csv | tr '\n' '|' | sed 's/.$//') file1.csv

# 4  
Old 02-19-2012
Hi codemaniac,

I tried the given code but it is giving the following error.
Error: grep: can't open |VIJAY

Actually I'm new to this shell/awk scripting. Is there any help using awk script will be very much helpful.

Thank you.
# 5  
Old 02-19-2012
Working with Franklin52's suggestion this is probably all you need:

Code:
grep -v -f file2.csv file1.csv >output-file

I note that in your sample, file2 isn't actually a comma separated list. If that is true, then the previous command will be fine. However, if file2 is indeed a comma separated list (as the name and your description implies) then you'll need to take a different approach.
# 6  
Old 02-19-2012
Dear Rajak ,

Can you chechk if egrep is available in uour *NIX flavor ?

Code:
which egrep

I have tested the below commandline in my RHEL and DEBIAN box .
Code:
egrep -v $(cat file2.csv | tr '\n' '|' | sed 's/.$//') file1.csv

Otherwise Agama's approach is the easiest , and you can use that .
# 7  
Old 02-19-2012
Dear Rajak ,

Can you chechk if egrep is available in uour *NIX flavor ?

Code:
which egrep

I have tested the below commandline in my RHEL and DEBIAN box .
Code:
egrep -v $(cat file2.csv | tr '\n' '|' | sed 's/.$//') file1.csv

Otherwise Agama's approach is the easiest , and you can use that .
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

CSV File:Filter duplicate records from column1 & another column having unique record

Hi Experts, I have csv file with 30, 40 columns Pasting just 2 column for problem description. Need to print error if below combination is not present in file check for column-1 (DocumentNumber) and filter columns where value in DocumentNumber field is same. For all such rows, the field... (7 Replies)
Discussion started by: as7951
7 Replies

2. Shell Programming and Scripting

Filter duplicate records from csv file with condition on one column

I have csv file with 30, 40 columns Pasting just three column for problem description I want to filter record if column 1 matches CN or DN then, check for values in column 2 if column contain 1235, 1235 then in column 3 values must be sequence of 2345, 2345 and if column 2 contains 6789, 6789... (5 Replies)
Discussion started by: as7951
5 Replies

3. Shell Programming and Scripting

Removing specific records from files when duplicate key

Hello I have been trying to remove a row from a file which has the same first three columns as another row - I have tried lots of different combinations of suggestion on this forum but can't get it exactly right. what I have is 900 - 1000 = 0 900 - 1000 = 2562 1000 - 1100 = 0 1000 - 1100... (7 Replies)
Discussion started by: tinytimmay
7 Replies

4. Shell Programming and Scripting

Comparing 2 CSV files and sending the difference to a new csv file

(say) I have 2 csv files - file1.csv & file2.csv as mentioned below: file1.csv ID,version,cost 1000,1,30 2000,2,40 3000,3,50 4000,4,60 file2.csv ID,version,cost 1000,1,30 2000,2,45 3000,4,55 6000,5,70 ... (1 Reply)
Discussion started by: Naresh101
1 Replies

5. Shell Programming and Scripting

Removing duplicate records in a file based on single column explanation

I was reading this thread. It looks like a simpler way to say this is to only keep uniq lines based on field or column 1. https://www.unix.com/shell-programming-scripting/165717-removing-duplicate-records-file-based-single-column.html Can someone explain this command please? How are there no... (5 Replies)
Discussion started by: cokedude
5 Replies

6. Shell Programming and Scripting

Removing duplicate records in a file based on single column

Hi, I want to remove duplicate records including the first line based on column1. For example inputfile(filer.txt): ------------- 1,3000,5000 1,4000,6000 2,4000,600 2,5000,700 3,60000,4000 4,7000,7777 5,999,8888 expected output: ---------------- 3,60000,4000 4,7000,7777... (5 Replies)
Discussion started by: G.K.K
5 Replies

7. UNIX for Dummies Questions & Answers

CSV file:Find duplicates, save original and duplicate records in a new file

Hi Unix gurus, Maybe it is too much to ask for but please take a moment and help me out. A very humble request to you gurus. I'm new to Unix and I have started learning Unix. I have this project which is way to advanced for me. File format: CSV file File has four columns with no header... (8 Replies)
Discussion started by: arvindosu
8 Replies

8. Linux

Need awk script for removing duplicate records

I have log file having Traffic line 2011-05-21 15:11:50.356599 TCP (6), length: 52) 10.10.10.1.3020 > 10.10.10.254.50404: 2011-05-21 15:11:50.652739 TCP (6), length: 52) 10.10.10.254.50404 > 10.10.10.1.3020: 2011-05-21 15:11:50.652558 TCP (6), length: 89) 10.10.10.1.3020 >... (1 Reply)
Discussion started by: Rastamed
1 Replies

9. Shell Programming and Scripting

Removing duplicate records from 2 files

Can anyone help me to removing duplicate records from 2 separate files in UNIX? Please find the sample records for both the files cat Monday.dat 3FAHP0JA1AR319226MOHMED ATEK 966504453742 SAU2010DE 3LNHL2GC6AR636361HEA DEUK CHOI 821057314531 KOR2010LE 3MEHM0JG7AR652083MUTLAB NAL-NAFISAH... (4 Replies)
Discussion started by: zooby
4 Replies

10. Linux

Need awk script for removing duplicate records

I have huge txt file having millions of trade data. For e.g Trade.txt (first 8 lines in the file is header info) COB_DATE,TRADE_ID,SOURCE_SYSTEM_TRADE_ID,TRADE_GROUP_ID, TRADE_TYPE,DEALER_NAME,EXTERNAL_COUNTERPARTY_ID, EXTERNAL_COUNTERPARTY_NAME,DB_COUNTERPARTY_ID,... (6 Replies)
Discussion started by: nmumbarkar
6 Replies
Login or Register to Ask a Question