Comparing two CSV files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Comparing two CSV files
# 1  
Old 06-01-2016
Comparing two CSV files

I have two csv files and im trying to compare them. e.g.

SAMPLE DATA:

file one:
Code:
ZipCode            Name
20878             Washington
10023              Missouri
20304              Maryland

file two:
Code:
ID                 Name            City                 ZipCode
11654      Arizona               Phoenix              10001
12343     Washington             DC                   20878
12343     New Hampshire      Concord             20854
54211        Maryland           Silver Spring         20304             
54877       Missouri              St. Louis             10023

SAMPLE OUTPUT:

Code:
12343     Washington             DC                   20878
54211        Maryland           Silver Spring         20304             
54877       Missouri              St. Louis             10023

These 3 rows only display because they are the rows we were looking for from file 1. Basically filtering file2 by zipcode to display all the zipcodes from file 1 but including all the other data components of file 2.

these files have like tens of thousands of rows so I wanted to retrieve all the rows in file 2 that exist with the same zip code in file 1. I thought this would be like an awk command like :

Code:
awk*-F','*'NR==FNR{c[$1]++;next};c[$4] > 0'*file2 file1


Am I not understanding the AWK command or typing something wrong? Comparing field 1 from file 1 to field 4 from file 2.




Moderator's Comments:
Mod Comment Please use code tags as required by forum rules!

Last edited by RudiC; 06-01-2016 at 06:42 PM.. Reason: Added code tags.
# 2  
Old 06-01-2016
Quote:
Originally Posted by dan139
I have two csv files and im trying to compare them. e.g.

file one:
Code:
ZipCode   Name
20878      Washington

file two:
Code:
ID           Name     City           ZipCode Etc etc
11654      Arizona   Phoenix    10001
12343     Washington DC        20878
etc etc

these files have like tens of thousands of rows so I wanted to retrieve all the rows in file 2 that exist with the same zip code in file 1. I thought this would be like an awk command like :

Code:
awk*-F','*'NR==FNR{c[$1]++;next};c[$4] > 0'*file2 file1


Am I not understanding the AWK command or typing something wrong? Comparing field 1 from file 1 to field 4 from file 2.
Hi, dan139

It would be much helpful if you were to give a few more lines from file1 and file2, in that way we can see what's the real structure of the files, which it will make a difference. Also, it would be most helpful if you were to post your expected output example.

Make sure you use the the code tag for your posts, that preserves the structure of it. My quote of your first post has an example how I code tagged your entries. Make sure it looks the same.
This User Gave Thanks to Aia For This Post:
# 3  
Old 06-01-2016
Added a more detailed response. let me know if you need more clarification.

basically taking all of the zips from file 1 and getting the rows for those same zips in file 2.
# 4  
Old 06-01-2016
Give this a try:
Code:
awk 'FNR==NR {zip[$1]; next} $NF in zip' file1 file2

Code:
FNR==NR # this condition is true only for the first file
{zip[$1]; next} # record the zips from file1 only
$NF in zip # if the zip, (last column) in file2, is found in the records, display

Output:
Code:
ID                 Name            City                 ZipCode
12343     Washington             DC                   20878
54211        Maryland           Silver Spring         20304
54877       Missouri              St. Louis             10023

# 5  
Old 06-01-2016
Quote:
Originally Posted by Aia
Give this a try:
Code:
awk 'FNR==NR {zip[$1]; next} $NF in zip' file1 file2

Code:
FNR==NR # this condition is true only for the first file
{zip[$1]; next} # record the zips from file1 only
$NF in zip # if the zip, (last column) in file2, is found in the records, display

Output:
Code:
ID                 Name            City                 ZipCode
12343     Washington             DC                   20878
54211        Maryland           Silver Spring         20304
54877       Missouri              St. Louis             10023


Thanks for the reply. So I realized that my data in file 2, it has the Zipcode in column #2, not the last the column. would the command be:

Code:
awk 'FNR==NR {zip[$1]; next} $2 in zip' file1 file2

I tried both versions and nothing was in the output so not really sure.
# 6  
Old 06-01-2016
Quote:
Originally Posted by dan139
Thanks for the reply. So I realized that my data in file 2, it has the Zipcode in column #2, not the last the column. would the command be:

Code:
awk 'FNR==NR {zip[$1]; next} $2 in zip' file1 file2

I tried both versions and nothing was in the output so not really sure.
That's why it is important to post accurate representative information.
To answer your last question: Yes, your modification "should" work if what you are saying is factual. However, since you are not getting any output that could only mean that what you think is not accurate, neither.

Please, post the real information for file2
# 7  
Old 06-01-2016
Quote:
Originally Posted by dan139
.
.
.
Code:
awk*-F','*'NR==FNR{c[$1]++;next};c[$4] > 0'*file2 file1


Am I not understanding the AWK command or typing something wrong? Comparing field 1 from file 1 to field 4 from file 2.
.
.
.
You were close.
- What are the stars for? Replace by spaces.
- Why did you specify the comma field separator when there's not a single comma in any of your files? Remove, use default.
- Due to the spaces in city names, $4 is not always the zip code in file2. Use $NF.
- file1 should be used to populate the array. Reverse the files.


For your new approach with the zip code in $2, I can't see why it shouldn't work; actually it does for me.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

awk assistance - Comparing 2 csv files

Hello all, I have searched high and low for a solution to this, many have come really close but not quite what I'm after. I have 2 files. One contains GUID's, for example: 8121E002-96FE-4C9C-BC5A-6AFF20DACECD 84468F30-F3B7-418B-81F0-0908E80792BF A second file, contains a path to the... (8 Replies)
Discussion started by: tirmUK
8 Replies

2. Shell Programming and Scripting

Comparing Select Columns from two CSV files in UNIX and create a third file based on comparision

Hi , I want to compare first 3 columns of File A and File B and create a new file File C which will have all rows from File B and will include rows that are present in File A and not in File B based on First 3 column comparison. Thanks in advance for your help. File A A,B,C,45,46... (2 Replies)
Discussion started by: ady_koolz
2 Replies

3. Shell Programming and Scripting

Comparing two large unsorted csv files

Hi All, My requirement is to write a shell script to compare two large csv files. I've created sample files for explaining my problem i.e., a.csv and b.csv contents of files: ----------------- a.csv ------ Type,Memory (Kb),Location HD,Size (Mb),Serial # XT,640,D402,0,MG0010... (2 Replies)
Discussion started by: vasavi
2 Replies

4. Shell Programming and Scripting

Comparing 2 CSV files and sending the difference to a new csv file

(say) I have 2 csv files - file1.csv & file2.csv as mentioned below: file1.csv ID,version,cost 1000,1,30 2000,2,40 3000,3,50 4000,4,60 file2.csv ID,version,cost 1000,1,30 2000,2,45 3000,4,55 6000,5,70 ... (1 Reply)
Discussion started by: Naresh101
1 Replies

5. Shell Programming and Scripting

Comparing 2 difference csv files

Hello, I have about 10 csv files which range from csv1 - csv10. Each csv file has same type/set of tabs and we have around 5-6 tabs for each of the csv file which have slightly different content(data). A sample of CSV1 is shown below: Joins: Data related to Joins, it can be any number of... (2 Replies)
Discussion started by: bobby1015
2 Replies

6. Shell Programming and Scripting

removing duplicate records comparing 2 csv files

Hi All, I want to remove the rows from File1.csv by comparing a column/field in the File2.csv. If both columns matches then I want that row to be deleted from File1 using shell script(awk). Here is an example on what I need. File1.csv: RAJAK,ACTIVE,1 VIJAY,ACTIVE,2 TAHA,ACTIVE,3... (6 Replies)
Discussion started by: rajak.net
6 Replies

7. Shell Programming and Scripting

comparing csv files

Hi! I'm just new to shell scripting n simple tasks looks so tough in initial stage. i need to write a script which will read a property file, property file will be containing count of the csv files, and in a folder(same folder) there will be respective csv files. like Property file data1=100... (3 Replies)
Discussion started by: sukhdip
3 Replies

8. Shell Programming and Scripting

Comparing Strings in 2 .csv/txt files?

EDIT: My problems have been solved thanks to the help of bartus11 and pravin27 This code is just to help me learn. It serves no purpose other than that. Here's a sample csv that I'm working with - #listofpeeps.csv Jackie Chan,1954,M Chuck Norris,1930,M Bruce Lee,1940,M This code is... (13 Replies)
Discussion started by: chickeneaterguy
13 Replies

9. Shell Programming and Scripting

Comparing 2 csv files and matching content

Hello, I have the following problem: There are two csv files csv-file #1: aaa1, aaa2, ... aaan aaa1, bbb2, ... bbbn aaa1, ccc2, ... cccn bbb1, bbb2, ... bbbn ... zzz1, zzz2, ... zzzn csv-file #2: aaa1, matchvalue1 ccc1, matchvalue2 (7 Replies)
Discussion started by: ghl10000
7 Replies

10. Shell Programming and Scripting

Last field problem while comparing two csv files

Hi All, I've two .csv files as below file1.csv abc, tdf, 223, tpx jgsd, tex, 342, rpy a, jdjdsd, 423, djfkld Where as file2.csv is the new version of file1.csv with some added fields in the end of each line and some additional lines. lfj, eru, 98, jkldj, 39, jdkj9 abc, tdf, 223, tpx,... (3 Replies)
Discussion started by: ganapati
3 Replies
Login or Register to Ask a Question