Comparing two CSV files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Comparing two CSV files
# 8  
Old 06-01-2016
file 1:

Code:
ID,Zip,Address,Parent,Country
9874125,43232,"493 Marietta St",21152,'United States'
4845622,85489,"434 Beach St",21542,'United States'
9874126,43234,"368 John's Creek Way",21122,'United States'
9874122,43233,"345 Cherry Place",21152,'United States'

file2
Code:
Zip, Parent
43232,21152
43234,21122

desired output
Code:
ID,Zip,Address,Parent,Country
9874125,43232,"493 Marietta St",21152,'United States'
9874126,43234,"368 John's Creek Way",21122,'United States'

This was just a sample of the data, there's thousands of more lines.

when i run:
Code:
awk 'FNR==NR {zip[$1]; next} $2 in zip' file2 file1

i don't get anything in the output...

Also these files are separated by commas because they are CSV files.

---------- Post updated at 05:41 PM ---------- Previous update was at 05:25 PM ----------

Never mind, figured it out! Just needed to add a -F ', ' delimiter! thanks. Final command:

Code:
awk -F ',' 'FNR==NR {zip[$1]; next} $2 in zip' file2 file1


Last edited by dan139; 06-01-2016 at 08:08 PM..
# 9  
Old 06-01-2016
If you're trying to match on both of the fields that are present in file2, you might want to try something more like:
Code:
awk -F ',' 'FNR==NR {zip_parent[$1,$2]; next} ($2,$4) in zip_parent' file2 file1

And, note that with a <comma> character as your field separator, you MUST use -F ',' with no space between the <comma> character and the closing single-quote character. (With -F ', ', you are specifying a field separator that is a <comma> character followed by a <space> character.)
This User Gave Thanks to Don Cragun For This Post:
# 10  
Old 06-02-2016
Hi again -

So after reexamining the output using the following command:

Code:
awk -F ',' 'FNR==NR {zip[$1]; next} $2 in zip' file2 file1

the output has less rows than file2 which is impossible. Every row in file2 exists in file1 but with added data. For example,

file1 has 1000 rows with zipcodes to search through.
file2 has 500 zipcodes we are looking for.

the output of this command:
file3 yields only 350 zipcodes when it should yield 500 zipcodes. I know for a fact every zipcode in file2 exists in file1.

Anyone know what the problem could be?
# 11  
Old 06-02-2016
Perhaps, you can run the opposite to diagnose the problem:
Code:
awk -F ',' 'FNR==NR {zip[$1]; next} !($2 in zip)' file2 file1

That will show the lines that did not make it, from file1. After that, you can analyze what's not according to what you think.

---------- Post updated at 04:15 PM ---------- Previous update was at 04:01 PM ----------

After re-reading your post again, I think a better test would be:
Code:
 awk -F"," 'FNR==NR {zip[$2]; next} !($1 in zip)' file1 file2

That will show the zips found in file2 that it does not have a match in file1, meaning: those are the lines it will not produce a result when you run the real program.
This User Gave Thanks to Aia For This Post:
# 12  
Old 06-02-2016
Please also show us what output you get from the following awk script:
Code:
awk -F, '
NR == 1 {
	next
}
!($1 in z) {
	c++
}
{	z[$1]++
}
END {	printf("%d data lines read\n", NR - 1)
	printf("%d unique zip codes read\n", c)
	for(i in z)
		if(z[i] > 1)
			printf("zip:%s appears %d times\n", i, z[i])
}' file2

As has been said many times before, if you are running this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk or nawk.

With what you have told us so far, I would expect the 2nd line of output to be about 350 and I would expect several lines following that listing zip codes that appear on more than one line in file2 (i.e., some zip codes have more than one parent).
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

awk assistance - Comparing 2 csv files

Hello all, I have searched high and low for a solution to this, many have come really close but not quite what I'm after. I have 2 files. One contains GUID's, for example: 8121E002-96FE-4C9C-BC5A-6AFF20DACECD 84468F30-F3B7-418B-81F0-0908E80792BF A second file, contains a path to the... (8 Replies)
Discussion started by: tirmUK
8 Replies

2. Shell Programming and Scripting

Comparing Select Columns from two CSV files in UNIX and create a third file based on comparision

Hi , I want to compare first 3 columns of File A and File B and create a new file File C which will have all rows from File B and will include rows that are present in File A and not in File B based on First 3 column comparison. Thanks in advance for your help. File A A,B,C,45,46... (2 Replies)
Discussion started by: ady_koolz
2 Replies

3. Shell Programming and Scripting

Comparing two large unsorted csv files

Hi All, My requirement is to write a shell script to compare two large csv files. I've created sample files for explaining my problem i.e., a.csv and b.csv contents of files: ----------------- a.csv ------ Type,Memory (Kb),Location HD,Size (Mb),Serial # XT,640,D402,0,MG0010... (2 Replies)
Discussion started by: vasavi
2 Replies

4. Shell Programming and Scripting

Comparing 2 CSV files and sending the difference to a new csv file

(say) I have 2 csv files - file1.csv & file2.csv as mentioned below: file1.csv ID,version,cost 1000,1,30 2000,2,40 3000,3,50 4000,4,60 file2.csv ID,version,cost 1000,1,30 2000,2,45 3000,4,55 6000,5,70 ... (1 Reply)
Discussion started by: Naresh101
1 Replies

5. Shell Programming and Scripting

Comparing 2 difference csv files

Hello, I have about 10 csv files which range from csv1 - csv10. Each csv file has same type/set of tabs and we have around 5-6 tabs for each of the csv file which have slightly different content(data). A sample of CSV1 is shown below: Joins: Data related to Joins, it can be any number of... (2 Replies)
Discussion started by: bobby1015
2 Replies

6. Shell Programming and Scripting

removing duplicate records comparing 2 csv files

Hi All, I want to remove the rows from File1.csv by comparing a column/field in the File2.csv. If both columns matches then I want that row to be deleted from File1 using shell script(awk). Here is an example on what I need. File1.csv: RAJAK,ACTIVE,1 VIJAY,ACTIVE,2 TAHA,ACTIVE,3... (6 Replies)
Discussion started by: rajak.net
6 Replies

7. Shell Programming and Scripting

comparing csv files

Hi! I'm just new to shell scripting n simple tasks looks so tough in initial stage. i need to write a script which will read a property file, property file will be containing count of the csv files, and in a folder(same folder) there will be respective csv files. like Property file data1=100... (3 Replies)
Discussion started by: sukhdip
3 Replies

8. Shell Programming and Scripting

Comparing Strings in 2 .csv/txt files?

EDIT: My problems have been solved thanks to the help of bartus11 and pravin27 This code is just to help me learn. It serves no purpose other than that. Here's a sample csv that I'm working with - #listofpeeps.csv Jackie Chan,1954,M Chuck Norris,1930,M Bruce Lee,1940,M This code is... (13 Replies)
Discussion started by: chickeneaterguy
13 Replies

9. Shell Programming and Scripting

Comparing 2 csv files and matching content

Hello, I have the following problem: There are two csv files csv-file #1: aaa1, aaa2, ... aaan aaa1, bbb2, ... bbbn aaa1, ccc2, ... cccn bbb1, bbb2, ... bbbn ... zzz1, zzz2, ... zzzn csv-file #2: aaa1, matchvalue1 ccc1, matchvalue2 (7 Replies)
Discussion started by: ghl10000
7 Replies

10. Shell Programming and Scripting

Last field problem while comparing two csv files

Hi All, I've two .csv files as below file1.csv abc, tdf, 223, tpx jgsd, tex, 342, rpy a, jdjdsd, 423, djfkld Where as file2.csv is the new version of file1.csv with some added fields in the end of each line and some additional lines. lfj, eru, 98, jkldj, 39, jdkj9 abc, tdf, 223, tpx,... (3 Replies)
Discussion started by: ganapati
3 Replies
Login or Register to Ask a Question