Comparing two large unsorted csv files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Comparing two large unsorted csv files
# 1  
Old 06-17-2013
Comparing two large unsorted csv files

Hi All,

My requirement is to write a shell script to compare two large csv files.
I've created sample files for explaining my problem
i.e., a.csv and b.csv

contents of files:
-----------------
a.csv
------
Code:
Type,Memory (Kb),Location HD,Size (Mb),Serial #
XT,640,D402,0,MG0010
386,2048,D403,100,MG0011
486,4096,D404,270,MG0012
386,8192,A423,400,CC0177
486,8192,A424,670,CC0182
286,4096,A423,C100,C0183
286,4096,A425,80,CC0184
Mac,4096,B407,80,EE1027

b.csv
------
Code:
Type,Memory (Kb),Serial #,Location HD
XT,640,MG0010,D402
386,2048,MG0011,D403
486,4096,CC0177,D404
386,8192,MG0012,A423
486,8192,C0183,A424
286,4096,CC0182,A423
286,4096,CC0184,A425
Mac,4096,EE1027,B407

cases to consider:
1. In a.csv and b.csv the column order may differ as shown above and in any of the files few extra columns might be there which are not present in other file.
2.The comparision should be like, for each column data in a.csv we should compare the matching column data in b.csv (ex., data for column "Type" from a.csv with data for column "Type" from b.csv) and write the DATA DIFFERENCE with FILE NAME, COLUMN NAME, COLUMN NUMBER and ROW NUMBER to a log file.

Here in a.csv one column is extra -> Size (Mb), we need to capture the extra columns as well in the same log.

Finally, the log file should contain something like the following data:

Code:
Filename: a.csv, Filename:b.csv    
 
serial #        serial #    
 
3 MG0012        CC0177
4 CC0177        MG0012
5 CC0182        A424
6 C0183        A423
7 CC0184        A425
8 EE1027        B407

Extra columns in a.csv Extra columns in b.csv

Size (Mb) column at field 4

Any help would be greatly appreciated.....

Last edited by vasavi; 06-18-2013 at 04:32 AM.. Reason: code tags
# 2  
Old 06-17-2013
What have you tried so far?

Regards and welcome to the forum,
Alister
# 3  
Old 06-17-2013
Tried with ulitilities like awk, sed etc., but not getting how to sort data based on column names common to both the files alphabetically so that extra cols would be at last in both files which will be easy for comparing.
Any scripts available for references would be of great help.

Last edited by vasavi; 06-18-2013 at 04:34 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

awk assistance - Comparing 2 csv files

Hello all, I have searched high and low for a solution to this, many have come really close but not quite what I'm after. I have 2 files. One contains GUID's, for example: 8121E002-96FE-4C9C-BC5A-6AFF20DACECD 84468F30-F3B7-418B-81F0-0908E80792BF A second file, contains a path to the... (8 Replies)
Discussion started by: tirmUK
8 Replies

2. Shell Programming and Scripting

Comparing two CSV files

I have two csv files and im trying to compare them. e.g. SAMPLE DATA: file one: ZipCode Name 20878 Washington 10023 Missouri 20304 Maryland file two: ID Name City ZipCode 11654 ... (11 Replies)
Discussion started by: dan139
11 Replies

3. Shell Programming and Scripting

Comparing 2 CSV files and sending the difference to a new csv file

(say) I have 2 csv files - file1.csv & file2.csv as mentioned below: file1.csv ID,version,cost 1000,1,30 2000,2,40 3000,3,50 4000,4,60 file2.csv ID,version,cost 1000,1,30 2000,2,45 3000,4,55 6000,5,70 ... (1 Reply)
Discussion started by: Naresh101
1 Replies

4. Shell Programming and Scripting

Comparing 2 difference csv files

Hello, I have about 10 csv files which range from csv1 - csv10. Each csv file has same type/set of tabs and we have around 5-6 tabs for each of the csv file which have slightly different content(data). A sample of CSV1 is shown below: Joins: Data related to Joins, it can be any number of... (2 Replies)
Discussion started by: bobby1015
2 Replies

5. Shell Programming and Scripting

Merging Very large CSV files in Unix

Hi, I have two very large CSV files, which I want to merge (equi-join) based on a key (column). One of the file (say F1) would have ~30 MM records and 700 columns. The other file (~f2) would have same # of records and lesser columns (say 50). I want to create an output file joining on a... (3 Replies)
Discussion started by: student_007
3 Replies

6. Shell Programming and Scripting

comparing csv files

Hi! I'm just new to shell scripting n simple tasks looks so tough in initial stage. i need to write a script which will read a property file, property file will be containing count of the csv files, and in a folder(same folder) there will be respective csv files. like Property file data1=100... (3 Replies)
Discussion started by: sukhdip
3 Replies

7. Shell Programming and Scripting

Comparing two unsorted files

Hi Guys, I'm a complete shell scripting newbie and need some help with comparing a file against a master file and outputting the results. master.txt would look something like this: 000123 000345 000341 000927 000762 000235 000155 000452 000846 000623 file.txt would look like... (1 Reply)
Discussion started by: ven
1 Replies

8. Shell Programming and Scripting

Comparing Strings in 2 .csv/txt files?

EDIT: My problems have been solved thanks to the help of bartus11 and pravin27 This code is just to help me learn. It serves no purpose other than that. Here's a sample csv that I'm working with - #listofpeeps.csv Jackie Chan,1954,M Chuck Norris,1930,M Bruce Lee,1940,M This code is... (13 Replies)
Discussion started by: chickeneaterguy
13 Replies

9. Shell Programming and Scripting

Comparing 2 csv files and matching content

Hello, I have the following problem: There are two csv files csv-file #1: aaa1, aaa2, ... aaan aaa1, bbb2, ... bbbn aaa1, ccc2, ... cccn bbb1, bbb2, ... bbbn ... zzz1, zzz2, ... zzzn csv-file #2: aaa1, matchvalue1 ccc1, matchvalue2 (7 Replies)
Discussion started by: ghl10000
7 Replies

10. Shell Programming and Scripting

Last field problem while comparing two csv files

Hi All, I've two .csv files as below file1.csv abc, tdf, 223, tpx jgsd, tex, 342, rpy a, jdjdsd, 423, djfkld Where as file2.csv is the new version of file1.csv with some added fields in the end of each line and some additional lines. lfj, eru, 98, jkldj, 39, jdkj9 abc, tdf, 223, tpx,... (3 Replies)
Discussion started by: ganapati
3 Replies
Login or Register to Ask a Question