Format & Compare two huge CSV files


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Format & Compare two huge CSV files
# 1  
Old 12-16-2011
Format & Compare two huge CSV files

I have two csv files having 90K records each & each row has around 50 columns.Lets say the file names are FILE1 and FILE2. I have to compare both the files and generate a new file that has rows from FILE2 if it differs.

Code:
FILE1
-----
2001,"John",25,19901130,21211.41,Unix Forum
2002,"Mike",26,19850101,0.0,"Linux Experts, Co."

FILE2
-----
ID,NAME,AGE,JOINDATE,SALARY,ORGANIZATION
2001,John,25,19901130,000000000021211.41,Unix Forum
2002,Mike,26,19850101,000000000000000.00,"Linux Experts, Co."

As you can see that the text values in one of the files are quoted and the salary field differs in format but not the values. Both the files are same right now and the only difference is the missing header. So the output file must have the header only.

Lets change the data in FILE2

Code:
FILE2
-----
ID,NAME,AGE,JOINDATE,SALARY,ORGANIZATION
2001,John,25,19901130,000000000021211.41,Unix Forum
2002,Mike,26,19850101,000000000000000.00,"Linux Experts, Co."

Now, the output file should have the header and row2 from FILE2.

Please suggest an awk command to do this.
# 2  
Old 12-16-2011
Understood first example when you said, output file must have the header only. I didn't understand the second example. How would the output file have header and row2 from FILE2?

I'm not able to see any changes made to row2 of FILE2 after you said "Lets change the data in FILE2"
# 3  
Old 12-16-2011
my bad..here is the modified file

Code:
FILE2
-----
ID,NAME,AGE,JOINDATE,SALARY,ORGANIZATION
2001,John,25,19901130,000000000021211.41,Unix Forum
2002,Mike,26,19850101,000000000000011.00,"Linux Experts, Co."

# 4  
Old 12-16-2011
Quote:
Originally Posted by Sheel
I have two csv files having 90K records each & each row has around 50 columns.Lets say the file names are FILE1 and FILE2. I have to compare both the files and generate a new file that has rows from FILE2 if it differs.

Code:
FILE1
-----
2001,"John",25,19901130,21211.41,Unix Forum
2002,"Mike",26,19850101,0.0,"Linux Experts, Co."

FILE2
-----
ID,NAME,AGE,JOINDATE,SALARY,ORGANIZATION
2001,John,25,19901130,000000000021211.41,Unix Forum
2002,Mike,26,19850101,000000000000000.00,"Linux Experts, Co."

As you can see that the text values in one of the files are quoted and the salary field differs in format but not the values. Both the files are same right now and the only difference is the missing header. So the output file must have the header only.
Now, the output file should have the header and row2 from FILE2.
Please suggest an awk command to do this.
If you really want to compare both the files and print then..
Code:
awk 'BEGIN{FS=OFS=",";print "ID,NAME,AGE,JOINDATE,SALARY,ORGANIZATION"} FNR==NR{a[FNR]=$2;next}{$2=a[FNR+1];print}' FILE2 FILE1

Or if you just want to remove the double quotes in column 2 (which it looks like..) in FILE1 then try
Code:
awk 'BEGIN{FS=OFS=",";print "ID,NAME,AGE,JOINDATE,SALARY,ORGANIZATION"} {gsub("\"","",$2);print}' FILE1

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Compare 2 csv files

Hello, I want to compare two csv files expected.csv and actual.csv by fields "f3", "f4", "f5". Field "z" can be differnt. Each record identified by two fields: "number" +"key". Records are not sorted. Can you please help me to get such report: Error: Mismatch in the record... (4 Replies)
Discussion started by: Eugenne
4 Replies

2. Shell Programming and Scripting

Compare 2 files of csv file and match column data and create a new csv file of them

Hi, I am newbie in shell script. I need your help to solve my problem. Firstly, I have 2 files of csv and i want to compare of the contents then the output will be written in a new csv file. File1: SourceFile,DateTimeOriginal /home/intannf/foto/IMG_0713.JPG,2015:02:17 11:14:07... (8 Replies)
Discussion started by: refrain
8 Replies

3. Shell Programming and Scripting

Compare 2 csv files in ksh and o/p the difference in a new csv file

(say) I have 2 csv files - file1.csv & file2.csv as mentioned below: file1.csv ID,version,cost 1000,1,30 2000,2,40 3000,3,50 4000,4,60 file2.csv ID,version,cost 1000,1,30 2000,2,45 3000,4,55 6000,5,70 The... (7 Replies)
Discussion started by: Naresh101
7 Replies

4. Shell Programming and Scripting

Difference between two huge .csv files

Hi all, I need help on getting difference between 2 .csv files. I have 2 large . csv files which has equal number of columns. I nned to compare them and get output in new file which will have difference olny. E.g. File1.csv Name, Date, age,number Sakshi, 16-12-2011, 22, 56 Akash,... (10 Replies)
Discussion started by: Dimple
10 Replies

5. Shell Programming and Scripting

How to fix line breaks format text for huge files?

Hi, I need to correct line breaks for huge files (more than 1MM records in a file) and then format it properly. Except the header and trailer, each record starts with 'D'. Requirement:Scan the whole file except the header and trailer records and see if any of the records start with... (19 Replies)
Discussion started by: kikionline
19 Replies

6. Shell Programming and Scripting

How to compare the columns in two .csv files?

Hi I have to compare two .csv files which having 4 columns and i am expecting the output if there is difference in the 3,4columns in two files with respect to the first column. if my statement is not clear please refer the example. Input: ----- File 1 : hostname MAC SWITCH_IP SWITCH_PORT... (7 Replies)
Discussion started by: Kanchana
7 Replies

7. Shell Programming and Scripting

Compare 2 folders to find several missing files among huge amounts of files.

Hi, all: I've got two folders, say, "folder1" and "folder2". Under each, there are thousands of files. It's quite obvious that there are some files missing in each. I just would like to find them. I believe this can be done by "diff" command. However, if I change the above question a... (1 Reply)
Discussion started by: jiapei100
1 Replies

8. UNIX for Dummies Questions & Answers

How to compare 2 files & get specific value & replace it in other file.

Hiiii Friends I have 2 files with huge data. I want to compare this 2 files & if they hav same set of vales in specific rows & columns i need to get that value from one file & replace it in other. For example: I have few set data of both files here: a.dat: PDE-W 2009 12 16 5 29 11.11 ... (10 Replies)
Discussion started by: reva
10 Replies

9. Shell Programming and Scripting

Need to compare two csv files values and write into another csv file

Hi all, Am new to scripting. So i just need your ideas to help me out. Here goes my requirement. I have two csv files 1.csv 2.csv abc,1.24 abc,1 def,2.13 def,1 I need to compare the first column of 1.csv with 2.csv and if matches then need to compare... (2 Replies)
Discussion started by: chinnahyd
2 Replies

10. Shell Programming and Scripting

Compare 2 huge files wrt to a key using awk

Hi Folks, I need to compare two very huge file ( i.e the files would contain a minimum of 70k records each) using awk or sed. The comparison needs to be done with respect to a 'key'. For example : File1 ********** 1234|TONY|Y75634|20/07/2008 1235|TINA|XCVB56|30/07/2009... (13 Replies)
Discussion started by: Ranjani
13 Replies
Login or Register to Ask a Question