Difference between two huge .csv files | Unix Linux Forums | Shell Programming and Scripting

  Go Back    


Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

Difference between two huge .csv files

Shell Programming and Scripting


Closed Thread    
 
Thread Tools Search this Thread Display Modes
    #1  
Old 10-07-2012
Dimple Dimple is offline
Registered User
 
Join Date: Oct 2012
Last Activity: 8 January 2013, 2:36 AM EST
Posts: 10
Thanks: 0
Thanked 0 Times in 0 Posts
Difference between two huge .csv files

Hi all,

I need help on getting difference between 2 .csv files.
I have 2 large . csv files which has equal number of columns. I nned to compare them and get output in new file which will have difference olny.

E.g.
File1.csv

Code:
Name, Date, age,number 
Sakshi, 16-12-2011, 22, 56
Akash, 14-12-2011, 23, 76

File2.csv

Code:
Name,Date,age,number
Sakshi, 14-12-2011,22,56
Akash,18-12-2011,23,76

then output should be like

Code:
16-12-2011                      14-12-2011
14-12-2011                       18-12-2011

It's just an example. What I am trying to say is I should get only the values of columns where we have the difference. Not the whole line.
Assuming File will be in sorted order.
There can be m number of columns but for sure in both the files, we will get same columns. If values are different then those values should be given in output.
It can also work if we can get difference in comma separated file like
wherver values matches between 2 files we get blank
,16-12-2011,,
Hope I am able to explain the issue.

Last edited by Franklin52; 10-08-2012 at 03:07 AM.. Reason: Please use code tags for data and code samples
Sponsored Links
    #2  
Old 10-07-2012
pamu pamu is offline
Registered User
 
Join Date: Mar 2012
Last Activity: 14 April 2014, 6:10 AM EDT
Posts: 1,640
Thanks: 58
Thanked 476 Times in 472 Posts

Code:
awk -F, 'FNR==NR{a[$1]=$2;next}{if(a[$1]!=$2){print a[$1],$2}}' file1 file2

Sponsored Links
    #3  
Old 10-07-2012
Dimple Dimple is offline
Registered User
 
Join Date: Oct 2012
Last Activity: 8 January 2013, 2:36 AM EST
Posts: 10
Thanks: 0
Thanked 0 Times in 0 Posts
I think I am not able to explain issue properly.

In the example given, there is a difference at 2nd column only. But there can be difference in some other columns value as well. This command is giving result for difference at 2nd place only.
Can you give me the command so that I can get result in comma separated format only. By this I will get to know wherever values are not matching in our files.
It's not neccessary to get values from both the file. Let say there is difference at 3rd column and 7th column so my result should be like
,,17-12-2011,,,,10,,,,,,,,,

Please help
    #4  
Old 10-07-2012
pamu pamu is offline
Registered User
 
Join Date: Mar 2012
Last Activity: 14 April 2014, 6:10 AM EDT
Posts: 1,640
Thanks: 58
Thanked 476 Times in 472 Posts
try with this..


Code:
paste file1 file2 | awk -F "[,\t]" '{for(i=1;i<=(NF/2);i++){if($i != $(NF/2+i)){printf $i}else{printf ","}}}{print ""}'

Sponsored Links
    #5  
Old 10-08-2012
Dimple Dimple is offline
Registered User
 
Join Date: Oct 2012
Last Activity: 8 January 2013, 2:36 AM EST
Posts: 10
Thanks: 0
Thanked 0 Times in 0 Posts
Thanks for the help.

But still have one issue.

If i have difference in 2 consecutive columns, it's not showing any separation between them.
E.g
File1
Rahul, 1203,113,11

File2
Malik, 121,113,11

Output coming as Rahul1203,,

Expected Output: Rahul,1203,,
Sponsored Links
    #6  
Old 10-08-2012
pamu pamu is offline
Registered User
 
Join Date: Mar 2012
Last Activity: 14 April 2014, 6:10 AM EDT
Posts: 1,640
Thanks: 58
Thanked 476 Times in 472 Posts
Quote:
Originally Posted by Dimple View Post

If i have difference in 2 consecutive columns, it's not showing any separation between t
What about FS for all the values. So you can easily distinguish between them...


Code:
paste file1 file2 | awk -F "[,\t]" '{for(i=1;i<=(NF/2);i++){if($i != $(NF/2+i)){
if(s){s=s";"$i}else{s=$i}}else{if(s){s=s";,"}else{s=","}}}}{ print s;s=""}'

Sponsored Links
    #7  
Old 10-09-2012
Dimple Dimple is offline
Registered User
 
Join Date: Oct 2012
Last Activity: 8 January 2013, 2:36 AM EST
Posts: 10
Thanks: 0
Thanked 0 Times in 0 Posts
Bug

Thanks for your help

It's working exactly what I want.

If possible Can you please explain the code.

Thanks
Sponsored Links
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Huge performance difference between Java and C, Java faster Phoib Programming 33 04-27-2011 10:22 AM
Three Difference File Huge Data Comparison Problem. patrick87 Shell Programming and Scripting 4 10-22-2010 06:49 PM
Huge difference between _POSIX_OPEN_MAX and sysconf(_SC_OPEN_MAX). gencon Programming 5 03-06-2010 03:47 PM
Huge difference in reported Disk usage between ls,df and du cooperuf AIX 4 11-14-2008 03:11 PM
Difference between two huge files pyaranoid UNIX for Dummies Questions & Answers 13 09-16-2008 10:11 AM



All times are GMT -4. The time now is 02:57 PM.