The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > Top Forums > Shell Programming and Scripting
Google UNIX.COM


Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts here.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
Comparing two files superstar003 Shell Programming and Scripting 2 05-08-2008 01:36 AM
Comparing files soliberus Shell Programming and Scripting 5 04-28-2008 11:37 PM
Comparing two files... paqman Shell Programming and Scripting 12 08-08-2007 12:45 AM
Problem comparing 2 files with lot of data rafisha Shell Programming and Scripting 4 07-25-2007 04:56 PM
comparing shadow files with real files terrym UNIX for Advanced & Expert Users 4 02-08-2007 10:38 PM

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 06-12-2008
ganapati's Avatar
Registered User
 

Join Date: Jul 2006
Posts: 106
Exclamation Last field problem while comparing two csv files

Hi All,

I've two .csv files as below
file1.csv
abc, tdf, 223, tpx
jgsd, tex, 342, rpy
a, jdjdsd, 423, djfkld


Where as file2.csv is the new version of file1.csv with some added fields in the end of each line and some additional lines.
lfj, eru, 98, jkldj, 39, jdkj9
abc, tdf, 223, tpx, 4sdd, 43
jgsd, tex, 342, rpy, 343js, j88
a, jdjdsd, 423, djfkld, djd, 322i
djlfj, djd, 499, djld, 323u, d88l
fjdkl, jdfsld, 45k, djdkl, 343, 334p


Now my requirement is to get only these additional lines from file2.csv.
I tried with 'diff' and 'comm'. Due to added additional fields I am not able to get the desired output as below:
lfj, eru, 98, jkldj, 39, jdkj9
djlfj, djd, 499, djld, 323u, d88l
fjdkl, jdfsld, 45k, djdkl, 343, 334p


Even I am still trying to solve this myself. Any help would be much appreciated from any of friends...

Thanks in advance / Mysore Ganapati
Reply With Quote
Forum Sponsor
  #2 (permalink)  
Old 06-12-2008
radoulov's Avatar
addict
 

Join Date: Jan 2007
Location: Milan, Italy/Varna, Bulgaria
Posts: 1,432
You can try something like this, but it may fail in some situations.

Code:
fgrep -f file1.csv -v file2.csv
Reply With Quote
  #3 (permalink)  
Old 06-12-2008
ganapati's Avatar
Registered User
 

Join Date: Jul 2006
Posts: 106
fgrep Not worked for my problem

I gave you the sample of files. But in reality, both files are having 10 millions of records. When I tried your solution using fgrep, got the below error message:
fgrep: could not allocate memory for wordlist

Are there any other solutions please...
Reply With Quote
  #4 (permalink)  
Old 06-12-2008
radoulov's Avatar
addict
 

Join Date: Jan 2007
Location: Milan, Italy/Varna, Bulgaria
Posts: 1,432
Hm,
if the number of columns in the first csv is fixed,
you could try Awk:

Code:
awk -F, 'NR == FNR {
  _[$0]
  next
  }
!(($1 FS $2 FS $3 FS $4) in _)
' file1.csv file2.csv
But you may encounter similar problems ...

Or, something like this, will not be fast :

Code:
awk -F, 'BEGIN {
  f = ARGV[1]
  ARGV[1] = ""
  }
{ 
  while ((getline _f < f) > 0)
    if (($1 FS $2 FS $3 FS $4) == _f) 
	  next
	close(f)
  }
1' file1.csv file2.csv
You could modify the second example and use it even if the number of columns in the first file is variable.

Last edited by radoulov; 06-12-2008 at 05:27 AM.
Reply With Quote
Google UNIX.COM
Reply

Thread Tools
Display Modes




All times are GMT -7. The time now is 03:53 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited.
The UNIX and Linux Forums Content Copyright ©1993-2008 The CEP Blog All Rights Reserved -Ad Management by RedTyger Visit The Global Fact Book

Content Relevant URLs by vBSEO 3.2.0