compare huge file

Thread Tools Search this Thread
Operating Systems Solaris compare huge file
# 1  
Old 02-09-2008
Question compare huge file

I have files with records of 40,00,000& 39,00,000 and i want to find out the


1.which is existing in file1 and not in file2.
2.Which is exisitng in file2 and not in file1.

The format of the file will be like


If its a smaller one i used to do egrep -f .

Need your help to sort it out.
# 2  
Old 02-09-2008
comparing files

If your machine has enough memory (I would hope 2 GB is enough), you should be able to do something like this:

sort f1 >f1.$$
sort f2 >f2.$$
diff f1.$$ f2.$$
# rm f1.$$ f2.$$

Here's a grep -f method that doesn't use a lot of memory, but takes a long time:
cat f1 | \
while read f1rec; do 
  fgrep -- "$f1rec" f2  >/dev/null || echo -- "$f1rec"
  # The -- may not work in all UNIX's - 
  # they are to ensure that records beginning with a record starting 
  # with "-" will not be interpreted as an option

That will find all the records in f1 not in f2. Just swap the variables to do the reverse effect.

If all you want to do is merge the files, and no duplicates are allowed, here you go:
sort -u f1 f2 >merged

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

My file system is 100%, can't find the huge file

Please help. My file system is 100%, I can't seem to find what is taking so much space. The total hard drive space is 150Gig free but I got nothing now. I did to this to find the big file but it's taking so much time. Is there any other way? du -ah / | more find ./ -size +200M... (3 Replies)
Discussion started by: samnyc
3 Replies

2. Shell Programming and Scripting

Format & Compare two huge CSV files

I have two csv files having 90K records each & each row has around 50 columns.Lets say the file names are FILE1 and FILE2. I have to compare both the files and generate a new file that has rows from FILE2 if it differs. FILE1 ----- 2001,"John",25,19901130,21211.41,Unix Forum... (3 Replies)
Discussion started by: Sheel
3 Replies

3. Shell Programming and Scripting

Optimised way for search & replace a value on one line in a very huge file (File Size is 24 GB).

Hi Experts, I had to edit (a particular value) in header line of a very huge file so for that i wanted to search & replace a particular value on a file which was of 24 GB in Size. I managed to do it but it took long time to complete. Can anyone please tell me how can we do it in a optimised... (7 Replies)
Discussion started by: manishkomar007
7 Replies

4. Shell Programming and Scripting

Compare 2 folders to find several missing files among huge amounts of files.

Hi, all: I've got two folders, say, "folder1" and "folder2". Under each, there are thousands of files. It's quite obvious that there are some files missing in each. I just would like to find them. I believe this can be done by "diff" command. However, if I change the above question a... (1 Reply)
Discussion started by: jiapei100
1 Replies

5. Shell Programming and Scripting

Huge File Comparison

Hi i need to compare two fixed length files and produce the differences if any to a seperate file. I have to capture each and every differneces line by line. Ideally my files should not have any differences but if there are any then it should be captured without any miss. Also my files sizes are... (4 Replies)
Discussion started by: naveenn08
4 Replies

6. Shell Programming and Scripting

Help on splitting this huge file

Hi , i have files coming in my system which are very huge in MB and GBs, all these files are in a single line, there is no newline character. I need to get only last 700 bytes of these files, of this i am splitting the files by "split -b 700 filename" but this gives all the splitted... (2 Replies)
Discussion started by: Prateek007
2 Replies

7. Shell Programming and Scripting

insert a header in a huge data file without using an intermediate file

I have a file with data extracted, and need to insert a header with a constant string, say: H|PayerDataExtract if i use sed, i have to redirect the output to a seperate file like sed ' sed commands' ExtractDataFile.dat > ExtractDataFileWithHeader.dat the same is true for awk and... (10 Replies)
Discussion started by: deepaktanna
10 Replies

8. Shell Programming and Scripting

Compare 2 huge files wrt to a key using awk

Hi Folks, I need to compare two very huge file ( i.e the files would contain a minimum of 70k records each) using awk or sed. The comparison needs to be done with respect to a 'key'. For example : File1 ********** 1234|TONY|Y75634|20/07/2008 1235|TINA|XCVB56|30/07/2009... (13 Replies)
Discussion started by: Ranjani
13 Replies

9. Shell Programming and Scripting

sorting huge file

Hi All I am sorting a huge file -rw-r--r-- 1 rama users 448156978 May 13 18:48 102384.temp $ sort -k 1,40n 102384.temp > 102384.temp1 msgcnt 1468 vxfs: mesg 001: vx_nospace - /dev/vg00/var file system full (1 block extent) sort: A write error occurred while sorting. I thought... (3 Replies)
Discussion started by: dhanamurthy
3 Replies

10. UNIX for Dummies Questions & Answers

spliting up a huge file

I have a file {filename} which contains 65000 records I need to split into 6 smaller files roughly 11000 records each. Can someone advise me of the Unix command to do so ? Many thanks (2 Replies)
Discussion started by: grinder182533
2 Replies
Login or Register to Ask a Question