Grep -v -f and sort|diff which way is faster


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Grep -v -f and sort|diff which way is faster
# 1  
Old 10-29-2014
Grep -v -f and sort|diff which way is faster

Hi Gurus,

I have two big files. I need to compare the different. currently, I am using
Code:
 
sort file1 > file1_temp; 
sort file2 > file2_tmp 
diff file1_tmp file2_tmp

I can use command
Code:
 
grep -v -f file1 file2

just wondering which way is fast to compare two big files.

Thanks in advance.
# 2  
Old 10-29-2014
It depends on how much input data you have.

The grep method is very fast if you have enough memory, but that is its limit... If file1 is too large, it's liable to run out of memory and grind to a halt, or just plain crash. I wouldn't trust it with a file1 larger than a hundred or two megabytes. (file2 can be any size, though.) You should be doing grep -v -F -f file1 file2 by the way -- the -F makes sure the lines are all considered raw, instead of being used as regular expressions.

The sort method can reliably tolerate any size of input (though I would have used comm -1 -3 rather than diff).

So all else being equal, I'd use the sort method and worry less.

Last edited by Corona688; 10-29-2014 at 02:24 PM..
This User Gave Thanks to Corona688 For This Post:
# 3  
Old 10-29-2014
The file after the -f is read into memory, so it should be
Code:
grep -v -f smallfile bigfile

Some grep versions are rather slow on this, and even the faster full-line-match is not a race car:
Code:
grep -v -x -f smallfile bigfile

In this case consider replacing it by awk:
Code:
awk 'FILENAME=="-" {s[$0]; next} !($0 in s)' - bigfile <smallfile

This User Gave Thanks to MadeInGermany For This Post:
# 4  
Old 10-29-2014
You could time the two approaches using small to medium size files. The -v option shows additional info on resource usage.
This User Gave Thanks to RudiC For This Post:
# 5  
Old 10-29-2014
Thanks all of you for your good suggestion.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Need help for faster file read and grep in big files

I have a very big input file <inputFile1.txt> which has list of mobile no inputFile1.txt 3434343 3434323 0970978 85233 ... around 1 million records i have another file as inputFile2.txt which has some log detail big file inputFile2.txt afjhjdhfkjdhfkd df h8983 3434343 | 3483 | myout1 |... (3 Replies)
Discussion started by: reldb
3 Replies

2. UNIX for Dummies Questions & Answers

What is the faster way to grep from huge file?

Hi All, I am new to this forum and this is my first post. My requirement is like to optimize the time taken to grep the file with 40000 lines. There are two files FILEA(40000 lines) FILEB(40000 lines). The requirement is like this, both the file will be in the format below... (11 Replies)
Discussion started by: mad man
11 Replies

3. Homework & Coursework Questions

awk with Grep and Sort

1. The problem statement, all variables and given/known data: Please bare in mind I am a complete novice to this and have very very basic knowledge so please keep any answers as simple as possible and explain in terms I will understand ahha :):) I have a text file of names and test scores... (1 Reply)
Discussion started by: jamesb18
1 Replies

4. Homework & Coursework Questions

Grep and Sort

Use and complete the template provided. The entire template must be completed. If you don't, your post may be deleted! 1. The problem statement, all variables and given/known data: 1. Print the number of people that are in the /etc/passwd file with the name of George 2. Sort by name and... (8 Replies)
Discussion started by: Jagst3r21
8 Replies

5. Shell Programming and Scripting

Diff between grep .* file name and grep '.*' filename

Hi, Can anyone let me know what is difference between grep .* foo.c grep '.*' foo.c I am not able to understand what is exact difference. Thanks in advance (2 Replies)
Discussion started by: SasDutta
2 Replies

6. Shell Programming and Scripting

grep from 3 lines and sort

Pseudo name=hdiskpower54 Symmetrix ID=000190101757 Logical device ID=0601 state=alive; policy=SymmOpt; priority=0; queued-IOs=0 ============================================================================== ---------------- Host --------------- - Stor - -- I/O Path - -- Stats --- ### HW... (7 Replies)
Discussion started by: Daniel Gate
7 Replies

7. Shell Programming and Scripting

[solved] Diff between two files by grep

My requiremeny is as follows, I have two files file a A BONES RD,NHILL,3418,VIC 37TH PARALLEL RD,DEEP LEAD,3385,VIC 4 AK RD,OAKEY,4401,QLD A & J FARRS RD,BARMOYA,4703,QLD A B PATTERSON DR,ARUNDEL,4214,QLD A BLAIRS RD,BUCKRABANYULE,3525,VIC file b A BONES... (12 Replies)
Discussion started by: feelmyfrd
12 Replies

8. Shell Programming and Scripting

how to grep sort userids

hello folks i have a file that have data like /test/aa/123 /test/aa/xyz /test/bb/xyz /test/bb/123 in above lines i just wants to grep "aa" and "bb". Thanks, Bash (4 Replies)
Discussion started by: learnbash
4 Replies

9. UNIX for Dummies Questions & Answers

How to grep faster ?

Hi I have to grep for 2000 strings in a file one after the other.Say the file name is Snxx.out which has these strings. I have to search for all the strings in the file Snxx.out one after the other. What is the fastest way to do it ?? Note:The current grep process is taking lot of time per... (7 Replies)
Discussion started by: preethgideon
7 Replies

10. UNIX for Dummies Questions & Answers

Sort/Grep Question

Hello all, I have a test file that has the format: ..... O 3.694950 -.895050 1.480000 O 5.485050 .895050 1.480000 Ti -4.590000 4.590000 2.960000 Ti -2.295000 ... (5 Replies)
Discussion started by: aarondesk
5 Replies
Login or Register to Ask a Question