Please suggest alternative to grep


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Please suggest alternative to grep
# 1  
Old 06-07-2012
Please suggest alternative to grep

Hi Experts,
PFB my requirement:
I have a file (named file1) containing numbers like:
Code:
372846078543002
372846078543003
372846078543004
372846078543005
372846078543006

I have another file (nemed file2)where lines containing these numbers(present in file1) are present; Eg:
Code:
lppza087; [2012-06-05 03:00:01,090] <PERFB  > JMSId :ID:414d51204c50505a41303837202020204f657ff1299e7bb7 SvcName :realtime.get.relationship Port :Port1 LobId :AMCSGCDERTUSUSD Card :372846078543002 SrcCd :16 versionNum :3.0 OO [MessageListenerThreadPool : 11  ] OO dao.CustDAO                       OO                 getRelnDetails() OO Entry : getRelnDetails
lppza087; [2012-06-05 03:00:01,100] <PERFB  > JMSId :ID:414d51204c50505a41303837202020204f657ff1299e7bb7 SvcName :realtime.get.relationship Port :Port1 LobId :AMCSGCDERTUSUSD Card :372846078543003 SrcCd :16 versionNum :3.0 OO [MessageListenerThreadPool : 11  ] OO dao.CustDAO                       OO                 getRelnDetails() OO Exit  : getRelnDetails

I need to grep all those lines present in file1 from the other file (file2).
One way will be to run a for loop on file1 and grep in file2. But my data volume is very high an it's taking 5-6 hours.
Can you please suggest the fastest way to achieve this (may be using awk/sed)

Last edited by Franklin52; 06-07-2012 at 04:01 AM.. Reason: Please use code tags
# 2  
Old 06-07-2012
Hi

Code:
grep -f file1 file2

Guru.
# 3  
Old 06-07-2012
Thanks Guru for your prompt response Smilie
But my 2nd file size is 15 GB, and the 1st file size is 5 GB. So just wanted to know can this proces be made faster.
I was also was wondering if the lines (as obtained from file2) can be arranged as per the search lines present in file1.
# 4  
Old 06-07-2012
grep, sed, and awk would all do the same thing: read the first file line by line, and check the second file for occurrences each time, chugging through approximately 75 GB (15*5) of data.

One way it could be done faster would be a script/program that reads the second file (which looks like the wanted information is in the same place on every line), creates a hash/list of the numbers and according line numbers, and the only has to go through the first file once.
# 5  
Old 06-07-2012
Hi.

First question is does this absolutely need to be faster? How many times are you going to run it? If it's a single-shot, then perhaps just letting it run to completion is the best solution.

Secondly, the first file looks like it is a sequence. If so, then perhaps a regular expression could be used rather than a volume of 5 GB of memory. If not a regular expression, then possibly a code that determines if a piece of the line matches the base + the sequence -- an arithmetic operation, which might be faster than string comparisons (for example, some mainframes & supercomputers had multiple units for arithmetic).

Thirdly, if you have sufficient IO throughput as well as multiple cores, then one could write a program that internally divides the main file into pieces by keeping track of start-stop line positions, and then uses processes or threads to process one segment each. A less elegant solution along the same lines would be to spilt the files into n sections, each in a file, and then run n instances of grep.

Fourthly, splitting the task up among a network of machines that might share the disk; as well as the easiest (but not cheapest) solution: get a faster box.

Best wishes ... cheers, drl
# 6  
Old 06-07-2012
Assuming both files are sorted, maybe you can use "join".
If all the 300 million numbers of file1 start with 372846 (if not, then multiple passes maybe), then you can treat them as integers (minus the prefix). This way you can store them as bitmaps and do look up of the numbers (check prefix first separately) from file2. The first chapter of Jon Bentley's book "programming pearl" talked exactly about this problem.
# 7  
Old 06-18-2012
"fgrep -f file1 file2 " worked for me

Thanks,
Niladri
Login or Register to Ask a Question

Previous Thread | Next Thread

8 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help with grep, or alternative

say I have a big list of something like: sdg2000 weghre10 fewg53 gwg99 jwegwejjwej43 afg10293 I want to remove the numbers of any line that has letters + 1 to 4 numbers output: sdg weghre fewg gwg jwegwejjwej afg10293 (7 Replies)
Discussion started by: Siwon
7 Replies

2. Shell Programming and Scripting

Alternative command to grep -w option

Hi All, We have few scripts where we are using grep -w option to do exact matching of the pattern. This works fine on most of our servers. But I have encounter a very old HP-UX System(HP-UX B.11.00) where grep -w option is not available. This is causing my scripts to fail. I need to change... (7 Replies)
Discussion started by: veeresh_15
7 Replies

3. UNIX for Dummies Questions & Answers

alternative to the grep trick

Hi, We used to use the below commands often. ps -ef|grep bc ps -ef|grep abc|grep -v grep Both fairly returns the same result. For example, the process name is dynamic and we are having the process name in a variable, how we can apply the above trick. For example "a" is the... (11 Replies)
Discussion started by: pandeesh
11 Replies

4. Shell Programming and Scripting

Need best grep option or alternative

Hello, I am processing a text file which contains only words with few combination of characters (it is a dictionary file). example: havana have haven haven't havilland havoc Is there a way to exclude only 1 to 8 character long words which not include space or special characters : '-`~.. so... (5 Replies)
Discussion started by: alekkz
5 Replies

5. Shell Programming and Scripting

Alternative to grep

How to find a particular line in a file without using grep? (3 Replies)
Discussion started by: proactiveaditya
3 Replies

6. UNIX for Dummies Questions & Answers

Grep alternative to handle large numbers of files

I am looking for a file with 'MCR0000000716214' in it. I tried the following command: grep MCR0000000716214 * The problem is that the folder I am searching in has over 87000 files and I am getting the following: bash: /bin/grep: Arg list too long Is there any command I can use that can... (6 Replies)
Discussion started by: runnerpaul
6 Replies

7. UNIX for Advanced & Expert Users

suggest book

Hi I am new to Unix/Linux I know commands and shell scripts which are useful for my project. But i need to know the basics and commands and shell scripts in detail and easy guide. Please refer a book. Thanks Haripatn (6 Replies)
Discussion started by: haripatn
6 Replies

8. Shell Programming and Scripting

Can you suggest a more efficient way for this?

Hi I have the following at the end of a service shutdown script used in part of an active-passive failover setup: ### # Shutdown all primary Network Interfaces # associated with failover ### # get interface names based on IP's # and shut them down to simulate loss of # heartbeatd ... (1 Reply)
Discussion started by: mikie
1 Replies
Login or Register to Ask a Question