grep 1000s of files with 1000s of grep values | Unix Linux Forums | Shell Programming and Scripting

  Go Back    


Shell Programming and Scripting Post questions about KSH, CSH, SH, BASH, PERL, PHP, SED, AWK and OTHER shell scripts and shell scripting languages here.

grep 1000s of files with 1000s of grep values

Shell Programming and Scripting


Tags
cat grep linux files multiple

Closed Thread    
 
Thread Tools Search this Thread Display Modes
    #1  
Old 10-23-2012
mantis mantis is offline
Registered User
 
Join Date: Oct 2012
Last Activity: 29 October 2012, 7:07 PM EDT
Posts: 5
Thanks: 0
Thanked 0 Times in 0 Posts
grep 1000s of files with 1000s of grep values

Hi,

I have around 200,000 files in a given directory.

I need to cat each of these files and grep them for thousands of identifier values (or strings) in a given text file.

The text file looks something like this:

1234
1243545
1234353
121324

etc with thousands of entries.

Can you please assist how I can do this, and in the most efficient manner possible because this script will no doubt take a long time to run.

Thanks in advance.

Mantis
Sponsored Links
    #2  
Old 10-23-2012
rdrtx1 rdrtx1 is offline
Registered User
 
Join Date: Sep 2012
Last Activity: 17 April 2014, 5:28 PM EDT
Location: Houston, Texas, USA
Posts: 660
Thanks: 0
Thanked 200 Times in 192 Posts
try:

Code:
grep -f text_file many_files > new_file

Sponsored Links
    #3  
Old 10-23-2012
jim mcnamara jim mcnamara is offline Forum Staff  
...@...
 
Join Date: Feb 2004
Last Activity: 19 April 2014, 8:15 AM EDT
Location: NM
Posts: 10,058
Thanks: 250
Thanked 756 Times in 711 Posts
This task will definitely complete before the next ice age sets in. (humor... sort of)

Consider adding some parallelism. This will only do well on a multi-cpu or box with a cpu that supports the equivalent of hyperhtreads. rdrtx1's solution is as good as it gets for a single cpu box. You may be able to run two processes in parallel. I do not know.

split your pattern file into several smaller files, because the more lines you have in the pattern file the more cpu is spent looking at each line in the search file.

Example with 1000 line file split into n x m line files: 4 X 250 or 8 x 125 might be better.

This benefits from disk controller caching and having grep run through fewer lines of patterns for each line of source. Let's say you think 8 parallel processes will do well.
Some systems do NOT do better with this, so set up a small test first.

Code:
#/bin/bash
cd /directory/with/zillions/of/files

> /path/to/result

ls | while read fname
do
 grep -f /path/to/file1  $fname >> /path/to/result  & 
 grep -f /path/to/file2  $fname >> /path/to/result  &
 grep -f /path/to/file3  $fname >> /path/to/result  &
 grep -f /path/to/file4  $fname >> /path/to/result  &
 grep -f /path/to/file5  $fname >> /path/to/result  &
 grep -f /path/to/file6  $fname >> /path/to/result  &
 grep -f /path/to/file7  $fname >> /path/to/result  &
 grep -f /path/to/file8  $fname >> /path/to/result  &
 wait
done

    #4  
Old 10-26-2012
mantis mantis is offline
Registered User
 
Join Date: Oct 2012
Last Activity: 29 October 2012, 7:07 PM EDT
Posts: 5
Thanks: 0
Thanked 0 Times in 0 Posts
Thank you so much fellas. Been so busy own been able to try your suggestions today.

Regarding the paralellism that is a great idea, problem is I have to put a condition after the grep eg:


Code:
grep -f /path/to/file1  $fname >> /path/to/result  &
        if [ $? = 0 ];
        then
        cp -p $fname $PATH
        fi


But this wont work correctly because of the & which will always be correct. So how do I use parallelism in a script like this?

Thanks again.
Mantis

Last edited by Franklin52; 10-27-2012 at 12:09 PM.. Reason: Please use code tags
Sponsored Links
Closed Thread

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
How do you [e]grep for multiple values within multiple files? callumw UNIX Desktop for Dummies Questions & Answers 4 03-18-2012 09:21 PM
1000s of undelivered email messages timgolding Security 1 10-29-2010 03:14 PM
grep for certain files using a file as input to grep and then move anita07 Shell Programming and Scripting 2 12-10-2009 03:59 AM
Moving 1000s of files to another folder unx100 UNIX for Advanced & Expert Users 1 12-04-2009 08:15 AM
grep two values together. tushar_tus Shell Programming and Scripting 1 02-24-2009 05:40 AM



All times are GMT -4. The time now is 03:00 PM.