grep 1000s of files with 1000s of grep values


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting grep 1000s of files with 1000s of grep values
# 1  
Old 10-23-2012
grep 1000s of files with 1000s of grep values

Hi,

I have around 200,000 files in a given directory.

I need to cat each of these files and grep them for thousands of identifier values (or strings) in a given text file.

The text file looks something like this:

1234
1243545
1234353
121324

etc with thousands of entries.

Can you please assist how I can do this, and in the most efficient manner possible because this script will no doubt take a long time to run.

Thanks in advance.

Mantis
# 2  
Old 10-23-2012
try:
Code:
grep -f text_file many_files > new_file

# 3  
Old 10-23-2012
This task will definitely complete before the next ice age sets in. (humor... sort of)

Consider adding some parallelism. This will only do well on a multi-cpu or box with a cpu that supports the equivalent of hyperhtreads. rdrtx1's solution is as good as it gets for a single cpu box. You may be able to run two processes in parallel. I do not know.

split your pattern file into several smaller files, because the more lines you have in the pattern file the more cpu is spent looking at each line in the search file.

Example with 1000 line file split into n x m line files: 4 X 250 or 8 x 125 might be better.

This benefits from disk controller caching and having grep run through fewer lines of patterns for each line of source. Let's say you think 8 parallel processes will do well.
Some systems do NOT do better with this, so set up a small test first.
Code:
#/bin/bash
cd /directory/with/zillions/of/files

> /path/to/result

ls | while read fname
do
 grep -f /path/to/file1  $fname >> /path/to/result  & 
 grep -f /path/to/file2  $fname >> /path/to/result  &
 grep -f /path/to/file3  $fname >> /path/to/result  &
 grep -f /path/to/file4  $fname >> /path/to/result  &
 grep -f /path/to/file5  $fname >> /path/to/result  &
 grep -f /path/to/file6  $fname >> /path/to/result  &
 grep -f /path/to/file7  $fname >> /path/to/result  &
 grep -f /path/to/file8  $fname >> /path/to/result  &
 wait
done

# 4  
Old 10-26-2012
Thank you so much fellas. Been so busy own been able to try your suggestions today.

Regarding the paralellism that is a great idea, problem is I have to put a condition after the grep eg:

Code:
grep -f /path/to/file1  $fname >> /path/to/result  &
        if [ $? = 0 ];
        then
        cp -p $fname $PATH
        fi


But this wont work correctly because of the & which will always be correct. So how do I use parallelism in a script like this?

Thanks again.
Mantis

Last edited by Franklin52; 10-27-2012 at 01:09 PM.. Reason: Please use code tags
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Inconsistent `ps -eaf -o args | grep -i sfs_pcard_load_file.ksh | grep -v grep | wc -l`

i have this line of code that looks for the same file if it is currently running and returns the count. `ps -eaf -o args | grep -i sfs_pcard_load_file.ksh | grep -v grep | wc -l` basically it is assigned to a variable ISRUNNING=`ps -eaf -o args | grep -i sfs_pcard_load_file.ksh |... (6 Replies)
Discussion started by: wtolentino
6 Replies

2. UNIX Desktop Questions & Answers

How do you [e]grep for multiple values within multiple files?

Hi I'm sure there's a way to do this, but I ran out of caffeine/talent before getting the answer in a long winded alternate way (don't ask ;) ) The task I was trying to do was scan a directory of files and show only files that contained 3 values: I940 5433309 2181 I tried many variations... (4 Replies)
Discussion started by: callumw
4 Replies

3. Cybersecurity

1000s of undelivered email messages

Hi, My boss has suddenly started receiving 1000s of messages in his inbox. They are undelivered messages that are bouncing back, though the emails weren't coming from him. I guess either these are fake undelivered messages and are just scam emails. Or they are real emails being sent with spoofed... (1 Reply)
Discussion started by: timgolding
1 Replies

4. Shell Programming and Scripting

grep distinct values

this is a little more complex than that. I have a text file and I need to find all the distinct words that appear in a line after the word TABLESPACE when I grep for just the word tablespace, I get: how do i parse this a little better so i have a smaller file to read? This is just an... (4 Replies)
Discussion started by: guessingo
4 Replies

5. Shell Programming and Scripting

grep for certain files using a file as input to grep and then move

Hi All, I need to grep few files which has words like the below in the file name , which i want to put it in a file and and grep for the files which contain these names and move it to a new directory , full file name -C20091210.1000-20091210.1100_SMGBSC3:1000... (2 Replies)
Discussion started by: anita07
2 Replies

6. UNIX for Advanced & Expert Users

Moving 1000s of files to another folder

Hi, I need to move 1000s of files from one folder to another. Actually there are 100K+ files. Source dir : source1 Target dir : target1 Now if try cp or mv commands I am getting an error message : Argument List too long. I tried to do it by the time the files are created in the... (1 Reply)
Discussion started by: unx100
1 Replies

7. Shell Programming and Scripting

grep two values together.

Hi... I have a file abc.txt , havin more then 10,000 lines, each field separated by '#'. I want to grep 9914699895 and 999 from abc.txt I am trying cat abc.txt | grep 9914699895 | grep 999 but i am also getting data like 9991111111 or 9991010101 I want to grep "999" exactly and... (1 Reply)
Discussion started by: tushar_tus
1 Replies

8. Shell Programming and Scripting

MEM=`ps v $PPID| grep -i db2 | grep -v grep| awk '{ if ( $7 ~ " " ) { print 0 } else

Hi Guys, I need to set the value of $7 to zero in case $7 is NULL. I've tried the below command but doesn't work. Any ideas. thanks guys. MEM=`ps v $PPID| grep -i db2 | grep -v grep| awk '{ if ( $7 ~ " " ) { print 0 } else { print $7}}' ` Harby. (4 Replies)
Discussion started by: hariza
4 Replies

9. UNIX for Dummies Questions & Answers

grep using ASCII values

machine: HPUX file: a.dat contents: decimal 1 decimal 2 string 1 string 2 ASCII value of 'd': 100. to grep lines that have 'd', I use the following command grep d a.dat My requirement: I should grep for lines that contain 'd'. But I should use ASCII value of 'd' in the command... (1 Reply)
Discussion started by: sriksama
1 Replies

10. Shell Programming and Scripting

grep a list of values

Hi everybody! :) :D :D :) it's great to be here since this is my first post. touch /base/oracle/FRA/XMUT00/RMAN_FLAG touch /base/oracle/FRA/XRLL00/RMAN_FLAG find directory name containing RMAN_FLAG : $ find /base/oracle/FRA -name RMAN_FLAG -print|xargs -n1 dirname |sort -u... (3 Replies)
Discussion started by: jolan_louve
3 Replies
Login or Register to Ask a Question