Best Alternative to Search Text strings in directory


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Best Alternative to Search Text strings in directory
# 1  
Old 05-08-2011
Best Alternative to Search Text strings in directory

Hi All,

We have a file "Customers.lst". It contains list of all the Customers.

There is directory which has number of text files and each file containing name of defaulter customers.

We want to search for all the customers available in "Customers.lst" file against the list of files containing the name of default customers and then
print the Default customer name and the file in which it is available.

Currently we are reading Customers.lst file line by line and then doing grep in the directory.
Code:
for customername in `cat ${customerlistfile}`
do
   cd $dir
   echo "Searching for $customername in the $dir" >> $mainlog
   find . -name "*"|xargs grep -il $customername |sed 's/.*/'"${customername}"', &/' >> $mainout
done

Is there a better way to do this operation or any suggestions on the above command to make it faster.

REgards,
Arun M

Last edited by Scott; 05-08-2011 at 09:15 AM.. Reason: Added code tags
# 2  
Old 05-08-2011
Please use code tags.
A few minor suggestions to your code:
= Useless Use Of Cat; no need
= '-name "*"' is redundant, be more specific or omit, if searching through all files
= sed doesn't need to do matching; use anchor '^'
= do use double quotes in grep pattern variable (no need inside sed command though)

Is there a reason why you 'cd' in each loop iteration? Does $dir change?
Code:
while read customername ; do 
   echo "Searching for $customername in the $dir" >> $mainlog
   find $dir | xargs grep -il "$customername" | 
     sed 's/^/'${customername}', /' >> $mainout
done <  ${customerlistfile}

# 3  
Old 05-08-2011
Thanks a lot for your reply. It helps a lot.

I have follow up questions for my understanding as still novice in shell scripting.

a) Which looping construct is more efficient in case the file to loop through is big one. In our case customer list file will be 25k-35k records ??

b) do use double quotes in grep pattern variable (no need inside sed command though)

Could you plz explain this suggestion.

Thanks once again.

Arun
# 4  
Old 05-08-2011
a) <file redirection is not too much more efficient than your for loop. It really just saves one process (cat). The best is to avoid shell loop and run the file through a filter, like awk.
And since you're not using find(1) to do anything super useful, it can be omitted as well. Something along these lines:
Code:
awk -v "dir=$someDir" -v "log=some.log" '{print "Searching for "$0 >> log; printf("grep -il \"%s\" * | sed \"s/^/%s, /\"\n", $0,$0); }' Customers.lst

This will print the commands to stdout. Look at it, take one, and execute it, and if everythinh seems well, pipe it to bash and capture the output to a file:
Code:
awk -v "dir=$someDir" -v "log=some.log" '{print "Searching for "$0  >> log; printf("grep -il \"%s\" * | sed \"s/^/%s, /\"\n", $0,$0);  }' Customers.lst | sh > output.log


b) if you omit double quotes and you're searching for a 2-word pattern, grep takes second word as the file to do search on:
Code:
$ cat file 
John Doe
John OtherDoe
$ grep John Doe file 
grep: Doe: No such file or directory
file:John Doe
file:John OtherDoe
$ grep "John Doe" file 
John Doe


Last edited by mirni; 05-08-2011 at 08:20 AM..
# 5  
Old 05-09-2011
Hi,

Thanks for the reply.

I tried using the example above it is throwing syntaxt error whenever trying to use log or dir.

Could you please explain :

a) "grep -il \"%s\" * " : How this will allow search of $0 in the current directories and its subsirectories ??

b) sed \"s/^/%s, /\"\n", $0,$0); : This sed ??


Regards,
Arun M
# 6  
Old 05-09-2011
Sorry, the redirection into variable doesn't work as expected. The following, inserting it straight from shell, works fine.
Let me format it nicer:
Code:
sh$ logFile=someLog.txt
sh$ awk '{
  print "Searching for "$0 > "'$logFile'"; 
  printf("grep -il \"%s\" * | sed \"s/^/%s, /\"\n", $0,$0); 
}' Customers.lst

Input:
Code:
sh$ cat Customers.lst
John Doe
Alice In Wonderland
John Other

Outputs:
Code:
sh$ logFile=somOther.log; awk '
{print "Searching for " $0 > "'$logFile'" ; printf("grep -il \"%s\" * | sed \"s/^/%s, /\"\n", $0,$0); 
}' Customers.lst
grep -il "John Doe" * | sed "s/^/John Doe, /"
grep -il "Alice In Wonderland" * | sed "s/^/Alice In Wonderland, /"
grep -il "John Other" * | sed "s/^/John Other, /"

So awk is used to print out a bunch of commands with your names from Customers.lst. Nicely each command on one line.
Now what does each command do when you execute it:
Code:
grep -il "John Doe" *

will search for "John Doe", case insensitive (-i), in all files present (*) and output the filename only if string is found (-l).
So if you have 3 log files present in current directory:
Code:
sh$ ls
file1.log
file2.log
file3.log

and only one -- file2.log -- contains "John Doe", then:
Code:
sh$ grep -il "John Doe" *
file2.log

that's the filename that grep returns.
Now you want <name>, <filenameThatContains_name>
so that's what sed does; it inserts 'John Doe, ' in the beginning of line ('^'), so:
Code:
sh$ echo file2.log | sed "s/^/John Doe, /"
John Doe, file2.log

Those '%s' are format specifiers for printf, it's telling printf to print string, and $0,$0 are arguments to printf, telling it to substitute $0 (which is the whole record -- the name (if you don't have anything else on line of Customers.lst) in awk) for %s.

So the awk magic is just to format and print out a nice command to the screen. Then you can test out one of those commands, if you wish, or just assess them visually (useful for debugging, before you actually run them).
Then pipe to bash to execute, and redirect to capture output.
You might need to insert a directory name for grep's argument, instead of just plain '*' to adjust for details of your dir structure.

I know it's not quite elegant, but should be faster.
Loops in shell are not nearly as efficient. Awk excels in the speed of reading lines from input -- it's a filter, after all, well crafted for this particular purpose. Then you are gonna launch a process for each grep; this can be optimized further (parallelized onto the CPUs, e.g).

If you want, you can do a little benchmark, take a subset of your logfiles, and process them with our script and mine.
Run it with 'time' like:
Code:
time ./scriptToRun.sh

which will spit out the time it takes for script to finish. I'd be curious...
Also I can't wait for answers of more experienced *nix people.

Last edited by mirni; 05-09-2011 at 06:24 AM..
# 7  
Old 05-09-2011
Hi,

Thanks for the detailed reply.

awk is faster than the looping at present based on current run. I have follow up questions :

a) We provided the list with 5000 customers, it ran pretty fast for first 500-800 then it is taking long time as it is running though the list.

Is it related to buffer or memory allocated ??

Regards,
Arun M
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Search for a text between two strings in a file using regex

Here is my sample file data: My requirement is to have a regex expression that is able to search for visible starting string "SSLInsecureRenegotiation Off" between strings "<VirtualHost " and "</VirtualHost>". In the sample data two lines should be matched. Below is what I tried but... (5 Replies)
Discussion started by: mohtashims
5 Replies

2. UNIX for Beginners Questions & Answers

Search strings from a file in files in a directory recursively; then print the string with a status

Hi All, I hope somebody would be able to help me. I would need to search a string coming from a file, example file.txt: dog cat goat horse fish For every string, I would need to know if there are any files inside a directory(recursively) that contains the string regardless of case.... (9 Replies)
Discussion started by: kokoro
9 Replies

3. Shell Programming and Scripting

Read in search strings from text file, search for string in second text file and output to CSV

Hi guys, I have a text file named file1.txt that is formatted like this: 001 , ID , 20000 002 , Name , Brandon 003 , Phone_Number , 616-234-1999 004 , SSNumber , 234-23-234 005 , Model , Toyota 007 , Engine ,V8 008 , GPS , OFF and I have file2.txt formatted like this: ... (2 Replies)
Discussion started by: An0mander
2 Replies

4. Shell Programming and Scripting

Search between two search strings and print the value

Based on the forums i have tried with grep command but i am unable to get the required output. search this value /*------ If that is found then search for temp_vul and print and also search until /*------- and print new_vul Input file contains: ... (5 Replies)
Discussion started by: onesuri
5 Replies

5. Shell Programming and Scripting

Change to directory and search some file in that directory in single command

I am trying to do the following task : export ENV=aaa export ENV_PATH=$(cd /apps | ls | grep $ENV) However, it's not working. What's the way to change to directory and search some file in that directory in single command Please help. (2 Replies)
Discussion started by: saurau
2 Replies

6. Shell Programming and Scripting

Search replace strings between single quotes in a text file

Hi There... I need to serach and replace a strings in a text file. My file has; books.amazon='Let me read' and the output needed is books.amazon=NONFOUND pls if anybody know this can be done in script sed or awk.. i have a list of different strings to be repced by NONFOUND.... (7 Replies)
Discussion started by: Hiano
7 Replies

7. UNIX for Dummies Questions & Answers

How to insert alternative columns and sort text from first column to second?

Hi Everybody, I am just new to UNIX as well as to this forum. I have a text file with 10,000 coloumns and each coloumn contains values separated by space. I want to separate them into new coloumns..the file is something like this as ad af 1 A as ad af 1 D ... ... 1 and A are in one... (7 Replies)
Discussion started by: Unilearn
7 Replies

8. UNIX for Dummies Questions & Answers

Search for a text within files in a directory

I need to search for a particular string. This string might be present in many files. The directory in which I am present has more than one subdirectories. Hence, the search should check in all the subdirectories and all the corresponding files and give a list of files which have the particular... (5 Replies)
Discussion started by: pakspan
5 Replies

9. Shell Programming and Scripting

how to do search and replace on text files in directory

I was google searching and found Perl as a command line utility tool This almost solves my problem: find . | xargs perl -p -i.old -e 's/oldstring/newstring/g' I think this would create a new file for every file in my directory tree. Most of my files will not contain oldstring and I... (1 Reply)
Discussion started by: siegfried
1 Replies

10. Shell Programming and Scripting

Search between strings with an OR

Hi have Input in this way KEY AAAA BBBB END1 KEY AAAA BBBB END2 KEY AAAA BBBB END3 I need to find any thing matching in between KEY And ending with "END1|END2|END3" This didnot work awk '/KEY/,/END1|END2|END3/' (3 Replies)
Discussion started by: pbsrinivas
3 Replies
Login or Register to Ask a Question