Can someone please help me optimize my code (script searches subdirectories)?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Can someone please help me optimize my code (script searches subdirectories)?
# 1  
Old 03-14-2012
Can someone please help me optimize my code (script searches subdirectories)?

Here is my code. What it does is it reads an input file (input.txt which contains roughly 2,000 search phrases) and searches a directory for files that contains the search phrase. The directory contains roughly 1900 files and 84 subdirectories. The output is a file (output.txt) that shows only the file names that contains the searched keyword. I timed this code and it took roughly 3.38 hours to run!!! Can someone help me optimize my code? Or provide my with some suggestions?

HTML Code:
#!/bin/sh
start=$SECONDS
while read word
do
a=$(find /path/to/files -exec grep -wi $word /dev/null {} \; | sort -u | cut -d \: -f1)
if [ -n "$a" ]; then
echo "$word is found in: $a"
fi
echo ""
done < input.txt >> output.txt
end3=$SECONDS
echo "Total Runtime: $((end3 - start3)) secs."
# 2  
Old 03-14-2012
Suggestions:
1) Correct the Runtime calculation!
Code:
start3=$SECONDS

2) Only search files (-type f) and use "grep -l" to get the name of the file once only. Put quotes round "$word" if it is a "phrase".
Code:
a=$(find /path/to/files -type f -exec grep -wil "$word" /dev/null {} \;)

# 3  
Old 03-14-2012
Quote:
Originally Posted by methyl
Suggestions:
1) Correct the Runtime calculation!
Code:
start3=$SECONDS


sorry, i had to modify parts of my code for the forums and that one slipped through the cracks!
This User Gave Thanks to jl487 For This Post:
# 4  
Old 03-14-2012
Additionally to methyl's suggestion try if having xargs improves the performance..
Code:
a=$(find /path/to/files -type f -print0 | xargs -0 grep -wil "$word" )

# 5  
Old 03-15-2012
How about this using awk:

Code:
find /path/to/files -type f -print | awk '
NR==FNR{for(i=1;i<=NR;i++) w[tolower($i)]++ ; next }
{ FILE=$0
  while(getline< FILE) {
     for(i=1;i<=NR;i++) {
         if($i && tolower($i) in w) print tolower($i)" is found in: "FILE
      }
  }
  close(FILE)
}' input.txt - >> output.txt

---------- Post updated at 04:08 PM ---------- Previous update was at 03:50 PM ----------

Sorry didn't pick up that the requirement was to find phrases not individual words, this should work but not quite as blazing fast:

Edit: also avoids printing result more than once if phrase appears multiple times in file.

Code:
find /path/to/files -type f -print | awk '
NR==FNR{w[tolower($0)" "]++ ; next}
{ FILE=$0
  delete h
  while(getline< FILE) {
     $0=" "tolower($0)" "
     for(L in w)
           if(!(L in h) && match($0, " "L)) {
              print L "is found in: "FILE
              h[L]++
           }
  }
  close(FILE)
}' input.txt - >> output.txt


Last edited by Chubler_XL; 03-15-2012 at 03:21 AM..
# 6  
Old 03-15-2012
Does your grep have recursive capabilities (-r / -R )? Then you could perhaps use this instead of your script:
Code:
grep -Frilwf input.txt /path/to/files > output.txt

instead of your script
# 7  
Old 03-15-2012
Quote:
Originally Posted by Scrutinizer
Does your grep have recursive capabilities (-r / -R )? Then you could perhaps use this instead of your script:
Code:
grep -Frilwf input.txt /path/to/files > output.txt

instead of your script

no recursive capabilities. now i'm jealous....
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Help Optimize the Script Further

Hi All, I have written a new script to check for DB space and size of dump log file before it can be imported into a Oracle DB. I'm relatively new to shell scripting. Please help me optimize this script further. (0 Replies)
Discussion started by: narayanv
0 Replies

2. Shell Programming and Scripting

Optimize awk code

sample data.file: 0,mfrh_green_screen,1454687485,383934,/PROD/G/cicsmrch/sys/unikixmain.log,37M,mfrh_green_screen,28961345,0,382962--383934 0,mfrh_green_screen,1454687785,386190,/PROD/G/cicsmrch/sys/unikixmain.log,37M,mfrh_green_screen,29139568,0,383934--386190... (7 Replies)
Discussion started by: SkySmart
7 Replies

3. Shell Programming and Scripting

Looking to optimize code

Hi guys, I feel a bit comfortable now doing bash scripting but I am worried that the way I do it is not optimized and I can do much better as to how I code. e.g. I have a whole line in a file from which I want to extract some values. Right now what I am doing is : STATE=`cat... (5 Replies)
Discussion started by: Junaid Subhani
5 Replies

4. Shell Programming and Scripting

Optimize my mv script

Hello, I'm wondering if there is a quicker way of doing this. Here is my mv script. d=/conversion/program/out cd $d ls $d > /home/tempuser/$$tmp while read line ; do a=`echo $line|cut -c1-5|sed "s/_//g"` b=`echo $line|cut -c16-21` if ;then mkdir... (13 Replies)
Discussion started by: whegra
13 Replies

5. Shell Programming and Scripting

pl help me to Optimize the given code

Pl help to me to write the below code in a simple way ... i suupose to use this code 3 to 4 places in my makefile(gnu) .. **************************************** @for i in $(LIST_A); do \ for j in $(LIST_B); do\ if ;then\ echo "Need to sign"\ echo "List A = $$i , List B =$$j"\ ... (2 Replies)
Discussion started by: pk_arun
2 Replies

6. Shell Programming and Scripting

Optimize shell code

#!/usr/bin/perl use strict; use warnings; use Date::Manip; my $date_converted = UnixDate(ParseDate("3 days ago"),"%e/%h/%Y"); open FILE,">$ARGV"; while(<DATA>){ my @tab_delimited_array = split(/\t/,$_); $tab_delimited_array =~ s/^\ =~ s/^\-//; my $converted_date =... (2 Replies)
Discussion started by: sandy1028
2 Replies

7. Shell Programming and Scripting

Optimize and Speedup the script

Hi All, There is a script (test.sh) which is taking more CPU usage. I am attaching the script in this thread. Could anybody please help me out to optimize the script in a better way. Thanks, Gobinath (6 Replies)
Discussion started by: ntgobinath
6 Replies

8. UNIX for Dummies Questions & Answers

Can we optimize this simple script ?

Hi All , I am just a new bie in Unix/Linux . With help of tips from 'here and there' , I just created a simple script to 1. declare one array and some global variables 2. read the schema names from user (user input) and want2proceed flag 3. if user want to proceed , keep reading user... (8 Replies)
Discussion started by: rajavu
8 Replies

9. Shell Programming and Scripting

optimize the script

Hi, I have this following script below. Its searching a log file for 2 string and if found then write the strings to success.txt and If not found write strings to failed.txt . if one found and not other...then write found to success.txt and not found to failed.txt. I want to optimize this... (3 Replies)
Discussion started by: amitrajvarma
3 Replies

10. News, Links, Events and Announcements

New Tool Searches and Replaces SCO Code

See this article: http://story.news.yahoo.com/news?tmpl=story&cid=74&ncid=738&e=9&u=/cmp/20030809/tc_cmp/13000487 (3 Replies)
Discussion started by: Neo
3 Replies
Login or Register to Ask a Question