Script takes too long to complete


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Script takes too long to complete
# 1  
Old 06-27-2016
Script takes too long to complete

Hi,

I have a lengthy script which i have trimmed down for a test case as below.
Code:
more run.sh
#!/bin/bash
paths="allpath.txt"
while IFS= read -r loc
do
echo "Working on $loc"
startdir=$loc
find "$startdir" -type f \( ! -name "*.log*" ! -name "*.class*" \) -print |
while read file
do
echo "TIME-STAMP:"$(date)
input="alter.txt"
while IFS= read -r var
do
searchterm=$(echo $var | awk -F'=' '{print $1}')
replaceterm=$(echo $var | awk -F'=' '{print $2}')
echo "ST:"$searchterm
echo "RT:"$replaceterm
done < "$input"
done
done < "$paths"

more alter.txt
Code:
hello=yello
wow=how
ping=pong
seesaw=heehow
bongo=ringo
jazbant=toaster
westowin=restaurant

more allpath.txt
Code:
/tmp/web/var/APPLE_DOM
/tmp/bin/var/APPLE_ART

The above script reads for a physical path from allpath.txt. It then looks for all files except .logs & .class files using the find command find "$startdir" -type f \( ! -name "*.log*" ! -name "*.class*" \) -print.

Note: the find command above is instant if i fire it as a separate command from the bash shell on that directory location. Takes less than 2 secs to list all the files.

For each file found it searches for all the "search strings" mentioned in the alter.txt file while IFS= read -r var and replaces it with the corresponding text (this part of the code i have not shared considering not necessary)

For a folder 4GB in size it take around 25 mins to complete.

Can you help me optimize the script so it completes in less time.

Last edited by mohtashims; 06-27-2016 at 09:50 AM..
# 2  
Old 06-27-2016
The entire logic and structure of that script seems suboptimal. For every file found, you (re)open "alter.txt", read every single line, invoke awk twice and - I'm guessing based on your other threads - run something like sed to do the replacements.

Depending on the found files' count this IS going to be lengthy.

I'm not talking of improving the innermost loop here - although there is quite some potential.
Why don't you leave the looping to one single instance of e.g. awk?
Create a list of all file candidates (find can have several paths as starting points) and run awk, first reading all the search/replacement pairs, and then working those on all files presented.
# 3  
Old 06-27-2016
Yes, you are right ...i do have sed to do replacement but did not share for the sake of making it look simple for others.

I don't know if I understood correctly and if i can work this out.

What i understood is

You asking me to keep the find inside the while IFS= read -r var loop ?

Is that correct ?

---------- Post updated at 09:35 AM ---------- Previous update was at 08:31 AM ----------

Quote:
Originally Posted by RudiC
The entire logic and structure of that script seems suboptimal. For every file found, you (re)open "alter.txt", read every single line, invoke awk twice and - I'm guessing based on your other threads - run something like sed to do the replacements.

Depending on the found files' count this IS going to be lengthy.

I'm not talking of improving the innermost loop here - although there is quite some potential.
Why don't you leave the looping to one single instance of e.g. awk?
Create a list of all file candidates (find can have several paths as starting points) and run awk, first reading all the search/replacement pairs, and then working those on all files presented.
Keeping the find inside the while IFS= read -r var loop helps cut down the time taken by more than half !!

Here is the latest code snippet

Code:
more run.sh
#!/bin/bash
paths="allpath.txt"
while IFS= read -r loc
do
echo "Working on $loc"
startdir=$loc
input="alter.txt"
while IFS= read -r var
do
searchterm=$(echo $var | awk -F'=' '{print $1}')
replaceterm=$(echo $var | awk -F'=' '{print $2}')
find "$startdir" -type f \( ! -name "*.log*" ! -name "*.class*" \) -print |
while read file
do
echo "TIME-STAMP:"$(date)
echo "ST:"$searchterm
echo "RT:"$replaceterm
done
done < "$input"
done < "$paths"

Can it be optimized further ?

Last edited by mohtashims; 06-27-2016 at 12:15 PM..
# 4  
Old 06-27-2016
Quote:
Originally Posted by mohtashims
.
.
.
You asking me to keep the find inside the while IFS= read -r var loop ?
.
.
.
No, this is not what I said.

Quote:
Can it be optimized further ?
To repeat my statement bluntly: It should be replaced.
# 5  
Old 06-27-2016
Yes, your inner loop (in post#1) should fill both variables in one stroke, use the correct InputFileSeparator
Code:
    input="alter.txt"
    while IFS="=" read -r searchterm replaceterm
    do
      echo "ST:$searchterm"
      echo "RT:$replaceterm"
    done < "$input"

# 6  
Old 06-27-2016
With the assumptions:
- not too many lines in "allpath.txt",
- not too many files found,
- bash being used,
wouldn't this do?
Code:
awk '
FNR == NR       {R[$1] = $2
                 next
                }
                {for (r in R) gsub (r, R[r])
                 print > (FILENAME ".new")
                }
' FS="=" alter.txt $(find $(< allpaths.txt) -type f \( ! -name "*.log*" ! -name "*.class*" \))

You will have to rename the ".new" files afterwards.
This User Gave Thanks to RudiC For This Post:
# 7  
Old 06-27-2016
Quote:
Originally Posted by MadeInGermany
Yes, your inner loop (in post#1) should fill both variables in one stroke, use the correct InputFileSeparator
Code:
    input="alter.txt"
    while IFS="=" read -r searchterm replaceterm
    do
      echo "ST:$searchterm"
      echo "RT:$replaceterm"
    done < "$input"

If you look at the modified script in my last post .. it takes the same time with or without this suggestion.

I was able to bring down the execution time from 25 mins to -> just 7 mins.

Please suggest if there is anything else that can be done to optimize this ?

@RudiC:

I m sorry for not able to completely understand your suggestion.

Can you please elaborate the below only if the same is not covered in my last post with the updated script.

Quote:
Why don't you leave the looping to one single instance of e.g. awk ?
Create a list of all file candidates ( find can have several paths as starting points) and run awk , first reading all the search/replacement pairs, and then working those on all files presented.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Checking for substring in a loop takes too long to complete.

I need to check if the files returned by ls command in the below script is a sub-string of the argument passed to the script i.e $1 The below script works fine but is too slow. If the ls command take 12 secs to complete printing all files with while loop then; using posix substring check... (6 Replies)
Discussion started by: mohtashims
6 Replies

2. Shell Programming and Scripting

Find command takes long

Hi, I am trying to search for a Directory called "mont" under a directory path "/opt/app/var/dumps" Although "mont" is in the very parent directory called "dumps" i.e "/opt/app/var/dumps/mont" and it can never be inside any Sub-Directory of "dumps"; my below find command which also checks... (5 Replies)
Discussion started by: mohtashims
5 Replies

3. UNIX for Advanced & Expert Users

Find command takes too long to complete

Hi, Below is my find command find /opt/app/websphere -name myfolder -perm -600 | wc -l At time it even takes 20 mins to complete. my OS is : SunOS mypc 5.10 Generic_150400-09 sun4v sparc SUNW,T5440 (10 Replies)
Discussion started by: mohtashims
10 Replies

4. Shell Programming and Scripting

Wget takes a long time to complete

Hi, I wish to check the return value for wget $url. However, some urls are designed to take 45 minutes or more to return. All i need to check if the URL can be reached or not using wget. How can i get wget to return the value in a few seconds ? (8 Replies)
Discussion started by: mohtashims
8 Replies

5. UNIX and Linux Applications

database takes long time to process

Hi, we currently having a issue where when we send jobs to the server for the application lawson, it is taking a very long time to complete. here are the last few lines of the database log. 2012-09-18-10.35.55.707279-240 E244403536A576 LEVEL: Warning PID : 950492 ... (1 Reply)
Discussion started by: techy1
1 Replies

6. Shell Programming and Scripting

sort takes a long time

Dear experts I have a 200MG text file in this format: text \tab number I try to sort using options -fd and it takes very long! is that normal or I can speed it up in some ways? I dont want to split the file since this one is already splitted. I use this command: sort -fd file >... (12 Replies)
Discussion started by: voolek
12 Replies

7. UNIX for Dummies Questions & Answers

time how long it takes to load a module

Hello, like the title says, how can i measure the time it takes to load a module in Linux, and how how can i measure the time it takes to load a statically compiled module. /Best Regards Olle ---------- Post updated at 01:13 PM ---------- Previous update was at 11:54 AM ---------- For... (0 Replies)
Discussion started by: ollebanan
0 Replies

8. Shell Programming and Scripting

<AIX>Problem in purge script, taking very very long time to complete 18.30hrs

Hi, I have here a script which is used to purge older files/directories based on defined purge period. The script consists of 45 find commands, where each command will need to traverse through more than a million directories. Therefore a single find command executes around 22-25 mins... (7 Replies)
Discussion started by: sravicha
7 Replies

9. Shell Programming and Scripting

shell script takes long time to complete

Hi all, I wrote this shell script to validate filed numbers for input file. But it take forever to complete validation on a file. The average speed is like 9mins/MB. Can anyone tell me how to improve the performance of a shell script? Thanks (12 Replies)
Discussion started by: ozzman
12 Replies

10. Shell Programming and Scripting

Killing a process that takes too long

Hello, I have a C program that takes anywhere from 5 to 100 arguments and I'd like to run it from a script that makes sure it doesnt take too long to execute. If the C program takes more than 5 seconds to execute, i would like the shell script to kill it and return a short message to the user. ... (3 Replies)
Discussion started by: WeezelDs
3 Replies
Login or Register to Ask a Question