Improve the performance of a shell script


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting Improve the performance of a shell script
# 1  
Old 04-09-2010
Improve the performance of a shell script

Hi Friends,

I wrote the below shell script to generate a report on alert messages recieved on a day. But i for processing around 4500 lines (alerts) the script is taking aorund 30 minutes to process.

Please help me to make it faster and improve the performace of the script. i would be very happy if this report gets generated within 5 mins - if possible
Below is my shell script

Code:
cp a_15.txt abc_15.txt && cp a_15.txt xyz_15.txt


total_word_count=0; match_word_count=0; alerts_matched=0;

while read outer_line
do
echo -e "$outer_line"
total_word_count=`echo $outer_line |tr '[ : .' ' ' |awk '{ print NF} '`
outer_line=`echo $outer_line |tr '[ : .' ' '`

##
while read inner_line
do

###
for word in $outer_line
do
echo $inner_line |grep -i -w "$word" 1>/dev/null
if [ $? -eq 0 ];then 
match_word_count=`echo $match_word_count + 1|bc`
else 
:
fi
done
###

match_pcnt=`echo "scale=2; $match_word_count/$total_word_count*100"|bc |awk -F"." '{print $1}'`


if [ $match_pcnt -ge 66 ];then
alerts_matched=`echo $alerts_matched + 1|bc`
inner_line=`echo $inner_line| tr '[ : .' '.'`
sed -i "s/$inner_line//g" ./abc_15.txt
sed -i '/^$/d' ./abc_15.txt

else
:
fi

match_word_count=0;
match_pcnt=0

done <./abc_15.txt
##
echo -e "\nALERTS_MATCHED: $alerts_matched\n\n"

alerts_matched=0;
cat ./abc_15.txt >./xyz_15.txt

done <./xyz_15.txt >APS.out

---------- Post updated at 10:00 AM ---------- Previous update was at 09:55 AM ----------

The alert messages (4000 lines) will be like below
Code:
ERROR (19:58:50,781 ERROR Thread-37-RunnerPool-Fast RamPricingQueryRunner.run:493 Got exception parsing VAN query result for product [APS   .Q])
ERROR (19:45:40,529 ERROR Thread-26-RunnerPool-Std RamPricingQueryRunner.run:493 Got exception parsing VAN query result for product [APS   .Q])
ERROR (01:31:24,073 ERROR Thread-214 STAbstractComputable.getComputeResult:114 Cycle detected at com.sample.es.calcrules.newstyle.RuleAwareMarketDataHolder@12345, key=rules|APS   .Q, subject=APS   .Q, , rule=com.sample.es

The report i want looks like below

Code:
ERROR (19:58:50,781 ERROR Thread-37-RunnerPool-Fast RamPricingQueryRunner.run:493 Got exception parsing VAN query result for product [APS   .Q])
ALERTS_MATCHED: 300

ERROR (19:45:40,529 ERROR Thread-26-RunnerPool-Std RamPricingQueryRunner.run:493 Got exception parsing VAN query result for product [APS   .Q])
ALERTS_MATCHED: 55

ERROR (01:31:24,073 ERROR Thread-214 STAbstractComputable.getComputeResult:114 Cycle detected at com.sample.es.calcrules.newstyle.RuleAwareMarketDataHolder@12345, key=rules|APS   .Q, subject=APS   .Q, , rule=com.sample.es
ALERTS_MATCHED: 700


Last edited by apsprabhu; 04-09-2010 at 02:38 AM..
# 2  
Old 04-09-2010
I can barely read the script because of the lack of contrast between foreground and background colors.
# 3  
Old 04-09-2010
URGENT

Some one please help me, its quire URGENT.
# 4  
Old 04-09-2010
Okay - please don't bump your posts. Also, if this is an emergency, which doesn't seem likely, please post emergency requests in the emrgency forums
Thank you.

You create a child process everytime you use the ` ` construct. You are creating further child processes with each awk, bc, etc. invocation. Some of your lines of code create 3 or 4 child processes. Those commands are inside nested loops. You are creating thousands of children, each one incurs a lot of resource usage.
# 5  
Old 04-09-2010
Does this works for you ?

Code:
 
awk '{a[$0]++} END { for(i in a) { print i "\n" "ALERTS_MATCHED:" a[i]} }'  input_file

# 6  
Old 04-09-2010
Ignoring what the script does at the moment, what constitutes a "match"?
Could you highlight those parts of some sample messages which you are trying to match.


Comment on efficiency and logic:
The script inner loop is executed 4,000 x 4,000 = 16,000,000 times and then two in-situ edit "sed -i" commands are executed on the INPUT file to the inner loop (./abc_15.txt) for every "match". Possibly an attempt to reduce processing by removing records from one of the copies of the input file.

I guess that this is some Linux variant with bash?
# 7  
Old 04-09-2010
Although this does not match your exact output format, it can do the job
Code:
sort some-alert-file | uniq -c

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Bash script search, improve performance with large files

Hello, For several of our scripts we are using awk to search patterns in files with data from other files. This works almost perfectly except that it takes ages to run on larger files. I am wondering if there is a way to speed up this process or have something else that is quicker with the... (15 Replies)
Discussion started by: SDohmen
15 Replies

2. Programming

Improve the performance of my C++ code

Hello, Attached is my very simple C++ code to remove any substrings (DNA sequence) of each other, i.e. any redundant sequence is removed to get unique sequences. Similar to sort | uniq command except there is reverse-complementary for DNA sequence. The program runs well with small dataset, but... (11 Replies)
Discussion started by: yifangt
11 Replies

3. UNIX for Dummies Questions & Answers

How to improve the performance of this script?

Hi , i wrote a script to convert dates to the formate i want .it works fine but the conversion is tkaing lot of time . Can some one help me tweek this script #!/bin/bash file=$1 ofile=$2 cp $file $ofile mydates=$(grep -Po '+/+/+' $ofile) # gets 8/1/13 mydates=$(echo "$mydates" | sort |... (5 Replies)
Discussion started by: vikatakavi
5 Replies

4. Programming

Help with improve the performance of grep

Input file: #content_1 12314345345 242467 #content_14 436677645 576577657 #content_100 3425546 56 #content_12 243254546 1232454 . . Reference file: content_100 (1 Reply)
Discussion started by: cpp_beginner
1 Replies

5. Shell Programming and Scripting

How to improve the performance of parsers in Perl?

Hi, I have around one lakh records. I have used XML for the creation of the data. I have used these 2 Perl modules. use XML::DOM; use XML::LibXML; The data will loo like this and most it is textual entries. <eid>19000</eid> <einfo>This is the ..........</einfo> ......... (3 Replies)
Discussion started by: vanitham
3 Replies

6. Shell Programming and Scripting

Want to improve the performance of script

Hi All, I have written a script as follows which is taking lot of time in executing/searching only 3500 records taken as input from one file in log file of 12 GB Approximately. Working of script is read the csv file as an input having 2 arguments which are transaction_id,mobile_number and search... (6 Replies)
Discussion started by: poweroflinux
6 Replies

7. Shell Programming and Scripting

Any way to improve performance of this script

I have a data file of 2 gig I need to do all these, but its taking hours, any where i can improve performance, thanks a lot #!/usr/bin/ksh echo TIMESTAMP="$(date +'_%y-%m-%d.%H-%M-%S')" function showHelp { cat << EOF >&2 syntax extreme.sh FILENAME Specify filename to parse EOF... (3 Replies)
Discussion started by: sirababu
3 Replies

8. UNIX for Dummies Questions & Answers

Improve Performance

hi someone tell me which ways i can improve disk I/O and system process performance.kindly refer some commands so i can do it on my test machine.thanks, Mazhar (2 Replies)
Discussion started by: mazhar99
2 Replies

9. Shell Programming and Scripting

How to improve grep performance...

Hi All, I am using grep command to find string "abc" in one file . content of file is *********** abc = xyz def= lmn ************ i have given the below mentioned command to redirect the output to tmp file grep abc file | sort -u | awk '{print #3}' > out_file Then i am searching... (2 Replies)
Discussion started by: pooga17
2 Replies

10. UNIX for Advanced & Expert Users

improve performance by using ls better than find

Hi , i'm searching for files over many Aix servers with rsh command using this request : find /dir1 -name '*.' -exec ls {} \; and then count them with "wc" but i would improve this search because it's too long and replace directly find with ls command but "ls *. " doesn't work. and... (3 Replies)
Discussion started by: Nicol
3 Replies
Login or Register to Ask a Question