Improve the performance of a shell script

04-09-2010

Registered User

62, 0

Join Date: Feb 2008

Last Activity: 24 September 2010, 6:43 AM EDT

Location: India

Posts: 62

Thanks Given: 0

Thanked 0 Times in 0 Posts

Improve the performance of a shell script

Hi Friends,

I wrote the below shell script to generate a report on alert messages recieved on a day. But i for processing around 4500 lines (alerts) the script is taking aorund 30 minutes to process.

Please help me to make it faster and improve the performace of the script. i would be very happy if this report gets generated within 5 mins - if possible
Below is my shell script

Code:

cp a_15.txt abc_15.txt && cp a_15.txt xyz_15.txt


total_word_count=0; match_word_count=0; alerts_matched=0;

while read outer_line
do
echo -e "$outer_line"
total_word_count=`echo $outer_line |tr '[ : .' ' ' |awk '{ print NF} '`
outer_line=`echo $outer_line |tr '[ : .' ' '`

##
while read inner_line
do

###
for word in $outer_line
do
echo $inner_line |grep -i -w "$word" 1>/dev/null
if [ $? -eq 0 ];then 
match_word_count=`echo $match_word_count + 1|bc`
else 
:
fi
done
###

match_pcnt=`echo "scale=2; $match_word_count/$total_word_count*100"|bc |awk -F"." '{print $1}'`


if [ $match_pcnt -ge 66 ];then
alerts_matched=`echo $alerts_matched + 1|bc`
inner_line=`echo $inner_line| tr '[ : .' '.'`
sed -i "s/$inner_line//g" ./abc_15.txt
sed -i '/^$/d' ./abc_15.txt

else
:
fi

match_word_count=0;
match_pcnt=0

done <./abc_15.txt
##
echo -e "\nALERTS_MATCHED: $alerts_matched\n\n"

alerts_matched=0;
cat ./abc_15.txt >./xyz_15.txt

done <./xyz_15.txt >APS.out

---------- Post updated at 10:00 AM ---------- Previous update was at 09:55 AM ----------

The alert messages (4000 lines) will be like below

Code:

ERROR (19:58:50,781 ERROR Thread-37-RunnerPool-Fast RamPricingQueryRunner.run:493 Got exception parsing VAN query result for product [APS   .Q])
ERROR (19:45:40,529 ERROR Thread-26-RunnerPool-Std RamPricingQueryRunner.run:493 Got exception parsing VAN query result for product [APS   .Q])
ERROR (01:31:24,073 ERROR Thread-214 STAbstractComputable.getComputeResult:114 Cycle detected at com.sample.es.calcrules.newstyle.RuleAwareMarketDataHolder@12345, key=rules|APS   .Q, subject=APS   .Q, , rule=com.sample.es

The report i want looks like below

Code:

ERROR (19:58:50,781 ERROR Thread-37-RunnerPool-Fast RamPricingQueryRunner.run:493 Got exception parsing VAN query result for product [APS   .Q])
ALERTS_MATCHED: 300

ERROR (19:45:40,529 ERROR Thread-26-RunnerPool-Std RamPricingQueryRunner.run:493 Got exception parsing VAN query result for product [APS   .Q])
ALERTS_MATCHED: 55

ERROR (01:31:24,073 ERROR Thread-214 STAbstractComputable.getComputeResult:114 Cycle detected at com.sample.es.calcrules.newstyle.RuleAwareMarketDataHolder@12345, key=rules|APS   .Q, subject=APS   .Q, , rule=com.sample.es
ALERTS_MATCHED: 700

Last edited by apsprabhu; 04-09-2010 at 02:38 AM..

apsprabhu

View Public Profile for apsprabhu

Find all posts by apsprabhu

04-09-2010

Registered User

3,231, 978

Join Date: Dec 2009

Last Activity: 11 June 2014, 8:40 PM EDT

Posts: 3,231

Thanks Given: 179

Thanked 978 Times in 791 Posts

I can barely read the script because of the lack of contrast between foreground and background colors.

alister

View Public Profile for alister

Find all posts by alister

04-09-2010

Registered User

62, 0

Join Date: Feb 2008

Last Activity: 24 September 2010, 6:43 AM EDT

Location: India

Posts: 62

Thanks Given: 0

Thanked 0 Times in 0 Posts

URGENT

Some one please help me, its quire URGENT.

apsprabhu

View Public Profile for apsprabhu

Find all posts by apsprabhu

04-09-2010

Registered User

11,728, 1,345

Join Date: Feb 2004

Last Activity: 8 May 2020, 9:07 AM EDT

Location: NM

Posts: 11,728

Thanks Given: 903

Thanked 1,345 Times in 1,201 Posts

Okay - please don't bump your posts. Also, if this is an emergency, which doesn't seem likely, please post emergency requests in the emrgency forums
Thank you.

You create a child process everytime you use the ` ` construct. You are creating further child processes with each awk, bc, etc. invocation. Some of your lines of code create 3 or 4 child processes. Those commands are inside nested loops. You are creating thousands of children, each one incurs a lot of resource usage.

jim mcnamara

View Public Profile for jim mcnamara

Find all posts by jim mcnamara

04-09-2010

Registered User

1,170, 106

Join Date: Sep 2008

Last Activity: 10 October 2019, 7:06 AM EDT

Posts: 1,170

Thanks Given: 22

Thanked 106 Times in 101 Posts

Does this works for you ?

Code:

 
awk '{a[$0]++} END { for(i in a) { print i "\n" "ALERTS_MATCHED:" a[i]} }'  input_file

panyam

View Public Profile for panyam

Find all posts by panyam

04-09-2010

Registered User

6,402, 678

Join Date: Mar 2008

Last Activity: 8 June 2016, 9:58 PM EDT

Posts: 6,402

Thanks Given: 288

Thanked 678 Times in 647 Posts

Ignoring what the script does at the moment, what constitutes a "match"?
Could you highlight those parts of some sample messages which you are trying to match.

Comment on efficiency and logic:
The script inner loop is executed 4,000 x 4,000 = 16,000,000 times and then two in-situ edit "sed -i" commands are executed on the INPUT file to the inner loop (./abc_15.txt) for every "match". Possibly an attempt to reduce processing by removing records from one of the copies of the input file.

I guess that this is some Linux variant with bash?

methyl

View Public Profile for methyl

Find all posts by methyl

04-09-2010

Registered User

191, 46

Join Date: Jun 2008

Last Activity: 31 July 2012, 10:08 PM EDT

Location: Singapore

Posts: 191

Thanks Given: 3

Thanked 46 Times in 45 Posts

Although this does not match your exact output format, it can do the job

Code:

sort some-alert-file | uniq -c

chihung

View Public Profile for chihung

Find all posts by chihung

Shell Programming and Scripting

Improve the performance of a shell script

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Bash script search, improve performance with large files

Discussion started by: SDohmen

2. Programming

Improve the performance of my C++ code

Discussion started by: yifangt

3. UNIX for Dummies Questions & Answers

How to improve the performance of this script?

Discussion started by: vikatakavi

4. Programming

Help with improve the performance of grep

Discussion started by: cpp_beginner

5. Shell Programming and Scripting

How to improve the performance of parsers in Perl?

Discussion started by: vanitham

6. Shell Programming and Scripting

Want to improve the performance of script

Discussion started by: poweroflinux

7. Shell Programming and Scripting

Any way to improve performance of this script

Discussion started by: sirababu

8. UNIX for Dummies Questions & Answers

Improve Performance

Discussion started by: mazhar99

9. Shell Programming and Scripting

How to improve grep performance...

Discussion started by: pooga17

10. UNIX for Advanced & Expert Users

improve performance by using ls better than find

Discussion started by: Nicol