Possible performance improvement (Bash and flat file)

05-07-2010

Registered User

6, 0

Join Date: Jan 2010

Last Activity: 27 September 2013, 12:26 PM EDT

Posts: 6

Thanks Given: 0

Thanked 0 Times in 0 Posts

Possible performance improvement (Bash and flat file)

Hello,

I am pretty new to shell scripts and I recently wrote one that seems to do what it should but I am exploring the possibility of improving its performance and would appreciate some help. Here is what it does - Its meant to monitor a bunch of systems (reads in IPs one at a time from a flat file). For each IP, it fetches a set of web pages, parses them to extract certain numbers, compares them against defined thresholds and alerts if the metric falls outside the threshold range. The catch is for certain metrics, it requires the last 5 values that it observed so I store those in a flat file and every time a new value is retrieved from the web page, that along with the stored values are used to compare against the threshold. Basically, I am doing everything sequentially so 2 loops, one to read in the IP and the next to do the web page download, threshold check, etc. Every time a new IP is added or a new metric needs to be monitored, the time taken to loop back to a machine increases. I wanted to see if there was a way to improve this? Intuitively, I feel, because all historical values are stored in a single flat file, something like multi processing would not work since, a process would have that file locked. Any ideas?????

Thanks,
-p

prafulnama

View Public Profile for prafulnama

Find all posts by prafulnama

05-07-2010

Registered User

317, 0

Join Date: Apr 2008

Last Activity: 22 May 2013, 8:38 AM EDT

Location: Calgary

Posts: 317

Thanks Given: 0

Thanked 0 Times in 0 Posts

As the level of complexity increases, it begins to make more sense to utilize a database to manage the changing state of the environment. Maybe look into something simple to start with - like Berkely DB

avronius

View Public Profile for avronius

Find all posts by avronius

05-07-2010

Registered User

11,728, 1,345

Join Date: Feb 2004

Last Activity: 8 May 2020, 9:07 AM EDT

Location: NM

Posts: 11,728

Thanks Given: 903

Thanked 1,345 Times in 1,201 Posts

One major thing to look at: child process creation. Try to use shell builtins instead of
a lot of back tic ` ` (or $(... ) ) constructions.

You can also store your flat file in a variable, so you read it only once:

Code:

flatfile=$(< /path/to/my/flatfile)

Then you can step thru the records or create arrays of the data.

jim mcnamara

View Public Profile for jim mcnamara

Find all posts by jim mcnamara

05-07-2010

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

Quote:

Originally Posted by prafulnama

The catch is for certain metrics, it requires the last 5 values that it observed so I store those in a flat file and every time a new value is retrieved from the web page, that along with the stored values are used to compare against the threshold. Basically, I am doing everything sequentially so 2 loops, one to read in the IP and the next to do the web page download, threshold check, etc. Every time a new IP is added or a new metric needs to be monitored, the time taken to loop back to a machine increases. I wanted to see if there was a way to improve this?

It would help to see the actual code.

Quote:

Intuitively, I feel, because all historical values are stored in a single flat file, something like multi processing would not work since, a process would have that file locked. Any ideas?

Most systems don't do that kind of locking unless you explicitly ask for it. But having two processes simultaneously read the same file handle wouldn't be a great idea, they might each get half a line or somesuch. If you're just reading flat files line by line, you could try a 'reader' script that reads everything for them and parcels them out individually. That'd have some extra overhead for the extra process and its pipes, but would let more than one reader operate at once.

I'll need to see your actual code to help you here, I think, at least some of it. What needs to be optimized depends not just on what you're doing, but how you're doing it. If you're new to shell scripting there's some trivial design mistakes that could be causing slowdowns... excessive use of pipes and/or backticks is particularly bad. If you've got pipe chains on almost every line, there's probably much room for improvement. In my early scripting days I wrote a linewrapper in BASH that fed everything through about 9 sub-processes, it ended up processing at 10 kilobytes per second!

Last edited by Corona688; 05-07-2010 at 03:02 PM.. Reason: fix inexplicable doublepost

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

05-11-2010

Registered User

6, 0

Join Date: Jan 2010

Last Activity: 27 September 2013, 12:26 PM EDT

Posts: 6

Thanks Given: 0

Thanked 0 Times in 0 Posts

Thanks a lot everyone. I do seem to have a very large number of back tics. Would appreciate help in eliminating them and any other way of improving performance.

Code:

#!/bin/bash
#Retrive a list of proxies and compare specified metrics against their threshold values. Alert as required.

#Path to the list of proxies
proxylist="proxylist2"

#Path to the list of URLs, metrics and thresholds
metriclist="metriclist1"

#Path to the proxy history file
proxyhistory="proxyhistory1"

#Parse through the list of proxies and check the specified metrics
while true
do
while read line
do
    if [ "$line" ]
    then

        #Ping the machine to check status.
        ping -c 2 $line > /dev/null 2>&1
        status=`echo $?`

        if [ $status -eq 0 ]
        then
            #Retrieve device name using SNMP
            a=`snmpget $line system.sysName.0`
            set -- $a
            devicename=`echo $6`
        #echo "DEVICE - $devicename"

            #Read in a list of URLs, metrics and thresholds and apply them one at a time for each proxy
            while read line1
            do
                if [ "$line1" ]
                then
                    len=`echo $line1 | wc -w`
                    len1=$[len-1]
                    len1a=$[len-2]
                    set -- $line1
                    alertlevel=$1
                    url=$2
                    url1=$2
                    threshold=`echo $line1 | cut -d ' ' -f $len`
                    
                    
                    if [ "$threshold" != "RATE" ]
                    then
                        metric=`echo $line1 | cut -d ' ' -f 3-$len1`
                    elif [ "$threshold" == "RATE" ]
                    then
                        metric=`echo $line1 | cut -d ' ' -f 3-$len1a`
                        rate=`echo $line1 | cut -d ' ' -f $len1`
                    #echo "Rate - $rate"
                    fi

                    #Completing the URL
                    url="https://$line:8082$url"
                #echo "URL - $url"
                
                    #Retrieve the metric value(s) from the URL 
                    value=`retrievemetric1.sh "$url" "$metric"`
                #echo "VALUE = $value"

                    #If the threshold is explicitly defined
                    if [ "$threshold" != "RATE" ]
                    then

                        #Compare the metric value(s) against corresponding thresholds and alert if required
                        len2=`echo $value | wc -w`

                        for (( i = 1; i <= $len2 ; i++))
                        do
                            value1=`echo $value | cut -d ' ' -f $i`
                            check=`thresholdexceed1.sh "$value1" "$threshold"`
                            if [ "$check" == "true" ]
                            then
                                echo "$alertlevel - $devicename -> $metric - $value1. Threshold: $threshold"
                                echo "-------------------------------------------------------------------------------------"
                            fi
                        done

                    #If the threshold is rate based
                    elif [ "$threshold" == "RATE" ]
                    then
                        #Number of values from the URL
                        len2=`echo $value | wc -w`    
            
                        #Flag to check if all values are reflected in history file
                        stringMissing="false"

                        #Check if all values present in history file
                        for (( i = 1 ; i <= $len2 ; i++ ))
                        do
                            stringa="$devicename-$url1-$metric-$i-a"
                            stringb="$devicename-$url1-$metric-$i-b"
                            stringc="$devicename-$url1-$metric-$i-c"
                            stringd="$devicename-$url1-$metric-$i-d"
                            stringe="$devicename-$url1-$metric-$i-e"

                            if ! grep "$stringa" "$proxyhistory" > /dev/null
                            then
                                stringMissing="true"
                            fi
                            if ! grep "$stringb" "$proxyhistory" > /dev/null
                                                 then
                                                         stringMissing="true"
                                                 fi
                            if ! grep "$stringc" "$proxyhistory" > /dev/null
                                                 then
                                                         stringMissing="true"
                                                 fi    
                            if ! grep "$stringd" "$proxyhistory" > /dev/null
                                                 then
                                                         stringMissing="true"
                                                 fi    
                            if ! grep "$stringe" "$proxyhistory" > /dev/null
                                                 then
                                                         stringMissing="true"
                                                 fi    
                        done
                    
                        #If a value is missing, delete all the ones that are present for that metric
                        if [ "$stringMissing" == "true" ]
                        then
                            grep -v "$devicename-$url1-$metric-" "$proxyhistory" > "temp"
                            mv "temp" "$proxyhistory"

                            #Create all the required strings and initialize them
                            for (( i = 1 ; i <= $len2 ; i++ ))
                            do
                                val=`echo $value | cut -d ' ' -f $i`
                                stringa="$devicename-$url1-$metric-$i-a $val"
                                stringb="$devicename-$url1-$metric-$i-b 0"
                                stringc="$devicename-$url1-$metric-$i-c 0"
                                stringd="$devicename-$url1-$metric-$i-d 0"
                                stringe="$devicename-$url1-$metric-$i-e 0"
            
                                echo "$stringa" >> "$proxyhistory"
                                echo "$stringb" >> "$proxyhistory"
                                echo "$stringc" >> "$proxyhistory"
                                echo "$stringd" >> "$proxyhistory"
                                echo "$stringe" >> "$proxyhistory"

                            done    
                    
                        #If all the required strings are present
                        elif [ "$stringMissing" == "false" ]
                        then
                    
                            for (( i = 1 ; i <= $len2 ; i++ ))
                            do
                                val=`echo $value | cut -d ' ' -f $i`
                                stringa="$devicename-$url1-$metric-$i-a"
                                stringb="$devicename-$url1-$metric-$i-b"
                                stringc="$devicename-$url1-$metric-$i-c"
                                stringd="$devicename-$url1-$metric-$i-d"
                                stringe="$devicename-$url1-$metric-$i-e"


                                vala=`grep "$stringa[[:space:]]*[0-9]" "$proxyhistory" | grep -o '[0-9]*$'`
                                valb=`grep "$stringb[[:space:]]*[0-9]" "$proxyhistory" | grep -o '[0-9]*$'`
                                valc=`grep "$stringc[[:space:]]*[0-9]" "$proxyhistory" | grep -o '[0-9]*$'`
                                vald=`grep "$stringd[[:space:]]*[0-9]" "$proxyhistory" | grep -o '[0-9]*$'`
                                vale=`grep "$stringe[[:space:]]*[0-9]" "$proxyhistory" | grep -o '[0-9]*$'`
                    
                            #echo "VALA - $vala"
                            #echo "VALB - $valb"
                            #echo "VALC - $valc"
                            #echo "VALD - $vald"
                            #echo "VALE - $vale"

                                if [ $vala -eq 0 ]
                                then
                                    echo "$stringa $val" >> "$proxyhistory"
                                    
                                elif [ $vala -ne 0 ] && [ $valb -eq 0 ]
                                then
                                    grep -v "$stringb" "$proxyhistory" > "temp"
                                    mv "/temp" "$proxyhistory"
                                    echo "$stringb $val" >> "$proxyhistory"
                            
                                elif [ $vala -ne 0 ] && [ $valb -ne 0 ] && [ $valc -eq 0 ]
                                then
                                    grep -v "$stringc" "$proxyhistory" > "temp"
                                    mv "temp" "$proxyhistory"
                                    echo "$stringc $val" >> "$proxyhistory"
                                
                                elif [ $vala -ne 0 ] && [ $valb -ne 0 ] && [ $valc -ne 0 ] && [ $vald -eq 0 ]
                                then
                                    grep -v "$stringd" "$proxyhistory" > "temp"
                                    mv "temp" "$proxyhistory"
                                    echo "$stringd $val" >> "$proxyhistory"
                                
                                elif [ $vala -ne 0 ] && [ $valb -ne 0 ] && [ $valc -ne 0 ] && [ $vald -ne 0 ] && [ $vale -eq 0 ]
                                then
                                    grep -v "$stringe" "$proxyhistory" > "temp"
                                    mv "temp" "$proxyhistory"
                                    echo "$stringe $val" >> "$proxyhistory"

                                elif [ $vala -ne 0 ] && [ $valb -ne 0 ] && [ $valc -ne 0 ] && [ $vald -ne 0 ] && [ $vale -ne 0 ]
                                then                    
                                    #threshold1=$[rate*-1]
                                    threshold1=$(printf "%s\n" "scale = 2; $rate*-1" | bc)
                                    threshold2=$rate

                                    diff1=$(printf "%s\n" "scale = 4; (($valb-$vala)/$vala)*100" | bc)
                                    diff2=$(printf "%s\n" "scale = 4; (($valc-$valb)/$valb)*100" | bc)
                                    diff3=$(printf "%s\n" "scale = 4; (($vald-$valc)/$valc)*100" | bc)
                                    diff4=$(printf "%s\n" "scale = 4; (($vale-$vald)/$vald)*100" | bc)
                                    diff5=$(printf "%s\n" "scale = 4; (($val-$vale)/$vale)*100" | bc)

                                #echo "DIFF1 - $diff1"
                                #echo "DIFF2 - $diff2"
                                #echo "DIFF3 - $diff3"
                                #echo "DIFF4 - $diff4"
                                #echo "DIFF5 - $diff5"

                                    overThreshold1=`echo "$diff1 > $threshold2" | bc`
                                    underThreshold1=`echo "$diff1 < $threshold1" | bc`
                                    overThreshold2=`echo "$diff2 > $threshold2" | bc`
                                    underThreshold2=`echo "$diff2 < $threshold1" | bc`
                                    overThreshold3=`echo "$diff3 > $threshold2" | bc`
                                    underThreshold3=`echo "$diff3 < $threshold1" | bc`
                                    overThreshold4=`echo "$diff4 > $threshold2" | bc`
                                    underThreshold4=`echo "$diff4 < $threshold1" | bc`
                                    overThreshold5=`echo "$diff5 > $threshold2" | bc`
                                    underThreshold5=`echo "$diff5 < $threshold1" | bc`

                                #echo "TH1 - $overThreshold1, $underThreshold1"
                                #echo "TH2 - $overThreshold2, $underThreshold2"
                                #echo "TH3 - $overThreshold3, $underThreshold3"
                                #echo "TH4 - $overThreshold4, $underThreshold4"
                                #echo "TH5 - $overThreshold5, $underThreshold5"

                                    thresh1="false"
                                    thresh2="false"
                                    thresh3="false"
                                    thresh4="false"
                                    thresh5="false"

                                    if [ $overThreshold1 -ne 0 ] || [ $underThreshold1 -ne 0 ]
                                    then
                                        thresh1="true"
                                    fi
                                    if [ $overThreshold2 -ne 0 ] || [ $underThreshold2 -ne 0 ]
                                    then
                                        thresh2="true"
                                    fi
                                    if [ $overThreshold3 -ne 0 ] || [ $underThreshold3 -ne 0 ]
                                    then
                                        thresh3="true"
                                    fi
                                    if [ $overThreshold4 -ne 0 ] || [ $underThreshold4 -ne 0 ]
                                    then
                                        thresh4="true"
                                    fi
                                    if [ $overThreshold5 -ne 0 ] || [ $underThreshold5 -ne 0 ]
                                    then
                                        thresh5="true"
                                    fi

                                    if [ "$thresh1" == "true" ] && [ "$thresh2" == "true" ] && [ "$thresh3" == "true" ] && [ "$thresh4" == "true" ] && [ "$thresh5" == "true" ]
                                    then
                                        echo "$alertlevel - $devicename -> $metric - $diff1%, $diff2%, $diff3%, $diff4%, $diff5%. Threshold: $threshold1% to $threshold2%"
                                        echo "-------------------------------------------------------------------------------------"
                                    fi
                                                            
                                    grep -v "$devicename-$url1-$metric-$i-" "$proxyhistory" > "/temp"
                                    mv "temp" "$proxyhistory"

                                    stringa="$devicename-$url1-$metric-$i-a $valb"
                                    stringb="$devicename-$url1-$metric-$i-b $valc"
                                    stringc="$devicename-$url1-$metric-$i-c $vald"
                                    stringd="$devicename-$url1-$metric-$i-d $vale"
                                    stringe="$devicename-$url1-$metric-$i-e $val"

                                    echo "$stringa" >> "$proxyhistory"
                                    echo "$stringb" >> "$proxyhistory"
                                    echo "$stringc" >> "$proxyhistory"
                                    echo "$stringd" >> "$proxyhistory"
                                    echo "$stringe" >> "$proxyhistory"

                                fi
                            done
                            sort "$proxyhistory" > "temp"
                            mv "temp" "$proxyhistory"
                        fi

                    fi    
                fi
            done < "$metriclist"
        else
            echo "$line did not respond to PING"
        fi
        echo "***********************************************************************************"    
    fi
done < "$proxylist"
done

prafulnama

View Public Profile for prafulnama

Find all posts by prafulnama

05-11-2010

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

Wow, yeah... whenever you have

Code:

something="`echo $variable`

just do

Code:

something="$variable"

Also,

Code:

ping -c 2 $line > /dev/null 2>&1
status=`echo $?`
if [ $status -eq 0 ]
then
...

can just be

Code:

if ping -c 2 $line > /dev/null 2>&1
then
...

Also, I'm not entirely sure what this line is doing:

Code:

if [ "$line" ]

...but if you're guarding against blank lines:

Code:

if [ ! -z "${line}" ]
...

Or better yet, do this. It will skip blank lines without another layer of nested if at all:

Code:

[ -z "${line}" ] && continue

Constructs like these are extremely slow since they can run cut uncountable numbers of times.

Code:

value1=`echo $value | cut -d ' ' -f $i`

Instead, since you're using a shell that supports arrays, just split it into an array once then use the array. This should split fine on spaces:

Code:

ARRAY=( $value )

...

value1="${ARRAY[$i]}"

You can also split on other characters by changing the IFS variable but be aware that this affects read too.

You're running grep many, many times per loop. This is slow. Instead of

Code:

if ! grep file string1 ; then str=no ; fi
if ! grep file string2 ; then str=no ; fi
...

try

Code:

HAS1=0
HAS2=0
HAS3=0
HAS4=0
while read TESTLINE
do
        [[ "${TESTLINE}" =~ $string1 ]] && HAS1=1
        [[ "${TESTLINE}" =~ $string2 ]] && HAS2=1
        [[ "${TESTLINE}" =~ $string3 ]] && HAS3=1
        [[ "${TESTLINE}" =~ $string4 ]] && HAS4=1
done < file
OKAY=0
[ "$HAS1" -gt 0 ] && [ "$HAS2" -gt 0 ] && [ "${HAS3}" -gt 0 ] && [ "${HAS4}" -gt 0 ] && OKAY=1

This reads the file only once and doesn't execute four extra processes. Note that the ~= regular expression operator only works in bash.

Whenever you have VAR=`something | grep something | grep something | grep something` that's an enormous performance waster, and likely possible with shell built-ins, though exactly how depends on what bits you want to get.

...and so forth and so forth. Your script is enormous. You might want to break it into functions so you can tell what's happening where. Functions are easy:

Code:

function myfunc
{
  echo $1 $2
}

myfunc a b

They act like processes in that they return numbers, not strings, and output to stdin/stdout/stderr. But they can set global variables (as long as they're not behind a pipe).

The advanced bash scripting guide is a nice reference.

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

05-11-2010

Registered User

55, 2

Join Date: Mar 2010

Last Activity: 23 May 2012, 10:22 AM EDT

Posts: 55

Thanks Given: 0

Thanked 2 Times in 2 Posts

You can also keep the historical data manageable by tailing the file. Log all values into a single file, such as history.log
At the beginning of the log file processing, execute:

Code:

tail history.log > fileToProcess.log

This will give you a smaller file from which to get your historical data. The size of your history.log file will not matter, your processing file will always contain the last 10 entries.

dunkar70

View Public Profile for dunkar70

Find all posts by dunkar70

Infrastructure Monitoring

Possible performance improvement (Bash and flat file)

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Bash script search, improve performance with large files

Discussion started by: SDohmen

2. OS X (Apple)

Create a bash array from a flat file of whitespaces only.

Discussion started by: wisecracker

3. Shell Programming and Scripting

Bash - array loop performance

Discussion started by: math4

4. Shell Programming and Scripting

Performance improvement in grep

Discussion started by: vegasluxor

5. Shell Programming and Scripting

[BASH] Performance question - Script to STDOUT

Discussion started by: sea

6. Shell Programming and Scripting

Display-performance in terminal, bash or python?

Discussion started by: sea

7. UNIX for Advanced & Expert Users

linux os improvement

Discussion started by: goodbid

8. Shell Programming and Scripting

Any improvement possible in this script

Discussion started by: Ex-Capsa

9. Programming

File - reading - Performance improvement

Discussion started by: dhanamurthy

10. Shell Programming and Scripting

Help with Flat Files Please!! BASH (New User)

Discussion started by: cyberjax21