Possible performance improvement (Bash and flat file)


 
Thread Tools Search this Thread
Special Forums UNIX and Linux Applications Infrastructure Monitoring Possible performance improvement (Bash and flat file)
# 1  
Old 05-07-2010
Possible performance improvement (Bash and flat file)

Hello,

I am pretty new to shell scripts and I recently wrote one that seems to do what it should but I am exploring the possibility of improving its performance and would appreciate some help. Here is what it does - Its meant to monitor a bunch of systems (reads in IPs one at a time from a flat file). For each IP, it fetches a set of web pages, parses them to extract certain numbers, compares them against defined thresholds and alerts if the metric falls outside the threshold range. The catch is for certain metrics, it requires the last 5 values that it observed so I store those in a flat file and every time a new value is retrieved from the web page, that along with the stored values are used to compare against the threshold. Basically, I am doing everything sequentially so 2 loops, one to read in the IP and the next to do the web page download, threshold check, etc. Every time a new IP is added or a new metric needs to be monitored, the time taken to loop back to a machine increases. I wanted to see if there was a way to improve this? Intuitively, I feel, because all historical values are stored in a single flat file, something like multi processing would not work since, a process would have that file locked. Any ideas?????

Thanks,
-p
# 2  
Old 05-07-2010
As the level of complexity increases, it begins to make more sense to utilize a database to manage the changing state of the environment. Maybe look into something simple to start with - like Berkely DB
# 3  
Old 05-07-2010
One major thing to look at: child process creation. Try to use shell builtins instead of
a lot of back tic ` ` (or $(... ) ) constructions.

You can also store your flat file in a variable, so you read it only once:
Code:
flatfile=$(< /path/to/my/flatfile)

Then you can step thru the records or create arrays of the data.
# 4  
Old 05-07-2010
Quote:
Originally Posted by prafulnama
The catch is for certain metrics, it requires the last 5 values that it observed so I store those in a flat file and every time a new value is retrieved from the web page, that along with the stored values are used to compare against the threshold. Basically, I am doing everything sequentially so 2 loops, one to read in the IP and the next to do the web page download, threshold check, etc. Every time a new IP is added or a new metric needs to be monitored, the time taken to loop back to a machine increases. I wanted to see if there was a way to improve this?
It would help to see the actual code.
Quote:
Intuitively, I feel, because all historical values are stored in a single flat file, something like multi processing would not work since, a process would have that file locked. Any ideas?
Most systems don't do that kind of locking unless you explicitly ask for it. But having two processes simultaneously read the same file handle wouldn't be a great idea, they might each get half a line or somesuch. If you're just reading flat files line by line, you could try a 'reader' script that reads everything for them and parcels them out individually. That'd have some extra overhead for the extra process and its pipes, but would let more than one reader operate at once.

I'll need to see your actual code to help you here, I think, at least some of it. What needs to be optimized depends not just on what you're doing, but how you're doing it. If you're new to shell scripting there's some trivial design mistakes that could be causing slowdowns... excessive use of pipes and/or backticks is particularly bad. If you've got pipe chains on almost every line, there's probably much room for improvement. In my early scripting days I wrote a linewrapper in BASH that fed everything through about 9 sub-processes, it ended up processing at 10 kilobytes per second!

Last edited by Corona688; 05-07-2010 at 03:02 PM.. Reason: fix inexplicable doublepost
# 5  
Old 05-11-2010
Thanks a lot everyone. I do seem to have a very large number of back tics. Would appreciate help in eliminating them and any other way of improving performance.

Code:
#!/bin/bash
#Retrive a list of proxies and compare specified metrics against their threshold values. Alert as required.

#Path to the list of proxies
proxylist="proxylist2"

#Path to the list of URLs, metrics and thresholds
metriclist="metriclist1"

#Path to the proxy history file
proxyhistory="proxyhistory1"

#Parse through the list of proxies and check the specified metrics
while true
do
while read line
do
    if [ "$line" ]
    then

        #Ping the machine to check status.
        ping -c 2 $line > /dev/null 2>&1
        status=`echo $?`

        if [ $status -eq 0 ]
        then
            #Retrieve device name using SNMP
            a=`snmpget $line system.sysName.0`
            set -- $a
            devicename=`echo $6`
        #echo "DEVICE - $devicename"

            #Read in a list of URLs, metrics and thresholds and apply them one at a time for each proxy
            while read line1
            do
                if [ "$line1" ]
                then
                    len=`echo $line1 | wc -w`
                    len1=$[len-1]
                    len1a=$[len-2]
                    set -- $line1
                    alertlevel=$1
                    url=$2
                    url1=$2
                    threshold=`echo $line1 | cut -d ' ' -f $len`
                    
                    
                    if [ "$threshold" != "RATE" ]
                    then
                        metric=`echo $line1 | cut -d ' ' -f 3-$len1`
                    elif [ "$threshold" == "RATE" ]
                    then
                        metric=`echo $line1 | cut -d ' ' -f 3-$len1a`
                        rate=`echo $line1 | cut -d ' ' -f $len1`
                    #echo "Rate - $rate"
                    fi

                    #Completing the URL
                    url="https://$line:8082$url"
                #echo "URL - $url"
                
                    #Retrieve the metric value(s) from the URL 
                    value=`retrievemetric1.sh "$url" "$metric"`
                #echo "VALUE = $value"

                    #If the threshold is explicitly defined
                    if [ "$threshold" != "RATE" ]
                    then

                        #Compare the metric value(s) against corresponding thresholds and alert if required
                        len2=`echo $value | wc -w`

                        for (( i = 1; i <= $len2 ; i++))
                        do
                            value1=`echo $value | cut -d ' ' -f $i`
                            check=`thresholdexceed1.sh "$value1" "$threshold"`
                            if [ "$check" == "true" ]
                            then
                                echo "$alertlevel - $devicename -> $metric - $value1. Threshold: $threshold"
                                echo "-------------------------------------------------------------------------------------"
                            fi
                        done

                    #If the threshold is rate based
                    elif [ "$threshold" == "RATE" ]
                    then
                        #Number of values from the URL
                        len2=`echo $value | wc -w`    
            
                        #Flag to check if all values are reflected in history file
                        stringMissing="false"

                        #Check if all values present in history file
                        for (( i = 1 ; i <= $len2 ; i++ ))
                        do
                            stringa="$devicename-$url1-$metric-$i-a"
                            stringb="$devicename-$url1-$metric-$i-b"
                            stringc="$devicename-$url1-$metric-$i-c"
                            stringd="$devicename-$url1-$metric-$i-d"
                            stringe="$devicename-$url1-$metric-$i-e"

                            if ! grep "$stringa" "$proxyhistory" > /dev/null
                            then
                                stringMissing="true"
                            fi
                            if ! grep "$stringb" "$proxyhistory" > /dev/null
                                                 then
                                                         stringMissing="true"
                                                 fi
                            if ! grep "$stringc" "$proxyhistory" > /dev/null
                                                 then
                                                         stringMissing="true"
                                                 fi    
                            if ! grep "$stringd" "$proxyhistory" > /dev/null
                                                 then
                                                         stringMissing="true"
                                                 fi    
                            if ! grep "$stringe" "$proxyhistory" > /dev/null
                                                 then
                                                         stringMissing="true"
                                                 fi    
                        done
                    
                        #If a value is missing, delete all the ones that are present for that metric
                        if [ "$stringMissing" == "true" ]
                        then
                            grep -v "$devicename-$url1-$metric-" "$proxyhistory" > "temp"
                            mv "temp" "$proxyhistory"

                            #Create all the required strings and initialize them
                            for (( i = 1 ; i <= $len2 ; i++ ))
                            do
                                val=`echo $value | cut -d ' ' -f $i`
                                stringa="$devicename-$url1-$metric-$i-a $val"
                                stringb="$devicename-$url1-$metric-$i-b 0"
                                stringc="$devicename-$url1-$metric-$i-c 0"
                                stringd="$devicename-$url1-$metric-$i-d 0"
                                stringe="$devicename-$url1-$metric-$i-e 0"
            
                                echo "$stringa" >> "$proxyhistory"
                                echo "$stringb" >> "$proxyhistory"
                                echo "$stringc" >> "$proxyhistory"
                                echo "$stringd" >> "$proxyhistory"
                                echo "$stringe" >> "$proxyhistory"

                            done    
                    
                        #If all the required strings are present
                        elif [ "$stringMissing" == "false" ]
                        then
                    
                            for (( i = 1 ; i <= $len2 ; i++ ))
                            do
                                val=`echo $value | cut -d ' ' -f $i`
                                stringa="$devicename-$url1-$metric-$i-a"
                                stringb="$devicename-$url1-$metric-$i-b"
                                stringc="$devicename-$url1-$metric-$i-c"
                                stringd="$devicename-$url1-$metric-$i-d"
                                stringe="$devicename-$url1-$metric-$i-e"


                                vala=`grep "$stringa[[:space:]]*[0-9]" "$proxyhistory" | grep -o '[0-9]*$'`
                                valb=`grep "$stringb[[:space:]]*[0-9]" "$proxyhistory" | grep -o '[0-9]*$'`
                                valc=`grep "$stringc[[:space:]]*[0-9]" "$proxyhistory" | grep -o '[0-9]*$'`
                                vald=`grep "$stringd[[:space:]]*[0-9]" "$proxyhistory" | grep -o '[0-9]*$'`
                                vale=`grep "$stringe[[:space:]]*[0-9]" "$proxyhistory" | grep -o '[0-9]*$'`
                    
                            #echo "VALA - $vala"
                            #echo "VALB - $valb"
                            #echo "VALC - $valc"
                            #echo "VALD - $vald"
                            #echo "VALE - $vale"

                                if [ $vala -eq 0 ]
                                then
                                    echo "$stringa $val" >> "$proxyhistory"
                                    
                                elif [ $vala -ne 0 ] && [ $valb -eq 0 ]
                                then
                                    grep -v "$stringb" "$proxyhistory" > "temp"
                                    mv "/temp" "$proxyhistory"
                                    echo "$stringb $val" >> "$proxyhistory"
                            
                                elif [ $vala -ne 0 ] && [ $valb -ne 0 ] && [ $valc -eq 0 ]
                                then
                                    grep -v "$stringc" "$proxyhistory" > "temp"
                                    mv "temp" "$proxyhistory"
                                    echo "$stringc $val" >> "$proxyhistory"
                                
                                elif [ $vala -ne 0 ] && [ $valb -ne 0 ] && [ $valc -ne 0 ] && [ $vald -eq 0 ]
                                then
                                    grep -v "$stringd" "$proxyhistory" > "temp"
                                    mv "temp" "$proxyhistory"
                                    echo "$stringd $val" >> "$proxyhistory"
                                
                                elif [ $vala -ne 0 ] && [ $valb -ne 0 ] && [ $valc -ne 0 ] && [ $vald -ne 0 ] && [ $vale -eq 0 ]
                                then
                                    grep -v "$stringe" "$proxyhistory" > "temp"
                                    mv "temp" "$proxyhistory"
                                    echo "$stringe $val" >> "$proxyhistory"

                                elif [ $vala -ne 0 ] && [ $valb -ne 0 ] && [ $valc -ne 0 ] && [ $vald -ne 0 ] && [ $vale -ne 0 ]
                                then                    
                                    #threshold1=$[rate*-1]
                                    threshold1=$(printf "%s\n" "scale = 2; $rate*-1" | bc)
                                    threshold2=$rate

                                    diff1=$(printf "%s\n" "scale = 4; (($valb-$vala)/$vala)*100" | bc)
                                    diff2=$(printf "%s\n" "scale = 4; (($valc-$valb)/$valb)*100" | bc)
                                    diff3=$(printf "%s\n" "scale = 4; (($vald-$valc)/$valc)*100" | bc)
                                    diff4=$(printf "%s\n" "scale = 4; (($vale-$vald)/$vald)*100" | bc)
                                    diff5=$(printf "%s\n" "scale = 4; (($val-$vale)/$vale)*100" | bc)

                                #echo "DIFF1 - $diff1"
                                #echo "DIFF2 - $diff2"
                                #echo "DIFF3 - $diff3"
                                #echo "DIFF4 - $diff4"
                                #echo "DIFF5 - $diff5"

                                    overThreshold1=`echo "$diff1 > $threshold2" | bc`
                                    underThreshold1=`echo "$diff1 < $threshold1" | bc`
                                    overThreshold2=`echo "$diff2 > $threshold2" | bc`
                                    underThreshold2=`echo "$diff2 < $threshold1" | bc`
                                    overThreshold3=`echo "$diff3 > $threshold2" | bc`
                                    underThreshold3=`echo "$diff3 < $threshold1" | bc`
                                    overThreshold4=`echo "$diff4 > $threshold2" | bc`
                                    underThreshold4=`echo "$diff4 < $threshold1" | bc`
                                    overThreshold5=`echo "$diff5 > $threshold2" | bc`
                                    underThreshold5=`echo "$diff5 < $threshold1" | bc`

                                #echo "TH1 - $overThreshold1, $underThreshold1"
                                #echo "TH2 - $overThreshold2, $underThreshold2"
                                #echo "TH3 - $overThreshold3, $underThreshold3"
                                #echo "TH4 - $overThreshold4, $underThreshold4"
                                #echo "TH5 - $overThreshold5, $underThreshold5"

                                    thresh1="false"
                                    thresh2="false"
                                    thresh3="false"
                                    thresh4="false"
                                    thresh5="false"

                                    if [ $overThreshold1 -ne 0 ] || [ $underThreshold1 -ne 0 ]
                                    then
                                        thresh1="true"
                                    fi
                                    if [ $overThreshold2 -ne 0 ] || [ $underThreshold2 -ne 0 ]
                                    then
                                        thresh2="true"
                                    fi
                                    if [ $overThreshold3 -ne 0 ] || [ $underThreshold3 -ne 0 ]
                                    then
                                        thresh3="true"
                                    fi
                                    if [ $overThreshold4 -ne 0 ] || [ $underThreshold4 -ne 0 ]
                                    then
                                        thresh4="true"
                                    fi
                                    if [ $overThreshold5 -ne 0 ] || [ $underThreshold5 -ne 0 ]
                                    then
                                        thresh5="true"
                                    fi

                                    if [ "$thresh1" == "true" ] && [ "$thresh2" == "true" ] && [ "$thresh3" == "true" ] && [ "$thresh4" == "true" ] && [ "$thresh5" == "true" ]
                                    then
                                        echo "$alertlevel - $devicename -> $metric - $diff1%, $diff2%, $diff3%, $diff4%, $diff5%. Threshold: $threshold1% to $threshold2%"
                                        echo "-------------------------------------------------------------------------------------"
                                    fi
                                                            
                                    grep -v "$devicename-$url1-$metric-$i-" "$proxyhistory" > "/temp"
                                    mv "temp" "$proxyhistory"

                                    stringa="$devicename-$url1-$metric-$i-a $valb"
                                    stringb="$devicename-$url1-$metric-$i-b $valc"
                                    stringc="$devicename-$url1-$metric-$i-c $vald"
                                    stringd="$devicename-$url1-$metric-$i-d $vale"
                                    stringe="$devicename-$url1-$metric-$i-e $val"

                                    echo "$stringa" >> "$proxyhistory"
                                    echo "$stringb" >> "$proxyhistory"
                                    echo "$stringc" >> "$proxyhistory"
                                    echo "$stringd" >> "$proxyhistory"
                                    echo "$stringe" >> "$proxyhistory"

                                fi
                            done
                            sort "$proxyhistory" > "temp"
                            mv "temp" "$proxyhistory"
                        fi

                    fi    
                fi
            done < "$metriclist"
        else
            echo "$line did not respond to PING"
        fi
        echo "***********************************************************************************"    
    fi
done < "$proxylist"
done

# 6  
Old 05-11-2010
Wow, yeah... whenever you have

Code:
something="`echo $variable`

just do
Code:
something="$variable"

Also,

Code:
ping -c 2 $line > /dev/null 2>&1
status=`echo $?`
if [ $status -eq 0 ]
then
...

can just be
Code:
if ping -c 2 $line > /dev/null 2>&1
then
...

Also, I'm not entirely sure what this line is doing:

Code:
if [ "$line" ]

...but if you're guarding against blank lines:

Code:
if [ ! -z "${line}" ]
...

Or better yet, do this. It will skip blank lines without another layer of nested if at all:

Code:
[ -z "${line}" ] && continue

Constructs like these are extremely slow since they can run cut uncountable numbers of times.

Code:
value1=`echo $value | cut -d ' ' -f $i`

Instead, since you're using a shell that supports arrays, just split it into an array once then use the array. This should split fine on spaces:

Code:
ARRAY=( $value )

...

value1="${ARRAY[$i]}"

You can also split on other characters by changing the IFS variable but be aware that this affects read too.

You're running grep many, many times per loop. This is slow. Instead of
Code:
if ! grep file string1 ; then str=no ; fi
if ! grep file string2 ; then str=no ; fi
...

try
Code:
HAS1=0
HAS2=0
HAS3=0
HAS4=0
while read TESTLINE
do
        [[ "${TESTLINE}" =~ $string1 ]] && HAS1=1
        [[ "${TESTLINE}" =~ $string2 ]] && HAS2=1
        [[ "${TESTLINE}" =~ $string3 ]] && HAS3=1
        [[ "${TESTLINE}" =~ $string4 ]] && HAS4=1
done < file
OKAY=0
[ "$HAS1" -gt 0 ] && [ "$HAS2" -gt 0 ] && [ "${HAS3}" -gt 0 ] && [ "${HAS4}" -gt 0 ] && OKAY=1

This reads the file only once and doesn't execute four extra processes. Note that the ~= regular expression operator only works in bash.

Whenever you have VAR=`something | grep something | grep something | grep something` that's an enormous performance waster, and likely possible with shell built-ins, though exactly how depends on what bits you want to get.

...and so forth and so forth. Your script is enormous. You might want to break it into functions so you can tell what's happening where. Functions are easy:

Code:
function myfunc
{
  echo $1 $2
}

myfunc a b

They act like processes in that they return numbers, not strings, and output to stdin/stdout/stderr. But they can set global variables (as long as they're not behind a pipe).

The advanced bash scripting guide is a nice reference.
# 7  
Old 05-11-2010
You can also keep the historical data manageable by tailing the file. Log all values into a single file, such as history.log
At the beginning of the log file processing, execute:
Code:
tail history.log > fileToProcess.log

This will give you a smaller file from which to get your historical data. The size of your history.log file will not matter, your processing file will always contain the last 10 entries.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Bash script search, improve performance with large files

Hello, For several of our scripts we are using awk to search patterns in files with data from other files. This works almost perfectly except that it takes ages to run on larger files. I am wondering if there is a way to speed up this process or have something else that is quicker with the... (15 Replies)
Discussion started by: SDohmen
15 Replies

2. OS X (Apple)

Create a bash array from a flat file of whitespaces only.

Hi guys and gals... MacBook Pro. OSX 10.13.2, default bash terminal. I have a flat file 1920 bytes in size of whitespaces only. I need to put every single whitespace character into a bash array cell. Below are two methods that work, but both are seriously ugly. The first one requires that I... (7 Replies)
Discussion started by: wisecracker
7 Replies

3. Shell Programming and Scripting

Bash - array loop performance

Hi, another little question... "sn" is an array whose elements can vary from about 55,000 to about 150,000 elements. Each element consists of an integer between 0-255, eg: ${sn} contain the value: 103 . For a decrypt-procedure I need scroll all the elements 4 or 5 times. Here is an example of... (15 Replies)
Discussion started by: math4
15 Replies

4. Shell Programming and Scripting

Performance improvement in grep

Below script is used to search numeric data from around 400 files in a folder. I have 300 such folders. Need help in performance improvement in the script. Below Script searches 20 such folders ( 300 files in each folder) simultaneously. This increases cpu utilization upto 90% What changes... (3 Replies)
Discussion started by: vegasluxor
3 Replies

5. Shell Programming and Scripting

[BASH] Performance question - Script to STDOUT

Hello Coders Some time ago i was asking about python and bash performances, and i was told i could post the regarding code, and someone would kindly help to make it faster (if possible). If you have noted, i'm on the way to finalize, finish, stable TUI - Text(ual) User Interface. It is a... (6 Replies)
Discussion started by: sea
6 Replies

6. Shell Programming and Scripting

Display-performance in terminal, bash or python?

Heyas I've been working on my project TUI (Text User Interface) for quite some time now, its a hobby project, so nothing i sit in front of 8hrs/day. Since the only 'real' programming language i knw is Visual Basic, based upon early steps with MS-Batch files. When i 'joined' linux 3 years ago,... (7 Replies)
Discussion started by: sea
7 Replies

7. UNIX for Advanced & Expert Users

linux os improvement

can anyone help to share the knowledge on linux os improvement? 1) os account - use window AD authentication, such as ldap, but how to set /etc/passwd, where to put user home? 2) user account activity - how to log os user activity share the idea and what tools can do that...thx (5 Replies)
Discussion started by: goodbid
5 Replies

8. Shell Programming and Scripting

Any improvement possible in this script

Hi! Thank you for the help yesterday This is the finished product There is one more thing I would like to do to it but I’m not to certain On how to proceed I would like to log all output to a log in order to Be able to roll back This script is meant to be used in repairing a... (4 Replies)
Discussion started by: Ex-Capsa
4 Replies

9. Programming

File - reading - Performance improvement

Hi All I am reading a huge file of size 2GB atleast. I am reading each line and cutting certain columns and writing it to another file. Here is the logic. int main() { string u_line; string Char_List; string u_file; int line_pos; string temp_form_u_file; ... (10 Replies)
Discussion started by: dhanamurthy
10 Replies

10. Shell Programming and Scripting

Help with Flat Files Please!! BASH (New User)

Hello All, I am brand new to the UNIX world and so far and very intrigued and enjoy scripting. This is just a new language for me. I would really like assistance with the below request. Any help would be greatly appreciated! I want to create a flat file in Vi that has a header field and... (0 Replies)
Discussion started by: cyberjax21
0 Replies
Login or Register to Ask a Question