Possible performance improvement (Bash and flat file)
Hello,
I am pretty new to shell scripts and I recently wrote one that seems to do what it should but I am exploring the possibility of improving its performance and would appreciate some help. Here is what it does - Its meant to monitor a bunch of systems (reads in IPs one at a time from a flat file). For each IP, it fetches a set of web pages, parses them to extract certain numbers, compares them against defined thresholds and alerts if the metric falls outside the threshold range. The catch is for certain metrics, it requires the last 5 values that it observed so I store those in a flat file and every time a new value is retrieved from the web page, that along with the stored values are used to compare against the threshold. Basically, I am doing everything sequentially so 2 loops, one to read in the IP and the next to do the web page download, threshold check, etc. Every time a new IP is added or a new metric needs to be monitored, the time taken to loop back to a machine increases. I wanted to see if there was a way to improve this? Intuitively, I feel, because all historical values are stored in a single flat file, something like multi processing would not work since, a process would have that file locked. Any ideas?????
As the level of complexity increases, it begins to make more sense to utilize a database to manage the changing state of the environment. Maybe look into something simple to start with - like Berkely DB
The catch is for certain metrics, it requires the last 5 values that it observed so I store those in a flat file and every time a new value is retrieved from the web page, that along with the stored values are used to compare against the threshold. Basically, I am doing everything sequentially so 2 loops, one to read in the IP and the next to do the web page download, threshold check, etc. Every time a new IP is added or a new metric needs to be monitored, the time taken to loop back to a machine increases. I wanted to see if there was a way to improve this?
It would help to see the actual code.
Quote:
Intuitively, I feel, because all historical values are stored in a single flat file, something like multi processing would not work since, a process would have that file locked. Any ideas?
Most systems don't do that kind of locking unless you explicitly ask for it. But having two processes simultaneously read the same file handle wouldn't be a great idea, they might each get half a line or somesuch. If you're just reading flat files line by line, you could try a 'reader' script that reads everything for them and parcels them out individually. That'd have some extra overhead for the extra process and its pipes, but would let more than one reader operate at once.
I'll need to see your actual code to help you here, I think, at least some of it. What needs to be optimized depends not just on what you're doing, but how you're doing it. If you're new to shell scripting there's some trivial design mistakes that could be causing slowdowns... excessive use of pipes and/or backticks is particularly bad. If you've got pipe chains on almost every line, there's probably much room for improvement. In my early scripting days I wrote a linewrapper in BASH that fed everything through about 9 sub-processes, it ended up processing at 10 kilobytes per second!
Last edited by Corona688; 05-07-2010 at 03:02 PM..
Reason: fix inexplicable doublepost
Thanks a lot everyone. I do seem to have a very large number of back tics. Would appreciate help in eliminating them and any other way of improving performance.
Code:
#!/bin/bash
#Retrive a list of proxies and compare specified metrics against their threshold values. Alert as required.
#Path to the list of proxies
proxylist="proxylist2"
#Path to the list of URLs, metrics and thresholds
metriclist="metriclist1"
#Path to the proxy history file
proxyhistory="proxyhistory1"
#Parse through the list of proxies and check the specified metrics
while true
do
while read line
do
if [ "$line" ]
then
#Ping the machine to check status.
ping -c 2 $line > /dev/null 2>&1
status=`echo $?`
if [ $status -eq 0 ]
then
#Retrieve device name using SNMP
a=`snmpget $line system.sysName.0`
set -- $a
devicename=`echo $6`
#echo "DEVICE - $devicename"
#Read in a list of URLs, metrics and thresholds and apply them one at a time for each proxy
while read line1
do
if [ "$line1" ]
then
len=`echo $line1 | wc -w`
len1=$[len-1]
len1a=$[len-2]
set -- $line1
alertlevel=$1
url=$2
url1=$2
threshold=`echo $line1 | cut -d ' ' -f $len`
if [ "$threshold" != "RATE" ]
then
metric=`echo $line1 | cut -d ' ' -f 3-$len1`
elif [ "$threshold" == "RATE" ]
then
metric=`echo $line1 | cut -d ' ' -f 3-$len1a`
rate=`echo $line1 | cut -d ' ' -f $len1`
#echo "Rate - $rate"
fi
#Completing the URL
url="https://$line:8082$url"
#echo "URL - $url"
#Retrieve the metric value(s) from the URL
value=`retrievemetric1.sh "$url" "$metric"`
#echo "VALUE = $value"
#If the threshold is explicitly defined
if [ "$threshold" != "RATE" ]
then
#Compare the metric value(s) against corresponding thresholds and alert if required
len2=`echo $value | wc -w`
for (( i = 1; i <= $len2 ; i++))
do
value1=`echo $value | cut -d ' ' -f $i`
check=`thresholdexceed1.sh "$value1" "$threshold"`
if [ "$check" == "true" ]
then
echo "$alertlevel - $devicename -> $metric - $value1. Threshold: $threshold"
echo "-------------------------------------------------------------------------------------"
fi
done
#If the threshold is rate based
elif [ "$threshold" == "RATE" ]
then
#Number of values from the URL
len2=`echo $value | wc -w`
#Flag to check if all values are reflected in history file
stringMissing="false"
#Check if all values present in history file
for (( i = 1 ; i <= $len2 ; i++ ))
do
stringa="$devicename-$url1-$metric-$i-a"
stringb="$devicename-$url1-$metric-$i-b"
stringc="$devicename-$url1-$metric-$i-c"
stringd="$devicename-$url1-$metric-$i-d"
stringe="$devicename-$url1-$metric-$i-e"
if ! grep "$stringa" "$proxyhistory" > /dev/null
then
stringMissing="true"
fi
if ! grep "$stringb" "$proxyhistory" > /dev/null
then
stringMissing="true"
fi
if ! grep "$stringc" "$proxyhistory" > /dev/null
then
stringMissing="true"
fi
if ! grep "$stringd" "$proxyhistory" > /dev/null
then
stringMissing="true"
fi
if ! grep "$stringe" "$proxyhistory" > /dev/null
then
stringMissing="true"
fi
done
#If a value is missing, delete all the ones that are present for that metric
if [ "$stringMissing" == "true" ]
then
grep -v "$devicename-$url1-$metric-" "$proxyhistory" > "temp"
mv "temp" "$proxyhistory"
#Create all the required strings and initialize them
for (( i = 1 ; i <= $len2 ; i++ ))
do
val=`echo $value | cut -d ' ' -f $i`
stringa="$devicename-$url1-$metric-$i-a $val"
stringb="$devicename-$url1-$metric-$i-b 0"
stringc="$devicename-$url1-$metric-$i-c 0"
stringd="$devicename-$url1-$metric-$i-d 0"
stringe="$devicename-$url1-$metric-$i-e 0"
echo "$stringa" >> "$proxyhistory"
echo "$stringb" >> "$proxyhistory"
echo "$stringc" >> "$proxyhistory"
echo "$stringd" >> "$proxyhistory"
echo "$stringe" >> "$proxyhistory"
done
#If all the required strings are present
elif [ "$stringMissing" == "false" ]
then
for (( i = 1 ; i <= $len2 ; i++ ))
do
val=`echo $value | cut -d ' ' -f $i`
stringa="$devicename-$url1-$metric-$i-a"
stringb="$devicename-$url1-$metric-$i-b"
stringc="$devicename-$url1-$metric-$i-c"
stringd="$devicename-$url1-$metric-$i-d"
stringe="$devicename-$url1-$metric-$i-e"
vala=`grep "$stringa[[:space:]]*[0-9]" "$proxyhistory" | grep -o '[0-9]*$'`
valb=`grep "$stringb[[:space:]]*[0-9]" "$proxyhistory" | grep -o '[0-9]*$'`
valc=`grep "$stringc[[:space:]]*[0-9]" "$proxyhistory" | grep -o '[0-9]*$'`
vald=`grep "$stringd[[:space:]]*[0-9]" "$proxyhistory" | grep -o '[0-9]*$'`
vale=`grep "$stringe[[:space:]]*[0-9]" "$proxyhistory" | grep -o '[0-9]*$'`
#echo "VALA - $vala"
#echo "VALB - $valb"
#echo "VALC - $valc"
#echo "VALD - $vald"
#echo "VALE - $vale"
if [ $vala -eq 0 ]
then
echo "$stringa $val" >> "$proxyhistory"
elif [ $vala -ne 0 ] && [ $valb -eq 0 ]
then
grep -v "$stringb" "$proxyhistory" > "temp"
mv "/temp" "$proxyhistory"
echo "$stringb $val" >> "$proxyhistory"
elif [ $vala -ne 0 ] && [ $valb -ne 0 ] && [ $valc -eq 0 ]
then
grep -v "$stringc" "$proxyhistory" > "temp"
mv "temp" "$proxyhistory"
echo "$stringc $val" >> "$proxyhistory"
elif [ $vala -ne 0 ] && [ $valb -ne 0 ] && [ $valc -ne 0 ] && [ $vald -eq 0 ]
then
grep -v "$stringd" "$proxyhistory" > "temp"
mv "temp" "$proxyhistory"
echo "$stringd $val" >> "$proxyhistory"
elif [ $vala -ne 0 ] && [ $valb -ne 0 ] && [ $valc -ne 0 ] && [ $vald -ne 0 ] && [ $vale -eq 0 ]
then
grep -v "$stringe" "$proxyhistory" > "temp"
mv "temp" "$proxyhistory"
echo "$stringe $val" >> "$proxyhistory"
elif [ $vala -ne 0 ] && [ $valb -ne 0 ] && [ $valc -ne 0 ] && [ $vald -ne 0 ] && [ $vale -ne 0 ]
then
#threshold1=$[rate*-1]
threshold1=$(printf "%s\n" "scale = 2; $rate*-1" | bc)
threshold2=$rate
diff1=$(printf "%s\n" "scale = 4; (($valb-$vala)/$vala)*100" | bc)
diff2=$(printf "%s\n" "scale = 4; (($valc-$valb)/$valb)*100" | bc)
diff3=$(printf "%s\n" "scale = 4; (($vald-$valc)/$valc)*100" | bc)
diff4=$(printf "%s\n" "scale = 4; (($vale-$vald)/$vald)*100" | bc)
diff5=$(printf "%s\n" "scale = 4; (($val-$vale)/$vale)*100" | bc)
#echo "DIFF1 - $diff1"
#echo "DIFF2 - $diff2"
#echo "DIFF3 - $diff3"
#echo "DIFF4 - $diff4"
#echo "DIFF5 - $diff5"
overThreshold1=`echo "$diff1 > $threshold2" | bc`
underThreshold1=`echo "$diff1 < $threshold1" | bc`
overThreshold2=`echo "$diff2 > $threshold2" | bc`
underThreshold2=`echo "$diff2 < $threshold1" | bc`
overThreshold3=`echo "$diff3 > $threshold2" | bc`
underThreshold3=`echo "$diff3 < $threshold1" | bc`
overThreshold4=`echo "$diff4 > $threshold2" | bc`
underThreshold4=`echo "$diff4 < $threshold1" | bc`
overThreshold5=`echo "$diff5 > $threshold2" | bc`
underThreshold5=`echo "$diff5 < $threshold1" | bc`
#echo "TH1 - $overThreshold1, $underThreshold1"
#echo "TH2 - $overThreshold2, $underThreshold2"
#echo "TH3 - $overThreshold3, $underThreshold3"
#echo "TH4 - $overThreshold4, $underThreshold4"
#echo "TH5 - $overThreshold5, $underThreshold5"
thresh1="false"
thresh2="false"
thresh3="false"
thresh4="false"
thresh5="false"
if [ $overThreshold1 -ne 0 ] || [ $underThreshold1 -ne 0 ]
then
thresh1="true"
fi
if [ $overThreshold2 -ne 0 ] || [ $underThreshold2 -ne 0 ]
then
thresh2="true"
fi
if [ $overThreshold3 -ne 0 ] || [ $underThreshold3 -ne 0 ]
then
thresh3="true"
fi
if [ $overThreshold4 -ne 0 ] || [ $underThreshold4 -ne 0 ]
then
thresh4="true"
fi
if [ $overThreshold5 -ne 0 ] || [ $underThreshold5 -ne 0 ]
then
thresh5="true"
fi
if [ "$thresh1" == "true" ] && [ "$thresh2" == "true" ] && [ "$thresh3" == "true" ] && [ "$thresh4" == "true" ] && [ "$thresh5" == "true" ]
then
echo "$alertlevel - $devicename -> $metric - $diff1%, $diff2%, $diff3%, $diff4%, $diff5%. Threshold: $threshold1% to $threshold2%"
echo "-------------------------------------------------------------------------------------"
fi
grep -v "$devicename-$url1-$metric-$i-" "$proxyhistory" > "/temp"
mv "temp" "$proxyhistory"
stringa="$devicename-$url1-$metric-$i-a $valb"
stringb="$devicename-$url1-$metric-$i-b $valc"
stringc="$devicename-$url1-$metric-$i-c $vald"
stringd="$devicename-$url1-$metric-$i-d $vale"
stringe="$devicename-$url1-$metric-$i-e $val"
echo "$stringa" >> "$proxyhistory"
echo "$stringb" >> "$proxyhistory"
echo "$stringc" >> "$proxyhistory"
echo "$stringd" >> "$proxyhistory"
echo "$stringe" >> "$proxyhistory"
fi
done
sort "$proxyhistory" > "temp"
mv "temp" "$proxyhistory"
fi
fi
fi
done < "$metriclist"
else
echo "$line did not respond to PING"
fi
echo "***********************************************************************************"
fi
done < "$proxylist"
done
This reads the file only once and doesn't execute four extra processes. Note that the ~= regular expression operator only works in bash.
Whenever you have VAR=`something | grep something | grep something | grep something` that's an enormous performance waster, and likely possible with shell built-ins, though exactly how depends on what bits you want to get.
...and so forth and so forth. Your script is enormous. You might want to break it into functions so you can tell what's happening where. Functions are easy:
Code:
function myfunc
{
echo $1 $2
}
myfunc a b
They act like processes in that they return numbers, not strings, and output to stdin/stdout/stderr. But they can set global variables (as long as they're not behind a pipe).
You can also keep the historical data manageable by tailing the file. Log all values into a single file, such as history.log
At the beginning of the log file processing, execute:
Code:
tail history.log > fileToProcess.log
This will give you a smaller file from which to get your historical data. The size of your history.log file will not matter, your processing file will always contain the last 10 entries.
Hello,
For several of our scripts we are using awk to search patterns in files with data from other files. This works almost perfectly except that it takes ages to run on larger files. I am wondering if there is a way to speed up this process or have something else that is quicker with the... (15 Replies)
Hi guys and gals...
MacBook Pro.
OSX 10.13.2, default bash terminal.
I have a flat file 1920 bytes in size of whitespaces only. I need to put every single whitespace character into a bash array cell.
Below are two methods that work, but both are seriously ugly.
The first one requires that I... (7 Replies)
Hi,
another little question...
"sn" is an array whose elements can vary from about 55,000 to about 150,000 elements. Each element consists of an integer between 0-255, eg: ${sn} contain the value: 103 . For a decrypt-procedure I need scroll all the elements 4 or 5 times. Here is an example of... (15 Replies)
Below script is used to search numeric data from around 400 files in a folder. I have 300 such folders. Need help in performance improvement in the script.
Below Script searches 20 such folders ( 300 files in each folder) simultaneously. This increases cpu utilization upto 90% What changes... (3 Replies)
Hello Coders
Some time ago i was asking about python and bash performances, and i was told i could post the regarding code, and someone would kindly help to make it faster (if possible).
If you have noted, i'm on the way to finalize, finish, stable TUI - Text(ual) User Interface.
It is a... (6 Replies)
Heyas
I've been working on my project TUI (Text User Interface) for quite some time now, its a hobby project, so nothing i sit in front of 8hrs/day.
Since the only 'real' programming language i knw is Visual Basic, based upon early steps with MS-Batch files. When i 'joined' linux 3 years ago,... (7 Replies)
can anyone help to share the knowledge on linux os improvement?
1) os account
- use window AD authentication, such as ldap, but how to set /etc/passwd, where to put user home?
2) user account activity
- how to log os user activity
share the idea and what tools can do that...thx (5 Replies)
Hi!
Thank you for the help yesterday
This is the finished product
There is one more thing I would like to do to it but I’m not to certain
On how to proceed I would like to log all output to a log in order to
Be able to roll back
This script is meant to be used in repairing a... (4 Replies)
Hi All
I am reading a huge file of size 2GB atleast. I am reading each line and cutting certain columns and writing it to another file.
Here is the logic.
int main()
{
string u_line;
string Char_List;
string u_file;
int line_pos;
string temp_form_u_file;
... (10 Replies)
Hello All,
I am brand new to the UNIX world and so far and very intrigued and enjoy scripting. This is just a new language for me. I would really like assistance with the below request. Any help would be greatly appreciated!
I want to create a flat file in Vi that has a header field and... (0 Replies)