False alerts


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting False alerts
# 8  
Old 02-24-2017
Hi,

Also, if you could please supply the full contrab entry that's being used on the live server itself to run the script every five minutes that would be good too (i.e. the entire line you see in the output of crontab -l on the live server that concerns the script in question).

One other aside: you mentioned way back in your second response that your "Linux team" thinks it's a Nagios issue. If this server is being monitored by Nagios, or if it can be monitored by Nagios, then there are nrpe or check_mk plugins that can monitor server load directly without you having to write a script of your own. If you don't know much about it, then basically nrpe and check_mk are pieces of software that can run on a server that's being montiored by Nagios to allow more complex checks than "is it up or down" to be carried out.

Whoever is responsible for the Nagios monitoring system at your site should be able to help you with that. Load monitoring is one of the most commonly-used Nagios plugins, so if you can run either nrpe or check_mk on this server that would definitely be the best way to go here, rather than rolling your own script.
This User Gave Thanks to drysdalk For This Post:
# 9  
Old 02-24-2017
Code:
#!/bin/sh
#description ...
THRESHOLD=90
ALERT="monitoringbox@abc.com"
TEMPFILE=/tmp/temp1
HOSTNAME=`hostname`
rm -f $TEMPFILE
CPU_LOAD=`sar -P ALL 10 1 |grep 'Average.*all' |awk -F" " '{print 100.0 -$NF}' |cut -d \. -f1`
if [[ $CPU_LOAD > $THRESHOLD ]];
then
echo "CPU notification on $HOSTNAME is ${CPU_LOAD}% " `date`  >> $TEMPFILE
fi
if [ -e $TEMPFILE ]
then
mail -s "Check CPU usage on $HOSTNAME `date`  " $ALERT < $TEMPFILE
fi


Moderator's Comments:
Mod Comment Please use CODE tags as required by forum rules!

Last edited by RudiC; 02-24-2017 at 02:33 PM.. Reason: Added CODE tags.
# 10  
Old 02-24-2017
Hello,

On my own test system (which is running Ubuntu 16.04 LTS x86_64) this script does basically appear to work. At any rate, it certainly doesn't generate any false positives for me when run at the shell as a non-privileged user, and the values it's getting for load appear to be genuine and sensible.

So, if you could provide the full crontab entry from which the script is run, then it'll be possible to look at that as a source of the issues next.

Again though: if you do have a Nagios environment, using its own load monitoring plugins is a much, much better idea. Honestly, if you have a Nagios server and the ability to add your server to it, you're just re-inventing the wheel here for no real gain whatsoever by writing your own script.
These 2 Users Gave Thanks to drysdalk For This Post:
# 11  
Old 02-24-2017
Thank you drysdalk

Exactly , I want to make sure that my thoughts were correct

my script in interfering with nagios or so and hence doing the false alarm

May be I need to take out the script from cron job and do it from cron.hourly ? or so ?

will be a good idea to add script to cron.hourly to execute every 5 min time frame ?
# 12  
Old 02-24-2017
Hi,

I doubt a script this simple could cause any problems for Nagios. What monitoring is already configured in Nagios ? Are these load alerts that you regard as false coming as e-mails from your script, or as alerts from Nagios ? And can you please provide the crontab entry so it can be ruled out as a cause ?
# 13  
Old 02-24-2017
Quote:
Originally Posted by anil529
Code:
#!/bin/sh
#description ...
THRESHOLD=90
ALERT="monitoringbox@abc.com"
TEMPFILE=/tmp/temp1
HOSTNAME=`hostname`
rm -f $TEMPFILE
CPU_LOAD=`sar -P ALL 10 1 |grep 'Average.*all' |awk -F" " '{print 100.0 -$NF}' |cut -d \. -f1`
if [[ $CPU_LOAD > $THRESHOLD ]];
then
echo "CPU notification on $HOSTNAME is ${CPU_LOAD}% " `date`  >> $TEMPFILE
fi
if [ -e $TEMPFILE ]
then
mail -s "Check CPU usage on $HOSTNAME `date`  " $ALERT < $TEMPFILE
fi

Moderator's Comments:
Mod Comment Please use CODE tags as required by forum rules!
I'd suggest changing this:
Code:
if [[ $CPU_LOAD > $THRESHOLD ]];

to
Code:
if [ $CPU_LOAD -gt $THRESHOLD ];

Also I'd debug what the value of CPU_LOAD is actually at the time of the email being sent.

Another, question.... you're doing this and checking of the existing of a file later:
Code:
echo "CPU notification on $HOSTNAME is ${CPU_LOAD}% " `date`  >> $TEMPFILE

Sounds like after the FIRST triggered condition, the file will be APPENDED to and any subsequent run of the script will trigger an email.
Don't you want to remove the file AFTER the condition has been triggered?
# 14  
Old 02-24-2017
Hi drysdalk

I doubt a script this simple could cause any problems for Nagios. What monitoring is already configured in Nagios ? Are these load alerts that you regard as false coming as e-mails from your script, or as alerts from Nagios ? And can you please provide the crontab entry so it can be ruled out as a cause ?[/QUOTE]


Yes Nagios is already monitoring load ,
Code:
0,15,30,45 * * * * /unixmon/servermon.p > /dev/null 2>&1
*/5 * * * *  /etc/applicationMonitoring.sh

---------- Post updated at 03:04 PM ---------- Previous update was at 02:59 PM ----------

Quote:
Originally Posted by vgersh99
I'd suggest changing this:
Code:
if [[ $CPU_LOAD > $THRESHOLD ]];

to
Code:
if [ $CPU_LOAD -gt $THRESHOLD ];

Yes I tried the -gt code and thought to give a try using >
did not help


Also I'd debug what the value of CPU_LOAD is actually at the time of the email being sent.
false alert always throws 1%,Real alerts were correct.

Another, question.... you're doing this and checking of the existing of a file later:
Code:
echo "CPU notification on $HOSTNAME is ${CPU_LOAD}% " `date`  >> $TEMPFILE

Sounds like after the FIRST triggered condition, the file will be APPENDED to and any subsequent run of the script will trigger an email.
Don't you want to remove the file AFTER the condition has been triggered?
I have placed rm command at the top will it make difference ?

Moderator's Comments:
Mod Comment edit by bakunin: for chrissakes, how often have we told you now to use CODE-tags? grrrrrr....

Last edited by bakunin; 02-27-2017 at 06:35 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Url check creating child process and generating false alerts

Hi All Below code is working as expected but creating too many child processes when the url is not up and every minute that process is sending false email alerts any help with the logic not to generate child process and not to send duplicate alerts app="https://url" appresult=$(wget... (2 Replies)
Discussion started by: srilinux09
2 Replies

2. Red Hat

Nagios is sending critical false alerts about current users

Hello All, Nagios seems to be sending false alerts about few hosts, (ex: There were no users on one host and still Nagios was reporting a critical alert and says 6 users are logged in. How do I fix this one? Also, I have installed nagios and added 12 hosts as a start and monitoring few... (4 Replies)
Discussion started by: lovesaikrishna
4 Replies

3. AIX

Gid=0 and 7 + admin=FALSE

Checking configuration access files for an AIX server, left me wondering about this :confused:: If a user is added to system group, it gets gid=0 with some security risks because it gets some root kind of file access level. Is this insecure condition kept if the user has admin variable... (0 Replies)
Discussion started by: bkiddo
0 Replies

4. IP Networking

false tcp connection

Why this happens? How to solve this? $netstat -na |grep 9325 tcp 0 0 127.0.0.1:9325 127.0.0.1:9325 ESTABLISHED When a client socket repeatedly tries to connect to an inactive(no server socket is listening on this port) local port,connect succeeds. ... (1 Reply)
Discussion started by: johnbach
1 Replies

5. Shell Programming and Scripting

False Condition

Hi All, I am using the below Script to enter a line in the File: #!/bin/ksh # To delete the last line if it contains the pattern Redirect permanent / Virgin Atlantic Airways - Popup echo "Enter the URL that should point to the particular microsite" read url # To delete the last line if it... (0 Replies)
Discussion started by: Shazin
0 Replies

6. Solaris

False Memory usage alarm!!

Hi Experts, I am using Solaris-10, Sun-Fire-V445. i got often the below message- "Memory Usage – Critical, Memory usage (RAM) exceeding 90% The memory utilization is exceeding 90%" in a application running on solaris. I checked with Vmstat. Everything seems to be fine. Where i should... (5 Replies)
Discussion started by: thepurple
5 Replies

7. UNIX for Advanced & Expert Users

will sftp work with /bin/false

helo helo I have create user for the group and entry for the user in /etc/passwd file is liek this bhavin:x:2014:109:test:/home/pds_RBAC:/bin/false I have keep here /bin/false now i m accesing user through sftp ow when i access that user using sftp from the another linux pc for e.g... (1 Reply)
Discussion started by: amitpansuria
1 Replies

8. Shell Programming and Scripting

Why is it always false?

Hi, I'm new to UNIX and am trying to learn shell scripting in order to work on an interface that I inherited when a co-worker left. I need to be able to check to see whether a file exists to determine whether the FTP has taken place, but in testing, the if statement always evaluates as false,... (3 Replies)
Discussion started by: JeffR
3 Replies

9. Shell Programming and Scripting

false use of sed???

i want to delete every newline and every line which starts with "RECORD......." in a file. FILE: Record 61391 in base BROCKHAUS (Timestamp: 2008-04-09 11:38:38) UNTERTITEL : Gräfin (seit 1707 Reichsgräfin) von, * Schwerin 4. 2. 1686, + Berlin 21. 10. 1744; wurde Record 61392 in base BROCKHAUS... (4 Replies)
Discussion started by: trek
4 Replies

10. Linux

bin\false

We have requirments to not allow a userid login abilities but allow users to 'su' to it. In solaris I normally set the shell in /etc/passwd to bin/false. THis does not work on Linux, any suggestions would help. (1 Reply)
Discussion started by: bryanthomas
1 Replies
Login or Register to Ask a Question