First, I know that's a bad title. I couldn't think of anything short enough. ...
I wrote the following script to let me know when various parts of the network are down. It used to look like this before last weekend when I got over 500 emails about 1 host being down all weekend:
this is in the cron to run every 5 minutes
Code:
#script to ping nodes specified in /home/scripts/watch
#supposed to email me when one does not respond.
while read HOST ; do live=`ping -c4 "$HOST"|wc -l` ; #read IP's and ping them, count the number of lines returned (should be 9 for success, 4 for failure)
if [ $live -eq 4 ] #pretty self-explanatory
then
echo "This is an automatically generated email to let you know that "$HOST" has not responded to a scheduled ping. \n\n`date`\n\n`ping -c1 "$HOST"`\n\n`traceroute "$HOST"`" | mail -s "IPwatch "$HOST" Down!" email@address.com # send a fancy email
fi
done < /home/scripts/watch # read IPs from this file