Hi guys, not sure if this would be the right place for this but I dont where else it would go... I'm new to Unix too, so please bare with me
I guess first up some background on the situation. We have some scripts that run as cron jobs which monitor and check the health, etc of our servers. They are configured to alert us (by sending emails to us) whenever they get an exception or appear to be critical or even go down.
This is all well and good, however at the moment we are having some issues and their stability has been more than rocky so they are sending A LOT of emails. I arrived this morning to 250 alerts in my inbox.
Now granted this is for about 8 different servers, but that's still alot of noise. We even get alerted that the server is down each time we manually bounce them. The team is at the point where they are basically ignoring all these emails which is not a good thing because when something really bad does actually happen, we'll miss it in the noise.
Now, my question is kind of two-fold. First up, is there a way we can make these scripts "smarter" so they only alert us when a certain threshold of errors is reached over a certain time period or something?
Secondly, rather than spam our inboxes with emails is there a method or solution to publish alerts to a webpage or a widget or something so only the person who is on support that particular day/week can log on and then be informed of the server activity (rather than all of us getting emails)?