Quote:
Originally Posted by
Corona688
Let me put it this way: What function does leaving them down perform for you? Does restarting them prevent you from debugging them?
Well, like I said, everything is in a pool, behind a load balancer. An app that fails is a broken app. Maybe it was just a glitch and a restart would remedy it, but "maybe" it's not. Maybe the app on that host is actually broke. If you restart it, the load balancer will continue to send traffic to it and those customers will be affected. To me that's a huge negative.
jim mcnamera said "sysadmins push for maximal uptime. This is what they are paid for". Where I work it's all about SLA's. If I'm sending customers to a malfunctioning node (or worse, sending the SLA monitor to a malfunctioning node) we take a hit on SLA. Big no-no here. So, yes, uptime is what we're paid for, but uptime for the service, not for the individual services running behind it. Plus, what if you had an app that crashes once a day but you autorestart it, resulting in 99.99% uptime. Would you seriously consider that a success?
MG