Opinion on auto-restart of failed apps/services


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users Opinion on auto-restart of failed apps/services
# 1  
Old 05-18-2011
Opinion on auto-restart of failed apps/services

I'm becoming a dying breed where I work. More and more sys admins are advocating automatically restarting failed services such as tomcat, jboss, etc. I've always been against doing this except with buggy apps that can't be fixed or avoided.

My main argument is that I feel it's a trick used by lazy sys admins who don't want to troubleshoot their apps. Almost everything we have that is customer facing is behind a load balancer (we have a lot of customers). If the LB is properly configured, it will pull a node out of the rotation if it fails a health check. If the pool is sized properly it will have at least n+1 servers running and should be able to handle the load if one node dies or is removed. I feel we should let the app fail, alert on it, remove it from the pool, and troubleshoot it to find out why. Turn up a new node to take it's place if necessary. If the bad app is auto restarted and it is indeed bad, we will continue to route customers to it and it could negatively affect them.

They argue that "apps just fail" and that we should restart them asap to keep them up and servicing customers.

I'm starting to feel like the old geezer of the group and these damn kids won't get off my lawn. If you wouldn't mind, please let me know your take on this. I'm not looking for everyone to agree with me and I'm not against changing my views. They just haven't provided a good argument.

Thanks,

MG
# 2  
Old 05-18-2011
Let me put it this way: What function does leaving them down perform for you? Does restarting them prevent you from debugging them?
# 3  
Old 05-18-2011
sysadmins push for maximal uptime. This is what they are paid for:

system availability
data security
# 4  
Old 05-20-2011
Quote:
Originally Posted by Corona688
Let me put it this way: What function does leaving them down perform for you? Does restarting them prevent you from debugging them?
Well, like I said, everything is in a pool, behind a load balancer. An app that fails is a broken app. Maybe it was just a glitch and a restart would remedy it, but "maybe" it's not. Maybe the app on that host is actually broke. If you restart it, the load balancer will continue to send traffic to it and those customers will be affected. To me that's a huge negative.

jim mcnamera said "sysadmins push for maximal uptime. This is what they are paid for". Where I work it's all about SLA's. If I'm sending customers to a malfunctioning node (or worse, sending the SLA monitor to a malfunctioning node) we take a hit on SLA. Big no-no here. So, yes, uptime is what we're paid for, but uptime for the service, not for the individual services running behind it. Plus, what if you had an app that crashes once a day but you autorestart it, resulting in 99.99% uptime. Would you seriously consider that a success?

MG
# 5  
Old 05-22-2011
I don't know why you quoted me, you didn't answer either question.
Quote:
Originally Posted by Corona688
What function does leaving them down perform for you?
Quote:
Originally Posted by Corona688
Does restarting them prevent you from debugging them?
# 6  
Old 05-22-2011
The hard bit can be detecting that the application has failed. Just relying on the output from a single "ps" command is not safe because a busy system may give a blank or incomplete response to a "ps" command.

To paraphrase Corona688 there is no harm in installing a workaround while you find and repair the root cause ... or determine that the root cause cannot be repaired.

If for example you have a client-server application running on an unreliable network (like the Internet) there is a good case to configure a client retry mechanism backed with a carefully-designed automatic client restart combined with a matching dead-session cleanup in the server.
# 7  
Old 05-22-2011
Uptime is gold.

I am a strong advocate of always having a watchdog process in place to watch all critical services and restart them if they go down.

Debugging why processes fail is another topic and certainly should not be used as an excuse to shave uptime down.
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Debian

How do i correct restart network-services in Debian?

Hello, I would like to do follow steps. Set a static IP-Adress on eth0 (For Testing) Set DHCP on eth0 All steps should be done without a single reboot. /etc/network/interfaces iface eth0 inet static address 192.0.2.7/24 gateway 192.0.2.254How do i perform... (3 Replies)
Discussion started by: int3g3r
3 Replies

2. Shell Programming and Scripting

Script to auto restart java for 100 percent

I want the tomcat to restart when java goes 100% cpu utilize and remain on this , Get pid kill and start tomcat . top | grep java We can get pid and cpu utilize , But how can we do on run time. Please use code tags as required by forum rules! (1 Reply)
Discussion started by: kaushik02018
1 Replies

3. Red Hat

Restart of services if port no is changed in /etc/services in RHEL

I had a doubt if any services need to be restarted if port no in /etc/services in an RHEL setup is changed. For eg, the port no of 443 for SSL may need to be changed. I hope my query is clear whether any services need to be restarted if port no in /etc/services is changed. Please revert with... (10 Replies)
Discussion started by: RHCE
10 Replies

4. Shell Programming and Scripting

Need script to restart the services

Hi Guys, I need bash script to restart the service. 1. Disable the service called SASM svcadm disable sasm 2. if service went to maintenance mode then it shuld clear it with below command svcadm clear sasm 3.or else it should restart the mysql service /etc/init.d/mysql stop... (1 Reply)
Discussion started by: bapu1981
1 Replies

5. AIX

problem to restart services from /etc/inittab in AIX6.1

hello, i have an AIX6.1.7.2 machine that it was upgraded recently from AIX5.3.9.4. when i kill system services that should restart automatically like /usr/sbin/cron it doesnt start. i checked my /etc/inittab file and i confirmed that this service is in respawn status so when i kill this process... (2 Replies)
Discussion started by: omonoiatis9
2 Replies

6. Shell Programming and Scripting

Auto restart script does not work

I have a service that are not 100% stable and stops from time to time. So I have a script that do restart the service if it does not run. This script works win on Ubuntu 9.04 but will not start the service in Ubuntu 10.10 If I run the part that do starts the service from CLI, it starts... (2 Replies)
Discussion started by: Jotne
2 Replies

7. HP-UX

Script to auto restart a service

Hi All, May i please know if it is possible to write a script to check the log messages and automatically restart a service if it is failed or it is stopped. Appreciate your suggestions. Thanks in advance. regards, Eajaz (2 Replies)
Discussion started by: ajazshariff
2 Replies

8. Linux

file location for GNOME auto startup apps

I know how to add an apps to auto-start in GUI, but I'd like to know how to do it mannualy. So where is the file saved to by GUI ? (1 Reply)
Discussion started by: honglus
1 Replies

9. Shell Programming and Scripting

Auto Detection/Restart of Sybase Deadlocks

Does anyone have an example of a ksh script that executes a Sybase stored procedure, via the ISQL command, and can detect a deadlock and loop until the process completes successfully? I'm a little confused on where to start. Thanks in advance for any assistance you can provide. (0 Replies)
Discussion started by: BCarlson
0 Replies
Login or Register to Ask a Question