05-18-2011
Opinion on auto-restart of failed apps/services
I'm becoming a dying breed where I work. More and more sys admins are advocating automatically restarting failed services such as tomcat, jboss, etc. I've always been against doing this except with buggy apps that can't be fixed or avoided.
My main argument is that I feel it's a trick used by lazy sys admins who don't want to troubleshoot their apps. Almost everything we have that is customer facing is behind a load balancer (we have a lot of customers). If the LB is properly configured, it will pull a node out of the rotation if it fails a health check. If the pool is sized properly it will have at least n+1 servers running and should be able to handle the load if one node dies or is removed. I feel we should let the app fail, alert on it, remove it from the pool, and troubleshoot it to find out why. Turn up a new node to take it's place if necessary. If the bad app is auto restarted and it is indeed bad, we will continue to route customers to it and it could negatively affect them.
They argue that "apps just fail" and that we should restart them asap to keep them up and servicing customers.
I'm starting to feel like the old geezer of the group and these damn kids won't get off my lawn. If you wouldn't mind, please let me know your take on this. I'm not looking for everyone to agree with me and I'm not against changing my views. They just haven't provided a good argument.
Thanks,
MG
9 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
Does anyone have an example of a ksh script that executes a Sybase stored procedure, via the ISQL command, and can detect a deadlock and loop until the process completes successfully? I'm a little confused on where to start.
Thanks in advance for any assistance you can provide. (0 Replies)
Discussion started by: BCarlson
0 Replies
2. Linux
I know how to add an apps to auto-start in GUI, but I'd like to know how to do it mannualy. So where is the file saved to by GUI ? (1 Reply)
Discussion started by: honglus
1 Replies
3. HP-UX
Hi All,
May i please know if it is possible to write a script to check the log messages and automatically restart a service if it is failed or it is stopped.
Appreciate your suggestions.
Thanks in advance.
regards,
Eajaz (2 Replies)
Discussion started by: ajazshariff
2 Replies
4. Shell Programming and Scripting
I have a service that are not 100% stable and stops from time to time.
So I have a script that do restart the service if it does not run.
This script works win on Ubuntu 9.04 but will not start the service in Ubuntu 10.10
If I run the part that do starts the service from CLI, it starts... (2 Replies)
Discussion started by: Jotne
2 Replies
5. AIX
hello,
i have an AIX6.1.7.2 machine that it was upgraded recently from AIX5.3.9.4.
when i kill system services that should restart automatically like /usr/sbin/cron it doesnt start.
i checked my /etc/inittab file and i confirmed that this service is in respawn status so when i kill this process... (2 Replies)
Discussion started by: omonoiatis9
2 Replies
6. Shell Programming and Scripting
Hi Guys,
I need bash script to restart the service.
1. Disable the service called SASM
svcadm disable sasm
2. if service went to maintenance mode then it shuld clear it with below command
svcadm clear sasm
3.or else it should restart the mysql service
/etc/init.d/mysql stop... (1 Reply)
Discussion started by: bapu1981
1 Replies
7. Red Hat
I had a doubt if any services need to be restarted if port no in /etc/services in an RHEL setup is changed. For eg, the port no of 443 for SSL may need to be changed.
I hope my query is clear whether any services need to be restarted if port no in /etc/services is changed.
Please revert with... (10 Replies)
Discussion started by: RHCE
10 Replies
8. Shell Programming and Scripting
I want the tomcat to restart when java goes 100% cpu utilize and remain on this , Get pid kill and start tomcat .
top | grep java
We can get pid and cpu utilize , But how can we do on run time.
Please use code tags as required by forum rules! (1 Reply)
Discussion started by: kaushik02018
1 Replies
9. Debian
Hello,
I would like to do follow steps.
Set a static IP-Adress on eth0 (For Testing)
Set DHCP on eth0
All steps should be done without a single reboot.
/etc/network/interfaces
iface eth0 inet static
address 192.0.2.7/24
gateway 192.0.2.254How do i perform... (3 Replies)
Discussion started by: int3g3r
3 Replies
pool(3erl) Erlang Module Definition pool(3erl)
NAME
pool - Load Distribution Facility
DESCRIPTION
pool can be used to run a set of Erlang nodes as a pool of computational processors. It is organized as a master and a set of slave nodes
and includes the following features:
* The slave nodes send regular reports to the master about their current load.
* Queries can be sent to the master to determine which node will have the least load.
The BIF statistics(run_queue) is used for estimating future loads. It returns the length of the queue of ready to run processes in the
Erlang runtime system.
The slave nodes are started with the slave module. This effects, tty IO, file IO, and code loading.
If the master node fails, the entire pool will exit.
EXPORTS
start(Name) ->
start(Name, Args) -> Nodes
Types Name = atom()
Args = string()
Nodes = [node()]
Starts a new pool. The file .hosts.erlang is read to find host names where the pool nodes can be started. See section Files below.
The start-up procedure fails if the file is not found.
The slave nodes are started with slave:start/2,3 , passing along Name and, if provided, Args . Name is used as the first part of the
node names, Args is used to specify command line arguments. See slave(3erl) .
Access rights must be set so that all nodes in the pool have the authority to access each other.
The function is synchronous and all the nodes, as well as all the system servers, are running when it returns a value.
attach(Node) -> already_attached | attached
Types Node = node()
This function ensures that a pool master is running and includes Node in the pool master's pool of nodes.
stop() -> stopped
Stops the pool and kills all the slave nodes.
get_nodes() -> Nodes
Types Nodes = [node()]
Returns a list of the current member nodes of the pool.
pspawn(Mod, Fun, Args) -> pid()
Types Mod = Fun = atom()
Args = [term()]
Spawns a process on the pool node which is expected to have the lowest future load.
pspawn_link(Mod, Fun, Args) -> pid()
Types Mod = Fun = atom()
Args = [term()]
Spawn links a process on the pool node which is expected to have the lowest future load.
get_node() -> node()
Returns the node with the expected lowest future load.
FILES
.hosts.erlang is used to pick hosts where nodes can be started. See net_adm(3erl) for information about format and location of this file.
$HOME/.erlang.slave.out.HOST is used for all additional IO that may come from the slave nodes on standard IO. If the start-up procedure
does not work, this file may indicate the reason.
Ericsson AB stdlib 1.17.3 pool(3erl)