04-04-2020
You problem is with network timeout settings; either the cluster or the clients.
Manually shutting down one of the cluster nodes may not give you the same result as a true CPU/power/whatever failure because the cluster software suite will probably see you do that. It would be better to simply pull out the RJ45 network connection to one of them simulating a network connection failure.
Anyway, the point is that a cluster failover takes time. During this time the virtual ip address is switched from one node to the other. Depending on the cluster suite this will take seconds/minutes. The fact that the client will reconnect to the surviving cluster node after you restart it proves that, had it waited long enough, it would have been able to reconnect on its own.
So the solution is to either (1) configure the cluster to failover faster, or (2) increase the timeout that clients will wait before giving up. That means that a new connection to the virtual ip address can be made before the configured timeout period ends.
6 More Discussions You Might Find Interesting
1. Windows & DOS: Issues & Discussions
Hi All,
I use two Network Connections at work: Wireless and LAN.
Wireless network has no limitations, but LAN internet has a web filter.
I start a download using my Wireless conn. (At this point, LAN is disabled)
But when I activate my LAN connection my download stops immediately.
LAN... (4 Replies)
Discussion started by: kalavkalav
4 Replies
2. AIX
Hi all,
I am new to HACMP. So sorry for the newie question. But I did search the forum and it seems that no one asks this before.
So if a 2-node cluster runs in active-active mode (and the same application), what is the benefit of using HACMP ?
If it runs in active-stanby, it is easy to... (9 Replies)
Discussion started by: qiulang
9 Replies
3. Solaris
Hi,
I need to configure 4 ip address (same subnet and mask) in one ipmp group (two interfaces) in an active active formation (link based). Can some one provide the steps or a tutorial link.
Thanks (2 Replies)
Discussion started by: Mack1982
2 Replies
4. Shell Programming and Scripting
Hi All,
From the title you may know that this question has been asked several times and I have done lot of Googling on this.
I have a Wikipedia dump file in XML format. All the contents are in one XML file i.e. all different topics have been put in one XML file. Now I need to separate them and... (1 Reply)
Discussion started by: shoaibjameel123
1 Replies
5. Linux
Hi,
We have one java client which connects to a windows server through ftp in active mode and gets files. When we run this client on hp-ux, it is able to transfer 100k files. But when we run the same client on Linux server it is able to transfer only 200 files at max and it is hanging there... (1 Reply)
Discussion started by: urspradeep330
1 Replies
6. Shell Programming and Scripting
#!/bin/bash
for digit in $(seq 1 10)
do
if ping -c1 -w2 192.168.1.$digit &> /dev/null
then
echo "192.168.1.$digit is UP"
else
echo "192.168.1.$digit is DOWN"
fi
done (3 Replies)
Discussion started by: fusetrips
3 Replies
CRM
DAEMON(7) Pacemaker Configuration CRM DAEMON(7)
NAME
crmd - CRM Daemon Options
SYNOPSIS
[dc-version=string] [cluster-infrastructure=string] [dc-deadtime=time] [cluster-recheck-interval=time] [election-timeout=time]
[shutdown-escalation=time] [crmd-integration-timeout=time] [crmd-finalization-timeout=time] [crmd-transition-delay=time]
[expected-quorum-votes=integer]
DESCRIPTION
This is a fake resource that details the options that can be configured for the CRM Daemon.
SUPPORTED PARAMETERS
dc-version = string [none]
Version of Pacemaker on the cluster's DC.
Includes the hash which identifies the exact Mercurial changeset it was built from. Used for diagnostic purposes.
cluster-infrastructure = string [heartbeat]
The messaging stack on which Pacemaker is currently running.
Used for informational and diagnostic purposes.
dc-deadtime = time [20s]
How long to wait for a response from other nodes during startup.
The "correct" value will depend on the speed/load of your network and the type of switches used.
cluster-recheck-interval = time [15min]
Polling interval for time based changes to options, resource parameters and constraints.
The Cluster is primarily event driven, however the configuration can have elements that change based on time. To ensure these changes
take effect, we can optionally poll the cluster's status for changes. Allowed values: Zero disables polling. Positive values are an
interval in seconds (unless other SI units are specified. eg. 5min)
election-timeout = time [2min]
*** Advanced Use Only ***.
If need to adjust this value, it probably indicates the presence of a bug.
shutdown-escalation = time [20min]
*** Advanced Use Only ***.
If need to adjust this value, it probably indicates the presence of a bug.
crmd-integration-timeout = time [3min]
*** Advanced Use Only ***.
If need to adjust this value, it probably indicates the presence of a bug.
crmd-finalization-timeout = time [30min]
*** Advanced Use Only ***.
If you need to adjust this value, it probably indicates the presence of a bug.
crmd-transition-delay = time [0s]
*** Advanced Use Only *** Enabling this option will slow down cluster recovery under all conditions
Delay cluster recovery for the configured interval to allow for additional/related events to occur. Useful if your configuration is
sensitive to the order in which ping updates arrive.
expected-quorum-votes = integer [2]
The number of nodes expected to be in the cluster
Used to calculate quorum in openais based clusters.
AUTHOR
Andrew Beekhof <andrew@beekhof.net>
Author.
Pacemaker Configuration 06/10/2014 CRM DAEMON(7)