11-22-2010
Quote:
Originally Posted by
samsungsamsung
Hi. We have a HACMP two node (Active passive cluster).
Recently the primary node went to shutdown state and all the resource groups moved to the secondary node.
People say that this has happened because of the /var filesystem was 100% utilized in the primary node. Is this true?
Short answer: Yes.
Long answer: sth. in the OS failed and therefore the cluster followed. It is the AIX server that cannot live with /var getting 100% full. Ironically this failure might have been caused by the HA software logging to many information in the /var filesystem. Works as designed...
Quote:
Originally Posted by
samsungsamsung
Will it switch over to the secondary node, if the /var filesystem was 100% utilized in the primary node ?
If the server malfunctions the cluster is expected to failover. Nothing wrong here.
If you did not guess by now how to work around this classic error here is my tip for you: an HACMP admin never installs HACMP/PowerHA on an AIX server without creating a separate filesystem (e.g. /var/hacmp/) for the cluster logs first.
Last edited by shockneck; 11-22-2010 at 01:20 PM..
Reason: spelling, removed redundant information
This User Gave Thanks to shockneck For This Post:
10 More Discussions You Might Find Interesting
1. AIX
Hello
I am a noobie to HACMP
I have 2 55A servers and a 7031 disk subsystem
For HACMP to work, do I need to have the hdisks on both servers to match the same drives??
The hdisks on each box are currently different:
hdisk0 and hdisk1 are the internal disks and hdisk2-5 are on the D24 on... (1 Reply)
Discussion started by: mhenryj
1 Replies
2. Shell Programming and Scripting
I would like to use the result of a query in another query. How do I redirect/add the output to another variable?
$result = odbc_exec($connect, $query);
while ($row = odbc_fetch_array($result)) {
echo $row,"\n";
}
odbc_close($connect);
?>
This will output hostnames:
host1... (0 Replies)
Discussion started by: hazno
0 Replies
3. AIX
Hi,
Can we use network for heartbeat, I mean can we use different network card for heartbeat. (6 Replies)
Discussion started by: vjm
6 Replies
4. Shell Programming and Scripting
Hi,
I have a requirement as below which needs to be done viz UNIX shell script
(1) I have to connect to an Oracle database
(2) Exexute "SELECT field_status from table 1" query on one of the tables.
(3) Based on the result that I get from point (2), I have to update another table in the... (6 Replies)
Discussion started by: balaeswari
6 Replies
5. AIX
hi
can anyone explain the concepts of HACMP and configuration (step by step) (2 Replies)
Discussion started by: udtyuvaraj
2 Replies
6. AIX
hi,
when I do a failover, hacmp always starts db2 but recently it fails to start db2..noticed the issue is db2nodes.cfg is not modified by hacmp and is still showing primary node..manually changed the node name to secondary after which db2 started immediately..unable to figure out why hacmp is... (4 Replies)
Discussion started by: gkr747
4 Replies
7. Shell Programming and Scripting
Hi,
I need to query Oracle database for 100 users. I have these 100 users in a file. I need a shell script which would read this User file (one user at a time) & query database.
For instance:
USER CITY
--------- ----------
A CITY_A
B CITY_B
C ... (2 Replies)
Discussion started by: DevendraG
2 Replies
8. AIX
Does anyone has idea about, what is the ibm standard HACMP trip interval?
We have 20 second.
lssrc -ls topsvcs
Subsystem Group PID Status
topsvcs topsvcs 1843200 active
Network Name Indx Defd Mbrs St Adapter ID Group ID
HB Interval =... (7 Replies)
Discussion started by: allwin
7 Replies
9. AIX
Hi.
We have a two node HA cluster.
We got a request to change one of the VG name?
Is there an option to do this online ? If it requires downtime can someone please explain me the steps for doing it ?
Let me know if you need any outputs from the servers (1 Reply)
Discussion started by: newtoaixos
1 Replies
10. AIX
Hi,
I have question about HA. I have 2 node cluster (node A and node B). I have configured network and disk HB. If my network is up and i remove both the fc cables from node A will my cluster failover to node B? I have checked and its not working, if i want my cluster to failover in this... (5 Replies)
Discussion started by: powerAIX
5 Replies
LEARN ABOUT SUSE
crm_failcount
CRM_FAILCOUNT(8) [FIXME: manual] CRM_FAILCOUNT(8)
NAME
crm_failcount - manipulate the failcount attribute on a given resource
SYNOPSIS
crm_failcount [-?|-V] -D -u|-U node -r resource
crm_failcount [-?|-V] -G -u|-U node -r resource
crm_failcount [-?|-V] -v string -u|-U node -r resource
DESCRIPTION
Heartbeat implements a sophisticated method to compute and force failover of a resource to another node in case that resource tends to fail
on the current node. A resource carries a resource_stickiness attribute to determine how much it prefers to run on a certain node. It also
carries a resource_failure_stickiness that determines the threshold at which the resource should failover to another node.
The failcount attribute is added to the resource and increased on resource monitoring failure. The value of failcount multiplied by the
value of resource_failure_stickiness determines the failover score of this resource. If this number exceeds the preference set for this
resource, the resource is moved to another node and not run again on the original node until the failure count is reset.
The crm_failcount command queries the number of failures per resource on a given node. This tool can also be used to reset the failcount,
allowing the resource to run again on nodes where it had failed too often.
OPTIONS
--help, -?
Print a help message.
--verbose, -V
Turn on debug information.
Note
Increase the level of verbosity by providing additional instances.
--quiet, -Q
When doing an attribute query using -G, print just the value to stdout. Use this option with -G.
--get-value, -G
Retrieve rather than set the preference.
--delete-attr, -D
Specify the attribute to delete.
--attr-value string, -v string
Specify the value to use. This option is ignored when used with -G.
--node-uuid node_uuid, -u node_uuid
Specify the UUID of the node to change.
--node-uname node_uname, -U node_uname
Specify the uname of the node to change.
--resource-id resource name, -r resource name
Specify the name of the resource on which to operate.
EXAMPLES
Reset the failcount for the resource myrsc on the node node1:
crm_failcount -D -U node1 -r my_rsc
Query the current failcount for the resource myrsc on the node node1:
crm_failcount -G -U node1 -r my_rsc
FILES
/var/lib/heartbeat/crm/cib.xml--the CIB (minus status section) on disk. Editing this file directly is strongly discouraged.
SEE ALSO
???, ???, and the Linux High Availability FAQ Web site[1]
AUTHOR
crm_failcount was written by Andrew Beekhof.
NOTES
1. Linux High Availability FAQ Web site
http://www.linux-ha.org/v2/faq/forced_failover
[FIXME: source] 07/05/2010 CRM_FAILCOUNT(8)