04-10-2013
Well, you need to differentiate between between full SAN loss (no nodes can see disks) and failure of only one node seeing disk.
It has been years since I have debugged HACMP scripts - there have been a lot of additions to what is checked, but at it's core the problem is that a resource has gone done - not a topology element - so it is up to the application stop script to make sure the resources are released before "standard processing" continues.
To have this fully automated you would need to write a recovery script that HACMP could call - as config_too_long - means HACMP does not see this as an error.
What I would look for is using the application monitoring abilities to see that the application is down and doing a verification of the resources on the "active" node.
If I recall correctly, the steps PowerHa takes is:
1) application stop - key here is that there are no open files on file system so that following step(s) that
2) release the resources (i.e. 2a) unmount filesystems and 2b) varyoffvg volume group) can succeed.
Again, config_too_long means the script is not exiting with any status - so it is not an error. It is hanging. I would have to look at both the current script as well as application monitoring to determine if application monitoring could inject a new action by the cluster manager to forcibly unmount the filesystems. I am guessing that is not a possibility.
Comment: I would be nervous to be mixing vSCSI and NPIV within a resource group. No real issues with a mix in the cluster, but real concerns when mixing technologies for a single resource in a resource group.
Hope this helps you advance your testing! Good diligence!
10 More Discussions You Might Find Interesting
1. AIX
Hello,
I would like to know if anyone has faced this problem. Whenever there is a duplicate IP address, HACMP goes down infact HACMP ( PowerHA ) takes the whole system down.
Does anyone know how to solve this problem ? (3 Replies)
Discussion started by: filosophizer
3 Replies
2. Solaris
Greetings Forumers!
I tried installing Solaris Cluster 3.3 today. I should say I tried configuring the Cluster today. The software is already installed on two systems. I am trying to configure a shared filesystem between two 6320 Blades. I selected the "Custom" install because the "Typical"... (2 Replies)
Discussion started by: bluescreen
2 Replies
3. AIX
Hi
What is the procedure to upgrade the MQ from 6 to 7 in aix hacmp cluster. Do i need to bring down the cluster
services running in both the nodes and then give #smitty installp in both the nodes separately. Please assist... (0 Replies)
Discussion started by: samsungsamsung
0 Replies
4. AIX
Hi,
I have a IBM Power series machine that has 2 VIOs and hosting 20 LPARS.
I have two LPARs on which GPFS is configured (4-5 disks)
Now these two LPARs need to be configured for HACMP (PowerHA) as well.
What is recommended? Is it possible that HACMP can be done on this config or do i... (1 Reply)
Discussion started by: aixromeo
1 Replies
5. AIX
I am planning for building a new database server using AIX 6.1 and Oracle 11.2 using ASM.
As i have learned starting with Oracle 11.2 ASM can only be used in conjunction with Clusterware, which is Oracles HA-software. As is the companies policy we do intend to use PowerHA as HA-solution instead... (1 Reply)
Discussion started by: bakunin
1 Replies
6. AIX
Few questions regarding Power HA ( previously known as HACMP) and VIOS POWERVM IVM ( IBM Virtualization I/O Server )
Is it possible to create HACMP cluster between two VIOS servers
Physical Machine_1
VIOS_SERVER_1
LPAR_1
SHARED_DISK_XX
VIOS_SERVER_2
Physical Machine_2
LPAR_2... (6 Replies)
Discussion started by: filosophizer
6 Replies
7. AIX
As i have updated a lot of HACMP-nodes lately the question arises how to do it with minimal downtime. Of course it is easily possible to have a downtime and do the version update during this. In the best of worlds you always get the downtime you need - unfortunately we have yet to find this best of... (4 Replies)
Discussion started by: bakunin
4 Replies
8. AIX
Hi,
A customer I'm supporting once upon a time broke their 2 cluster node database servers so they could use the 2nd standby node for something else. Now sometime later they want to bring the 2nd node back into the cluster for resilance. Problem is there are now 3 VG's that have been set-up... (1 Reply)
Discussion started by: elcounto
1 Replies
9. AIX
Hi all,
I remember way back in some old environment, having the HA cluster services not being started automatically at startup, ie. no entry in /etc/inittab.
I remember reason was (taken a 2 node active/passive cluster), to avoid having a backup node being booted, so that it will not... (4 Replies)
Discussion started by: zaxxon
4 Replies
10. AIX
I have troubles making clstat work. All the "usual suspects" have been covered but still no luck. The topology is a two-node active/passive with only one network-interface (it is a test-setup). The application running is SAP with DB/2 as database. We do not use SmartAssists or other gadgets.
... (8 Replies)
Discussion started by: bakunin
8 Replies
LEARN ABOUT DEBIAN
crm_resource
PACEMAKER(8) System Administration Utilities PACEMAKER(8)
NAME
Pacemaker - Part of the Pacemaker cluster resource manager
SYNOPSIS
crm_resource (query|command) [options]
DESCRIPTION
crm_resource - Perform tasks related to cluster resources. Allows resources to be queried (definition and location), modified, and moved
around the cluster.
OPTIONS
-?, --help
This text
-$, --version
Version information
-V, --verbose
Increase debug output
-Q, --quiet
Print only the value on stdout
-r, --resource=value
Resource ID
Queries:
-L, --list
List all resources
-l, --list-raw
List the IDs of all instantiated resources (no groups/clones/...)
-O, --list-operations
List active resource operations. Optionally filtered by resource (-r) and/or node (-N)
-o, --list-all-operations
List all resource operations. Optionally filtered by resource (-r) and/or node (-N)
-q, --query-xml
Query the definition of a resource (template expanded)
-w, --query-xml-raw
Query the definition of a resource (raw xml)
-W, --locate
Display the current location(s) of a resource
-A, --stack
Display the prerequisites and dependents of a resource
-a, --constraints
Display the (co)location constraints that apply to a resource
Commands:
-p, --set-parameter=value
Set the named parameter for a resource. See also -m, --meta
-g, --get-parameter=value
Display the named parameter for a resource. See also -m, --meta
-d, --delete-parameter=value
Delete the named parameter for a resource. See also -m, --meta
-M, --move
Move a resource from its current location, optionally specifying a destination (-N) and/or a period for which it should take effect
(-u) If -N is not specified, the cluster will force the resource to move by creating a rule for the current location and a score of
-INFINITY NOTE: This will prevent the resource from running on this node until the constraint is removed with -U
-U, --un-move
Remove all constraints created by a move command
Advanced Commands:
-D, --delete
Delete a resource from the CIB
-F, --fail
Tell the cluster this resource has failed
-R, --refresh
(Advanced) Refresh the CIB from the LRM
-C, --cleanup
(Advanced) Delete a resource from the LRM
-P, --reprobe
(Advanced) Re-check for resources started outside of the CRM
Additional Options:
-N, --node=value
Host uname
-t, --resource-type=value
Resource type (primitive, clone, group, ...)
-v, --parameter-value=value
Value to use with -p, -g or -d
-u, --lifetime=value
Lifespan of migration constraints
-m, --meta
Modify a resource's configuration option rather than one which is passed to the resource agent script. For use with -p, -g, -d
-z, --utilization
Modify a resource's utilization attribute. For use with -p, -g, -d
-s, --set-name=value
(Advanced) ID of the instance_attributes object to change
-i, --nvpair=value
(Advanced) ID of the nvpair object to change/delete
-f, --force
EXAMPLES
List the configured resources:
# crm_resource --list
Display the current location of 'myResource':
# crm_resource --resource myResource --locate
Move 'myResource' to another machine:
# crm_resource --resource myResource --move
Move 'myResource' to a specific machine:
# crm_resource --resource myResource --move --node altNode
Allow (but not force) 'myResource' to move back to its original location:
# crm_resource --resource myResource --un-move
Tell the cluster that 'myResource' failed:
# crm_resource --resource myResource --fail
Stop a 'myResource' (and anything that depends on it):
# crm_resource --resource myResource --set-parameter target-role --meta --parameter-value Stopped
Tell the cluster not to manage 'myResource':
The cluster will not attempt to start or stop the resource under any circumstances. Useful when performing maintenance tasks on a
resource.
# crm_resource --resource myResource --set-parameter is-managed --meta --parameter-value false
Erase the operation history of 'myResource' on 'aNode':
The cluster will 'forget' the existing resource state (including any errors) and attempt to recover the resource. Useful when a resource
had failed permanently and has been repaired by an administrator.
# crm_resource --resource myResource --cleanup --node aNode
AUTHOR
Written by Andrew Beekhof
REPORTING BUGS
Report bugs to pacemaker@oss.clusterlabs.org
Pacemaker 1.1.7 April 2012 PACEMAKER(8)