PowerHA(HACMP) full vg loss - cluster hangs on release_vg

Sponsored Content

Operating Systems AIX PowerHA(HACMP) full vg loss - cluster hangs on release_vg_fs event Post 302792615 by MichaelFelt on Wednesday 10th of April 2013 04:57:49 PM

04-10-2013

Registered User

Well, you need to differentiate between between full SAN loss (no nodes can see disks) and failure of only one node seeing disk.

It has been years since I have debugged HACMP scripts - there have been a lot of additions to what is checked, but at it's core the problem is that a resource has gone done - not a topology element - so it is up to the application stop script to make sure the resources are released before "standard processing" continues.

To have this fully automated you would need to write a recovery script that HACMP could call - as config_too_long - means HACMP does not see this as an error.

What I would look for is using the application monitoring abilities to see that the application is down and doing a verification of the resources on the "active" node.

If I recall correctly, the steps PowerHa takes is:
1) application stop - key here is that there are no open files on file system so that following step(s) that
2) release the resources (i.e. 2a) unmount filesystems and 2b) varyoffvg volume group) can succeed.

Again, config_too_long means the script is not exiting with any status - so it is not an error. It is hanging. I would have to look at both the current script as well as application monitoring to determine if application monitoring could inject a new action by the cluster manager to forcibly unmount the filesystems. I am guessing that is not a possibility.

Comment: I would be nervous to be mixing vSCSI and NPIV within a resource group. No real issues with a mix in the cluster, but real concerns when mixing technologies for a single resource in a resource group.

Hope this helps you advance your testing! Good diligence!

MichaelFelt

View Public Profile for MichaelFelt

Find all posts by MichaelFelt

10 More Discussions You Might Find Interesting

1. AIX

Duplicate IP address makes PowerHA ( HACMP ) down

Hello, I would like to know if anyone has faced this problem. Whenever there is a duplicate IP address, HACMP goes down infact HACMP ( PowerHA ) takes the whole system down. Does anyone know how to solve this problem ?

2. Solaris

Solaris Cluster Install Hangs

Greetings Forumers! I tried installing Solaris Cluster 3.3 today. I should say I tried configuring the Cluster today. The software is already installed on two systems. I am trying to configure a shared filesystem between two 6320 Blades. I selected the "Custom" install because the "Typical"...

3. AIX

MQ upgrade(ver.6to7) in a HACMP cluster

Hi What is the procedure to upgrade the MQ from 6 to 7 in aix hacmp cluster. Do i need to bring down the cluster services running in both the nodes and then give #smitty installp in both the nodes separately. Please assist...

4. AIX

Should GPFS be configured before/after configuring HACMP for 2 node Cluster?

Hi, I have a IBM Power series machine that has 2 VIOs and hosting 20 LPARS. I have two LPARs on which GPFS is configured (4-5 disks) Now these two LPARs need to be configured for HACMP (PowerHA) as well. What is recommended? Is it possible that HACMP can be done on this config or do i...

5. AIX

Interoperability Oracle Clusterware - PowerHA/HACMP

I am planning for building a new database server using AIX 6.1 and Oracle 11.2 using ASM. As i have learned starting with Oracle 11.2 ASM can only be used in conjunction with Clusterware, which is Oracles HA-software. As is the companies policy we do intend to use PowerHA as HA-solution instead...

6. AIX

PowerHA HACMP on VIOS servers

Few questions regarding Power HA ( previously known as HACMP) and VIOS POWERVM IVM ( IBM Virtualization I/O Server ) Is it possible to create HACMP cluster between two VIOS servers Physical Machine_1 VIOS_SERVER_1 LPAR_1 SHARED_DISK_XX VIOS_SERVER_2 Physical Machine_2 LPAR_2...

7. AIX

[Howto] Update AIX in HACMP cluster-nodes

As i have updated a lot of HACMP-nodes lately the question arises how to do it with minimal downtime. Of course it is easily possible to have a downtime and do the version update during this. In the best of worlds you always get the downtime you need - unfortunately we have yet to find this best of...

8. AIX

Re-cluster 2 HACMP 5.2 nodes

Hi, A customer I'm supporting once upon a time broke their 2 cluster node database servers so they could use the 2nd standby node for something else. Now sometime later they want to bring the 2nd node back into the cluster for resilance. Problem is there are now 3 VG's that have been set-up...

9. AIX

Thoughts on HACMP: Automatic start of cluster services

Hi all, I remember way back in some old environment, having the HA cluster services not being started automatically at startup, ie. no entry in /etc/inittab. I remember reason was (taken a 2 node active/passive cluster), to avoid having a backup node being booted, so that it will not...

10. AIX

Clstat not working in a HACMP 7.1.3 cluster

I have troubles making clstat work. All the "usual suspects" have been covered but still no luck. The topology is a two-node active/passive with only one network-interface (it is a test-setup). The application running is SAP with DB/2 as database. We do not use SmartAssists or other gadgets. ...

LEARN ABOUT CENTOS

crm_mon

PACEMAKER(8)						  System Administration Utilities					      PACEMAKER(8)

NAME

       Pacemaker - Part of the Pacemaker cluster resource manager

SYNOPSIS

       crm_mon mode [options]

DESCRIPTION

       crm_mon - Provides a summary of cluster's current state.

       Outputs varying levels of detail in a number of different formats.

OPTIONS

       -?, --help
	      This text

       -$, --version
	      Version information

       -V, --verbose
	      Increase debug output

       -Q, --quiet
	      Display only essential output

   Modes:
       -h, --as-html=value
	      Write cluster status to the named html file

       -X, --as-xml
	      Write cluster status as xml to stdout. This will enable one-shot mode.

       -w, --web-cgi
	      Web mode with output suitable for cgi

       -s, --simple-status
	      Display the cluster status once as a simple one line output (suitable for nagios)

   Display Options:
       -n, --group-by-node
	      Group resources by node

       -r, --inactive
	      Display inactive resources

       -f, --failcounts
	      Display resource fail counts

       -o, --operations
	      Display resource operation history

       -t, --timing-details
	      Display resource operation history with timing details

       -c, --tickets
	      Display cluster tickets

       -W, --watch-fencing
	      Listen for fencing events. For use with --external-agent, --mail-to and/or --snmp-traps where supported

       -L, --neg-locations[=value]
	      Display negative location constraints [optionally filtered by id prefix]

       -A, --show-node-attributes
	      Display node attributes

   Additional Options:
       -i, --interval=value
	      Update frequency in seconds

       -1, --one-shot
	      Display the cluster status once on the console and exit

       -N, --disable-ncurses
	      Disable the use of ncurses

       -d, --daemonize
	      Run in the background as a daemon

       -p, --pid-file=value
	      (Advanced) Daemon pid file location

       -E, --external-agent=value
	      A program to run when resource operations take place.

       -e, --external-recipient=value A recipient for your program (assuming you want the program to send something to someone).

EXAMPLES

       Display the cluster status on the console with updates as they occur:

	      # crm_mon

       Display the cluster status on the console just once then exit:

	      # crm_mon -1

       Display your cluster status, group resources by node, and include inactive resources in the list:

	      # crm_mon --group-by-node --inactive

       Start crm_mon as a background daemon and have it write the cluster status to an HTML file:

	      # crm_mon --daemonize --as-html /path/to/docroot/filename.html

       Start crm_mon and export the current cluster status as xml to stdout, then exit.:

	      # crm_mon --as-xml

AUTHOR

       Written by Andrew Beekhof

REPORTING BUGS

       Report bugs to pacemaker@oss.clusterlabs.org

Pacemaker 1.1.10-29.el7 					     June 2014							      PACEMAKER(8)

10 More Discussions You Might Find Interesting

1. AIX

Duplicate IP address makes PowerHA ( HACMP ) down

Discussion started by: filosophizer

2. Solaris

Solaris Cluster Install Hangs

Discussion started by: bluescreen

3. AIX

MQ upgrade(ver.6to7) in a HACMP cluster

Discussion started by: samsungsamsung

4. AIX

Should GPFS be configured before/after configuring HACMP for 2 node Cluster?

Discussion started by: aixromeo

5. AIX

Interoperability Oracle Clusterware - PowerHA/HACMP

Discussion started by: bakunin

6. AIX

PowerHA HACMP on VIOS servers

Discussion started by: filosophizer

7. AIX

[Howto] Update AIX in HACMP cluster-nodes

Discussion started by: bakunin

8. AIX

Re-cluster 2 HACMP 5.2 nodes

Discussion started by: elcounto

9. AIX

Thoughts on HACMP: Automatic start of cluster services

Discussion started by: zaxxon

10. AIX

Clstat not working in a HACMP 7.1.3 cluster

Discussion started by: bakunin

LEARN ABOUT CENTOS

crm_mon