Deactivating HACMP without deinstall


 
Thread Tools Search this Thread
Operating Systems AIX Deactivating HACMP without deinstall
# 1  
Old 06-04-2013
Deactivating HACMP without deinstall

Looking at the configuration of the cluster from clshowres it seems to be have 3 nodes but one acting as a backup. Version of HACMP is 5.2

Node A --> Node C
Node B --> Node C

However, looking at the current services running from clshowsrv -a, I can only see clcomd running. clstrmgrES, topsvcs are down.

Looking at inittab, I can see the below startup related to cluster:

/usr/es/sbin/cluster/.telinit
startsrc -s clcomdES
/usr/es/sbin/cluster/etc/harc.net

The one that seemed to be missing is the below which I believe should be the most important to start the cluster and resource group itself.

hacmp:2:wait:/usr/sbin/etc/rc.cluster -boot> /dev/console 2>&1 #

This is a mission critical server and I have not seen this server being brought up and down myself. The feedback that I have got collected is that one of the node have shutdown by itself. I can see error TTY_HOG and "dead man switch" which I have encountered before on a good HACMP setup if the load is too high and the other node could not reach the other and it initiates a failover to the next available node and the server with the high load restarted. Also, I learned that the VG could not be varied on automatically upon system restart (which is expected if RG is not brought up on the node).

Here are just my below opinion but I can only speculate as I can't really get enough downtime to diagnose the problem:

1.) Absence of rc.cluster -boot will not start clstrmgrES and other services thus RG will not be brought up and VG will not be varied on.
2.) Since clcomdES is started it still checks communication between ethernet and non-ip(tty on this setup). One node is unreachable thus reboot is triggered and since clstrmgrES is not started the RG will not be failing over to the next node.

Now, it has come to a decision to just break the cluster and let it be stand alone. I know there are better things to do but that is what best suits the uptime. Is commenting out the cluster related startups on inittab and setting the VG to varyon upon reboot and set filesystem to automount, alias the service to one of the boot manually enough to not have the incident in the future. I know that the proper process would be to stop the cluster services and remove all cluster.* filesets and set the service IPs to the previous boot IP interface but its just that there is the downtime is not enough to do all those.

I believe that stopping clcomdES shouldn't harm even while its up as the cluster holding the RG itself is down. But need some experience sharing. Any other thing that can be done to disable the cluster and switch to standalone mode without uninstalling the cluster filesets?
# 2  
Old 06-04-2013
5.2! - going back a few years here Smilie

Normally, cluster services are not started by default. Why reboot, restart cluster services, have the application return - only to crash again.

So, initially, I would go to a working node - that currently holds the resource group, and set it so the it does not return to "primary" node when that node goes online. Sorry, I have forgotten the term used in 5.2 days.

Or, you just run # smitty clstart when things are not too busy because there is a good chance that the cluster will stop one or more resource groups - assume at least one - and try to reactivate on the reactivated node.

The less drastic measure, compared to what you suggest, would be to remove the resource group from the cluster configuration. Just "force" synchronize to all remaining nodes.
P.S. make a snapshot - ALWAYS - before making any changes to any unknown configuration.

so - basically, the normal idea would be to clstart the inactive node(s) and let HACMP do its thing. This assumes the configuration is ok to begin with.
# 3  
Old 06-04-2013
It has come to a decision now to making the OS standalone for now as there is *not* much time to troubleshoot / correct the cluster configuration. Even removing the cluster fileset is not acceptable. Looking at the cluster resources and verify logs, which seemed to be running every midnight, it looks healthy (though I can't really say much as I have never seen the resource group working). All the while its a cluster without a substance having clcomd running and RG (meaning VG, FS mounted normally, apps started manually) and downtime not as generous. As much as possible, I want the cluster to be taken out completely if that will be the direction but even so, they can only give me a 5 minute window for a reboot.

My question lives wherein will below suffice in taking out the cluster service and no "dead man switch" or that can take over the system and have it just crash:

1.) Comment out / Take out cluster startups (harc.net, telinit, clcomd) from inittab
2.) Set the VGs to be varied on upon boot, filesystem be mounted automatically
3.) Putting the service IP as alias to one interface
4.) Reboot machine

Maybe until they can migrate to a new box with a more intensive tests done before rolling it out and see to it that failover testing needs to be done if need be and of course resource group failing over the way it should be. It may sound like a irresponsible move now but the uptime is more important as of this 5.2 setup. They *think* its a risk even starting the cluster thus a decision to making it standalone as that it functions currently all manual. Thanks.
# 4  
Old 06-04-2013
Assumption #1: no resource groups are active

If true, HACMP is not active so you can just remove the cluster.

Easiest way is to just go into smitty hacmp menu and find the option to remove the cluster. If I recall correctly - there is no need to syncronize because the cluster definition is removed - thus, the node has no idea how to syncronize.

Repeat this on each node.

Whether you delete the cluster.* filesets is an option. Once the cluster definition is removed the software is dormant.

Hope this helps!
# 5  
Old 06-08-2013
Thanks a lot. I have tried this on one of our clustered server and no problem.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. AIX

Hacmp

hello ive 2 nodes aix 6100-04-10-1119 with hacmp 6.1.0.0 my qustion is : can i add default gateway to one member node with hacmp a live or to do take over to realse one of the members ? thanks When it is urgent, then open the thread up in the appropriate subforum next time, thanks. (3 Replies)
Discussion started by: ariec
3 Replies

2. Solaris

./deinstall: Exited from program error

I am wondering if I can get some help here. I am deinstalling Oracle home on solaris. Downloaded deintsall zip file from oracle and then ran ./deinstall -home (Full Path) 10 seconds later I get this: Any idea how to resolve this issue? $ ./deinstall -home... (4 Replies)
Discussion started by: newborndba
4 Replies

3. AIX

Hacmp

Hi, I have question about HA. I have 2 node cluster (node A and node B). I have configured network and disk HB. If my network is up and i remove both the fc cables from node A will my cluster failover to node B? I have checked and its not working, if i want my cluster to failover in this... (5 Replies)
Discussion started by: powerAIX
5 Replies

4. AIX

Help with HACMP

Hi. We have a two node HA cluster. We got a request to change one of the VG name? Is there an option to do this online ? If it requires downtime can someone please explain me the steps for doing it ? Let me know if you need any outputs from the servers (1 Reply)
Discussion started by: newtoaixos
1 Replies

5. AIX

HACMP

Does anyone has idea about, what is the ibm standard HACMP trip interval? We have 20 second. lssrc -ls topsvcs Subsystem Group PID Status topsvcs topsvcs 1843200 active Network Name Indx Defd Mbrs St Adapter ID Group ID HB Interval =... (7 Replies)
Discussion started by: allwin
7 Replies

6. AIX

HACMP does not start db2 after failover (db2nodes not getting modified by hacmp)

hi, when I do a failover, hacmp always starts db2 but recently it fails to start db2..noticed the issue is db2nodes.cfg is not modified by hacmp and is still showing primary node..manually changed the node name to secondary after which db2 started immediately..unable to figure out why hacmp is... (4 Replies)
Discussion started by: gkr747
4 Replies

7. AIX

HACMP

hi can anyone explain the concepts of HACMP and configuration (step by step) (2 Replies)
Discussion started by: udtyuvaraj
2 Replies

8. AIX

HACMP

Hi, Can we use network for heartbeat, I mean can we use different network card for heartbeat. (6 Replies)
Discussion started by: vjm
6 Replies

9. AIX

deinstall devices.fcp.disk.array.rte

Hello, I am trying to "deinstall devices.fcp.disk.array.rte" fileset from an AIX5.3 box. Have been unsuccessful so far. "smit remove" threw the below error: FAILURES -------- Filesets listed in this section failed pre-deinstall verification and will not be removed. ... (2 Replies)
Discussion started by: ronykris
2 Replies

10. HP-UX

HACMP in HP-UX

Hi, Can anybody advice whether it is possible to configure HACMP in HP-UX Server. To my knowledge HACMP is IBM Solution. Thanks && Regards, N. Poorna Chandra Rao. (2 Replies)
Discussion started by: npcrao
2 Replies
Login or Register to Ask a Question