As i have updated a lot of HACMP-nodes lately the question arises how to do it with minimal downtime. Of course it is easily possible to have a downtime and do the version update during this. In the best of worlds you always get the downtime you need - unfortunately we have yet to find this best of worlds.
The following procedure is proven to work with AIX 5.3, 6.x and 7.x and associated HACMP/PowerHA versions. It needs only one takeover, so the downtime is from somewhere from under a minute to some minutes, depending on the nature of your resource group(s).
Communications in HACMP happens via RSCT and for a cluster to work the version of the RSCT-packages have to be in sync. Fortunately it is easy to update the RSCT independent of the rest of the OS. This is what this procedure depends on. We will consider a dual-node cluster with an active and a standby-system (rotating cluster), but the procedure can easily be adapted to other cluster-architectures.
- Stop the clustermanager on the standby-node. This will end the cluster-communication. The remaining node will be on its own.
- Update the RSCT-packages on both nodes. It won't matter that the communication path over the RSCT-daemons will be disrupted, because there is nobody to communicate with anyways.
- Optional step: If you are of the well and truly paranoid type (like me) you can now restart the clustermanager on the standby-node and do a cluster-synchronization. I never experienced any problems when i tried this procedure in a test-environment and skipped this step, but i still feel better to do it when working on a PROD-system.
- Stop the clustermanager on the standby-system again and update the rest of AIX and/or HACMP. Because you made sure the RSCT-daemons are already updated and at a equal version it won't do any harm if the versions of the other packages are different.
- Once the standby-system has finished the update restart cluster-services and move the resource-group to the standby-system. This takeover will be your downtime.
- Update now the remaining node after shutting down cluster-services. After the update finished restart cluster-services and do a cluster-synchronization. You are finished.
I hope this helps.
bakunin