FileSystems under HACMP

11-14-2017

Registered User

7, 0

Join Date: Nov 2017

Last Activity: 6 December 2017, 8:46 AM EST

Posts: 7

Thanks Given: 1

Thanked 0 Times in 0 Posts

FileSystems under HACMP

Dear Fellows,

I'm now working under a HACMP Cluster (version 7.1) with 2 nodes (Node1 active / Node2 passive), and 1 Resource Group on active node (Node1), which is UNMANAGED for boths nodes.
So, all VG Data are on Node1.
Then I had a JFS2 FileSystem full (located on one of these VG Data called "VG_Clust") that I had to increase it with LVM commands (chfs), not CSPOC ones (cl_chfs) and this worked fine.

This "VG_Clust" is in "read/write" permission, "Enhanced-Capable" and VG Mode "Concurrent" .

In case of Failover to Node2, should I need to "Verify and Synchronize HACMP" or perform Script Failure from Node1 ?

For so, this requires HACMP to be down ?

Could you confirm if these steps below is correct ?
smitty hacmp -> Custom Cluster Configuration -> Verify and Synchronize Cluster Configuration (Advanced)

Thanks for your kind reply.

LoLo92

View Public Profile for LoLo92

Find all posts by LoLo92

11-15-2017

Moderator

3,843, 841

Join Date: Jun 2007

Last Activity: 29 June 2020, 12:30 PM EDT

Location: Lancashire, UK

Posts: 3,843

Thanks Given: 2,004

Thanked 841 Times in 727 Posts

Assuming you have paid-for support from IBM (given that you are thinking of sending them a snap in this ticket) then that would probably be the best option, or just open a PMR and give them all the details from here.

I don't know if your cluster is in production/productive (non-prod but still important) use, but if it is then they will help you avoid risks of downtime and if something goes wrong then management will have a contractual place to blame rather than just yourself.

Sorry to shy away, but I no longer work with AIX clusters, so I can't explore for you.

Kind regards,
Robin

Last edited by rbatte1; 11-17-2017 at 05:47 AM..

rbatte1

View Public Profile for rbatte1

Visit rbatte1's homepage!

Find all posts by rbatte1

11-17-2017

Registered User

7, 0

Join Date: Nov 2017

Last Activity: 6 December 2017, 8:46 AM EST

Posts: 7

Thanks Given: 1

Thanked 0 Times in 0 Posts

Your exploration on AIX Clusters would be quite appreciated.
Thanks rbatte1

LoLo92

View Public Profile for LoLo92

Find all posts by LoLo92

11-22-2017

Registered User

6,384, 2,214

Join Date: May 2005

Last Activity: 28 October 2019, 4:59 PM EDT

Location: In the leftmost byte of /dev/kmem

Posts: 6,384

Thanks Given: 143

Thanked 2,214 Times in 1,548 Posts

It seems you do not really understand how a HACMP cluster works, so a few words for clarification. Bear with me if this is already known. Also note that i will leave out a lot of details as i can't write a complete PowerHA documentation here.

OK, let us start with the central term in HACMP, which is "resource group". What is it?

Look at an application, say, a database: for it to run you first need some file systems, where the DB files are stored. Then you need some processes (the DB process(es) running - basically the application has to be started. Finally you need an IP address under which clients from outside can connect to the database and use its services.

Exactly these three components - file systems, (started) processes and an IP address is what a "resource group" consists of.

File systems: one or more volume groups go into a resource group. When a resource group goes active all these VGs are activated on one cluster node and all the file systems in it are mounted there. In case of a resource group move the FSs are unmounted, the VGs are deactivated on the node, then activated on another node and all the FSs mounted there. HACMP does this itself for all the VGs defined in a resource group.

Processes: for each resource group there is a so-called "application monitor", a collection of a start- and a stop-script. Whenever a resource group is deactivated the stop script is executed. It should make sure the application is down, so that afterwards the file systems can be unmounted. When a RG is activated its start script is executed and should start the application. These start-/stop-scripts are provided by you and are simple shell scripts so it is easy to integrate all sorts of applications into HACMP.

Finally the IP address: every RG can have one (or several, but typically one) "service addresses". These service addresses are normal IP addresses which are added to a certain network adapter when the RG starts and remove when it stops. Technically they are IP aliases which are added to network interfaces.

An RG-start now works like this: the VGs are acquired (varyon), the filesystems are mounted, then the start-script of the application monitor is executed. finally the service-IP-address is put onto a network interface and the clients can use it to connect to the application. If the RG is moved, the service-IP is taken down, the stop-part of the application monitor stops the application, FS are unmounted and VGs deactivated, then the start-procedure is done on another node. The client will notice that the service-IP is (after a short time) available again. If a node crashes the same as in an RG-move hapeens, only the stop-part is skipped (obviously). HACMP can handle that but you need to take care of the application part in your start-script eventually, like a cleanup in a DB in case of a sudden system shutdown, etc..

When your RG is in the state "UNMANAGED" it means that its FSs, processes, etc. are there, but not started via HACMP. Stop it using HACMP so that it becomes "OFFLINE" (meaning: not active on any node), then bring it online again. Now it should be in status "ONLINE" on a certain node. You can move it to another node from there.

A word about CSPOC: you should absolutely, positively use these commands, not the normal commands, to do LVM management. The reason is that for all the components i talked about above to work all the cluster nodes needs to share consistent information about how the parts of the resoufrce groups look like. The cluster commands do the same as the normal commands, but they distribute the changed information to the other nodes too. THIS IS VITAL!

You can get away with doing LVM operations if you do a "learning import" on the passive node afterwards, eventually a cluster synchronisation too. But why take such risks if there are commands to do exactly this without any risks involved at all?

I hope this helps.

bakunin

Last edited by bakunin; 11-22-2017 at 10:04 PM.. Reason: typo

These 2 Users Gave Thanks to bakunin For This Post:

bakunin

View Public Profile for bakunin

Find all posts by bakunin

11-23-2017

Registered User

7, 0

Join Date: Nov 2017

Last Activity: 6 December 2017, 8:46 AM EST

Posts: 7

Thanks Given: 1

Thanked 0 Times in 0 Posts

Hello Bakunin,
Thanks for your kind reply.

Actually, in my low-budget Customer environnement, this very HACMP cluster is only configured and used when needed, that's why boths nodes are UNMANAGED for instance.
And what risks you mentioned above could happended when using LVM commands instead of CSPOC ones ?

Kind Regards

LoLo92

View Public Profile for LoLo92

Find all posts by LoLo92

11-23-2017

Moderator

3,843, 841

Join Date: Jun 2007

Last Activity: 29 June 2020, 12:30 PM EDT

Location: Lancashire, UK

Posts: 3,843

Thanks Given: 2,004

Thanked 841 Times in 727 Posts

So, do they both have access to a shared disk? The volume group is the smallest disk entity that can define to share between them, so you can't usually have one logical volume/filesystem accessed on NodeA with a different one in the same VG accessed on NodeB.

If you force the issue, you can have both servers accessing the shared disk at the same time, but as you can imagine, there will be conflicts because there is no locking between them. Imagine that NodeA reads a directory. NodeB then updates it. NodeA is not aware (because it will have cached it) and may make a different change that NodeB is then not aware of. There will very quickly be conflict over the free block list, file names, timestamps etc. It is possible that replaced files will be seen separately and then have random parts overwritten as time progresses. You will end up with a filesystem that is corrupted badly and will require 'fixing', but it is pot luck what gets salvaged and what is lost/damaged.

Can you describe what resources you have? Do you have a shared IP address that clients connect to and you can move to the 'Active' node?

If you want an Active-Active style cluster for load balancing you may be looking at Oracle RAC (with the associated costs) or maybe achieve this with more servers. The servers running the application that needs the data would NFS mount from an HA cluster set up to serve up the disk and they handle passing the volume group & IP address that the application servers connect to. The NFS mount on the applications servers will wait if the NFS server (appears to be singular) goes away and should recover when it (probably the other node) makes it available again.

Of course, there is then the performance cost of NFS if that is an issue to you.

A better description of your configuration and application needs might get a more useful response to help you.

Kind regards,
Robin

rbatte1

View Public Profile for rbatte1

Visit rbatte1's homepage!

Find all posts by rbatte1

11-26-2017

Registered User

6,384, 2,214

Join Date: May 2005

Last Activity: 28 October 2019, 4:59 PM EDT

Location: In the leftmost byte of /dev/kmem

Posts: 6,384

Thanks Given: 143

Thanked 2,214 Times in 1,548 Posts

Quote:

Originally Posted by LoLo92

Actually, in my low-budget Customer environnement, this very HACMP cluster is only configured and used when needed

What do you mean by that? The whole point of a cluster is high-availability. If one of the nodes break the application still runs. If you know in advance when your node breaks you don't a cluster at all (although i don't believe such astute foretelling skills exist).

Quote:

Originally Posted by LoLo92

that's why boths nodes are UNMANAGED for instance.

I don't understand this. "nodes" are the systems taking part in the cluster. They cannot be "unmanaged". They can only have their cluster services started ("joined the cluster") or not.

"Unmanaged" is a state only a resource group can be in.

Quote:

Originally Posted by LoLo92

And what risks you mentioned above could happended when using LVM commands instead of CSPOC ones ?

I thought i described that in pretty detail: you have a cluster for the situations where something has (quite drastically) gone wrong. To make it possible that filesystems, volumes, etc. are taken over safely and started on the other node they share the information about how these FSes, LVs, etc. are built and in which state exactly they are right now. If you make changes to a LV (like increasing its size, etc.) and use normal LVM commands this information will not be propagated to the other nodes because these commands are not cluster-aware. If you use the respective CSPOC commands which indeed are cluster-aware they will do the same as the normal LVM commands but also use the clusters communication services (RSCT) to propagate this changed information to the other nodes immediately.

Again, you can get away with using "learning imports" on the other nodes to make the information consistent again, but why not just use the cluster commands, which do that automatically?

I hope this helps.

bakunin

bakunin

View Public Profile for bakunin

Find all posts by bakunin

AIX

FileSystems under HACMP

10 More Discussions You Might Find Interesting

1. Solaris

Clustering filesystems

Discussion started by: Harleyrci

2. Shell Programming and Scripting

filesystems > 70%

Discussion started by: eponcedeleonc

3. Shell Programming and Scripting

Filesystems using more than 75% capacity

Discussion started by: xtatic

4. AIX

HACMP does not start db2 after failover (db2nodes not getting modified by hacmp)

Discussion started by: gkr747

5. UNIX for Dummies Questions & Answers

mounted filesystems

Discussion started by: ravijanjanam12

6. AIX

HACMP and Filesystems question

Discussion started by: mhenryj

7. UNIX for Advanced & Expert Users

resize filesystems

Discussion started by: mhbd

8. Filesystems, Disks and Memory

Vdump of two filesystems

Discussion started by: geraldwilson

9. Shell Programming and Scripting

Filesystems GT 95%

Discussion started by: YS2002

10. UNIX for Advanced & Expert Users

Filesystems

Discussion started by: marun