Visit The New, Modern Unix Linux Community


Clstat not working in a HACMP 7.1.3 cluster


 
Thread Tools Search this Thread
Operating Systems AIX Clstat not working in a HACMP 7.1.3 cluster
# 1  
Clstat not working in a HACMP 7.1.3 cluster

I have troubles making clstat work. All the "usual suspects" have been covered but still no luck. The topology is a two-node active/passive with only one network-interface (it is a test-setup). The application running is SAP with DB/2 as database. We do not use SmartAssists or other gadgets.

Here are the OS and HACMP-versions:

Code:
# oslevel -s
7100-03-02-1412

# lslpp -L "cluster*"
  Fileset                      Level  State  Type  Description (Uninstaller)
  ----------------------------------------------------------------------------
  cluster.adt.es.client.include
                             7.1.3.1    C     F    PowerHA SystemMirror Client
                                                   Include Files
  cluster.adt.es.client.samples.clinfo
                             7.1.3.0    C     F    PowerHA SystemMirror Client
                                                   CLINFO Samples 
  cluster.es.client.clcomd   7.1.3.1    C     F    Cluster Communication
                                                   Infrastructure
  cluster.es.client.lib      7.1.3.1    C     F    PowerHA SystemMirror Client
                                                   Libraries
  cluster.es.client.rte      7.1.3.1    C     F    PowerHA SystemMirror Client
                                                   Runtime
  cluster.es.client.utils    7.1.3.0    C     F    PowerHA SystemMirror Client
                                                   Utilities 
  cluster.es.cspoc.cmds      7.1.3.1    C     F    CSPOC Commands
  cluster.es.cspoc.rte       7.1.3.1    C     F    CSPOC Runtime Commands
  cluster.es.migcheck        7.1.3.0    C     F    PowerHA SystemMirror Migration
                                                   support 
  cluster.es.nfs.rte         7.1.3.0    C     F    NFS Support 
  cluster.es.server.diag     7.1.3.1    C     F    Server Diags
  cluster.es.server.events   7.1.3.1    C     F    Server Events
  cluster.es.server.rte      7.1.3.1    C     F    Base Server Runtime
  cluster.es.server.testtool
                             7.1.3.0    C     F    Cluster Test Tool 
  cluster.es.server.utils    7.1.3.1    C     F    Server Utilities
  cluster.license            7.1.3.0    C     F    PowerHA SystemMirror
                                                   Electronic License 
  cluster.man.en_US.es.data  7.1.3.1    C     F    Man Pages - U.S. English

cldump works and all other cluster services are working as expected too. Alas, calling clstat:

Code:
# clstat -a
Failed retrieving cluster information.

There are a number of possible causes:
clinfoES or snmpd subsystems are not active.
snmp is unresponsive.
snmp is not configured correctly.
Cluster services are not active on any nodes.

Refer to the HACMP Administration Guide for more information.

I followed this procedure and double-checked everything mentioned there:

Code:
# tail -3 /etc/snmpdv3.conf
smux            1.3.6.1.4.1.2.3.1.2.1.2         gated_password
VACM_VIEW defaultView        1.3.6.1.4.1.2.3.1.2.1.5    - included -
smux     1.3.6.1.4.1.2.3.1.2.1.5      clsmuxpd_password ::1 128

Code:
# snmpinfo -m dump -v -o /usr/es/sbin/cluster/hacmp.defs cluster  
clusterId.0 = 1560242040
clusterName.0 = "<mycluster>"
clusterConfiguration.0 = ""
clusterState.0 = 2
clusterPrimary.0 = 1
clusterLastChange.0 = 1412260986
clusterGmtOffset.0 = -3600
clusterSubState.0 = 32
clusterNodeName.0 = "<my-node-name-a>"
clusterPrimaryNodeName.0 = "<my-node-name-a>"
clusterNumNodes.0 = 2
clusterNodeId.0 = 1
clusterNumSites.0 = 0

I also made sure the services are up and snmpd is the correct one:

Code:
# lssrc -g cluster
Subsystem         Group            PID          Status 
 clstrmgrES       cluster          10027094     active
 clinfoES         cluster          18743412     active

# lssrc -a
 aixmibd          tcpip            27263194     active
 snmpmibd         tcpip            5046514      active
 hostmibd         tcpip            30802078     active
[...]
 snmpd            tcpip            24772704     active

# ls -l /usr/sbin/snmpd
lrwxrwxrwx    1 root     system            9 Feb  5 2014  /usr/sbin/snmpd -> snmpdv3ne

The loopback-addresses for IPv6 are there in the /etc/hosts:

Code:
# head -2 /etc/hosts
127.0.0.1               loopback localhost      # loopback (lo0) name/address
::1                     loopback localhost      # IPv6 loopback (lo0) name/address

In the cited document it is mentioned to remove the comments in /etc/snmpdv3.conf as a last-ditch effort which i did. The services were restarted as described there and finally the whole system rebooted. I also did a cluster verification and synchronisation (in fact several times, before and after the reboot).

To be honest i am out of ideas what i still could do.

bakunin
# 2  
I know nothing about AIX, but if the implementation of the snmp protocol is anything like elsewhere(so there may be some huge faults in my understanding), consider:

Are there required MIB lists missing as a startup parameter for snmpd?

'Failed to retrieve' can alternatively be rendered as 'do not know how'. MIB lists provide the know how. Or. It can mean 'permission denied'. So I assume permissions strings have not been changed from default. And your UDP stack/ports are all up correctly?
# 3  
Quote:
Originally Posted by jim mcnamara
I
Are there required MIB lists missing as a startup parameter for snmpd?
Thank you, Jim.

In fact all the MIB settings are in place (this is basically what the mentioned entries in /etc/snmpdv3.conf do) and the quoted line with the snmpinfo command proves that snmp is up and working as expected. I could have (and in fact - have) started snmpwalk instead and it shows the whole MIB tree for HACMP being in place. The listing is quite long so i didn't post it but in fact it is there.

In addition, if SNMP would not be configured correctly in respect to HACMP then the cldump should also not work, but does so. This is why i believe that SNMP is not the problem here but it is the common problem if clstat is not working so i posted the respective info beforehand.

bakunin

Last edited by bakunin; 10-03-2014 at 05:58 PM..
# 4  
Hi, did you solve the issue?

If not, which level of AIX do you have?

It's an 'forever' old issue on hacmp.powerHA Smilie

Did you check on support if there any efix?
http://www14.software.ibm.com/webapp/set2/psearch/search?domain=power&new=y&os=aix

I remember we solved this issue with an APAR.

Moderator's Comments:
Mod Comment edit by bakunin: fixed your link for you. You should be able to post links yourself now.

Last edited by bakunin; 10-28-2014 at 05:28 PM..
# 5  
Quote:
Originally Posted by igalvarez
Hi, did you solve the issue?
As it is, no. This is a test cluster for the latest AIX/PowerHA version and its integration with SAP.

Quote:
Originally Posted by igalvarez
If not, which level of AIX do you have?
See post #1, the output of the "oslevel" command.

Quote:
Originally Posted by igalvarez
It's an 'forever' old issue on hacmp.powerHA Smilie
Not to my knowledge. I have about 50 other clusters in my environment (mostly HACMP 6.x and 5.x, but also a few on 7.x, OS versions are 6.1-7.1.3), and "clstat" is working on any of them. I use to check cluster statii with "cldump" so i commonly do not use clstat, but i would like to understand why it is not working - just out of curiosity.

Quote:
Originally Posted by igalvarez
Did you check on support if there any efix?
I would do so but right now i do not even understand where the problem is. If i could point to a certain fileset as the culprit i would try to get an update/efix/whatever or open an PMR, but i am not sure if there is anything left i could do before. There is no point in issuing a software call only to learn that "just do this, that and that to make it work as expected".

Quote:
Originally Posted by igalvarez
I remember we solved this issue with an APAR.
I'll be thankful if you could tell what the issue was because right now i don't even understand where the problem is.

bakunin

Last edited by bakunin; 10-27-2014 at 06:21 PM..
# 6  
HI bakunin, sorry the delay..

we have got this error from time to time in our old AIX 6.1 (powerHA 6.1 GLVM) clusters. In deed last week we had to upgrade nodes from AIX 6.1TL6 to TL9 because a problem with clstat/cldump. But this is not your problem..

The steps we use here for all powerHA 6.1 clusters, sure are the same on your link above, are:
Code:
#!/bin/ksh
#
stopsrc -s hostmibd
stopsrc -s snmpmibd
stopsrc -s aixmibd
stopsrc -s snmpd
sleep 8
startsrc -s snmpd
startsrc -s aixmibd
startsrc -s snmpmibd
startsrc -s hostmibd
sleep 120
stopsrc -s clinfoES
startsrc -s clinfoES
sleep 120

Really sorry I can not help in this case...Smilie

Last edited by igalvarez; 10-29-2014 at 09:21 AM..
# 7  
Finally i found a "solution" to my problem: install a even newer version. As it seems the version i used was somewhat differently abled, as i believe the politically correct euphemism for "buggy" is. (A big THANK YOU goes to IBM for letting me do the beta-testing of software i thought to have purchased. I only bought a cluster-software but got a built-in adventure game at no cost.)

Here is what i did: first, install the latest AIX release (AIX 7.1, TL3 SP3):

Code:
# lslpp -l bos.rte
  Fileset                      Level  State      Description         
  ----------------------------------------------------------------------------
Path: /usr/lib/objrepos
  bos.rte                   7.1.3.30  COMMITTED  Base Operating System Runtime

Path: /etc/objrepos
  bos.rte                   7.1.3.30  COMMITTED  Base Operating System Runtime

# oslevel -s
7100-03-04-1441

# instfix -i | grep SP
    All filesets for 71-00-011037_SP were found.
    All filesets for 71-00-021041_SP were found.
    All filesets for 71-00-031115_SP were found.
    All filesets for 71-01-011141_SP were found.
    All filesets for 71-00-041140_SP were found.
    All filesets for 71-01-021150_SP were found.
    All filesets for 71-01-031207_SP were found.
    All filesets for 71-00-051207_SP were found.
    All filesets for 71-01-041216_SP were found.
    All filesets for 71-00-061216_SP were found.
    All filesets for 71-01-051228_SP were found.
    All filesets for 71-00-071228_SP were found.
    All filesets for 71-02-011245_SP were found.
    All filesets for 71-00-081241_SP were found.
    All filesets for 71-01-061241_SP were found.
    All filesets for 71-02-021316_SP were found.
    All filesets for 71-00-091316_SP were found.
    All filesets for 71-01-071316_SP were found.
    All filesets for 71-00-101334_SP were found.
    All filesets for 71-01-081334_SP were found.
    All filesets for 71-02-031334_SP were found.
    All filesets for 71-01-091341_SP were found.
    All filesets for 71-02-041341_SP were found.
    All filesets for 71-03-011341_SP were found.
    All filesets for 71-03-021412_SP were found.
    All filesets for 71-01-101415_SP were found.
    All filesets for 71-02-051415_SP were found.
    All filesets for 71-03-031415_SP were found.
    All filesets for 71-02-061441_SP were found.
    All filesets for 71-03-041441_SP were found.


This i did on both nodes. I am not sure if this was necessary, but together with the other changes (see below) it did the job. Next was to update the cluster software itself:

Code:
# lslpp -l | grep -i cluster
  bos.cluster.rte           7.1.3.30  COMMITTED  Cluster Aware AIX
  bos.cluster.solid         7.1.1.15  COMMITTED  POWER HA Business Resiliency
  cluster.adt.es.client.include
  cluster.adt.es.client.samples.clinfo
  cluster.es.client.clcomd   7.1.3.2  COMMITTED  Cluster Communication
  cluster.es.client.lib      7.1.3.2  COMMITTED  PowerHA SystemMirror Client
  cluster.es.client.rte      7.1.3.2  COMMITTED  PowerHA SystemMirror Client
  cluster.es.client.utils    7.1.3.1  COMMITTED  PowerHA SystemMirror Client
  cluster.es.cspoc.cmds      7.1.3.2  COMMITTED  CSPOC Commands
  cluster.es.cspoc.rte       7.1.3.2  COMMITTED  CSPOC Runtime Commands
  cluster.es.migcheck        7.1.3.0  COMMITTED  PowerHA SystemMirror Migration
  cluster.es.nfs.rte         7.1.3.1  COMMITTED  NFS Support
  cluster.es.server.diag     7.1.3.2  COMMITTED  Server Diags
  cluster.es.server.events   7.1.3.2  COMMITTED  Server Events
  cluster.es.server.rte      7.1.3.2  COMMITTED  Base Server Runtime
  cluster.es.server.testtool
                             7.1.3.0  COMMITTED  Cluster Test Tool 
  cluster.es.server.utils    7.1.3.2  COMMITTED  Server Utilities
  cluster.license            7.1.3.0  COMMITTED  PowerHA SystemMirror
  mcr.rte                   7.1.3.30  COMMITTED  Metacluster Checkpoint and
  bos.cluster.rte           7.1.3.30  COMMITTED  Cluster Aware AIX
  bos.cluster.solid          7.1.1.0  COMMITTED  Cluster Aware AIX SolidDB 
  cluster.es.client.clcomd   7.1.3.0  COMMITTED  Cluster Communication
  cluster.es.client.lib      7.1.3.2  COMMITTED  PowerHA SystemMirror Client
  cluster.es.client.rte      7.1.3.2  COMMITTED  PowerHA SystemMirror Client
  cluster.es.cspoc.rte       7.1.3.0  COMMITTED  CSPOC Runtime Commands 
  cluster.es.migcheck        7.1.3.0  COMMITTED  PowerHA SystemMirror Migration
  cluster.es.nfs.rte         7.1.3.1  COMMITTED  NFS Support
  cluster.es.server.diag     7.1.3.0  COMMITTED  Server Diags 
  cluster.es.server.events   7.1.3.0  COMMITTED  Server Events 
  cluster.es.server.rte      7.1.3.2  COMMITTED  Base Server Runtime
  cluster.es.server.utils    7.1.3.2  COMMITTED  Server Utilities
  mcr.rte                   7.1.3.30  COMMITTED  Metacluster Checkpoint and
  cluster.man.en_US.es.data  7.1.3.2  COMMITTED  Man Pages - U.S. English

After this (and of course a reboot) i did a final cluster verification, then started the cluster without any problems. SNMP (and, as far as i can see, everything else) was working as expected. All in all it took me about 25 minutes per node, most of it can be done in parallel if the cluster can be stopped. Plan about 30-60 minutes for the whole update if you have the resources ready from the NIM server and everything else working.

I hope this helps.

bakunin

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #432
Difficulty: Medium
As of May 2017, 94.5% of the 10 million most popular web pages used JavaScript, according to w3techs.com.
True or False?

10 More Discussions You Might Find Interesting

1. AIX

HACMP two-node cluster with two SAN storages mirrored using LVM

HACMP two-node cluster with mirrored LVM. HACMP two-node cluster with two SAN storages mirrored using LVM. Configured 2 disk heartbeat networks - 1 per each SAN storage. While performing redundancy tests. Once one of SAN storage is down - cluster is going to ERROR state. What are the guidelines... (2 Replies)
Discussion started by: OdilPVC
2 Replies

2. AIX

Thoughts on HACMP: Automatic start of cluster services

Hi all, I remember way back in some old environment, having the HA cluster services not being started automatically at startup, ie. no entry in /etc/inittab. I remember reason was (taken a 2 node active/passive cluster), to avoid having a backup node being booted, so that it will not... (4 Replies)
Discussion started by: zaxxon
4 Replies

3. AIX

Re-cluster 2 HACMP 5.2 nodes

Hi, A customer I'm supporting once upon a time broke their 2 cluster node database servers so they could use the 2nd standby node for something else. Now sometime later they want to bring the 2nd node back into the cluster for resilance. Problem is there are now 3 VG's that have been set-up... (1 Reply)
Discussion started by: elcounto
1 Replies

4. AIX

[Howto] Update AIX in HACMP cluster-nodes

As i have updated a lot of HACMP-nodes lately the question arises how to do it with minimal downtime. Of course it is easily possible to have a downtime and do the version update during this. In the best of worlds you always get the downtime you need - unfortunately we have yet to find this best of... (4 Replies)
Discussion started by: bakunin
4 Replies

5. AIX

hacmp - is it possible to remove a disk heartbeat network from an online cluster?

Hi I'm a little rusty with HACMP, but wanted to find out if it is possible to remove a disk heartbeat network from a running HACMP cluster. Reason is, I need to migrate all the SAN disk, so the current heartbeat disk will be disappearing. Ideally, I'd like to avoid taking the cluster down to... (2 Replies)
Discussion started by: cmcbugg
2 Replies

6. AIX

Should GPFS be configured before/after configuring HACMP for 2 node Cluster?

Hi, I have a IBM Power series machine that has 2 VIOs and hosting 20 LPARS. I have two LPARs on which GPFS is configured (4-5 disks) Now these two LPARs need to be configured for HACMP (PowerHA) as well. What is recommended? Is it possible that HACMP can be done on this config or do i... (1 Reply)
Discussion started by: aixromeo
1 Replies

7. AIX

Make system backup for 2 nodes HACMP cluster

Hi all, I was wondering if someone direct me in how to Make system backup for 2 nodes HACMP cluster ( system image ) . What are the consideration for this task (3 Replies)
Discussion started by: h@foorsa.biz
3 Replies

8. AIX

MQ upgrade(ver.6to7) in a HACMP cluster

Hi What is the procedure to upgrade the MQ from 6 to 7 in aix hacmp cluster. Do i need to bring down the cluster services running in both the nodes and then give #smitty installp in both the nodes separately. Please assist... (0 Replies)
Discussion started by: samsungsamsung
0 Replies

9. AIX

HACMP 5.4.1 Two-Node-Cluster-Configuration-Assistant fails

This post just as a follow-up for thread https://www.unix.com/aix/115548-hacmp-5-4-aix-5300-10-not-working.html: there was a bug in the clcomdES that would cause the Two-Node-Cluster-Configuration-Assistant to fail even with a correct TCP/IP adapter setup. That affected HACMP 5.4.1 in combinatin... (0 Replies)
Discussion started by: shockneck
0 Replies

10. AIX

Aix hacmp cluster question (oracle & sap)

Hello, I was wondering if I have 3 nodes (A, B, C) all configured to startup with HACMP, but I would like to configure HACMP in such a way: 1) Node B should startup first. After the cluster successfully starts up and mounts all the filesystems, then 2) Node A, and Node C should startup ! ... (4 Replies)
Discussion started by: filosophizer
4 Replies

Featured Tech Videos