Can anyone shed some light on this HACMP failover? Post: 302096203

Sponsored Content

Operating Systems AIX Can anyone shed some light on this HACMP failover? Post 302096203 by Wez on Tuesday 14th of November 2006 11:05:34 AM

11-14-2006

Registered User

Can anyone shed some light on this HACMP failover?

Hello All,

Here is a snipet from our cluster.log, I was wondering if anyone could shed some light on what may have caused the failover.

The first two lines indicate a possible memory issue which I am currently looking into.

Quote:

Nov 7 16:30:21 server_01 grpsvcs[16000]: (Recorded using libct_ffdc.a cv 2):::Error ID: 6xYcC4/BO8I3/c2C/4Im5t....................:::Reference ID: :::Template ID: 463a893d:::Details File: :::Location: RSCT,pgsd.C,1.51,195 :::GS_ERROR_ER Internal logic error in Group Services daemon DIAGNOSTIC EXPLANATION Memory allocation failed. Please check the memory availability.
Nov 7 16:30:21 server_01 grpsvcs[16000]: (Recorded using libct_ffdc.a cv 2):::Error ID: 6xYcC4/BO8I3/Ysc/4Im5t....................:::Reference ID: :::Template ID: 463a893d:::Details File: :::Location: RSCT,pgsd.C,1.51,195 :::GS_ERROR_ER Internal logic error in Group Services daemon DIAGNOSTIC EXPLANATION Memory allocation failed. Please check the memory availability.
Nov 7 16:32:10 server_01 clstrmgrES[17318]: Tue Nov 7 16:32:10 SendInfoBcast: ha_gs_send_message() failed rc=1
Nov 7 16:32:10 server_01 clstrmgrES[17318]: Tue Nov 7 16:32:10 clstrmgr on node 1 is exiting with code 4
Nov 7 16:32:10 server_01 haemd[16528]: LPP=PSSP,Fn=emd_gsi.c,SID=1.4.1.33,L#=1361, haemd: 2521-032 Cannot dispatch group services (1).
Nov 7 16:32:11 server_01 clsmuxpdES[17574]: clRGInfoGetRGHandle() failed, error: : The system call does not exist on this system.
Nov 7 16:32:11 server_01 clsmuxpdES[17574]: Error from ha_em_receive_response(): EMAPI error number 10 EMAPI error message 2521-649 An attempt to receive a command response was unsuccessful; read() detected end-of-file; connection with Event Manager lost. : The system call does not exist on this system.
Nov 7 16:32:11 server_01 clsmuxpdES[17574]: Event Manager API Disconnected:: The system call does not exist on this system.
Nov 7 16:32:11 server_01 snmpd[14998]: NOTICE: SMUX packet from (127.0.0.1+32771+1)
Nov 7 16:32:11 server_01 snmpd[14998]: NOTICE: SMUX trap: (6 10) (127.0.0.1+32771+1)
Nov 7 16:32:11 server_01 snmpd[14998]: NOTICE: SMUX packet from (127.0.0.1+32771+1)
Nov 7 16:32:11 server_01 snmpd[14998]: NOTICE: SMUX trap: (6 11) (127.0.0.1+32771+1)
Nov 7 16:32:12 server_01 snmpd[14998]: NOTICE: SMUX packet from (127.0.0.1+32771+1)
Nov 7 16:32:12 server_01 snmpd[14998]: NOTICE: SMUX trap: (6 15) (127.0.0.1+32771+1)
Nov 7 16:32:12 server_01 HACMP for AIX: clexit.rc : Unexpected termination of clstrmgrES.
Nov 7 16:32:12 server_01 HACMP for AIX: clexit.rc : Halting system immediately!!!
Nov 7 17:29:19 server_01 RMCdaemon[11610]: (Recorded using libct_ffdc.a cv 2):::Error ID: 6eKora0TF9I3/6V2/4Im5t....................:::Reference ID: :::Template ID: a6df45aa:::Details File: :::Location: RSCT,rmcd.c,1.34,196 :::RMCD_INFO_0_ST The daemon is started.
Nov 7 17:29:19 server_01 ctcasd[11870]: (Recorded using libct_ffdc.a cv 2):::Error ID: 6YzeY.1TF9I3/UeV/4Im5t....................:::Reference ID: :::Template ID: c092afe4:::Details File: :::Location: rsct.core.sec,ctcas_main.c,1.13,295 :::ctcasd Daemon Started

Thanks.

Wez

View Public Profile for Wez

Find all posts by Wez

4 More Discussions You Might Find Interesting

1. AIX

failover on 4.5 hacmp

Hi All, How do I trigger the failover on the second hacmp server? Please give me idea and I will do the rest. Thanks, itik

2. AIX

hacmp ip load balancer failover

Hi All, How do I failover on the ip load balancer (back and forth)? It involves first to load a new config on the passive ip. If success, load the new config on the ip active (which is now passive). Any idea, please. Thanks in advance.

3. AIX

HACMP does not start db2 after failover (db2nodes not getting modified by hacmp)

hi, when I do a failover, hacmp always starts db2 but recently it fails to start db2..noticed the issue is db2nodes.cfg is not modified by hacmp and is still showing primary node..manually changed the node name to secondary after which db2 started immediately..unable to figure out why hacmp is...

4. AIX

HACMP with VIO, service IP failover

Would anyone please kindly help to solve this problem... An LPAR with the below network configuration. ent0 and ent1 are logical lan (virtual ethernet) from VIO SEA. en0 1.2.3.4 <- boot ip 192.168.1.1 <- persistent ip 192.168.1.10 <- service ip en1 11.22.33.44 <- boot ip When I...

LEARN ABOUT OPENSOLARIS

scds_fm_sleep

scds_fm_sleep(3HA)					 Sun Cluster HA and Data Services					scds_fm_sleep(3HA)

NAME

       scds_fm_sleep - wait for a message on a fault monitor control socket

SYNOPSIS

       cc [flags...] -I /usr/cluster/include file -L /usr/cluster/lib  -l  dsdev
       #include <rgm/libdsdev.h>

       scha_err_t scds_fm_sleep(scds_handle_t handle, time_t timeout

DESCRIPTION

       Thescds_fm_sleep() function waits for a data service application process tree that running under control of the process monitor facility to
       die. If no such death occurs within  the  specified  timeout period, the function returns SCHA_ERR_NOERR.

       If a data  service  application process tree death occurs, scds_fm_sleep() records SCDS_COMPLETE_FAILURE in the failure history and  either
       restarts  the process tree or fails it over according to the algorithm described in the scds_fm_action(3HA) man page. If a failover attempt
       is unsuccessful, a restart of the application is attempted.

       If an attempted restart fails, the function returns SCHA_ERR_INTERNAL.

       Note that if the failure history causes this function to do a failover, and the failover attempt succeeds, scds_fm_sleep() never returns.

PARAMETERS

       The following parameters are supported:

       handle		   The handle returned from scds_initialize(3HA).

       timeout		   The timeout period measured in seconds.

RETURN VALUES

       The scds_fm_sleep() function returns the following:

       0		   The function succeeded.

       nonzero		   The function failed.

ERRORS

       SCHA_ERR_NOERR		   Indicates that the process tree has not died.

       SCHA_ERR_INTERNAL	   Indicates that the data service application process tree has died and failed to restart.

       Other values		   Indicate the function failed. See scha_calls(3HA) for  the meaning of failure codes.

FILES

       /usr/cluster/include/rgm/libdsdev.h

	   Include file

       /usr/cluster/lib/libdsdev.so

	   Library

ATTRIBUTES

       See attributes(5) for descriptions of the following attributes:

       +-----------------------------+-----------------------------+
       |      ATTRIBUTE TYPE	     |	    ATTRIBUTE VALUE	   |
       +-----------------------------+-----------------------------+
       |Availability		     |SUNWscdev 		   |
       +-----------------------------+-----------------------------+
       |Interface Stability	     |Evolving			   |
       +-----------------------------+-----------------------------+

SEE ALSO

	scha_calls(3HA), scds_fm_action(3HA), scds_initialize(3HA), attributes(5)

Sun Cluster 3.2 						    7 Sep 2007							scds_fm_sleep(3HA)