sun Cluster resource group cant failover

07-09-2008

Registered User

2,693, 19

Join Date: May 2008

Last Activity: 24 August 2014, 5:15 AM EDT

Location: SINGAPORE.. The "FINE" City

Posts: 2,693

Thanks Given: 1

Thanked 19 Times in 19 Posts

Failover operation completed successfully for device ssd (GUID 600a0b800029d2160000057148649e21): failed over from <none> to secondary

What command did you issue to test failover? Can you do it once more and capture exactly hat you get during that time.?

incredible

View Public Profile for incredible

Find all posts by incredible

07-09-2008

Registered User

2,693, 19

Join Date: May 2008

Last Activity: 24 August 2014, 5:15 AM EDT

Location: SINGAPORE.. The "FINE" City

Posts: 2,693

Thanks Given: 1

Thanked 19 Times in 19 Posts

Are you aware ?
#200892: Sun Cluster 3.x Servers With Certain Qlogic HBA Drivers Attached to EMC Arrays may Encounter System Panics and Failed Service Failover

incredible

View Public Profile for incredible

Find all posts by incredible

07-09-2008

Registered User

9, 0

Join Date: Jul 2008

Last Activity: 11 July 2008, 9:36 AM EDT

Posts: 9

Thanks Given: 0

Thanked 0 Times in 0 Posts

HI,

the command I used was clrg switch -n C2SRV2 proxy2-rg

I have tail -f the messages and I have attached this information:

Jul 9 14:35:06 C2SRV2 Cluster.RGM.rgmd: [ID 224900 daemon.notice] launching method <hastorageplus_prenet_start> for resource <proxy2-HAS-rs>, resource group <proxy2-rg>, node <C2SRV2>, timeout <1800> seconds
Jul 9 14:35:06 C2SRV2 Cluster.RGM.rgmd: [ID 252072 daemon.notice] 50 fe_rpc_command: cmd_type(enum):<1>:cmd=</usr/cluster/lib/rgm/rt/hastorageplus/hastorageplus_prenet_start>:tag=<proxy2-rg.proxy2-HAS-rs.10>: Calling security_clnt_connect(..., host=<C2SRV2>, sec_type {0:WEAK, 1:STRONG, 2DES} =<1>, ...)
Jul 9 14:35:06 C2SRV2 Cluster.RGM.rgmd: [ID 285716 daemon.notice] 20 fe_rpc_command: cmd_type(enum):<2>:cmd=<null>:tag=<proxy2-rg.proxy2-HAS-rs.10>: Calling security_clnt_connect(..., host=<C2SRV2>, sec_type {0:WEAK, 1:STRONG, 2DES} =<0>, ...)
Jul 9 14:35:06 C2SRV2 Cluster.RGM.rgmd: [ID 316625 daemon.notice] Timeout monitoring on method tag <proxy2-rg.proxy2-HAS-rs.10> has been suspended.
Jul 9 14:35:09 C2SRV2 Cluster.Framework: [ID 801593 daemon.notice] stdout: becoming primary for proxy2-dg
Jul 9 14:35:11 C2SRV2 scsi: [ID 243001 kern.info] /scsi_vhci (scsi_vhci0):
Jul 9 14:35:11 C2SRV2 /scsi_vhci/ssd@g600a0b800029d28e000005ff48649a5c (ssd24): path /pci@780/SUNW,qlc@0/fp@0,0 (fp1) target address 200a00a0b829d290,b is now STANDBY because of an externally initiated failover
Jul 9 14:35:16 C2SRV2 scsi: [ID 243001 kern.info] /scsi_vhci (scsi_vhci0):
Jul 9 14:35:16 C2SRV2 Initiating failover for device ssd (GUID 600a0b800029d28e000005ff48649a5c)
Jul 9 14:35:18 C2SRV2 scsi: [ID 243001 kern.info] /scsi_vhci (scsi_vhci0):
Jul 9 14:35:18 C2SRV2 Failover operation completed successfully for device ssd (GUID 600a0b800029d28e000005ff48649a5c): failed over from <none> to primary
Jul 9 14:35:18 C2SRV2 scsi: [ID 243001 kern.info] /scsi_vhci (scsi_vhci0):
Jul 9 14:35:18 C2SRV2 /scsi_vhci/ssd@g600a0b800029d2160000057148649e21 (ssd25): path /pci@780/SUNW,qlc@0/fp@0,0 (fp1) target address 200a00a0b829d290,c is now STANDBY because of an externally initiated failover
Jul 9 14:35:23 C2SRV2 scsi: [ID 243001 kern.info] /scsi_vhci (scsi_vhci0):
Jul 9 14:35:23 C2SRV2 Initiating failover for device ssd (GUID 600a0b800029d2160000057148649e21)
Jul 9 14:35:25 C2SRV2 scsi: [ID 243001 kern.info] /scsi_vhci (scsi_vhci0):
Jul 9 14:35:25 C2SRV2 Failover operation completed successfully for device ssd (GUID 600a0b800029d2160000057148649e21): failed over from <none> to secondary
Jul 9 14:35:25 C2SRV2 Cluster.RGM.rgmd: [ID 285716 daemon.notice] 20 fe_rpc_command: cmd_type(enum):<3>:cmd=<null>:tag=<proxy2-rg.proxy2-HAS-rs.10>: Calling security_clnt_connect(..., host=<C2SRV2>, sec_type {0:WEAK, 1:STRONG, 2DES} =<0>, ...)
Jul 9 14:35:25 C2SRV2 Cluster.RGM.rgmd: [ID 316625 daemon.notice] Timeout monitoring on method tag <proxy2-rg.proxy2-HAS-rs.10> has been resumed.
Jul 9 14:35:27 C2SRV2 Cluster.RGM.rgmd: [ID 515159 daemon.notice] method <hastorageplus_prenet_start> completed successfully for resource <proxy2-HAS-rs>, resource group <proxy2-rg>, node <C2SRV2>, time used: 1% of timeout <1800 seconds>
Jul 9 14:35:27 C2SRV2 Cluster.RGM.rgmd: [ID 224900 daemon.notice] launching method <hastorageplus_monitor_start> for resource <proxy2-HAS-rs>, resource group <proxy2-rg>, node <C2SRV2>, timeout <90> seconds
Jul 9 14:35:27 C2SRV2 Cluster.RGM.rgmd: [ID 224900 daemon.notice] launching method <gds_svc_start> for resource <proxy2-zone-rs>, resource group <proxy2-rg>, node <C2SRV2>, timeout <300> seconds
Jul 9 14:35:27 C2SRV2 Cluster.RGM.rgmd: [ID 333393 daemon.notice] 49 fe_rpc_command: cmd_type(enum):<1>:cmd=</usr/cluster/lib/rgm/rt/hastorageplus/hastorageplus_monitor_start>:tag=<proxy2-rg.proxy2-HAS-rs.7>: Calling security_clnt_connect(..., host=<C2SRV2>, sec_type {0:WEAK, 1:STRONG, 2DES} =<1>, ...)
Jul 9 14:35:27 C2SRV2 Cluster.RGM.rgmd: [ID 252072 daemon.notice] 50 fe_rpc_command: cmd_type(enum):<1>:cmd=</opt/SUNWscgds/bin/gds_svc_start>:tag=<proxy2-rg.proxy2-zone-rs.0>: Calling security_clnt_connect(..., host=<C2SRV2>, sec_type {0:WEAK, 1:STRONG, 2DES} =<1>, ...)
Jul 9 14:35:27 C2SRV2 Cluster.RGM.rgmd: [ID 515159 daemon.notice] method <hastorageplus_monitor_start> completed successfully for resource <proxy2-HAS-rs>, resource group <proxy2-rg>, node <C2SRV2>, time used: 0% of timeout <90 seconds>
Jul 9 14:35:28 C2SRV2 genunix: [ID 408114 kern.info] /pseudo/zconsnex@1/zcons@1 (zcons1) online
Jul 9 14:40:33 C2SRV2 Cluster.RGM.rgmd: [ID 764140 daemon.error] Method <gds_svc_start> on resource <proxy2-zone-rs>, resource group <proxy2-rg>, node <C2SRV2>: Timeout.
Jul 9 14:40:33 C2SRV2 Cluster.RGM.rgmd: [ID 224900 daemon.notice] launching method <hastorageplus_monitor_stop> for resource <proxy2-HAS-rs>, resource group <proxy2-rg>, node <C2SRV2>, timeout <90> seconds
Jul 9 14:40:33 C2SRV2 Cluster.RGM.rgmd: [ID 224900 daemon.notice] launching method <gds_svc_stop> for resource <proxy2-zone-rs>, resource group <proxy2-rg>, node <C2SRV2>, timeout <300> seconds
Jul 9 14:40:33 C2SRV2 Cluster.RGM.rgmd: [ID 333393 daemon.notice] 49 fe_rpc_command: cmd_type(enum):<1>:cmd=</usr/cluster/lib/rgm/rt/hastorageplus/hastorageplus_monitor_stop>:tag=<proxy2-rg.proxy2-HAS-rs.8>: Calling security_clnt_connect(..., host=<C2SRV2>, sec_type {0:WEAK, 1:STRONG, 2DES} =<1>, ...)
Jul 9 14:40:33 C2SRV2 Cluster.RGM.rgmd: [ID 252072 daemon.notice] 50 fe_rpc_command: cmd_type(enum):<1>:cmd=</opt/SUNWscgds/bin/gds_svc_stop>:tag=<proxy2-rg.proxy2-zone-rs.1>: Calling security_clnt_connect(..., host=<C2SRV2>, sec_type {0:WEAK, 1:STRONG, 2DES} =<1>, ...)
Jul 9 14:40:33 C2SRV2 Cluster.RGM.rgmd: [ID 515159 daemon.notice] method <hastorageplus_monitor_stop> completed successfully for resource <proxy2-HAS-rs>, resource group <proxy2-rg>, node <C2SRV2>, time used: 0% of timeout <90 seconds>
Jul 9 14:43:35 C2SRV2 Cluster.RGM.fed: [ID 605976 daemon.notice] SCSLM zone <proxy2.mail.internal> down
Jul 9 14:43:35 C2SRV2 SC[SUNWsczone.stop_sczbt]proxy2-rg proxy2-zone-rs: [ID 567783 daemon.notice] stop_command rc<0> - Shutdown started. Wed Jul 9 13:40:33 BST 2008
Jul 9 14:43:35 C2SRV2 SC[SUNWsczone.stop_sczbt]roxy2-rgroxy2-zone-rs: [ID 567783 daemon.notice] stop_command rc<0> - Changing to init state 0 - please wait
Jul 9 14:43:35 C2SRV2 SC[SUNWsczone.stop_sczbt]roxy2-rgroxy2-zone-rs: [ID 567783 daemon.notice] stop_command rc<0> - showmount: proxy2.mail.internal: RPC: Program not registered
Jul 9 14:43:35 C2SRV2 Cluster.RGM.rgmd: [ID 515159 daemon.notice] method <gds_svc_stop> completed successfully for resource <proxy2-zone-rs>, resource group <proxy2-rg>, node <C2SRV2>, time used: 60% of timeout <300 seconds>
Jul 9 14:43:35 C2SRV2 Cluster.RGM.rgmd: [ID 224900 daemon.notice] launching method <hastorageplus_postnet_stop> for resource <proxy2-HAS-rs>, resource group <proxy2-rg>, node <C2SRV2>, timeout <1800> seconds
Jul 9 14:43:35 C2SRV2 Cluster.RGM.rgmd: [ID 252072 daemon.notice] 50 fe_rpc_command: cmd_type(enum):<1>:cmd=</usr/cluster/lib/rgm/rt/hastorageplus/hastorageplus_postnet_stop>:tag=<proxy2-rg.proxy2-HAS-rs.11>: Calling security_clnt_connect(..., host=<C2SRV2>, sec_type {0:WEAK, 1:STRONG, 2DES} =<1>, ...)
Jul 9 14:43:36 C2SRV2 Cluster.RGM.rgmd: [ID 515159 daemon.notice] method <hastorageplus_postnet_stop> completed successfully for resource <proxy2-HAS-rs>, resource group <proxy2-rg>, node <C2SRV2>, time used: 0% of timeout <1800 seconds>
Jul 9 14:43:36 C2SRV2 Cluster.Framework: [ID 801593 daemon.notice] stdout: no longer primary for proxy2-dg

lesliek

View Public Profile for lesliek

Find all posts by lesliek

07-09-2008

Registered User

2,693, 19

Join Date: May 2008

Last Activity: 24 August 2014, 5:15 AM EDT

Location: SINGAPORE.. The "FINE" City

Posts: 2,693

Thanks Given: 1

Thanked 19 Times in 19 Posts

When a Solaris Zone is managed by the Sun Cluster HA for Solaris Containers data service, the Solaris Zone becomes a failover Solaris Zone, or multiple-masters Solaris Zone, across the Sun Cluster nodes. The failover is managed by the Sun Cluster HA for Solaris Containers data service, which runs only within the global zone.

incredible

View Public Profile for incredible

Find all posts by incredible

07-09-2008

Registered User

9, 0

Join Date: Jul 2008

Last Activity: 11 July 2008, 9:36 AM EDT

Posts: 9

Thanks Given: 0

Thanked 0 Times in 0 Posts

Hi,

In Relation to the San I am using

Sun StorageTek 6140

Thanks

lesliek

View Public Profile for lesliek

Find all posts by lesliek

07-09-2008

Registered User

2,693, 19

Join Date: May 2008

Last Activity: 24 August 2014, 5:15 AM EDT

Location: SINGAPORE.. The "FINE" City

Posts: 2,693

Thanks Given: 1

Thanked 19 Times in 19 Posts

Perform the following step for each resource group you want to return to the original node.
# clrg switch -h nodename resourcegroup
if your cluster is 3.2 you should not use Network_resources_used any more, just place your logical host in the dependency list.

From the messages I see two probable root causes.
1. The master server installed on shared storage.
2. The master server resource does not depend on the necessary HASP resource.
The problem arises due to a probable misconfiguration.

It is nearly 100% sure that the dependency from the master resource to the underlying HAStoragePlus resourece was missing. The symptoms are classic, if the dependency is missing, RGM calls the validation on the second node. On this node there is no shared storage, so the Agent works as expected. The problem gets fixed if the necessary dependency is added.

incredible

View Public Profile for incredible

Find all posts by incredible

07-09-2008

Registered User

9, 0

Join Date: Jul 2008

Last Activity: 11 July 2008, 9:36 AM EDT

Posts: 9

Thanks Given: 0

Thanked 0 Times in 0 Posts

I have the data service installed on all nodes in the cluster.

there is are files in the
/opt/SUNWsczone/sczbt/util/proxy2-sczbt_config
/opt/SUNWsczone/sczbt/util/sczbt_register
/opt/ParameterFile/sczbt_proxy2-zone-rs

these are used to create the proxy2-zone-rs resource and this resource doe not run on all the servers. when you do try to failover the resource group proxy2-rg it failsover all the resources apart from the last one proxy2-zone-rs

I created the resouce by editing the proxy2-sczbt_config file
then I registered the config file i.e
/opt/SUNWsczone/sczbt/util/sczbt_register -f /opt/SUNWsczone/sczbt/util/proxy2-sczbt_config

this created the /opt/ParameterFile/sczbt_proxy2-zone-rs

and then I copied the /opt/ParameterFile/sczbt_proxy2-zone-rs
to all other nodes in the cluster so the should be identical

I have attached the config/Parameter files :#

bash-3.00# cat proxy2-sczbt_config
#
# Copyright 2007 Sun Microsystems, Inc. All rights reserved.
# Use is subject to license terms.
#
# ident "@(#)sczbt_config 1.4 07/09/14 SMI"
#
# This file will be sourced in by sczbt_register and the parameters
# listed below will be used.
#
# These parameters can be customized in (key=value) form
#
# RS - Name of the resource
# RG - Name of the resource group containing RS
# PARAMETERDIR - Name of the parameter file direcrory
# SC_NETWORK - Identfies if SUNW.LogicalHostname will be used
# true = zone will use SUNW.LogicalHostname
# false = zone will use it's own configuration
#
# NOTE: If the ip-type keyword for the non-global zone is set
# to "exclusive", only "false" is allowed for SC_NETWORK
#
# The configuration of a zone's network addresses depends on
# whether you require IPMP protection or protection against
# the failure of all physical interfaces.
#
# If you require only IPMP protection, configure the zone's
# addresses by using the zonecfg utility and then place the
# zone's address in an IPMP group.
#
# To configure this option set
# SC_NETWORK=false
# SC_LH=
#
# If IPMP protection is not required, just configure the
# zone's addresses by using the zonecfg utility.
#
# To configure this option set
# SC_NETWORK=false
# SC_LH=
#
# If you require protection against the failure of all physical
# interfaces, choose one option from the following list.
#
# - If you want the SUNW.LogicalHostName resource type to manage
# the zone's addresses, configure a SUNW.LogicalHostName
# resource with at least one of the zone's addresses.
#
# To configure this option set
# SC_NETWORK=true
# SC_LH=<Name of the SC Logical Hostname resource>
#
# - Otherwise, configure the zone's addresses by using the
# zonecfg utility and configure a redundant IP address
# for use by a SUNW.LogicalHostName resource.
#
# To configure this option set
# SC_NETWORK=false
# SC_LH=<Name of the SC Logical Hostname resource>
#
# Whichever option is chosen, multiple zone addresses can be
# used either in the zone's configuration or using several
# SUNW.LogicalHostname resources.
#
# e.g. SC_NETWORK=true
# SC_LH=zone1-lh1,zone1-lh2
#
# SC_LH - Name of the SC Logical Hostname resource
# FAILOVER - Identifies if the zone's zone path is on a
# highly available local file system
#
# e.g. FAILOVER=true - highly available local file system
# FAILOVER=false - local file system
#
# HAS_RS - Name of the HAStoragePlus SC resource
#

RS=proxy2-zone-rs
RG=proxy2-rg
PARAMETERDIR=/opt/ParameterFile
SC_NETWORK=false
SC_LH=
FAILOVER=true
HAS_RS=proxy2-HAS-rs

#
# The following variable will be placed in the parameter file
#
# Parameters for sczbt (Zone Boot)
#
# Zonename Name of the zone
# Zonebrand Brand of the zone. Current supported options are
# "native" (default), "lx" or "solaris8"
# Zonebootopt Zone boot options ("-s" requires that Milestone=single-user)
# Milestone SMF Milestone which needs to be online before the zone is
# considered booted. This option is only used for the
# "native" Zonebrand.
# LXrunlevel Runlevel which needs to get reached before the zone is
# considered booted. This option is only used for the "lx"
# Zonebrand.
# SLrunlevel Solaris legacy runlevel which needs to get reached before the
# zone is considered booted. This option is only used for the
# "solaris8" Zonebrand.
# Mounts Mounts is a list of directories and their mount options,
# which are loopback mounted from the global zone into the
# newly booted zone. The mountpoint in the local zone can
# be different to the mountpoint from the global zone.
#
# The Mounts parameter format is as follows,
#
# Mounts="/<global zone directory>:/<local zone directory>:<mount options>"
#
# The following are valid examples for the "Mounts" variable
#
# Mounts="/globalzone-dir1:/localzone-dir1:rw"
# Mounts="/globalzone-dir1:/localzone-dir1:rw /globalzone-dir2:rw"
#
# The only required entry is the /<global zone directory>, the
# /<local zone directory> and <mount options> can be omitted.
#
# Omitting /<local zone directory> will make the local zone
# mountpoint the same as the global zone directory.
#
# Omitting <mount options> will not provide any mount options
# except the default options from the mount command.
#
# Note: You must manually create any local zone mountpoint
# directories that will be used within the Mounts variable,
# before registering this resource within Sun Cluster.
#

Zonename="proxy2.mail.internal"
Zonebrand="native"
Zonebootopt=""
Milestone="multi-user-server"
LXrunlevel="3"
SLrunlevel="3"
Mounts=""
########################
Paramerter file :

bash-3.00# cat sczbt_proxy2-zone-rs
#!/usr/bin/ksh
#
# Copyright 2007 Sun Microsystems, Inc. All rights reserved.
# Use is subject to license terms.
#
#
# Parameters for sczbt (Zone Boot)
#
# Zonename Name of the zone
# Zonebrand Brand of the zone. Current supported options are
# "native" (default), "lx" or "solaris8"
# Zonebootopt Zone boot options ("-s" requires that Milestone=single-user)
# Milestone SMF Milestone which needs to be online before the zone is
# considered as booted. This option is only used for the
# "native" Zonebrand.
# LXrunlevel Runlevel which needs to get reached before the zone is
# considered booted. This option is only used for the "lx"
# Zonebrand.
# SLrunlevel Solaris legacy runlevel which needs to get reached before the
# zone is considered booted. This option is only used for the
# "solaris8" Zonebrand.
# Mounts Mounts is a list of directories and their mount options,
# which are loopback mounted from the global zone into the
# newly booted zone. The mountpoint in the local zone can
# be different to the mountpoint from the global zone.
#
# The Mounts parameter format is as follows,
#
# Mounts="/<global zone directory>:/<local zone directory>:<mount options>"
#
# The following are valid examples for the "Mounts" variable
#
# Mounts="/globalzone-dir1:/localzone-dir1:rw"
# Mounts="/globalzone-dir1:/localzone-dir1:rw /globalzone-dir2:rw"
# The only required entry is the /<global zone directory>, the
# /<local zone directory> and <mount options> can be omitted.
#
# Omitting /<local zone directory> will make the local zone
# mountpoint the same as the global zone directory.
#
# Omitting <mount options> will not provide any mount options
# except the default options from the mount command.
#
# Note: You must manually create any local zone mountpoint
# directories that will be used within the Mounts variable,
# before registering this resource within Sun Cluster.
#

Zonename="proxy2.mail.internal"
Zonebrand="native"
Zonebootopt=""
Milestone="multi-user-server"
LXrunlevel="3"
SLrunlevel="3"
Mounts=""

lesliek

View Public Profile for lesliek

Find all posts by lesliek

High Performance Computing

sun Cluster resource group cant failover

10 More Discussions You Might Find Interesting

1. Solaris

Process to add mount point in Sun Cluster existing HAplus resource

Discussion started by: amity

2. Red Hat

Linux Cluster failover issue

Discussion started by: munna529

3. Solaris

Solaris Cluster Failover based on scan rate

Discussion started by: edydsuranta

4. Solaris

Sun cluster 4.0 - zone cluster failover doubt

Discussion started by: NVA

5. AIX

Adding a Volume Group to an HACMP Resource Group?

Discussion started by: aixromeo

6. Gentoo

How to failover the cluster ?

Discussion started by: sidharthmellam

7. AIX

Resource Group Monitoring

Discussion started by: srnagu

8. Solaris

Sun Cluster 3.1 failover

Discussion started by: Mack1982

9. High Performance Computing

Veritas Cluster Server Management Console IP Failover

Discussion started by: Beast Of Bodmin

10. HP-UX

ServiceGuard cluster & volume group failover

Discussion started by: Wotan31