sun Cluster resource group cant failover


 
Thread Tools Search this Thread
Special Forums UNIX and Linux Applications High Performance Computing sun Cluster resource group cant failover
# 8  
Old 07-09-2008
Failover operation completed successfully for device ssd (GUID 600a0b800029d2160000057148649e21): failed over from <none> to secondary


What command did you issue to test failover? Can you do it once more and capture exactly hat you get during that time.?
# 9  
Old 07-09-2008
# 10  
Old 07-09-2008
HI,

the command I used was clrg switch -n C2SRV2 proxy2-rg

I have tail -f the messages and I have attached this information:


Jul 9 14:35:06 C2SRV2 Cluster.RGM.rgmd: [ID 224900 daemon.notice] launching method <hastorageplus_prenet_start> for resource <proxy2-HAS-rs>, resource group <proxy2-rg>, node <C2SRV2>, timeout <1800> seconds
Jul 9 14:35:06 C2SRV2 Cluster.RGM.rgmd: [ID 252072 daemon.notice] 50 fe_rpc_command: cmd_type(enum):<1>:cmd=</usr/cluster/lib/rgm/rt/hastorageplus/hastorageplus_prenet_start>:tag=<proxy2-rg.proxy2-HAS-rs.10>: Calling security_clnt_connect(..., host=<C2SRV2>, sec_type {0:WEAK, 1:STRONG, 2DES} =<1>, ...)
Jul 9 14:35:06 C2SRV2 Cluster.RGM.rgmd: [ID 285716 daemon.notice] 20 fe_rpc_command: cmd_type(enum):<2>:cmd=<null>:tag=<proxy2-rg.proxy2-HAS-rs.10>: Calling security_clnt_connect(..., host=<C2SRV2>, sec_type {0:WEAK, 1:STRONG, 2DES} =<0>, ...)
Jul 9 14:35:06 C2SRV2 Cluster.RGM.rgmd: [ID 316625 daemon.notice] Timeout monitoring on method tag <proxy2-rg.proxy2-HAS-rs.10> has been suspended.
Jul 9 14:35:09 C2SRV2 Cluster.Framework: [ID 801593 daemon.notice] stdout: becoming primary for proxy2-dg
Jul 9 14:35:11 C2SRV2 scsi: [ID 243001 kern.info] /scsi_vhci (scsi_vhci0):
Jul 9 14:35:11 C2SRV2 /scsi_vhci/ssd@g600a0b800029d28e000005ff48649a5c (ssd24): path /pci@780/SUNW,qlc@0/fp@0,0 (fp1) target address 200a00a0b829d290,b is now STANDBY because of an externally initiated failover
Jul 9 14:35:16 C2SRV2 scsi: [ID 243001 kern.info] /scsi_vhci (scsi_vhci0):
Jul 9 14:35:16 C2SRV2 Initiating failover for device ssd (GUID 600a0b800029d28e000005ff48649a5c)
Jul 9 14:35:18 C2SRV2 scsi: [ID 243001 kern.info] /scsi_vhci (scsi_vhci0):
Jul 9 14:35:18 C2SRV2 Failover operation completed successfully for device ssd (GUID 600a0b800029d28e000005ff48649a5c): failed over from <none> to primary
Jul 9 14:35:18 C2SRV2 scsi: [ID 243001 kern.info] /scsi_vhci (scsi_vhci0):
Jul 9 14:35:18 C2SRV2 /scsi_vhci/ssd@g600a0b800029d2160000057148649e21 (ssd25): path /pci@780/SUNW,qlc@0/fp@0,0 (fp1) target address 200a00a0b829d290,c is now STANDBY because of an externally initiated failover
Jul 9 14:35:23 C2SRV2 scsi: [ID 243001 kern.info] /scsi_vhci (scsi_vhci0):
Jul 9 14:35:23 C2SRV2 Initiating failover for device ssd (GUID 600a0b800029d2160000057148649e21)
Jul 9 14:35:25 C2SRV2 scsi: [ID 243001 kern.info] /scsi_vhci (scsi_vhci0):
Jul 9 14:35:25 C2SRV2 Failover operation completed successfully for device ssd (GUID 600a0b800029d2160000057148649e21): failed over from <none> to secondary
Jul 9 14:35:25 C2SRV2 Cluster.RGM.rgmd: [ID 285716 daemon.notice] 20 fe_rpc_command: cmd_type(enum):<3>:cmd=<null>:tag=<proxy2-rg.proxy2-HAS-rs.10>: Calling security_clnt_connect(..., host=<C2SRV2>, sec_type {0:WEAK, 1:STRONG, 2DES} =<0>, ...)
Jul 9 14:35:25 C2SRV2 Cluster.RGM.rgmd: [ID 316625 daemon.notice] Timeout monitoring on method tag <proxy2-rg.proxy2-HAS-rs.10> has been resumed.
Jul 9 14:35:27 C2SRV2 Cluster.RGM.rgmd: [ID 515159 daemon.notice] method <hastorageplus_prenet_start> completed successfully for resource <proxy2-HAS-rs>, resource group <proxy2-rg>, node <C2SRV2>, time used: 1% of timeout <1800 seconds>
Jul 9 14:35:27 C2SRV2 Cluster.RGM.rgmd: [ID 224900 daemon.notice] launching method <hastorageplus_monitor_start> for resource <proxy2-HAS-rs>, resource group <proxy2-rg>, node <C2SRV2>, timeout <90> seconds
Jul 9 14:35:27 C2SRV2 Cluster.RGM.rgmd: [ID 224900 daemon.notice] launching method <gds_svc_start> for resource <proxy2-zone-rs>, resource group <proxy2-rg>, node <C2SRV2>, timeout <300> seconds
Jul 9 14:35:27 C2SRV2 Cluster.RGM.rgmd: [ID 333393 daemon.notice] 49 fe_rpc_command: cmd_type(enum):<1>:cmd=</usr/cluster/lib/rgm/rt/hastorageplus/hastorageplus_monitor_start>:tag=<proxy2-rg.proxy2-HAS-rs.7>: Calling security_clnt_connect(..., host=<C2SRV2>, sec_type {0:WEAK, 1:STRONG, 2DES} =<1>, ...)
Jul 9 14:35:27 C2SRV2 Cluster.RGM.rgmd: [ID 252072 daemon.notice] 50 fe_rpc_command: cmd_type(enum):<1>:cmd=</opt/SUNWscgds/bin/gds_svc_start>:tag=<proxy2-rg.proxy2-zone-rs.0>: Calling security_clnt_connect(..., host=<C2SRV2>, sec_type {0:WEAK, 1:STRONG, 2DES} =<1>, ...)
Jul 9 14:35:27 C2SRV2 Cluster.RGM.rgmd: [ID 515159 daemon.notice] method <hastorageplus_monitor_start> completed successfully for resource <proxy2-HAS-rs>, resource group <proxy2-rg>, node <C2SRV2>, time used: 0% of timeout <90 seconds>
Jul 9 14:35:28 C2SRV2 genunix: [ID 408114 kern.info] /pseudo/zconsnex@1/zcons@1 (zcons1) online
Jul 9 14:40:33 C2SRV2 Cluster.RGM.rgmd: [ID 764140 daemon.error] Method <gds_svc_start> on resource <proxy2-zone-rs>, resource group <proxy2-rg>, node <C2SRV2>: Timeout.
Jul 9 14:40:33 C2SRV2 Cluster.RGM.rgmd: [ID 224900 daemon.notice] launching method <hastorageplus_monitor_stop> for resource <proxy2-HAS-rs>, resource group <proxy2-rg>, node <C2SRV2>, timeout <90> seconds
Jul 9 14:40:33 C2SRV2 Cluster.RGM.rgmd: [ID 224900 daemon.notice] launching method <gds_svc_stop> for resource <proxy2-zone-rs>, resource group <proxy2-rg>, node <C2SRV2>, timeout <300> seconds
Jul 9 14:40:33 C2SRV2 Cluster.RGM.rgmd: [ID 333393 daemon.notice] 49 fe_rpc_command: cmd_type(enum):<1>:cmd=</usr/cluster/lib/rgm/rt/hastorageplus/hastorageplus_monitor_stop>:tag=<proxy2-rg.proxy2-HAS-rs.8>: Calling security_clnt_connect(..., host=<C2SRV2>, sec_type {0:WEAK, 1:STRONG, 2DES} =<1>, ...)
Jul 9 14:40:33 C2SRV2 Cluster.RGM.rgmd: [ID 252072 daemon.notice] 50 fe_rpc_command: cmd_type(enum):<1>:cmd=</opt/SUNWscgds/bin/gds_svc_stop>:tag=<proxy2-rg.proxy2-zone-rs.1>: Calling security_clnt_connect(..., host=<C2SRV2>, sec_type {0:WEAK, 1:STRONG, 2DES} =<1>, ...)
Jul 9 14:40:33 C2SRV2 Cluster.RGM.rgmd: [ID 515159 daemon.notice] method <hastorageplus_monitor_stop> completed successfully for resource <proxy2-HAS-rs>, resource group <proxy2-rg>, node <C2SRV2>, time used: 0% of timeout <90 seconds>
Jul 9 14:43:35 C2SRV2 Cluster.RGM.fed: [ID 605976 daemon.notice] SCSLM zone <proxy2.mail.internal> down
Jul 9 14:43:35 C2SRV2 SC[SUNWsczone.stop_sczbt]proxy2-rg proxy2-zone-rs: [ID 567783 daemon.notice] stop_command rc<0> - Shutdown started. Wed Jul 9 13:40:33 BST 2008
Jul 9 14:43:35 C2SRV2 SC[SUNWsczone.stop_sczbt]roxy2-rgroxy2-zone-rs: [ID 567783 daemon.notice] stop_command rc<0> - Changing to init state 0 - please wait
Jul 9 14:43:35 C2SRV2 SC[SUNWsczone.stop_sczbt]roxy2-rgroxy2-zone-rs: [ID 567783 daemon.notice] stop_command rc<0> - showmount: proxy2.mail.internal: RPC: Program not registered
Jul 9 14:43:35 C2SRV2 Cluster.RGM.rgmd: [ID 515159 daemon.notice] method <gds_svc_stop> completed successfully for resource <proxy2-zone-rs>, resource group <proxy2-rg>, node <C2SRV2>, time used: 60% of timeout <300 seconds>
Jul 9 14:43:35 C2SRV2 Cluster.RGM.rgmd: [ID 224900 daemon.notice] launching method <hastorageplus_postnet_stop> for resource <proxy2-HAS-rs>, resource group <proxy2-rg>, node <C2SRV2>, timeout <1800> seconds
Jul 9 14:43:35 C2SRV2 Cluster.RGM.rgmd: [ID 252072 daemon.notice] 50 fe_rpc_command: cmd_type(enum):<1>:cmd=</usr/cluster/lib/rgm/rt/hastorageplus/hastorageplus_postnet_stop>:tag=<proxy2-rg.proxy2-HAS-rs.11>: Calling security_clnt_connect(..., host=<C2SRV2>, sec_type {0:WEAK, 1:STRONG, 2DES} =<1>, ...)
Jul 9 14:43:36 C2SRV2 Cluster.RGM.rgmd: [ID 515159 daemon.notice] method <hastorageplus_postnet_stop> completed successfully for resource <proxy2-HAS-rs>, resource group <proxy2-rg>, node <C2SRV2>, time used: 0% of timeout <1800 seconds>
Jul 9 14:43:36 C2SRV2 Cluster.Framework: [ID 801593 daemon.notice] stdout: no longer primary for proxy2-dg
# 11  
Old 07-09-2008
When a Solaris Zone is managed by the Sun Cluster HA for Solaris Containers data service, the Solaris Zone becomes a failover Solaris Zone, or multiple-masters Solaris Zone, across the Sun Cluster nodes. The failover is managed by the Sun Cluster HA for Solaris Containers data service, which runs only within the global zone.
# 12  
Old 07-09-2008
Hi,

In Relation to the San I am using

Sun StorageTek 6140

Thanks
# 13  
Old 07-09-2008
Perform the following step for each resource group you want to return to the original node.
# clrg switch -h nodename resourcegroup
if your cluster is 3.2 you should not use Network_resources_used any more, just place your logical host in the dependency list.

From the messages I see two probable root causes.
1. The master server installed on shared storage.
2. The master server resource does not depend on the necessary HASP resource.
The problem arises due to a probable misconfiguration.

It is nearly 100% sure that the dependency from the master resource to the underlying HAStoragePlus resourece was missing. The symptoms are classic, if the dependency is missing, RGM calls the validation on the second node. On this node there is no shared storage, so the Agent works as expected. The problem gets fixed if the necessary dependency is added.
# 14  
Old 07-09-2008
I have the data service installed on all nodes in the cluster.

there is are files in the
/opt/SUNWsczone/sczbt/util/proxy2-sczbt_config
/opt/SUNWsczone/sczbt/util/sczbt_register
/opt/ParameterFile/sczbt_proxy2-zone-rs

these are used to create the proxy2-zone-rs resource and this resource doe not run on all the servers. when you do try to failover the resource group proxy2-rg it failsover all the resources apart from the last one proxy2-zone-rs

I created the resouce by editing the proxy2-sczbt_config file
then I registered the config file i.e
/opt/SUNWsczone/sczbt/util/sczbt_register -f /opt/SUNWsczone/sczbt/util/proxy2-sczbt_config

this created the /opt/ParameterFile/sczbt_proxy2-zone-rs

and then I copied the /opt/ParameterFile/sczbt_proxy2-zone-rs
to all other nodes in the cluster so the should be identical

I have attached the config/Parameter files :#

bash-3.00# cat proxy2-sczbt_config
#
# Copyright 2007 Sun Microsystems, Inc. All rights reserved.
# Use is subject to license terms.
#
# ident "@(#)sczbt_config 1.4 07/09/14 SMI"
#
# This file will be sourced in by sczbt_register and the parameters
# listed below will be used.
#
# These parameters can be customized in (key=value) form
#
# RS - Name of the resource
# RG - Name of the resource group containing RS
# PARAMETERDIR - Name of the parameter file direcrory
# SC_NETWORK - Identfies if SUNW.LogicalHostname will be used
# true = zone will use SUNW.LogicalHostname
# false = zone will use it's own configuration
#
# NOTE: If the ip-type keyword for the non-global zone is set
# to "exclusive", only "false" is allowed for SC_NETWORK
#
# The configuration of a zone's network addresses depends on
# whether you require IPMP protection or protection against
# the failure of all physical interfaces.
#
# If you require only IPMP protection, configure the zone's
# addresses by using the zonecfg utility and then place the
# zone's address in an IPMP group.
#
# To configure this option set
# SC_NETWORK=false
# SC_LH=
#
# If IPMP protection is not required, just configure the
# zone's addresses by using the zonecfg utility.
#
# To configure this option set
# SC_NETWORK=false
# SC_LH=
#
# If you require protection against the failure of all physical
# interfaces, choose one option from the following list.
#
# - If you want the SUNW.LogicalHostName resource type to manage
# the zone's addresses, configure a SUNW.LogicalHostName
# resource with at least one of the zone's addresses.
#
# To configure this option set
# SC_NETWORK=true
# SC_LH=<Name of the SC Logical Hostname resource>
#
# - Otherwise, configure the zone's addresses by using the
# zonecfg utility and configure a redundant IP address
# for use by a SUNW.LogicalHostName resource.
#
# To configure this option set
# SC_NETWORK=false
# SC_LH=<Name of the SC Logical Hostname resource>
#
# Whichever option is chosen, multiple zone addresses can be
# used either in the zone's configuration or using several
# SUNW.LogicalHostname resources.
#
# e.g. SC_NETWORK=true
# SC_LH=zone1-lh1,zone1-lh2
#
# SC_LH - Name of the SC Logical Hostname resource
# FAILOVER - Identifies if the zone's zone path is on a
# highly available local file system
#
# e.g. FAILOVER=true - highly available local file system
# FAILOVER=false - local file system
#
# HAS_RS - Name of the HAStoragePlus SC resource
#

RS=proxy2-zone-rs
RG=proxy2-rg
PARAMETERDIR=/opt/ParameterFile
SC_NETWORK=false
SC_LH=
FAILOVER=true
HAS_RS=proxy2-HAS-rs

#
# The following variable will be placed in the parameter file
#
# Parameters for sczbt (Zone Boot)
#
# Zonename Name of the zone
# Zonebrand Brand of the zone. Current supported options are
# "native" (default), "lx" or "solaris8"
# Zonebootopt Zone boot options ("-s" requires that Milestone=single-user)
# Milestone SMF Milestone which needs to be online before the zone is
# considered booted. This option is only used for the
# "native" Zonebrand.
# LXrunlevel Runlevel which needs to get reached before the zone is
# considered booted. This option is only used for the "lx"
# Zonebrand.
# SLrunlevel Solaris legacy runlevel which needs to get reached before the
# zone is considered booted. This option is only used for the
# "solaris8" Zonebrand.
# Mounts Mounts is a list of directories and their mount options,
# which are loopback mounted from the global zone into the
# newly booted zone. The mountpoint in the local zone can
# be different to the mountpoint from the global zone.
#
# The Mounts parameter format is as follows,
#
# Mounts="/<global zone directory>:/<local zone directory>:<mount options>"
#
# The following are valid examples for the "Mounts" variable
#
# Mounts="/globalzone-dir1:/localzone-dir1:rw"
# Mounts="/globalzone-dir1:/localzone-dir1:rw /globalzone-dir2:rw"
#
# The only required entry is the /<global zone directory>, the
# /<local zone directory> and <mount options> can be omitted.
#
# Omitting /<local zone directory> will make the local zone
# mountpoint the same as the global zone directory.
#
# Omitting <mount options> will not provide any mount options
# except the default options from the mount command.
#
# Note: You must manually create any local zone mountpoint
# directories that will be used within the Mounts variable,
# before registering this resource within Sun Cluster.
#

Zonename="proxy2.mail.internal"
Zonebrand="native"
Zonebootopt=""
Milestone="multi-user-server"
LXrunlevel="3"
SLrunlevel="3"
Mounts=""
########################
Paramerter file :

bash-3.00# cat sczbt_proxy2-zone-rs
#!/usr/bin/ksh
#
# Copyright 2007 Sun Microsystems, Inc. All rights reserved.
# Use is subject to license terms.
#
#
# Parameters for sczbt (Zone Boot)
#
# Zonename Name of the zone
# Zonebrand Brand of the zone. Current supported options are
# "native" (default), "lx" or "solaris8"
# Zonebootopt Zone boot options ("-s" requires that Milestone=single-user)
# Milestone SMF Milestone which needs to be online before the zone is
# considered as booted. This option is only used for the
# "native" Zonebrand.
# LXrunlevel Runlevel which needs to get reached before the zone is
# considered booted. This option is only used for the "lx"
# Zonebrand.
# SLrunlevel Solaris legacy runlevel which needs to get reached before the
# zone is considered booted. This option is only used for the
# "solaris8" Zonebrand.
# Mounts Mounts is a list of directories and their mount options,
# which are loopback mounted from the global zone into the
# newly booted zone. The mountpoint in the local zone can
# be different to the mountpoint from the global zone.
#
# The Mounts parameter format is as follows,
#
# Mounts="/<global zone directory>:/<local zone directory>:<mount options>"
#
# The following are valid examples for the "Mounts" variable
#
# Mounts="/globalzone-dir1:/localzone-dir1:rw"
# Mounts="/globalzone-dir1:/localzone-dir1:rw /globalzone-dir2:rw"
# The only required entry is the /<global zone directory>, the
# /<local zone directory> and <mount options> can be omitted.
#
# Omitting /<local zone directory> will make the local zone
# mountpoint the same as the global zone directory.
#
# Omitting <mount options> will not provide any mount options
# except the default options from the mount command.
#
# Note: You must manually create any local zone mountpoint
# directories that will be used within the Mounts variable,
# before registering this resource within Sun Cluster.
#

Zonename="proxy2.mail.internal"
Zonebrand="native"
Zonebootopt=""
Milestone="multi-user-server"
LXrunlevel="3"
SLrunlevel="3"
Mounts=""
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Solaris

Process to add mount point in Sun Cluster existing HAplus resource

Hi Well I would like to know step by step process of adding a mountpoint in HAPLUS resource in SUN cluster as I go the below command to add a mount point but not the step by step process of adding a mount point in existing HA Plus resource. clrs set -p FileSystemMountPoints+=<new_MP>... (3 Replies)
Discussion started by: amity
3 Replies

2. Red Hat

Linux Cluster failover issue

Hi Guys, I am not much aware of clusters but i have few questions can someone provide the overview as it would be very helpful for me. How can i perform cluster failover test to see all the services are failing back to other node ? If it is using veritas cluster then what kind of... (2 Replies)
Discussion started by: munna529
2 Replies

3. Solaris

Solaris Cluster Failover based on scan rate

Dear Experts, If there is a possible Solaris Cluster failover to second node based on scan rate? I need the documentation If solaris cluster can do this. Thank You in Advance Edy (3 Replies)
Discussion started by: edydsuranta
3 Replies

4. Solaris

Sun cluster 4.0 - zone cluster failover doubt

Hello experts - I am planning to install a Sun cluster 4.0 zone cluster fail-over. few basic doubts. (1) Where should i install the cluster s/w binaries ?. ( global zone or the container zone where i am planning to install the zone fail-over) (2) Or should i perform the installation on... (0 Replies)
Discussion started by: NVA
0 Replies

5. AIX

Adding a Volume Group to an HACMP Resource Group?

Hi, I have a 2 node Cluster. Which is working in active/passive mode (i.e Node#1 is running and when it goes down the Node#2 takes over) Now there's this requirement that we need a mount point say /test that should be available in active node #1 and when node #1 goes down and node#2 takes... (6 Replies)
Discussion started by: aixromeo
6 Replies

6. Gentoo

How to failover the cluster ?

How to failover the cluster ? GNU/Linux By which command, My Linux version 2008 x86_64 x86_64 x86_64 GNU/Linux What are the prerequisites we need to take while failover ? if any Regards (3 Replies)
Discussion started by: sidharthmellam
3 Replies

7. AIX

Resource Group Monitoring

Hi, I have a requirement to monitor the HACMP Resource Groups. At present in my environment, if the Resource Groups fail over from preferred node to Secondary node we dont get notification. Can some one help me in creating a scrript. I have more than one RG online. (Max 4 Resource Groups in... (2 Replies)
Discussion started by: srnagu
2 Replies

8. Solaris

Sun Cluster 3.1 failover

Hi, We have two sun SPARC server in Clustered (Sun Cluster 3.1). For some reason, System 1 failed over to System 2. Where can I find the logs which could tell me the reason for this failover? Thanks (5 Replies)
Discussion started by: Mack1982
5 Replies

9. High Performance Computing

Veritas Cluster Server Management Console IP Failover

I have just completed a first RTFM of "Veritas Cluster Server Management Console Implementation Guide" 5.1, with a view to assessing it to possibly make our working lives easier. Unfortunately, at my organisation, getting a test installation would be worse than pulling teeth, so I can't just go... (2 Replies)
Discussion started by: Beast Of Bodmin
2 Replies

10. HP-UX

ServiceGuard cluster & volume group failover

I have a 2-node ServiceGuard cluster. One of the cluster packages has a volume group assigned to it. When I fail the package over to the other node, the volume group does not come up automatically on the other node. I have to manually do a "vgchange -a y vgname" on the node before the package... (5 Replies)
Discussion started by: Wotan31
5 Replies
Login or Register to Ask a Question