What is the procedure to reboot cluster nodes

12-08-2011

Registered User

96, 0

Join Date: Jul 2011

Last Activity: 31 August 2016, 5:57 AM EDT

Location: Bangalore

Posts: 96

Thanks Given: 29

Thanked 0 Times in 0 Posts

What is the procedure to reboot cluster nodes

Hi

we have 2 solaris 10 servers in veritas cluster.
also we have oracle cluster on the database end.
now we have a requirement to reboot both the servers as it has been running for more than a year.
can any one tell what is the procedure to bring down the cluster services in both the nodes and the reboot procedure?
if you need any command output from the servers let me know

newtoaixos

View Public Profile for newtoaixos

Find all posts by newtoaixos

12-08-2011

Registered User

5,725, 311

Join Date: Jul 2006

Last Activity: 17 February 2019, 10:46 AM EST

Location: Berlin, Germany

Posts: 5,725

Thanks Given: 75

Thanked 311 Times in 297 Posts

just reboot the nodes... the cluster software should handle the rest.

DukeNuke2

View Public Profile for DukeNuke2

Visit DukeNuke2's homepage!

Find all posts by DukeNuke2

12-08-2011

Registered User

52, 4

Join Date: Nov 2008

Last Activity: 8 October 2014, 1:03 PM EDT

Location: Chicago, IL

Posts: 52

Thanks Given: 0

Thanked 4 Times in 4 Posts

You should probably review the cluster configuration before you do anything. Be sure to know the servicegroups and the dependencies and how they are going to react to a reboot. From all of my years as an SA, I found that fully trusting what others have done before me only results in a lot of pain. But, I work in a large environment of 4000+ servers with 1000+ clusters of all shapes and sizes. If you have specific questions, post them.

garskoci

View Public Profile for garskoci

Find all posts by garskoci

12-09-2011

Registered User

96, 0

Join Date: Jul 2011

Last Activity: 31 August 2016, 5:57 AM EDT

Location: Bangalore

Posts: 96

Thanks Given: 29

Thanked 0 Times in 0 Posts

hi
in aix normally we do # smitty clstop in both the nodes before go ahead with the reboot ?. Similarly what is the procedure in veritas cluster

newtoaixos

View Public Profile for newtoaixos

Find all posts by newtoaixos

12-09-2011

Registered User

52, 4

Join Date: Nov 2008

Last Activity: 8 October 2014, 1:03 PM EDT

Location: Chicago, IL

Posts: 52

Thanks Given: 0

Thanked 4 Times in 4 Posts

Take a look at the

Code:

hastop

command.

I don't know what clstop does. But, if you want to offline all of the servicegroups and shutdown the cluster you can run

Code:

hastop -all

Or, run

Code:

hastop -local

on each node. If you add a

Code:

-force

to it, only the cluster will shut down. The servicegroup will remain online. But, I don't think that you wan that.

Not knowing your cluster configuration, I would think that you would want to run

Code:

hastop -local

on each cluster node to let VCS offline the servicegroups. You will need to know what's controled by VCS and what's not. I see many times that Oracle RAC is outside of the cluster. Then you might have a DBA shut down RAC.

garskoci

View Public Profile for garskoci

Find all posts by garskoci

12-10-2011

Registered User

9, 0

Join Date: Dec 2011

Last Activity: 31 December 2011, 9:40 AM EST

Location: South Carolina

Posts: 9

Thanks Given: 0

Thanked 0 Times in 0 Posts

Every cluster is different .. I've seen some database clusters that the only thing the cluster controls is the filesystems. (Like that's not a disaster waiting to happen.. ). In that particular case, off-lining cluster resources without DBA involvement could make for a bad day

Since it looks like you might not be familiar with the nuances of this cluster, here's what I consider the safe route for DB servers:

Do hastatus -sum, and note the group that controls the database. Then look at

Code:

/etc/VRTSvcs/conf/config/main.cf

and see what that group actually does (or look via the hagrp and hares commands). Assuming the database itself is controlled by the cluster, bring it down like this -

In one window:

Code:

tail /var/VRTSvcs/log/engine_A.log

In another:

Code:

hastop -g <database group> -sys <system its running on>

Watch hastatus, and/or the log file you're tailing. If things go down smoothly, then great. If it hangs up waiting on the DB, let the DBA do their thing. The log will usually tell you everything you need to know. Be patient, depending on the DB, it can take a long time to come down.

Once the cluster and the DBA are both satisfied that the DB is down, you can usually then do a hastop -all, and the cluster should pretty easily take care of the dependencies. Wait for it to complete, and help it along if necessary using the info from the log file you're tailing.

Personally, if I'm not 100% comfortable with the system I'm on, I'm paranoid. In that case I like to do everything with cluster nodes one at a time. So offline all resources on all nodes of the cluster and stop VCS, then shut down one node, then the next, etc. Same on the way up. Bring up nodes one at a time, if you want to be extra careful. Let VCS find its brain on one node before another tries.

I've brought down CFS nodes at the same time, and end up with goofy fencing issues. (in hindsight I should have fully closed out gab and llt). It's never happened when I bring them down one at a time, so if I have the time, I like to do it that way.

Anyway, hastop -all would probably work just fine on a properly configured cluster. The fun is when the cluster isn't properly configured. And unless you know for sure either way, it's best to play it safe.

Last edited by cubemonkey; 12-10-2011 at 12:55 AM..

cubemonkey

View Public Profile for cubemonkey

Find all posts by cubemonkey

12-14-2011

Registered User

96, 0

Join Date: Jul 2011

Last Activity: 31 August 2016, 5:57 AM EDT

Location: Bangalore

Posts: 96

Thanks Given: 29

Thanked 0 Times in 0 Posts

What is the procedure to reboot cluster nodes ?

Hi. Thanks for your valuable suggestions. Im pasting the output from both the servers

Code:

================================================================================
CLUSTERNODE:cdbpsdb02:# hastatus -summary

-- SYSTEM STATE
-- System               State                Frozen

A  cdbpsdb02         RUNNING              0
A  cdbpsdb03         RUNNING              0

-- GROUP STATE
-- Group           System               Probed     AutoDisabled    State

B  AFASIT          cdbpsdb02         Y          N               ONLINE
B  AFASIT          cdbpsdb03         Y          N               ONLINE
B  FNTSIT          cdbpsdb02         Y          N               ONLINE
B  FNTSIT          cdbpsdb03         Y          N               ONLINE
B  RACSIT          cdbpsdb02         Y          N               ONLINE
B  RACSIT          cdbpsdb03         Y          N               ONLINE
B  SITFLASH        cdbpsdb02         Y          N               ONLINE
B  SITFLASH        cdbpsdb03         Y          N               ONLINE
B  cvm             cdbpsdb02         Y          N               ONLINE
B  cvm             cdbpsdb03         Y          N               ONLINE
================================================================================

CLUSTERNODE:cdbpsdb03:# hastatus -summary

-- SYSTEM STATE
-- System               State                Frozen

A  cdbpsdb02         RUNNING              0
A  cdbpsdb03         RUNNING              0

-- GROUP STATE
-- Group           System               Probed     AutoDisabled    State

B  AFASIT          cdbpsdb02         Y          N               ONLINE
B  AFASIT          cdbpsdb03         Y          N               ONLINE
B  FNTSIT          cdbpsdb02         Y          N               ONLINE
B  FNTSIT          cdbpsdb03         Y          N               ONLINE
B  RACSIT          cdbpsdb02         Y          N               ONLINE
B  RACSIT          cdbpsdb03         Y          N               ONLINE
B  SITFLASH        cdbpsdb02         Y          N               ONLINE
B  SITFLASH        cdbpsdb03         Y          N               ONLINE
B  cvm             cdbpsdb02         Y          N               ONLINE
B  cvm             cdbpsdb03         Y          N               ONLINE
0:0:0 /
================================================================================

Is it fine if I follow the below procedure. please advise

Code:

CLUSTERNODE:cdbpsdb02:# hastop -local
Wait for 10 minutes
CLUSTERNODE:cdbpsdb03:# hastop -local
Wait for 10 minutes
CLUSTERNODE:cdbpsdb02:# init 6
CLUSTERNODE:cdbpsdb03:# init 6

================================================================================

newtoaixos

View Public Profile for newtoaixos

Find all posts by newtoaixos

Solaris

What is the procedure to reboot cluster nodes

10 More Discussions You Might Find Interesting

1. Red Hat

RedHat Cluster: Nodes won't see each other

Discussion started by: Meacham12

2. Red Hat

RedHat Cluster: Nodes won't see each other

Discussion started by: Meacham12

3. UNIX for Advanced & Expert Users

Arbitrator for 2 nodes ocfs cluster

Discussion started by: malayo

4. AIX

Re-cluster 2 HACMP 5.2 nodes

Discussion started by: elcounto

5. Red Hat

How to troubleshoot a 1000 nodes Apache cluster?

Discussion started by: admin_xor

6. Solaris

Need advise on setting up solaris 10 2 nodes cluster

Discussion started by: spitfire2011

7. Red Hat

Centos/rhel 5 cluster 3 nodes with out Quorum

Discussion started by: Flomaster

8. Emergency UNIX and Linux Support

Rebooting 3 to 1 Cluster nodes.

Discussion started by: EmbedUX

9. UNIX for Dummies Questions & Answers

IP Alias, Bonding or Virtual IP, 2 nodes Cluster, which one to use ?

Discussion started by: Danny Gilbert

10. High Performance Computing

Bonding, IP alias, Virtual IP, 2 nodes cluster

Discussion started by: Danny Gilbert