Solaris cluster critical issue


 
Thread Tools Search this Thread
Operating Systems Solaris Solaris cluster critical issue
# 1  
Old 02-28-2015
Solaris cluster critical issue

Hi all,

Few hours ago I did some changes in our Solaris cluster servers. Below are changes I did :

1. Installed latest Solaris 10 patchset from oracle.
2. Enabled BSM log module. Entered into single user mode and rebooted. After reboot changed to multi-user mode and rebooted again.

Now cluster service automatically stopping after about 10 mins. Also 2 of 5 cluster resource are offline.
Code:
dbcon-rs - offline
svr-rs - starting 
lsnr-rs - online
hasp-rs - online
rs - online

Quorum device and shared disks are all online. Please help me, it's enterprise production system. Now it's not working.

Last edited by Scott; 02-28-2015 at 06:30 PM.. Reason: Please use code tags
# 2  
Old 02-28-2015
How did you fix it on your test cluster, where you test these kind of things before trying them on critical production systems?
# 3  
Old 03-01-2015
There are no test cluster env. So I here is that problem.
# 4  
Old 03-01-2015
Missing a test environment is the problem you need to fix in the first place.
# 5  
Old 03-01-2015
In the meantime, please show us more information. All you've told us is your cluster is broken.

The best thing you can do is back out all your changes and go back to the way you were set up before. You do have a way to do that, don't you?

If you don't, read this:

http://www.oracle.com/technetwork/se...-wp-167900.pdf
# 6  
Old 03-01-2015
Hi all, now we stopped cluster services and system is working on first node without cluster.
I'm trying to find what was the cause of failure. Below are some info when cluster is not working.

Code:
bash-3.2# /usr/cluster/bin/clresource status
=== Cluster Resources ===

Resource Name      Node Name   State                  Status Message
-------------      ---------   -----                  --------------
fepprod-dbcon-rs   fep1prod    Offline                Offline
                   fep2prod    Offline                Offline

fepprod-svr-rs     fep1prod    Offline                Offline
                   fep2prod    Starting               Unknown

fepprod-lsnr-rs    fep1prod    Offline                Offline
                   fep2prod    Offline                Offline

fepprod-hasp-rs    fep1prod    Offline                Offline
                   fep2prod    Online		      Online

fepprod-rs         fep1prod    Offline                Offline - LogicalHostname offline.
                   fep2prod    Online		      Online - LogicalHostname online.

Also, when I try to switch active node below error occured :
Code:
resource group is undergoing a reconfiguration, try again later

Now, node1 is patchset updated and working without clustering. I'm going to install patchset on node2 and switch active node to node2. Hoping that I can find something helpful after patchset installation on node2.

---------- Post updated 03-02-15 at 12:53 AM ---------- Previous update was 03-01-15 at 11:45 PM ----------

---------- Post updated at 12:54 AM ---------- Previous update was at 12:53 AM ----------

One interesting thing I found. There is one failed device in cluster devices.

Code:
cldev status -v

/dev/did/rdsk/d8             fep1prod             Fail

Maybe it raised a problem ?
# 7  
Old 03-04-2015
U can try using EASEUS part master to check clusters
Login or Register to Ask a Question

Previous Thread | Next Thread

8 More Discussions You Might Find Interesting

1. Solaris

Patching Procedure in Solaris 10 with sun cluster having Solaris zone

Hi Gurus I am not able to find the patching procedure for solaris 10 ( sol10 u11) to latest patchset with sun cluster having failover zones so that same I should follow. Take an instance, there are sol1 and sol2 nodes and having two failover zones like sozone1-rg and sozone2-rg and currently... (1 Reply)
Discussion started by: nick101
1 Replies

2. AIX

Cluster communication issue

Hi, I am using Power HA7.1.1 SP5 AIx 7.1 My both cluster nodes are independently working. RG informations are not updating each other. Node A shows that node B is down and vice versa. RG1 is running node A, RG2 running on node B. === clRGinfo From Node B === RG01 OFFLINE ... (2 Replies)
Discussion started by: sunnybee
2 Replies

3. Red Hat

Linux Cluster failover issue

Hi Guys, I am not much aware of clusters but i have few questions can someone provide the overview as it would be very helpful for me. How can i perform cluster failover test to see all the services are failing back to other node ? If it is using veritas cluster then what kind of... (2 Replies)
Discussion started by: munna529
2 Replies

4. Solaris

Sun Cluster 3.2 Issue

Hello everyone, I have two Solaris 10 servers that are on cluster. The cluster is a Sun Cluster 3.2 I have a script cronned that stop/start a ressource in a resource group everyday. Today I have checked the status of the ressources and I found that my ressource group have a "Error--stop... (1 Reply)
Discussion started by: adilyos
1 Replies

5. Solaris

Sun Cluster switching issue

I have installed sun cluster 3.2 on two sprac servers. Configured a failover resource group. Added a LogicalHostname resource to it. LogicalHostname is also added to /etc/hosts with ip address. I am able to access cluster by share ip used for logical hostname but when i try to switch the resource... (0 Replies)
Discussion started by: ahmadnauman
0 Replies

6. Solaris

Sun Cluster configuration issue

I am using VMware Workstation-7 on Windows-XP host . I am trying to configure Solaris 10-X86 guest os based 2 nodes Sun Cluster . I have added one extra Virtual Lan adapter on my VMware with another subnet (that I would like to put for SUN Cluster private communication). I have... (0 Replies)
Discussion started by: sanjee
0 Replies

7. Solaris

Issue while installing: Solaris 10 SPARC Recommended Patch Cluster (2009.10.23)

Hello, As explained, I've encountered an issue while installing Solaris 10 SPARC Recommended Patch Cluster (2009.10.23). Actually, patch no 120011-14 stops with the following error: ERROR: attribute verification of </var/run/.patchSafeMode/root/usr/bin/passwd> failed file type <f>... (6 Replies)
Discussion started by: a.mauger
6 Replies

8. High Performance Computing

Building a Solaris Cluster Express cluster in a VirtualBox on OpenSolaris

Provides a description of how to set up a Solaris Cluster Express cluster in a VirtualBox on OpenSolaris. More... (0 Replies)
Discussion started by: Linux Bot
0 Replies
Login or Register to Ask a Question