Why didn't she panic? (Sol 10 + SVM + HDS) Post: 302257902

Sponsored Content

Operating Systems Solaris Why didn't she panic? (Sol 10 + SVM + HDS) Post 302257902 by MikaBaghinen on Thursday 13th of November 2008 10:22:53 AM

11-13-2008

Registered User

Why didn't she panic? (Sol 10 + SVM + HDS)

Hi folks,

the following incident occured today:

by mistake one of our renowned administrators deleted the complete zoning for a 25K domain running solaris 10.

Thus the system lost all of it's external disks.

We've got oracle datafiles and oracle software residing on those lost disks.

The system logged read and write errors to /var/adm/messages. But it did not panic, because the write errors were qualified "retryable".

The external disks were mounted as metadevices in metasets.

Does SVM keep the system from panicing?

Background information:

uname -a:
SunOS <servername> 5.10 Generic_127111-11 sun4u sparc SUNW,Sun-Fire-15000

cat /etc/release:
Solaris 10 8/07 s10s_u4wos_12b SPARC
Copyright 2007 Sun Microsystems, Inc. All Rights Reserved.
Use is subject to license terms.
Assembled 16 August 2007

/var/adm/messages:

Nov 13 10:28:47 sv0703 md_stripe: [ID 641072 kern.warning] WARNING: md: 0703m1/d101: write error on /dev/dsk/c6t60060E801526C300000126C300003265d0s0
Nov 13 10:28:47 sv0703 md_stripe: [ID 641072 kern.warning] WARNING: md: 0703m1/d80: write error on /dev/dsk/c6t60060E801526C300000126C300002265d0s0
Nov 13 10:28:47 sv0703 md_sp: [ID 641072 kern.warning] WARNING: md: 0703m1/d101: write error on /dev/md/0703m1/dsk/d90
Nov 13 10:56:09 sv0703 Error for Command: write(10) Error Level: Retryable
...and so on...

df -h /u02:
/dev/md/<metasetname>/dsk/d100 103G 85G 17G 83% /u02

metaset -s <metasetname>
Set name = <metasetname>, Set number = 4
Host Owner
<hostname> Yes (auto)
Drive Dbase
/dev/dsk/c6t60060E801526C300000126C3000022A2d0 Yes
/dev/dsk/c6t60060E801526C300000126C3000032A2d0 Yes
/dev/dsk/c6t50060E80000000000000F8FE000000A2d0 Yes
/dev/dsk/c6t50060E80000000000000F8FE000004A2d0 Yes

For each FS we've got:

submirror-> mirror-> soft partition on metaset

Thanks & Regards

Mika

###

MikaBaghinen

View Public Profile for MikaBaghinen

Find all posts by MikaBaghinen

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

script didn;t work in cron !!! @_@

Hi all, I am writing a script to monitor some processes existence in the system. It works perfectly by running the script manually in commend line. However, when I put it under cron to run it failed. Everything time when the variable is null in the if statment. it failed and quitted. Here is...

2. UNIX for Dummies Questions & Answers

starce didn't work

Hello, I am learning to debug in sgi-Irix6.5, after a core dump, I was adviced to perform a "strace", but I got the following information: ERROR: tracer already exists what shall I do now? Thanks a lot Daniel

3. SCO

Add HDs while preserving the data

I am a 10 years windows person with basic unix training (very basic). I have a sco server where it originally had 3 physical drives. Drive 0 had to be replaced so I did that but because I am really new to Unix I was afraid to detroy the data on DRIVE 1 and 2, so I took them out when I loaded the...

4. UNIX for Advanced & Expert Users

luupgrade: Sol 8 -> Sol 10 u7 (5/09)

Greetings Forumers! I ran into an issue after running luupgrade on v880 running Solaris 8. I want to upgrade to Solaris 10. When I rebooted the system I noticed the file systems listed as such: # df -h Filesystem size used avail capacity Mounted on /dev/dsk/c1t1d0s0 ...

5. AIX

Attaching HDS External storage to AIX Servers

Hi guys, I am newbie to AIX. We are planning to attach external HDS array to AIX servers where VCS in installed. Anyone know step by step procedure for attaching and detaching HDS array?. If yes, please post reply for the same. Thanks in advance guys.

6. Solaris

JASS - upgrading from Sol 9 to Sol 10

Do I need to reinstall/rerun JASS after upgrading from Sol9 to Sol10? Just wondered if the upgrade procedure overwrote any of the settings etc?

7. UNIX for Advanced & Expert Users

VCS triggerring panic on 1 node, root disk under SVM

We have two node cluster with OS disk mirrored under SVM. There is slight disk problem on one of the mirror disk causing cluster to panic. Failure of one mirror disk causing VCS to panic the node. Why VCS is not able to write /var filesystem, as one of the disk is healthy. ...

8. Post Here to Contact Site Administrators and Moderators

Didn't find a suggestion thread

An 'addition' to the "Homework & Classes" requirements.. As i am someone without paper, i just figured i got tempred reading such a question. To avoid such 'feelings' in future, i'd be thankfull if the 'kind & definition of the course' would be required too. As in (i dont know about proper...

9. Shell Programming and Scripting

[Crontab] didn't work

Hello, Here is my crontab # Reboot one Sunday out of 2 at 02:00 0 2 * * 0/2 /usr/bin/reboot 2017-04-16 2017-04-23 2017-04-30 and so on I tested my crontab here, it seems to work Http://cron.schlitt.info/index.php?c...=100&test=Test However on my distrib linux mage�a When I register...

LEARN ABOUT HPUX

cmdisklock

cmdisklock(1m)															    cmdisklock(1m)

NAME

       cmdisklock - manage Serviceguard cluster lock devices.

SYNOPSIS

       cmdisklock check path
       cmdisklock [-f] reset path

DESCRIPTION

       cmdisklock is a tool to check the current state of a Serviceguard cluster lock device.  It can also be used to reset the state of the clus-
       ter lock device.  The need to reset the cluster lock device state could arise if the cluster lock device is replaced or becomes corrupt.

       A cluster lock device can be either an HP-UX LVM cluster lock or a cluster lock LUN device.  HP-UX LVM cluster locks exist only on  a  disk
       in an LVM volume group.	Cluster lock LUNs exist only on disks dedicated to cluster lock.  cmdisklock is useful for checking either type of
       cluster lock and for re-initializing cluster lock LUN devices after a failure or corruption.

       NOTE
       To restore an HP-UX LVM cluster lock, use vgcfgrestore.	cmdisklock will fail until vgcfgrestore is run, and cmdisklock is  unnecessary	as
       long as vgcfgbackup was done after the cluster lock was initialized.  See the Managing Serviceguard manual for details.

       The  syntax  of	the  path  option  depends  on	the  type  of  lock.  For  HP-UX LVM cluster lock disks, the syntax is VG:PV (for example:
       /dev/vglock:/dev/dsk/c0t0d2).  For cluster lock LUN disks, the path is the  disk  device  path.	 For  example,	/dev/sdd1  (on	Linux)	or
       /dev/dsk/c0t1d2 (on HP-UX).

   Options
       cmdisklock supports the following options:

	      check	       Check the current state of the cluster lock device and report the results.

	      reset	       Reset (initialize) the state of the cluster lock device.  This operation should only be performed on a cluster lock
			       LUN device.  For HP-UX LVM cluster lock, use vgcfgrestore as documented in the Managing Serviceguard manual.  After
			       performing a reset, a check can be used to verify that the lock is cleared.

EXAMPLES

       If the cluster lock LUN device becomes corrupted and the cluster is up, messages like the following will appear in syslog.

       Mar  15	12:20:41  usb  cmdisklockd[17599]:  WARNING: Cluster lock LUN /dev/dsk/c0t1d2 is corrupt: bad label.  Until this situation is cor-
       rected, a single failure could cause all nodes in the cluster to crash.
       Mar 15 12:20:41 usb cmdisklockd[17599]: After ensuring that all active nodes in the cluster have logged this message, run 'cmdisklock reset
       /dev/dsk/c0t1d2' to repair
       Mar 15 12:20:41 usb cmdisklockd[17599]: Cluster lock disk /dev/dsk/c0t1d2 is inaccessible

       Once the above messages appear in syslog on all running nodes, the following command will re-initialize the cluster lock LUN:

       ucd:/> cmdisklock reset /dev/dsk/c0t1d2
       WARNING: Cluster lock LUN /dev/dsk/c0t1d2 is corrupt: bad label.  Until this situation is corrected, a single failure could cause all nodes
       in the cluster to crash.
       After ensuring that all active nodes in the cluster have logged this message, run 'cmdisklock reset /dev/dsk/c0t1d2' to repair
       /dev/dsk/c0t1d2 is inaccessible
       Resetting cluster lock device /dev/dsk/c0t1d2
       Cluster lock reset completed
       /dev/dsk/c0t1d2 is accessible
       cleared

       After the lock is restored, a message like the following appears in syslog:

       Mar 15 12:23:11 usb cmdisklockd[17599]: Cluster lock disk /dev/dsk/c0t1d2 is accessible

WARNINGS

       CAUTION
       For cluster lock LUN, reset is a potentially destructive operation.  While cmdisklock checks for known volume manager and file  system  use
       (overridden  by	-f),  it does not validate that the device to be reset is actually used by any cluster.  If -f is used on the wrong device
       file, loss of data may result.

       CAUTION
       Care should be taken when doing a reset when the cluster is active as there is a remote possibility that the cluster will  partition  right
       when this command is run and both nodes could end up thinking they have successfully acquired the lock.	To avoid this situation, make sure
       cmcld has logged a message in syslog on all running nodes saying the device is inaccessble, before performing a reset.	Note  that  it	is
       safe to run cmdisklock when the cluster is down.

RETURN VALUE

       cmdisklock returns the following values:

	      0    Successful completion.
	      1    The disk is inaccessible or is not recognized as a cluster lock.

AUTHOR

       cmdisklock was developed by HP.

SEE ALSO

       cmapplyconf(1m), cmviewcl(1m), vgcfgbackup(1m), vgcfgrestore(1m)

						      Requires Optional Serviceguard Software					    cmdisklock(1m)