Help with faulty Disk on Sun OS


 
Thread Tools Search this Thread
Operating Systems Solaris Help with faulty Disk on Sun OS
# 1  
Old 03-27-2012
Help with faulty Disk on Sun OS

Hi,

Recently i came across a disk that seems to be faulty and need help. I have gathered some information by running below commands and any help on how to solve this will be great.



Code:
# uname –a

SunOS XYZ 5.7 Generic_106541-16 sun4u sparc SUNW,Ultra-4

Code:
#df -k
Filesystem kbytes used avail capacity Mounted on
/proc 0 0 0 0% /proc
/dev/md/dsk/d0 2056211 791794 1202731 40% /
fd 0 0 0 0% /dev/fd
/dev/md/dsk/d6 482455 129619 304591 30% /var
/dev/md/dsk/d9 17404618 12474839 4755733 73% /oracle
/dev/md/dsk/d12 15281351 4116289 11012249 28% /archive
/dev/md/dsk/d15 52211532 16096689 35592728 32% /db01
/dev/md/dsk/d18 52211532 17333076 34356341 34% /backup
/dev/md/dsk/d21 2114063 87178 1963464 5% /home
swap 2325656 1080 2324576 1% /tmp

Code:
# ./metadb
flags first blk block count
Wm p l 16 1034 /dev/dsk/c0t0d0s7
a p luo 16 1034 /dev/dsk/c2t0d0s7
a p luo 16 1034 /dev/dsk/c5t0d0s7
a p luo 16 1034 /dev/dsk/c4t0d0s7
a p luo 16 1034 /dev/dsk/c2t1d0s7
a p luo 16 1034 /dev/dsk/c3t1d0s7
a p luo 16 1034 /dev/dsk/c4t1d0s7
a p luo 16 1034 /dev/dsk/c5t1d0s7

Code:
# ./metastat -p

d0 -m d1 d2 1
d1 1 1 c0t0d0s0
d2 1 1 c2t3d0s0
d3 -m d4 d5 1
d4 1 1 c0t0d0s1
d5 1 1 c2t3d0s1
d6 -m d7 d8 1
d7 1 1 c0t0d0s3
d8 1 1 c2t3d0s3
d9 -m d101 d11 1
d101 1 1 c5t3d0s0
d11 1 1 c4t3d0s0
d12 -m d13 1
d13 1 1 c0t2d0s0
d15 -m d16 d17 1
d16 1 3 c2t0d0s0 c3t0d0s0 c2t1d0s0 -i 32b
d17 1 3 c4t2d0s0 c5t2d0s0 c5t1d0s0 -i 32b
d18 -m d19 d20 1
d19 1 3 c4t0d0s0 c5t0d0s0 c0t1d0s1 -i 32b
d20 1 3 c2t2d0s0 c3t2d0s0 c3t1d0s0 -i 32b
d21 -m d22 d23 1
d22 1 1 c0t2d0s1
d23 1 1 c0t0d0s4

After checking metastat output individually I am able to see the maintenance mode for all d1, d4, d7 and d23

Code:
d1: Submirror of d0
State: Needs maintenance
Invoke: metareplace d0 c0t0d0s0 <new device>
Size: 4198392 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
c0t0d0s0 0 No Maintenance

From below messages and format command which shows error with c0t0d0
Code:
# format

Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c0t0d0 <drive type unknown>
/pci@1f,4000/scsi@3/sd@0,0
1. c0t1d0 <SUN36G cyl 24620 alt 2 hd 27 sec 107>
/pci@1f,4000/scsi@3/sd@1,0
2. c0t2d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248> arc disk
/pci@1f,4000/scsi@3/sd@2,0
3. c2t0d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248> db01_1
/pci@4,2000/scsi@1/sd@0,0
4. c2t1d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248> db01_3
/pci@4,2000/scsi@1/sd@1,0
5. c2t2d0 <SUN36G cyl 24620 alt 2 hd 27 sec 107>
/pci@4,2000/scsi@1/sd@2,0
6. c2t3d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248> OS mir
/pci@4,2000/scsi@1/sd@3,0
7. c3t0d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248> db01_2
/pci@4,2000/scsi@1,1/sd@0,0
8. c3t1d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248> back_3m
/pci@4,2000/scsi@1,1/sd@1,0
9. c3t2d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248> back_2m
/pci@4,2000/scsi@1,1/sd@2,0
10. c4t0d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248> back_1
/pci@6,2000/scsi@1/sd@0,0
11. c4t1d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248> back_3
/pci@6,2000/scsi@1/sd@1,0
12. c4t2d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248> db01_1m
/pci@6,2000/scsi@1/sd@2,0
13. c4t3d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248> ora_mir
/pci@6,2000/scsi@1/sd@3,0
14. c5t0d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248> back_2
/pci@6,2000/scsi@1,1/sd@0,0
15. c5t1d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248> db01_3m
/pci@6,2000/scsi@1,1/sd@1,0
16. c5t2d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248> db01_2m
/pci@6,2000/scsi@1,1/sd@2,0
17. c5t3d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248> arch_mir
/pci@6,2000/scsi@1,1/sd@3,0
Specify disk (enter its number): 0
AVAILABLE DRIVE TYPES:
0. Auto configure
1. Quantum ProDrive 80S
2. Quantum ProDrive 105S
3. CDC Wren IV 94171-344
4. SUN0104
5. SUN0207
6. SUN0327
7. SUN0340
8. SUN0424
9. SUN0535
10. SUN0669
11. SUN1.0G
12. SUN1.05
13. SUN1.3G
14. SUN2.1G
15. SUN2.9G
16. SUN18G
17. SUN18G
18. SUN18G
19. SUN36G
20. other
Specify disk type (enter its number):

Code:
# iostat -En

c0t0d0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 3
Vendor: FUJITSU Product: MAJ3182M SUN18G Revision: 0804 Serial No: 02P19623
Size: 18.11GB <18110967808 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0

When check in messages i came across stuff that c0t0d0 is faulty.
Code:
# cat /var/adm/messages

Mar 27 08:43:00 XYZ unix: WARNING: /pci@1f,4000/scsi@3/sd@0,0 (sd0):
Mar 27 08:43:00 XYZ disk not responding to selection
Mar 27 08:43:01 XYZ unix: WARNING: /pci@1f,4000/scsi@3/sd@0,0 (sd0):
Mar 27 08:43:01 XYZ disk not responding to selection

Code:
# cat /var/adm/messages.0

Mar 20 21:10:25 XYZ unix: WARNING: /pci@1f,4000/scsi@3/sd@0,0 (sd0):
Mar 20 21:10:25 XYZ SCSI transport failed: reason 'incomplete': retrying command
Mar 20 21:11:30 XYZ unix: /pci@1f,4000/scsi@3 (glm0):
Mar 20 21:11:30 XYZ Cmd (0x2aca648) dump for Target 0 Lun 0:
Mar 20 21:11:30 XYZ unix: /pci@1f,4000/scsi@3 (glm0):
Mar 20 21:11:30 XYZ cdb=[ 0x2a 0x0 0x2 0x1b 0x76 0x54 0x0 0x0 0x1 0x0 ]
Mar 20 21:11:30 XYZ unix: /pci@1f,4000/scsi@3 (glm0):
Mar 20 21:11:30 XYZ pkt_flags=0x4000 pkt_statistics=0x61 pkt_state=0x7
Mar 20 21:11:30 XYZ unix: /pci@1f,4000/scsi@3 (glm0):
Mar 20 21:11:30 XYZ pkt_scbp=0x0 cmd_flags=0x18e1
Mar 20 21:11:30 XYZ unix: WARNING: /pci@1f,4000/scsi@3 (glm0):
Mar 20 21:11:30 XYZ Disconnected tagged cmd(s) (1) timeout for Target 0.0
Mar 20 21:11:30 XYZ unix: WARNING: ID[SUNWpd.glm.cmd_timeout.6018]
Mar 20 21:11:30 XYZ unix: WARNING: /pci@1f,4000/scsi@3/sd@0,0 (sd0):
Mar 20 21:11:30 XYZ SCSI transport failed: reason 'reset': retrying command
Mar 20 21:11:30 XYZ unix: WARNING: /pci@1f,4000/scsi@3/sd@0,0 (sd0):
Mar 20 21:11:30 XYZ SCSI transport failed: reason 'timeout': retrying command
Mar 20 21:11:34 XYZ unix: WARNING: /pci@1f,4000/scsi@3/sd@0,0 (sd0):
Mar 20 21:11:34 XYZ SCSI transport failed: reason 'incomplete': retrying command
Mar 20 21:11:38 XYZ unix: WARNING: /pci@1f,4000/scsi@3/sd@0,0 (sd0):
Mar 20 21:11:38 XYZ disk not responding to selection
Mar 20 21:11:40 XYZ unix: WARNING: /pci@1f,4000/scsi@3/sd@0,0 (sd0):
Mar 20 21:11:40 XYZ disk not responding to selection
Mar 20 21:11:40 XYZ unix: WARNING: md: d1: write error on /dev/dsk/c0t0d0s0
Mar 20 21:11:40 XYZ unix: WARNING: md: d1: /dev/dsk/c0t0d0s0 needs maintenance
Mar 20 21:11:40 XYZ unix: WARNING: md: d4: read error on /dev/dsk/c0t0d0s1
Mar 20 21:11:42 XYZ unix: WARNING: md: d7: write error on /dev/dsk/c0t0d0s3
Mar 20 21:11:42 XYZ unix: WARNING: md: d4: /dev/dsk/c0t0d0s1 needs maintenance
Mar 20 21:11:42 XYZ unix: WARNING: md: d7: /dev/dsk/c0t0d0s3 needs maintenance
Mar 21 01:36:42 XYZ unix: WARNING: /pci@1f,4000/scsi@3/sd@0,0 (sd0):
Mar 21 01:36:42 XYZ disk not responding to selection
Mar 21 01:36:42 XYZ unix: WARNING: md: d23: read error on /dev/dsk/c0t0d0s4
Mar 21 01:36:42 XYZ unix: WARNING: md: d23: /dev/dsk/c0t0d0s4 needs maintenance
Mar 21 08:00:02 XYZ unix: WARNING: /pci@1f,4000/scsi@3/sd@0,0 (sd0):
Mar 21 08:00:02 XYZ disk not responding to selection


Code:
# ./prtdiag

System Configuration: Sun Microsystems sun4u Sun Enterprise 450 (2 X UltraSPARC-II 400MHz)
System clock frequency: 100 MHz
Memory size: 1024 Megabytes
========================= CPUs =========================
Run Ecache CPU CPU
Brd CPU Module MHz MB Impl. Mask
--- --- ------- ----- ------ ------ ----
SYS 1 1 400 4.0 US-II 10.0
SYS 3 3 400 4.0 US-II 10.0
========================= Memory =========================
Interlv. Socket Size
Bank Group Name (MB) Status
---- ----- ------ ---- ------
0 none 1901 256 OK
0 none 1902 256 OK
0 none 1903 256 OK
0 none 1904 256 OK
========================= IO Cards =========================
Bus Freq
Brd Type MHz Slot Name Model
--- ---- ---- ---- -------------------------------- ----------------------
SYS PCI 33 4 pciclass,001000 Symbios,53C875
SYS PCI 33 6 pciclass,001000 Symbios,53C875

No failures found in System



When I tried see the label for faulty sub mirror it gives as unable to read disk geometry but I was able to read other half of sub mirror as below.


Code:
# prtvtoc /dev/dsk/c0t0d0s0
prtvtoc: /dev/rdsk/c0t0d0s0: Unable to read Disk geometry
# prtvtoc /dev/dsk/c2t3d0s0
* /dev/dsk/c2t3d0s0 (volume "OS mir") partition map
*
* Dimensions:
* 512 bytes/sector
* 248 sectors/track
* 19 tracks/cylinder
* 4712 sectors/cylinder
* 7508 cylinders
* 7506 accessible cylinders
*
* Flags:
* 1: unmountable
* 10: read-only
*
* Unallocated space:
* First Sector Last
* Sector Count Sector
* 9424000 25925424 35349423
* 35363560 4712 35368271
*
* First Sector Last
* Partition Tag Flags Sector Count Sector Mount Directory
0 2 00 0 4198392 4198391
1 3 01 4198392 4198392 8396783
2 5 01 0 35368272 35368271
3 7 00 8396784 1027216 9423999
7 0 00 35349424 14136 35363559

Usually i use to do this to solve it but not done for master anytime.
1. copy the labels of d2,d5, d8 and d22 and detach the faulty onces from mirror.
2. remove the old meta device and un configure the disk using cfgadm
3. now replace the disk and write the copied label to new disk and recreate the metadevices and attach them to the mirror.

But since its a c0t0d0 primary disk which contains partitions like root,boot etc i am not sure if i can do hot swapping as this is a production server cant really play much on it.

Last edited by DukeNuke2; 03-27-2012 at 11:06 AM..
# 2  
Old 03-28-2012
So, what's your question? It appears that every slice of the failed disk (c0t0d0) is mirrored, so you can follow your normal detach-replace-attach process to fix this.. but your configuration does require some care, as c0t0d0 is not mirrored by a single disk. You will likely have to hand construct the partition table to match the sizes of the slices on the other 2 disks involved in the various mirrors (c2t3d0 and c0t2d0).

also, since you are replacing a disk which is part of your boot mirror, you will need to ensure that you put the bootblocks onto the replacement.. see the manpage for 'installboot' for details, but I suspect the command will be something like:

Code:
# installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c0t0d0s0

..and, just in case some prior administrator didn't put the bootblocks on c2t3s0, you might want to proactively apply them there FIRST, before you eject c0t0d0 from the chassis.

Those old 18G and 36G scsi drives are becoming harder to find reliable replacements for... good luck finding some!
# 3  
Old 03-30-2012
Did this so far

Hi,

Thanks for your suggestions. this is what i did until now as below

to check the configuration

Code:
# cfgadm -al
cfgadm: Configuration administration not supported

Any ideas which command should be used instead of cfgadm as some people suggested devfsadm which i havent used anytime. so then did below

Code:
#  drvconfig


After that i can see the c0t0d0 in format utility, which shows as the OS disk

Code:
  # format 
  c0t0d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248>  OS disk
            /pci@1f,4000/scsi@3/sd@0,0


then i ran metareplace command with defective mirrors, as metastat output says "Invoke: metareplace d0 c0t0d0s0 <new device>"

Code:
  # metareplace –e d21 c0t0d0s4
  # metareplace –e d6 c0t0d0s3
  # metareplace –e d3 c0t0d0s1
  # metareplace –e d0 c0t0d0s0


From metastat output I can see that all the mirrors seems to be fine and are in sync with other sub mirrors and i can run prtvtoc as well...but the problem is if run metadb again it shows the "W" still exists as below. Some suggest to reboot which I don’t want to do now.

Code:
  # metadb
  flags first blk block count
  Wm p l 16 1034 /dev/dsk/c0t0d0s7
  a p luo 16 1034 /dev/dsk/c2t0d0s7
  a p luo 16 1034 /dev/dsk/c5t0d0s7
  a p luo 16 1034 /dev/dsk/c4t0d0s7
  a p luo 16 1034 /dev/dsk/c2t1d0s7
  a p luo 16 1034 /dev/dsk/c3t1d0s7
  a p luo 16 1034 /dev/dsk/c4t1d0s7
  a p luo 16 1034 /dev/dsk/c5t1d0s7

For this I have removed and re attached the database like below

Code:
  # metadb –d /dev/dsk/c0t0d0s7
  # metadb –a /dev/dsk/c0t0d0s7


Now when I check metadb output the errors are gone but some flags are missing (m – master, and p,l,o will only appear when rebooted as said in some forums) and came to know those will reappear once reboot is done which don’t want to do now.

Code:
  # metadb -i
          flags           first blk       block count
       a        u         16              1034            /dev/dsk/c0t0d0s7
       a    p  luo        16              1034            /dev/dsk/c2t0d0s7
       a    p  luo        16              1034            /dev/dsk/c5t0d0s7
       a    p  luo        16              1034            /dev/dsk/c4t0d0s7
       a    p  luo        16              1034            /dev/dsk/c2t1d0s7
       a    p  luo        16              1034            /dev/dsk/c3t1d0s7
       a    p  luo        16              1034            /dev/dsk/c4t1d0s7
       a    p  luo        16              1034            /dev/dsk/c5t1d0s7


Any suggestion on this and how it is done ?

Thanks in advance

Last edited by DukeNuke2; 03-30-2012 at 08:44 AM..
# 4  
Old 03-30-2012
Since your system is running Solaris7, the 'cfgadm' command did not exist back then, and I'm not even certain (without checking) whether 'devfsadm' was even around back then. You'll probably want the do the following on replacement of scsi disks:

Code:
# drvconfig; disks; devlinks

as far as the flags output from 'metadb -i', you're just going to have to wait for a reboot to see the p,l,o and possibly m flags on that disk.
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Show faulty shows PS1 faulty

I plugged both power cables in both power supply. When I unplugged each power cable one by one, the SPARC T4-1 machine keep running. However, show faulty command shows below message. (I have also attached the picture of both power supply) -> show faulty Target ... (1 Reply)
Discussion started by: z_haseeb
1 Replies

2. Solaris

[solved] How to blink faulty disk in Solaris hardware?

Hi Guys, One of two disks in my solaris machine has failed, the name is disk0, this is SUN physical sparc machine But I work remotely, so people working near that physical server are not that technical, so from OS command prompt can run some command to bink faulty disk at front panel of Server.... (9 Replies)
Discussion started by: manalisharmabe
9 Replies

3. HP-UX

FAULTY DISK replacement HP rx4640

Hello, I'm new to this forum and as you will see from my question I'm new to UNIX as well. One of our costumers has HP rx4640 running on UNIX with two 300GB hot-swappable disks that are mirrored. They reported to us that one of the disks is faulty and they want us to take care of it. Below is... (16 Replies)
Discussion started by: gjk
16 Replies

4. HP-UX

Remove faulty disk LV from VG

Hi, Have mirrored the primary disk to 3 . Server and OS: # uname -a HP-UX pdwp1s B.11.11 U 9000/800 118434630 unlimited-user license # model 9000/800/L3000-7x # strings /etc/lvmtab /dev/vg00 +F@< /dev/dsk/c1t2d0 /dev/dsk/c2t2d0 /dev/dsk/c2t0d0 But now I have only 1 disk... (5 Replies)
Discussion started by: Shirishlnx
5 Replies

5. HP-UX

Remove Faulty disk from HP-UX LVM VG

Requirement to remove a faulty mirrored disk from hp-ux LVM <root@pdwp1s>/etc # vgdisplay -v /dev/vg00 vgdisplay: Warning: couldn't query physical volume "/dev/dsk/c2t0d0": The specified path does not correspond to physical volume attached to this volume group vgdisplay: Warning: couldn't... (9 Replies)
Discussion started by: Shirishlnx
9 Replies

6. Emergency UNIX and Linux Support

disk replacment, SUN M3000

we have a SUN M3000 server. setup as only 1 domain. disk c0t0d0 and c0t1d0 and setup as SVM mirrors. a few days ago disk T1 failed. new we have replaced the disk, but can's see the disk in format. have done cfgadm and devfsadm. still can't access the new disk in format. the output... (6 Replies)
Discussion started by: robsonde
6 Replies

7. AIX

Removing Faulty Disk SSA

Hi Experts, I have configured A D40 Array. There is an faulty disk which is not part of an raid volume but shows fault in the diagnostics. pdisk15 U0.1-P1-I1/Q1-W40AA83CC2400D SSA160 Physical Disk Drive ( MB) Is there a way to stop this... (2 Replies)
Discussion started by: vuppala360
2 Replies

8. UNIX for Advanced & Expert Users

Help: Sun Disk partitioning for Sun V240 & StorEdge 3300

Dear Sun gurus, I have Sun Fire V240 server with its StorEdge 3300 disk-array. Following are its disks appeared in format command. I have prepared its partitions thru format and metainit & metattach (may be i have made wrong steps, causing the errors below because I have done thru some document... (1 Reply)
Discussion started by: shafeeq
1 Replies

9. UNIX for Dummies Questions & Answers

multiple disk in sun os

i have unix box , which currently has 2 scsi disk , as shown by format command, one at target 1 and another at target3 (which is current boot disk). can i use both the disk , if so will df -k show usage of both ? can any one guide me how to span file system across multiple disk. i m using sun 5.7... (4 Replies)
Discussion started by: raju
4 Replies
Login or Register to Ask a Question