Bad disk, how to replace ?


 
Thread Tools Search this Thread
Operating Systems Solaris Bad disk, how to replace ?
# 1  
Old 06-12-2018
Bad disk, how to replace ?

Hello,

I see hard and transport errors on all disks under treso pool and looks like some data corruption too. I want to take backup before, I reboot and replace disk. As of now, there are no slots free on server, so one option is, to break mirror, remove second disk (I need two disks, because data is 400GB). I have two spare disks, will insert in those slots, mount and copy data.
Can somebody help me to understand, if below setup shows me that I can detach disks without disturbing data and mount ?
Code:
pool: treso
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: resilver completed after 0h42m with 0 errors on Thu Mar 24 12:11:13 2016
config:

        NAME        STATE     READ WRITE CKSUM
        zones2      DEGRADED    17     0     0
          raidz1    DEGRADED    17     0     0
            c1t4d0  ONLINE       0     0     0
            c1t5d0  DEGRADED    35     0     0  too many errors
            c1t6d0  ONLINE       0     0     0
            c1t8d0  FAULTED      2     0     0  too many errors

errors: 4 data errors, use '-v' for a list
#

Thanks
# 2  
Old 06-12-2018
In current configuration, you will can do little..
Reason being your configuration (RAIDZ1), allows one disk to fail (which it did).

Other being almost failed, pool is still accessible.
When the degraded disk fails (should happen soon enough), you will lose all the data in zpool.

The course of action should be :
  1. Take a backup using zfs send / receive or copy the data.
  2. zpool offline the FAILED disk from pool.
  3. Unconfigure the offlined disk using cfgadm
  4. Insert a new working drive in the same slot, and configure it using cfgadm
  5. Issue a zpool online / replace against the replaced disk.

https://docs.oracle.com/cd/E19253-01...cet/index.html

Regards
Peasant.

Last edited by rbatte1; 06-13-2018 at 07:14 AM.. Reason: Formatted numbered list with LIST=1 tags
This User Gave Thanks to Peasant For This Post:
# 3  
Old 06-15-2018
I took the backup, destroyed pool, replace disks and created new pool - zones3
Now, instead of putting in raidz1, I just want to create mirror of zones3. With below configuration, if one disk fails, data will be lost. I have two new disks- c1t4d0 and c1t6d0
Code:
# zpool status zones3
  pool: zones3
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        zones3      ONLINE       0     0     0
          c1t9d0    ONLINE       0     0     0
          c1t10d0   ONLINE       0     0     0

errors: No known data errors
#

Is it correct command to run ?
Code:
zpool zones3 mirror c1t4d0 c1t6d0

# 4  
Old 06-15-2018
Take the following example, where i'm using files but it's the same with real devices.
This will tolerate 1 to 2 device failures.

If two devices fail from one top level vdev (mirror-N) you will lose data.

I would strongly suggest using odd number of disks and keeping one hot spare in pool.
In your configuration, get one more disk if you really love your data.

Code:
[root@gimmick ~]# ls -dl /zones/test/disk*
-rw------T   1 root     root     104857600 Jun 16 02:48 /zones/test/disk0
-rw------T   1 root     root     104857600 Jun 16 02:48 /zones/test/disk1
-rw------T   1 root     root     104857600 Jun 16 02:48 /zones/test/disk2
-rw------T   1 root     root     104857600 Jun 16 02:48 /zones/test/disk3
[root@gimmick ~]# 

[root@gimmick ~]# zpool status testpool

  pool: testpool
 state: ONLINE
  scan: none requested
config:

	NAME                 STATE     READ WRITE CKSUM
	testpool             ONLINE       0     0     0
	  /zones/test/disk1  ONLINE       0     0     0
	  /zones/test/disk0  ONLINE       0     0     0

errors: No known data errors
[root@gimmick ~]# zpool attach testpool /zones/test/disk0 /zones/test/disk2
[root@gimmick ~]# zpool attach testpool /zones/test/disk1 /zones/test/disk3
[root@gimmick ~]# zpool status testpool
  pool: testpool
 state: ONLINE
  scan: resilvered 49K in 0h0m with 0 errors on Sat Jun 16 02:48:41 2018
config:

	NAME                   STATE     READ WRITE CKSUM
	testpool               ONLINE       0     0     0
	  mirror-0             ONLINE       0     0     0
	    /zones/test/disk1  ONLINE       0     0     0
	    /zones/test/disk3  ONLINE       0     0     0
	  mirror-1             ONLINE       0     0     0
	    /zones/test/disk0  ONLINE       0     0     0
	    /zones/test/disk2  ONLINE       0     0     0

errors: No known data errors

[root@gimmick  ~]#

Hope that helps
Regards
Peasant.
# 5  
Old 06-16-2018
Going through your example, can I run below commands online, without interruption ?
PHP Code:
zpool attach zones c1t9d0 c1t4d0
zpool attach zones c1t10d0 c1t6d0 
# 6  
Old 06-16-2018
Yes.

Only thing that you should notice is increased read / write until resilvering is done.

Regards
Peasant.
This User Gave Thanks to Peasant For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. AIX

AIX lpar bad disk I/O performance - 4k per IO limitation ?

Hi Guys, I have fresh new installed VIO 2.2.3.70 on a p710, 3 physical SAS disks, rootvg on hdisk0 and 3 VIO clients through vscsi, AIX7.1tl4 AIX6.1tl9 RHEL6.5ppc, each lpar has its rootvg installed on a LV on datavg (hdisk2) mapped to vhost0,1,2 There is no vg on hdisk1, I use it for my... (1 Reply)
Discussion started by: frenchy59
1 Replies

2. Solaris

Bad magic number in disk label.

This is first time post...found this forum when looking for possible solution to fix my sun pc. Just one day can't boot it already showing the following: Boot device: disk File args: Bad magic number in disk label Can't open disk label package Evaluating: boot Can't open boot device... (40 Replies)
Discussion started by: SHuKoSuGi
40 Replies

3. HP-UX

LVM: is possible to replace a disk?

Scenario1: VG00 lvm,not mirrored,2 disk of 36GB vg size VG00 size is under 30G. Is possible to remove a disk of 36GB and replace "on fly" with a disk of 300GB on VG00? Thanks (6 Replies)
Discussion started by: Linusolaradm1
6 Replies

4. SCO

Replace Disk

Originally Posted by panos83 Hello there Sir, according to this old post, https://www.unix.com/sco/100001-warni...ler-found.html i try to restore an old image to a new hard disk drive. The problem is that i don't have basic information such as: 1) Which btld driver do i... (3 Replies)
Discussion started by: jgt
3 Replies

5. Solaris

Help:"Bad checksum in disk label" and "Can't open disk label package"?

Hello, I'm brand new to Sun/Solaris. I have a Sun Blade 150, with SunOS 5.8. I wanted to make a backup to prevent future data loss, so I put the disk in a normal PC with Windows XP to try to make a backup with Norton Ghost, the disk was detected, but not the file volume, so I place the disk... (6 Replies)
Discussion started by: Resadija
6 Replies

6. Solaris

How to replace failed disk?

Dear all Please can any one explain me how to replace failed disk in Solaris 10. Please tell me the step by step procedure. (9 Replies)
Discussion started by: suneelieg
9 Replies

7. OS X (Apple)

Can't Mount Disk / Image after bad unmount

I have had a little issue with one of my disks, the usb cacble was pulled out and one of the external drives on it would no longer mount. I used First Aid and it verified and repaired both OK / nothing to do). After lots of messing around and not being able to mount I used Drive Genius 2 and that... (1 Reply)
Discussion started by: Cranie
1 Replies

8. UNIX for Advanced & Expert Users

Replace, fromat, label bad harddrive

All, Here is the situation. Four 72GB hard drive installed on Sun V-240 server. HDD3 has gone bad. The server will not boot up completely. It keeps booting into single user mode. FSCK has been run and it did not fix the problem. I believe the harddrive is bad and needs to be replaced. The... (1 Reply)
Discussion started by: Kevin1166
1 Replies

9. UNIX for Dummies Questions & Answers

trying to replace a disk, that's all...

Actually have a few different issues, wondering if I could get input on (tried reading/searching for an answer, with no luck) 1. Have a v240 running sol9 that needs a bad disk replaced. So, here are the steps I took: a. detached from mirroring b. cleared mirroring (metaclear) c.... (4 Replies)
Discussion started by: mr.moralito
4 Replies

10. Solaris

Big UH-OH "Bad magic number in disk label"

I tried rebooting my Sun server just a few minutes ago and I got the following at boot: -- Sun Fire 280R (UltraSPARC-III+) , No Keyboard Copyright 1998-2002 Sun Microsystems, Inc. All rights reserved. OpenBoot 4.5, 1024 MB memory installed, Serial #xxxxxxxxx Ethernet address... (6 Replies)
Discussion started by: deckard
6 Replies
Login or Register to Ask a Question