RAID5 multi disk failure

01-23-2012

Registered User

358, 5

Join Date: Nov 2008

Last Activity: 11 June 2020, 6:22 AM EDT

Location: various

Posts: 358

Thanks Given: 17

Thanked 5 Times in 5 Posts

RAID5 multi disk failure

Hi there,

Don't know if my title is relevant but I'm dealing with dangerous materials that I don't really know and I'm very afraid to mess anything up.

I have a Debian 5.0.4 server with 4 x 1TB hard drives.

I have the following mdstat

Code:

Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] [faulty]
md1 : active raid1 sda1[0] sdd1[3] sdb1[1] sdc1[2]
      1024896 blocks [4/4] [UUUU]

md5 : active raid1 sda5[0] sdd5[3] sdb5[1] sdc5[2]
      1023872 blocks [4/4] [UUUU]

md6 : active raid1 sda6[0] sdd6[3] sdb6[1]
      1023872 blocks [4/3] [UU_U]

md7 : active raid1 sda7[0] sdd7[3] sdb7[1] sdc7[2]
      1023872 blocks [4/4] [UUUU]

md8 : active raid1 sdd8[3] sdb8[1] sdc8[2]
      1023872 blocks [4/3] [_UUU]

unused devices: <none>

That's kind of weird because I use to have a huge md10 partition with a monstruous amount of important files.

I have no idea where to start!

I tried to examine the partitions in the multi-disk :

Code:

root@titan:~# mdadm --examine /dev/sda10
/dev/sda10:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 0b972a2e:3aaabcf9:a4d2adc2:26fd5302
  Creation Time : Sat Apr 17 16:30:50 2010
     Raid Level : raid5
  Used Dev Size : 1459502912 (1391.89 GiB 1494.53 GB)
     Array Size : 4378508736 (4175.67 GiB 4483.59 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 10

    Update Time : Sun Jun  5 16:00:41 2011
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0
       Checksum : ac3fac12 - correct
         Events : 2552115

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     0       8       10        0      active sync   /dev/sda10

   0     0       8       10        0      active sync   /dev/sda10
   1     1       8       26        1      active sync   /dev/sdb10
   2     2       8       42        2      active sync   /dev/sdc10
   3     3       8       58        3      active sync   /dev/sdd10
root@titan:~# mdadm --examine /dev/sdb10
/dev/sdb10:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 0b972a2e:3aaabcf9:a4d2adc2:26fd5302
  Creation Time : Sat Apr 17 16:30:50 2010
     Raid Level : raid5
  Used Dev Size : 1459502912 (1391.89 GiB 1494.53 GB)
     Array Size : 4378508736 (4175.67 GiB 4483.59 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 10

    Update Time : Mon Jan 23 12:05:02 2012
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 1
  Spare Devices : 0
       Checksum : ade16f37 - correct
         Events : 6224199

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     1       8       26        1      active sync   /dev/sdb10

   0     0       0        0        0      removed
   1     1       8       26        1      active sync   /dev/sdb10
   2     2       0        0        2      faulty removed
   3     3       8       58        3      active sync   /dev/sdd10
root@titan:~# mdadm --examine /dev/sdc10
/dev/sdc10:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 0b972a2e:3aaabcf9:a4d2adc2:26fd5302
  Creation Time : Sat Apr 17 16:30:50 2010
     Raid Level : raid5
  Used Dev Size : 1459502912 (1391.89 GiB 1494.53 GB)
     Array Size : 4378508736 (4175.67 GiB 4483.59 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 10

    Update Time : Fri Jan 20 23:16:43 2012
          State : active
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0
       Checksum : ad7f1c03 - correct
         Events : 6223465

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     2       8       42        2      active sync   /dev/sdc10

   0     0       0        0        0      removed
   1     1       8       26        1      active sync   /dev/sdb10
   2     2       8       42        2      active sync   /dev/sdc10
   3     3       8       58        3      active sync   /dev/sdd10
root@titan:~# mdadm --examine /dev/sdd10
/dev/sdd10:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 0b972a2e:3aaabcf9:a4d2adc2:26fd5302
  Creation Time : Sat Apr 17 16:30:50 2010
     Raid Level : raid5
  Used Dev Size : 1459502912 (1391.89 GiB 1494.53 GB)
     Array Size : 4378508736 (4175.67 GiB 4483.59 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 10

    Update Time : Mon Jan 23 12:05:02 2012
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 1
  Spare Devices : 0
       Checksum : ade16f5b - correct
         Events : 6224199

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     3       8       58        3      active sync   /dev/sdd10

   0     0       0        0        0      removed
   1     1       8       26        1      active sync   /dev/sdb10
   2     2       0        0        2      faulty removed
   3     3       8       58        3      active sync   /dev/sdd10

But that doesn't really help...
I have no idea how to interpret the results!
I'm scared with the "faulty" and "removed" warnings.
Can anyone give me a hint?
Is there any other command I can run to regain access to the data, at least read-only?

Thanks for your help.
Santiago

chebarbudo

View Public Profile for chebarbudo

Find all posts by chebarbudo

01-23-2012

Registered User

1,155, 93

Join Date: Dec 2007

Last Activity: 28 December 2019, 12:50 PM EST

Posts: 1,155

Thanks Given: 5

Thanked 93 Times in 90 Posts

been a while since I worked with md so I can't help you much there. I would check all disks and SMART data for any errors.

RAID is not a substitute for backups.

Are you able to mount the file systems that are using those volumes?

frank_rizzo

View Public Profile for frank_rizzo

Find all posts by frank_rizzo

01-25-2012

Registered User

358, 5

Join Date: Nov 2008

Last Activity: 11 June 2020, 6:22 AM EDT

Location: various

Posts: 358

Thanks Given: 17

Thanked 5 Times in 5 Posts

OK, thanks to your pieces of advice, I went a little further :
I can tell that two of my 4 disks are removed from the array.

Code:

# mdadm --examine /dev/sda10 | grep 'Update Time'
    Update Time : Sun Jun  5 16:00:41 2011
# mdadm --examine /dev/sdb10 | grep 'Update Time'
    Update Time : Mon Jan 23 12:05:02 2012
# mdadm --examine /dev/sdc10 | grep 'Update Time'
    Update Time : Fri Jan 20 23:16:43 2012
# mdadm --examine /dev/sdd10 | grep 'Update Time'
    Update Time : Mon Jan 23 12:05:02 2012

One failed in june 2011, the second one failed 5 days ago.
I thought that RAID5 would turn read only as soon as one disk fails.
Does anyone knows more?
Please let's not discuss how crazy it is to have let my RAID5 run with one disk removed during 6 month. I didn't know what SMART was before now (belive me I'm reading the manual).

For more information, here is the status of the array

Code:

# mdadm --examine /dev/sdb10 | tail -6
this     1       8       26        1      active sync   /dev/sdb10

   0     0       0        0        0      removed
   1     1       8       26        1      active sync   /dev/sdb10
   2     2       0        0        2      faulty removed
   3     3       8       58        3      active sync   /dev/sdd10

Is there any chance I can resync 2 disks out of 4?

Any help will be appreciated.

chebarbudo

View Public Profile for chebarbudo

Find all posts by chebarbudo

01-27-2012

Registered User

358, 5

Join Date: Nov 2008

Last Activity: 11 June 2020, 6:22 AM EDT

Location: various

Posts: 358

Thanks Given: 17

Thanked 5 Times in 5 Posts

Hi there, me again,

I think my problem is somewhere else.
I know no disk is broken given that there are a few other raid arrays using the same 4 disks:

Code:

root@titan:~# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] [faulty]
md1 : active raid1 sda1[0] sdd1[3] sdb1[1] sdc1[2]
      1024896 blocks [4/4] [UUUU]

md5 : active raid1 sda5[0] sdd5[3] sdb5[1] sdc5[2]
      1023872 blocks [4/4] [UUUU]

md6 : active raid1 sdc6[2] sda6[0] sdd6[3] sdb6[1]
      1023872 blocks [4/4] [UUUU]

md7 : active raid1 sda7[0] sdd7[3] sdb7[1] sdc7[2]
      1023872 blocks [4/4] [UUUU]

md8 : active raid1 sda8[0] sdd8[3] sdb8[1] sdc8[2]
      1023872 blocks [4/4] [UUUU]

unused devices: <none>

So I thought I should just check the disks.
Problem: fsck doesn't work:

Code:

root@titan:~# fsck.ext3 /dev/sdc10
#e2fsck 1.41.3 (12-Oct-2008)
fsck.ext3: Superblock invalid, trying backup blocks...
fsck.ext3: Bad magic number in super-block while trying to open /dev/sdc10

The superblock could not be read or does not describe a correct ext2
filesystem.  If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>

root@titan:~# fsck.ext3 -b 8193 /dev/sdc10
e2fsck 1.41.3 (12-Oct-2008)
fsck.ext3: Bad magic number in super-block while trying to open /dev/sdc10

The superblock could not be read or does not describe a correct ext2
filesystem.  If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>

How can I repair the filesystem on /dev/sdc10?

Thanks for your help
Santiago

chebarbudo

View Public Profile for chebarbudo

Find all posts by chebarbudo

UNIX for Advanced & Expert Users

RAID5 multi disk failure

10 More Discussions You Might Find Interesting

1. AIX

AIX hard disk failure

Discussion started by: Phat

2. Solaris

Poor disk performance however no sign of failure

Discussion started by: s ladd

3. Red Hat

How to monitor HP server hard disk failure ?

Discussion started by: maxlee24

4. Solaris

Configure disk array in RAID5 and create file system

Discussion started by: Kjons76

5. Solaris

SAN disk failure

Discussion started by: cesarNZ

6. Filesystems, Disks and Memory

Looking for a solution to disk failure!

Discussion started by: christopher4

7. Filesystems, Disks and Memory

Looking for a solution to disk failure!

Discussion started by: adam466

8. SCO

Raid5 Failure

Discussion started by: gseyforth

9. HP-UX

Disk Failure

Discussion started by: SemiOfCol

10. UNIX for Advanced & Expert Users

Disk failure

Discussion started by: shauche