10-06-2011
244,
25
Join Date: Aug 2009
Last Activity: 26 December 2011, 4:26 PM EST
Location: Munich (Germany)
Posts: 244
Thanks Given: 0
Thanked 25 Times in 25 Posts
Identify failed disk in Linux RAID
Good Evening,
2 years ago, I set up an Ubuntu file-server for a friend, who is a photograph amateur. Basically, the server offers a software RAID-5 that can be accessed remotely from a MAC. Unfortunately, I didn't labeled the hard drives (i.e. which physical drive corresponds to the /dev/sdX device).
Now a drive has failed, and the RAID-5 is at risk. I needed to find out which physical drive we have to replace, before we can rebuild the array. I have summed up below the procedure I'd follow. It would be great if some Linux software RAID connaisseur could review it. The more eyeballs, the better; and beside Linux RAID are quite new land for me.
1. stop raid system
# umount /dev/md1
# mdadm -S /dev/md1
2. Unplug one by one the hard drives. Looks in dmesg failure events for /dev/sdX. That way the mapping between the physical disk and the device /dev/sdX is step-by-step revealed.
3. Replace the failed disk, and partition it accordingly to what is expected.
4. Rebuild the mirror with the new disk
- get UUID with mdadm -query
- assemble array with that new disk: mdadm --assemble /dev/md -u XXX
- update /etc/mdadm.conf: mdadm --detail --scan >> /etc/mdadm.conf
You find below detailed information about the server set-up.
TIA,
Loïc
The setup:
Ubuntu server, 6 SATA Hard drives /dev/sda ... /dev/sdf
Each Drives (X=a..f) are partitioned as followed:
/sdX1 type Linux Partition
/sdX2 type swap
/sdX3 type extended
/sdX5 type RAID
The server has 2 software Raids:
/dev/md0 RAID1 /sda1 and /sdb1
/dev/md1 RAID5 /sda5, /sdb5, /sdc5, /sdd5, /sde5, /sdf5
The OS is located on /dev/md0, only application data are located on /dev/md1
The Failure:
A Fail event had been detected on md device /dev/md1.
It could be related to component device /dev/sdd5.
The /proc/mdstat file currently contains the following:
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid5 sde5[4] sdc5[2] sdd5[6](F) sdf5[5] sdb5[1] sda5[0]
9636429120 blocks level 5, 64k chunk, algorithm 2 [6/5] [UUU_UU]
md0 : active raid1 sdb1[1] sda1[0]
20506816 blocks [2/2] [UU]
unused devices: <none>