10-06-2011
Identify failed disk in Linux RAID
Good Evening,
2 years ago, I set up an Ubuntu file-server for a friend, who is a photograph amateur. Basically, the server offers a software RAID-5 that can be accessed remotely from a MAC. Unfortunately, I didn't labeled the hard drives (i.e. which physical drive corresponds to the /dev/sdX device).
Now a drive has failed, and the RAID-5 is at risk. I needed to find out which physical drive we have to replace, before we can rebuild the array. I have summed up below the procedure I'd follow. It would be great if some Linux software RAID connaisseur could review it. The more eyeballs, the better; and beside Linux RAID are quite new land for me.
1. stop raid system
# umount /dev/md1
# mdadm -S /dev/md1
2. Unplug one by one the hard drives. Looks in dmesg failure events for /dev/sdX. That way the mapping between the physical disk and the device /dev/sdX is step-by-step revealed.
3. Replace the failed disk, and partition it accordingly to what is expected.
4. Rebuild the mirror with the new disk
- get UUID with mdadm -query
- assemble array with that new disk: mdadm --assemble /dev/md -u XXX
- update /etc/mdadm.conf: mdadm --detail --scan >> /etc/mdadm.conf
You find below detailed information about the server set-up.
TIA,
Loïc
The setup:
Ubuntu server, 6 SATA Hard drives /dev/sda ... /dev/sdf
Each Drives (X=a..f) are partitioned as followed:
/sdX1 type Linux Partition
/sdX2 type swap
/sdX3 type extended
/sdX5 type RAID
The server has 2 software Raids:
/dev/md0 RAID1 /sda1 and /sdb1
/dev/md1 RAID5 /sda5, /sdb5, /sdc5, /sdd5, /sde5, /sdf5
The OS is located on /dev/md0, only application data are located on /dev/md1
The Failure:
A Fail event had been detected on md device /dev/md1.
It could be related to component device /dev/sdd5.
The /proc/mdstat file currently contains the following:
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid5 sde5[4] sdc5[2] sdd5[6](F) sdf5[5] sdb5[1] sda5[0]
9636429120 blocks level 5, 64k chunk, algorithm 2 [6/5] [UUU_UU]
md0 : active raid1 sdb1[1] sda1[0]
20506816 blocks [2/2] [UU]
unused devices: <none>
9 More Discussions You Might Find Interesting
1. Solaris
I need to upgrade 2 x 73 GB disk and replace with 2 x 146 GB disk in sun v240.
These disks contain boot and swap files
These are mirrored disks with RAID 1
I am trining to create the correct procedure.
So far the procedure I have is as follows:
# metastat
State: Okay
... (5 Replies)
Discussion started by: photon
5 Replies
2. AIX
Hi friends,....
am sindhiya,
i have joined as AIX level 1 support.
help me to identify the failed pv in vg which has some 4 physical volumes? (2 Replies)
Discussion started by: sindhiya
2 Replies
3. AIX
hi
how to identify the raid type on aix?
thx (1 Reply)
Discussion started by: melanie_pfefer
1 Replies
4. Linux
Hi
any idea on why I am getting this?
/sbin/mdadm --detail /dev/md0
mdadm: md device /dev/md0 does not appear to be active.
thanks. (2 Replies)
Discussion started by: melanie_pfefer
2 Replies
5. Filesystems, Disks and Memory
I found out that the raid 1 was degraded:
# cat /proc/mdstat
Personalities :
md3 : active raid1 sda5 sdb5
1822445428 blocks super 1.0
md2 : active raid1 sda3(F) sdb3
1073741688 blocks super 1.0
md1 : active raid1 sda2 sdb2
524276 blocks super 1.0
md0 : active raid1 sda1... (0 Replies)
Discussion started by: ZaNaToS
0 Replies
6. AIX
I am new to the AIX operating system and am seeking out some advice. We recently have had a drive go bad on our AIX server that is in a RAID 10 array. We have a replacement on the way. I was wondering what the correct steps are to swap out this drive. Does the server need to be powered off? Or can... (5 Replies)
Discussion started by: mpeter05
5 Replies
7. Shell Programming and Scripting
Hi All,
I have a pretty demanding requirement for an SFTP script I have been trying to put together.
I have nearly 100 files (all with the names staring with T_PROD) generated in my local server daily. I need to transfer each of these files to a remote server via SFTP (that's a client... (6 Replies)
Discussion started by: Aviktheory11
6 Replies
8. Solaris
Dear All ,
We need to do patching on one Solaris Server , where we have raid 0 configured.
What is the process to patch a Server if RAID 0 (Concat/Stripe) is there.
Below is the sample output.
# metadb
flags first blk block count
a m pc luo 16 ... (1 Reply)
Discussion started by: jegaraman
1 Replies
9. Solaris
Hi Guys,
I have performed OBP & ALOM upgrade on V240 system. One of my system, running Solaris 10, having issue to identify flash rom during ALOM 1.6.10 version upgrade (OBP upgraded to latest one).
May I know what the reason of this error and how can I fix it so I can upgrade ALOM using... (0 Replies)
Discussion started by: myrpthidesis
0 Replies