Sponsored Content
Top Forums UNIX for Advanced & Expert Users Identify failed disk in Linux RAID Post 302562352 by Loic Domaigne on Thursday 6th of October 2011 03:04:10 PM
Old 10-06-2011
Identify failed disk in Linux RAID

Good Evening,

2 years ago, I set up an Ubuntu file-server for a friend, who is a photograph amateur. Basically, the server offers a software RAID-5 that can be accessed remotely from a MAC. Unfortunately, I didn't labeled the hard drives (i.e. which physical drive corresponds to the /dev/sdX device).

Now a drive has failed, and the RAID-5 is at risk. I needed to find out which physical drive we have to replace, before we can rebuild the array. I have summed up below the procedure I'd follow. It would be great if some Linux software RAID connaisseur could review it. The more eyeballs, the better; and beside Linux RAID are quite new land for me.

1. stop raid system
# umount /dev/md1
# mdadm -S /dev/md1

2. Unplug one by one the hard drives. Looks in dmesg failure events for /dev/sdX. That way the mapping between the physical disk and the device /dev/sdX is step-by-step revealed.

3. Replace the failed disk, and partition it accordingly to what is expected.

4. Rebuild the mirror with the new disk
- get UUID with mdadm -query
- assemble array with that new disk: mdadm --assemble /dev/md -u XXX
- update /etc/mdadm.conf: mdadm --detail --scan >> /etc/mdadm.conf

You find below detailed information about the server set-up.

TIA,
Loïc

The setup:

Ubuntu server, 6 SATA Hard drives /dev/sda ... /dev/sdf

Each Drives (X=a..f) are partitioned as followed:
/sdX1 type Linux Partition
/sdX2 type swap
/sdX3 type extended
/sdX5 type RAID


The server has 2 software Raids:
/dev/md0 RAID1 /sda1 and /sdb1
/dev/md1 RAID5 /sda5, /sdb5, /sdc5, /sdd5, /sde5, /sdf5

The OS is located on /dev/md0, only application data are located on /dev/md1

The Failure:

A Fail event had been detected on md device /dev/md1.
It could be related to component device /dev/sdd5.
The /proc/mdstat file currently contains the following:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid5 sde5[4] sdc5[2] sdd5[6](F) sdf5[5] sdb5[1] sda5[0]
9636429120 blocks level 5, 64k chunk, algorithm 2 [6/5] [UUU_UU]

md0 : active raid1 sdb1[1] sda1[0]
20506816 blocks [2/2] [UU]


unused devices: <none>
 

9 More Discussions You Might Find Interesting

1. Solaris

Upgrade disk in RAID 1

I need to upgrade 2 x 73 GB disk and replace with 2 x 146 GB disk in sun v240. These disks contain boot and swap files These are mirrored disks with RAID 1 I am trining to create the correct procedure. So far the procedure I have is as follows: # metastat State: Okay ... (5 Replies)
Discussion started by: photon
5 Replies

2. AIX

to identify failed pv

Hi friends,.... am sindhiya, i have joined as AIX level 1 support. help me to identify the failed pv in vg which has some 4 physical volumes? (2 Replies)
Discussion started by: sindhiya
2 Replies

3. AIX

how to identify the raid type on aix

hi how to identify the raid type on aix? thx (1 Reply)
Discussion started by: melanie_pfefer
1 Replies

4. Linux

how to identify the raid type on Linux?

Hi any idea on why I am getting this? /sbin/mdadm --detail /dev/md0 mdadm: md device /dev/md0 does not appear to be active. thanks. (2 Replies)
Discussion started by: melanie_pfefer
2 Replies

5. Filesystems, Disks and Memory

Failed raid 1 partition cannot re-add

I found out that the raid 1 was degraded: # cat /proc/mdstat Personalities : md3 : active raid1 sda5 sdb5 1822445428 blocks super 1.0 md2 : active raid1 sda3(F) sdb3 1073741688 blocks super 1.0 md1 : active raid1 sda2 sdb2 524276 blocks super 1.0 md0 : active raid1 sda1... (0 Replies)
Discussion started by: ZaNaToS
0 Replies

6. AIX

RAID 10 Failed Drive Swap

I am new to the AIX operating system and am seeking out some advice. We recently have had a drive go bad on our AIX server that is in a RAID 10 array. We have a replacement on the way. I was wondering what the correct steps are to swap out this drive. Does the server need to be powered off? Or can... (5 Replies)
Discussion started by: mpeter05
5 Replies

7. Shell Programming and Scripting

Identify failed file transfers during SFTP

Hi All, I have a pretty demanding requirement for an SFTP script I have been trying to put together. I have nearly 100 files (all with the names staring with T_PROD) generated in my local server daily. I need to transfer each of these files to a remote server via SFTP (that's a client... (6 Replies)
Discussion started by: Aviktheory11
6 Replies

8. Solaris

Patching on Raid 0 Disk

Dear All , We need to do patching on one Solaris Server , where we have raid 0 configured. What is the process to patch a Server if RAID 0 (Concat/Stripe) is there. Below is the sample output. # metadb flags first blk block count a m pc luo 16 ... (1 Reply)
Discussion started by: jegaraman
1 Replies

9. Solaris

Failed to identify flash rom on Sunfire V240 running Solaris 10

Hi Guys, I have performed OBP & ALOM upgrade on V240 system. One of my system, running Solaris 10, having issue to identify flash rom during ALOM 1.6.10 version upgrade (OBP upgraded to latest one). May I know what the reason of this error and how can I fix it so I can upgrade ALOM using... (0 Replies)
Discussion started by: myrpthidesis
0 Replies
vxr5check(1M)															     vxr5check(1M)

NAME
vxr5check - verify RAID-5 volume parity SYNOPSIS
/etc/vx/bin/vxr5check [-i | -v] [-g diskgroup] volume DESCRIPTION
The vxr5check utility compares the parity of each stripe of a RAID-5 volume specified by volume. vxr5check reads the data for each stripe, generates the parity for this stripe, and compares this parity with the existing parity. vxr5check can be run against the entire RAID-5 volume, or incrementally on RAID-5 stripe boundaries, by specifying the -i option. OPTIONS
-g diskgroup Specifies the Veritas Volume Manager (VxVM) disk group name for the RAID-5 volume name for verification. If this option is not specified, the default disk group is determined using the rules given in the vxdg(1M) manual page. -i Verifies the RAID-5 volume incrementally per stripes. If a parity mismatch is found, that stripe location is displayed. -v Verbose output for the incremental vxr5check verification. The verbose option outputs each stripe number that is being verified. OUTPUT FORMAT
In verbose mode and incremental mode, summary reports for each stripe of the RAID-5 volume are printed in output records. If an error is returned for a stripe, then an error message and stripe number are displayed. In non-verbose mode, if an error is returned, an error mes- sage is displayed. If a parity mismatch error is determined on a stripe, vxr5check exits on that stripe and does not continue for the remaining stripes in the RAID-5 volume. FILES
/usr/lib/vxvm/bin/vxr5vrfy The utility that vxr5check calls to perform RAID-5 parity verification operations for the specified RAID-5 volume. EXIT CODES
The vxr5check utility exits with a non-zero status if the attempted operation fails. A non-zero exit code is not a complete indicator of the problems encountered, but rather denotes the first condition that prevented further execution of the utility. See vxintro(1M) for a list of standard exit codes. NOTES
Do not run vxr5check on a volume that is in degraded mode. SEE ALSO
vxevac(1M), vxintro(1M), vxmend(1M), vxvol(1M) VxVM 5.0.31.1 24 Mar 2008 vxr5check(1M)
All times are GMT -4. The time now is 07:13 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy