Sponsored Content
Top Forums UNIX for Advanced & Expert Users Identify failed disk in Linux RAID Post 302562352 by Loic Domaigne on Thursday 6th of October 2011 03:04:10 PM
Old 10-06-2011
Identify failed disk in Linux RAID

Good Evening,

2 years ago, I set up an Ubuntu file-server for a friend, who is a photograph amateur. Basically, the server offers a software RAID-5 that can be accessed remotely from a MAC. Unfortunately, I didn't labeled the hard drives (i.e. which physical drive corresponds to the /dev/sdX device).

Now a drive has failed, and the RAID-5 is at risk. I needed to find out which physical drive we have to replace, before we can rebuild the array. I have summed up below the procedure I'd follow. It would be great if some Linux software RAID connaisseur could review it. The more eyeballs, the better; and beside Linux RAID are quite new land for me.

1. stop raid system
# umount /dev/md1
# mdadm -S /dev/md1

2. Unplug one by one the hard drives. Looks in dmesg failure events for /dev/sdX. That way the mapping between the physical disk and the device /dev/sdX is step-by-step revealed.

3. Replace the failed disk, and partition it accordingly to what is expected.

4. Rebuild the mirror with the new disk
- get UUID with mdadm -query
- assemble array with that new disk: mdadm --assemble /dev/md -u XXX
- update /etc/mdadm.conf: mdadm --detail --scan >> /etc/mdadm.conf

You find below detailed information about the server set-up.

TIA,
Loïc

The setup:

Ubuntu server, 6 SATA Hard drives /dev/sda ... /dev/sdf

Each Drives (X=a..f) are partitioned as followed:
/sdX1 type Linux Partition
/sdX2 type swap
/sdX3 type extended
/sdX5 type RAID


The server has 2 software Raids:
/dev/md0 RAID1 /sda1 and /sdb1
/dev/md1 RAID5 /sda5, /sdb5, /sdc5, /sdd5, /sde5, /sdf5

The OS is located on /dev/md0, only application data are located on /dev/md1

The Failure:

A Fail event had been detected on md device /dev/md1.
It could be related to component device /dev/sdd5.
The /proc/mdstat file currently contains the following:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid5 sde5[4] sdc5[2] sdd5[6](F) sdf5[5] sdb5[1] sda5[0]
9636429120 blocks level 5, 64k chunk, algorithm 2 [6/5] [UUU_UU]

md0 : active raid1 sdb1[1] sda1[0]
20506816 blocks [2/2] [UU]


unused devices: <none>
 

9 More Discussions You Might Find Interesting

1. Solaris

Upgrade disk in RAID 1

I need to upgrade 2 x 73 GB disk and replace with 2 x 146 GB disk in sun v240. These disks contain boot and swap files These are mirrored disks with RAID 1 I am trining to create the correct procedure. So far the procedure I have is as follows: # metastat State: Okay ... (5 Replies)
Discussion started by: photon
5 Replies

2. AIX

to identify failed pv

Hi friends,.... am sindhiya, i have joined as AIX level 1 support. help me to identify the failed pv in vg which has some 4 physical volumes? (2 Replies)
Discussion started by: sindhiya
2 Replies

3. AIX

how to identify the raid type on aix

hi how to identify the raid type on aix? thx (1 Reply)
Discussion started by: melanie_pfefer
1 Replies

4. Linux

how to identify the raid type on Linux?

Hi any idea on why I am getting this? /sbin/mdadm --detail /dev/md0 mdadm: md device /dev/md0 does not appear to be active. thanks. (2 Replies)
Discussion started by: melanie_pfefer
2 Replies

5. Filesystems, Disks and Memory

Failed raid 1 partition cannot re-add

I found out that the raid 1 was degraded: # cat /proc/mdstat Personalities : md3 : active raid1 sda5 sdb5 1822445428 blocks super 1.0 md2 : active raid1 sda3(F) sdb3 1073741688 blocks super 1.0 md1 : active raid1 sda2 sdb2 524276 blocks super 1.0 md0 : active raid1 sda1... (0 Replies)
Discussion started by: ZaNaToS
0 Replies

6. AIX

RAID 10 Failed Drive Swap

I am new to the AIX operating system and am seeking out some advice. We recently have had a drive go bad on our AIX server that is in a RAID 10 array. We have a replacement on the way. I was wondering what the correct steps are to swap out this drive. Does the server need to be powered off? Or can... (5 Replies)
Discussion started by: mpeter05
5 Replies

7. Shell Programming and Scripting

Identify failed file transfers during SFTP

Hi All, I have a pretty demanding requirement for an SFTP script I have been trying to put together. I have nearly 100 files (all with the names staring with T_PROD) generated in my local server daily. I need to transfer each of these files to a remote server via SFTP (that's a client... (6 Replies)
Discussion started by: Aviktheory11
6 Replies

8. Solaris

Patching on Raid 0 Disk

Dear All , We need to do patching on one Solaris Server , where we have raid 0 configured. What is the process to patch a Server if RAID 0 (Concat/Stripe) is there. Below is the sample output. # metadb flags first blk block count a m pc luo 16 ... (1 Reply)
Discussion started by: jegaraman
1 Replies

9. Solaris

Failed to identify flash rom on Sunfire V240 running Solaris 10

Hi Guys, I have performed OBP & ALOM upgrade on V240 system. One of my system, running Solaris 10, having issue to identify flash rom during ALOM 1.6.10 version upgrade (OBP upgraded to latest one). May I know what the reason of this error and how can I fix it so I can upgrade ALOM using... (0 Replies)
Discussion started by: myrpthidesis
0 Replies
MDADM.CONF(5)							File Formats Manual						     MDADM.CONF(5)

NAME
mdadm.conf - configuration for management of Software Raid with mdadm SYNOPSIS
/etc/mdadm.conf DESCRIPTION
mdadm is a tool for creating, managing, and monitoring RAID devices using the md driver in Linux. Some common tasks, such as assembling all arrays, can be simplified by describing the devices and arrays in this configuration file. SYNTAX The file should be seen as a collection of words separated by white space (space, tab, or newline). Any word that beings with a hash sign (#) starts a comment and that word together with the remainder of the line is ignored. Any line that starts with white space (space or tab) is treated as though it were a continuation of the previous line. Empty lines are ignored, but otherwise each (non continuation) line must start with a keyword as listed below. The keywords are case insensitive and can be abbreviated to 3 characters. The keywords are: DEVICE A device line lists the devices (whole devices or partitions) that might contain a component of an MD array. When looking for the components of an array, mdadm will scan these devices (or any devices listed on the command line). The device line may contain a number of different devices (separated by spaces) and each device name can contain wild cards as defined by glob(7). Also, there may be several device lines present in the file. For example: DEVICE /dev/hda* /dev/hdc* DEV /dev/sd* DEVICE /dev/discs/disc*/disc ARRAY The ARRAY lines identify actual arrays. The second word on the line should be the name of the device where the array is normally assembled, such as /dev/md1. Subsequent words identify the array, or identify the array as a member of a group. If multiple identi- ties are given, then a component device must match ALL identities to be considered a match. Each identity word has a tag, and equals sign, and some value. The tags are: uuid= The value should be a 128 bit uuid in hexadecimal, with punctuation interspersed if desired. This must match the uuid stored in the superblock. super-minor= The value is an integer which indicates the minor number that was stored in the superblock when the array was created. When an array is created as /dev/mdX, then the minor number X is stored. devices= The value is a comma separated list of device names. Precisely these devices will be used to assemble the array. Note that the devices listed there must also be listed on a DEVICE line. level= The value is a raid level. This is not normally used to identify an array, but is supported so that the output of mdadm --examine --scan can be use directly in the configuration file. num-devices= The value is the number of devices in a complete active array. As with level= this is mainly for compatibility with the output of mdadm --examine --scan. spare-group= The value is a textual name for a group of arrays. All arrays with the same spare-group name are considered to be part of the same group. The significance of a group of arrays is that mdadm will, when monitoring the arrays, move a spare drive from one array in a group to another array in that group if the first array had a failed or missing drive but no spare. MAILADDR The mailaddr line gives an E-mail address that alerts should be sent to when is running in --monitor mode (and was given the --scan option). There should only be one MAILADDR line and it should have only one address. PROGRAM The program line gives the name of a program to be run when mdadm --monitor detects potentially interesting events on any of the arrays that it is monitoring. This program gets run with two or three arguments, they being the Event, the md device, and possibly the related component device. There should only be one program line and it should be give only one program. EXAMPLE
DEVICE /dev/sd[bcdjkl]1 DEVICE /dev/hda1 /dev/hdb1 # /dev/md0 is known by it's UID. ARRAY /dev/md0 UUID=3aaa0122:29827cfa:5331ad66:ca767371 # /dev/md1 contains all devices with a minor number of # 1 in the superblock. ARRAY /dev/md1 superminor=1 # /dev/md2 is made from precisey these two devices ARRAY /dev/md2 devices=/dev/hda1,/dev/hda2 # /dev/md4 and /dev/md5 are a spare-group and spares # can be moved between them ARRAY /dev/md4 uuid=b23f3c6d:aec43a9f:fd65db85:369432df spare-group=group1 ARRAY /dev/md5 uuid=19464854:03f71b1b:e0df2edd:246cc977 spare-group=group1 MAILADDR root@mydomain.tld PROGRAM /usr/sbin/handle-mdadm-events SEE ALSO
mdadm(8), md(4). MDADM.CONF(5)
All times are GMT -4. The time now is 09:27 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy