10-06-2011
Identify failed disk in Linux RAID
Good Evening,
2 years ago, I set up an Ubuntu file-server for a friend, who is a photograph amateur. Basically, the server offers a software RAID-5 that can be accessed remotely from a MAC. Unfortunately, I didn't labeled the hard drives (i.e. which physical drive corresponds to the /dev/sdX device).
Now a drive has failed, and the RAID-5 is at risk. I needed to find out which physical drive we have to replace, before we can rebuild the array. I have summed up below the procedure I'd follow. It would be great if some Linux software RAID connaisseur could review it. The more eyeballs, the better; and beside Linux RAID are quite new land for me.
1. stop raid system
# umount /dev/md1
# mdadm -S /dev/md1
2. Unplug one by one the hard drives. Looks in dmesg failure events for /dev/sdX. That way the mapping between the physical disk and the device /dev/sdX is step-by-step revealed.
3. Replace the failed disk, and partition it accordingly to what is expected.
4. Rebuild the mirror with the new disk
- get UUID with mdadm -query
- assemble array with that new disk: mdadm --assemble /dev/md -u XXX
- update /etc/mdadm.conf: mdadm --detail --scan >> /etc/mdadm.conf
You find below detailed information about the server set-up.
TIA,
Loïc
The setup:
Ubuntu server, 6 SATA Hard drives /dev/sda ... /dev/sdf
Each Drives (X=a..f) are partitioned as followed:
/sdX1 type Linux Partition
/sdX2 type swap
/sdX3 type extended
/sdX5 type RAID
The server has 2 software Raids:
/dev/md0 RAID1 /sda1 and /sdb1
/dev/md1 RAID5 /sda5, /sdb5, /sdc5, /sdd5, /sde5, /sdf5
The OS is located on /dev/md0, only application data are located on /dev/md1
The Failure:
A Fail event had been detected on md device /dev/md1.
It could be related to component device /dev/sdd5.
The /proc/mdstat file currently contains the following:
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid5 sde5[4] sdc5[2] sdd5[6](F) sdf5[5] sdb5[1] sda5[0]
9636429120 blocks level 5, 64k chunk, algorithm 2 [6/5] [UUU_UU]
md0 : active raid1 sdb1[1] sda1[0]
20506816 blocks [2/2] [UU]
unused devices: <none>
9 More Discussions You Might Find Interesting
1. Solaris
I need to upgrade 2 x 73 GB disk and replace with 2 x 146 GB disk in sun v240.
These disks contain boot and swap files
These are mirrored disks with RAID 1
I am trining to create the correct procedure.
So far the procedure I have is as follows:
# metastat
State: Okay
... (5 Replies)
Discussion started by: photon
5 Replies
2. AIX
Hi friends,....
am sindhiya,
i have joined as AIX level 1 support.
help me to identify the failed pv in vg which has some 4 physical volumes? (2 Replies)
Discussion started by: sindhiya
2 Replies
3. AIX
hi
how to identify the raid type on aix?
thx (1 Reply)
Discussion started by: melanie_pfefer
1 Replies
4. Linux
Hi
any idea on why I am getting this?
/sbin/mdadm --detail /dev/md0
mdadm: md device /dev/md0 does not appear to be active.
thanks. (2 Replies)
Discussion started by: melanie_pfefer
2 Replies
5. Filesystems, Disks and Memory
I found out that the raid 1 was degraded:
# cat /proc/mdstat
Personalities :
md3 : active raid1 sda5 sdb5
1822445428 blocks super 1.0
md2 : active raid1 sda3(F) sdb3
1073741688 blocks super 1.0
md1 : active raid1 sda2 sdb2
524276 blocks super 1.0
md0 : active raid1 sda1... (0 Replies)
Discussion started by: ZaNaToS
0 Replies
6. AIX
I am new to the AIX operating system and am seeking out some advice. We recently have had a drive go bad on our AIX server that is in a RAID 10 array. We have a replacement on the way. I was wondering what the correct steps are to swap out this drive. Does the server need to be powered off? Or can... (5 Replies)
Discussion started by: mpeter05
5 Replies
7. Shell Programming and Scripting
Hi All,
I have a pretty demanding requirement for an SFTP script I have been trying to put together.
I have nearly 100 files (all with the names staring with T_PROD) generated in my local server daily. I need to transfer each of these files to a remote server via SFTP (that's a client... (6 Replies)
Discussion started by: Aviktheory11
6 Replies
8. Solaris
Dear All ,
We need to do patching on one Solaris Server , where we have raid 0 configured.
What is the process to patch a Server if RAID 0 (Concat/Stripe) is there.
Below is the sample output.
# metadb
flags first blk block count
a m pc luo 16 ... (1 Reply)
Discussion started by: jegaraman
1 Replies
9. Solaris
Hi Guys,
I have performed OBP & ALOM upgrade on V240 system. One of my system, running Solaris 10, having issue to identify flash rom during ALOM 1.6.10 version upgrade (OBP upgraded to latest one).
May I know what the reason of this error and how can I fix it so I can upgrade ALOM using... (0 Replies)
Discussion started by: myrpthidesis
0 Replies
LEARN ABOUT REDHAT
mdadm.conf
MDADM.CONF(5) File Formats Manual MDADM.CONF(5)
NAME
mdadm.conf - configuration for management of Software Raid with mdadm
SYNOPSIS
/etc/mdadm.conf
DESCRIPTION
mdadm is a tool for creating, managing, and monitoring RAID devices using the md driver in Linux.
Some common tasks, such as assembling all arrays, can be simplified by describing the devices and arrays in this configuration file.
SYNTAX
The file should be seen as a collection of words separated by white space (space, tab, or newline). Any word that beings with a hash sign
(#) starts a comment and that word together with the remainder of the line is ignored.
Any line that starts with white space (space or tab) is treated as though it were a continuation of the previous line.
Empty lines are ignored, but otherwise each (non continuation) line must start with a keyword as listed below. The keywords are case
insensitive and can be abbreviated to 3 characters.
The keywords are:
DEVICE A device line lists the devices (whole devices or partitions) that might contain a component of an MD array. When looking for the
components of an array, mdadm will scan these devices (or any devices listed on the command line).
The device line may contain a number of different devices (separated by spaces) and each device name can contain wild cards as
defined by glob(7).
Also, there may be several device lines present in the file.
For example:
DEVICE /dev/hda* /dev/hdc*
DEV /dev/sd*
DEVICE /dev/discs/disc*/disc
ARRAY The ARRAY lines identify actual arrays. The second word on the line should be the name of the device where the array is normally
assembled, such as /dev/md1. Subsequent words identify the array, or identify the array as a member of a group. If multiple identi-
ties are given, then a component device must match ALL identities to be considered a match. Each identity word has a tag, and
equals sign, and some value. The tags are:
uuid= The value should be a 128 bit uuid in hexadecimal, with punctuation interspersed if desired. This must match the uuid stored in
the superblock.
super-minor=
The value is an integer which indicates the minor number that was stored in the superblock when the array was created. When an
array is created as /dev/mdX, then the minor number X is stored.
devices=
The value is a comma separated list of device names. Precisely these devices will be used to assemble the array. Note that the
devices listed there must also be listed on a DEVICE line.
level= The value is a raid level. This is not normally used to identify an array, but is supported so that the output of
mdadm --examine --scan
can be use directly in the configuration file.
num-devices=
The value is the number of devices in a complete active array. As with level= this is mainly for compatibility with the output
of
mdadm --examine --scan.
spare-group=
The value is a textual name for a group of arrays. All arrays with the same spare-group name are considered to be part of the
same group. The significance of a group of arrays is that mdadm will, when monitoring the arrays, move a spare drive from one
array in a group to another array in that group if the first array had a failed or missing drive but no spare.
MAILADDR
The mailaddr line gives an E-mail address that alerts should be sent to when is running in --monitor mode (and was given the --scan
option). There should only be one MAILADDR line and it should have only one address.
PROGRAM
The program line gives the name of a program to be run when mdadm --monitor detects potentially interesting events on any of the
arrays that it is monitoring. This program gets run with two or three arguments, they being the Event, the md device, and possibly
the related component device.
There should only be one program line and it should be give only one program.
EXAMPLE
DEVICE /dev/sd[bcdjkl]1
DEVICE /dev/hda1 /dev/hdb1
# /dev/md0 is known by it's UID.
ARRAY /dev/md0 UUID=3aaa0122:29827cfa:5331ad66:ca767371
# /dev/md1 contains all devices with a minor number of
# 1 in the superblock.
ARRAY /dev/md1 superminor=1
# /dev/md2 is made from precisey these two devices
ARRAY /dev/md2 devices=/dev/hda1,/dev/hda2
# /dev/md4 and /dev/md5 are a spare-group and spares
# can be moved between them
ARRAY /dev/md4 uuid=b23f3c6d:aec43a9f:fd65db85:369432df
spare-group=group1
ARRAY /dev/md5 uuid=19464854:03f71b1b:e0df2edd:246cc977
spare-group=group1
MAILADDR root@mydomain.tld
PROGRAM /usr/sbin/handle-mdadm-events
SEE ALSO
mdadm(8), md(4).
MDADM.CONF(5)