Identify failed disk in Linux RAID Post: 302562352

Sponsored Content

Top Forums UNIX for Advanced & Expert Users Identify failed disk in Linux RAID Post 302562352 by Loic Domaigne on Thursday 6th of October 2011 03:04:10 PM

10-06-2011

Registered User

Identify failed disk in Linux RAID

Good Evening,

2 years ago, I set up an Ubuntu file-server for a friend, who is a photograph amateur. Basically, the server offers a software RAID-5 that can be accessed remotely from a MAC. Unfortunately, I didn't labeled the hard drives (i.e. which physical drive corresponds to the /dev/sdX device).

Now a drive has failed, and the RAID-5 is at risk. I needed to find out which physical drive we have to replace, before we can rebuild the array. I have summed up below the procedure I'd follow. It would be great if some Linux software RAID connaisseur could review it. The more eyeballs, the better; and beside Linux RAID are quite new land for me.

1. stop raid system
# umount /dev/md1
# mdadm -S /dev/md1

2. Unplug one by one the hard drives. Looks in dmesg failure events for /dev/sdX. That way the mapping between the physical disk and the device /dev/sdX is step-by-step revealed.

3. Replace the failed disk, and partition it accordingly to what is expected.

4. Rebuild the mirror with the new disk
- get UUID with mdadm -query
- assemble array with that new disk: mdadm --assemble /dev/md -u XXX
- update /etc/mdadm.conf: mdadm --detail --scan >> /etc/mdadm.conf

You find below detailed information about the server set-up.

TIA,
Lo�c

The setup:

Ubuntu server, 6 SATA Hard drives /dev/sda ... /dev/sdf

Each Drives (X=a..f) are partitioned as followed:
/sdX1 type Linux Partition
/sdX2 type swap
/sdX3 type extended
/sdX5 type RAID

The server has 2 software Raids:
/dev/md0 RAID1 /sda1 and /sdb1
/dev/md1 RAID5 /sda5, /sdb5, /sdc5, /sdd5, /sde5, /sdf5

The OS is located on /dev/md0, only application data are located on /dev/md1

The Failure:

A Fail event had been detected on md device /dev/md1.
It could be related to component device /dev/sdd5.
The /proc/mdstat file currently contains the following:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid5 sde5[4] sdc5[2] sdd5[6](F) sdf5[5] sdb5[1] sda5[0]
9636429120 blocks level 5, 64k chunk, algorithm 2 [6/5] [UUU_UU]

md0 : active raid1 sdb1[1] sda1[0]
20506816 blocks [2/2] [UU]

unused devices: <none>

Loic Domaigne

View Public Profile for Loic Domaigne

Find all posts by Loic Domaigne

9 More Discussions You Might Find Interesting

1. Solaris

Upgrade disk in RAID 1

I need to upgrade 2 x 73 GB disk and replace with 2 x 146 GB disk in sun v240. These disks contain boot and swap files These are mirrored disks with RAID 1 I am trining to create the correct procedure. So far the procedure I have is as follows: # metastat State: Okay ...

2. AIX

to identify failed pv

Hi friends,.... am sindhiya, i have joined as AIX level 1 support. help me to identify the failed pv in vg which has some 4 physical volumes?

3. AIX

how to identify the raid type on aix

hi how to identify the raid type on aix? thx

4. Linux

how to identify the raid type on Linux?

Hi any idea on why I am getting this? /sbin/mdadm --detail /dev/md0 mdadm: md device /dev/md0 does not appear to be active. thanks.

5. Filesystems, Disks and Memory

Failed raid 1 partition cannot re-add

I found out that the raid 1 was degraded: # cat /proc/mdstat Personalities : md3 : active raid1 sda5 sdb5 1822445428 blocks super 1.0 md2 : active raid1 sda3(F) sdb3 1073741688 blocks super 1.0 md1 : active raid1 sda2 sdb2 524276 blocks super 1.0 md0 : active raid1 sda1...

6. AIX

RAID 10 Failed Drive Swap

I am new to the AIX operating system and am seeking out some advice. We recently have had a drive go bad on our AIX server that is in a RAID 10 array. We have a replacement on the way. I was wondering what the correct steps are to swap out this drive. Does the server need to be powered off? Or can...

7. Shell Programming and Scripting

Identify failed file transfers during SFTP

Hi All, I have a pretty demanding requirement for an SFTP script I have been trying to put together. I have nearly 100 files (all with the names staring with T_PROD) generated in my local server daily. I need to transfer each of these files to a remote server via SFTP (that's a client...

8. Solaris

Patching on Raid 0 Disk

Dear All , We need to do patching on one Solaris Server , where we have raid 0 configured. What is the process to patch a Server if RAID 0 (Concat/Stripe) is there. Below is the sample output. # metadb flags first blk block count a m pc luo 16 ...

9. Solaris

Failed to identify flash rom on Sunfire V240 running Solaris 10

Hi Guys, I have performed OBP & ALOM upgrade on V240 system. One of my system, running Solaris 10, having issue to identify flash rom during ALOM 1.6.10 version upgrade (OBP upgraded to latest one). May I know what the reason of this error and how can I fix it so I can upgrade ALOM using...

LEARN ABOUT REDHAT

mdadm.conf

MDADM.CONF(5)							File Formats Manual						     MDADM.CONF(5)

NAME

       mdadm.conf - configuration for management of Software Raid with mdadm

SYNOPSIS

       /etc/mdadm.conf

DESCRIPTION

       mdadm is a tool for creating, managing, and monitoring RAID devices using the md driver in Linux.

       Some common tasks, such as assembling all arrays, can be simplified by describing the devices and arrays in this configuration file.

   SYNTAX
       The  file should be seen as a collection of words separated by white space (space, tab, or newline).  Any word that beings with a hash sign
       (#) starts a comment and that word together with the remainder of the line is ignored.

       Any line that starts with white space (space or tab) is treated as though it were a continuation of the previous line.

       Empty lines are ignored, but otherwise each (non continuation) line must start with a keyword as  listed  below.   The  keywords  are  case
       insensitive and can be abbreviated to 3 characters.

       The keywords are:

       DEVICE A  device  line lists the devices (whole devices or partitions) that might contain a component of an MD array.  When looking for the
	      components of an array, mdadm will scan these devices (or any devices listed on the command line).

	      The device line may contain a number of different devices (separated by spaces) and each device  name  can  contain  wild  cards	as
	      defined by glob(7).

	      Also, there may be several device lines present in the file.

	      For example:

	      DEVICE /dev/hda* /dev/hdc*
	      DEV    /dev/sd*
	      DEVICE /dev/discs/disc*/disc

       ARRAY  The  ARRAY  lines  identify actual arrays.  The second word on the line should be the name of the device where the array is normally
	      assembled, such as /dev/md1.  Subsequent words identify the array, or identify the array as a member of a group. If multiple identi-
	      ties  are  given,  then  a  component  device must match ALL identities to be considered a match.  Each identity word has a tag, and
	      equals sign, and some value.  The tags are:

	   uuid=  The value should be a 128 bit uuid in hexadecimal, with punctuation interspersed if desired.	This must match the uuid stored in
		  the superblock.

	   super-minor=
		  The  value  is an integer which indicates the minor number that was stored in the superblock when the array was created. When an
		  array is created as /dev/mdX, then the minor number X is stored.

	   devices=
		  The value is a comma separated list of device names. Precisely these devices will be used to assemble the array.  Note that  the
		  devices listed there must also be listed on a DEVICE line.

	   level= The value is a raid level.  This is not normally used to identify an array, but is supported so that the output of

		  mdadm --examine --scan

		  can be use directly in the configuration file.

	   num-devices=
		  The  value is the number of devices in a complete active array.  As with level= this is mainly for compatibility with the output
		  of

		  mdadm --examine --scan.

	   spare-group=
		  The value is a textual name for a group of arrays.  All arrays with the same spare-group name are considered to be part  of  the
		  same	group.	 The significance of a group of arrays is that mdadm will, when monitoring the arrays, move a spare drive from one
		  array in a group to another array in that group if the first array had a failed or missing drive but no spare.

       MAILADDR
	      The mailaddr line gives an E-mail address that alerts should be sent to when is running in --monitor mode (and was given the  --scan
	      option).	There should only be one MAILADDR line and it should have only one address.

       PROGRAM
	      The  program  line  gives  the name of a program to be run when mdadm --monitor detects potentially interesting events on any of the
	      arrays that it is monitoring.  This program gets run with two or three arguments, they being the Event, the md device, and  possibly
	      the related component device.

	      There should only be one program line and it should be give only one program.

EXAMPLE

       DEVICE /dev/sd[bcdjkl]1
       DEVICE /dev/hda1 /dev/hdb1

       # /dev/md0 is known by it's UID.
       ARRAY /dev/md0 UUID=3aaa0122:29827cfa:5331ad66:ca767371
       # /dev/md1 contains all devices with a minor number of
       #   1 in the superblock.
       ARRAY /dev/md1 superminor=1
       # /dev/md2 is made from precisey these two devices
       ARRAY /dev/md2 devices=/dev/hda1,/dev/hda2

       # /dev/md4 and /dev/md5 are a spare-group and spares
       #  can be moved between them
       ARRAY /dev/md4 uuid=b23f3c6d:aec43a9f:fd65db85:369432df
		  spare-group=group1
       ARRAY /dev/md5 uuid=19464854:03f71b1b:e0df2edd:246cc977
		  spare-group=group1

       MAILADDR root@mydomain.tld
       PROGRAM /usr/sbin/handle-mdadm-events

SEE ALSO

       mdadm(8), md(4).

																     MDADM.CONF(5)