Home Man
Today's Posts

Linux & Unix Commands - Search Man Pages
Man Page or Keyword Search:
Select Section of Man Page:
Select Man Page Repository:

NetBSD 6.1.5 - man page for raidctl (netbsd section 8)

RAIDCTL(8)			   BSD System Manager's Manual			       RAIDCTL(8)

     raidctl -- configuration utility for the RAIDframe disk driver

     raidctl [-v] -a component dev
     raidctl [-v] -A [yes | no | root] dev
     raidctl [-v] -B dev
     raidctl [-v] -c config_file dev
     raidctl [-v] -C config_file dev
     raidctl [-v] -f component dev
     raidctl [-v] -F component dev
     raidctl [-v] -g component dev
     raidctl [-v] -G dev
     raidctl [-v] -i dev
     raidctl [-v] -I serial_number dev
     raidctl [-v] -m dev
     raidctl [-v] -M [yes | no | set params] dev
     raidctl [-v] -p dev
     raidctl [-v] -P dev
     raidctl [-v] -r component dev
     raidctl [-v] -R component dev
     raidctl [-v] -s dev
     raidctl [-v] -S dev
     raidctl [-v] -u dev

     raidctl is the user-land control program for raid(4), the RAIDframe disk device.  raidctl is
     primarily used to dynamically configure and unconfigure RAIDframe disk devices.  For more
     information about the RAIDframe disk device, see raid(4).

     This document assumes the reader has at least rudimentary knowledge of RAID and RAID con-

     The command-line options for raidctl are as follows:

     -a component dev
	     Add component as a hot spare for the device dev.  Component labels (which identify
	     the location of a given component within a particular RAID set) are automatically
	     added to the hot spare after it has been used and are not required for component
	     before it is used.

     -A yes dev
	     Make the RAID set auto-configurable.  The RAID set will be automatically configured
	     at boot before the root file system is mounted.  Note that all components of the set
	     must be of type RAID in the disklabel.

     -A no dev
	     Turn off auto-configuration for the RAID set.

     -A root dev
	     Make the RAID set auto-configurable, and also mark the set as being eligible to be
	     the root partition.  A RAID set configured this way will override the use of the
	     boot disk as the root device.  All components of the set must be of type RAID in the
	     disklabel.  Note that only certain architectures (currently alpha, amd64, i386,
	     pmax, sparc, sparc64, and vax) support booting a kernel directly from a RAID set.

     -B dev  Initiate a copyback of reconstructed data from a spare disk to its original disk.
	     This is performed after a component has failed, and the failed drive has been recon-
	     structed onto a spare drive.

     -c config_file dev
	     Configure the RAIDframe device dev according to the configuration given in
	     config_file.  A description of the contents of config_file is given later.

     -C config_file dev
	     As for -c, but forces the configuration to take place.  Fatal errors due to unini-
	     tialized components are ignored.  This is required the first time a RAID set is con-

     -f component dev
	     This marks the specified component as having failed, but does not initiate a recon-
	     struction of that component.

     -F component dev
	     Fails the specified component of the device, and immediately begin a reconstruction
	     of the failed disk onto an available hot spare.  This is one of the mechanisms used
	     to start the reconstruction process if a component does have a hardware failure.

     -g component dev
	     Get the component label for the specified component.

     -G dev  Generate the configuration of the RAIDframe device in a format suitable for use with
	     the -c or -C options.

     -i dev  Initialize the RAID device.  In particular, (re-)write the parity on the selected
	     device.  This MUST be done for all RAID sets before the RAID device is labeled and
	     before file systems are created on the RAID device.

     -I serial_number dev
	     Initialize the component labels on each component of the device.  serial_number is
	     used as one of the keys in determining whether a particular set of components belong
	     to the same RAID set.  While not strictly enforced, different serial numbers should
	     be used for different RAID sets.  This step MUST be performed when a new RAID set is

     -m dev  Display status information about the parity map on the RAID set, if any.  If used
	     with -v then the current contents of the parity map will be output (in hexadecimal
	     format) as well.

     -M yes dev
	     Enable the use of a parity map on the RAID set; this is the default, and greatly
	     reduces the time taken to check parity after unclean shutdowns at the cost of some
	     very slight overhead during normal operation.  Changes to this setting will take
	     effect the next time the set is configured.  Note that RAID-0 sets, having no par-
	     ity, will not use a parity map in any case.

     -M no dev
	     Disable the use of a parity map on the RAID set; doing this is not recommended.
	     This will take effect the next time the set is configured.

     -M set cooldown tickms regions dev
	     Alter the parameters of the parity map; parameters to leave unchanged can be given
	     as 0, and trailing zeroes may be omitted.	The RAID set is divided into regions
	     regions; each region is marked dirty for at most cooldown intervals of tickms mil-
	     liseconds each after a write to it, and at least cooldown - 1 such intervals.
	     Changes to regions take effect the next time is configured, while changes to the
	     other parameters are applied immediately.	The default parameters are expected to be
	     reasonable for most workloads.

     -p dev  Check the status of the parity on the RAID set.  Displays a status message, and
	     returns successfully if the parity is up-to-date.

     -P dev  Check the status of the parity on the RAID set, and initialize (re-write) the parity
	     if the parity is not known to be up-to-date.  This is normally used after a system
	     crash (and before a fsck(8)) to ensure the integrity of the parity.

     -r component dev
	     Remove the spare disk specified by component from the set of available spare compo-

     -R component dev
	     Fails the specified component, if necessary, and immediately begins a reconstruction
	     back to component.  This is useful for reconstructing back onto a component after it
	     has been replaced following a failure.

     -s dev  Display the status of the RAIDframe device for each of the components and spares.

     -S dev  Check the status of parity re-writing, component reconstruction, and component copy-
	     back.  The output indicates the amount of progress achieved in each of these areas.

     -u dev  Unconfigure the RAIDframe device.	This does not remove any component labels or
	     change any configuration settings (e.g. auto-configuration settings) for the RAID

     -v      Be more verbose.  For operations such as reconstructions, parity re-writing, and
	     copybacks, provide a progress indicator.

     The device used by raidctl is specified by dev.  dev may be either the full name of the
     device, e.g., /dev/rraid0d, for the i386 architecture, or /dev/rraid0c for many others, or
     just simply raid0 (for /dev/rraid0[cd]).  It is recommended that the partitions used to rep-
     resent the RAID device are not used for file systems.

   Configuration file
     The format of the configuration file is complex, and only an abbreviated treatment is given
     here.  In the configuration files, a '#' indicates the beginning of a comment.

     There are 4 required sections of a configuration file, and 2 optional sections.  Each sec-
     tion begins with a 'START', followed by the section name, and the configuration parameters
     associated with that section.  The first section is the 'array' section, and it specifies
     the number of rows, columns, and spare disks in the RAID set.  For example:

	   START array
	   1 3 0

     indicates an array with 1 row, 3 columns, and 0 spare disks.  Note that although multi-
     dimensional arrays may be specified, they are NOT supported in the driver.

     The second section, the 'disks' section, specifies the actual components of the device.  For

	   START disks

     specifies the three component disks to be used in the RAID device.  If any of the specified
     drives cannot be found when the RAID device is configured, then they will be marked as
     'failed', and the system will operate in degraded mode.  Note that it is imperative that the
     order of the components in the configuration file does not change between configurations of
     a RAID device.  Changing the order of the components will result in data loss if the set is
     configured with the -C option.  In normal circumstances, the RAID set will not configure if
     only -c is specified, and the components are out-of-order.

     The next section, which is the 'spare' section, is optional, and, if present, specifies the
     devices to be used as 'hot spares' -- devices which are on-line, but are not actively used
     by the RAID driver unless one of the main components fail.  A simple 'spare' section might

	   START spare

     for a configuration with a single spare component.  If no spare drives are to be used in the
     configuration, then the 'spare' section may be omitted.

     The next section is the 'layout' section.	This section describes the general layout parame-
     ters for the RAID device, and provides such information as sectors per stripe unit, stripe
     units per parity unit, stripe units per reconstruction unit, and the parity configuration to
     use.  This section might look like:

	   START layout
	   # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level
	   32 1 1 5

     The sectors per stripe unit specifies, in blocks, the interleave factor; i.e., the number of
     contiguous sectors to be written to each component for a single stripe.  Appropriate selec-
     tion of this value (32 in this example) is the subject of much research in RAID architec-
     tures.  The stripe units per parity unit and stripe units per reconstruction unit are nor-
     mally each set to 1.  While certain values above 1 are permitted, a discussion of valid val-
     ues and the consequences of using anything other than 1 are outside the scope of this docu-
     ment.  The last value in this section (5 in this example) indicates the parity configuration
     desired.  Valid entries include:

     0	   RAID level 0.  No parity, only simple striping.

     1	   RAID level 1.  Mirroring.  The parity is the mirror.

     4	   RAID level 4.  Striping across components, with parity stored on the last component.

     5	   RAID level 5.  Striping across components, parity distributed across all components.

     There are other valid entries here, including those for Even-Odd parity, RAID level 5 with
     rotated sparing, Chained declustering, and Interleaved declustering, but as of this writing
     the code for those parity operations has not been tested with NetBSD.

     The next required section is the 'queue' section.	This is most often specified as:

	   START queue
	   fifo 100

     where the queuing method is specified as fifo (first-in, first-out), and the size of the
     per-component queue is limited to 100 requests.  Other queuing methods may also be speci-
     fied, but a discussion of them is beyond the scope of this document.

     The final section, the 'debug' section, is optional.  For more details on this the reader is
     referred to the RAIDframe documentation discussed in the HISTORY section.

     See EXAMPLES for a more complete configuration file example.

     /dev/{,r}raid*  raid device special files.

     It is highly recommended that before using the RAID driver for real file systems that the
     system administrator(s) become quite familiar with the use of raidctl, and that they under-
     stand how the component reconstruction process works.  The examples in this section will
     focus on configuring a number of different RAID sets of varying degrees of redundancy.  By
     working through these examples, administrators should be able to develop a good feel for how
     to configure a RAID set, and how to initiate reconstruction of failed components.

     In the following examples 'raid0' will be used to denote the RAID device.	Depending on the
     architecture, /dev/rraid0c or /dev/rraid0d may be used in place of raid0.

   Initialization and Configuration
     The initial step in configuring a RAID set is to identify the components that will be used
     in the RAID set.  All components should be the same size.	Each component should have a
     disklabel type of FS_RAID, and a typical disklabel entry for a RAID component might look

	   f:  1800000	200495	   RAID 	     # (Cyl.  405*- 4041*)

     While FS_BSDFFS will also work as the component type, the type FS_RAID is preferred for
     RAIDframe use, as it is required for features such as auto-configuration.	As part of the
     initial configuration of each RAID set, each component will be given a 'component label'.	A
     'component label' contains important information about the component, including a user-spec-
     ified serial number, the row and column of that component in the RAID set, the redundancy
     level of the RAID set, a 'modification counter', and whether the parity information (if any)
     on that component is known to be correct.	Component labels are an integral part of the RAID
     set, since they are used to ensure that components are configured in the correct order, and
     used to keep track of other vital information about the RAID set.	Component labels are also
     required for the auto-detection and auto-configuration of RAID sets at boot time.	For a
     component label to be considered valid, that particular component label must be in agreement
     with the other component labels in the set.  For example, the serial number, 'modification
     counter', number of rows and number of columns must all be in agreement.  If any of these
     are different, then the component is not considered to be part of the set.  See raid(4) for
     more information about component labels.

     Once the components have been identified, and the disks have appropriate labels, raidctl is
     then used to configure the raid(4) device.  To configure the device, a configuration file
     which looks something like:

	   START array
	   # numRow numCol numSpare
	   1 3 1

	   START disks

	   START spare

	   START layout
	   # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_5
	   32 1 1 5

	   START queue
	   fifo 100

     is created in a file.  The above configuration file specifies a RAID 5 set consisting of the
     components /dev/sd1e, /dev/sd2e, and /dev/sd3e, with /dev/sd4e available as a 'hot spare' in
     case one of the three main drives should fail.  A RAID 0 set would be specified in a similar

	   START array
	   # numRow numCol numSpare
	   1 4 0

	   START disks

	   START layout
	   # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_0
	   64 1 1 0

	   START queue
	   fifo 100

     In this case, devices /dev/sd10e, /dev/sd11e, /dev/sd12e, and /dev/sd13e are the components
     that make up this RAID set.  Note that there are no hot spares for a RAID 0 set, since there
     is no way to recover data if any of the components fail.

     For a RAID 1 (mirror) set, the following configuration might be used:

	   START array
	   # numRow numCol numSpare
	   1 2 0

	   START disks

	   START layout
	   # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_1
	   128 1 1 1

	   START queue
	   fifo 100

     In this case, /dev/sd20e and /dev/sd21e are the two components of the mirror set.	While no
     hot spares have been specified in this configuration, they easily could be, just as they
     were specified in the RAID 5 case above.  Note as well that RAID 1 sets are currently lim-
     ited to only 2 components.  At present, n-way mirroring is not possible.

     The first time a RAID set is configured, the -C option must be used:

	   raidctl -C raid0.conf raid0

     where raid0.conf is the name of the RAID configuration file.  The -C forces the configura-
     tion to succeed, even if any of the component labels are incorrect.  The -C option should
     not be used lightly in situations other than initial configurations, as if the system is
     refusing to configure a RAID set, there is probably a very good reason for it.  After the
     initial configuration is done (and appropriate component labels are added with the -I
     option) then raid0 can be configured normally with:

	   raidctl -c raid0.conf raid0

     When the RAID set is configured for the first time, it is necessary to initialize the compo-
     nent labels, and to initialize the parity on the RAID set.  Initializing the component
     labels is done with:

	   raidctl -I 112341 raid0

     where '112341' is a user-specified serial number for the RAID set.  This initialization step
     is required for all RAID sets.  As well, using different serial numbers between RAID sets is
     strongly encouraged, as using the same serial number for all RAID sets will only serve to
     decrease the usefulness of the component label checking.

     Initializing the RAID set is done via the -i option.  This initialization MUST be done for
     all RAID sets, since among other things it verifies that the parity (if any) on the RAID set
     is correct.  Since this initialization may be quite time-consuming, the -v option may be
     also used in conjunction with -i:

	   raidctl -iv raid0

     This will give more verbose output on the status of the initialization:

	   Initiating re-write of parity
	   Parity Re-write status:
	    10% |****					| ETA:	  06:03 /

     The output provides a 'Percent Complete' in both a numeric and graphical format, as well as
     an estimated time to completion of the operation.

     Since it is the parity that provides the 'redundancy' part of RAID, it is critical that the
     parity is correct as much as possible.  If the parity is not correct, then there is no guar-
     antee that data will not be lost if a component fails.

     Once the parity is known to be correct, it is then safe to perform disklabel(8), newfs(8),
     or fsck(8) on the device or its file systems, and then to mount the file systems for use.

     Under certain circumstances (e.g., the additional component has not arrived, or data is
     being migrated off of a disk destined to become a component) it may be desirable to config-
     ure a RAID 1 set with only a single component.  This can be achieved by using the word
     ``absent'' to indicate that a particular component is not present.  In the following:

	   START array
	   # numRow numCol numSpare
	   1 2 0

	   START disks

	   START layout
	   # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_1
	   128 1 1 1

	   START queue
	   fifo 100

     /dev/sd0e is the real component, and will be the second disk of a RAID 1 set.  The first
     component is simply marked as being absent.  Configuration (using -C and -I 12345 as above)
     proceeds normally, but initialization of the RAID set will have to wait until all physical
     components are present.  After configuration, this set can be used normally, but will be
     operating in degraded mode.  Once a second physical component is obtained, it can be hot-
     added, the existing data mirrored, and normal operation resumed.

     The size of the resulting RAID set will depend on the number of data components in the set.
     Space is automatically reserved for the component labels, and the actual amount of space
     used for data on a component will be rounded down to the largest possible multiple of the
     sectors per stripe unit (sectPerSU) value.  Thus, the amount of space provided by the RAID
     set will be less than the sum of the size of the components.

   Maintenance of the RAID set
     After the parity has been initialized for the first time, the command:

	   raidctl -p raid0

     can be used to check the current status of the parity.  To check the parity and rebuild it
     necessary (for example, after an unclean shutdown) the command:

	   raidctl -P raid0

     is used.  Note that re-writing the parity can be done while other operations on the RAID set
     are taking place (e.g., while doing a fsck(8) on a file system on the RAID set).  However:
     for maximum effectiveness of the RAID set, the parity should be known to be correct before
     any data on the set is modified.

     To see how the RAID set is doing, the following command can be used to show the RAID set's

	   raidctl -s raid0

     The output will look something like:

		      /dev/sd1e: optimal
		      /dev/sd2e: optimal
		      /dev/sd3e: optimal
		      /dev/sd4e: spare
	   Component label for /dev/sd1e:
	      Row: 0 Column: 0 Num Rows: 1 Num Columns: 3
	      Version: 2 Serial Number: 13432 Mod Counter: 65
	      Clean: No Status: 0
	      sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
	      RAID Level: 5  blocksize: 512 numBlocks: 1799936
	      Autoconfig: No
	      Last configured as: raid0
	   Component label for /dev/sd2e:
	      Row: 0 Column: 1 Num Rows: 1 Num Columns: 3
	      Version: 2 Serial Number: 13432 Mod Counter: 65
	      Clean: No Status: 0
	      sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
	      RAID Level: 5  blocksize: 512 numBlocks: 1799936
	      Autoconfig: No
	      Last configured as: raid0
	   Component label for /dev/sd3e:
	      Row: 0 Column: 2 Num Rows: 1 Num Columns: 3
	      Version: 2 Serial Number: 13432 Mod Counter: 65
	      Clean: No Status: 0
	      sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
	      RAID Level: 5  blocksize: 512 numBlocks: 1799936
	      Autoconfig: No
	      Last configured as: raid0
	   Parity status: clean
	   Reconstruction is 100% complete.
	   Parity Re-write is 100% complete.
	   Copyback is 100% complete.

     This indicates that all is well with the RAID set.  Of importance here are the component
     lines which read 'optimal', and the 'Parity status' line.	'Parity status: clean' indicates
     that the parity is up-to-date for this RAID set, whether or not the RAID set is in redundant
     or degraded mode.	'Parity status: DIRTY' indicates that it is not known if the parity
     information is consistent with the data, and that the parity information needs to be
     checked.  Note that if there are file systems open on the RAID set, the individual compo-
     nents will not be 'clean' but the set as a whole can still be clean.

     To check the component label of /dev/sd1e, the following is used:

	   raidctl -g /dev/sd1e raid0

     The output of this command will look something like:

	   Component label for /dev/sd1e:
	      Row: 0 Column: 0 Num Rows: 1 Num Columns: 3
	      Version: 2 Serial Number: 13432 Mod Counter: 65
	      Clean: No Status: 0
	      sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
	      RAID Level: 5  blocksize: 512 numBlocks: 1799936
	      Autoconfig: No
	      Last configured as: raid0

   Dealing with Component Failures
     If for some reason (perhaps to test reconstruction) it is necessary to pretend a drive has
     failed, the following will perform that function:

	   raidctl -f /dev/sd2e raid0

     The system will then be performing all operations in degraded mode, where missing data is
     re-computed from existing data and the parity.  In this case, obtaining the status of raid0
     will return (in part):

		      /dev/sd1e: optimal
		      /dev/sd2e: failed
		      /dev/sd3e: optimal
		      /dev/sd4e: spare

     Note that with the use of -f a reconstruction has not been started.  To both fail the disk
     and start a reconstruction, the -F option must be used:

	   raidctl -F /dev/sd2e raid0

     The -f option may be used first, and then the -F option used later, on the same disk, if
     desired.  Immediately after the reconstruction is started, the status will report:

		      /dev/sd1e: optimal
		      /dev/sd2e: reconstructing
		      /dev/sd3e: optimal
		      /dev/sd4e: used_spare
	   Parity status: clean
	   Reconstruction is 10% complete.
	   Parity Re-write is 100% complete.
	   Copyback is 100% complete.

     This indicates that a reconstruction is in progress.  To find out how the reconstruction is
     progressing the -S option may be used.  This will indicate the progress in terms of the per-
     centage of the reconstruction that is completed.  When the reconstruction is finished the -s
     option will show:

		      /dev/sd1e: optimal
		      /dev/sd2e: spared
		      /dev/sd3e: optimal
		      /dev/sd4e: used_spare
	   Parity status: clean
	   Reconstruction is 100% complete.
	   Parity Re-write is 100% complete.
	   Copyback is 100% complete.

     At this point there are at least two options.  First, if /dev/sd2e is known to be good
     (i.e., the failure was either caused by -f or -F, or the failed disk was replaced), then a
     copyback of the data can be initiated with the -B option.	In this example, this would copy
     the entire contents of /dev/sd4e to /dev/sd2e.  Once the copyback procedure is complete, the
     status of the device would be (in part):

		      /dev/sd1e: optimal
		      /dev/sd2e: optimal
		      /dev/sd3e: optimal
		      /dev/sd4e: spare

     and the system is back to normal operation.

     The second option after the reconstruction is to simply use /dev/sd4e in place of /dev/sd2e
     in the configuration file.  For example, the configuration file (in part) might now look

	   START array
	   1 3 0

	   START disks

     This can be done as /dev/sd4e is completely interchangeable with /dev/sd2e at this point.
     Note that extreme care must be taken when changing the order of the drives in a configura-
     tion.  This is one of the few instances where the devices and/or their orderings can be
     changed without loss of data!  In general, the ordering of components in a configuration
     file should never be changed.

     If a component fails and there are no hot spares available on-line, the status of the RAID
     set might (in part) look like:

		      /dev/sd1e: optimal
		      /dev/sd2e: failed
		      /dev/sd3e: optimal
	   No spares.

     In this case there are a number of options.  The first option is to add a hot spare using:

	   raidctl -a /dev/sd4e raid0

     After the hot add, the status would then be:

		      /dev/sd1e: optimal
		      /dev/sd2e: failed
		      /dev/sd3e: optimal
		      /dev/sd4e: spare

     Reconstruction could then take place using -F as describe above.

     A second option is to rebuild directly onto /dev/sd2e.  Once the disk containing /dev/sd2e
     has been replaced, one can simply use:

	   raidctl -R /dev/sd2e raid0

     to rebuild the /dev/sd2e component.  As the rebuilding is in progress, the status will be:

		      /dev/sd1e: optimal
		      /dev/sd2e: reconstructing
		      /dev/sd3e: optimal
	   No spares.

     and when completed, will be:

		      /dev/sd1e: optimal
		      /dev/sd2e: optimal
		      /dev/sd3e: optimal
	   No spares.

     In circumstances where a particular component is completely unavailable after a reboot, a
     special component name will be used to indicate the missing component.  For example:

		      /dev/sd2e: optimal
		     component1: failed
	   No spares.

     indicates that the second component of this RAID set was not detected at all by the auto-
     configuration code.  The name 'component1' can be used anywhere a normal component name
     would be used.  For example, to add a hot spare to the above set, and rebuild to that hot
     spare, the following could be done:

	   raidctl -a /dev/sd3e raid0
	   raidctl -F component1 raid0

     at which point the data missing from 'component1' would be reconstructed onto /dev/sd3e.

     When more than one component is marked as 'failed' due to a non-component hardware failure
     (e.g., loss of power to two components, adapter problems, termination problems, or cabling
     issues) it is quite possible to recover the data on the RAID set.	The first thing to be
     aware of is that the first disk to fail will almost certainly be out-of-sync with the
     remainder of the array.  If any IO was performed between the time the first component is
     considered 'failed' and when the second component is considered 'failed', then the first
     component to fail will not contain correct data, and should be ignored.  When the second
     component is marked as failed, however, the RAID device will (currently) panic the system.
     At this point the data on the RAID set (not including the first failed component) is still
     self consistent, and will be in no worse state of repair than had the power gone out in the
     middle of a write to a file system on a non-RAID device.  The problem, however, is that the
     component labels may now have 3 different 'modification counters' (one value on the first
     component that failed, one value on the second component that failed, and a third value on
     the remaining components).  In such a situation, the RAID set will not autoconfigure, and
     can only be forcibly re-configured with the -C option.  To recover the RAID set, one must
     first remedy whatever physical problem caused the multiple-component failure.  After that is
     done, the RAID set can be restored by forcibly configuring the raid set without the compo-
     nent that failed first.  For example, if /dev/sd1e and /dev/sd2e fail (in that order) in a
     RAID set of the following configuration:

	   START array
	   1 4 0

	   START disks

	   START layout
	   # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_5
	   64 1 1 5

	   START queue
	   fifo 100

     then the following configuration (say "recover_raid0.conf")

	   START array
	   1 4 0

	   START disks

	   START layout
	   # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_5
	   64 1 1 5

	   START queue
	   fifo 100

     can be used with

	   raidctl -C recover_raid0.conf raid0

     to force the configuration of raid0.  A

	   raidctl -I 12345 raid0

     will be required in order to synchronize the component labels.  At this point the file sys-
     tems on the RAID set can then be checked and corrected.  To complete the re-construction of
     the RAID set, /dev/sd1e is simply hot-added back into the array, and reconstructed as
     described earlier.

     RAID sets can be layered to create more complex and much larger RAID sets.  A RAID 0 set,
     for example, could be constructed from four RAID 5 sets.  The following configuration file
     shows such a setup:

	   START array
	   # numRow numCol numSpare
	   1 4 0

	   START disks

	   START layout
	   # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_0
	   128 1 1 0

	   START queue
	   fifo 100

     A similar configuration file might be used for a RAID 0 set constructed from components on
     RAID 1 sets.  In such a configuration, the mirroring provides a high degree of redundancy,
     while the striping provides additional speed benefits.

   Auto-configuration and Root on RAID
     RAID sets can also be auto-configured at boot.  To make a set auto-configurable, simply pre-
     pare the RAID set as above, and then do a:

	   raidctl -A yes raid0

     to turn on auto-configuration for that set.  To turn off auto-configuration, use:

	   raidctl -A no raid0

     RAID sets which are auto-configurable will be configured before the root file system is
     mounted.  These RAID sets are thus available for use as a root file system, or for any other
     file system.  A primary advantage of using the auto-configuration is that RAID components
     become more independent of the disks they reside on.  For example, SCSI ID's can change, but
     auto-configured sets will always be configured correctly, even if the SCSI ID's of the com-
     ponent disks have become scrambled.

     Having a system's root file system (/) on a RAID set is also allowed, with the 'a' partition
     of such a RAID set being used for /.  To use raid0a as the root file system, simply use:

	   raidctl -A root raid0

     To return raid0a to be just an auto-configuring set simply use the -A yes arguments.

     Note that kernels can only be directly read from RAID 1 components on architectures that
     support that (currently alpha, i386, pmax, sparc, sparc64, and vax).  On those architec-
     tures, the FS_RAID file system is recognized by the bootblocks, and will properly load the
     kernel directly from a RAID 1 component.  For other architectures, or to support the root
     file system on other RAID sets, some other mechanism must be used to get a kernel booting.
     For example, a small partition containing only the secondary boot-blocks and an alternate
     kernel (or two) could be used.  Once a kernel is booting however, and an auto-configuring
     RAID set is found that is eligible to be root, then that RAID set will be auto-configured
     and used as the root device.  If two or more RAID sets claim to be root devices, then the
     user will be prompted to select the root device.  At this time, RAID 0, 1, 4, and 5 sets are
     all supported as root devices.

     A typical RAID 1 setup with root on RAID might be as follows:

     1.   wd0a - a small partition, which contains a complete, bootable, basic NetBSD installa-

     2.   wd1a - also contains a complete, bootable, basic NetBSD installation.

     3.   wd0e and wd1e - a RAID 1 set, raid0, used for the root file system.

     4.   wd0f and wd1f - a RAID 1 set, raid1, which will be used only for swap space.

     5.   wd0g and wd1g - a RAID 1 set, raid2, used for /usr, /home, or other data, if desired.

     6.   wd0h and wd1h - a RAID 1 set, raid3, if desired.

     RAID sets raid0, raid1, and raid2 are all marked as auto-configurable.  raid0 is marked as
     being a root file system.	When new kernels are installed, the kernel is not only copied to
     /, but also to wd0a and wd1a.  The kernel on wd0a is required, since that is the kernel the
     system boots from.  The kernel on wd1a is also required, since that will be the kernel used
     should wd0 fail.  The important point here is to have redundant copies of the kernel avail-
     able, in the event that one of the drives fail.

     There is no requirement that the root file system be on the same disk as the kernel.  For
     example, obtaining the kernel from wd0a, and using sd0e and sd1e for raid0, and the root
     file system, is fine.  It is critical, however, that there be multiple kernels available, in
     the event of media failure.

     Multi-layered RAID devices (such as a RAID 0 set made up of RAID 1 sets) are not supported
     as root devices or auto-configurable devices at this point.  (Multi-layered RAID devices are
     supported in general, however, as mentioned earlier.)  Note that in order to enable compo-
     nent auto-detection and auto-configuration of RAID devices, the line:

	   options    RAID_AUTOCONFIG

     must be in the kernel configuration file.	See raid(4) for more details.

   Swapping on RAID
     A RAID device can be used as a swap device.  In order to ensure that a RAID device used as a
     swap device is correctly unconfigured when the system is shutdown or rebooted, it is recom-
     mended that the line


     be added to /etc/rc.conf.

     The final operation performed by raidctl is to unconfigure a raid(4) device.  This is accom-
     plished via a simple:

	   raidctl -u raid0

     at which point the device is ready to be reconfigured.

   Performance Tuning
     Selection of the various parameter values which result in the best performance can be quite
     tricky, and often requires a bit of trial-and-error to get those values most appropriate for
     a given system.  A whole range of factors come into play, including:

     1.   Types of components (e.g., SCSI vs. IDE) and their bandwidth

     2.   Types of controller cards and their bandwidth

     3.   Distribution of components among controllers

     4.   IO bandwidth

     5.   file system access patterns

     6.   CPU speed

     As with most performance tuning, benchmarking under real-life loads may be the only way to
     measure expected performance.  Understanding some of the underlying technology is also use-
     ful in tuning.  The goal of this section is to provide pointers to those parameters which
     may make significant differences in performance.

     For a RAID 1 set, a SectPerSU value of 64 or 128 is typically sufficient.	Since data in a
     RAID 1 set is arranged in a linear fashion on each component, selecting an appropriate
     stripe size is somewhat less critical than it is for a RAID 5 set.  However: a stripe size
     that is too small will cause large IO's to be broken up into a number of smaller ones, hurt-
     ing performance.  At the same time, a large stripe size may cause problems with concurrent
     accesses to stripes, which may also affect performance.  Thus values in the range of 32 to
     128 are often the most effective.

     Tuning RAID 5 sets is trickier.  In the best case, IO is presented to the RAID set one
     stripe at a time.	Since the entire stripe is available at the beginning of the IO, the par-
     ity of that stripe can be calculated before the stripe is written, and then the stripe data
     and parity can be written in parallel.  When the amount of data being written is less than a
     full stripe worth, the 'small write' problem occurs.  Since a 'small write' means only a
     portion of the stripe on the components is going to change, the data (and parity) on the
     components must be updated slightly differently.  First, the 'old parity' and 'old data'
     must be read from the components.	Then the new parity is constructed, using the new data to
     be written, and the old data and old parity.  Finally, the new data and new parity are writ-
     ten.  All this extra data shuffling results in a serious loss of performance, and is typi-
     cally 2 to 4 times slower than a full stripe write (or read).  To combat this problem in the
     real world, it may be useful to ensure that stripe sizes are small enough that a 'large IO'
     from the system will use exactly one large stripe write.  As is seen later, there are some
     file system dependencies which may come into play here as well.

     Since the size of a 'large IO' is often (currently) only 32K or 64K, on a 5-drive RAID 5 set
     it may be desirable to select a SectPerSU value of 16 blocks (8K) or 32 blocks (16K).  Since
     there are 4 data sectors per stripe, the maximum data per stripe is 64 blocks (32K) or 128
     blocks (64K).  Again, empirical measurement will provide the best indicators of which values
     will yield better performance.

     The parameters used for the file system are also critical to good performance.  For
     newfs(8), for example, increasing the block size to 32K or 64K may improve performance dra-
     matically.  As well, changing the cylinders-per-group parameter from 16 to 32 or higher is
     often not only necessary for larger file systems, but may also have positive performance

     Despite the length of this man-page, configuring a RAID set is a relatively straight-forward
     process.  All that needs to be done is the following steps:

     1.   Use disklabel(8) to create the components (of type RAID).

     2.   Construct a RAID configuration file: e.g., raid0.conf

     3.   Configure the RAID set with:

		raidctl -C raid0.conf raid0

     4.   Initialize the component labels with:

		raidctl -I 123456 raid0

     5.   Initialize other important parts of the set with:

		raidctl -i raid0

     6.   Get the default label for the RAID set:

		disklabel raid0 > /tmp/label

     7.   Edit the label:

		vi /tmp/label

     8.   Put the new label on the RAID set:

		disklabel -R -r raid0 /tmp/label

     9.   Create the file system:

		newfs /dev/rraid0e

     10.  Mount the file system:

		mount /dev/raid0e /mnt

     11.  Use:

		raidctl -c raid0.conf raid0

	  To re-configure the RAID set the next time it is needed, or put raid0.conf into /etc
	  where it will automatically be started by the /etc/rc.d scripts.

     ccd(4), raid(4), rc(8)

     RAIDframe is a framework for rapid prototyping of RAID structures developed by the folks at
     the Parallel Data Laboratory at Carnegie Mellon University (CMU).	A more complete descrip-
     tion of the internals and functionality of RAIDframe is found in the paper "RAIDframe: A
     Rapid Prototyping Tool for RAID Systems", by William V. Courtright II, Garth Gibson, Mark
     Holland, LeAnn Neal Reilly, and Jim Zelenka, and published by the Parallel Data Laboratory
     of Carnegie Mellon University.

     The raidctl command first appeared as a program in CMU's RAIDframe v1.1 distribution.  This
     version of raidctl is a complete re-write, and first appeared in NetBSD 1.4.

     The RAIDframe Copyright is as follows:

     Copyright (c) 1994-1996 Carnegie-Mellon University.
     All rights reserved.

     Permission to use, copy, modify and distribute this software and
     its documentation is hereby granted, provided that both the copyright
     notice and this permission notice appear in all copies of the
     software, derivative works or modified versions, and any portions
     thereof, and that both notices appear in supporting documentation.


     Carnegie Mellon requests users of this software to return to

      Software Distribution Coordinator  or  Software.Distribution@CS.CMU.EDU
      School of Computer Science
      Carnegie Mellon University
      Pittsburgh PA 15213-3890

     any improvements or extensions that they make and grant Carnegie the
     rights to redistribute these changes.

     Certain RAID levels (1, 4, 5, 6, and others) can protect against some data loss due to com-
     ponent failure.  However the loss of two components of a RAID 4 or 5 system, or the loss of
     a single component of a RAID 0 system will result in the entire file system being lost.
     RAID is NOT a substitute for good backup practices.

     Recomputation of parity MUST be performed whenever there is a chance that it may have been
     compromised.  This includes after system crashes, or before a RAID device has been used for
     the first time.  Failure to keep parity correct will be catastrophic should a component ever
     fail -- it is better to use RAID 0 and get the additional space and speed, than it is to use
     parity, but not keep the parity correct.  At least with RAID 0 there is no perception of
     increased data security.

     When replacing a failed component of a RAID set, it is a good idea to zero out the first 64
     blocks of the new component to insure the RAIDframe driver doesn't erroneously detect a com-
     ponent label in the new component.  This is particularly true on RAID 1 sets because there
     is at most one correct component label in a failed RAID 1 installation, and the RAIDframe
     driver picks the component label with the highest serial number and modification value as
     the authoritative source for the failed RAID set when choosing which component label to use
     to configure the RAID set.

     Hot-spare removal is currently not available.

BSD					 January 27, 2010				      BSD

All times are GMT -4. The time now is 07:45 AM.

Unix & Linux Forums Content Copyrightę1993-2018. All Rights Reserved.
Show Password