Linux Storage system: looking for advices Post: 302385670

LEARN ABOUT HPUX

vxsparecheck

vxsparecheck(1M)														  vxsparecheck(1M)

NAME

       vxsparecheck - monitor Veritas Volume Manager for failure events and replace failed disks

SYNOPSIS

       /etc/vx/bin/vxsparecheck [mail-address...]

DESCRIPTION

       The  vxsparecheck  command  monitors Veritas Volume Manager (VxVM) by analyzing the output of the vxnotify command, waiting for failures to
       occur.  It then sends mail via mailx to the logins specified on the command line, or (by default) to root.  It  then  replaces  any  failed
       disks.  After an attempt at replacement is complete, mail will be sent indicating the status of each disk replacement.

       The mail notification that is sent when a failure is detected follows this format:

	      Failures have been detected by the Veritas Volume Manager:

	      failed disks:
	      medianame
		...
	      failed plexes:
	      plexname
		...
	      failed subdisks:
	      subdiskname
		...
	      failed volumes:
	      volumename
		...

	      The Volume Manager will attempt to find hot-spare disks to replace any
	      failed disks and attempt to reconstruct any data in volumes that have
	      storage on the failed disk.

       The  medianame list specifies disks that appear to have completely failed. The plexname list show plexes of mirrored volumes that have been
       detached due to I/O failures experienced while attempting to do I/O to subdisks they contain. The subdiskname list  specifies  subdisks	in
       RAID-5  volumes	that  have been detached due to I/O errors. The volumename list shows non-RAID-5 volumes that have become unusable because
       disks in all of their plexes have failed (and are listed in the ``failed disks'' list) and shows those  RAID-5  volumes	that  have  become
       unusable because of multiple failures.

       If any volumes appear to have failed, the following paragraph will be included in the mail:

	      The data in the failed volumes listed above is no longer
	      available. It will need to be restored from backup.

Replacement Procedure
       After mail has been sent, vxsparecheck finds a hot spare replacement for any disks that appear to have failed (that is, those listed in the
       medianame list). This involves finding an appropriate replacement for those eligible hot spares in the same disk group as the failed  disk.
       A  disk	is  eligible as a replacement if it is a valid Veritas Volume Manager disk (VM disk), has been marked as a hot-spare disk and con-
       tains enough space to hold the data contained in all the subdisks on the failed disk.

       To determine which disk from among the eligible hot spares to use, vxsparecheck first checks the file /etc/vx/sparelist (see Sparelist File
       below). If this file does not exist or lists no eligible hot spares for the failed disk, the disk that is ``closest'' to the failed disk is
       chosen. The value of ``closeness'' depends on the controller, target and disk number of the failed disk.  A disk on the same controller	as
       the  failed  disk  is  closer than a disk on a different controller; and a disk under the same target as the failed disk is closer than one
       under a different target.

       If no hot spare disk can be found, the following mail is sent:

	      No hot spare could be found for disk medianame in
	      diskgroup. No replacement has been made and the disk is still
	      unusable.

       The mail then explains the disposition of volumes that had storage on the failed disk. The following message lists disks that  had  storage
       on the failed disk, but are still usable:

	      The following volumes have storage on medianame:

	      volumename

	      These volumes are still usable, but the redundancy of
	      those volumes is reduced. Any RAID-5 volumes with storage
	      on the failed disk may become unusable in the face of further
	      failures.

       If any non-RAID-5 volumes were made unusable due to the failure of the disk, the following message is included:

	      The following volumes:

	      volumename

	      have data on medianame but have no other usable
	      mirrors on other disks. These volumes are now unusable
	      and the data on them is unavailable.

       If any RAID-5 volumes were made unavailable due to the disk failure, the following message is included

	      The following RAID-5 volumes:

	      volumename

	      had storage on medianame and have experienced
	      other failures. These RAID-5 volumes are now unusable
	      and data on them is unavailable.

       If  a  hot-spare disk was found, a hot-spare replacement is attempted.  This involves associating the device marked as a hot spare with the
       media record that was associated with the failed disk. If this is successful, the vxrecover(1M)	command  is  used  in  the  background	to
       recover the contents of any data in volumes that had storage on the disk.

       If the hot-spare replacement fails, the following message is sent:

	      Replacement of disk medianame in group diskgroup
	      failed. The error is:

	      error message

       If any volumes (RAID-5 or otherwise) are rendered unusable due to the failure, the following message is included:

	      The following volumes:

	      volumename

	      occupy space on the failed disk and have no other available
	      mirrors or have experienced other failures. These volumes are
	      unusable, and the data they contain is unavailable.

       If the hot-spare replacement procedure completed successfully and recovery is under way, a final mail message is sent:

	      Replacement of disk medianame in group diskgroup
	      with disk device sparedevice has successfully completed
	      and recovery is under way.

       If  any	non-RAID-5  volumes  were  rendered  unusable  by the failure despite the successful hot-spare procedure, the following message is
       included in the mail:

	      The following volumes:

	      volumename

	      occupy spare on the replaced disk, but have no other enabled
	      mirrors on other disks from which to perform recovery. These
	      volumes must have their data restored.

       If any RAID-5 volumes were rendered unusable by the failure despite the successful hot-spare procedure, the following message  is  included
       in the mail:

	      The following RAID-5 volumes:

	      volumename

	      have subdisks on the replaced disk and have experienced
	      other failures that prevent recovery. These RAID-5 volumes
	      must have their data restored.

       If any volumes (RAID-5 or otherwise) were rendered unusable, the following message is also included:

	      To restore the contents of any volumes listed above, the
	      volume should be started with the command:

		   vxvol -f start volumename

	      and the data restored from backup.

Sparelist File
       The sparelist file is a text file that specifies an ordered list of disks to be used as hot spares when a specific disk fails.  The system-
       wide sparelist file is located in /etc/vx/sparelist.  Each line in the sparelist file specifies a list  of  spares  for	one  disk.   Lines
       beginning with the pound (#) character and empty lines are ignored.  The format for a line in the sparelist file is:

	    [ diskgroup:] diskname : spare1 [ spare2 ... ]

       The  diskgroup field, if present, specifies the disk group within which the disk and designated spares reside.  If this field is not speci-
       fied, the default disk group is determined using the rules given in the vxdg(1M) manual page.  The diskname specifies the  disk	for  which
       spares  are being designated. The spare list after the colon lists the disks to be used as hot spares. The list is order dependent; in case
       of failure of diskname, the spares are tried in order. A spare will be used only if it is a valid hot spare (see above).  If  the  list	is
       exhausted without finding any spares, the default policy of using the closest disk is used.

FILES

       /etc/vx/sparelist	     Specifies a list of disks to serve as hot spares for a disk.

NOTES

       The  sparelist  file  is not checked in any way for correctness until a disk failure occurs. It is possible to inadvertently specify a non-
       existent disk or inappropriate disk or disk group. Malformed lines are also ignored.

SEE ALSO

       mailx(1), vxintro(1M), vxnotify(1M), vxrecover(1M), vxrelocd(1M), vxunreloc(1M)

VxVM 5.0.31.1							    24 Mar 2008 						  vxsparecheck(1M)