Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

sam_overview(8) [debian man page]

SAM_OVERVIEW(8) 				    Corosync Cluster Engine Programmer's Manual 				   SAM_OVERVIEW(8)

NAME
sam_overview - Overview of the Simple Availability Manager OVERVIEW
The SAM library provide a tool to check the health of an application. The main purpose of SAM is to restart a local process when it fails to respond to a healthcheck request in a configured time interval. During sam_initialize(3), a duplicate copy of the process is created using the fork(3) system call. This duplicate process copy contains the logic for executing the SAM server. The SAM server is responsible for requesting healthchecks from the active process, and controlling the lifecycle of the active process when it fails. If the active process fails to respond to the healthcheck request sent by the SAM server, it will be sent a user configurable signal (default SIGTERM) to request shutdown of the application. After a configured time interval, the process will be forcibly killed by being sent a SIGKILL signal. Once the active process terminates, the SAM server will cre- ate a new active process. The Simple Availability Manager is meant to be used in conjunction with the cpg service. Used together, it is possible to restart a cpg process that fails healthchecking during operation. The main features of SAM include: o A configurable recovery policy. o A configurable time interval for health check operations. o A notification via signal before recovery action is taken. o A mechanism to indicate to the application the number of times an active process has been created by the SAM server. o Both application driven health checking and event driven health checking. Initializing SAM The SAM library is initialized by sam_initialize(3). sam_initalize(3) may only be called once per process. Calling it more then once has undefined results and is not recommended or tested. Setting warning callback User configurable signal (default SIGTERM) is sent to the application when a recovery action is planned. The application can use the sig- nal(3) system call to monitor for this signal. There are no special constraints on what SAM apis may be called in a warning callback. After time_interval expires, a SIGKILL signal is sent to the active process to force its termination. Registering the active process The active process is registered with SAM by calling sam_register(3). This function should only be called one time in a process. After a recovery action is taken, the new active process will begin execution at the next line of code in a user process after sam_register(3). Enabling event driven healthchecking Two types of healthchecking are available to the user. The first model is one where the user application healthchecks during its normal operation. It is never requested to healtcheck, and if the active process doesn't respond within the time interval, the process will be restarted. A more useful mechanism for healthchecking is event driven healthchecking. Because this model is directed by the SAM server, It isn't nec- essary to guess or add timers to the active process to signal a healthcheck operation is successful. To use event driven healthchecking, the sam_hc_callback_register(3) function should be executed. BUGS
SEE ALSO
sam_initialize(3), sam_finalize(3), sam_start(3), sam_stop(3), sam_register(3), sam_warn_signal_set(3), sam_hc_send(3), sam_hc_call- back_register(3) corosync Man Page 12/01/2009 SAM_OVERVIEW(8)

Check Out this Related Man Page

SG_RESET(8)							     SG3_UTILS							       SG_RESET(8)

NAME
sg_reset - sends SCSI device, target, bus or host reset; or checks reset state SYNOPSIS
sg_reset [-b] [-d] [-h] [-t] [-V] DEVICE DESCRIPTION
The sg_reset utility with no options (just a DEVICE) reports on the reset state (e.g. if a reset is underway) of DEVICE. When given a -d, -t, -b or -h option it requests a device, target, bus or host reset respectively. The ability to reset a SCSI target (often called a "hard reset" at the transport level) was added in linux kernel 2.6.27 . Low level driv- ers that support target reset hopefully reset a logical unit only when given the device reset (i.e. -d) option. This should removed the ambiguity of whether "device" meant LU or target that we have had in the past. In the linux kernel 2.6 series this utility can be called on sd, sr (cd/dvd), st or sg device nodes; if the user has appropriate permis- sions. In the linux kernel 2.4 series support for this utility first appeared in lk 2.4.19 and could only be called on sg device nodes. Various vendors made this capability available in their kernels prior to lk 2.4.19. OPTIONS
-b attempt a SCSI bus reset. This would normally be tried if the device reset (i.e. option -d) was not successful. -d attempt a SCSI device reset. If the device seems stuck, this is the first reset that should be tried. This assumes the linux scsi mid level error handler is not already in the process of resetting DEVICE. -h attempt a host adapter reset. This would normally be tried if both device reset (i.e. option -d) and bus reset (i.e. option -b) were not successful. -t attempt a SCSI target reset. This assumes the linux scsi mid level error handler is not already in the process of resetting the tar- get that contains the given DEVICE. -V prints the version string then exits. NOTES
The error recovery code within the linux kernel when faced with a SCSI command timing out and no response from the device (LU), first tries a device reset and if that is not successful tries a target reset. If that is not successful it tries a bus reset. If that is not success- ful it tries a host reset. Users of this utility should check whether such a recovery is already underway before trying to reset with this utility. The "device,target,bus,host" order is also recommended (i.e. first start with the smallest hammer). The above is a generalization and exact details will vary depending on the transport and the low level driver concerned. SAM-4 defines a hard reset, a logical unit reset and a I_T nexus reset. A hard reset is defined to be a power on condition, a microcode change or a transport reset event. A LU reset and an I_T nexus reset can be requested via task management function (and support for LU reset is mandatory). In Linux the SCSI subsystem leaves it up to the low level drivers as to whether a "device reset" is only for the addressed LU or all the LUs in the device that contains the addressed LU (i.e. a target reset). The additional of the target reset (i.e. option -t) should give more control in this area. The "bus reset" is a transport reset and may be a dummy operation, depending on the transport. A "host reset" attempts to re-initialize the HBA that the request passes through en route to the DEVICE. Note that a "host reset" and a "bus reset" may cause collateral damage. This utility does not allow individual SCSI commands (or tasks as they are called in SAM-4) to be aborted. SAM-4 defines ABORT TASK and ABORT TASK SET task management functions for that. Prior to SAM-3 there was a TARGET RESET task management function. Several transports still support that function and many associated linux low level drivers map the -t option to it. AUTHORS
Written by Douglas Gilbert. COPYRIGHT
Copyright (C) 1999-2009 Douglas Gilbert This software is distributed under the GPL version 2. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PUR- POSE. sg3_utils-1.28 July 2009 SG_RESET(8)
Man Page