Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

lamshrink(1) [redhat man page]

LAMSHRINK(1)							   LAM COMMANDS 						      LAMSHRINK(1)

NAME
lamshrink - Shrink a LAM multicomputer. SYNTAX
lamshrink [-hv] [-w <delay>] [-u <userid>] <node> <hostname> OPTIONS
-h Print useful information on this command. -v Be verbose. <hostname> Remove the LAM node on this host. <node> Remove the LAM node with this ID. -w <delay> Notify processes on the doomed node and pause for <delay> seconds before proceeding. -u <userid> Use this userid to access the host. DESCRIPTION
An existing LAM session, initiated by lamboot(1), can be shrunk to include less nodes with lamshrink. One node is removed for each invoca- tion. At a minimum, the node ID and the associated host name is given on the command line. Once lamshrink completes, the node ID is in- valid across the remaining nodes. If a different userid is required to access the host, it is specified with the -u option. Existing application processes on the target node can be warned of impending shutdown with the -w option. A LAM signal (SIGFUSE) will be sent to these processes and lamshrink will then pause for the given number of seconds before proceeding with removing the node. By de- fault, SIGFUSE is ignored. A different handler can be installed with ksignal(2). All application processes on all remaining nodes are always informed of the death of a node. This is also done with a signal (SIGSHRINK), which by default causes a process's runtime route cache to be flushed (to remove any cached information on the dead node). If this signal is re-vectored for the purpose of fault tolerance, the old handler should be called at the beginning of the new handler. The signal does not, by itself, give the process information on which node has been removed. One technique for getting this information is to query the router for information on all relevant nodes using getroute(2). The dead node will cause this routine to return an error. FAULT TOLERANCE If enabled with lamboot(1), LAM will watch for nodes that fail. The procedure for removing a node that has failed is the same as lamshrink after the warning step. In particular, the SIGSHRINK signal is delivered. EXAMPLES
lamshrink -v newhost n1 Remove LAM on newhost, known within LAM as node 1. Report about important steps as they are done. lamshrink newhost n30 -w 10 Inform all processes on LAM node 30, which is running on newhost, that the node will be dead in 10 seconds. Wait 10 seconds and remove the node. Operate silently. SEE ALSO
lamboot(1), tkill(1), ksignal(2), getroute(2) LAM 6.5.8 November, 2002 LAMSHRINK(1)

Check Out this Related Man Page

LAMGROW(1)							   LAM COMMANDS 							LAMGROW(1)

NAME
lamgrow - Extend a LAM multicomputer. SYNOPSIS
lamgrow [-hvd] [-cpu num] [-n nodeid] [-no-schedule] [-ssi key value] hostname OPTIONS
-cpu num Indicate how many CPUs are available to LAM on the new node. -d Turn on debugging output. This implies -v. -h Print useful information on this command. -n nodeid Assign this ID to the new node. -no-schedule Indicate that C and N expansion in mpirun and lamexec should not schedule on this node. -ssi key value Send arguments to various SSI modules. See the "SSI" section, below. -v Be verbose. hostname Extend LAM with this host. DESCRIPTION
An existing LAM universe, initiated by lamboot(1), can be enlarged to include more nodes with lamgrow. One new node is added for each in- vocation. At a minimum, the host name that will run the new node is given on the command line. If a different userid is required to ac- cess the host, it is specified with the appropriate boot SSI options (see lamssi_boot(7)). The new node can be assigned any unused, non-negative identifier. If no identifier is specified, the highest node identifier in the cur- rent LAM universe plus one is used. Note that lamboot(1) always assigns node identifiers consecutively from 0. lamgrow can be run from any node in the current LAM universe. Specifically -- it cannot be run from the intended new host. Two invoca- tions of lamgrow should not run concurrently, and the command attempts to detect this situation. The name of the host specified in lamgrow should not be the one which is already present in the user's LAM universe and the command attempts to detect this situation too. Resource managers will be the most common user of lamgrow. When hosts become idle and a user has expressed a desire to the manager that extra cycles should be exploited, the manager could invoke lamgrow and then launch the specified application process(es) on the new node. EXAMPLES
lamgrow -v newhost Start LAM on newhost and add it to the existing LAM universe. Choose the next available node identifier and report about important steps as they are done. lamgrow -n 30 newhost Start LAM on newhost with node ID 30 and add it to the existing LAM universe. Operate silently. FILES
laminstalldir/etc/lam-conf.lamd default configuration file for LAM nodes, where "laminstalldir" is the directory where LAM/MPI was in- stalled. BUGS
It is not currently possible to specify a configuration file other than lam-conf.lamd on the remote node, even though this is possible with lamboot. SEE ALSO
lamboot(1), lamhalt(1), hboot(1), lamwipe(1), tkill(1), bhost(5), conf(5), lamssi_boot(7) LAM 7.1.4 July, 2007 LAMGROW(1)
Man Page