Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

opal-restart(1) [debian man page]

OPAL-RESTART(1) 						     Open MPI							   OPAL-RESTART(1)

NAME
opal-restart - Restart a previously checkpointed sequential process using the Open PAL Checkpoint/Restart Service (CRS) Note: This should only be used by the user if the application being restarted is an OPAL-only application. If it is an Open RTE or Open MPI program their respective tools should be used. SYNOPSIS
opal-restart [ options ] <SNAPSHOT HANDLE> Options opal-restart will attempt to restart a previously checkpointed squential process from the snapshot handle reference returned by opal_check- point. <SNAPSHOT HANDLE> The snapshot handle reference returned by opal_checkpoint, used to restart the process. This is required to be the last argument to this command. -h | --help Display help for this command --fork Fork off a new process, which is the restarted process. By default, the restarted process will replace opal-restart process. -w | --where The location of the local snapshot reference. -s | --self Restart this process using the self CRS component. This component is a special case, all other CRS components are automatically detected. -v | --verbose Enable verbose output for debugging. -gmca | --gmca <key> <value> Pass global MCA parameters that are applicable to all contexts. <key> is the parameter name; <value> is the parameter value. -mca | --mca <key> <value> Send arguments to various MCA modules. DESCRIPTION
opal-restart can be invoked multiple, non-overlapping times. This allows the user to restart a previously running sequential process. See opal_crs(7) for more information about the CRS framework and components. When using the self CRS component, the <FILENAME> argument is replaced by the name of the program to be restarted followed by any arguments that need to be passed to the program. For example, if under normal execution we would start our program "foo" as: shell$ setenv OMPI_MCA_crs=self shell$ setenv OMPI_MCA_crs_self_prefix=my_callback_prefix shell$ ./foo arg1 arg2 To restart this process, we may only need to call: shell$ opal-restart --self -mca crs_self_prefix my_callback_prefix ./foo arg1 arg2 This will cause the "my_callback_prefix-restart" function to be called as soon as the program "foo" calls OPAL_INIT. You do not have to call your program with the same argument set as before. There for we could have just as correctly called: shell$ opal-restart --self -mca crs_self_prefix my_callback_prefix ./foo arg3 This depends upon the behavior of the program "foo". SEE ALSO
opal-checkpoint(1), opal_crs(7) 1.4.5 Feb 10, 2012 OPAL-RESTART(1)

Check Out this Related Man Page

ORTE_SNAPC(7)							     Open MPI							     ORTE_SNAPC(7)

NAME
ORTE_SNAPC - Open RTE MCA Snapshot Coordination (SnapC) Framework: Overview of Open RTE's SnapC framework, and selected modules. Open MPI 1.4.5 DESCRIPTION
Open RTE can coordinate the generation of a global snapshot of a parallel job from many distributed local snapshots. The components in this framework determine how to: Initiate the checkpoint of the parallel application, gather together the many distributed local snapshots, and provide the user with a global snapshot handle reference that can be used to restart the parallel application. GENERAL PROCESS REQUIREMENTS
In order for a process to use the Open RTE SnapC components it must adhear to a few programmatic requirements. First, the program must call ORTE_INIT early in its execution. This should only be called once, and it is not possible to checkpoint the process without it first having called this function. The program must call ORTE_FINALIZE before termination. A user may initiate a checkpoint of a parallel application by using the orte-checkpoint(1) and orte-restart(1) commands. AVAILABLE COMPONENTS
Open RTE ships with one SnapC component: full. The following MCA parameters apply to all components: snapc_base_verbose Set the verbosity level for all components. Default is 0, or silent except on error. snapc_base_global_snapshot_dir The directory to store the checkpoint snapshots. Default is /tmp. full SnapC Component The full component gathers together the local snapshots to the disk local to the Head Node Process (HNP) before completing the checkpoint of the process. This component does not currently support replicated HNPs, or timer based gathering of local snapshot data. This is a 3-tiered hierarchy of coordinators. The full component has the following MCA parameters: snapc_full_priority The component's priority to use when selecting the most appropriate component for a run. snapc_full_verbose Set the verbosity level for this component. Default is 0, or silent except on error. none SnapC Component The none component simply selects no SnapC component. All of the SnapC function calls return immediately with ORTE_SUCCESS. This component is the last component to be selected by default. This means that if another component is available, and the none component was not explicity requested then ORTE will attempt to activate all of the available components before falling back to this component. SEE ALSO
orte-checkpoint(1), orte-restart(1), opal-checkpoint(1), opal-restart(1), orte_filem(7), opal_crs(7) 1.4.5 Feb 10, 2012 ORTE_SNAPC(7)
Man Page

Featured Tech Videos