OMPI-CHECKPOINT(1) Open MPI OMPI-CHECKPOINT(1)NAME
ompi-checkpoint, orte-checkpoint - Checkpoint a running parallel process using the Open MPI Checkpoint/Restart Service (CRS)
NOTE: ompi-checkpoint, and orte-checkpoint are all exact synonyms for each other. Using any of the names will result in exactly identical
behavior.
SYNOPSIS
ompi-checkpoint [ options ] <PID_OF_MPIRUN>
Options
orte-checkpoint will attempt to notify a running parallel job (identified by mpirun) that it has been requested that the job checkpoint
itself. A global snapshot handle reference is presented to the user, which is used in ompi_restart to restart the job.
<PID_OF_MPIRUN>
Process ID of the mpirun process.
-h | --help
Display help for this command
-w | --nowait
Do not wait for the application to finish checkpointing before returning.
-s | --status
Display status messages regarding the progression of the checkpoint request.
--term After checkpointing the running job, terminate it.
-v | --verbose
Enable verbose output for debugging.
-gmca | --gmca <key> <value>
Pass global MCA parameters that are applicable to all contexts. <key> is the parameter name; <value> is the parameter value.
-mca | --mca <key> <value>
Send arguments to various MCA modules.
DESCRIPTION
orte-checkpoint can be invoked multiple, non-overlapping times. It is convenient to note that the user does not need to spectify the
checkpointer to be used here, as that is determined completely by each of the running process in the job being checkpointed.
SEE ALSO orte-ps(1), orte-clean(1), ompi-restart(1), opal-checkpoint(1), opal-restart(1), opal_crs(7)1.4.5 Feb 10, 2012 OMPI-CHECKPOINT(1)
Check Out this Related Man Page
OPAL-CHECKPOINT(1) 1.4.5 OPAL-CHECKPOINT(1)NAME
opal-checkpoint - Checkpoint a running sequential process using the Open PAL Checkpoint/Restart Service (CRS).
Note: This should only be used by the user if the application being checkpointed is an OPAL-only application. If it is an Open RTE or Open
MPI program their respective tools should be used.
SYNOPSIS
opal-checkpoint [ options ] <PID>
Options
opal-checkpoint will attempt to notify a running process that it has been requested that the process checkpoint itself. A snapshot handle
reference is presented to the user, which is used in opal_restart to restart the process.
<PID> Process ID of the running target process.
-h | --help
Display help for this command
--term After checkpointing the running process, terminate it.
-v | --verbose
Enable verbose output for debugging.
-n | --name
Request a specific name for the local snapshot reference.
-w | --where
Request that the local snapshot reference be placed in a specific location.
-gmca | --gmca <key> <value>
Pass global MCA parameters that are applicable to all contexts. <key> is the parameter name; <value> is the parameter value.
-mca | --mca <key> <value>
Send arguments to various MCA modules.
DESCRIPTION
opal-checkpoint can be invoked multiple, non-overlapping times. This allows the user to take involuntary checkpoints of a running sequen-
tial process. See opal_crs(7) for more information about the CRS framework and components. It is convenient to note that the user does not
need to spectify the checkpointer to be used here, as that is determined completely by the running process being checkpointed.
SEE ALSO opal-restart(1), opal_crs(7)Open MPI Feb 10, 2012 OPAL-CHECKPOINT(1)
Hi all,
I am running a job .. and i want to know the status tht it is runnig or not ..
and how can i find the jobId of my job ..
I have to get it to kill my running job
Pls let me know da Unix commands to do it ..
i m wrking on Hp UNIX (1 Reply)
There is a unix process process in oracle running and i see running by typing ps -fea|grep GE_CLIENTES.
The question is How can i see if this process is running in paralel. I dont know with a Unix command or specifically its a comand from Oracle.
I kow a Parallel process ia a process that... (1 Reply)
When running our UNIX job scripts we randomly get the following 198 error below. When we restart the job it works fine. I haven't been able to recreate the problem in test, so I'm wondering if it has something to do with Cron or possibly a memory error or memory leak. I don't see anything... (5 Replies)
Bonjour tout le monde :D. Je suis entrain de développer un outil de checkpointing et j 'ai besoin de sauvegarder le contexte de processus.Donc je veux savoir s'il y a commandes ou algorithmes qui me donnent tout le contexte de processus ( les adresses et leurs contenues).
Le contexte d'un... (1 Reply)
how process checkpointing is carried out ? Actually i want detail steps to carry out process checkpointing with discription of each and every file included in it like the core dump file,what is ELF header etc. (5 Replies)
hey,
can any one please tell me how can i debug blcr??
actually i have checkpointd a client using blcr and i want to check out what actually happens when we checkpoint any program. so i want to see what happen when we type
$cr_checkpoint pid
i mean i want to debug when i enter this... (0 Replies)
Hello,
i have done the checkpoint of an application client server in C with BLCR (Berkeley Lab checkpoint restart), after a failure, i'd like to restart server (server.blcr) and client (client.blcr) but i should recreate sockets betwen new client and new server, have you an idea please ?
... (0 Replies)
Hi All,
I am running a parallel processing on aggregating a file. I am splitting the process into 7 separate parallel process and processing the same input file and the process will do the same for each 7 run. The issue I am having is for some reason the 1st parallel processes complete first... (7 Replies)