Linux and UNIX Man Pages

Linux & Unix Commands - Search Man Pages

ompi-checkpoint(1) [debian man page]

OMPI-CHECKPOINT(1)						     Open MPI							OMPI-CHECKPOINT(1)

NAME
ompi-checkpoint, orte-checkpoint - Checkpoint a running parallel process using the Open MPI Checkpoint/Restart Service (CRS) NOTE: ompi-checkpoint, and orte-checkpoint are all exact synonyms for each other. Using any of the names will result in exactly identical behavior. SYNOPSIS
ompi-checkpoint [ options ] <PID_OF_MPIRUN> Options orte-checkpoint will attempt to notify a running parallel job (identified by mpirun) that it has been requested that the job checkpoint itself. A global snapshot handle reference is presented to the user, which is used in ompi_restart to restart the job. <PID_OF_MPIRUN> Process ID of the mpirun process. -h | --help Display help for this command -w | --nowait Do not wait for the application to finish checkpointing before returning. -s | --status Display status messages regarding the progression of the checkpoint request. --term After checkpointing the running job, terminate it. -v | --verbose Enable verbose output for debugging. -gmca | --gmca <key> <value> Pass global MCA parameters that are applicable to all contexts. <key> is the parameter name; <value> is the parameter value. -mca | --mca <key> <value> Send arguments to various MCA modules. DESCRIPTION
orte-checkpoint can be invoked multiple, non-overlapping times. It is convenient to note that the user does not need to spectify the checkpointer to be used here, as that is determined completely by each of the running process in the job being checkpointed. SEE ALSO
orte-ps(1), orte-clean(1), ompi-restart(1), opal-checkpoint(1), opal-restart(1), opal_crs(7) 1.4.5 Feb 10, 2012 OMPI-CHECKPOINT(1)

Check Out this Related Man Page

OPAL-CHECKPOINT(1)						       1.4.5							OPAL-CHECKPOINT(1)

NAME
opal-checkpoint - Checkpoint a running sequential process using the Open PAL Checkpoint/Restart Service (CRS). Note: This should only be used by the user if the application being checkpointed is an OPAL-only application. If it is an Open RTE or Open MPI program their respective tools should be used. SYNOPSIS
opal-checkpoint [ options ] <PID> Options opal-checkpoint will attempt to notify a running process that it has been requested that the process checkpoint itself. A snapshot handle reference is presented to the user, which is used in opal_restart to restart the process. <PID> Process ID of the running target process. -h | --help Display help for this command --term After checkpointing the running process, terminate it. -v | --verbose Enable verbose output for debugging. -n | --name Request a specific name for the local snapshot reference. -w | --where Request that the local snapshot reference be placed in a specific location. -gmca | --gmca <key> <value> Pass global MCA parameters that are applicable to all contexts. <key> is the parameter name; <value> is the parameter value. -mca | --mca <key> <value> Send arguments to various MCA modules. DESCRIPTION
opal-checkpoint can be invoked multiple, non-overlapping times. This allows the user to take involuntary checkpoints of a running sequen- tial process. See opal_crs(7) for more information about the CRS framework and components. It is convenient to note that the user does not need to spectify the checkpointer to be used here, as that is determined completely by the running process being checkpointed. SEE ALSO
opal-restart(1), opal_crs(7) Open MPI Feb 10, 2012 OPAL-CHECKPOINT(1)
Man Page

8 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Hw to Know the status of running JoB

Hi all, I am running a job .. and i want to know the status tht it is runnig or not .. and how can i find the jobId of my job .. I have to get it to kill my running job Pls let me know da Unix commands to do it .. i m wrking on Hp UNIX (1 Reply)
Discussion started by: ravi.sadani19
1 Replies

2. UNIX and Linux Applications

How can i see if a unix Process Aplication i.e oracle is running in parallel

There is a unix process process in oracle running and i see running by typing ps -fea|grep GE_CLIENTES. The question is How can i see if this process is running in paralel. I dont know with a Unix command or specifically its a comand from Oracle. I kow a Parallel process ia a process that... (1 Reply)
Discussion started by: alexcol
1 Replies

3. UNIX for Dummies Questions & Answers

UNIX Scripts "Load Error" with MicroFocus COBOL subprograms

When running our UNIX job scripts we randomly get the following 198 error below. When we restart the job it works fine. I haven't been able to recreate the problem in test, so I'm wondering if it has something to do with Cron or possibly a memory error or memory leak. I don't see anything... (5 Replies)
Discussion started by: rthiele
5 Replies

4. UNIX for Advanced & Expert Users

Contexte de processus

Bonjour tout le monde :D. Je suis entrain de développer un outil de checkpointing et j 'ai besoin de sauvegarder le contexte de processus.Donc je veux savoir s'il y a commandes ou algorithmes qui me donnent tout le contexte de processus ( les adresses et leurs contenues). Le contexte d'un... (1 Reply)
Discussion started by: markyz
1 Replies

5. UNIX for Dummies Questions & Answers

process checkpointing

how process checkpointing is carried out ? Actually i want detail steps to carry out process checkpointing with discription of each and every file included in it like the core dump file,what is ELF header etc. (5 Replies)
Discussion started by: pratibha
5 Replies

6. UNIX for Dummies Questions & Answers

blcr debugging

hey, can any one please tell me how can i debug blcr?? actually i have checkpointd a client using blcr and i want to check out what actually happens when we checkpoint any program. so i want to see what happen when we type $cr_checkpoint pid i mean i want to debug when i enter this... (0 Replies)
Discussion started by: pratibha
0 Replies

7. UNIX for Dummies Questions & Answers

Restore Socket after checkpoint

Hello, i have done the checkpoint of an application client server in C with BLCR (Berkeley Lab checkpoint restart), after a failure, i'd like to restart server (server.blcr) and client (client.blcr) but i should recreate sockets betwen new client and new server, have you an idea please ? ... (0 Replies)
Discussion started by: chercheur857
0 Replies

8. Linux

Inconsistency with parallel run

Hi All, I am running a parallel processing on aggregating a file. I am splitting the process into 7 separate parallel process and processing the same input file and the process will do the same for each 7 run. The issue I am having is for some reason the 1st parallel processes complete first... (7 Replies)
Discussion started by: arunkumar_mca
7 Replies