Restore Socket after checkpoint


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Restore Socket after checkpoint
# 1  
Old 09-24-2012
Restore Socket after checkpoint

Hello,

i have done the checkpoint of an application client server in C with BLCR (Berkeley Lab checkpoint restart), after a failure, i'd like to restart server (server.blcr) and client (client.blcr) but i should recreate sockets betwen new client and new server, have you an idea please ?

Thank you so much.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

6 More Discussions You Might Find Interesting

1. IP Networking

Clarification - Setting socket options at the same time when socket is listening

I need clarification on whether it is okay to set socket options on a listening socket simultaneously when it is being used in an accept() call? Following is the scenario:- -- Task 1 - is executing in a loop - polling a listen socket, lets call it 'fd', (whose file descriptor is global)... (2 Replies)
Discussion started by: jake24
2 Replies

2. Programming

Error with socket operation on non-socket

Dear Experts, i am compiling my code in suse 4.1 which is compiling fine, but at runtime it is showing me for socket programming error no 88 as i searched in errno.h it is telling me socket operation on non socket, what is the meaning of this , how to deal with this error , please... (1 Reply)
Discussion started by: vin_pll
1 Replies

3. Programming

socket function to read a webpage (socket.h)

Why does this socket function only read the first 1440 chars of the stream. Why not the whole stream ? I checked it with gdm and valgrind and everything seems correct... #include <stdio.h> #include <stdlib.h> #include <sys/types.h> #include <sys/stat.h> #include <string.h> #include... (3 Replies)
Discussion started by: cyler
3 Replies

4. Programming

which socket should socket option on be set

Hi all, On the server side, one socket is used for listening, the others are used for communicating with the client. My question is: if i want to set option for socket, which socket should be set on? If either can be set, what's the different? Again, what's the different if set option... (1 Reply)
Discussion started by: blademan100
1 Replies

5. UNIX for Advanced & Expert Users

connect problem for sctp socket (ipv6 socket) - Runtime fail Invalid Arguments

Hi, I was porting ipv4 application to ipv6; i was done with TCP transports. Now i am facing problem with SCTp transport at runtime. To test SCTP transport I am using following server and client socket programs. Server program runs fine, but client program fails giving Invalid Arguments for... (0 Replies)
Discussion started by: chandrutiptur
0 Replies

6. AIX

mksysb restore - Wrong OS level for restore

Hi all, I am still working on my mksysb restore. My latest issue is during an alt_disk_install from tape I got the following error after all the data had been restored. 0505-143 alt_disk_install: Unable to match mksysb level 5.2.0 with any available boot images. Please correct this... (0 Replies)
Discussion started by: pobman
0 Replies
Login or Register to Ask a Question
SRUN_CR(1)							 slurm components							SRUN_CR(1)

NAME
srun_cr - run parallel jobs with checkpoint/restart support SYNOPSIS
srun_cr [OPTIONS...] DESCRIPTION
The design of srun_cr is inspired by mpiexec_cr from MVAPICH2 and cr_restart form BLCR. It is a wrapper around the srun command to enable batch job checkpoint/restart support when used with SLURM's checkpoint/blcr plugin. OPTIONS
The srun_cr execute line options are identical to those of the srun command. See "man srun" for details. DETAILS
After initialization, srun_cr registers a thread context callback function. Then it forks a process and executes "cr_run --omit srun" with its arguments. cr_run is employed to exclude the srun process from being dumped upon checkpoint. All catchable signals except SIGCHLD sent to srun_cr will be forwarded to the child srun process. SIGCHLD will be captured to mimic the exit status of srun when it exits. Then srun_cr loops waiting for termination of tasks being launched from srun. The step launch logic of SLURM is augmented to check if srun is running under srun_cr. If true, the environment variable SURN_SRUN_CR_SOCKET should be present, the value of which is the address of a Unix domain socket created and listened to be srun_cr. After launching the tasks, srun tires to connect to the socket and sends the job ID, step ID and the nodes allocated to the step to srun_cr. Upon checkpoint, srun_cr checks to see if the tasks have been launched. If not srun_cr first forwards the checkpoint request to the tasks by calling the SLURM API slurm_checkpoint_tasks() before dumping its process context. Upon restart, srun_cr checks to see if the tasks have been previously launched and checkpointed. If true, the environment variable SLURM_RESTART_DIR is set to the directory of the checkpoint image files of the tasks. Then srun is forked and executed again. The envi- ronment variable will be used by the srun command to restart execution of the tasks from the previous checkpoint. COPYING
Copyright (C) 2009 National University of Defense Technology, China. Produced at National University of Defense Technology, China (cf, DISCLAIMER). CODE-OCEC-09-009. All rights reserved. This file is part of SLURM, a resource management program. For details, see <http://www.schedmd.com/slurmdocs/>. SLURM is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. SLURM is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. SEE ALSO
srun(1) srun_cr 2.0 March 2009 SRUN_CR(1)