03-30-2009
MPI, recovering node
Hi all,
I'm writing an MPI application, in which I handle failures and recover them. In order to do that, in case of one node failure, I would like to remove that node from the MPI_COMM_WORLD group and continue with the remaining nodes.
Does anybody know how I can do that?
I'm using MPICH-G2 by the way.
thanks in advance.
10 More Discussions You Might Find Interesting
1. UNIX for Dummies Questions & Answers
I noticed this in a search for more security tools...
It IS possible to "undelete" a file; I suppose recover would be a better term for it. I suppose we've all made the boo-boo (that we all hopefully learned from) of deleting a file, and finding that you do not have a backup. I wouldn't... (1 Reply)
Discussion started by: LivinFree
1 Replies
2. SCO
I am helping a company recover a system that is SCO OS 5.0.5 - they have their backup media, cd copies of SCO, but they do not have their license keys to install and SCO is being difficult in validating their license.
Does anyone have an install license key for 5.0.5 that they would be willing... (1 Reply)
Discussion started by: ggraham
1 Replies
3. SCO
I've been working with SCO Unix for several years now but have never had to restore a system from a bare drive.
I have a bootable CD that contains what appears to be the correct files necessary to recover the boot and root filesystems.
I've got the BIOS setup such that the CD is the first... (12 Replies)
Discussion started by: teamhog
12 Replies
4. AIX
Hi,My system is not booting and at the startup it is getting struck.In HMC error code is coming as 0000, I know the reason of failing.I have few queries on recovery, please answer:1. I have mksysb of the system from which I can restore the system but problem is my few application mount point was a... (5 Replies)
Discussion started by: aixpank
5 Replies
5. Shell Programming and Scripting
I deleted one of the job from the cron tab. I want to get it back. How can i do this.
pplease suggest me..
thanks (1 Reply)
Discussion started by: pranabrana
1 Replies
6. SCO
I'm sorting out the disaster recovery plan for a critical server. It's a Dell PowerEdge 2850 running Openserver 5.0.6a.
We have a disaster recovery agreement with HP and they have just confirmed that in the event of a total disaster such as the server being totally wiped out, they would NOT... (2 Replies)
Discussion started by: mmcardle
2 Replies
7. UNIX for Advanced & Expert Users
I accidentally deleted a very important directory today with this rm -r. What would be the recommended way to recover my directory? After a lot of googleing I have seen these choices. Could I get some recommendations please?
Testdisk
Photorec- Doesn't recover file name like I would like. ... (10 Replies)
Discussion started by: cokedude
10 Replies
8. Solaris
Hi,
Is it possible to have a Solaris cluster of 2 nodes at SITE-A using SVM and creating metaset using say 2 LUNs (on SAN). Then replicating these 2 LUNs to remote site SITE-B via storage based replication and then using these LUNs by importing them as a metaset on a server at SITE-B which is... (0 Replies)
Discussion started by: dn2011
0 Replies
9. Homework & Coursework Questions
Hi Experts,
I am in need of running a script from one node say node 1 via node 2.
My scheduling tool dont have access to node2 , so i need to invoke the list file from node1 but the script needs to run from node2. because the server to which i am hitting, is having access only for the node... (5 Replies)
Discussion started by: arun1377
5 Replies
10. HP-UX
Hi,
We have HP UX service guard cluster on OS 11.23. Recently 40+ LUNs presented to both nodes by SAN team but I was asked to mount them on only one node. I created required VGs/LVs, created VxFS and mounted all of them and they are working fine. Now client requested those FS on 2nd node as... (4 Replies)
Discussion started by: prvnrk
4 Replies
LEARN ABOUT REDHAT
mpil_signal
MPIL_Signal(3) LAM/MPI MPIL_Signal(3)
NAME
MPIL_Signal - LAM/MPI-specific function to send a LAM signal to a rank in an MPI communicator
SYNOPSIS
#include <mpi.h>
int MPIL_Signal(MPI_Comm comm, int rank, int signo)
INPUT PARAMETER
dtype - MPI datatype (handle)
OUTPUT PARAMETER
ptid - datatype ID (integer)
NOTES
An asynchronous signal is delivered from one process to another with MPIL_Signal(). The target process is selected with a communicator and
a process rank within that communicator. The remaining argument, signo, identifies the signal to be delivered. These signals are com-
pletely apart from the signals provided by the underlying operating system. LAM signals, defined in <lam_ksignal.h>, are listed below.
LAM_SIGTRACE 1 unload trace data
LAM_SIGUDIE 4 terminate
LAM_SIGARREST 5 suspend execution
LAM_SIGRELEASE 6 continue execution
LAM_SIGA 7 user defined
LAM_SIGB 8 user defined
LAM_SIGFUSE 9 node about to die
LAM_SIGSHRINK 10 another node has died
This is a LAM/MPI-specific function. Most users should not have use for this function. If this function is used, it should be used in
conjunction with the LAM_MPI C preprocessor macro
#if LAM_MPI
MPIL_Signal(MPI_COMM_WORLD, 0, SIGINT);
#endif
NOTES FOR FORTRAN
All MPI routines in Fortran (except for MPI_WTIME and MPI_WTICK ) have an additional argument ierr at the end of the argument list. ierr
is an integer and has the same meaning as the return value of the routine in C. In Fortran, MPI routines are subroutines, and are invoked
with the call statement.
All MPI objects (e.g., MPI_Datatype , MPI_Comm ) are of type INTEGER in Fortran.
ERRORS
If an error occurs in an MPI function, the current MPI error handler is called to handle it. By default, this error handler aborts the MPI
job. The error handler may be changed with MPI_Errhandler_set ; the predefined error handler MPI_ERRORS_RETURN may be used to cause error
values to be returned (in C and Fortran; this error handler is less useful in with the C++ MPI bindings. The predefined error handler
MPI::ERRORS_THROW_EXCEPTIONS should be used in C++ if the error value needs to be recovered). Note that MPI does not guarantee that an MPI
program can continue past an error.
All MPI routines (except MPI_Wtime and MPI_Wtick ) return an error value; C routines as the value of the function and Fortran routines in
the last argument. The C++ bindings for MPI do not return error values; instead, error values are communicated by throwing exceptions of
type MPI::Exception (but not by default). Exceptions are only thrown if the error value is not MPI::SUCCESS
.
Note that if the MPI::ERRORS_RETURN handler is set in C++, while MPI functions will return upon an error, there will be no way to recover
what the actual error value was.
MPI_SUCCESS
- No error; MPI routine completed successfully.
MPI_ERR_COMM
- Invalid communicator. A common error is to use a null communicator in a call (not even allowed in MPI_Comm_rank ).
MPI_ERR_RANK
- Invalid source or destination rank. Ranks must be between zero and the size of the communicator minus one; ranks in a receive (
MPI_Recv , MPI_Irecv , MPI_Sendrecv , etc.) may also be MPI_ANY_SOURCE
.
MPI_ERR_OTHER
- Other error; use MPI_Error_string to get more information about this error code.
SEE ALSO
lam_ksignal
LOCATION
mpil_signal.c
LAM
/MPI 6.5.8 11/10/2002 MPIL_Signal(3)