MPI_Bcast problem, bug?
Hi, I'm trying to define an MPI_datatype for a structure, then do message passing for this created datatype. However, when I tried to broadcast the initialized data from rank 0, I found that part of the received data at other ranks are not correct. Could you please help me take a look at my code and suggest possible errors I made? Thanks a lot!
-----------
//definition of datatype:
-----------
typedef struct paramtwo {
int exist;
double phi0;
double ap;
double r0;
double rc_low;
double rc;
} PARAMTWO;
PARAMTWO **para2;// global variable
//in main.cpp:
----------
void main(){
......
MPI_Datatype paramtwompi;
define_MPI_Paramtwo(¶mtwompi);
para2= (PARAMTWO **)(malloc(NTYPEMAX*sizeof(PARAMTWO*)));
for(i=0;i<NTYPEMAX;i++){
para2[i]=(PARAMTWO*)(malloc(NTYPEMAX*sizeof(PARAMTWO)));
}
if(myrank==0) initialize_para2(); //initialize para2[][] values at rank 0
MPI_Bcast(&(para2[0][0]), (NTYPEMAX*NTYPEMAX), paramtwompi, 0, MPI_COMM_WORLD);
...... output from all ranks ......
}
int define_MPI_Paramtwo(MPI_Datatype *paramtwompi){
int blocklen[6]={1,1,1,1,1,1};
MPI_Aint disp[6];
MPI_Datatype type[6]={MPI_INT,MPI_DOUBLE, MPI_DOUBLE, MPI_DOUBLE, MPI_DOUBLE, MPI_DOUBLE};
PARAMTWO findsize[2];
MPI_Aint findsize_addr, exist_addr, phi0_addr, dumb_addr, ap_addr, r0_addr, rc_low_addr, rc_addr;
MPI_Get_address(&findsize[0], &findsize_addr);
MPI_Get_address(&(findsize[0]).exist, &exist_addr);
MPI_Get_address(&(findsize[0]).phi0, &phi0_addr);
MPI_Get_address(&(findsize[0]).ap, &ap_addr);
MPI_Get_address(&(findsize[0]).r0, &r0_addr);
MPI_Get_address(&(findsize[0]).rc_low, &rc_low_addr);
MPI_Get_address(&(findsize[0]).rc, &rc_addr);
disp[0]=exist_addr-findsize_addr;
disp[1]=phi0_addr-findsize_addr;
disp[2]=ap_addr-findsize_addr;
disp[3]=r0_addr-findsize_addr;
disp[4]=rc_low_addr-findsize_addr;
disp[5]=rc_addr-findsize_addr;
MPI_Type_create_struct(6, blocklen, disp, type, paramtwompi);
MPI_Type_commit(paramtwompi);
return 1;
}
void initialize_para2(){
for(i=1;i<=3;i++){
for(j=1;j<=3;j++){
para2[i][j].exist= ..;
para2[i][j].phi0 = ..;
... ...
para2[i][j].rc = ..;
}
}
}
==================================
From the output:
rank: 0 phi0,ap,r0,rc_low,rc: 0.075832 1.6752 3.64207 0 5
rank: 0 phi0,ap,r0,rc_low,rc: 0.807371 0.73124 4.49714 0 5.5
rank: 0 phi0,ap,r0,rc_low,rc: 0.974501 1.2846 3.08858 0 4
rank: 0 phi0,ap,r0,rc_low,rc: 0.807371 0.73124 4.49714 0 5.5
rank: 0 phi0,ap,r0,rc_low,rc: 0.066001 2.8757 4.31169 0 5
rank: 0 phi0,ap,r0,rc_low,rc: 0.581624 1.2566 3.25062 0 4
rank: 0 phi0,ap,r0,rc_low,rc: 0.974501 1.2846 3.08858 0 4
rank: 0 phi0,ap,r0,rc_low,rc: 0.581624 1.2566 3.25062 0 4
rank: 0 phi0,ap,r0,rc_low,rc: 0.085009 2.2124 4.20311 0 5.5
rank: 1 phi0,ap,r0,rc_low,rc: 0.075832 1.6752 3.64207 0 5
rank: 1 phi0,ap,r0,rc_low,rc: 0.807371 0.73124 4.49714 0 5.5
rank: 1 phi0,ap,r0,rc_low,rc: 0.974501 1.2846 3.08858 0 4
rank: 1 phi0,ap,r0,rc_low,rc: 0.807371 0 4.49714 0 5.5
rank: 1 phi0,ap,r0,rc_low,rc: 0.066001 0 4.31169 0 5
rank: 1 phi0,ap,r0,rc_low,rc: 0.581624 0 3.25062 0 4
rank: 1 phi0,ap,r0,rc_low,rc: 0.974501 1.2846 3.08858 0 4
rank: 1 phi0,ap,r0,rc_low,rc: 0.581624 1.2566 3.25062 0 4
rank: 1 phi0,ap,r0,rc_low,rc: 0.085009 2.2124 4.20311 0 5.5
the problem lies in the second data column, last 4~6 rows, their values are actually zero, while all other data are correct.
The partially correct data from other ranks indicates that the Broadcast has been performed, however, I don't understand how can partial data be wrong.
So my question is, why the broadcast partially fails? What did I do wrong in my coding? Thanks for the help!