Unix/Linux Go Back    


High Performance Computing Message Passing Interface (MPI) programming and tuning, MPI library installation and management, parallel administration tools, cluster monitoring, cluster optimization, and more HPC topics.

MPI_Bcast problem, bug?

High Performance Computing


Closed    
 
Thread Tools Search this Thread Display Modes
    #1  
Old Unix and Linux 03-21-2010
qb13 qb13 is offline
Registered User
 
Join Date: Mar 2010
Last Activity: 7 December 2010, 10:20 AM EST
Posts: 1
Thanks: 0
Thanked 0 Times in 0 Posts
MPI_Bcast problem, bug?

Hi, I'm trying to define an MPI_datatype for a structure, then do message passing for this created datatype. However, when I tried to broadcast the initialized data from rank 0, I found that part of the received data at other ranks are not correct. Could you please help me take a look at my code and suggest possible errors I made? Thanks a lot!

-----------
//definition of datatype:
-----------
typedef struct paramtwo {
int exist;
double phi0;
double ap;
double r0;
double rc_low;
double rc;
} PARAMTWO;

PARAMTWO **para2;// global variable

//in main.cpp:
----------
void main(){
......
MPI_Datatype paramtwompi;
define_MPI_Paramtwo(&paramtwompi);

para2= (PARAMTWO **)(malloc(NTYPEMAX*sizeof(PARAMTWO*)));
for(i=0;i<NTYPEMAX;i++){
para2[i]=(PARAMTWO*)(malloc(NTYPEMAX*sizeof(PARAMTWO)));
}

if(myrank==0) initialize_para2(); //initialize para2[][] values at rank 0
MPI_Bcast(&(para2[0][0]), (NTYPEMAX*NTYPEMAX), paramtwompi, 0, MPI_COMM_WORLD);

...... output from all ranks ......

}

int define_MPI_Paramtwo(MPI_Datatype *paramtwompi){
int blocklen[6]={1,1,1,1,1,1};
MPI_Aint disp[6];
MPI_Datatype type[6]={MPI_INT,MPI_DOUBLE, MPI_DOUBLE, MPI_DOUBLE, MPI_DOUBLE, MPI_DOUBLE};
PARAMTWO findsize[2];
MPI_Aint findsize_addr, exist_addr, phi0_addr, dumb_addr, ap_addr, r0_addr, rc_low_addr, rc_addr;

MPI_Get_address(&findsize[0], &findsize_addr);
MPI_Get_address(&(findsize[0]).exist, &exist_addr);
MPI_Get_address(&(findsize[0]).phi0, &phi0_addr);
MPI_Get_address(&(findsize[0]).ap, &ap_addr);
MPI_Get_address(&(findsize[0]).r0, &r0_addr);
MPI_Get_address(&(findsize[0]).rc_low, &rc_low_addr);
MPI_Get_address(&(findsize[0]).rc, &rc_addr);

disp[0]=exist_addr-findsize_addr;
disp[1]=phi0_addr-findsize_addr;
disp[2]=ap_addr-findsize_addr;
disp[3]=r0_addr-findsize_addr;
disp[4]=rc_low_addr-findsize_addr;
disp[5]=rc_addr-findsize_addr;


MPI_Type_create_struct(6, blocklen, disp, type, paramtwompi);
MPI_Type_commit(paramtwompi);

return 1;
}

void initialize_para2(){
for(i=1;i<=3;i++){
for(j=1;j<=3;j++){
para2[i][j].exist= ..;
para2[i][j].phi0 = ..;
... ...
para2[i][j].rc = ..;
}
}
}

==================================
From the output:

rank: 0 phi0,ap,r0,rc_low,rc: 0.075832 1.6752 3.64207 0 5
rank: 0 phi0,ap,r0,rc_low,rc: 0.807371 0.73124 4.49714 0 5.5
rank: 0 phi0,ap,r0,rc_low,rc: 0.974501 1.2846 3.08858 0 4
rank: 0 phi0,ap,r0,rc_low,rc: 0.807371 0.73124 4.49714 0 5.5
rank: 0 phi0,ap,r0,rc_low,rc: 0.066001 2.8757 4.31169 0 5
rank: 0 phi0,ap,r0,rc_low,rc: 0.581624 1.2566 3.25062 0 4
rank: 0 phi0,ap,r0,rc_low,rc: 0.974501 1.2846 3.08858 0 4
rank: 0 phi0,ap,r0,rc_low,rc: 0.581624 1.2566 3.25062 0 4
rank: 0 phi0,ap,r0,rc_low,rc: 0.085009 2.2124 4.20311 0 5.5

rank: 1 phi0,ap,r0,rc_low,rc: 0.075832 1.6752 3.64207 0 5
rank: 1 phi0,ap,r0,rc_low,rc: 0.807371 0.73124 4.49714 0 5.5
rank: 1 phi0,ap,r0,rc_low,rc: 0.974501 1.2846 3.08858 0 4
rank: 1 phi0,ap,r0,rc_low,rc: 0.807371 0 4.49714 0 5.5
rank: 1 phi0,ap,r0,rc_low,rc: 0.066001 0 4.31169 0 5
rank: 1 phi0,ap,r0,rc_low,rc: 0.581624 0 3.25062 0 4
rank: 1 phi0,ap,r0,rc_low,rc: 0.974501 1.2846 3.08858 0 4
rank: 1 phi0,ap,r0,rc_low,rc: 0.581624 1.2566 3.25062 0 4
rank: 1 phi0,ap,r0,rc_low,rc: 0.085009 2.2124 4.20311 0 5.5

the problem lies in the second data column, last 4~6 rows, their values are actually zero, while all other data are correct.
The partially correct data from other ranks indicates that the Broadcast has been performed, however, I don't understand how can partial data be wrong.
So my question is, why the broadcast partially fails? What did I do wrong in my coding? Thanks for the help!
Sponsored Links
Closed

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Linux More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
AIX OS problem? network problem? Jeon Jun Seok AIX 3 11-08-2009 12:17 AM
DHCP problem and eth1 problem sllinux UNIX for Dummies Questions & Answers 0 10-23-2009 02:45 AM
user login problem & Files listing problem. pernasivam AIX 1 06-18-2009 09:09 AM
problem in finding a hardware problem girish.batra Solaris 8 09-09-2008 10:10 AM
problem with dd command or maybe AFS problem Anta Shell Programming and Scripting 0 08-25-2006 10:10 AM



All times are GMT -4. The time now is 03:11 PM.