The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > OS Specific Forums > AIX
Google UNIX.COM


AIX AIX is IBM's industry-leading UNIX operating system that meets the demands of applications that businesses rely upon in today's marketplace.

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
IBM AIX on AS/400 Systems analyzer AIX 3 11-10-2006 07:05 PM
rsh between systems DeepakXavier Shell Programming and Scripting 1 05-17-2006 10:40 AM
having 2 systems creative UNIX for Dummies Questions & Answers 4 06-25-2002 05:06 AM
'make' problems (compliation problems?) xyyz UNIX for Advanced & Expert Users 5 11-05-2001 07:47 PM
sun systems khussain IP Networking 1 07-27-2001 06:33 PM

Reply
 
Submit Tools LinkBack Thread Tools Search this Thread Display Modes
  #1  
Old 03-30-2005
Registered User
 

Join Date: Mar 2005
Location: College Station, TX
Posts: 12
PPE/POE problems on AIX 5.2 on p690 systems

We have PPE/POE problems on a 32 PE p690 system. After upgrading to the latest AIX 5.2 (ML 05) POE/PPE environment on a p690, we've noticed that mpi jobs could not start. I've trace the problem in the communication of poe client routines and the pmdv4 (/etc/pmdv4) partition
manager on our system. Specifically:

% lslpp -l '*poe*'
Fileset Level State Description
----------------------------------------------------------------------------
Path: /usr/lib/objrepos
ppe.poe 4.1.1.6 APPLIED poe Parallel Operating Environment
Path: /etc/objrepos
ppe.poe 4.1.1.6 APPLIED poe Parallel Operating
Environment

with AIX patched to latest levels as of 03/29/2005.

Trying poe on a simple 'hello world' mpi program:

% poe -procs 4 ./mpi -ilevel 6
INFO: DEBUG_LEVEL changed from 0 to 4
D1<L4>: Open of file /home/miket/SC/LSF/host.list successful
D1<L4>: mp_euilib = ip
D1<L4>: task 0 agave.tamu.edu 165.91.16.6 10
D1<L4>: task 1 agave.tamu.edu 165.91.16.6 10
D1<L4>: task 2 agave.tamu.edu 165.91.16.6 10
D1<L4>: task 3 agave.tamu.edu 165.91.16.6 10
D1<L4>: node allocation strategy = 0
D1<L4>: Entering pm_contact, jobid is 0
D1<L4>: Jobid = 1113711767
D1<L4>: POE security method is COMPAT
D1<L4>: Requesting service pmv4
D1<L4>: 1 master nodes
D1<L4>: Socket file descriptor for master 0 (agave.tamu.edu) is 4
ERROR: 0031-024 agave.tamu.edu: no response; rc = -1
D1<L4>: Non-zero status -1 returned from pm_mgr_init
D2<L4>: In pm_exit... About to call pm_remote_shutdown
D2<L4>: Elapsed time for pm_remote_shutdown: 0 seconds
D2<L4>: In pm_exit... Calling exit with status = -1 at Wed Mar 30 13:53:42 2005
-------------------

poe contacts pmdv4 via inetd, and pmdv4 leaves the following log entries each
time on /tmp/mplog.PID :


% cat /tmp/mplog.1306788
AIX Parallel Environment pmd4 version @(#) 2003/06/11 13:19:38
The ID of this process is 1306788
The version of this pmd for version checking is 4100
The hostname of this node is agave
The short hostname of this node is agave
Wed Mar 30 13:53:37 2005

ERROR: 0031-203 malformed from address: <Error 0>
pmd_exit reached!, exit code is 1
No collective communication shared memory segments to clean up.
-------------------

After tracing (truss) both poe (or mpi executables) and /etc/pmdv4, I've
noticed that pmdv4 closes the socket to poe client, with the `malformed from address: <Error 0>' after talking for a while with it. A socket read on poe side returns 0 and the poe client simply quits.

I do have ~/.rhosts on my home directory. However, the only services handled by inetd is SSH and the pmv4 and no rsh/exec allowed. But even allowing rsh did not change the behavior of pmdv4.

Should I be doing something different for POE / PPE to work now? Note that we are not using load-leveler or any other resource manager. Any hint would be greatly appreciated!

Thanks
Michael
Reply With Quote
Google The UNIX and Linux Forums
Forum Sponsor
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes




All times are GMT -7. The time now is 05:31 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited.
The UNIX and Linux Forums Content Copyright ©1993-2008. All Rights Reserved.Ad Management by RedTyger Visit The Complex Event Processing Blog

Content Relevant URLs by vBSEO 3.2.0