PPE/POE problems on AIX 5.2 on p690 systems


 
Thread Tools Search this Thread
Operating Systems AIX PPE/POE problems on AIX 5.2 on p690 systems
# 1  
Old 03-31-2005
PPE/POE problems on AIX 5.2 on p690 systems

We have PPE/POE problems on a 32 PE p690 system. After upgrading to the latest AIX 5.2 (ML 05) POE/PPE environment on a p690, we've noticed that mpi jobs could not start. I've trace the problem in the communication of poe client routines and the pmdv4 (/etc/pmdv4) partition
manager on our system. Specifically:

% lslpp -l '*poe*'
Fileset Level State Description
----------------------------------------------------------------------------
Path: /usr/lib/objrepos
ppe.poe 4.1.1.6 APPLIED poe Parallel Operating Environment
Path: /etc/objrepos
ppe.poe 4.1.1.6 APPLIED poe Parallel Operating
Environment

with AIX patched to latest levels as of 03/29/2005.

Trying poe on a simple 'hello world' mpi program:

% poe -procs 4 ./mpi -ilevel 6
INFO: DEBUG_LEVEL changed from 0 to 4
D1<L4>: Open of file /home/miket/SC/LSF/host.list successful
D1<L4>: mp_euilib = ip
D1<L4>: task 0 agave.tamu.edu 165.91.16.6 10
D1<L4>: task 1 agave.tamu.edu 165.91.16.6 10
D1<L4>: task 2 agave.tamu.edu 165.91.16.6 10
D1<L4>: task 3 agave.tamu.edu 165.91.16.6 10
D1<L4>: node allocation strategy = 0
D1<L4>: Entering pm_contact, jobid is 0
D1<L4>: Jobid = 1113711767
D1<L4>: POE security method is COMPAT
D1<L4>: Requesting service pmv4
D1<L4>: 1 master nodes
D1<L4>: Socket file descriptor for master 0 (agave.tamu.edu) is 4
ERROR: 0031-024 agave.tamu.edu: no response; rc = -1
D1<L4>: Non-zero status -1 returned from pm_mgr_init
D2<L4>: In pm_exit... About to call pm_remote_shutdown
D2<L4>: Elapsed time for pm_remote_shutdown: 0 seconds
D2<L4>: In pm_exit... Calling exit with status = -1 at Wed Mar 30 13:53:42 2005
-------------------

poe contacts pmdv4 via inetd, and pmdv4 leaves the following log entries each
time on /tmp/mplog.PID :


% cat /tmp/mplog.1306788
AIX Parallel Environment pmd4 version @(#) 2003/06/11 13:19:38
The ID of this process is 1306788
The version of this pmd for version checking is 4100
The hostname of this node is agave
The short hostname of this node is agave
Wed Mar 30 13:53:37 2005

ERROR: 0031-203 malformed from address: <Error 0>
pmd_exit reached!, exit code is 1
No collective communication shared memory segments to clean up.
-------------------

After tracing (truss) both poe (or mpi executables) and /etc/pmdv4, I've
noticed that pmdv4 closes the socket to poe client, with the `malformed from address: <Error 0>' after talking for a while with it. A socket read on poe side returns 0 and the poe client simply quits.

I do have ~/.rhosts on my home directory. However, the only services handled by inetd is SSH and the pmv4 and no rsh/exec allowed. But even allowing rsh did not change the behavior of pmdv4.

Should I be doing something different for POE / PPE to work now? Note that we are not using load-leveler or any other resource manager. Any hint would be greatly appreciated!

Thanks
Michael
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. AIX

AIX Cluster Show shared file systems.

Hello, I am working on applications on an AIX 6.1 two-node cluster, with an active and passive node. Is there a command that will show me which mount points / file systems are shared and 'swing' from one node to the other when the active node changes, and which mount points are truly local to... (6 Replies)
Discussion started by: Clovis_Sangrail
6 Replies

2. Shell Programming and Scripting

Script problems in hp unix systems

test.sh -------------- #This script deletes the temporary files created on the server when the user opens the output files # FILE_PATH=$1 P_FILE_PATH=$2 FRQ=$3 #FRQ=`expr $FRQ*60*24 | bc` #FRQ= 60 echo $FILE_PATH echo $P_FILE_PATH echo $FRQ if then find $FILE_PATH -mmin... (7 Replies)
Discussion started by: arjunbodduuxlx
7 Replies

3. AIX

AIX 5.3 performance problems

Hello, I encounter some performance issues on my AIX 5.3 server running in a LPAR on a P520. How do I investigate performance issues in AIX. Is there any kind of procedure that takes me to the steps to investigate my server and find the sub systems that is causing the issues? The performance... (1 Reply)
Discussion started by: petervg
1 Replies

4. AIX

various problems in aix

I have some doubts 1. Count the number for fields in a file, being separated by "space" 2. Reverse the order of fields in a file 3. Show the date after 10 days from the current date. (DD-MM-YYYY) 4. Convert GMT to IST (User Gives GMT and IST must be displayed) 5. Delete Duplicate... (1 Reply)
Discussion started by: tsurendra
1 Replies

5. AIX

Vulnerability AIX server (GROUPS/USERS) and SAP Systems ?

I hope you can understand me, although my english is not so good. I have a problem. I have installed 4 SAP Systems with different releases on the same server (AIX). Each SAP system has got its own operating system user through the installation. But all users belong to the group SAPSYS. So in other... (3 Replies)
Discussion started by: momok1976
3 Replies

6. AIX

How big is AIX installed base (i.e. number of deployed systems)?

Hi, Just wondered if anyone knows the approximate size of the AIX installed base, i.e. number of machines running out there. I'm expecting a figure of less than a million. There seems to be a lack of info on the web. Presumably only IBM know for sure! (3 Replies)
Discussion started by: garethr
3 Replies

7. UNIX for Dummies Questions & Answers

Printing systems in Solaris, AIX and HP-UX

Hi, Can anybody teach me the printing systems supported for Solaris 9, AIX and HP-UX 11i. Thanks in advance.:) (0 Replies)
Discussion started by: meeraramanathan
0 Replies

8. AIX

IBM AIX on AS/400 Systems

Sry for my beginner question. I didn't find a list with all supported server types for an AIX 5.3 installation. Unfortunately ibm.com page has problems with the sign in so I can't ask in the ibm foum. Will AIX 5.3 run on a 9402, 9404 or 9406 system? Thanks for your help. (3 Replies)
Discussion started by: analyzer
3 Replies

9. AIX

Problems with scp in AIX-5.2

I am using ssh version OpenSSH_3.8p1, SSH protocols 1.5/2.0, OpenSSL 0.9.6l When i ssh to this server its working fine ,sftp is also fine ,but when i use scp to copy files to it i am prompted for the password and on entering it throws up the following error stty: tcgetattr: A... (1 Reply)
Discussion started by: arnab
1 Replies

10. UNIX for Advanced & Expert Users

remote file copy across 2 systems (AIX and SCO)

Hello, Pls i need to copy some data from AIX Unix 4.3 to a SCO Openserve 5.0.5 using rcp command. But i keep on having permission error. WHAT IS THE SOLTION OR WHAT COMMAND CAN I USE AGAIN (4 Replies)
Discussion started by: aji
4 Replies
Login or Register to Ask a Question