4 Servers hang with same error message


 
Thread Tools Search this Thread
Operating Systems AIX 4 Servers hang with same error message
# 1  
Old 12-16-2015
IBM 4 Servers hang with same error message

I have 3 Power7 710 Express boxes and 1 Power7 750Express box that all get into a weird hung state with the same console message. they all have AIX 6.1 TL7 (2x6100-07-08-1339, 2x6100-07-10-1415).

"NIS: Server not responding for domain X.X.X; still trying."

There are other Clients that use the same NIS Server and they do not have any issues. I have tried to restart the NIS server (/etc/init.d/ypserv restart) while the AIX boxes are hung, but, to no avail.

The NIS server is a fedora server running at ~14.1 utilized (yes very high, I agree), but if that was issue, I would expect other clients would have
issues, not just my AIX boxes.

When problem is occurring, I can telnet to the AIX box and get a login prompt, but the login isn't processed (cpu over utilized??). The console is active, again, the OS cannot process the login request, it does not respond back with the password prompt sometimes,. the box does not respond to a ping.

The only cure is reboot the box from HMC, then they are ok. There is NO errpt messages about any problem, no network, no kernel, no disk, no memory logged.

They will run for a few days, 10-20 days, then all 4 go into the same state almost at same time.

Just need a direction to go in or script to collect system stats so when the system hangs I can look at the output after reboot, or an idea where to start looking. I have ran diag and checked network card, sysplanar0, sisass0 and memory and all pass.
# 2  
Old 12-16-2015
it didn't hang, it is wating for NIS server ;-) First thing you can do is to remove NIS configuration from the server: rmyp -c. I would personally recommend to stop on this point, because you will never ever have any problems with NIS if you don't have it.

But I suppose, that you want to have your NIS configuration back. After NIS removal I'd try to reboot the server again and to see, that everything goes well and there are no other errors, such as hardware or network errors.

If the server starts without NIS, try to ping the NIS server, then check DNS and/or /etc/hosts.
Code:
# host NIS-server
# host IP-address-of-my-NIS-server
# host -n NIS-server
# host -n IP-address-of-my-NIS-server

If you have several DNS servers in your /etc/resolv.conf, try all of them:
Code:
# host -n NIS-server DNS-server-1
# host -n NIS-server DNS-server-2

If it is ok, you can try to configure NIS client again:
Code:
# domainname NIS-domain
# mkclient -B

In this case AIX will broadcast to find a suitable server. You can try to specify NIS server:
Code:
# mkclient -B -S NIS-server

But I would first try without the server - AIX should be able to find the server, and if it can't you have some problem with your NIS configuration. (But don't ask me which problem - I configured a NIS server last time in 1998).

Then check, that ypbind started:
Code:
# lssrc -s ypbind

If it is started, check which NIS server do you use:
Code:
# ypwhich

If everything is ok at this point, you can check, that you see your users (with their passwords of course):
Code:
# ypcat -k passwd

As a last step you can call me and we will discuss your future migration to LDAP, because NIS+ is not a part of AIX 7.2 anymore.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Programming

Please help:program hang stuck there signal handling on POSIX Message Queue UNIX C programming

in a single main() function,so need signal handling. Use Posix Message Queue IPC mechanism , can ignore the priority and other linked list message,to implement the scenario: client:Knock Knock server:who's there client: Eric Server:Eric,Welcome. client:exit all process terminated ... (1 Reply)
Discussion started by: ouou
1 Replies

2. Solaris

Out of family NDU causing servers to hang

Had to reboot over 30 servers out of 70 or so during out of family code load. Still searching but may all have VxVM 3.5 in common. Our last window had similar outcome but on HPUX. I think it has to do with time outs and volume manager is offlining the devices. This just started happening. Anyone... (0 Replies)
Discussion started by: GaryP
0 Replies

3. UNIX for Dummies Questions & Answers

Error Message

I am getting a error message when I try to assign this? Can someone help I am new to unix? $ First-name=james ksh:First-Name=james not found (1 Reply)
Discussion started by: vthokiefan
1 Replies

4. UNIX for Dummies Questions & Answers

Error Message

What does this means? - ERROR OPENING FILE - KEY LENGHT MISMATCH (2 Replies)
Discussion started by: RDM00
2 Replies

5. UNIX for Dummies Questions & Answers

getting last error message

Question for unix programmers - what function I need to used to get the exact error message when the library failed to load? Thanks (1 Reply)
Discussion started by: tttttt
1 Replies

6. UNIX for Advanced & Expert Users

Error message

Hi, My Solaris 5.8 system keeps getting this error at boot - "Can't set vol root to /vol" then /usr/sbin/vold: can't set vol root to /vol: Resource temporarily unavailiable Any idea what is wrong, and how do I fix it? (1 Reply)
Discussion started by: ghuber
1 Replies

7. AIX

Error Message with ls

When I try to list a directory that I have been using rsync to copy, I now am getting the following message. root # ls -alt /directory ls: /directory: Value too large to be stored in data type. total 0 I can change directory and list the contents of directories within /directory, but I... (2 Replies)
Discussion started by: sallender
2 Replies

8. Solaris

lp error message

Both of these messages are filling up the /var/adm/messages files on these two Sun boxes, goober and gomer. The print server is called gold. Jul 31 03:15:40 gold bsd-gw: request to ma28084.Solaris (unknown printer) from goober Jul 31 03:16:39 gold bsd-gw: request to ma28084.Solaris (unknown... (1 Reply)
Discussion started by: antalexi
1 Replies

9. UNIX for Advanced & Expert Users

Error message

I'm getting an error - symbol referencing errors. No output written to, etc Can anybody tell me why this is? (2 Replies)
Discussion started by: Dan Rooney
2 Replies

10. UNIX for Dummies Questions & Answers

error message

Hi All, occasionally my server gives this error messages "NOTICE:HTFS Out of inodes on HTFS dev hd (1/42)" why ?? Alice. (3 Replies)
Discussion started by: alisev
3 Replies
Login or Register to Ask a Question