We have 2 node Oracle RAC Cluster. It is running RHEL 5 (2.6.18-92.1.10.el5PAE)
Hardware is HP DL360
We have node eviction issue, Oracle evicts node. It is not very consistent.
Oracle has looked all log files and said this is not Oracle issue but rather system or os issue. Server does not respond within say 30 seconds, so it gets evicted.
Oracle have also given some sample tools to run every few seconds called OS Watcher scripts which basically runs vmstat, top,mpstat etc every 10 seconds.
From these logs file, we see it stop updating logfile before eviction happens e.g.
Say node eviction happended at 3:41:00 AM and we see that it stooped looging into into log files around 03:40:30 AM so that clearly tells that server is not responding.
These scripts runs locally on server.
I am also running across server simple ping command to check status and we see similar behavior that even ping command stop and does not write any output before any eviction happens.
So from Oracle perspective, it is very clear that it is either Server (Hardware) or OS issue.
At the time node eviction happens, server was quite idle based on OS watcher collection.
My question is "How we go from here". I am not sys admin , I am DBA.
What other info, we could gather that would provide more info what is causing server hung issue. We don't see any thing in server log files ( message log)
What tools we could use to trobleshoot either hardware issue or OS issue.
---------- Post updated at 04:09 PM ---------- Previous update was at 03:45 PM ----------
Since it was obviously working up till recently, the question has to be about what has recenly changed in the operating systems on your platforms. Have OS patches been applied? Are disks filling up? Is a NIC failing?
this seem network problem on host2 through other nodes..check host2 network settings , network cables and private cluster network or host2 rac settings..
this seem network problem on host2 through other nodes..check host2 network settings , network cables and private cluster network or host2 rac settings..
If this one doesn't fix your problem... then please reply with the info I asked. I've faced the same problem a few months ago and now is .....gone.
Quote:
Originally Posted by mardaff
Hi,
Can you please mention the version of the Oracle RAC, the patchset,the number of nodes and the number of instances running on each node.
If this one doesn't fix your problem... then please reply with the info I asked. I've faced the same problem a few months ago and now is .....gone.
if you experience with this issue before then you must tell how to go (except resetup inode or reinstal ocfs2 or rac req rpms and reconfigure inode ) ..
sometimes version informations can be unimportant ..
I have a three node Oracle RAC cluster on RHEL4u7. User is using sftp to move files and the process works well on 2 of the three nodes. The other remaining node takes forever. I have gone through the /etc/sysconfig/network-scripts/ifcfg files and everything seems to be configured correctly.
... (0 Replies)