|
I had a similar problem but is was RHAS 2.1 and RHAS 3.0 on the same system. 2.1 would just die every once and a while but 3.0 was fine. It was caeued by the hangcheck timer (or watchdog/softdog). Heavy disk IO was causing the timer to fail to check in which would cause the sytem to reboot. Because it was a hard reboot syslogd wouldn't have time to write to the /var/logs so it took a while to figure out what the problem was. I caught it once we had built the RHAS 2.1 cluster with the machine. The other node would have STONITH messages in it's log. I guess the end of this story is to do an 'lsmod' and see if you have any similar modules installed that might react this way.
|