The UNIX and Linux Forums  


Go Back   The UNIX and Linux Forums > Operating Systems > SUN Solaris
.
google unix.com




View Single Post in the UNIX and Linux Forums - Click on the Thread or Permalink to View Entire Thread -->
  #5 (permalink)  
Old 09-05-2008
avronius avronius is offline VIP Member  
VIP Member
  
 

Join Date: Apr 2008
Location: Calgary
Posts: 305
When I run into a hardware problem that I've not experienced before, I generally run SUN vts, then Explorer, then check sunsolve and google. If the host is still under a service contract, I call SUN.

Install VTS - see what it says. Be prepared to let it run (impact system performance - so don't serve anything during testing) for a few hours. Don't be surprised if it doesn't find a problem - this can run for a couple of days before it hits on anything.

If you have the ability to monitor the power that is coming into the host - something that shows spikes (+/-) in power - that's a likely cause of this sort of error.

Then, on a separate host:

- start a terminal session.
- type: script /somewhere/date.problem_hostname.capture
- telnet into the console on the problem host

Ideally, you'll write a small shell script to run prtdiag -v and dmesg every 3 minutes or so. If you have utilities that you like, include them in the script. The next time that the server crashes, let it complete it's power cycle and then see if any new and interesting errors arise. Compare your prtdiag outputs over the course of the hours prior to the crash. See if there are drastic changes in temperature, etc.

Good luck!