Solaris 10 server crashed two times


 
Thread Tools Search this Thread
Operating Systems Solaris Solaris 10 server crashed two times
# 1  
Old 01-21-2014
Solaris 10 server crashed two times

Hi,

I have two Solaris 10 servers. First server crashed last week (Monday) and second one crashed over the weekend. I have checked the logs such as /var/adm/messages, syslog and dmesg. So for I found none. My management wants to know why the server crashed. I need to come with some kind of reasons.

I also searched for core file and didn't find any. Can someone guide me what else I can do to figure out why the server crashed.
# 2  
Old 01-21-2014
What kind of hardware is it? Does it have ILOM? If it does, then you can check ILOM logs in /SP/logs/event/list (IIRC).
This User Gave Thanks to bartus11 For This Post:
# 3  
Old 01-21-2014
Both systems reboot OK? Did you look in the older /var/adm/messages log files and not just the current messages file? Is crash dump enabled? If not, you should enable it if possible.
This User Gave Thanks to fpmurphy For This Post:
# 4  
Old 01-22-2014
Quote:
Originally Posted by bartus11
What kind of hardware is it? Does it have ILOM? If it does, then you can check ILOM logs in /SP/logs/event/list (IIRC).
I have Sunfire E6900. which has four domain. But I only have access to one. Other three are used by different groups currently I think they took it offline.
second one Sunfire E2900

On the E6900, I did go to console I was hostname-scSmilie prompt. I typed " help " I saw this...

Code:
 
history          -- show command history
password         -- set the domain password
poweroff         -- powers off components 
poweron          -- powers on components
reset            -- reset the domain
resume           -- return to domain console
setdate          -- set the date and time for the domain
setdefaults      -- set default configuration values
setkeyswitch     -- set the keyswitch position
setls            -- set FRU location status
setupdomain      -- configure the domain
showboards       -- show board information
showcodusage     -- show COD resource usage
showcomponent    -- show state of a component
showdate         -- show the current date and time for the domain
showdomain       -- show domain configuration and status
showenvironment  -- show environmental information
showkeyswitch    -- show the keyswitch position
showlogs         -- show the logs
showresetstate   -- show CPU registers after reset
testboard        -- test a CPU/Memory board

I then typed " showlogs "

Code:
Jan 17 10:28:47 dev-sc Domain-D.SC: [ID 384869 local0.error] Domain watchdog timer expired.
Jan 17 10:28:47 dev-sc Domain-D.SC: [ID 180029 local0.notice] Using default hang-policy (RESET).
Jan 17 10:28:47 dev-sc Domain-D.SC: [ID 838382 local0.error] Saving reset state data before XIR.
Jan 17 10:28:50 dev-sc Domain-D.SC: [ID 580408 local0.notice] Resetting (XIR) domain.
Jan 17 10:28:50 dev-sc Domain-D.SC: [ID 815168 local0.error] Saving reset state data after XIR.

Can you advise what else I can look at?

---------- Post updated at 06:34 PM ---------- Previous update was at 06:29 PM ----------

Quote:
Originally Posted by fpmurphy
Both systems reboot OK? Did you look in the older /var/adm/messages log files and not just the current messages file? Is crash dump enabled? If not, you should enable it if possible.
yes. System is online now. Both of the server has Sybase running. Very important DB for the company. I did check the /var/adm/messages file. It has lot of data, but I didn't find anything useful as to why the system crashed.
# 5  
Old 01-22-2014
What does this show:
Code:
grep panic /var/adm/messages*
grep kern /var/adm/messages*

Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Choosing VPN server based on server response times

Hello all, I am using the VPN provider Private Internet Access. I am using the Raspberry Pi 4 with 4GB of RAM, performance on this upgraded board is great. Anyways I am connecting to its service using systemd's openvpn-client @ US_New_York_City.service I wonder if I can create a... (5 Replies)
Discussion started by: haloslayer255
5 Replies

2. Solaris

Validate mountpoints on solaris server after server reboot

Hi, anyone please let us know how to write shell script to find the missing mountpoints after server reboot. i want to take the mountpount information before server reboot, and validate the mountpoints after server reboot if any missing.please let us know the shell script from begining to end as... (24 Replies)
Discussion started by: VenkatReddy786
24 Replies

3. IP Networking

DNS server crashed

If Freebsd DNS server that served 100 people is crashed. How to move this 100 people to a new FreeBSD DNS server as quickly as possible? (1 Reply)
Discussion started by: AIX_30
1 Replies

4. Programming

Problem with implementing the times() function in C (struct tms times return zero/negative values)

Hello, i'm trying to implement the times() function and i'm programming in C. I'm using the "struct tms" structure which consists of the fields: The tms_utime structure member is the CPU time charged for the execution of user instructions of the calling process. The tms_stime structure... (1 Reply)
Discussion started by: g_p
1 Replies

5. Shell Programming and Scripting

Script to check for the newest file mutiple times a day and SCP it to another server.

Hi, I need a sample of a script that will check a specific directory multiple times throughout the day, and scp the newest file to another server. Example: current file is misc_file.txt_02272011 (the last part is the date), once that has been secure copied, another one may come in later the... (1 Reply)
Discussion started by: richasmi
1 Replies

6. Red Hat

What do you do right after a server crashed.

What do you check???? Thanks! JC (0 Replies)
Discussion started by: 300zxmuro
0 Replies

7. Linux

Find out process that crashed the server

Hi everybody, I want to find out all the processes that ran before a server crashed. Is that possible? I've looked in /var/log/messages and found out that the system was out of memory. A user probably wrote a script (in Perl or Python) that used up all available memory and crashed the... (11 Replies)
Discussion started by: z1dane
11 Replies

8. AIX

how would you know your server was rebooted 3 times or 5 times

Is there such location or command to know how many times did you reboot your server in that particular day?in AIX. (3 Replies)
Discussion started by: kenshinhimura
3 Replies

9. High Performance Computing

Removed crashed node from Solaris Cluster 3.0

All- I am new to these forums so please excuse me if this post is in the wrong place. I had a node crash in a 4 node cluster and mgmt has determined this node will not be part of the cluster when rebuilt. I am researching how to remove it from the cluster information on the other 3 nodes and... (2 Replies)
Discussion started by: bluescreen
2 Replies

10. UNIX for Dummies Questions & Answers

old server crashed

Hello We had an old system designed in fortran that ran on a IBM RS6000 AIX 3.2 system. The person who designed is long gone. It was replaced with a completely different (non unix) system 6 years ago. We still used it for historical lookups of older information. Well yesterday it died. The... (5 Replies)
Discussion started by: billfaith
5 Replies
Login or Register to Ask a Question