Post mortem for critical Production AIX System Reboot/Crash


 
Thread Tools Search this Thread
Operating Systems AIX Post mortem for critical Production AIX System Reboot/Crash
# 1  
Old 12-16-2011
Post mortem for critical Production AIX System Reboot/Crash

Hello All,

Critical AIX production box crashed/rebooted while our team is working on it and we need to generate a detailed report for that, below are few questions that need to be included in the report. (We are System Administration team and everyone in our team has root access via sudo as well as root password, that is our main application server.)

1. Who is doing what? at the time of crash/reboot?
2. Was it user initiated roboot?
3. If user initiated then how can we substantiate?
4. What are the commands that need to be run to get our desired out put and why are we running those commands?

Regards,
Lovesaikrishna
# 2  
Old 12-16-2011
Check system logs and user shell histories for a start.
# 3  
Old 12-16-2011
last command will show you if it was a crash and who was logged in at the point in time
errpt will show you if the server had any problems
if its an oracle box - particularly rac - check the oracle logs if the node has been brought down by oracle. If you run nmon in data collection mode, check if you exceeded maxpin or if you were running out of paging space.

If you have a crash dump send it to IBM - they can determine what was exactly causing the crash

regards
zxmaus
# 4  
Old 01-10-2012
Hi

The best way to find the root cause analysis is to log a PMR with IBM. Recently I met a SAP application server crash which is running in AIX. IBM may ask you to execute a few commands like
Code:
# snap -r
# snap -gbc

Once you send all the output they need they will come back to you with their analysis.
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. HP-UX

Machine dumps crash on each reboot.

Hi experts, My HP machine dumps a crash upon each reboot(even if we reboot it manually) and fill the root space. Can anyone please point out what config parameter could be went wrong to happen this? Thanks, Vaishey (2 Replies)
Discussion started by: Vaishey
2 Replies

2. Solaris

Ldom clone from production system

Hi! I am quite new to Solaris and come from AIX world. I need to clone running production LDOM. In AIX you just take mksysb and restore it to new LPAR. It will install with blank network settings. How to do it in Solaris 11? Can't find document to clone from running system.. they talk only... (6 Replies)
Discussion started by: padapada
6 Replies

3. Emergency UNIX and Linux Support

AIX: Production email issue

Hello, system generated emails sent to users from production scripts within Aix arent going out. In the errpt -a output I see: _______________________________________________________ LABEL: SRC_SVKO IDENTIFIER: BC3BE5A3 Date/Time: Tue Mar 13 16:28:07 EDT 2012 Sequence... (2 Replies)
Discussion started by: NycUnxer
2 Replies

4. Filesystems, Disks and Memory

ext4 - ready for production system?

Gidday, Are you using ext4 for production system? Or is it better to opt for a more conservative strategy, like ext3 for instance? What are your experiences? Thanks in advance, Loïc. (3 Replies)
Discussion started by: Loic Domaigne
3 Replies

5. UNIX for Dummies Questions & Answers

System check after crash?

Hi all, First and foremost I would like to mention I am pretty new to Linux. I have some experience in Windows systems administration but nothing professionally. Expect some real newbie questions. For a small company I am trying to install a server so a development environment can be... (1 Reply)
Discussion started by: Crazy Harry
1 Replies

6. AIX

System crash when update TL

Hello everyone I have a partition with the TL 5300-06-01-0000 and try to update to this TL 5300-09-03-0918 but suddenly the update dont work more. I get in to the HMC and I see this code error 888 102 700 0C5 I try to restart my partition but doesnt work. Theres someone who has this... (9 Replies)
Discussion started by: lo-lp-kl
9 Replies

7. UNIX for Dummies Questions & Answers

Post mortem of a virus :)

Hi, My pen-drive got infected with a virus when I used it on a windows system. When working on a fedora system, I could view the files that the virus created, and the virus exe file itself. I navigated into the pen drive using the bash prompt, and opened the virus exe file with the vi... (7 Replies)
Discussion started by: sdsd
7 Replies

8. HP-UX

HP-UX system crash help please!!!

Hi, First of all, thanks for your help. I have downloaded freeBSD to study unix I'm here again 'cause my HP-UX 10.x has "broken". It raise this error: ---------------------------------------------------- Diagnostic System warning: = 0x1f005000 is POWERFAILED The diagnostic logging... (1 Reply)
Discussion started by: efrenba
1 Replies

9. UNIX for Dummies Questions & Answers

Tru64 system crash

Hello All We are using Tru64 Unix V4.0D on a Digital Alpha 1200 machine, which by coincidence, crashed last night! Completely out of the Blue!!! After rebooting the system and recovering the databases everything seemed to work ok. It is however, discomforting not to know how or why the system... (2 Replies)
Discussion started by: Ivo
2 Replies
Login or Register to Ask a Question