Sponsored Content
Full Discussion: system getting crashed
Operating Systems Linux system getting crashed Post 302319378 by otheus on Monday 25th of May 2009 06:23:55 AM
Old 05-25-2009
First, go into the cron folder and update sysstat to run more frequently. On our centos systems, inside /etc/cron.d/sysstat, I use:
Code:
*/10 * * * * root /usr/lib/sa/sa1 -d -I 30 20

To have this take effect immediately, you need to delete today's sa file. Otherwise, the change will start taking place tomorrow.
Code:
rm -f /var/log/sa/sa`date +%d`

The next thing is to monitor processes. Something like this should work. Add to the sysstat cron file these two lines:
Code:
1 * * * * root find /var/log/sa -name "ps-*" -cmin +300 | xargs rm -f  &>/dev/null
* * * * * root ps -N --sort comm,pid -ww 
  -o  tty:1,pid,c,pmem:5,rss:8,sz:8,size:8=TSIZE,vsz:8,nlwp,lstart,wchan,args |
  sed -n 's/^? //p' |
  awk '$4 != "0" && $5 != "0"' 
  &>/var/log/sa/ps-`date +%H%m`

(Note, you must put these on exactly TWO lines. For readability, I've broken up the second entry onto multiple lines.)
Every hour, the first command cleans up after the second command any data that is more than 5 hours old (to prevent the directory from getting too full). You can change that if it's not enough. The second command runs every minute and saves a very details ps-listing to the disk.

If you have a hang, reboot and then run "sar -A", which should now give you very detailed information about everything. You might notice a memory spike followed by IO, or vice-versa. Note the time when the problem occurs, and then go into the appropriate ps-* files to see if you can see the problem process. You might need to look and previous ps outputs to see a change. The processes are ordered by command-name and pid, so you are able to do a "diff" between two ps files to see where a change really occurs.

Last edited by otheus; 05-25-2009 at 07:28 AM.. Reason: a few filters to exclude non-system non-kernel processes
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

kill crashed out users

Hi all, We have a problem where we get a fair few users either exiting incorrectly or crashing. I'm trying to get a script together that runs every hour to kill these processes off. We are running Sco OperServer(TM) Release 5 The command we use to get a list of users who have crashed: ps... (2 Replies)
Discussion started by: tez
2 Replies

2. Post Here to Contact Site Administrators and Moderators

Gollum got crashed

Gollum got crashed, needs Administrator's attention. Check this: https://www.unix.com/showthread.php?p=302093676 (0 Replies)
Discussion started by: tayyabq8
0 Replies

3. UNIX for Dummies Questions & Answers

old server crashed

Hello We had an old system designed in fortran that ran on a IBM RS6000 AIX 3.2 system. The person who designed is long gone. It was replaced with a completely different (non unix) system 6 years ago. We still used it for historical lookups of older information. Well yesterday it died. The... (5 Replies)
Discussion started by: billfaith
5 Replies

4. AIX

AIX Crashed..

My AIX 5.3 Machine Carshed Can any one tell some way to find out what went wrong.. I mean debug why it got creahed... (3 Replies)
Discussion started by: pbsrinivas
3 Replies

5. SuSE

suse 9 crashed

Hi, Running SLES 9 (update 4) on dell's poweredge 1950 server. Kernel: 2.6.5-7.315-smp #1 SMP Wed Nov 26 13:03:18 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux Yesterday night my monitoring service emailed me system(ssh/smtp) unreachable...I tried connection through ssh, it did not let me through... (0 Replies)
Discussion started by: upengan78
0 Replies

6. Linux

Suse Linux System Crashed

Friends please let me know. How we can check the reason why system restarted automatically or rebooted. Suggest me some way other than messag log. Thankx Bryan (1 Reply)
Discussion started by: bryanabhay
1 Replies

7. Solaris

Disk crashed

My system is SUN Solaris 5.6 and one of the disks on the server got crashed. Here are the details d23: Mirror Submirror 0: d24 State: Okay Submirror 1: d25 State: Okay Pass: 1 Read option: roundrobin (default) Write option: parallel (default) ... (1 Reply)
Discussion started by: asalman.qazi
1 Replies

8. Red Hat

Redhat Linux 4 crashed

Hi, I am having redhat enterprise linux 4 machine with kernel version 2.6.9-39.EL.Whenever I ran some java applications related to mechanical the system got crashed and powered off. last output is as shown below. reboot system boot 2.6.9-39.EL Fri Sep 24 15:23 (01:43)... (0 Replies)
Discussion started by: ktrimu
0 Replies

9. Red Hat

What do you do right after a server crashed.

What do you check???? Thanks! JC (0 Replies)
Discussion started by: 300zxmuro
0 Replies

10. Solaris

System got crashed.

Hi Admins, In my local Vmware system i have installed solaris but while getting my root disk mirrored in svm I changed the vfstab entries and rebooted the server , the server got crashed, and now the root file systems and other filesystems are crashed. Please help me in recovering this. (2 Replies)
Discussion started by: Laxxi
2 Replies
SA1(8)								Linux User's Manual							    SA1(8)

NAME
sa1 - Collect and store binary data in the system activity daily data file. SYNOPSIS
/usr/lib/sysstat/sa1 [ --boot | interval count ] DESCRIPTION
The sa1 command is a shell procedure variant of the sadc command and handles all of the flags and parameters of that command. The sa1 com- mand collects and stores binary data in the /var/log/sysstat/sadd file, where the dd parameter indicates the current day. The interval and count parameters specify that the record should be written count times at interval seconds. If no arguments are given to sa1 then a single record is written. The sa1 command is designed to be started automatically by the cron command. OPTIONS
--boot This option tells sa1 that the sadc command should be called without specifying the interval and count parameters in order to insert a dummy record, marking the time when the counters restarts from 0. EXAMPLE
To collect data (including those from disks) every 10 minutes, place the following entry in your root crontab file: 0,10,20,30,40,50 * * * * /usr/lib/sysstat/sa1 1 1 -S DISK Debian note The Debian sysstat package has already placed such an entry in your system crontab. Please refer to the /usr/share/doc/sys- stat/README.Debian file for details. FILES
/var/log/sysstat/sadd Indicate the daily data file, where the dd parameter is a number representing the day of the month. AUTHOR
Sebastien Godard (sysstat <at> orange.fr) SEE ALSO
sar(1), sadc(8), sa2(8), sadf(1), sysstat(5) http://pagesperso-orange.fr/sebastien.godard/ Linux FEBRUARY 2012 SA1(8)
All times are GMT -4. The time now is 01:59 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy