Sponsored Content
Full Discussion: system getting crashed
Operating Systems Linux system getting crashed Post 302319378 by otheus on Monday 25th of May 2009 06:23:55 AM
Old 05-25-2009
First, go into the cron folder and update sysstat to run more frequently. On our centos systems, inside /etc/cron.d/sysstat, I use:
Code:
*/10 * * * * root /usr/lib/sa/sa1 -d -I 30 20

To have this take effect immediately, you need to delete today's sa file. Otherwise, the change will start taking place tomorrow.
Code:
rm -f /var/log/sa/sa`date +%d`

The next thing is to monitor processes. Something like this should work. Add to the sysstat cron file these two lines:
Code:
1 * * * * root find /var/log/sa -name "ps-*" -cmin +300 | xargs rm -f  &>/dev/null
* * * * * root ps -N --sort comm,pid -ww 
  -o  tty:1,pid,c,pmem:5,rss:8,sz:8,size:8=TSIZE,vsz:8,nlwp,lstart,wchan,args |
  sed -n 's/^? //p' |
  awk '$4 != "0" && $5 != "0"' 
  &>/var/log/sa/ps-`date +%H%m`

(Note, you must put these on exactly TWO lines. For readability, I've broken up the second entry onto multiple lines.)
Every hour, the first command cleans up after the second command any data that is more than 5 hours old (to prevent the directory from getting too full). You can change that if it's not enough. The second command runs every minute and saves a very details ps-listing to the disk.

If you have a hang, reboot and then run "sar -A", which should now give you very detailed information about everything. You might notice a memory spike followed by IO, or vice-versa. Note the time when the problem occurs, and then go into the appropriate ps-* files to see if you can see the problem process. You might need to look and previous ps outputs to see a change. The processes are ordered by command-name and pid, so you are able to do a "diff" between two ps files to see where a change really occurs.

Last edited by otheus; 05-25-2009 at 07:28 AM.. Reason: a few filters to exclude non-system non-kernel processes
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

kill crashed out users

Hi all, We have a problem where we get a fair few users either exiting incorrectly or crashing. I'm trying to get a script together that runs every hour to kill these processes off. We are running Sco OperServer(TM) Release 5 The command we use to get a list of users who have crashed: ps... (2 Replies)
Discussion started by: tez
2 Replies

2. Post Here to Contact Site Administrators and Moderators

Gollum got crashed

Gollum got crashed, needs Administrator's attention. Check this: https://www.unix.com/showthread.php?p=302093676 (0 Replies)
Discussion started by: tayyabq8
0 Replies

3. UNIX for Dummies Questions & Answers

old server crashed

Hello We had an old system designed in fortran that ran on a IBM RS6000 AIX 3.2 system. The person who designed is long gone. It was replaced with a completely different (non unix) system 6 years ago. We still used it for historical lookups of older information. Well yesterday it died. The... (5 Replies)
Discussion started by: billfaith
5 Replies

4. AIX

AIX Crashed..

My AIX 5.3 Machine Carshed Can any one tell some way to find out what went wrong.. I mean debug why it got creahed... (3 Replies)
Discussion started by: pbsrinivas
3 Replies

5. SuSE

suse 9 crashed

Hi, Running SLES 9 (update 4) on dell's poweredge 1950 server. Kernel: 2.6.5-7.315-smp #1 SMP Wed Nov 26 13:03:18 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux Yesterday night my monitoring service emailed me system(ssh/smtp) unreachable...I tried connection through ssh, it did not let me through... (0 Replies)
Discussion started by: upengan78
0 Replies

6. Linux

Suse Linux System Crashed

Friends please let me know. How we can check the reason why system restarted automatically or rebooted. Suggest me some way other than messag log. Thankx Bryan (1 Reply)
Discussion started by: bryanabhay
1 Replies

7. Solaris

Disk crashed

My system is SUN Solaris 5.6 and one of the disks on the server got crashed. Here are the details d23: Mirror Submirror 0: d24 State: Okay Submirror 1: d25 State: Okay Pass: 1 Read option: roundrobin (default) Write option: parallel (default) ... (1 Reply)
Discussion started by: asalman.qazi
1 Replies

8. Red Hat

Redhat Linux 4 crashed

Hi, I am having redhat enterprise linux 4 machine with kernel version 2.6.9-39.EL.Whenever I ran some java applications related to mechanical the system got crashed and powered off. last output is as shown below. reboot system boot 2.6.9-39.EL Fri Sep 24 15:23 (01:43)... (0 Replies)
Discussion started by: ktrimu
0 Replies

9. Red Hat

What do you do right after a server crashed.

What do you check???? Thanks! JC (0 Replies)
Discussion started by: 300zxmuro
0 Replies

10. Solaris

System got crashed.

Hi Admins, In my local Vmware system i have installed solaris but while getting my root disk mirrored in svm I changed the vfstab entries and rebooted the server , the server got crashed, and now the root file systems and other filesystems are crashed. Please help me in recovering this. (2 Replies)
Discussion started by: Laxxi
2 Replies
CRON(8) 						    BSD System Manager's Manual 						   CRON(8)

NAME
cron -- daemon to execute scheduled commands (ISC Cron V4.1) SYNOPSIS
cron [-n] [-x debugflags] DESCRIPTION
cron is normally started during system boot by rc.d(8) framework, if cron is switched on in rc.conf(5). It will return immediately so you don't have to start it with '&'. cron searches /var/cron/tabs for crontab files which are named after accounts in /etc/passwd. Crontabs found are loaded into memory. cron also searches for /etc/crontab which is in a different format (see crontab(5)). Finally cron looks for crontabs in /etc/cron.d if it exists, and executes each file as a crontab. When cron looks in a directory for crontabs (either in /var/cron/tabs or /etc/cron.d) it will not process files that: - Start with a '.' or a '#'. - End with a '~' or with ``.rpmsave'', ``.rpmorig'', or ``.rpmnew''. - Are of zero length. - Their length is greater than MAXNAMLEN. cron then wakes up every minute, examining all stored crontabs, checking each command to see if it should be run in the current minute. When executing commands, any output is mailed to the owner of the crontab (or to the user named in the MAILTO environment variable in the crontab, if such exists). Events such as START and FINISH are recorded in the /var/log/cron log file with date and time details. This information is useful for a num- ber of reasons, such as determining the amount of time required to run a particular job. By default, root has an hourly job that rotates these log files with compression to preserve disk space. Additionally, cron checks each minute to see if its spool directory's modtime (or the modtime on /etc/crontab or /etc/cron.d) has changed, and if it has, cron will then examine the modtime on all crontabs and reload those which have changed. Thus cron need not be restarted when- ever a crontab file is modified. Note that the crontab(1) command updates the modtime of the spool directory whenever it changes a crontab. The following options are available: -x This flag turns on some debugging flags. debugflags is comma-separated list of debugging flags to turn on. If a flag is turned on, cron writes some additional debugging information to system log during its work. Available debugging flags are: sch scheduling proc process control pars parsing load database loading misc miscellaneous test test mode - do not actually execute any commands bit show how various bits are set (long) ext print extended debugging information -n Stay in the foreground and don't daemonize cron. Daylight Saving Time and other time changes Local time changes of less than three hours, such as those caused by the start or end of Daylight Saving Time, are handled specially. This only applies to jobs that run at a specific time and jobs that are run with a granularity greater than one hour. Jobs that run more fre- quently are scheduled normally. If time has moved forward, those jobs that would have run in the interval that has been skipped will be run immediately. Conversely, if time has moved backward, care is taken to avoid running jobs twice. Time changes of more than 3 hours are considered to be corrections to the clock or timezone, and the new time is used immediately. SIGNALS
On receipt of a SIGHUP, the cron daemon will close and reopen its log file. This is useful in scripts which rotate and age log files. Natu- rally this is not relevant if cron was built to use syslog(3). FILES
/var/cron/tabs cron spool directory /etc/crontab system crontab file /etc/cron.d/ system crontab directory /var/log/cron log file for cron events SEE ALSO
crontab(1), crontab(5) AUTHORS
Paul Vixie <vixie@isc.org> BSD
October 12, 2011 BSD
All times are GMT -4. The time now is 03:46 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy