Sponsored Content
Operating Systems Linux Find out process that crashed the server Post 302411949 by MarkSeger on Saturday 10th of April 2010 08:13:33 AM
Old 04-10-2010
collectl - your one-stop tool

I just answered a previous note about memory usage and pointed the user at collectl. There are a couple of things worth noting - collectl is VERY lightweight, on the order of using <0.1% of the cpu when sampling system data every 10 seconds! When trying to track down something tricky you ALWAYS need fine grained time or you never see those spikes that so ofter offer at the least expected time. In fact if you want to sample once a second you're still <1%.

But back to the problem at hand. While you can certainly run ps from cron every hours there are 2 reasons why you might not want to. First of all, sampling once an hour isn't really going to help much unless you get real lucky. Second, even if ps did tell you something you might also want to get other things that happened at the time in question like CPU, memory usage, open files, etc. but you don't have access to it because you didn't think to ask ahead of time.

With collectl, you just start it running as a daemon and it will collect more than you thought of to ask. It will even collect info on your slab usage and a runaway allocation of slab memory can certainly trigger the out-of-memory killer.

Just note that collectl only monitors slabs/processes once a minute because there are high load tasks...

-mark
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Restarting a Crashed Process

Hello, I host a couple of Call of Duty gameing servers. There are some hackers who love the crash them. When they crash them it simply causes a segmentaion fault and kills the PID. I was wondering it you could help me write a script to simply restart the program after it has been crashed. The... (9 Replies)
Discussion started by: Phobos
9 Replies

2. Shell Programming and Scripting

how to find the chid process id from given parent process id

how to find the chid process id from given parent process id.... (the chid process doesnot have sub processes inturn) (3 Replies)
Discussion started by: guhas
3 Replies

3. UNIX for Dummies Questions & Answers

old server crashed

Hello We had an old system designed in fortran that ran on a IBM RS6000 AIX 3.2 system. The person who designed is long gone. It was replaced with a completely different (non unix) system 6 years ago. We still used it for historical lookups of older information. Well yesterday it died. The... (5 Replies)
Discussion started by: billfaith
5 Replies

4. Solaris

How to find the process that is using the port 80 and apache server.

How to find the process that is using the port 80 and apache server. When i used the command 'netstat -a|grep 80' it given that port 80 is in listening mode. I had used the following command: telnet localhost 80 GET / I had got some HTML script. But when I accessed the GUI ( url is... (7 Replies)
Discussion started by: vamshikrishnab
7 Replies

5. Shell Programming and Scripting

script to monitor process running on server and posting a mail if any process is dead

Hello all, I would be happy if any one could help me with a shell script that would determine all the processes running on a Unix server and post a mail if any of the process is not running or aborted. Thanks in advance Regards, pradeep kulkarni. :mad: (13 Replies)
Discussion started by: pradeepmacha
13 Replies

6. Shell Programming and Scripting

Find the process in different server

suppose there are in 10 different server how can i know in which server a process (ex:oracle )is running (6 Replies)
Discussion started by: alokjyotibal
6 Replies

7. Red Hat

Process does not dump any core files when crashed even if coredumpsize is unlimited

Hello Im using redhat and try to debug my application , its crashes and in strace I also see it has problems , but I can't see any core dump I configured all the limit ( im using .cshrc ) and it looks like this : cputime unlimited filesize unlimited datasize unlimited... (8 Replies)
Discussion started by: umen
8 Replies

8. Red Hat

What do you do right after a server crashed.

What do you check???? Thanks! JC (0 Replies)
Discussion started by: 300zxmuro
0 Replies

9. IP Networking

DNS server crashed

If Freebsd DNS server that served 100 people is crashed. How to move this 100 people to a new FreeBSD DNS server as quickly as possible? (1 Reply)
Discussion started by: AIX_30
1 Replies

10. Solaris

Solaris 10 server crashed two times

Hi, I have two Solaris 10 servers. First server crashed last week (Monday) and second one crashed over the weekend. I have checked the logs such as /var/adm/messages, syslog and dmesg. So for I found none. My management wants to know why the server crashed. I need to come with some kind of... (4 Replies)
Discussion started by: samnyc
4 Replies
PMATOP(1)						      General Commands Manual							 PMATOP(1)

NAME
pmatop - System & Process Monitor SYNOPSIS
Interactive usage: pmatop [-g|-m] [-L linelen] [-h host] [ interval [ samples ]] Writing and reading raw logfiles: pmatop -w rawfile [ interval [ samples ]] pmatop -r [ rawfile ] [-g|-m] [-L linelen] [-h host] DESCRIPTION
The program pmatop is an interactive monitor to view the load on a Linux system. It shows the occupation of the most critical hardware resources (from a performance point of view) on system level, i.e. cpu, memory, disk and network. By default metrics from the local host are displayed, but a different host may be specified with the [-h host] option. It is modeled after atop(1) and provides a showcase for the variety of data available via pmcd(1). Every interval (default: 10 seconds) information is shown about the resource occupation on system level (cpu, memory, disks and network layers), followed by a list of processes which have been active during the last interval If the list of active processes does not entirely fit on the screen, only the top of the list is shown. The intervals are repeated till the number of samples (specified as command argument) is reached, or till the key 'q' is pressed in inter- active mode. When pmatop is started, it checks whether the standard output channel is connected to a screen, or to a file/pipe. In the first case it produces screen control codes (via the ncurses library) and behaves interactively; in the second case it produces flat ASCII-output. In interactive mode, the output of pmatop scales dynamically to the current dimensions of the screen/window. Furthermore in interactive mode the output of pmatop can be controlled by pressing particular keys. However it is also possible to specify such key as flag on the command line. In that case pmatop switches to the indicated mode on beforehand; this mode can be modified again interactively. Specifying such key as flag is especially useful when running pmatop with output to a pipe or file (non-interactively). These flags are the same as the keys that can be pressed in interactive mode (see section INTERACTIVE COMMANDS). OUTPUT FORMAT
The output of pmatop consists of system level and process level information. The system level information consists of the following output lines: PRC Process and thread level totals. This line contains the total cpu time consumed in system mode (`sys') and in user mode (`user'), the total number of processes present at this moment (`#proc'), `sleeping interruptible' (`#tslpi') and `sleeping uninterruptible' (`#tslpu'), and the number of zombie pro- cesses (`#zombie'). CPU The occupation percentage of this process related to the available capacity for this resource on system level. This line contains the total CPU usage in system mode, in user mode, in irq mode, in idle mode, and in wait mode. The cpu lines con- tain this information on a per cpu basis. CPL This line contains load average information for the last minute, five minutes, and fifteen minutes. Also the number of context switches and the number of device interrupts. MEM This line contains the size of physical memory, free memory, page cache, buffer cache, and slab. SWP This line contains the size of swap, free swap, committed space, and committed space limit. PAG This line contains the number of page scans, allocstalls, swapins, and swapouts. LVM/MDD/DSK For every logical volume/multiple device/hard disk one line is shown containing the name, number of reads, and number of writes. NET The first line is for the upper TCP/IP layer and contains the number of packets received, packets transmitted, packets received. The next line is one per network interface and contains the number of packets received and number of packets transmitted. PROCESS The remaining lines are one line per process and can be controlled as described below. INTERACTIVE COMMANDS
When running pmatop interactively (no output redirection), keys can be pressed to control the output. g Show generic output (default). Per process the following fields are shown in case of a window-width of 80 positions: process-id, cpu consumption during the last interval in system- and user mode, the virtual and resident memory growth of the process. The subsequent columns are the username, number of threads in the thread group, the status and exit code are shown. The last columns contain the state, the occupation percentage for the chosen resource (default: cpu) and the process name. When more than 80 positions are available, other information is added. m Show memory related output. Per process the following fields are shown in case of a window-width of 80 positions: process-id, minor and major memory faults, size of virtual shared text, total virtual process size, total resident process size, virtual and resident growth during last interval, memory occupation percentage and process name. When more than 80 positions are available, other information is added. Miscellaneous interactive commands: ? Request for help information (also the key 'h' can be pressed). z The pause key can be used to freeze the current situation in order to investigate the output on the screen. While pmatop is paused, the keys described above can be pressed to show other information about the current list of processes. Whenever the pause key is pressed again, pmatop will continue with a next sample. SEE ALSO
PCPIntro(1), collectl(1), perl(1), python(1), pmlogger(1), pmcd(1), pmprobe(1), pmval(1), PMAPI(3), and pcp.conf(4). Performance Co-Pilot PCP PMATOP(1)
All times are GMT -4. The time now is 03:43 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy