Sponsored Content
Operating Systems Linux Find out process that crashed the server Post 302411949 by MarkSeger on Saturday 10th of April 2010 08:13:33 AM
Old 04-10-2010
collectl - your one-stop tool

I just answered a previous note about memory usage and pointed the user at collectl. There are a couple of things worth noting - collectl is VERY lightweight, on the order of using <0.1% of the cpu when sampling system data every 10 seconds! When trying to track down something tricky you ALWAYS need fine grained time or you never see those spikes that so ofter offer at the least expected time. In fact if you want to sample once a second you're still <1%.

But back to the problem at hand. While you can certainly run ps from cron every hours there are 2 reasons why you might not want to. First of all, sampling once an hour isn't really going to help much unless you get real lucky. Second, even if ps did tell you something you might also want to get other things that happened at the time in question like CPU, memory usage, open files, etc. but you don't have access to it because you didn't think to ask ahead of time.

With collectl, you just start it running as a daemon and it will collect more than you thought of to ask. It will even collect info on your slab usage and a runaway allocation of slab memory can certainly trigger the out-of-memory killer.

Just note that collectl only monitors slabs/processes once a minute because there are high load tasks...

-mark
 

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Restarting a Crashed Process

Hello, I host a couple of Call of Duty gameing servers. There are some hackers who love the crash them. When they crash them it simply causes a segmentaion fault and kills the PID. I was wondering it you could help me write a script to simply restart the program after it has been crashed. The... (9 Replies)
Discussion started by: Phobos
9 Replies

2. Shell Programming and Scripting

how to find the chid process id from given parent process id

how to find the chid process id from given parent process id.... (the chid process doesnot have sub processes inturn) (3 Replies)
Discussion started by: guhas
3 Replies

3. UNIX for Dummies Questions & Answers

old server crashed

Hello We had an old system designed in fortran that ran on a IBM RS6000 AIX 3.2 system. The person who designed is long gone. It was replaced with a completely different (non unix) system 6 years ago. We still used it for historical lookups of older information. Well yesterday it died. The... (5 Replies)
Discussion started by: billfaith
5 Replies

4. Solaris

How to find the process that is using the port 80 and apache server.

How to find the process that is using the port 80 and apache server. When i used the command 'netstat -a|grep 80' it given that port 80 is in listening mode. I had used the following command: telnet localhost 80 GET / I had got some HTML script. But when I accessed the GUI ( url is... (7 Replies)
Discussion started by: vamshikrishnab
7 Replies

5. Shell Programming and Scripting

script to monitor process running on server and posting a mail if any process is dead

Hello all, I would be happy if any one could help me with a shell script that would determine all the processes running on a Unix server and post a mail if any of the process is not running or aborted. Thanks in advance Regards, pradeep kulkarni. :mad: (13 Replies)
Discussion started by: pradeepmacha
13 Replies

6. Shell Programming and Scripting

Find the process in different server

suppose there are in 10 different server how can i know in which server a process (ex:oracle )is running (6 Replies)
Discussion started by: alokjyotibal
6 Replies

7. Red Hat

Process does not dump any core files when crashed even if coredumpsize is unlimited

Hello Im using redhat and try to debug my application , its crashes and in strace I also see it has problems , but I can't see any core dump I configured all the limit ( im using .cshrc ) and it looks like this : cputime unlimited filesize unlimited datasize unlimited... (8 Replies)
Discussion started by: umen
8 Replies

8. Red Hat

What do you do right after a server crashed.

What do you check???? Thanks! JC (0 Replies)
Discussion started by: 300zxmuro
0 Replies

9. IP Networking

DNS server crashed

If Freebsd DNS server that served 100 people is crashed. How to move this 100 people to a new FreeBSD DNS server as quickly as possible? (1 Reply)
Discussion started by: AIX_30
1 Replies

10. Solaris

Solaris 10 server crashed two times

Hi, I have two Solaris 10 servers. First server crashed last week (Monday) and second one crashed over the weekend. I have checked the logs such as /var/adm/messages, syslog and dmesg. So for I found none. My management wants to know why the server crashed. I need to come with some kind of... (4 Replies)
Discussion started by: samnyc
4 Replies
COLGUI(1)							      colgui								 COLGUI(1)

NAME
colgui - realtime plotting for collectl on one or more systems (all must have collectl installed) SYNOPSIS
colgui [-switches] colgui --machines machinesfile [-switches] colgui --hosts pattern [-switches] colgui --address addresses [-switches] DESCRIPTION
Provides a grapical user interface to collectl, displaying real-time graphs for one or more hosts. By default, plots are generated for the local system. One can specify other/additional systems via a file containg a list of those addresses, the hosts listed in /etc/hosts by applying an appropriate filter or by specifying a specific address or addresses at the command line. BASIC SWITCHES
The easiest way to get started is to use one or more of the following switches which many people find meet most of their needs. Over time the need may arise to change the way the display looks, modify the data collection itself, simultaneously log the data as it is being col- lected or even change the way colgui connects to remotes systems. In those situations, more advanced switches are provided. Common Options When first getting started, you can use the following switches to generate plots for your local system. To generate remote plots see the following section on "Host Selection". --i interval The frequency at which data should be collected. This is passed unaltered to collectl as -i. --r rowsize The number of plots displayed in a row before starting a new row. By default, a new row is automatically started for each host. see --geometry to alter this behavior. --s subsys Select the plots to display by the "standard" subsystems that collectl uses. This too is passed unaltered to collectl as -s. Host Selection --hosts pattern The hosts are chosen from the /etc/hosts file by executing the command "grep pattern /etc/hosts". The display form of the hostname will be taken from the second field if it is defined. When using "--geometry nd", the third column will be used. --address addresses One or more host names, separated by spaces and quoted if necessary. If it is desired to display a shorter hostname when "--geome- tery nd" is chosen, append that synonym to the hostname separated by a colon. -machines machinesfile The machinesfile is a text file similar in format to the /etc/hosts file. See below for more details on the format and how the 2nd and 3rd names (if specified) will be used. Alternate Plot Selection These additional plot selection options can be used in any combination with or without -s. -p plots Select one or more plots, many of which can also be selected by -s. For more information see "Plot Selection" further below. -c plots Select one or more custom, user developed plots. check out /opt/colplot/examples/*cfg to see how these work... ADVANCED PLOTTING SELECTIONS
The first set of these effect the size of individual plots and how they are displayed. --xaxis int Change the size of the x-axis to be n-intervals wide, where an interval corresponds to "-i int" seconds. --yaxis int Change the size of the y-axis to be "int" pixels high. --geometry [n, c, nd, cd] Choose the display geometry. By default, everything displays in "normal" mode, that is a new row is started for each host. In "compact" mode, each row is filled to the number of plots specified by -r. Dense modes, specified by adding the "d" modifier to one of the other two modes, removes many of the elements common to each plot and displays them elsewhere, proving more efficient use of the screen real estate, something that becomes more important as the num- ber of plots grows. NOTE - colgui always generates the same number of plots for all systems. This means that if doing detail plots where the number of network, disks, etc can in fact be different, colgui will pad unused entries with blank plots which won't have an active sweeper line in them. Some of the less common plotting switches are: --homogeneous When colgui starts up, it queries each node for its configuration since some nodes can have different numbers of devices or device names. When there are a large number of nodes this can slow down the whole startup process. This switch will set the configura- tions of all nodes to that of the first one querried and can significantly speed startup. Be very careful when doing detail report- ing becuase if two systems have a different number of devices, you will either get errors or incorrect data displayed. If any device names differ (and this is always the case with lustre), all systems will show the same names and this can be confusing. --plottype [l, p, b, s, r] Line plots, the default, are displayed using connected solid lines, indexed from the beginning Y axis value. A "point" plot, also known as a scatter plot but the "s" was taken, is one in which the points are not connected. "Bar" plots are vertical bars, more often associated with business graphics. Appending the "s" to any of the first three types (I told you the "s" was taken) of plots will produce "stacked" plots (when there are multiple values being plotted) such that rather than each point relative to the base of the y-axis it is stacked on top of the previous one. Radial or "radar" plots are actually circular plots and this must be combined with l or p and optionally s. At this time, radial plots may produce some oddly formatted displays. --radint num By default, a radial plot has the same number of intervals as an "xy" plot, that is based on the value of --xaxis. This switch allows seeing that interval independently. --smooth num Some data may be presented very spikey and this allows one to provide a smoothing value which softens those spikes. --linewidth pixels For those who want a wider plotting line, this is the way to go. Enter the width in pixels. --plotwidth pixels This is actually the horizontal distance between points in pixels. Changing either this or --xaxis effects the width of the plot, but this does it without changing the number of data points that will fit on it. Data Collection --count num The number of samples to collect, this is passed unaltered to collectl as -c. --colbin path If collectl is stored somewhere other than /usr/sbin on the target machine, use this to specify its location. However, remember that this path will be passed to ALL machines being monitored. --colmuxbin path Like --colbin, this allows you to change the location of where to look for colmux. --lustype [cmo] This defines what types of lustre machines are being monitored when -sl is selected since there is no apriori way for colgui to know that. Choose any combination of "cmo" to choose client, mds or oss noting these types of plots will be displayed for ALL machines selected. It is passed unaltered to collectl as -L. Also be aware that for any machines NOT configured as running lustre, at least version 1.5.3 of collectl will be required. --nfstype [c2] Collectl is capable of monitoring nfs clients or servers, supporting either nfs version 2 or 3, but only 1 of the 4 combinations during any single run. By default, it is assumed a machine is running as a v3 server. To change either the version or to make the target machine a client, use this switch. It is passed unalted to collectl as -O. Data Logging In addition to displaying plots, colgui can also be requested to log the data simultaneously. --logterm Write a copy of each record received to the terminal. Naturally the speed of the display can effect how quickly the plots can be updated. --log1file dir Create a file in the specified directory named for the host this is running on and the date/time of the data collection. Each record will be preceeded by the name of the host (or address) from which the data was collected. --logfiles dir Similar to log1file except now a separate file is created for each host, named for that host as well as the date/time that the col- lection was started. You can combine --log1file and --logfiles with --logterm but not each other. If Compress::Zlib is installed, the logs will automatically be compressed. If logging to the terminal AND a file simultaneously, compres- sion will be turned off. Networking --port number By default, colgui communicates over port 1234. This option allows you to select a different one. --proxy address If colgui cannot directly connect to the target machines, one can put the "colmux" program on a machine that can, using it as a proxy. Specify the address of that machine with this switch. --realaddress address When communicating through a proxy, this machine`s address is hidden from other machines. Enter the address that needs to be used to connect back to this machine. --rsh By default, colgui uses ssh for all communications. If not available but rsh is, select this switch. --username name If rsh or ssh requires some username other than the one being run under, this is the way to change it. PLOT SELECTION
One can actually select plots in one of three ways. Using -s, one selects a default plot that matches the associated subsystem(s). Some of these plots contain multiple y-axes so that they can present the maximum amount of information in the minimal amount of space. Using -p, one selects specific plots by name. These names can be either comma separated (no whitespace) or separated by whitespace and quoted. The list of available plots can be displayed with --showplots, some of which are those displayed via -s. Many of these plots are actually the multi-yaxis plots broken into 2, single axis plots. A number of these plots contain data fields not available as -s plots so it's worth familiarizing yourself with them. Finally, when nothing quite fits the bill, one can use custom plots, referred to by -c. Here too one can specify one or more name, however in this case these name actual files, whose default extensions are "cfg". These files contain user defined plots so that you can essen- tially plot any data fields known by collectl! The rules of how to define a custom plot are contained in the sample mem.cfg which can be found in the examples directory. There are also a number of custom lustre plots that can display a broad set of information. These can also be used as a starting point for building your own. There are also FAQs for both colplot and colgui that may provide addition help. One thing to remember is that colgui and colplot actually share ALL the plots, both standard as well as custom. This means that any custom plots constructed for colgui can be used by colplot and visa-versa. If there appear to be problems using custom plots - either colgui is reporting errors OR the data being displayed does't look correct, you can also see the parameters colgui will be using to generate its plots by using --showparams, which shows ALL plot definitions, not just custom ones. Finally, you CAN mix -s, -p and -c in any combinations you like. MACHINES FILE
This is a file that names the machines which are to be monitored. At minimal, it lists one machine per line. Each entry must be an address or a name that can be resolved to an address. Additional names may be specified, separated by whitespace. If a second name exists, it will be used when a title is displayed on a plot. If it doesn't exist, the value of the first field will be displayed. When displaying plots in compressed/dense format, host names are displayed vertically. In some cases, the names are simply too long to fit and if specified, the value of the 3rd field will be used, otherwise the second field will be used. USING COLMUX AS A PROXY
This is a feature that allows you to monitor systems to which you have no direct connectivity. This is typically the case when a machine that does has connectivity isn't configured to run X. This feature has been successfully tested in a number of configurations but cer- tainly not all. If you do encounter problems be sure to report them. To use this feature, you need to find a machine to act as a proxy and which is capable of accessing the target machines via both rsh/ssh and a socket connection. If there are firewalls involved they may have to be opened up, at least for a specific port which can then be specified with "--port". Since machines can have multiple interfaces on them, be sure to use addresses that the machine running colmux can see. If you do encounter problems, try logging into the machine on which colmux is running and try to run it manually using the same node list but without --proxy. Often this will reveal connectivity/reachability problems you didn't realize you had. RESTRICTIONS
Requires at least collectl V1.5.6. When displaying detail data normal/dense using --geometry nd, there is only a single title line displayed for all systems. This means that if the devices are not the same, the titles can be misleading. If you're not sure what you're displaying, use --showparams to see this level of information. AUTHOR
This program was written by Mark Seger (mjseger@gmail.com). Copyright 2005 Hewlett-Packard Development Company, L.P. SEE ALSO
collectl, colmux, colplot LOCAL
OCTOBER 2005 COLGUI(1)
All times are GMT -4. The time now is 05:35 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy