04-10-2010
collectl - your one-stop tool
I just answered a previous note about memory usage and pointed the user at collectl. There are a couple of things worth noting - collectl is VERY lightweight, on the order of using <0.1% of the cpu when sampling system data every 10 seconds! When trying to track down something tricky you ALWAYS need fine grained time or you never see those spikes that so ofter offer at the least expected time. In fact if you want to sample once a second you're still <1%.
But back to the problem at hand. While you can certainly run ps from cron every hours there are 2 reasons why you might not want to. First of all, sampling once an hour isn't really going to help much unless you get real lucky. Second, even if ps did tell you something you might also want to get other things that happened at the time in question like CPU, memory usage, open files, etc. but you don't have access to it because you didn't think to ask ahead of time.
With collectl, you just start it running as a daemon and it will collect more than you thought of to ask. It will even collect info on your slab usage and a runaway allocation of slab memory can certainly trigger the out-of-memory killer.
Just note that collectl only monitors slabs/processes once a minute because there are high load tasks...
-mark
10 More Discussions You Might Find Interesting
1. Shell Programming and Scripting
Hello,
I host a couple of Call of Duty gameing servers. There are some hackers who love the crash them. When they crash them it simply causes a segmentaion fault and kills the PID. I was wondering it you could help me write a script to simply restart the program after it has been crashed. The... (9 Replies)
Discussion started by: Phobos
9 Replies
2. Shell Programming and Scripting
how to find the chid process id from given parent process id.... (the chid process doesnot have sub processes inturn) (3 Replies)
Discussion started by: guhas
3 Replies
3. UNIX for Dummies Questions & Answers
Hello
We had an old system designed in fortran that ran on a IBM RS6000 AIX 3.2 system. The person who designed is long gone. It was replaced with a completely different (non unix) system 6 years ago. We still used it for historical lookups of older information. Well yesterday it died. The... (5 Replies)
Discussion started by: billfaith
5 Replies
4. Solaris
How to find the process that is using the port 80 and apache server.
When i used the command 'netstat -a|grep 80' it given that port 80 is in listening mode.
I had used the following command:
telnet localhost 80
GET /
I had got some HTML script.
But when I accessed the GUI ( url is... (7 Replies)
Discussion started by: vamshikrishnab
7 Replies
5. Shell Programming and Scripting
Hello all,
I would be happy if any one could help me with a shell script that would determine all the processes running on a Unix server and post a mail if any of the process is not running or aborted.
Thanks in advance
Regards,
pradeep kulkarni.
:mad: (13 Replies)
Discussion started by: pradeepmacha
13 Replies
6. Shell Programming and Scripting
suppose there are in 10 different server
how can i know in which server a process (ex:oracle )is running (6 Replies)
Discussion started by: alokjyotibal
6 Replies
7. Red Hat
Hello
Im using redhat and try to debug my application , its crashes and in strace I also see it has problems , but I can't see any core dump
I configured all the limit ( im using .cshrc ) and it looks like this :
cputime unlimited
filesize unlimited
datasize unlimited... (8 Replies)
Discussion started by: umen
8 Replies
8. Red Hat
What do you check????
Thanks!
JC (0 Replies)
Discussion started by: 300zxmuro
0 Replies
9. IP Networking
If Freebsd DNS server that served 100 people is crashed. How to move this 100 people to a new FreeBSD DNS server as quickly as possible? (1 Reply)
Discussion started by: AIX_30
1 Replies
10. Solaris
Hi,
I have two Solaris 10 servers. First server crashed last week (Monday) and second one crashed over the weekend. I have checked the logs such as /var/adm/messages, syslog and dmesg. So for I found none. My management wants to know why the server crashed. I need to come with some kind of... (4 Replies)
Discussion started by: samnyc
4 Replies
LEARN ABOUT REDHAT
netdump-server
NETDUMP-SERVER(8) System Programs NETDUMP-SERVER(8)
NAME
netdump-server - handle crash dumps over the network
SYNOPSIS
netdump-server [--port portnumber]
[--concurrent number]
[--pidfile path]
[--daemon]
[--help] [--usage]
DESCRIPTION
Listens to the network for clients that crashes and uses the netdump protocol to recieve a memory dump and a stack trace. The memory dump
and oops message are stored in a timestamped directory in /var/crash. The server can also run scripts when some events happen.
OPTIONS
--port portnumber
Specifies the IP port number for the netdump server to listen to. The default is 6666.
--concurrent number
You can limit the amount of concurrent dumps being done at any one time. If more clients than the specified maximum connects at one
time the last ones will just be logged and then rebooted.
--pidfile path
Store a pidfile. The default service uses /var/run/ttywatch.pid. The default is not to write a pidfile.
--daemon
ttywatch should background itself and run as a daemon.
EXAMPLES
netdump-server --daemon
This launches the netdump-server and puts it in the background, listening for crashed clients.
EXIT STATUS
Exit status is 0 for a clean exit and non-0 for a non-clean exit.
FILES
/etc/netdump.conf
A configuration file read by netdump-server on startup. It is a "key=value" style file. Currently it supports the options: port,
max_concurrent_dumps, daemon and pidfile.
/etc/init.d/netdump-server
An init script to start a default system installation of netdump-server. This is normally turned off by default; use the command
/sbin/chkconfig netdump-server on
to enable the netdump-server service.
/var/crash
The main directory where the crash dump files are stored. Each dump is put in a subdirectory named with the ip of the crashed
machine and the date and time of the crash.
/var/crash/scripts
This directory can contain scripts that are run at various times. They all get passed the ip of the crashing machine as the first
argument, and each one except netdump-start gets the directory that the dump is written into as the second argument.
netdump-start - This is called when a client connects to the server to tell it that it has just started the netdump client. This
normally means that the machine just booted up.
netdump-crash - This is run when a client reports that it has crashed. If it returns a non-zero value the dump request will be
ignored and the client will be told to reboot immediately
netdump-nospace - This is run when there is not enough diskspace for the dump of the crashed machine. If this script exits with a
non-zero return value netdump-server will try once again (but only once) before giving up the dump. If this script exits with a zero
return value, netdump-server will reboot the client without performing a dump.
netdump-reboot - This is run when netdump-server is finished with a client and is about to tell the client to reboot itself.
SEE ALSO
netdump(8)
BUGS
Report any bugs you find to http://bugzilla.redhat.com/bugzilla
AUTHOR
Alexander Larsson <alexl@redhat.com>
Linux 14 Feb 2002 NETDUMP-SERVER(8)