Find out process that crashed the server


 
Thread Tools Search this Thread
Operating Systems Linux Find out process that crashed the server
# 8  
Old 02-08-2010
Quote:
Originally Posted by Corona688
My crystal ball tells me it was emacs. Smilie

But the process killed isn't necessarily the one that caused the out of memory condition. The kernel tries to identify it but when the whole system is memory starved, EVERYTHING is fighting for memory...
Ah point taken.

The system has 64 gigs of memory, and probably was using 30% of it before someone/something used it all up. What would emacs be doing?

I know it is possible in vi, since I once did a substitute command in vi on a big file and drained the memory.

---------- Post updated at 08:55 ---------- Previous update was at 08:50 ----------

Quote:
Originally Posted by mikep9
debug0:2> eps() <Enter>
The eps() command will give you a process listing.

This works with SCO Unix boxes that have had a kernel panic.
I tried looking for the eps() command on my box, RHEL, but couldn't find it. Is it available on RHEL or just SCO Unix?
# 9  
Old 02-09-2010
I think you misunderstood corona.

When a Linux box runs out of memory it starts killing processes (I think randomly) to free memory. In this case emacs was just prey. As corona said too, you can't see by this messages, which other processes were using up all memory, causing this behaviour (killing other processes).

So you might want to, as already said, just write a little script and place it in the crontab, that takes a snapshot with ps and pmap every hour or 10 minutes or whatever and compare them and have a look which of them rises in memory usage between a fresh reboot and when it is about time it should start doing this.
# 10  
Old 02-10-2010
Quote:
Originally Posted by zaxxon
I think you misunderstood corona.

When a Linux box runs out of memory it starts killing processes (I think randomly) to free memory.
Checking the source, it's got a complex scoring system to measure a process' "badness". It preferentially kills:
  • Things with lots of memory.
  • Things with lots of children(forkbombs).
  • Things with very high total CPU time, i.e. endless allocation loops.
  • Low-priority and/or non-root things (since they're presumably less important).
  • Above all else, swapoff. duh.
But it can only measure the stats, and gauges what's safe to kill as much as what should be killed.

This doesn't rule out emacs, either! It might have been killed because it was consuming too much memory. Or it might have been killed to make way for a runaway process that had higher priority or access privileges than it, which the OOM killer preferentially keeps.
# 11  
Old 04-10-2010
collectl - your one-stop tool

I just answered a previous note about memory usage and pointed the user at collectl. There are a couple of things worth noting - collectl is VERY lightweight, on the order of using <0.1% of the cpu when sampling system data every 10 seconds! When trying to track down something tricky you ALWAYS need fine grained time or you never see those spikes that so ofter offer at the least expected time. In fact if you want to sample once a second you're still <1%.

But back to the problem at hand. While you can certainly run ps from cron every hours there are 2 reasons why you might not want to. First of all, sampling once an hour isn't really going to help much unless you get real lucky. Second, even if ps did tell you something you might also want to get other things that happened at the time in question like CPU, memory usage, open files, etc. but you don't have access to it because you didn't think to ask ahead of time.

With collectl, you just start it running as a daemon and it will collect more than you thought of to ask. It will even collect info on your slab usage and a runaway allocation of slab memory can certainly trigger the out-of-memory killer.

Just note that collectl only monitors slabs/processes once a minute because there are high load tasks...

-mark
# 12  
Old 04-10-2010
Quote:
Originally Posted by MarkSeger
I just answered a previous note about memory usage and pointed the user at collectl. There are a couple of things worth noting - collectl is VERY lightweight, on the order of using <0.1% of the cpu when sampling system data every 10 seconds! When trying to track down something tricky you ALWAYS need fine grained time or you never see those spikes that so ofter offer at the least expected time. In fact if you want to sample once a second you're still <1%.

But back to the problem at hand. While you can certainly run ps from cron every hours there are 2 reasons why you might not want to. First of all, sampling once an hour isn't really going to help much unless you get real lucky. Second, even if ps did tell you something you might also want to get other things that happened at the time in question like CPU, memory usage, open files, etc. but you don't have access to it because you didn't think to ask ahead of time.

With collectl, you just start it running as a daemon and it will collect more than you thought of to ask. It will even collect info on your slab usage and a runaway allocation of slab memory can certainly trigger the out-of-memory killer.

Just note that collectl only monitors slabs/processes once a minute because there are high load tasks...

-mark
Thanks Mark! collectl seems to be the type of monitor tool I need to use to keep track of events; certainly as you have pointed out a much better solution than cron and ps. I'll give it a go. Cheers!

Dave
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Solaris

Solaris 10 server crashed two times

Hi, I have two Solaris 10 servers. First server crashed last week (Monday) and second one crashed over the weekend. I have checked the logs such as /var/adm/messages, syslog and dmesg. So for I found none. My management wants to know why the server crashed. I need to come with some kind of... (4 Replies)
Discussion started by: samnyc
4 Replies

2. IP Networking

DNS server crashed

If Freebsd DNS server that served 100 people is crashed. How to move this 100 people to a new FreeBSD DNS server as quickly as possible? (1 Reply)
Discussion started by: AIX_30
1 Replies

3. Red Hat

What do you do right after a server crashed.

What do you check???? Thanks! JC (0 Replies)
Discussion started by: 300zxmuro
0 Replies

4. Red Hat

Process does not dump any core files when crashed even if coredumpsize is unlimited

Hello Im using redhat and try to debug my application , its crashes and in strace I also see it has problems , but I can't see any core dump I configured all the limit ( im using .cshrc ) and it looks like this : cputime unlimited filesize unlimited datasize unlimited... (8 Replies)
Discussion started by: umen
8 Replies

5. Shell Programming and Scripting

Find the process in different server

suppose there are in 10 different server how can i know in which server a process (ex:oracle )is running (6 Replies)
Discussion started by: alokjyotibal
6 Replies

6. Shell Programming and Scripting

script to monitor process running on server and posting a mail if any process is dead

Hello all, I would be happy if any one could help me with a shell script that would determine all the processes running on a Unix server and post a mail if any of the process is not running or aborted. Thanks in advance Regards, pradeep kulkarni. :mad: (13 Replies)
Discussion started by: pradeepmacha
13 Replies

7. Solaris

How to find the process that is using the port 80 and apache server.

How to find the process that is using the port 80 and apache server. When i used the command 'netstat -a|grep 80' it given that port 80 is in listening mode. I had used the following command: telnet localhost 80 GET / I had got some HTML script. But when I accessed the GUI ( url is... (7 Replies)
Discussion started by: vamshikrishnab
7 Replies

8. UNIX for Dummies Questions & Answers

old server crashed

Hello We had an old system designed in fortran that ran on a IBM RS6000 AIX 3.2 system. The person who designed is long gone. It was replaced with a completely different (non unix) system 6 years ago. We still used it for historical lookups of older information. Well yesterday it died. The... (5 Replies)
Discussion started by: billfaith
5 Replies

9. Shell Programming and Scripting

how to find the chid process id from given parent process id

how to find the chid process id from given parent process id.... (the chid process doesnot have sub processes inturn) (3 Replies)
Discussion started by: guhas
3 Replies

10. Shell Programming and Scripting

Restarting a Crashed Process

Hello, I host a couple of Call of Duty gameing servers. There are some hackers who love the crash them. When they crash them it simply causes a segmentaion fault and kills the PID. I was wondering it you could help me write a script to simply restart the program after it has been crashed. The... (9 Replies)
Discussion started by: Phobos
9 Replies
Login or Register to Ask a Question