Random Crashing


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Random Crashing
# 1  
Old 12-23-2012
Random Crashing

Over the last month or so my CentOS server has been crashing for reasons I do not know. It has been running for over a year with regular yum updates without problems. The load on the server is perfectly normal with CPU usage at 5-6% and RAM usage at less than half of 32GB of RAM (multiple smaller game servers run off of this box). I am unsure if this is a software issue at all.

I have pasted my /var/log/messages file around the time of my latest crash all the way up to the crash. Because I am a CentOS newb, this is gibberish to me, so I am curious if anything in the file points to a crash of some kind? Or if there are other logs I could check and paste? If not, it would lead me to believe there is a hardware issue or overheating.

Here is the messages:
Code:
Dec 21 14:58:03 server1 kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x26d/0x280() (Not tainted)
Dec 21 14:58:03 server1 kernel: Hardware name: X9SCL/X9SCM
Dec 21 14:58:03 server1 kernel: NETDEV WATCHDOG: eth2 (e1000e): transmit queue 0 timed out
Dec 21 14:58:03 server1 kernel: Modules linked in: fuse autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 sg microcode serio_raw i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support e1000e ext4 mbcache jbd2 sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
Dec 21 14:58:03 server1 kernel: Pid: 0, comm: swapper Not tainted 2.6.32-279.14.1.el6.x86_64 #1
Dec 21 14:58:03 server1 kernel: Call Trace:
Dec 21 14:58:03 server1 kernel: <IRQ>  [<ffffffff8106b7b7>] ? warn_slowpath_common+0x87/0xc0
Dec 21 14:58:03 server1 kernel: [<ffffffff8106b8a6>] ? warn_slowpath_fmt+0x46/0x50
Dec 21 14:58:03 server1 kernel: [<ffffffff81459c0d>] ? dev_watchdog+0x26d/0x280
Dec 21 14:58:03 server1 kernel: [<ffffffff8108caad>] ? insert_work+0x6d/0xb0
Dec 21 14:58:03 server1 kernel: [<ffffffff814599a0>] ? dev_watchdog+0x0/0x280
Dec 21 14:58:03 server1 kernel: [<ffffffff8107e937>] ? run_timer_softirq+0x197/0x340
Dec 21 14:58:03 server1 kernel: [<ffffffff810a23c0>] ? tick_sched_timer+0x0/0xc0
Dec 21 14:58:03 server1 kernel: [<ffffffff8102b40d>] ? lapic_next_event+0x1d/0x30
Dec 21 14:58:03 server1 kernel: [<ffffffff81073f61>] ? __do_softirq+0xc1/0x1e0
Dec 21 14:58:03 server1 kernel: [<ffffffff81096d60>] ? hrtimer_interrupt+0x140/0x250
Dec 21 14:58:03 server1 kernel: [<ffffffff8100c24c>] ? call_softirq+0x1c/0x30
Dec 21 14:58:03 server1 kernel: [<ffffffff8100de85>] ? do_softirq+0x65/0xa0
Dec 21 14:58:03 server1 kernel: [<ffffffff81073d45>] ? irq_exit+0x85/0x90
Dec 21 14:58:03 server1 kernel: [<ffffffff81506450>] ? smp_apic_timer_interrupt+0x70/0x9b
Dec 21 14:58:03 server1 kernel: [<ffffffff8100bc13>] ? apic_timer_interrupt+0x13/0x20
Dec 21 14:58:03 server1 kernel: <EOI>  [<ffffffff812cddbe>] ? intel_idle+0xde/0x170
Dec 21 14:58:03 server1 kernel: [<ffffffff812cdda1>] ? intel_idle+0xc1/0x170
Dec 21 14:58:03 server1 kernel: [<ffffffff8109929d>] ? sched_clock_cpu+0xcd/0x110
Dec 21 14:58:03 server1 kernel: [<ffffffff81407c27>] ? cpuidle_idle_call+0xa7/0x140
Dec 21 14:58:03 server1 kernel: [<ffffffff81009e06>] ? cpu_idle+0xb6/0x110
Dec 21 14:58:03 server1 kernel: [<ffffffff814f754f>] ? start_secondary+0x22a/0x26d
Dec 21 14:58:03 server1 kernel: ---[ end trace c6b419e0a29214c3 ]---
Dec 21 14:58:03 server1 kernel: e1000e 0000:02:00.0: eth2: Reset adapter
Dec 21 14:58:03 server1 kernel: e1000e 0000:02:00.0: eth2: Error reading PHY register
Dec 21 14:58:03 server1 kernel: e1000e: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Dec 21 14:58:04 server1 abrtd: Directory 'oops-2012-12-21-14:58:04-2219-0' creation detected
Dec 21 14:58:04 server1 abrt-dump-oops: Reported 1 kernel oopses to Abrt
Dec 21 14:58:04 server1 abrtd: Can't open file '/var/spool/abrt/oops-2012-12-21-14:58:04-2219-0/uid': No such file or directory
Dec 21 14:58:06 server1 kernel: Bridge firewalling registered
Dec 21 14:58:13 server1 kernel: e1000e 0000:02:00.0: eth2: Reset adapter
Dec 21 14:58:13 server1 kernel: e1000e 0000:02:00.0: eth2: Error reading PHY register
Dec 21 14:58:13 server1 kernel: e1000e: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Dec 21 14:58:14 server1 abrtd: Sending an email...
Dec 21 14:58:14 server1 abrtd: Email was sent to: root@localhost
Dec 21 14:58:14 server1 abrtd: New problem directory /var/spool/abrt/oops-2012-12-21-14:58:04-2219-0, processing
Dec 21 14:58:14 server1 abrtd: Can't open file '/var/spool/abrt/oops-2012-12-21-14:58:04-2219-0/uid': No such file or directory
Dec 21 14:58:23 server1 kernel: e1000e 0000:02:00.0: eth2: Reset adapter
Dec 21 14:58:23 server1 kernel: e1000e 0000:02:00.0: eth2: Error reading PHY register
Dec 21 14:58:23 server1 kernel: e1000e: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Dec 21 14:58:33 server1 kernel: e1000e 0000:02:00.0: eth2: Reset adapter
Dec 21 14:58:33 server1 kernel: e1000e 0000:02:00.0: eth2: Error reading PHY register
Dec 21 14:58:33 server1 kernel: e1000e: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Dec 21 14:58:43 server1 kernel: e1000e 0000:02:00.0: eth2: Reset adapter
Dec 21 14:58:43 server1 kernel: e1000e 0000:02:00.0: eth2: Error reading PHY register
Dec 21 14:58:43 server1 kernel: e1000e: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Dec 21 14:58:53 server1 kernel: e1000e 0000:02:00.0: eth2: Reset adapter
Dec 21 14:58:53 server1 kernel: e1000e 0000:02:00.0: eth2: Error reading PHY register
Dec 21 14:58:53 server1 kernel: e1000e: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Dec 21 14:59:03 server1 kernel: e1000e 0000:02:00.0: eth2: Reset adapter
Dec 21 14:59:03 server1 kernel: e1000e 0000:02:00.0: eth2: Error reading PHY register
Dec 21 14:59:03 server1 kernel: e1000e: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Dec 21 14:59:13 server1 kernel: e1000e 0000:02:00.0: eth2: Reset adapter
Dec 21 14:59:13 server1 kernel: e1000e 0000:02:00.0: eth2: Error reading PHY register
Dec 21 14:59:13 server1 kernel: e1000e: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Dec 21 15:03:03 server1 kernel: e1000e 0000:02:00.0: eth2: Reset adapter
Dec 21 15:03:03 server1 kernel: e1000e 0000:02:00.0: eth2: Error reading PHY register
Dec 21 15:03:03 server1 kernel: e1000e: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx

Thanks in advance.
# 2  
Old 12-23-2012
watchdog is doing what it is supposed to do - it is a daemon that resets the system when something, usually software, gets to some defined - not at all acceptable - point.

In this instance it won't allow what is going on with your ethernet (network) port. .. your NIC card.
This is generally hardware - loose connection, bad component, and so on. If this happens every time within a short period of booting, then this may be caused by setting network configuration incorrectly.

I don't know enough about this to give you more information - maybe someone else does.

You do need to make sure about the error relating to: Can't open file '/var/spool/abrt/oops-2....' get resolved - check disk full, check directory /var/spool/abrt exists and is writable

Last edited by jim mcnamara; 12-23-2012 at 11:07 AM..
# 3  
Old 12-23-2012
Well if it helps at all the crashes are random, and happen every 3-4 days at most. Sometimes it can take a week or more.
# 4  
Old 12-23-2012
It is the NIC card and the TCP protocol stack that are bothering watchdog. I know enough red hat to know I do not know how to diagnose and then remedy the problem.

Start with connectivity: ports, ethernet cables, maybe the NIC card.

If that does not help then assume you misconfigured TCP - start with man ifconfig. I assume since everyone uses that, Red Hat does, too.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Need to generate a file with random data. /dev/[u]random doesn't exist.

Need to use dd to generate a large file from a sample file of random data. This is because I don't have /dev/urandom. I create a named pipe then: dd if=mynamed.fifo do=myfile.fifo bs=1024 count=1024 but when I cat a file to the fifo that's 1024 random bytes: cat randomfile.txt >... (7 Replies)
Discussion started by: Devyn
7 Replies

2. UNIX and Linux Applications

Firefox crashing

Firefox keeps on crashing every time I try to run it. I tried to create a new profile with no luck. When I try to open up the profile manager firefox crashes. I even deleted my profile folder and tried to start with a new profile. That did not work either. I don't know what the problem could be. I... (6 Replies)
Discussion started by: cokedude
6 Replies

3. Web Development

MySQL Server Crashing need Help

Hi, we have some problem with mysql high cpu , would like some help with MySQL Tuning here are the mysqltuner & tuning-primer details mysqltuner: # mysqltuner >> MySQLTuner 1.1.1 - Major Hayden <major@mhtx.net> >> Run with '--help' for additional options and output filtering --------... (1 Reply)
Discussion started by: cataplexy
1 Replies

4. Programming

Why this C program is crashing?

Hi, Why I am getting 'SIGSEGV' in the following code? char* p="abcde"; printf("%s", 3); // Segmentation Fault (core dump) Kindly help me to understand what exactly makes the program to crash or the reason for the crashing. (7 Replies)
Discussion started by: royalibrahim
7 Replies

5. Ubuntu

expect script for random password and random commands

Hi I am new to expect. Please if any one can help on my issue its really appreciable. here is my issue: I want expect script for random passwords and random commands generation. please can anyone help me? Many Thanks in advance (0 Replies)
Discussion started by: vanid
0 Replies

6. Red Hat

Fedora 11 crashing help

Using Fedora 11, just about every day the system started crashing. Becomes unresponsive to keyboard/mouse, nothing appears on screen but box is still running. Still responds to ping, arp address stays alive in the firewall, other than that its unresponsive Where can I look to find out... (1 Reply)
Discussion started by: ippy98
1 Replies

7. Solaris

Solaris 10 crashing

Hi, My system is crashing with following error .. i tried to boot from the network and unencapsulated the root disk from SVM .. but still not able to boot the box , can any one point me to some direction .. i do not want to build the box new as of now just want to troubleshoot if possible.. ... (3 Replies)
Discussion started by: fugitive
3 Replies

8. Programming

c++ class keeps crashing

i just started learning to prog in c++, im using netbeans. i tryed to make a class and i keep getting this error. what am i doing wrong ? ----------------------------- class test { public: int funtime(); private: int time; }; int test::funtime()... (4 Replies)
Discussion started by: akira300
4 Replies

9. Programming

dlclose crashing in 64bit

Hi I have a 64bit C++ dynamic component built using Sun Forte compiler(CC) on one server. I am opening this shared component using dlopen and checking if a particular function is defined or not. After that, when I am closing the component using dlclose, the program is crashing. The... (3 Replies)
Discussion started by: ajphaj
3 Replies
Login or Register to Ask a Question