Sponsored Content
Full Discussion: System check after crash?
Top Forums UNIX for Dummies Questions & Answers System check after crash? Post 302352535 by rhfrommn on Friday 11th of September 2009 05:02:47 PM
Old 09-11-2009
It is hard to tell what is going on for sure from the info you have. Here are a couple hints/starting points.

The Unix equivalent of scandisk is called "fsck". I've run it many times and never seen output like you're getting so I doubt is is a disk check. You could google that to familiarize yourself with it so you can be sure.

Based on all the hex addresses you're getting I would guess you're getting a panic - analagous to the "blue screen of death" from Windows. This is often caused by a software problem (writing to restricted memory, illegal instruction, etc.) but can also be caused by hardware errors. If this is a server with a service processor of some kind try to access that and see if it logged any hardware faults or messages. If it's just a PC obviously that won't work.

You probably need to reboot or power cycle in either case. Carefully watch the messages and see if you can find a clue there.

Good luck.
 

9 More Discussions You Might Find Interesting

1. UNIX for Dummies Questions & Answers

Tru64 system crash

Hello All We are using Tru64 Unix V4.0D on a Digital Alpha 1200 machine, which by coincidence, crashed last night! Completely out of the Blue!!! After rebooting the system and recovering the databases everything seemed to work ok. It is however, discomforting not to know how or why the system... (2 Replies)
Discussion started by: Ivo
2 Replies

2. Filesystems, Disks and Memory

System crash and Disk erasure !!!

I need some expert help, and would appreciate any feedback on the following problem: After power outage the machine didn't allow the login. When we checked it , it looked like both disks were completely empty. Luckily, we have a backup machine, and we can restore the disks. Has anybody seen... (3 Replies)
Discussion started by: DGoubine
3 Replies

3. HP-UX

HP-UX system crash help please!!!

Hi, First of all, thanks for your help. I have downloaded freeBSD to study unix I'm here again 'cause my HP-UX 10.x has "broken". It raise this error: ---------------------------------------------------- Diagnostic System warning: = 0x1f005000 is POWERFAILED The diagnostic logging... (1 Reply)
Discussion started by: efrenba
1 Replies

4. UNIX for Dummies Questions & Answers

how to handle system which become crash automatically

helo, suppose u have well build in product system which is build in linux o.s. now due to reason system automatically become crash(software crash),then how do u handle such crash. which is better way to handle such crash. amit (0 Replies)
Discussion started by: amitpansuria
0 Replies

5. AIX

System crash when update TL

Hello everyone I have a partition with the TL 5300-06-01-0000 and try to update to this TL 5300-09-03-0918 but suddenly the update dont work more. I get in to the HMC and I see this code error 888 102 700 0C5 I try to restart my partition but doesnt work. Theres someone who has this... (9 Replies)
Discussion started by: lo-lp-kl
9 Replies

6. Programming

socket system call can not succedd right after application crash.

hello all, I have developed a server application in C for ulinux kernel 2.6.It works very fine; creating a socket, binding it to a port, listening for incoming sockets and accepting them ,all finish without any error. But there is a problem regarding application crash.After an intentionally... (1 Reply)
Discussion started by: Sedighzadeh
1 Replies

7. HP-UX

restore data after system crash

Hi all, I have a server running HP-UX 11i V1 (11.11). We had a problem with the system disk which cannot boot and the recovery with the CD failed too. the only solution was to re-install the system on a new disk. The problem now is to get access to the data which are on other disks, not... (2 Replies)
Discussion started by: aribault
2 Replies

8. Solaris

Check for existence of crash on Solaris 10

Hey all what is the command to check "Check for existence of crash/coredump files in /var/crash/"hostname" directory" thanks for help (4 Replies)
Discussion started by: gema.utama
4 Replies

9. OS X (Apple)

MacOS 10.15.2 Catalina display crash and system panic

MacPro (2013) 12-Core, 64GB RAM (today's crash): panic(cpu 2 caller 0xffffff7f8b333ad5): userspace watchdog timeout: no successful checkins from com.apple.WindowServer in 120 seconds service: com.apple.logd, total successful checkins since load (318824 seconds ago): 31883, last successful... (3 Replies)
Discussion started by: Neo
3 Replies
MCELOG(8)						  Linux's Administrator's Manual						 MCELOG(8)

NAME
mcelog - Decode kernel machine check log on x86 machines SYNOPSIS
mcelog [options] [device] mcelog [options] --daemon mcelog [options] --client mcelog [options] --ascii mcelog --version DESCRIPTION
X86 CPUs report errors detected by the CPU as machine check events (MCEs). These can be data corruption detected in the CPU caches, in main memory by an integrated memory controller, data transfer errors on the front side bus or CPU interconnect or other internal errors. Possible causes can be cosmic radiation, instable power supplies, cooling problems, broken hardware, or bad luck. Most errors can be corrected by the CPU by internal error correction mechanisms. Uncorrected errors cause machine check exceptions which may panic the machine. When a corrected error happens the x86 kernel writes a record describing the MCE into a internal ring buffer available through the /dev/mcelog device mcelog retrieves errors from /dev/mcelog, decodes them into a human readable format and prints them on the standard out- put or optionally into the system log. Optionally it can also take more options like keeping statistics or triggering shell scripts on specific events. The normal operating modi for mcelog are running as a regular cron job (traditional way, deprecated), running as a trigger directly exe- cuted by the kernel, or running as a daemon with the --daemon option. When an uncorrected machine check error happens that the kernel cannot recover from then it will usually panic the system. In this case when there was a warm reset after the panic mcelog should pick up the machine check errors after reboot. This is not possible after a cold reset. In addition mcelog can be used on the command line to decode the kernel output for a fatal machine check panic in text format using the --ascii option. This is typically used to decode the panic console output of a fatal machine check, if the system was power cycled or mcelog didn't run immediately after reboot. When the panic triggers a kdump kexec crash kernel the crash kernel boot up script should log the machine checks to disk, otherwise they might be lost. Note that after mcelog retrieves an error the kernel doesn't store it anymore (different from dmesg(1)), so the output should be always saved somewhere and mcelog not run in uncontrolled ways. OPTIONS
When the --syslog option is specified redirect output to system log. The --syslog-error option causes the normal machine checks to be logged as LOG_ERR (implies --syslog ). Normally only fatal errors or high level remarks are logged with error level. High level one line summaries of specific errors are also logged to the syslog by default unless mcelog operates in --ascii mode. When the --logfile=file option is specified append log output to the specified file. With the --no-syslog option mcelog will never log any- thing to the syslog. When the --cpu=cputype option is specified set the to be decoded CPU to cputype. See mcelog --help for a list of valid CPUs. Note that specifying an incorrect CPU can lead to incorrect decoding output. Default is either the CPU of the machine that reported the machine check (needs a newer kernel version) or the CPU of the machine mcelog is running on, so normally this option doesn't have to be used. Older versions of mcelog had separate options for different CPU types. These are still implemented, but deprecated and undocumented now. With the --dmi option mcelog will look up the addresses reported in machine checks in the SMBIOS/DMI tables of the BIOS. This can some- times tell you which DIMM or memory controller has developed a problem. More often the information reported by the BIOS is either subtly or obviously wrong or useless. This option requires that mcelog has read access to /dev/mem (normally requires root) and runs on the same machine in the same hardware configuration as when the machine check event happened. When --ignorenodev is specified then mcelog will exit silently when the device cannot be opened. This is useful in virtualized environment with limited devices. When --filter is specified mcelog will filter out known broken machine check events (default on). When the --no-filter option is specified mcelog does not filter events. When --raw is specified mcelog will not decode, but just dump the mcelog in a raw hex format. This can be useful for automatic post pro- cessing. When a device is specified the machine check logs are read from device instead of the default /dev/mcelog. With the --ascii option mcelog decodes a fatal machine check panic generated by the kernel ("CPU n: Machine Check Exception ...") in ASCII from standard input and exits afterwards. Note that when the panic comes from a different machine than where mcelog is running on you might need to specify the correct cputype on older kernels. On newer kernels which output the PROCESSOR field this is not needed anymore. When the --file filename option is specified mcelog --ascii will read the ASCII machine check record from input file filename instead of standard input. With the --config-file file option mcelog reads the specified config file. Default is /etc/mcelog/mcelog.conf See also CONFIG FILE below. With the --daemon option mcelog will run in the background. This gives the fastest reaction time and is the recommended operating mode. This option implies --logfile=/var/log/mcelog. Important messages will be logged as one-liner summaries to syslog unless --no-syslog is given. The option --foreground will prevent mcelog from giving up the terminal in daemon mode. This is intended for debugging. With the --client option mcelog will query a running daemon for accumulated errors. With the --cpumhz=mhz option assume the CPU has mhz frequency for decoding the time of the event using the CPU time stamp counter. This also forces decoding. Note this can be unreliable. on some systems with CPU frequency scaling or deep C states, where the CPU time stamp counter does not increase linearly. By default the frequency of the current CPU is used when mcelog determines it is safe to use. Newer kernels report the time directly in the event and don't need this anymore. The --pidfile file option writes the process id of the daemon into file file. Only valid in daemon mode. --version displays the version of mcelog and exits. CONFIG FILE
mcelog supports a config file to set defaults. Command line options override the config file. By default the config file is read from /etc/mcelog/mcelog.conf unless overridden with the --config-file option. The general format is optionname = value White space is not allowed in value currently, except at the end where it is dropped Comments start with #. All command line options that are not commands can be specified in the config file. For example t to enable the --no-syslog option use no- syslog = yes (or no to disable). When the option has a argument use logfile = /tmp/logfile NOTES
The kernel prefers old messages over new. If the log buffer overflows only old ones will be kept. The exact output in the log file depends on the CPU, unless the --raw option is used. mcelog will report serious errors to the syslog during decoding. SIGNALS
When mcelog runs in daemon mode and receives a SIGUSR1 it will close and reopen the log files. This can be used to rotate logs without restarting the daemon. FILES
/dev/mcelog (char 10, minor 227) /etc/mcelog/mcelog.conf /var/log/mcelog /var/run/mcelog.pid SEE ALSO
AMD x86-64 architecture programmer's manual, Volume 2, System programming Intel 64 and IA32 Architectures Software Developer's manual, Volume 3, System programming guide Parts 1 and 2. Machine checks are described in Chapter 14 in Part1 and in Appendix E in Part2. Datasheet of your CPU. May 2009 MCELOG(8)
All times are GMT -4. The time now is 08:37 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy