Sponsored Content
Special Forums Hardware Overheating causing system shutdowns Post 302430790 by Corona688 on Friday 18th of June 2010 05:52:06 PM
Old 06-18-2010
Quote:
Originally Posted by Narnie
Here is the most recent kernel error.
Where are you getting these errors? If you're reading them from the logfile, understand that severe things like kernel panics won't be written to it -- a crashed system writes no files. To get really bad messages you have to watch the messages from a raw system console.

Where it says "machine check events logged" you can get more info by running the "mcelog" command. Though I think that buffer gets cleared by a reboot...

Quote:
It never says it is shutting down. It just "dies" even after saying the temp/speed is normal.
It's pretty hard for the kernel to fake a thermal throttling, I think it's really overheating. If linux has taken control of the fans away from your BIOS, glitched readings from lm_sensors could keep your fans at low speed even under heavy load. Bad calibration in lm_sensors.conf could prevent the fans speeding up as fast as they need to. Try disabling lm_sensors so it doesn't seize control of the sensors, leaving the fan under BIOS control. Try cat-ing the temperature from something like /proc/acpi/processor/CPU0/THRM (if available).

Last edited by Corona688; 06-18-2010 at 06:58 PM..
This User Gave Thanks to Corona688 For This Post:
 

9 More Discussions You Might Find Interesting

1. Post Here to Contact Site Administrators and Moderators

HTML is causing problems

I have to suggest that we turn HTML back off. The problem is that angle brackets are used in code and this is causing stuff to get dropped from posts. I know that we can use the constructs that PxT mentions in this thread. But look how hard it is to educate folks about code tags and the search... (4 Replies)
Discussion started by: Perderabo
4 Replies

2. UNIX for Dummies Questions & Answers

Causing a disk to be corrupt

Hmm, how to ask this without sounding too malicious... How might one go about causing a disk corruption in OS X specifically or via the command line in UNIX in general? Doesnt matter the severity of the problem, I just want to scare the person a little, then fix the problem for them. Any... (1 Reply)
Discussion started by: Yummator
1 Replies

3. UNIX for Dummies Questions & Answers

GCC causing problems it seems.

Hi, I seem to be getting errors in relation to GCC it seems as I cant upgrade alot of pkgs until I can upgrade or use a later version of GCC. The error I get is along the lines of ( cc1: error: unrecognized command line option "-Wno-pointer-sign" *** Error code 1 ) Anyway I was wondering if... (2 Replies)
Discussion started by: Browser
2 Replies

4. Shell Programming and Scripting

Nohup causing issues

Hi folks... I really need some help soon with this issue I am having when I run my script using 'nohup'. Below is a function 'checkReturn' that my script uses to check whether other functions or tasks errored out with a non-zero exit code. function checkReturn { if ; then ... (2 Replies)
Discussion started by: ChicagoBlues
2 Replies

5. AIX

Which Process is causing Paging?

Hello On one of our systems (AIX 5) I am seeing (vmstat) paging intermittently I want to know which process is causing the paging? I understand that first I would need to find out which process is consuming most memory 1) Is that right? 2) How to find it out? 3) By googling I found... (8 Replies)
Discussion started by: Chetanz
8 Replies

6. AIX

How to know which process is causing the closed_wait?

I do have a friend who have this script already but lost it. Can you please help to give me a script that can capture the closed_wait on the stack and identify which process using it. I am thinking of using netstat and rmsock. (2 Replies)
Discussion started by: depam
2 Replies

7. BSD

Process remians in Running state causing other similar process to sleep and results to system hang

Hi Experts, I am facing one problem here which is one process always stuck in running state which causes the other similar process to sleep state . This causes my system in hanged state. On doing cat /proc/<pid>wchan showing the "__init_begin" in the output. Can you please help me here... (0 Replies)
Discussion started by: naveeng
0 Replies

8. UNIX for Advanced & Expert Users

Process remians in Running state causing other similar process to sleep and results to system hang

Hi Experts, I am facing one problem here which is one process always stuck in running state which causes the other similar process to sleep state . This causes my system in hanged state. On doing cat /proc/<pid>wchan showing the "__init_begin" in the output. Can you please help me here... (1 Reply)
Discussion started by: naveeng
1 Replies

9. UNIX for Advanced & Expert Users

Process remians in Running state causing other similar process to sleep and results to system hang

Hi Experts, I am facing one problem here which is one process always stuck in running state which causes the other similar process to sleep state . This causes my system in hanged state. On doing cat /proc/<pid>wchan showing the "__init_begin" in the output. Can you please help me here... (6 Replies)
Discussion started by: naveeng
6 Replies
ACPI_THERMAL(4) 					   BSD Kernel Interfaces Manual 					   ACPI_THERMAL(4)

NAME
acpi_thermal -- ACPI thermal management subsystem SYNOPSIS
device acpi DESCRIPTION
The acpi_thermal driver provides the thermal management features of the ACPI module. This driver has a sysctl(8) interface and a devd(8) notification interface. The sysctls export properties of each ACPI thermal zone object. There can be multiple thermal zones in a system. For example, each CPU and the enclosure could all be separate thermal zones, each with its own setpoints and cooling devices. Thermal zones are numbered sequentially in the order they appear in the AML. The acpi_thermal driver also activates the active cooling system according to each thermal zone's setpoints. SYSCTL VARIABLES
hw.acpi.thermal.min_runtime Number of seconds to continue active cooling once started. A new active cooling level will not be selected until this interval expires. hw.acpi.thermal.polling_rate Number of seconds between polling the current temperature. hw.acpi.thermal.user_override If set to 1, allow user override of various setpoints (below). The original values for these settings are obtained from the BIOS and system overheating and possible damage could occur if changed. Default is 0 (no override). hw.acpi.thermal.tz%d.active Current active cooling system state. If this is non-negative, the appropriate _AC%d object is running. Set this value to the desired active cooling level to force the corresponding fan object to the appropriate level. hw.acpi.thermal.tz%d.passive_cooling If set to 1, passive cooling is enabled. It does cooling without fans using cpufreq(4) as the mechanism for controlling CPU speed. Default is enabled for tz0 where it is available. hw.acpi.thermal.tz%d.thermal_flags Current thermal zone status. These are bit-masked values. hw.acpi.thermal.tz%d.temperature Current temperature for this zone. hw.acpi.thermal.tz%d._PSV Temperature to start passive cooling by throttling down CPU, etc. This value can be overridden by the user. hw.acpi.thermal.tz%d._HOT Temperature to start critical suspend to disk (S4). This value can be overridden by the user. hw.acpi.thermal.tz%d._CRT Temperature to start critical shutdown (S5). This value can be overridden by the user. hw.acpi.thermal.tz%d._ACx Temperatures at which to switch to the corresponding active cooling level. The lower the _ACx value, the higher the cooling power. All temperatures are printed in Celsius. Values can be set in Celsius (by providing a trailing "C") or Kelvin (by leaving off any trailing letter). When setting a value by sysctl(8), do not specify a trailing decimal (i.e., 90C instead of 90.0C). NOTIFIES
Notifies are passed to userland via devd(8). See /etc/devd.conf and devd.conf(5) for examples. The acpi_thermal driver sends events with the following attributes: system ACPI subsystem Thermal type The fully qualified thermal zone object path as in the ASL. notify An integer designating the event: 0x80 Current temperature has changed. 0x81 One or more trip points (_ACx, _PSV) have changed. 0x82 One or more device lists (_ALx, _PSL, _TZD) have changed. 0xcc Non-standard notify that the system will shutdown if the temperature stays above _CRT or _HOT for one more poll cycle. SEE ALSO
acpi(4), cpufreq(4), acpidump(8) AUTHORS
Michael Smith This manual page was written by Takanori Watanabe. BSD
March 17, 2007 BSD
All times are GMT -4. The time now is 11:37 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy