The UNIX and Linux Forums  

Go Back   The UNIX and Linux Forums > OS Specific Forums > SUN Solaris
Google UNIX.COM


SUN Solaris The Solaris Operating System, usually known simply as Solaris, is a free Unix-based operating system introduced by Sun Microsystems .

More UNIX and Linux Forum Topics You Might Find Helpful
Thread Thread Starter Forum Replies Last Post
"Out of Memory" after reboot for only Tuxedo DB servers on AIX 5.3 swvahokie AIX 3 05-12-2008 08:46 AM
[help]network error after reboot server v890 sparc bucci SUN Solaris 5 09-29-2006 02:54 AM
Explorer causing syslog error hcclnoodles SUN Solaris 8 09-25-2006 06:11 AM
different between soft reboot and hard reboot seelan3 SUN Solaris 3 09-20-2006 11:12 PM
How to handle memory error? diganta High Level Programming 1 04-20-2005 01:34 PM

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 11-08-2006
Registered User
 

Join Date: Mar 2002
Posts: 165
Memory error causing reboot

Hi there

I have a box that at 4pm started recieving soft errors on a DIMM, normally this is ok and we have time to swap it out. But I got the following error which caused the box to reboot

NOTE: there were abount 6 or 7 normal "soft error encountered" messages before this one

Code:
Nov  7 16:02:37 my.box SUNW,UltraSPARC-IIIi: [ID 667005 kern.info] [AFT3] errID 0x00212a5e.7a11dcbb Above Error is in User Mode
Nov  7 16:02:37 my.box     and is fatal: will reboot
Nov  7 16:02:37 my.box SUNW,UltraSPARC-IIIi: [ID 936573 kern.info] NOTICE: [AFT0] Corrected memory (FRC) Event detected by CPU1 at TL=0, e
rrID 0x00212a5e.7a11ddb5
Nov  7 16:02:37 my.box     AFSR 0x00100002<PRIV,CE>.18000027<FRC,FRU> AFAR 0x00000012.0a625570 INVALID
Nov  7 16:02:37 my.box     Fault_PC 0x100350b0 Esynd 0x0027 INVALID J_AID 0 INVALID
Nov  7 16:02:37 my.box SUNW,UltraSPARC-IIIi: [ID 337726 kern.info] NOTICE: [AFT0] Corrected memory (CE) Event detected by CPU1 at TL=0, er
rID 0x00212a5e.7a11ddb5
Nov  7 16:02:37 my.box     AFSR 0x00100002<PRIV,CE>.18000027<FRC,FRU> AFAR 0x00000012.0a625570
Nov  7 16:02:37 my.box     Fault_PC 0x100350b0 Esynd 0x0027 INVALID
Nov  7 16:02:37 my.box SUNW,UltraSPARC-IIIi: [ID 568294 kern.info] NOTICE: [AFT0] Corrected remote memory/cache (RCE) Event detected by CP
U0 at TL=0, errID 0x00212a5e.7a11dcbb
Nov  7 16:02:37 my.box     AFSR 0x00000001<RUE>.81000000<RCE> AFAR 0x00000011.0a0fffe0 INVALID
Nov  7 16:02:37 my.box     Fault_PC 0xffffffff7dc04884 J_REQ 1 INVALID
Nov  7 16:02:37 my.box unix: [ID 855177 kern.warning] WARNING: [AFT1] initiating reboot due to above error in pid 7744 (apas_OaLgw)
Nov  7 16:02:38 my.box SUNW,UltraSPARC-IIIi: [ID 845842 kern.info] NOTICE: [AFT0] Corrected memory (FRC) Event detected by CPU1 at TL=0, e
rrID 0x00212a5e.7a6b8682
My question is this really, is it possible that when an process tries to access the specific bad area of memory on the DIMM it can cause the box to reboot .......because in the above example, its only when the process (apas_OaLgw) gets involvedf that anything happens

any help would be greatly appreciated
Reply With Quote
Forum Sponsor
  #2 (permalink)  
Old 11-08-2006
Tornado's Avatar
Registered User
 

Join Date: Nov 2006
Location: Melbourne
Posts: 240
Yes it is possible..

From sunsolve:
"On EDP, LDP, CP, UE, BERR, and TO events the system will panic if the address is in kernel space or if the error occurs while the CPU is at a trap level greater than zero. Otherwise, if the affected address is in use by a process, the process will be killed immediately (sent SIGKILL) and the system will be rebooted (as if a privileged user had entered "init 6")."
Reply With Quote
Google UNIX.COM
Reply

Thread Tools
Display Modes




All times are GMT -7. The time now is 07:56 PM.


Powered by: vBulletin, Copyright ©2000 - 2006, Jelsoft Enterprises Limited.
The UNIX and Linux Forums Content Copyright ©1993-2008 The CEP Blog All Rights Reserved -Ad Management by RedTyger Visit The Global Fact Book

Content Relevant URLs by vBSEO 3.2.0