Memory error causing reboot


 
Thread Tools Search this Thread
Operating Systems Solaris Memory error causing reboot
# 1  
Old 11-08-2006
Memory error causing reboot

Hi there

I have a box that at 4pm started recieving soft errors on a DIMM, normally this is ok and we have time to swap it out. But I got the following error which caused the box to reboot

NOTE: there were abount 6 or 7 normal "soft error encountered" messages before this one

Code:
Nov  7 16:02:37 my.box SUNW,UltraSPARC-IIIi: [ID 667005 kern.info] [AFT3] errID 0x00212a5e.7a11dcbb Above Error is in User Mode
Nov  7 16:02:37 my.box     and is fatal: will reboot
Nov  7 16:02:37 my.box SUNW,UltraSPARC-IIIi: [ID 936573 kern.info] NOTICE: [AFT0] Corrected memory (FRC) Event detected by CPU1 at TL=0, e
rrID 0x00212a5e.7a11ddb5
Nov  7 16:02:37 my.box     AFSR 0x00100002<PRIV,CE>.18000027<FRC,FRU> AFAR 0x00000012.0a625570 INVALID
Nov  7 16:02:37 my.box     Fault_PC 0x100350b0 Esynd 0x0027 INVALID J_AID 0 INVALID
Nov  7 16:02:37 my.box SUNW,UltraSPARC-IIIi: [ID 337726 kern.info] NOTICE: [AFT0] Corrected memory (CE) Event detected by CPU1 at TL=0, er
rID 0x00212a5e.7a11ddb5
Nov  7 16:02:37 my.box     AFSR 0x00100002<PRIV,CE>.18000027<FRC,FRU> AFAR 0x00000012.0a625570
Nov  7 16:02:37 my.box     Fault_PC 0x100350b0 Esynd 0x0027 INVALID
Nov  7 16:02:37 my.box SUNW,UltraSPARC-IIIi: [ID 568294 kern.info] NOTICE: [AFT0] Corrected remote memory/cache (RCE) Event detected by CP
U0 at TL=0, errID 0x00212a5e.7a11dcbb
Nov  7 16:02:37 my.box     AFSR 0x00000001<RUE>.81000000<RCE> AFAR 0x00000011.0a0fffe0 INVALID
Nov  7 16:02:37 my.box     Fault_PC 0xffffffff7dc04884 J_REQ 1 INVALID
Nov  7 16:02:37 my.box unix: [ID 855177 kern.warning] WARNING: [AFT1] initiating reboot due to above error in pid 7744 (apas_OaLgw)
Nov  7 16:02:38 my.box SUNW,UltraSPARC-IIIi: [ID 845842 kern.info] NOTICE: [AFT0] Corrected memory (FRC) Event detected by CPU1 at TL=0, e
rrID 0x00212a5e.7a6b8682

My question is this really, is it possible that when an process tries to access the specific bad area of memory on the DIMM it can cause the box to reboot .......because in the above example, its only when the process (apas_OaLgw) gets involvedf that anything happens

any help would be greatly appreciated
# 2  
Old 11-08-2006
Yes it is possible..

From sunsolve:
"On EDP, LDP, CP, UE, BERR, and TO events the system will panic if the address is in kernel space or if the error occurs while the CPU is at a trap level greater than zero. Otherwise, if the affected address is in use by a process, the process will be killed immediately (sent SIGKILL) and the system will be rebooted (as if a privileged user had entered "init 6")."
Tornado
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Shell Programming and Scripting

Pipe causing last command error to not function

Hi I am quite new to scripting and cannot work out how to do the following - I want to pipe to a log file and then use the "last statement error" in an if statement after, and this doesn't work because it checks the pipe statement instead of the script. Example: executteTheScript $var |... (4 Replies)
Discussion started by: erjorgito
4 Replies

2. Shell Programming and Scripting

Help with FTP Script which is causing "syntax error: unexpected end of file" Error

Hi All, Please hav a look at the below peice of script and let me know if there are any syntax errors. i found that the below peice of Script is causing issue. when i use SFTP its working fine, but there is a demand to use FTP only. please find below code and explain if anything is wrong... (1 Reply)
Discussion started by: mahi_mayu069
1 Replies

3. Shell Programming and Scripting

Error Message in function causing failure.....

I have a long busybox ash script that has 3 stages. 1. Identify and Capture information on variable data sources, output the information to text file on each data source. 2. Using data from 1 above now actually do data processing on each individual dataset. 3. Produce report. So... (6 Replies)
Discussion started by: tesser
6 Replies

4. Shell Programming and Scripting

Spaced input causing awk error

Hi all, Just want to say thanks for the great forum you have here, the old topics and posts have helped tremendously. So much so that I have managed to figure a lot out just by researching. However, I'm having a small issue that I simply can't find the answer to. (4 Replies)
Discussion started by: whyte_rhyno
4 Replies

5. Solaris

Unexpected error on reboot in UNIX

Hi all, My remote unix machine failed unexpectly, and I am unable to login to it. Here is what I can see on the screen - > Boot device: .... File and args: -i > Boot load failed. > The file just loaded does not appear to be executable. > {1} ok How can I fix this problem? Has... (1 Reply)
Discussion started by: bhakti.gandhi
1 Replies

6. UNIX for Advanced & Expert Users

Out of Memory error when free memory size is large

I was running a program and it stopped and showed "Out of Memory!". at that time, the RAM used by this process is around 4G and the free memory size of the machine is around 30G. Does anybody know what maybe the reason? this program is written with Perl. the OS of the machine is Solaris U8. And I... (1 Reply)
Discussion started by: lilili07
1 Replies

7. Red Hat

RHEL5 reboot - error loading shared library

Hi All, I have RHEL 5 installed in my system. Something must has happened because when i reboot the server, it came with many error.. /usr/bin/rhgb-client -- error while loading shared libraries: libpopt.so.0. Can't open shared object files. No such file/directory It finnaly ends with the... (0 Replies)
Discussion started by: c00kie88
0 Replies

8. AIX

"Out of Memory" after reboot for only Tuxedo DB servers on AIX 5.3

Hello all, I am working on an AIX 5.3L machine. I am currently trying to work on getting all of my Tuxedo servers booted for my app. However, some of these servers also use the Oracle 10g R2 client to connect to the 10g2 DB. AT FIRST, everything was working fine. Anyway, one night all... (3 Replies)
Discussion started by: swvahokie
3 Replies

9. Solaris

[help]network error after reboot server v890 sparc

hi expert, i had reinstall the sun v890 server solaris 8 and also do mirroring, i had configure the network for the server (hostname.eri0,hosts,netmasks,nodename,etc) after i reboot get and error messages below : Setting default IPv4 interface for multicast: add net 224.0/4: gateway... (5 Replies)
Discussion started by: bucci
5 Replies

10. Solaris

Explorer causing syslog error

Hi there, I have upgraded my explorer (SUNWexplo) on a solaris 10 Sparc box from version 3.4 to the latest version (5.5) . However im a little concerned, whenever I run the new explorer either manually or scheduled, I get a syslog event as follows 1 in 0:08:31: Sep 22 17:00:15 fmy.machine.com... (8 Replies)
Discussion started by: hcclnoodles
8 Replies
Login or Register to Ask a Question