Hi there
I have a box that at 4pm started recieving soft errors on a DIMM, normally this is ok and we have time to swap it out. But I got the following error which caused the box to reboot
NOTE: there were abount 6 or 7 normal "soft error encountered" messages before this one
Code:
Nov 7 16:02:37 my.box SUNW,UltraSPARC-IIIi: [ID 667005 kern.info] [AFT3] errID 0x00212a5e.7a11dcbb Above Error is in User Mode
Nov 7 16:02:37 my.box and is fatal: will reboot
Nov 7 16:02:37 my.box SUNW,UltraSPARC-IIIi: [ID 936573 kern.info] NOTICE: [AFT0] Corrected memory (FRC) Event detected by CPU1 at TL=0, e
rrID 0x00212a5e.7a11ddb5
Nov 7 16:02:37 my.box AFSR 0x00100002<PRIV,CE>.18000027<FRC,FRU> AFAR 0x00000012.0a625570 INVALID
Nov 7 16:02:37 my.box Fault_PC 0x100350b0 Esynd 0x0027 INVALID J_AID 0 INVALID
Nov 7 16:02:37 my.box SUNW,UltraSPARC-IIIi: [ID 337726 kern.info] NOTICE: [AFT0] Corrected memory (CE) Event detected by CPU1 at TL=0, er
rID 0x00212a5e.7a11ddb5
Nov 7 16:02:37 my.box AFSR 0x00100002<PRIV,CE>.18000027<FRC,FRU> AFAR 0x00000012.0a625570
Nov 7 16:02:37 my.box Fault_PC 0x100350b0 Esynd 0x0027 INVALID
Nov 7 16:02:37 my.box SUNW,UltraSPARC-IIIi: [ID 568294 kern.info] NOTICE: [AFT0] Corrected remote memory/cache (RCE) Event detected by CP
U0 at TL=0, errID 0x00212a5e.7a11dcbb
Nov 7 16:02:37 my.box AFSR 0x00000001<RUE>.81000000<RCE> AFAR 0x00000011.0a0fffe0 INVALID
Nov 7 16:02:37 my.box Fault_PC 0xffffffff7dc04884 J_REQ 1 INVALID
Nov 7 16:02:37 my.box unix: [ID 855177 kern.warning] WARNING: [AFT1] initiating reboot due to above error in pid 7744 (apas_OaLgw)
Nov 7 16:02:38 my.box SUNW,UltraSPARC-IIIi: [ID 845842 kern.info] NOTICE: [AFT0] Corrected memory (FRC) Event detected by CPU1 at TL=0, e
rrID 0x00212a5e.7a6b8682