Hardware Diagnosis

 
Thread Tools Search this Thread
Special Forums Hardware Hardware Diagnosis
# 1  
Old 12-01-2012
Hardware Diagnosis

Hello everyone,

I'm having an odd problem I've never encountered before and I'm at the point where the only things I have not replaced are the CPUs, Motherboard and Power Supply.

The errors:
Code:
[17100.704058] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc2b400013080a13
[17100.704069] [Hardware Error]: 	MC4_ADDR: 0x0000000001017d80
[17100.704071] [Hardware Error]: Northbridge Error (node 0): DRAM ECC error detected on the NB.
[17100.704075] [Hardware Error]: cache level: L3/GEN, mem/io: MEM, mem-tx: RD, part-proc: RES (no timeout)

This is the Slackware64 14.0 version of the errors. I get thousands of these and you can set your NTP by them. The first one is ALWAYS at 300 seconds after boot. I don't know if this pattern indicates anything.

The build:
2x AMD Opteron 2435
8x HP DDR2 400 (first RAM installed)
4x Hynix DDR2 400 (mobo. mfg. recommended)
SuperMicro H8DAE-2
OCZ Petrol 128G SSD
Enermax NAXN 750AWT
Slackware64 13.37 recently updated to 14. I tried CentOS live as well as Knoppix live and I get the same type of symptoms.

Everything is new except the RAM. I had 1 CPU installed and I tried everything I could to get the HP memory working. The Motherboard mfgr. said the RAM was not supported and "too slow because it's CAS 3". So I bought some of their recommended RAM which was only different in Tras and Trc by 1 CLK. This didn't fix anything.

Can someone with more experience solving this type of issue give me some assistance resolving this issue?

If I try one DIMM at a time with any of the RAM, I don't get any errors. I've done all the test I can think of and I will redo them if requested.

Here is the pattern:
Code:
[  300.704038] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc454000bd080813
[  600.704041] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc0540009a080813
[  900.704043] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc52400041080813
[ 1200.704041] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc0c4000a9080813
[ 1500.704039] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc33400030080a13
[ 1800.704043] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc20c0006d080813
[ 2100.704043] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc694000bf080813
[ 2400.704029] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc694000bf080813
[ 2700.704037] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc22400020080813
[ 3000.704041] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc44c00086080813
[ 3300.704064] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc504000ca080813
[ 3600.704035] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc1b400055080813
[ 3900.704043] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc324000cd080813
[ 4200.704046] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc77c000b6080813
[ 4500.704046] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc31c0007d080813
[ 4800.704040] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc2cc000c4080a13
[ 5100.704037] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc16c000c7080813
[ 5400.704040] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc3f400099080813
[ 5700.704038] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc0540009a080813
[ 6000.704044] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc14c0004c080813
[ 6300.704021] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc22400020080813
[ 6600.704035] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc60c0004a080813
[ 6900.704038] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc33c000f6080a13
[ 7200.704045] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc6b400034080813
[ 7500.704038] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc52400041080813
[ 7800.704057] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc20c0006d080813
[ 8100.704045] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc11400010080813
[ 8400.704052] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc77c000b6080813
[ 8700.704046] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc77400070080813
[ 9000.704046] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc12400066080813
[ 9300.704041] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc39400075080813
[ 9600.704031] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc04400067080813
[ 9900.704030] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc77c000b6080813
[10200.704039] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc694000bf080813
[10500.704045] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc694000bf080813
[10800.704043] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc6ac0000f080813
[11100.704054] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc504000ca080813
[11400.704047] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc64c0002d080813
[11700.704043] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc00c000c6080813
[12000.704045] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc08c00008080813
[12300.704038] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc04400067080813
[12600.704031] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc52400041080813
[12900.704045] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc5d40009e080a13
[13200.704042] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc22400020080813
[13500.704045] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc44400040080813
[13800.704046] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc08c00008080813
[14100.704044] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc7a400024080813
[14400.704047] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc04400067080813
[14700.704046] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc22400020080813
[15000.704058] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc694000bf080813
[15300.704056] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc23c0001b080813
[15600.704040] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc49400014080813
[15900.704045] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc504000ca080813
[16200.704041] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc57c0001d080a13
[16500.704043] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc694000bf080813
[16800.704041] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc33400030080813
[17100.704058] [Hardware Error]: CPU:0	MC4_STATUS[Over|CE|MiscV|-|AddrV|CECC]: 0xdc2b400013080a13

# 2  
Old 12-01-2012
See if you can find an update for your BIOS.
# 3  
Old 12-02-2012
I had to install the latest BIOS to get the Opteron "Istanbul" Hex Core processors working with the motherboard.
# 4  
Old 12-07-2012
I would recommend to check power supply output power to requirements of hardware and condition of capacitors on the mobo.
It could be also other sort of motherboard fault and even small chance that it is CPU. If you have two CPUs, try to change between them and if problem returns on CPU0 then mobo, if on CPU1 then it is CPU.
# 5  
Old 12-07-2012
Considering AMD processors have their memory controllers built into them, it could easily be the CPU.

Unfortunately, it could easily be lots of other things too; hard to be certain without testing Smilie
This User Gave Thanks to Corona688 For This Post:
# 6  
Old 03-08-2013
I should recommend that first of all you should check your all hardware like power supply, MOBO or processors issue. After it you should check your BIOS. I am confident that it's work.
This User Gave Thanks to Amous For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Solaris

Hardware faulty, but which hardware?

Hi folk, I have this hardware faunty message, but dont know which hardware is this ? can you guide me ? --------------- ------------------------------------ -------------- --------- TIME EVENT-ID MSG-ID SEVERITY ---------------... (9 Replies)
Discussion started by: dehetoxic
9 Replies

2. AIX

System P hardware

hello everybody, Is there any training to get deep knowledge an hands on lab with system p hardware firmware microcode upgrading etc... thanks in advance (2 Replies)
Discussion started by: Vit0_Corleone
2 Replies

3. UNIX for Advanced & Expert Users

Performance diagnosis & tuning

Hi, I am facing a strange issue. Application is deployed in a cluster with 2 Unix nodes (with same configuration). On one node the application is working fine but on another node we see this behavior I found using vmstat- when the server is not yet started everything is OK; when you start the... (3 Replies)
Discussion started by: ash.abrol
3 Replies

4. AIX

New Hardware

Can someone help me with what I am guessing is a simple job for an AIX admin. However I am 100% HP-UX and not touched AIX before the start of this week. I am trying to connect an IBM Blade (JS22) to our HP Enterprise Tape Library. I have done all the SAN zoning and this appears to be happy... (5 Replies)
Discussion started by: Andyp2704
5 Replies

5. UNIX for Advanced & Expert Users

How to diagnosis the problem on Solaris 10 for DB startup

I run Oracle 10g on Solaris 10 Sparc machine. I created a dbora file to automatically start Oracle database when UNIX system reboot. I created this dbora file under /etc/init.d, Then link it to /etc/rc0.d/K10dbora and /etc/rc2.d/S99dbora. When I reboot solaris 10 system to test this script. It... (2 Replies)
Discussion started by: duke0001
2 Replies

6. Solaris

Hardware

Hi, I'm looking to run Sun Solaris 8 or 9, but have been running windows :mad: .can anyone give me advice about the hardware needed for solaris and possably any software i may need, the type of model and where i may be able to buy these within the uk. :confused: ... (3 Replies)
Discussion started by: franz
3 Replies

7. Programming

C and hardware !

Hello ! I have a friend , in one day he tell me this : some guy made a cool program in C , for some sort or hardware control . I say : wow ! Maybe someone , can give me an example , how can C control hardware so good ( as I hear ) , and maybe some cool information , where to learn the idea ,... (1 Reply)
Discussion started by: !_30
1 Replies

8. UNIX for Dummies Questions & Answers

Per and Hardware

Is it possible to use Perl to work with hardware? Or is it strictly for text processing. (1 Reply)
Discussion started by: Luftwaffe
1 Replies

9. UNIX Desktop Questions & Answers

hardware 3d

How do I check my system to see if the graphics are using hardware 3d suport or not. I have a TNT2 on RH 7.1 kernel 2.4.9-?(can't remember off top of my head)I'm running all the latest updates from RH. Also I now have a dvd drive and I am trying to find a player that will play all movies and not... (1 Reply)
Discussion started by: MaxCat
1 Replies

10. UNIX for Dummies Questions & Answers

Hardware

Hi I have a problem to start up my Solaris Ultra 5. When it boots up I get a strange clicking sound, sounds like the hard drive that is "thinking" very hard. On the screen I get these messages: Boot device: Files and Args: Please check cable and try again Network link setup fail Time out... (8 Replies)
Discussion started by: Orange
8 Replies
Login or Register to Ask a Question