Failing to boot - DMVA problem


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Failing to boot - DMVA problem
# 1  
Old 03-17-2006
Failing to boot - DMVA problem

SunOS 5.6

I have a remote test system (test T1/T3, etc) that runs SunOS 5.6. I rebooted the system for no other reason than that it had been up for over 300 days. It failed to boot completely.

I have an ok prompt and when I type boot, I see the following messages..
Boot device: /iommu/sbus/espdma@5,8400000/esp@5,8800000/sd@0,0 File and args: -r
Short read. 0x2000 chars read
Short read. 0x2000 chars read
Short read. 0x2000 chars read
Short read. 0x2000 chars read
|
Watchdog Reset

There are some diagnostic tools that pointed me to the problem:
CS4231 ASIC SelfTest Passed.
Error: DVMA failure internal loopback
expected 2 received 0
Error: DVMA failure internal loopback
expected 4 received 0
Error: DVMA failure internal loopback
expected 8 received 0
Error: DVMA failure internal loopback
expected 10 received 0
Error: DVMA failure internal loopback
expected 20 received 0
Error: DVMA failure internal loopback
expected 40 received 0
Error: DVMA failure internal loopback
expected 80 received 0
Selftest failed. Return code = 7
ok

Based on this output, I used another diag tool to check the memory map and got the following results:
Virtual : 0000.0002
Context : @ 0.01ff.f000 001f.eec1 # 0
Region : @ 0.01fe.ec00 001f.ee71
Segment : @ 0.01fe.e700 001f.ee61
Page : @ 0.01fe.e600 0000.001c Invalid
Stack Underflow

So it appears to me that the system can't find the pages it's looking for. I'm in very short supply of CPUs so will probably attempt to reload the OS/software. This is a bit of a hassle because it's a remote site.

What I'm looking for is opinions on what the most likely cause of the problem is and if there is some way to possibly fix this without a CPU replacement or OS/software reload. I've already reseated the CPU and I get the same results.

Thanks for you time!
# 2  
Old 03-17-2006
Quote:
The system did a watchdog reset.

Watchdog resets are usually hardware related. They occur when a processor
gets a second trap while in the middle of initial processing of a first trap,
during the period when trap handling is disabled. During this period, the
system does not know what to do with the second trap, so it just stops.
Software can cause this, but most Solaris bugs in this vein have been fixed.
A problematic piece of hardware, on the other hand, could cause spurious traps to be sent, some in the middle of processing of legitimate traps.

The system leaves no messages when it reboots.

This could be a watchdog reset on a system with its obprom "watchdog-reboot?"
flag set to true.
OS reload probably won't help you at all.
# 3  
Old 03-17-2006
Thanks RTM. I appreciate the feedback and the education on the watchdog timer. I will attempt the reload first keeping in mind that I will likely need to dig up a CPU.
# 4  
Old 03-17-2006
Why would you attempt the reload at all? Did you note the last line in my other post?
# 5  
Old 03-17-2006
"This could be a watchdog reset on a system with its obprom "watchdog-reboot?"
flag set to true. "

I suppose I don't know enough to see how this would make a reload useless. I basically want to try everything possible before changing the CPU. I only have one left.

If you could expound on what I'm missing (why the reload may be pointless) I would appreciate it.
# 6  
Old 03-18-2006
Watchdog resets are usually hardware related.

Does this server have one or more than one cpu? If more than one, remove one and see if the issue goes away. If it doesn't (still gets a watchdog reset), put that cpu back in, and pull the other one. If the issue goes away, you could always just run on one cpu (if they are so hard to come by) or replace the bad one. If the watchdog reset doesn't go away, then it could be your memory. But attempting to reload the OS will not make a hardware issue go away. You will still have the watchdog reset issue, possibly mess up what is on the hard drive already, and have wasted your time.

The second part of the information I posted states:
The system leaves no messages when it reboots.

This could be a watchdog reset on a system with its obprom "watchdog-reboot?" flag set to true.

This means that if at the ok prompt, if you do a printenv, you will see all the different parameters. If parameter watchdog-reset IS set to true, there may not be a message as to why it had a watchdog-reset. Set it to false for more info.

If you have more information about which CPU or possibly which memory board is bad, post the info.
# 7  
Old 03-21-2006
Thanks for the explanation. The watchdog-reboot is set to false currently. There is only one CPU in this box. I realize from what you've posted that reloading the software is probably an exercise in futility, but I have a good backup of the config and I really do want to avoid replacing the CPU if at all possible......so I'm going to try the reload just in case.

If the reload fails, I'll replace the CPU. Either way, I'll let you know what ultimately fixes the problem. Thanks again for the input.
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Solaris

Problem boot Solaris

Hi I have a problem on a server, when he boot I have this type of message loop, do you know what it corresponds to ? ( requesting internet Adress for 0:3:ba:fa:33:91 ) Regards Sun Fire V240, No Keyboard Copyright 2007 Sun Microsystems, INC. All rights reserved. openboot 4.22.33,... (10 Replies)
Discussion started by: yoyo-tns
10 Replies

2. Solaris

Boot problem

Hi all, I have the the following SCSI diagnostic codes shown in the attached photo I suspect disk drive failure what can I do further to diagnose the disk. and if possible to boot into OS again. (5 Replies)
Discussion started by: h@foorsa.biz
5 Replies

3. Solaris

boot problem

hello i have a sun server V890 and it stops booting after adding 2 CPUs modules and gives me the following error message: Enabling system bus....... Done Initializing CPUs......... Done Initializing boot memory.. RAM-Copy CRC failure!Can't start: No image found FATAL: Can't find/decompress... (12 Replies)
Discussion started by: bahjatm
12 Replies

4. UNIX for Dummies Questions & Answers

boot problem

hello i have a sun server V890 and it stops booting after adding 2 CPUs modules and gives me the following error message: Enabling system bus....... Done Initializing CPUs......... Done Initializing boot memory.. RAM-Copy CRC failure!Can't start: No image found FATAL: Can't find/decompress... (0 Replies)
Discussion started by: bahjatm
0 Replies

5. Solaris

flar restore with svm mirrors failing to create boot image

In a Solaris9 environment I'm trying to restore flash archive (flar) with SVM mirrored devices to same server via jumpstart server and it is failing to create boot file and drops down to a command prompt in single user mode, metastat -i and metastat -p output looks good when compared to the ones... (0 Replies)
Discussion started by: mbak
0 Replies

6. Solaris

Boot problem

Hi everyone, I have sun fire280R machine before i it is installed with solaris5.9 now i want to upgrade it but it not booting from the cd rom ....OK prompt also not comming i dont have solaris keyboard to press stop-A key...i have a intel supported keyboard ...what may be the problem it is... (1 Reply)
Discussion started by: alakshmanrao
1 Replies

7. UNIX for Advanced & Expert Users

unix server failing to boot

i have unix 5.4 running on team server hs failing to boot. when i turn it on, it prints the following lines on the screen and just stops: module ID 01: (0011) Main power supply ok module ID 01: (0012) checks ok module ID 01: (0020) test of SCSI device 0 x 0 in progress module ID 01:... (4 Replies)
Discussion started by: ronst
4 Replies

8. Solaris

solaris boot problem boot error loading interpreter(misc/krtld)

When I installed the SOLARIS 10 OS first time, the desktop would not start up, this was because of network setup. Reinstalled worked. After a week due to some problem I had to reinstall OS, installation went fine and but when i reboot I get this error. cannot find mis/krtld boot error loading... (0 Replies)
Discussion started by: johncy_j
0 Replies

9. UNIX for Advanced & Expert Users

Boot Problem

Hi All. Yesterday my pc worked fine. Today when booting the sequence gets just past mounting the root filesystem - it then says ' Module dependencies up to date (no new kernel modules found).' At this point it hangs and goes no further! Does this sound familiar to anyone? I am running... (2 Replies)
Discussion started by: silvaman
2 Replies

10. UNIX for Advanced & Expert Users

Boot problem

we have a problem with a server UNIX SUN OS 5.7 : we tried to reboot it and we have the following error message : "boot load failed the file just loaded does not appear to be executable " we booted the server in single user mode with the cdrom we made the mirrored disk bootable we tried... (2 Replies)
Discussion started by: farzolito
2 Replies
Login or Register to Ask a Question