Hi,
Recently I installed Fedora 9 on the following hardware
- Asus A8N-SLI Deluxe motherboard bios version 1805
- 2GB twinmos ram
- AMD 4400 CPU
- Tagan PSU 550 W
- Asus EN6200LE video card
- WD 74 GB Raptor
- Areca ARC-1222 raid controller
- 4x 1TB Seagate Baracudas
- Symbios Logic 53C875J SCSI controller card (made for Compaq)
- HP surestore DAT40 tape drive
Fedora installed, booted and worked fine for a couple of days. With yum I installed all relevant updates.
Trouble started when using Amanda for the first backups to tape. Amanda would work ok a few times, but then
the entire machine crashed. I mean really crashed. The machine would not get through the bios post.
So I cleared cmos by removing battery , setting jumper appropriately and a wait for 15 secs. No avail, motherboard dead.
I ordered a replacement identical motherboard and put everything back together. Linux boots fine and I did not touch Amanda
for a week. All was well, so I thought. I used the machine intensively, copying over 1 TB of data to the raid array, installing
Horde packages and all kinds of other fun stuff. No problems what so ever.
I did look through the logs obviously. The only entries of note were related to the scsi controller card. A couple of
SCSI bus resets just prior to the crash. I did find a few articles from 2005 on the net about 53c8XX driver problems:
Please fix bug #1852 (hald causes SYM53C8xx SCSI errors, device disconnects + GNOME hang). Surely this problems was fixed a long time ago ?
For double measure I also checked all termination caps and scsi cables.
I am pretty sure, but not absolutely sure, these resets were related to the 53c875J scsi controller card and not to the
Areca raid card. Anyhow, I had no problems with the raid array at all, even when using it intensively.
The next weekend I ran an Amanda backup again. Two amflush jobs went fine, so old backups on holding disk were flushed to tape ok.
Then I proceeded with a new backup (amdump). After some time the machine crashed again. Absolutely identical symptoms.
This time, I stripped the machine down to bare minimum.
Only motherboard, PSU, 1 GB ram, AMD 4400 CPU, old pci videocard, keyboard, monitor.
Result only one beep (that's good) and colorful gibberish on the monitor, not even the bios mem check and such.
So, it appears that some error related to using the backup software (scsi?) causes the motherboard to die.
Presently, I have two courses of action I can think of:
1. I ordered a new bios chip, hoping that the board will then get through post.
If this works it suggests to me that some error in scsi subsystem can actually overwrite (flash!) the motherboard bios.
Two weeks ago, I had not believed this possible, but here it is.
2. If option 1. does not work I will order yet another replacement motherboard and think of a new backup strategy.
I do not mind chasing bugs, but loosing a motherboard at every step of the way is not very appealing.
So out with the scsi card.
BTW until I get the machine up and running again I cannot look at the logs and present more detailed error reports.
This is all from memory.
I have spent quite some time googling this particular problem. I cannot find any similar cases.
So anyone out there, does this ring any bells?
Thanks
Jos