|
google site
|
|||||||
| Forums | Register | Blog | Man Pages | Forum Rules | Links | Albums | FAQ | Users | Calendar | Search | Today's Posts | Mark Forums Read |
| Solaris The Solaris Operating System, usually known simply as Solaris, is a Unix-based operating system introduced by Sun Microsystems. The Solaris OS is now owned by Oracle. |
![]() |
|
|
Search this Thread |
|
#1
|
|||
|
|||
|
Sun Fire V490 occasionally downs
I can't find why but I suppose it is because of some hardware failures... I have a Sun Fire V490 with Solaris 10 5/08 which runs Sun Cluster 3.2 and it downs occasionally with a circumstances I can't define... Also I note some strange behavior after server failures: when I ran any diagnostics from OBP, it shows a lot of error: Code:
{12} ok test-all
Testing /pci@9,600000/SUNW,qlc@2
ERROR : RISC RAM failed to load from host buffer.
DEVICE : /pci@9,600000/SUNW,qlc@2
SUBTEST : selftest:mats-test
CALLERS : (f010633c)
MACHINE : Sun Fire V490
SERIAL# : 71218358
DATE : 11/25/2009 07:44:25 GMT
CONTR0LS: diag-level=max test-args=
/pci@9,600000/SUNW,qlc@2 selftest failed, return code = 1
Testing /pci@9,600000/network@1
ERROR : TX DMA block never received packet.
DEVICE : /pci@9,600000/network@1
SUBTEST : selftest:mltpkt-gmii-int-lpb-test
CALLERS : (f010a928)
MACHINE : Sun Fire V490
SERIAL# : 71218358
DATE : 11/25/2009 07:44:37 GMT
CONTR0LS: diag-level=max test-args=
/pci@9,600000/network@1 selftest failed, return code = 1
Testing /pci@9,700000/network@2
ERROR : TX DMA block never received packet.
DEVICE : /pci@9,700000/network@2
SUBTEST : selftest:mltpkt-gmii-int-lpb-test
CALLERS : (f010e9e0)
MACHINE : Sun Fire V490
SERIAL# : 71218358
DATE : 11/25/2009 07:44:49 GMT
CONTR0LS: diag-level=max test-args=
/pci@9,700000/network@2 selftest failed, return code = 1
Testing /pci@9,700000/usb@1,3
Testing /pci@9,700000/ebus@1
ERROR : DMA control status register 1
SUMMARY : Obs=0xef Exp=0x00 XOR=0xef Addr=0x0
DEVICE : /pci@9,700000/ebus@1
SUBTEST : selftest:dma-func-test
CALLERS : (f010162c)
MACHINE : Sun Fire V490
SERIAL# : 71218358
DATE : 11/25/2009 07:44:50 GMT
CONTR0LS: diag-level=max test-args=
/pci@9,700000/ebus@1 selftest failed, return code = 1But after power-off and some time in power-off state, diagnostics doesn't show any issues... Also system log and fmdump -vu show some interesting outputs: Code:
Nov 30 12:00:45 server1 EVENT-TIME: Fri Mar 27 13:56:44 MSK 2009 Nov 30 12:00:45 server1 PLATFORM: SUNW,Sun-Fire-V490, CSN: -, HOSTNAME: server1 Nov 30 12:00:45 server1 SOURCE: eft, REV: 1.16 Nov 30 12:00:45 server1 EVENT-ID: 56b30d73-1f74-6d5c-812b-bca359fdc999 Nov 30 12:00:45 server1 DESC: The transmitting device sent an invalid request. Nov 30 12:00:45 server1 Refer to Sun Message ID: PCIEX-8000-5Y for more information. Nov 30 12:00:45 server1 AUTO-RESPONSE: One or more device instances may be disabled Nov 30 12:00:45 server1 IMPACT: Loss of services provided by the device instances associated with this fault Nov 30 12:00:45 server1 REC-ACTION: Ensure that the latest drivers and patches are installed. Otherwise schedule a repair procedure to repl ace the affected device(s). Use fmdump -v -u <EVENT_ID> to identify the devices or contact Sun for support. [ root@server1 Mon Nov 30 12:09:17 2009 ] / # fmdump -vu 56b30d73-1f74-6d5c-812b-bca359fdc999 TIME UUID SUNW-MSG-ID Nov 30 12:00:45.5664 56b30d73-1f74-6d5c-812b-bca359fdc999 PCIEX-8000-5Y 50% fault.io.pci.device-invreq Problem in: hc://:product-id=SUNW,Sun-Fire-V490:server-id=server1/motherboard=0/hostbridge=1/pcibus=0/pcidev=1/pcifn=0 Affects: dev:////pci@9,600000/network@1 FRU: hc:///component=MB Location: MB 50% fault.io.pci.device-invreq Problem in: hc://:product-id=SUNW,Sun-Fire-V490:server-id=server1/motherboard=0/hostbridge=1/pcibus=0/pcidev=2/pcifn=0 Affects: dev:////pci@9,600000/SUNW,qlc@2 FRU: hc:///component=MB Location: MB I'm stucked! I don't know how to determine the reason of such strange behavior, I consider to check it with SunVTS, but It can not be used in Sun Cluster environment, according to documentation... One of way to find out what is going on is to uninstall Sun Cluster software and run stress testing for a couple of days... Maybe this will tell the matter of failures... |
| Sponsored Links | ||
|
|
|
#2
|
|||
|
|||
|
few things to check. OBP, kernel patch. If there's no cluster issue afterall, it could be a motherboard problem. also analyse the messages file.
|
|
#3
|
|||
|
|||
|
You mean check OBP? But how?.. And what kernel patch should I install?.. SunSolve shows about 30 patches by searching by keywords 'pci network' and 'pci qlc', but I can't find any that meet my problem. Another node in this cluster has no such problems at all... And unfortunately, system messages unexpectedly interrupts on server failures... Can it be a SB error?..
---------- Post updated at 03:20 ---------- Previous update was at 03:18 ---------- Also, I have my metadevices degraded after such failures, but after issuing metasync things get better... I've putted faulted server on component stress testing with SunVTS, I hope it will find out |
|
#4
|
|||
|
|||
|
OBP -> prtconf -V
kernel patch -> is it the latest patch cluster? showrev From the OBP test-all, it shows something related to MB. Unlikely that both the HBA card or network card will go faulty at the same time. |
|
#5
|
|||
|
|||
|
I'm a little bit confused. I have a relatively recent version of OS, two absolutely similar servers (in hardware configuration), I have not installed any patch clusters on both servers, but one of them fails sometime, but another - don't. Also as Sun Fire V490 was released earlier than my OS version I suppose there should be all required patches and software updates... Maybe I'm wrong?..
|
|
#6
|
|||
|
|||
|
Recommended patch cluster is very important. Pls provide the outputs as requested. At least we can advise if its too low.
|
|
#7
|
|||
|
|||
|
incredible
Ok, I will post the output soon... |
| Sponsored Links | ||
|
|
![]() |
| Thread Tools | Search this Thread |
| Display Modes | |
|
|
More UNIX and Linux Forum Topics You Might Find Helpful
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Disable Serials ports in Sun Fire v490 | enkei17 | UNIX for Dummies Questions & Answers | 1 | 08-31-2009 02:12 AM |
| Sun Server T2000 occasionally reboot | webster5u | Solaris | 8 | 06-09-2009 03:05 AM |
| Problem While Configuring IPMP on Sun Fire V490 | Linux Bot | Solaris BigAdmin RSS | 0 | 05-20-2009 11:00 PM |
| V490 - Centerplane failed | nam.nguyen | Solaris | 6 | 11-07-2008 08:35 PM |
| Sol 10 on SUN V490: Setting LOCALE | dewets | Solaris | 2 | 10-19-2007 06:03 AM |