Sun Fire v440 Hard disk or controller broken? WARNING: /pci@1f,700000/scsi@2/sd@0,0 (sd1)


 
Thread Tools Search this Thread
Operating Systems Solaris Sun Fire v440 Hard disk or controller broken? WARNING: /pci@1f,700000/scsi@2/sd@0,0 (sd1)
# 1  
Old 02-13-2020
Sun Fire v440 Hard disk or controller broken? WARNING: /pci@1f,700000/scsi@2/sd@0,0 (sd1)

Hi,


I have a Sun Fire V440 server that fails to boot up correctly. A lot of services are not started and the sytems acts really slow to commands. During boot I can see the following Error:

Code:
WARNING: /pci@1f,700000/scsi@2/sd@0,0 (sd1):
        SCSI transport failed: reason 'reset': retrying command
WARNING: /pci@1f,700000/scsi@2/sd@0,0 (sd1):
        Error for Command: read                    Error Level: Retryable
        Requested Block: 689376                    Error Block: 689390
        Vendor: LSILOGIC                           Serial Number: LSI INTERNAL
        Sense Key: Media Error
        ASC: 0x11 (read retries exhausted), ASCQ: 0x1, FRU: 0x0

The first two disks sd0 and sd1 are configured as raid 1 it seems. So I would assume that one of those disks is bad. But raidctl shows no errors:

Code:
RAID    Volume  RAID            RAID            Disk
Volume  Type    Status          Disk            Status
------------------------------------------------------
c1t0d0  IM      RESYNCING       c1t0d0          OK
                                 c1t1d0          OK


But iostat -en shows soft and hard errors for the raid:

Code:
bash-3.00# iostat -en
  ---- errors ---
  s/w h/w trn tot
    3   6   0   9 c1t0d0
    0   0   0   0 c1t2d0
    0   0   0   0 c1t3d0
    1   0   0   1 c3t600144F0A549542200005CC83C9C0003d0
    1   0   0   1 ssd3

Is it possible that the Raid controller is broken?

Code:
bash-3.00# prtdiag -v
System Configuration: Sun Microsystems  sun4u Sun Fire V440
System clock frequency: 183 MHZ
Memory size: 16GB

==================================== CPUs ====================================
               E$          CPU                    CPU
CPU  Freq      Size        Implementation         Mask    Status      Location
---  --------  ----------  ---------------------  -----   ------      --------
0    1281 MHz  1MB         SUNW,UltraSPARC-IIIi    2.4    on-line      -
1    1281 MHz  1MB         SUNW,UltraSPARC-IIIi    2.4    on-line      -
2    1281 MHz  1MB         SUNW,UltraSPARC-IIIi    2.4    on-line      -
3    1281 MHz  1MB         SUNW,UltraSPARC-IIIi    2.4    on-line      -

================================= IO Devices =================================
Bus     Freq  Slot +      Name +
Type    MHz   Status      Path                          Model
------  ----  ----------  ----------------------------  --------------------
pci     66    MB          pci108e,abba (network)        SUNW,pci-ce
              okay        /pci@1c,600000/network@2

pci     33    MB          isa/su (serial)
              okay        /pci@1e,600000/isa@7/serial@0,3f8

pci     33    MB          isa/su (serial)
              okay        /pci@1e,600000/isa@7/serial

pci     33    MB          isa/rmc-comm-rmc_comm (seria+
              okay        /pci@1e,600000/isa@7/rmc-comm@0,3e8

pci     33    MB          pci10b9,5229 (ide)
              okay        /pci@1e,600000/ide

pci     66    MB          pci108e,abba (network)        SUNW,pci-ce
              okay        /pci@1f,700000/network@1

pci     66    MB          scsi-pci1000,30 (scsi-2)      LSI,1030
              okay        /pci@1f,700000/scsi@2

pci     66    MB          scsi-pci1000,30 (scsi-2)      LSI,1030
              okay        /pci@1f,700000/scsi


============================ Memory Configuration ============================
Segment Table:
-----------------------------------------------------------------------
Base Address       Size       Interleave Factor  Contains
-----------------------------------------------------------------------
0x0                4GB               16          BankIDs 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
0x1000000000       4GB               16          BankIDs 16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31
0x2000000000       4GB               16          BankIDs 32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47
0x3000000000       4GB               16          BankIDs 48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63

Bank Table:
-----------------------------------------------------------
           Physical Location
ID       ControllerID  GroupID   Size       Interleave Way
-----------------------------------------------------------
0        0             0         256MB           0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
1        0             0         256MB
2        0             1         256MB
3        0             1         256MB
4        0             0         256MB
5        0             0         256MB
6        0             1         256MB
7        0             1         256MB
8        0             1         256MB
9        0             1         256MB
10       0             0         256MB
11       0             0         256MB
12       0             1         256MB
13       0             1         256MB
14       0             0         256MB
15       0             0         256MB
16       1             0         256MB           0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
17       1             0         256MB
18       1             1         256MB
19       1             1         256MB
20       1             0         256MB
21       1             0         256MB
22       1             1         256MB
23       1             1         256MB
24       1             1         256MB
25       1             1         256MB
26       1             0         256MB
27       1             0         256MB
28       1             1         256MB
29       1             1         256MB
30       1             0         256MB
31       1             0         256MB
32       2             0         256MB           0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
33       2             0         256MB
34       2             1         256MB
35       2             1         256MB
36       2             0         256MB
37       2             0         256MB
38       2             1         256MB
39       2             1         256MB
40       2             1         256MB
41       2             1         256MB
42       2             0         256MB
43       2             0         256MB
44       2             1         256MB
45       2             1         256MB
46       2             0         256MB
47       2             0         256MB
48       3             0         256MB           0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
49       3             0         256MB
50       3             1         256MB
51       3             1         256MB
52       3             0         256MB
53       3             0         256MB
54       3             1         256MB
55       3             1         256MB
56       3             1         256MB
57       3             1         256MB
58       3             0         256MB
59       3             0         256MB
60       3             1         256MB
61       3             1         256MB
62       3             0         256MB
63       3             0         256MB

Memory Module Groups:
--------------------------------------------------
ControllerID   GroupID  Labels         Status
--------------------------------------------------
0              0        C0/P0/B0/D0
0              0        C0/P0/B0/D1
0              1        C0/P0/B1/D0
0              1        C0/P0/B1/D1
1              0        C1/P0/B0/D0
1              0        C1/P0/B0/D1
1              1        C1/P0/B1/D0
1              1        C1/P0/B1/D1
2              0        C2/P0/B0/D0
2              0        C2/P0/B0/D1
2              1        C2/P0/B1/D0
2              1        C2/P0/B1/D1
3              0        C3/P0/B0/D0
3              0        C3/P0/B0/D1
3              1        C3/P0/B1/D0
3              1        C3/P0/B1/D1

============================ Environmental Status ============================
Fan Status:
-------------------------------------------
Location             Sensor          Status
-------------------------------------------
FT0/F0               TACH            okay
FT1/F0               TACH            okay
FT1/F1               TACH            okay
PS0                  FF_PDCT_FAN     okay

Temperature sensors:
-----------------------------------------
Location       Sensor              Status
-----------------------------------------
C0/P0          T_CORE              okay
C1/P0          T_CORE              okay
C2/P0          T_CORE              okay
C3/P0          T_CORE              okay
C0             T_AMB               okay
C1             T_AMB               okay
C2             T_AMB               okay
C3             T_AMB               okay
SCSIBP         T_AMB               okay
MB             T_AMB               okay
------------------------------------
Current sensors:
----------------------------------------
Location             Sensor       Status
----------------------------------------
MB                   FF_SCSIA     okay
MB                   FF_SCSIB     okay
MB                   FF_POK       okay
C0/P0                FF_POK       okay
C1/P0                FF_POK       okay
C2/P0                FF_POK       okay
C3/P0                FF_POK       okay
------------------------------------
Voltage sensors:
-----------------------------------
Location       Sensor        Status
-----------------------------------
MB             V_+1V5        okay
MB             V_VCCTM       okay
MB             V_NET0_1V2D   okay
MB             V_NET1_1V2D   okay
MB             V_NET0_1V2A   okay
MB             V_NET1_1V2A   okay
MB             V_+3V3        okay
MB             V_+3V3STBY    okay
MB/BAT         V_BAT         warning (0.00V)
MB             V_SCSI_CORE   okay
MB             V_+5V         okay
MB             V_+12V        okay
MB             V_-12V        okay
PS0            P_PWR         okay
PS0            FF_POK        okay
-----------------------------------------
Keyswitch:
-----------------------------------------
Location       Keyswitch   State
-----------------------------------------
SYS            SYSCTRL     NORMAL
--------------------------------------------------
Led State:
--------------------------------------------------------------
Location               Led                   State       Color
--------------------------------------------------------------
SYS                    ACT                   on          green
SYS                    SERVICE               on          amber
SYS                    LOCATE                off         white
PS0                    POK                   on          green
PS0                    STBY                  on          green
PS0                    SERVICE               off         amber
PS0                    OK2RM                 off         blue
HDD0                   SERVICE               off         amber
HDD0                   OK2RM                 off         blue
HDD1                   SERVICE               off         amber
HDD1                   OK2RM                 off         blue
HDD2                   SERVICE               off         amber
HDD2                   OK2RM                 off         blue
HDD3                   SERVICE               off         amber
HDD3                   OK2RM                 off         blue

=========================== FRU Operational Status ===========================
---------------------------------
Fru Operational Status:
---------------------------------
Location                Status
---------------------------------
SC                      okay
HDD0                    present
HDD1                    present
HDD2                    present
HDD3                    present
PS0                     okay

================================ HW Revisions ================================
ASIC Revisions:
-------------------------------------------------------------------
Path                   Device           Status             Revision
-------------------------------------------------------------------
/pci@1c,600000         pci108e,a801     okay               4
/pci@1d,700000         pci108e,a801     okay               4
/pci@1e,600000         pci108e,a801     okay               4
/pci@1f,700000         pci108e,a801     okay               4

System PROM revisions:
----------------------
OBP 4.16.4 2004/12/18 05:20 Sun Fire V440,Netra 440
OBDIAG 4.16.4 2004/12/18 05:21



I'm really thankful for any hints, as I have no clue how to proceed with this.


Best Regards,
Oliver
# 2  
Old 02-13-2020
The Raid controller is not showing no problems, as you put it.

RESYNCING means that the controller is remirroring the Raid1 disks because of a problem. Depending on the capacity of the Raid1 disks (they will typically be exactly the same size) this resyncing shouldn't take very long, however, whilst this is in progress, system response time will be impacted. Once complete, the status should become OPTIMAL.

If the resyncing is falling over for some reason then the process might be restarting over and over and OPTIMAL is never achieved. What for that. If that is the case I would be inclined to first if possible take the system down and re-seat all SCSI/SATA cables both ends (disk and mobo) and all disk power supply plugs. Reboot and see if the problem persists. If it does, then most likely one of the disks is faulty. It's possible but unlikely that the raid controller is faulty. All the moving parts are the disks.

You could remove the faulty raid1 drive (the one continuously resyncing) and put it on another machine running diagnostics. Perhaps completely reformat and try again. Otherwise, it's a new disk required.
# 3  
Old 02-13-2020
Hi,


the status is shown as optimal. I would guess that if a disk is failed or failing raidctl would show that? How can I identify which of the two disks are bad if raidctl claims everything is ok. I have powered down the server many times. I have not replugged all the cables yet. I will give it a try.

Last edited by hicksd8; 02-13-2020 at 06:32 AM..
# 4  
Old 02-13-2020
Watch for a repeat of resyncing. If it keeps happening something is wrong (probably with one of the disks). You will also see high disk activity on the disk LEDs which might be easier to spot than keep doing a raidctl.
# 5  
Old 02-13-2020
Do I need to issue a command in order to remove one of the disks? Is the raid hotplug capable?
# 6  
Old 02-13-2020
Yes, the onboard raid is hotplug capable but, of course, you need to be sure that you're pulling the right disk.

With the system down you can pull out and re-seat both of them to try to ensure good connection with the hotplug sockets.
# 7  
Old 02-13-2020
Also, from your original post, it shows that c1t0d0 is the disk being rebuilt (RESYNCING) and c1t1d0 is running OK.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Solaris

Removing a disk from SUN Fire V440 running Solaris 8

Hi, I have a SUN Fire V440 server running Solaris 8. One of the 4 disks do not appear when issued the format command. The "ready to remove" LED is not on either. Metastat command warns that this disk "Needs maintenace". Can I just shutdown and power off the machine and then insert an... (5 Replies)
Discussion started by: Echo68
5 Replies

2. Solaris

Sun-Fire V440 boot disk issue

Hi, I have Sun Fire V440. Boot disks are mirrored. system crashed and it's not coming up. Error message is Insufficient metadevice database replicas located. Use Metadb to delete databases which are broken. Boot disks are mirrored and other disks are ZFS configuration. Please... (2 Replies)
Discussion started by: samnyc
2 Replies

3. Solaris

Sun Fire v440 Over heat Problem.

Dear Team, I need some expert advice to my problem. We have a Sun Fire v440 in our customer Place. Server is working fine and no hardware deviations are found except one problem that processors generating too much heat. I have verified and found that the room temperature was 26-27 degree.... (5 Replies)
Discussion started by: sudhansu
5 Replies

4. Solaris

Firmware password Solaris Sun Fire v440

Hi: I bougth an used Sun Fire v440, and It have a firmware password. When I turn on the server, it ask for firmware password. (I don 't know what is the correct password). I can access to SC, but when I want to access to OBP, Firmware Password appears again. I remove the battery for two hours,... (1 Reply)
Discussion started by: mguazzardo
1 Replies

5. AIX

SCSI PCI - X RAID Controller card RAID 5 AIX Disks disappeared

Hello, I have a scsi pci x raid controller card on which I had created a disk array of 3 disks when I type lspv ; I used to see 3 physical disks ( two local disks and one raid 5 disk ) suddenly the raid 5 disk array disappeared ; so the hardware engineer thought the problem was with SCSI... (0 Replies)
Discussion started by: filosophizer
0 Replies

6. Solaris

error messages in Sun Fire V440

Hello, I am seeing error messages in V440 (OS = solaris 8). I have copied here : The system does not reboot constantly and it is up for last 67 days. One more interesting thing I found, I see errors start appearing at 4:52AM last until 6am and again start at 16:52am on same day.. I... (5 Replies)
Discussion started by: upengan78
5 Replies

7. Solaris

Sun Fire v440 hardware problem (can't get ok>)

First of all it's shut down 60 second after power on and write on console : SC Alert: Correct SCC not replaced - shutting managed system down! This is cured by moving out battery from ALOM card. Now server start to loop during the testing. That's on the console: >@(#) Sun Fire V440,Netra... (14 Replies)
Discussion started by: Alisher
14 Replies

8. Solaris

USB Hard Disk Drive Supported by Sun Fire V890

Hi, Can anyone suggest me any USB Hard Disk Drive which I can connect to Sun Fire V890 and take backup at a quick speed. A test with SolidState USB Hard Drive for backup work was taking writing at 2GB per hour for a 75GB backup. Regards, Tushar Kathe (1 Reply)
Discussion started by: tushar_kathe
1 Replies

9. Solaris

Sun Fire v440 keeps shutting down

Hello, I hope you can help me. I am new to Sun servers and we have a Sun Fire v440 server in which one power supply failed, we are waiting for new one. But now our server is shutting down constantly. Is there any setting with which we can prevent this behaviour? (1 Reply)
Discussion started by: Tibor
1 Replies

10. Solaris

Sun Fire V440 and Patch 109147-39

Got an curious issue. I applied 109147-39 to, oh 15 or so various systems all running Jumpstarted Solaris 8. When I hit the first two V440s, they both failed with Return code 139. All non shell commands segfaulted from then on. The patch modified mainly the linker libraries and commands. ... (2 Replies)
Discussion started by: BOFH
2 Replies
Login or Register to Ask a Question