zpool status shows things NOT OK, but 3rd party raid says all is well


Login or Register for Dates, Times and to Reply

 
Thread Tools Search this Thread
# 1  
zpool status shows things NOT OK, but 3rd party raid says all is well

Hi,
I've gone around with this on Oracle's site (and tech support) and ended up empty handed and without ideas of what to do to fix the problem.
Background:
V245, Solaris 10, has 2 12-disk infortrend RAIDs attached.
Have replaced faulty disks many times - familiar with the routine. However, this
time didn't go as routine, and didn't go as "replacement disk was faulty" - or anything else that would be normal and logical. This time, things went haywire.
Now, the RAID software says everything is OK - LUNs, logical disks, etc - no errors, no red lights, nothing in its log...
But,
Code:
idadcc# zpool status -v dp1
  pool: dp1
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: 
 scrub: scrub completed after 14h14m with 65 errors on Thu Apr  5 05:25:13 2012
config:

        NAME           STATE     READ WRITE CKSUM
        dp1            DEGRADED     0     0   154
          raidz1-0     DEGRADED     0     0   308
            spare-0    DEGRADED     0     0     0
              c2t0d0   DEGRADED     0     0     0  too many errors
              c2t0d11  ONLINE       0     0     0
            spare-1    DEGRADED     0     0     0
              c2t0d1   DEGRADED     0     0     0  too many errors
              c2t0d10  ONLINE       0     0     0
            c2t0d2     DEGRADED     0     0     0  too many errors
            c2t0d3     DEGRADED     0     0     0  too many errors
            c2t0d4     DEGRADED     0     0     0  too many errors
          raidz1-1     ONLINE       0     0     0
            c2t0d5     ONLINE       0     0     0
            c2t0d6     ONLINE       0     0     0
            c2t0d7     ONLINE       0     0     0
            c2t0d8     ONLINE       0     0     0
            c2t0d9     ONLINE       0     0     0
        spares
          c2t0d10      INUSE     currently in use
          c2t0d11      INUSE     currently in use

errors: Permanent errors have been detected in the following files...

(and a list of -- files that are actually my snapshots)....

I've done the "clear" and the "scrub" - all the usual tricks. Same result.

Code:
idadcc# echo | format
Searching for disks...done


AVAILABLE DISK SELECTIONS:
       0. c0t0d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424>
          /pci@1e,600000/pci@0/pci@a/pci@0/pci@8/scsi@1/sd@0,0
       1. c0t1d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424>
          /pci@1e,600000/pci@0/pci@a/pci@0/pci@8/scsi@1/sd@1,0
       2. c0t2d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424>
          /pci@1e,600000/pci@0/pci@a/pci@0/pci@8/scsi@1/sd@2,0
       3. c0t3d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424>
          /pci@1e,600000/pci@0/pci@a/pci@0/pci@8/scsi@1/sd@3,0
       4. c1t0d0 <IFT-A12U-G2421-347R-931.26GB>
          /pci@1f,700000/pci@0/LSILogic,scsi@2/sd@0,0
       5. c1t0d1 <IFT-A12U-G2421-347R-931.26GB>
          /pci@1f,700000/pci@0/LSILogic,scsi@2/sd@0,1
       6. c1t0d2 <IFT-A12U-G2421-347R-931.26GB>
          /pci@1f,700000/pci@0/LSILogic,scsi@2/sd@0,2
       7. c1t0d3 <IFT-A12U-G2421-347R-931.26GB>
          /pci@1f,700000/pci@0/LSILogic,scsi@2/sd@0,3
       8. c1t0d4 <IFT-A12U-G2421-347R-931.26GB>
          /pci@1f,700000/pci@0/LSILogic,scsi@2/sd@0,4
       9. c1t0d5 <IFT-A12U-G2421-347R-931.26GB>
          /pci@1f,700000/pci@0/LSILogic,scsi@2/sd@0,5
      10. c1t0d6 <IFT-A12U-G2421-347R-931.26GB>
          /pci@1f,700000/pci@0/LSILogic,scsi@2/sd@0,6
      11. c1t0d7 <IFT-A12U-G2421-347R-931.26GB>
          /pci@1f,700000/pci@0/LSILogic,scsi@2/sd@0,7
      12. c1t0d8 <IFT-A12U-G2421-347R-931.26GB>
          /pci@1f,700000/pci@0/LSILogic,scsi@2/sd@0,8
      13. c1t0d9 <IFT-A12U-G2421-347R-931.26GB>
          /pci@1f,700000/pci@0/LSILogic,scsi@2/sd@0,9
      14. c1t0d10 <IFT-A12U-G2421-347R-931.26GB>
          /pci@1f,700000/pci@0/LSILogic,scsi@2/sd@0,a
      15. c1t0d11 <IFT-A12U-G2421-347R-931.26GB>
          /pci@1f,700000/pci@0/LSILogic,scsi@2/sd@0,b
      16. c2t0d0 <IFT-A12U-G2421-347R-931.26GB>
          /pci@1f,700000/pci@0,2/scsi@1,1/sd@0,0
      17. c2t0d1 <IFT-A12U-G2421-347R-931.26GB>
          /pci@1f,700000/pci@0,2/scsi@1,1/sd@0,1
      18. c2t0d2 <IFT-A12U-G2421-347R-931.26GB>
          /pci@1f,700000/pci@0,2/scsi@1,1/sd@0,2
      19. c2t0d3 <IFT-A12U-G2421-347R-931.26GB>
          /pci@1f,700000/pci@0,2/scsi@1,1/sd@0,3
      20. c2t0d4 <IFT-A12U-G2421-347R-931.26GB>
          /pci@1f,700000/pci@0,2/scsi@1,1/sd@0,4
      21. c2t0d5 <IFT-A12U-G2421-347R-931.26GB>
          /pci@1f,700000/pci@0,2/scsi@1,1/sd@0,5
      22. c2t0d6 <IFT-A12U-G2421-347R-931.26GB>
          /pci@1f,700000/pci@0,2/scsi@1,1/sd@0,6
      23. c2t0d7 <IFT-A12U-G2421-347R-931.26GB>
          /pci@1f,700000/pci@0,2/scsi@1,1/sd@0,7
      24. c2t0d8 <IFT-A12U-G2421-347R-931.26GB>
          /pci@1f,700000/pci@0,2/scsi@1,1/sd@0,8
      25. c2t0d9 <IFT-A12U-G2421-347R-931.26GB>
          /pci@1f,700000/pci@0,2/scsi@1,1/sd@0,9
      26. c2t0d10 <IFT-A12U-G2421-347R-931.26GB>
          /pci@1f,700000/pci@0,2/scsi@1,1/sd@0,a
      27. c2t0d11 <IFT-A12U-G2421-347R-931.26GB>
          /pci@1f,700000/pci@0,2/scsi@1,1/sd@0,b

FWIW, it was the slot 1 disk that I replaced. I even replaced it with a different disk. I've gone through the delete/remake LUN/logical drive on the RAID several times. The raid looks happy. Solaris doesn't.
ANY suggestions welcome. Suggestions on Solaris-friendly vendors of RAIDs welcome - I'd like to get a system that doesn't end up with finger-pointing to "the other guy" whenever I have questions that I can't find answers to!
Thank you!!!

Moderator's Comments:
Mod Comment Welcome to the UNIX and Linux Forums. Please use code tags. Video tutorial on how to use them

Last edited by Scrutinizer; 04-06-2012 at 04:34 PM.. Reason: Code tags
# 2  
Here is why you're stuck between oracle zfs and the vendor:

Should ZFS Have a fsck Tool?

It doesn't help your situation. I have no good answer because zfs is SUPPOSED to be self healing. Thers is only zfs scrub, which I guess you've tried. It does sound like whatever the SAN software does to replicate/snapshot a lun had some issues.
# 3  
An fsck would be of no use. The file systems are mountable here. Whatever the file system, fsck is a tool that fixes the structure, not the file content which is the issue here.
I would do a snapshot to lock the faulty blocks and restore the problem files (and only them) from a backup.
# 4  
zpool clear
# 5  
Quote:
Originally Posted by jlouki01
zpool clear
as stated in the first post, that doesn't work!
# 6  
you can check the disk errors with iostat -En

Even your controller says everything is ok... it could be wrong.

Juan
Login or Register for Dates, Times and to Reply

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Computers #502
Difficulty: Medium
If a function uses a particular process or algorithm such as a Fast Fourier Transform to perform an operation, it would not be appropriate to document it in a series of comments in the source code.
True or False?

10 More Discussions You Might Find Interesting

1. Solaris

Zpool status shows scrub date of Dec 31, 1969

hello, We are using Solaris 11.3 on SPARC T5-2. The below is the actual output from "zpool status" command. The disks were scrubed last week, but it says the scrub repaired on Dec 31, 1969. Does anyone know how to correct this to report the correct date? Thanks pool: rpool state:... (5 Replies)
Discussion started by: jasonu
5 Replies

2. Shell Programming and Scripting

No such file or directory for 3rd party software

I am trying to use the KiFMM3D software with my code. I am compiling code in C++ and everything looks fine but I am getting an "no such file or directory" error regarding the KiFMM3d code. The exact error message is : In file included from... (0 Replies)
Discussion started by: larry burns
0 Replies

3. UNIX for Dummies Questions & Answers

Problem compiling 3rd party g++ program

I'm trying to compile a 3rd party program used for solid-state chemistry that calculates pore characteristics of an input material. The program was written between 2000 and 2006, so I believe the problem is that the headers used are outdated, but I'm not terribly computer savvy (and a complete... (1 Reply)
Discussion started by: motrax
1 Replies

4. Solaris

zpool status -v erros message

# zpool status -v pool: pool1 state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see:... (0 Replies)
Discussion started by: beginner
0 Replies

5. Linux

Will installing LINUX mean reinstalling my 3rd party apps?

Hi all, Long time UNIX admin, first time LINUX user. So I'm finally at the last straw with Windows. I hate it. I've always hated it but the wife was scared of change so I kept it going. But Window's insistence on "protecting" me by preventing me access to certain areas created hours of work... (14 Replies)
Discussion started by: Korn0474
14 Replies

6. AIX

finding 3rd party Applications installed on AIX

Hi,. I want to know how to find out 3rd party application installed on aix, example Oracle database if it is installed on aix box it is not showing as installed using lslpp -l command Regards, Manoj (1 Reply)
Discussion started by: manoj.solaris
1 Replies

7. Solaris

Findout 3rd party softwares in Linux/Solaris server

Hi all, how to find 3rd party softwares like Oracle,phpldapadmin,Citrix etc (other than packages) which are installed on a linux box. Please guide me to get this info'n on LINUX/SOLARIS. Thanks in advance, Uday (0 Replies)
Discussion started by: uday123
0 Replies

8. AIX

3rd Party Utilities to read Syslog

I'm new to UNIX / AIX and I'm trying to determine the best way to monitor the SYSLOG output generated from our RS6000. I apologize if there is another thread that already addresses this issue, I scanned the threads, but didn't see anything. Thanks in advance, Rosemary (0 Replies)
Discussion started by: ratrahan
0 Replies

9. Shell Programming and Scripting

How to pass variables to 3rd party unix menu?

Hello, I was wondering if it is possible to pass data to a unix driven 3rd party menu. Changing the code is out of the question. I have a menu with various options and I would like a ksh to execute the menu and input the required fields. For example. Main menu 1. Company Name 2. blah... (3 Replies)
Discussion started by: ctcuser
3 Replies

10. UNIX for Dummies Questions & Answers

root cron was override w/ 3rd party software

Hi Guys, I'm new in Unix Environment. Any Unix Guru around...I need help. My question is, is it possible that the root cron could be override with 3rd party software?How can it happen. Another thing, how the cron job works?, I mean how the Unix process the cron job , I don't have an idea... (2 Replies)
Discussion started by: kupal
2 Replies

Featured Tech Videos