Sponsored Content
Full Discussion: 13 disk raidz2 pool lost
Operating Systems Solaris 13 disk raidz2 pool lost Post 302707289 by tatxo on Friday 28th of September 2012 10:05:05 AM
Old 09-28-2012
13 disk raidz2 pool lost

Hi guys, I appreciate any help in this regard, we have lost sensitive data in the company.

One box with 2 disk mirrored and a 3ware controller handling 13 disks in a raidz2 pool. Suddenly the box restart and keeps "Reading ZFS config" for hours.

Unplugging disk by disk we isolate the disk was causing the system not to be able to restar and we execute 'zpool clear -F' as suggested by 'zpool status' command. During hours of proccess we get a console error from the controller, and the system hangs, so we decide to change such disk, getting the pool from DEGRADED to FAULTED. After one 'zpool clear' we get the pool again DEGRADED, but no access to data, so we try to roll back with previous disks. (we didn't commit any 'zpool replace').

The box keeps restarting, freezing and unable to boot, so we decide to plug the original 13 disks in another box with same hardware.

Now we are trying to import the pool here, after hours of proccess and huge disk activity, the box hangs and the import doesn't succeed. This is the result of 'zpool import' command:

Code:
state: DEGRADED
status: The pool was last accessed by another system.
action: The pool can be imported despite missing or damaged devices.  The
        fault tolerance of the pool may be compromised if imported.
   see: http://www.sun.com/msg/ZFS-8000-EY
config:

        zsan08rz2     DEGRADED
          raidz2-0    DEGRADED
            c10t2d0   FAULTED  corrupted data
            c10t2d0   ONLINE
            c10t5d0   ONLINE
            c10t9d0   ONLINE
            c10t0d0   ONLINE
            c10t1d0   ONLINE
            c10t4d0   ONLINE
            c10t8d0   ONLINE
            c10t12d0  ONLINE
            c10t11d0  ONLINE
            c10t3d0   ONLINE
            c10t7d0   ONLINE
            c10t6d0   ONLINE

Any ideas? Note that c10t2d0 is duplicated, and note that during las import process we got this error from the controller in the console:

Code:
zsan08 tw: WARNING: tw0: tw_aen_task AEN 0x000a Drive error detected unit=7 port=13

This drive seems to be different than the drive c10t2d0.

Suggestions? Thanks!
 

7 More Discussions You Might Find Interesting

1. Infrastructure Monitoring

zfs - migrate from pool to pool

Here are the details. cnjr-opennms>root$ zfs list NAME USED AVAIL REFER MOUNTPOINT openpool 20.6G 46.3G 35.5K /openpool openpool/ROOT 15.4G 46.3G 18K legacy openpool/ROOT/rds 15.4G 46.3G 15.3G / openpool/ROOT/rds/var 102M ... (3 Replies)
Discussion started by: pupp
3 Replies

2. Ubuntu

Disk Space lost mysteriously upon breaking a process.

Hi All, Today when I was working on a script to generate custom wordlist. So I ran a script and the output was directed to /tmp. The disk space was around 19 gb. While the script was running, I decided to direct the o/p file to my 1TB drive. So I broke the run using Ctrl + C. Now when I... (4 Replies)
Discussion started by: morningSunshine
4 Replies

3. Boot Loaders

Lost MBR on disk

trying to recover a lost partition table, where the signature (0x55AA) has been lost, though attempting to restore using a number of tools (fdisk, testdisk et al) the write fails. also the os is unable to read the disk geometry correctly, after attempting to correct the geometry, the updated... (2 Replies)
Discussion started by: xaphan
2 Replies

4. Solaris

zfs raidz2 - insufficient replicas

I lost my system volume in a power outage, but fortunately I had a dual boot and I could boot into an older opensolaris version and my raidz2 7 drive pool was still fine. I even scrubbed it, no errors. However, the older os has some smb problems so I wanted to upgrade to opensolaris11. I... (3 Replies)
Discussion started by: skk
3 Replies

5. Solaris

Lost Root Password on VXVM Encapsulated Root Disk

Hi All Hope it's okay to post on this sub-forum, couldn't find a better place I've got a 480R running solaris 8 with veritas volume manager managing all filesystems, including an encapsulated root disk (I believe the root disk is encapsulated as one of the root mirror disks has an entry under... (1 Reply)
Discussion started by: sunnyd76
1 Replies

6. Solaris

Need to remove a disk from zfs pool

I accidently added a disk in different zpool instead of pool, where I want. root@prtdrd21:/# zpool status cvfdb2_app_pool pool: cvfdb2_app_pool state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM cvfdb2_app_pool ONLINE 0 0 0... (1 Reply)
Discussion started by: solaris_1977
1 Replies

7. Solaris

How to clear a removed single-disk pool from being listed by zpool import?

On an OmniOS server, I removed a single-disk pool I was using for testing. Now, when I run zpool import it will show it as FAULTED, since that single disk not available anymore. # zpool import pool: fido id: 7452075738474086658 state: FAULTED status: The pool was last... (11 Replies)
Discussion started by: priyadarshan
11 Replies
TWE(4)							   BSD Kernel Interfaces Manual 						    TWE(4)

NAME
twe -- 3ware 5000/6000/7000/8000 series PATA/SATA RAID adapter driver SYNOPSIS
To compile this driver into the kernel, place the following lines in your kernel configuration file: device pci device twe Alternatively, to load the driver as a module at boot time, place the following line in loader.conf(5): twe_load="YES" DESCRIPTION
The twe driver provides support for AMCC's 3ware 5000/6000/7000/8000 series PATA/SATA RAID adapters. These adapters were formerly known as ``3ware Escalade''. These devices support 2, 4, 8, or 12 ATA disk drives and provide RAID0 (striping) and RAID1 (mirroring) functionality. HARDWARE
The twe driver supports the following PATA/SATA RAID controllers: o AMCC's 3ware 5000 series o AMCC's 3ware 6000 series o AMCC's 3ware 7000-2 o AMCC's 3ware 7006-2 o AMCC's 3ware 7500-4LP o AMCC's 3ware 7500-8 o AMCC's 3ware 7500-12 o AMCC's 3ware 7506-4LP o AMCC's 3ware 7506-8 o AMCC's 3ware 7506-12 o AMCC's 3ware 8006-2LP o AMCC's 3ware 8500-4LP o AMCC's 3ware 8500-8 o AMCC's 3ware 8500-12 o AMCC's 3ware 8506-4LP o AMCC's 3ware 8506-8 o AMCC's 3ware 8506-8MI o AMCC's 3ware 8506-12 o AMCC's 3ware 8506-12MI DIAGNOSTICS
Controller initialisation phase twe%d: microcontroller not ready The controller's onboard CPU is not reporting that it is ready; this may be due to either a board or system failure. Initialisation has failed. twe%d: no attention interrupt twe%d: can't drain AEN queue twe%d: reset not reported twe%d: controller errors detected twe%d: can't drain response queue twe%d: reset %d failed, trying again The controller is not responding correctly to the driver's attempts to reset and initialise it. This process is retried several times. twe%d: can't initialise controller, giving up Several attempts to reset and initialise the controller have failed; initialisation has failed and the driver will not attach to this con- troller. Driver initialisation/shutdown phase twe%d: register window not available twe%d: can't allocate register window twe%d: can't allocate parent DMA tag twe%d: can't allocate interrupt twe%d: can't set up interrupt twe%d: can't establish configuration hook A resource allocation error occurred while initialising the driver; initialisation has failed and the driver will not attach to this con- troller. twe%d: can't detect attached units Fetching the list of attached units failed; initialisation has failed. twe%d: error fetching capacity for unit %d twe%d: error fetching state for unit %d twe%d: error fetching descriptor size for unit %d twe%d: error fetching descriptor for unit %d twe%d: device_add_child failed twe%d: bus_generic_attach returned %d Creation of the disk devices failed, either due to communication problems with the adapter or due to resource shortage; attachment of one or more units may have been aborted. Operational phase twe%d: command completed - %s A command was reported completed with a warning by the controller. The warning may be one of: redundant/inconsequential request ignored failed to write zeroes to LBA 0 failed to profile TwinStor zones twe%d: command failed - %s A command was reported as failed by the controller. The failure message may be one of: aborted due to system command or reconfiguration aborted access error access violation device failure controller error timed out invalid unit number unit not available undefined opcode request incompatible with unit invalid request firmware error, reset requested The command will be returned to the operating system after a fatal error. twe%d: command failed submission - controller wedged A command could not be delivered to the controller because the controller is unresponsive. twe%d: AEN: <%s> The controller has reported a change in status using an AEN (Asynchronous Event Notification). The following AENs may be reported: queue empty soft reset degraded mirror controller error rebuild fail rebuild done incomplete unit initialisation done unclean shutdown detected drive timeout drive error rebuild started aen queue full AENs are also queued internally for use by management tools. twe%d: error polling for signalled AENs The controller has reported that one or more status messages are ready for the driver, but attempting to fetch one of these has returned an error. twe%d: AEN queue overflow, lost AEN <%s> A status message was retrieved from the controller, but there is no more room to queue it in the driver. The message is lost (but will be printed to the console). twe%d: missing expected status bits %s twe%d: unexpected status bits %s A check of the controller's status bits indicates an unexpected condition. twe%d: host interrupt The controller has signalled a host interrupt. This serves an unknown purpose and is ignored. twe%d: command interrupt The controller has signalled a command interrupt. This is not used, and will be disabled. twe%d: controller reset in progress... The controller is being reset by the driver. Typically this is done when the driver has determined that the controller is in an unrecover- able state. twe%d: can't reset controller, giving up The driver has given up on resetting the controller. No further I/O will be handled. controller reset done, %d commands restarted The controller was successfully reset, and outstanding commands were restarted. AUTHORS
The twe driver and manual page were written by Michael Smith <msmith@FreeBSD.org>. Extensive work done on the driver by Vinod Kashyap <vkashyap@FreeBSD.org> and Paul Saab <ps@FreeBSD.org>. BUGS
The controller cannot handle I/O transfers that are not aligned to a 512-byte boundary. In order to support raw device access from user- space, the driver will perform alignment fixup on non-aligned data. This process is inefficient, and thus in order to obtain best perfor- mance user-space applications accessing the device should do so with aligned buffers. BSD
August 15, 2004 BSD
All times are GMT -4. The time now is 03:10 PM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy