File System Corruption on IBM DS8300


 
Thread Tools Search this Thread
Operating Systems AIX File System Corruption on IBM DS8300
# 1  
Old 05-27-2009
File System Corruption on IBM DS8300

Hi All,

We are facing the problem of file system corruption on DS8300,we have done very much effort to find out the root cause of problem but we still not get any success, we have AIX 5.3 OS installed on system with latest patches, we have upgraded HBA firmwares, DS8300 firmware, System firmware, Upgraded the Fabric Switches firmwares, recently deployed brand new switches but sill the problem exists, when problem occurs we have to down our live services and unmount the affected file system & repair the file system by fsck utility & then we have to restart the services which results in down time of about 30-40 minutes, we have raised the problem to IBM whenever the problem arise but when they analyzed they haven't find any abnormality they analyzed the PE packages in DS but they didn't find any abnormality. Have anyone received these file system corruption error on DS or any suggestion Idea ?

Regards
# 2  
Old 05-27-2009
What is the data or application writing or using the 'corrupt' data?
This could be an application problem.
Does the application use raw filesystems?
Get your application provider in the loop and get them talking to IBM.
HTH.
# 3  
Old 05-28-2009
You will have to ask IBM to help you fix this

There are a few things that I know you need to have a look
If you are using the new 450 disks and you are using space efficient flash copy you must upgrade to the latest level.

There was problem on this also disk that are SAN boot root disk will have to be recreated in some cases to recover from this problem. This can be done by remirroring and moving the disk but then the old disk must be removed and recreated on the storage.
Smilie
# 4  
Old 05-28-2009
The corruption of file system was occurred some times on Application File system & some times on Database file system, We are using Bea Weblogic & Oracle 10g Database, previously the corruption error was occurred with a time span of two weeks, now form last two months we noted that the corruption error came on DS8300 after two weeks We do not have HACMP environment but we are using SDDPCM with MPIO for attaching host, we made the VG's & FS available on one host at a time.

The error which came on host was FILE SYSTEM CORRUPTION when we saw the error report by errpt -a it shows some file name j2imap.c and in the end the name of the effected file system was written.Yes fsck always fix it. When the fsck repairing the file system it display messages like super block mark dirty but Fixed.
# 5  
Old 05-28-2009
Quote:
Originally Posted by m_raheelahmed
We do not have HACMP environment but we are using SDDPCM with MPIO for attaching host, we made the VG's & FS available on one host at a time.
This sounds like asking for trouble: if somehow two machines concurrently access the volumes it will result in corrupted filesystems, probably even if not both systems actually write to the disk. I remember reading the HACMP scripts for taking over the shared volumes from one cluster node to another once (back in the days when disks were SCSI or SSA) and they were an absolute nightmare of low-level device manipulation to avoid such problems.

Verify you really really always access the LUNs only from one system at a time.

I hope this helps.

bakunin
# 6  
Old 05-29-2009
we use the same configuration,

mpio sddpcm, two vio-server for non-hacmp, two for hacmp systems

p570 Power 6 (9117-MMA) here for this example

oracle 10g with ocr and asm, oracle 10g on jfs2, many db2 v9.1 on jfs2 with SAP, a lot of java applikations, which are always candidates for damaging filesystems, and no problems

I can't tell you whats wrong on your systems, but I can tell you our settings:

Code:
vio-server:
:/home/padmin-->lsdev -Ccadapter | grep fc
fcs0    Available 02-00 4Gb FC PCI Express Adapter (df1000fe)
fcs1    Available 02-01 4Gb FC PCI Express Adapter (df1000fe)
fcs2    Available 03-00 4Gb FC PCI Express Adapter (df1000fe)
fcs3    Available 03-01 4Gb FC PCI Express Adapter (df1000fe)

:/home/padmin-->lsattr -El fcs0
bus_intr_lvl             Bus interrupt level                                False
bus_io_addr   0xff800    Bus I/O address                                    False
bus_mem_addr  0xffe7e000 Bus memory address                                 False
init_link     al         INIT Link flags                                    True
intr_msi_1    66085      Bus interrupt level                                False
intr_priority 3          Interrupt priority                                 False
lg_term_dma   0x800000   Long term DMA                                      True
max_xfer_size 0x200000   Maximum Transfer Size                              True
num_cmd_elems 1024       Maximum number of COMMANDS to queue to the adapter True
pref_alpa     0x1        Preferred AL_PA                                    True
sw_fc_class   2          FC Class for Fabric                                True

all adapters have the same settings, mode is load balance

>pcmpath query device 37

DEV#:  37  DEVICE NAME: hdisk37  TYPE: 2107900  ALGORITHM:  Load Balance
SERIAL: 75DP8911018
===========================================================================
Path#      Adapter/Path Name          State     Mode     Select     Errors
    0           fscsi0/path0           OPEN   NORMAL   25009976         10
    1           fscsi0/path1           OPEN   NORMAL   25033949         10
    2           fscsi1/path2           OPEN   NORMAL   25019659          8
    3           fscsi1/path3           OPEN   NORMAL   25029155          9
    4           fscsi2/path4           OPEN   NORMAL   25018403          8
    5           fscsi2/path5           OPEN   NORMAL   25031846          8
    6           fscsi3/path6           OPEN   NORMAL   25034755          9
    7           fscsi3/path7           OPEN   NORMAL   25022454          9

/home/padmin-->lslpp -l | grep -i sddpc
  devices.sddpcm.53.rte      2.2.0.0  COMMITTED  IBM SDD PCM for AIX V53
  devices.sddpcm.53.rte      2.2.0.0  COMMITTED  IBM SDD PCM for AIX V53

/home/padmin-->oslevel -s
5300-08-01-0819

/home/padmin-->ioslevel
1.5.2.1-FP-11.1



>lscfg -vpl fcs0
  fcs0             U789D.001.DQD21A1-P1-C2-T1  4Gb FC PCI Express Adapter (df1000fe)

        Part Number.................10N7255
        Serial Number...............xxx
        Manufacturer................001F
        EC Level....................A
        Customer Card ID Number.....xxx
        FRU Number.................. 10N7255
        Device Specific.(ZM)........3
        Network Address.............xxx
        ROS Level and ID............02E82752
        Device Specific.(Z0)........2057706D
        Device Specific.(Z1)........00000000
        Device Specific.(Z2)........00000000
        Device Specific.(Z3)........03000909
        Device Specific.(Z4)........FFE01212
        Device Specific.(Z5)........02E82752
        Device Specific.(Z6)........06E12715
        Device Specific.(Z7)........07E12752
        Device Specific.(Z8)........xxx
        Device Specific.(Z9)........ZS2.71A2
        Device Specific.(ZA)........Z1F2.70A5
        Device Specific.(ZB)........Z2F2.71A2
        Device Specific.(ZC)........00000000
        Hardware Location Code......U789D.001.DQD21A1-P1-C2-T1


  PLATFORM SPECIFIC

  Name:  fibre-channel
    Model:  10N7255
    Node:  fibre-channel@0
    Device Type:  fcp
    Physical Location: U789D.001.DQD21A1-P1-C2-T1




lsmcode -d fcs0:

Microcode: 

DISPLAY MICROCODE LEVEL                                                   802110
fcs0    4Gb FC PCI Express Adapter (df1000fe)





sample lpar: 

:/-->lspath
ussap103:/-->lspath
Enabled hdisk0  vscsi2
Enabled hdisk57 vscsi1
Enabled hdisk2  vscsi2
Enabled hdisk58 vscsi1
Enabled hdisk60 vscsi1
Enabled hdisk55 vscsi1
Enabled hdisk61 vscsi1
Enabled hdisk62 vscsi1
Enabled hdisk56 vscsi1
Enabled hdisk63 vscsi1
Enabled hdisk64 vscsi1
Enabled hdisk59 vscsi1
Enabled hdisk2  vscsi0
Enabled hdisk0  vscsi0
Enabled hdisk55 vscsi3
Enabled hdisk56 vscsi3
Enabled hdisk57 vscsi3
Enabled hdisk58 vscsi3
Enabled hdisk59 vscsi3
Enabled hdisk60 vscsi3
Enabled hdisk61 vscsi3
Enabled hdisk62 vscsi3
Enabled hdisk63 vscsi3
Enabled hdisk64 vscsi3


vscsi0 root disks from vio1
vscsi2 root disks from vio2
vscsi1 data disks from vio1
vscsi3 data disks from vio2


:/-->oslevel -s
5300-07-01-0748

if you need more information, feel free to ask ^^


I would run
Code:
filemon -O lf,lv,pv

to trace read/write errors and files accessed


tracefile will be very big!
Login or Register to Ask a Question

Previous Thread | Next Thread

7 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

File System corruption

Hi, While a tar file was created, the file system got full and there was no message on the tar failure. Then the system was shut down and the administrator says because the file system was full the shut down procedure corrupted the file system. I'm wondering, unix should have given some... (2 Replies)
Discussion started by: manivanm
2 Replies

2. AIX

IBM System Director Installation

Anyone has installed IBM System Director? May I have the procedures manual or any experiences can be shared? (1 Reply)
Discussion started by: kwliew999
1 Replies

3. AIX

IBM System Director

Is IBM System Director good for collecting error and notifications from IBM servers such as x3250 x336 etc... or Please give me brief description for the purpose of IBM system Director Thanks in advance (1 Reply)
Discussion started by: Vit0_Corleone
1 Replies

4. UNIX for Advanced & Expert Users

File system testing for Data corruption

Hi, could any one tell is there any test-suite or any idea How to do data corruption validation testing, means there is no any data corruption ? Regards Manish (1 Reply)
Discussion started by: manish_tcs_hp
1 Replies

5. AIX

installation of 5.2 on iBM p series system

Hi all, we have iBM p series server on that 4. 3 operating system is runing.but i need ti install 5.2 or 5.3 then i ahve to install oracle 10g release 2 .but we have only 1 GB of RAM.can i install 5.2 or 5.3 with same RAM and please send me a document which discribe about how to install... (5 Replies)
Discussion started by: younusdba
5 Replies

6. UNIX for Advanced & Expert Users

Using SCP command in IBM AIX to download file from remote to local system

Hi, When i run the code in solaris unix machine, the file from remote server is getting downloaded. but when i use the same code in IBM AIX remote machine, it is not running. It is saying "Erro during scp transfer." Below is the code. Please give some resolution. SCPClient client = new... (1 Reply)
Discussion started by: gravi2020
1 Replies

7. UNIX for Advanced & Expert Users

file corruption

Hi, All of a sudden I landed in a strange problem. I was working with my C source code in vi editor. I did a wq! and when reopened, the file is full of "data".. I mean the text contents are gone!!. I believe this is a file corruption. I have tried the -r option with vi, but no success. ... (5 Replies)
Discussion started by: shibz
5 Replies
Login or Register to Ask a Question