Receiving: 4B436A3D 0313233216 T H fscsi0 LINK ERROR

03-14-2016

Registered User

71, 2

Join Date: Feb 2005

Last Activity: 10 April 2016, 2:10 PM EDT

Posts: 71

Thanks Given: 10

Thanked 2 Times in 2 Posts

Receiving: 4B436A3D 0313233216 T H fscsi0 LINK ERROR

Hey All,

I'm receiving the following error off of a Power5 9133-55A after I write 2-5 files to the LUN:

4B436A3D 0313233216 T H fscsi0 LINK ERROR

I can create the filesystem, volume groups etc etc. All goes well until there is sustained activity to the LUN then the above error shows up with no messages on the target.

Code:

[ AIX root@mdsnim01:/htpc ] lsattr -El fcs0
bus_intr_lvl  277        Bus interrupt level                                False
bus_io_addr   0xdf800    Bus I/O address                                    False
bus_mem_addr  0xe8081000 Bus memory address                                 False
init_link     al         INIT Link flags                                    True
intr_priority 3          Interrupt priority                                 False
lg_term_dma   0x800000   Long term DMA                                      True
max_xfer_size 0x400000   Maximum Transfer Size                              True
num_cmd_elems 200        Maximum number of COMMANDS to queue to the adapter True
pref_alpa     0x1        Preferred AL_PA                                    True
sw_fc_class   2          FC Class for Fabric                                True
tme           no         Target Mode Enabled                                True
[ AIX root@mdsnim01:/htpc ] lsattr -El fscsi0
attach       al        How this adapter is CONNECTED         False
dyntrk       yes       Dynamic Tracking of FC Devices        True+
fc_err_recov fast_fail FC Fabric Event Error RECOVERY Policy True+
scsi_id      0x1       Adapter SCSI ID                       False
sw_fc_class  3         FC Class for Fabric                   True
[ AIX root@mdsnim01:/htpc ]

Code:

[ AIX root@mdsnim01:/ ] /hbainfo
Total Adapters:                 2
This Adapter Index:             0
Adapter Name:                   com.ibm-df1000fd-1
Manufacturer:                   IBM
SerialNumber:                   1B70704261
Model:                          df1000fd
Model Description:              FC Adapter
HBA WWN:                        20000000C9621B82
Node Symbolic Name:
Hardware Version:
Driver Version:                 7.1.3.0
Option ROM Version:             02C82774
Firmware Version:               271304
Vendor Specific ID:             0
Number Of Ports:                1
Driver Name:                    /usr/lib/drivers/pci/efcdd
Port Index:                     0
Node WWN:                       20000000C9621B82
Port WWN:                       10000000C9621B82
Port Fc Id:                     1
Port Type:                      Private Loop
Port State:                     Operational
Port Symbolic Name:
OS Device Name:                 fcs0
Port Supported Speed:           4 GBit/sec
Port Speed:                     4 GBit/sec
Port Max Frame Size:            2112
Fabric Name:                    0000000000000000
Number of Discovered Ports:     1
Seconds Since Last Reset:       5060
Tx Frames:                      938801
Tx Words:                       478609152
Rx Frames:                      35195
Rx Words:                       3098112
LIP Count:                      1
NOS Count:                      0
Error Frames:                   0
Dumped Frames:                  0
Link Failure Count:             0
Loss of Sync Count:             2
Loss of Signal Count:           0
Primitive Seq Protocol Err Cnt: 0
Invalid Tx Word Count:          4
Invalid CRC Count:              0
[ AIX root@mdsnim01:/ ]
[ AIX root@mdsnim01:/ ]

Code:

Error log information:
          Date: Sun Mar 13 23:32:52 EDT 2016
          Sequence number: 7007
          Label: FCP_ERR4

The initiator card on the above is an LP11002 card and the target card is a QLogic 2464 card. I tried all sorts of things over the last 2 months but no luck. Still I get the above error. The connection breaks each time a significant amount of data is being transferred (1-4 GB). I'm wondering how to debug that card further? I'm aware of an APAR on some AIX versions that throw the above but I upgraded the OS as suggested yet the error still remains. Any other way to debug the above? I tried P2P and the cards negotiate for a few seconds then the connection is dropped. Arbitrary loop seems to work best but the connection fails on sustained writes.

[ AIX root@mdsnim01:/ ] oslevel -s
7100-03-00-0000
[ AIX root@mdsnim01:/ ]

Cheers,
DH

Devyn

View Public Profile for Devyn

Find all posts by Devyn

03-14-2016

Registered User

344, 99

Join Date: Feb 2015

Last Activity: 18 February 2020, 9:58 AM EST

Location: basement, Lubyanka, Moscow

Posts: 344

Thanks Given: 8

Thanked 99 Times in 88 Posts

Could you please post the full output of the error, including the sense information - errpt -j 4B436A3D -a

agent.kgb

View Public Profile for agent.kgb

Find all posts by agent.kgb

03-14-2016

Registered User

71, 2

Join Date: Feb 2005

Last Activity: 10 April 2016, 2:10 PM EDT

Posts: 71

Thanks Given: 10

Thanked 2 Times in 2 Posts

These two error messages below always accompany each other. So I'll post both to get feedback from others as I work through the solutions on this page IBM Technical support search - United States. However I tried to disable dynamic tracking already (Both I then I & T / I = Initiator and T = Target in this context), from that page, but that didn't help with the issue:

Code:

26623394   0314022216 T H fscsi0         COMMUNICATION PROTOCOL ERROR
4B436A3D   0314022216 T H fscsi0         LINK ERROR

Code:

[ AIX root@mdsnim01:/ ] errpt -aDj 4B436A3D
---------------------------------------------------------------------------
LABEL:          FCP_ERR4
IDENTIFIER:     4B436A3D

Date/Time:       Mon Mar 14 19:46:05 EDT 2016
Sequence Number: 8605
Machine Id:      000D6210D600
Node Id:         mdsnim01
Class:           H
Type:            TEMP
WPAR:            Global
Resource Name:   fscsi0
Resource Class:  driver
Resource Type:   efscsi
Location:        U787B.001.DNWCB61-P1-C1-T1


Description
LINK ERROR

        Recommended Actions
        PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
SENSE DATA
0000 0010 0000 002C 0000 0000 0301 0000 0000 0000 0000 0000 0000 0000 0000 2000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0001 0000 0000 0000 0002 0000 0000 0000 0000
2101 001B 32A1 8121 2001 001B 32A1 8121 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 012F 0000 0002 0000 0100 0000 0000 0000 0000 0301 0100 0000 0000 0002 0000
0000 0000 0000 0001 0000 0000 0000 0061 0000 0412 0000 0000 0000 0000 2A58 A000
2400 0000 48E0 8B28 0000 0000 0001 0001 0000 0000 0000 0000 2022 0100 069C 0200
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
2400 0000 48E0 8B28 0000 0000 0001 0001 0000 0000 0000 0000 2022 0100 069C 0200
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
2000 0000 0000 8C28 0000 0000 1801 0000 0010 8C28 0000 0000 0000 0000 0000 2022
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
1000 0000 C962 1B82 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

Diagnostic Analysis
Diagnostic Log sequence number: 1031
Resource tested:        fscsi0
Menu Number:            2602902
Description:


Error Log Analysis has detected multiple communication
errors.  These errors can be caused by attached devices,
a switch, a hub, or a SCSI-to-FC convertor.

If connected to a switch, refer to the Storage Area
Network (SAN) problem determination procedures for
additional problem resolution.

If not connected to a switch, run diagnostics on the
attached devices.  If a hub or SCSI-to-FC convertor is
attached, refer to the product documentation for problem
resolution.


---------------------------------------------------------------------------
LABEL:          FCP_ERR4
IDENTIFIER:     4B436A3D

Date/Time:       Mon Mar 14 02:33:03 EDT 2016
Sequence Number: 7180
Machine Id:      000D6210D600
Node Id:         mdsnim01
Class:           H
Type:            TEMP
WPAR:            Global
Resource Name:   fscsi0
Resource Class:  driver
Resource Type:   efscsi
Location:        U787B.001.DNWCB61-P1-C1-T1


Description
LINK ERROR

        Recommended Actions
        PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
SENSE DATA
0000 0010 0000 002C 0000 0000 0301 0000 0000 0000 0000 0000 0000 0000 0000 2000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0001 0000 0000 0000 0002 0000 0000 0000 0000
2101 001B 32A1 8121 2001 001B 32A1 8121 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 012F 0000 0002 0000 0100 0000 0000 0000 0000 0301 0100 0000 0000 0002 0000
0000 0000 0000 0001 0000 0000 0000 0061 0000 0412 0000 0000 0000 0000 2A58 A000
2400 0000 48E0 8B28 0000 0000 0001 0001 0000 0000 0000 0000 2015 0100 069C 0200
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
2400 0000 48E0 8B28 0000 0000 0001 0001 0000 0000 0000 0000 2015 0100 069C 0200
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
2000 0000 0000 8C28 0000 0000 1801 0000 0010 8C28 0000 0000 0000 0000 0000 2015
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
1000 0000 C962 1B82 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

Diagnostic Analysis
Diagnostic Log sequence number: 1025
Resource tested:        fscsi0
Menu Number:            2602902
Description:


Error Log Analysis has detected multiple communication
errors.  These errors can be caused by attached devices,
a switch, a hub, or a SCSI-to-FC convertor.

If connected to a switch, refer to the Storage Area
Network (SAN) problem determination procedures for
additional problem resolution.

If not connected to a switch, run diagnostics on the
attached devices.  If a hub or SCSI-to-FC convertor is
attached, refer to the product documentation for problem
resolution.


---------------------------------------------------------------------------
LABEL:          FCP_ERR4
IDENTIFIER:     4B436A3D

Date/Time:       Mon Mar 14 02:22:14 EDT 2016
Sequence Number: 7160
Machine Id:      000D6210D600
Node Id:         mdsnim01
Class:           H
Type:            TEMP
WPAR:            Global
Resource Name:   fscsi0
Resource Class:  driver
Resource Type:   efscsi
Location:        U787B.001.DNWCB61-P1-C1-T1


Description
LINK ERROR

        Recommended Actions
        PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
SENSE DATA
0000 0010 0000 002C 0000 0000 0301 0000 0000 0000 0000 0000 0000 0000 0000 2000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0001 0000 0000 0000 0002 0000 0000 0000 0000
2101 001B 32A1 8121 2001 001B 32A1 8121 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 012F 0000 0002 0000 0100 0000 0000 0000 0000 0301 0100 0000 0000 0002 0000
0000 0000 0000 0001 0000 0000 0000 0061 0000 0412 0000 0000 0000 0000 2A58 A000
2400 0000 48E0 8B28 0000 0000 0001 0001 0000 0000 0000 0000 2008 0100 069C 0200
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
2400 0000 48E0 8B28 0000 0000 0001 0001 0000 0000 0000 0000 2008 0100 069C 0200
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
2000 0000 0000 8C28 0000 0000 1801 0000 0010 8C28 0000 0000 0000 0000 0000 2008
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
1000 0000 C962 1B82 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

Diagnostic Analysis
Diagnostic Log sequence number: 1019
Resource tested:        fscsi0
Menu Number:            2602902
Description:


Error Log Analysis has detected multiple communication
errors.  These errors can be caused by attached devices,
a switch, a hub, or a SCSI-to-FC convertor.

If connected to a switch, refer to the Storage Area
Network (SAN) problem determination procedures for
additional problem resolution.

If not connected to a switch, run diagnostics on the
attached devices.  If a hub or SCSI-to-FC convertor is
attached, refer to the product documentation for problem
resolution.


[ AIX root@mdsnim01:/ ]

Code:

[ AIX root@mdsnim01:/ ] errpt -aDj 26623394
---------------------------------------------------------------------------
LABEL:          FCP_ERR12
IDENTIFIER:     26623394

Date/Time:       Mon Mar 14 19:46:19 EDT 2016
Sequence Number: 8606
Machine Id:      000D6210D600
Node Id:         mdsnim01
Class:           H
Type:            TEMP
WPAR:            Global
Resource Name:   fscsi0
Resource Class:  driver
Resource Type:   efscsi
Location:        U787B.001.DNWCB61-P1-C1-T1


Description
COMMUNICATION PROTOCOL ERROR

        Recommended Actions
        PERFORM PROBLEM DETERMINATION PROCEDURES

Detail Data
SENSE DATA
0000 0010 0000 00A1 0000 0013 0303 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0001 0000 0000 0000 0002 0000 0000 0000 0000
2101 001B 32A1 8121 2001 001B 32A1 8121 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 012F 0000 0002 0000 0100 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0001 0000 0000 0000 0040 0000 0412 0001 0000 0000 0000 2A58 A000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0602 8A13 0200
0019 0000 0000 0000 0000 0000 05D9 CBC8 0000 0001 0000 0000 0000 0000 0000 0000
0000 0001 636D 4643 F100 0A00 2BF5 80E8 F100 0A00 2BF5 815C F100 0A00 2BF5 706C
0000 0000 288B F0E8 0000 0000 288B F15C 0000 0000 288B E06C 0100 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0300 0000 0908 0000 8800 0800 00FF FFFF 0000 07D0 1000 0000 C962 1B82 2000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
1000 0000 C962 1B82 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0000 0001 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
Duplicates
Number of duplicates
           2
Time of first duplicate
Mon Mar 14 02:22:27 EDT 2016
Time of last duplicate
Mon Mar 14 19:46:19 EDT 2016
[ AIX root@mdsnim01:/ ]

Cheers,
DH

---------- Post updated at 08:17 PM ---------- Previous update was at 08:15 PM ----------

Just checking the time and notice all of these ended up getting logged at the exact same time:

Code:

E86653C3   0314022216 P H LVDD           I/O ERROR DETECTED BY LVM
C62E1EB7   0314022216 P H hdisk2         DISK OPERATION ERROR
26623394   0314022216 T H fscsi0         COMMUNICATION PROTOCOL ERROR
4B436A3D   0314022216 T H fscsi0         LINK ERROR

Just let me know if you need to see the first two. They seem symptomatic however.

Cheers,
DH

---------- Post updated at 09:15 PM ---------- Previous update was at 08:17 PM ----------

Code:

DISPLAY MICROCODE LEVEL                                                                                               802111
fcs0    FC Adapter

The current microcode level for fcs0 is 271304.

Use Enter to continue.

Devyn

View Public Profile for Devyn

Find all posts by Devyn

03-15-2016

Registered User

344, 99

Join Date: Feb 2015

Last Activity: 18 February 2020, 9:58 AM EST

Location: basement, Lubyanka, Moscow

Posts: 344

Thanks Given: 8

Thanked 99 Times in 88 Posts

the first error you receive - FCP_ERR4 4B436A3D - according to the sense information provided means, that AIX driver sent RESET command to the SAN device and didn't receive an answer. Usually it means, you have a SAN problem and you should open a case with your SAN switch or better - storage device vendor.

But as far as I see from the output of lsattr -El fscsi0 you don't have SAN. You have a direct-attached storage. If you have a SAN fabric, not a direct-attached storage, then you have a problem connecting to the fabric, mostly a broken cable is the cause.

If you really have a direct-attached storage, then I have some other question:
- how many LDEVs/LUNs do you receive from the storage?
- does the problem happen only with this LDEV (Nr. 00:00:00:00:00:00:00:02) or also with other LDEVs?
- is the storage connected through multiple adapters or is it the only adapter to the storage?
- how many different storages are connected using this adapter?

If it is a single storage directly connected through the single adapter, I would recommend:
- to check the cable
- to switch off dyntrk and fc_err_recov
- to minimize max_xfer_size and corresponding parameters on the hdisk

agent.kgb

View Public Profile for agent.kgb

Find all posts by agent.kgb

03-15-2016

Registered User

71, 2

Join Date: Feb 2005

Last Activity: 10 April 2016, 2:10 PM EDT

Posts: 71

Thanks Given: 10

Thanked 2 Times in 2 Posts

It's fiber card to fiber card and I'm zoning a single FILEIO device, which itself is sitting on a RAID 6 / XFS storage ( 6 disk ). I tried disabling dynamic tracking, no luck. It tried to change the cable, no luck. I'll read about the other options you mentioned as well. There's only one LUN involved and I'm able to write to it fine until some large data is being written but failure is 100% in each case.

The target system is SCST (Apologies I thought I mentioned but as I read above, I haven't yet.). Funny thing is that on restart of that SCST subsystem, I can get a LUN back following a failure. (Maybe memory leak.) I might try LIO / targetcli next if the above doesn't work.

Cheers,
DH

Devyn

View Public Profile for Devyn

Find all posts by Devyn

03-15-2016

Registered User

344, 99

Join Date: Feb 2015

Last Activity: 18 February 2020, 9:58 AM EST

Location: basement, Lubyanka, Moscow

Posts: 344

Thanks Given: 8

Thanked 99 Times in 88 Posts

Could you download the devscan tool and run it on your server?

https://www-304.ibm.com/support/docv...ixtoolsc9e095f

agent.kgb

View Public Profile for agent.kgb

Find all posts by agent.kgb

03-15-2016

Moderator

2,327, 710

Join Date: Feb 2012

Last Activity: 3 May 2020, 3:12 AM EDT

Location: Devon, UK

Posts: 2,327

Thanks Given: 442

Thanked 710 Times in 578 Posts

I see this thread has been open for over a day without resolution so, although I'm not qualified to answer the specifics, I thought I'd chip in anyway.

Firstly, my disclaimer. I'm not an AIX expert by any means and I have no knowledge of the LP11002. However, I do know the QL2464 very well and I was the technical director of a storage distributor many years ago and we shipped loads of fibre channel kit. So all I can do is tell you where I'd be looking in the first instance. I could well be completely wrong but here goes...........

The symptoms you describe indicate that everything is fine until the link gets really busy, then it screws up. Normal FC payload is 2112 giving a MTU of 2148 bytes total allowing for headers, etc. Some FC adapters support "jumbo" packets with a payload up to 9000 giving a MTU of 9036 bytes with headers. If the adapter supports jumbos, whether jumbo packets are enabled or not is a setting in the adapter BIOS. So if one adapter is set for jumbo and the other doesn't support jumbo then everything will work find with low traffic but when things really get going one of the adapters suddenly sends a jumbo packet that the other adapter cannot understand. So if I was fighting this issue I would look at both adapters and set the max payload to 2112 or the max MTU to 2148 or set the "support jumbo packets=no". Then test to see if the problem has gone away.

Needless to say, should you get to a known good working situation only change one thing at a time afterwards and fully test that it hasn't screwed up again.

I have no clue whether this will help you or not.

Good luck anyway.

Last edited by hicksd8; 03-15-2016 at 05:35 PM..

These 2 Users Gave Thanks to hicksd8 For This Post:

hicksd8

View Public Profile for hicksd8

Find all posts by hicksd8

AIX

Receiving: 4B436A3D 0313233216 T H fscsi0 LINK ERROR

10 More Discussions You Might Find Interesting

1. Solaris

/var/adm/messages (insterface turned off/restored) and link up & link down message.

Discussion started by: javeedkaleem

2. Solaris

/var/adm/messages (interface turned off/restored) and link up & link down message.

Discussion started by: javeedkaleem

3. UNIX for Dummies Questions & Answers

[Solved] Symbolic link not allowed or link target not accessible

Discussion started by: newbielgn

4. AIX

Error opening device: /dev/fscsi0

Discussion started by: aixn00b

5. Programming

g++ fails to link to static library when compilation and link in single command

Discussion started by: magelord

6. Shell Programming and Scripting

Why am i receiving too many argument error with this?

Discussion started by: garfish

7. Shell Programming and Scripting

Receiving error: ./ang.ksh[35]: 0403-057 Syntax error at line 116 : `done' is not expected.

Discussion started by: amitsinha

8. UNIX for Dummies Questions & Answers

Receiving error on Unix server-- java.lang.UnsatisfiedLinkError: registerNatives

Discussion started by: loveToBlade

9. Linux

link error problem

Discussion started by: niukun

10. Programming

Link Edit Error, Help!!!!!

Discussion started by: rachael