SSA_DEVICE_ERROR - What could be?


 
Thread Tools Search this Thread
Operating Systems AIX SSA_DEVICE_ERROR - What could be?
# 1  
Old 07-17-2014
Linux SSA_DEVICE_ERROR - What could be?

Hello folks.

I have a problem with one of the disk of the RAID, the error message is the following:
Code:
LABEL:          SSA_DEVICE_ERROR
IDENTIFIER:     FE9E9357
 
Date/Time:       Mon Jul 14 11:00:01
Sequence Number: 48978087
Machine Id:      000DBC2D4C00
Node Id:         HAL1
Class:           H
Type:            PERM
Resource Name:   ssa0
Resource Class:  adapter
Resource Type:   ssa160
Location:        2A-08
VPD:
        Part Number................. 09L5693
        FRU Number.................. 09L2090
        Serial Number...............S9344332
        EC Level....................    F24713
        Manufacturer................IBM053
        ROS Level and ID............C400    0000
        Loadable Microcode Level....05
        Device Driver Level.........00
        Displayable Message.........SSA-ADAPTER
        Device Specific.(Z0)........SDRAM=064
        Device Specific.(Z1)........CACHE=00
        Device Specific.(Z2)........UID=000000062989EE7A
 
Description
DISK OPERATION ERROR
 
Probable Causes
DASD DEVICE
 
Failure Causes
DISK DRIVE
 
        Recommended Actions
        PERFORM PROBLEM DETERMINATION PROCEDURES
 
Detail Data
ERROR CODE
0440 0200 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

LABEL:          SSA_DISK_ERR4
IDENTIFIER:     F7863CFE

Date/Time:       Mon Jul 14 11:00:01
Sequence Number: 48978086
Machine Id:      000DBC2D4C00
Node Id:         HAL1
Class:           H
Type:            PERM
Resource Name:   pdisk15
Resource Class:  pdisk
Resource Type:   scsd
Location:        2A-08-P
VPD:
        Manufacturer................IBM-SSG
        Machine Type and Model......T54D073
        Part Number.................22R1518
        ROS Level and ID............7205
        Serial Number...............3KP0KDJ1
        EC Level....................8352180149
        Device Specific.(Z2)........  30C613
        Device Specific.(Z3)........22R1518
        Device Specific.(Z4)........000-0

Description

DISK OPERATION ERROR

Probable Causes
DASD DEVICE
 
Failure Causes
DISK DRIVE
        Recommended Actions
        PERFORM PROBLEM DETERMINATION PROCEDURES
Detail Data
SENSE DATA
7000 0200 0000 0018 0000 0000 0400 0200 0000 0000 0400 0000 0000 0000 0000 0000

If i print the following command line, it shows that all disks are OK:
Code:
root [HAL1]% /usr/ssa/ssaraid/bin/ssaraid.smit lsmssaraid_hdr_cmd_to_exec  -l>
hdisk26         BC2D4E9E52AD4CK system    degraded   437.4GB raid_5
hdisk27         BC2D4E9E52CD4CK system    degraded   437.4GB raid_5

root [HAL1]% /usr/ssa/ssaraid/bin/ssaraid.smit ls_hsm_array_status   'ssa0'
ssa0
    Component        Location           Size   Pool     Protected  Status
hdisk26          raid_5
    Blank-ReservedZ                       n/a  pool_A0     no      n/a
    pdisk5           2A-08-A002-08-P   72.9GB  pool_A0     no      good
    pdisk6           2A-08-A002-01-P   72.9GB  pool_A0     no      good
    pdisk7           2A-08-A002-07-P   72.9GB  pool_A0     no      good
    pdisk8           2A-08-A002-02-P   72.9GB  pool_A0     no      good
    pdisk13          2A-08-A002-04-P   72.9GB  pool_A0     no      good
    pdisk14          2A-08-A002-03-P   72.9GB  pool_A0     no      good
    hdisk27          raid_5
    pdisk0           2A-08-A002-12-P   72.9GB  pool_B0     no      good
    pdisk1           2A-08-A002-13-P   72.9GB  pool_B0     no      good
    pdisk2           2A-08-A002-10-P   72.9GB  pool_B0     no      good
    pdisk3           2A-08-A002-16-P   72.9GB  pool_B0     no      good
    pdisk9           2A-08-A002-15-P   72.9GB  pool_B0     no      good
    pdisk12          2A-08-A002-11-P   72.9GB  pool_B0     no      good
    Blank-ReservedZ                       n/a  pool_B0     no      n/a
root [HAL1]% /usr/ssa/ssaraid/bin/ssaraid.smit lcssaraid_hdr_cmd_to_exec  -l 'ssa0'
pdisk15         00B006FDE88D00D free      good      0       disk

But I keep getting the same error message about disk pdisk15 and the adapter

Is there any further research I should do? or what could be a cause of this?

Last edited by Scott; 07-17-2014 at 09:01 PM.. Reason: PLEASE use code tags
# 2  
Old 07-17-2014
Looks like pdisk15 has failed, what more do you want to know?
# 3  
Old 07-18-2014
Quote:
Description
DISK OPERATION ERROR

Probable Causes
DASD DEVICE

Failure Causes
DISK DRIVE
Says it all really. The drive will need to be replaced.

To find out what simple hdisk it relates to, try ssaxlate -l pdisk15 (lower case L flag)

If it's in one of your RAID sets, then I regret I have no experience in those.




Robin
# 4  
Old 07-18-2014
Hello.

Thank you both for your help Smilie I,m extremely new into this world of AIX and pseries.

You both agree that pdisk15 had died, but physically on the pseries why do I see all disk with green light?, if one of them is death, it should show with ligth off or orange ligth?.

Sorry if my question sound extremely akward Smilie I just want to know exactly what disk to replace.

Last edited by little_ball; 07-18-2014 at 10:42 AM..
# 5  
Old 07-18-2014
Just found this one to (maybe) make sure, that pdisk15 is the culprit or soon to be one:

Code:
ssa_diag command 
Purpose
        To Run Diagnostic style tests to a specified device.

[Syntax, Description]

ssa_diag is found in /usr/lpp/diagnostics/bin and is invoked:

 ssa_diag -l pdiskX
or
 ssa_diag -l ssaX

additional parameters are:

 [-a] : which causes the adapter to be reset
if it is an adapter being tested. (This has no effect on a disk test)
 [-u] : which forces a disk reservation to be broken, if it is a disk
which is being tested. (This has no effect on an adapter test)
 [-s] : This can only be used with a disk device, and requests the output
of the power status. (This flag cannot be used with the -a or -u flag)

Output

if an error is detected, then a message such as:

ssa0 SRN 42500

will be sent to stdout. If there is no problem, then there is no message
sent to stdout.
A non-zero return code indicates an error, for which output will be sent to
stderr.

Power Status output (to stdout) is as follows:
        pdisk0 0                which means pdisk0 Power good
        pdisk0 1                which means pdisk0 Lost Redundancy
        pdisk0 2                which means pdisk0 Failed

This User Gave Thanks to zaxxon For This Post:
# 6  
Old 07-18-2014
Hello.

Thank you all for your help Smilie I did use some of the command line expose here, but unfortunately every time I tried to view pdisk15 output came with "Error on device" so, finally in "diag" it shows me that pdisk15 was on status "failed" and also it shows up the serial number of the disk, so now I know what disk it is in the enclosure. It still keeps a "green ligth on" but, if the system says it has failed i should trust the system instead of the disk ligth.

I will replace the disk and see how it goes....thank you all Smilie
Login or Register to Ask a Question

Previous Thread | Next Thread
Login or Register to Ask a Question