AIX hard disk failure


 
Thread Tools Search this Thread
Operating Systems AIX AIX hard disk failure
# 8  
Old 03-05-2019
Hi Bakunin,

Code:
rmdev -Rl hdisk5
cfgmgr

We can remove it actually? and can be rediscovered? So this action is just delete the device file, not the data?

Code:
We have pulled out the failure disk pdisk5 and added a new hard disk.

In the first, the newly added disk is recognized as a hdisk1. Then I format it with
Code:
Create an Array Candidate pdisk and Format to 522 Byte Sectors

then it becomes the array candidate pdisk5. Then I use.
Code:
Add Disks to an Existing PCI-X SCSI Disk Array

The disk can be added, but the get the warning such as: the disk is not used for parity and not restriped". I have not captured the exact output.
--> this means that the new disk is only used for data and not used for parity checks and stripped data like Raid 5 behavior?

I though I should try with this first instead of "Add disks to an existing ..." . As after adding, the re-contruct option is not effective.
Code:
Reconstruct a PCI-X SCSI Disk Array

Here is the menu-list command:
Code:
  List PCI-X SCSI Disk Array Configuration
  Create an Array Candidate pdisk and Format to 522 Byte Sectors
  Create a PCI-X SCSI Disk Array
  Delete a PCI-X SCSI Disk Array
  Add Disks to an Existing PCI-X SCSI Disk Array
  Configure a Defined PCI-X SCSI Disk Array
  Change/Show Characteristics of a PCI-X SCSI Disk Array
  Reconstruct a PCI-X SCSI Disk Array
  Change/Show PCI-X SCSI pdisk Status
  Diagnostics and Recovery Options

I followed this for what I have done
IBM Knowledge Center Error

And this for rebuild -->PCI-X SCSI RAID Controller Reference Guide for AIX. Actually not having the chance to use it as mentioned above.
IBM Knowledge Center Error

You have any idea for this?

So just wonder, as recommended by you, we should unmount all devices/filesystems, but do this mean downtime also in application and not really the "hot-swap". In what case we can do an online replacement? As read, the disk is hot-swap, it can be done online, right? Please advise.

--- Post updated at 04:04 PM ---

Note: I have edited my post.

--- Post updated at 04:14 PM ---

If you look at the pdisks in the raid5 pdisk5
Code:
hdisk5    04-08-ff-0,1  Optimal     RAID 5 Array     714.3GB
 pdisk2   04-08-00-5,0  Active      Array Member     142.8GB
 pdisk8   04-08-00-8,0  Active      Array Member     142.8GB
 pdisk4   04-08-01-3,0  Active      Array Member     142.8GB
 pdisk3   04-08-01-5,0  Active      Array Member     142.8GB
 pdisk7   04-08-01-8,0  Active      Array Member     142.8GB
 pdisk5   04-08-01-4,0  Active      Array Member     142.8GB

We can see it has 6 pdisks: 6x142.8=856.8 GB
But with Raid5, we have total size=total disk -1 means 5x142.8=714GB. It matched with 714.3 GB above.

So the OS should recognize the hdisk5 and its VG is 714GB instead of only 540GB
Code:
xxx@/#lsvg -p SSAMvg
SSAMvg:
PV_NAME           PV STATE          TOTAL PPs   FREE PPs    FREE DISTRIBUTION
hdisk5            active            532         45          00..00..00..00..45


xxx@/#lsvg  SSAMvg
VOLUME GROUP:       SSAMvg                   VG IDENTIFIER:  00096f540000d7000000015371bd6d50
VG STATE:           active                   PP SIZE:        1024 megabyte(s)
VG PERMISSION:      read/write               TOTAL PPs:      532 (544768 megabytes)
MAX LVs:            256                      FREE PPs:       45 (46080 megabytes)
LVs:                8                        USED PPs:       487 (498688 megabytes)
OPEN LVs:           8                        QUORUM:         2
TOTAL PVs:          1                        VG DESCRIPTORS: 2
STALE PVs:          0                        STALE PPs:      0
ACTIVE PVs:         1                        AUTO ON:        yes
MAX PPs per VG:     32512
MAX PPs per PV:     1016                     MAX PVs:        32
LTG size (Dynamic): 256 kilobyte(s)          AUTO SYNC:      no
HOT SPARE:          no                       BB POLICY:      relocatable

# 9  
Old 03-05-2019
Quote:
Originally Posted by Phat
Code:
rmdev -Rl hdisk5
cfgmgr

We can remove it actually? and can be rediscovered? So this action is just delete the device file, not the data?
It does a bit more than that: it cleans out ODM entries regarding the disk and so on. They are created anew by cfgmgr But you are right insofar as the device is deleted, not its contents.

Quote:
Originally Posted by Phat
In the first, the newly added disk is recognized as a hdisk1. Then I format it with
Code:
Create an Array Candidate pdisk and Format to 522 Byte Sectors

Of course it is created as hdisk1 - the system sees a (new) single disk and creates a device file for it.

[QUOTE=Phat;303031756]
then it becomes the array candidate pdisk5. Then I use.
Code:
Add Disks to an Existing PCI-X SCSI Disk Array

Look - i am not all too proficient with the SMITty menus (i use it once a year maybe) and i have no AIX system at hand right now to look it up (save for the fact that in order to get these menus you have to install special software like the RAID driver). When you get a menu, instead of executing it you can press <F6> and display the command (or scriptlet) that would be executed. In addition you can look into the file ~root/smit.script to see what SMITty has executed before. I have no idea what the SMITty entry you quoted does.

Quote:
Originally Posted by Phat
The disk can be added, but the get the warning such as: the disk is not used for parity and not restriped". I have not captured the exact output.
--> this means that the new disk is only used for data and not used for parity checks and stripped data like Raid 5 behavior?
Probably - but since this would violate what a RAID does and how it does it i suppose the disk is not used at all, neither for data nor parity.

Quote:
Originally Posted by Phat
I though I should try with this first instead of "Add disks to an existing ..." . As after adding, the re-contruct option is not effective.
Code:
Reconstruct a PCI-X SCSI Disk Array

Here is the menu-list command:
Code:
  List PCI-X SCSI Disk Array Configuration
  Create an Array Candidate pdisk and Format to 522 Byte Sectors
  Create a PCI-X SCSI Disk Array
  Delete a PCI-X SCSI Disk Array
  Add Disks to an Existing PCI-X SCSI Disk Array
  Configure a Defined PCI-X SCSI Disk Array
  Change/Show Characteristics of a PCI-X SCSI Disk Array
  Reconstruct a PCI-X SCSI Disk Array
  Change/Show PCI-X SCSI pdisk Status
  Diagnostics and Recovery Options

To be honest: i have no idea. But you probably should "Change/Show PCI-X SCSI pdisk Status", and/or "Diagnostics and Recovery Options"

[QUOTE=Phat;303031756]I followed this for what I have done
IBM Knowledge Center Error

And this for rebuild -->PCI-X SCSI RAID Controller Reference Guide for AIX. Actually not having the chance to use it as mentioned above.
IBM Knowledge Center Error

You have any idea for this?[quote]

Yes: you should have followed the link on exactly this lastlinked webpage where it says Prepare to remove a disk drive from a system or expansion unit controlled by AIX and followed these instructions first. It says essentially what i told you too: do not pull a disk physically until it is deconfigured/removed from the system.

Quote:
Originally Posted by Phat
So just wonder, as recommended by you, we should unmount all devices/filesystems, but do this mean downtime also in application and not really the "hot-swap". In what case we can do an online replacement? As read, the disk is hot-swap, it can be done online, right?
Yes, it could have been done online but be aware that you already mistreated the system. Maybe it is still possible to do everything online but out of sheer paranoia (sorry - it's a professional trait) i would take a downtime at this point to make sure everything goes well. You haven't told us anything about your configuration but from what i do see in hardware configuration your system isn't exactly brand new (if i had to guess: POWER5 tops, probably running AIX 5.3. ML?, which would make it about 10-15 years old) so i would be even more paranoid. Probably everything is out of support if my wild guess is true. I haven't seen a RAID on any AIX-system perhaps for 15 years now and a RAID made of such small disks is probably quite old too.

Quote:
Originally Posted by Phat
So the OS should recognize the hdisk5 and its VG is 714GB instead of only 540GB
Code:
xxx@/#lsvg -p SSAMvg
SSAMvg:
PV_NAME           PV STATE          TOTAL PPs   FREE PPs    FREE DISTRIBUTION
hdisk5            active            532         45          00..00..00..00..45


xxx@/#lsvg  SSAMvg
VOLUME GROUP:       SSAMvg                   VG IDENTIFIER:  00096f540000d7000000015371bd6d50
VG STATE:           active                   PP SIZE:        1024 megabyte(s)
VG PERMISSION:      read/write               TOTAL PPs:      532 (544768 megabytes)

Yes, that is all correctly observed - it emphasizes what i said before: the pdisk is probably not used by the RAID at all., Still i wonder how the size got reduced when the disk failed - this should not be the case with a RAID. If a disk fails it still has the same amount of capacity, just nothing to spare any more.

Did you issue the chvg -g SSAMvg already? Or the size always have been that small? IBM RAIDs use not only data/parity disks but also "Hot Spare" disks which take over once another disk breaks. List the status of your pdisks, i think their current role should be displayed there.

I hope this helps.

bakunin
# 10  
Old 03-06-2019
Code:
if i had to guess: POWER5 tops, probably running AIX 5.3. ML?, which would make it about 10-15 years old) so i would be even more paranoid. Probably everything is out of support if my wild guess is true. I haven't seen a RAID on any AIX-system perhaps for 15 years now and a RAID made of such small disks is probably quite old too.

Yes, it's correct. It's power 5 and 5.3 ML. We work for customer who use the old technologies and that their legacy. We work on risk but no choice.

Actually execute it but not help. It seems not recognizing the new size
Code:
xxx@/#chvg -g SSAMvg
0516-1382 chvg: Volume group is not changed. None of the disks in the
        volume group have grown in size.
0516-732 chvg: Unable to change volume group SSAMvg.

Check the status it seems in good state
AIX hard disk failure-aix_raidpng
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. AIX

Clone 1 Hard disk fromIBM Intellipoint server with AIX 5.x

Hello to all, Im having a new task in a new world (AIX - IBM Servers) I have an IBM Server (Type - 9111-285 very old one) with one Hard disk (73 GB 10 K) with AIX 5.x, and I need to clone the existing disk to another with the same specifications. Could you please give me some advice in order... (7 Replies)
Discussion started by: trevian3969
7 Replies

2. Linux

C++ Code to Access Linux Hard Disk Sectors (with a LoopBack Virtual Hard Disk)

Hi all, I'm kind of new to programming in Linux & c/c++. I'm currently writing a FileManager using Ubuntu Linux(10.10) for Learning Purposes. I've got started on this project by creating a loopback device to be used as my virtual hard disk. After creating the loop back hard disk and mounting it... (23 Replies)
Discussion started by: shen747
23 Replies

3. Red Hat

How to monitor HP server hard disk failure ?

in red hat 4, 5 any one know any commands or any scritps to monitor HP DL 380 G5/6 server and trigger alarm when hard disk failed. thanks for all support ---------- Post updated at 02:45 PM ---------- Previous update was at 12:00 PM ---------- does HP ProLiant Support Pack support is... (4 Replies)
Discussion started by: maxlee24
4 Replies

4. SCO

declare disk driver for IDE hard disk

hi I've a fresh installation of SCO 5.0.7 on the IDE hard disk. For SCSI hard disk I can declare, for example blc disk driver using: # mkdev hd 0 SCSI-0 0 blc 0but it works for IDE hard disk? (3 Replies)
Discussion started by: ccc
3 Replies

5. Solaris

SAN disk failure

hi all, have a solaris 9 OS and a SAN disk which used to work fine is not getting picked up by my machine. can anyone point out things to check in order to troubleshoot this ?? thanks in advance. (3 Replies)
Discussion started by: cesarNZ
3 Replies

6. Filesystems, Disks and Memory

Looking for a solution to disk failure!

Hi people, I have been using my disk for quite a long time but the other day I heard the drive making some noise and had to restart the system again. But when I did that the disk would not boot and I fear that the data might be deleted or lost. So, if you people have any know about the ways to... (2 Replies)
Discussion started by: christopher4
2 Replies

7. Filesystems, Disks and Memory

Looking for a solution to disk failure!

Hi people, I have been using my disk for quite a long time but the other day I heard the drive making some noise and had to restart the system again. But when I did that the disk would not boot and I fear that the data might be deleted or lost. So, if you people have any know about the ways to get... (1 Reply)
Discussion started by: adam466
1 Replies

8. HP-UX

Disk Failure

I am new to being a Unix admin and have a question about replacing some hardware. I have a K class box using HP-UX 10.20 with three disks. Two of the drives are in one logical volume. Every 3 or 4 days, the syslog is showing that one of these drives is experiencing "POWERFAILED" and then recovering... (6 Replies)
Discussion started by: SemiOfCol
6 Replies

9. AIX

hard disk information in AIX

Hi, Other than df -k, is there any command that will tell me all physical hard drives installed on the system as well as the size of each one? I'm using AIX 5.1 Thanks, (3 Replies)
Discussion started by: quickfirststep
3 Replies

10. UNIX for Advanced & Expert Users

Disk failure

is there anu way by which i can find out if all the disks on the system are working ? Milind Shauche. (2 Replies)
Discussion started by: shauche
2 Replies
Login or Register to Ask a Question