AIX hard disk failure


 
Thread Tools Search this Thread
Operating Systems AIX AIX hard disk failure
# 1  
Old 03-05-2019
AIX hard disk failure

Hi all,

I have encountered the issue with the hard disk, the disk is failed and need to replace by the new one.

As my understanding, this is just to take out the failed disk and insert the new ones, and that's all.

But the third party hardware vendor said, there should be another procedure in AIX for this activity.
I have attached the screenshot for disk checking.

Please advise if I miss something.
AIX hard disk failure-aix-disk1png
AIX hard disk failure-aix_disk2png
# 2  
Old 03-05-2019
I am only guessing, but I think you might need to format the new disk with the correct filesystem format and also correctly partition the disk before you swap them out.
# 3  
Old 03-05-2019
Hi Neo,

As further checking, I can see it is local disk with RAID 5 in "Change/Show PCI-X SCSI pdisk" in smitty. As shown below, it's a disk inside the hdisk5 scsi raid 5 disk array.

Code:
xxx@/#lsdev -Cc disk
hdisk0 Available 04-08-ff-0,0 SCSI RAID 10 Disk Array
hdisk2 Available 00-08-02     1814     DS4700 Disk Array Device
hdisk3 Available 00-08-02     1814     DS4700 Disk Array Device
hdisk4 Available 00-08-02     1814     DS4700 Disk Array Device
hdisk5 Available 04-08-ff-0,1 SCSI RAID 5 Disk Array

So I though, just take out the missing disk, insert the new disk and format it with "create an array candiate pdisk and format to 512 byte sectors". But I'm not sure about this.

Code:
 
                                              x                       Change/Show PCI-X SCSI pdisk                       x
                                              x                                                                          x
                                              x Move cursor to desired item and press Enter. Use arrow keys to scroll.   x
                                              x                                                                          x
                                              x   0940-038 scsi2: Open not attempted. Device not Available.              x
                                              x   0940-038 scsi3: Open not attempted. Device not Available.              x
                                              x   pdisk0    04-08-00-3,0  Active      Array Member     142.8GB           x
                                              x   pdisk1    04-08-00-4,0  Active      Array Member     142.8GB           x
                                              x   pdisk2    04-08-00-5,0  Active      Array Member     142.8GB           x
                                              x   pdisk8    04-08-00-8,0  Active      Array Member     142.8GB           x
                                              x   pdisk4    04-08-01-3,0  Active      Array Member     142.8GB           x
                                              x   pdisk5    04-08-01-4,0  Missing     Disk             142.8GB           x
                                              x   pdisk7    04-08-01-8,0  Active      Array Member     142.8GB           x
                                              x   pdisk3    04-08-01-5,0  Active      Array Member     142.8GB           x

# 4  
Old 03-05-2019
Did you let the system discover the new disk (cfgmgr...) ?
# 5  
Old 03-05-2019
I am a bit confused: is this a disk in your AIX system as you said in #1 or a disk in a separate RAID as you said in #2? Usually, if you have an external raid, by formatting it forms one virtual disk (the whole RAID set) which you then in turn can see in AIX as a single hdisk device. Please explain your hardware setup (what is connected to what, etc.) a bit more detailed.

If your disk is part of a RAID set which is managed by an external device you need to follow the procedures of this external device. That may be anything, you will have to look it up in the respective manual of the device.

Notice that the following only applies if the system is not part of a cluster!

If the disk is directly attached to the system (that basically means you have a hdisk device /dev/hdiskNN for this single disk) you CANNOT remove it simply! Disks are uniquely identified by a "PVID" (physical volume ID) and AIX will notice that this disk is not that disk, regardless of them being identical hardware. (To be honest, it is, in fact, possible to de-configure a physically removed disk but that is really complicated work including manually patching the ODM - you do NOT want to have to do that if you can avoid it.

The correct way to remove the disk is: identify part of which volume group it is by using the lspv command. Move all LVs occupying space on that disk to other disks (if it is only one copy of mirrored LVs simply remove the copy (rmlvcopy and remirror once the new disk is in place) by using the lmigratepp command.

When the disk has no occupied PPs any more (check with lsvg -p <volume-group>) remove it from the VG with:

Code:
reducevg <vg-name> <hdiskNN>

Now - ONLY NOW! - you can pull the hdisk and replace it with the new one. Run a cfgmgr then to discover the new disk. Add it the VG by doing a:

Code:
extendvg <vg-name> <hdiskNN>

it will format the disk and put a PVID on it in the process.

Now you can use the space provided by the new disk. If you have deleted a mirror from a LV before, create a new mirror using the mklvcopy command. If you want to move a whole (unmirrored) LV to the new disk: create a mirror copy the same way and remove the original. This is faster then moving single PPs around.

I hope this helps.

bakunin
This User Gave Thanks to bakunin For This Post:
# 6  
Old 03-05-2019
Hi Bakunin,

Quote:
]I am a bit confused: is this a disk in your AIX system as you said in #1 or a disk in a separate RAID as you said in #2? Usually, if you have an external raid, by formatting it forms one virtual disk (the whole RAID set) which you then in turn can see in AIX as a single hdisk device. Please explain your hardware setup (what is connected to what, etc.) a bit more detailed.
--> My hardware setup is that I have a RAID5 disk hdisk5 (inside it are the pdiskx).
Code:
hdisk0    04-08-ff-0,0  Optimal     RAID 10 Array    142.8GB
 pdisk0   04-08-00-3,0  Active      Array Member     142.8GB
 pdisk1   04-08-00-4,0  Active      Array Member     142.8GB

hdisk5    04-08-ff-0,1  Optimal     RAID 5 Array     714.3GB
 pdisk2   04-08-00-5,0  Active      Array Member     142.8GB
 pdisk8   04-08-00-8,0  Active      Array Member     142.8GB
 pdisk4   04-08-01-3,0  Active      Array Member     142.8GB
 pdisk3   04-08-01-5,0  Active      Array Member     142.8GB
 pdisk7   04-08-01-8,0  Active      Array Member     142.8GB
 pdisk5   04-08-01-4,0  Active      Array Member     142.8GB

My volume group
Code:
xxx@/#lsvg -p YYYY
SSAMvg:
PV_NAME           PV STATE          TOTAL PPs   FREE PPs    FREE DISTRIBUTION
hdisk5            active            532         45          00..00..00..00..45

My raid configuration
Code:
0940-038 scsi2: Open not attempted. Device not Available.
0940-038 scsi3: Open not attempted. Device not Available.
------------------------------------------------------------------------
Name      Location      State       Description        Size
------------------------------------------------------------------------
sisioa0   04-08         Available   PCI-X Dual Channel U320 SCSI RAID Adapter
 scsi0    04-08-00-07,0 NoLink      No remote adapter target
 scsi1    04-08-01-07,0 NoLink      No remote adapter target

hdisk0    04-08-ff-0,0  Optimal     RAID 10 Array    142.8GB
 pdisk0   04-08-00-3,0  Active      Array Member     142.8GB
 pdisk1   04-08-00-4,0  Active      Array Member     142.8GB

hdisk5    04-08-ff-0,1  Optimal     RAID 5 Array     714.3GB
 pdisk2   04-08-00-5,0  Active      Array Member     142.8GB
 pdisk8   04-08-00-8,0  Active      Array Member     142.8GB
 pdisk4   04-08-01-3,0  Active      Array Member     142.8GB
 pdisk3   04-08-01-5,0  Active      Array Member     142.8GB
 pdisk7   04-08-01-8,0  Active      Array Member     142.8GB
 pdisk5   04-08-01-4,0  Active      Array Member     142.8GB

We have pulled out the failure disk pdisk5 and added a new hard disk.
Anything is fine until trying to add the new disk to the RAID and encountered the issue below:

Code:
hdisk5 changed. hdisk 5 has been expanded. However, hdisk5 needs to be unconfigured and reconfigured prior to the system being able to use the increased capacity.
Note: the volume group, logical volumes, and file systems associated with hdisk5 might need to be changed in order to make use of the increased capacity.

After some checks, I can see the hdisk5 becomes bigger with new size but the VG still in the old size.

Last edited by bakunin; 03-05-2019 at 10:21 AM..
# 7  
Old 03-05-2019
Quote:
Originally Posted by Phat
After some checks, I can see the hdisk5 becomes bigger with new size but the VG still in the old size.
It seems that the "physical" layer of the RAID is already reconfigured. Perhaps you have re-read the configuration too with the cfgmgr command and hence hdisk5 (this is the "logical" representation of the whole RAID) has become bigger. Anyways, you can make sure that the "new" hdisk5 is identified correctly in all its aspects.

Unmount the filesystems of the VG, then do a varyoffvg <VG> . Then delete the hdisk device and rediscover it:

Code:
rmdev -Rl hdisk5
cfgmgr

Now you need to tell the volume manager that the VG has changed. Issue a

Code:
chvg -g <volume-groupname>

which should do the trick. I am not sure if the VG needs to be varied on or off for that, so try in varyoffvg mode and if you get an error do a varyonvg <VG> and try again.

Ah, a last thing:

Quote:
We have pulled out the failure disk pdisk5 and added a new hard disk.
DON'T DO THAT!

In this case you were lucky, but generally - as i wrote above - it is a bad idea to remove disks which are still known to the system. Always deconfigure them first and pull them only then.

I hope this helps.

bakunin

Last edited by bakunin; 03-05-2019 at 10:30 AM..
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. AIX

Clone 1 Hard disk fromIBM Intellipoint server with AIX 5.x

Hello to all, Im having a new task in a new world (AIX - IBM Servers) I have an IBM Server (Type - 9111-285 very old one) with one Hard disk (73 GB 10 K) with AIX 5.x, and I need to clone the existing disk to another with the same specifications. Could you please give me some advice in order... (7 Replies)
Discussion started by: trevian3969
7 Replies

2. Linux

C++ Code to Access Linux Hard Disk Sectors (with a LoopBack Virtual Hard Disk)

Hi all, I'm kind of new to programming in Linux & c/c++. I'm currently writing a FileManager using Ubuntu Linux(10.10) for Learning Purposes. I've got started on this project by creating a loopback device to be used as my virtual hard disk. After creating the loop back hard disk and mounting it... (23 Replies)
Discussion started by: shen747
23 Replies

3. Red Hat

How to monitor HP server hard disk failure ?

in red hat 4, 5 any one know any commands or any scritps to monitor HP DL 380 G5/6 server and trigger alarm when hard disk failed. thanks for all support ---------- Post updated at 02:45 PM ---------- Previous update was at 12:00 PM ---------- does HP ProLiant Support Pack support is... (4 Replies)
Discussion started by: maxlee24
4 Replies

4. SCO

declare disk driver for IDE hard disk

hi I've a fresh installation of SCO 5.0.7 on the IDE hard disk. For SCSI hard disk I can declare, for example blc disk driver using: # mkdev hd 0 SCSI-0 0 blc 0but it works for IDE hard disk? (3 Replies)
Discussion started by: ccc
3 Replies

5. Solaris

SAN disk failure

hi all, have a solaris 9 OS and a SAN disk which used to work fine is not getting picked up by my machine. can anyone point out things to check in order to troubleshoot this ?? thanks in advance. (3 Replies)
Discussion started by: cesarNZ
3 Replies

6. Filesystems, Disks and Memory

Looking for a solution to disk failure!

Hi people, I have been using my disk for quite a long time but the other day I heard the drive making some noise and had to restart the system again. But when I did that the disk would not boot and I fear that the data might be deleted or lost. So, if you people have any know about the ways to... (2 Replies)
Discussion started by: christopher4
2 Replies

7. Filesystems, Disks and Memory

Looking for a solution to disk failure!

Hi people, I have been using my disk for quite a long time but the other day I heard the drive making some noise and had to restart the system again. But when I did that the disk would not boot and I fear that the data might be deleted or lost. So, if you people have any know about the ways to get... (1 Reply)
Discussion started by: adam466
1 Replies

8. HP-UX

Disk Failure

I am new to being a Unix admin and have a question about replacing some hardware. I have a K class box using HP-UX 10.20 with three disks. Two of the drives are in one logical volume. Every 3 or 4 days, the syslog is showing that one of these drives is experiencing "POWERFAILED" and then recovering... (6 Replies)
Discussion started by: SemiOfCol
6 Replies

9. AIX

hard disk information in AIX

Hi, Other than df -k, is there any command that will tell me all physical hard drives installed on the system as well as the size of each one? I'm using AIX 5.1 Thanks, (3 Replies)
Discussion started by: quickfirststep
3 Replies

10. UNIX for Advanced & Expert Users

Disk failure

is there anu way by which i can find out if all the disks on the system are working ? Milind Shauche. (2 Replies)
Discussion started by: shauche
2 Replies
Login or Register to Ask a Question