I have a AIX 7.1 system that has 3 failed disks, 1 in rootvg and 2 in vg_usr1.
Here is the output of lspv.
Code:
# lspv
hdisk0 00044d4dfbb11575 vg_usr1 active
hdisk1 0000150179158027 vg_usr1 active
hdisk2 00001501791582c2 vg_usr1 active
hdisk3 000015013a9148db vg_usr1 active
hdisk4 00001501791587df vg_usr1 active
hdisk5 0000150179158a75 vg_usr1 active
hdisk6 00001501792ff38b vg_usr1 active
hdisk7 00001501792ff659 vg_usr1 active
hdisk8 000015019dbd4c34 vg_usr1 active
hdisk9 0000150178bb04a2 vg_usr1 active
hdisk10 0000150173c3f71f rootvg active
hdisk11 000015015b329219 rootvg active
hdisk12 0000150115562195 rootvg active
hdisk13 0000150173c3f833 rootvg active
hdisk14 0000150173c3f860 rootvg active
hdisk15 00001501f260dea1 rootvg active
hdisk16 00001501791a37d9 vg_usr2 active
hdisk17 00001501791a3a9d vg_usr2 active
hdisk18 00001501791a3d45 vg_usr2 active
hdisk19 00001501791a3fea vg_usr2 active
hdisk20 00001501791a4288 vg_usr3 active
hdisk21 00001501791a451e vg_usr3 active
hdisk22 000015019dbd4ed5 vg_usr1 active
hdisk23 000015019dbd517b vg_paging active
hdisk24 00001501791bf1c6 vg_usr1 active
hdisk25 00001501791bf48f vg_usr1 active
hdisk26 00001501791bf72e vg_usr1 active
hdisk27 00001501791bfa40 vg_usr1 active
hdisk28 00001501791bfce5 vg_usr1 active
hdisk29 00001501791bff84 vg_usr1 active
hdisk30 00001501792ff909 vg_usr1 active
hdisk32 00001501caf49d70 vg_usr1 active
hdisk33 0000150179310ec5 vg_usr1 active
hdisk34 000015010dd840e2 vg_usr1 active
hdisk35 00001501caf588da vg_paging active
hdisk36 00001501793177ff vg_usr3 active
hdisk37 0000150179317ac0 vg_usr3 active
hdisk38 0000150179317d6d vg_usr3 active
hdisk39 0000150179318031 vg_usr3 active
hdisk40 000015019dc3fe46 vg_usr3 active
hdisk42 00c55310ce0d3117 None
Output of lsvg for rootvg and vg_usr1.
Code:
# lsvg vg_usr1
VOLUME GROUP: vg_usr1 VG IDENTIFIER: 000015010000d60000000141138f2577
VG STATE: active PP SIZE: 128 megabyte(s)
VG PERMISSION: read/write TOTAL PPs: 12012 (1537536 megabytes)
MAX LVs: 256 FREE PPs: 1 (128 megabytes)
LVs: 2 USED PPs: 12011 (1537408 megabytes)
OPEN LVs: 2 QUORUM: 12 (Enabled)
TOTAL PVs: 22 VG DESCRIPTORS: 22
STALE PVs: 2 STALE PPs: 768
ACTIVE PVs: 20 AUTO ON: yes
MAX PPs per VG: 32512
MAX PPs per PV: 1016 MAX PVs: 32
LTG size (Dynamic): 128 kilobyte(s) AUTO SYNC: no
HOT SPARE: no BB POLICY: relocatable
PV RESTRICTION: none INFINITE RETRY: no
DISK BLOCK SIZE: 512
# lsvg rootvg
VOLUME GROUP: rootvg VG IDENTIFIER: 000015010000d6000000013569180e7a
VG STATE: active PP SIZE: 128 megabyte(s)
VG PERMISSION: read/write TOTAL PPs: 3276 (419328 megabytes)
MAX LVs: 256 FREE PPs: 2346 (300288 megabytes)
LVs: 15 USED PPs: 930 (119040 megabytes)
OPEN LVs: 13 QUORUM: 4 (Enabled)
TOTAL PVs: 6 VG DESCRIPTORS: 6
STALE PVs: 1 STALE PPs: 9
ACTIVE PVs: 5 AUTO ON: yes
MAX PPs per VG: 32512
MAX PPs per PV: 1016 MAX PVs: 32
LTG size (Dynamic): 256 kilobyte(s) AUTO SYNC: no
HOT SPARE: no BB POLICY: relocatable
PV RESTRICTION: none INFINITE RETRY: no
DISK BLOCK SIZE: 512
Output of lsvg -p rootvg and vg_usr1.
Code:
# lsvg -p vg_usr1
vg_usr1:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk0 active 546 0 00..00..00..00..00
hdisk1 active 546 0 00..00..00..00..00
hdisk2 active 546 0 00..00..00..00..00
hdisk3 missing 546 0 00..00..00..00..00
hdisk4 active 546 0 00..00..00..00..00
hdisk5 active 546 0 00..00..00..00..00
hdisk6 active 546 0 00..00..00..00..00
hdisk7 active 546 0 00..00..00..00..00
hdisk8 active 546 0 00..00..00..00..00
hdisk9 active 546 0 00..00..00..00..00
hdisk22 active 546 0 00..00..00..00..00
hdisk24 active 546 0 00..00..00..00..00
hdisk25 active 546 0 00..00..00..00..00
hdisk26 active 546 0 00..00..00..00..00
hdisk27 active 546 0 00..00..00..00..00
hdisk28 active 546 0 00..00..00..00..00
hdisk29 active 546 0 00..00..00..00..00
hdisk30 active 546 0 00..00..00..00..00
hdisk31 missing 546 0 00..00..00..00..00
hdisk32 active 546 0 00..00..00..00..00
hdisk33 active 546 0 00..00..00..00..00
hdisk34 active 546 1 00..00..00..00..01
# lsvg -p rootvg
rootvg:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk10 active 546 497 109..89..81..109..109
hdisk11 active 546 173 110..56..00..00..07
hdisk12 active 546 177 68..00..00..00..109
hdisk13 active 546 487 110..50..109..109..109
hdisk14 missing 546 511 109..89..95..109..109
hdisk15 active 546 501 110..109..64..109..109
When I query each VG, I found that the following PVs are in a missing state.
Code:
PV_NAME PV STATE TOTAL PPs FREE PPs FREE DISTRIBUTION
hdisk3 missing 546 0 00..00..00..00..00
hdisk14 missing 546 0 00..00..00..00..00
hdisk31 missing 546 0 00..00..00..00..00
It does look like I have a disk that is unassigned, "hdisk42", so I'd like to put this one in use while I get 2 other disks to replace with.
I was going to try and remove hdisk31 and put hdisk42 in its place..
My notes for removing, swapping the failed disk and adding a new one in for AIX 7.1 are as follows..
Quote:
1) Unmirror hdisk31
unmirrorvg -c 1 vg_usr1 hdisk31
2) Take the disk out of the VG
reducevg vg_usr1 hdisk31
3) Remove the disk from system configuration
rmdev -dl hdisk31
4) Replace the disk
5) Get the disk back to the system configuration
cfgmgr
6) Add the disk to the VG
extendvg vg_usr1 hdisk31
7) Mirror hdisk31 again
mirrorvg -S -c 2 vg_usr1
"-S" will synchronize the stale partitions in the background.
8) After some time "lsvg vg_usr1" should show "STALE PPs: 0"
However, something is not happy, I can't get passed the first step.
Code:
# unmirrorvg -c 1 vg_usr1 hdisk31
0516-1155 lreducelv: Last good copy of a partition cannot reside on a missing disk.
Try again after reactivating the disk using chpv and varyonvg.
0516-922 rmlvcopy: Unable to remove logical partition copies from
logical volume lv_usr1.
0516-1135 unmirrorvg: The unmirror of the volume group failed.
The volume group is still partially or fully mirrored.
Then I tried varyonvg. no success.
Code:
# varyonvg vg_usr1
varyonvg # 0516-934 /usr/sbin/syncvg: Unable to synchronize logical volume lv_usr1.
0516-932 /usr/sbin/syncvg: Unable to synchronize volume group vg_usr1.
This is a problem directly related to AIX and we have special forum for AIX. I am going to move the thread there.
Your notes are correct, but only for working disks. Because it is not clear how the copies in your VG are set up it might well be that you have already lost data.
A word about that first: as a responsible admin you should never, NEVER let 3 disks become missing! You should act immediately when the first one fails. This usually foreshadows in the "errpt", when a disk issues an increasing number of disk errors (usually hdisk error 3, which is temporary). What happens there is that blocks are becoming bad and are relocated to good sectors. When formatting a disk AIX sets aside a number of such contingency sectors. One by one these are used if a sector becomes bad but at one point they are exhausted and then you usually get a hdisk error type 4, which is permanent. I suggest to check the "errpt" and "errpt -a" output respectively to find out what happened to the disks.
Second: take stock of your data. Find out if you still have a good copy of every LP (logical partition) by generating a map file for each LV. When you mirror a LV two (or even three, depending on the number of mirrors) PPs are representing one LP. Check the map files for all the LVs if there are LPs represented only by PPs from hdisk31 or hdisk3. If so, you have lost data and you will have to restore its contents from backup (you do have a backup, don't you??).
You will need to varyon the VG for that. Alas, it will not work when disks are missing, even if the quorum checking is disabled. Use the "force" option for this, also use the "-r" option to varyon in read-only state and the "-n" to disable synchronisation of stale partitions:
Code:
varyonvg -fnr vg_usr1
Now generate the map files for analysis and varyoff again:
Code:
lslv -m <LVname> > /path/to/file
What comes now depends on what your analysis results in. In case you have not lost data already you can try immediate removal of the failed disks. Varyon again and remove all the missing disks:
The last one has to work without any "brutal" handiwork: without "force"-options or the like. If this works so far you might try to put hdisk42 to work. Tell us how far you got and i will explain how to do that in a separate post.
# varyonvg -fnr vg_usr1
0516-1293 varyonvg: Volume group is currently in read-write mode.
Use varyoffvg before varyon volume group in different mode.
So now I try.. "varyoffvg -s vg_usr1".
According to man varyoffvg, -s puts group in system maintenance mode..
But also no luck..
Code:
# varyoffvg -s vg_usr1
0516-012 lvaryoffvg: Logical volume must be closed. If the logical
volume contains a filesystem, the umount command will close
the LV device.
0516-942 varyoffvg: Unable to vary off volume group vg_usr1.
So, keep going.. unmount /usr1 and I get..
Code:
umount: 0506-349 Cannot unmount /dev/lv_usr1: The requested resource is busy.
Figured out what resources are using /usr1 via fuser and got rid of them, was able to run varyoffvg -s vg_usr1 fine this time.
Next I ran varyoffvg -s vg_usr1, it worked but varyonvg still complained about read-write mode, so ran varyoffvg without the -s switch, it took it and was able to run varyonvg -fnr vg_usr1 fine.
I have attached the "map.txt" file which is the output of lslv -m for lv_usr1.. any pointers on analyzing this file to check and see if data is missing?
Am I looking to see that a LP# is not on both failed disks?
I didn't bother to build it but a little sed/awk/grep-filtering should do the trick nicely.
Any of these would mean that you lost both copies of a certain LP and thus data. In this case you will need to restore the respective LV from backup no matter what. You need to verify of this being the case first, because you will not be able to remove the missing disks with the method i described otherwise.
I have followed your instructions and compared my map, didn't see faulty results and proceeded.
for vg_usr1 I now have..
Code:
# lsvg vg_usr1
VOLUME GROUP: vg_usr1 VG IDENTIFIER: 000015010000d60000000141138f2577
VG STATE: active PP SIZE: 128 megabyte(s)
VG PERMISSION: read/write TOTAL PPs: 10920 (1397760 megabytes)
MAX LVs: 256 FREE PPs: 1 (128 megabytes)
LVs: 2 USED PPs: 10919 (1397632 megabytes)
OPEN LVs: 2 QUORUM: 11 (Enabled)
TOTAL PVs: 20 VG DESCRIPTORS: 20
STALE PVs: 0 STALE PPs: 0
ACTIVE PVs: 20 AUTO ON: yes
MAX PPs per VG: 32512
MAX PPs per PV: 1016 MAX PVs: 32
LTG size (Dynamic): 256 kilobyte(s) AUTO SYNC: no
HOT SPARE: no BB POLICY: relocatable
PV RESTRICTION: none INFINITE RETRY: no
DISK BLOCK SIZE: 512
Now I believe hdisk42 is a good disk, anyway to test that? How would I go about adding this disk if the above proves true to vg_usr1 to gain space?
Essentially what we did is remove the bad disk because the partitions on it were available on a mirrored disk?
For tests, I was able to mount /usr1 (which is on vg_usr1) and the data is there.
I have followed your instructions and compared my map, didn't see faulty results and proceeded.
Good!
Quote:
Originally Posted by c3rb3rus
Now I believe hdisk42 is a good disk, anyway to test that? How would I go about adding this disk if the above proves true to vg_usr1 to gain space?
You do not need to test that. Adding a disk to a VG means formatting it, so the system will tell you if the disk is faulty. Do the following to add a new disk to a VG:
Code:
extendvg <vgname> <hdisk-device>
If everything goes well there is no output at all, otherwise diagnostic messages will appear.
Notice that a physical volume (a "disk" in LVM speak) can only contain 1018 PPs in an ordinary VG. This means that because of your PP size of 128M (see lsvg vg_usr1 output) you can only add disks up to somewhat below 128G in size. If your hdisk42 is bigger than this you need to introduce a "factor". This means the usual limit of 32 PVs each 1018 PPs in size is divided/multiplied by this factor: a factor of 2 means that only 16 PVs can be used but each PV can hold 2036 PPs. You change the factor by:
Code:
chvg -t <factor> <vgname>
Quote:
Originally Posted by c3rb3rus
Essentially what we did is remove the bad disk because the partitions on it were available on a mirrored disk?
Not quite. You were able to remove the bad disks because you still had one good copy of every LP. Logical volumes (LVs) consist of "logical partitions" (LP) which are in turn made of "PPs" (physical partitions). PPs are parts of a disk. If you create an unmirrored LV you assign one PP to each LP the LV consists of. Creating a mirrored LV means assigning 2 PPs to every LP and writing data to these two different (disk) locations in parallel.
After adding the new disk you might want to remirror the VG. First find out which LVs are affected. Here is an example:
Code:
# lsvg -l myvg
myvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
mirrorlv jfs2 80 160 2 open/syncd /some/where
loglv04 jfs2log 1 1 1 open/syncd N/A
nomirrorlv jfs2 3 3 1 open/syncd /else/where
The first LV is mirrored, the second is not and the jfslog LV is not mirrored too. Use the mklvcopy command to create mirror copies of unmirrored LVs. You could use the mirrorvg to do that but only if hdisk42 offers the same or more space as the missing disks together, because you do not have any space reserves in your VG (see "free PPs" in the lsvg output).
# extendvg vg_usr1 hdisk42
0516-1398 extendvg: The physical volume hdisk42, appears to belong to another volume group. Use the force option to add this physical volume to a volume group.
0516-792 extendvg: Unable to extend volume group.
Interesting error, since hdisk42 shows it is not assigned when I ssue "lspv hdisk42", I get..
Code:
Physical volume hdisk42 is not assigned to a volume group.
So next, i tried to force it and it worked.. "extendvg -f vg_usr1 hdisk42"
Then I tried mirrorvg vg_usr1, and I get..
Code:
# mirrorvg vg_usr1
0516-1119 mirrorvg: Not enough free physical partitions to satisfy request.
0516-1200 mirrorvg: Failed to mirror the volume group.
This coresponds to what you said, since I had 2 disks fail I don't have enough for this command, so next I try.
Looking at lslv lv_usr1 I have it set for Copies = 2, so I will add 2 for the mklvcopy cmdlet.
However, next I get this error.
Code:
# mklvcopy lv_usr1 2 hdisk42
0516-1509 mklvcopy: VGDA corruption: physical partition info for this LV is invalid.
0516-842 mklvcopy: Unable to make logical partition copies for
logical volume.
The output of lslv_lv_usr1 is:
Code:
# lslv lv_usr1
LOGICAL VOLUME: lv_usr1 VOLUME GROUP: vg_usr1
LV IDENTIFIER: 000015010000d60000000141138f2577.1 PERMISSION: read/write
VG STATE: active/complete LV STATE: closed/syncd
TYPE: jfs2 WRITE VERIFY: off
MAX LPs: 32512 PP SIZE: 128 megabyte(s)
COPIES: 2 SCHED POLICY: parallel
LPs: 6005 PPs: 12010
STALE PPs: 0 BB POLICY: relocatable
INTER-POLICY: maximum RELOCATABLE: yes
INTRA-POLICY: middle UPPER BOUND: 32
MOUNT POINT: /usr1 LABEL: /usr1
MIRROR WRITE CONSISTENCY: on/ACTIVE
EACH LP COPY ON A SEPARATE PV ?: yes
Serialize IO ?: NO
INFINITE RETRY: no
Code:
# lslv -l lv_usr1
0516-1939 : PV identifier not found in VGDA.
Code:
# lsvg vg_usr1
VOLUME GROUP: vg_usr1 VG IDENTIFIER: 000015010000d60000000141138f2577
VG STATE: active PP SIZE: 128 megabyte(s)
VG PERMISSION: read/write TOTAL PPs: 11466 (1467648 megabytes)
MAX LVs: 256 FREE PPs: 547 (70016 megabytes)
LVs: 2 USED PPs: 10919 (1397632 megabytes)
OPEN LVs: 0 QUORUM: 11 (Enabled)
TOTAL PVs: 21 VG DESCRIPTORS: 21
STALE PVs: 0 STALE PPs: 0
ACTIVE PVs: 21 AUTO ON: yes
MAX PPs per VG: 32512
MAX PPs per PV: 1016 MAX PVs: 32
LTG size (Dynamic): 256 kilobyte(s) AUTO SYNC: no
HOT SPARE: no BB POLICY: relocatable
PV RESTRICTION: none INFINITE RETRY: no
DISK BLOCK SIZE: 512
Hi there,
I had issue with one of MY FC cards on T4-2 servers so system team replace it and start the machine but when launch FORMAT command so I don't see my shared disks coming from storage controller. i have checked at the Fabric switch so WWN numbers are visible and zones are ok and after... (1 Reply)
Hi everybody,
I have a little problem with my AIX 6.1, PowerHA 6.1 LVM mirror. After problem with SAN pathing of our one Datacenter, I have LV at stale state.
# lsvg cpsdata2vg
VOLUME GROUP: cpsdata2vg VG IDENTIFIER: 00fb518c00004c0000000169445f4c2c
VG STATE: ... (6 Replies)
Hello
I recently received a request to reclaim hard disks and IP addresses within an AIX system(s). THe file systems are no longer in use and the client has indicated that it is OK to remove them and reclaim the disks and release the IP's. Now, since the file systems belong to a Volume group I... (8 Replies)
HI, I have had an issue last night while trying to extend a filesystsem . chvg -g <vg> command cameback with an error
0516-1790 chvg: Failed bootinfo -s hdisk9.
Ensure the physical volume is available and try again.
0516-732 chvg: Unable to change volume group u01vg.
the VG has 1... (1 Reply)
We run two p5 nodes running AIX 5L in a cluster mode (HACMP), both the nodes share external disk arrays. Only the primary node can access the shared disks at a given point of time.
We are in the process of adding two new disks to the disk arrays so as to make them available to the existing... (3 Replies)
This may sound like an absolute rookie question, and it is.
I have been working on Migrating our HP and Solaris servers to the new EMC SAN and know the routines backwards.
Now we've suddenly got a new IBM server and I don't even know how to check if it is connected to the switch.
Can someone... (1 Reply)
hello
i'm running on P570 box aix 5.3 8 cpus 24G ram
there are 1850 users loged in to this box
the problem is that the two sysytem disks busy all the time
hdisk0 100% busy
hdisk1 100% busy
some one have an idea what writing to this disks?
thanks
ariec (9 Replies)
My root disk is failed and how to replace the root disk in AIX.
Can u give a detailed explanation in step wise.
Pls give the answer taking different scenarios.
Regards
Praveen (1 Reply)
Hello
I've been working on AIX 5.3 ML3 on IBM pSeries520. That server has 6 HDD drives in 3 volume groups (1+mirror in each group). I must check which phisical disk is which disk in the system. For ex. I want to know that disk in 4th slot in the machine is marked as hdisk5 on AIX. Does anybody... (2 Replies)