I have a AIX 7.1 system that has 3 failed disks, 1 in rootvg and 2 in vg_usr1.
Here is the output of lspv.
Output of lsvg for rootvg and vg_usr1.
Output of lsvg -p rootvg and vg_usr1.
When I query each VG, I found that the following PVs are in a missing state.
It does look like I have a disk that is unassigned, "hdisk42", so I'd like to put this one in use while I get 2 other disks to replace with.
I was going to try and remove hdisk31 and put hdisk42 in its place..
My notes for removing, swapping the failed disk and adding a new one in for AIX 7.1 are as follows..
Quote:
1) Unmirror hdisk31
unmirrorvg -c 1 vg_usr1 hdisk31
2) Take the disk out of the VG
reducevg vg_usr1 hdisk31
3) Remove the disk from system configuration
rmdev -dl hdisk31
4) Replace the disk
5) Get the disk back to the system configuration
cfgmgr
6) Add the disk to the VG
extendvg vg_usr1 hdisk31
7) Mirror hdisk31 again
mirrorvg -S -c 2 vg_usr1
"-S" will synchronize the stale partitions in the background.
8) After some time "lsvg vg_usr1" should show "STALE PPs: 0"
However, something is not happy, I can't get passed the first step.
Then I tried varyonvg. no success.
Any help is appreciated! Thank you.
This is a problem directly related to AIX and we have special forum for AIX. I am going to move the thread there.
Your notes are correct, but only for working disks. Because it is not clear how the copies in your VG are set up it might well be that you have already lost data.
A word about that first: as a responsible admin you should never, NEVER let 3 disks become missing! You should act immediately when the first one fails. This usually foreshadows in the "errpt", when a disk issues an increasing number of disk errors (usually hdisk error 3, which is temporary). What happens there is that blocks are becoming bad and are relocated to good sectors. When formatting a disk AIX sets aside a number of such contingency sectors. One by one these are used if a sector becomes bad but at one point they are exhausted and then you usually get a hdisk error type 4, which is permanent. I suggest to check the "errpt" and "errpt -a" output respectively to find out what happened to the disks.
Second: take stock of your data. Find out if you still have a good copy of every LP (logical partition) by generating a map file for each LV. When you mirror a LV two (or even three, depending on the number of mirrors) PPs are representing one LP. Check the map files for all the LVs if there are LPs represented only by PPs from hdisk31 or hdisk3. If so, you have lost data and you will have to restore its contents from backup (you do have a backup, don't you??).
You will need to varyon the VG for that. Alas, it will not work when disks are missing, even if the quorum checking is disabled. Use the "force" option for this, also use the "-r" option to varyon in read-only state and the "-n" to disable synchronisation of stale partitions:
Now generate the map files for analysis and varyoff again:
What comes now depends on what your analysis results in. In case you have not lost data already you can try immediate removal of the failed disks. Varyon again and remove all the missing disks:
Now try to settle the system:
The last one has to work without any "brutal" handiwork: without "force"-options or the like. If this works so far you might try to put hdisk42 to work. Tell us how far you got and i will explain how to do that in a separate post.
According to man varyoffvg, -s puts group in system maintenance mode..
But also no luck..
So, keep going.. unmount /usr1 and I get..
Figured out what resources are using /usr1 via fuser and got rid of them, was able to run varyoffvg -s vg_usr1 fine this time.
Next I ran varyoffvg -s vg_usr1, it worked but varyonvg still complained about read-write mode, so ran varyoffvg without the -s switch, it took it and was able to run varyonvg -fnr vg_usr1 fine.
I have attached the "map.txt" file which is the output of lslv -m for lv_usr1.. any pointers on analyzing this file to check and see if data is missing?
Am I looking to see that a LP# is not on both failed disks?
I didn't bother to build it but a little sed/awk/grep-filtering should do the trick nicely.
Any of these would mean that you lost both copies of a certain LP and thus data. In this case you will need to restore the respective LV from backup no matter what. You need to verify of this being the case first, because you will not be able to remove the missing disks with the method i described otherwise.
I have followed your instructions and compared my map, didn't see faulty results and proceeded.
for vg_usr1 I now have..
Code:
# lsvg vg_usr1
VOLUME GROUP: vg_usr1 VG IDENTIFIER: 000015010000d60000000141138f2577
VG STATE: active PP SIZE: 128 megabyte(s)
VG PERMISSION: read/write TOTAL PPs: 10920 (1397760 megabytes)
MAX LVs: 256 FREE PPs: 1 (128 megabytes)
LVs: 2 USED PPs: 10919 (1397632 megabytes)
OPEN LVs: 2 QUORUM: 11 (Enabled)
TOTAL PVs: 20 VG DESCRIPTORS: 20
STALE PVs: 0 STALE PPs: 0
ACTIVE PVs: 20 AUTO ON: yes
MAX PPs per VG: 32512
MAX PPs per PV: 1016 MAX PVs: 32
LTG size (Dynamic): 256 kilobyte(s) AUTO SYNC: no
HOT SPARE: no BB POLICY: relocatable
PV RESTRICTION: none INFINITE RETRY: no
DISK BLOCK SIZE: 512
Now I believe hdisk42 is a good disk, anyway to test that? How would I go about adding this disk if the above proves true to vg_usr1 to gain space?
Essentially what we did is remove the bad disk because the partitions on it were available on a mirrored disk?
For tests, I was able to mount /usr1 (which is on vg_usr1) and the data is there.
I have followed your instructions and compared my map, didn't see faulty results and proceeded.
Good!
Quote:
Originally Posted by c3rb3rus
Now I believe hdisk42 is a good disk, anyway to test that? How would I go about adding this disk if the above proves true to vg_usr1 to gain space?
You do not need to test that. Adding a disk to a VG means formatting it, so the system will tell you if the disk is faulty. Do the following to add a new disk to a VG:
Code:
extendvg <vgname> <hdisk-device>
If everything goes well there is no output at all, otherwise diagnostic messages will appear.
Notice that a physical volume (a "disk" in LVM speak) can only contain 1018 PPs in an ordinary VG. This means that because of your PP size of 128M (see lsvg vg_usr1 output) you can only add disks up to somewhat below 128G in size. If your hdisk42 is bigger than this you need to introduce a "factor". This means the usual limit of 32 PVs each 1018 PPs in size is divided/multiplied by this factor: a factor of 2 means that only 16 PVs can be used but each PV can hold 2036 PPs. You change the factor by:
Code:
chvg -t <factor> <vgname>
Quote:
Originally Posted by c3rb3rus
Essentially what we did is remove the bad disk because the partitions on it were available on a mirrored disk?
Not quite. You were able to remove the bad disks because you still had one good copy of every LP. Logical volumes (LVs) consist of "logical partitions" (LP) which are in turn made of "PPs" (physical partitions). PPs are parts of a disk. If you create an unmirrored LV you assign one PP to each LP the LV consists of. Creating a mirrored LV means assigning 2 PPs to every LP and writing data to these two different (disk) locations in parallel.
After adding the new disk you might want to remirror the VG. First find out which LVs are affected. Here is an example:
Code:
# lsvg -l myvg
myvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
mirrorlv jfs2 80 160 2 open/syncd /some/where
loglv04 jfs2log 1 1 1 open/syncd N/A
nomirrorlv jfs2 3 3 1 open/syncd /else/where
The first LV is mirrored, the second is not and the jfslog LV is not mirrored too. Use the mklvcopy command to create mirror copies of unmirrored LVs. You could use the mirrorvg to do that but only if hdisk42 offers the same or more space as the missing disks together, because you do not have any space reserves in your VG (see "free PPs" in the lsvg output).
# extendvg vg_usr1 hdisk42
0516-1398 extendvg: The physical volume hdisk42, appears to belong to another volume group. Use the force option to add this physical volume to a volume group.
0516-792 extendvg: Unable to extend volume group.
Interesting error, since hdisk42 shows it is not assigned when I ssue "lspv hdisk42", I get..
Code:
Physical volume hdisk42 is not assigned to a volume group.
So next, i tried to force it and it worked.. "extendvg -f vg_usr1 hdisk42"
Then I tried mirrorvg vg_usr1, and I get..
Code:
# mirrorvg vg_usr1
0516-1119 mirrorvg: Not enough free physical partitions to satisfy request.
0516-1200 mirrorvg: Failed to mirror the volume group.
This coresponds to what you said, since I had 2 disks fail I don't have enough for this command, so next I try.
Looking at lslv lv_usr1 I have it set for Copies = 2, so I will add 2 for the mklvcopy cmdlet.
However, next I get this error.
Code:
# mklvcopy lv_usr1 2 hdisk42
0516-1509 mklvcopy: VGDA corruption: physical partition info for this LV is invalid.
0516-842 mklvcopy: Unable to make logical partition copies for
logical volume.
The output of lslv_lv_usr1 is:
Code:
# lslv lv_usr1
LOGICAL VOLUME: lv_usr1 VOLUME GROUP: vg_usr1
LV IDENTIFIER: 000015010000d60000000141138f2577.1 PERMISSION: read/write
VG STATE: active/complete LV STATE: closed/syncd
TYPE: jfs2 WRITE VERIFY: off
MAX LPs: 32512 PP SIZE: 128 megabyte(s)
COPIES: 2 SCHED POLICY: parallel
LPs: 6005 PPs: 12010
STALE PPs: 0 BB POLICY: relocatable
INTER-POLICY: maximum RELOCATABLE: yes
INTRA-POLICY: middle UPPER BOUND: 32
MOUNT POINT: /usr1 LABEL: /usr1
MIRROR WRITE CONSISTENCY: on/ACTIVE
EACH LP COPY ON A SEPARATE PV ?: yes
Serialize IO ?: NO
INFINITE RETRY: no
Code:
# lslv -l lv_usr1
0516-1939 : PV identifier not found in VGDA.
Code:
# lsvg vg_usr1
VOLUME GROUP: vg_usr1 VG IDENTIFIER: 000015010000d60000000141138f2577
VG STATE: active PP SIZE: 128 megabyte(s)
VG PERMISSION: read/write TOTAL PPs: 11466 (1467648 megabytes)
MAX LVs: 256 FREE PPs: 547 (70016 megabytes)
LVs: 2 USED PPs: 10919 (1397632 megabytes)
OPEN LVs: 0 QUORUM: 11 (Enabled)
TOTAL PVs: 21 VG DESCRIPTORS: 21
STALE PVs: 0 STALE PPs: 0
ACTIVE PVs: 21 AUTO ON: yes
MAX PPs per VG: 32512
MAX PPs per PV: 1016 MAX PVs: 32
LTG size (Dynamic): 256 kilobyte(s) AUTO SYNC: no
HOT SPARE: no BB POLICY: relocatable
PV RESTRICTION: none INFINITE RETRY: no
DISK BLOCK SIZE: 512
Hi there,
I had issue with one of MY FC cards on T4-2 servers so system team replace it and start the machine but when launch FORMAT command so I don't see my shared disks coming from storage controller. i have checked at the Fabric switch so WWN numbers are visible and zones are ok and after... (1 Reply)
Hi everybody,
I have a little problem with my AIX 6.1, PowerHA 6.1 LVM mirror. After problem with SAN pathing of our one Datacenter, I have LV at stale state.
# lsvg cpsdata2vg
VOLUME GROUP: cpsdata2vg VG IDENTIFIER: 00fb518c00004c0000000169445f4c2c
VG STATE: ... (6 Replies)
Hello
I recently received a request to reclaim hard disks and IP addresses within an AIX system(s). THe file systems are no longer in use and the client has indicated that it is OK to remove them and reclaim the disks and release the IP's. Now, since the file systems belong to a Volume group I... (8 Replies)
HI, I have had an issue last night while trying to extend a filesystsem . chvg -g <vg> command cameback with an error
0516-1790 chvg: Failed bootinfo -s hdisk9.
Ensure the physical volume is available and try again.
0516-732 chvg: Unable to change volume group u01vg.
the VG has 1... (1 Reply)
We run two p5 nodes running AIX 5L in a cluster mode (HACMP), both the nodes share external disk arrays. Only the primary node can access the shared disks at a given point of time.
We are in the process of adding two new disks to the disk arrays so as to make them available to the existing... (3 Replies)
This may sound like an absolute rookie question, and it is.
I have been working on Migrating our HP and Solaris servers to the new EMC SAN and know the routines backwards.
Now we've suddenly got a new IBM server and I don't even know how to check if it is connected to the switch.
Can someone... (1 Reply)
hello
i'm running on P570 box aix 5.3 8 cpus 24G ram
there are 1850 users loged in to this box
the problem is that the two sysytem disks busy all the time
hdisk0 100% busy
hdisk1 100% busy
some one have an idea what writing to this disks?
thanks
ariec (9 Replies)
My root disk is failed and how to replace the root disk in AIX.
Can u give a detailed explanation in step wise.
Pls give the answer taking different scenarios.
Regards
Praveen (1 Reply)
Hello
I've been working on AIX 5.3 ML3 on IBM pSeries520. That server has 6 HDD drives in 3 volume groups (1+mirror in each group). I must check which phisical disk is which disk in the system. For ex. I want to know that disk in 4th slot in the machine is marked as hdisk5 on AIX. Does anybody... (2 Replies)