in trying to rectify a stale lv problem I ran rmlvcopy <lv> 1 <primary disk> leaving the original os disk without lv copies other than the stale lv.
Both disks seem operational, but, lsvg rootg shows 1 stale pv.
The end goal is to re-attach the lv's back to hdisk1, and then attempt a reboot off of hdisk1 to sync things up again.
Last edited by jim mcnamara; 01-14-2019 at 02:46 PM..
in trying to rectify a stale lv problem I ran rmlvcopy <lv> 1 <primary disk> leaving the original os disk without lv copies other than the stale lv.
Both disks seem operational, but, lsvg rootg shows 1 stale pv.
The end goal is to re-attach the lv's back to hdisk1, and then attempt a reboot off of hdisk1 to sync things up again.
Let us first establish what "stale" means here. Bear with me if this is old news for you: when you have a mirrored LV (basically there are only mirrored LVs, a mirrored VG means just that all LVs are mirrored) each LP (logical partition) is represented by two different PPs (physical partition). An LV is considered stale if any of its LPs is not represented by two (or three, depending on the number of mirrors) PPs.
If the mirroring is recreated (that happens in the background) all the LVs that are not completely mirrored yet are marked "stale" too. Check for a processes named syncvg in the process list. If it is there you just need to wait. You can also check the output of lsvg rootvg to see if the number of stale LPs decrease.
Furthermore, your OS disk does not only contain VG information but also is also instrumented to be booted from. Whenever you alter (the disks of) your rootvg you need to reestablish the boot code by using the bosboot command - this puts theboot code onto the disk and thus makes it bootable. Furthermore you may need to alter the bootlist by (re-)creating it with the bootlist command. I just wanted to say this up front because it is easily forgotten once in a while.
Back to remedying your situation: the first thing you should do is to make absolutely sure you have a valid, working and installable backup, preferably in form of an mksysb image, most preferably on your NIM server. However far from ideal your current situation is: take the time to create such an image before you try anything else. Whenever you do non-trivial tasks to your rootvg you run a non-zero chance of ending with a non-working system. With an image you can at least get where you have been. If you know your trade you can run mkszfile before running mksysb and then edit this file to create a non-mirrored backup image. Normally the image will be restored the same way the system was installed when the image was taken, with all the mirrors, etc. in place. It may be preferable to have the image been taken in an unmirrored fashion so that it restores without a mirror on one single disk and only then do a mirrorvg manually. Again: don't forget bosboot and bootlist afterwards.
Which brings us to your disk: you probably have isolated the culprit to the one LV you still have on it after you removed all the other ones. Do an unmirrorvg to completely make the disk empty and a reducevg to get it out of the VG. Your VG should now be in unmirrored but otherwise healthy state. If you want you could now extensively test the disk and eventually reuse it but i wouldn't. The gain of what a single disk costs is simply not worth the effort it takes to reinstall a system that crashed because of a failing disk, not to mention the costs of the downtime of the service provided by the system itself. Get a new disk, put it in, do an extendvg and finally a mirrorvg. After you issued the mirrorvg command it takes some time until the mirrors are resynchronized. Until that the LVs are still shown to be "stale".
To speed up things (and if you have enough RAM because that takes some of it) you can do like i usually do:
Notice that 32 is the maximum. Use less if you have not enough RAM. The needed amount is the PP-size (times the number). You can also set a certain number of parallel tasks in advance by putting into /etc/environment the following line:
This will also affect HACMP/PowerHA commands, unlike the same setting in roots profile, which are ignored. Also notice that activatevg and varyonvg will (re-)start the synchronisation process too if the VG has stale partitions.
I hope this helps.
bakunin
Last edited by bakunin; 01-14-2019 at 03:49 PM..
These 2 Users Gave Thanks to bakunin For This Post:
Thank you.
I may not be in as bad of shape as I think i am. lslv -l hd2 shows hdisk1 with a 0% in the IN BAND column, which from the man pages sounds like the OS is not writing to the lv anymore.
returns with no errors.
It almost seems like i could just pull hdisk1 out and be ok at this point. Its a gut wrenching decision (probably wont do it though). I have re-ran bosboot -ad /dev/hdisk0 and made sure my bootlist lists hdisk0 first. If there was trouble as far as os problems, I would expect my OS by now to be choking and dying if I had any filesystem access, os command errors, accessing hdisk0, however, its still running fine (running DB2 and Informix developement DB's).
I may not be in as bad of shape as I think i am. lslv -l hd2 shows hdisk1 with a 0% in the IN BAND column, which from the man pages sounds like the OS is not writing to the lv anymore.
Sorry, but: no. The "in band" means something completely different and has nothing to do with your problem. "In band" means: when you create LVs they are placed fittingly on the disk so that there is no place for extension. Like this, where a,b,c... mean the PPs of various LVs and X means free PPs:
Now, when you extend LVs or shrink them you over time end in a situation where this strict succcession is broken up, like this:
The initial situation is what is meant by "in band 100%": all the LVs are physically placed in one piece and the PPs are in the order of ascending LPs. Once your disk becomes more and more disorganised you can rectify this with the reorgvg command which moves around all the PPs until they are in order again. In your case the "in band 100%" comes from all PPs assigned to hd2 are placed on the "center" part of hdisk0 but on hdisk1 39 of the 40 are placed on "outer middle" and one is placed on "outer edge". Therefore the "in band" indicator shows 0%. But again, this has nothing to do with your problem.
Quote:
Originally Posted by mrmurdock
returns with no errors.
This just means that the information in the ODM about the composition of the rootvg is accurate. This is a good thing but still does not help your problem.
Quote:
Originally Posted by mrmurdock
It almost seems like i could just pull hdisk1 out and be ok at this point.
DON'T!!
As i said before the information about the VG is stored in the ODM and if you simply remove the disk (without using the reducevg procedure i explained above) you end with this information being NOT accurate any more. Prepare to manually repair the ODM in a rather tedious fashion afterwards if you do that. (Don't think you could put in another disk to make up: disks are identified by a unique "PVID" when they become part of a VG, so the system knows that this disk is not that disk.) Before you pull out the disk remove it cleanly from the ODM and this is done by using the commands i explained above.
I hope this helps.
bakunin
Last edited by bakunin; 01-14-2019 at 10:45 PM..
These 2 Users Gave Thanks to bakunin For This Post:
this morning (or maybe after a nights rest), revealed the issue from lspv . The lspv hdisk1 this morining also shows the pv state: missing, although lspv shows all the disks online. none of the aix lvm commands are working on the disk (reducevg complains about the open hd2 lv, which is /usr, even if I use -f to force it). syncvg is not running in the background.
This is AIX 6.1 TL7 SP 10 1415 build date. I have had to run odmgets and odmdeletes before on other boxes. a little bit of tedious cleanup isnt all that bad. Unfortunately this is in a remote DC, so I have to rely on another pair of hands to pull the disk.
AND ERRPT shows (finally)
Description
PV NO LONGER RELOCATING NEW BAD BLOCKS
Probable Causes
NON-MEDIA ERROR DURING SW RELOCATION
Failure Causes
DISK DRIVE
DISK DRIVE ELECTRONICS
STORAGE DEVICE CABLE
(reducevg complains about the open hd2 lv, which is /usr, even if I use -f to force it).
That was to be expected. I repeat:
Quote:
Do an unmirrorvg to completely make the disk empty and a reducevg to get it out of the VG.
You cannot use a reducevg on a disk which has not been emptied before. Since you have still a LV occupying space on the PV (even if it is only a mirror) you cannot remove the disk from the VG. You either have to remove the mirror on this disk first or move it to another PV.
If this is not due to a broken cable or controller (that would explain the "missing" status) the error message suggests that the disk was in its last throes anyway: when a disk is formatted (when included in a VG) a certain number of blocks is set aside to compensate for blocks going bad. They are used up over time. Once they are depleted (or nearly depleted) you usually see a series of TEMP hdisk errors (IIRC "hdisk error 3", usually stretched out over some days or weeks) before finally a PERM (IIRC "hdisk error 4") one in the errpt.
so migratepv -l hd2 hdisk1 hdisk6 (yes I found a spare unused disk allocated, but I can delete the lv and vg on it. ). My only concern would be since it cannot read the bad block to finish the mirror, is migratepv smart enough to move the stuck lv? I guess if migratepv cant, it will just error out.
Hi everybody,
I have a little problem with my AIX 6.1, PowerHA 6.1 LVM mirroring.
I accidentally created logical volume cpsabcd2lv with external jfs2log loglv00 in the same volume group cpsdata2vg. Then I mirrored LV cpsabcd2lv on the second LUN in VG cpsdata2vg. My journal is unmirrored and... (0 Replies)
Hello,
I have two hdisk in Power7 machine, the rootvg on hdisk0.
So to make a disk redundancy should make mirror or alt_clone and what is the different.
Appreciate your help
Thanks (1 Reply)
Hello,
aix 5.2, mirrored rootvg on hdisk0 and hdisk1. hdisk0 is dead. I can boot to cd, into sms, into maintenance mode. I can fsck all the various partitions on hdisk1 (the hd4 hd2 hd3, etc...) all is fine. But without the hdisk0 part of the mirror I cannot get the system to boot. ystem hangs on... (6 Replies)
hello folks,
I have a 300GB ROOTVG volume groups with one filesystem /backup having 200GB allocated space
Now, I cannot alt disk clone or mirrorvg this hdisk with another smaller disk. The disk size has to be 300GB; I tried alt disk clone and mirrorvg , it doesn't work. you cannot copy LVs as... (9 Replies)
Dear all.
We have a very big issue on Attach HP EVA to IBM AIX powerpc singlepath.
the configurations on
lscfg -vl fcs2
fcs2 U789C.001.DQD8D74-P1-C2-T1 4Gb FC PCI Express Adapter (df1000fe)
Part Number.................10N7249
Serial... (3 Replies)
Guys,
In my AIX 6.1 box the rootvg was on hdisk2, I tried to migrated it to hdisk0
Added hisk0 to rootvg , mirrored rootvg and changed bootlist and and sucessfully rebooted from hdisk0
Now I tried to remove the hdisk2 from rootvg so breaked mirror
-bash-3.00# unmirrorvg rootvg hdisk2... (3 Replies)
Hi ,
I am new to SVM .when i try to learn RAID 1 , first they are creating two RAID 0 strips through
metainit d51 1 1 c0t0d0s2
metainit d52 1 1 c1t0d0s2
In the next step
metainit d50 -m d51
d50: Mirror is setup
next step is
metaattach d50 d52
d50 : submirror d52 is... (7 Replies)
I've looked a little but haven't found a solid answer, assuming there is one.
What's better, hardware mirroring or ZFS mirroring? Common practice for us was to use the raid controllers on the Sun x86 servers. Now we've been using ZFS mirroring since U6. Any performance difference? Any other... (3 Replies)
Hello,
how can i see easily the state of a mirrored disk on a AIX 4.3.3.
I try followed command:
lslv -m >lvname> but for me is not enough information.
thanx in advance
fenomen (2 Replies)