How to manually -re-attach AIX lv's to a mirror?

Tags
aix, lv, lvm, mirror

 
Thread Tools Search this Thread
# 1  
Old 1 Day Ago
How to manually -re-attach AIX lv's to a mirror?

in trying to rectify a stale lv problem I ran rmlvcopy <lv> 1 <primary disk> leaving the original os disk without lv copies other than the stale lv.
Both disks seem operational, but, lsvg rootg shows 1 stale pv.

The end goal is to re-attach the lv's back to hdisk1, and then attempt a reboot off of hdisk1 to sync things up again.


Code:
#lsvg -l rootvg
rootvg:
LV NAME             TYPE       LPs     PPs     PVs  LV STATE      MOUNT POINT
hd5                 boot       1       1       1    closed/syncd  N/A
hd6                 paging     4       4       1    open/syncd    N/A
hd8                 jfs2log    1       1       1    open/syncd    N/A
hd4                 jfs2       60      60      1    open/syncd    /
hd2                 jfs2       40      80      2    open/stale    /usr   <== not sure why the lv is in stale mode!
hd9var              jfs2       16      16      1    open/syncd    /var
hd3                 jfs2       20      20      1    open/syncd    /tmp
hd1                 jfs2       40      40      1    open/syncd    /home
hd10opt             jfs2       40      40      1    open/syncd    /opt
hd11admin           jfs2       1       1       1    open/syncd    /admin
livedump            jfs2       1       1       1    open/syncd    /var/adm/ras/livedump
lvol1               jfs2       60      60      1    open/syncd    /usr/sys/inst.images

Code:
# ls -m hd2          <== StaLE LV
LP    PP1  PV1               PP2  PV2               PP3  PV3
0001  0222 hdisk1            0509 hdisk0            
0002  0229 hdisk1            0510 hdisk0            
0003  0230 hdisk1            0511 hdisk0            
0004  0231 hdisk1            0512 hdisk0            
0005  0232 hdisk1            0513 hdisk0            

# lslv -m hd1
hd1:/home
LP    PP1  PV1               PP2  PV2               PP3  PV3
0001  0585 hdisk0            
0002  0586 hdisk0            
0003  0587 hdisk0            
0004  0588 hdisk0            
0005  0589 hdisk0


Last edited by jim mcnamara; 1 Day Ago at 01:46 PM..
This User Gave Thanks to mrmurdock For This Post:
Neo (1 Day Ago)
# 2  
Old 1 Day Ago
Quote:
Originally Posted by mrmurdock
in trying to rectify a stale lv problem I ran rmlvcopy <lv> 1 <primary disk> leaving the original os disk without lv copies other than the stale lv.
Both disks seem operational, but, lsvg rootg shows 1 stale pv.

The end goal is to re-attach the lv's back to hdisk1, and then attempt a reboot off of hdisk1 to sync things up again.
Let us first establish what "stale" means here. Bear with me if this is old news for you: when you have a mirrored LV (basically there are only mirrored LVs, a mirrored VG means just that all LVs are mirrored) each LP (logical partition) is represented by two different PPs (physical partition). An LV is considered stale if any of its LPs is not represented by two (or three, depending on the number of mirrors) PPs.

If the mirroring is recreated (that happens in the background) all the LVs that are not completely mirrored yet are marked "stale" too. Check for a processes named syncvg in the process list. If it is there you just need to wait. You can also check the output of lsvg rootvg to see if the number of stale LPs decrease.

Furthermore, your OS disk does not only contain VG information but also is also instrumented to be booted from. Whenever you alter (the disks of) your rootvg you need to reestablish the boot code by using the bosboot command - this puts theboot code onto the disk and thus makes it bootable. Furthermore you may need to alter the bootlist by (re-)creating it with the bootlist command. I just wanted to say this up front because it is easily forgotten once in a while.

Back to remedying your situation: the first thing you should do is to make absolutely sure you have a valid, working and installable backup, preferably in form of an mksysb image, most preferably on your NIM server. However far from ideal your current situation is: take the time to create such an image before you try anything else. Whenever you do non-trivial tasks to your rootvg you run a non-zero chance of ending with a non-working system. With an image you can at least get where you have been. If you know your trade you can run mkszfile before running mksysb and then edit this file to create a non-mirrored backup image. Normally the image will be restored the same way the system was installed when the image was taken, with all the mirrors, etc. in place. It may be preferable to have the image been taken in an unmirrored fashion so that it restores without a mirror on one single disk and only then do a mirrorvg manually. Again: don't forget bosboot and bootlist afterwards.

Which brings us to your disk: you probably have isolated the culprit to the one LV you still have on it after you removed all the other ones. Do an unmirrorvg to completely make the disk empty and a reducevg to get it out of the VG. Your VG should now be in unmirrored but otherwise healthy state. If you want you could now extensively test the disk and eventually reuse it but i wouldn't. The gain of what a single disk costs is simply not worth the effort it takes to reinstall a system that crashed because of a failing disk, not to mention the costs of the downtime of the service provided by the system itself. Get a new disk, put it in, do an extendvg and finally a mirrorvg. After you issued the mirrorvg command it takes some time until the mirrors are resynchronized. Until that the LVs are still shown to be "stale".

To speed up things (and if you have enough RAM because that takes some of it) you can do like i usually do:

Code:
mirrorvg -s rootvg hdiskN     # mirrors but does NOT start to synchronise
syncvg -P:32 -v rootvg &      # mirrors in background, syncing 32 LPs in parallel

Notice that 32 is the maximum. Use less if you have not enough RAM. The needed amount is the PP-size (times the number). You can also set a certain number of parallel tasks in advance by putting into /etc/environment the following line:

Code:
NUM_PARALLEL_LPS=NN

This will also affect HACMP/PowerHA commands, unlike the same setting in roots profile, which are ignored. Also notice that activatevg and varyonvg will (re-)start the synchronisation process too if the VG has stale partitions.

I hope this helps.

bakunin

Last edited by bakunin; 1 Day Ago at 02:49 PM..
These 2 Users Gave Thanks to bakunin For This Post:
Don Cragun (1 Day Ago) Neo (1 Day Ago)
# 3  
Old 1 Day Ago
Thank you.
I may not be in as bad of shape as I think i am. lslv -l hd2 shows hdisk1 with a 0% in the IN BAND column, which from the man pages sounds like the OS is not writing to the lv anymore.
Code:
# lslv -l hd2
hd2:/usr
PV                COPIES        IN BAND       DISTRIBUTION  
hdisk1            040:000:000   0%            001:039:000:000:000 
hdisk0            040:000:000   100%          000:000:040:000:000

Code:
 synclvodm  rootvg

returns with no errors.
It almost seems like i could just pull hdisk1 out and be ok at this point. Its a gut wrenching decision (probably wont do it though). I have re-ran bosboot -ad /dev/hdisk0 and made sure my bootlist lists hdisk0 first. If there was trouble as far as os problems, I would expect my OS by now to be choking and dying if I had any filesystem access, os command errors, accessing hdisk0, however, its still running fine (running DB2 and Informix developement DB's).
# 4  
Old 1 Day Ago
Quote:
Originally Posted by mrmurdock
I may not be in as bad of shape as I think i am. lslv -l hd2 shows hdisk1 with a 0% in the IN BAND column, which from the man pages sounds like the OS is not writing to the lv anymore.
Code:
# lslv -l hd2
hd2:/usr
PV                COPIES        IN BAND       DISTRIBUTION  
hdisk1            040:000:000   0%            001:039:000:000:000 
hdisk0            040:000:000   100%          000:000:040:000:000

Sorry, but: no. The "in band" means something completely different and has nothing to do with your problem. "In band" means: when you create LVs they are placed fittingly on the disk so that there is no place for extension. Like this, where a,b,c... mean the PPs of various LVs and X means free PPs:

Code:
aaaaabbccccccXXXXXXXX.....

Now, when you extend LVs or shrink them you over time end in a situation where this strict succcession is broken up, like this:

Code:
aaaccbbccccccaacbaXXXX.....

The initial situation is what is meant by "in band 100%": all the LVs are physically placed in one piece and the PPs are in the order of ascending LPs. Once your disk becomes more and more disorganised you can rectify this with the reorgvg command which moves around all the PPs until they are in order again. In your case the "in band 100%" comes from all PPs assigned to hd2 are placed on the "center" part of hdisk0 but on hdisk1 39 of the 40 are placed on "outer middle" and one is placed on "outer edge". Therefore the "in band" indicator shows 0%. But again, this has nothing to do with your problem.

Quote:
Originally Posted by mrmurdock
Code:
 synclvodm  rootvg

returns with no errors.
This just means that the information in the ODM about the composition of the rootvg is accurate. This is a good thing but still does not help your problem.

Quote:
Originally Posted by mrmurdock
It almost seems like i could just pull hdisk1 out and be ok at this point.
DON'T!!

As i said before the information about the VG is stored in the ODM and if you simply remove the disk (without using the reducevg procedure i explained above) you end with this information being NOT accurate any more. Prepare to manually repair the ODM in a rather tedious fashion afterwards if you do that. (Don't think you could put in another disk to make up: disks are identified by a unique "PVID" when they become part of a VG, so the system knows that this disk is not that disk.) Before you pull out the disk remove it cleanly from the ODM and this is done by using the commands i explained above.

I hope this helps.

bakunin

Last edited by bakunin; 1 Day Ago at 09:45 PM..
These 2 Users Gave Thanks to bakunin For This Post:
Don Cragun (22 Hours Ago) Neo (23 Hours Ago)
# 5  
Old 8 Hours Ago
this morning (or maybe after a nights rest), revealed the issue from lspv . The lspv hdisk1 this morining also shows the pv state: missing, although lspv shows all the disks online. none of the aix lvm commands are working on the disk (reducevg complains about the open hd2 lv, which is /usr, even if I use -f to force it). syncvg is not running in the background.
This is AIX 6.1 TL7 SP 10 1415 build date. I have had to run odmgets and odmdeletes before on other boxes. a little bit of tedious cleanup isnt all that bad. Unfortunately this is in a remote DC, so I have to rely on another pair of hands to pull the disk.Smilie

Code:
PHYSICAL VOLUME:    hdisk1                   VOLUME GROUP:     rootvg
PV IDENTIFIER:      00f649e07720beb9 VG IDENTIFIER     00f7382000004c000000015706543652
PV STATE:           missing                                    
STALE PARTITIONS:   38                       ALLOCATABLE:      yes
PP SIZE:            256 megabyte(s)          LOGICAL VOLUMES:  1
TOTAL PPs:          1117 (285952 megabytes)  VG DESCRIPTORS:   2
FREE PPs:           1077 (275712 megabytes)  HOT SPARE:        no
USED PPs:           40 (10240 megabytes)     MAX REQUEST:      1 megabyte
FREE DISTRIBUTION:  223..184..223..223..224                    
USED DISTRIBUTION:  01..39..00..00..00                         
MIRROR POOL:        None

AND ERRPT shows (finally)
Description
PV NO LONGER RELOCATING NEW BAD BLOCKS

Probable Causes
NON-MEDIA ERROR DURING SW RELOCATION

Failure Causes
DISK DRIVE
DISK DRIVE ELECTRONICS
STORAGE DEVICE CABLE
# 6  
Old 4 Hours Ago
Quote:
Originally Posted by mrmurdock
(reducevg complains about the open hd2 lv, which is /usr, even if I use -f to force it).
That was to be expected. I repeat:

Quote:
Do an unmirrorvg to completely make the disk empty and a reducevg to get it out of the VG.
You cannot use a reducevg on a disk which has not been emptied before. Since you have still a LV occupying space on the PV (even if it is only a mirror) you cannot remove the disk from the VG. You either have to remove the mirror on this disk first or move it to another PV.

If this is not due to a broken cable or controller (that would explain the "missing" status) the error message suggests that the disk was in its last throes anyway: when a disk is formatted (when included in a VG) a certain number of blocks is set aside to compensate for blocks going bad. They are used up over time. Once they are depleted (or nearly depleted) you usually see a series of TEMP hdisk errors (IIRC "hdisk error 3", usually stretched out over some days or weeks) before finally a PERM (IIRC "hdisk error 4") one in the errpt.

I hope this helps.

bakunin

Last edited by bakunin; 4 Hours Ago at 05:46 PM..
# 7  
Old 3 Hours Ago
so migratepv -l hd2 hdisk1 hdisk6 (yes I found a spare unused disk allocated, but I can delete the lv and vg on it. Smilie). My only concern would be since it cannot read the bad block to finish the mirror, is migratepv smart enough to move the stuck lv? I guess if migratepv cant, it will just error out.

|
Thread Tools Search this Thread
Search this Thread:
Advanced Search

More UNIX and Linux Forum Topics You Might Find Helpful
AIX break rootvg mirror from system down sshapiro AIX 6 04-27-2015 08:43 AM
What is the difference between DRD and Root Mirror Disk using LVM mirror ? maxim42 HP-UX 3 02-12-2013 01:10 PM
Clone or mirror your AIX OS larger disk to smaller disk ? filosophizer AIX 9 02-09-2013 04:21 PM
Metadevice Too Small To Attach A-Train Solaris 3 08-09-2012 05:56 PM
Attach HP EVA to IBM AIX powerpc singlepath Juri_al AIX 3 03-02-2012 08:25 AM
xm block-attach majid.merkava Emergency UNIX and Linux Support 1 04-08-2011 12:27 PM
Help with Update on attach rama krishna Solaris 0 03-16-2011 01:21 PM
Can not attach using mailx ultimatix Shell Programming and Scripting 10 10-20-2010 10:14 AM
AIX Rootvg mirror and sysdumplv kkeng808 AIX 3 03-29-2010 10:29 AM
What is mirror and sub mirror in RAID -1 SVM vr_mari Solaris 7 08-21-2009 01:56 PM
ZFS Mirror versus Hardware Mirror Lespaul20 Solaris 3 05-02-2009 09:48 AM
How to attach a file in a email ting123 UNIX for Dummies Questions & Answers 3 04-11-2009 07:03 PM
Display Mirror State AIX fenomen UNIX for Dummies Questions & Answers 2 07-08-2003 12:21 PM