Question about Grub2 and Multipathing


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users Question about Grub2 and Multipathing
# 1  
Old 10-10-2018
Question about Grub2 and Multipathing

I have a bit of an odd situation I would like to float out here and see if anyone has any ideas on this..

We are working on doing Disaster Recovery on a number of RHEL 7.4 systems. These are running on Cisco Blade Servers.

The mount point for /boot is on a Multipathed SAN LUN. There are a couple of other SAN volumes as well in disk groups; none of those are being problematic, but /boot is.

Code:
/dev/mapper/360000970000197700328533030344233p1  976M  175M  751M  19% /boot

Above is the mount for the production system. We use SRDF to copy this data to a DR site and then clone the storage and boot up the same hardware at the DR site; Cisco blade servers (same models) as in production.

When we boot up the DR nodes, the system understandably gets a little 'confused' and will mount /boot to a RAW device.

Code:
/dev/sdy1                            976M  175M  751M  19% /boot

Note /dev/sd* instead of the MPATH device /dev/mapper/***

So to fix this:
First, I run a 'multipath -W' and this corrects the extra WWIDs in /etc/multipath/wwids file.

Code:
multipath -W
successfully reset wwids

Now the WWIDS from the production side are gone and only the new ones that are at the DR site exist - So that file is now a happy file. Smilie

After that, I need to add a filter to /etc/lvm/lvm.conf to ignore any devices aside from the /dev/mapper devices (RedHat support suggested that I add the global_filter as well - but it seemed to work ok with just 'filter', but it didn't hurt either..

Code:
filter = [ "a|/dev/mapper/.*|", "r|.*|" ]
global_filter = [ "a|/dev/mapper/.*|", "r|.*|" ]

Then - create a new initramfs image:

Code:
dracut --force --add multipath --include /etc/multipath

And reboot.
The server comes back up in either rescue or emergency mode (I'll pay more attention next time) and EACH TIME, running grub2-mkconfig fixes it - and the server boots just fine.

I need to figure out what's going on for my own geeky-obsessive-ness. The thing is, I saved a backup copy of /boot/grub2/grub.cfg and compared it to the new one that was generated in emergency mode and there are zero differences. I used notepad ++ and did a file comparison - even adding a character to verify the plug-in was working right and I can find no difference at all between the two files.

I thought that grub2-mkconfig just generated a new grub.cfg file, but it almost seems like something else is going on here as well.

Any ideas?

It's not that I can't get these servers back online, it's just that I would like to skip the reboot into rescue mode - as we are looking to automate this process as much as possible.

We have recovered these 4 nodes a couple of times - this process seems consistent. I just can't figure out what change grub2-mkconfig is making to the system to get it to boot!

Thanks in advance!
# 2  
Old 10-10-2018
I have no experience with boot from SAN on metal, to put a small disclaimer upfront.

Not related to the problem...what would really help you is to have a separated installed operating system on both environments (DCs), with data part replicated in separate volume groups.
That way, only thing you do is import the volume group and (possibly or not depends) an ip assignment, depending on the topology and services such as vrrp or higher layer used.

I never endorsed SAN boot on metal personally (but in vms of course), always had couple of local disks to mirror ...

Point being, i would go with service separation and clean install on each of the sites, while cloning data part via storage methods.

Hopefully someone else who used linux SAN boot will be of more assistance with the actual problem.

Regards
Peasant.
This User Gave Thanks to Peasant For This Post:
# 3  
Old 10-10-2018
Some of that is outside of my control Smilie

This is the method that was decided upon... we have become pretty endeared to using recovered Virtual Machines or volumes from our Production side; overall it has reduced our RTO significantly due to the ease of importing synchronized data.

Part of the long standing issue we have had with maintaining the OS portion at the DR site is changes made in production that don't get replicated to the DR environment. Those changes are supposed to be done, but often still get missed. While there is always the 'so and so forgot to...' and then management whines about it; doing a full replication of even the boot environments simply factors out those human problems. For instance - a new mount point is added or an old mount point is removed.. a ton of little annoying things. Then when we test out DR processes, sometimes these 'little things' that we are unaware were changed, can cost us large amounts of time. Lately, we have been consolidating ZPOOLS on Solaris and every time we test - we run into issues with them.

But overall this works very well. Everything is 100% except for this /boot MPATH issue; and honestly - the systems will run on a single path just fine. But you know.. that's not good enough for a geek such as myself! lol

Thanks for your input too Smilie
# 4  
Old 10-11-2018
After some reading it looks like a bug
RHEL7: Booting fails for cloned SAN root disk on multipath systems - Red Hat Customer Portal

Can you check if the version mentioned on top is your version of dracut

Regards
Peasant.
# 5  
Old 10-11-2018
That might just be it... I did find that running grub is *entirely* irrelevant, as /boot isn't even mounted in rescue mode in my case.

I added filters to /etc/lvm/lvm.conf
Rebuilt the initramfs ( dracut --force --add multipath --include /etc/multipath )
Cleaned up the additional WWIDs using 'multipath -W'

Then rebooted - came up in rescue mode.
I did nothing - but reboot again, and it came up just fine.

I will check into this bug and workaround and see what happens. Thanks for the find Peasant Smilie

------ Post updated at 01:54 PM ------

Oh and the version is a bit higher..

dracut-033-502
But on article you linked, under the "affected systems" - the output is what I'm getting..

Either way, I went ahead and updated to dracut-033-552.el7 - just to see if there's any difference.

I think these are the related errors on boot:

------ Post updated at 01:56 PM ------

Only at 4 posts, so can't post a link - but if you take out the spaces, I uploaded a pic of the KVM screen with one of the errors I think are related.

Aukro and eBay auction gallery - eBayPhotoGallery.com

Last edited by Corona688; 10-11-2018 at 03:17 PM.. Reason: fixed link
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. UNIX for Beginners Questions & Answers

Unable to edit grub2 boot screen in centos 7

I have centos 7 gui installed on vmware workstation12 on my laptop.WhenI want to pause my splash screen while starting my centos 7 using the 'esc key' nothing happens and the system just boots up.I also see a entry for aci_memory_fail... entry during the boot process.help me fix the system. (1 Reply)
Discussion started by: sabsac
1 Replies

2. Red Hat

Verify multipathing

I have a couple of questions regarding multipath. If I do vgdisplay vg01, I see it is using 1 PV: /dev/dm-13 If I type multipath -ll I see dm-9, dm-10, dm-11, dm-12, but do not see dm-13. Is my vg01 multipathed? How can I actually know for sure? Secondly, let's say this time vg01 says... (1 Reply)
Discussion started by: keelba
1 Replies

3. Red Hat

GRUB2 + UEFI issue, new entry each boot

Hello This is happening on: 3.13.7-200.fc20.x86_64 This happened already some weeks ago, until now i didn install linunx onto this machine, as i had to turn in the laptop to the service center so they could fix the UEFI flash storage. Either way, its happening again. as i installed Fedora 20... (1 Reply)
Discussion started by: sea
1 Replies

4. Linux

grub2 startup freeze

I got a dual boot with grub2, but everytime I turn on the computer and the booter is loaded, I can't handle the menu, so I am forced to wait the countdown and choose the default option. I'd really like to know why! This is my grub.cfg, # # DO NOT EDIT THIS FILE # # It is automatically... (0 Replies)
Discussion started by: Luke Bonham
0 Replies

5. Debian

Grub2 (dual boot, dmraid) cannot run Debian6

Hello, firstly excuse for my poor english. I have a busybox error when I try to run Debian 6. It's like Grub cannot find root (initramfs) My system is: - RAID0 with dmraid - /boot ext2 (from moonOS installation --ubuntu based--) - ext4 (moonOS wich have the Grub2 installation, where I... (0 Replies)
Discussion started by: neutralTTY
0 Replies

6. UNIX for Dummies Questions & Answers

Can't get puppy to work with grub2

Hello, I'm trying to put puppy linux 4.2.1 (I can't use the latest because it won't boot on my hardware) on a grub2'd usb drive. It throws the error that it can't find pup_421.sfs Here is my /boot/grub/grub.cfg menuentry "Puppy 4.2.1" { loopback loop... (2 Replies)
Discussion started by: Narnie
2 Replies

7. Solaris

Multipathing - problem

Hello, I turned on the server multipathing: # uname -a SunOS caiman 5.10 Generic_141444-09 sun4v sparc SUNW,T5140 stmsboot -D fp -e And after a reboot the server, multipathing is not enable: # stmsboot -L stmsboot: MPxIO is not enabled stmsboot: MPxIO disabled # ls /dev/dsk... (4 Replies)
Discussion started by: bieszczaders
4 Replies

8. Solaris

Solaris multipathing

Hai we using emc storage which is conneted to M5000 through san switch. we asign 13 luns but in server it is showing 22 luns. i enable the solaris multipathing (MPxIO) #more /kernel/drv/fp.conf in that file MPxio-disable=no #mpathadm list lu it shows ... (2 Replies)
Discussion started by: joshmani
2 Replies

9. Solaris

Solaris IP Multipathing

Hi, I saw your post on the forums about how to setup IP multipathing. I wanted your help on the below situation . I have 2 servers A and B . Now they should be connected to 2 network switches . S1 and S2. Now is it possible to have IP Multipathing on each of the servers as follows ? ... (0 Replies)
Discussion started by: maadhuu
0 Replies

10. Solaris

solaris multipathing

I have solaris 10 sparc. I installed a Qlogic hba card. This card is connected on a brocade switch and the brocade is connected on 2 different controllers on a Hitachi disk bay. I formated 2 luns. On my solaris system, i have 4 disk. How to configure solaris 10 to fix the dual disk view. ... (4 Replies)
Discussion started by: simquest
4 Replies
Login or Register to Ask a Question