IPMP group failed on Solaris 9

09-05-2019

Registered User

576, 19

Join Date: Mar 2011

Last Activity: 22 April 2020, 4:52 PM EDT

Posts: 576

Thanks Given: 83

Thanked 19 Times in 16 Posts

IPMP group failed on Solaris 9

Hi,

I have Solaris-9 server, V240.
I got alert that one of the interface on IPMP configuration, is failed. Found that two IPs (192.168.120.32 and 192.168.120.35) are not pingable from this server. These two IPs were plumbed on another server and that is decommissioned now. That is the reason, they are not pingable. For immediate fix, I plumbed both these IPs on another server and after that I was able to ping. I have seen this behaviour in other server, so I knew this may be the cause. But even after all IPs are pingable from routing table, I can't remove FAILED flag from ce0 interface.

Code:

# netstat -nr | grep 192.168.120.3
192.168.120.31 192.168.120.31 UGH 1 0
192.168.120.32 192.168.120.32 UGH 1 3
192.168.120.33 192.168.120.33 UGH 1 0
192.168.120.34 192.168.120.34 UGH 1 0
192.168.120.35 192.168.120.35 UGH 1 5
#
# ifconfig -a
lo0: flags=1000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4> mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
bge0: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 2
inet 192.168.120.51 netmask ffffff00 broadcast 192.168.120.255
groupname sbprd_data
ether 0:3:flag_ba:c4:51:dd
bge0:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
inet 192.168.120.50 netmask ffffff00 broadcast 192.168.120.255
ce0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
inet 192.168.67.50 netmask ffffff00 broadcast 192.168.67.255
ether 0:3:flag_ba:85:5e:bd
ce2: flags=39040803<UP,BROADCAST,MULTICAST,DEPRECATED,IPv4,NOFAILOVER,FAILED,STANDBY> mtu 1500 index 4
inet 192.168.120.52 netmask ffffff00 broadcast 192.168.120.255
groupname sbprd_data
ether 0:3:flag_ba:85:5e:bf
# if_mpadm -d bge0
Offline failed as there is no other functional interface available in the multipathing group for failing over the network access.
#
# snoop -d ce2
Using device /dev/ce (promiscuous mode)
^C
#
# cat /etc/hostname.bge0
sbprda-app1-bge0 group sbsd_data netmask + broadcast + -failover deprecated up \
addif sbprda-app1-prod netmask + broadcast + failover up
# cat /etc/hostname.ce2
sbprda-app1-ce2 group sbsd_data netmask + broadcast + deprecated -failover standby up
# cat /etc/hostname.ce0
sbprda-app1-ce0
#
# cat /etc/hosts| egrep "ce0|ce2|bge0" | grep -v "#"
192.168.120.51  sbprda-app1-bge0 sbprda-app1-bge0.xypoint.com
192.168.120.52  sbprda-app1-ce2 sbprda-app1-ce2.xypoint.com
192.168.67.50   sbprda-app1-ce0 sbprda-app1-ce0.xypoint.com sbprda-app1-bkp
#

I ran "pkill -HUP in.mpathd" on one terminal twice and checked /var/adm/messages on another session

Code:

Sep  5 18:26:25 sbprda-app1-prod in.mpathd[1290]: [ID 111610 daemon.error] SIGHUP: restart and reread config file
Sep  5 18:26:25 sbprda-app1-prod in.mpathd[18166]: [ID 215189 daemon.error] The link has gone down on ce2
Sep  5 18:26:25 sbprda-app1-prod in.mpathd[18166]: [ID 832587 daemon.error] Successfully failed over from NIC ce2 to NIC bge0

Sep  5 18:26:34 sbprda-app1-prod in.mpathd[18166]: [ID 111610 daemon.error] SIGHUP: restart and reread config file
Sep  5 18:26:34 sbprda-app1-prod in.mpathd[18347]: [ID 215189 daemon.error] The link has gone down on ce2
Sep  5 18:26:34 sbprda-app1-prod in.mpathd[18347]: [ID 832587 daemon.error] Successfully failed over from NIC ce2 to NIC bge0

Please suggest, what I am missing here and should check ?

Thanks

Last edited by solaris_1977; 09-06-2019 at 02:51 AM..

solaris_1977

View Public Profile for solaris_1977

Find all posts by solaris_1977

09-06-2019

Administrator

19,118, 3,359

Join Date: Sep 2000

Last Activity: 15 July 2022, 8:51 AM EDT

Location: Asia Pacific, Cyberspace, in the Dark Dystopia

Posts: 19,118

Thanks Given: 2,351

Thanked 3,359 Times in 1,878 Posts

Is it a production / continuity problem if you simply clear by rebooting the server?

Neo

View Public Profile for Neo

Visit Neo's homepage!

Find all posts by Neo

09-06-2019

Moderator

2,327, 710

Join Date: Feb 2012

Last Activity: 3 May 2020, 3:12 AM EDT

Location: Devon, UK

Posts: 2,327

Thanks Given: 442

Thanked 710 Times in 578 Posts

I have read your post#1 countless times and I must confess that I am at a loss to understand your question. Sorry about that I cannot give you a specific answer as a result.

So what I will do is bash some keys a provide some general network interface information as it pertains to Solaris 9. I apologize if you already know all this but we have to start somewhere. This might be a long post before I'm finished, I don't know, it's just going to be as it comes (into my head).

Why are you seemingly just plumbing missing IP addresses that you can't ping onto another system? With IPMP the same IP address is aggregated across two or more NICs (on the same machine).

If you want to configure IPMP you would do that BEFORE you 'plumb'. For example if you have interfaces bge0 and bge1, you would create an aggregate interface 'aggr1' for example and after that you would plumb and configure only aggr1. You would not try to configure bge0 and bge1 individually any more.

Now Solaris 9 will look for files /etc/hostname.<interface> at boot time and try to plumb those interfaces. If this system was restored from a different hardware platform, then you might for example have a file /etc/hostname.ce0 existing causing Solaris to try to plumb ce0 at boot-time when ce0 doesn't actually exist on this hardware. To stop Solaris from trying to plumb ce0 simply delete the /etc/hostname.ceo file.

When Solaris finds a file /etc/hostname.<interface> at boot-time, it reads the hostname from this file and then (assuming the interface is not configured for DHCP of course) goes to /etc/hosts and looks up the IP address it should use on this interface.

If you aggregate bge0 and bge1 into aggr1, then a file /etc/hostname.aggr1 is created which Solaris will try to plumb at boot-time.

Now, you are trying to get a FAIL message for ce0 to disappear, yes? I can think of only two possibilities why a system would complain about ce0 FAIL:

1. File /etc/hostname.ce0 exists but actual interface ce0 does not exist on this hardware. Delete the file.
2. The interface ce0 does not exist on this platform but is included in an aggregate IPMP configuration that has been restored from a different hardware platform. Down the aggregate interface and delete the IPMP configuration, then recreate the aggregate with interfaces that do exist on this platform and exclude ce0 which doesn't.

Aggregating interfaces has nothing to do with other systems on the LAN. Provided the network cables from the aggregated interfaces go to network switch(es) that understand multi-pathing then all should be well.

I'm going to stop there. If I've completely misunderstood your question then please give us a clue what this is about please.

Hope that helps in some way.

hicksd8

View Public Profile for hicksd8

Find all posts by hicksd8

09-06-2019

Registered User

576, 19

Join Date: Mar 2011

Last Activity: 22 April 2020, 4:52 PM EDT

Posts: 576

Thanks Given: 83

Thanked 19 Times in 16 Posts

I am sorry to have confused you. I clubbed two issues in one. I will re-word this issue.

IPMP is already configured on this server. Suddenly I got alert that IPMP group is failed over due to some error. When I logged into the server, I found that ce2 was in FAILED status, instead of the usual INACTIVE state.

/etc/hostname.ce2 file is there and the physical interface is also present. There was never any change in its setup. Physically I can see light blinking on network port behind the server. But since this interface is in FAILED states, IPMP is broken. Running snoop on ce2, is not giving me any result. To test this, I tried to detach bge0 and it is not working

Code:

# if_mpadm -d bge0
Offline failed as there is no other functional interface available in the multipathing group for failing over the network access.
#
# cat /etc/hostname.ce2
sbprda-app1-ce2 group sbsd_data netmask + broadcast + deprecated -failover standby up
#

---------------------UPDATE-----------------
Found that cable had problem. After replacing that, I was able to fix this issue

Last edited by solaris_1977; 09-06-2019 at 11:44 PM..

This User Gave Thanks to solaris_1977 For This Post:

solaris_1977

View Public Profile for solaris_1977

Find all posts by solaris_1977

09-07-2019

Registered User

5,091, 1,931

Join Date: May 2012

Last Activity: 15 July 2020, 4:46 AM EDT

Location: Simplicity

Posts: 5,091

Thanks Given: 565

Thanked 1,931 Times in 1,668 Posts

Thanks for the update.
If a NIC suddenly fails, and no admin did something to your system or to the LAN switch then the next idea is hardware.

The IPMP concept is quite different from the port aggregation concept.
Does the latter exist in Solaris 9 at all? In the early days you had to purchase SunTrunking software.

These 2 Users Gave Thanks to MadeInGermany For This Post:

MadeInGermany

View Public Profile for MadeInGermany

Find all posts by MadeInGermany

09-07-2019

Moderator

2,327, 710

Join Date: Feb 2012

Last Activity: 3 May 2020, 3:12 AM EDT

Location: Devon, UK

Posts: 2,327

Thanks Given: 442

Thanked 710 Times in 578 Posts

@MadeInGermany........................That's an interesting point you make. AFAIR port aggregation was around long before multi-pathing (IPMP) as it's a simpler technology (isn't it??).

I assumed that since this is Solaris 9 we were talking aggregation and, from the posts, it sounded to me that one port going down (perhaps by unplugging the cable) stopped all communication thereby indicating that the other aggregated port was already down.

Perhaps I misunderstood the question in the first place. I had real difficulty getting a handle on it.

Yes, okay, I know that we techies are continuing a thread that's already tagged as solved.

Last edited by hicksd8; 09-07-2019 at 10:32 AM..

This User Gave Thanks to hicksd8 For This Post:

hicksd8

View Public Profile for hicksd8

Find all posts by hicksd8

Solaris

IPMP group failed on Solaris 9

9 More Discussions You Might Find Interesting

1. Solaris

Solaris 10 IPMP - failback=no

Discussion started by: javanoob

2. Solaris

New to Solaris IPMP (conversion from Linux)

Discussion started by: javanoob

3. Solaris

IPMP over aggregate in Solaris 11

Discussion started by: sylvain

4. Solaris

Solaris 10 branded zone with IPMP

Discussion started by: aeroforce

5. Solaris

Solaris IPMP

Discussion started by: vks47

6. Solaris

how to configure IPMP in solaris 9

Discussion started by: jaganblore

7. Solaris

IPMP group failure when gateway not detected

Discussion started by: Sun Fire

8. Solaris

Does Veritas Cluster work with IPMP on Solaris 10?

Discussion started by: dfezz1

9. Solaris

Solaris IP Multipathing (IPMP) Help

Discussion started by: mainegeek