IPMP group failed on Solaris 9


Login or Register for Dates, Times and to Reply

 
Thread Tools Search this Thread
# 1  
IPMP group failed on Solaris 9

Hi,

I have Solaris-9 server, V240.
I got alert that one of the interface on IPMP configuration, is failed. Found that two IPs (192.168.120.32 and 192.168.120.35) are not pingable from this server. These two IPs were plumbed on another server and that is decommissioned now. That is the reason, they are not pingable. For immediate fix, I plumbed both these IPs on another server and after that I was able to ping. I have seen this behaviour in other server, so I knew this may be the cause. But even after all IPs are pingable from routing table, I can't remove FAILED flag from ce0 interface.
Code:
# netstat -nr | grep 192.168.120.3
192.168.120.31 192.168.120.31 UGH 1 0
192.168.120.32 192.168.120.32 UGH 1 3
192.168.120.33 192.168.120.33 UGH 1 0
192.168.120.34 192.168.120.34 UGH 1 0
192.168.120.35 192.168.120.35 UGH 1 5
#
# ifconfig -a
lo0: flags=1000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4> mtu 8232 index 1
inet 127.0.0.1 netmask ff000000
bge0: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 2
inet 192.168.120.51 netmask ffffff00 broadcast 192.168.120.255
groupname sbprd_data
ether 0:3:flag_ba:c4:51:dd
bge0:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
inet 192.168.120.50 netmask ffffff00 broadcast 192.168.120.255
ce0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
inet 192.168.67.50 netmask ffffff00 broadcast 192.168.67.255
ether 0:3:flag_ba:85:5e:bd
ce2: flags=39040803<UP,BROADCAST,MULTICAST,DEPRECATED,IPv4,NOFAILOVER,FAILED,STANDBY> mtu 1500 index 4
inet 192.168.120.52 netmask ffffff00 broadcast 192.168.120.255
groupname sbprd_data
ether 0:3:flag_ba:85:5e:bf
# if_mpadm -d bge0
Offline failed as there is no other functional interface available in the multipathing group for failing over the network access.
#
# snoop -d ce2
Using device /dev/ce (promiscuous mode)
^C
#
# cat /etc/hostname.bge0
sbprda-app1-bge0 group sbsd_data netmask + broadcast + -failover deprecated up \
addif sbprda-app1-prod netmask + broadcast + failover up
# cat /etc/hostname.ce2
sbprda-app1-ce2 group sbsd_data netmask + broadcast + deprecated -failover standby up
# cat /etc/hostname.ce0
sbprda-app1-ce0
#
# cat /etc/hosts| egrep "ce0|ce2|bge0" | grep -v "#"
192.168.120.51  sbprda-app1-bge0 sbprda-app1-bge0.xypoint.com
192.168.120.52  sbprda-app1-ce2 sbprda-app1-ce2.xypoint.com
192.168.67.50   sbprda-app1-ce0 sbprda-app1-ce0.xypoint.com sbprda-app1-bkp
#

I ran "pkill -HUP in.mpathd" on one terminal twice and checked /var/adm/messages on another session
Code:
Sep  5 18:26:25 sbprda-app1-prod in.mpathd[1290]: [ID 111610 daemon.error] SIGHUP: restart and reread config file
Sep  5 18:26:25 sbprda-app1-prod in.mpathd[18166]: [ID 215189 daemon.error] The link has gone down on ce2
Sep  5 18:26:25 sbprda-app1-prod in.mpathd[18166]: [ID 832587 daemon.error] Successfully failed over from NIC ce2 to NIC bge0

Sep  5 18:26:34 sbprda-app1-prod in.mpathd[18166]: [ID 111610 daemon.error] SIGHUP: restart and reread config file
Sep  5 18:26:34 sbprda-app1-prod in.mpathd[18347]: [ID 215189 daemon.error] The link has gone down on ce2
Sep  5 18:26:34 sbprda-app1-prod in.mpathd[18347]: [ID 832587 daemon.error] Successfully failed over from NIC ce2 to NIC bge0

Please suggest, what I am missing here and should check ?

Thanks

Last edited by solaris_1977; 09-06-2019 at 02:51 AM..
# 2  
Is it a production / continuity problem if you simply clear by rebooting the server?
# 3  
I have read your post#1 countless times and I must confess that I am at a loss to understand your question. Sorry about that I cannot give you a specific answer as a result.

So what I will do is bash some keys a provide some general network interface information as it pertains to Solaris 9. I apologize if you already know all this but we have to start somewhere. This might be a long post before I'm finished, I don't know, it's just going to be as it comes (into my head).

Why are you seemingly just plumbing missing IP addresses that you can't ping onto another system? With IPMP the same IP address is aggregated across two or more NICs (on the same machine).

If you want to configure IPMP you would do that BEFORE you 'plumb'. For example if you have interfaces bge0 and bge1, you would create an aggregate interface 'aggr1' for example and after that you would plumb and configure only aggr1. You would not try to configure bge0 and bge1 individually any more.

Now Solaris 9 will look for files /etc/hostname.<interface> at boot time and try to plumb those interfaces. If this system was restored from a different hardware platform, then you might for example have a file /etc/hostname.ce0 existing causing Solaris to try to plumb ce0 at boot-time when ce0 doesn't actually exist on this hardware. To stop Solaris from trying to plumb ce0 simply delete the /etc/hostname.ceo file.

When Solaris finds a file /etc/hostname.<interface> at boot-time, it reads the hostname from this file and then (assuming the interface is not configured for DHCP of course) goes to /etc/hosts and looks up the IP address it should use on this interface.

If you aggregate bge0 and bge1 into aggr1, then a file /etc/hostname.aggr1 is created which Solaris will try to plumb at boot-time.

Now, you are trying to get a FAIL message for ce0 to disappear, yes? I can think of only two possibilities why a system would complain about ce0 FAIL:

1. File /etc/hostname.ce0 exists but actual interface ce0 does not exist on this hardware. Delete the file.
2. The interface ce0 does not exist on this platform but is included in an aggregate IPMP configuration that has been restored from a different hardware platform. Down the aggregate interface and delete the IPMP configuration, then recreate the aggregate with interfaces that do exist on this platform and exclude ce0 which doesn't.

Aggregating interfaces has nothing to do with other systems on the LAN. Provided the network cables from the aggregated interfaces go to network switch(es) that understand multi-pathing then all should be well.

I'm going to stop there. If I've completely misunderstood your question then please give us a clue what this is about please.

Hope that helps in some way.
# 4  
I am sorry to have confused you. I clubbed two issues in one. I will re-word this issue.

IPMP is already configured on this server. Suddenly I got alert that IPMP group is failed over due to some error. When I logged into the server, I found that ce2 was in FAILED status, instead of the usual INACTIVE state.

/etc/hostname.ce2 file is there and the physical interface is also present. There was never any change in its setup. Physically I can see light blinking on network port behind the server. But since this interface is in FAILED states, IPMP is broken. Running snoop on ce2, is not giving me any result. To test this, I tried to detach bge0 and it is not working
Code:
# if_mpadm -d bge0
Offline failed as there is no other functional interface available in the multipathing group for failing over the network access.
#
# cat /etc/hostname.ce2
sbprda-app1-ce2 group sbsd_data netmask + broadcast + deprecated -failover standby up
#

---------------------UPDATE-----------------
Found that cable had problem. After replacing that, I was able to fix this issue

Last edited by solaris_1977; 09-06-2019 at 11:44 PM..
This User Gave Thanks to solaris_1977 For This Post:
# 5  
Thanks for the update.
If a NIC suddenly fails, and no admin did something to your system or to the LAN switch then the next idea is hardware.

The IPMP concept is quite different from the port aggregation concept.
Does the latter exist in Solaris 9 at all? In the early days you had to purchase SunTrunking software.
These 2 Users Gave Thanks to MadeInGermany For This Post:
# 6  
@MadeInGermany........................That's an interesting point you make. AFAIR port aggregation was around long before multi-pathing (IPMP) as it's a simpler technology (isn't it??).

I assumed that since this is Solaris 9 we were talking aggregation and, from the posts, it sounded to me that one port going down (perhaps by unplugging the cable) stopped all communication thereby indicating that the other aggregated port was already down.

Perhaps I misunderstood the question in the first place. I had real difficulty getting a handle on it.

Yes, okay, I know that we techies are continuing a thread that's already tagged as solved.

Last edited by hicksd8; 09-07-2019 at 10:32 AM..
This User Gave Thanks to hicksd8 For This Post:
Login or Register for Dates, Times and to Reply

Previous Thread | Next Thread
Thread Tools Search this Thread
Search this Thread:
Advanced Search

Test Your Knowledge in Science: Mathematics
Difficulty: Medium
Zero factorial is equal to zero.
True or False?

9 More Discussions You Might Find Interesting

1. Solaris

Solaris 10 IPMP - failback=no

Hi all, Just a few questions -> Is an "OFFLINE" interface going back to "ONLINE" consider as a failback by IPMP ? I have "FAILBACK=no" in my /etc/default/mpathd; however when i do the following (igb0 and igb7 are in the same ipmp link based group) q1) why does "if_mpadm -r igb7" cause... (0 Replies)
Discussion started by: javanoob
0 Replies

2. Solaris

New to Solaris IPMP (conversion from Linux)

Hi all, I been reading examples of how to setup IPMP and how it differs from Etherchannel. However, i am still unsure of how it really works and i hope gurus here can shed some light on the questions I have below while i will lab it up for my own test -> q1) for IPMP, there is no such thing... (23 Replies)
Discussion started by: javanoob
23 Replies

3. Solaris

IPMP over aggregate in Solaris 11

hi all, i start with solaris 11 and i am disapointed by the change on ip managing. i want to set a ipmp over tow aggregate but i dont find any doc and i am lost with the new commande switch1 net0 aggregate1 | net1 aggregate1 |-----| |... (1 Reply)
Discussion started by: sylvain
1 Replies

4. Solaris

Solaris 10 branded zone with IPMP

All. I am trying to create a 10 branded zone on a Sol 11.1 T5. The Global is using IPMP...so aggregating is out of the question. Has anyone successfully created a branded zone with IPMP? If they have can you please show me the steps you took to get this to run. Thanks (4 Replies)
Discussion started by: aeroforce
4 Replies

5. Solaris

Solaris IPMP

Can any one please explain me the concept behind IPMP in solaris clustering.Basic explanation would be really appreciated... Thanks in Advance vks (2 Replies)
Discussion started by: vks47
2 Replies

6. Solaris

how to configure IPMP in solaris 9

Hi friends , can anyone provide me the complete steps to configure IPMP in solaris 9 or 10 provided i have two NIC card ? regards jagan (4 Replies)
Discussion started by: jaganblore
4 Replies

7. Solaris

IPMP group failure when gateway not detected

A problem happened with me, I was configuring IP for two network interfaces, and when I rebooted the system, everything is working but after like 3 or 5 minutes it will tell me that the whole IPMP group has failed ! I tried to troubleshoot, so I found that the gateway is not reachable..so I... (4 Replies)
Discussion started by: Sun Fire
4 Replies

8. Solaris

Does Veritas Cluster work with IPMP on Solaris 10?

Does Veritas Cluster work with IPMP on Solaris 10? If anyone has set it up do you have a doc or tips? I have heard several different statements ranging from , not working at all to Yes it works! Great How? * Test and Base IPs???? * configure the MultiNICB agent ? I can give details... (1 Reply)
Discussion started by: dfezz1
1 Replies

9. Solaris

Solaris IP Multipathing (IPMP) Help

Hello All, I work for a Health care company at a local trauma hospital. I maintain a Picture Archiving and Communication System (PAC's). Basically, any medical images (X-Ray, CT, MRI, Mammo, etc) are stored digitally on the servers for viewing and dictation from diagnostic stations. I took over... (10 Replies)
Discussion started by: mainegeek
10 Replies

Featured Tech Videos