AIX server problem - network connection is unstable !


 
Thread Tools Search this Thread
Operating Systems AIX AIX server problem - network connection is unstable !
# 1  
Old 03-04-2020
This is a pretty large box, a E870 as a matter of fact. Is the network coming from a VIO? If so: Is the VIO network setup correctly? Does the VIO have the correct resource assignments to provide services to however many lpars you have running?

I ask the above because I encountered an issue where the VIO was set up by an MSP and what was happening is every so often the active network path would switch from say c1-p1-t1 to c10-p1-t1. That would cause a momentary ping delay. Much like what you are seeing. My network team was the first to report it to me, as they would see the MAC for the etherchannel device change which port it was reporting on the switch. The item they sent me was %SW_MATM-4-MACFLAP_NOTIF. It was caused by the etherchannel on the VIO having two primary adapters and no backup adapter. Taking one of the primary adapters and moving it to the backup adapter fixed the issue. Your results may vary.

If they are dedicated adapters and not coming from the VIO, again check your AIX configuration. Something to note, your tcp_recvspace and rfc1323 are not consistent with your adapters. That might be by design, but gives me pause that network wasn't set up correctly.

Now, everything else that everyone has posted here comes into play as well, and I'm not a network admin so I cannot weigh in on the other topics presented here.

Push comes to shove, start a ticket with the IBM AIX team.
These 2 Users Gave Thanks to RecoveryOne For This Post:
# 2  
Old 03-09-2020
Quote:
Yes, I understand.

When on a LAN segment you should also realize that every hardware interface to the LAN can have a different characteristic.

Some ethernet cards are more "chatty" and some are less "chatty" and some are "old" and others are "new" and some have drivers / firmware written by "A" and others have drivers / firmware written by "B".

You say "you don't have a lot of knowledge"... that's normal.

So, I am telling you what an experienced network engineer with a lot of knowledge would so.

If I had a busy LAN segment with a lot of devices on the same segment (subnet), as you have indicated, and I had two devices which I wanted to have the best communication speed between them, I would put them on their own LAN segment (subnet) and "be done with it" and maybe "retest" when the two devices are the only two devices (or three if you have a different gateway device) on that segment.
Quote:
Originally Posted by RecoveryOne
This is a pretty large box, a E870 as a matter of fact. Is the network coming from a VIO? If so: Is the VIO network setup correctly? Does the VIO have the correct resource assignments to provide services to however many lpars you have running?

I ask the above because I encountered an issue where the VIO was set up by an MSP and what was happening is every so often the active network path would switch from say c1-p1-t1 to c10-p1-t1. That would cause a momentary ping delay. Much like what you are seeing. My network team was the first to report it to me, as they would see the MAC for the etherchannel device change which port it was reporting on the switch. The item they sent me was %SW_MATM-4-MACFLAP_NOTIF. It was caused by the etherchannel on the VIO having two primary adapters and no backup adapter. Taking one of the primary adapters and moving it to the backup adapter fixed the issue. Your results may vary.

If they are dedicated adapters and not coming from the VIO, again check your AIX configuration. Something to note, your tcp_recvspace and rfc1323 are not consistent with your adapters. That might be by design, but gives me pause that network wasn't set up correctly.

Now, everything else that everyone has posted here comes into play as well, and I'm not a network admin so I cannot weigh in on the other topics presented here.

Push comes to shove, start a ticket with the IBM AIX team.
We're running with full resources, no VIO.

We've asked network team to recheck the network. We are also planning to upgrade OS to 7.1 to get help from IBM.
# 3  
Old 03-09-2020
please show me the outputs of following commands (if ent is mentioned for every ent adapter configured, if ipaddress is stated for every IP of your cluster partners)
Code:
lsdev -Cc adapter
entstat entx
no -a
vmstat -IWwt 2 10
ping -c 10  ipaddress 25000

Are your cluster partner's IP addresses in /etc/hosts
how are /etc/netsvc.conf and /etc/resolv.conf configured (order of things)
how many disks are part of your gpfs cluster
This User Gave Thanks to zxmaus For This Post:
# 4  
Old 03-11-2020
ok, I don't see any errors or overflows or underruns on any of your adapters.That is a good thing. You can transfer large package sizes to everywhere else without loosing packages - which is a good thing too.
It would have been nice to know which adapters make up which link aggregation, but I forgot to ask for it Smilie
A few things I see in your network tunables, that I would probably change if this would be my systems to improve the general network flow, (i.e. sack, tcp_nodelayack, and buffer sizes) but for that it would help to know exactly which adapters make up which link aggregation. So if you could show me the lsattr -El entx outputs for your link aggregations and the underlying physical adapters that would help to make sure you dont use different speeds and depending on the adapters, the device attributes are set correctly. It might help as well to set certain settings on the adapters themselves in addition to the general system settings, like rfc1323 and buffers. And I can see that you have relatively little free memory. This may or may not be a problem so if you can give me the outputs of vmstat -v and vmstat -s, that would help me tell you the answer to this. You may as well want to populate netsvc.conf as it helps with order of name resolution and might generally improve network speed when the system does not have to guess how to find another host.
To get rid of the pkcs error on top of the lsdev -Cc adapter output, you probably need to install the security.pkcs11 fileset. And finally, can you please run lppchk -vm3 and post the output as well?.
This User Gave Thanks to zxmaus For This Post:
# 5  
Old 03-11-2020
Quote:
Originally Posted by zxmaus
ok, I don't see any errors or overflows or underruns on any of your adapters.That is a good thing. You can transfer large package sizes to everywhere else without loosing packages - which is a good thing too.
It would have been nice to know which adapters make up which link aggregation, but I forgot to ask for it Smilie
A few things I see in your network tunables, that I would probably change if this would be my systems to improve the general network flow, (i.e. sack, tcp_nodelayack, and buffer sizes) but for that it would help to know exactly which adapters make up which link aggregation. So if you could show me the lsattr -El entx outputs for your link aggregations and the underlying physical adapters that would help to make sure you dont use different speeds and depending on the adapters, the device attributes are set correctly. It might help as well to set certain settings on the adapters themselves in addition to the general system settings, like rfc1323 and buffers. And I can see that you have relatively little free memory. This may or may not be a problem so if you can give me the outputs of vmstat -v and vmstat -s, that would help me tell you the answer to this. You may as well want to populate netsvc.conf as it helps with order of name resolution and might generally improve network speed when the system does not have to guess how to find another host.
To get rid of the pkcs error on top of the lsdev -Cc adapter output, you probably need to install the security.pkcs11 fileset. And finally, can you please run lppchk -vm3 and post the output as well?.
Thanks for your help. This's what you need. Hope you can find something

Quote:
sysopr1@oltpn8c:/home/sysopr1>lsattr -El ent32
adapter_names ent2,ent18 EtherChannel Adapters True
alt_addr 0x000000000000 Alternate EtherChannel Address True
auto_recovery yes Enable automatic recovery after failover True
backup_adapter NONE Adapter used when whole channel fails True
hash_mode default Determines how outgoing adapter is chosen True
interval long Determines interval value for IEEE 802.3ad mode True
mode 8023ad EtherChannel mode of operation True
netaddr 0 Address to ping True
noloss_failover yes Enable lossless failover after ping failure True
num_retries 3 Times to retry ping before failing True
retry_time 1 Wait time (in seconds) between pings True
use_alt_addr no Enable Alternate EtherChannel Address True
use_jumbo_frame no Enable Gigabit Ethernet Jumbo Frames True


sysopr1@oltpn8c:/home/sysopr1>lsattr -El ent33
adapter_names ent3,ent19 EtherChannel Adapters True
alt_addr 0x000000000000 Alternate EtherChannel Address True
auto_recovery yes Enable automatic recovery after failover True
backup_adapter NONE Adapter used when whole channel fails True
hash_mode default Determines how outgoing adapter is chosen True
interval long Determines interval value for IEEE 802.3ad mode True
mode 8023ad EtherChannel mode of operation True
netaddr 0 Address to ping True
noloss_failover yes Enable lossless failover after ping failure True
num_retries 3 Times to retry ping before failing True
retry_time 1 Wait time (in seconds) between pings True
use_alt_addr no Enable Alternate EtherChannel Address True
use_jumbo_frame no Enable Gigabit Ethernet Jumbo Frames True


sysopr1@oltpn8c:/home/sysopr1>lsattr -El ent34
adapter_names ent10,ent26 EtherChannel Adapters True
alt_addr 0x000000000000 Alternate EtherChannel Address True
auto_recovery yes Enable automatic recovery after failover True
backup_adapter NONE Adapter used when whole channel fails True
hash_mode default Determines how outgoing adapter is chosen True
interval long Determines interval value for IEEE 802.3ad mode True
mode 8023ad EtherChannel mode of operation True
netaddr 0 Address to ping True
noloss_failover yes Enable lossless failover after ping failure True
num_retries 3 Times to retry ping before failing True
retry_time 1 Wait time (in seconds) between pings True
use_alt_addr no Enable Alternate EtherChannel Address True
use_jumbo_frame no Enable Gigabit Ethernet Jumbo Frames True


sysopr1@oltpn8c:/home/sysopr1>lsattr -El ent35
adapter_names ent11,ent27 EtherChannel Adapters True
alt_addr 0x000000000000 Alternate EtherChannel Address True
auto_recovery yes Enable automatic recovery after failover True
backup_adapter NONE Adapter used when whole channel fails True
hash_mode default Determines how outgoing adapter is chosen True
interval long Determines interval value for IEEE 802.3ad mode True
mode 8023ad EtherChannel mode of operation True
netaddr 0 Address to ping True
noloss_failover yes Enable lossless failover after ping failure True
num_retries 3 Times to retry ping before failing True
retry_time 1 Wait time (in seconds) between pings True
use_alt_addr no Enable Alternate EtherChannel Address True
use_jumbo_frame no Enable Gigabit Ethernet Jumbo Frames True


sysopr1@oltpn8c:/home/sysopr1>lsattr -El ent36
adapter_names ent12,ent28 EtherChannel Adapters True
alt_addr 0x000000000000 Alternate EtherChannel Address True
auto_recovery yes Enable automatic recovery after failover True
backup_adapter NONE Adapter used when whole channel fails True
hash_mode default Determines how outgoing adapter is chosen True
interval long Determines interval value for IEEE 802.3ad mode True
mode 8023ad EtherChannel mode of operation True
netaddr 0 Address to ping True
noloss_failover yes Enable lossless failover after ping failure True
num_retries 3 Times to retry ping before failing True
retry_time 1 Wait time (in seconds) between pings True
use_alt_addr no Enable Alternate EtherChannel Address True
use_jumbo_frame yes Enable Gigabit Ethernet Jumbo Frames True


sysopr1@oltpn8c:/home/sysopr1>lsattr -El ent37
adapter_names ent13,ent29 EtherChannel Adapters True
alt_addr 0x000000000000 Alternate EtherChannel Address True
auto_recovery yes Enable automatic recovery after failover True
backup_adapter NONE Adapter used when whole channel fails True
hash_mode default Determines how outgoing adapter is chosen True
interval long Determines interval value for IEEE 802.3ad mode True
mode 8023ad EtherChannel mode of operation True
netaddr 0 Address to ping True
noloss_failover yes Enable lossless failover after ping failure True
num_retries 3 Times to retry ping before failing True
retry_time 1 Wait time (in seconds) between pings True
use_alt_addr no Enable Alternate EtherChannel Address True
use_jumbo_frame yes Enable Gigabit Ethernet Jumbo Frames True


sysopr1@oltpn8c:/home/sysopr1>vmstat -v
59703296 memory pages
24344784 lruable pages
2960431 free pages
12 memory pools
5104035 pinned pages
90.0 maxpin percentage
3.0 minperm percentage
90.0 maxperm percentage
66.1 numperm percentage
16101397 file pages
0.0 compressed percentage
0 compressed pages
66.1 numclient percentage
90.0 maxclient percentage
16101397 client pages
0 remote pageouts scheduled
16925 pending disk I/Os blocked with no pbuf
0 paging space I/Os blocked with no psbuf
2310 filesystem I/Os blocked with no fsbuf
0 client filesystem I/Os blocked with no fsbuf
963 external pager filesystem I/Os blocked with no fsbuf
68.1 percentage of memory used for computational pages


sysopr1@oltpn8c:/home/sysopr1>vmstat -s
2901456848 total address trans. faults
71529368 page ins
37972163 page outs
0 paging space page ins
0 paging space page outs
0 total reclaims
1499444920 zero filled pages faults
10061720 executable filled pages faults
129155399 pages examined by clock
5 revolutions of the clock hand
69804251 pages freed by the clock
35287579 backtracks
0 free frame waits
0 extend XPT waits
1787276 pending I/O waits
109500893 start I/Os
6814168 iodones
20209136397 cpu context switches
10241162941 device interrupts
5085027858 software interrupts
3949413325 decrementer interrupts
10361145 mpc-sent interrupts
10361144 mpc-receive interrupts
95687435 phantom interrupts
0 traps
160934074206 syscalls


sysopr1@oltpn8c:/home/sysopr1>lppchk -vm3
sysopr1@oltpn8c:/home/sysopr1>
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. AIX

FTP connection refused from text editor while accessing AIX server .

HI , I'm facing the FTP connection refused from text editor while accessing AIX server .It showing the messege "can't create ftp connection connectin refused".Though it is accessible from putty . i'm using aix version 6 . Can any one let me know the seetings needs to be made so that i... (2 Replies)
Discussion started by: rmkganesh
2 Replies

2. Ubuntu

ssh connection unstable on remote server

Hi I hope someone can spot what is wrong with this ssh connection as it has me baffled. I am trying to set up a remote ssh connection (passwordless) to a remote 'server', (Ubuntu laptop at home). I have tried these steps with rsa and dsa key types, (currently dsa) - 1) ssh-keygen... (4 Replies)
Discussion started by: steadyonabix
4 Replies

3. UNIX for Advanced & Expert Users

fork: Resource temporarily unavailable , server unexpectedly unavailable network connection

Solaris 10 Server refuse to connect :wall: fork: Resource temporarily unavailable , server unexpectedly unavailable network connection , refuse error, disconnect message, fatal error type2, (protocol error type2) Issue has been resolved after taken few steps :b: First of all need to check... (1 Reply)
Discussion started by: taherahmed
1 Replies

4. AIX

Help Me - AIX server connect to a VPN network

Hi, I have a task requested by my boss to create a script to enable a server to connect to a vpn network and then to connect to another server to upload some data... How can I connect to a vpn network from AIX server? via telnet? ssh? I have tried to google but mostly the answers are... (1 Reply)
Discussion started by: mushr00m
1 Replies

5. Solaris

Solaris 10 ftp connection problem (connection refused, connection timed out)

Hi everyone, I am hoping anyone of you could help me in this weird problem we have in 1 of our Solaris 10 servers. Lately, we have been having some ftp problems in this server. Though it can ping any server within the network, it seems that it can only ftp to a select few. For most servers, the... (4 Replies)
Discussion started by: labdakos
4 Replies

6. AIX

AIX OS problem? network problem?

Dear ALL. I installed AIX OS on customer sites. but Only one site is too slow when I connected telnet, ftp.. Ping is too fast. but telnet and FTP is not connected.. of course i check the configuration file on aix but it's normal. Do any Idea?? thanks in advance. - Jun - (3 Replies)
Discussion started by: Jeon Jun Seok
3 Replies

7. Solaris

Server unexpectedly closed network connection error in passwordless in ssh through

Hi , when i try to passwordless connection login in ssh through putty, i am getting the "Server unexpectedly closed network connection" error.i have already finished the public and private key settings for the particular user. thanks MaroV (1 Reply)
Discussion started by: vr_mari
1 Replies

8. Cybersecurity

Enable SSH for root over certain network connection of a server...is it possible?

Hi - I have a SUSE Enterprise Linux Server V9 that I have an issue with. Policy says that root connectivity via ssh needs to be disabled. So, to do that, I made the following change in the sshd_config section: # Authentication: #LoginGraceTime 2m #PermitRootLogin yes PermitRootLogin no... (3 Replies)
Discussion started by: cpolikowsky
3 Replies

9. UNIX for Dummies Questions & Answers

Network connection problem in unix

I have a network connection problem in unix. I am trying to access to LAN in unix but unfortunately it doesn't work. For example, if I ping from Unix to windows it says 'the network is unreacable'. Also, I can't ping in windows to unix. I would be glad if someone could help me. Thanks. ... (5 Replies)
Discussion started by: fatihshen
5 Replies

10. UNIX for Dummies Questions & Answers

FTP connection problem on new server

Solaris Server (V880) version 8 Brand new box just installed O/S, what do I need to do in order to get FTP working. I have updated the /etc/hosts.equiv file Problem - When trying to ftp to the server I get a login incorrect message, this is with the root user. Are there some... (1 Reply)
Discussion started by: miredale
1 Replies
Login or Register to Ask a Question