AIX server problem - network connection is unstable !


 
Thread Tools Search this Thread
Operating Systems AIX AIX server problem - network connection is unstable !
# 15  
Old 03-04-2020
This is a pretty large box, a E870 as a matter of fact. Is the network coming from a VIO? If so: Is the VIO network setup correctly? Does the VIO have the correct resource assignments to provide services to however many lpars you have running?

I ask the above because I encountered an issue where the VIO was set up by an MSP and what was happening is every so often the active network path would switch from say c1-p1-t1 to c10-p1-t1. That would cause a momentary ping delay. Much like what you are seeing. My network team was the first to report it to me, as they would see the MAC for the etherchannel device change which port it was reporting on the switch. The item they sent me was %SW_MATM-4-MACFLAP_NOTIF. It was caused by the etherchannel on the VIO having two primary adapters and no backup adapter. Taking one of the primary adapters and moving it to the backup adapter fixed the issue. Your results may vary.

If they are dedicated adapters and not coming from the VIO, again check your AIX configuration. Something to note, your tcp_recvspace and rfc1323 are not consistent with your adapters. That might be by design, but gives me pause that network wasn't set up correctly.

Now, everything else that everyone has posted here comes into play as well, and I'm not a network admin so I cannot weigh in on the other topics presented here.

Push comes to shove, start a ticket with the IBM AIX team.
These 2 Users Gave Thanks to RecoveryOne For This Post:
# 16  
Old 03-09-2020
Quote:
Yes, I understand.

When on a LAN segment you should also realize that every hardware interface to the LAN can have a different characteristic.

Some ethernet cards are more "chatty" and some are less "chatty" and some are "old" and others are "new" and some have drivers / firmware written by "A" and others have drivers / firmware written by "B".

You say "you don't have a lot of knowledge"... that's normal.

So, I am telling you what an experienced network engineer with a lot of knowledge would so.

If I had a busy LAN segment with a lot of devices on the same segment (subnet), as you have indicated, and I had two devices which I wanted to have the best communication speed between them, I would put them on their own LAN segment (subnet) and "be done with it" and maybe "retest" when the two devices are the only two devices (or three if you have a different gateway device) on that segment.
Quote:
Originally Posted by RecoveryOne
This is a pretty large box, a E870 as a matter of fact. Is the network coming from a VIO? If so: Is the VIO network setup correctly? Does the VIO have the correct resource assignments to provide services to however many lpars you have running?

I ask the above because I encountered an issue where the VIO was set up by an MSP and what was happening is every so often the active network path would switch from say c1-p1-t1 to c10-p1-t1. That would cause a momentary ping delay. Much like what you are seeing. My network team was the first to report it to me, as they would see the MAC for the etherchannel device change which port it was reporting on the switch. The item they sent me was %SW_MATM-4-MACFLAP_NOTIF. It was caused by the etherchannel on the VIO having two primary adapters and no backup adapter. Taking one of the primary adapters and moving it to the backup adapter fixed the issue. Your results may vary.

If they are dedicated adapters and not coming from the VIO, again check your AIX configuration. Something to note, your tcp_recvspace and rfc1323 are not consistent with your adapters. That might be by design, but gives me pause that network wasn't set up correctly.

Now, everything else that everyone has posted here comes into play as well, and I'm not a network admin so I cannot weigh in on the other topics presented here.

Push comes to shove, start a ticket with the IBM AIX team.
We're running with full resources, no VIO.

We've asked network team to recheck the network. We are also planning to upgrade OS to 7.1 to get help from IBM.
# 17  
Old 03-09-2020
please show me the outputs of following commands (if ent is mentioned for every ent adapter configured, if ipaddress is stated for every IP of your cluster partners)
Code:
lsdev -Cc adapter
entstat entx
no -a
vmstat -IWwt 2 10
ping -c 10  ipaddress 25000

Are your cluster partner's IP addresses in /etc/hosts
how are /etc/netsvc.conf and /etc/resolv.conf configured (order of things)
how many disks are part of your gpfs cluster
This User Gave Thanks to zxmaus For This Post:
# 18  
Old 03-10-2020
Quote:
sysopr1@oltpn8c:/home/sysopr1>lsdev -Cc adapter
lsdev: 0514-521 Cannot find information in the predefined device
configuration database for the customized device pkcs11.
ent0 Available 01-00 PCIe2 10GbE SFP+ CU 4-port Converged Network Adapter (df1020e214103d04)
ent1 Available 01-01 PCIe2 10GbE SFP+ CU 4-port Converged Network Adapter (df1020e214103d04)
ent2 Available 01-02 PCIe2 100/1000 Base-TX 4-port Converged Network Adapter (df1020e214103f04)
ent3 Available 01-03 PCIe2 100/1000 Base-TX 4-port Converged Network Adapter (df1020e214103f04)
ent4 Available 02-00 PCIe2 4-Port Adapter (10GbE SFP+) (e4148a1614109304)
ent5 Available 02-01 PCIe2 4-Port Adapter (10GbE SFP+) (e4148a1614109304)
ent6 Available 02-02 PCIe2 4-Port Adapter (1GbE RJ45) (e4148a1614109404)
ent7 Available 02-03 PCIe2 4-Port Adapter (1GbE RJ45) (e4148a1614109404)
ent8 Available 04-00 PCIe2 10GbE SFP+ CU 4-port Converged Network Adapter (df1020e214103d04)
ent9 Available 04-01 PCIe2 10GbE SFP+ CU 4-port Converged Network Adapter (df1020e214103d04)
ent10 Available 04-02 PCIe2 100/1000 Base-TX 4-port Converged Network Adapter (df1020e214103f04)
ent11 Available 04-03 PCIe2 100/1000 Base-TX 4-port Converged Network Adapter (df1020e214103f04)
ent12 Available 05-00 PCIe2 4-Port Adapter (10GbE SFP+) (e4148a1614109304)
ent13 Available 05-01 PCIe2 4-Port Adapter (10GbE SFP+) (e4148a1614109304)
ent14 Available 05-02 PCIe2 4-Port Adapter (1GbE RJ45) (e4148a1614109404)
ent15 Available 05-03 PCIe2 4-Port Adapter (1GbE RJ45) (e4148a1614109404)
ent16 Available 09-00 PCIe2 10GbE SFP+ CU 4-port Converged Network Adapter (df1020e214103d04)
ent17 Available 09-01 PCIe2 10GbE SFP+ CU 4-port Converged Network Adapter (df1020e214103d04)
ent18 Available 09-02 PCIe2 100/1000 Base-TX 4-port Converged Network Adapter (df1020e214103f04)
ent19 Available 09-03 PCIe2 100/1000 Base-TX 4-port Converged Network Adapter (df1020e214103f04)
ent20 Available 0B-00 PCIe2 4-Port Adapter (10GbE SFP+) (e4148a1614109304)
ent21 Available 0B-01 PCIe2 4-Port Adapter (10GbE SFP+) (e4148a1614109304)
ent22 Available 0B-02 PCIe2 4-Port Adapter (1GbE RJ45) (e4148a1614109404)
ent23 Available 0B-03 PCIe2 4-Port Adapter (1GbE RJ45) (e4148a1614109404)
ent24 Available 0F-00 PCIe2 10GbE SFP+ CU 4-port Converged Network Adapter (df1020e214103d04)
ent25 Available 0F-01 PCIe2 10GbE SFP+ CU 4-port Converged Network Adapter (df1020e214103d04)
ent26 Available 0F-02 PCIe2 100/1000 Base-TX 4-port Converged Network Adapter (df1020e214103f04)
ent27 Available 0F-03 PCIe2 100/1000 Base-TX 4-port Converged Network Adapter (df1020e214103f04)
ent28 Available 0H-00 PCIe2 4-Port Adapter (10GbE SFP+) (e4148a1614109304)
ent29 Available 0H-01 PCIe2 4-Port Adapter (10GbE SFP+) (e4148a1614109304)
ent30 Available 0H-02 PCIe2 4-Port Adapter (1GbE RJ45) (e4148a1614109404)
ent31 Available 0H-03 PCIe2 4-Port Adapter (1GbE RJ45) (e4148a1614109404)
ent32 Available EtherChannel / IEEE 802.3ad Link Aggregation
ent33 Available EtherChannel / IEEE 802.3ad Link Aggregation
ent34 Available EtherChannel / IEEE 802.3ad Link Aggregation
ent35 Available EtherChannel / IEEE 802.3ad Link Aggregation
ent36 Available EtherChannel / IEEE 802.3ad Link Aggregation
ent37 Available EtherChannel / IEEE 802.3ad Link Aggregation
fcs0 Available 00-00 PCIe2 2-Port 16Gb FC Adapter (df1000e21410f103)
fcs1 Available 00-01 PCIe2 2-Port 16Gb FC Adapter (df1000e21410f103)
fcs2 Available 01-04 PCIe2 10Gb Cu 4-Port FCoE Adapter (df1060e214103e04)
fcs3 Available 01-05 PCIe2 10Gb Cu 4-Port FCoE Adapter (df1060e214103e04)
fcs4 Available 03-00 PCIe2 2-Port 16Gb FC Adapter (df1000e21410f103)
fcs5 Available 03-01 PCIe2 2-Port 16Gb FC Adapter (df1000e21410f103)
fcs6 Available 04-04 PCIe2 10Gb Cu 4-Port FCoE Adapter (df1060e214103e04)
fcs7 Available 04-05 PCIe2 10Gb Cu 4-Port FCoE Adapter (df1060e214103e04)
fcs8 Available 08-00 PCIe2 2-Port 16Gb FC Adapter (df1000e21410f103)
fcs9 Available 08-01 PCIe2 2-Port 16Gb FC Adapter (df1000e21410f103)
fcs10 Available 09-04 PCIe2 10Gb Cu 4-Port FCoE Adapter (df1060e214103e04)
fcs11 Available 09-05 PCIe2 10Gb Cu 4-Port FCoE Adapter (df1060e214103e04)
fcs12 Available 0A-00 PCIe2 2-Port 16Gb FC Adapter (df1000e21410f103)
fcs13 Available 0A-01 PCIe2 2-Port 16Gb FC Adapter (df1000e21410f103)
fcs14 Available 0D-00 PCIe2 2-Port 16Gb FC Adapter (df1000e21410f103)
fcs15 Available 0D-01 PCIe2 2-Port 16Gb FC Adapter (df1000e21410f103)
fcs16 Available 0E-00 PCIe2 2-Port 16Gb FC Adapter (df1000e21410f103)
fcs17 Available 0E-01 PCIe2 2-Port 16Gb FC Adapter (df1000e21410f103)
fcs18 Available 0F-04 PCIe2 10Gb Cu 4-Port FCoE Adapter (df1060e214103e04)
fcs19 Available 0F-05 PCIe2 10Gb Cu 4-Port FCoE Adapter (df1060e214103e04)
pkcs11 Defined N/A
sissas0 Available 06-00 PCIe3 RAID SAS Adapter Quad-port 6Gb x8
sissas1 Available 0C-00 PCIe3 RAID SAS Adapter Quad-port 6Gb x8
sissas2 Available 0G-00 PCIe3 RAID SAS Adapter Quad-port 6Gb x8
usbhc0 Available 07-00 PCIe2 USB 3.0 xHCI 4-Port Adapter (4c10418214109e04)
vsa0 Available LPAR Virtual Serial Adapter
Quote:
sysopr1@oltpn8c:/home/sysopr1>entstat ent32
-------------------------------------------------------------
ETHERNET STATISTICS (ent32) :
Device Type: IEEE 802.3ad Link Aggregation
Hardware Address: 00:90:fa:d9:c8:77
Elapsed Time: 13 days 22 hours 4 minutes 2 seconds

Transmit Statistics: Receive Statistics:
-------------------- -------------------
Packets: 3415644304 Packets: 4471422995
Bytes: 1267611213067 Bytes: 1185875560382
Interrupts: 52163060 Interrupts: 2672637348
Transmit Errors: 0 Receive Errors: 0
Packets Dropped: 0 Packets Dropped: 0
Bad Packets: 0
Max Packets on S/W Transmit Queue: 180
S/W Transmit Queue Overflow: 0
Current S/W+H/W Transmit Queue Length: 80

Broadcast Packets: 1797 Broadcast Packets: 3216126
Multicast Packets: 84501 Multicast Packets: 82523
No Carrier Sense: 0 CRC Errors: 0
DMA Underrun: 0 DMA Overrun: 0
Lost CTS Errors: 0 Alignment Errors: 0
Max Collision Errors: 0 No Resource Errors: 0
Late Collision Errors: 0 Receive Collision Errors: 0
Deferred: 0 Packet Too Short Errors: 0
SQE Test: 0 Packet Too Long Errors: 0
Timeout Errors: 0 Packets Discarded by Adapter: 0
Single Collision Count: 0 Receiver Start Count: 0
Multiple Collision Count: 0
Current HW Transmit Queue Length: 80

General Statistics:
-------------------
No mbuf Errors: 0
Adapter Reset Count: 0
Adapter Data Rate: 2000
Driver Flags: Up Broadcast Running
Simplex 64BitSupport ChecksumOffload
LargeSend DataRateSet ETHERCHANNEL
sysopr1@oltpn8c:/home/sysopr1>
sysopr1@oltpn8c:/home/sysopr1>entstat ent33
-------------------------------------------------------------
ETHERNET STATISTICS (ent33) :
Device Type: IEEE 802.3ad Link Aggregation
Hardware Address: 00:90:fa:d9:c8:78
Elapsed Time: 11 days 23 hours 41 minutes 4 seconds

Transmit Statistics: Receive Statistics:
-------------------- -------------------
Packets: 1043970 Packets: 2381761
Bytes: 459459779 Bytes: 270515040
Interrupts: 16284 Interrupts: 2298206
Transmit Errors: 0 Receive Errors: 0
Packets Dropped: 0 Packets Dropped: 0
Bad Packets: 0
Max Packets on S/W Transmit Queue: 6
S/W Transmit Queue Overflow: 0
Current S/W+H/W Transmit Queue Length: 50

Broadcast Packets: 240 Broadcast Packets: 1344979
Multicast Packets: 78025 Multicast Packets: 80959
No Carrier Sense: 0 CRC Errors: 0
DMA Underrun: 0 DMA Overrun: 0
Lost CTS Errors: 0 Alignment Errors: 0
Max Collision Errors: 0 No Resource Errors: 0
Late Collision Errors: 0 Receive Collision Errors: 0
Deferred: 0 Packet Too Short Errors: 0
SQE Test: 0 Packet Too Long Errors: 0
Timeout Errors: 0 Packets Discarded by Adapter: 0
Single Collision Count: 0 Receiver Start Count: 0
Multiple Collision Count: 0
Current HW Transmit Queue Length: 50

General Statistics:
-------------------
No mbuf Errors: 0
Adapter Reset Count: 0
Adapter Data Rate: 2000
Driver Flags: Up Broadcast Running
Simplex 64BitSupport ChecksumOffload
LargeSend DataRateSet ETHERCHANNEL
sysopr1@oltpn8c:/home/sysopr1>entstat ent34
-------------------------------------------------------------
ETHERNET STATISTICS (ent34) :
Device Type: IEEE 802.3ad Link Aggregation
Hardware Address: 00:90:fa:c0:d3:b5
Elapsed Time: 13 days 22 hours 4 minutes 22 seconds

Transmit Statistics: Receive Statistics:
-------------------- -------------------
Packets: 244022339 Packets: 320503599
Bytes: 122073968977 Bytes: 167392262368
Interrupts: 3810437 Interrupts: 248212087
Transmit Errors: 0 Receive Errors: 0
Packets Dropped: 0 Packets Dropped: 0
Bad Packets: 0
Max Packets on S/W Transmit Queue: 5
S/W Transmit Queue Overflow: 0
Current S/W+H/W Transmit Queue Length: 195

Broadcast Packets: 1705 Broadcast Packets: 4421
Multicast Packets: 78754 Multicast Packets: 80818
No Carrier Sense: 0 CRC Errors: 0
DMA Underrun: 0 DMA Overrun: 0
Lost CTS Errors: 0 Alignment Errors: 0
Max Collision Errors: 0 No Resource Errors: 0
Late Collision Errors: 0 Receive Collision Errors: 0
Deferred: 0 Packet Too Short Errors: 0
SQE Test: 0 Packet Too Long Errors: 0
Timeout Errors: 0 Packets Discarded by Adapter: 0
Single Collision Count: 0 Receiver Start Count: 0
Multiple Collision Count: 0
Current HW Transmit Queue Length: 195

General Statistics:
-------------------
No mbuf Errors: 0
Adapter Reset Count: 0
Adapter Data Rate: 2000
Driver Flags: Up Broadcast Running
Simplex 64BitSupport ChecksumOffload
LargeSend DataRateSet ETHERCHANNEL
sysopr1@oltpn8c:/home/sysopr1>entstat ent35
-------------------------------------------------------------
ETHERNET STATISTICS (ent35) :
Device Type: IEEE 802.3ad Link Aggregation
Hardware Address: 00:90:fa:c0:d3:b6
Elapsed Time: 13 days 22 hours 4 minutes 45 seconds

Transmit Statistics: Receive Statistics:
-------------------- -------------------
Packets: 23202676 Packets: 55789032
Bytes: 283260647446 Bytes: 3353483603
Interrupts: 362540 Interrupts: 38413169
Transmit Errors: 0 Receive Errors: 0
Packets Dropped: 0 Packets Dropped: 0
Bad Packets: 0
Max Packets on S/W Transmit Queue: 2
S/W Transmit Queue Overflow: 0
Current S/W+H/W Transmit Queue Length: 116

Broadcast Packets: 68 Broadcast Packets: 2019
Multicast Packets: 78756 Multicast Packets: 80818
No Carrier Sense: 0 CRC Errors: 0
DMA Underrun: 0 DMA Overrun: 0
Lost CTS Errors: 0 Alignment Errors: 0
Max Collision Errors: 0 No Resource Errors: 0
Late Collision Errors: 0 Receive Collision Errors: 0
Deferred: 0 Packet Too Short Errors: 0
SQE Test: 0 Packet Too Long Errors: 0
Timeout Errors: 0 Packets Discarded by Adapter: 0
Single Collision Count: 0 Receiver Start Count: 0
Multiple Collision Count: 0
Current HW Transmit Queue Length: 116

General Statistics:
-------------------
No mbuf Errors: 0
Adapter Reset Count: 0
Adapter Data Rate: 2000
Driver Flags: Up Broadcast Running
Simplex 64BitSupport ChecksumOffload
LargeSend DataRateSet ETHERCHANNEL
sysopr1@oltpn8c:/home/sysopr1>entstat ent36
-------------------------------------------------------------
ETHERNET STATISTICS (ent36) :
Device Type: IEEE 802.3ad Link Aggregation
Hardware Address: 34:80:0d:66:02:58
Elapsed Time: 3 days 23 hours 29 minutes 46 seconds

Transmit Statistics: Receive Statistics:
-------------------- -------------------
Packets: 276712971 Packets: 241026420
Bytes: 107028587332 Bytes: 177467891861
Interrupts: 187553411 Interrupts: 208539011
Transmit Errors: 0 Receive Errors: 0
Packets Dropped: 0 Packets Dropped: 0
Bad Packets: 0
Max Packets on S/W Transmit Queue: 75
S/W Transmit Queue Overflow: 0
Current S/W+H/W Transmit Queue Length: 0

Broadcast Packets: 2359 Broadcast Packets: 6430
Multicast Packets: 11645 Multicast Packets: 11785
No Carrier Sense: 0 CRC Errors: 0
DMA Underrun: 0 DMA Overrun: 0
Lost CTS Errors: 0 Alignment Errors: 0
Max Collision Errors: 0 No Resource Errors: 0
Late Collision Errors: 0 Receive Collision Errors: 0
Deferred: 0 Packet Too Short Errors: 0
SQE Test: 0 Packet Too Long Errors: 0
Timeout Errors: 0 Packets Discarded by Adapter: 0
Single Collision Count: 0 Receiver Start Count: 0
Multiple Collision Count: 0
Current HW Transmit Queue Length: 0

General Statistics:
-------------------
No mbuf Errors: 0
Adapter Reset Count: 0
Adapter Data Rate: 20000
Driver Flags: Up Broadcast Running
Simplex 64BitSupport ChecksumOffload
LargeSend DataRateSet ETHERCHANNEL
sysopr1@oltpn8c:/home/sysopr1>entstat ent37
-------------------------------------------------------------
ETHERNET STATISTICS (ent37) :
Device Type: IEEE 802.3ad Link Aggregation
Hardware Address: 34:80:0d:66:25:31
Elapsed Time: 13 days 22 hours 4 minutes 52 seconds

Transmit Statistics: Receive Statistics:
-------------------- -------------------
Packets: 226279262 Packets: 348178675
Bytes: 7287168578791 Bytes: 108190596025
Interrupts: 180561270 Interrupts: 283659203
Transmit Errors: 0 Receive Errors: 0
Packets Dropped: 0 Packets Dropped: 0
Bad Packets: 0
Max Packets on S/W Transmit Queue: 18
S/W Transmit Queue Overflow: 0
Current S/W+H/W Transmit Queue Length: 0

Broadcast Packets: 312 Broadcast Packets: 233113
Multicast Packets: 78756 Multicast Packets: 82891
No Carrier Sense: 0 CRC Errors: 0
DMA Underrun: 0 DMA Overrun: 0
Lost CTS Errors: 0 Alignment Errors: 0
Max Collision Errors: 0 No Resource Errors: 0
Late Collision Errors: 0 Receive Collision Errors: 0
Deferred: 0 Packet Too Short Errors: 0
SQE Test: 0 Packet Too Long Errors: 0
Timeout Errors: 0 Packets Discarded by Adapter: 0
Single Collision Count: 0 Receiver Start Count: 0
Multiple Collision Count: 0
Current HW Transmit Queue Length: 0

General Statistics:
-------------------
No mbuf Errors: 0
Adapter Reset Count: 0
Adapter Data Rate: 20000
Driver Flags: Up Broadcast Running
Simplex 64BitSupport ChecksumOffload
LargeSend DataRateSet ETHERCHANNEL
Quote:
sysopr1@oltpn8c:/home/sysopr1>no -a
arpqsize = 1024
arpt_killc = 20
arptab_bsiz = 7
arptab_nb = 149
bcastping = 0
bsd_loglevel = 3
clean_partial_conns = 0
delayack = 0
delayackports = {}
dgd_flush_cached_route = 0
dgd_packets_lost = 3
dgd_ping_time = 5
dgd_retry_time = 5
directed_broadcast = 0
fasttimo = 200
hstcp = 0
icmp6_errmsg_rate = 10
icmpaddressmask = 0
ie5_old_multicast_mapping = 0
ifsize = 256
igmpv2_deliver = 0
init_high_wat = 0
ip6_defttl = 64
ip6_prune = 1
ip6forwarding = 0
ip6srcrouteforward = 1
ip_ifdelete_notify = 0
ip_nfrag = 200
ipforwarding = 0
ipfragttl = 2
ipignoreredirects = 0
ipqmaxlen = 512
ipsendredirects = 1
ipsrcrouteforward = 1
ipsrcrouterecv = 0
ipsrcroutesend = 1
limited_ss = 0
llsleep_timeout = 3
lo_perf = 0
lowthresh = 90
main_if6 = 0
main_site6 = 0
maxnip6q = 20
maxttl = 255
medthresh = 95
mpr_policy = 1
multi_homed = 1
nbc_limit = 29851648
nbc_max_cache = 131072
nbc_min_cache = 1
nbc_ofile_hashsz = 12841
nbc_pseg = 0
nbc_pseg_limit = 59703296
ndd_event_name = {all}
ndd_event_tracing = 0
ndogthreads = 0
ndp_mmaxtries = 3
ndp_umaxtries = 3
ndpqsize = 50
ndpt_down = 3
ndpt_keep = 120
ndpt_probe = 5
ndpt_reachable = 30
ndpt_retrans = 1
net_buf_size = {all}
net_buf_type = {all}
net_malloc_frag_mask = {0}
netm_page_promote = 1
nonlocsrcroute = 0
nstrpush = 8
passive_dgd = 0
pmtu_default_age = 10
pmtu_expire = 10
pmtu_rediscover_interval = 30
psebufcalls = 20
psecache = 1
psetimers = 20
rfc1122addrchk = 0
rfc1323 = 1
rfc2414 = 1
route_expire = 1
routerevalidate = 0
rtentry_lock_complex = 0
rto_high = 64
rto_length = 13
rto_limit = 7
rto_low = 1
sack = 0
sb_max = 21053440
send_file_duration = 300
site6_index = 0
sockthresh = 85
sodebug = 0
sodebug_env = 0
somaxconn = 1024
strctlsz = 1024
strmsgsz = 0
strthresh = 85
strturncnt = 15
subnetsarelocal = 1
tcp_bad_port_limit = 0
tcp_cwnd_modified = 0
tcp_ecn = 0
tcp_ephemeral_high = 65500
tcp_ephemeral_low = 9000
tcp_fastlo = 0
tcp_fastlo_crosswpar = 0
tcp_finwait2 = 1200
tcp_icmpsecure = 0
tcp_init_window = 0
tcp_inpcb_hashtab_siz = 24499
tcp_keepcnt = 8
tcp_keepidle = 14400
tcp_keepinit = 150
tcp_keepintvl = 150
tcp_limited_transmit = 1
tcp_low_rto = 0
tcp_maxburst = 0
tcp_mssdflt = 1460
tcp_nagle_limit = 65535
tcp_nagleoverride = 0
tcp_ndebug = 100
tcp_newreno = 1
tcp_nodelayack = 0
tcp_pmtu_discover = 1
tcp_recvspace = 262144
tcp_sendspace = 262144
tcp_tcpsecure = 0
tcp_timewait = 1
tcp_ttl = 60
tcprexmtthresh = 3
tcptr_enable = 0
thewall = 119406592
timer_wheel_tick = 0
tn_filter = 1
udp_bad_port_limit = 0
udp_ephemeral_high = 65500
udp_ephemeral_low = 9000
udp_inpcb_hashtab_siz = 24499
udp_pmtu_discover = 1
udp_recv_perf = 0
udp_recvspace = 10526720
udp_sendspace = 1052672
udp_ttl = 30
udpcksum = 1
use_sndbufpool = 1
Quote:
sysopr1@oltpn8c:/home/sysopr1>vmstat -IWwt 2 10

System configuration: lcpu=80 mem=233216MB

kthr memory page faults cpu time
--------------- --------------------- ------------------------------------ ------------------ ----------- --------
r b p w avm fre fi fo pi po fr sr in sy cs us sy id wa hr mi se
3 0 0 0 5842255 1482071 0 0 0 0 0 0 2567 93780 4660 1 7 93 0 10:10:46
1 0 0 0 5841994 1482346 0 102 0 0 0 0 4706 104535 11609 1 5 94 0 10:10:48
1 0 0 0 5841994 1482345 0 0 0 0 0 0 3011 96238 5377 1 7 93 0 10:10:50
3 0 0 0 5841994 1482343 0 0 0 0 0 0 5735 102653 11254 1 6 93 0 10:10:52
2 0 0 0 5841994 1482343 0 1 0 0 0 0 4820 101308 9536 1 5 94 0 10:10:54
2 0 0 0 5841994 1482341 0 0 0 0 0 0 5525 104230 10647 1 8 91 0 10:10:56
1 0 0 0 5841994 1482341 0 1 0 0 0 0 6889 104100 13478 1 5 94 0 10:10:58
3 0 0 0 5842133 1482190 0 6 0 0 0 0 3103 94743 6000 1 7 92 0 10:11:00
2 0 0 0 5842135 1482162 0 14 0 0 0 0 2069 95776 3544 1 6 93 0 10:11:02
2 0 0 0 5842135 1482162 0 0 0 0 0 0 2200 95534 4312 1 6 94 0 10:11:04
Ping to another server
Quote:
sysopr1@oltpn8c:/home/sysopr1>ping -c 10 10.0.91.82 25000
PING 10.0.91.82: (10.0.91.82): 25000 data bytes
25008 bytes from 10.0.91.82: icmp_seq=0 ttl=255 time=0 ms
25008 bytes from 10.0.91.82: icmp_seq=1 ttl=255 time=0 ms
25008 bytes from 10.0.91.82: icmp_seq=2 ttl=255 time=0 ms
25008 bytes from 10.0.91.82: icmp_seq=3 ttl=255 time=0 ms
25008 bytes from 10.0.91.82: icmp_seq=4 ttl=255 time=80 ms
25008 bytes from 10.0.91.82: icmp_seq=5 ttl=255 time=0 ms
25008 bytes from 10.0.91.82: icmp_seq=6 ttl=255 time=0 ms
25008 bytes from 10.0.91.82: icmp_seq=7 ttl=255 time=0 ms
25008 bytes from 10.0.91.82: icmp_seq=8 ttl=255 time=0 ms
25008 bytes from 10.0.91.82: icmp_seq=9 ttl=255 time=0 ms

--- 10.0.91.82 ping statistics ---
10 packets transmitted, 10 packets received, 0% packet loss
round-trip min/avg/max = 0/8/80 ms
Ping from 10.0.91.82
Quote:
<icapp2:/home/sysopr1>ping -c 10 10.0.91.18 25000
PING 10.0.91.18 (10.0.91.18): 25000 data bytes
25008 bytes from 10.0.91.18: icmp_seq=0 ttl=255 time=0 ms
25008 bytes from 10.0.91.18: icmp_seq=1 ttl=255 time=0 ms
25008 bytes from 10.0.91.18: icmp_seq=2 ttl=255 time=0 ms
25008 bytes from 10.0.91.18: icmp_seq=3 ttl=255 time=57 ms
25008 bytes from 10.0.91.18: icmp_seq=4 ttl=255 time=0 ms
25008 bytes from 10.0.91.18: icmp_seq=5 ttl=255 time=0 ms
25008 bytes from 10.0.91.18: icmp_seq=6 ttl=255 time=0 ms
25008 bytes from 10.0.91.18: icmp_seq=7 ttl=255 time=0 ms
25008 bytes from 10.0.91.18: icmp_seq=8 ttl=255 time=1 ms
25008 bytes from 10.0.91.18: icmp_seq=9 ttl=255 time=45 ms

--- 10.0.91.18 ping statistics ---
10 packets transmitted, 10 packets received, 0% packet loss
round-trip min/avg/max = 0/10/57 ms
Ping to one node
Quote:
sysopr1@oltpn8c:/home/sysopr1>ping -c 10 10.0.91.119 25000
PING 10.0.91.119: (10.0.91.119): 25000 data bytes
25008 bytes from 10.0.91.119: icmp_seq=0 ttl=255 time=20 ms
25008 bytes from 10.0.91.119: icmp_seq=1 ttl=255 time=18 ms
25008 bytes from 10.0.91.119: icmp_seq=2 ttl=255 time=18 ms
25008 bytes from 10.0.91.119: icmp_seq=3 ttl=255 time=0 ms
25008 bytes from 10.0.91.119: icmp_seq=4 ttl=255 time=22 ms
25008 bytes from 10.0.91.119: icmp_seq=5 ttl=255 time=50 ms
25008 bytes from 10.0.91.119: icmp_seq=6 ttl=255 time=52 ms
25008 bytes from 10.0.91.119: icmp_seq=7 ttl=255 time=0 ms
25008 bytes from 10.0.91.119: icmp_seq=8 ttl=255 time=39 ms
25008 bytes from 10.0.91.119: icmp_seq=9 ttl=255 time=34 ms

--- 10.0.91.119 ping statistics ---
10 packets transmitted, 10 packets received, 0% packet loss
round-trip min/avg/max = 0/25/52 ms
Quote:
Are your cluster partner's IP addresses in /etc/hosts
how are /etc/netsvc.conf and /etc/resolv.conf configured (order of things)
how many disks are part of your gpfs cluster
- Yes
- There's nothing in /etc/netsvc.conf
Quote:
sysopr1@oltpn8c:/home/sysopr1>vi /etc/resolv.conf
"/etc/resolv.conf" [Read only] 2 lines, 50 characters
nameserver 10.0.58.11
domain xxx.xxx.xxx
- 154 disk per site


SmilieSmilie
# 19  
Old 03-11-2020
ok, I don't see any errors or overflows or underruns on any of your adapters.That is a good thing. You can transfer large package sizes to everywhere else without loosing packages - which is a good thing too.
It would have been nice to know which adapters make up which link aggregation, but I forgot to ask for it Smilie
A few things I see in your network tunables, that I would probably change if this would be my systems to improve the general network flow, (i.e. sack, tcp_nodelayack, and buffer sizes) but for that it would help to know exactly which adapters make up which link aggregation. So if you could show me the lsattr -El entx outputs for your link aggregations and the underlying physical adapters that would help to make sure you dont use different speeds and depending on the adapters, the device attributes are set correctly. It might help as well to set certain settings on the adapters themselves in addition to the general system settings, like rfc1323 and buffers. And I can see that you have relatively little free memory. This may or may not be a problem so if you can give me the outputs of vmstat -v and vmstat -s, that would help me tell you the answer to this. You may as well want to populate netsvc.conf as it helps with order of name resolution and might generally improve network speed when the system does not have to guess how to find another host.
To get rid of the pkcs error on top of the lsdev -Cc adapter output, you probably need to install the security.pkcs11 fileset. And finally, can you please run lppchk -vm3 and post the output as well?.
This User Gave Thanks to zxmaus For This Post:
# 20  
Old 03-11-2020
Quote:
Originally Posted by zxmaus
ok, I don't see any errors or overflows or underruns on any of your adapters.That is a good thing. You can transfer large package sizes to everywhere else without loosing packages - which is a good thing too.
It would have been nice to know which adapters make up which link aggregation, but I forgot to ask for it Smilie
A few things I see in your network tunables, that I would probably change if this would be my systems to improve the general network flow, (i.e. sack, tcp_nodelayack, and buffer sizes) but for that it would help to know exactly which adapters make up which link aggregation. So if you could show me the lsattr -El entx outputs for your link aggregations and the underlying physical adapters that would help to make sure you dont use different speeds and depending on the adapters, the device attributes are set correctly. It might help as well to set certain settings on the adapters themselves in addition to the general system settings, like rfc1323 and buffers. And I can see that you have relatively little free memory. This may or may not be a problem so if you can give me the outputs of vmstat -v and vmstat -s, that would help me tell you the answer to this. You may as well want to populate netsvc.conf as it helps with order of name resolution and might generally improve network speed when the system does not have to guess how to find another host.
To get rid of the pkcs error on top of the lsdev -Cc adapter output, you probably need to install the security.pkcs11 fileset. And finally, can you please run lppchk -vm3 and post the output as well?.
Thanks for your help. This's what you need. Hope you can find something

Quote:
sysopr1@oltpn8c:/home/sysopr1>lsattr -El ent32
adapter_names ent2,ent18 EtherChannel Adapters True
alt_addr 0x000000000000 Alternate EtherChannel Address True
auto_recovery yes Enable automatic recovery after failover True
backup_adapter NONE Adapter used when whole channel fails True
hash_mode default Determines how outgoing adapter is chosen True
interval long Determines interval value for IEEE 802.3ad mode True
mode 8023ad EtherChannel mode of operation True
netaddr 0 Address to ping True
noloss_failover yes Enable lossless failover after ping failure True
num_retries 3 Times to retry ping before failing True
retry_time 1 Wait time (in seconds) between pings True
use_alt_addr no Enable Alternate EtherChannel Address True
use_jumbo_frame no Enable Gigabit Ethernet Jumbo Frames True


sysopr1@oltpn8c:/home/sysopr1>lsattr -El ent33
adapter_names ent3,ent19 EtherChannel Adapters True
alt_addr 0x000000000000 Alternate EtherChannel Address True
auto_recovery yes Enable automatic recovery after failover True
backup_adapter NONE Adapter used when whole channel fails True
hash_mode default Determines how outgoing adapter is chosen True
interval long Determines interval value for IEEE 802.3ad mode True
mode 8023ad EtherChannel mode of operation True
netaddr 0 Address to ping True
noloss_failover yes Enable lossless failover after ping failure True
num_retries 3 Times to retry ping before failing True
retry_time 1 Wait time (in seconds) between pings True
use_alt_addr no Enable Alternate EtherChannel Address True
use_jumbo_frame no Enable Gigabit Ethernet Jumbo Frames True


sysopr1@oltpn8c:/home/sysopr1>lsattr -El ent34
adapter_names ent10,ent26 EtherChannel Adapters True
alt_addr 0x000000000000 Alternate EtherChannel Address True
auto_recovery yes Enable automatic recovery after failover True
backup_adapter NONE Adapter used when whole channel fails True
hash_mode default Determines how outgoing adapter is chosen True
interval long Determines interval value for IEEE 802.3ad mode True
mode 8023ad EtherChannel mode of operation True
netaddr 0 Address to ping True
noloss_failover yes Enable lossless failover after ping failure True
num_retries 3 Times to retry ping before failing True
retry_time 1 Wait time (in seconds) between pings True
use_alt_addr no Enable Alternate EtherChannel Address True
use_jumbo_frame no Enable Gigabit Ethernet Jumbo Frames True


sysopr1@oltpn8c:/home/sysopr1>lsattr -El ent35
adapter_names ent11,ent27 EtherChannel Adapters True
alt_addr 0x000000000000 Alternate EtherChannel Address True
auto_recovery yes Enable automatic recovery after failover True
backup_adapter NONE Adapter used when whole channel fails True
hash_mode default Determines how outgoing adapter is chosen True
interval long Determines interval value for IEEE 802.3ad mode True
mode 8023ad EtherChannel mode of operation True
netaddr 0 Address to ping True
noloss_failover yes Enable lossless failover after ping failure True
num_retries 3 Times to retry ping before failing True
retry_time 1 Wait time (in seconds) between pings True
use_alt_addr no Enable Alternate EtherChannel Address True
use_jumbo_frame no Enable Gigabit Ethernet Jumbo Frames True


sysopr1@oltpn8c:/home/sysopr1>lsattr -El ent36
adapter_names ent12,ent28 EtherChannel Adapters True
alt_addr 0x000000000000 Alternate EtherChannel Address True
auto_recovery yes Enable automatic recovery after failover True
backup_adapter NONE Adapter used when whole channel fails True
hash_mode default Determines how outgoing adapter is chosen True
interval long Determines interval value for IEEE 802.3ad mode True
mode 8023ad EtherChannel mode of operation True
netaddr 0 Address to ping True
noloss_failover yes Enable lossless failover after ping failure True
num_retries 3 Times to retry ping before failing True
retry_time 1 Wait time (in seconds) between pings True
use_alt_addr no Enable Alternate EtherChannel Address True
use_jumbo_frame yes Enable Gigabit Ethernet Jumbo Frames True


sysopr1@oltpn8c:/home/sysopr1>lsattr -El ent37
adapter_names ent13,ent29 EtherChannel Adapters True
alt_addr 0x000000000000 Alternate EtherChannel Address True
auto_recovery yes Enable automatic recovery after failover True
backup_adapter NONE Adapter used when whole channel fails True
hash_mode default Determines how outgoing adapter is chosen True
interval long Determines interval value for IEEE 802.3ad mode True
mode 8023ad EtherChannel mode of operation True
netaddr 0 Address to ping True
noloss_failover yes Enable lossless failover after ping failure True
num_retries 3 Times to retry ping before failing True
retry_time 1 Wait time (in seconds) between pings True
use_alt_addr no Enable Alternate EtherChannel Address True
use_jumbo_frame yes Enable Gigabit Ethernet Jumbo Frames True


sysopr1@oltpn8c:/home/sysopr1>vmstat -v
59703296 memory pages
24344784 lruable pages
2960431 free pages
12 memory pools
5104035 pinned pages
90.0 maxpin percentage
3.0 minperm percentage
90.0 maxperm percentage
66.1 numperm percentage
16101397 file pages
0.0 compressed percentage
0 compressed pages
66.1 numclient percentage
90.0 maxclient percentage
16101397 client pages
0 remote pageouts scheduled
16925 pending disk I/Os blocked with no pbuf
0 paging space I/Os blocked with no psbuf
2310 filesystem I/Os blocked with no fsbuf
0 client filesystem I/Os blocked with no fsbuf
963 external pager filesystem I/Os blocked with no fsbuf
68.1 percentage of memory used for computational pages


sysopr1@oltpn8c:/home/sysopr1>vmstat -s
2901456848 total address trans. faults
71529368 page ins
37972163 page outs
0 paging space page ins
0 paging space page outs
0 total reclaims
1499444920 zero filled pages faults
10061720 executable filled pages faults
129155399 pages examined by clock
5 revolutions of the clock hand
69804251 pages freed by the clock
35287579 backtracks
0 free frame waits
0 extend XPT waits
1787276 pending I/O waits
109500893 start I/Os
6814168 iodones
20209136397 cpu context switches
10241162941 device interrupts
5085027858 software interrupts
3949413325 decrementer interrupts
10361145 mpc-sent interrupts
10361144 mpc-receive interrupts
95687435 phantom interrupts
0 traps
160934074206 syscalls


sysopr1@oltpn8c:/home/sysopr1>lppchk -vm3
sysopr1@oltpn8c:/home/sysopr1>
# 21  
Old 03-11-2020
ok - so you have some link aggregations with 100/1000 and some with 10G adapters. The 10G will almost always be faster. Considering how many 10G ports you have on your box, you may want to consider to move all of them to 10G instead? That would make tuning your network much easier.

If / when you do this, after long and exhausting RAC testing on our systems (similar hardware), we have found below network tunables to be the most beneficial for us. But be careful - do not change all of them at once - but one by one and see if your performance gets better, worse or remains the same.
You can change them via no -p -o name=value - note there is NO space before and after the equal sign in the command to change them.

Code:
hstcp = 1
lo_perf = 1
rtentry_lock_complex = 1
sack = 1
sb_max = 33554432
tcp_ephemeral_high = 65535
tcp_ephemeral_low = 32768
tcp_init_window = 16
tcp_nodelayack = 1
tcp_recvspace = 8388608
tcp_sendspace = 8388608
udp_ephemeral_high = 65535
udp_ephemeral_low = 32768
udp_recvspace = 655360
udp_sendspace = 65536

in addition, you probably want to put this into your /etc/netsvc.conf
hosts=local,bind4
This User Gave Thanks to zxmaus For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. AIX

FTP connection refused from text editor while accessing AIX server .

HI , I'm facing the FTP connection refused from text editor while accessing AIX server .It showing the messege "can't create ftp connection connectin refused".Though it is accessible from putty . i'm using aix version 6 . Can any one let me know the seetings needs to be made so that i... (2 Replies)
Discussion started by: rmkganesh
2 Replies

2. Ubuntu

ssh connection unstable on remote server

Hi I hope someone can spot what is wrong with this ssh connection as it has me baffled. I am trying to set up a remote ssh connection (passwordless) to a remote 'server', (Ubuntu laptop at home). I have tried these steps with rsa and dsa key types, (currently dsa) - 1) ssh-keygen... (4 Replies)
Discussion started by: steadyonabix
4 Replies

3. UNIX for Advanced & Expert Users

fork: Resource temporarily unavailable , server unexpectedly unavailable network connection

Solaris 10 Server refuse to connect :wall: fork: Resource temporarily unavailable , server unexpectedly unavailable network connection , refuse error, disconnect message, fatal error type2, (protocol error type2) Issue has been resolved after taken few steps :b: First of all need to check... (1 Reply)
Discussion started by: taherahmed
1 Replies

4. AIX

Help Me - AIX server connect to a VPN network

Hi, I have a task requested by my boss to create a script to enable a server to connect to a vpn network and then to connect to another server to upload some data... How can I connect to a vpn network from AIX server? via telnet? ssh? I have tried to google but mostly the answers are... (1 Reply)
Discussion started by: mushr00m
1 Replies

5. Solaris

Solaris 10 ftp connection problem (connection refused, connection timed out)

Hi everyone, I am hoping anyone of you could help me in this weird problem we have in 1 of our Solaris 10 servers. Lately, we have been having some ftp problems in this server. Though it can ping any server within the network, it seems that it can only ftp to a select few. For most servers, the... (4 Replies)
Discussion started by: labdakos
4 Replies

6. AIX

AIX OS problem? network problem?

Dear ALL. I installed AIX OS on customer sites. but Only one site is too slow when I connected telnet, ftp.. Ping is too fast. but telnet and FTP is not connected.. of course i check the configuration file on aix but it's normal. Do any Idea?? thanks in advance. - Jun - (3 Replies)
Discussion started by: Jeon Jun Seok
3 Replies

7. Solaris

Server unexpectedly closed network connection error in passwordless in ssh through

Hi , when i try to passwordless connection login in ssh through putty, i am getting the "Server unexpectedly closed network connection" error.i have already finished the public and private key settings for the particular user. thanks MaroV (1 Reply)
Discussion started by: vr_mari
1 Replies

8. Cybersecurity

Enable SSH for root over certain network connection of a server...is it possible?

Hi - I have a SUSE Enterprise Linux Server V9 that I have an issue with. Policy says that root connectivity via ssh needs to be disabled. So, to do that, I made the following change in the sshd_config section: # Authentication: #LoginGraceTime 2m #PermitRootLogin yes PermitRootLogin no... (3 Replies)
Discussion started by: cpolikowsky
3 Replies

9. UNIX for Dummies Questions & Answers

Network connection problem in unix

I have a network connection problem in unix. I am trying to access to LAN in unix but unfortunately it doesn't work. For example, if I ping from Unix to windows it says 'the network is unreacable'. Also, I can't ping in windows to unix. I would be glad if someone could help me. Thanks. ... (5 Replies)
Discussion started by: fatihshen
5 Replies

10. UNIX for Dummies Questions & Answers

FTP connection problem on new server

Solaris Server (V880) version 8 Brand new box just installed O/S, what do I need to do in order to get FTP working. I have updated the /etc/hosts.equiv file Problem - When trying to ftp to the server I get a login incorrect message, this is with the root user. Are there some... (1 Reply)
Discussion started by: miredale
1 Replies
Login or Register to Ask a Question