AIX 6.1 reach the threshold of stream(no -a|grep strthresh)


 
Thread Tools Search this Thread
Operating Systems AIX AIX 6.1 reach the threshold of stream(no -a|grep strthresh)
# 1  
Old 02-16-2013
AIX 6.1 reach the threshold of stream(no -a|grep strthresh)

last night i want to do oracle full backup with expdp when i switch to oracle it hangs,it looks like:
su - oracle
there is nothing feedback and hang ,but su - root work fine.
then i use truss su - oracle found it stuck at "ENOSR" ,then i changed the kernel parameter of strthresh from 85 to 90 and the su - oracle command works fine.


the no -a command output:
Code:
  no -a

                 arpqsize = 1024
               arpt_killc = 20
              arptab_bsiz = 7
                arptab_nb = 149
                bcastping = 0
      clean_partial_conns = 0
                 delayack = 0
            delayackports = {}
   dgd_flush_cached_route = 0
         dgd_packets_lost = 3
            dgd_ping_time = 5
           dgd_retry_time = 5
       directed_broadcast = 0
                 fasttimo = 200
                    hstcp = 0
        icmp6_errmsg_rate = 10
          icmpaddressmask = 0
ie5_old_multicast_mapping = 0
                   ifsize = 256
           igmpv2_deliver = 0
               ip6_defttl = 64
                ip6_prune = 1
            ip6forwarding = 0
       ip6srcrouteforward = 1
       ip_ifdelete_notify = 0
                 ip_nfrag = 200
             ipforwarding = 0
                ipfragttl = 2
        ipignoreredirects = 0
                ipqmaxlen = 512
          ipsendredirects = 1
        ipsrcrouteforward = 1
           ipsrcrouterecv = 0
           ipsrcroutesend = 1
               limited_ss = 0
          llsleep_timeout = 3
                  lo_perf = 1
                lowthresh = 90
                 main_if6 = 0
               main_site6 = 0
                 maxnip6q = 20
                   maxttl = 255
                medthresh = 95
               mpr_policy = 1
              multi_homed = 1
                nbc_limit = 11829248
            nbc_max_cache = 131072
            nbc_min_cache = 1
         nbc_ofile_hashsz = 12841
                 nbc_pseg = 0
           nbc_pseg_limit = 23658496
           ndd_event_name = {all}
        ndd_event_tracing = 0
              ndogthreads = 0
            ndp_mmaxtries = 3
            ndp_umaxtries = 3
                 ndpqsize = 50
                ndpt_down = 3
                ndpt_keep = 120
               ndpt_probe = 5
           ndpt_reachable = 30
             ndpt_retrans = 1
             net_buf_size = {all}
             net_buf_type = {all}
     net_malloc_frag_mask = {0}
        netm_page_promote = 1
           nonlocsrcroute = 0
                 nstrpush = 8
              passive_dgd = 0
         pmtu_default_age = 10
              pmtu_expire = 10
 pmtu_rediscover_interval = 30
              poolbuckets = 4
              psebufcalls = 20
                 psecache = 1
                psetimers = 20
           rfc1122addrchk = 0
                  rfc1323 = 1
                  rfc2414 = 1
             route_expire = 1
          routerevalidate = 0
     rtentry_lock_complex = 0
                 rto_high = 64
               rto_length = 13
                rto_limit = 7
                  rto_low = 1
                     sack = 0
                   sb_max = 4194304
       send_file_duration = 300
              site6_index = 0
               sockthresh = 85
                  sodebug = 0
              sodebug_env = 0
                somaxconn = 1024
                 strctlsz = 1024
                 strmsgsz = 0
                strthresh = 85
               strturncnt = 15
          subnetsarelocal = 1
       tcp_bad_port_limit = 0
        tcp_cwnd_modified = 0
                  tcp_ecn = 0
       tcp_ephemeral_high = 65500
        tcp_ephemeral_low = 9000
               tcp_fastlo = 0
     tcp_fastlo_crosswpar = 0
             tcp_finwait2 = 1200
           tcp_icmpsecure = 0
          tcp_init_window = 0
    tcp_inpcb_hashtab_siz = 24499
              tcp_keepcnt = 8
             tcp_keepidle = 14400
             tcp_keepinit = 150
            tcp_keepintvl = 150
     tcp_limited_transmit = 1
              tcp_low_rto = 0
             tcp_maxburst = 0
              tcp_mssdflt = 1460
          tcp_nagle_limit = 65535
        tcp_nagleoverride = 0
               tcp_ndebug = 100
              tcp_newreno = 1
           tcp_nodelayack = 0
        tcp_pmtu_discover = 1
            tcp_rand_port = 0
       tcp_rand_timestamp = 0
            tcp_recvspace = 65536
            tcp_sendspace = 65536
            tcp_tcpsecure = 0
             tcp_timewait = 1
                  tcp_ttl = 60
           tcprexmtthresh = 3
             tcptr_enable = 0
                  thewall = 47448064
         timer_wheel_tick = 0
                tn_filter = 1
       udp_bad_port_limit = 0
       udp_ephemeral_high = 65500
        udp_ephemeral_low = 9000
    udp_inpcb_hashtab_siz = 24499
        udp_pmtu_discover = 1
            udp_recvspace = 655360
            udp_sendspace = 65536
                  udp_ttl = 30
                 udpcksum = 1
           use_sndbufpool = 1

and i found the "delayed" column in the result of netstat -m output
is not 0,it likes:
Code:
******* CPU 24 *******
By size           inuse     calls failed   delayed    free   hiwat   freed
64                  119    239402      0        12     713   14824       0
128                  89    238427      0         0     679    7412       0
256                   5      1279      0         0      11   14824       0
512                  40     20422      0         1     104   18530       0
1024                107    238984      0       190     665    7412       0
2048                 67    116067      0       209     357   11118       0
4096                  1         5      0         1       4    3706       0
8192                  1         6      0         1       0     926       0
16384                 1         5      0         1       0     463       0
32768                 0         2      0         2       1     231       0
65536                 0         5      0         2       2     231       0
131072                0         0      0         0      16      32       0

Streams mblk statistic failures:
0 high priority mblk failures
0 medium priority mblk failures
0 low priority mblk failures



my questions is :
1、how can i know the stream usage on aix 6.1?


and now i suspect the problem is network issue but i don't know how to affirm that.


thanks!

tony
2013/2/17

---------- Post updated at 10:08 PM ---------- Previous update was at 10:03 PM ----------

the strthresh means:
AIX has another no option called "strthresh" which is defined as "Specifies the maximum number of bytes Streams are normally allowed to allocate. When the threshold is passed, does not allow users without the appropriate privilege to open Streams, push modules, or write to Streams devices, and returns ENOSR. The threshold applies only to output side and does not affect data coming into the system` (e.g. console continues to work properly). A value of zero means that there is no threshold. The strthresh attribute represents a percentage of the thewall attribute and you can set its value from 0 to 100. The thewall attribute indicates the maximum number of bytes that can be allocated by Streams and Sockets using the net_malloc() call. When you change thewall attribute, the threshold gets updated accordingly." Thank you for using AIX Support Family Services.


Moderator's Comments:
Mod Comment Use code tags, thanks.

Last edited by zaxxon; 02-18-2013 at 07:31 AM.. Reason: code tags, see PM
# 2  
Old 02-17-2013
sb_max, at 4Mbyte looks large enough, but i would increase the tcp_sendspace and tcp_recvspace. 256 or 512k, rather than 64k. Note, an application can overide the defaults, so maybe your real sizes are larger already.

how much real memory?
This User Gave Thanks to MichaelFelt For This Post:
# 3  
Old 02-17-2013
Quote:
Originally Posted by MichaelFelt
sb_max, at 4Mbyte looks large enough, but i would increase the tcp_sendspace and tcp_recvspace. 256 or 512k, rather than 64k. Note, an application can overide the defaults, so maybe your real sizes are larger already.

how much real memory?

thanks for your reply.the physical memory size is 96gb.
the application run on this machine is oracle 11gR2 RAC,i set the tcp_sendspace from the oracle manual and do it on other machine many times and never face this problem,how can ionitor the stream usage in aix?

thanks.
# 4  
Old 02-18-2013
A rather simple way to monitor socket activity (aka streams), especially for blockage is to look at netstat -tn output.

Code:
michael@x054:[/home/michael]netstat -tn | head -2; netstat -tn | grep ESTABLISHED | head
Active Internet connections
Proto Recv-Q Send-Q  Local Address          Foreign Address        (state)
tcp4       0     48  192.168.129.54.22      192.168.129.20.1348    ESTABLISHED

What you are looking for is numbers in the Send-Q and/or Recv-Q. If they are consistently at the sendspace/recvspace size then you may be suffering from network congestion outside the box - as TCP is doing what it can, then stopping and waiting for acknowledgements (Send-Q at max) and the "outside" is waiting for the server to wake up and respond when the Revc-Q is "stuck" at max.

I have looked at netstat -nm again. It is normal that there are some "delayed" numbers. Not sure why - probably has something to do with setting up the stack. What you want to watch for is "failed" - as that indicates, mainly, not enough memory for communications.

Question: as this sometimes occurs: are you using large sends (e.g., MTU of 9000) while the network and/or endpoints cannot support that?

What does netstat -p tcp report?
# 5  
Old 02-18-2013
1、netstat -m output of failed is consistently zero.
2、mtu
Code:
for i in 1 2 
do
lsattr -El en$i|grep mtu
done

both of en1 and en2's mtu are 1500
the application run this machine is oracle 11gR2 rac,and the client is middleware tuxedo
3、first output
Code:
# netstat -tn | head -2;netstat -tn | grep ESTABLISHED | head
Active Internet connections
Proto Recv-Q Send-Q  Local Address          Foreign Address        (state)
tcp4       0      0  172.18.100.9.1521      172.18.100.5.52799     ESTABLISHED
tcp4       0      0  172.18.100.9.1521      172.18.100.5.52905     ESTABLISHED
tcp4       0      0  172.18.100.9.1521      172.18.100.5.53378     ESTABLISHED
tcp4       0      0  172.18.100.9.1521      172.18.100.5.55738     ESTABLISHED
tcp4       0      0  127.0.0.1.6100         127.0.0.1.65429        ESTABLISHED
tcp4       0      0  127.0.0.1.65429        127.0.0.1.6100         ESTABLISHED
tcp4       0      0  172.18.100.9.1521      172.18.100.5.56564     ESTABLISHED
tcp4       0      0  172.18.100.9.1521      172.18.100.5.32805     ESTABLISHED
tcp4       0      0  172.18.100.9.1521      172.18.100.5.32806     ESTABLISHED
tcp4       0      0  172.18.100.9.1521      172.18.100.5.32807     ESTABLISHED

4、second output
Code:
 # netstat -tn | head -2; netstat -tn | grep ESTABLISHED | head
Active Internet connections
Proto Recv-Q Send-Q  Local Address          Foreign Address        (state)
tcp4       0      0  172.18.100.9.1521      172.18.100.5.52799     ESTABLISHED
tcp4       0      0  172.18.100.9.1521      172.18.100.5.52905     ESTABLISHED
tcp4       0      0  172.18.100.9.1521      172.18.100.5.53378     ESTABLISHED
tcp4       0      0  172.18.100.9.1521      172.18.100.5.55738     ESTABLISHED
tcp4       0      0  127.0.0.1.6100         127.0.0.1.65429        ESTABLISHED
tcp4       0      0  127.0.0.1.65429        127.0.0.1.6100         ESTABLISHED
tcp4       0      0  172.18.100.9.1521      172.18.100.5.56564     ESTABLISHED
tcp4       0      0  172.18.100.9.1521      172.18.100.5.32805     ESTABLISHED
tcp4       0      0  172.18.100.9.1521      172.18.100.5.32806     ESTABLISHED
tcp4       0      0  172.18.100.9.1521      172.18.100.5.32807     ESTABLISHED

5、netstat -p tcp output
Code:
tcp:
       447631444 packets sent
               425423593 data packets (2452701253 bytes)
               2796 data packets (2109430 bytes) retransmitted
               2332528 ack-only packets (1814129 delayed)
               0 URG only packets
               3 window probe packets
               19339756 window update packets
               1065558 control packets
               20593693 large sends
               2004898651 bytes sent using largesend
               4170240 bytes is the biggest largesend
       422219647 packets received
               341923413 acks (for 2452662545 bytes)
               216669 duplicate acks
               0 acks for unsent data
               315335086 packets (1007197678 bytes) received in-sequence
               7624 completely duplicate packets (60958 bytes)
               488 old duplicate packets
               0 packets with some dup. data (0 bytes duped)
               138640 out-of-order packets (42518 bytes)
               0 packets (0 bytes) of data after window
               0 window probes
              524552 window update packets
               2407 packets received after close
               0 packets with bad hardware assisted checksum
               0 discarded for bad checksums
               0 discarded for bad header offset fields
               0 discarded because packet too short
               1489 discarded by listeners
               0 discarded due to listener's queue full
               90949323 ack packet headers correctly predicted
               79016426 data packet headers correctly predicted
       267916 connection requests
       110757 connection accepts
       258066 connections established (including accepts)
       391662 connections closed (including 613 drops)
       0 connections with ECN capability
       0 times responded to ECN
       118143 embryonic connections dropped
       320167932 segments updated rtt (of 265052429 attempts)
       0 segments with congestion window reduced bit set
       0 segments with congestion experienced bit set
       0 resends due to path MTU discovery
       2563 path MTU discovery terminations due to retransmits
       9875 retransmit timeouts
               0 connections dropped by rexmit timeout
       55 fast retransmits
               4 when congestion window less than 4 segments
       48 newreno retransmits
       0 times avoided false fast retransmits
       2 persist timeouts
               0 connections dropped due to persist timeout
       7550 keepalive timeouts
               0 keepalive probes sent
               1 connection dropped by keepalive
       0 times SACK blocks array is extended
       0 times SACK holes array is extended
       0 packets dropped due to memory allocation failure
       0 connections in timewait reused
       0 delayed ACKs for SYN
       0 delayed ACKs for FIN
       0 send_and_disconnects
       0 spliced connections
       0 spliced connections closed
       0 spliced connections reset
       0 spliced connections timeout
       0 spliced connections persist timeout
       0 spliced connections keepalive timeout
       7 TCP checksum offload disabled during retransmit
       27 Connections dropped due to bad ACKs
       0 Connections dropped due to duplicate SYN packets
       0 fastpath loopback connections
       0 fastpath loopback sent packets (0 bytes)
       0 fastpath loopback received packets (0 bytes)

6、this qustion has appeared 2 times,first time i changed the strthresh from 85 to 92 ,the second from 90 to 92,a few days ago the ibm engineer told me can modify the strthresh to 0 and the stream has no limit,i dont't modify that because i am worry about if i change to 0 when the stream usage reach 100% and whole system is hang until i reboot the system from hmc or something.
7、the system has reboot a week ago and the switch hang has found before that,so the netstat's statistics was lost


thanks a lot for your helping.

Last edited by zaxxon; 02-19-2013 at 06:07 AM.. Reason: uncomplete usage of code tags
# 6  
Old 02-19-2013
Quote:
and now i suspect the problem is network issue but i don't know how to affirm that.
I have been approaching this as an AIX configuration issue because changing a setting has helped it "go away". Needed: better definition of what you mean by "network issue".

Some data during/after the problem (during - repeating commands to look for deltas helps pin-point what the system is trying to say).

Sort of: no Pain, no Gain.

In any case - real data values - during a backup are needed to know if we are looking at this properly.

FYI: not sure what the limits are these days. Back when CHRP (Common Hardware Reference Platform) first came out IP buffers were limited to 4x 256MB memory, or 1G - up to 50% of memory (so, when more than 2G of memory, maximum was 1G)
no -o thewall tells us the HW limit (1k value) - so roughly, drop 6 digits, and you get the GByte value - on my system with 9G - that makes it near 50% still.

Your number: thewall = 47448064 goes down to 47.
I do not see this as being your limiting factor - unless it is conflicting with something else. However, there is a second variable to set a limit under thewall.

Quote:
You can also use the maxmbuf tunable to lower the thewall limit. You can view the maxmbuf tunable value by running the lsattr -E -l sys0 command . If the maxmbuf value is greater than 0 , the maxmbuf value is used regardless of the value of thewall tunable.
The default value for the maxmbuf tunable is 0. A value of 0 for the maxmbuf tunable indicates that the thewall tunable is used. You can change the maxmbuf tunable value by using the chdev or smitty commands.


Hope this helps moving forward!
# 7  
Old 02-19-2013
sorry about my english,the "network issue" i mean that is network problem ,maybe the networker parameter of kernel parameter set a wrong value or network card and cable has something wrong .


Code:
lsattr -El sys0|grep maxmbuf

is default 0.

the physical memory is 96g,thewall set 47G ,is nearly 50% of ram.


thanks!


Moderator's Comments:
Mod Comment Use code tags constantly for code, logs, snippets etc. thanks. You got a reminder about using it after just a PM seems to be not sufficient.

Last edited by zaxxon; 02-19-2013 at 06:09 AM.. Reason: code tags
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. Post Here to Contact Site Administrators and Moderators

Trying to reach Neo, but unable to.

Hello, I am trying to reach forum administrator Neo because of specific material that he posted here. I cannot reach him directly via PM him unless I have made 10 posts, but at the same time I don't want to spam the forum with 10 posts just to be able to PM him. I assume I won't be able to... (3 Replies)
Discussion started by: Artashes
3 Replies

2. Shell Programming and Scripting

Help with script to reach remote servers

new3=`cat /tmp/list3` for pol in "$new3" $(su - dbadmin -c "ssh $new3 '/usr//llist'"); do export policy=`echo $pol`; su - dbadmin -c "ssh $x '/usr/policycmd $policy -L |grep -i active; echo $policy'">>/tmp/listxyz;done I am having trouble with this testscript as the file list3 has two names... (1 Reply)
Discussion started by: newbie2010
1 Replies

3. AIX

When AIX audit start, How to set the /audit/stream.out file size ?

Dear All When I start the AIX(6100-06)audit subsystem. the log will save in /audit/stream.out (or /audit/trail), but in default when /audit/stream.out to grow up to 150MB. It will replace the original /audit/stream.out (or /audit/trail). Then the /audit/stream.out become empty and... (2 Replies)
Discussion started by: nnnnnnine
2 Replies

4. AIX

AIX Scan Rate threshold

Hi Guys, The management is being frisky about scan rate in the range of a few thousands ( 4 digit scan rates occasionally). After much research ive concluded that its ok to have high scan rates , unless it leads to swapping/ it falls above 1:4 ratio with free rate (fr:sr) My question is:... (2 Replies)
Discussion started by: muzahed
2 Replies

5. Infrastructure Monitoring

How to setup Email notification when storage reach certain % ?

Hi, I recently research on how auto-mailing to notify the increase of storage size. I try avoid schedule/routine checkup the storage to determine increase the storage size. It is time-consuming. Any comment on how to get the storage size %? and automatically trigger mailing function instead... (16 Replies)
Discussion started by: i-counsellor
16 Replies

6. Shell Programming and Scripting

[Video stream] network stream recording with mplayer

Hi I used this command: mplayer http://host/axis-cgi/mjpg/video.cgi -user root -passwd root \ -cache 1024 -fps 25.0 -nosound -vc ffh264 \ -demuxer 3 -dumpstream -dumpfile output.avi It's ok but... Video Playing is very fast! Why? Is it a synch problem? What parameter I have to use for... (1 Reply)
Discussion started by: takeo.kikuta
1 Replies

7. IP Networking

can't reach host with ipv6

Hi all, First I know little about ipv6. I have two target. A and B,A and B connet with each other *directly* with line.and I can ping each other with ipv4 For A: # ifconfig eth0 Link encap:Ethernet HWaddr 00:21:9B:80:51:68 inet addr:128.224.159.188 Bcast:128.224.159.255... (1 Reply)
Discussion started by: yanglei_fage
1 Replies

8. BSD

How to reach files from tape drive using dd

Hi all! I have problem with copying files from tape drive. The contents of tape: silverman# tcopy /dev/sa1 file 0: block size 10240: 21 records file 0: eof after 21 records: 215040 bytes file 1: block size 10240: 20712 records file 1: eof after 20712 records: 212090880 bytes file 2:... (2 Replies)
Discussion started by: d3m00n
2 Replies

9. UNIX for Dummies Questions & Answers

How to reach to the end of the file?

What is the command, which takes cursor to the end of file which is opened within vi editor? (4 Replies)
Discussion started by: videsh77
4 Replies
Login or Register to Ask a Question