So_keepalive


 
Thread Tools Search this Thread
Operating Systems AIX So_keepalive
# 1  
Old 06-18-2008
So_keepalive

Hi,

TCP/IP connection timeout issue:
Here is the scenario:
1) WLS Server1 (from Machine 1) is creating a conn pool to Oracle RAC node1 (on Machine 2) and node2 (Machine 3).
When we do the Oracle node panic shutdown of node1, we noticed failover is happening successfully, but with few stuck threads. After analyzing the stack trace, we noticed that there is still a socket open from Machine1 to node1 (macihne2 ) as ESTABLISHED state though node1 is dead.

When we contacted serveral teams, someone recommended SO_KEEPALIVE needs to be enabled. Any suggestions how to enable SO_KEEPALIVE flag to true? Here are the current TCP settings for Mahine 1:

Versions: AIX 5.3, WLS9.2, Oracle 10g Thin.
TCP/IP settings: no -a |grep tcp:
tcp_bad_port_limit = 0
tcp_ecn = 0
tcp_ephemeral_high = 65535
tcp_ephemeral_low = 32768
tcp_finwait2 = 1200
tcp_icmpsecure = 0
tcp_init_window = 0
tcp_inpcb_hashtab_siz = 24499
tcp_keepcnt = 8
tcp_keepidle = 600
tcp_keepinit = 150
tcp_keepintvl = 150
tcp_limited_transmit = 1
tcp_low_rto = 0
tcp_maxburst = 0
tcp_mssdflt = 1460
tcp_nagle_limit = 65535
tcp_nagleoverride = 0
tcp_ndebug = 100
tcp_newreno = 1
tcp_nodelayack = 0
tcp_pmtu_discover = 1
tcp_recvspace = 262144
tcp_sendspace = 262144
tcp_tcpsecure = 0
tcp_timewait = 1
tcp_ttl = 60
tcprexmtthresh = 3


thanks in advance for your advises.
regards,
Amir
# 2  
Old 06-19-2008
You might check for the network option "tcp_keepidle", maybe reduce it's value.
# 3  
Old 06-19-2008
zaxxon is right, tcp_keepinit is probably the culprit here.

All these values can be set with the "no" command (see "man no" on how to do it along with a explanation of what the items do). All these changes made through "no" will be volatile, not surviving the next reboot.

If you want to make them lasting change the file /etc/tunables/nextboot accordingly.

I hope this helps.

bakunin
# 4  
Old 06-19-2008
You can also use the option "-p" on "no" which stands for "permanent" so it will count in on next reboot too. I guess it will just make the entry for you, which bakuni n mentioned.
# 5  
Old 06-19-2008
tcp_kepidle is set for 5 mins (600 1/2 sec). We also tested with 120 1/2 sec which is 1 min. Still nothing is happened. We still have same issue even after 12 hours.
# 6  
Old 06-19-2008
Tested with recommended values still no luck. We have some info from the tcpdump.

tcp_keepidle=600
tcp_keepintvl=10
tcp_keepinit=40

As per the core server documentations, needs to enable the SO_KEEPALIVE but not give much info.
Not sure how and where to set this. Do I set this in AIX or in Weblogic or in Oracle thin driver?
# 7  
Old 06-20-2008
Ok, i think we are talking two different things here:

tcp_keepidle is the time (in .5 sec units) an idle TCP connection will be held open instead of closing it. This value is set via the no command or in /etc/tunables (no -p will do exactly this, as zaxxon rightly assumed).

SO_KEEPALIVE does the following: every IP host is required (RFC 1122, Requirements for IP hosts) to be able to send/receive-and-answer certain ACK packets. The time interval of these packets to be sent over an open connection is set with this option in the setsockopt() system call (see pSeries and AIX Information Center ). This interval is set for every new socket to be created.

Therefore it must be set by the application, not the OS.

There is another tuning parameter tcp_keepintvl which seems to do the same as SO_KEEPALIVE. The documentation does not explicitly state it but i suppose this is the system default if SO_KEEPALIVE is not set individually upon opening the socket. The default for this value is 150 (^=75 sec).

But SO_KEEPALIVE and tcp_keepintvl are both not dealing with idle but with broken connections. A connection which is valid but idle will not be closed until the time stated in tcp_keepalive has passed. During this time the connection will be checked periodically (in SO_KEEPALIVE intervals) it it still would work.

Searching for examples i found that setting SO_KEEPALIVE is configurable value in Websphere MQ, so this supports my assumption that it is an application parameter, not an OS parameter.

See here: Help -

I hope this helps.

bakunin
Login or Register to Ask a Question

Previous Thread | Next Thread
Login or Register to Ask a Question