can-not detect TCP disconnects well


 
Thread Tools Search this Thread
Top Forums Programming can-not detect TCP disconnects well
# 1  
Old 08-21-2008
can-not detect TCP disconnects well

Hello everyone. Thanks for reading. I am using Ubuntu 7.04 to experience this problem:

I have written my own programs that communicate to eachother and I am having a hard time detecting a TCP socket disconnect when the remote side's computer has a power-failure (for example).

On the computer that stays up my program continually polls the socket and tries to send a status message. These will never end up failing.

The poll just returns 0 implying a timeout, and the poll before a send returns with a POLLOUT and a send returns the number of bytes I tried to send, implying that it sent properly.

This goes on for ever. I am trying to figure out how to detect that the socket is down so I can clean up my end and listen for the computer to connect again.




Thanks!
PJW
# 2  
Old 08-23-2008
It would be easier if you post the code you're using.

Generally servers simply send a packet every X minutes and wait for reply for Y minutes, if there is no reply they assume the connection is timed out. You can use poll(), alarm(), setitimer()...
# 3  
Old 09-03-2008
see manpage setsockopt() and pay special attention to so_keepalive and tcp_keepalive or tcp_keepidle options. BTW, options are system specified.
# 4  
Old 09-03-2008
Tools

mika is correct. You really have to understand the TCP protocol to understand "exceptional" behavior, such as what you're dealing with. The TCP protocol is designed for fast AND slow networks. It's also built around reliable transmission when things in the middle temporarily break or get clogged. Thus, TCP is very tolerant of errors and transmission interruptions, and what you're trying to do is make it intolerant of errors.

Let's say you take mika's suggestion and set the socket options so that the timeout is shortened. Now what if the client application is operating over a modem or over a VPN (virtual private network)? In both of these cases, your connection might temporarily become very slow -- there's noise on the telephone line and the modems need several seconds to renegotiate, resulting in lots of lost packets, or in the case of the VPN, there is a re-exchange of encryption keys which suspends communications for several seconds. You might find you have made your TCP connection incapable of surviving these "normal" exceptions.

Another possibility is to leave these TCP parameters alone and implement an out-of-band "heartbeat". You could do this with ICMP or UDP. As soon as you lose your heartbeat (several times in a row to be sure), your client application shuts down the socket. The upside is portability (sort of). But there are several downsides as well, like learning a new set of programming protocols, multithreading (or at least interprocess communication between the heartbeat and your client application), and getting around firewalls, which might block your heartbeats.

Is the can of worms making you sick yet?
# 5  
Old 09-03-2008
poll() and select() aren't the operative indicators.

send() to a closed port should error. You need to check the return from this function.
If your client really depends on server response then make sure that the client knows
that the message was not received. It seems to me that in any tcp based scenario the
client would have ample notification based on l4 feedback.
# 6  
Old 04-27-2009
Code:
"""
tcp_disconnect.py
Echo network data test program in python. This program easily translates to C & Java.

By TCP rules, the only way for a server program to know if a client has disconnected,
is to try to read from the socket. Specifically, if select() says there is data, but
recv() returns 0 bytes of data, then this implies the client has disconnected.

But a server program might want to confirm that a tcp client is still connected without
reading data. For example, before it performs some task or sends data to the client.
This program will demonstrate how to detect a TCP client disconnect without reading data.

The method to do this:
1) select on socket as poll (no wait)
2) if no recv data waiting, then client still connected
3) if recv data waiting, the read one char using PEEK flag 
4) if PEEK data len=0, then client has disconnected, otherwise its connected.
Note, the peek flag will read data without removing it from tcp queue.

To see it in action: 0) run this program on one computer 1) from another computer, 
connect via telnet port 12345, 2) type a line of data 3) wait to see it echo, 
4) type another line, 5) disconnect quickly, 6) watch the program will detect the 
disconnect and exit.

I hope this is helpful to someone. John Masinter, 17-Dec-2008.
"""

import socket
import time
import select

HOST = ''       # all local interfaces
PORT = 12345    # port to listen

# listen for new TCP connections
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
s.bind((HOST, PORT))
s.listen(1)
# accept new conneciton
conn, addr = s.accept()
print 'Connected by', addr
# loop reading/echoing, until client disconnects
try:
    conn.send("Send me data, and I will echo it back after a short delay.\n")
    while 1:
        data = conn.recv(1024)                          # recv all data queued
        if not data: break                              # client disconnected
        time.sleep(3)                                   # simulate time consuming work
        # below will detect if client disconnects during sleep
        r, w, e = select.select([conn], [], [], 0)      # more data waiting?
        print "select: r=%s w=%s e=%s" % (r,w,e)        # debug output to command line
        if r:                                           # yes, data avail to read.
            t = conn.recv(1024, socket.MSG_PEEK)        # read without remove from queue
            print "peek: len=%d, data=%s" % (len(t),t)  # debug output
            if len(t)==0:                               # length of data peeked 0?
                print "Client disconnected."            # client disconnected
                break                                   # quit program
        conn.send("-->"+data)                           # echo only if still connected
finally:
    conn.close()

# 7  
Old 05-25-2009
Change Protocol

Hello pjwhite,

It the remote computer simply crash or network is cut, then how to to catch that situation? As I understand that's the question?

Let me say few words about TCP/IP

1. It is reliable and stream based protocol ...

Each sending side has a buffer for example 32 kb buffer and if there is room at the buffer all "send" or "write" operations succeeds immediately without any error and without caring if the data reaches the remote end or not. Next the tcp/ip implementation starts to send the current window of the stream as a sequence of IP packets re-transmitting them until any reply from the remote side. Without any reply this side will not know what happen.


I would like to propose to you to change your protocol ( if yours ), and to introduce ability to ping the other side with a special kind of message. Most already designed protocol has ability to do such activity or they don't need it.

I've seen from your post that you have already created that.
I mean you can create a special message that requires a special reply :

1. If there is some in-activity time you can simply pass the special message, just 1, and if the reply didn't come to start re-establishing the connection.

2. 2 is like 1.but uses OOB ( Out-Of-Band Data for such "ping" message )

Actually after sending the special message if there is no reply you can try to re-connect and if that operation fails. Actually reconnecting will also "hang" if the network is down. So timeout and asynchronoust connect is also recommended.

Best Regards
O.

or try the mika's recommendation to use keep alive but please check it if it is working. And you should configure something

"A Transmission Control Protocol (TCP) keep-alive packet is an acknowledgment (ACK) with the sequence number set to one less than the current sequence number for the connection. The Transmission Control Protocol/Internet Protocol (TCP/IP) stack can automatically generate these keep-alive messages to verify that the computer at the remote end of a connection is still available."

Best Regards
O.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. SCO

Telnet session disconnects abruptly

I have inherited and SCO OpenServer Release 6 server. The clients connect using telnet to get to a proprietary database application for Service tickets. The issue I am currently having is that the connection just stops abruptly and you can see "telnet session terminated" on the terminal emulation... (22 Replies)
Discussion started by: sean6605
22 Replies

2. UNIX for Dummies Questions & Answers

Persistent terminal windows across viewer disconnects

Is there a way to have persistent terminal windows to redhat server across viewer disconnects? I can do that with the help of an extra MS Windows server and rdp, but is there a way of doing that without the Windows server? Here's the scenario. I have multiple redhat servers (VMs) which have no... (3 Replies)
Discussion started by: ad101
3 Replies

3. Solaris

Too much TCP retransmitted and TCP duplicate on server Oracle Solaris 10

I have problem with oracle solaris 10 running on oracle sparc T4-2 server. Os information: 5.10 Generic_150400-03 sun4v sparc sun4v Output from tcpstat.d script TCP bytes: out outRetrans in inDup inUnorder 6833763 7300 98884 0... (2 Replies)
Discussion started by: insatiable1610
2 Replies

4. Red Hat

Network disconnects often

Hi , My redhat 5 frequently disconnects from network. Once rebooted , network is working for one day or two. After that the NIC suddently stops working. Even if i give "#service network restart" or ifup eth0 commands it won't come up. I even tried reconfigure the network card. but no use. Only... (6 Replies)
Discussion started by: dknattukal
6 Replies

5. Programming

[C++] [Unix] TCP non-blocking. Detect server disconnection procedure over, from client.

Hello! I searched forum for similar topic, with no luck, if you know one, delete this topic, and send me private message with link please. Little background: I have a lot of clients and one serwer. Client can make multiple connections on different ports and ips, but only one can be acctive... (2 Replies)
Discussion started by: ikeban
2 Replies

6. Programming

Socket endpoints disconnects

Hello, I am able to establish a connection, and transfer data. Occasionally the receiving client will block in read(2) and stay that way until it is killed. initial: server: netstat -aveeopT tcp 0 0 *:17398 *:* LISTEN server-user... (8 Replies)
Discussion started by: eoa
8 Replies

7. Programming

How detect TCP/IP socket shutdown when ethernet cable is disconnected

Hi, I want to code TCP/IP client/server in linux application capable to fastly detect ethernet cable disconnection in any condition. So I have activate SO_KEEPALIVE options and set TCP_KEEPCNT, TCP_KEEPIDLE and TCP_KEEPINTVL to 1. When I disconnect ethernet cable I have the following... (5 Replies)
Discussion started by: jeje_clb
5 Replies

8. Windows & DOS: Issues & Discussions

Putty disconnects after sometime

i am using Putty to do ssh to all the unix nodes that we have in our work environment. it is very strange that all my network connections will timeout quickly in 10 mins, it can either be a putty connection, sqlplus or toad. is there some setting that can help to prevent this. please let me know... (3 Replies)
Discussion started by: sudhiroracle
3 Replies

9. AIX

Telnet disconnects on handheld device AIX

I have intermec handheld device which is connecting to AIX Server on port 12431 or whatever. ( oracle application ) The handheld device connects for few seconds and then disconnects from the AIX server. Once it disconnects the handheld device automatically switches off. Are there any... (2 Replies)
Discussion started by: filosophizer
2 Replies

10. AIX

aix telnet disconnects

We're having problems getting disconnected from AIX with our telnet sessions. I can't ping the server when this happens, either. Other serves can be pinged at the same time. This happens both at unix and within the database. Database locks remain when editing files. unix logins remain after... (0 Replies)
Discussion started by: e1lyons
0 Replies
Login or Register to Ask a Question