Socket endpoints disconnects


 
Thread Tools Search this Thread
Top Forums Programming Socket endpoints disconnects
# 1  
Old 09-13-2011
Socket endpoints disconnects

Hello,

I am able to establish a connection, and transfer data. Occasionally the receiving client will block in read(2) and stay that way until it is killed.

initial:
Code:
server: netstat -aveeopT
tcp        0      0 *:17398                     *:*                         LISTEN      server-user     49951      13406/server off (0.00/0/0)

(Connection established and data transferred)

Lost connection:
Code:
server: netstat -aveeopT
tcp        0 534312 server-ip:17398   client-ip:57566           ESTABLISHED server-user     57181      13406/server on (0.51/2/0)

client: netstat -aveeopT
tcp        0      0 client-ip:57566 server-ip:17398   ESTABLISHED client-user    44526      27082/client off (0.00/0/0)

At this point the server issues write(2) calls on the non-blocking socket and receives -1/errno=EAGAIN, eventually the socket times out and write(2) returns -1/errno=ETIMEDOUT. The server then close(2) the connection.

Code:
server:netstat -aveeopT
tcp        0      0 *:17398                     *:*                         LISTEN      server-user     49951      13406/server off (0.00/0/0)
client:netstat -aveeopT
tcp        0      0 client-ip:57566 server-ip:17398   ESTABLISHED client-user    44526      27082/client off (0.00/0/0)

********

The client is now stuck in read(2) on a blocking connection.

I don't have a lot of information about the server or client computers, but:
server:
Linux server-name 2.6.18-194.el5 #1 SMP Fri Apr 2 14:58:14 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

client:
Linux client-name 2.6.18-92.el5 #1 SMP Tue Jun 10 18:51:06 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux

Have anyone seen this before ?
Any suggestions of how to debug this would be greatly appreciated. I am not exactly a tcp expert and could use some tips.

[This worked fine when the server was running on SunOS]


Regards,


Even

Last edited by pludi; 09-13-2011 at 10:35 AM..
# 2  
Old 09-13-2011
I'm pretty sure this is a kernel bug. I get the same behavior on 2.6.19 sometimes; and unfortunately I am forced to stick with this version for the time being!
# 3  
Old 09-16-2011
Gidday,

Quote:
Originally Posted by Corona688
I'm pretty sure this is a kernel bug. I get the same behavior on 2.6.19 sometimes; and unfortunately I am forced to stick with this version for the time being!
To confirm this assumption, it would be useful to tcpdump what the OP did above. We would see what happens at the protocol level, and possibly trace kernel misbehavior. I'd be in particular interested to know if the server sends RST after closing the socket, and if the client received it at the link layer.

Cheers,
/Lew
# 4  
Old 09-16-2011
client.log.gz and server.log.gz attached

I grepped for the port number 17398, so there might be some unrelated packets in there.

server log: After the last TCP Retransmission the connection timed out causing write(2) to return ETIMEDOUT which caused the server to issue a close(2) on the socket.

This triggered no new packets (at least not on the server side). Then I shutdown the server, but that didn't trigger anything either, the client was still blocked in read(2).

When I killed the client (Ctrl-C), the remaining entries (FIN etc) appeared in the server log.

I said earlier that this worked when the server was run on a SunOS computer, this is not correct, it fails in the same way.

For these tests the following computers was used:
server> uname -a
Linux server-name 2.6.34-gentoo-r12 #5 SMP Tue Apr 5 12:56:20 CEST 2011 x86_64 Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz GenuineIntel GNU/Linux

client> uname -a
Linux client-name 2.6.18-194.26.1.el5 #1 SMP Tue Nov 9 12:54:20 EST 2010 x86_64 x86_64 x86_64 GNU/Linux

Even
# 5  
Old 09-16-2011
Quote:
Originally Posted by NH2
To confirm this assumption, it would be useful to tcpdump what the OP did above.
I'm not talking about sockets hanging until they die. I'm talking about some sockets that, once they hang, fail to die by themselves -- ever.

They won't time out in the usual 10 minutes. They won't time out in 10 days. They don't abide by manually-set TCP timeouts. Not even -TERM or -QUIT will stop a process stuck reading one. Luckily -KILL still works.
# 6  
Old 09-17-2011
Hi,

Quote:
I grepped for the port number 17398, so there might be some unrelated packets in there.
You could use a command similar to the following to allow trace the communication between your server running at port 17398 and your client(s):
Code:
tcpdump -i <interface> src port 17398 or dst port 17398

Quote:
server log: After the last TCP Retransmission the connection timed out causing write(2) to return ETIMEDOUT which caused the server to issue a close(2) on the socket.
Are you sure that you're performing a close(). A close() on socket at the server side should cause a FIN segment to be sent to the client(s).

Code:
This triggered no new packets (at least not on the server side). Then I shutdown the server, but that didn't trigger anything either, the client was still blocked in read(2).

What do you mean by "shutdown the server". Is the process terminated? (e.g. with KILL).

Code:
 I said earlier that this worked when the server was run on a SunOS computer, this is not correct, it fails in the same way.

This seems to indicate that there is a problem with your server code.

/Lew
# 7  
Old 09-19-2011
Hi,

********
> Are you sure you're performing a close.
Yes, since I am now logging the calls (see below).

> What do you mean by "shutdown the server". ...
Server responds to Ctrl-C by close(2) ing the listening socket and exit cleanly.

>> run on a SunOS computer, ...
> This seems to indicate that there is a problem...
Both the client and server software has worked for years (I know this doesn't prove anything).
However if one end is constantly calling write(2) and the other end is blocked in read(2) there isn't a lot of things that can go wrong (is it ?).

********

Four instances of the server runs on ports 17395..17398, clients issues two connections alternating on 17395/17396 and 17397/17398.

* When communication has stopped:
server issues non-blocking write(2) calls, client is blocked in read(2).

server retransmits continuously:
"17206","79.425849","server-ip","client-ip","TCP","[TCP Retransmission] 17395 > 34386 [ACK] Seq=1295964 Ack=224 Win=6912 Len=1448 TSV=528357141 TSER=3610227624"

Why does the client not respond to this ?
(A tail on the client log at this point in time shows no new entries).
The last client entry is :
10:10:37.406609 IP (tos 0x0, ttl 64, id 58005, offset 0, flags [DF], proto: TCP (6), length: 80) client-ip:34386 > server-ip:17395: ., cksum 0xf106 (correct), 223:223(0) ack 1295964 win 5114 <nop,nop,timestamp 3610227622 528289197,nop,nop,sack 3 {1339404:1933084}{1330716:1332164}{1319132:1323476}>
(It is possible that the tcpdump I/O buffers has not been flushed so that the log is not complete here).


netstat after the communication has stopped:

server$ netstat:
tcp 0 936856 server-ip:17395 client-ip:34386 ESTABLISHED

client$ netstat:
tcp 0 0 client-ip:34386 server-ip:17395 ESTABLISHED

The next thing that happens is that write(2) fails,
write(2) -> -1, errno = 110(TimeOut)
shutdown(fd, 2) -> -1, errno = 107 (Not connected)
close(fd) -> 0

No packets are transmitted when this happens.
And it seems the retransmissions have stopped.

Now the connection does not show up in netstat on the server, but on the client it is the same:

client$ netstat
tcp 0 0 client:34386 server:17395 ESTABLISHED


**********
The server reacts to Ctrl-C by doing a close(2) on the filedescriptor that was passed to bind/listen, and then exits cleanly.

When this happens no packets is transmitted and the
server process no longer exists

*********
Finally I kill the client with Ctrl-C
That cause 9 FIN events to be sent from the client.

netstat on the client shows that the connection is gone.

*********
It seems the client is continuously sending ACK for packet 1295964, why ?
"6512","11.705225","10.21.33.125","10.30.33.154","TCP","[TCP Dup ACK 5345#428] 34386 > 17395 [ACK] Seq=224 Ack=1295964 Win=654592 Len=0 TSV=3610227624 TSER=528289197 SLE=1339404 SRE=1950460 SLE=1330716 SRE=1332164 SLE=1319132 SRE=1323476"

And packets with this id (or whatever it is called) seems to be passed between the server and the client.

(attached two more logs, not sure if they reveal anything else).


Regards,

Even
Login or Register to Ask a Question

Previous Thread | Next Thread

8 More Discussions You Might Find Interesting

1. SCO

Telnet session disconnects abruptly

I have inherited and SCO OpenServer Release 6 server. The clients connect using telnet to get to a proprietary database application for Service tickets. The issue I am currently having is that the connection just stops abruptly and you can see "telnet session terminated" on the terminal emulation... (22 Replies)
Discussion started by: sean6605
22 Replies

2. UNIX for Dummies Questions & Answers

Persistent terminal windows across viewer disconnects

Is there a way to have persistent terminal windows to redhat server across viewer disconnects? I can do that with the help of an extra MS Windows server and rdp, but is there a way of doing that without the Windows server? Here's the scenario. I have multiple redhat servers (VMs) which have no... (3 Replies)
Discussion started by: ad101
3 Replies

3. Red Hat

Network disconnects often

Hi , My redhat 5 frequently disconnects from network. Once rebooted , network is working for one day or two. After that the NIC suddently stops working. Even if i give "#service network restart" or ifup eth0 commands it won't come up. I even tried reconfigure the network card. but no use. Only... (6 Replies)
Discussion started by: dknattukal
6 Replies

4. IP Networking

Clarification - Setting socket options at the same time when socket is listening

I need clarification on whether it is okay to set socket options on a listening socket simultaneously when it is being used in an accept() call? Following is the scenario:- -- Task 1 - is executing in a loop - polling a listen socket, lets call it 'fd', (whose file descriptor is global)... (2 Replies)
Discussion started by: jake24
2 Replies

5. Programming

can-not detect TCP disconnects well

Hello everyone. Thanks for reading. I am using Ubuntu 7.04 to experience this problem: I have written my own programs that communicate to eachother and I am having a hard time detecting a TCP socket disconnect when the remote side's computer has a power-failure (for example). On the computer... (6 Replies)
Discussion started by: pjwhite
6 Replies

6. Windows & DOS: Issues & Discussions

Putty disconnects after sometime

i am using Putty to do ssh to all the unix nodes that we have in our work environment. it is very strange that all my network connections will timeout quickly in 10 mins, it can either be a putty connection, sqlplus or toad. is there some setting that can help to prevent this. please let me know... (3 Replies)
Discussion started by: sudhiroracle
3 Replies

7. UNIX for Advanced & Expert Users

connect problem for sctp socket (ipv6 socket) - Runtime fail Invalid Arguments

Hi, I was porting ipv4 application to ipv6; i was done with TCP transports. Now i am facing problem with SCTp transport at runtime. To test SCTP transport I am using following server and client socket programs. Server program runs fine, but client program fails giving Invalid Arguments for... (0 Replies)
Discussion started by: chandrutiptur
0 Replies

8. AIX

aix telnet disconnects

We're having problems getting disconnected from AIX with our telnet sessions. I can't ping the server when this happens, either. Other serves can be pinged at the same time. This happens both at unix and within the database. Database locks remain when editing files. unix logins remain after... (0 Replies)
Discussion started by: e1lyons
0 Replies
Login or Register to Ask a Question