I am able to establish a connection, and transfer data. Occasionally the receiving client will block in read(2) and stay that way until it is killed.
initial:
(Connection established and data transferred)
Lost connection:
At this point the server issues write(2) calls on the non-blocking socket and receives -1/errno=EAGAIN, eventually the socket times out and write(2) returns -1/errno=ETIMEDOUT. The server then close(2) the connection.
********
The client is now stuck in read(2) on a blocking connection.
I don't have a lot of information about the server or client computers, but:
server:
Linux server-name 2.6.18-194.el5 #1 SMP Fri Apr 2 14:58:14 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
client:
Linux client-name 2.6.18-92.el5 #1 SMP Tue Jun 10 18:51:06 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
Have anyone seen this before ?
Any suggestions of how to debug this would be greatly appreciated. I am not exactly a tcp expert and could use some tips.
[This worked fine when the server was running on SunOS]
I'm pretty sure this is a kernel bug. I get the same behavior on 2.6.19 sometimes; and unfortunately I am forced to stick with this version for the time being!
I'm pretty sure this is a kernel bug. I get the same behavior on 2.6.19 sometimes; and unfortunately I am forced to stick with this version for the time being!
To confirm this assumption, it would be useful to tcpdump what the OP did above. We would see what happens at the protocol level, and possibly trace kernel misbehavior. I'd be in particular interested to know if the server sends RST after closing the socket, and if the client received it at the link layer.
I grepped for the port number 17398, so there might be some unrelated packets in there.
server log: After the last TCP Retransmission the connection timed out causing write(2) to return ETIMEDOUT which caused the server to issue a close(2) on the socket.
This triggered no new packets (at least not on the server side). Then I shutdown the server, but that didn't trigger anything either, the client was still blocked in read(2).
When I killed the client (Ctrl-C), the remaining entries (FIN etc) appeared in the server log.
I said earlier that this worked when the server was run on a SunOS computer, this is not correct, it fails in the same way.
For these tests the following computers was used:
server> uname -a
Linux server-name 2.6.34-gentoo-r12 #5 SMP Tue Apr 5 12:56:20 CEST 2011 x86_64 Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz GenuineIntel GNU/Linux
client> uname -a
Linux client-name 2.6.18-194.26.1.el5 #1 SMP Tue Nov 9 12:54:20 EST 2010 x86_64 x86_64 x86_64 GNU/Linux
To confirm this assumption, it would be useful to tcpdump what the OP did above.
I'm not talking about sockets hanging until they die. I'm talking about some sockets that, once they hang, fail to die by themselves -- ever.
They won't time out in the usual 10 minutes. They won't time out in 10 days. They don't abide by manually-set TCP timeouts. Not even -TERM or -QUIT will stop a process stuck reading one. Luckily -KILL still works.
I grepped for the port number 17398, so there might be some unrelated packets in there.
You could use a command similar to the following to allow trace the communication between your server running at port 17398 and your client(s):
Code:
tcpdump -i <interface> src port 17398 or dst port 17398
Quote:
server log: After the last TCP Retransmission the connection timed out causing write(2) to return ETIMEDOUT which caused the server to issue a close(2) on the socket.
Are you sure that you're performing a close(). A close() on socket at the server side should cause a FIN segment to be sent to the client(s).
Code:
This triggered no new packets (at least not on the server side). Then I shutdown the server, but that didn't trigger anything either, the client was still blocked in read(2).
What do you mean by "shutdown the server". Is the process terminated? (e.g. with KILL).
Code:
I said earlier that this worked when the server was run on a SunOS computer, this is not correct, it fails in the same way.
This seems to indicate that there is a problem with your server code.
********
> Are you sure you're performing a close.
Yes, since I am now logging the calls (see below).
> What do you mean by "shutdown the server". ...
Server responds to Ctrl-C by close(2) ing the listening socket and exit cleanly.
>> run on a SunOS computer, ...
> This seems to indicate that there is a problem...
Both the client and server software has worked for years (I know this doesn't prove anything).
However if one end is constantly calling write(2) and the other end is blocked in read(2) there isn't a lot of things that can go wrong (is it ?).
********
Four instances of the server runs on ports 17395..17398, clients issues two connections alternating on 17395/17396 and 17397/17398.
* When communication has stopped:
server issues non-blocking write(2) calls, client is blocked in read(2).
Why does the client not respond to this ?
(A tail on the client log at this point in time shows no new entries).
The last client entry is :
10:10:37.406609 IP (tos 0x0, ttl 64, id 58005, offset 0, flags [DF], proto: TCP (6), length: 80) client-ip:34386 > server-ip:17395: ., cksum 0xf106 (correct), 223:223(0) ack 1295964 win 5114 <nop,nop,timestamp 3610227622 528289197,nop,nop,sack 3 {1339404:1933084}{1330716:1332164}{1319132:1323476}>
(It is possible that the tcpdump I/O buffers has not been flushed so that the log is not complete here).
netstat after the communication has stopped:
server$ netstat:
tcp 0 936856 server-ip:17395 client-ip:34386 ESTABLISHED
client$ netstat:
tcp 0 0 client-ip:34386 server-ip:17395 ESTABLISHED
The next thing that happens is that write(2) fails,
write(2) -> -1, errno = 110(TimeOut)
shutdown(fd, 2) -> -1, errno = 107 (Not connected)
close(fd) -> 0
No packets are transmitted when this happens.
And it seems the retransmissions have stopped.
Now the connection does not show up in netstat on the server, but on the client it is the same:
client$ netstat
tcp 0 0 client:34386 server:17395 ESTABLISHED
**********
The server reacts to Ctrl-C by doing a close(2) on the filedescriptor that was passed to bind/listen, and then exits cleanly.
When this happens no packets is transmitted and the
server process no longer exists
*********
Finally I kill the client with Ctrl-C
That cause 9 FIN events to be sent from the client.
netstat on the client shows that the connection is gone.
*********
It seems the client is continuously sending ACK for packet 1295964, why ?
"6512","11.705225","10.21.33.125","10.30.33.154","TCP","[TCP Dup ACK 5345#428] 34386 > 17395 [ACK] Seq=224 Ack=1295964 Win=654592 Len=0 TSV=3610227624 TSER=528289197 SLE=1339404 SRE=1950460 SLE=1330716 SRE=1332164 SLE=1319132 SRE=1323476"
And packets with this id (or whatever it is called) seems to be passed between the server and the client.
(attached two more logs, not sure if they reveal anything else).
I have inherited and SCO OpenServer Release 6 server. The clients connect using telnet to get to a proprietary database application for Service tickets. The issue I am currently having is that the connection just stops abruptly and you can see "telnet session terminated" on the terminal emulation... (22 Replies)
Is there a way to have persistent terminal windows to redhat server across viewer disconnects? I can do that with the help of an extra MS Windows server and rdp, but is there a way of doing that without the Windows server?
Here's the scenario. I have multiple redhat servers (VMs) which have no... (3 Replies)
Hi , My redhat 5 frequently disconnects from network. Once rebooted , network is working for one day or two. After that the NIC suddently stops working. Even if i give "#service network restart" or ifup eth0 commands it won't come up. I even tried reconfigure the network card. but no use. Only... (6 Replies)
I need clarification on whether it is okay to set socket options on a listening socket
simultaneously when it is being used in an accept() call?
Following is the scenario:-
-- Task 1 - is executing in a loop - polling a listen socket, lets call it 'fd', (whose file descriptor is global)... (2 Replies)
Hello everyone. Thanks for reading. I am using Ubuntu 7.04 to experience this problem:
I have written my own programs that communicate to eachother and I am having a hard time detecting a TCP socket disconnect when the remote side's computer has a power-failure (for example).
On the computer... (6 Replies)
i am using Putty to do ssh to all the unix nodes that we have in our work environment. it is very strange that all my network connections will timeout quickly in 10 mins, it can either be a putty connection, sqlplus or toad. is there some setting that can help to prevent this. please let me know... (3 Replies)
Hi,
I was porting ipv4 application to ipv6; i was done with TCP transports. Now i am facing problem with SCTp transport at runtime.
To test SCTP transport I am using following server and client socket programs. Server program runs fine, but client program fails giving Invalid Arguments for... (0 Replies)
We're having problems getting disconnected from AIX with our telnet sessions.
I can't ping the server when this happens, either. Other serves can be pinged at the same time.
This happens both at unix and within the database. Database locks remain when editing files. unix logins remain after... (0 Replies)