Bizzare TCP/IP problem


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users Bizzare TCP/IP problem
# 1  
Old 02-01-2011
Question Bizzare TCP/IP problem

Hi all.

I have a really really weird problem that I've been working on for days.

The problem manifested as users cannot connect to our web servers via SSH when they're using our wireless network. Here's where it gets weird:

- Clients from anywhere other than the wireless subnet can connect fine
- Wireless clients can connect to ssh servers on subnets other than the one our web servers are on (both onsite and offsite)
- I can run nc -l 22 on one of the web servers and transfer big files from a wireless client with cat bigfile | nc webserver 22.Smilie
- If I run telnetd on port 22 one of our web servers, I cannot connect. It fails in a very similar way to ssh Smilie
- Update (Three days later) I can recreate the problem in netcat by typing into the client and server alternately. If I just send one-way in netcat, the problem never comes up.
- The TCP handshake succeeds, then packets stop arriving and the client starts resending packets. The server seems to be waiting.
- When I kill the ssh or sshd process, a bunch of tcp packets start flowing. If I kill the client, the server will actually show a completed key exchange (ssh obviously). Said another way, the connection stalls, I kill the client, the connection continues a bit with the client dead and then closes.Smilie
- Googling around I found lots of folks who recommended fiddling with MTU and some IP /proc variables, but that did not help. The problem is too consistent to be that anyway. And I can nc big files (10Mb) with no problem. (md5 checked)
- I thought it might be a DNS problem, but tcpdump shows no DNS queries while the connection hangs (set UseDNS no in sshd_config).
- Update (Two days later...) - I plugged in a machine that is not a Xen host or client, and it shows the same behaviour, so we can rule out any Xen strangeness as the culprit.
- Update (Three days later...) - After the TCP handshake, the client can send as many packets as it wants UNTIL the server sends anything (again, after the initial handshake), after which any packets from the client do not reach the server.

Other important info:
I only control the client I'm testing with and the web servers. I do not control the wireless setup or the routers or the firewalls. Those are all controlled by my boss. He's checked his config and it looks good to him, so if it really is something wrong on his end, I need really good evidence before I waste his time some more. Really, the clues so far point to my servers being the source of the problem.

The servers are all CentOS 5.5. They are virtualized under Xen. (Tcpdump shows the same stuff on the Xen host/Dom0 as on the client/DomU, so I don't think it's a Xen problem, but then again....) Update My client is also linux, Fedora 11. The problem was initially reported by a Mac user, version unknown.

Okay, I gotta go soak my head.... Thanks All!

-Pileofrogs

Last edited by pileofrogs; 02-04-2011 at 02:16 PM.. Reason: I am made of stilton cheese
# 2  
Old 02-03-2011
Did you mention what Operating System and ssh software the clients use? If Microsoft is involved, please be specific about software versions.
# 3  
Old 02-03-2011
Quote:
Originally Posted by pileofrogs
Hi all.

I have a really really weird problem that I've been working on for days.

The problem manifested as users cannot connect to our web servers via SSH when they're using our wireless network.
It sounds like an MTU problem. A while ago we had another fellow with a similar-looking problem -- he could connect on FTP, but the socket would transfer a few kilobytes then timeout, because his client's MTU was too large.

Early in the session when they're still negotiating they'll be mostly sending small packets and the problem goes unnoticed, but when you start transferring bulk data(or ssh keys?), some link between your clients and your web server chokes on packets larger than its configured to handle and drops them into hyperspace, leaving both ends waiting for the other. Retransmits also get dropped, so the connection chokes and eventually dies.

It should be able to handle that gracefully -- compliant routers send an ICMP reply which says "too big! fragment them more!" But there are unfortunately lots and lots of firewalls set up by people convinced that all ICMP is bad.

Try reducing the MTU on your clients and see if that helps.

Try pinging hosts from the wireless with huge packets to see if some links start dropping before others and, if they do drop, whether anything ICMP replies.

Last edited by Corona688; 02-03-2011 at 06:56 PM..
# 4  
Old 02-03-2011
If it is a MTU problem, try ftp with the parameter "-B 1". I have seen dramatic speed improvements because "-B 1" prevents "jumbo packets" which can be extremely slow unless every software and hardware component in the network was expecting this "enhancement" to the TCP/IP protocol.


Quote:
A while ago we had another fellow with a similar-looking problem -- he could connect on FTP, but the socket would transfer a few kilobytes then timeout, because his client's MTU was too large.
@Corona688
Hmm sounds like a classic unix-to-Microsoft ftp problem. It is a firewall problem because Imho Microsoft don't implement ftp correctly. In unix you can transmit small files on port 21 but need port 20 open to transmit large files. Nuff said.

If it's unix-to-unix lowering the MTU with the "-B" parameter to "ftp" can produce serious speed improvements on a mixed-manufacturer network.

Last edited by methyl; 02-03-2011 at 07:21 PM.. Reason: lots of afterhoughts
# 5  
Old 02-04-2011
Thanks for taking the time to answer! Sadly, I've already played around with the MTU and it didn't help. It's actually balking on packets with almost no data at all in them.

---------- Post updated at 10:13 AM ---------- Previous update was at 10:06 AM ----------

Quote:
Originally Posted by methyl
Did you mention what Operating System and ssh software the clients use? If Microsoft is involved, please be specific about software versions.
Sorry! Client is Fedora 11 linux using OpenSSH. The problem was originally reported by someone using a Mac, I don't know the OS version. He tried using the command line ssh & something called cyberduck.

Since I can recreate the problem using telnet and now, netcat, I think it's not specific to any versions or OS.
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Programming

Problem and question with TCP

Hi guys , i write this message for a doubt, a time ago i wrote a client/server program with TCP/IP in Linux. When i tested the program flooding the server with messages of 1024 bytes (Or 1025 bytes i dont remember exactly the number but was more that 1000 bytes) in certain point a message was... (5 Replies)
Discussion started by: Kovalevski
5 Replies

2. Programming

Bizzare optimization problem

I have a C program that, for the life of me, I can't see any possible stack corruption in but for some reason corrupts a local variable to 0 when not referenced. My compiler is gcc 4.3.4. But my question's not really a code question, it's a compiler question. The glitch is weirdly specific: ... (3 Replies)
Discussion started by: Corona688
3 Replies

3. Programming

Problem with tcp server

Hello @ all, I hope you can give me some advice :b: I will be following code for a tcp server and doStuff () function, the clients treated. From some point, I have several identical clients (zombies, I think), the same records in the database write. Has anyone an explanation? What can I... (1 Reply)
Discussion started by: yumos
1 Replies

4. Shell Programming and Scripting

tcp/ip and memory problem

how the data from disk is loaded into memory and then it supplied to tcp/ip packet. how i can trace the no of pages loaded in memory by that process and rate of context switch for that process. (1 Reply)
Discussion started by: amar20
1 Replies

5. Red Hat

tcp/ip problem

how the data from disk is loaded into memory and then it transfered to tcp/ip packet. how i can find how many pages are loaded into memory by that process what is the rate of context switch for that process. (5 Replies)
Discussion started by: amar20
5 Replies

6. Programming

very bizzare file writing problem

I'm trying to write a function which opens a file pointer and writes one of the function parameters into the file, however for some reason Im getting a core dump error. The code is as below void WriteToFile(char *file_name, char *data) { FILE *fptr; /*disk_name_size is a... (10 Replies)
Discussion started by: JamesGoh
10 Replies

7. Solaris

TCP Problem

I am running a Java Client on Solaris 9 which communicates with the Server using TCP/IP. The client transmits a FIN packet to server. The server sends a ACK, FIN enters LAST_ACK state and then waits for ACK from client. The client did not respond back leaving the server in LAST_ACK itself. Also... (0 Replies)
Discussion started by: diarun
0 Replies

8. UNIX for Advanced & Expert Users

Bizzare (while statement)

I'm trying to use the while statement to increment a positive number, with a leading "0". when I pass it through, it seems to come out with a negative value, and all the increments remain negative. This is what I have: i=010986294184 j=010986988888 while ; do echo $i i=(($i + 1)) done... (8 Replies)
Discussion started by: Khoomfire
8 Replies

9. IP Networking

WinXP and TCP/IP Problem

Hi Eveyone, I have A small problems maybe some one can help me. I'm running a small network at home with internet access. Two PC's have Win XP and one has Win98se. I have them all hook up on a SMC router. ALL windows firewall are off and and harddrive sharing is on. I am using DCHP network... (3 Replies)
Discussion started by: Peterh
3 Replies

10. IP Networking

tcp problem with port

I am trying to connect via DBACCESS and Informix server to a server on a different computer. When I execute the connect command from dbaccess I get the following message, Exec format error cannot bind a name to the port. As far as I know the port is not being used by another client. How... (1 Reply)
Discussion started by: lopez
1 Replies
Login or Register to Ask a Question