How to demonstrate network problems?


 
Thread Tools Search this Thread
Top Forums Shell Programming and Scripting How to demonstrate network problems?
# 1  
Old 02-04-2011
How to demonstrate network problems?

Hi,
I work on several Sun servers running Solaris (SunOS 5.10). All of these are Application Servers with a propietary software running on it.
It happens that some times (not regularly/deterministic and not so often, i.e. twice a month circa) we register what I think are network problems.
I say this because the messages appearing in alarm log from AS says:
* Can't connect to NE
* AS not available
* Connection to NE time out
Also it happened that after such alarms we had a huge number of frozen TCP connection in CLOSE_WAIT. Following this the File Descriptors for the process that handle TCP connection reached the system limit of 5120 and the AS started to transmit in UDP causing the failure of transactions since the other side didn't expect UDP packets.
Now this behavior (frozen CLOSE_WAIT) is fixed with a patch from the vendor but still remains the network (?) problems causing the alarms.

So since the first time it happened I have a fixed idea in my mind: network problems (Catalyst side to be clear).
Whatever it can be (traffic switch, downlink ...) this affect the traffic (SIP) of our AS.
The only problem is that the network guys says: from our point of view it's everything fine! Always, all the times it happens the problems!

At the end of the story:
What kind of data can I collect (and HOW!?) to DEMONSTRATE that it's a problem of the network (or the AS)?
Do you know a script or tool (not traffic affecting) that can monitor the described scenario?

Thanks you very much!

Regards,
Evan
# 2  
Old 02-04-2011
First read up on snoop. You can capture and analyze packets, then run reports against those packets when you have a problem for network verification. The main downside to this strategy is disk usage. Not performance. You point of interest is to be able to realte the send time of a packet from your app to the time it takes for the expected return packet to appear. This is not the same thing as ping return times.

You can also use netstat -i to gather information on packet collisions. When collisions rise to a significant level trafic is impeded to large degree.

However, most network problem I have seen are the result of crappy application code.
Once a network is set up correctly, and is not subject to huge torrents of random data, you do not see problems except for hardware issues or intrusions.

A common application fault is not implementing a circular buffer (queue) large enough to deal with full bore traffic. In other words the network outruns the application between two apps: one slower, one faster.
# 3  
Old 02-04-2011
Hi jim thanks for your reply.

Regarding the snoop I'll brush up a previous script: an infinite loop that created a snoop file at once with 5000 packets:
Code:
snoop -q -o ./trace.cap -d eth0 -c 5000

Then the file is kept or deleted based on other stats so no disk space problems.
The script didn't affected much the system from CPU and memory point of view.
I'll size the number of packets.

Your last statement is not very clear to me.
You mean that in some cases the network could be slower than the application traffic and then some packets are lost?
So you suppose that the application has a limited buffer (undersized) and during traffic peak it could loose packets and take to the mentioned errors.
Is this right?

In the meanwhile I'm studying kstat and dtrace as described here:
H.K. Jerry Chu's Weblog

Do you think I can get interesting infos with these tools?

Thanks,
Evan
# 4  
Old 02-04-2011
No - the reverse. The network is not the problem. The application loses data because it cannot handle the traffic.

If your network guy knows what he/she is doing, the likelihood of your applications being at fault is pretty good.

Your goal should be: correlate times of network data with good performance and bad performance.
# 5  
Old 05-23-2011
Hi jim,
sorry for not updating this post.
TCP issue was the problem!

Let me explain: the core network sends TCP packets with parameter Window=0 (this happens on process restart, still not clear why it restarts).
The AS handle such scenario in a standard manner: no more traffic toward the core NE.
This lead to some asserts in log files reporting disconnection from all the nodes.

Regards,
Evan
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

Vmware 7 unixware 7.13 network problems

i need help, i will always accord mine if my scope allows me, help me in this one.. am running vmware 7.0 in winserver 2008 on a cq61-425el, my network driver is installed correctly. and my vmware network drivers have added as well (in the device manager section). During installation i wasnt... (0 Replies)
Discussion started by: baitz
0 Replies

2. SCO

Problems with network card in SCO

hello, i am new to unix and need support</SPAN></SPAN> </SPAN></SPAN>The problem is: I have a SCO 5.0.5 server, and has no local network access, I think my problem are the drivers for network card. my network card is Kingston EtheRx PCI Fast Ethernet Adapter KNE120TX and already installed the... (9 Replies)
Discussion started by: daniel_cie
9 Replies

3. SCO

network problems with sco

SCO unix 5.0.5 - Suddently network lost & has to be reboot. I try to stop TCP & start TCP. but no connection. Any idea about that. thanks (8 Replies)
Discussion started by: ajantha
8 Replies

4. Solaris

Troubleshooting Solairs 10 Network problems

recently I've setup a Solaris 10 box & am having network disruption, e.g. disconnects, interrupted pings, etc) when I connect to it from another machine, how should I troubleshoot this ! The machine is a x86 box with 3 NICs (Differnet providors). The Current NIC is a Intel Gigabit, which is... (4 Replies)
Discussion started by: stevie_velvet
4 Replies

5. Solaris

Network Install Problems T2000

Hi I am trying to do a network install of Solaris 10 08_07 onto a Sunfire T2000. I have configured all my network-boot-arguments on the client server (named sundb1). I have installed my image of Solaris on my install server (sun1). But when I try to install using # boot net -s I get the... (0 Replies)
Discussion started by: Bobby76
0 Replies

6. Solaris

Problems with SS5 Network

Hey. i have a SPARCstation 5 running solaris 8 and CDE, i know this hardware is really old but its the only Solaris machine i can afford at this time, (Student) but im having a hard time getting it on the internet, im using a cable modem, andt DHCP IPv4, but i can`t get a connection, i have heard... (0 Replies)
Discussion started by: mads-nielsen
0 Replies

7. Slackware

Network Startup Problems

I recently installed slackware 11 and have been very happy with it until I found out that some gnome related apps can cause gnome's network manager to alter the rc.init1 script by adding 3 lines to the script containing only the command, eth_up. This causes the script not to run properly and not... (1 Reply)
Discussion started by: djtrippin
1 Replies

8. SuSE

Linux Network problems-Please help

Hullo everyone, I have recently installed SUSE 9.1 and attached it to my Network router (Actiontec 54Mbps Wireless DSL Gateway) but can't connect to the internet. It comes up with the following error message: An error occurred while loading "http://www.yahoo.com":Timeout on server - connection... (2 Replies)
Discussion started by: sybella1
2 Replies

9. Windows & DOS: Issues & Discussions

Windows xp network problems

Hello guys , i have a big big problem. At the company that i work for there are like 4 pcs that are sending and recieving a lot of packets to the point that the network is down. we think that it might be a virus, we run all the antivirus that you could posible think of and nothing so far. ... (2 Replies)
Discussion started by: josramon
2 Replies
Login or Register to Ask a Question