Gurus needed to diagnose severe performance degradation


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users Gurus needed to diagnose severe performance degradation
# 8  
Old 09-01-2009
What version of NFS are you using?
# 9  
Old 09-02-2009
A 1GB Ethernet connection can never approach 1GB/S because of the collision algorithm used by the Ethernet MAC protocol specification.

Sorry, I don't mean to sound like I am finger pointing, but you should have measured network performance, including throughput and latency, on both channels (old and new) before cutting over.

Ethernet does not perform well under heavy loads because of the way Ethernet works (aloha, collision, backoff) and when you add another protocol on top, the performance is worse.

A directly attached fiber channel should be far superior to ethernet, in this case. The only way to get past "finger pointing" is to build a baseline of the system before production. You have to know the maximum throughput and latency of the fiber channel and the same for the ethernet channel.

Then, you move into the next phase of testing (for a commercial applications). Without baselining, the team is always asking for trouble because you cannot know the system constraints and bottlenecks.

Normally, the network communications channel is the bottleneck. Then, the next problem is the I/O at the network interface level. These tend to perform worse than directly attached disk IO, etc.

I once worked in NYC on a TCP/IP throughput problem where people were about to get fired over the problems with production. There were finger pointing between all (network, system, and dB admins). Finally, I forced them to let me run TCP spray with the system shut down (or it was a parallel system, I can't recall), and then everyone said "Ah!! It it the network!)

Start at the network layer and work up, just the the TCP/IP protocol stack (or OSI stack, if you prefer). Without baselining, you are simply shooting in the dark and guessing. The fastest path to a solution is to take time and baseline the various critical systems, in this case, the network would be the best place to start.

Cheers.
# 10  
Old 09-02-2009
Quote:
Originally Posted by Neo
A 1GB Ethernet connection can never approach 1GB/S because of the collision algorithm used by the Ethernet MAC protocol specification.
How close can it get? I regularly get 90MB with 100baseT, but haven't seen higher than 300MB on our gigabit lines.
# 11  
Old 09-02-2009
Quote:
Originally Posted by Corona688
How close can it get? I regularly get 90MB with 100baseT, but haven't seen higher than 300MB on our gigabit lines.
Well, it has been a long time since I had to do these calculations. The limits are published by IEEE (assuming point-to-point in this discussion) - I would have to Google for the numbers.

---------- Post updated at 18:14 ---------- Previous update was at 18:03 ----------

Also, I forgot to mention that the theoretical maximum for point-to-point Ethernet (assuming no other network devices talking on the channel), is different, of course, than the practical maximum based on things like "length of cable run", "crimps in the cable" , "connector losses", etc.

I once was on a site where the entire performance of the network management system was terrible and the problem was a crimped cable (I think someone rolled their chair across it in the data center, LOL)

That is why I advise to focus on the network channel(s) when you are debugging performance issues on distributed applications.
# 12  
Old 09-02-2009
Quote:
Originally Posted by Neo
Well, it has been a long time since I had to do these calculations. The limits are published by IEEE (assuming point-to-point in this discussion) - I would have to Google for the numbers.

---------- Post updated at 18:14 ---------- Previous update was at 18:03 ----------

Also, I forgot to mention that the theoretical maximum for point-to-point Ethernet (assuming no other network devices talking on the channel), is different, of course, than the practical maximum based on things like "length of cable run", "crimps in the cable" , "connector losses", etc.

I once was on a site where the entire performance of the network management system was terrible and the problem was a crimped cable (I think someone rolled their chair across it in the data center, LOL)

That is why I advise to focus on the network channel(s) when you are debugging performance issues on distributed applications.
If you really want to get into the weeds, the theoretical max bandwidth utilization of an unswitched Ethernet network is 32% of the nominal rate. The key word is "unswitched".

Once you throw switching into the equation, it's a lot easier to get higher rates. I can sustain 90+ megabytes/sec on a gigE point-to-point link, as long as it's dedicated traffic. Now, it takes newer hardware to do that as even a not-too-old IBM x305 that I've used as a WAN emulator (WANEM : The Wide Area Network Emulator) starts falling behind at about 30 or 40 megabytes/sec. And that's a bit newer albeit smaller box than the Fujitsu PrimePower that's the subject of this discussion.

Also, what are the NFS settings? The NFS version? What are the TCP send and receive hiwat settings? Jumbo frames?

Is direct IO enabled?

What is the exact version of Solaris 9? I'd suspect it needs to be as recent as possible.

Also, was the IO utilization of the older fiber channel configuration ever measured? That'd be nice to know in order to solve this problem.

Knowing the IO utilization now would be good, too. If it's moving 3.8 gbps NOW over NFS, it'd be hard to go much faster than that.
# 13  
Old 09-04-2009
I have worked a lot with NetApp filers and Oracle databases on Solaris.

With NFS and a 1gbit etherenet connection you should be able to reach around 50-90MB / sec in sequential reads/writes.

Check how your nfs/filer is working.. Create a large file on the nfs share with mkfile or dd command and check your throughput with iostat -xnpr while the file is beeing created.

also check your network card with netstat -i , do you have errors?


DirectIO is a good option but its not game breaking also change the rsize, wsize for nfs (64k) but thats just fine tuning.


but as the above poster stated, it would be great if you could post some numbers on how much IO the EMC did over FC
# 14  
Old 09-08-2009
Thanks for the feedback everyone, there is lots to digest and compile, I will update again.
Login or Register to Ask a Question

Previous Thread | Next Thread

9 More Discussions You Might Find Interesting

1. AIX

Ld: 0711-851 SEVERE ERROR:

I need to install python 3.3.0 to AIX 6.1 I created folder where I want to install I downloaded files archive from python official website I extracted it into new folder and ran; 1)./configure --with-gcc="xlc_r" --with-cxx="xlC_r" --disable-ipv6 --prefix=my_folder CXX=xlC_r... (2 Replies)
Discussion started by: AIX_30
2 Replies

2. Shell Programming and Scripting

Shell script to diagnose the network

i have learnt a little bit of shell scripting but not alot. i want to write a script to diagnose the network using ping and another script to traceroute. how would i do this? (6 Replies)
Discussion started by: stefanere2k9
6 Replies

3. Shell Programming and Scripting

Severe performance issue while 'grep'ing on large volume of data

Background ------------- The Unix flavor can be any amongst Solaris, AIX, HP-UX and Linux. I have below 2 flat files. File-1 ------ Contains 50,000 rows with 2 fields in each row, separated by pipe. Row structure is like Object_Id|Object_Name, as following: 111|XXX 222|YYY 333|ZZZ ... (6 Replies)
Discussion started by: Souvik
6 Replies

4. AIX

Diagnose high disk write IO

Hi, say for example if there is high disk write IO in one disk (detected from NMON), how to we identify what processes is writing on that particular disk? (3 Replies)
Discussion started by: ngaisteve1
3 Replies

5. Shell Programming and Scripting

Performance monitoring help needed.

How would i check for following? 1)open ports in my linux machine. 2)Hard disk read speed. 3)Hard disk write speed. (2 Replies)
Discussion started by: pinga123
2 Replies

6. Shell Programming and Scripting

Performance degradation with KSH93

Hi, I have a script that calls an external program to perform some calculations and then I read with "grep" and "sed" values from the output files. I've noticed that performance of KSH93 degrades with every iteration. The output files are all the same size, so I don't understand why after the... (2 Replies)
Discussion started by: i.f.schulz
2 Replies

7. Red Hat

Severe Error while starting the System

Dear All, I am facing a unknown error, I start the Linux (RHEL 4 update 6) as usual. After starting the various services(like network,sendmail,portmap etc) a error appears suddenly. The error looks like : Post_create: setxattr failed, rc=28 (dev=hda2 ino=772685) Post_create: setxattr... (2 Replies)
Discussion started by: akhtar.bhat
2 Replies

8. Solaris

error notification and diagnose

Hi All, How does Solaris 9/10 alert the server? Where do you get the error on the server? Is there some kind of verifying of errors (like in AIX, CERTIFY resources or diagnose)? Please let me know. Thanks, itik (4 Replies)
Discussion started by: itik
4 Replies

9. Shell Programming and Scripting

SED GURUS - Help!

I wish to substituite a string on each line but ONLY if it appears within double-quotes: this_string="abc#def#geh" # Comment here I wish to change the "#" characters within the double quoted string to "_": this_string="abc_def_geh" # Comment here ... but as you see, the "comment" hash... (2 Replies)
Discussion started by: Simerian
2 Replies
Login or Register to Ask a Question