Gurus needed to diagnose severe performance degradation

09-01-2009

Registered User

4,996, 477

Join Date: Dec 2003

Last Activity: 12 June 2016, 11:03 PM EDT

Location: /dev/ph

Posts: 4,996

Thanks Given: 73

Thanked 477 Times in 439 Posts

What version of NFS are you using?

fpmurphy

View Public Profile for fpmurphy

Find all posts by fpmurphy

09-02-2009

Administrator

19,118, 3,359

Join Date: Sep 2000

Last Activity: 15 July 2022, 8:51 AM EDT

Location: Asia Pacific, Cyberspace, in the Dark Dystopia

Posts: 19,118

Thanks Given: 2,351

Thanked 3,359 Times in 1,878 Posts

A 1GB Ethernet connection can never approach 1GB/S because of the collision algorithm used by the Ethernet MAC protocol specification.

Sorry, I don't mean to sound like I am finger pointing, but you should have measured network performance, including throughput and latency, on both channels (old and new) before cutting over.

Ethernet does not perform well under heavy loads because of the way Ethernet works (aloha, collision, backoff) and when you add another protocol on top, the performance is worse.

A directly attached fiber channel should be far superior to ethernet, in this case. The only way to get past "finger pointing" is to build a baseline of the system before production. You have to know the maximum throughput and latency of the fiber channel and the same for the ethernet channel.

Then, you move into the next phase of testing (for a commercial applications). Without baselining, the team is always asking for trouble because you cannot know the system constraints and bottlenecks.

Normally, the network communications channel is the bottleneck. Then, the next problem is the I/O at the network interface level. These tend to perform worse than directly attached disk IO, etc.

I once worked in NYC on a TCP/IP throughput problem where people were about to get fired over the problems with production. There were finger pointing between all (network, system, and dB admins). Finally, I forced them to let me run TCP spray with the system shut down (or it was a parallel system, I can't recall), and then everyone said "Ah!! It it the network!)

Start at the network layer and work up, just the the TCP/IP protocol stack (or OSI stack, if you prefer). Without baselining, you are simply shooting in the dark and guessing. The fastest path to a solution is to take time and baseline the various critical systems, in this case, the network would be the best place to start.

Cheers.

Neo

View Public Profile for Neo

Visit Neo's homepage!

Find all posts by Neo

09-02-2009

Registered User

23,310, 4,623

Join Date: Aug 2005

Last Activity: 7 July 2020, 11:47 AM EDT

Location: Saskatchewan

Posts: 23,310

Thanks Given: 1,331

Thanked 4,623 Times in 4,217 Posts

Quote:

Originally Posted by Neo

A 1GB Ethernet connection can never approach 1GB/S because of the collision algorithm used by the Ethernet MAC protocol specification.

How close can it get? I regularly get 90MB with 100baseT, but haven't seen higher than 300MB on our gigabit lines.

Corona688

View Public Profile for Corona688

Visit Corona688's homepage!

Find all posts by Corona688

09-02-2009

Administrator

19,118, 3,359

Join Date: Sep 2000

Last Activity: 15 July 2022, 8:51 AM EDT

Location: Asia Pacific, Cyberspace, in the Dark Dystopia

Posts: 19,118

Thanks Given: 2,351

Thanked 3,359 Times in 1,878 Posts

Quote:

Originally Posted by Corona688

How close can it get? I regularly get 90MB with 100baseT, but haven't seen higher than 300MB on our gigabit lines.

Well, it has been a long time since I had to do these calculations. The limits are published by IEEE (assuming point-to-point in this discussion) - I would have to Google for the numbers.

---------- Post updated at 18:14 ---------- Previous update was at 18:03 ----------

Also, I forgot to mention that the theoretical maximum for point-to-point Ethernet (assuming no other network devices talking on the channel), is different, of course, than the practical maximum based on things like "length of cable run", "crimps in the cable" , "connector losses", etc.

I once was on a site where the entire performance of the network management system was terrible and the problem was a crimped cable (I think someone rolled their chair across it in the data center, LOL)

That is why I advise to focus on the network channel(s) when you are debugging performance issues on distributed applications.

Neo

View Public Profile for Neo

Visit Neo's homepage!

Find all posts by Neo

09-02-2009

Registered User

1,015, 157

Join Date: Jun 2009

Last Activity: 25 June 2018, 8:15 AM EDT

Posts: 1,015

Thanks Given: 3

Thanked 157 Times in 149 Posts

Quote:

Originally Posted by Neo

Well, it has been a long time since I had to do these calculations. The limits are published by IEEE (assuming point-to-point in this discussion) - I would have to Google for the numbers.

---------- Post updated at 18:14 ---------- Previous update was at 18:03 ----------

Also, I forgot to mention that the theoretical maximum for point-to-point Ethernet (assuming no other network devices talking on the channel), is different, of course, than the practical maximum based on things like "length of cable run", "crimps in the cable" , "connector losses", etc.

I once was on a site where the entire performance of the network management system was terrible and the problem was a crimped cable (I think someone rolled their chair across it in the data center, LOL)

That is why I advise to focus on the network channel(s) when you are debugging performance issues on distributed applications.

If you really want to get into the weeds, the theoretical max bandwidth utilization of an unswitched Ethernet network is 32% of the nominal rate. The key word is "unswitched".

Once you throw switching into the equation, it's a lot easier to get higher rates. I can sustain 90+ megabytes/sec on a gigE point-to-point link, as long as it's dedicated traffic. Now, it takes newer hardware to do that as even a not-too-old IBM x305 that I've used as a WAN emulator (WANEM : The Wide Area Network Emulator) starts falling behind at about 30 or 40 megabytes/sec. And that's a bit newer albeit smaller box than the Fujitsu PrimePower that's the subject of this discussion.

Also, what are the NFS settings? The NFS version? What are the TCP send and receive hiwat settings? Jumbo frames?

Is direct IO enabled?

What is the exact version of Solaris 9? I'd suspect it needs to be as recent as possible.

Also, was the IO utilization of the older fiber channel configuration ever measured? That'd be nice to know in order to solve this problem.

Knowing the IO utilization now would be good, too. If it's moving 3.8 gbps NOW over NFS, it'd be hard to go much faster than that.

achenle

View Public Profile for achenle

Find all posts by achenle

09-04-2009

Registered User

168, 1

Join Date: Apr 2002

Last Activity: 4 August 2011, 11:04 AM EDT

Location: Sweden / Stockholm

Posts: 168

Thanks Given: 0

Thanked 1 Time in 1 Post

I have worked a lot with NetApp filers and Oracle databases on Solaris.

With NFS and a 1gbit etherenet connection you should be able to reach around 50-90MB / sec in sequential reads/writes.

Check how your nfs/filer is working.. Create a large file on the nfs share with mkfile or dd command and check your throughput with iostat -xnpr while the file is beeing created.

also check your network card with netstat -i , do you have errors?

DirectIO is a good option but its not game breaking also change the rsize, wsize for nfs (64k) but thats just fine tuning.

but as the above poster stated, it would be great if you could post some numbers on how much IO the EMC did over FC

s93366

View Public Profile for s93366

Find all posts by s93366

09-08-2009

Registered User

2, 0

Join Date: Sep 2009

Last Activity: 8 September 2009, 1:34 PM EDT

Posts: 2

Thanks Given: 0

Thanked 0 Times in 0 Posts

Thanks for the feedback everyone, there is lots to digest and compile, I will update again.

DBA_guy

View Public Profile for DBA_guy

Find all posts by DBA_guy

UNIX for Advanced & Expert Users

Gurus needed to diagnose severe performance degradation

9 More Discussions You Might Find Interesting

1. AIX

Ld: 0711-851 SEVERE ERROR:

Discussion started by: AIX_30

2. Shell Programming and Scripting

Shell script to diagnose the network

Discussion started by: stefanere2k9

3. Shell Programming and Scripting

Severe performance issue while 'grep'ing on large volume of data

Discussion started by: Souvik

4. AIX

Diagnose high disk write IO

Discussion started by: ngaisteve1

5. Shell Programming and Scripting

Performance monitoring help needed.

Discussion started by: pinga123

6. Shell Programming and Scripting

Performance degradation with KSH93

Discussion started by: i.f.schulz

7. Red Hat

Severe Error while starting the System

Discussion started by: akhtar.bhat

8. Solaris

error notification and diagnose

Discussion started by: itik

9. Shell Programming and Scripting

SED GURUS - Help!

Discussion started by: Simerian