Solaris 11.1 Slow Network Performance


 
Thread Tools Search this Thread
Operating Systems Solaris Solaris 11.1 Slow Network Performance
# 1  
Old 06-25-2013
Solaris 11.1 Slow Network Performance

I have identical M5000 machines that are needing to transfer very large amounts of data between them. These are fully loaded machines, and I've already checked IO, memory usage, etc... I get poor network performance even when the machines are idle or copying via loopback. The 10 GB NICs are setup in aggregates, so we should be seeing some serious performance out of these. However, I'm only seeing about 30 MB/sec. These are brand new Solaris 11.1 installs, and fully patched.

I'm getting 30 MB/sec no matter which connection I use, whether it's 1 GB and 10 GB connections, physical, VLAN, etc...

Code:
richardc@SERVERX:/kernel/drv$ dladm show-phys
LINK              MEDIA                STATE      SPEED  DUPLEX    DEVICE
net1              Ethernet             unknown    0      unknown   bge1
net6              Ethernet             up         10000  full      ixgbe0
net0              Ethernet             unknown    0      unknown   bge0
net4              Ethernet             up         1000   full      nxge2
net2              Ethernet             unknown    0      unknown   nxge0
net5              Ethernet             unknown    0      unknown   nxge3
net7              Ethernet             up         10000  full      ixgbe1
net3              Ethernet             up         1000   full      nxge1
richardc@SERVERX:/kernel/drv$ dladm
LINK                CLASS     MTU    STATE    OVER
net1                phys      1500   unknown  --
net6                phys      1500   up       --
net0                phys      1500   unknown  --
net4                phys      1500   up       --
net2                phys      1500   unknown  --
net5                phys      1500   unknown  --
net7                phys      1500   up       --
net3                phys      1500   up       --
prim1               aggr      1500   up       net3 net4
bkup1               aggr      1500   up       net6 net7
rep1                vlan      1500   up       bkup1
richardc@SERVERX:/kernel/drv$ ipadm
NAME              CLASS/TYPE STATE        UNDER      ADDR
bkup1             ip         ok           --         --
   bkup1/v4       static     ok           --         XXX.XX.235.4/22
lo0               loopback   ok           --         --
   lo0/v4         static     ok           --         127.0.0.1/8
prim1             ip         ok           --         --
   prim1/v4       static     ok           --         XXX.XX.225.244/22
rep1              ip         ok           --         --
   rep1/v4        static     ok           --         XXX.XX.198.176/25
sppp0             ip         ok           --         --
richardc@SERVERX:/kernel/drv$ ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000
prim1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
        inet XXX.XX.225.244 netmask fffffc00 broadcast XXX.XX.227.255
bkup1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
        inet XXX.XX.235.4 netmask fffffc00 broadcast XXX.XX.235.255
rep1: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu 1500 index 4
        inet XXX.XX.198.176 netmask ffffff80 broadcast XXX.XX.198.255
sppp0: flags=10010008d1<UP,POINTOPOINT,RUNNING,NOARP,MULTICAST,IPv4,FIXEDMTU> mtu 1500 index 5
        inet 10.1.1.3 --> 10.1.1.1 netmask ff000000
lo0: flags=2002000848<LOOPBACK,RUNNING,MULTICAST,IPv6,VIRTUAL> mtu 8252 index 1
        inet6 ::/128
prim1: flags=20002000840<RUNNING,MULTICAST,IPv6> mtu 1500 index 2
        inet6 ::/0
bkup1: flags=20002000840<RUNNING,MULTICAST,IPv6> mtu 1500 index 3
        inet6 ::/0
rep1: flags=20202000840<RUNNING,MULTICAST,IPv6,CoS> mtu 1500 index 4
        inet6 ::/0
richardc@SERVERX:/kernel/drv$

For example, when doing an SCP of a 200 MB file it takes 8 seconds and runs at 30 MB/sec:

Code:
root@SERVERX:/var/tmp# dlstat -rt -i 1 | grep rep
           rep1  545.27K  314.76M  177.00K   32.18M
           rep1        0        0        0        0
           rep1        0        0        0        0
           rep1        0        0        0        0
           rep1        6      376        0        0
           rep1    1.45K  110.42K      919   15.09M
           rep1    2.84K  215.39K    1.80K   29.69M
           rep1    3.03K  229.36K    1.92K   31.66M
           rep1    3.03K  229.50K    1.69K   31.45M
           rep1    2.92K  220.44K    1.50K   30.17M
           rep1    2.97K  225.20K    1.74K   30.96M
           rep1    2.93K  221.15K    1.34K   30.25M
           rep1    2.89K  218.22K    1.35K   29.67M
           rep1      789   59.82K      306    8.01M
           rep1        0        0        0        0
           rep1        0        0        0        0
           rep1        0        0        0        0


I figured maybe it was something to do with the network, but I see this slow speed no matter which interface I use -- even localhost. I tried doing a local SCP back to the same machine, and it's taking 8 seconds there too even though it isn't actually hitting the network cards. If I do it as a regular CP it copies the file in less than a second.

Code:
richardc@SERVERX:/var/tmp$ scp esf40_20111108.iso 127.0.0.1:/var/tmp/test
Password:
esf40_20111108.iso   100% |************************************************************************************|   224 MB    00:08
richardc@SERVERX:/var/tmp$


Does anyone have any ideas on how to improve the performance? I figure I can't be the only one who has run into this, but my Google searches haven't turned up anything.
# 2  
Old 06-25-2013
these kind of issues i used to see when the network device was set to auto-negotiate the speed/duplex ... see if editing the config files of the necessary interfaces (i.e., /kernel/drv/prim1.conf) to eliminate auto-negotiation helps ...
# 3  
Old 06-25-2013
But 127.0.0.1 shouldn't be affected by auto-negotiation, right? In our environment the standard is auto on both the switches and servers.
# 4  
Old 06-25-2013
you may be right about the loopback but i cannot say for sure as i do not have access to those kind of machines -- maybe somebody else here can explain further ...

you may also want to see the configurations of the appropriate network devices on your other solaris servers and their appropriate ports on the switch if they are able to send the big files much faster ...

btw, i had my network group actually hard set the affected ports on the switch to run 100/full so my servers' network ports were not continually auto-negotiating ... the standards at that company was also to set every network port to auto but they had to make an exception for my production servers ...

if your network group balks at the request to hard set the ports, call up oracle support to see if they have better ideas if nobody else here has one ...
# 5  
Old 06-25-2013
You're right, I may have to turn to support. There are just so many possible factors that they often seem to point the finger at "the other guy..." Smilie

I did notice one interesting thing. When I open multiple SCP sessions the speed of the throughput jumps to 60 MB/sec with two transfers, 90 MB/sec with three, etc.... CPU and memory never cap out.
# 6  
Old 06-25-2013
apparently what you are seeing with scp performance on solaris 11 is by design according to this oracle document ...

anyways, see if the script described here helps out with your issue ... if it does, please post the fix to your problem here so everybody learns ...
# 7  
Old 06-27-2013
The problem turned out to be LACP. The aggregates weren't communicating properly as LACP pairs due to both a configuration problem on the switch and the servers didn't have have passive mode enabled. After I fixed the server end, and the network guys fixed the switch end the speeds went up exponentially.
Login or Register to Ask a Question

Previous Thread | Next Thread

8 More Discussions You Might Find Interesting

1. Red Hat

Network becomes slow and return fast only after restart network

Hi, I have 2 machines in production environment: 1. redhat machine for application 2. DB machine (oracle) The application doing a lot of small read&writes from and to the DB machine. The problem is that after some few hours the network from the application to the DB becomes very slow and... (4 Replies)
Discussion started by: moshesa
4 Replies

2. Red Hat

GFS file system performance is very slow

My code Hi All, I am having redhat linux 5.3 (Tikanga) with GFS file system and its very very slow for executing ls -ls command also.Please see the below for 2minits 12 second takes. Please help me to fix the issue. $ sudo time ls -la BadFiles |wc -l 0.01user 0.26system... (3 Replies)
Discussion started by: susindram
3 Replies

3. Infrastructure Monitoring

99% performance wa, slow server.

There is a big problem with the server (VPS based on OpenVZ, CentOS 5, 3GB RAM). The problem is the following. The first 15-20 minutes after starting the server is operating normally, the load average is less than or about 1.0, but then begins to increase sharply% wa, then hovers around 95-99%.... (2 Replies)
Discussion started by: draiphod
2 Replies

4. Shell Programming and Scripting

Slow performance filtering file

Please, I need help tuning my script. It works but it's too slow. The code reads an acivity log file with 50.000 - 100.000 lines and filters error messages from it. The data in the actlog file look similar to this: 02/08/2011 00:25:01,ANR2034E QUERY MOUNT: No match found using this criteria.... (5 Replies)
Discussion started by: Miila
5 Replies

5. UNIX for Dummies Questions & Answers

Slow copy/performance... between volumes

hi guys We are seeing weird issues on my Linux Suse 10, it has lotus 8.5 and 1 filesystem for OS and another for Lotus Database. the issue is when the Lotus service starts wait on top is very high about 25% percent and in general CPU usage is very high we found that when this happens if we... (0 Replies)
Discussion started by: kopper
0 Replies

6. Filesystems, Disks and Memory

Slow Copy(CP) performance

Hi all We have got issues with copying a 2.6 GB file from one folder to another folder. Well, this is not the first issue we are having on the box currently, i will try to explain everything we have done from the past 2 days. We got a message 2 days back saying that our Production is 98%... (3 Replies)
Discussion started by: b_sri
3 Replies

7. Shell Programming and Scripting

egrep is very slow : How to improve performance

We have an egrep search in a while loop. egrep -w "$key" ${PICKUP_DIR}/new_update >> ${PICKUP_DIR}/update_record_new ${PICKUP_DIR}/new_update is 210 MB file In each iteration, the egrep on an average takes around 50-60 seconds to search. Ther'es nothing significant in the loop other... (7 Replies)
Discussion started by: hidnana
7 Replies

8. Post Here to Contact Site Administrators and Moderators

Help! Slow Performance

Is the performance now very, very slow (pages take a very long time to load)? Or is it just me? Neo (6 Replies)
Discussion started by: Neo
6 Replies
Login or Register to Ask a Question