Performance investigation, very high runq-sz %runocc


 
Thread Tools Search this Thread
Homework and Emergencies Emergency UNIX and Linux Support Performance investigation, very high runq-sz %runocc
# 1  
Old 03-23-2011
Performance investigation, very high runq-sz %runocc



I've just been handed a hot potato from a colleague who left Smilie... our client has been complaining about slow performance on one of our servers.
I'm not very experienced in investigating performance issues so I hoping someone will be so kind to provide some guidance

Here is an overview of the system:

-running Solaris 10 SPARC, multiple Sybase instances & apps (java, perl, financial software).

-kernel version: Generic_142900-13
Code:
$ uptime
1:23pm  up 13 day(s), 17:34,  19 users,  load average: 21.75, 22.65, 25.14

Huge amount of memory & CPUs:
# prtdiag -v
System Configuration:  Sun Microsystems  sun4u Sun Fire E25K
System clock frequency: 150 MHz
Memory size: 163840 Megabytes

========================= CPUs =========================

          CPU      Run    E$    CPU     CPU
Slot ID   ID       MHz    MB   Impl.    Mask
--------  -------  ----  ----  -------  ----
/SB00/P0    0,  4  1800  32.0  US-IV+   2.2
/SB00/P1    1,  5  1800  32.0  US-IV+   2.2
/SB00/P2    2,  6  1800  32.0  US-IV+   2.2
/SB00/P3    3,  7  1800  32.0  US-IV+   2.2
/SB01/P0   32, 36  1350  16.0  US-IV    3.1
/SB01/P1   33, 37  1350  16.0  US-IV    3.1
/SB01/P2   34, 38  1350  16.0  US-IV    3.1
/SB01/P3   35, 39  1350  16.0  US-IV    3.1
/SB04/P0  128,132  1800  32.0  US-IV+   2.2
/SB04/P1  129,133  1800  32.0  US-IV+   2.2
/SB04/P2  130,134  1800  32.0  US-IV+   2.2
/SB04/P3  131,135  1800  32.0  US-IV+   2.2
/SB05/P0  160,164  1800  32.0  US-IV+   2.2
/SB05/P1  161,165  1800  32.0  US-IV+   2.2
/SB05/P2  162,166  1800  32.0  US-IV+   2.2
/SB05/P3  163,167  1800  32.0  US-IV+   2.2
/SB08/P0  256,260  1350  16.0  US-IV    3.1
/SB08/P1  257,261  1350  16.0  US-IV    3.1
/SB08/P2  258,262  1350  16.0  US-IV    3.1
/SB08/P3  259,263  1350  16.0  US-IV    3.1

But even with all that CPU power, the system still seems to be choking:
# sar -q

SunOS aubbwsyd01 5.10 Generic_142900-13 sun4u    03/24/2011

00:00:01 runq-sz %runocc swpq-sz %swpocc
00:05:02    26.4      72     0.0       0
00:10:02    25.9      71     0.0       0
00:15:02    27.4      73     0.0       0
00:20:01    27.3      62     0.0       0
00:25:01    25.5      66     0.0       0
00:30:02    26.9      75     0.0       0
00:35:01    36.1      60     0.0       0
00:40:02    28.5      64     0.0       0
00:45:01    30.6      58     0.0       0
00:50:02    30.0      64     0.0       0
00:55:02    30.4      59     0.0       0
01:00:02    26.7      64     0.0       0
...
12:45:02    29.5      78     0.0       0
12:50:01    27.4      90     0.0       0
12:55:01    29.7      79     0.0       0
13:00:03    30.7      76     0.0       0
13:05:01    30.4      86     0.0       0
13:10:03    34.6      81     0.0       0
13:15:01    26.8      84     0.0       0
13:20:02    30.4      77     0.0       0
13:25:01    31.6      72     0.0       0

Average     29.5      69     0.0       0

# sar -r

SunOS aubbwsyd01 5.10 Generic_142900-13 sun4u    03/24/2011

00:00:01 freemem freeswap
00:05:02  586184 110438515
00:10:02  562080 113580170
00:15:02  547328 111934356
00:20:01  577790 111795786
00:25:01  597018 112950564
00:30:02  630584 110620673
00:35:01  649792 113179258
00:40:02  662950 110557264
00:45:01  658017 113512159
00:50:02  633167 110902038
00:55:02  644952 113924963
01:00:02  610516 112041306
...
12:45:02  348721 97869521
12:50:01  340880 96804395
12:55:01  339169 98490899
13:00:03  327440 99308450
13:05:01  336337 97280372
13:10:03  341150 99300626
13:15:01  345920 98246498
13:20:02  369102 99563900
13:25:01  387421 99101277

Average   627886 118480917

#mpstat 5 2
... (2nd iteration below)
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0 2152   1 26484   926  336 1593  276  649  976    5 14654   34  44   0  22
  1 2056   1 32114   796  285 1322  254  597  958    9 16580   38  43   0  20
  2 1715   1 25972   888  323 1578  262  602  822    3 22862   33  46   0  21
  3 1706   2 29307   724  279 1183  197  515  820    6 19937   40  39   0  21
  4 1378   0 25992   816  313 1464  211  564  779    1 16577   43  35   0  22
  5 1587   1 28487   808  302 1420  237  571  930    5 20051   31  48   0  21
  6 1429   1 19215   765  286 1338  207  521  830    3 21779   38  39   0  24
  7 1547   0 22940   801  293 1497  234  557  820    2 19536   35  44   0  22
 32 1217   2 15876  1314  641 1125  287  555  574    3  5699   31  57   0  12
 33 1304   3 23066   870  303 1469  307  664  603    3  7398   38  47   0  15
 34 1459   1 25564   951  337 1565  330  691  660    3  8834   32  51   0  16
 35 1282   2 22116   898  340 1565  280  633  585    3  7867   36  47   0  17
 36 1255   1 20946   802  286 1296  285  583  567    3  9369   30  61   0   9
 37 1348   0 23823   813  297 1426  260  581  601    3  7670   32  51   0  17
 38 1028   1 21024   810  296 1434  258  588  551    4  6874   32  51   0  17
 39 1065   1 21564   706  270 1321  192  512  771    1  7690   36  47   0  17
128 1517   1 25091  1059  375 1535  371  733  860    2 27353   41  44   0  16
129 1707   1 27668   927  334 1448  308  673  823    2 20142   39  44   0  17
130 1376   2 23294   866  318 1349  282  624  745    3 26822   37  46   0  17
131 1238   4 20804   895  322 1425  325  610  744    3 32165   46  39   0  15
132 1169   1 24721   780  283 1264  262  535  798    3 31841   47  39   0  14
133 1339   0 20148   789  289 1202  256  537  928    1 30757   46  41   0  13
134 1134   2 21571   862  315 1372  279  587  812    2 32827   46  38   0  16
135 1296   2 19052   898  331 1437  293  601  680    2 28036   43  39   0  18
160 1151   0 20643   730  241 1027  292  470 1065    3 57836   57  36   0   8
161 1094   0 13299   848  297 1188  323  473 1050    3 58257   45  46   0  10
162 1245   0 15682   923  330 1221  370  477  778    3 53849   49  42   0   9
163  927   0 9607   845  297 1145  370  423  678    2 69122   55  39   0   6
164  560   0 14091  4496 4033 1016  276  380  515    2 50642   50  42   0   9
165  675   0 18376  1595 1135 1002  259  377  662    2 62744   52  36   0  12
166  593   0 9206   901  331 1215  375  421  529    2 81789   59  33   0   8
167  838   0 24495   733  267  958  279  361  566    2 54789   53  35   0  12
256 1409   4 20748   878  309 1192  282  560  546    3 17693   36  49   0  16
257 1363   4 19532   848  298 1201  305  522  566    3 24880   39  48   0  13
258 1252   2 27165   865  322 1192  267  507  644    5 27032   32  52   0  15
259 1089   0 18189   902  379 1211  252  480  490    2 26119   36  47   0  17
260 1249   4 19819  1018  397 1508  303  570  468    3 28197   34  45   0  21
261 1081   6 18595   807  326  985  241  447  490    2 29507   34  51   0  14
262 1065   3 16197   882  351 1290  251  478  471    2 32525   33  48   0  19
263 1095   2 21474  1308  791 1218  237  477  562    3 26501   32  49   0  19

#top
last pid: 13141;  load avg:  24.9,  25.3,  25.3;       up 13+17:46:55                                                  13:36:00
1399 processes: 1382 sleeping, 1 running, 1 zombie, 15 on cpu
CPU states: 56.7% idle, 24.5% user, 18.8% kernel,  0.0% iowait,  0.0% swap
Memory: 160G phys mem, 3221M free mem, 281G swap, 276G free swap

   PID USERNAME LWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
  3035 sybdev   176   0    0   12G   12G cpu    115.2H   122% dataserver
 15934 sybase   264   0    0   12G   12G cpu    143.8H   108% dataserver
 15440 sybase   264   0    0   12G   12G cpu    170.9H 98.04% dataserver
  5436 sybdev   158   0    0   12G   12G cpu    195.5H 97.95% dataserver
 15932 sybase   264   0    0   12G   12G cpu     50.0H 97.94% dataserver
  2860 sybdev   264   0    0   12G   12G cpu    186.6H 88.24% dataserver
 15955 sybase   264   0    0   12G   12G cpu     26.6H 79.29% dataserver
 15966 sybase   264   4    0   12G   12G sleep   34.4H 59.64% dataserver
  2902 sybdev   264   0    0   12G   12G cpu    101.1H 59.48% dataserver
 15937 sybase   264   0    0   12G   12G cpu    140.2H 41.35% dataserver
 19421 appdev   1   0    0  443M  411M sleep  836:14 33.02% perl
 12074 appdev 999  59    0 3002M 2817M sleep   33.3H 31.77% java
 24636 appdev 999  59    0  485M  432M sleep   18:12 31.40% java
 27539 appdev   1   0    0 1843M 1655M cpu     46.6H 29.13% perl
 10297 appdev   1   0    2   39M   19M cpu    104:16 28.15% perl


So I just can't figure out where these huge runq's are coming from... can someone please tell me what I'm missing or what would be the next thing to check?
Maybe it's staring me right in the face but I just don't see it Smilie

Many thanks in advance!!

Last edited by Perderabo; 03-25-2011 at 04:23 PM..
# 2  
Old 03-28-2011
Anyone has any ideas?
# 3  
Old 03-28-2011
The runq-sz isn't that high, given the number of cores (40).
This User Gave Thanks to jlliagre For This Post:
# 4  
Old 03-29-2011
Forget "top". That's inaccurate enough on a Solaris box with just a few CPUs.

What's "prstat -a" and "vmstat 2 20" show when the machine is slow?

How about "iostat -sndxz 2 20"?

FWIW, there appears to be plenty of CPU available.
This User Gave Thanks to achenle For This Post:
# 5  
Old 04-04-2011
Thanks for your replies.

Achenle, here are the outputs of "prstat -a", "vmstat 2 20" & "iostat -sndxz 2 20".
And yep I think the 'solution' to this performance issues before was the "throw as much CPU against it and hope if that resolves it"-approach.. . Which clearly didn't have the hoped for effect.
Code:
root # prstat -a
   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP
  2991 sybbck     12G   12G cpu162   0    0 181:17:47 2.9% dataserver/264
  2902 sybbck     12G   12G cpu164   0    0 201:44:38 2.9% dataserver/264
  3035 sybbck     12G   12G cpu260   0    0 217:30:41 2.8% dataserver/264
 15937 sybase     12G   12G cpu2     0    0 271:43:48 2.8% dataserver/264
  2931 sybbck     12G   12G cpu34    0    0 247:58:39 2.7% dataserver/264
 15950 sybase     12G   12G cpu135   0    0  70:47:34 2.6% dataserver/264
 15971 sybase     12G   12G cpu163   0    0 152:57:03 2.6% dataserver/264
 15934 sybase     12G   12G cpu128   0    0 294:14:32 2.6% dataserver/264
 15966 sybase     12G   12G cpu166   0    0  76:18:56 2.5% dataserver/264
 15970 sybase     12G   12G cpu167   0    0  78:03:39 2.5% dataserver/264
 15932 sybase     12G   12G cpu161   0    0 113:41:30 2.5% dataserver/264
 15955 sybase     12G   12G cpu131   0    0  71:25:05 2.5% dataserver/264
  2860 sybbck     12G   12G cpu5     0    0 363:30:51 2.5% dataserver/264
  3010 sybbck     12G   12G cpu129   0    0 280:51:08 2.4% dataserver/264
  5436 sybbck     12G   12G cpu258   0    0 374:04:30 2.3% dataserver/264
  2816 sybbck     12G   12G cpu132   0    0 258:57:25 2.3% dataserver/264
 15925 sybase     12G   12G cpu256   0    0 172:25:59 2.3% dataserver/264
 15440 sybase     12G   12G cpu164   0    0 332:39:20 2.2% dataserver/264
 15958 sybase     12G   12G sleep   41    0  58:39:32 1.9% dataserver/264
 15902 sybase     12G   12G sleep   41    0 160:19:43 1.8% dataserver/264
 21065 root     6488K 5112K cpu165  40    0   0:00:40 1.6% prstat/1
  6146 appdev  503M  423M sleep    0    0   0:10:34 1.1% appdevljv/24
 19598 appdev 2955M 2834M cpu3    31    0  83:58:38 1.0% java/2562
 23910 appdev 3091M 3011M sleep    0    0  19:52:02 0.7% perl/1
 25934 appdev 1011M  951M sleep   20    0  28:47:55 0.4% perl/1
  3501 appdev 1422M 1317M sleep   59    0 114:32:34 0.4% java/2872
 13726 appdev   99M   71M sleep   24    2   1:22:42 0.4% perl/1
  3828 appdev  589M  534M sleep   59    0  21:18:43 0.4% java/1785
 19854 appdev 2818M 2566M sleep   53    2   1:44:23 0.4% java/1919
  7581 daemon   3088K  888K sleep   60  -20  98:58:26 0.4% nfsd/16
 NPROC USERNAME  SWAP   RSS MEMORY      TIME  CPU
   100 sybase     24G   24G    15% 1901:53:0  29%
    18 sybbck     12G   12G   7.7% 2125:59:0  21%
  1155 appdev  314G   95G    60% 1256:43:3  12%
   106 root      398M  275M   0.2%  24:51:47 1.6%
     7 daemon     15M   14M   0.0% 101:43:05 0.4%
    40 document      13G 2140M   1.3%  16:57:59 0.1%
     4 devuser2 5384K 9480K   0.0%   0:00:02 0.0%
     1 noaccess   86M   83M   0.1%   0:55:05 0.0%
     6 devuser1 7032K   10M   0.0%   0:00:11 0.0%
     1 tcm  512K 2472K   0.0%   0:02:40 0.0%
    17 mqm       125M   58M   0.0%   2:05:31 0.0%
     1 smmsp    2344K 6448K   0.0%   0:00:43 0.0%
Total: 1471 processes, 60965 lwps, load averages: 42.38, 45.79, 45.04
root #

Moderator's Comments:
Mod Comment Truncated..
# 6  
Old 04-06-2011
If they're looking at end-user performance issues I'd have the Sybase DBA's check the databases as well. Make sure that they don't need some tuning.
This User Gave Thanks to jtollefson For This Post:
# 7  
Old 04-07-2011
None of your disks seems to be all that busy - the service times are all really good. The vmstat output doesn't look exceptional.

Yet the Sybase DB processes are pegging their CPUs.

Looks like you could use some serious database tuning.
This User Gave Thanks to achenle For This Post:
Login or Register to Ask a Question

Previous Thread | Next Thread

6 More Discussions You Might Find Interesting

1. High Performance Computing

High Performance Linpack Compiling Issue

I'm trying to compile Linpack on a Ubuntu cluster. I'm running MPI. I've modified the following values to fit my system TOPdir MPdir LAlib CC LINKER. When compiling I get the following error: (the error is at the end, the other errors in between are because I've ran the script several times so... (0 Replies)
Discussion started by: JPJPJPJP
0 Replies

2. High Performance Computing

High performance Linkpack

hello everyone , Im new to HPL. i wanted to know whether High performance linpack solves linear system of equations for single precision airthmatic on LINUX. it works for double precision , so is there any HPL version which is for single precision.\ thanks . (0 Replies)
Discussion started by: rahul_viz
0 Replies

3. High Performance Computing

What does high performance computing mean?

Sorry, I am not really from a computer science background. But from the subject of it, does it mean something like multi processor programming? distributed computing? like using erlang? Sound like it, which excite me. I just had a 3 day crash course in erlang and "Cocurrency oriented programming"... (7 Replies)
Discussion started by: linuxpenguin
7 Replies

4. High Performance Computing

High Performance Computing

I am interested in setting up some High Performance Computing clusters and would like to get people's views and experiences on this. I have 2 requirements: 1. Compute clusters to do fast cpu intensive computations 2. Storage clusters of parallel and extendable filesystems spread across many... (6 Replies)
Discussion started by: humbletech99
6 Replies

5. UNIX for Advanced & Expert Users

Causes of high runq-sz and cswch/s output from sar

Hi folks, I'm running RHEL4 (2.6.9 - 64 bit) on a 4 CPU Dual Core Xeon. This server is running DB2 database. I've been getting the following readings from sar over the past week: 09:35:01 AM cswch/s 09:40:01 AM 4774.95 09:45:01 AM 27342.76 09:50:02 AM 196015.02 09:55:01 AM... (8 Replies)
Discussion started by: fulat2k
8 Replies

6. AIX

Performance Problem - High CPU utilization

Hello everybody. I have a problem with my AIX 5.3. Recently my unix shows a high cpu utilization with sar or topas. I need to find what I have to do to solve this problem, in fact, I don't know what is my problem. I had the same problem with another AIX 5.3 running the same... (2 Replies)
Discussion started by: wilder.mellotto
2 Replies
Login or Register to Ask a Question