Causes of high runq-sz and cswch/s output from sar


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users Causes of high runq-sz and cswch/s output from sar
# 1  
Old 09-30-2008
Question Causes of high runq-sz and cswch/s output from sar

Hi folks,

I'm running RHEL4 (2.6.9 - 64 bit) on a 4 CPU Dual Core Xeon. This server is running DB2 database. I've been getting the following readings from sar over the past week:

Code:
09:35:01 AM   cswch/s
09:40:01 AM   4774.95
09:45:01 AM  27342.76
09:50:02 AM 196015.02
09:55:01 AM 337021.92
10:00:01 AM 347007.79
10:05:01 AM 309210.99
10:10:01 AM 308174.09
10:15:01 AM 350074.07
10:20:01 AM 350716.36
10:25:01 AM 329279.95
10:30:02 AM 319551.01
10:35:01 AM 312952.02
10:40:01 AM 130142.16
10:45:01 AM   6056.06
10:50:01 AM   5131.25

Code:
09:35:01 AM   runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15
09:40:01 AM         5      1156      7.31      6.63      6.69
09:45:01 AM        29      1190     29.53     15.30      9.89
09:50:02 AM        45      1298     58.97     39.22     21.26
09:55:01 AM       117      1323     85.29     67.64     38.24
10:00:01 AM        71      1332     70.71     76.38     50.85
10:05:01 AM        96      1286     58.03     65.37     53.27
10:10:01 AM       105      1316     77.59     70.25     58.42
10:15:01 AM       136      1334     97.88     82.74     66.72
10:20:01 AM        70      1308     80.77     84.55     72.34
10:25:01 AM        86      1308     77.55     81.79     74.77
10:30:02 AM       177      1327    121.49    100.03     83.81
10:35:01 AM        95      1334    135.49    118.93     95.86
10:40:01 AM        10      1212     14.40     71.86     84.11

The context switching and run queue length values stick out like a sore thumb. However, the CPU load at that time is normal although slightly alleviated:

Code:
09:40:01 AM       all     58.65      0.00      1.81      0.16     39.38
09:45:01 AM       all     72.99      0.00      2.44      0.27     24.30
09:50:02 AM       all     53.78      0.00      6.21      1.19     38.82
09:55:01 AM       all     45.99      0.00      9.12      0.75     44.14
10:00:01 AM       all     46.96      0.00      9.25      0.76     43.02
10:05:01 AM       all     42.43      0.00      9.36      2.05     46.16
10:10:01 AM       all     47.45      0.00      8.60      0.38     43.56
10:15:01 AM       all     45.64      0.00      9.17      0.34     44.84
10:20:01 AM       all     44.27      0.00      8.69      0.62     46.43
10:25:01 AM       all     45.23      0.00      9.04      0.73     45.00
10:30:02 AM       all     57.97      0.00      9.27      0.08     32.68
10:35:01 AM       all     61.81      0.00      9.62      0.09     28.48
10:40:01 AM       all     66.45      0.00      6.24      0.21     27.11
10:45:01 AM       all     45.15      0.00      2.41      0.28     52.16
10:50:01 AM       all     33.64      0.00      2.05      0.23     64.08
10:55:01 AM       all     39.24      0.00      3.27      0.27     57.22
11:00:01 AM       all     49.34      0.00      4.09      0.22     46.35
11:05:01 AM       all     42.20      0.00      2.45      0.32     55.02
11:10:01 AM       all     35.33      0.00      2.14      0.20     62.33
11:15:01 AM       all     33.07      0.00      2.31      0.24     64.38
11:20:01 AM       all     35.54      0.00      4.20      0.21     60.05
11:25:01 AM       all     34.20      0.00      2.02      0.16     63.61
11:30:01 AM       all     31.26      0.00      1.79      0.17     66.78
11:35:01 AM       all     36.21      0.00      2.88      0.14     60.76

I've checked on any long running queries, but there seems to be none. My question is how can I find the cause of the high system usage? Is it possible to pinpoint the process(es) which is causing the issue?


Thanks!
# 2  
Old 09-30-2008
It's probally IO related.

does "ps aux" give processes in de "D" state?
# 3  
Old 09-30-2008
Don't know about processes in "D" state. Will have to run top in batch mode to find out when it happens again. Does context switching relate to IO functions?
# 4  
Old 09-30-2008
I/O may cause a process to lose current context. ie. switch context.

What I/O scheduler are you using: AS, CFQ?

If your %iowait is high try increasing quantum.

It is possible to see 100% iowait. What iowait really measures is the percent of time at least one process is in an iowait state. I would bet your system shows a high values during the context switch rush.

cfq is able to dedicate a time slice to each process that uses a block device. You can adjust the time slice. Consider looking into that.

Since you are not incurring a cpu hit, all you are doing is increasing wasted cpu time in context switches. Rather than something more useful. IMO.
# 5  
Old 10-02-2008
How do I find out which I/O scheduler in Linux?

Edit: Scratch that. Just Googled it... Thankx.
# 6  
Old 10-02-2008
as, cfq, deadline, and noop are the four choices, AFIAK right now


/sys/block/$devicename/queue/scheduler for each device has one of those values. Supposedly you can place another value into the "file"
Code:
echo  "noop" > /sys/block/$devicename/queue/scheduler

and thereby change the behavior of a running kernel. ..... where $devicename is the name of a block device like /dev/sda1


"elevator" is the kernel parameter used to control this at boot time.


I HAVE NEVER done this; be careful.
# 7  
Old 10-02-2008
That sounds risky as I've never tweak this setting before in any of the servers as well Smilie Will I be able to determine which process is causing the high context switching by using top in batch mode when the switching occurs?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Red Hat

Sar output

I am facing situation where sar -u command is showing 0 for all cps, so does it mean all the cpus are fully utilized, os is oracle Linux 6.8 01:34:13 PM all 0 0 0 0 0.00 0 (2 Replies)
Discussion started by: manoj.solaris
2 Replies

2. Solaris

Odd output from sar

We're experiencing some intermittent freezes on one of our systems and I'm trying to figure out what is happening. We're running Solaris 10 zones mounting shares from netapp through nfs. On the zone that freezes we have sar running and are getting this output: SunOS prodserver 5.10... (3 Replies)
Discussion started by: Jyda
3 Replies

3. Solaris

sar output

One of my servers giving all zero sar output. Could anyone explain this behaviour. Thanks CHaandana Sample: 10:43:37 %usr %sys %wio %idle 16:15:01 2 1 0 97 16:20:02 2 1 0 97 16:25:02 2 1 0 97 16:30:01 ... (3 Replies)
Discussion started by: chaandana
3 Replies

4. Emergency UNIX and Linux Support

Performance investigation, very high runq-sz %runocc

I've just been handed a hot potato from a colleague who left :(... our client has been complaining about slow performance on one of our servers. I'm not very experienced in investigating performance issues so I hoping someone will be so kind to provide some guidance Here is an overview of the... (8 Replies)
Discussion started by: Solarius
8 Replies

5. Solaris

Strange sar output

I was reviewing yesterday's sar file and came across this strange output! What in the world? Any reason why there's output like that? SunOS unixbox 5.10 Generic_144488-07 sun4v sparc SUNW,T5240 Solaris 00:00:58 device %busy avque r+w/s blks/s avwait avserv 11:20:01 ... (4 Replies)
Discussion started by: dangral
4 Replies

6. Solaris

extraction of sar output

Hi, Anyone knows how to extract sar command output to excel or Is there any free grapical tools to extract this sar log file. thanks, regards (2 Replies)
Discussion started by: vijill
2 Replies

7. Shell Programming and Scripting

sar today's output

Hi All, i tried sar command the output appears to be for several days I would like to just see today's SAR output: Please advice me. $sar Linux 2.6.9-67.ELsmp (lrtp50) 02/28/09 00:00:01 CPU %user %nice %system %iowait %idle 00:05:02 all 3.10... (4 Replies)
Discussion started by: raghur77
4 Replies

8. HP-UX

sar output gives 98% idle CPU

Dear All, Our HPUX 8 GB 8CPU database server is behaving abnormally for the last 4+ weeks. I have generated a sar output and it is here- 11:46:52 %usr %sys %wio %idle 11:46:53 1 1 6 92 11:46:54 0 1 0 99 11:46:55 0 1 0... (3 Replies)
Discussion started by: Ashrunil
3 Replies

9. UNIX for Advanced & Expert Users

Why my sar is not updating the output file.

I am trying to collect the sar output for around 90minutes. When i do sar 1 5000 >> /tmp/sar.out It's not updating the sar.out file. When we decrease the 5000 to smaller number like 10, i can see the file sar.out updated after the 10seconds.If i kill my sar while it is running it's not... (1 Reply)
Discussion started by: skneeli
1 Replies

10. Shell Programming and Scripting

sar -q output for one processor

Hello, We like to know if there is a way to report the sar -q per processor on AIX 4.3 . Please help RGDS,Elie. (1 Reply)
Discussion started by: eyounes
1 Replies
Login or Register to Ask a Question