Causes of high runq-sz and cswch/s output from sar

09-30-2008

Registered User

6, 0

Join Date: Feb 2008

Last Activity: 27 February 2009, 4:01 AM EST

Posts: 6

Thanks Given: 0

Thanked 0 Times in 0 Posts

Causes of high runq-sz and cswch/s output from sar

Hi folks,

I'm running RHEL4 (2.6.9 - 64 bit) on a 4 CPU Dual Core Xeon. This server is running DB2 database. I've been getting the following readings from sar over the past week:

Code:

09:35:01 AM   cswch/s
09:40:01 AM   4774.95
09:45:01 AM  27342.76
09:50:02 AM 196015.02
09:55:01 AM 337021.92
10:00:01 AM 347007.79
10:05:01 AM 309210.99
10:10:01 AM 308174.09
10:15:01 AM 350074.07
10:20:01 AM 350716.36
10:25:01 AM 329279.95
10:30:02 AM 319551.01
10:35:01 AM 312952.02
10:40:01 AM 130142.16
10:45:01 AM   6056.06
10:50:01 AM   5131.25

Code:

09:35:01 AM   runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15
09:40:01 AM         5      1156      7.31      6.63      6.69
09:45:01 AM        29      1190     29.53     15.30      9.89
09:50:02 AM        45      1298     58.97     39.22     21.26
09:55:01 AM       117      1323     85.29     67.64     38.24
10:00:01 AM        71      1332     70.71     76.38     50.85
10:05:01 AM        96      1286     58.03     65.37     53.27
10:10:01 AM       105      1316     77.59     70.25     58.42
10:15:01 AM       136      1334     97.88     82.74     66.72
10:20:01 AM        70      1308     80.77     84.55     72.34
10:25:01 AM        86      1308     77.55     81.79     74.77
10:30:02 AM       177      1327    121.49    100.03     83.81
10:35:01 AM        95      1334    135.49    118.93     95.86
10:40:01 AM        10      1212     14.40     71.86     84.11

The context switching and run queue length values stick out like a sore thumb. However, the CPU load at that time is normal although slightly alleviated:

Code:

09:40:01 AM       all     58.65      0.00      1.81      0.16     39.38
09:45:01 AM       all     72.99      0.00      2.44      0.27     24.30
09:50:02 AM       all     53.78      0.00      6.21      1.19     38.82
09:55:01 AM       all     45.99      0.00      9.12      0.75     44.14
10:00:01 AM       all     46.96      0.00      9.25      0.76     43.02
10:05:01 AM       all     42.43      0.00      9.36      2.05     46.16
10:10:01 AM       all     47.45      0.00      8.60      0.38     43.56
10:15:01 AM       all     45.64      0.00      9.17      0.34     44.84
10:20:01 AM       all     44.27      0.00      8.69      0.62     46.43
10:25:01 AM       all     45.23      0.00      9.04      0.73     45.00
10:30:02 AM       all     57.97      0.00      9.27      0.08     32.68
10:35:01 AM       all     61.81      0.00      9.62      0.09     28.48
10:40:01 AM       all     66.45      0.00      6.24      0.21     27.11
10:45:01 AM       all     45.15      0.00      2.41      0.28     52.16
10:50:01 AM       all     33.64      0.00      2.05      0.23     64.08
10:55:01 AM       all     39.24      0.00      3.27      0.27     57.22
11:00:01 AM       all     49.34      0.00      4.09      0.22     46.35
11:05:01 AM       all     42.20      0.00      2.45      0.32     55.02
11:10:01 AM       all     35.33      0.00      2.14      0.20     62.33
11:15:01 AM       all     33.07      0.00      2.31      0.24     64.38
11:20:01 AM       all     35.54      0.00      4.20      0.21     60.05
11:25:01 AM       all     34.20      0.00      2.02      0.16     63.61
11:30:01 AM       all     31.26      0.00      1.79      0.17     66.78
11:35:01 AM       all     36.21      0.00      2.88      0.14     60.76

I've checked on any long running queries, but there seems to be none. My question is how can I find the cause of the high system usage? Is it possible to pinpoint the process(es) which is causing the issue?

Thanks!

fulat2k

View Public Profile for fulat2k

Find all posts by fulat2k

09-30-2008

Registered User

8, 0

Join Date: Sep 2008

Last Activity: 3 October 2008, 9:33 AM EDT

Location: Netherlands

Posts: 8

Thanks Given: 0

Thanked 0 Times in 0 Posts

It's probally IO related.

does "ps aux" give processes in de "D" state?

aluiken

View Public Profile for aluiken

Find all posts by aluiken

09-30-2008

Registered User

6, 0

Join Date: Feb 2008

Last Activity: 27 February 2009, 4:01 AM EST

Posts: 6

Thanks Given: 0

Thanked 0 Times in 0 Posts

Don't know about processes in "D" state. Will have to run top in batch mode to find out when it happens again. Does context switching relate to IO functions?

fulat2k

View Public Profile for fulat2k

Find all posts by fulat2k

09-30-2008

Registered User

11,728, 1,345

Join Date: Feb 2004

Last Activity: 8 May 2020, 9:07 AM EDT

Location: NM

Posts: 11,728

Thanks Given: 903

Thanked 1,345 Times in 1,201 Posts

I/O may cause a process to lose current context. ie. switch context.

What I/O scheduler are you using: AS, CFQ?

If your %iowait is high try increasing quantum.

It is possible to see 100% iowait. What iowait really measures is the percent of time at least one process is in an iowait state. I would bet your system shows a high values during the context switch rush.

cfq is able to dedicate a time slice to each process that uses a block device. You can adjust the time slice. Consider looking into that.

Since you are not incurring a cpu hit, all you are doing is increasing wasted cpu time in context switches. Rather than something more useful. IMO.

jim mcnamara

View Public Profile for jim mcnamara

Find all posts by jim mcnamara

10-02-2008

Registered User

6, 0

Join Date: Feb 2008

Last Activity: 27 February 2009, 4:01 AM EST

Posts: 6

Thanks Given: 0

Thanked 0 Times in 0 Posts

How do I find out which I/O scheduler in Linux?

Edit: Scratch that. Just Googled it... Thankx.

fulat2k

View Public Profile for fulat2k

Find all posts by fulat2k

10-02-2008

Registered User

11,728, 1,345

Join Date: Feb 2004

Last Activity: 8 May 2020, 9:07 AM EDT

Location: NM

Posts: 11,728

Thanks Given: 903

Thanked 1,345 Times in 1,201 Posts

as, cfq, deadline, and noop are the four choices, AFIAK right now

/sys/block/$devicename/queue/scheduler for each device has one of those values. Supposedly you can place another value into the "file"

Code:

echo  "noop" > /sys/block/$devicename/queue/scheduler

and thereby change the behavior of a running kernel. ..... where $devicename is the name of a block device like /dev/sda1

"elevator" is the kernel parameter used to control this at boot time.

I HAVE NEVER done this; be careful.

jim mcnamara

View Public Profile for jim mcnamara

Find all posts by jim mcnamara

10-02-2008

Registered User

6, 0

Join Date: Feb 2008

Last Activity: 27 February 2009, 4:01 AM EST

Posts: 6

Thanks Given: 0

Thanked 0 Times in 0 Posts

That sounds risky as I've never tweak this setting before in any of the servers as well

Will I be able to determine which process is causing the high context switching by using top in batch mode when the switching occurs?

fulat2k

View Public Profile for fulat2k

Find all posts by fulat2k

UNIX for Advanced & Expert Users

Causes of high runq-sz and cswch/s output from sar

10 More Discussions You Might Find Interesting

1. Red Hat

Sar output

Discussion started by: manoj.solaris

2. Solaris

Odd output from sar

Discussion started by: Jyda

3. Solaris

sar output

Discussion started by: chaandana

4. Emergency UNIX and Linux Support

Performance investigation, very high runq-sz %runocc

Discussion started by: Solarius

5. Solaris

Strange sar output

Discussion started by: dangral

6. Solaris

extraction of sar output

Discussion started by: vijill

7. Shell Programming and Scripting

sar today's output

Discussion started by: raghur77

8. HP-UX

sar output gives 98% idle CPU

Discussion started by: Ashrunil

9. UNIX for Advanced & Expert Users

Why my sar is not updating the output file.

Discussion started by: skneeli

10. Shell Programming and Scripting

sar -q output for one processor

Discussion started by: eyounes