Causes of high runq-sz and cswch/s output from sar
Hi folks,
I'm running RHEL4 (2.6.9 - 64 bit) on a 4 CPU Dual Core Xeon. This server is running DB2 database. I've been getting the following readings from sar over the past week:
Code:
09:35:01 AM cswch/s
09:40:01 AM 4774.95
09:45:01 AM 27342.76
09:50:02 AM 196015.02
09:55:01 AM 337021.92
10:00:01 AM 347007.79
10:05:01 AM 309210.99
10:10:01 AM 308174.09
10:15:01 AM 350074.07
10:20:01 AM 350716.36
10:25:01 AM 329279.95
10:30:02 AM 319551.01
10:35:01 AM 312952.02
10:40:01 AM 130142.16
10:45:01 AM 6056.06
10:50:01 AM 5131.25
Code:
09:35:01 AM runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15
09:40:01 AM 5 1156 7.31 6.63 6.69
09:45:01 AM 29 1190 29.53 15.30 9.89
09:50:02 AM 45 1298 58.97 39.22 21.26
09:55:01 AM 117 1323 85.29 67.64 38.24
10:00:01 AM 71 1332 70.71 76.38 50.85
10:05:01 AM 96 1286 58.03 65.37 53.27
10:10:01 AM 105 1316 77.59 70.25 58.42
10:15:01 AM 136 1334 97.88 82.74 66.72
10:20:01 AM 70 1308 80.77 84.55 72.34
10:25:01 AM 86 1308 77.55 81.79 74.77
10:30:02 AM 177 1327 121.49 100.03 83.81
10:35:01 AM 95 1334 135.49 118.93 95.86
10:40:01 AM 10 1212 14.40 71.86 84.11
The context switching and run queue length values stick out like a sore thumb. However, the CPU load at that time is normal although slightly alleviated:
Code:
09:40:01 AM all 58.65 0.00 1.81 0.16 39.38
09:45:01 AM all 72.99 0.00 2.44 0.27 24.30
09:50:02 AM all 53.78 0.00 6.21 1.19 38.82
09:55:01 AM all 45.99 0.00 9.12 0.75 44.14
10:00:01 AM all 46.96 0.00 9.25 0.76 43.02
10:05:01 AM all 42.43 0.00 9.36 2.05 46.16
10:10:01 AM all 47.45 0.00 8.60 0.38 43.56
10:15:01 AM all 45.64 0.00 9.17 0.34 44.84
10:20:01 AM all 44.27 0.00 8.69 0.62 46.43
10:25:01 AM all 45.23 0.00 9.04 0.73 45.00
10:30:02 AM all 57.97 0.00 9.27 0.08 32.68
10:35:01 AM all 61.81 0.00 9.62 0.09 28.48
10:40:01 AM all 66.45 0.00 6.24 0.21 27.11
10:45:01 AM all 45.15 0.00 2.41 0.28 52.16
10:50:01 AM all 33.64 0.00 2.05 0.23 64.08
10:55:01 AM all 39.24 0.00 3.27 0.27 57.22
11:00:01 AM all 49.34 0.00 4.09 0.22 46.35
11:05:01 AM all 42.20 0.00 2.45 0.32 55.02
11:10:01 AM all 35.33 0.00 2.14 0.20 62.33
11:15:01 AM all 33.07 0.00 2.31 0.24 64.38
11:20:01 AM all 35.54 0.00 4.20 0.21 60.05
11:25:01 AM all 34.20 0.00 2.02 0.16 63.61
11:30:01 AM all 31.26 0.00 1.79 0.17 66.78
11:35:01 AM all 36.21 0.00 2.88 0.14 60.76
I've checked on any long running queries, but there seems to be none. My question is how can I find the cause of the high system usage? Is it possible to pinpoint the process(es) which is causing the issue?
Don't know about processes in "D" state. Will have to run top in batch mode to find out when it happens again. Does context switching relate to IO functions?
I/O may cause a process to lose current context. ie. switch context.
What I/O scheduler are you using: AS, CFQ?
If your %iowait is high try increasing quantum.
It is possible to see 100% iowait. What iowait really measures is the percent of time at least one process is in an iowait state. I would bet your system shows a high values during the context switch rush.
cfq is able to dedicate a time slice to each process that uses a block device. You can adjust the time slice. Consider looking into that.
Since you are not incurring a cpu hit, all you are doing is increasing wasted cpu time in context switches. Rather than something more useful. IMO.
That sounds risky as I've never tweak this setting before in any of the servers as well Will I be able to determine which process is causing the high context switching by using top in batch mode when the switching occurs?
I am facing situation where sar -u command is showing 0 for all cps, so does it mean all the cpus are fully utilized, os is oracle Linux 6.8
01:34:13 PM all 0 0 0 0 0.00 0 (2 Replies)
We're experiencing some intermittent freezes on one of our systems and I'm trying to figure out what is happening.
We're running Solaris 10 zones mounting shares from netapp through nfs.
On the zone that freezes we have sar running and are getting this output:
SunOS prodserver 5.10... (3 Replies)
I've just been handed a hot potato from a colleague who left :(... our client has been complaining about slow performance on one of our servers.
I'm not very experienced in investigating performance issues so I hoping someone will be so kind to provide some guidance
Here is an overview of the... (8 Replies)
I was reviewing yesterday's sar file and came across this strange output! What in the world? Any reason why there's output like that?
SunOS unixbox 5.10 Generic_144488-07 sun4v sparc SUNW,T5240 Solaris
00:00:58 device %busy avque r+w/s blks/s avwait avserv
11:20:01 ... (4 Replies)
Hi,
Anyone knows how to extract sar command output to excel or Is there any free grapical tools to extract this sar log file. thanks, regards (2 Replies)
Hi All,
i tried sar command the output appears to be for several days
I would like to just see today's SAR output: Please advice me.
$sar
Linux 2.6.9-67.ELsmp (lrtp50) 02/28/09
00:00:01 CPU %user %nice %system %iowait %idle
00:05:02 all 3.10... (4 Replies)
Dear All,
Our HPUX 8 GB 8CPU database server is behaving abnormally for the last 4+ weeks. I have generated a sar output and it is here-
11:46:52 %usr %sys %wio %idle
11:46:53 1 1 6 92
11:46:54 0 1 0 99
11:46:55 0 1 0... (3 Replies)
I am trying to collect the sar output for around 90minutes.
When i do
sar 1 5000 >> /tmp/sar.out
It's not updating the sar.out file. When we decrease the 5000 to smaller number like 10, i can see the file sar.out updated after the 10seconds.If i kill my sar while it is running it's not... (1 Reply)