Causes of high runq-sz and cswch/s output from sar
Hi folks,
I'm running RHEL4 (2.6.9 - 64 bit) on a 4 CPU Dual Core Xeon. This server is running DB2 database. I've been getting the following readings from sar over the past week:
The context switching and run queue length values stick out like a sore thumb. However, the CPU load at that time is normal although slightly alleviated:
I've checked on any long running queries, but there seems to be none. My question is how can I find the cause of the high system usage? Is it possible to pinpoint the process(es) which is causing the issue?
Don't know about processes in "D" state. Will have to run top in batch mode to find out when it happens again. Does context switching relate to IO functions?
I/O may cause a process to lose current context. ie. switch context.
What I/O scheduler are you using: AS, CFQ?
If your %iowait is high try increasing quantum.
It is possible to see 100% iowait. What iowait really measures is the percent of time at least one process is in an iowait state. I would bet your system shows a high values during the context switch rush.
cfq is able to dedicate a time slice to each process that uses a block device. You can adjust the time slice. Consider looking into that.
Since you are not incurring a cpu hit, all you are doing is increasing wasted cpu time in context switches. Rather than something more useful. IMO.
as, cfq, deadline, and noop are the four choices, AFIAK right now
/sys/block/$devicename/queue/scheduler for each device has one of those values. Supposedly you can place another value into the "file"
and thereby change the behavior of a running kernel. ..... where $devicename is the name of a block device like /dev/sda1
"elevator" is the kernel parameter used to control this at boot time.
That sounds risky as I've never tweak this setting before in any of the servers as well Will I be able to determine which process is causing the high context switching by using top in batch mode when the switching occurs?
I am facing situation where sar -u command is showing 0 for all cps, so does it mean all the cpus are fully utilized, os is oracle Linux 6.8
01:34:13 PM all 0 0 0 0 0.00 0 (2 Replies)
We're experiencing some intermittent freezes on one of our systems and I'm trying to figure out what is happening.
We're running Solaris 10 zones mounting shares from netapp through nfs.
On the zone that freezes we have sar running and are getting this output:
SunOS prodserver 5.10... (3 Replies)
I've just been handed a hot potato from a colleague who left :(... our client has been complaining about slow performance on one of our servers.
I'm not very experienced in investigating performance issues so I hoping someone will be so kind to provide some guidance
Here is an overview of the... (8 Replies)
I was reviewing yesterday's sar file and came across this strange output! What in the world? Any reason why there's output like that?
SunOS unixbox 5.10 Generic_144488-07 sun4v sparc SUNW,T5240 Solaris
00:00:58 device %busy avque r+w/s blks/s avwait avserv
11:20:01 ... (4 Replies)
Hi,
Anyone knows how to extract sar command output to excel or Is there any free grapical tools to extract this sar log file. thanks, regards (2 Replies)
Hi All,
i tried sar command the output appears to be for several days
I would like to just see today's SAR output: Please advice me.
$sar
Linux 2.6.9-67.ELsmp (lrtp50) 02/28/09
00:00:01 CPU %user %nice %system %iowait %idle
00:05:02 all 3.10... (4 Replies)
Dear All,
Our HPUX 8 GB 8CPU database server is behaving abnormally for the last 4+ weeks. I have generated a sar output and it is here-
11:46:52 %usr %sys %wio %idle
11:46:53 1 1 6 92
11:46:54 0 1 0 99
11:46:55 0 1 0... (3 Replies)
I am trying to collect the sar output for around 90minutes.
When i do
sar 1 5000 >> /tmp/sar.out
It's not updating the sar.out file. When we decrease the 5000 to smaller number like 10, i can see the file sar.out updated after the 10seconds.If i kill my sar while it is running it's not... (1 Reply)