Nearly Random, Uncorrelated Server Load Average Spikes
I have been wrangling with a small problem on a Ubuntu server which runs a LAMP application.
This server runs fine, basically:
But around five to six times a day, the (1 minute) load average spikes to around 200 and quickly goes back down to normal, all in the time interval of less than one minute:
I have added sensors to the crontab processes, both for the LAMP application and the server crontabs and log all cron start and stop events in the database. This instrumentation of the server yields no fruit.
In addition, in the occasional spurious events that seem to happen at "near regular" intervals, I have had a terminal window open running top, mytop, ifstats and other command line tools trying to trap the process which is causing the load spurs. There has been nothing unusual and mysql is always the "leading contender" when the pikes occur.
There is no other process with a high CPU during the spikes, so that seems to indicate something related to mysql., obviously.
With only mysql showing up as the "leading contender", I added more instrumentation to various mysql processes in the application, and can find no event on the server or in the application which correlates to this spurious behavior.
I have correlated the spikes to (1) network interface i/o stats, (2) apache processes, (3) all crontab processes (in app and on server) and regardless of my skills at adding sensors to various processes, I cannot trap the spurious process.
I have running time series graphs, and logging in the database. When there is an incident on the graph, I go to the database log and look at all the sensor entries and can find no correlation.
There is no correlation to cron processes, network i/o, users on the server, bots, and backup processes.
I keep adding more and more instrumentation to every process on the server and in the app, but all that instrumentation looking for correlation to a server process, including the LAMP app, bears no fruit.
At first, I thought this problem was caused by bots hitting the web server; but there is no correlation to increased bot traffic or LAN interface I/O.
Then I thought the problem was caused by various cron entries in the LAMP application; but reconfiguring them, turning them on and off, adding instrumentation for start / stop times in a log, also bears no fruit.
I've been working this problem on and off for over a week and cannot find a single causal reason, no a single cause for any effect, which may be causing this spurious load average behavior.
It's only around five to six times a day for less than a minute at each event; but I want to find the cause and fix it. I'm not 100% convinced the issue is caused by a single process / issue.
I wonder if there is some underlying disk I/0 activity on the server causing these spikes? Could this be related to a potential disk error on the SSD drives? Could it be related to underlying disk raid activity?
If so, any ideas how to trap this?
My current "best guess" is that there is some underlying disk I/O activity causing this spurious issue, and that is why nothing at the application level correlates.
we have an unix system which has
load average normally about 20.
but while i am running a particular unix batch which performs heavy
operations on filesystem and database average load
reduces to 15.
how can we explain this situation?
while running that batch idle cpu time is about %60-65... (0 Replies)
Hello all, I have a question about load averages.
I've read the man pages for the uptime and w command for two or three different flavors of Unix (Red Hat, Tru64, Solaris). All of them agree that in the output of the 2 aforementioned commands, you are given the load average for the box, but... (3 Replies)
Hello, Here is the output of top command. My understanding here is,
the load average 0.03 in last 1 min, 0.02 is in last 5 min, 0.00 is in last 15 min.
By seeing this load average, When can we say that, the system load averge is too high?
When can we say that, load average is medium/low??... (8 Replies)
Hi,
i have installed solaris 10 on t-5120 sparc enterprise.
I am little surprised to see load average of 2 or around on this OS.
when checked with ps command following process is using highest CPU. looks like it is running for long time and does not want to stop, but I do not know... (5 Replies)
Hello AlL,..
I want from experts to help me as my load average is increased and i dont know where is the problem !!
this is my top result :
root@a4s # top
top - 11:30:38 up 40 min, 1 user, load average: 3.06, 2.49, 4.66
Mem: 8168788k total, 2889596k used, 5279192k free, 47792k... (3 Replies)
Hi ,
I am using 48 CPU sunOS server at my work.
The application has facility to check the current load average before starting a new process to control the load.
Right now it is configured as 48. So it does mean that each CPU can take maximum one proces and no processe is waiting.
... (2 Replies)
Hi,
I am getting a high load average, around 7, once an hour. It last for about 4 minutes and makes things fairly unusable for this time.
How do I find out what is using this. Looking at top the only thing running at the time is md5sum.
I have looked at the crontab and there is nothing... (10 Replies)
Here we go....
Preface:
..... so in a galaxy far, far, far away from commercial, data sharing corporations.....
For this project, I used the ESP-WROOM-32 as an MQTT (publish / subscribe) client which receives Linux server "load averages" as messages published as MQTT pub/sub messages.... (6 Replies)