Sponsored Content
Top Forums UNIX for Advanced & Expert Users Nearly Random, Uncorrelated Server Load Average Spikes Post 303044029 by Neo on Wednesday 12th of February 2020 09:42:41 PM
Old 02-12-2020
Nearly Random, Uncorrelated Server Load Average Spikes

I have been wrangling with a small problem on a Ubuntu server which runs a LAMP application.

Code:
Linux ubuntu 4.15.0-33-generic #36-Ubuntu SMP Wed Aug 15 16:00:05 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

This server runs fine, basically:

Code:
ubuntu:/var/www# uptime
 20:17:13 up 105 days, 19:54,  1 user,  load average: 1.38, 1.12, 1.

But around five to six times a day, the (1 minute) load average spikes to around 200 and quickly goes back down to normal, all in the time interval of less than one minute:

Nearly Random, Uncorrelated Server Load Average Spikes-screen-shot-2020-02-13-91244-amjpg


I have added sensors to the crontab processes, both for the LAMP application and the server crontabs and log all cron start and stop events in the database. This instrumentation of the server yields no fruit.

In addition, in the occasional spurious events that seem to happen at "near regular" intervals, I have had a terminal window open running top, mytop, ifstats and other command line tools trying to trap the process which is causing the load spurs. There has been nothing unusual and mysql is always the "leading contender" when the pikes occur.

There is no other process with a high CPU during the spikes, so that seems to indicate something related to mysql., obviously.

With only mysql showing up as the "leading contender", I added more instrumentation to various mysql processes in the application, and can find no event on the server or in the application which correlates to this spurious behavior.

I have correlated the spikes to (1) network interface i/o stats, (2) apache processes, (3) all crontab processes (in app and on server) and regardless of my skills at adding sensors to various processes, I cannot trap the spurious process.

I have running time series graphs, and logging in the database. When there is an incident on the graph, I go to the database log and look at all the sensor entries and can find no correlation.

There is no correlation to cron processes, network i/o, users on the server, bots, and backup processes.

I keep adding more and more instrumentation to every process on the server and in the app, but all that instrumentation looking for correlation to a server process, including the LAMP app, bears no fruit.

At first, I thought this problem was caused by bots hitting the web server; but there is no correlation to increased bot traffic or LAN interface I/O.

Then I thought the problem was caused by various cron entries in the LAMP application; but reconfiguring them, turning them on and off, adding instrumentation for start / stop times in a log, also bears no fruit.

I've been working this problem on and off for over a week and cannot find a single causal reason, no a single cause for any effect, which may be causing this spurious load average behavior.

It's only around five to six times a day for less than a minute at each event; but I want to find the cause and fix it. I'm not 100% convinced the issue is caused by a single process / issue.

I wonder if there is some underlying disk I/0 activity on the server causing these spikes? Could this be related to a potential disk error on the SSD drives? Could it be related to underlying disk raid activity?

Code:
ubuntu:/etc/mdadm# cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] 
md2 : active raid1 sdb4[1] sda4[0]
      479503360 blocks super 1.2 [2/2] [UU]
      bitmap: 3/4 pages [12KB], 65536KB chunk

md1 : active raid1 sdb3[1] sda3[0]
      7995392 blocks super 1.2 [2/2] [UU]
      
md0 : active raid1 sdb2[1] sda2[0]
      498688 blocks super 1.2 [2/2] [UU]

If so, any ideas how to trap this?

My current "best guess" is that there is some underlying disk I/O activity causing this spurious issue, and that is why nothing at the application level correlates.
 

10 More Discussions You Might Find Interesting

1. UNIX for Advanced & Expert Users

load average

we have an unix system which has load average normally about 20. but while i am running a particular unix batch which performs heavy operations on filesystem and database average load reduces to 15. how can we explain this situation? while running that batch idle cpu time is about %60-65... (0 Replies)
Discussion started by: gfhgfnhhn
0 Replies

2. UNIX for Dummies Questions & Answers

Load Average

Hello all, I have a question about load averages. I've read the man pages for the uptime and w command for two or three different flavors of Unix (Red Hat, Tru64, Solaris). All of them agree that in the output of the 2 aforementioned commands, you are given the load average for the box, but... (3 Replies)
Discussion started by: Heathe_Kyle
3 Replies

3. UNIX for Dummies Questions & Answers

top - Load average

Hello, Here is the output of top command. My understanding here is, the load average 0.03 in last 1 min, 0.02 is in last 5 min, 0.00 is in last 15 min. By seeing this load average, When can we say that, the system load averge is too high? When can we say that, load average is medium/low??... (8 Replies)
Discussion started by: govindts
8 Replies

4. Solaris

load average query.

Hi, i have installed solaris 10 on t-5120 sparc enterprise. I am little surprised to see load average of 2 or around on this OS. when checked with ps command following process is using highest CPU. looks like it is running for long time and does not want to stop, but I do not know... (5 Replies)
Discussion started by: upengan78
5 Replies

5. UNIX for Dummies Questions & Answers

Please Help me in my load average

Hello AlL,.. I want from experts to help me as my load average is increased and i dont know where is the problem !! this is my top result : root@a4s # top top - 11:30:38 up 40 min, 1 user, load average: 3.06, 2.49, 4.66 Mem: 8168788k total, 2889596k used, 5279192k free, 47792k... (3 Replies)
Discussion started by: black-code
3 Replies

6. UNIX for Advanced & Expert Users

Load average in UNIX

Hi , I am using 48 CPU sunOS server at my work. The application has facility to check the current load average before starting a new process to control the load. Right now it is configured as 48. So it does mean that each CPU can take maximum one proces and no processe is waiting. ... (2 Replies)
Discussion started by: kumaran_5555
2 Replies

7. Solaris

Load Average and Lwps

NPROC USERNAME SWAP RSS MEMORY TIME CPU 320 oracle 23G 22G 69% 582:55:11 85% 47 root 148M 101M 0.3% 99:29:40 0.3% 53 rafmsdb 38M 60M 0.2% 0:46:17 0.1% 1 smmsp 1296K 5440K 0.0% 0:00:08 0.0% 7 daemon ... (2 Replies)
Discussion started by: snjksh
2 Replies

8. UNIX for Dummies Questions & Answers

Load average spikes once an hour

Hi, I am getting a high load average, around 7, once an hour. It last for about 4 minutes and makes things fairly unusable for this time. How do I find out what is using this. Looking at top the only thing running at the time is md5sum. I have looked at the crontab and there is nothing... (10 Replies)
Discussion started by: sm9ai
10 Replies

9. UNIX for Dummies Questions & Answers

Help with load average?

how load average is calculated and what exactly is it difference between cpu% and load average (9 Replies)
Discussion started by: robo
9 Replies

10. Programming

ESP32 (ESP-WROOM-32) as an MQTT Client Subscribed to Linux Server Load Average Messages

Here we go.... Preface: ..... so in a galaxy far, far, far away from commercial, data sharing corporations..... For this project, I used the ESP-WROOM-32 as an MQTT (publish / subscribe) client which receives Linux server "load averages" as messages published as MQTT pub/sub messages.... (6 Replies)
Discussion started by: Neo
6 Replies
All times are GMT -4. The time now is 11:17 AM.
Unix & Linux Forums Content Copyright 1993-2022. All Rights Reserved.
Privacy Policy