Nearly Random, Uncorrelated Server Load Average Spikes


 
Thread Tools Search this Thread
Top Forums UNIX for Advanced & Expert Users Nearly Random, Uncorrelated Server Load Average Spikes
# 1  
Old 02-12-2020
Nearly Random, Uncorrelated Server Load Average Spikes

I have been wrangling with a small problem on a Ubuntu server which runs a LAMP application.

Code:
Linux ubuntu 4.15.0-33-generic #36-Ubuntu SMP Wed Aug 15 16:00:05 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

This server runs fine, basically:

Code:
ubuntu:/var/www# uptime
 20:17:13 up 105 days, 19:54,  1 user,  load average: 1.38, 1.12, 1.

But around five to six times a day, the (1 minute) load average spikes to around 200 and quickly goes back down to normal, all in the time interval of less than one minute:

Nearly Random, Uncorrelated Server Load Average Spikes-screen-shot-2020-02-13-91244-amjpg


I have added sensors to the crontab processes, both for the LAMP application and the server crontabs and log all cron start and stop events in the database. This instrumentation of the server yields no fruit.

In addition, in the occasional spurious events that seem to happen at "near regular" intervals, I have had a terminal window open running top, mytop, ifstats and other command line tools trying to trap the process which is causing the load spurs. There has been nothing unusual and mysql is always the "leading contender" when the pikes occur.

There is no other process with a high CPU during the spikes, so that seems to indicate something related to mysql., obviously.

With only mysql showing up as the "leading contender", I added more instrumentation to various mysql processes in the application, and can find no event on the server or in the application which correlates to this spurious behavior.

I have correlated the spikes to (1) network interface i/o stats, (2) apache processes, (3) all crontab processes (in app and on server) and regardless of my skills at adding sensors to various processes, I cannot trap the spurious process.

I have running time series graphs, and logging in the database. When there is an incident on the graph, I go to the database log and look at all the sensor entries and can find no correlation.

There is no correlation to cron processes, network i/o, users on the server, bots, and backup processes.

I keep adding more and more instrumentation to every process on the server and in the app, but all that instrumentation looking for correlation to a server process, including the LAMP app, bears no fruit.

At first, I thought this problem was caused by bots hitting the web server; but there is no correlation to increased bot traffic or LAN interface I/O.

Then I thought the problem was caused by various cron entries in the LAMP application; but reconfiguring them, turning them on and off, adding instrumentation for start / stop times in a log, also bears no fruit.

I've been working this problem on and off for over a week and cannot find a single causal reason, no a single cause for any effect, which may be causing this spurious load average behavior.

It's only around five to six times a day for less than a minute at each event; but I want to find the cause and fix it. I'm not 100% convinced the issue is caused by a single process / issue.

I wonder if there is some underlying disk I/0 activity on the server causing these spikes? Could this be related to a potential disk error on the SSD drives? Could it be related to underlying disk raid activity?

Code:
ubuntu:/etc/mdadm# cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] 
md2 : active raid1 sdb4[1] sda4[0]
      479503360 blocks super 1.2 [2/2] [UU]
      bitmap: 3/4 pages [12KB], 65536KB chunk

md1 : active raid1 sdb3[1] sda3[0]
      7995392 blocks super 1.2 [2/2] [UU]
      
md0 : active raid1 sdb2[1] sda2[0]
      498688 blocks super 1.2 [2/2] [UU]

If so, any ideas how to trap this?

My current "best guess" is that there is some underlying disk I/O activity causing this spurious issue, and that is why nothing at the application level correlates.
# 2  
Old 02-12-2020
For completeness, here are some entries from the tail of dmesg:

Code:
[    0.783084] intel_pstate: Intel P-state driver initializing
[    0.784360] intel_pstate: HWP enabled
[    0.785146] ledtrig-cpu: registered to indicate activity on CPUs
[    0.785426] EFI Variables Facility v0.08 2004-May-17
[    0.790146] intel_pmc_core:  initialized
[    0.790564] NET: Registered protocol family 10
[    0.794450] Segment Routing with IPv6
[    0.794735] NET: Registered protocol family 17
[    0.795098] Key type dns_resolver registered
[    0.796173] RAS: Correctable Errors collector initialized.
[    0.796471] microcode: sig=0x506e3, pf=0x2, revision=0xc6
[    0.797051] microcode: Microcode Update Driver: v2.2.
[    0.797068] sched_clock: Marking stable (797040485, 0)->(777089743, 19950742)
[    0.797950] registered taskstats version 1
[    0.798222] Loading compiled-in X.509 certificates
[    0.799763] Loaded X.509 cert 'Build time autogenerated kernel key: b2b2adbda3eef7b219e699c1b87c022096150517'
[    0.800481] Couldn't get size: 0x800000000000000e
[    0.800767] MODSIGN: Couldn't get UEFI db list
[    0.802200] Loaded UEFI:MokListRT cert 'Canonical Ltd. Master Certificate Authority: ad91990bc22ab1f517048c23b6655a268e345a63' linked to secondary sys keyring
[    0.802855] Couldn't get size: 0x800000000000000e
[    0.803150] MODSIGN: Couldn't get UEFI dbx list
[    0.803458] zswap: loaded using pool lzo/zbud
[    0.806301] Key type big_key registered
[    0.806595] Key type trusted registered
[    0.807944] Key type encrypted registered
[    0.808249] AppArmor: AppArmor sha1 policy hashing enabled
[    0.808530] ima: No TPM chip found, activating TPM-bypass! (rc=-19)
[    0.808826] evm: HMAC attrs: 0x1
[    0.809603]   Magic number: 7:736:368
[    0.810095] rtc_cmos 00:05: setting system clock to 2019-10-30 06:23:54 UTC (1572416634)
[    0.810491] BIOS EDD facility v0.16 2004-Jun-25, 0 devices found
[    0.810787] EDD information not available.
[    1.155604] Freeing unused kernel memory: 2408K
[    1.188162] Write protecting the kernel read-only data: 20480k
[    1.189458] Freeing unused kernel memory: 2008K
[    1.191801] Freeing unused kernel memory: 1900K
[    1.195636] x86/mm: Checked W+X mappings: passed, no W+X pages found.
[    1.195905] x86/mm: Checking user space page tables
[    1.199568] x86/mm: Checked W+X mappings: passed, no W+X pages found.
[    1.253419] ahci 0000:00:17.0: version 3.0
[    1.253706] ahci 0000:00:17.0: AHCI 0001.0301 32 slots 6 ports 6 Gbps 0x3f impl SATA mode
[    1.254096] ahci 0000:00:17.0: flags: 64bit ncq sntf led clo only pio slum part ems deso sadm sds apst 
[    1.254129] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
[    1.254133] r8169 0000:01:00.0: can't disable ASPM; OS doesn't have ASPM control
[    1.265296] r8169 0000:01:00.0 eth0: RTL8168h/8111h at 0x        (ptrval), 4c:cc:6a:04:4c:37, XID 14100880 IRQ 124
[    1.265918] r8169 0000:01:00.0 eth0: jumbo features [frames: 9200 bytes, tx checksumming: ko]
[    1.296757] r8169 0000:01:00.0 enp1s0: renamed from eth0
[    1.320853] scsi host0: ahci
[    1.321512] scsi host1: ahci
[    1.322057] scsi host2: ahci
[    1.322601] scsi host3: ahci
[    1.323061] scsi host4: ahci
[    1.323651] scsi host5: ahci
[    1.323975] ata1: SATA max UDMA/133 abar m2048@0xdf118000 port 0xdf118100 irq 123
[    1.324302] ata2: SATA max UDMA/133 abar m2048@0xdf118000 port 0xdf118180 irq 123
[    1.324558] ata3: SATA max UDMA/133 abar m2048@0xdf118000 port 0xdf118200 irq 123
[    1.324810] ata4: SATA max UDMA/133 abar m2048@0xdf118000 port 0xdf118280 irq 123
[    1.325060] ata5: SATA max UDMA/133 abar m2048@0xdf118000 port 0xdf118300 irq 123
[    1.325309] ata6: SATA max UDMA/133 abar m2048@0xdf118000 port 0xdf118380 irq 123
[    1.328090] [drm] Memory usable by graphics device = 4096M
[    1.328379] checking generic (c0000000 1e0000) vs hw (c0000000 10000000)
[    1.328380] fb: switching to inteldrmfb from EFI VGA
[    1.328641] Console: switching to colour dummy device 80x25
[    1.328696] [drm] Replacing VGA console driver
[    1.334329] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    1.334331] [drm] Driver supports precise vblank timestamp query.
[    1.338355] i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem
[    1.338674] [drm] Finished loading DMC firmware i915/skl_dmc_ver1_26.bin (v1.26)
[    1.342834] [drm] Initialized i915 1.6.0 20171023 for 0000:00:02.0 on minor 0
[    1.344165] ACPI: Video Device [GFX0] (multi-head: yes  rom: no  post: no)
[    1.344419] input: Video Bus as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/LNXVIDEO:00/input/input6
[    1.628416] clocksource: Switched to clocksource tsc
[    1.641018] ata3: SATA link down (SStatus 4 SControl 300)
[    1.641032] ata5: SATA link down (SStatus 4 SControl 300)
[    1.641048] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    1.641060] ata6: SATA link down (SStatus 4 SControl 300)
[    1.641074] ata4: SATA link down (SStatus 4 SControl 300)
[    1.641089] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    1.642670] ata1.00: supports DRM functions and may not be fully accessible
[    1.642759] ata2.00: supports DRM functions and may not be fully accessible
[    1.643571] ata1.00: disabling queued TRIM support
[    1.643572] ata1.00: ATA-9: Samsung SSD 850 EVO 500GB, EMT02B6Q, max UDMA/133
[    1.643574] ata1.00: 976773168 sectors, multi 1: LBA48 NCQ (depth 31/32), AA
[    1.643598] ata2.00: disabling queued TRIM support
[    1.643599] ata2.00: ATA-9: Samsung SSD 850 EVO 500GB, EMT02B6Q, max UDMA/133
[    1.643615] ata2.00: 976773168 sectors, multi 1: LBA48 NCQ (depth 31/32), AA
[    1.645667] ata1.00: supports DRM functions and may not be fully accessible
[    1.645691] ata2.00: supports DRM functions and may not be fully accessible
[    1.646488] ata1.00: disabling queued TRIM support
[    1.646513] ata2.00: disabling queued TRIM support
[    1.648205] ata1.00: configured for UDMA/133
[    1.648232] ata2.00: configured for UDMA/133
[    1.648412] scsi 0:0:0:0: Direct-Access     ATA      Samsung SSD 850  2B6Q PQ: 0 ANSI: 5
[    1.648765] sd 0:0:0:0: Attached scsi generic sg0 type 0
[    1.649033] sd 0:0:0:0: [sda] 976773168 512-byte logical blocks: (500 GB/466 GiB)
[    1.649080] scsi 1:0:0:0: Direct-Access     ATA      Samsung SSD 850  2B6Q PQ: 0 ANSI: 5
[    1.649104] sd 0:0:0:0: [sda] Write Protect is off
[    1.649107] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[    1.649216] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    1.649313] sd 1:0:0:0: Attached scsi generic sg1 type 0
[    1.649393] sd 1:0:0:0: [sdb] 976773168 512-byte logical blocks: (500 GB/466 GiB)
[    1.649408] sd 1:0:0:0: [sdb] Write Protect is off
[    1.649411] sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[    1.649425] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    1.650494]  sda: sda1 sda2 sda3 sda4
[    1.650580]  sdb: sdb1 sdb2 sdb3 sdb4
[    1.651233] sd 0:0:0:0: [sda] supports TCG Opal
[    1.651235] sd 0:0:0:0: [sda] Attached SCSI disk
[    1.651310] sd 1:0:0:0: [sdb] supports TCG Opal
[    1.651311] sd 1:0:0:0: [sdb] Attached SCSI disk
[    1.816138] [drm] Cannot find any crtc or sizes
[    1.832959] random: fast init done
[    1.833102] random: systemd-udevd: uninitialized urandom read (16 bytes read)
[    1.833120] random: systemd-udevd: uninitialized urandom read (16 bytes read)
[    1.833128] random: systemd-udevd: uninitialized urandom read (16 bytes read)
[    1.835976] md/raid1:md1: active with 2 out of 2 mirrors
[    1.835990] md1: detected capacity change from 0 to 8187281408
[    1.836596] md/raid1:md0: active with 2 out of 2 mirrors
[    1.836609] md0: detected capacity change from 0 to 510656512
[    1.836831] md/raid1:md2: active with 2 out of 2 mirrors
[    1.841013] md2: detected capacity change from 0 to 491011440640
[    1.920035] raid6: sse2x1   gen() 14417 MB/s
[    1.968041] raid6: sse2x1   xor() 10359 MB/s
[    2.016042] raid6: sse2x2   gen() 18297 MB/s
[    2.052151] [drm] Cannot find any crtc or sizes
[    2.064043] raid6: sse2x2   xor() 11977 MB/s
[    2.112038] raid6: sse2x4   gen() 21191 MB/s
[    2.160038] raid6: sse2x4   xor() 12870 MB/s
[    2.208039] raid6: avx2x1   gen() 29740 MB/s
[    2.256038] raid6: avx2x1   xor() 20399 MB/s
[    2.304039] raid6: avx2x2   gen() 36755 MB/s
[    2.352038] raid6: avx2x2   xor() 22326 MB/s
[    2.400039] raid6: avx2x4   gen() 40979 MB/s
[    2.448033] raid6: avx2x4   xor() 23242 MB/s
[    2.448035] raid6: using algorithm avx2x4 gen() 40979 MB/s
[    2.448036] raid6: .... xor() 23242 MB/s, rmw enabled
[    2.448037] raid6: using avx2x2 recovery algorithm
[    2.448681] xor: automatically using best checksumming function   avx       
[    2.449192] async_tx: api initialized (async)
[    2.780803] [drm] RC6 on
[    3.040192] random: crng init done
[    3.040195] random: 7 urandom warning(s) missed due to ratelimiting
[    3.302697] EXT4-fs (md2): mounted filesystem with ordered data mode. Opts: (null)
[    3.380050] ip_tables: (C) 2000-2006 Netfilter Core Team
[    3.391342] systemd[1]: systemd 237 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN -PCRE2 default-hierarchy=hybrid)
[    3.408121] systemd[1]: Detected architecture x86-64.
[    3.411068] systemd[1]: Set hostname to <condor2769>.
[    3.470691] systemd[1]: Set up automount Arbitrary Executable File Formats File System Automount Point.
[    3.470764] systemd[1]: Reached target System Time Synchronized.
[    3.470880] systemd[1]: Created slice User and Session Slice.
[    3.470904] systemd[1]: Reached target User and Group Name Lookups.
[    3.470943] systemd[1]: Started Forward Password Requests to Wall Directory Watch.
[    3.471019] systemd[1]: Created slice System Slice.
[    3.471075] systemd[1]: Listening on Journal Socket (/dev/log).
[    3.480628] EXT4-fs (md2): re-mounted. Opts: errors=remount-ro,usrquota
[    3.505280] systemd-journald[364]: Received request to flush runtime journal from PID 1
[    3.513289] systemd-journald[364]: File /var/log/journal/c0688dc6550e40d9b44a66237890f6da/system.journal corrupted or uncleanly shut down, renaming and replacing.
[    3.530296] IPv6: ADDRCONF(NETDEV_UP): enp1s0: link is not ready
[    3.555546] (NULL device *): hwmon_device_register() is deprecated. Please convert the driver to use hwmon_device_register_with_info().
[    3.555803] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
[    3.556869] intel-lpss 0000:00:15.0: enabling device (0000 -> 0002)
[    3.560814] idma64 idma64.0: Found Intel integrated DMA 64-bit
[    3.565947] intel-lpss 0000:00:15.1: enabling device (0000 -> 0002)
[    3.566174] idma64 idma64.1: Found Intel integrated DMA 64-bit
[    3.570988] mei_me 0000:00:16.0: enabling device (0000 -> 0002)
[    3.578580] r8169 0000:01:00.0 enp1s0: link down
[    3.578582] r8169 0000:01:00.0 enp1s0: link down
[    3.578660] IPv6: ADDRCONF(NETDEV_UP): enp1s0: link is not ready
[    3.585416] intel-lpss 0000:00:1e.0: enabling device (0000 -> 0002)
[    3.585799] idma64 idma64.2: Found Intel integrated DMA 64-bit
[    3.592856] RAPL PMU: API unit is 2^-32 Joules, 5 fixed counters, 655360 ms ovfl timer
[    3.592857] RAPL PMU: hw unit of domain pp0-core 2^-14 Joules
[    3.592858] RAPL PMU: hw unit of domain package 2^-14 Joules
[    3.592859] RAPL PMU: hw unit of domain dram 2^-14 Joules
[    3.592859] RAPL PMU: hw unit of domain pp1-gpu 2^-14 Joules
[    3.592859] RAPL PMU: hw unit of domain psys 2^-14 Joules
[    3.602188] AVX2 version of gcm_enc/dec engaged.
[    3.602189] AES CTR mode by8 optimization enabled
[    3.631620] Adding 7995388k swap on /dev/md1.  Priority:-2 extents:1 across:7995388k SSFS
[    3.649228] intel_rapl: Found RAPL domain package
[    3.649229] intel_rapl: Found RAPL domain core
[    3.649230] intel_rapl: Found RAPL domain uncore
[    3.649231] intel_rapl: Found RAPL domain dram
[    3.716185] dw-apb-uart.2: ttyS4 at MMIO 0xdf117000 (irq = 20, base_baud = 115200) is a 16550A
[    3.768473] EXT4-fs (md0): mounting ext2 file system using the ext4 subsystem
[    3.769920] EXT4-fs (md0): mounted filesystem without journal. Opts: (null)
[    7.404362] r8169 0000:01:00.0 enp1s0: link up
[    7.404368] IPv6: ADDRCONF(NETDEV_CHANGE): enp1s0: link becomes ready
[ 4075.546801] perf: interrupt took too long (2508 > 2500), lowering kernel.perf_event_max_sample_rate to 79500
[ 5689.837424] perf: interrupt took too long (3142 > 3135), lowering kernel.perf_event_max_sample_rate to 63500
[10115.250368] perf: interrupt took too long (3945 > 3927), lowering kernel.perf_event_max_sample_rate to 50500
[22838.481641] TCP: request_sock_TCP: Possible SYN flooding on port 443. Sending cookies.  Check SNMP counters.
[28028.743897] perf: interrupt took too long (4950 > 4931), lowering kernel.perf_event_max_sample_rate to 40250
[280629.260406] TCP: request_sock_TCP: Possible SYN flooding on port 443. Sending cookies.  Check SNMP counters.
[343987.398818] md: data-check of RAID array md0
[343987.411753] md: delaying data-check of md1 until md0 has finished (they share one or more physical units)
[343987.467934] md: delaying data-check of md2 until md1 has finished (they share one or more physical units)
[343989.916428] md: md0: data-check done.
[343989.920038] md: data-check of RAID array md1
[344029.804168] md: md1: data-check done.
[344029.808416] md: data-check of RAID array md2
[346437.124572] md: md2: data-check done.
[385142.122374] TCP: request_sock_TCP: Possible SYN flooding on port 443. Sending cookies.  Check SNMP counters.
[526986.800286] perf: interrupt took too long (6196 > 6187), lowering kernel.perf_event_max_sample_rate to 32250
[1150592.768812] TCP: request_sock_TCP: Possible SYN flooding on port 11211. Sending cookies.  Check SNMP counters.
[1150631.783170] cgroup: fork rejected by pids controller in /system.slice/apache2.service
[1212935.515650] TCP: request_sock_TCP: Possible SYN flooding on port 443. Sending cookies.  Check SNMP counters.
[2186110.449750] TCP: request_sock_TCP: Possible SYN flooding on port 443. Sending cookies.  Check SNMP counters.
[2410857.588817] cgroup: fork rejected by pids controller in /system.slice/apache2.service
[2766788.336203] md: data-check of RAID array md0
[2766788.352563] md: delaying data-check of md1 until md0 has finished (they share one or more physical units)
[2766788.392480] md: delaying data-check of md2 until md1 has finished (they share one or more physical units)
[2766790.823840] md: md0: data-check done.
[2766790.826994] md: data-check of RAID array md1
[2766830.713540] md: md1: data-check done.
[2766830.716909] md: data-check of RAID array md2
[2769236.777339] md: md2: data-check done.
[4077245.109452] usb 1-4: new low-speed USB device number 2 using xhci_hcd
[4077245.266436] usb 1-4: New USB device found, idVendor=0a81, idProduct=0205
[4077245.266441] usb 1-4: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[4077245.266446] usb 1-4: Product: PS2 to USB Converter
[4077245.266450] usb 1-4: Manufacturer: CHESEN
[4077245.304834] hidraw: raw HID events driver (C) Jiri Kosina
[4077245.321865] usbcore: registered new interface driver usbhid
[4077245.321867] usbhid: USB HID core driver
[4077245.330021] input: CHESEN PS2 to USB Converter as /devices/pci0000:00/0000:00:14.0/usb1/1-4/1-4:1.0/0003:0A81:0205.0001/input/input7
[4077245.389637] hid-generic 0003:0A81:0205.0001: input,hidraw0: USB HID v1.10 Keyboard [CHESEN PS2 to USB Converter] on usb-0000:00:14.0-4/input0
[4077245.390055] input: CHESEN PS2 to USB Converter as /devices/pci0000:00/0000:00:14.0/usb1/1-4/1-4:1.1/0003:0A81:0205.0002/input/input8
[4077245.449900] hid-generic 0003:0A81:0205.0002: input,hidraw1: USB HID v1.10 Mouse [CHESEN PS2 to USB Converter] on usb-0000:00:14.0-4/input1
[4077272.721817] fbcon: inteldrmfb (fb0) is primary device
[4077272.748609] Console: switching to colour frame buffer device 128x48
[4077272.767415] i915 0000:00:02.0: fb0: inteldrmfb frame buffer device
[4077295.853502] usb 1-6: new high-speed USB device number 3 using xhci_hcd
[4077296.002270] usb 1-6: New USB device found, idVendor=14dd, idProduct=0002
[4077296.002275] usb 1-6: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[4077296.002281] usb 1-6: Product: Multidevice
[4077296.002285] usb 1-6: Manufacturer: Peppercon AG
[4077296.002289] usb 1-6: SerialNumber: 03CDF9B8AC3A66ED5EE5F7C3072A91E8
[4077296.004069] input: Peppercon AG Multidevice as /devices/pci0000:00/0000:00:14.0/usb1/1-6/1-6:1.0/0003:14DD:0002.0003/input/input9
[4077296.061725] hid-generic 0003:14DD:0002.0003: input,hidraw2: USB HID v1.01 Keyboard [Peppercon AG Multidevice] on usb-0000:00:14.0-6/input0
[4077296.062405] input: Peppercon AG Multidevice as /devices/pci0000:00/0000:00:14.0/usb1/1-6/1-6:1.1/0003:14DD:0002.0004/input/input10
[4077296.062625] hid-generic 0003:14DD:0002.0004: input,hidraw3: USB HID v1.01 Mouse [Peppercon AG Multidevice] on usb-0000:00:14.0-6/input1
[4088040.967662] usb 1-6: USB disconnect, device number 3
[4088104.869754] usb 1-6: new high-speed USB device number 4 using xhci_hcd
[4088105.018599] usb 1-6: New USB device found, idVendor=14dd, idProduct=0002
[4088105.018605] usb 1-6: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[4088105.018609] usb 1-6: Product: Multidevice
[4088105.018613] usb 1-6: Manufacturer: Peppercon AG
[4088105.018616] usb 1-6: SerialNumber: 03CDF9B8AC3A66ED5EE5F7C3072A91E8
[4088105.020475] input: Peppercon AG Multidevice as /devices/pci0000:00/0000:00:14.0/usb1/1-6/1-6:1.0/0003:14DD:0002.0005/input/input11
[4088105.078213] hid-generic 0003:14DD:0002.0005: input,hidraw2: USB HID v1.01 Keyboard [Peppercon AG Multidevice] on usb-0000:00:14.0-6/input0
[4088105.079216] input: Peppercon AG Multidevice as /devices/pci0000:00/0000:00:14.0/usb1/1-6/1-6:1.1/0003:14DD:0002.0006/input/input12
[4088105.079532] hid-generic 0003:14DD:0002.0006: input,hidraw3: USB HID v1.01 Mouse [Peppercon AG Multidevice] on usb-0000:00:14.0-6/input1
[4425126.063597] usb 1-6: USB disconnect, device number 4
[4425127.120530] usb 1-4: USB disconnect, device number 2
[4608173.543565] audit: type=1400 audit(1577024807.084:2): apparmor="DENIED" operation="change_onexec" info="label not found" error=-2 profile="unconfined" name="system_tor" pid=4141 comm="(tor)"
[4674102.600246] TCP: request_sock_TCP: Possible SYN flooding on port 443. Sending cookies.  Check SNMP counters.
[5389167.147735] TCP: enp1s0: Driver has suspect GRO implementation, TCP performance may be compromised.
[5790886.556050] md: data-check of RAID array md0
[5790886.570198] md: delaying data-check of md1 until md0 has finished (they share one or more physical units)
[5790886.580154] md: delaying data-check of md2 until md1 has finished (they share one or more physical units)
[5790889.088669] md: md0: data-check done.
[5790889.091915] md: data-check of RAID array md1
[5790928.804089] md: md1: data-check done.
[5790928.808618] md: data-check of RAID array md2
[5793342.403501] md: md2: data-check done.
[5885167.576271] TCP: request_sock_TCP: Possible SYN flooding on port 443. Sending cookies.  Check SNMP counters.
[5942225.927974] r8169 0000:01:00.0 enp1s0: link down
[5942286.125907] r8169 0000:01:00.0 enp1s0: link up
[6100421.130628] TCP: request_sock_TCP: Possible SYN flooding on port 443. Sending cookies.  Check SNMP counters.
[6848807.673874] DCCP: Activated CCID 2 (TCP-like)
[6848807.681997] sctp: Hash tables configured (bind 1024/1024)
[8210127.728955] md: data-check of RAID array md0
[8210127.742698] md: delaying data-check of md1 until md0 has finished (they share one or more physical units)
[8210127.780876] md: delaying data-check of md2 until md1 has finished (they share one or more physical units)
[8210130.257361] md: md0: data-check done.
[8210130.260788] md: data-check of RAID array md1
[8210170.116940] md: md1: data-check done.
[8210170.121703] md: data-check of RAID array md2
[8212579.951548] md: md2: data-check done.
[8325516.433600] TCP: request_sock_TCP: Possible SYN flooding on port 443. Sending cookies.  Check SNMP counters.
[9051600.650503] r8169 0000:01:00.0 enp1s0: link down
[9051660.833780] r8169 0000:01:00.0 enp1s0: link up

# 3  
Old 02-13-2020
Update:

Have checked the health of the disks and all seems fine, for example (one of many checks):

Code:
ubuntu#  smartctl -H /dev/sda
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.15.0-33-generic] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

# 4  
Old 02-13-2020
I have seen cases like that when you have enormous disc cache synchronising... all is fine when pure transactional mode ( the periodic sync has hmmm "little" to update ) but when it comes to batches running, it fills the cache before periodic sync and then the system has to wait the end of forced sync (file cache OS side...)
Other contention possible big file cache with big SGA: parsing both and not finding reduces perfs...

Last edited by vbe; 02-13-2020 at 04:07 AM..
This User Gave Thanks to vbe For This Post:
# 5  
Old 02-13-2020
I would check if at that time your server is doing something "heavy batch mod style" at the same time like backup or import/export of tables...
# 6  
Old 02-13-2020
Quote:
Originally Posted by vbe
I would check if at that time your server is doing something "heavy batch mod style" at the same time like backup or import/export of tables...
Yes, I know...

That is why I previously mentioned that I have checked all cron and cron-like processes which are all batch-like processes. The server only runs a LAMP application, so there are no other application processes of any consequence.

I think I mentioned all the things you mentioned already in an earlier post; but thanks for brainstorming... and coming up with ideas.

Honestly, I am already logging all batch-like processing at the application level, and there is no correlation, as I mentioned.
# 7  
Old 02-13-2020
you tried
Code:
sar -b 5 5

or more adequate value?
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Programming

ESP32 (ESP-WROOM-32) as an MQTT Client Subscribed to Linux Server Load Average Messages

Here we go.... Preface: ..... so in a galaxy far, far, far away from commercial, data sharing corporations..... For this project, I used the ESP-WROOM-32 as an MQTT (publish / subscribe) client which receives Linux server "load averages" as messages published as MQTT pub/sub messages.... (6 Replies)
Discussion started by: Neo
6 Replies

2. UNIX for Dummies Questions & Answers

Help with load average?

how load average is calculated and what exactly is it difference between cpu% and load average (9 Replies)
Discussion started by: robo
9 Replies

3. UNIX for Dummies Questions & Answers

Load average spikes once an hour

Hi, I am getting a high load average, around 7, once an hour. It last for about 4 minutes and makes things fairly unusable for this time. How do I find out what is using this. Looking at top the only thing running at the time is md5sum. I have looked at the crontab and there is nothing... (10 Replies)
Discussion started by: sm9ai
10 Replies

4. Solaris

Load Average and Lwps

NPROC USERNAME SWAP RSS MEMORY TIME CPU 320 oracle 23G 22G 69% 582:55:11 85% 47 root 148M 101M 0.3% 99:29:40 0.3% 53 rafmsdb 38M 60M 0.2% 0:46:17 0.1% 1 smmsp 1296K 5440K 0.0% 0:00:08 0.0% 7 daemon ... (2 Replies)
Discussion started by: snjksh
2 Replies

5. UNIX for Advanced & Expert Users

Load average in UNIX

Hi , I am using 48 CPU sunOS server at my work. The application has facility to check the current load average before starting a new process to control the load. Right now it is configured as 48. So it does mean that each CPU can take maximum one proces and no processe is waiting. ... (2 Replies)
Discussion started by: kumaran_5555
2 Replies

6. UNIX for Dummies Questions & Answers

Please Help me in my load average

Hello AlL,.. I want from experts to help me as my load average is increased and i dont know where is the problem !! this is my top result : root@a4s # top top - 11:30:38 up 40 min, 1 user, load average: 3.06, 2.49, 4.66 Mem: 8168788k total, 2889596k used, 5279192k free, 47792k... (3 Replies)
Discussion started by: black-code
3 Replies

7. Solaris

load average query.

Hi, i have installed solaris 10 on t-5120 sparc enterprise. I am little surprised to see load average of 2 or around on this OS. when checked with ps command following process is using highest CPU. looks like it is running for long time and does not want to stop, but I do not know... (5 Replies)
Discussion started by: upengan78
5 Replies

8. UNIX for Dummies Questions & Answers

top - Load average

Hello, Here is the output of top command. My understanding here is, the load average 0.03 in last 1 min, 0.02 is in last 5 min, 0.00 is in last 15 min. By seeing this load average, When can we say that, the system load averge is too high? When can we say that, load average is medium/low??... (8 Replies)
Discussion started by: govindts
8 Replies

9. UNIX for Dummies Questions & Answers

Load Average

Hello all, I have a question about load averages. I've read the man pages for the uptime and w command for two or three different flavors of Unix (Red Hat, Tru64, Solaris). All of them agree that in the output of the 2 aforementioned commands, you are given the load average for the box, but... (3 Replies)
Discussion started by: Heathe_Kyle
3 Replies

10. UNIX for Advanced & Expert Users

load average

we have an unix system which has load average normally about 20. but while i am running a particular unix batch which performs heavy operations on filesystem and database average load reduces to 15. how can we explain this situation? while running that batch idle cpu time is about %60-65... (0 Replies)
Discussion started by: gfhgfnhhn
0 Replies
Login or Register to Ask a Question