UNIX for Advanced &amp; Expert Users

02-14-2020

Administrator

19,118, 3,359

Join Date: Sep 2000

Last Activity: 15 July 2022, 8:51 AM EDT

Location: Asia Pacific, Cyberspace, in the Dark Dystopia

Posts: 19,118

Thanks Given: 2,351

Thanked 3,359 Times in 1,878 Posts

Update:

Two more spikes over the course of the past 12 hours, none of which show any correlation to an increase the number of bots; however, that does not say anything about the velocity of bots hitting the site. However, there is no correlation (over the last 12 hours and with two spikes recorded) with an increased overall number of bots. In addition, there is no correlation to increase network I/0.

The only consistent correlation so far is:

The one minute load average spike.
The increase in the total number of apache2 processes.
The increase in the total percentage of apache2 CPU % recorded.

There is weak, but inconsistent, correlation with:

An increase in the number of bots on the site .
An increase in the overall MySQL CPU percent.

There is no correlation, weak or otherwise to:

An increase in network I/O.
Any application cron activity.
Any system cron activity.
Any system disk I/O errors or anomalies.

So, as a sanity check, I have disabled apache2 mod pagespeed (just now) to see if there is any effect at all.

Code:

<IfModule pagespeed_module>
    # Turn on mod_pagespeed. To completely disable mod_pagespeed, you
    # can set this to "off".
    ModPagespeed off

    ###
    ###

This is just a "shot in the dark" (disabling mod pagespeed), but at least we will know something. If the spikes continue, I will turn it back on, of course.

Neo

02-15-2020

Administrator

19,118, 3,359

Join Date: Sep 2000

Last Activity: 15 July 2022, 8:51 AM EDT

Location: Asia Pacific, Cyberspace, in the Dark Dystopia

Posts: 19,118

Thanks Given: 2,351

Thanked 3,359 Times in 1,878 Posts

Update:

Quote:

Originally Posted by Neo

So, as a sanity check, I have disabled apache2 mod pagespeed (just now) to see if there is any effect at all.

Code:

<IfModule pagespeed_module>
    # Turn on mod_pagespeed. To completely disable mod_pagespeed, you
    # can set this to "off".
    ModPagespeed off

    ###
    ###

This is just a "shot in the dark" (disabling mod pagespeed), but at least we will know something. If the spikes continue, I will turn it back on, of course.

Did not help at all. Slowed the site down a bit and did not stop any spikes.

Code:

ModPagespeed on

Neo

02-15-2020

Administrator

19,118, 3,359

Join Date: Sep 2000

Last Activity: 15 July 2022, 8:51 AM EDT

Location: Asia Pacific, Cyberspace, in the Dark Dystopia

Posts: 19,118

Thanks Given: 2,351

Thanked 3,359 Times in 1,878 Posts

Next:

I have some old "cyberspace situational awareness" PHP code I used for a visualization project a few years ago, which captures and stores details information on web session activity; this code has proven handy identifying rouge bots in the past.

So, I have modified that code to capture and store detailed session information, including the number of hits per IP address, the user agent string, country code, etc. when the 1 minute load average is above 20 and less than 50.

Code:

$theload = getLoadAvg();
if (floatval($theload) > 20.0 && floatval($theload) < 50.0) 
{
  /// the old CSA code to parse web session activity and store the results in the DB
}

So, let's see what happens the next time we get a spike... this should be interesting.

Code:

mysql> describe neo_csa_session_manager;
+--------------+------------------+------+-----+---------+----------------+
| Field        | Type             | Null | Key | Default | Extra          |
+--------------+------------------+------+-----+---------+----------------+
| id           | int(11) unsigned | NO   | PRI | NULL    | auto_increment |
| user_id      | int(11)          | NO   | MUL | 0       |                |
| session_id   | varchar(255)     | NO   |     | NULL    |                |
| url          | text             | NO   |     | NULL    |                |
| ip_address   | varchar(45)      | NO   | MUL | NULL    |                |
| user_agent   | varchar(255)     | NO   |     | NULL    |                |
| bot_flag     | tinyint(1)       | NO   |     | 0       |                |
| robot_txt    | mediumint(6)     | NO   |     | 0       |                |
| sitemap      | mediumint(6)     | NO   |     | 0       |                |
| riskscore    | int(11)          | NO   |     | 0       |                |
| country_iso2 | varchar(2)       | NO   |     | UN      |                |
| country      | varchar(50)      | NO   |     | UNKNOWN |                |
| hitcount     | int(10) unsigned | NO   |     | 1       |                |
| firstseen    | bigint(11)       | NO   |     | NULL    |                |
| unixtime     | bigint(11)       | YES  |     | NULL    |                |
| longitude    | float            | NO   |     | 0       |                |
| latitude     | float            | NO   |     | 0       |                |
+--------------+------------------+------+-----+---------+----------------+
17 rows in set (0.00 sec)

This User Gave Thanks to Neo For This Post:

Neo

02-15-2020

Administrator

19,118, 3,359

Join Date: Sep 2000

Last Activity: 15 July 2022, 8:51 AM EDT

Location: Asia Pacific, Cyberspace, in the Dark Dystopia

Posts: 19,118

Thanks Given: 2,351

Thanked 3,359 Times in 1,878 Posts

Update:

Just noticed, after digging around in the DB logs from my MQTT instrumentation, that the last spike correlated with a jump in data transferred out of the network interface:

Nearly Random, Uncorrelated Server Load Average Spikes-screen-shot-2020-02-15-65207-pmjpg

Typical values are much less (see below), so this would seem to validate the "rouge bots hypothesis", currently leading the candidate to explain these periodic spikes:

Nearly Random, Uncorrelated Server Load Average Spikes-screen-shot-2020-02-15-65655-pmjpg

This is also the first "hard correlation" of a spike with network interface iostats, so, let's see if my code in the post before this one will trap the next big spike

Neo

02-15-2020

Administrator

19,118, 3,359

Join Date: Sep 2000

Last Activity: 15 July 2022, 8:51 AM EDT

Location: Asia Pacific, Cyberspace, in the Dark Dystopia

Posts: 19,118

Thanks Given: 2,351

Thanked 3,359 Times in 1,878 Posts

Update:

Nearly Random, Uncorrelated Server Load Average Spikes-screen-shot-2020-02-16-92316-amjpg

There were two spikes three hours apart; both were captured by my HTTP session logging program, which logs session detaisl aggregated by IP address. In this case, the code starts logging (kicks off) when the one minute load average exceeds 20 and ends when the same load average exceeds 50. So, in a spike we will record a very short snap shot in time of the traffic (on the way up and on the way down, but I may change this in the future to only capture on the way up).

The results were as follows:

In both spikes, there were at least fort Chinese IP addresses present at the top of the "hit count" chart (the DB table):

116.232.49.231
116.232.48.112
117.144.138.130
117.135.187.18

All four of these IP addresses were present during the 4AM and 7AM (Bangkok Time) spikes, and all three identified with the same user agent string:

Code:

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36

This indicates these IP addresses (in China) are running the same bot software; but that is only an indication (but a fairly strong indication).

However, there is no denying that my "trap the bots" code has identified four Chinese IP addresses running some bot software which is more-than-likely contributing to the cause of the spikes.

In addition, during the same two spikes spaced three hours apart (as mentioned), there was one US-based IP address running with a blank user agent string:

208.91.198.98

Keep in mind in this capture, the code only captured the session information when the one minute load average was above 20 and below 50, and there were two spikes spaced almost exactly three hours apart:

Nearly Random, Uncorrelated Server Load Average Spikes-screen-shot-2020-02-16-90255-amjpg

So, having recorded the events above, I have just now emptied that DB table and have "reset the trap" for the next spikes.

Now, turning our attention to my instrumentation log where I am using MQTT to log all application and system cron (batch) events (start and end times) as well as a number of system metrics, we see there is a correlation (during the first spike) at 1581800045 of a spike in traffic out of the network interface, along with correlating spikes in Apache2 processes and CPU.

Nearly Random, Uncorrelated Server Load Average Spikes-screen-shot-2020-02-16-90944-amjpg

Now looking at the second event (spike 2), there is a similar pattern, but of interest in that proceeding both spikes, is an hourly application cron function:

Nearly Random, Uncorrelated Server Load Average Spikes-screen-shot-2020-02-16-91546-amjpg

This seems to indicate that the cause of the spikes, in this case, is a combination of aggressive bot activity coincidental with an hourly cron / batch process, causing spikes.

To be more certain of this, I am going to change the time of the "update attachment view" cron process from kicking off on the 53 minute mark of every hour, to the 23 minute mark of every hour, and see if the times of the spikes shift in time as well.

Neo

02-15-2020

Administrator

19,118, 3,359

Join Date: Sep 2000

Last Activity: 15 July 2022, 8:51 AM EDT

Location: Asia Pacific, Cyberspace, in the Dark Dystopia

Posts: 19,118

Thanks Given: 2,351

Thanked 3,359 Times in 1,878 Posts

Just got another spike exactly three hours after the last one, not correlated to any cron / batch process:

Nearly Random, Uncorrelated Server Load Average Spikes-screen-shot-2020-02-16-100552-amjpg

Nearly Random, Uncorrelated Server Load Average Spikes-screen-shot-2020-02-16-100408-amjpg

Chinese IPs:

117.144.138.130
116.232.49.231
116.232.48.112

Nearly Random, Uncorrelated Server Load Average Spikes-screen-shot-2020-02-16-100817-amjpg

Same Chinese bots, same IPs, same user agent strings.

This User Gave Thanks to Neo For This Post:

Neo