Location: Asia Pacific, Cyberspace, in the Dark Dystopia
Posts: 19,118
Thanks Given: 2,351
Thanked 3,359 Times in 1,878 Posts
So, let's try this:
Empty the "trap" again and block two Chinese subnetworks with rouge, unidentified bot activity.
Honestly, this is starting to "annoy me a lot" in the possibility that these performance hits, and all the time I am spending to find the cause of these hits / spikes, wasting valuable "time in life" is related to rouge, unidentified bots from Chinese networks.
If this continues, I am going to start blocking Chinese networks at the /16 and /8 levels (entire networks).
First, let's see if this is indeed the main source of these spikes. As we all know from situational awareness theory and the famous OODA loop by John Boyd.
OBSERVE
ORIENT
DECIDE
ACT
Already, we have enough information to ACT. But lets continue to OBSERVE
The loop goes on ... and on ....
Please note that we cannot trust apache2 modules and other third-party software to automatically block IPs, because this can results in blocking the "good bots" which are important for search engine optimization and site traffic.
That means, if this is confirmed that these kinds of bots continue to be the cause of problems, then I will need to DECIDE how to deal with this situation moving forward. I think point in time, I am going to continue to "trap and trace" before making a decision. However, it does seem, at this point, that rouge, unidentified bots from Chinese networks are causing performance issues and need to be "dealt with".
If anyone else has experienced similar issues and has an interesting potential solution to this problem, please reply and share your ideas.
Thanks!
PS: I may consider automating this, as follows:
Capture network session activity when one minute load average exceeds a threshold (as I am doing now).
Filter results captured in the DB based on "hitcount" and "country".
If the "hitcount" exceeds a certain threshold and "country" is in an array of "known to have rouge bots countries".
Location: Asia Pacific, Cyberspace, in the Dark Dystopia
Posts: 19,118
Thanks Given: 2,351
Thanked 3,359 Times in 1,878 Posts
Update:
I have confirmed 100% the source of the these spikes were very aggressive, rogue, unidentified bots originating on Chinese networks. After blocking the resident networks of these bots, all spikes have stopped, completely.
This is a "huge success story", going from unknown, uncorrelated performance hits / spikes due to nearly random spikes in performance to cause identification and total resolution. As you can see from the graph over the last 24 hours, there have been zero spikes.
I will keep the same MQTT and Node-RED instrumentation in place (which I am very pleased with) and will also keep all "spike trapping" instrumentation and DB logging in place, so if other spikes appear, which I am fairly confident more of these "pesky" bots will appear sooner or later, I will trap them, identify the source and block their resident networks.
Success!
MQTT and Node-RED did not "solve the problem". MQTT and Node-RED provided a very powerful and flexible way for me to quickly instrument custom sensors and logging, which helped me identify the problem.
I highly, recommend MQTT and Node-RED. These tools are free. Thank you very much MQTT and Node-RED developers!
Here we go....
Preface:
..... so in a galaxy far, far, far away from commercial, data sharing corporations.....
For this project, I used the ESP-WROOM-32 as an MQTT (publish / subscribe) client which receives Linux server "load averages" as messages published as MQTT pub/sub messages.... (6 Replies)
Hi,
I am getting a high load average, around 7, once an hour. It last for about 4 minutes and makes things fairly unusable for this time.
How do I find out what is using this. Looking at top the only thing running at the time is md5sum.
I have looked at the crontab and there is nothing... (10 Replies)
Hi ,
I am using 48 CPU sunOS server at my work.
The application has facility to check the current load average before starting a new process to control the load.
Right now it is configured as 48. So it does mean that each CPU can take maximum one proces and no processe is waiting.
... (2 Replies)
Hello AlL,..
I want from experts to help me as my load average is increased and i dont know where is the problem !!
this is my top result :
root@a4s # top
top - 11:30:38 up 40 min, 1 user, load average: 3.06, 2.49, 4.66
Mem: 8168788k total, 2889596k used, 5279192k free, 47792k... (3 Replies)
Hi,
i have installed solaris 10 on t-5120 sparc enterprise.
I am little surprised to see load average of 2 or around on this OS.
when checked with ps command following process is using highest CPU. looks like it is running for long time and does not want to stop, but I do not know... (5 Replies)
Hello, Here is the output of top command. My understanding here is,
the load average 0.03 in last 1 min, 0.02 is in last 5 min, 0.00 is in last 15 min.
By seeing this load average, When can we say that, the system load averge is too high?
When can we say that, load average is medium/low??... (8 Replies)
Hello all, I have a question about load averages.
I've read the man pages for the uptime and w command for two or three different flavors of Unix (Red Hat, Tru64, Solaris). All of them agree that in the output of the 2 aforementioned commands, you are given the load average for the box, but... (3 Replies)
we have an unix system which has
load average normally about 20.
but while i am running a particular unix batch which performs heavy
operations on filesystem and database average load
reduces to 15.
how can we explain this situation?
while running that batch idle cpu time is about %60-65... (0 Replies)