I spend many hours hunting down hungry, rogue spiders in the log files. I found many spiders not obeying robots.txt, so I blocked their network completely.
Maybe we will see some improvement before the upgrade, but I am not confident we will see much, since the bot traffic is very small compared to the Google search referral traffic.
Also, I spend many hours trying various MySQL and Apache configurations and tuning parameters, but that did not result in any tangible improvements and, in some cases, trying to add or allocate more resources just made it worse. For example, we were getting some errors because Apache needed more than 250 concurrent servers. I tried changing this to 350 and the load average when to over 350 (!!) during peak time, crushing the server.
Obviously, when we upgrade, traffic will greatly increase again. This is great, thanks to all of your super posts, great expertise, helping others help themselves.
Thank you for your patience during this phase of our growing pains!
As a side note, I am amazed at the number of spiders that roam the internet these days; and from all over the world. It seems everyone wants to be the next Google. I even noticed that two of the spiders were operating out of the new Amazon Elastic Computing Cloud (EC2).
Now I have a new security topic to write about, something like:
The Attack of the Cyberspiders from the Clouds !!!
You have to love life on the net