Home » How to Stop Bad Robots from Crawling Your Blog
Hey, everyone. Today’s blog post comes out of a very real and scary situation that I dealt with a couple of weeks ago, so strap in! It’s story tiiiime. If you’re reading this and you have a blog, this is extremely important info and I strongly suggest pinning this/bookmarking this post, god forbid, you experience something like this. None of my blogger friends had gone through this, so I didn’t have anyone besides dudes on Stack Overflow to go to for advice. So ya, saddle up, because this is a long, wild, and EDUCATIONAL ride.
THE SPIKE IN TRAFFIC
Three weeks ago, after the launch of this post, I noticed it was getting a ton of hits.
My WordPress app sent me multiple notifications that my traffic was spiking, and I checked and it was spiking A LOT. Like, we’re talking record-breaking.
At first, I was so excited. A ton of traffic to my website? Heck yessss this is what I’ve been waiting and working for! I didn’t think anything of it at first, and I was really happy that I was getting the most traffic I’ve ever received in my (nearly) five years of blogging.
SOMETHING WAS WRONG
After two days, I noticed my stats kept rising. At this point, my curiosity kicked in. How was I getting so many hits to that one post? Did someone share it somewhere? Where was the traffic coming from?
My WordPress analytics were telling me that people were accessing my site through Facebook. Since WordPress Analytics aren’t as robust as Google Analytics, I logged on to my GA account to see if I could gain anymore insight. WHERE on Facebook was my link?
This is the critical moment that I knew something was wrong. After trying to find the exact place on Facebook where the traffic was coming from, all I could find was this eerie, mysterious string of letters and numbers.
I mean, what the HECK is this?? WHAT IS P-2228/FB (for the record, I still don’t know). That can’t be good, y’all. My stomach began to sink. I also typed in my link and keywords into the Facebook search bar, and searched all relevant pages that may have reposted it. Nothing turned up. The number of hits to my website KEPT rising! Something was definitely fishy here.
HOW I FOUND THE SOURCE OF TRAFFIC
I host with SiteGround, which is SUPPOSED to be the number 1 hosting site (I have beef with them now though, more on that later). SiteGround automatically scans my website for malware/suspicious activity on a weekly basis. If something was up, wouldn’t it have been detected in the weekly scan?
You guys may not know this, but I actually have a decent technical background. I used to study computers way back when, and I feel comfortable enough navigating the server end/code of my website. I thought to myself, if there was something off about my site, I should be able to tell through SiteGround somehow. Inside SiteGround, I went to my site’s control panel, aka cPanel. Once inside the cPanel, I clicked on “AWStats.”
BAD VS. GOOD ROBOTS
I had never looked at AWStats in my LIFE, but I needed to know if there was info in there that Google/WordPress were failing to provide me with. My intuition was indeed correct, and it was here that found the source of the problem.
I scrolled and scrolled until I came upon this box of data. Here, I saw that the majority of the traffic to my site was from “Unknown robot (identified by ‘bot’ followed by a space or one of the following characters _+:,.;/\-).” Jeez, louise. THAT CAN’T BE GOOD! I then Googled that exact line and learned several things:
1. What is a robot (in terms of cyber security)? Wikipedia provides a solid definition:
“An Internet bot, also known as web robot, WWW robot or simply bot, is a software application that runs automated tasks (scripts) over the Internet. The largest use of bots is in web spidering (web crawler), in which an automated script fetches, analyzes and files information from web servers at many times the speed of a human. More than half of all web traffic is made up of bots.”
2. What’s the difference between a good robot and a bad robot? This website puts it simply:
“Good bots exist to monitor the web. For example, a “Googlebot” is Google’s web crawling bot, often referred to as a “spider.” Googlebots crawl the Internet for SEO purposes and discover new pages to add to the Google index. These bots make sure we’re being rewarded for our SEO efforts and penalize those who use black hat SEO techniques.
Bad bots represent over 35 percent of all bot traffic. Hackers execute bad bots to perform simple and repetitive tasks. These bots scan millions of websites and aim to steal website content, consume bandwidth and look for outdated software and plugins that they can use as a way into your website and database.”
3. HOW THE HECK DO I STOP THE BAD UNKNOWN ROBOTS!! OMG! (I was in full-on panic mode at this point).
HOW TO STOP BAD ROBOTS FROM CRAWLING YOUR BLOG
At this point, I reached out to SiteGround’s live support chat while having a breakdown. I was unbelievably upset that bad robots were trying to drive traffic to my site with the intention of crashing my servers and hacking my site. I have worked on my blog for nearly five years and suddenly, out of nowhere, someone was trying to take that away from me. WHY! WHY ME?!? I was hyperventilating and on the verge of tears. I reached out to SiteGround’s live chat support not one, not two, not three, but a total of FIVE TIMES (this is the beef I was talking about earlier). My site was up against some TOUGH robots, guys.
The folks at SiteGround did a number of things to combat the problem. They added a robots.txt file (which you should ALL HAVE! I think I deleted mine thinking I wouldn’t need it years ago, big mistake), as well as some lines of code to my .htaccess file (found on cPanel). After all of the changes, it took well over 24 hours for my traffic to return to normalcy. I was ready to accept that most of my traffic would be artificial going forward, when finally, it died down.
OTHER THINGS YOU CAN DO
Right before my traffic went back to normal, I was looking into paying for firewall/security software (thank goodness I didn’t end up doing so). Two services I was looking into were Cloudflare and Wordfence. I found out about both of those sites through trusted sources.
I certainly went to hell and back but I also learned so much from this experience. Not only did I learn how to stop bad robots from crawling your blog, I developed a solid understanding of robots in the world of modern cyber security. The scary thing is, bots are continuously evolving. Hackers are constantly inventing new ways to steal information. Knowing this, purchasing firewall software is probably a smart move for the long run.
Have you ever had a cyber security scare on your site? Do you have a robust robots.txt file? Do you feel like you have a good grip on how to stop bad robots from crawling your blog? Let me know what you thought of this post in the comments! I’ve never written about techie stuff before and I want to know if you enjoyed it/found it helpful 🙂
Connect with me on social:
This post uses affiliate links. NRoH thanks you for your support!