Gearside Design

How to stop spambots like semalt, buttons-for-website, darodar, and others!

Updates
June 1, 2015 Because keeping up with spambots is very time consuming and requires so many filters/exclusions, I now prefer a "Valid Hostname" alternate method (See below).

Who doesn’t love seeing traffic spikes from referrers in their Google Analytics reports? The first thing most of us do is copy the referrer path into a new browser tab and try to find where the link back is. Have you ever gone to the referring page only to have it try to sell you on their SEO services and a link to your site is nowhere to be found? Congratulations, you found a spambot!

Spambots are as clever as they are frustrating. They prey upon web admins who don’t know how they work as a means of getting traffic and ultimately selling their “services” (which likely include capturing your webserver to use as a bot in their scheme). Regardless of their motive, they are ruining your metrics! Spambot traffic needs to be circumvented ASAP, so let’s get started.

There are many ways to remove spambots from your Google Analytics reports. I’ve narrowed it down to my favorite four.

Google Analytics Bot Filtering

Let’s start with the easiest one to do. Like the GA Filters (coming up), the traffic is still hitting your website and Google Analytics simply isn’t reporting it. If you ever change Google Analytics accounts, or when you make a new website, you’ll need to re-enable this. However, that’s the easiest thing to do:

bot-filtering

  • Log into Google Analytics.
  • Select the property you’d like to work on.
  • Under the Admin tab (at the very top), click “View Settings” on the right-most column.
  • Scroll down and check the box for “Exclude all hits from known bots and spiders”.

Google Analytics Filters

This is a last-chance type filter. Setting up a filter to hide traffic from a certain ISP or domain will prevent future metrics from being reported from that referrer. I consider this to be an out-of-sight out-of-mind fix. The spambots are still going to your site, but Google Analytics will not report the traffic. The caveat here is that you’ll be making plenty of filters, and if a spambot changes it’s TLD, you’ll need another one. That being said, this is recommended because the benefits are that if you change webhosts, or re-code your website you won’t need to remember to copy anything over— as long as you’re still using the same Google Analytics account, the filters will continue working.

 

Screen Shot 2015-06-18 at 8.15.17 PM

To set up a filter, log into Google Analytics and select the property you’d like to work with. Click on the “Admin” tab at the top of the page, and on the right-most column click “Filters”. You’ll want to create a new filter, and set the name to something like “Semalt Spambot” so you know exactly what it is for. The filter type is going to be Custom as “Exclude” > “Campaign Source”, and in the Filter Pattern input field type in the domain of the spambot— in this example it would be “semalt.com”. Save the filter and you’re done! Rinse and repeat for each domain you’d like to filter.

For advanced users, you could use a Regex pattern here like .*domain1.com|.*domain2.com, but keep in mind that you need to constantly maintain these filters, and Google Analytics only allows patterns of 255 characters.

htaccess Rules

This method will stop the spambot before it even sees the first byte of code from your front-end. What’s nice about this is that the .htaccess file can live in your public_html (or equivalent) directory and block spambots for everything on your server— this means that if you have multiple sites, you only have to do this once. The drawback is that you would need to remember to bring over these rules any time you change hosts or if you are re-coding a website that isn’t covered by a previous .htaccess file.

To create these rules, locate your .htaccess file (if it exists). If you are using WordPress, it will be found in the top-most directory where you installed WP. If it doesn’t exist, create a new one (there are tons of other benefits of using an .htaccess file too- like compression, cacheing, and other security precautions!). Warning: any syntax errors you make in your .htaccess file will trigger a 500-level error on your server, so just be careful. If you do trigger an error, either remove or comment the code and try again (comments use a “#” symbol).

Here is an example to block a handful of spambot traffic:

htaccess

Valid Hostname Include

This is now my preferred method for blocking spambots. I use this in combination with various server-side filters (like the .htaccess method above) along with a custom PHP function that pulls the domain list from my regularly updated Gist of common referral spambots.

What this method does is instead of filtering out bad domains, it only allows valid hostnames. To set this up, create a new segment that only allows hostnames from a given regex pattern (Don’t worry! This pattern is very easy to write!). Here’s a screenshot of my hostname segment:

valid-hostname-segment

The pattern should include any hostname/domain that you use as well as hostnames of Google’s translate and cache. Here is mine, so you can modify it as needed. Just remember to separate domains with a | and escape periods with a .

RegEx

.*gearside.com|.*gearsidecreative.com|.*googleusercontent.com

Once you create the segment, view a report with that segment info. Once you’re certain that you’ve included all hostnames that you get actual traffic from, it’s time to create a new View with a filter. In the Google Analytics admin, create a new View called “My Hostnames”. Then, under “filters”, create a custom Include filter that uses the same filter pattern that you used in your segment. Be sure to select “Hostname” from the filter field dropdown.

Screen Shot 2015-06-14 at 7.41.02 PM

The reason I like this so much is that it has the obscurity of the UA-000000-2 tracking ID that (at least right now) are targeted much less than the UA-000000-1 codes, and it stops all ghost spambots.

 

Hope this helps! If you have other ideas or have a good case study be sure to comment below. Here is a good example of one motive for spambots »