9th February, 2017 7 Min read
Book a Demo
Data is the cornerstone of all online marketing and business decisions. Proper understanding of visitors and market trends, behaviour and industry shifts – it all comes down to data driven insights. And those insights can only be as good as your data.
Reduce your AWS costs by over 50%
Discover your Cloud Saving Potential – Answer just 5 simple questions. AppsFlyer, Playtika, Lufthansa, IBM, top leading companies are already using our FinOps services.
In order to gather data we use a variety of monitoring and analytical tools, and one such widely used is definitely Google Analytics (GA). It helps gather intel about traffic, users, channels and behaviour, making it the source of all the main website metrics. But if you’ve used it, you certainly noticed some data simply doesn’t add up, and you rightfully doubt the legitimacy of it.
The most frequent source of inaccurate data picked up by your GA is bot traffic. This should come as no surprise considering that over half of the Internet’s traffic is made by bots, according to a report by Imperva Incapsula. Bot traffic that gets picked up by analytical tools can end up skewing your data reports, which can lead to wrong assumptions and conclusions, impact your site performance and ultimately even harm your business. In this article we’ll cover the basics of bot traffic, how to detect it and how to eliminate it.
Considering that for every human hit there’s a bot hit to a server, it is safe to say that bots are an essential component of the internet infrastructure. One one side we have “good bots” which are used by organisations to gather information and perform automated tasks (such as search engine bots that crawl and index your site), while on the other side there are “bad bots” deployed by cyber-criminals whose sole intent is to steal data or participate in botnets for launching powerful DDoS attacks (such as the Dyn one).
According to Imperva Incapsula, 22.9% of all traffic is made by good bots while 28.9% are malicious. Good bots, such as search engine crawlers, respect your robots.txt file instructions and are excluded from GA reports by default. Bad bots, on the other hand, visit your site with all kind of intentions like spamming, content scraping or malware distribution. Advanced bots often do a great job at imitating human behaviour making it very difficult to separate them from regular human visitors. Vast media coverage of security breaches has further pushed the online community to rethink their security solutions. Many of those solutions offer a high level of bot protection, filtering out most bot traffic.
Being treated mainly as a security issue, bots are often an afterthought for marketers. However, even if most of the bot traffic gets eliminated, there are still residues left that your GA easily picks up and treats as legitimate visits. These residues can end up messing up your analytical data. When talking about bad bots that get picked up by GA, there are two main types:
If you check your GA dashboard regularly you could encounter a sudden burst in traffic on some occasions. If there were no special occasions, campaigns or social events that could justify the sudden increase in traffic then it is likely due to bot traffic. In that case make sure to check some of the following GA reports for those unexpected spikes in traffic:
Google has announced a global solution for filtering bots and referral spam in GA but until it gets released here are a few tricks you can implement by yourself. Before you start applying any filters, make sure to set up an unfiltered view – a “filter with no filters” – that will encompass all your website traffic data, including bots. Don’t skip this step, nobody wants to lose data to a typo.
As a first step, you should get to know your bot traffic. As bots can be good, bad and neutral, make sure all your internal teams are coordinated on the topic. Align marketing, IT, sales, site operations, etc. efficiently as some bots may be related to partners, used tools or extensions and thus legitimate.
Next is a recap of the procedures needed to successfully exclude bot data. For an in depth step-by-step guide, make sure to check our previous article on the topic of filtering bot traffic.
To exclude bot data, you need to label bots as bots. Next, you need to create appropriate filters in GA to apply them to your data in order to keep it as clean as possible. Start by going to the Admin section in GA, then Settings and then Create Copy. Name it – www.yourwebsite.com// Bot Exclusion View or similar. You will then use this new view to filter out bot traffic. At first it will appear empty but will build up with time.
After previously detecting your bot sources, you can now proceed to setting up filters that will eliminate invalid data from your reports. For ghost bots, you will need to setup a filter by Hostname and take note of all the valid hostnames. After that create a regex that will hold only those. The new regex will also capture all subdomains on the main domain. Now head back to your Bot Exclusion view to add a new custom filter. Select Include Only Hostname and add the created regex into the field.
Now, with zombie bots, it’s a bit more complicated. You’ll have to filter zombie bots by detecting their footprints first. Use the reports where you previously detected bot traffic and add those suspicious sources to a new regex. Proceed to detect other bot footprints and repeat the regex procedure. After detecting zombie bot footprints, head back to Admin section – Filters in Bot Exclusion view to apply the filters. Do the same steps as for ghost bots, but instead of Hostname, create filters to exclude each regex data. You can see a step-by-step guide for setting up filters here.
Now the new view will filter almost all bot traffic. At some point, however, you will probably need to verify historical traffic in your original view. This will require you to set up an Advanced Segment that replicates the filters applied earlier. You’ll need to Add a Segment in the Reporting dashboard of your original view. Name it something like “Bot Filter” and add all the new filters in the Advanced – Conditions section (keep in mind the Include/Exclude setting).
By doing so you will have and Advanced Segment filter at your disposal. This will contain all your bot filters and you will be able to apply it on any report and even for any selected date range. Also there are lots of other advanced techniques that can be implemented. But if you don’t have adequate technical skills, we strongly suggest to avoid these:
As said, bots make more than half of all Internet traffic and are a force to be reckoned. With the advance of technology, bot sophistication is rising and their impact on businesses is increasing. Until a global solution gets put in place, creating GA filters will bill your safest pick. Worth mentioning is the fact that eliminating bot traffic from your GA reports simply hides them from your data but they do still hit your servers, and may consume resources as well as impact the performance of your web assets. Stopping bot traffic altogether requires high expertise and adequate tools. Also, all the above described steps can be stressful and time-consuming so at a certain point you might want to consider contacting experts on the matter. If you seek excellence in eliminating analytics spam without risking your unfiltered data, filtering false positives or creating unsustainable server changes, our experts are always here to help. Feel free to contact our experts at GlobalDots as they can help you pick security and performance solutions that best suit your needs.
EX.CO is a video technology platform that enables publishers to monetize video content on websites.
Justt is a chargeback mitigation startup based in Tel Aviv. Chargebacks, as defined, are demands by a credit card provider for a retailer to reimburse losses on fraudulent or disputed transactions. Justt’s objective is to assist merchants worldwide in combating false chargebacks using its proprietary artificial intelligence technology.
The cloud used to be viewed as a place of significant cost savings: rather than purchasing and maintaining dozens of server stacks, organizations could outsource this and purchase compute power on an as-needed basis. In the ensuing rush to cloud architecture, however, many companies simply lifted-and-shifted their old financial bad habits. The sheer speed of […]
Cloud computing has transformed more than individual app architectures: it’s granted both start-ups and market leaders an equal platform for innovation. New products are no longer dependent upon complex revenue-draining in-house server stacks. Instead, cloud-native disruptors such as Uber and Airbnb have been able to harness the once-unthinkable degrees of agility, scalability, and cost-efficiency that […]
Schedule a call with our experts. Discover new technology and get recommendations to improve your performance.