Why Direct Traffic Isn't So Direct
by Greg Lett
Have you checked the performance of your direct traffic in Google Analytics lately? Does it look like it tanked? Are you left wondering what the heck happened? Good news! Chances are that it looks much worse than it actually is. There are lots of reasons that your direct traffic might be struggling, but one thing in particular may be single-handedly responsible for dragging down your metrics. In fact thousands of different websites are dealing with the same problem. You're not alone. In this post I'll explain how to investigate, identify and resolve the issue in order to fix your data.
How do you know if your site is afflicted? The fastest way to check is by looking at your direct traffic's bounce rate and/or percentage of new sessions. If either of these metrics are north of 50%, there is cause for more digging. We'll discuss how to determine if your metrics are skewed with more certainty shortly.
What's the cause? Truthfully we only have some of the answers. We do know that this traffic is completely bogus. These visitors are actually robots/crawlers/spiders, not humans. They always come directly to the site, but from different locations, ISPs, browsers, etc. At a more general level we have seen this happen twice in the last two years. The first instance seems to have more consistency than the second and is easier to filter from your data. It can be identified fairly easily in Google Analytics by viewing the Audience > Technology > Browser & OS report. You are looking for a browser named "Mozilla Compatible Agent". If present, you should see the bounce rate and percentage of new sessions nearing 100%, while average session duration and pages-per session are close to zero. Not all Mozilla Compatible Agent traffic is bogus, but much of it likely is not real human traffic. Excluding most of this phony Mozilla traffic is relatively easy.
On July 30, 2014, Google announced a new feature in Analytics, which is designed specifically to filter out non-human traffic. It can be enabled in your GA account by navigating to the View Settings section in your admin panel and checking the new box appropriately labeled "exclude all hits from known bots and spiders".
Enabling this feature will help clean-up your future data, but not your historical data. The best way to view clean historical data is to create a new Advanced Segment in your GA account. You can do that by simply following this hyperlink, which will automatically create the new segment for you. It uses something called RegEx (Regular Expressions) to filter the unwanted traffic from your data. The filter pattern is borrowed from the folks over at LunaMetrics. The second instance of this fake traffic is a little more complicated to exclude. We are not completely confident of the problem's source, but some believe that scammers utilizing AdRoll's retargeting network are to blame. However, we have identified several clients' sites with the same problem that are not currently using AdRoll's services. This traffic is always "direct" and it always lands on the homepage. It's unique in that it always uses Internet Explorer 7. All other characteristics are highly inconsistent (location, service provider, etc.). Once again, you can check for this in your own account by viewing the Audience > Technology > Browser & OS report. Make sure you have the Direct Traffic segment enabled and click on "Internet Explorer". If browser version 7.0 is present and has % New Sessions and Bounce Rates that are both close to 100%, you've got it. We first noticed this in our clients' accounts on July 7, 2014. It continued to grow throughout the month, until it suddenly disappeared on July 30th. Unfortunately it reappeared in our data exactly 7 days later on August 6th. As of the time of this writing the traffic is still present in many accounts we manage. As I mentioned earlier, this particular issue is more difficult to filter.
Google's built-in bot removal feature doesn't exclude this traffic. You'll need two filters in order to remove the bad data and be warned, you might lose a little bit of good data in the process. However the loss of a little bit of good data is far outweighed by excluding the large volume of bad data. First, you must implement a custom advanced filter and then a custom exclude filter. Demonstrating how to create and apply these filters is probably a little too technical for this post. If you're interested in setting this up for your account, please let us know and we'll be happy to help. Analytics are important to the performance of any site and you need to be able to rely on the data to make solid, grounded decisions. It's imperative that you question the data when something doesn't quite make sense. If it's illogical on the surface, chances are that there's an underlying issue that is responsible for obscuring your true metrics. Just because Analytics records it doesn't make it fact. Be sure you have a certain level of confidence in the accuracy of your data before making decisions that affect the direction of your business.