Posts Tagged ‘Metrics’
Essential guide to data accuracy in web analytics
The issue of data quality and accuracy in web analytics is something that most web analysts have no option but to learn and internalise very quickly, especially when people start asking why numbers don’t match. However, it is often easy for us to forget that our clients, business users and marketing teams don’t live and breath this data as we do. This post is therefore a reminder of the essential (by no means definitive) facts about why web analytics data can’t necessarily be taken as fact.
Why are the numbers different?
Most people first recognise a problem with web analytics data because they are trying to reconcile absolute numbers between two different systems, for example when comparing visits in Google Analytics with clicks as reported by Atlas (or some other ad tracking tool). The following are the key reasons why these numbers don’t match:
- The terminology used to calculate metrics usually differs slightly. For example, unique visitors must always be unique visitors within [a certain time frame]. Different vendors may use different time frames. Neither is right or wrong; they are just different. This same principle can also apply to lots of other metrics, and sometimes on a much more subtle level.
- Whilst advances are constantly being made, there are currently no agreed standards to these definitions. Analytics vendors often try to name-drop ABCe standards (at least in the UK), but these are generally considered to be outdated and were created for reporting on visits that derive from banner advertising and search; not for web analysis. Here is a good synopsis of the current state of standards.
- Tracking methodologies, such as cookies, packet sniffers and IP addresses all collect data in different ways and all have pros and cons to the way in which they do this. See example below for further info on this one.
- The Internet is composed of a huge array of different technologies, which are all constantly evolving and changing. These technologies play a big part in the accuracy of data collection.
- New browser versions invariably feature new types of technology that allow increasingly savvy web users to hide their on-line behaviour, or even block this behaviour by default.
- Robots and spiders crawl Internet pages in order to e.g. index what is in them for search engines. Data quality in web analytics is a race to keep up with these creatures!
Cookies and Unique Visitors – An Example
The issue of cookies is generally the biggest area of confusion. A client of mine was recently comparing Google Analytics to their incumbent provider, Sophus3. They noticed large differences in unique visitors and wanted to understand why. Whilst this issue is in some respect the product of all the points raised above, the main cause is the type of cookie used:
With Google Analytics, visitors are tracked using 1st party cookies. Estimates suggest that around 1% of users block these cookies and a further 4% block JavaScript. GA is therefore unable to track these users, so real visitors may be under-counted by about 5%.
Sophus3, on the other hand, uses 3rd party cookies. Many browsers block these by default, so estimates suggest that around 65% of traffic is lost due to the combination of this and JavaScript blocking.
Sophus3 then use IP address to track visitors who have blocked cookies. However, most broadband providers use dynamic IP addresses, which change periodically. In some cases, the IP address could change every time the person switches on their computer. Therefore, Sophus3 will register individual people as multiple visitors, and overall numbers will therefore be inflated.
The following chart illustrates this issue in a more visual way (numbers are rough estimates to illustrate a point, and are not meant to be accurate):
How cookies can affect data accuracy in web analytics
Whilst 1st party cookies are generally considered in the industry to be best practice, in truth neither is perfect. For more information, here is a more detailed overview of how cookies affect web analytics data.
Get over it!
The issue of data accuracy can cripple companies and cause vast amounts of wasted time. In truth there is no solution, it is much better to:
- Understand the limitations in as much detail as possible and ensure that all recipients of web reporting and analysis are familiar with what the numbers do and don’t tell them.
- Focus on trends and segments, and not on absolute numbers. This is easy to do when the focus is on analysis and not pure reporting; insight never comes from pure numbers.
- Where numbers such as unique visitors are required for decision making, confidence levels should be used to make reasonable judgements about those numbers.
- If we set a consistent base-line of data at the most accurate that we can get it, then we can use this data to make accurate trend assumptions and draw conclusions about time-series analyses.
Errors of causation in web analytics
The other day I was presenting the findings of some analysis to a client. The focus of this analysis was to discover the behavioural factors affecting checkout completion rates in order to shed some light on why people drop out. For example, within this analysis I was able to say fairly basic things such as:
- Visitors who spend a lot of time on the site before their purchase are less likely to drop out of the checkout process than those who’s session is shorter
- Visitors who land on the homepage are more likely to drop out of the process than those who land on a product page
Now, the client immediately got rather excited about this and began to say, regarding the first point, “Wow, so if we can increase the time on site then we can improve our drop-out rates. Excellent, how do we increase time on site?”. Had I allowed it, this person would no doubt have been rushing back to the marketing team with a new objective to get the time on site up!
So what’s wrong with this? Well, apart from the very numerous issues with dwell-time associated with this specific example, this represents a very common misunderstanding in web analysis. Put very simply:
Your customer didn’t complete their purchase because they were on your site for a long time. They were probably on your site for a long time because they are interested in your products and your site is relevant to them which, in turn, means they are more likely to complete their purchase. Increasing dwell-time per se doesn’t make any sense in this example because it isn’t the cause.
To provide a simpler example of this: you might notice that people who dress smartly often have quite tidy hair as well. Does this mean that dressing smartly causes tidy hair? If I put a suit on, will my hair instantly become much tidier because I’m wearing a suit? No, there is some other factor (the person’s need to look smart) that is driving both of these things.
This leads to all kinds of problems in web analysis, some of which are quite subtle. Furthermore, this problem is inextricably bound up with our obsession with click-stream; if we can’t see beyond the web analysis tool then we have to find our causes within it. The biggest danger is that we stop being able to see our visitors as real people with real needs, and instead just view them as lines of data or collections of behaviours.
A couple more examples of how this can cause problems:
- You notice that direct traffic is of a higher quality than other sources. Does this mean that you should simply get more people to come to you direct? You could do this by displaying your URL as a static image in non-clickable banners, meaning that people have to physically type it into the browser. Again, no. Real direct traffic is direct because of brand familiarity and relevance, which may have nothing to do with advertising. The pure fact that it’s direct is of little relevance. (by the way, be careful – direct traffic isn’t always what it seems)
- You notice a correlation between downloads of your latest white paper and calls to your salesteam. Excellent, the white paper is a succesful acquisition tool and is driving leads! Or is it? Which way round is it really? Are people calling you because they downloaded the white paper, or did they look at your site and dowload the white paper because they called you?
Remember, correlation does not imply causation! You can avoid this by remembering that your customers are real people with needs, desires, habits and lifestyles. They are not lines of data with dwell-times, page counts and completion rates. These things are only behavioural indications of something else more complex that is happening. Look beyond the click-stream and understand how your customers think and feel.
Analytics Direct Traffic is NOT What You Think It Is
Analytics direct traffic reports are often viewed as both a highly insightful metric and, in itself, as a particularly valuable stream of visitors. These are people that typed your URL directly into their browser, right? They must have seen your TV ad or just been really engaged with your brand because they remembered your address and didn’t need to use search. Who could ask for better visitors? They are motivated and focused and really intended to come here.
This kind of language continues to dominate all kinds of discussions about web analytics, including blogs, forums, and articles – and even reaches into the field of the experts; just look at the way Google Analytics defines direct traffic. It’s even more worrying when I hear the way my clients talk about it!
The fact is, this definition of direct traffic in web analysis is extremely misleading. It’s true that the direct traffic bucket does include bookmark traffic and typed URLs, but these days (unless you are very strict about your campaign tracking parameters) it can and does include all kinds of other stuff. All it really means is that the session started without a referrer being passed by the user’s browser, and this can happen for lots of reasons as defined in this rather neat list. I have done some tests on some of my clients’ sites and estimate that in some cases up to 90% of ‘direct’ traffic is infact banner ad or PPC traffic!
Here’s an exercise you can perform that will demonstrate exactly how prolific this problem is: as you’re browsing the Internet and following links from one site to the next, you can check the referrer that is passed by typing the following snippet of code into the address bar of your browser:
javascript:alert(document.referrer)
For example, if you visit a site like AOL and click on one of the advertising banners, when you arrive at the destination page replace the URL with the code – a pop-up will appear showing you the referrer (or nothing if one wasn’t passed). Try this with different types of sites, banners and links. Also try it with different browsers. As you will see, quite a lot of the time the referrer is blank. This means that your visit would have been counted as direct traffic in the analytics reports of that site!
So, it’s time stop thinking of direct traffic as people typing in your URL, this isn’t necessarily the case. ‘Other’ or ‘unknown’ would be a more accurate description.
It’s also time to realise the importance of campaign tracking on your inbound links, as Avinash Kaushik points out in his definition of analytics direct traffic. If you always ensure that your links are passing source and campaign info, then you are forcing the referrer field to be populated even if the browser doesn’t pass it. Here’s an easy way to build campaign tracking URLs in Google Analytics.
Measuring engagement & the dangers of dwell-time
I was driven to write this post after chatting to the online marketing manager of a large international company, who proudly told me that ‘dwell-time’ was now one of their most important KPIs; and that they had issued instructions to all local marketing teams that the primary focus for the coming year was to ‘increase dwell-time’, thereby getting customers ‘more engaged’. I suggested that they make the pages take longer to load. He didn’t get the joke!
In seriousness though, this is a very common example of the way many companies view their websites. Personally I think it might come from too many years dealing with traditional offline media – “if only we could find a way to get people to look at our bill-board for longer, and pay more attention to it!” But beware…
The danger of dwell-time

In most cases measuring dwell-time as ‘engagement’ (or even at all) is not only wrong, but is frankly dangerous. Just a few of the reasons for this are as follows:
- A lot of your visitors are at your site because they want to get something done, quickly: place an order for something they decided to buy last week; find your address; get help; and so on. Why do you want this to take longer? If you ran a supermarket you might want people to spend longer browsing the aisles, but would you want them to have to queue for longer at the check-out??
- I might spend 2 hours ‘engaging’ with every aspect of your site, but that might be because I despise you and am learning everything about you so I can destroy you! This is extreme, but the point is that engagement isn’t necessarily positive engagement.
- Most companies find, if they run the analysis, that people who buy things spent longer on the site than people who didn’t. This leads them to think that if they can get people to spend longer on the site then they will surely buy more stuff. This is one of the biggest errors I see in web analytics, and not just regarding this example. People who buy things don’t buy things because they were on the site longer, they were on the site longer because they were in the mood to buy something, or because your site was relevant to them. Simply getting people to stay on the site longer doesn’t change their state of mind, and by obsessing over it you ignore the real underlying drivers.
- If what you really want to do is get people more engaged with your content, and get them to think positively about it – why not just measure that? Do a survey or run some focus groups; ask them what they thought and, if they don’t like it, ask them why not and how you can improve it. This kind of brand engagement is a deeply emotional and qualitative thing – how on earth do you expect to correlate it to something so cold and bland as the time they spent on your site?
But there is something more fundamental underlying all this. I think in most of these cases companies (especially non-ecommerce sites) are unsure what their website IS; what it means to them strategically and, more importantly, the role it plays in the overall journeys taken by their different customer segments. How exactly do you want the content on your site to influence your customers’ behaviour? Do you even know how your customers are using the site at the moment? Until these questions are answered (quantitatively and qualitatively) you will never be able to meet them in relevant dialogue through your site. And if you really think this through, and then think back to the concept of pure dwell-time – how absurd does that sound now? It’s like locking the doors of the shop and not letting people out!
But we are trying to achieve something, so what is it and how do we go about it?

Nevertheless, websites do have a communicative role to play. Our visitors need to be influenced, motivated, persuaded, dazzled, awed – not just to make them buy something, but so that we become part of their lives in whatever way is relevant to them. So how do we do it? Well, unfortunately the answer to this question is deeply unique to every single business – you need to go on your own voyage of discovery in order to understand exactly what ‘success’ and ‘performance’ mean to you and therefore how to influence them. However, here are some tips to set you off:
- Push the site itself (and especially anything to do with click-stream data) out of your mind temporarily. Work out who your customers are and why and how they want to interact with you as a business. Similarly, work out how you want them to think of you, and what role you want to play in their lives. Now, in the middle of all this – what does/might the website mean to them; how does it help them; what would make it important to them? If you have the budget I would strongly recommend this being a major research project.
- Remember that you don’t just have one type of customer, and even similar customers want different things at different times. Segment your customers by who they are and what they want to achieve, and make sure you understand the above question according to these different types of customers. What role does the site play for them at the current stage in their journey with you?
- Ensure that your objectives and KPIs reflect this understanding. If by engagement you really mean that all visitors successfully completed what they came to do, then ask them whether they did or not and use this as a KPI. If the journeys and tasks that people want to perform are totally different, then you need different KPIs.
- If things like dwell-time are still relevant to some of these journeys then use them, but remember and take heed: these are indicators of other behaviours or attitudes. You cannot influence this metric directly. Know what drives it!
- Never rely solely on click-stream data as your source of insight. Sometimes it is easier for continual reporting if all KPIs are based on click-stream, but if this is the case then you need to make sure you explain and drive these metrics using other, qualitative sources of data. Click-stream is the what, not the why!
Above all, remember that your website is not and will never be a ‘pamphlet on the web’. You might think of it like this, but your customers most certainly don’t. These days brands sink or swim based on how effectively they ‘engage’ with people through digital channels, but this ‘engagement’ is a million miles away from ‘dwell-time’!
