All sectors and sizes of companies are potentially affected by the unreliability of their data. Small and medium-sized organisations generally do not have the time or resources to assess quality and/or resolve reliability issues.
Large companies tend to generate a lot of campaigns and new web pages. They cannot keep up with the pace of publication while respecting data quality standards.
The poor quality of analytical data can take different forms, with sometimes serious consequences for your business:
- loss of income
- reduction in the ROI of marketing actions
- loss of quality in decision-making
- contamination of other data projects (CRM, Data Lake, CDP, etc.)
- decreased internal trust and credibility
The intrinsic risk in web analytics
Without preventive action, the quality of the data is altered by nature. Sources of error are diverse and inherent to some web technologies: unmeasured data, robot traffic, browser inaccuracies, traffic jams, etc. We have identified, quantified and represented here the main risk factors that threaten the quality of your analytical data.
The critical phase of collection
The data collection phase is critical because it is permanent. Each optimisation, functionality, new campaign or new piece of content poses a risk to the quality of data collection. An effective collection strategy brings together all the company’s decision-making players and adapts to each development on an ongoing basis. This reflection on data collection must be taken into consideration when defining a data governance policy.
The more you update and enrich your mobile sites and applications, the more likely you are to inadvertently alter your analytics tags. This seems to be an elementary mistake, but it is very common to find missing, defective or duplicated tags, especially on large sites with a lot of content. While these tagging problems, sometimes minimal, can be difficult to detect, they have a significant impact on performance. It is of vital importance to be vigilant over the integrity of tags! Checking the source code of all pages is therefore essential, but who has the time to do this tedious manual task? Crawling tools allow you to automatically browse the site, all pages and sections combined, to check the presence of digital analytics tags. Others allow you to check your tags live once they are implemented on a site. They can also issue a report indicating the problems to be solved.
According to some estimates, robots (or “bots”) account for more than half of web traffic. To know exactly the actual volume of flows, it is essential to have the means to identify and exclude the part generated by the robots that visit your sites. However, some “bad bots” can be very difficult to detect; hence the importance of working with a digital analytics provider that has the experience and means necessary to recognise and eliminate this traffic. The ability to discard flows caused by robots has direct consequences on data quality.
In addition to the qualitative aspect, the manual sorting of this polluted traffic is enormous, if not impossible for the person analysing the data. As a first step, your Web Analytics provider should be able to identify these robots using the official exclusion list published and updated regularly by the IAB. It must then offer you the possibility to regenerate your data, over the desired period, excluding this robot traffic.
Source allocation biases
Some events, such as Facebook’s overestimation of video viewing time and the temporary suspension of two Google indicators by the Media Rating Council for “non-compliance” with measurement guidelines, have given companies reasons to question the accuracy and validity of the data they receive.
At a time when transparency seems to be lacking, one can begin to question the accuracy (and impartiality) of indicator calculation in these restricted access systems. Here is an example with a very simple question: can we really rely on the figures provided by an analytical tool for the source of a “search engine”, when it is at the same time generating the revenues of this tool? One of the last and most striking examples is the bias in the measurement of source attribution by Google’s analysis tool. In other words, the conversion is automatically assigned to the Google source (engine or sponsored link) if the visitor has clicked, even once, on a Google link in the last 6 months. The measurement tool thus completely ignores the sources of direct traffic (link in favourites, automatic entry in an engine for example) to assign the conversion to itself. In other words, if the source is not determined, Google takes it over. The result: conversions that add up and numbers inflated in the counters of advertising channels like Google Ads. Nearly 20% of conversions are overestimated due to misallocation of sources.
Fortunately, it is possible and simple to act to reduce risks with the appropriate tools and procedures. The most difficult thing is to be aware of the potential sources of errors.
AT Internet offers a wide range of tools for quality control of analytical data. Fewer errors are therefore likely to alter your data and influence your decisions.
If you are keen to find out more about data quality, download our latest guide: