Digital analytics is used to support a wide range of data science projects, and vice versa. Features built with machine learning and AI are also emerging in web analytics solutions – predictions, targeting, new services, etc. In a recent publication, Gartner refers to augmented analytics as the main trend and priority for CDOs in 2020. We recently interviewed Jérémie Bureau, data scientist and head of the data science team at AT Internet, about data science and analytics. Read on to learn more…
How do you become a data scientist?
There are many ways into the world of data science. Engineering schools and universities offer courses from master’s to PhD level. Indeed, the demand for data scientists is so high that specialised private schools are starting to spring up. Personally, I studied applied mathematics at the University of Bordeaux, then went on to a doctorate in mathematics and statistics at the University of Toulouse. I wrote my thesis under the CIFRE agreement (French industrial convention on training through research) and worked as an R&D engineer for a startup during my three years of doctorate. My thesis was on the reliability of geolocation systems in an aeronautical context. I then worked in various professional fields such as health, employment and digital.
How does data science specifically apply to an analytics solution in SaaS mode?
When working on issues that require data processing, before even going into predictive models or machine learning, we need to meet two requirements to extract information that is actionable and value-added – firstly, collecting a sufficient volume of data, and secondly, ensuring it is representative of the population we want to study. AT Internet’s huge advantage is its variety of customer websites, making it possible to tick both boxes!
However, each site will have its own specific properties depending on its business sector. These differences can vary enormously from one sector to another – e-commerce sites, media, advertisers, banks, institutional sites etc.
The data science team needs to provide tools that target all our customers to help them optimise their marketing strategy. The tools, based on mathematical algorithms and models, must make it possible to describe and predict the behaviour of Internet users.
An example of this is a segmentation method to identify users who purchase the most, or alternately users who have a high probability of churn (unsubscribing or not returning to a site). It is often a case of choosing between a generic model with an acceptable performance on average across all customer sites, or a specific model for similar sites.
How and why is data science useful for web analysts today?
Data science is now able to provide descriptive, predictive and even prescriptive tools to support analysts. There are numerous metrics to monitor and understand to obtain useful information. It’s also not rational to try to follow this vast number of metrics manually. One of the applications of machine learning to support analysts is to offer an automatic anomaly detection service. The goal is to capture unusual or suspicious fluctuations in metrics over time. Our teams are currently working on analyses to explain the probable causes of these anomalies – e.g. if a bot passes over a site and causes a significant peak in traffic, an anomaly is detected on the number of pages viewed. We aim to support the analyst in his or her investigative work by automatically exploring a set of dimensions (source, device, browser, etc.). Our causality analysis module shows that this anomaly was caused by an abnormal increase in traffic on the direct traffic segment in Canada on the Chrome 55 version. This type of tool will enable the analyst to carry out an initial analysis and gain a better understanding of behaviour to anticipate and implement the necessary actions or strategies.
RFM segmentation is another use case – it is a clustering (segmentation) of customers according to their purchasing habits to optimise a marketing strategy. Customer transactions are analysed based on three criteria: Date of last purchase (Recency), Frequency over a given period, Amount (cumulative over this period). Scoring methods are then used to create the customer segments, such as Stars who buy a lot, and who have bought recently, or Thrifty dormants who have a poor recency score. At AT Internet, we have decided to integrate an automatic RFM clustering feature – the idea is to use a turnkey analysis which will automatically adjust to the customer context and especially to seasonal fluctuations. In addition, prediction elements are added and integrated into a set of adapted graphs. Our teams are currently applying the same segmentation methodologies, but on metrics related to engagement rather than purchasing, to enable these features to be used on non-transactional sites.
What are the data science team’s challenges at AT Internet?
Firstly, the construction of a data science roadmap in line with the needs of our users. Our priority is to be attentive and responsive. From an organisational point of view, our team is now part of a high-level development environment. This requires the implementation of a workflow combining major R&D work, industrialisation and continuous optimisation of our models.
Each member of the team must now be able to handle both modeling and industrialisation issues. The technologies and tools the team uses are very diverse: Python, R, Shiny, Scala, Spark, Elastic Search, Kibana, Snowflake, AWS, Kubernetes, Jenkins, Git, etc. The other key challenge is to ensure that the team’s skill base progresses consistently for everyone. To do this, we work with platforms such as DataCamp or Kaggle.
And to sum up…
It’s important to always stay sharp and attentive, with a passion for discovery and learning – “Data science is driven by curiosity”.