Today we’re hearing from Dr. Sébastien Foucaud, head of data science at Scout 24. After sharing his experience working with diverse types of data with us, he’s back today to talk machine learning.
Artificial intelligence and machine learning are no longer just the stuff of sci-fi films. From speech recognition and voice control on our mobile phones to personalised item recommendations on shopping sites… from facial recognition in our shared photos to the customer service chatbots who answer our questions… examples of applied AI and machine learning surround us every day.
Even if the fruits of AI and machine learning technologies are increasingly present in our lives as consumers, many businesses are still questioning whether (and how) they can ride the machine learning wave and use AI to their company’s benefit. You might be asking yourself:
Can machine learning be applied to any type of business?
Can even small companies do machine learning?
What resources do I need to do machine learning, and where do I start?
For many years, machine learning and AI were traditionally reserved for the biggest, most resource-rich companies and brands. But today, machine learning has truly become accessible to all types of businesses. Different factors have contributed to the democratisation of machine learning:
- Off-the-shelf technology: Out-of-the-box machine learning and AI solutions make it simpler to implement and apply this technology. Data streaming technology and cloud-based solutions have become more powerful, while at the same time becoming cheaper and more accessible.
- Data abundance: It has become easier to acquire data, and even smaller companies typically have enough exploitable data to form the basis of a machine learning project.
With a lowered barrier to entry, machine learning and data science are no longer just for big companies. Businesses of all types and sizes can now start considering and implementing these technologies.
In this article, I’ll cover the elements essential to machine learning projects, recommendations for successfully launching your machine learning project, and pitfalls to avoid.
Small data set? No problem!
Evidently, data will be the core of your machine learning project. If you have large data sets – and if that data is labelled (meaning it has been “classified” or “tagged” in some way, in order to serve as “training data”) – you’re well on your way to integrating AI into your company (and you might even think about deep learning or neural networks, if you have a business case for it).
But on the other end of the spectrum, if you only have a small data set, don’t despair – it’s not a deal-breaker. You can actually do quite a bit of machine learning with smaller data sets as well. Good models can be built using simple regression on a small data set, and once you’ve built a solid model, it’s easy to implement and teach a machine.
When working with a huge data set, your technology matters much more. But when working with a smaller data set, it’s your human talent that carries more weight, specifically domain knowledge that cannot be substituted by a machine. Pairing this human domain knowledge with simple modelling can take you far!
Build your project with the right people
Despite the focus on the “machine” in machine learning, humans are still key here, and it’s especially important to have the right kind of talent and the right approach.
A huge team is not necessary to do machine learning. You can build a great product with 6 to 9 people: a product owner or manager, someone from UX/UI, a data scientist or two, a data engineer, and then a few software engineers (2 to 4, depending how fast you want to build and how complex the product is).
But the caveat is that you must have done your homework first! You must know which specific business problems you’re looking to solve, and whether you have the right data to achieve this (see the 8 steps outlined below). This is where it’s crucial to involve a data scientist who understands the business needs and has some experience with product development.
As the “data scientist” title is being blurred across lines these days, it can be very confusing for companies to know what kind of data scientist talent they need, and who they should be working with. There are several different types of data science specialists (for more details you can always read this article), but the three profiles that are most useful in the context of a machine learning project are:
- A data analyst or insights-type person: This type of data scientist can work with large data sets and derive insights. In the early stages of your project, your data analyst can help you identify and understand trends in your data and determine any problems that must be corrected.
- A data strategist: This type of data scientist has the business expertise necessary to understand the identified problems, ensure your company has the right data to resolve them, and develop business use cases.
- A Machine Learning engineer: This type of data scientist has a strong background in software development and engineering, and can be instrumental in actually building the product that will solve the identified problem.
No matter the specialisation, the core skill of a data scientist is applied statistics and the ability to derive insights from large data sets using statistical models (gained from years of first-hand experience with data). A good data scientist therefore combines deep technical knowledge with enough domain knowledge to address business’ needs. Besides, at a broader level the data scientist should be able to steer machine learning projects with help from software and data engineers. Finding the right data scientist for your project is a complicated task and may require time. However, in the meantime you may want to temporarily hire a certified expert (from platforms like certace.com).
8 steps to a successful machine learning project
You’ve got the right data and the right human talent to power your machine learning project. But that still doesn’t guarantee success. The deciding factor will be your project execution: how you combine your data sets, your human talent and your implementation. Here are 8 steps to running an effective machine learning project that optimises your resources and investments:
- Identify your business use cases and the business needs. Take a top-down approach and start with the business problem you want to solve (as opposed to first investing in massive infrastructure and expensive resources that ultimately prove to be ill-adapted to your needs). Use your business roadmap as a guide; be “business-driven” and not “tech-driven”.
- Identify the required data sets. What data do you need to solve this problem? A data strategist can be especially helpful at this stage, along with a high-quality data provider.
- Determine which product solution should be built. Combining knowledge of available data and business needs, you can then start scoping your project and define clearly your data product.
- Determine and scale the right architecture to stream data into your platform. Many different solutions are possible here, so a data engineer will be able to recommend how to best stream data depending on your available infrastructure and technology.
- Take care of your data. Clean the data (it’s usually 90% of the job), remove outliers, replace missing values, work out the formatting, etc.; at this stage it is also worth looking into compliance and privacy (should you anonymise part of the data set?). This is also the time to reduce the number of free features with feature engineering (combining features and removing duplicates) – this last stage is essential to reduce complexity of models, but requires advanced domain knowledge to perform well.
- Build the right model for your machine. At core of your product is obviously the machine learning model. Many algorithms can be used to solve the business case, with various level of precision. One key element to keep in mind is that as I mentioned earlier, even a simple regression model can work, and simple models are usually easier to implement, scale and maintain. So between a very fancy model with a very high level of accuracy, and a far more simple one (although obviously less accurate), you may rather choose the latter if it suits your business needs sufficiently.
- Build your product. Interactions between your product manager, software engineers, data scientists and data engineers are essential here. Opt for lean development and a small, minimum viable product (MVP) so that you can iterate quickly (see next step).
- Test and adjust. There’s never just one answer to a problem, so don’t assume your product is the only solution! When launching your product, set a hypothesis about the impact it will have, and measure success by collecting more data. If you’ve taken a lean approach, you’ll be able to quickly test and readjust your product where needed.
Today, thanks to accessible technology and abundant data, machine learning has come within reach of all businesses – whether your company is big or small, no matter your sector of activity. But the key to leading a successful machine learning project is to adopt a sound approach from the very start. A business-driven (and not tech-driven) approach will enable you to identify and align the right elements (talent, data and execution) in the right way. In doing so, you’ll make the most of your machine learning investments and increase your chances of achieving your specific goals.