A bit over a year ago, I started as a Lead Data Scientist at Travix, after having worked as a data science consultant for over 10 years. I am writing this blog to tell you about this journey from consultancy to growing a data science team in an ecommerce company. I will also tell you more about the skills required in my new position and how they differ from consultancy. This might be interesting to you if you are planning a similar career move or are interested to join our team.
My name is Maarten Soomer. I have a Master’s and PhD in Business Analytics and previously worked as a Data Science consultant for over 10 years. This experience included a lot of projects in the travel industry (airlines, tour operators, leisure) mainly related to Pricing & Revenue Management and Marketing Analytics.
After 10 years of consultancy I was looking for a longer term view and I was sure it had to be in e-commerce. Travix is the perfect company for this, as it combines e-commerce with my experience and passion for travel.
Travix is an Online Travel Agent (OTA), selling flight tickets in 39 countries with 5 brands using 43 websites. We don't own planes or buy any seats in advance. We do have data though. Lots of it. We use it in our platform to make the best match between demand and supply and give the best experience to our customers. And this abundance of data coming from all systems/processes in the platforms allows for many amazing data science opportunities.
Currently, we are in the process of growing our Data Science capability and impact. We have a small but growing central Data Science team, focused on delivering (automated) data science applications, i.e. Machine Learning models. The team works alongside Data Engineering, which is responsible for the development of data applications.
Given that we are the central data science team we work on projects covering the full spectrum of our business. I personally really like this, because it gives a lot of variety, and you learn a lot about the different business areas. Some of our key recent projects include:
As you can see, not only are the topics very broad, but also the techniques used. We are also very pragmatic in the approach we take for each individual project; it doesn't have to be Machine Learning if a simpler solution will do.
The list of future projects appears limitless!
As a small team we have the challenging task to deliver data science applications that provide business value in production. This goes beyond just building a machine learning model (although this is an essential part of it). The team is also (heavily) involved in delivering a solution in production. This is a great opportunity for the team to learn and build a different set of skills.
The first challenge is to make sure that we understand the business problem and frame it as a data science problem. Business people typically do not fully understand what is possible with Machine Learning: They expect that you will do some magic with the data and will always have amazing results. So we need to interact with the business to get an understanding and make sure we are solving the right problem and have an idea if and how we can solve this problem with the right data science techniques.
At the same time we must also have a rough idea what the final solution should look like. For example, the model prediction may be supplied online by an API. The system calling the API might be required to use the predicted value in an automated process, for instance to block fraudulent payment transactions. It is important to know this early in the process, because knowing how the model will be used will help to make the right choices in the modeling phase (e.g. when is it good enough, what error metrics to use) and might also provide constraints on the data that will be available at prediction time.
Now that we have an idea of what problem we will solve in what way, we need to get our hands on some data. Luckily we are working alongside a talented group of data engineers, who have built a mature data stack in the cloud. They help us find the right data sources and ultimately build the data pipelines that turn data science into production software. We need to make sure that the data has the quality, consistency and volume required to build a solid solution.
Finally, we can perform exploratory data analysis and start modeling. This is our favorite part of the work, where we can play around with the data in a Jupyter notebook on our laptop (if the data is small enough), or in the cloud and apply our favorite machine learning packages (scikit-learn, LightGBM, Tensorflow, etc.). And of course we will celebrate when we have a model that has a good enough performance!
But the story doesn't end after a successful modeling phase. We want to put our model in production in order to deliver the desired business improvements. Because running a model in production requires thinking about topics like scalability, high availability and model monitoring, we have to move from notebooks to production code and work with data engineers and software engineers to develop and deploy everything that is required to run the solution in production. Luckily there is a good infrastructure to work with: fully Google-cloud-based with Kubernetes clusters and Apache Airflow.
Once the solution is live we have to measure its impact in practice and monitor (model) performance. This can lead to a new iteration of improving the solution.
Besides building our legacy of data driven applications, we also aim to improve the tools and processes we are using. This continuous improvement makes sure the next project can be run even more efficiently, and we can reuse components. This involves topics like MLOps, a data (science) platform, best practices, etc.
Overall our way of working is agile and pragmatic: start simple, scale (or fail) fast.
These are some of the types of tasks and meetings I have in a typical week:
As you can see, we are working on a wide range of projects and have to build relationships with many stakeholders. This is similar to what I was used to when working in consultancy. What is really different is the longer term view and the involvement in every aspect of the project from start to finish. In consultancy there is a lot of specialization: both the assignment is typically scoped very well and within the project team there is a clear distinction of tasks.
I really like that at Travix there is less pressure and a lot of freedom and autonomy, and that I am involved in many more aspects of delivering a solution. It’s more wide than deep: A solid background in math/data science is really required, but we do not have a single person specializing solely on NLP models for example. Rather, we work with multiple algorithms and combine that with coding and communication skills. This involves a fair amount of pragmatism.
Personally I switch between very hands-on modeling, coding and analysis tasks to strategic involvement (what should the data science team look like in five years), which is sometimes challenging in time management, but I really enjoy this mix.
There will be a little more specialization in the team when we grow over the coming years. In the end we want to have data science embedded in the business teams, combined with a central expertise team (hub and spoke organisation). In the coming year we’ll first focus on growing a solid hub.
What we’ll continue to do is build data science knowledge and skills across the company. Although data science is still a centralised activity, it is very valuable to have some people in the business teams who have a good understanding of what is possible with data science and can start performing analysis on their own. Last year we ran a Data Science Club training program aimed at this. It was truly inspiring and impressive to see analysts from business teams building their first Machine Learning model to solve their own business problem at the end of the program. And some of these models were developed further into production models by our team.
I am really enjoying being a part of this journey and proud of the business impact we have delivered in the past year!
Lead Data Scientist