How to Build Data Products in a Fast-Paced Environment
How to Build Data Products in a Fast-Paced Environment
How do you meet expectations from different stakeholders while being truly data-driven? From changes in requirements and ad-hoc requests to international macroeconomic demands and constraints, it has become even harder to build scalable products in a fast-paced environment.
Monique shares learnings for the past year on how Trustpilot built data products in a cross-functional team to help their global customers overcome these challenges.
Hi, everyone. My name is Monique, I hope you're enjoying the conference. I'm happy to be here and talk about how to build data products in a fast-paced environment.
So today I am going to share my experience working in cross functional teams, and some of the lessons and learnings that I had while building data products while needing to quickly adapt to changes in the requirements, and demands.
Before we jump into the presentation, I would like to share more information about myself. So my background is actually in finance. And I've been working through initiation data science. Because mostly my passion is about how to use data in different ways. I've been working with markets, eco sources, and projects with blue chip companies on how to apply AI techniques to solve real problems. More recently, in the last few years, I have also experienced shifts and how we work. And I would like to share that with you.
Today, I'm working as a senior data science, a Trustpilot building designing data product and improving customer experience. More about Trustpilot, so Trustpilot was founded in 2007. And with a vision to create a dependence on trust. It's a digital platform that connects consumers and businesses. So we help businesses to show up with more confidence. At the same time, we provide actionable insights, actionable business. We are different stand locations, we have more than 167 million reviews, more than 700 websites revealed in the platform, and more than 50 nationalities working Trustpilot.
And to give more compass about my role and the team, I work on data science products in science teams. So across all the products. And as a nation both on Business and Consumer Reports, we work and cross functional teams to improve and to design those features and data products. So my experience and some of the learnings that I'm going to share comes from the business side, and can actually be easily applied to any scenario and seen. So I hope you enjoy.
So just start off, let's talk about the products and how it's actually some of the elements that we need to consider. So let's start with the customer thing. And so think about customers' problems, what we wanted to solve. We also have our data characteristics. So the data that we produce in house the raw data, and then we also have the value of that we can add it to the stages. So basically how can we enrich this data to make it actionable? And before we jump into more details about the workflow and how the development process, actually, they space, I would like to share some statistics about those projects.
So according to Gartner, only 20% of this, and then analytics insights will deliver business outcomes, which is quite alarming. And it's actually a little bit worse, as 85% of all big data projects fail. And you're probably wondering why there are so many investments and time and resources. So why did most of these projects fail? And I highlighted the most interesting reasons for that. So most of them and most cases we're trying to solve or an autism defined by problem.
We are not actually adding value. Another interesting reason we think about performance in the last step in the last phase of the process. Or simply, it's a problem of the process itself, there is no process or we are just doing the wrong process. And that's quite interesting. I tend to like them, they would like to focus on giving more insights on the process. So if we look at the data product and development workflow, in a general way we have, well, first we have the first steps of talking to customers, understanding the problem, identifying our audience, our target audience. And then we quickly try to work on a solution, create a prototype, get feedback from the team or from stakeholders, and implement the solution. And then we iterate a little bit more until we are happy with that. And then finally, we launch the feature of the product, and then we start over.
So what can actually possibly go home can that and I would like to highlight a couple of things. So in the first phase, talking to customers, it is quite interesting that we sometimes assume that we know what the problem is. And we just jump into the next phase and build the solution without really really understanding what the problem is. So, or we have some assumptions, what the customers want, and which are just jump into the development phase, a prototype phase chooses all the reasons, also, the prototype and development phase that they would like to give us an example. Also as a result of not really understanding what the problem is. So lack of requirements for the prototype phase, or very customer requirements, as well as in the while implementing the solution. We end up with technical problems, integration issues, or we figure out that actually, it is more complex than we estimated in the beginning. common shoes also related to dependencies, shorter teams, and all these problems cause this pour to actually take longer than estimated.
So if we look into the goals and people involved in this process, usually what we see is that customers that are represented by project managers that talk to customers, UX, designers, marketing, team sales, customer success teams, they are heavily involved in these phases, right? And while in the prototype phase, you'll see more development teams, such as engineering, data science, analysts, machine learning engineers, as well. And then also in the implementation phase, the development team. And then finally, when we launch the product, then we have the marketing of the communications to the release process. Right? But if we think about the challenges that we had, and how the roles are traditionally involved in this process, can we make something out of it? Maybe can we change the way that we work? So we solve some of those challenges? So I've proposed a new way of working. So if we think about all the people involved initially, we have customers and identify the problem.
It is not really a big shift, but this intelligence light, what I'm proposing, so actually, we are all involved in the process, but we contribute in different ways and in some of these steps, we contribute more and other stuff, we contribute it in different ways as poor as for example giving goods. So now you might wonder, okay, so in the prototype phase development phase, how can the other role can actually contribute? And they'd like to give some examples because it's actually a very difficult question. So for example, in this phase of data science, for example, working on the prototype, it is quite helpful to have some feedback, even in terms of choosing the right metrics to measure success or even to keep track of any integration issues.
So, we can make sure we reach our tagline and to launch the product, launch a new feature. And in the same way for engineering and the data teams, how can they contribute to the other phases that you shouldn't? They are not so active. So, in the first phase, for example, data science data analysts, they can help to filter data validating, for example, personas, identifying market trends, and then providing more information to stakeholders. So, we can actually understand the current business and the customer problem.
And as well as the last phase data science and data analysts, and also the team itself can be involved in listening and helping with some questions or understanding why customers are having questions. So know exactly what they have questions. So this is actually a way to pay for the whole process. And yes, so as I mentioned before, data sciences can be actually quite active across the whole process, across the whole development process. And, but I would like to go a little bit deeper into the details of how this works in practice. And I think it is also a good opportunity for professionals outside the development of technical error to understand how they can collaborate with data science, and how actually the sciences use it in different steps.
So I would like to start with a problem statement, right? So if we think more about the Trustpilot, business receives lots, tons of reviews on a daily basis. So how can we translate all those reveals all that data into actionable insights? So we have a given problem to solve.
So the next, the initial phase, as a mission is to understand the problem and understand who this customer is, right? And the traditional methodologies to understand we interview customers, we create personas to understand their journeys, which were traditionally reasons why the person has had that problem. And through customer surveys and through customer interviews, and then finally, we try to come up with a solution on how to solve that problem. So what I would like to share is how data science can contribute to that.
And from the data side perspective, we can actually help by looking to the data exploring the data, understand if there was any partners in data, if there is any relationship with location languages, customer base segmentation, establishing these correlations and partners, and helping UX to understand and as it seemed understand the problem. Also, in the ideation session, usually with the team on trying to come up with solutions. Data science can be quite helpful in providing stakeholders, early assessment and insights on what the data that's actually we have available and identify potential actually red flags colour for some solutions that weapons dorm.
And what I would like to have an just as a note, as a side note, especially for technical members in the team, we usually jump into and I can include myself on that, we usually jump into the solution modes, think we tend to think about how to solve the problem and sometimes thinking too much into the solution. And forget about the problem. And that is quite important that we focus on the problem instead of just jumping to the solution, or how to actually build the solution.
So going to the next step, that is a prototype phase. So a prototype is our early sample model or release of a product that is built to test the concept that you're trying to build. And it is typically used to validate the product design or functionality and gather user feedback before deploying data into production. That is usually a common definition of a prototype for software development. And they would like to challenge that today by how we actually see that in data science, building data products. So when designing the solution, and exploring approaches, I would like to highlight four elements to that.
First, is scalability. So during the prototype phase, we usually use a sample of the data, our customers. So how does it work? Actually, if we wanted to scale that to all customers, when thinking about the solution, the model, the approach is quite important to think about scalability. Also, explain ability and they're really I like this one is basically the bridge between the technical part and non technical audience. So how can you explain your model, how can you explain the outputs of your model to, for example, a non technical audience.
Another one, the third one is implementation. So here, it should consider the challenges of your chosen approach, when implementing that to the product. So you can think about the technology stack, how difficult it is, and how complex the shrinking means to the current architecture.
Flexibility, flexibility is about changes will come, we don't know when, but they will come. And then you need to have that in mind. When choosing your approach. I will go through some examples in the next few slides about these four elements. But before that, I would like to highlight also, in this phase of creating a prototype, it is quite important that we have in mind how to validate our approach. And I like to think about that in two different ways. Two different categories of metrics. So we have the second core metrics, that can be related to the machine learning metrics that you use, usually, with precision recall more technically, and that's also a business metric. So how your model, how your other output of your model approach will impact customers?
So we can talk about coverage, how much of the customer base your future impacts the how. So, the changes that it will promote when deployed to production. So some examples of the four elements when choosing our approach in data science. When creating your prototype, I would like to go through some examples of how this works in practice. So when it comes to scalability, as I mentioned before, so as your approach is your model, is it able to scale when deployed in production? Is it able to schedule our customer base, when it comes to explainability the chosen approach should be able to explain the model outputs no tracking code since, remember that we are building products to different audiences, and we need to understand your audience?
So your model should be able to speak to them. And also for the data science perspective, the model output should be easy to understand. So data scientists have a good intuition on how to interpret them, how to improve, and how to correct them when something is wrong. On the implementation side, we need to consider the current architecture as I mentioned before, when choosing the machine learning approach, how the model outputs, for example, will be consumed in that setting data structure, how it's related to the current technology, and latency. So those are a couple of examples that you can consider flexibility.
So consider a machine learning approach that allows you as a data scientist to iterate quite easily full time. Part of that is the approach to the model, and part of that is also how mature your development process is with your current tech stack, for example, your machine learning engineers are helping you to create a pipeline that can automate processes so you can deploy experiments faster quickly.
So all that goes into flexibility. So flexibility on how you build but flexibility also on the algorithm behind it. How easy is it to change something? And let's say that, okay, we are finishing the prototype, we implemented the model to the product stack, and will launch new releases out that is great. So I will finish. Well, no, usually then we have a phase for customer feedback. And we expect to have feedback from different customers. So different opinions about the same feature. So some customers can be quite positive and actually validate what you build. Other customers can have different reasons they don't like, they don't like the feature, nothing that's useful. That's not exactly what they want. Or they actually have additional requests.
So from the feedback, a couple of examples that I put in this slide, it's quite easy to see that it is not only one reason, we can see that actually, customer feedback goes from the user interface issues to communications to data science model to engineering problems. So how to proceed? It is important that we understand the feedback. So we ask why. So, for example, a customer is asking this new feature why it is important, why it is critical for them, trying to understand the reason behind the feedback, and then working as a team prioritizing and improving the product.
And now you may think oh yes, but how they just science and engineering teams can work together in the same Sprint will work quite differently to reach winning and model it is not the same as fixing a bug sometimes in the way. So it can be different, there are different requirements for that. And there is no right answer here. But so one of the learnings that we had, is that or short term. So what we consider bug fixes from a data science perspective, to have the elements that automation, flexibility and how we implement it actually allow us to quickly make some changes. And so we can keep up our flood of demands and the critical things for the short term.
In the medium term for example, when we think about the next release, it is important to have in mind as a data science that some improvements, we can identify areas of improvement from the data, so from your metrics, but it's also important to listen to stakeholders, they have the domain knowledge and supports consider that but also important to validate that full data, so receive a feedback, it is important to customer, it is also we can also find in the data, we can also validate flow data, yes, then we can implement it. Also, as I mentioned earlier, scalability is also important to have ML ops in place.
So being able for scalability, and also for flexibility of being able to have a retraining process and pipeline in place. That is quite important from a production setting. And also, it is important to say no, sometimes to very custom requests. And this is important also for engineering teams that we cannot just implement every single feature that customer requests, but also for data size. I would say that's quite important. So we don't prioritize or we don't favour some segments of business by implementing some custom requests, and questions features. Yes, and I would like to actually just recap, some key takeaways.
So as a team, it is important that we work all together and focus on delivering cost, delivering value to customers, it is easy to get distracted with bugs with the different requests. But it is important to have in mind the customer pain, and the customer problem. Also, get yourself involved in all steps. As I mentioned before, of course, we are more active in some steps more than others. But it's important to understand that what you do as a data science as an engineer, as UX, or in marketing, promoting the product will be impacted by what your colleague is doing. So it's important that we work well together. For data science, I would like to highlight a couple of things. So ask for feedback in all phases. Just early feedback is quite important. So if you have an opportunity to ask for feedback to your colleagues, engineers to other data science, ask for feedback.
Prioritize the approach that waits valid for persons, it's quite easy to be seduced by state of art models and complex models. But sometimes they are difficult to implement or to understand. So again, the explainability elements here. So keep in mind, scalability, explainability implementation, and flexibility when choosing your approach, and be aware of trade off the trade off performance and time. We tend to aim for perfection and improving and improving and we can actually spend a lot of time on that.
So it is important that we draw the line when it's enough so we can actually focus on what is important to customers. And I would like to thank you for your time. I hope that some of the learnings I share today are useful to you and your team either you're in data science or in your other position so thank you for today and I hope you enjoy the conference.