A/B Testing At Scale


A/B Testing At Scale

Continuous Discovery
UXDX Europe 2019

When teams move to outcomes they need to be able to confirm that their work has delivered on the expected benefits and not some external factor. As teams scale up their releases, managing the number of concurrent tests becomes more difficult.

This session will look at how to build and scale an experimentation culture that can run hundreds if not thousands of tests a year.

Tatiana Tretyak

Tatiana Tretyak, Product Manager,Booking.com

Hi everyone! As I just mentioned, my name is Tatiana Tretyak. I'm a product manager at booking.com. And I've been with Booking.com for about three years, and they're based in Amsterdam, which is from where I flew over. And it's my first time in Dublin, so I'm very excited. But before we dive into the topic of culture and the cultural shifts with experimentation, I want to share a factual story. What you don't know about me is that I grew up in a really small town in Russia, and actually moved to Amsterdam, just for this job. When I say small town, it has a population of 15,000 people, which if you think of the size of Russia, is this little drop in the population. And as in every small town, people had very normal expectations of what normal is: I could go to school, work, maybe go to university, and it's all kind of planned out for you. But I always wanted to try new things, try different things. And I'm not talking about like crazy hairstyles, we all went through as teenagers. It's more like learning Photoshop at 12, because I wanted to change the hairstyle of my favorite actress or learning Spanish in a town where no one speaks Spanish for sure. And I didn't get much criticism, but the answer is always like, "Why did you bother to Tatiana like, what's in it for you? And my answer is always, "Well, I'm curious. And if I fail, at least I learnt something."

Well, this approach of I'll try and if I fail I learn something got me from my little hometown in Russia to working for a leading travel company in the world where ‘we at least try and learn something’ became the motto we use every day.

So around four years back, I came to Booking.com which had almost 18,000 employees. So it's bigger than my hometown, no pressure. And around 200 product teams working on product development every day, from Amsterdam. And they all do it with the goal of making it easier for everyone to experience the world.

Of course, every mission statement is meant to be ambitious and inspiring. But if you look at every single word of it, like what's easier, who's everyone? What do you even mean by experience the world? And when you are sitting in Amsterdam, it's very hard to understand what customers actually want.

So we made a decision that will let Customers drive our product. And probably as all of you know, in this room, predicting customer behavior is hard. Raise your hand if you ever predicted it right? Let's talk afterwards, because I want to know how you manage, and since predicting customer behavior is hard, the approach the company took is to run data and experiment on everything every day, every single pixel and interaction every feature has been tested. And it gave us an immense amount of data. And we built some tools to process this data. But this talk, I don't want to focus on the tools because technology can be bought or replicated. But culture is something that you need to build from the beginning. And even if you have the most amazing tools in the world, the wrong culture will not let you experiment at scale, every single day.

So we'd like to share with you four aspects of culture and organizational processes that actually give you this feeling that, "Okay, it's a new day, I'm going to run a new A B test, and I might fail, but I've learned something."

The first one is actually that failure is okay. And I think it's been covered in previous talks in this room. So I might not want to reiterate the message too much. But failure is different for everyone. Like statistically speaking, only one out of 10 A/B tests are successful, which means 9 are not. So if you obsess around the failures, you basically will not have a life because you'll only think about the failed A/B tests. And another point is that breaking things is that amazing learning opportunity, because you learn how to not break them again next time.

And what is failure actually? This is another question. Because to me and my peers at Booking, failure is not a negative result, it is actually a very, very good thing because you managed to tackle a pain point of the user which is so bad that just by touching it, you made it worse. It means that you found something where you have to dig deeper, something that requires more insights, more research and you're moving in this direction, the failure or what the defining failure is, is if you run an A/B test, which you spent 2 weeks preparing, and then 3 weeks running, and then you didn't learn anything, because that's just time wasted, and you didn't get any insights out of that.

The second part is that Experimentation is the key to agreement.

Raise your hand, if, during the last month, you disagreed with your designer, product manager on how the future should look like.

Imagine you have five teams that can try to come to this agreement or 10 teams, or you have a CTO coming in saying, we need this tomorrow, I don't care. Well, if experimentation is something that you do frequently, experimentation becomes a part of the decision making process. And this way, there is no right or wrong opinion, there is no right or wrong idea. Every idea is worth testing. And even if you are 100% sure that this is something you want to roll out, testing and experimentation still becomes an important step in that delivery process. Because even if you were hugely successful, or hugely unsuccessful, experimentation will help you measure how successful or unsuccessful you were which happens 9 times out of 10. And it also helps you to avoid HIPPOs — the highest paid person's opinion. Because even if they're super confident with what they want to do, if the data proves them otherwise, it actually gives you as a PM or UX person the impetus to push back and do it right and not just do it fast.

Point no.3, Data is king but empathy wins.

I don't think it really matters what the size of your company is. But the moment you start getting your hands on data, it becomes very tempting to just spend days on running analysis, running queries, analysing the results. But in the end, data and experimentation is not the ‘what’ , it's not the reason your company exists, it's not something that will make the difference at the end of the day.

Experimentation is just how, it's the way you validate hypotheses. And in order to understand the ‘what’ for this 'how', you actually have to work a lot on the validity side effects. So user research, customer interviews, and it's sometimes a bit tricky it becomes a chicken and egg problem. Because do you first start with research? And then you run an experiment to validate the findings? Or do you first run the experiment? And if the findings don't make sense, you invest in research? There is no right or wrong answer. Because ideally, you can do both. If you set up your experiments in a way that you collect as much data as possible, every experiment can lead you to new insight and new ideas of what needs to be investigated qualitatively.

For example, if you run an A/B test that changes the checkout process, and you suddenly see that John, great, people are booking, everyone's happy. But you see a drop in family bookings. So families booked last holidays, which is not great, because families definitely need a holiday. It gives you an idea.

Okay, something went wrong for families, let's talk to family bookers and understand why they're not booking, how can we improve the process for them.

So experimentation can actually fire up your research. Or the other way around is that if you work on something which you have no way to implement in terms of A/B test first, of course, there's research, and then you test it through experimentation. And, yes, so in my opinion, Data is king. But empathy is still very important, you shouldn't just run A/B tests every day, because then you lose touch with your customer. And you lose the ability to understand what is actually the matter you're trying to solve.

And another very important point is that you have to try to create a community to keep the culture strong. And that's something that applies both for startups and big corporations, because it takes a very long time to set up a culture. But especially in the times of rapid growth and rapid hiring, it's the first thing that can be broken very easily. If you're just a startup, it's a privilege to try to set up the culture you want and then you can look at the user research and experimentation rules and how you trust data. But, as you get more successful and you get bigger, you have more and more people coming in. And as you get more successful you also get more people coming in from other big companies who also have their rules of the game and they believe that this is the right rule of the game. So it's hard, like creating the culture is hard, maintaining it is even harder.

At the very beginning, when I joined, it was still possible to do one on one training with someone who created the whole culture or talk to someone who is more experienced to ask for advice. But once you start having 50 people join in every month, it becomes not scalable to teach every person individually how the experimentation works, and what rules of the game.

We found two solutions, how to solve it. One of them is to work with your most passionate, most loyal employees. To make them ambassadors, they become the go to people to ask these questions. And they usually do it voluntarily because they've been in the company for so long. They care about the matter so long, you don't need extra incentives to make them do it. And they want to pass the baton, they want to share what they know, to keep this culture and keep the standards high. But it also doesn't go very far, because you can have just so many experts to cover a 2000-people organisation.

So another approach we've retested and which worked is also trying to create a community that does peer to peer reviews of each other, experiments. And it becomes just win, win, win for all three parties. We usually pair up someone who is my experience in their organisation with someone who is new. So people learn from each other of how the experiments should be run. And what is the good practice, what is a bad practice. And since you actually review random experiments, so you need to really press a shortcut, and you get the random experiment, joining the company is also an amazing way to keep, to stay aware of what's happening in the company outside of your immediate scope. So you learn what's happening in other departments. And since you're leaving a review, you also help these people to understand what can you do better in future, and you don't need a massive organisation to start doing it because once the community gets started and then grows, at a similar rate to your company's growth and it becomes self sustainable.

One thing to keep in mind, though, is when you start this review process, it's very important to explain why this is happening. Because if people just suddenly start leaving reviews on your experiments, you're like, Why me? What did I do wrong? Why out of all the 100 experiments running, it's mine that got picked. So this can create a bit of animosity in the beginning and not understanding like, why you're grading me.

So set up the rules of the game at the very beginning. And then the community will keep the culture strong.

So I think these are four key ideas I would like to bring and once you go back to the office on Monday, well, not Monday, Wednesday. Remember that, you have to revisit the definition of failure. And a negative test is not a failure. And a negative test is just one step in the right direction, hopefully, of which will be a right direction after some point. And try to get buy-in not just from your team, but also from management. Because we all sometimes have people who just want results and don't understand why they're happening. And then it will just create the atmosphere, which is more welcoming to experimentation and failure becomes part of the game.

Balance experimentation with qualitative research, depending on the problem you're working on, you can either start this research and continue the experimentation or the other way around.

Try to experiment as part of the decision making process, whatever project plan you're doing, or whatever slide that you’re submitting with the roadmap, experimentation should be one of the aspects. You can never assume whatever you've built, they'll be right from the first go. And since you start communicating this to upper management to peers, others will also start thinking this way. So whatever idea you push, always put a slot of time to experiment and test it.

And, of course, try to build community to keep the culture thriving. Even if you're at first just one person in a community. It's a good start. And then it can keep growing further.

So I have some links here because I know a lot of people are interested more in the technical side of things and how the infrastructure is set up. I'm probably not the right person to tell you about it because I wasn't the one who built it. But on our blog, looking at Booking.com, there are plenty of articles written by my colleagues who explain this a bit more detail. So, if you're interested more in the technical side of things, that's the place you should go. And hiring.

Thank you.