WebRTC in the Trenches – A Survival Guide
WebRTC in the Trenches – A Survival Guide
This is the story of how I built a WebRTC app to help people connect for 1-on-1 video chats during the pandemic.
A tale of product learnings, technical decisions, and lots of conversations!
Hello and welcome to my talk Building Speakeasy, WEBRTC In the Real World. And so, today I'm going to share with you a story of how I've built an application called Speakeasy. And it's a web application for meeting people online through video calling. And you can kind of think of it like if you took a bar or a party and you tried to bring it to the online world and that's what Speakeasy is. And to my talk today is going to be the story of how I built it, why I built it, what went right, what went wrong along the way and hopefully there'll be some lessons in there for you if you're interested ever in building a video app or developing a product on your own and so that's set we are going to talk about today. So, let's get started.
So, I'll show you the, sort of what the application looked like when I finished prototyping it. So, this is kind of the idea you can select, if you want to do a video or audio chat and you get matched up with a partner who the site recommends that you see speak with so in this case, we are given this is sort of conversation starter question, what do you value more intelligence or common sense. And then we can discuss that with person we've matched with. And the kind of the idea behind the site was this sort of created in the beginning of the coronavirus lockdowns, and also in the United States, we were having a bunch of protests going on and the police actually created a curfew. And so, a lot of people couldn't leave the house after 8:00 PM. And so, there was a certain feeling of social isolation that I wanted to help address. And I wanted to basically make a way for people to connect with each other and to have conversations even without leaving their homes and to create a little bit of a sense of community and so that's kind of the idea behind the design of the sites.
One other aspect you will notice is that there's a countdown timer at the top right that is counting down how much time is left in the conversation. And that along with the conversation starter question creates a sort of very intense conversation with the person you're connected to because you just have two minutes and you also have this question that's encouraging you to skip past the small talk. And the typical conversation, a lot of times when you meet somebody, you'll start your conversation with the basics like where you work, where you live and in the case of this application, you only have two minutes to talk to the person. So, if you spend that two minutes reciting the basic information that you tell everybody that you meet then the whole conversation will be over by the time you do that. So, this separately encourages you to skip past the small talk, skip past the basics that you've said hundreds of times before, and to really get to something interesting and then if you do connect with the person that you're speaking with then there's this button that pops up in the last 15 seconds of the call which allows you to extend the conversation but it only works if both people agree to extend the conversation and so that's kind of the idea of the site. Hopefully, that makes sense and so now what I want to share is kind of what went right in the process of building this. So, the first thing I will focus on is sort of how I built it. So, there's this open source library that I created called simple-peer and it's one of those popular libraries for building WebRTC applications. And if you're not familiar WebRTC, it stands for Web Real Time Communications and this is the API in the web browser that allows you to do video and voice calling. And so, pretty much any video or voice app that you've ever used in a browser or something like a Google Meet or Skype, it's using a WebRTC to actually enable that video calling or voice calling to actually work. The thing though is that's actually a really complicated API there's dozens of methods that you need to understand. There's a lot of acronyms there's terms like stunned, turn, ice, signaling and just a whole bunch of things that have concepts that you have to understand in order to get an application like this off the ground. And it turns out all those concepts are there for a reason. It's actually pretty hard to do a video call and get all the little technical pieces in place to make it work. But when you're actually building an application, you don't really want to be faced with all of those kinds of minor details and so it really helps to pick some kind of a library or framework to give you sort of a simpler way to access that same functionality. And in this case I wrote this library, simple-peer when I was building Web Torrent which is that other open source project I mentioned and so simple-peer I had never really used it for video or voice calling before, but it turns out because of the great work ups, some other open source contributors on the project, it was really ready to go for this video and voice calling use case. And so, I was really pleasantly surprised how well it worked. And so, if you're building any kind of video or voice app, I really recommend you take a look at this library or maybe even some kind of a third-party service for WebRTC calling it will really be a great help to you.
And so, now I want to focus on sort of how I promoted the site on a launch day. So, this right here is our product hunt listing page. And so, this is where I posted the site when I first created it. And one thing you'll notice is the site was actually called Virus Cafe when I first created it. Now it's called Speakeasy which I think is a much better name but the original name, Virus Cafe, the idea was kind of like, "Well, we all have inspire us and we can't really meet up in a real cafe to hang out with people or to meet people and so why don't we do a virtual cafe? And that's kind of where the name came from but you'll see here that when we launched it, so we got a lot of votes. We got a lot of interests, a lot of people trying the app out. One tip by the way, if you're ever launching a product on product hunt is that you should post it at midnight or a really a couple of minutes after midnight because the way that the site does ranking is it will reset the number of up votes that all of the sort of other products have at reset it down to zero at midnight and so if you submit your app right after midnight, then you'll sort of be like more fairly in the running with all the other applications and that are on the site and you'll have a better shot at making it into that top list of top products for the day. And so, as you can see here, we were number four for the day. Throughout the day, we were in number one and number two, we were sort of fighting for that top position. In the end, we got number four, which is a little bit disappointing, but we still. As you can see got a ton of interest and have people to come through and try it out and a lot of really good feedback. And so, I would say that the launch on product tone was actually quite a success for us. And the other thing we did is launch on Hacker News. And for those of you don't know, this is like a really developer centric place to hang out. And it's a sort of link in news sharing sites but they also have a feature where you can sort of show products that you've built to the community and that's what we did there. And one of the things about happening is that's cool is there's a lot of critics in the community. And so, you can see here as I'm introducing the site to people, I was mentioning sort of that the goal was to eliminate small talk and get people to really connect over a good conversation. And then the very first comment from somebody here is somebody saying, "Actually, small talk is really great and I can't believe you would want to get rid of small talk." And that's just how these things are when you put your product out there into the world, there's always going to be critics. And there's always going to be people who have something to say. And so, it's not a reason not to launch stuff. That's not a reason not to create stuff. So, just got to take it and roll with it. Then on Twitter, of course, there was quite a lot of feedback. People were really happy with the app. They thought it was amazing and that they wanted to share it with people so you can see here's this guy said that it was a much needed replacement for talking to his colleagues in between breaks so as he transitions to remote work, having a place where you can actually hang out and socialize with people was really valuable. This person was saying that they had really wholesome conversations on the site which is really great. The other thing was interesting as people would spend more time talking with others than they intended to. So, they would come in thinking, I'm going to check this out for a couple of minutes, but then they would get really addicted to this sort of two-minute conversation where you get matched to a stranger and then at the end, it disconnects and you get matched to somebody else. So, it was sort of like this really interesting kind of like opportunity to just meet all kinds of different people from around the world. And so, some people really spent hours and hours on the app. One user in the first week actually spent 17 hours talking to people on this app which is pretty wild. This is the kind of feedback I was getting. Here's another one someone said they made a friend in Paris. So, making friends all over the world this person said they made two really good friends around the world in different parts of the world within an hour, which is really great. I'm right here. This person met a bunch of strangers and never realized they wouldn't spend this weekend just chatting with people on the internet and so they had a lot of fun.
And then this story here was actually my favorite story from a user of the app. So, this person said that they met a German girl on the app and they had a really great time chatting and right when they were exchanging telephone numbers, the server restarted and this is because actually it was my fault. So, anytime I deploy a new version of the site, I built it in a way where it actually kind of forces all the calls on the site to end for a quick second there while the server restarts. So, not the best design but one thing that I did is I sent a little message to the users on the app that says, "Hey, there. Please wrap up your call because the service is about to restart and in 10 seconds." And and in this case, these people were trying to exchange their telephone numbers really quickly before the call ended and the server restarted but then they ran out of time. And so, this guy was complaining about it and then low and behold, the German girl that he was trying to talk to comes into the comment section and says, "Hi, it's me. I just sent you an email. Hopefully, we can stay in touch." And so, thought that was a really great story where they actually met in the end.
So, this is one of the fun things that we're working on social apps is you're helping humans connect. So, one thing that you might be thinking at this point is like, "Okay, so this is great. All these great interactions are happening but what about like the bad actors? What about the malicious people? Won't there be a moderation problem with a site like this?" And that was actually my concern as well when building this is design. I was really worried that people would maybe come onto the site and then misbehave and create like a really bad experience for people. And so, one of the first things I did when I was building this is, I built a moderation dashboard or a control panel. And I'll just show you a little screenshot of me testing it out. So, you can get an idea here. On the left is the moderation dashboard and what I can do is just click on a little message, click Send and then you'll see over on the right-hand side of the screen, that's the user. And they're going to get this little message from the admin saying, "Please behave appropriately. This app is for friendly people." And I could also send custom messages to people like to basically, teach them or train them how to behave on the site correctly and I had all this built all this ready to go because I was really worried about this problem of the users not being aware of the sort of cultural norms I wanted to create around at the site and fortunately, I actually never really had to use this too much and nobody was behaving in a really terrible way. So, I was really pleased that was the case. One really cool thing that came out of it out of building this feature is I had the ability now to make these little thumbnails of anyone who's talking on a site and this actually made its way into the public facing product. And so, now when you go to Speakeasy, you can see all the people that are at the event and that are talking about to each other. So, you see all the thumbnails of the other calls that are going on even the ones that you're not part of and what's great about this is it gives you kind of the feeling that you're at a party or you're at an event so mean if you're talking to one person, you have this sort of sense of like over your shoulder to the left and to the right you can see all the other people that are chatting in your peripheral vision which is really cool.
Now, I'm going to focus on kind of some things that went wrong in building this thing. The things that could have been better and I apologize to the product people in the audience, but this is going to be a pretty developer centric part of the talk, because the stuff that went wrong is mostly coding related. It's mostly to do with iOS bugs. So, there were a whole lot of bugs on iOS in particular. It turns out Safari was one of the last browsers to add WebRTC support to the browser. And even though it's been a couple of years now that that you've technically been able to do this kind of video calling on Safari, they still have quite a lot of bugs that they've yet to iron out in particular on iOS. And so, I'm just going to give you a little taste of the suffering that I had to go through in dealing with these bugs and in building product. And hopefully if you're building a video or voice out of your own that some of the issues, I mentioned now will save you weeks of debugging time that I had to spend figuring out these bugs. So, without further ado, let's get into it. So, this one was really a joy, 10% of the time audio would be muted for one end of the WebRTC call. As you can imagine having your video calling app failed 10% of the time randomly is would be terrible user experience and people were really confused why their call wouldn't connect 10% of the time. Fortunately, this was fixed in iOS 13.6. So, this is not a problem anymore but I'll just share with you how I managed to fix this before it was actually fixed in iOS. The way I figured out the solution is where it was really strange. I basically opened the web inspector. I selected the video element and then I typed pause and play into the console. And it turns out if you pause the video and you play the video, suddenly the video starts to work. So, completely unexplainable but if you write this code, you can basically hack around the problem. So, what this code does is you basically it's saying when the video starts to play, go ahead and pause it. And then after it's been paused, make it play again. And this seems to basically fix this problem of the video not working 10% of the time. So, extremely gross code but it actually turns out that fixes the problem. Cool.
So, here's iOS Bug number two. So, there's this API called GetUserMedia which allows you to sort of get control of the users camera and microphone so that you can built these kinds of applications and on iOS, if you call this function twice then it actually breaks the video stream that you've got from the first time you called it. And this causes a lot of problems if you're trying to build a WebRTC application and unfortunately this seems like it was an intentional decision from Apple because it's still an issue in the current version of iOS and so there's some code here that I used to try to work around it. It's not that important. Long story short is it's a really annoying iOS issue.
Okay, so let's talk about iOS Bug number three. So, this bug is really hard to understand how these kinds of bugs really made it into iOS. It's really just so surprising. So, this one is if you take an app and you add it to the home screen, so you save the website to your home screen, and now you have a little icon there and then you open up the app from that icon, then the camera and the microphone will not work. So, that obviously completely breaks the application and for this bug, I basically just had to wait until Apple fixed it in iOS 13.5.1 but at least that's fixed now. Okay.
Here's iOS Bug number four. So, for this one, the way you trigger it, is you switch while you're in a video call from Safari and switch to another application and then you switched back to Safari and what that happens to is it just breaks your local video stream. So, the video of yourself that you'd see on the app just turns to black and you can't see yourself anymore. This one was fortunately fixed in iOS 13.5 but here's a code I used so to fix it in the meantime before that fixed came out from Apple. So, again, this is really going to be really gross code but you can kind of see what's going on here. So, there was that listen for visibility change which is the event that fires when the application is brought into focus or were sent away from focus. And then when that happens, I pause the video and I play the video and that happens to also fix the bug again. So, it's like the video chat gets stuck in some way. And then I just have a couple more bugs. I'm going to share it I could go on and on and on with all day about these iOS bugs because really, I mean, I have to really emphasize like the amount of time I spent on these iOS bugs was like at least 50% of the time building the app because I was able to prototype it in like three weeks and then I spent like more like a month or more just like fixing it on iOS. So, that's why I'm spending so much time on these bugs in this talk just because that's kind of where the time was spent when actually building the app. Okay. Let me just actually explain what this bug is. So, for this one, what happened is if you ever took your AirPods out during a call or put them in then the audio would stop for that person on the other end. So, you wouldn't be able to hear the person you're chatting with anymore. And this bug seems like it's fixed in the latest iOS, iOs14, but the kit bug is still open. So, I'm not sure if it's actually like fully fixed but it seems to mostly not be a problem anymore fortunately. And here's my, again, my hacky fix was basically if the video freezes, just play it when it pauses, just call play and that seem to kind of improve the situation a bit.
Just a couple more. I'm going to throw in just a couple more bugs here. So, this one is if you try to get the camera, you try to get a HD10 ADP video stream from the camera what's supposed to happen in WebRTC is when you ask for a particular video quality and that's not available because the camera happens to be lower resolution than you've asked for. The browser supposed to give you the closest match. So, if I want 10 ADP, but only I don't know, four ADP is available then it should give me for ADP without a problem. But it turns out on iOS, if you ask for 10 ADP you just get an error. And the highest that you can ask for without its erroring is 720P which is silly. There's no reason why I shouldn't be able to get the highest quality video when I'm building my application and of course, native apps on iOS don't have this problem so this is a pretty disappointing bug turned into. And the fixed was, of course, to just sort of have an if statement where if iOS then we'll use one resolution and otherwise we'll use the 10 ADP resolution and that's the solution.
And now I'll just share the very last iOS bug that I'm going to share with you today, which is to do with its web views. So, this is actually really useful information if you don't know it already but on iOS, it turns out all the third-party browsers, Chrome, Firefox, Brave, they're not actually running true Chrome, true Firefox or true Brave. Really what the browsers are is Safari with like a skin like a UI layer on top of Safari. And that's because Apple actually doesn't allow third party browsers into the app store. They don't allow third party browser engines that is. So, Chrome, Firefox, Brave just the sort of set of UI enhancements but under the hood, you're really running Safari. And the implication of this is that whatever decisions Apple makes about how the browser's going to work applies to sort of all these third-party groups. And so, one of these browsers are built as they use this feature called the WKWebView and that's a way that basically an application can embed a Safari instance in their application. And the WKWebView does support the WebRTC API. So, long story short, what does this mean? This means basically that all third-party browsers there's on iOS can't use WebRTC. So, if your users come to your site from Chrome or Firefox or Brave then WebRTC will be absent. And the best thing that you can do to these users, unfortunately, there's nothing that you can do other than telling them to have to switch to Safari and that's basically what we do here. So, you can see here, this is what we show the user on Chrome. We say, "Sorry, you have to use Safari." Which is very unfortunate.
So, anyway, I could go on and on about this but what I want to focus on now is just a couple sort of final learnings about things that happened that I didn't expect to happen while I was building this. So, the first thing is that the URL of the site Virus.Cafe, it turns out people thought that this site was actually a virus that was going to infect their computer and I don't know why they think that a virus site would actually declare itself to be a virus in the URL but I thought the URL was kind of cute when I came up with this name but it turns out that this seems to actually be scaring people away from trying the app out. And so, one of the things I did was I renamed a site to Speakeasy, as I mentioned before which I think is a much better name. And so, now this is sort of how the site looks, how it works it's called Speakeasy and it's a lot friendlier to potential users.
And so, let's go to the next thing. So, I noticed this one behavior. I have a couple of users which was extremely shocking to me and those of you that have built applications before and sort of observed your users, I'm sure you've seen all kinds of really strange stuff, completely inexplicable user behavior that you can't explain and this is one of those cases for me. So, I noticed that there were a couple of users who would basically use the app. They would get matched with a random person but instead of talking to them, they would immediately disconnect and continue to do so until they got matched to each other so that they were friends who were using the app, basically find a way to match with each other and then once they did that they would just talk for like hours and hours and hours and talk to nobody else. And I was mystified. Why would they be doing this? And so, I managed to get matched to them myself while using the app and I asked them, "Why are you using the app this way?" And they said, "Well we just want to have an audio chat. I just was trying to chat with my friend and your site is the only site that lets me do that without making an account." And so, it turns out, like if you let people chat to each other without making an account then some people will value that and they will literally force the app to be used in a way that it's not really meant for just because of this one feature. Very surprising.
The other thing that happened that was kind of surprising is that this Saudi influencer shared the site out to all of his Twitter followers and overnight sort of the feeling the culture on the site changed. It used to be all Silicon of Valley people who found the site on product times. And then suddenly now I'm getting matched with people who may be speaking Arabic. And that was very kind of surprising turn of events and I noticed this behavior that some of the them were doing which is they would block the camera with their finger and first, I was like this seems really bad why would a user want to block their camera. It must be up to no good but after getting matched to several people doing that, I learned that actually a lot of them were women who just didn't want to show their face to the people they were talking to because maybe some of the questions, the conversation started questions that the site was presenting are pretty personal and they felt more comfortable discussing these questions with people if they weren't showing their face which was very interesting behavior and actually led to me building that audio only option that you saw earlier. It turns out observing your users and talking to them you can learn a lot.
So, just going to finish up with a couple of final ideas here and then we'll do questions. So, what did I learn? One of the most important lessons from this whole experience is the importance of having critical mass when you're building an application like this because you're doing synchronous video calls, so you need to have people online to match to each other. And even if the site had thousands of people coming to it throughout the day but they all came at different times. Like let's say they came one minute apart, right? You're going to end up in a situation where you can't actually match anyone to each other because someone comes, there's no one online and they leave. And so, if you want an app like this to work, you really have to think about how am I going to get a lot of different people on the site at the same time to create this matching. And then the second thing is the importance of shared context. So, as I mentioned from that example, when all the Saudis came onto my site, if you don't have any context for the people that you're getting matched with, you're just not going to have as good of a conversation. Don't get me wrong. You can have a great conversation with a random person and that can be really fun but there's nothing that can replace having some kind of context with the person that you're talking to knowing why were we matched? What are we both interested in? Because it makes the conversation flow a lot better and so that's actually why the current version of the site Speakeasy is based around events instead of just getting matched to random people.
So, the way it works is you make an event and then you as a host invite the people that you want to come to the event and now there's some context around it. So, in this example here, we're actually making an event for working out and the reason why I chose this as an example is it demonstrates the flexibility of the tool. You can actually create all kinds of different events using the tools that Speakeasy, and in this case, we're making a Speakeasy for doing workouts, 60 second workouts with people that you don't know. So, you can see here, we're going to set the dark modes and make it look cooler. And we're going to set the length of the chats to 60 seconds so that we do a 60 second workout with the person we're matched with, and then we're going to automatically get matched to somebody else to do a different workout. We're going to disable the ability to extend the chat. So, it's going to be forced to be 60 seconds. And lastly, we're going to the sort of prompt that the two users see when they get connected to be a list of one of these workouts from this list here. So, maybe I'll get matched to somebody and it'll say, "Do a plank for 60 seconds with them." And once I've done that, I will go back and just hit the back button. And then you can see just like that with a couple of clicks, we were able to create a completely new video app experience for working out with strangers in 60 second segments and that's how Speakeasy works.
So, what I want to emphasize is that building WebRTC apps is extremely fun. WebRTC lets you build stuff with video and voice and especially in the current environment that we're in with the coronavirus pandemic, I'm so excited to see one of the silver linings from this is that there's all this interesting experimentation going on with video and voice meeting formats. And there's so many apps now where you can meet in all kinds of different interesting ways and people feel really empowered to experiment with the format. And I feel really confident that at the end of this, that we're going to see some new video format merge. That's a really interesting and really exciting that will change the way that we do meetings online. And so, to that point, one of the socialising platforms that UXDX is offering to you as an attendee is you can actually go and use Speakeasy and meet with people and check out the app for yourself. So, if that sounds interesting at all, make sure to look for the Speakeasy option and select it and go forth and meet other UXDX attendees using the platform. And hopefully this talk today will have given you more insight into how and why I built the site.
And hopefully that makes it all the more fun when you're using it to talk to people from UXDX. So, with that, thank you very much for coming to my talk and I hope that you enjoy the rest of UXDX.
Got a Question?
More like this?
Mon, Oct 05, 7:00 PM UTCData Driven Engineering Team - Changing The Culture
VP, Software Development, Hootsuite
Tue, Oct 06, 3:10 PM UTCMind the Gap Between the Product and the Platform
Developer & Agile Coach, Mojang
VP, Software Development, Hootsuite
Global Vice President, Platform Technology, Conde Nast
VP, Product, Gainsight
Wed, Oct 07, 2:55 PM UTCSoftware Engineership - How To Think About Software
International Speaker & Author , Continuous Delivery
Head of Engineering, ParcelVision
Author, Agile Testing with Lisa Crispin
Lead Principal Software Engineer, ASOS