Café with Yang from Facebook, on HTTP3 and QUIC

Transcript of the App Performance Café episode with Yang Chi , from Facebook, originally published on July 2, 2020. More details in the podcast episode page.

Rui Costa:

Hi, everyone. Welcome to the App Performance Cafe, a podcast on mobile app performance. My name is Rui and our guest for today is Yang Chi from Facebook. As you will see Yang and I are both network protocols enthusiasts, but we will start our conversation with the relationship between performance and user engagement, and as well as why it is so relevant to get visibility of the performance your end users are getting. Finally we'll dive into our passion. So we will tell us a lot about HTTP/3 and QUIC - it's potential benefits and the trade offs that we can expect. And finally, Yang will also share with u, some interesting facts about what happened during the lockdown stage due to the COVID-19 pandemic. I hope you enjoy it. And don't forget to follow us on the usual podcast platforms, as well as visit performancecafe.codavel.com.

Episode Start

Hi, everyone. Welcome to the App Performance Cafe. My name is Rui and today I'm delighted to have you with me Yang Chi from Facebook. Thank you Yang for joining us today. Can you tell us a little bit more about yourself and what you do at Facebook?

Yang Chi:

Sure, right. Thank you for having me here. My name Yang, as you said, I'm a software engineer from the Facebook traffic protocols team. We work on the HTTP stack on both the Facebook server and Facebook mobile clients. And in the past, I guess two years we have been working on QUIC. We have our open source QUIC implementation mvfst and is being deployed on Facebook servers, and we have been working on rolling out in Facebook mobile applications.

Rui Costa:

Wow, cool! So today's episode is something that I truly enjoy because I'm a network engineer, I've been working all my life in protocols. And QUIC is one of the most exciting initiatives out there. I have been following it for -

Yang Chi:

It is kinda exciting.

Rui Costa:

Yes, I believe so. But before going into that, so before diving into QUIC and mvfst and what you guys have been doing - I always like to start by setting the tone for the conversation with a question on why, in your opinion, why should we care about mobile app performance? Why is it relevant and why should we devote ourselves to things like building new protocols, for example?

Yang Chi:

Sure, yeah. I think that's a reasonable way to start this conversation, like why people should care about and why we do all this work. So from our perspective, there are quite a few reasons.

One is we want to have the features available to our own developers and our users before the system can provide the features to us. For example, Facebook rolled out SPDY and HP too, before Android and iOS had such support. We have been working on QUIC and I probably Android and iOS would soon need maybe one year to offer such API computations. But you know by working on our own networking stack, we can have those features before the system can provide to us. And from there, we provide a better performance to the user, and from there what we see is when the is performance better the user, they actually engage with the app a lot more probably Android and iOS would soon need maybe one year to offer such API computations. We have been running a lot of AB tests, to turn on and off different features in the networking stack. And we can clearly see when the networking is faster people use the app more: they watch more videos, they see more images from their friends - they even set up more friend requests, and write more comments.

Rui Costa:

Really?!

Yang Chi:

Yeah, like people are just having a better time when the performance is better. And that's a big reason, at least the reason that Facebook did all those mobile networking work is that we also want to increase the visibility, the observability. So a big part is missing from the system API is that we don't know the details of what's going on. How much time you actually spend on like making a connection and how much time you spend sending out, processing request response headers and all those detailed timing information - we don't have them. And that's quite important. And also like when the request fails, why does it fail? And on what type of network does it fail more? Those are the information we really want to know, and without our own networking stack and working on the mobile networking performance issues - we'll never find out. So the visibility story that's also very important.

Rui Costa:

So measuring every millisecond that we spend around metric performance, and as you mentioned, and this is - I totally relate to that. Probably the most challenging piece is when you have problems, either like requests fail or the most challenging is when things are slow - why is that happening? And like with the diversity of devices, of networks, locations - you name it - it's very, very tricky to do that. So I absolutely relate to that.

Yang Chi:

Sometimes the number really comes back as a surprise. Like one thing, at least surprised me, is I never realized that we send so many requests when there's no network. Like there's no network and users do try to use the app and try to send a lot of requests. That actually happens a lot. And without this number, without this visibility - we’ll never know.

Rui Costa:

Yeah. That's actually a very interesting use case because I always think, and Sailesh in a past episode mentioned precisely that one of the use cases you should have in mind is precisely to prepare your app for a no connectivity. If you are not able, as you said, to actually track that - you have no way to know what's happening. Absolutely relate to that experience.

But taking two steps back - so I want to go into QUIC. We as a community have been working on this QUIC or HTTP/3, so this revisiting of standard, I was going to say application protocols, but actually it's a transport protocol in its essence, so why do we need it to go into this direction from your perspective?

Yang Chi:

I see. I think people try to improve TCP. There was a lot of very interesting work on it. The problem is anytime you try to change TCP, there are meta boxes in the network that they assume TCP should work in certain ways and any improvements would actually break. So I think the community was in this weird position that you have to use TCP without any improvement. You're using a protocol that was designed - I dunno, three years ago - and you can't iterate and optimize on top of that. Not put it through a very tough spot to make the performance better. I think that's why QUIC came in. And from - at least the way I say that I see people in the QUIC working has been designing this protocol - they want no visibility from the meta box, which means they can still change the protocol once we have a spec version out there.

Rui Costa:

Yes, absolutely! And then the time it takes to actually see things rolling out ... and actually from my perspective it's also about all of that also leads to the fact that it's very hard to optimize because it's very hard to measure. It's very hard to test. And so you need more flexibility there. And as you mentioned TCP - although very successful obviously, otherwise I guess you wouldn't be here having this discussion - was designed where the mindset was wired connectivity, or I was going to say more stable connectivity, and so things have evolved and I absolutely agree. And I've been dedicating myself to that for quite a while now, actually. To this fact that you actually need to revisit that so ... networks are not stable, in particular wireless networks, pose challenges that TCP, by design I would say, will be very difficult to actually tackle or optimize with something built on top of TCP.

And so I guess that's why we move into something based on top of UDP: building all, like in essence the core features - so a reliable delivery, ordered delivery and congestion control of course are the same. So same features, but built on top of UDP so that we can get that observability, easy deployment, fast testing, so of different optimizations. And so I'm absolutely excited to see what's been going on. I was going to ask you as it is for today, what would you say there are like the key improvements that QUIC brought us, first from a conceptual perspective - so protocol wise, how far are we in terms of changing TCP now in QUIC, let's say? And then what are like the outcomes of that from your perspective?

Yang Chi:

I would say a lot of people's go-to feature, and they want from QUIC, is the 0-RTT connection establishment and also connection migration. Those two features are very appealing to a lot of people. It's also very appealing to us as well and we have been working very hard on those two features. But like one thing that's quite surprising, from the numbers we see, from our experiment of, you know rolling out QUIC in Facebook is that even without those two very important features we already see a lot of improvements in performance.

Turns out once you have everything in the user space, you have control of pretty much everything, right? And we wrote our library in a way that the packetization happens very, very late. So imagine you have more proposed streams trying to write into the same connection in HTTP2, you're at their user space, you don't see anything happening in the kernel. You just pull your data from two streams into the same connection and the connection doesn't really know those are two different hp2 streams, and they're going to do the packetization based on the order you give them the data. But because now we move to transport into user space we make a packetization decision very, very late. Like right before we write the packets, we do that. And we can wrap in different streams because we also understand the streams. We know the concept in the transport layer and turns out that's also very important to mobile performance, especially to video when you have ... like one video request is not just one stream: you have the meta data, you have video, you have audio, that's at least three streams - and you need to deliver them at the same time so the player can play it. Just by doing the packetization better and doing the wrapping of different streams better in the transport later, we can already see a big improvement.

Rui Costa:

Wow, interesting! Multiplexing in some essence, I would say that.

Yang Chi:

So also of course there's no Head of Line blocking in the same way as HTTP2.

Rui Costa:

For everyone listening to us, we just mentioned three key features, 0-RTT, which is basically the fact that we are connecting or the user is connecting to a known server, there are no handshakes, so data starts coming in immediately. Then we mentioned connection migration, which is basically the fact that, say a user is on 4G, then he gets home he connects to WiFI and this process - whereas if you are using standard HTTP2 PCP base, you will need to establish a new connection, so meaning more handshakes - the connection will have a stall from the application perspective. With QUIC or this connection migration feature, basically it's the same connection from the application perspective - so nothing stalls. And then we were just discussing multiplexing. And as you mentioned, it completely kills Head of Line blocking, and this has a tremendous impact, definitely. I don't want to go into very big, big details about QUIC but - as you mentioned the capacity or the ability to build this networking logic at the application layer brings tremendous opportunities, but also challenges, right? Like I'm thinking, for example, in processing power and processing efficiency, let's say, so that's one of the challenges that I have in mind. What do you see happening there? Are we getting closer to something that is closer to optimal TCP usage at the kernel level or some path to reach there?

Yang Chi:

I think there's still a gap. That's, that's definitely the case. But how large that gap is, I think that also depends on the use case. What we see is if you're talking about the internet traffic, then I think the gap is not that big. But if you tell me about data center to data center or intranet server traffic, I think that's where QUIC will still have a lot of work to do to catch up with TCP performance.

Rui Costa:

Yeah, it makes sense, right? Volume of data is tremendously bigger. So every small, less optimized piece will cascade into something bigger. Yeah, that absolutely makes sense. And where do we see this ability - so we already mentioned three features that come from the ability of building this at the application level. What else is out there? What else do you think that ahead in terms of what we can do, due to the fact that we now have the ability to manipulate, and manipulate in a good way I would say, the traffic that is going out and into the application and how the application connects with the internet or with the cloud servers?

Yang Chi:

So, with the flexibility in QUIC, I think there's lots of things it can do. For example if you just want to try a different loss recovery algorithm, if you want to do it in TCP that means you're going to the kernel code, you do a code change there, you have to boot your kernel, you have to deploy your kernel - which for a large internet company that means a lot of time.But for user space transport that's a very easy to write, it's very easy to test, it's very easy to canary and you're going to see results just in a few days.

Also another area is congestion control. So Facebook was able to experiment with this congestion control algorithm called COPA for our live stream traffic, which one my colleagues gave a presentation a few months ago.So we were able to do that because we have a user space transport. There's no kernel support for COPA today, but with QUIC we were able to do that.

Rui Costa:

Yeah. Just update the app, update the server and that's it.

Yang Chi:

Yeah.

Rui Costa:

And in terms of observability as you mentioned, so like in the episode with Prassana, Prassana mentioned this concept of personalized performance - I guess that was the term that he was using.

So basically something that you have within the app that can measure everything that's going on the application within the device so that you can make better decisions. Do you think that, I think it is, but there is this major opportunity due to the fact that you're building the entire transfer of data at the application - is there an opportunity to go further in terms of what you can do, in terms of servability within the app so that the app can adapt. Do you think there is?

Yang Chi:

I think there absolutely is, but that's also weird and a little tricky. I guess I will probably discuss that in maybe two folds. One is about, now we have this transport here, right? We can certainly observe lots of things we couldn't in the past. That's cool. And I think that's absolutely where we should be going. And in fact, there is this extension spec comes from these researchers from Europe, I think his name is Robin Marx. He proposed this logging scheme - QUIC log or key log - that's very rich. We have been using that on our server side. The trickiest thing of doing that in a mobile application is that, that's a lot of data to log and that's a lot of CPU power that you have to burn on a mobile device. So that's where it gets a little tricky and I think one example of not doing this right. I have seen in the past was, I saw someone build a very, very extensive tracing framework in a mobile application that will unfortunately stop the networking stride every time there's a response comes back for about 10 milliseconds in my profile.

It gives you a very good visibility of what's going on the networking stride and all the other inside requests. But ten milliseconds means - if your company built a server in New York city to serve users in New York city, that 10 milliseconds just move the server all the way to Denver. Which is not cool. I think it's going to be very hard to balance this, but we should definitely try. And the second part of this discussion about this personalized mobile experience is that's also very hard to achieve. I think that's a great, great inspiration. But I think we still have a long way to go. At least the way I see our networking stack is - I would love to say our stack be able to, you know, just run on users' devices for a little bit of time and then figure out this is the most optimal configuration I should have of this user. But I believe we still have a long way to go there.

Rui Costa:

Yeah, definitely. I absolutely agree. But, thinking as a network engineer, like all the information over the network now happens at the application layer. So you can get per packet, delay, you can get jitter measured in much more granular way. You can do a lot of things due to the fact that you have the data there, but I agree we don't want to stall the application or 10 milliseconds every time we want to do that. And that's actually a very good point because not always more information is actually better to make these decisions in the sense that, the fact that you are computing that will impair what you are actually measuring. So, it's the Heisenberg Principle, right? You are influencing what we're actually observing. So we need to keep this, I would say very sensitive trade off between how much you're collecting and analyzing versus how much we are processing on top of that and giving that information to the applications, so that the application can adapt.

Before we just jump and then to the actual episode, we were sharing something about something you have seen throughout this COVID-19 - I was going to say crisis. I don't want to take it on a negative side - but you have seen different things in terms of metric behavior, right?

Yang Chi:

Yes, I think in the past two months, not just Facebook quite a few other internet companies have also seen this, as this pandemic spread around the world, we see more and more countries having this shelter-in-place or lockdown policy going on and people will start to work from home. They spend more time on the internet. You see the internet traffic go up, and behind the same - of course there's like we see heavier load on our server side and a lot of engineers on our side have been working very hard just to keep the server running. On the other side, in the network, that's closer to the user we also see there's more congestion going on.

One interesting result we see from our QUIC environment, because we have been running this AB test between QUIC and TCP on Facebook users. What we saw was, around the time in late February, I think we already pretty much figured out what's the optimal configuration we should use in our experiments. And we already saw very positive results coming from this experiment. Like people watch more video. People use the app a lot more than they are in the QUIC test group. And then in the next two months, as everybody is going home and use more internet and are getting more congested the result just automatically got better by itself without us touching their code. I think that's a very good indication telling us that we are able to handle the network congestion a lot more better.

Rui Costa:

Wow. So basically to try to see if I understood. So you had this AB test running, so HTTP to standard TCP versus your QUIC mvfast experiment.

Yang Chi:

Right.

Rui Costa:

And then as soon as you start seeing a volume of data increasing, congestions started increasing, on the last mile. So over the end users actually, last mile network. And from there, you saw that the gap between, user experience comparing the user on TCP versus the user on QUIC the gap increased - is that correct?

Yang Chi:

Yes.

Rui Costa:

Wow. That's impressive. So trying to figure it out, is it because TCP got significantly slower because QUIC brought it faster? That's probably hard, right? Or QUIC was able to -

Yang Chi:

I don't think that's the case, unless someone like who's running the network has been treating TCP and UPD very differently during this pandemic globally in the past two months. Which, I don't believe is the case. That would be crazy. I think this probably goes back to what we just discussed about the Head of Line blocking and how we can multiply a lot better than TCP. I think that's the main driver of why our engagement number just became much better in the past two months.

Rui Costa:

Wow, that's superb - in the sense that - would you say that this is a very impressive result?

Yang Chi:

I was going to put surprising.

Rui Costa:

Well, yes - I agree. I agree. It's surprising. But it's kind of like what the community has been working towards. I was going to say, it's what's supposed to be, but from building the protocol to actually seeing the outcome goes a long way.

Yang Chi:

I understand the theoretical benefit on paper. We never expected that there would be this worldwide event for us to see it - and I really hope we don't say it again - but you know, it happened and we were able to witness this type of results. That's quite amazing.

Rui Costa:

And it goes into working on this networking protocols, having an impact on people's lives, I guess that's also very - something that you should be proud of in that regard. I wasn't going to come complete that sentence with something like, we already see significant positive outputs with just a small step into what we can do with respect to QUIC - would you agree?

Yang Chi:

Yes.

Rui Costa:

What are the next big things that you expect to see happening around QUIC? Anything in particular, or is it about continuing to optimize what has been done so far?

Yang Chi:

Yeah, I guess this question can touch both: like where I say maybe the spec is going and also our work at Facebook.Unfortunately, I don't think I'm the right person to talk about where this spec is going and where all of this is going to go. But I do have my wishlist of what I want to see happens in this area.

So one thing that's quite important is the path MTU detection in QUIC. Unfortunately in our deployment so far, we are only able to use the minimal allowed packet size in QUIC and we use this value globally and I'm pretty sure they are networked and we can use a slightly larger volume. But making the value right, making the packet size that's correct - turns out to be very hard. We ran an experiment to just increase the packet size by 20 bytes. And then suddenly we have quite a bit of users in one country that was not able to connect with Facebook server over QUIC. If we can have path MTU detection in QUIC,that would be super cool.

Another thing I want to say in this area is partial reliability. Like QUIC today is through a fully-reliable transport. It doesn't have to. And there are applications that we have today and can certainly benefit from a --- transport.

Rui Costa:

You're thinking about which application? So when you say that, I always think about live video streaming. Is that one?

Yang Chi:

Definitely. That's a very good example. Live video both upload and playback.

Rui Costa:

Interesting because on my PhD thesis -

Yang Chi:

Right, you worked on network coding.

Rui Costa:

Yes. The final piece of the puzzle was precisely to work on this notion of partial reliability in the sense that, you have X slots or time slots to recover from this given loss. Otherwise just forget it and keep going precisely because of this use case. So yeah, so that's why I'm excited about QUIC and all that's going on about protocols, but it's because I do see a lot of potential benefit from a wide area or wide variety of applications. Like you just mentioned, we just mentioned live streaming, but actually I believe that this applies to tons and tons of things. For example, this specific use case that we are right now, like a conference call, I believe the impact will even be bigger because this is quite delay sensitive - so it is very important to do that.

Wow. I believe that we could be talking about QUIC for a day long.

Yang Chi:

It has the potential to be a long conversation.

Rui Costa:

Yes, it does. Like I'm thinking about, so you mentioned already - so the part that I work on is on the loss recovery let's say, so codification for loss recovery. That's one. You have congestion control. You have AR2 base versus coded base. Wow. You have tons and tons of things. For example, one thing that always comes to mind when thinking about this stuff is there are some networks where the network being for some reason caps or throttles down UDP connections. So how can you handle that? How can you actually detect that UDP is throttled down so that you can switch back to TCP in this specific case? I don't think that’s the rule - that's a far from it, but do you have these cases? Well, tons and tons and tons of stuff, but do you know about this specific case value UDP cap or throttle now?

Yang Chi:

Well, I know at least one major carrier in the United States that's capping both TCP and UDP. Thinking of only UDP capping I'm not aware of a carrier that's doing that today, but it wouldn't surprise me if there are social networks and we have some ongoing work to detect traffic police or traffic shapers in real time. What's high hope that conversation would be going is not, we will fall back to TCP, but I'm hoping we can have our conversation with some people who are running the network and understand why they do it and maybe push them forward into this new world where QUIC would be the majority.

Rui Costa:

Absolutely support you in that expectation. Wow, Yang - fantastic. So before we dive more into that, let me tease you with the final question.

Yang Chi:

Sure.

Rui Costa:

Like if you met - let's pose it in a different tone that we did before - so you if you met you're at Starbucks and you meet this young guy that is jumping into this performance engineering career. You have 30 seconds before you leave. What would you say to this guy? What would be the key takeaways from your experience that you would share with them?

Yang Chi:

I would say that performance work is beyond performance. What your choice here is your user's experience of using your application. And that's something that you should definitely care about. Otherwise you shouldn't ever put your product or application in front of the user. You care about their experience of using your product, then you go chase your performance problems. Because you solve that - they will have a better time.

Rui Costa:

Wow, that's very good advice. I would say, Yang, thank you so much.It was a true pleasure. It was very, very enlightening. I feel that we can go on and on and do like three or four episodes on HTTP/3 and QUIC and all of that. I guess in the future, I will have opportunities to come back with what's been going on with HTTP/3 and QUIC and mvfst and other libraries that you have out there. Thank you so much Yang. And thank you all for listening. See you next week!

Hope you have enjoyed the conversation. I will leave Yang's Twitter in the description of the episode. Don't forget to follow us on the usual podcast platforms like Apple podcast, Google podcasts or Spotify - as well as visit performancecafe.codavel.com

See you next week.

Café with Yang from Facebook, on HTTP3 and QUIC

Why did we build a Mobile CDN

How fast is QUIC protocol and what makes Bolina faster - PT. II

Café with Hendro from Bukalapak, on super apps and their performance

Café with Yang from Facebook, on HTTP3 and QUIC

Related Articles

Why did we build a Mobile CDN

How fast is QUIC protocol and what makes Bolina faster - PT. II

Café with Hendro from Bukalapak, on super apps and their performance