Café with Sailesh from Western Digital, on why every component matters for performance

Café with Sailesh from Western Digital, on why every component matters for performance
 

Transcript of the App Performance Café episode with Sailesh Rachabathuni , from Western Digital, originally published on June 4, 2020. More details in the podcast episode page.

Rui Costa:

Hi everyone! My name is Rui and welcome to the App Performance Cafe, a podcast on mobile app performance. Today our guest is Sailesh from Western Digital. I will start by discussing why every component matters with respect for performance; from backend optimization to the screen rendering for example, and then we will go to the challenges posed by the instability of wireless networks like 4G, 3G or wifi. Hope you enjoy it - don't forget to follow us on the usual podcast platforms, and visit performancecafe.codavel.com

Episode Start

Rui Costa:

Welcome to the App Performance Cafe. My name is Rui and I'm very excited to share a cafe with Sailesh Rachabathuni from Western Digital. Thank you so much, Sailesh, for joining us today. Can you please tell the audience a little bit about yourself and what you do at Western Digital?

 

Sailesh Rachabathuni:

Hi Rui, glad to be on the App Performance Cafe. My name is Sailesh Rachabathuni, I'm an engineer, an architect, a technical manager, and I've been in the industry for about 20 years, 20 plus years. And I've been mostly focusing on consumer electronic products. You know, one of my proud products is one of the Palm early trail phones. Most people don't remember them anymore. One of the first smartphones, I was part of that team that delivered them. I was also part of the team that delivered the original Kindle, Amazon e-reader, a long time ago. I was part of that team.

 

Rui Costa:

I wasn't aware of that!

 It was a great project to be on. I mean, it was groundbreaking. Talking about App Performance Cafe, talking to performance problems

think about an e-device that takes like one to three seconds to update the screen. But you have to be very careful when you update what you update and so on. So yeah, so we got our exposure into app performance at that time on the e-ink screen. 

 

Rui Costa:

Wow!

 

Sailesh Rachabathuni:

Yeah, more recently I shipped a WebOs TV for LG and currently I'm working on personal cloud solutions at Western Digital.

 

Rui Costa:

My Cloud, right?

 

Sailesh Rachabathuni:

My Cloud device, yes.

 

Rui Costa:

Thank you so much for that intro. I was not aware at all that you were involved in Kindle, so yeah, that's fantastic. So Sailesh , I always like to start with this broad question: why should we care about mobile app performance and what does performance or app performance mean to you in this, in this case?

 

Sailesh Rachabathuni:

So the mobile app performance, I'm going to repeat the statistics, probably we all are aware of now you know, a statistic from Nielsen: so they talk about user attention span, right? So the app has to respond to your command within 0.1 seconds for the user to feel that an app is highly responsive and users typically tolerate about one second for response. And it's expected after one second is when users start fading out, the attention starts fading out and 10 seconds is the absolute maximum before they start switching to something else. And think about both mobile apps and web apps, mobile apps they have a little more tolerance, but on a web app, switching to a different task comes at almost no cost. So in those cases, if the app takes like 10 seconds to respond, they're out somewhere, they're checking Twitter, right? Yeah. So we already lost the user. So it's important for the app developers to make sure that we're not losing the user, we're holding their attention as we go through the app tasks - that's quite important.

 

Rui Costa:

Yeah, that they stick within the app and don't get lost with, I dunno, my phone right now has, I don't know, 25 notifications. So if there's something takes a long time to actually load, yeah, that would happen. And I will just jump into something else. 

 

Sailesh Rachabathuni:

You are competing with the Instagram personal message. So you have to always keep it more interesting than that.

 

Rui Costa:

Yes, absolutely. Absolutely. So what do you see as the key factors that may affect this responsiveness of the application? 

 

Sailesh Rachabathuni:

Cool. So, yes, I think we kind of narrow down on the responsiveness so that when people talk about app performance, there are various factors of performance, right? People talk about battery drain, CPU usage, and several other things.But I'm trying to focus on responsiveness of the application. To me that's very important, not only as the developer, but also as a user. It drives me crazy if an app is taking too long to respond to my commands. Responsiveness is a very important factor in creating a delightful application. So, tuning responsiveness changes based on context, right? Because it depends on what kind of app you're working on to tune the responsiveness. For example, how you tune a game is very different from how you tune an application that is using GPS location services. And it's different from how you tune an application that has a cloud backend. Yeah, I actually want to dig into the cloud backend kind of applications in this session. 

 

Rui Costa:

Okay, cloud-connected applications, that's where we're focused. Yeah, and I agree like performance can mean so many things, and that's why usually we start with phrasing it from where we want to take it and specific use cases we have in mind, otherwise we can get lost about where do we want to go with the conversation?

So,  what do you see as - so take cloud-connected application, mobile application - what do you see that we should be most concerned about when we look into performance or responsiveness in this specific case? What do you see there as the biggest challenges? 

 

Sailesh Rachabathuni:

The biggest challenges, okay. So there are several challenges in making a responsive cloud application. First of all, the trend is pretty much every application these days has a cloud backend. You can't find applications  that work  completely locally these days. Any useful application, take your e-trade, your Twitters, or even Evernote - they all have some kind of a cloud back into it, even though most of the data is local.

So, there are several challenges, let's talk about the challenges that these app developers have to think about, right? Talking to a backend server is not only slow when compared to local access, but also primarily unpredictable. It depends on your situation, how fast the server's responding, right? And also they need to be able to manage the data, the volume of data they want to consume. Think about it right, on the server, you have volumes of data about the user, about the application, the context, and the developer's ability to figure out which small portion of the data to retrieve and how to slice it and show it to the user. Because all these things factor into how the app performs for the user. Simple example, Gmail, you may have 10,000 messages in Gmail. When they designed the Gmail application for the mobile, they have to make a call- how much of it am I going to get? And how much processing am I going to do on the server side? You make it simpler for the application. You make it simpler for the application, and how much of it can I try to fit on the screen and render it. All these little decisions have a huge impact on the performance. 

 

Rui Costa:

So I guess we are touching from server-side processing content delivery, so network, impairing performance and also volume of data, of course. How do you deliver that data into the application or even into the screen? And then on the client side, on the app side, how do you render that? How do you provide that for the UI right? That's the biggest challenges that you've talked about, right? 

 

Sailesh Rachabathuni:

Right. But let me break it down. So when somebody wants to performance-tune a cloud connection application, there are primarily three factors that they're looking at, right? One is the server response time, how quickly server's responding to the calls you're making. They're dealing with network conditions and they are dealing with the screen update time itself. How complex am I making the screen rendering and how quickly it's updating - the other three factors most developers would like to tune to get to the best performance possible. 

So let's talk about how people approach this and what are the things that you can do to improve this kind of performance? So the fundamental things everybody does, for example one of the things everybody does is, you know, you go performance-tune your backend. You know that the DevOps dedicated to performance tuning, they look where's the accesses, memcaching on the database side. They do everything they can do to performance-tune. Obviously that's the first thing anybody does. And there is testing under the load, right? And again, these are not new things for most development teams and DevOps teams, this is their bread and butter. 

How do you test it in perform, test it under load, make sure the performance is up to par. But it can only go so far, right? I mean you're not going to be able to guarantee a 10 millisecond response on every server. So while we, we try to get the best out of the server, that cannot be the only way to tune your application. You know, I'll also give you these days of microservice architecture where, so sometimes these calls they change, right? You know you call one, microservice calls something else, it all adds to the latency of the service. Performance-tuning is obviously one of the most basic things to do and we should get out of that out of the way.



Rui Costa:

Even if you have the perfect backend performance-wise, you still have the rest, right? You still have the delivering of the content to the device. And then on the device itself, the process in rendering of the content

 

Sailesh Rachabathuni:

Right. So on top of that one of the things most people do is caching on the client-side. And that is caching, prefetching that you know that you should get done. One of the things I feel that many developers are looking at is the http cache control. You know, before we go and write our own cache control, we should make sure that we are utilizing everything that HTTP provides us. That our cache control headers that let you control the cache, those they need to be taking advantage of Etags, HTTP defines Etags - those things need to be taken advantage of, right? So that's the second thing. 

And if I may bring up one thing I noticed recently, not sure how many people are realizing it, but one of the best things you can do for your application is to make sure using HTTP 2.0 end to end, most clients these days automatically support HTTP 2.0, not all of them, but that would be a very good low-hanging fruit to take care of. Make sure it's HTTP 2.0 because as you know, Rui, HTTP 1.0 is sequential in refresh response, HTTP 2.0 takes that out of the picture and gives you a much better performance. 

 

Rui Costa:

Yes. It enables it to do some multiplexing for example.That has a very tremendous impact in terms of performance, definitely. 

 

Sailesh Rachabathuni:

Absolutely.

 

Rui Costa:

Yeah. But still, even if you do say - again, perfect backend - you optimize your client, so we are using HTTP 2 with a state-of-the-art features or even the upcoming, well, it's already there,  the HTTP/3 based on QUIC protocol: even if you use all that, you still have the uncertainty from the network perspective, right? So take it to the extreme, like if you're not connected, you're gone, right?

 

Sailesh Rachabathuni:

That's correct.

 

Rui Costa:

We should be prepared yourself to do that. So you should be prepared to the unconnected set up and unconnected experience. But then, I guess that's the -  I'm going to say easy - although it's not easy at all. But it's the easiest of all the problems to tackle, because you can clearly reflect the end user experience because it's just shutting down internet access on your testing phone, for example.

But when we go to the network perspective, so the instability or the unpredictability of network experience: what kind of challenges do you see there and what kind of approaches do you see has the most recommended into taking into account? How can we, you prepare yourself for this unpredictability? How can you optimize yourself towards that? 

 

Sailesh Rachabathuni:

That's a really good topic and this is something really dear to my heart. And one advice I give to engineers all the time is that: don't program assuming the most optimal network conditions. And it seems like a platitude, but it's surprising how many applications don't really actually follow that.

And as you brought up, the problem could be simple or very difficult. In the most simple case, you know: you check network, if we don't have network you throw up a dialogue that says "No networks. Sorry it can't work" - that's the simple case. But even within that, there are several nuances.

So that's the basic stuff to take care of obviously, your application needs to look for no connectivity and then deal with that. And also make the assumption that data rates are variable, they can change. Even within the same location, they can change from client to client, the customer to customer - the data rate changes. So the applications at the minimum have to take care of those, right? But as I said, that's the easy part of it.

There are several difficult parts in there, which I think many applications overlook. I'll give you a simple example: most applications look at the device, connected state from the device to decide whether we have network or no network, right? But there are cases where the device, and I think it's connected, but you still don't have data. And  I actually experienced the brunt of the problem recently when my internet started having problems and the internet goes out. From my laptop's point of view, it is still connected to wifi, but there is no data. So I was surprised how many applications just fall on their faces in this situation. A good example would be say, Evernote. You start Evernote, it has all the data locally, but it just gets stuck on the launch because it's still waiting for something to happen. I go turn the wifi off. Everything works fine because then the device, then the laptop says I have no connection. Everybody's happy.

You know these are the things you, as you work with many network conditions, you realize that there are several corner cases that applications have to worry about. So that's a good example. Don't assume that you have data flowing back and forth just because the devices are connected. 

 

Rui Costa:

Yeah. Then it also,  it kind of relates to last week's episode with Prasanna where we were kind of talking about then the next gray area which is: Okay. you were connected. You do have connectivity in the sense that you do get some data, but one thing is being on wifi in my house with a perfect fiber optical backend. It's completely different to be on 4G in the city center or when I go to the countryside, I have 4G maybe 3G in most cases and connectivity or available throughput is completely different. Not only that, but also I would say stability, right? I get home. Usually get those, I don't know, in my case it's 200 megabits per second, pretty much stable - unless my daughter is watching YouTube or something like that, then that changes a little bit.  But in particular, when you go outside or when you go mobile that completely changes, right? It just fluctuates. And so Prasanna was kind of telling us about this concept of network quality as a service. Which is something, I guess you're also a very, something that also relates to you and to your concerns. 

 

Sailesh Rachabathuni:

It does. Right. That's actually a great segment. You talked about network quality assessment and it is not an easy problem. That's the reason I think we're going about figuring out as a service. Now I'm not going to go into that, but let's make the assumption on the app side that you have some metric of the quality of the network that you have. That can do miracles for your application. You know, knowingly or unknowingly at a gross level we do it today. Many applications give you the option that if you are not on the wifi - don’t do this -  if you're not on the wifi don't pre download your thumbnails, don't auto-backup your phone when you're not on when you're not on wifi, for example. In the My Cloud application we have that option or even in Dropbox, many applications is that when you are not on wifi don't upload the content, for example.

So that goes towards something we refer to as degrading gracefully, right? As the network conditions change, the quality changes you're degrading the app performance in a graceful way that user can still have a good experience. While some features may not work, but it's still a decent experience. But that's at a very gross level. Are you on wifi or are you on 3G/4G -  then I behave differently. But introduction of this network quality assessment gives you much better control on several things you can do. For example, you could choose to disable a set of features, but the quality is not at this level. And it changes the program paradigm that almost, think of it this way: if you have a network assessment from zero to hundred, you basically associate features to a certain threshold. And if you're not meeting that threshold, that feature will tell you that this feature won't' work because network is not quite there yet, right? So I think that changes the game a lot. So whether you have a numerical metric for network quality, it is always important to think about network quality while you're developing these features. And if there's any way for the app to program in a way that can degrade gracefully, that is a much better experience than trying to do everything in every network condition. 

Talking about network quality, right? So again, this is, this topic is quite complex because. We just talked about a numerical value for network quality, right? When you say that, you know, we have the impression that if you think about the graph that it goes to 60 and stays at 60. More likely, if you actually look at it, it's gotta be 20, 60, 80, 90, 20 - it changes quite frequently. And that's a challenge for the applications to handle too. So the other thing that I'll advise is that data rates are going to be unpredictable. And we need to be able to handle that.

 

Rui Costa:

Absolutely. As you know, like for me, this is a very dear problem and something that does relate to a lot into what we do. But that fluctuation, that instability is something that's more frequent than usually what we are aware. And people tend to think about these remote areas, old devices, but you also see that like very good 4G connections and very good wifi connections with the latest devices you have, you will also see that fluctuation. Which made you do a wide variety of things, like ---- to the back end as we're just designing. 

 

Sailesh Rachabathuni:

Yeah, of course.

 

Rui Costa:

Usually the most, the most critical point is actually the delivery piece. So the metric of the incubation. Absolutely. I'm thinking that. Okay. So, we can optimize the backend. We can take into account like measuring, as you mentioned, like network quality, we could do an entire episode on what network quality means and how to measure that, because it sounds so simple. But as you said, it's we have latency, we have jitter, we have packet loss, we have throughput. Then we have the variance of all of this. It's very difficult to actually sum it up into something. 

Yeah, go ahead. 

 

Sailesh Rachabathuni:

You touched on something very interesting that we typically don't pay attention to, which is network quality is not just data throughput, which is what we always think about it. It's also latency. You may be getting 200 megabit per second, download speed, but your ping time could be a hundred milliseconds for example. Right? So when measuring quality or even apps are trying to measure the quality of network, they have to keep in mind that latency is also important. If we could take them, look at them separately and then devise strategies to handle them separately that's even better. 

 

Rui Costa:

Yeah, absolutely. For example, when I think about HTTP/3 or QUIC, in the end what  they do, you can sum it up - it's more than that, but they optimize the handshake. So basically the three handshakes that a TCP usually does, so with HTTP/3 there will be some cases where you actually don't need a handshake at all. So think about, I just want to send a packet to you and I immediately start transmitting. Just which is like three packets, I don't know when thousands, or hundreds of thousands of packets, just three steps already bring a significant impact in terms of performance, precisely because latency does play a very important role when you look at it from the end user perspective and user experience perspective. So, alright let's say that we have optimized the backend, everything in the cloud is all fully perfect, whatever that means. We are also - I've done our best job in terms of preparing the app for unstable networks, for ability - all of that. But, we'll still have problems even if we do our best job - we will still have performance issues within the application. So from your perspective - well, this is annoying in the sense that you do your best job, but you still have problems, right? So what should we do about that? So how should we tackle that?

 

Sailesh Rachabathuni:

Right. So this is where, you know some of the things we've done in the past with our applications is that you, you realize that he can never really get rid of all the performance problems. And as we as discussed, it is inherently unpredictable, the network condition. Just to give you a little more insight into what we deal with, for example on the Western Digital side we work on these My Cloud devices.

So in this case, our client is talking to a device that's sitting at user's home. It's not even like a central server. So there are hundreds of thousands of these little devices in the field and clients are talking to them. Network conditions are even worse for us because it depends on various factors within that end to end network connection. In these cases ... 

 

Rui Costa:

In that case you have two hops that you don't control, right?

 

Sailesh Rachabathuni:

Right.

 

Rui Costa:

Between the device, the storage device that you have at the user's home whether it's wifi or cable ...  

 

Sailesh Rachabathuni:

Exactly. 

 

Rui Costa:

And then the mobile in the end. 

 

Sailesh Rachabathuni:

Precisely. So there are two networks you're dealing with the network in the client side, the network at the user's home end, and of course the server side. So we deal with all these things in the My Cloud system. So we realized pretty early on that you really can't program, you really can't get the best performance you want out of it. So you have to find ways to live with it. And this is where, you know we keep going back to, not just trying to solve all the, all the performance issues, but also being clever about your UX is, has gotta be helpful.

You know some of the tricks everybody does is, you know adding animations. Well, don't definitely add a creative spinner. However, creative the spinner is, it's going to be as annoying. One technique is tips. People like " Hey, did you do this? How did you do that? Here's a new thing that we introduced", and keep them occupied and that's one thing we do. You know the other thing that I've seen some applications do recently is that they split their screens into multiple screens, especially on data entry. Think about, let's take a hypothetical example of a, of an application that asks for user information; like name, email, zip code and all that - and assume that each update takes a second.

So in this case instead of asking them for all the information and five fields, and then take five seconds to update: split that into five screens and each screen takes one second to update.Thereby you're keeping the user engaged. So you have to balance that. Of course you just get annoyed that you take me through five screens for this stupid thing. If you don't do it,  you’ll be just as annoyed "Why are you taking five seconds to make this update?" You have to balance these two things, but that's also an interesting idea. I've seen some people are doing it, split the screens into multiple screens, just to keep the user engaged. Because to be honest users like to, it doesn't matter what the task is, as long as there is something happening that they are interacting with the application they're typically engaged with it.

 

Rui Costa:

It does relate a lot into what Nolan shared in the first episode, which is controlling or managing user expectation and frustration. That's actually a very clever example. If you do know that it's going to take time, so instead of if you need five actions instead of putting those five actions in one slot and then the user will just be looking at the phone happily waiting for a thing that takes ages. You can split that and control that expectation and that frustration from the end user perspective. Wow, very interesting. I never thought about that angle, that specific angle on how can we manage that.  

Sailesh, always also like to tease the guests with this challenging question, before we end: what should be for every performance engineer or developer out there, from your perspective, what should be the key takeaways you want these guys to take home and what will be the advices that you will share with these people if we're facing down the street?

 

Sailesh Rachabathuni:

You know nothing beats the user experience you are providing in the field. You can test as much as you want, you can draw a lot of pictures and flow diagrams and then test them at your work. But it doesn't matter, right? So that the best advice I can provide to people is that you need to be able to measure the performance the users are getting in the field.

This is where your analytics becomes extremely important. And not just analytics which screens the users are at or what, what flows they're doing or when they're quitting - it's not only that. You need to be able to measure the time between your transitions and then be able to log them somewhere and then be able to analyze your field data and then see where your performance problems are.

And a little more advise on top of that is because you have to think about international audience, unless your application is only released within the Bay Area. There are applications that just work within the Bay Area, but most applications you would want them to be international. Global, international, global, right? And it's surprising how network conditions changed drastically between these countries and some of the problems could be local to a country or to a region. So my best advice is definitely gather information from the field off the performance that your application is getting. And then, and then be ready to slice the data based on various regions and various conditions to see how your app is performing. That's really the only way to get the best out of the app.

 

Rui Costa:

Cool. It's very good advice. I absolutely relate to that ourselves. Ourselves we've been feeling that pain from the very beginning. I still remember the time where we were optimizing our own thing and then we went to a McDonald's to test it out and this McDonald's wifi network was UDP blocked or you know, managing principles and we said "Oh my God, we totally forgot about this use case!", which is the fact that there are some networks that are UDP blocked. And the most challenging one actually for us was UDP cap networks. So you do have connectivity, but you're bundled into a lower bandwidth available. And these are the things that you actually need to be out there and entering what happens out there to be able to grasp the dimension of the issues. Number one, how frequent are they, right? And then what's the dimension? Well, what's the impact when this happens to the end user experience? So I absolutely relate to your last comment and suggestion. Sailesh, thank you so much. It was a pleasure to have you on board. Too bad we can't have this conversation with a beer or a coffee in Mountain View, but we will do that next time. Sailesh, thank you so much. And thank you all for listening and see you next week.

 

Sailesh Rachabathuni:

Alright. Thank you Rui, for having me. Stay safe. 

 

Rui Costa:

I hope you have enjoyed the conversation with Sailesh. I believe Sailesh's LinkedIn address in the description of the episode. Please don't forget to follow us on usual podcast platforms and leave us your review, your feedback, and you can also visit performancecafe.codavel.com to get in touch with us.

Thank you so much and see you next week.