Blog - Codavel

Café with Manuel from Farfetch, on performance budgets and metrics

Written by Rui Costa | July 03, 2020
 

Transcript of the App Performance Café episode with Manuel Garcia , from Farfetch, originally published on June 4, 2020. More details in the podcast episode page.

Rui Costa:

Hi, everyone. Welcome to the App Performance Cafe, a podcast on mobile app performance. My name is Rui and our guest for today is Manuel Garcia from Farfetch. Manuel is an expert on performance and so in our conversation, as you will see, we went through performance metrics and how complex and how customized these performance metrics should be for your specific use case. And then Manuel took us to the very interesting concept of performance budgets, which is a key tool to prevent performance regressions. And then in the end Manuel just shared why he believes that performance should be seen as a feature in the product, and why performance must be a shared responsibility across the entire company.

I hope you like it. It was a very, very interesting conversation we had. Don't forget to follow us on the usual podcast platforms as well as visit a performancecafe.codavel.com. 

Episode Start

Hi, everyone. Welcome to the app performance cafe. I'm Rui and I'm very excited to have here with me today Manuel Garcia from Farfetch. Manuel, thank you so much for accepting the invitation. Can you please tell us a little bit more about yourself? 

 

Manuel Garcia

Hi, Rui.Thanks a lot for this opportunity. It's a pleasure to be here talking about a topic that I really enjoy - performance. So my name is Manuel Garcia, I work as a principal engineer at Farfetch.

Farfetch is an eCommerce platform for luxury fashion. And I work as a principal engineer, which is the same to say that I'm involved with top technical concerns - performance being one of those - resilience, scalability, monitoring, architecture. So basically, I try to look into the processes that we have trying to improve them, trying to find new opportunities, setting the vision, sometimes working with other teams, being also a mentor, which is very important for the principal engineer. So basically a lot of stuff, a big cocktail of skills that I need to have as an individual contributor. 

 

Rui Costa:

Yeah. I was imagining that because from the ... so the reason why I invited Manuel was I saw, although we are neighbors, we never actually met, but I saw his work on the Farfetch blog and Manuel has at least three- if I'm not mistaken - very powerful posts about performance, and it's clearly the case that you take a very broad perspective, then you go deep on different levels. So that's why I immediately think I have to have this guy on the podcast because I'm sure that we'll have a quite interesting conversation. Manuel, so I always start with a question about: Why should we care about mobile app performance? And what do you understand by mobile app performance? 

 

Manuel Garcia

I mean it's important,  I cannot cannot say to any difference -  everyone understands that it's important. But if I look at it from my reality at the moment, which is working on an eCommerce application that is  available to the general public, it can be the difference between making a sale or losing the customer to competition.

Simply as that, but I actually believe that can be a way deeper than that. It's a sense of added responsibility that we have, because it's not just losing the customer. It's like in a ...  for a lot of people if they come to our web application, if they are browsing  on their phones, if the page takes a long time to load, they will not simply go away - they will have this vision of the website and also of the whole brand. This happens a lot with a lot of people. 

So they, all of a sudden they besides the sites they kind of feel that the whole experience of the brand will also be slow. It will be a painful experience. So, getting the order at home, making a return, all of that is tainted already. So it's like people go away and then when they are joined by their friends and their family, that's the same impression that they're going to broadcast. And this influences other people into thinking that, "Man, I don't want to go there.  I don't want to have your experience." So that's why I think it's an added responsibility. It's our first impression and it can have a lot more importance than what people typically realize.

 

Rui Costa:

You don't get a second chance to make a very good first impression. That's interesting because basically what you're saying is that it goes way beyond just the specific action that we are trying to optimize. That it actually goes into more from a brand and loyalty perspective intersect. So I'm curious, because from that regard, how do you end up well in the process of optimizing and trying to build this perfect experience that brings this awareness that your brand and your application and your service is top notch, but how in the end you end up in the process of optimizing - the first step, I guess, is to actually measure that. So what kind of measurements do you do? And what kind of experience have you guys have seen with respect to what matters and what does not matter that much with respect to performance? 

 

Manuel Garcia

Yeah , I'll say that that will be actually the first point - the first step - trying to understand where we stand at the moment.

So if anyone is starting to do this kind of work he has to have some sort of assessment of how things are. And actually that's the tricky part as well, because for a lot of years I have seen a lot of metrics, so we are talking about metrics - there are a lot of them metrics that have been along with us since the nineties and so on. But it kind of feels to me that we now have more complex applications that are more and more dynamic over the years. So metrics that perhaps worked in the past do not work as well now with all the complexity that we have. But I'll say that trying to find the right metrics is not easy, but you definitely need those. Because this is like your North star- it tells you where you are and it will also guide you in further improvements that you do and also which metrics to choose - that's complicated. For sure you need more than one. So choosing an all-in-one metric and judging the performance by that is a myth. You cannot have that. You have to have more metrics because the process of loading the application is like a journey - it's a process. And also what I like about working with this application that's available for everyone is that I can talk with other people and ask what they think, and it's not difficult. So I just need to open the application on their phones and tell me about it. 

But when trying to understand which metrics we can use, there's no perfect answer for that. I've been into all sorts of conferences where people talk about different metrics and we all agreed that there is no perfect one. It's also a bit of experimentation that we need to do and understand if some have more importance than others, of course. 

 

Rui Costa:

Would you say that each application should have its own metrics in some essence, even if you have some generic ones?

 

Manuel Garcia

Actually, that's a very good point. I mean the generic ones are the ones that we heard about in the community and so on. But I actually believe that besides the generic ones, it's definitely a good idea that you try to formalize your own metrics. The reason I say that is that generic metrics will work the same way, regardless of what application you are pointing to. But customized metrics, you can design metrics that serve best for your business, right? So we can define all of the variables and so on. And perhaps, and what's a hundred percent guaranteed is that no one will know the business better than our own business. 

So that's why we need to come up with understandings on perception and so on. And we have all the control to define the metrics that are customized to our website. And maybe you can even have different metrics for different pages because you are trying to extract more on some pages than others. So I just wanted to say that we have all of this control where we can define, but it's not something that I see that often. I see other companies, for instance Twitter, formalized one a few years ago, that it was "the time to first tweet". So for them, it was really important the speed that they wanted to have and for the users to just write the tweet, and that was good because it worked for their business. In our case, there are also other areas that we are focused on. Like since it's an eCommerce site and we want to do -  basically it's a business, right? We want the customer to buy, we want to design a metric that tells us how fast the user can buy it at our site.

 

Rui Costa:

Yeah, I read in one of your posts  - "time to basket" - is it?

 

Manuel Garcia

"Time to add to bag", yeah. I think it's a cool idea in a way that - I mean, I was just thinking out loud and it was one of my ideas. It's not hard to implement, but I think the cool part about it, is that it expresses what that place is all about. Because that's what we want. Like if you open on a mobile, you will see all the different elements that are important. You want to see an image, it's a product that you want to buy. You want to see the name and the price and you want to have that magic button there and you just want the customer to click on that. It's just what we want. It's our intention. And so having a metric that expresses that alone - I think it's very, very powerful. 

 

Rui Costa:

Yeah, I absolutely agree because it's - in that example and the example you gave about Twitter -  it's the key action. So you should definitely measure that and optimize that. But do you break it down? So I'm thinking like adding something to the bag involves multiple things, like rendering, the name of the product, the images, context - and all of that. So do you break it down? And if so, do you see any of these pieces as the most relevant ones? Or do you take an approach more from the perspective of like progressive content delivery in the sense that you have some key information that you want to ensure that gets first to the app, so that you optimize the time to add to the bag. 

 

Manuel Garcia

Yeah, great question. Yes. I mean, besides this one that is really focused on the action of the user,  there are other stages that I feel that they're very relevant and when trying to break that down, I always look at psychology in this case. So how the human brain thinks, I always felt that it was not something that was very abstract. I think it was very easy to explain to people because as users, we go to the same set of simulations when we open a page and how we perceive it. Although perception is subjective, we are talking about objective metrics. So that's why it's a bit complex, but I'll say that the first visual checkpoint for me, it's crucial.

I mean, like the first moments you start to see something on the screen. So this is very important because it's important for bounce rates, you know. And if you have a mobile application and you are just walking around and you want to go to a site. And just for the fact that you are walking your level, the amount of time that you are willing to wait for that first impression, it just goes down very dramatically. So you are not as patient. So that's why it's very important to put that first paint on the screen as fast as possible, because it's our first impression. For me it's very - I kind of say that until that moment, all that we are doing is just wasting money because all that we have before that moment are just cycles, CPU cycles that we waste on the backend, trying to come up with all of these pieces of information: the rendering and all of that stuff, just to assemble everything and just give it to the user. If we fail to deliver, because we are not fast enough, we already lost the client. So until that moment, it was just costs. We're just wasting money. And so that's why for me, it's very important to have that first things very fast, because it's also the moment that the user understands that the site is actually working - so he starts to see something. So that will be the first visual check point that I think that is crucial and then comes all the other stuff which is building the layout. 

So in a mobile, I would say that everything that's above the fold that we prioritize into getting into the user as soon as possible. And the interesting part is where this ends. So the moment where the user sees that everything has loaded, not necessarily the whole page, and that's why we break it down into what is more important than what is below the fold, because our first commitment is to be fast in delivering what the user sees. Users, they only relate with what they see - this is very important. So this is understanding that what's below has no meaning for the user has a different importance outside. And we need to build all of that stuff: the image, the button, and all of that, and this is a very crucial point as well, because it's the stage where the user switches from the passive state to an active state. Users don't want to be in a passive state because they do not feel in control. So they are just waiting for something, they do not feel in control and that's why they become impatient. And when all of this seems completely loaded, it's the time where the user has all the visual cues that invite him to interact. And so he switches into an active state and he tries to interact and we want him to interact with that button, and we need to make sure that the button is alive. This is very important and that's where the custom metric also helps in which it will only fire that metric if the button is alive: because if it's not alive this also brings another level of frustration in the experience and also brings this other pattern that it's like rage clicks - we call it rage clicks. It's the point where you just start to tap into the button because it does not give you any feedback. All of a sudden, instead of clicking once you are already clicking twice, and this is good because we can measure this. It's your level of frustration as well. We need to provide these to the user as fast a living experience, not a dead experience. 

But also with other parts, other metrics start to shine, because they are not visual metrics, but are more in the interactivity realm. So where we are trying to study, what the browser's main thread is doing because we only have one thread and all the code that we ship out goes through this main thread, which means that it can become busy. And did all of the code being our own code, then other third party code -  which is completely understandable because in a case of an eCommerce site you download more than just the site itself. There's a lot of stuff that goes on the site, but it has to go through the same thread. So this is where things start to get complicated and complex, because it has to go, everything is urgent, right? Everything has to ask to come and go through that same pipeline. And then he can, can reach a moment where you try to interact with the page and it's busy doing something else because we are talking about a single thread. And so it means that if the thread is occupied for more than 15 milliseconds, you are not guaranteed to have fluid feedback. And so this - there are a lot of studies that show that when you have this pattern - it triggers this disruption in the user experience, and this is normally correlated with conversion rates. So people feel that the experience is not fluid, they get frustrated and then they just go away.

 

Rui Costa:

I believe it was the first time that I heard this “rage clicks” expression, and so you measure user frustration with this. It's actually a very accurate, but at least in my case, it's a very accurate metric for user frustration. So, thank you for that. It was very, very enlightening. I guess I will share in the description of the episode a few of you, your articles, where people will be able to see in more detail, like what metrics do you measure that? And what's the impact on what you're saying? But so now I wanted to take it from a different perspective, which is, so you measure all this. Then you detect problems, and then you optimize. But how do you ensure that you don't go back to worst performance, so you don't get regressions with respect to performance. So how do you manage that? And such a complex metric set up?

 

Manuel Garcia

Yeah, definitely complex. Yeah, trying to be at the top of your game with so many things that go on the site, it's very challenging because we don't control all the variables, right? There are different areas. Performance is like a shared responsibility, it should be. But this is always good on paper, but then in reality it is kind of, I mean there has to be people that have these alignments and try to align people around this. It has also been a challenge that we have in trying to set a culture of performance in the company, but definitely trying to set the stage and keep the standard is very complex.

Regressions are something that happens everyday and typically this can happen on just small things, just simple implementations that seem naive, that can bring performance bottlenecks. I believe that a good concept that we can introduce in companies that want to protect that standard is creating performance budgets. 

Performance budgets is a concept that's also not hard to understand. It's having performance and living on a budget. Let's say it's defining - 

 

Rui Costa:

Can you give us an example? Or I can give an example, and you can say if my interpretation is right or wrong. I think a performance budget is like ...  I'm a project manager, I'm developing - so my team is developing this feature, I don't know, a new page or something like that. And the performance budget means that,  I'm bundled and you tell me, this page cannot take more than X milliseconds - something like that.

 

Manuel Garcia

Yeah. That's exactly it. That amount of seconds is your threshold. So the performance budget is no more than a threshold that we define. There can be different kinds of budgets. I mean, the example that you gave is what we call a time-based budget because it's easier  to explain to people like "we want to page to load under five seconds or something." So everyone understands time, right? So regardless of what metric we are trying to establish, everyone knows that the sooner the better, right? But there are other ways of looking at it that also bring value, which is like quantity-based metrics, for instance - the page weights. Typically on my performance analysis, I tend to look at a lot of what the page weight is. 

So over the years pages just weigh more than they used to years ago: more images, eye candy, higher quality - I mean, that all pays a price to our clients data plans. I mean, there's also that responsibility in trying to understand if we are following what the user defines on their phone and abide to that, there are protocols or there are contrasts that you can follow to be under that and follow that. And in those terms, we can say, I don't want the page to have to weigh more than one megabyte of data, or we don't want to have more than a certain amount of images. It also makes sense. We don't want the page to have more than a certain amount of requests - so that also counts. And also there are tools that bring the concept of rule-based budgets, which is like a score. You don't want to test performance and these tools just make an analysis of it. And they basically join a lot of best practices that you do. 

Basically you need certain important metrics that the tool accounts for. You have to have good indicators on these metrics, and so the tool sums that up and gives you a score. It's good to show the technical quality of the product, but it's kind of hard also to explain to other people what that does mean. Typically want to have a higher score and that suits better. I mean it's hard to measure and benchmark. I see that most of the time we kind of use time-related budgets, and so we use this concept of budgets to validate that we are not having performance regressions, and there are a lot of places where you can have this. I think that the sooner, the better that you are aware that you have regressions, that is great. So if you are able to spot regressions before shipping the code to the live environment, that would be great. 

It would mean that you can do this on your own, on your machine - you can do that - but typically this will work in a continuous integration, continuous deployment pipeline, where you do validations. And this will be one of those validations. So you'll have your budgets which can be this number  - under three seconds, under five seconds - you spin up a few tests to understand if you were still under that budget: taking into account all of the features that were developed and so on. And this is very interesting .... yeah, go ahead. 

 

Rui Costa:

Yeah, you actually put performance as one of the integration tests. 

 

Manuel Garcia

Yes, we knew that -  that's where synthetic tools, which is something that I didn't talk about before, but this is where the syntactic tools are really interesting because they just give you a glimpse of what your performance series is like. It's a laboratory where you would just have control of all the different angles, the network constraints, the device that you use, and typically you perhaps use different kinds of devices. Like the baseline  of your customers, we have an idea of what is the device that most of our customers use - so we can use that. 

But since we are always trying to go after the tech excellence, we want also to understand on the slower devices how performance is. So we want to understand how that experience is, and we also want to understand how our page loads under those circumstances and have a good budget for that. And we do these tests, we test against these budgets that we have set up and if the tests reveal that we are over budget, it - so the whole idea is that it can break like a pipeline, right? We don't want to ship this code because we are regressing. That's the idea. 

 

Rui Costa:

It sounds like -  I'll say that it sounds like you look at performance as a feature to some extent.

 

Manuel Garcia

Yeah, for sure, for sure. I totally believe that it's a feature.

And I think that when you have this concept of budgets, if you are able to understand what a regression means - I mean, not necessarily in the terms of time,but perhaps in the terms of business impact. So if you are, if you went through the process of mapping a certain metrics time metrics typically, with business like conversion rates or profit, whatever, what is important for the company to follow engagement. I mean, there can be a lot of things that are important for the company. 

I mean  this gives quantifiable - for instance, a product manager or product owner, which is responsible for the functional side of the product, right? - to understand if that feature is worth it or not. I mean, if the changes that we are doing to the code are really worth it or not, taking into account the performance trade off. Because you are able to quantify what does that mean for the business, and maybe all of a sudden, this feature that you think is like the best thing ever, maybe it's not that good because it will shave, it will increase one second certain metrics. 

So I think this is amazing when you can reach this stage because it gives the product managers another lever, understand? Like most of the time they are focused on the functional side - again, the feature - but on the other hand, they know now what performance means, and they can actually use performance as driving value. So perhaps in some way, or at certain points, the product manager can tell the team to focus more on performance improvements. Than actually for instance, work on an AB test or something, or some other feature, maybe it brings a lot more value than concentrating on improving the performance. And I think that this is good for a company to give this other lever for product managers to pull. 

 

Rui Costa:

So I was hearing you and I was thinking, if you take it to an extreme, you'd say something like I'm thinking about deploying this feature. And even before actually drawing a one single line of code, you're already saying that while we will do this, if and only if we can be within this budget. So within this loading time, within the size, whatever - but within this performance budget. So, does that happen? You do that before actually going into the development or is more about, say an iterative process that you will fine tune these goals. 

 

Manuel Garcia

I mean you should set this budget soon. It's not something that should be changed often. And this is a topic that I normally talk about, because what would be great is that we are able to have a harder budget even. Keep it down, not pull the bar up. Because there will always be a time - there's a balance between the feature and the performance. And sometimes it's hard for us to understand what we should do. If the feature or if performance suffers a bit. I mean, it's still something that we think it's achievable so  there's a balance here that's not very easy to identify. But that threshold is important because I think that it motivates the team, because they see this as their standard, and they want to defend this standard. They will defend the standard at all costs, and so they will interpret changes to the code and they will be quicker to understand or already see if there will be some performance penalty. And they are already trying to talk to product managers that don't see this right away. And so when this starts to set in and work, I think it's kind of interesting because the team is already embodying this concept of performance budget, and for them it starts to seem that something that is familiar, it's what they do. And also they show this to other people that they are able to sustain the level. And if they can, they can even increase in terms of having the budget even harder and be more ambitious. 

But I mean, this is always something that doesn't happen every day. It's challenging because there is a lot of stuff. And if you work like we do in a micro services architecture typically, I see this as a good thing, but in terms of overall  performance experience, it kind of sometimes feels like it's disconnected because you are looking at your own garden, right? You are looking at your own application service, which is just a little piece of the whole puzzle. So sometimes it can feel okay if you look at it from that perspective, but then when you see the big picture it kinds of feels different, which also brings another level of complexity. Which is "OK, I may be shipping this out. It may look okay, but not that okay in the whole picture". So perhaps instead of having this budget set into your own garden, you have to have the whole property and have a budget that represents the whole page, because if you think about it, that's what the user experience is. It's not about the little piece, it's the whole picture. So we need to protect, above all,  we need to protect that big picture. 

 

Rui Costa:

So just before we started recording, we were sharing that we kind of do the same thing. So you sell performance internally and I sell performance externally. So, from your experience - what I'm thinking is - how do you put this in people's minds? So how do you convince people that this is something very relevant  for the company?

 

Manuel Garcia

I can only talk from my experience. I mean sometimes I feel that it works, sometimes I don't feel that it works that well - but the one thing that I know is that I cannot be bothered. And sometimes the same speech, and you don't know about that because it's the selling part that you have to repeat yourself over and over again. But I feel like all of this is like our end goal is like creating this culture around performance. We want performance to be a first class citizen -  besides engineering, which is the part where I work at it's also very important that other areas other than engineering also know what's the important of performance and why it is important. And also just put that aside, the idea that performance is just something that just gets dropped on an engineering team to fix. Performance is a shared responsibility, so other areas like, I already talked about the product managers, product area also needs to understand like it's a feature so they need to protect that non-functional requirement of the product, because for most people how fast the site loads is more important than actually how good it looks and how easy it is for me to find what I'm looking for.

But you have other areas like design, for instance. Is the design team designing with a performance mindset? But I always believe that you are building a house, if you have a strong foundation, the house at the end will be much better than if you don't think about this early on. So, if you understand the perceived performance, if you are doing stuff on the application into tricking the user that the site looks faster - it's like that mantra "Be fast, but feel faster" -  because this is what's important for the user and for the customer  to feel that the experience. We talk with these people, and also it's a culture made of people. You need to talk with these people over and over and again, and be accountable. So if  you are able to show some numbers that's of course that's, that's a different level, right? You are already saying something objective, because if you are talking with this with people about it being something that is important, they will agree with you of course. But if you bring numbers, you are more objective. So they understand the level of importance that you are talking to. 

But you also need to be accountable. So set the stage, set a goal - try to understand what you want to do next and also improving. Of course, like you have measure metrics to tell them the state of the art that you want to go further, you need to improve, and then you have to socialize all of these. Like if you are doing improvements and if you are bringing this extra value to the company, you need to socialize that. This is the part where I think that it works well with people, especially if you have a company with a lot of offices, is like you celebrate your success. And people feel driven to people that are successful, right? And so they want to be part of it,. Even if they were not interested before, they now see, "Oh, they are working on performance and they did this and this result in that." And so they want to know more about that. And by knowing, wanting to know more about that, that's already a win because they already start to understand what all of this performance part and can make a difference in their own area, on their own domain.

 

Rui Costa:

Yeah, interesting. I'm thinking about stuff like having TVs with performance outcomes. We did that in the past, in the new office was still not there, but we did that in the past because as you said, it's all about getting into your instinct actually like it's all about being natural for you to think about performance. I absolutely agree with you. Manuel, wow. Thank you so much. It was so cool. So I always liked to tease the guests in the end, with this provocative question that is, if you meet another performance engineer or someone starting in the street and you have like one minute, two minutes to give them key advice or key takeaways from your experience, what would that be?

 

Manuel Garcia

Good question. I feel that if you are starting into performance work, advice that I always provide these like repairs to get frustrated. Because frustration will come. It will come. I'm not saying this like, I'm in a higher level that I get that don't get any frustration - I do. The only difference is that I already accepted it. It's the only difference. It's because, and I kind of feel like this is what happens even with senior engineers. It's like when you work in performance, there are certain things that don't seem to have any logic. I mean, you try to improve, it looks good on paper, it looks good on your machine, but then you go into the real thing and you don't see it. And this is this really frustrates people. And what I typically say as an advice is don't give up. First time it will not happen the way you want, but you have to persevere and you have to try it out, understand what's what's going on. Understanding what's going on is actually a very challenging thing to do . Explaining why stuff is degrading and at the same time you are not doing anything. So this happens all the time. It's like stock options, right? You don't do a thing. You have to understand that it's like that and don't give up because it's complex, but I think it's amazing to work in this area. 

 

Rui Costa:

Yeah, I believe you just summed up my past - I dunno - eight years of work. That's exactly it. So frustration. Do not give up. Keep improving. Things that seem to work will not work and things that do not seem to work, will work and still trying to figure all that out can get very frustrating. Manuel, thank you so much - it was so cool.  And thank you all for listening and see you next week. 

Hope you have enjoyed the conversation. I will leave Manuel's LinkedIn address and description of the episode as well as articles on performance, you should definitely read them  - quite quite interesting. Uh, don't forget to follow us on the usual podcast platforms like Apple podcasts or Spotify, as well as visit performancecafe.codavel.com and leave us your reviews, your feedback- looking forward to hearing from you. 

See you next week!