Yes, we do monitor individual requests too, but...
In every single application, performance will eventually become a topic of discussion. There are a multitude of ways to measure an app’s performance, but let’s focus here on HTTP requests. It’s undeniable that tracking every request has its importance and gives valuable insight to which parts of an application are faulty and which ones can be improved. A careful management of every request will most surely improve performance. But there is one thing that these requests do not give: a direct representation of a user action.
A user action is the interaction an user does with the app itself to perform a specific activity. These actions may then trigger a chain of procedures to deliver specific content to the user, like when you click on a link, you are hoping you’re redirected to the new website, or when you click on a video thumbnail you hope to see the entire video without interruptions.
HTTP requests can be performed when a user takes an action, whether to complete the user’s intent, or to make more actions available to him. Analysing a single request can infer some degree of usability (if I receive a POST at /login and this fails or takes too long, I’m pretty sure what the user is trying to do). But this isn’t always possible to infer, and we needed a different approach to replicate a user’s action as closely as possible to a real-case scenario. As such, we decided to create sequences of requests that would simulate the user’s experience with the app, and evaluate the whole sequence results as well.
Let’s say you’re trying to buy a specific product on a website, and you go to the product’s page. That page is probably composed of dozens of requests, so you need to load the product images, descriptions and availabilities, similar items you might be interested in, etc. You probably have already experienced this: you enter a webpage, you see a big bold title, a bunch of text descriptions and you start looking at those. All of a sudden, a big image of the product pops up. You try to refocus on the text you were reading and bam, another bunch of images pops up on the sides. It's some unwanted advertisement for some completely different product. What a terrible experience this is.
What is happening here? Let’s say each text description and icon in the page took around 50 ms to load each and there were 20 of them (yes, they can be grouped in a single request, but let’s keep it separated for simplicity). Every related product thumbnail took 100 ms and there were 6 of them. That big image took 500 ms to load on the page. Those advertisements requests also took 500 ms to complete. Before all that, the page has to ask for every JS file, every CSS file, and there probably will be at least one request to fetch something from some remote database, and let’s say this process takes 500 ms. If you look at these requests individually, you’ll be “heh not great, not terrible”. However, if you look at the user’s experience as a whole, this will amount to 3,1 seconds to complete the whole webpage. Not only the user’s attention was interrupted multiple times, but the amount of time it took to complete the webpage is way above the 2 seconds that are established as a threshold for an ecommerce webpage to load. Probably, you’ve already lost 40% of your potential customers. And those poor souls that consist of the remaining 60% will probably not be left with a good impression of the website, will they? And keep in mind that we used a time-to-first-byte (TTFB) of just 500 ms, but this can be much longer!
The above example assumed that the application made requests sequentially. However, starting a request right after the previous ended is not be the best approach for most use cases. In fact, most apps make some requests in parallel and after getting the response to these requests they make more requests. This means our sequences can have sub-sequences with requests that are done sequentially, while other sub-sequences have requests that can be done in parallel - multiplexed.
While doing our sequential sequence (pardon me) we know the whole sequence duration will be equal to the sum of the duration of every request, while in our multiplexed sequences, the sequence duration will be equal to the maximum value of the duration of the requests made.
Now we have a model for understanding users actions. That is, by modeling a sequence of requests, we can analyse how much time it takes for a user to take a particular action and get the desired content for that action. But what do we assess from these sequences? In our particular case, we use these sequences to compare HTTP
with our Bolina protocol to measure how long the whole sequence took, and then check the benefit the user had by using Bolina versus HTTP. However, speed is not the only thing that matters. One of the main points about Bolina is our protocol’s decreased variability when compared with HTTP. With these sequences, we also take every single run duration and place it in a scatterplot, just like this, for easier reading.
With this we can compare both protocols sequence duration trends, but also the reduced variance between them (which is fortified by the mean, median and standard deviation values calculated).
In essence, using sequences instead of individual requests allows you to make your statistical analysis on what the user actually is perceiving.
Understanding users’ behaviour
Every app is targeted to some kind of end user. But more often than not, these fall into the same pitfall of creating an UX flow and all seems well and done theoretically, but when it is put to practice, users just do not follow this flow. Something just like this.
And how does this good/bad UX experience translate into something more tangible? According to Akamai, rebuffering events during a video stream leads to a 16% increase in negative emotions, including disgust, sadness and a decrease in focus. On the other hand, high-quality streams lead to almost 20% more emotional engagement. The point is, there is a serious need to understand the user’s experience from a more practical point of view and check what is more valued by the end-user. And the closer we get to these values, the closer we get to success.
Going back to our HTTP sequence requests, we see these as a combination of user actions. To get this, we go to an app, record every single interaction we, as a user, do with the app, and catch every request of that interaction. We’ll expand this further into a future blogpost, but the idea is to translate these interaction requests into an action. Actions like scrolling, clicking a button, loading and buffering videos need to be represented in our requests to better simulate a practical user experience in an app. And by combining all these actions into a single sequence, we can get closer to a real simulation. So, we’ll have sequences with 9 requests for starting an app, 8 requests of scrolling actions, 5 more for “add to cart” action and so on.
Your client’s KPIs should be your KPIs too
Until now, I’ve talked about how important it is to understand user’s behaviours in our apps, but let’s face it. Companies do this to improve their KPIs, be it their conversion rate, their churn rate, their retention rate or anything else.
Farfetch established a correlation between Time To Interactive (TTI) and their business metrics, namely that a lower TTI improves their percentage of visits with more than 3 page views. Not only that, but the chance of a user leaving increases 5% for each additional second of TTI. That’s significant.
This analysis is what we try to bring in the last step before we create a sequence. “How will this sequence give my customers value?” and “What can we analyze from this sequence results?” are some of the questions we make in this process. I mean, they probably do not want us to test the performance of a user visiting the whole sitemap. So our metrics and sequences must be thought of to bring something valuable to the client, like the video streaming buffering events are X% faster which translates to Y% more engagement. Then, we have to think about which user actions better reflect a difference for these metrics, and create a use case scenario to simulate this. Optimizing each request is relevant, but optimizing the user actions is extremely important for a better experience, leading to better results overall.
But this is not a simple task, as these KPIs may vary wildly from company to company and even more if they are from different segments. A social media app will surely not be looking for the same metrics as a video streaming service. So before we even start creating the sequence, we need to understand the value our customers give to their data. A great effort in our team is done to do this research, aiming to not only give back something valuable to an end-user, but also to the companies behind the app.
Let’s dive into more concrete examples, shall we?
Social media example
Let’s take a look at Instagram for example. First, let’s define the metrics we need to evaluate:
- Time to fetch an image - the time it takes to load an image in scrolling events
- Time to fetch a video - the time it takes to load a video in scrolling events
- Time to load a feed - time to load an entire feed, consisting of both images and videos
Now, our second step. Which user actions are more relevant, and allows us to capture the previous data?
- Opening the feed
- Opening an image
- More scrolling
- Opening a video
To simulate our user action, we would get the initial requests to open the user’s feed, then we’d scroll through the feed, which would load some images and/or videos and then pick one specific image and open it. After this, we’d close the image, and begin scrolling again until we found an interesting video and open it, watching until the end.
In this case, we captured the times to get all content in the feed, as we do not want users to be hindered by any kind of loading events throughout the scrolling events. These events need to be blazingly fast, as users don’t want to wait for content to load when they reach the bottom of their currently loaded feeds.
Video streaming example
Let’s now talk about the example of creating a sequence for video streaming, taking the YouTube platform as our case study.
So first, let’s define the metrics we want to evaluate:
- Video startup time - the time it takes for the video to start playing, from the moment the user clicks on the wanted video
- # of buffering events - how many buffering events occurred though the whole video
- Buffering time (para ter average, max, min e total) - the time it took for each buffering event to complete
- Time per video quality - how much time it was spent in each video quality
Again, which user actions are more relevant, and allows us to capture the previous data?
- Opening a catalog of videos
- Select a specific video
- Opening and starting the video
- Watching until the end
To simulate a user behaviour, we would get the requests to load the video, but also the rebuffering events that occur throughout the video. But we also want to capture those events where video quality is switched mid-video.
However, in this specific case, we know that video quality is something that varies around the global, and depends on some variables that are out of our control, like infrastructure. Therefore, we feel the need to create sequences that differ in video quality, explicitly telling which quality we want to assess and compare the results, whether the video is presented with a bitrate of 24 Mbps, 4 Mbps or even less.
Thank you for reading all the way through here! Let’s just review the main topics we’ve discussed above:
- Difference between viewing individual requests vs a sequence of requests
- What can we get from analysing a sequence of requests
- A sequence must reflect a user action on the app
- We need to understand a user’s behaviour on an app, to better simulate his actions on the sequences
- But also the company’s KPIs
Next, we’ll dive more deeply into how these sequences are created, and the technical challenges they provide.
To know more about how we monitor a sequence of requests, check this article.