Frontend performance optimization for newbies. Part 1/n. How browsers render the page

Posted: May 01, 2021

It's quite difficult to cover all the themes about frontend optimization. So I'm going to have a number of them: 1) How browsers render the page <ā€” this article 2) What can we do to improve FMP and TTI? 3) In progress The first 2 parts of the article are republished from my dev.to account: https://dev.to/xnimorz/hitchhiker-s-guide-to-frontend-performance-optimization-4607 However, I'm going to extend it.

I'm Nik and I'm a frontend developer. Besides writing code, I was a mentor at HeadHunter's developers school: https://school.hh.ru/

We recorded our lectures in 2018-2019. These lectures are opened on our YouTube channel (but in Russian). Here is a playlist https://www.youtube.com/watch?v=eHWMtfqxjes&list=PLGn25JCaSSFQQOab_xMXI3vJ0tDUkFaCI However, in 2019-2020 school we didn't record our lectures. I had a talk dedicated to frontend performance optimization. After it, I decided to make an article based on the material. As the lecture was 3 hours long, I divided the article into 2 parts.

This longread could be useful as a handbook. We will cover:

  1. Why performance is important;
  2. FMP (First Meaningful Paint), TTI (Time To Interactive);
  3. Critical render path, DOM, CSSOM, RenderTree;
  4. Base steps to improve performance.

The rest of the themes, which were in my lecture, will be in the second article. The second part will cover such topics as layout, reflow, repaint, composite, and their optimization.

Why performance is important. Motivational part.

0.1 seconds ā€” it is a gap when we perceive a connection between our mouse click or keyboard press and changes in the application or interface.

I think almost everybody saw a lag when you input a text, but the interface handles only a previous word. A similar problem exists with button clicks. The good UX helps me, it tells me: "Okay, just a moment and everything will be done". The latest example I had was when I tried to remove a huge number of emails through a web-version in one email webapp (let it be an anonymous service). When I selected emails and clicked the "remove" button, nothing happened. At those moments I didn't understand either I misclicked or the interface had a lag. The second variant was correct :) It is frustrating. I want to have a responsive interface.

Why should it be 0.1 seconds? The key is that our consciousness makes connections between our actions and the definite changes in the website and 100ms is a good time for it.

Let me show an example. Here is a video clip of 30 Seconds to mars ā€” Hurricane (be careful, it is an explicit one, and has some NSFW parts. You can open the clip on 9:30 and you will be able to catch frames, which we are talking about, during the next 30 seconds): https://www.youtube.com/watch?v=MjyvlD0TwiA this clip has several moments when a screen appears for only 1-2 frames. Our consciousness not only handles this screen but recognizes content (partly).

1 second is a perfect time to load a site. Users perceive surfing smoothly in this case. If your service could be loaded within 1 second you are awesome! Unfortunately, we have a different situation in general.

Let's count what we have to do when a user navigates to our site: network outgoings, backend processings, microservice queries (usually), DB queries, templating, data processing on the client-side (we are going to talk about it today), static resource loading, script initialization. Summing up: it's painful.

That's why usually 1 second is an ideal timing.

10 seconds. Lots of analytics tell us that people spend about 30 seconds visiting a website on average. A site that is loaded 5 seconds consumes 1/6 of user time. 10 seconds ā€” a third.

The next numbers are 1 minute and 10 minutes. 1 minute is a perfect time to complete a small task using a site like reading product info or getting registered. Why should it be only a minute? We don't spend much time these days concentrating on one thing. We change objects of our attention pretty often.

  • Opened an article, read the tenth part of it, then a colleague sent a meme on Slack, web-site trigger alerted, wow coronavirus news, all of it. Only in the evening, you get time to read an article.

When a user spent 10 minutes on a site, it means they tried to solve their problem at least. They compared plans, made an order, etc.

Big companies have good analytics for performance metrics:

  • Walmart: 1 second means + 2% conversion
  • Amazon: 0,1 seconds increases proceeds for 1%

The latest motivator is from Wikipedia:

https://twitter.com/wikipedia/status/585186967685619712

An image from Notion

Let's go further:

Two eternal questions

Let's run a lighthouse check on hh.ru. Looks pretty bad (pay attention it's a mobile configuration of the lighthouse):

An image from Notion

Here we have 2 traditional questions:

1) Who's to blame for this? :) (and it's better to replace with a question why we have this)

2) What do we do with it?

Spoiler: there won't be a picture of how good our metrics became at the end.

Let's dive

We have 3 common scenarios:

  1. First paint
  2. Page processing (user clicks, data input, etc.)
  3. SPA ā€” changing pages without reloading

Talking about first-page loading, we have 2 the most important stages of page readiness from the user's point of view: FMP (First Meaningful Paint) and TTI (Time to interactive):

An image from Notion

FMP for users indicates that we have text, and they can start consuming content (of course in case you are not Instagram or youtube).

TTI === the site is ready to work. Scripts are downloaded, initialized, all resources are ready.

The most important metric for HeadHunter (hh.ru) is FMP, as applicants base behavior is to open vacancies search and then open each vacancy in a new tab so that users can read them one by one and make a decision whether they want to apply to this vacancy or not.

With some nuances, FMP is one of the best metrics to measure websites' critical render path. A critical render path is a number of actions, resources, which should be downloaded and processed by the browser before showing a first result appropriate to users' work. Minimal resources, we have to download, are HTML, CSS stylesheets, and blocking js scripts.

Critical render path or what the browsers do to show user text

TL&DR;

0) Make a navigate request (DNS resolve, TCP request, etc.)

1) Receive HTML-doc;

2) Parse HTML

3) Build the DOM (Document object model)

4) Send requests to download blocking resources (works in parallel with the previous process)

5) Receive blocking resources, especially CSS-code. In case we have blocking JS code, execute it.

6) Rebuild the DOM if needed (especially in case blocking JS mutates DOM)

7) Make CSSOM tree

8) Build Render tree

9) Draw a page (Layout ā‡’ paint ā‡’ Composite)

Note: Reflow could be executed additionally on previous stages, due to the fact that js could force it. We will cover this part in the second article

In details:

Request

An image from Notion

Make a request, resolve DNS, IP, TCP, etc. Bytes are running through the sockets, the server receives a request.

Response

Backends execute a request, write bytes into the socket. We receive the answer like this:

An image from Notion

We receive a bunch of bytes, form a string due to the text/html data type. Interesting thing: first requests are marked by the browser as a "navigate" request. You can see it if you subscribe to fetch action in ServiceWorker. After receiving data, the browser should parse it and make DOM.

DOM processing

DOM

We receive a string or a Stream. In this stage browser parses it and transform a string into a special object (DOM):

An image from Notion

This is only a carcass. At this point, the browser knows nothing about styles, hence it doesn't know how to render the page.

Downloading of Blocking resources

Browsers synchronously process HTML. Each resource either CSS or JS could be downloaded synchronously or asynchronously. When we download a resource synchronously we block the rest of DOM processing before we receive it. That's why people recommend putting blocking javascript without defer and async attributes right before the closing body tag.

So each time browsers get to the blocking resource, they make a request, parse the response, and so on. Here we have some limitations such as the max number of simultaneous domain requests.

After all blocking resources are received, we can form CSSOM

CSSOM

Let's suggest, besides meta and title tags we have style or link. Now browsers merge DOM and CSS and make an object model for CSS:

An image from Notion

The left part of the object (head and the children) isn't interesting for CSSOM, as it wouldn't be shown to the user. For the rest of the nodes, we define styles, which browsers will apply. CSSOM is important, as it helps us to form RenderTree.

RenderTree

The last step between making trees and render. At this stage, we form a tree that will be rendered. In our example, the left part won't be rendered, so we will remove it:

An image from Notion

This tree will be rendered. However, we could get a question. Why do we render "RenderTree" instead of DOM? We can check it easily by opening DevTools. Even though DevTools has all DOM elements, all computed styles are based on RenderTree:

An image from Notion

Here we selected a button in the Elements tab. We got all the computed data of the button: its size, position, styles, even inherited ones, etc. After making the RenderTree the browser's next task is to execute Layout ā‡’ Paint ā‡’ Composite for our app. Once Composite is ended user will see the site. Layout ā‡’ Paint ā‡’ Composite could be a problem not only for the first render but also during user interaction with the website. it's why I moved this part to another article.

Read next part: What can we do to improve FMP and TTI?