The long journey to the runtime. Part 3. Event loop, layout, paint, composite, call stack

Posted: May 31, 2021

The first 2 parts were dedicated to resource loading and application critical path. Today we will talk about the next part of the performance optimization. What is happening with the page, when it's downloaded, what our CPU executes and how we can optimize it. Keywords for the topic: event loop, paint \ repaint, layout \ reflow, composite. Previous parts: 1) How browsers render the page 2) What can we do to improve FMP and TTI 3) Event loop, layout, paint, composite, call stack <— this article

Questions for self-check

If you want to read the content you can skip this part. The Event loop knowledge is handy not only in our everyday routine, but also it's quite popular in interviews. I like these 4 questions, which allow you to check whether you understand this browser technic. 3 of these questions you usually can find in interviews, but the third one is more practical. They are quite simple, but I sorted them from the simplest to the most difficult one: 1) Will console output "1"? Why?

function loop() {
	Promise.resolve().then(loop);
}
setTimeout(() => {console.log(1)}, 0);

loop();

2) There is a site that has a link, which has cursor: pointer on :hover style. Also, the website has a button, which :hover style changes background-color from grey to blue. Now, let add a script to the console:

while (true);

The question is what will happen when we point the mouse over the link? Over the button? Why? 3) What will console output? Why?

Promise.resolve(1)
	.then((x) => { console.log(x); return x + 1 })
	.then((x) => {console.log(x);})

Promise.resolve(10)
	.then((x) => { console.log(x); return x * 10})
	.then((x) => {console.log(x);})

4) How to animate a popup element height from 0 to auto? It's important to discuss the different methods using JS and\or CSS. By the way, if you google this question, stackoverflow gives a wrong answer ;). I won't give direct answers to these questions at the moment, if you feel you get stuck in one of the questions, try to read the article and then return to the questions. If you still have any questions, don't hesitate to drop a line to the discussion chat in telegram: https://t.me/fxnim/31

Our target

is to fully understand the schema below. We will arrange this schema step-by-step with a detailed description of each stage.

An image from Notion

Event Loop

Old operational systems had similar thing which looked like this conditionally:

while (true) {
	if (execQueue.isNotEmpty()) {
		execQueue.pop().exec();
	}
}

This code utilizes all CPU. It was so in old windows versions. Modern OS schedulers are utterly complicated. They have prioritization, execution, and lots of queues. So, to start we should have some infinite cycle, which checks if we have tasks to execute. Like this:

An image from Notion

Now we should receive tasks somehow. Let's ask a question to ourselves: what are the triggers to execute JS code? It could be: 1) Browser downloaded tag <script> 2) Postponed tasks: setTimeout, setInterval, requestIdleCallback 3) Server response through XmlHttpRequest, fetch, and so on 4) Events and subscribers notifications from browser API: click, mousedown, input, blur, visibilitychange, message, plenty of them. Part of them is initiated by the user (clicked to the button, alt-tabbed, etc.) 5) Promise state changed. In some cases, it could be outside of our js code. 6) Observers like DOMMutationObserver, IntersectionObserver 7) RequestAnimationFrame 8) Something else? Almost all these calls are planned via WebAPI (sometimes it's called browser API). For instance: 1) We entered setTimeout(function a() {}, 100) 2) WebAPI postponed task for 100ms 3) After 100ms, WebAPI puts function a() to the queue (TaskQueue) 4) EventLoop executes the task on its cycle Our JS code has to work with DOM somehow. Getting the size of the elements, adding properties, drawing some pop-ups, etc. It should make the interface alive. It adds some limitations to elements drawings. It's complicated to run 2 threads to execute JS in one of them, and renderings with CSS in another, as it requires lots of code synchronizations, otherwise it could lead to inconsistent execution. It's why both JS and elements rendering work in the same thread. Okay, it means we should add "rendering" to our schema. As it's not a single operation, it's better to use a separate queue. Let's call it render queue:

An image from Notion

We have 2 entry points. One for the most JS operations and the second for renderings. Our first queue is called "SomeJsTasks" and it's time to review how it works: Browsers use 2 queues to execute most part of JS code: 1) TaskQueue is for all events, postponed tasks, almost for everything. The task of this queue is "Task". 2) MicroTaskQueue is for promise callbacks (both resolved and rejected) and MutationObserver. The single element from this queue is "MicroTask".

Screen updating

The event loop is inextricably linked with frames. It executes not only JS code but calculates new frames. Browsers try to show changes on pages as quickly as possible. We do have some limitations: Hardware limits: screen refresh rate; Software limits: OS, browser, energy-saving settings, etc. Most modern devices (and applications) support 60 FPS (frames per second). Most browsers try to update their screen at this particular rate. So, we will use 60 FPS in the article, but it's better to keep in mind, that the definite rate may vary. For our event loop it means that if we want to keep 60FPS, we have timeslots of 16.6 ms for our tasks.

What is TaskQueue

As soon as we receive tasks in TaskQueue, we get the top task from the queue and execute it in each cycle. After the execution, if we have enough time (in other words, if the render queue doesn't get any tasks) we get another task, and another till the render queue gets a task. Let's review some examples:

An image from Notion

We have 3 tasks: A, B, C. Event Loop gets the first one and executes it. It takes 4 ms. Then Event loop checks other queues (MicroTaskQueue and Render Queue). They are empty. It's why Event Loop executes the second task. It takes 12 ms. In total two tasks use 16 ms. Then the browser adds tasks to Render Queue to draw a new frame. The event loop checks the render queue and starts the execution of these tasks. They take 1 ms approx. After these operations Event loop returns to TaskQueue. The event loop can't predict how much time a task will be executed. Furthermore, the event loop isn't able to pause the task to render the frame, as the browser engine doesn't know if it can draw changes from custom JS code or it just some kind of preparation and not the final state. We just don't have an API for this. In other words: During JS code execution all the changes which JS made won't be presented as a rendered frame to the user, but they could be calculated. Now, lets look at the second example:

An image from Notion

We have only 2 tasks in the queue. The first one is quite long, it takes 240ms. As 60FPS means that each frame should be rendered every 16.6ms, we lose approximately 14 frames. So as soon as the task ends, the event loop executes tasks from the render queue to draw the frame. Important note:  Even though we lost 14 frames it doesn't mean we will render 15 frames in a row. Before reviewing MicroTaskQueie, let's talk about the call stack.

Call Stack

The call stack is a list that shows which functions are currently being called and where the transition will take place when the current function finishes executing. Let's look at the example:

function findJinny() {
  debugger;
  console.log('It seems you get confused with universe');
}

function goToTheCave() {
  findJinny();
}
function becomeAPrince() {
  goToTheCave();  
}
function findAFriend() {
   // ¯\_(ツ)_/¯
}
function startDndGame() {
	const friends = [];
  while (friends.length < 2) {
    friends.push(findAFriend());
  }
  becomeAPrince();
}
console.log(startDndGame());

We run this code in the browser console and it will be paused on debugger instruction. How our call stack would be presented? We start our stack from inline code: console.log(startDndGame()); hence it is the start of the call stack. Generally, chrome points out the reference to this line. Let's mark it as inline. Then we go down to the startDndGame function and findAFriend is called several times. This function wouldn't be presented in the call stack as it will be ended before we get to the debugger. Our call will be the following:

An image from Notion

So, it's how the call stack works. It is a queue (stack) of all the function which are being executed at the time and the call stack helps to return to the right place after the current function is ended. When the call stack gets empty, it means that the current task is ended.

What is microtasks?

Microtasks are specific. It could be only Promises or MutationObserver callbacks. Microtasks is a kind of hacks but they grant us some pros and cons in comparison with regular tasks. The main feature of microtasks is that they will be executed as soon as the call stack becomes empty. For instance, we may have this call stack:

An image from Notion

If we have a promise in fulfilled or rejected state, it will be executed as soon as all the elements in the stack will be ended. Any js code is registered in the call stack (which is logical). The end of the call stack is the end of the task or microtask. Here we have an interesting fact: microtasks can create new microtasks, which will be executed when call stack ends, In other words: page render may be postponed forever. This is the main "feature" of microtasks:

An image from Notion

If we have 4 microtasks in MicrotaskQueue, they will be executed one after another. The render will be executed only after these 4 microtasks even though it may consume seconds. All this time user won't be able to work with website. This microtasks' feature could be both advantage and a disadvantage. For example, when MutationObserver calls its callback when DOM is changed user won't see changes on the page before the callback completes. Thereby, we can effectively manage the content which user would see. Our event loop schema:

An image from Notion

We've already figured out that tasks can follow each other, skipping the RenderQueueue (if there are no tasks in RenderQueue).

What is executed inside RenderQueue?

Each frame render may be divided into several stages. Each stage may be divided into substages. We will follow it on Layout example.

An image from Notion

Let's dwell on each stage in more detail:

RequestAnimationFrame (raf)

An image from Notion

Browser is ready to start render, we can subscribe in it and calculate or prepare the frame for the animation step. This callback suits well for working with animations or plan some changes in DOM right before frame renders. Some interesting facts: 1) Raf's callback has an argument: DOMHighResTimeStamp — which is the number of milliseconds passed since "time origin" (which is the start of the document lifetime). Therefore you may not need to use performance.now() inside the callback, you already have it; 2) raf returns a descriptor (id), hence you can cancel raf using cancelAnimationFrame. (like setTimeout); 3) If user changes the tab, or minimized browser, you won't have a re-render which means you won't have raf either; 4) Js code, which changes the size of the elements or reads element properties may force requestAnimationFrame; 5) How to check how often browser renders frames? This code would help:

const checkRequestAnimationDiff = () => {
	let prev;
	function call() {
		requestAnimationFrame((timestamp) => {
			if (prev) {
				console.log(timestamp - prev); // It should be around 16.6 ms for 60FPS
			}
			prev = timestamp;
			call();
		});
	}
	call();
}
checkRequestAnimationDiff();

Here is my experiment on hh.ru:

An image from Notion

6) Safari call(ed) raf after frame rendered. This is the only browser with the different behavior. https://github.com/whatwg/html/issues/2569#issuecomment-332150901

Style (recalculation)

An image from Notion

Browser recalculates styles that should be applied. This step also calculates which media queries will be active. The recalculations include both direct changes a.styles.left = '10px' and those described through CSS files, such as element.classList.add('my-styles-class') They will all be recalculated in terms of CSSOM and Render tree production. If you run the profiler and open the hh.ru website, this is where you can find the time spent on Style:

An image from Notion

Layout

An image from Notion

Calculating layers, element positions, their size, and their mutual influence on each other. The more DOM elements on the page the harder the operation is. To understand how the time is spent some browsers divide the process to some subprocess. For example, in Chrome you can see update layer tree and layout shift calls. Layout Shift is in charge of shifting elements relative to each other.

An image from Notion

Layout is quite a painful operation for modern web-sites. Layout happens each time when you: 1) Read properties associated with the size and position of the element (offsetWidth, offsetLeft, getBoundingClientRect, etc.) 2) Write properties associated with the size and position of the elements except some of them (like transform and will-change). transform operates in composition process. will-change would signal to the browser, that changing the property should be calculated in composition stage. Here you can check the actual list of the reasons for that: https://source.chromium.org/chromium/chromium/src/+/master:third_party/blink/renderer/core/paint/compositing/compositing_reason_finder.cc;l=39 Layout is in charge of: 1. Calculating layouts 2. Elements interposition on the layer Layout (with or without raf or style) could happen not in turn when you want to render the page and apply the changes, but when js has resized elements or read properties. This process is called force layout The full list of properties which forces Layout: https://gist.github.com/paulirish/5d52fb081b3570c81e3a. Important note: when layout is forces, browser paused JS in main thread despite the call stack isn't empty. Let's check it on the example:

div1.style.height = "200px"; // Change element size
var height1 = div1.clientHeight; // Read property

Browser cannot calculate clientHeight of our div1 without recalculating its real size. In the case, the browser paused JS execution and runs: Style (to check what is to change), Layout (to recalculate sizes) Layout calculates not only elements that are placed before our div1, but after as well. Modern browsers optimize calculation so that you won't recalculate the whole dom tree each time, but we still have it in bad cases. The process of recalculation is called Layout Shift. You can check it on the screenshot and see that you have the list of the elements which will be modified and shifted during layout:

An image from Notion

Browsers try not to force layout each time. So they group operations:

div1.style.height = "200px";
var height1 = div1.clientHeight; // <-- layout 1
div2.style.margin = "300px";
var height2 = div2.clientHeight; // <-- layout 2 

On the first line browser just planned height changed. On the second line, browser got a request to read the property. As we have pending height changes, browser has to force layout. The same situation we have on 3rd and 4th lines. To make it better for browsers we may group read and write operations:

div1.style.height = "200px";
div2.style.margin = "300px";
var height1 = div1.clientHeight; // <-- layout 1
var height2 = div2.clientHeight;

By grouping elements, we get rid of the second layout, because when browser reaches the 4th line it already has all the needed data. Thus, our event loop mutates from only one loop to several as we can force layout on both tasks and microtask stages:

An image from Notion

Some advice on how to optimize layout: 1. Reduce the DOM nodes number 2. Group read \ write operations to get rid of unnecessary layouts 3. Replace operations that force layout with operations which force composite

Paint

An image from Notion

We have the element, its position on a viewport and its size. Now we have to apply color, background that is to say to "draw" it

An image from Notion

This operation usually doesn't consume lots of the time, however it may be big during the first render. After this step we should "physically" draw the frame. The latest operation is "Composition".

Composition

An image from Notion

Composition is the only stage that runs on GPU by default. In this step browser executes only specific CSS styles like "transform". Important note: transform: translate doesn't "turn on" the render on a GPU. So, if you have transform: translateZ(0) in your codebase to move the render on a GPU, it doesn't work in such a way. It's a misconception. Modern browsers can move part of the operation to the GPU on their own. I didn't find the up-to-date list for that, so it's better to check in source code: https://source.chromium.org/chromium/chromium/src/+/master:third_party/blink/renderer/core/paint/compositing/compositing_reason_finder.cc;l=39

An image from Notion

transform is the best choice for complex animations: 1) We don't force layout each frame, we save CPU time 2) These animations get rid of artifacts (soap), small lags which you may follow when website has animations implemented through top, right, bottom, left.

How to optimize render?

The most difficult operation for frame rendering is the layout. When you have a complex animation, each render may require shifting all the DOM elements that are ineffective, as you'd spend 13-20ms (or even more). You will lose frames and hence, your website performance. Several examples:

An image from Notion

We may pass layout phase if we change colours, background image, etc.

An image from Notion

We may not need layout and paint when we use transform and we don't read properties from our DOM elements. You may cache them and store in the memory.

Summing up, there are some advice:

1) Move animations from JS to CSS. Running additional JS code is not "for free" 2) Animate transform for "moving" objects 3) Use will-change property. It allows browsers to "prepare" DOM elements for the property mutations. This property just helps browsers to see, that developer is about to change it. https://developer.mozilla.org/en-US/docs/Web/CSS/will-change 4) Use batch changed in DOM 5) Use requestAnimationFrame to plan changes in the next frame 6) Combine read \ write element css properties operations, use memoization. 7) Pay attention to properties that force layout: https://gist.github.com/paulirish/5d52fb081b3570c81e3a 8) When you have a non-trivial situation it's better to run the profiler and check frequency and timings. It gives you the data that phase is slow. 9) Optimize step-by-step, do not try to do everything at once.

Thanks for attention :)

In this part, we found out how the runtime works in our browsers, its pros and cons. It allows us to: 1. Understand how to write better code 2. Determinate problems when we see lags.

An image from Notion