Async&Multithrad - 1 Async

Table of Contents for the whole series

1.What is asynchronous programming? (JavaScript)
1.1How to perform asynchronous programming in JavaScript?
2.Asynchronous programming in other languages
3.Multithreading programming (C++)
a.What is multithreading? What is its relationship with hardware?
b.What does it mean to use multithreading?
c.What is the goal of using multithreading?
4.Multithreading programming in other languages
5.The relationship between multithreading and asynchronous programming (mainly discussing GoRoutine, analyzing the source code, and examining if Rust has made improvements)
6.GPU parallel programming

Introduction

Background (Asynchronous & Multithreading)

This series mainly discusses asynchronous and multithreading programming. Both asynchronous and multithreading programming aim to fully utilize computing resources.

As CPU single-core frequency development reached a bottleneck, the trend shifted towards multi-core CPUs. In recent years, parallel matrix operations on GPUs have greatly promoted the development of machine learning and LLMs. This series will mainly focus on CPUs and will not cover GPUs for now. The main goal is to introduce asynchronous and multithreading programming and their relationship.

So, what is asynchronous programming? How is it related to multithreading? What is the relationship between multithreading and hardware? And how are asynchronous & multithreading programming related to parallel computing? We will address these questions throughout this series.

(If I discover errors or new viewpoints, I will update the published articles or change the structure of this series)

Experimental Environment

OS: Windows 11 21H2 22000.2538

Node.js: 20.13.1

What is asynchronous programming?

What is asynchronous?

The opposite of asynchronous is synchronous. Let's look at an example from daily life: I go to the post office to mail a letter to borrow money from a friend, and then I deposit the check into the bank after receiving it. I can achieve this using synchronous and asynchronous approaches:

Basic Introduction to Async and Sync

Figure 1. Basic Introduction to Async and Sync

I wrote a sample program in Node.js to see the execution results Click to view source code MailDemo

Synchronized Mailing Demo

Figure 2. Synchronized Mailing Demo

Asynchronized Mailing Demo

Figure 3. Asynchronized Mailing Demo

So, the biggest difference we can see between synchronous and asynchronous is whether I (the sender) wait at the post office after handing the letter to the post office or go do other things.

From this example, it's obvious that the asynchronous mailing process is more reasonable. In the synchronous mailing program, there's a seemingly strange question: Why do I need to wait at the post office after giving the letter? I could be eating, shopping, watching movies, or doing many other things. When my friend replies, the post office can call me, and I can then pick up the letter.

So, waiting at the post office is an unreasonable operation. A term called blocking often appears with synchronous programming, meaning the protagonist is doing nothing but waiting for a result. This is unreasonable in real life and also in programming, so asynchronous programming solves this problem.

In the industry, there is a field called operations research, which aims to keep resources (including machines, labor, etc.) running efficiently to complete more tasks in the same amount of time. (I hope everyone can enjoy life while earning money, and not run like machines. Operations research shouldn't apply to humans.)

Back to programming, let's look at the timing diagrams of asynchronous and synchronous.

Synchronized Mailing Sequence Diagram

Figure 4. Synchronized Mailing Sequence Diagram

Asynchronized Mailing Sequence Diagram

Figure 5. Asynchronized Mailing Sequence Diagram

These two timing diagrams make it clearer that asynchronous is more reasonable. During the mailing process, I can do many things like eating and shopping without waiting at the post office.

What are the characteristics of this mailing activity? 1. It's handled by someone else (the post office), not myself. 2. Waiting for a reply takes a relatively long time.

Asynchronous Programming

In programming, there are some operations similar to mailing a letter, such as file reading/writing, network requests (TCP, HTTP, DNS), etc.

File reading/writing: The disk controller helps us find the content we want.
Network requests: The server processes the request.
Timer: The timer just waits and doesn't need to wait for any return.

The disk controller and server are the third parties, equivalent to the post office in the previous example. The time it takes to handle these tasks depends on them, and I can't control it.

To summarize: Asynchronous programming allows the main thread to perform other tasks while waiting for some tasks to be handled by a third party.

Asynchronous Programming in JavaScript

Why do I want to discuss asynchronous programming in JavaScript?

Because JavaScript was initially designed to be single-threaded, it follows an event-driven model in the browser, and network requests (such as Ajax) are frequent. If there is any blocking, the browser can't process user clicks and render the interface in time, leading to an unresponsive UI and poor user experience. (The JavaScript browser environment has some auxiliary threads in addition to the main thread)

Why not design it to be multithreaded? Handling multithreading is extremely complicated and causes more headaches for developers. Comparatively, the asynchronous model is much simpler to write than multithreading.

For example, a web page in the browser requests a REST API (HTTP): Mail, which takes 2 seconds to return data. - If it's synchronous: During these 2 seconds, user clicks on the page are unresponsive because the main thread can only wait for the Mail to return and cannot do other things. - If it's asynchronous: During these 2 seconds, the main thread is idle and can handle any user requests and render the page, avoiding unresponsive UI. If the Mail suddenly returns data at 1.2 seconds, it will not interrupt other tasks. When the Mail returns at 1.2 seconds, JavaScript will insert the Mail's callback function into the event loop, waiting for the current task to complete before retrieving and executing the Mail's callback function from the event loop.

We encountered two new terms: callback function and event loop. Let's explain how JavaScript uses them to achieve asynchronous programming.

Callback function

From now on, we will only talk about the asynchronous model. Please forget the synchronous model.

What is a callback function? As shown in Figure 5, "Saved the cheque to my bank account" is a callback function because it can only be executed after my friend sends back the cheque, and the caller (notifier) should be the third-party post office because it knows precisely when the reply letter arrives.

Let's continue to look at the mailing example from the code perspective Click to view source code MailDemo CallBack

CallBack Code

Figure 6. CallBack code in mailing example

We can see that callBackFromPostOffice() is the CallBack function. The post office sends me a message after receiving the letter, and then I call this function myself. (Because my demo is written in multithreading, using message communication is very important to avoid directly calling main thread functions from new threads. JavaScript is single-threaded.)

My demo uses a single event, so after the post office receives the reply letter, it directly calls the only callback function. However, a typical user interface is filled with various events. Where are these callback functions stored? How are they handled?

JavaScript uses a queue (first in, first out) to store these callback functions. The event loop constantly checks if the post office has received a letter, and if it has, it inserts the callback function callBackFromPostOffice() into the queue, then checks if there are waiting callback functions in the queue, and executes them.

To summarize: A callback function processes the result data after receiving it (depositing the friend's cheque into the bank).

Event loop

If I were to name it, I might call it TaskLoop because its core function is to execute callback functions continuously. If we consider each callback function as a task, most of the time is spent handling tasks. Of course, it also checks if the friend's mail has arrived (if the IO operation is complete or if the timer has counted down).

JavaScript was originally used in browsers, and most tasks come from user actions: clicks, mouseovers, page loads, etc. These are all events, so the loop that handles them is called the EventLoop.

I changed the previous mailing example to use an event loop. Let's look at it Click to view source code MailDemo EventLoop

EventLoop Code

Figure 7. EventLoop in mailing example

In fact, after we execute eventLoop.addTask(), callBackFromPostOffice() will not execute immediately. It must wait for eventLoop.run() to check the queue and find that there are tasks (callback functions) waiting to be executed before it runs.

The JavaScript EventLoop in the browser and Node.js is an enhanced version of this example (JavaScript is just a language. Specific syntax parsing and underlying functionality need a runtime like Chrome or Node.js, which both use the V8 engine for syntax parsing). The task types and trigger times of the event loop are more complex than in this example, and there are differences between the browser and Node.js.

What is the event loop? When the user needs to execute some functions asynchronously, the callback function is inserted into the queue, and the event loop checks the queue and executes the tasks if there are any. For asynchronous IO operations, such as file and network operations, it also regularly checks if the operation is successful, and if so, inserts the callback function into the event loop. Next, let's see how Node.js implements this.

Node.js

Let's look at the flowchart first. The event loop and asynchronous operations in Node.js are managed by the UV library. The code corresponding to Figure 8 can be found here Node.js EventLoop

EventLoop in Node.js

Figure 8. EventLoop in Node.js

See Figure 9 for the source code related to the data structure of the Node.js Loop (Linux platform) | See Figure 8 for the source code related to the Node.js EventLoop

EventLoop in Node.js

Figure 9. Main relationship of the EventLoop in Node.js

When all IO operations are complete and callback functions are executed, the EventLoop will exit.

Let's talk about each step of this event loop (timers=>Pending Queue=>Idle Handlers=>Prepare Handlers=>IO Poll=>Check=>Close):

timers: Execute the callback functions set by setTimeout() and setInterval(). There are two interesting things here: EventLoop Timer in Node.js

Figure 9.1 EventLoop Timer in Node.js

The data structure corresponding to timers is a min-heap. A heap is partially sorted; in handling timers, it only needs to take out the smallest timer each time, which is the closest one (Click to see the source code related to the EventLoop Timer).
Let's review the time complexity of heap sort: For two main steps, building the tree and removing elements. Assuming there are n elements, the time complexity of building the tree (inserting all elements) is O(n), and the time complexity of removing all elements, including rebuilding the tree, is O(nlogn). Heap sort is partial sorting, taking out and adjusting the tree once each time, unlike other sorting algorithms that sort once and take out all with a time complexity of O(n) and single O(1).
Node.js timers are based on a polling mechanism and may be blocked by other callback functions or even another timer's callback function. So, we can confirm that timers set by setTimer() are inaccurate. Setting very precise timers is another topic.

Pending Queue: Execute some tasks left over from the IO Poll phase. Generally, when the IO Poll phase finds that the IO operation is complete, it will handle the corresponding callback function immediately, but in some cases, it may trigger new callback functions, such as errors, which will be placed in the Pending Queue to be processed in the next round. (In fact, the current code will execute the Pending Queue 8 times after the IO Poll phase. I'm also curious about the number 8. Refer to the middle code in Figure 8)

Idle Handers&Prepare Handlers: Used internally by the UV library.

IO Poll: Here, it checks if these IO operations are complete, and if so, it executes the corresponding callback function. If further processing is needed, it creates new callback functions and places them in the Pending Queue. For example, checking if a file read is complete, if the HTTP API request has returned results, if the DNS query has returned results, etc.

This step is crucial and significantly impacts performance. The epoll model is used on the Linux platform, which achieved extremely high performance when implemented by Nginx. Windows has corresponding completion ports (IOCP). Mac's is called Kqueue. I will introduce these details in the next article.

Check: Execute the callback functions set by setImmediate().

Close: Execute the callback functions for close requests, such as socket.on('Close', ...). Related to network operations, not file operations.

Microtasks

In the JavaScript language, besides using setTimeout(), setInterval(), and setImmediate() to set custom asynchronous functions, you can also use process.nextTick(), Promise(), and Await() [the advanced implementation of Promise()] to insert the callback functions (any function) you want to execute asynchronously into the event loop. I will mention the details of Promise and Await in the next article.

Functions inserted by process.nextTick() and Promise.then() are called microtasks. The other tasks mentioned in the previous section (setTimeout(), setInterval(), IO, and setImmediate()) are called macro tasks. When are microtasks and macro tasks executed?

Tasks Execution Process in Node.js

Figure 10. Tasks Execution Process in Node.js

We can see that the priority of microtasks is very high and will be executed immediately after the synchronous code is executed. Moreover, after each type of task, such as setTimeout(), setInterval(), I/O Poll, is executed, the microtask queue is checked, and if there are any, they are executed immediately.

Note:

The priority of functions inserted by process.nextTick() is higher than those inserted by Promise.then().
The microtask queue is checked after each task is executed, not after the entire macro task queue is executed Related Pull Request.

I wrote a program to check the execution order of different asynchronous tasks Click to view source code

Tasks Execution Order in Node.js

Figure 10. Tasks Execution Order in Node.js

We can compare it with the previous code flowchart, which matches completely.

Browser

The event loop in the browser is a bit different:

No process.nextEvent()
There is a MutationObserver() task, mainly used to monitor changes in network element attributes, which also belongs to microtasks
There are rendering tasks (actually, the browser has a dedicated rendering thread that receives rendering tasks sent by the main thread, and the rendering thread is responsible for the specific implementation to render efficiently without blocking the main thread)
Synchronous code execution is considered the first macro task
In the Event Loop, the execution order is one macro task => execute all microtasks => all rendering tasks. The execution order is similar to Node.js, but in Node.js, all microtasks are executed first, and there are no rendering tasks.

Summary

In this article, we mainly introduced what asynchronous programming is. Through a mailing example, callback functions, and then to the event loop in Node.js, we explained what asynchronous looks like from a life perspective to the code level.

In the next article, we will discuss how to perform asynchronous programming in JavaScript and some related underlying implementations.

If you find any errors in my article or have any new viewpoints, feel free to contact me or leave a comment.

relative resources

GPU programming from Jim Fan