Async Rust in Three Parts

2024 October 23

Async/await, or "async IO", is a new-ishRust added async/await in 2019. For comparison, C# added it in 2012, Python in 2015, JS in 2017, and C++ in 2020. language feature that lets our programs do more than one thing at a time. It's sort of an alternative to multithreading, though Rust programs often use both. Async is popular with websites and network services that handle many connections at once,"Many" here conventionally means ten thousand or more. This is sometimes called the "C10K problem", short for 10,000 clients or connections. because running lots of "futures" or "tasks" is more efficient than running lots of threads.

This series is an introduction to futures, tasks, and async IO in Rust. Our goal will be to get a good look at the machinery, so that async code doesn't feel like magic. We'll start by translating ("desugaring") async examples into ordinary Rust, and gradually we'll build our own async "runtime".For now, a "runtime" is a library or framework that we use to write async programs. Building our own futures, tasks, and IO will gradually make it clear what a runtime does for us. I'll assume that you've written some Rust before and that you've read The Rust Programming Language ("The Book") or similar.The multithreaded web server project in Chapter 20 is especially relevant.

Let's get started by doing more than one thing at a time with threads. This will go well at first, but we'll run into trouble as the number of threads grows. Then we'll get the same thing working with async/await, to see what all the fuss is about. That'll give us some example code to play with, and in Part One we'll start digging into it.

Threads

Here's an example function foo that takes a second to run:

fn foo(n: u64) {
println!("start {n}");
thread::sleep(Duration::from_secs(1));
println!("end {n}");
}

If we want to make several calls to foo at the same time, we can spawn a thread for each one. Click on the Playground button to see that this takes one second instead of ten:You'll probably also see the "start" and "end" prints appear out of order. Different threads running at the same time run in an unpredictable order, and that can be true of futures too.

fn main() {
let mut thread_handles = Vec::new();
for n in 1..=10 {
thread_handles.push(thread::spawn(move || foo(n)));
}
for handle in thread_handles {
handle.join().unwrap();
}
}

Note that join here means "wait for the thread to finish". Threads start running in the background as soon as we call spawn, so all of them are making progress while we wait on the first one, and the rest of the calls to join return quickly.

We can bump this example up to a hundred threads, and it works just fine. But if we try to run a thousand threads,On my Linux laptop I can spawn almost 19k threads before I hit this crash, but the Playground has tighter resource limits. it doesn't work anymore:

thread 'main' panicked at /rustc/3f5fd8dd41153bc5fdca9427e9e05...
failed to spawn thread: Os { code: 11, kind: WouldBlock, message:
"Resource temporarily unavailable" }

Each thread uses a lot of memory,In particular, each thread allocates space for its "stack", which is 8 MiB by default on Linux. The OS uses fancy tricks to allocate this space "lazily", but it's still a lot if we spawn thousands of threads. so there's a limit on how many threads we can spawn. It's harder to see on the Playground, but we can also cause performance problems by switching between lots of threads at once.Here's a demo of passing "basketballs" back and forth among many threads, to show how thread switching overhead affects performance as the number of threads grows. It's longer and more complicated than the other examples here, and it's ok to skip it. Threads are a fine way to run a few jobs in parallel, or even a few hundred, but for various reasons they don't scale well beyond that.A thread pool can be a good approach for CPU-intensive work, but when each job spends most of its time blocked on IO, the pool quickly runs out of worker threads, and there's not enough parallelism to go around. If we want to run thousands of jobs, we need something different.

Async

Let's try the same thing with async/await. For now we'll just type it out and run it on the Playground without explaining anything. Our async foo function looks like this:The async examples in this introduction and in most of Part One will use the Tokio runtime. There are several async runtimes available in Rust, but the differences between them aren't important for this series. Tokio is the most popular and the most widely supported.

async fn foo(n: u64) {
println!("start {n}");
tokio::time::sleep(Duration::from_secs(1)).await;
println!("end {n}");
}

Making a few calls to foo one-at-a-time looks like this:In Parts Two and Three of this series, we'll implement a lot of what #[tokio::main] is doing. Until then we can just take it on faith that it's "the thing we put before main when we use Tokio."

#[tokio::main]
async fn main() {
foo(1).await;
foo(2).await;
foo(3).await;
}

The first thing that's different about async functions is that we declare them with the async keyword, and we write .await when we call them. Fair enough.

Making several calls to foo at the same time looks like this:Unlike the version with threads above, you'll always see this version print its start messages in order, and you'll usually see it print the end messages in order too. However, it's possible for the end messages to appear out of order, because Tokio's timer implementation is complicated.

#[tokio::main]
async fn main() {
let mut futures = Vec::new();
for n in 1..=10 {
futures.push(foo(n));
}
let joined_future = future::join_all(futures);
joined_future.await;
}

Note that we don't .await each call to foo this time. Each un-awaited foo returns a "future", which we collect in a Vec, kind of like the Vec of thread handles above. But join_all is very different from the join method we used with threads. Previously joining meant waiting on something, but here it means combining multiple "futures" together somehow. We'll get to the details in Part One, but for now we can add some more prints to see that join_all doesn't take any time, and none of foos start running until we .await the joined future.

Unlike the threads example above, this works even if we bump it up to a thousand futures. In fact, if we comment out the prints and build in release mode, we can run a million futures at once. This sort of thing is why async is popular.

Important Mistakes

We can get some hints about how async works if we start making mistakes. First, let's try using thread::sleep instead of tokio::time::sleep in our async function:

async fn foo(n: u64) {
println!("start {n}");
thread::sleep(Duration::from_secs(1)); // Oops!
println!("end {n}");
}

(Playground running…)

Oh no! Everything is one-at-a-time again. It's an easy mistake to make, unfortunately.There have been attempts to automatically detect and handle blocking in async functions, but that's led to performance problems, and it hasn't been possible to handle all cases. As we work through Part One, it'll become clear how thread::sleep gets in the way of async. For now, we might guess that these foo functions running "at the same time" are actually all running on one thread.

We can also try awaiting each future in a loop, the same way we originally joined threads in a loop:

#[tokio::main]
async fn main() {
let mut futures = Vec::new();
for n in 1..=10 {
futures.push(foo(n));
}
for future in futures {
future.await; // Oops!
}
}

This also doesn't work! What we're seeing is that futures don't do any work "in the background"."Tasks" are futures that run in the background. We'll get to those in Part Two. Instead, they do their work when we .await them. If we .await them one-at-a-time, they do their work one-at-a-time. But somehow join_all lets us .await all of them at the same time.

Ok, we've got a lot of mysteries here. Let's start solving them.