Object Soup is Made of Indexes

2023 October 23

When objects come and go and change all the time, and any one might point to any other, I call that "object soup".In other words, object soup is an implicit, heterogeneous, mutable graph in a program that might not look like it's trying to build a graph. It's hard to write object soup in Rust, because it breaks the rules for references.I'm assuming that you've already seen Rust's ownership, borrowing, and mutability rules. If not, here's an overview from a talk by Niko Matsakis, and here's the relevant chapter of The Book. But sometimes it's just how the world works: A creature in a game targets another creature, and then its target disappears. A cell in a spreadsheet depends on another cell, and the other cell's value changes. A song in a music player links to a singer, and the singer links to their songs. These programs are object soup by design, but Rust doesn't let us do things like this with references. So what do we do?

The short answer is: use indexes instead of references. To see why, we'll look at three other approaches that don't work. If you just want to see the code that works, skip to part four.

Our object soup of the day is a toy program that models two friends, Alice and Bob. Here's the Python version:

class Person:
def __init__(self, name):
self.name = name
self.friends = []

def add_friend(self, other):
self.friends.append(other)

alice = Person("Alice")
bob = Person("Bob")
alice.add_friend(bob)
bob.add_friend(alice)

This is simple, boring Python, and it would be simple and boring in most languages.Garbage collection solves a lot of problems, but there are monsters lurking in dark corners. But in Rust it's surprisingly tricky.

Part One: Move Semantics

A naive Rust translation doesn't compile:

struct Person {
name: String,
friends: Vec<Person>,
}

impl Person {
fn new(name: &str) -> Person {
Person { name: name.into(), friends: Vec::new() }
}

fn add_friend(&mut self, other: Person) {
self.friends.push(other);
}
}

fn main() {
let mut alice = Person::new("Alice");
let mut bob = Person::new("Bob");
alice.add_friend(bob);
bob.add_friend(alice); // error: borrow of moved value: `bob`
}

Here's the full error:

error[E0382]: borrow of moved value: `bob`
--> src/main.rs:20:5
|
18 | let mut bob = Person::new("Bob");
| ------- move occurs because `bob` has type `Person`, which does not implement the `Copy` trait
19 | alice.add_friend(bob);
| --- value moved here
20 | bob.add_friend(alice); // error: borrow of moved value: `bob`
| ^^^ value borrowed here after move

Passing Bob to add_friend by value moves him, because Person isn't Copy.Again I'm assuming that you've already seen move semantics in Rust. If not, here's the relevant chapter of The Book, and here's a comparison with move semantics in C++. A quick fix is to add #[derive(Clone)] to Person and to clone each argument to add_friend. But copying or cloning isn't what we want when we're writing object soup. The real Alice and Bob will change over time, and any copies of them will quickly get out of sync.In fact this example is already out of sync. The copy of Bob in Alice's friends list doesn't get updated by the second call to add_friend. A naive C++ translation has the same problem.

Like most garbage-collected languages, Python doesn't have this problem, because it passes objects around "by reference". Can we use references in Rust?

Part Two: Borrowing

No we can't, because Rust doesn't let us mutate objects that are borrowed.The exception to this rule is "interior mutability", and we'll get to that in the next section. If we use shared references:

alice.add_friend(&bob);
bob.add_friend(&alice);

We get a compiler error when we try to modify Bob:

error[E0502]: cannot borrow `bob` as mutable because it is also borrowed as immutable
--> src/main.rs:20:5
|
19 | alice.add_friend(&bob);
| ---- immutable borrow occurs here
20 | bob.add_friend(&alice);
| ^^^^----------^^^^^^^^
| | |
| | immutable borrow later used by call
| mutable borrow occurs here

If we use mutable references, we can avoid aliasing Bob by going through Alice's friends list to modify him:This is worth a shot, but the uniqueness rule means we can't use mutable references for many-to-many relationships, so we definitely can't make object soup out of them in general.

alice.add_friend(&mut bob);
alice.friends[0].add_friend(&mut alice);

But we still get an error about aliasing Alice:

error[E0499]: cannot borrow `alice` as mutable more than once at a time
--> src/main.rs:20:33
|
20 | alice.friends[0].add_friend(&mut alice);
| ------------- ---------- ^^^^^^^^^^ second mutable borrow occurs here
| | |
| | first borrow later used by call
| first mutable borrow occurs here

Playing with these examples is educational,It's worth spending some time "fighting the borrow checker" to build up intuition about what works and what doesn't. But when you get stuck, a good rule of thumb is to avoid putting lifetime parameters on structs. but there's no way to make them work.Ok I lied. We can get something working by combining shared references and interior mutability. Circular borrows in safe code! It's a neat party trick, but it's not useful in real programs, because it breaks if we try to move anything. Object soup wants to do aliasing and mutation at the same time, and that's exactly what references in Rust are supposed to prevent. We need something different.

Part Three: Interior Mutability

If you search for "how to mutate a shared object in Rust", you'll find articles about RcRc stands for "reference counting", which is the strategy it uses to free its contents. It behaves like a shared reference with no lifetime. It's similar to std::shared_ptr in C++ and automatic reference counting in Swift. and RefCell,RefCell is like an RwLock that panics instead of blocking and can't be shared across threads. It lets us get &mut T from &RefCell<T> (which we get from Rc). but Rc<RefCell<T>> doesn't work well for object soup. To see why, let's try it:

let alice = Rc::new(RefCell::new(Person::new("Alice")));
let bob = Rc::new(RefCell::new(Person::new("Bob")));
alice.borrow_mut().add_friend(Rc::clone(&bob));
bob.borrow_mut().add_friend(Rc::clone(&alice));

There's a lot going on there,borrow_mut returns a smart pointer type called RefMut that implements the Deref and DerefMut traits. A lot of Rust magic works through those traits and "deref coercions". Spelling out all the types is helpful for seeing what's going on. The same pattern comes up with Arc<Mutex<T>>, which is fundamental for multithreading. and it's pretty verbose, but it compiles and runs. That's progress! Unfortunately it has a memory leak,Usually it's hard to leak memory by accident in Rust, but reference cycles in Rc and Arc are the main exception. Again this is similar to C++ and to Swift, and you can make Python do the same thing if you call gc.disable. which we can see if we run it under ASan or Miri.Tools → Miri on the Playground To fix that, we need to either explicitly break cycles before Alice and Bob go out of scope or use Weak references. Both options are error-prone.Weak references are a good fit for asymmetrical relationships like child nodes and parent nodes in a tree, but here it's not clear who should be weak and who should be strong. If all friends are weak, then we need to hold strong references somewhere else to keep people alive.

As our program grows, the uniqueness rule will also come back to bite us in the form of RefCell panics. To provoke that, let's change add_friend to check for people befriending themselves. Here's the change in Python:No two people ever have the same name. It's fine.

def add_friend(self, other):
if other.name != self.name:
self.friends.append(other)

And in Rust:

fn add_friend(&mut self, other: &Rc<RefCell<Person>>) {
if other.borrow().name != self.name {
self.friends.push(Rc::clone(other));
}
}

The Rust version compiles, but if we make Alice call add_friend on herself, it panics:

thread 'main' panicked at src/main.rs:15:18:
already mutably borrowed: BorrowError

The problem is that we "locked" the RefCell to get &mut self, and that conflicts with other.borrow() when other is aliasing self.In multithreaded code using Arc<Mutex<T>> or Arc<RwLock<T>>, the same mistake is a deadlock. The fix is to avoid &mut self methods and keep our borrows short-lived, but this is also error-prone. We might've missed this bug without a test case.

Rc<RefCell<T>> isn't a good way to write object soup, because it has problems with aliasing and cycles.Unsafe code has similar problems. Unless you're extremely careful, raw pointer soup usually breaks the uniqueness rule when you convert pointers back into references to call safe functions. That's undefined behavior in Rust, even when the same code would've been legal in C or C++. Again we need something different.

Part Four: Indexes

We can do better with simpler tools. Keep Alice and Bob in a Vec and have them refer to each other by index:

struct Person {
name: String,
friends: Vec<usize>,
}

fn new_person(people: &mut Vec<Person>, name: &str) -> usize {
people.push(Person { name: name.into(), friends: Vec::new() });
people.len() - 1
}

fn add_friend(people: &mut Vec<Person>, this_id: usize, other_id: usize) {
if people[other_id].name != people[this_id].name {
people[this_id].friends.push(other_id);
}
}

fn main() {
let mut people = Vec::new();
let alice_id = new_person(&mut people, "Alice");
let bob_id = new_person(&mut people, "Bob");
add_friend(&mut people, alice_id, bob_id);
add_friend(&mut people, bob_id, alice_id);
add_friend(&mut people, alice_id, alice_id); // no-op
}

This is how we write object soup in Rust. We still need to avoid &mut self methods, and each function has an extra people argument. But aliasing mistakes are compiler errors instead of panics, and there's no risk of memory leaks. We can also serialize the Vec with serdeRc implements Serialize if you enable the rc feature, but trying to serialize a reference cycle will trigger infinite recursion and panic. or parallelize it with rayon.

Part Five: Next Steps

Even though we're technically not leaking memory, we can't delete anything from the Vec without messing up the indexes of other elements. One way to allow for deletion is to replace the Vec with a HashMap, using either an incrementing counter or random UUIDs for the keys. There are also more specialized data structures like Slab and SlotMap.

When you have more than one type of object to keep track of, you'll probably want to group them in a struct with a name like World or State or Entities. In her 2018 keynote on writing games in Rust,This article is really just a rehash of Catherine's talk. I can't recommend it highly enough. Catherine West talked about how this pattern is a precursor to what game developers call an "entity component system". These patterns solve borrowing and mutability problems in Rust, but they also solve problems like serialization, synchronization, and cache locality that come up when we write object soup in any language.


Discussion threads on r/rust, Hacker News, and lobste.rs.