Object Soup is Made of Indexes
2023 October 23
When objects come and go and change all the time, and any one might point to any other, I call that "object soup".​In other words, object soup is an implicit, heterogeneous, mutable graph in a program that might not look like it's trying to build a graph. It's hard to write object soup in Rust, because it breaks the rules for references.​I'm assuming that you've already seen Rust's ownership, borrowing, and mutability rules. If not, here's an overview from a talk by Niko Matsakis, and here's the relevant chapter of The Book. But sometimes it's just how the world works: A creature in a game targets another creature, and then its target disappears. A cell in a spreadsheet depends on another cell, and the other cell's value changes. A song in a music player links to a singer, and the singer links to their songs. These programs are object soup by design, but Rust doesn't let us do things like this with references. So what do we do?
The short answer is: use indexes instead of references. To see why, we'll look at three other approaches that don't work. If you just want to see the code that works, skip to part four.
Our object soup of the day is a toy program that models two friends, Alice and Bob. Here's the Python version:
class Person:
def __init__(self, name):
self.name = name
self.friends = []
def add_friend(self, other):
self.friends.append(other)
alice = Person("Alice")
bob = Person("Bob")
alice.add_friend(bob)
bob.add_friend(alice)
This is simple, boring Python, and it would be simple and boring in most languages.​Garbage collection solves a lot of problems, but there are monsters lurking in dark corners. But in Rust it's surprisingly tricky.
Part One: Move Semantics
A naive Rust translation doesn't compile:
struct Person {
name: String,
friends: Vec<Person>,
}
impl Person {
fn new(name: &str) -> Person {
Person { name: name.into(), friends: Vec::new() }
}
fn add_friend(&mut self, other: Person) {
self.friends.push(other);
}
}
fn main() {
let mut alice = Person::new("Alice");
let mut bob = Person::new("Bob");
alice.add_friend(bob);
bob.add_friend(alice); // error: borrow of moved value: `bob`
}
Here's the full error:
error[E0382]: borrow of moved value: `bob`
--> src/main.rs:20:5
|
18 | let mut bob = Person::new("Bob");
| ------- move occurs because `bob` has type `Person`, which
| does not implement the `Copy` trait
19 | alice.add_friend(bob);
| --- value moved here
20 | bob.add_friend(alice); // error: borrow of moved value: `bob`
| ^^^^^^^^^^^^^^^^^^^^^ value borrowed here after move
Passing Bob to add_friend
by value moves him, because Person
isn't
Copy
.​Again I'm assuming that you've already seen move semantics
in Rust. If not, here's the relevant chapter of The
Book,
and here's a comparison with move semantics in
C++. A quick fix is to add #[derive(Clone)]
to Person
and to clone
each argument to add_friend
. But copying or
cloning isn't what we want when we're writing object soup. The real Alice and
Bob will change over time, and any copies of them will quickly get out of
sync.​In fact this example is already out of sync. The copy of Bob
in Alice's friends list doesn't get updated by the second call to
add_friend
. A naive C++ translation has the same problem.
Like most garbage-collected languages, Python doesn't have this problem, because it passes objects around "by reference". Can we use references in Rust?
Part Two: Borrowing
No we can't, because Rust doesn't let us mutate objects that are borrowed.​The exception to this rule is "interior mutability", and we'll get to that in the next section. If we use shared references:
alice.add_friend(&bob);
bob.add_friend(&alice);
We get a compiler error when we try to modify Bob:
error[E0502]: cannot borrow `bob` as mutable because it is also borrowed
as immutable
--> src/main.rs:20:5
|
19 | alice.add_friend(&bob);
| ---- immutable borrow occurs here
20 | bob.add_friend(&alice);
| ^^^^----------^^^^^^^^
| | |
| | immutable borrow later used by call
| mutable borrow occurs here
If we use mutable references, we can avoid aliasing Bob by going through Alice's friends list to modify him:​This is worth a shot, but the uniqueness rule means we can't use mutable references for many-to-many relationships, so we definitely can't make object soup out of them in general.
alice.add_friend(&mut bob);
alice.friends[0].add_friend(&mut alice);
But we still get an error about aliasing Alice:
error[E0499]: cannot borrow `alice` as mutable more than once at a time
--> src/main.rs:20:33
|
20 | alice.friends[0].add_friend(&mut alice);
| ------------- ---------- ^^^^^^^^^^ second mutable borrow
| | | occurs here
| | first borrow later used by call
| first mutable borrow occurs here
Playing with these examples is educational,​It's worth spending some time "fighting the borrow checker" to build up intuition about what works and what doesn't. But when you get stuck, a good rule of thumb is to avoid putting lifetime parameters on structs. but there's no way to make them work.​Ok I lied. We can get something working by combining shared references and interior mutability. Circular borrows in safe code! It's a neat party trick, but it's not useful in real programs, because it breaks if we try to move anything. Object soup wants to do aliasing and mutation at the same time, and that's exactly what references in Rust are supposed to prevent. We need something different.
Part Three: Interior Mutability
If you search for "how to mutate a shared object in Rust", you'll find articles
about Rc
​Rc
stands for
"reference counting",
which is the strategy it uses to free its contents. It behaves like a
shared reference with no lifetime. It's similar to std::shared_ptr
in C++
and automatic reference counting in Swift. and RefCell
,​RefCell
is like an
RwLock
that panics instead of blocking and can't be shared across threads. It lets
us get &mut
T
from &RefCell<T>
(which we get from Rc
). but Rc<RefCell<T>>
doesn't work
well for object soup. To see why, let's try it:
let alice = Rc::new(RefCell::new(Person::new("Alice")));
let bob = Rc::new(RefCell::new(Person::new("Bob")));
alice.borrow_mut().add_friend(Rc::clone(&bob));
bob.borrow_mut().add_friend(Rc::clone(&alice));
There's a lot going on there,​borrow_mut
returns a smart
pointer type
called RefMut
that implements the
Deref
and
DerefMut
traits.
A lot of Rust magic works through those traits and "deref
coercions". Spelling
out all the types is helpful for seeing what's going on. The same pattern
comes up with Arc<Mutex<T>>
, which is fundamental for multithreading. and it's pretty verbose, but it
compiles and runs. That's progress! Unfortunately it has a memory
leak,​Usually it's hard to leak memory by accident in Rust, but
reference cycles in Rc
and Arc
are the main exception. Again this is
similar to C++ and to Swift, and you can make Python do the same thing if
you call
gc.disable
. which we can see if we run it under ASan or
Miri.​Tools → Miri on the Playground To fix that, we need to either explicitly break cycles before
Alice and Bob go out of scope or use Weak
references. Both options are
error-prone.​Weak
references are a good fit for asymmetrical
relationships like child nodes and parent nodes in a tree, but here it's
not clear who should be weak and who should be strong. If all friends are
weak, then we need to hold strong references somewhere else to keep people
alive.
As our program grows, the uniqueness rule will also come back to bite us in the
form of RefCell
panics. To provoke that, let's change add_friend
to check
for people befriending themselves. Here's the change in
Python:​No two people ever have the same name. It's fine.
def add_friend(self, other):
if other.name != self.name:
self.friends.append(other)
And in Rust:
fn add_friend(&mut self, other: &Rc<RefCell<Person>>) {
if other.borrow().name != self.name {
self.friends.push(Rc::clone(other));
}
}
The Rust version compiles, but if we make Alice call add_friend
on
herself, it panics:
thread 'main' panicked at 'already mutably borrowed: BorrowError',
src/main.rs:15:18
The problem is that we "locked" the RefCell
to get &mut
self
, and that
conflicts with other.borrow()
when other
is aliasing self
.​In multithreaded code using Arc<Mutex<T>>
or Arc<RwLock<T>>
,
the same mistake is a deadlock. The
fix is to avoid &mut
self
methods and keep our borrows short-lived,
but this is also error-prone. We might've missed this bug without a test case.
Rc<RefCell<T>>
isn't a good way to write object soup, because it has problems
with aliasing and cycles.​Unsafe code has similar problems. Unless you're extremely
careful, raw pointer soup usually breaks the uniqueness rule when you
convert pointers back into references to call safe functions. That's
undefined behavior in Rust, even when the same code would've been legal in
C or C++. Again we need something different.
Part Four: Indexes
We can do better with simpler tools. Keep Alice and Bob in a Vec
and have
them refer to each other by index:
struct Person {
name: String,
friends: Vec<usize>,
}
fn new_person(people: &mut Vec<Person>, name: &str) -> usize {
people.push(Person { name: name.into(), friends: Vec::new() });
people.len() - 1
}
fn add_friend(people: &mut Vec<Person>, this_id: usize, other_id: usize) {
if people[other_id].name != people[this_id].name {
people[this_id].friends.push(other_id);
}
}
fn main() {
let mut people = Vec::new();
let alice_id = new_person(&mut people, "Alice");
let bob_id = new_person(&mut people, "Bob");
add_friend(&mut people, alice_id, bob_id);
add_friend(&mut people, bob_id, alice_id);
add_friend(&mut people, alice_id, alice_id); // no-op
}
This is how we write object soup in Rust. We still need to avoid &mut
self
methods, and each function has an extra people
argument. But aliasing
mistakes are compiler errors instead of panics, and there's no risk of memory
leaks. We can also serialize the Vec
with serde
​Rc
implements Serialize
if you enable the rc
feature, but trying to serialize a
reference cycle will trigger infinite recursion and panic. or
parallelize it with rayon
.
Part Five: Next Steps
Even though we're technically not leaking memory, we can't delete anything from
the Vec
without messing up the indexes of other elements. One way to allow
for deletion is to replace the Vec
with a HashMap
, using either an
incrementing counter or random UUIDs for the keys.
There are also more specialized data structures like
Slab
and SlotMap
.
When you have more than one type of object to keep track of, you'll probably
want to group them in a struct with a name like World
or State
or
Entities
. In her 2018 keynote on writing games in
Rust,​This article is really just a rehash of Catherine's talk. I
can't recommend it highly enough. Catherine
West talked about how this pattern is a precursor to what game developers call
an "entity component system". These patterns solve borrowing and mutability
problems in Rust, but they also solve problems like serialization,
synchronization, and cache locality that come up when we write object soup in
any language.
Discussion threads on r/rust, Hacker News, and lobste.rs.