Crafting Interpreters with Rust: On Garbage Collection

2024/07/24

programming language

#crafting interpreters #garbage collector #interpreter #lox #memory management #mark and sweep #programming language #reference counting #rust #unsafe #virtual machine

TL;DR Here’s the link to the project.

I became interested in implementing programming languages a few years ago and discovered Crafting Interpreters by Bob Nystrom. At the time, I had experience with Rust and decided to use it to follow the book. Albeit, being a noob, I managed to implement a fully functional bytecode interpreter that supported every feature of the Lox language as described. However, my implementation suffered memory leaks due to reference counting. Back then, I didn’t fully grasp Rust to design and implement a proper garbage collector (GC)! Now that I have more confidence in the language, I decided to revisit the project and improve its memory management scheme.

As for the goals, I aim to implement a mark-and-sweep GC that is on par with the C implementation. You probably guessed that lots of unsafe statements will be thrown around. If you have the ick for unsafe, look elsewhere. Although I’m not an expert on this subject matter and might make many mistakes, I hope you’ll find something useful here.

Viewer discretion is advised!

Viewer discretion is advised!

preface

Following the book, I implemented an interpreter for the Lox language, which has three phases - scanner, compiler, and virtual machine (VM). This post will only touch on the GC residing in the VM. Here’s a picture of how the interpreter operates and its components to clarify our starting point.

Interpreter pipeline.

Interpreter pipeline.

managed memory

Before diving deeper, let’s define garbage and what the GC does. Many readers likely know this already, but I’ll briefly review it so we’re all on the same page. If you need to learn more about this subject, there are already much better resources, and the chapter on Garbage Collection is also a great read on the topic.

We all know software needs to store data on your computer memory, i.e., RAM, to operate. Most, if not all, programs have two separate regions to hold data - the stack and the heap. How much memory is used for the stack and when it is freed can always be determined at compile-time. We are more interested in the heap, where data can be of arbitrary sizes, and the program can ask your computer for more memory when needed. Think about taking in user inputs. Before running the program, we couldn’t know how much memory would be used to hold those inputs and would have to rely on runtime information to allocate enough memory to hold them.

Your browser.

Your browser.

This is one of the most important aspects where languages differ. The programming language we’re trying to implement, Lox, is a managed high-level language. That means all the heap memory is automatically dealt with for you behind the scenes. When data is needed to be put on the heap, the VM automatically allocates memory to hold it. Once that data is no longer used, the VM ensures it is returned to your computer so your program doesn’t eat up all the memory. The component doing all of these is called the garbage collector.

garbage collection

One of the most important tasks of a GC is determining which piece of data is no longer used. Because memory is dynamically allocated, the GC cannot determine precisely at any particular moment if a piece of memory is still in use. Thus, the GC can only approximate by considering that data is still in use if it can potentially be read at any future point.

Lox uses the term object to refer to any values that are heap-allocated.

Let’s consider an example from the book:

var a = "first value";
a = "updated";
// GC here.
print a;

If the GC runs right after a is reassigned to "updated", the string "first value" is in memory, but the program can no longer access it. As a result, the GC can safely remove it. We say an object is reachable if there’s a way for the program to reference it, like the string "updated" in this case. Conversely, the string "first value" is unreachable. Intuitively, we see that reachable objects can’t be freed.

Objects that can be reached directly from the VM are called roots. Root objects can be global variables, objects on the VM’s stack, objects on the frame stack, or upvalues, i.e., values hoisted by closures. Additionally, objects can be referenced indirectly through other references, and we must also consider those. Again, taken from the book:

fun makeClosure() {
    var a = "data";
    fun f() { print a; }
    return f;
}
{
    var closure = makeClosure();
    // GC here.
    closure();
}

Here, we have a function makeClosure returning another function f. Suppose the GC runs once makeClosure was called after we have returned a reference to the function f. After the GC finishes, the returned closure is called printing out the string "data". Therefore, the GC must not free "data" when it runs. However, if we take a look at the stack when the GC ran, it’s like this:

Closure object in memory.

Closure object in memory.

The string "data" is nowhere to be found on the stack because it is hoisted into the closure, where it can be later referenced. Because the program can, at some point, indirectly access objects through root objects, they must also be considered reachable. Hence, the problem of finding garbage reduces to a graph reachability problem.

Garbage Collection - Mark-Sweep.

Garbage Collection - Mark-Sweep.

by Bob Nystrom licensed under CC BY-NC-ND 4.0

Considering the above, we arrive at a straightforward GC algorithm called mark-and-sweep. This was the first algorithm for garbage collection, devised by John McCarthy in his paper on the Lisp language. It works in two phases:

when worlds collide

Fundamentally, the issues we faced when implementing Lox’s object system aren’t due to the GC but to the difference between how Lox and Rust handle objects. In Rust, a variable can roughly be one of three kinds:

These rules are enforced by the borrow-checker, famously known for being hard to grasp for a Rust beginner. By making our program conform to these rules, the compiler can reason about one of the hardest problems to solve in static analysis - aliasing.

Variables aliasing is when two variables point to the same memory address and can mutate the data held at that address. Effectively, within the same scope, the effects of mutating one variable can be observed by the other. If two variables are not aliases, the compiler can perform some neat optimizations, and developers can make fewer mistakes.

Furthermore, by combining the borrow rules with Rust’s explicit lifetime model, the compiler can deduce when objects are no longer used and automatically insert destroy statements for us. This can also be considered a form of garbage collection, but the only difference is that it’s done at compile time, not runtime. But, unlike stack memory, we don’t know beforehand how much memory needs to be allocated.

On the other hand, Lox’s semantics deal with none of these rules. Essentially, every object in Lox is passed by reference, with no distinction between whether it’s shared or exclusive. Consider the following Lox program:

class Number {
    init(value) {
        this.value = value;
    }
}
fun add(a, b) {
    a.value = a.value + b.value;
}
var a = Number(3);
add(a, a);
print a.value;

The example above is perfectly valid Lox and will print out the value 6 when it is run. If we were to translate it to Rust while keeping similar semantics, it would look like this:

struct Number {
    value: f64,
}
fn add(a: &mut Number, b: &mut Number) {
    a.value = a.value + b.value;
}
fn main() {
    let mut a = Number { value: 3.0 };
    add(&mut a, &mut a);
    print!("{}", a.value);
}

The Rust program will not compile, and the compiler will return the following error:

error[E0499]: cannot borrow `a` as mutable more than once at a time
  --> src/main.rs:11:17
   |
11 |     add(&mut a, &mut a);
   |     --- ------  ^^^^^^ second mutable borrow occurs here
   |     |   |
   |     |   first mutable borrow occurs here
   |     first borrow later used by call

As an experiment, let’s say we modify the function add() in Rust to take two mutable values instead of two mutable references, like so:

fn add(mut a: Number, mut b: Number) {
    a.value = a.value + b.value;
}

We then face another error:

error[E0382]: use of moved value: `a`
  --> src/main.rs:11:12
   |
10 |     let a = Number { value: 3.0 };
   |         - move occurs because `a` has type `Number`, which does not implement the `Copy` trait
11 |     add(a, a);
   |         -  ^ value used here after move
   |         |
   |         value moved here
   |

error[E0382]: borrow of moved value: `a`
  --> src/main.rs:12:18
   |
10 |     let a = Number { value: 3.0 };
   |         - move occurs because `a` has type `Number`, which does not implement the `Copy` trait
11 |     add(a, a);
   |            - value moved here
12 |     print!("{}", a.value);
   |                  ^^^^^^^ value borrowed here after move
   |

Needless to say, if we strictly follow Rust’s borrow-checker, it’s impossible to have two variables modifying the same memory address. To do so, we are required to jump through a few hoops to bypass the borrow-checker.

safe bets

We went a long-winded path to show the difficulties of making the two language semantics work happily together. But how do we address these fundamental differences? In essence, we want some mechanism to enable pointer aliasing and shared mutable access to objects. Lucky for us, Rust does provide tools for dealing with these cases. Despite that, we’ll soon see there’s more to desire from doing things the Rust’s way.

In the following sections, I’ll discuss two approaches for implementing a GC for Lox VM without using unsafe that I know of. This is not a comprehensive list, but it shows the choices that can be made while working on this problem. I’ve also encountered other approaches that utilize Rust’s type system and macros to their full extent, ensuring both safety and performance. Still, I don’t understand them enough for a meaningful discussion.

reference counting

The simplest approach to GC is to utilize existing Rust types for automatic memory management, ditching the mark-and-sweep algorithm. In Rust, we typically reach out for reference counting when we need shared resource management, i.e., shared ownership. This is supported through two different types - Rc<T> and Arc<T>. Rc<T> uses a non-atomic reference count and can’t be sent across threads, while Arc<T> uses an atomic reference count and can be sent across threads. We only cover Rc<T> for brevity, but the concept will also hold for Arc<T>.

According to Rust’s documentation, the type Rc<T> provides shared ownership of a value of type T allocated in the heap. Invoking clone on it produces a new pointer to the same allocation in the heap. When the last Rc<T> pointer to a given allocation is destroyed, the value stored in that allocation is also dropped. Thanks to Rust semantics, there’s no way to get a new reference from an Rc<T> without using clone or unsafe.

Reference counting with interior mutability.

Reference counting with interior mutability.

By using Rc<T>, we are free from caring about how resources are released, but that only solves half of the problem since the reference we get from Rc<T> is immutable. In addition to Rc<T>, we have to introduce RefCell<T> to enable mutation through immutable reference. This pattern in Rust is called interior mutability. Instead of enforcing the borrowing rules with the compiler, RefCell<T> implements it at runtime. This has one significant implication - failure to adhere to the borrowing rule results in a runtime panic instead of a compile-time error. The resulting type Rc<RefCell<T>> provides exactly what we need, a shared mutable reference to a heap-allocated value that gets free automatically when it’s no longer referenced. But what’s the catch?

One crucial thing to remember when using Rc<T> is to not create cycles between Rc<T>. Nonetheless, Lox doesn’t have this constraint and allows reference cycles to form. Moreover, Lox users should not care about the underlying memory model to avoid forming reference cycles. Consider the following Lox program:

class Node {}
{
    var n1 = Node();
    var n2 = Node();
    n1.inner = n2;
    n2.inner = n1;
}

Here, we create a reference cycle between two objects. When the stack variables n1 and n2 go out of scope, the reference count of both objects gets decremented to 1, and we lose access to both objects. Due to the reference cycle, there’s no way to get the count to 0 because the count of one object depends on the deallocation of the other and vice versa.

Reference counting with cycle.

Reference counting with cycle.

Using Rust’s API, such a cycle can be broken using Weak<T>. Giving out a Weak<T> only increments the weak count, which does not prevent the value from being destroyed even when it’s greater than zero. However, there’s no way for a Lox program to specify to the VM that it wants to use a weak pointer, and we don’t even want to expose such functionality due to the high-level nature of the language. Therefore, while being extremely simple to implement, this approach is ultimately unsound on its own.

On top of reference counting, languages typically employ a cycle detection algorithm and have a cycle collection phase to clean up unreachable objects that form reference cycles. Examples of such languages include Python, Nim, etc. You can implement this in Rust by defining your own Rc<T> like the one described in this article. Even so, to keep true to how Lox is implemented in the book, I decided to not use reference counting and explored other methods for doing mark-and-sweep instead.

object manager

We now know that reference counting is a no-go. What options do we have left? If we squint and look a bit closer at Lox’s memory model and object system, we see that every heap-allocated object is actually owned by “the heap”, and variables are just borrowing from it.

Taking this idea, we can explicitly model Lox’s heap using an object manager, i.e., a dynamic array containing all the objects. But then we have a problem - how do more than two variables hold mutable references to the same object inside the heap? We again face the scrutiny of the borrow-checker.

All problems in computer science can be solved by another level of indirection, except for the problem of too many layers of indirection – David Wheeler

Instead of directly referencing objects’ data in the object manager with memory pointers, our objects simply hold an index into the dynamic array where their data is stored. Essentially, we’re building our own memory address translation layer. This is a typical pattern in Rust for complex object graphs and cyclic data structures. When we need to access an object, we simply give the index to the object manager and get a mutable/immutable reference to that object’s data in return.

Object manager.

Object manager.

One caveat when applying this pattern is ensuring that the index held by the objects is, in fact, pointing to a valid position within the dynamic array. In general, supporting data deletion is quite complex because we don’t know whether an item at a particular index is no longer in use. We may still have an index into the deleted position somewhere in our program if we’re not careful. Fortunately, we are building a GC, and within the confines of our VM, we know exactly when the object will no longer be accessible using mark-and-sweep!

erm… ackchyually

With the “object manager” approach, haven’t we just solved the GC problem? Well, yes and no. While having an indirection is nice and easy to reason about, we lose lots of performance doing so.

Besides the performance implications, this approach doesn’t return memory to the operating system. Unless we employ a much more complex strategy, we can’t shrink the dynamic array because of the indexing issue mentioned earlier. As a result, the GC only reclaims memory for reuse without actually reducing the program’s memory usage. Imagine a situation when we allocate a bunch of objects at the start of the program and then don’t use them. The large memory region is still being held until the program exits, even though it was “freed” by the GC.

oii, speed up!

Sanic.

Sanic.

What if we forget about “safety” and implement the GC exactly as described? How do we do that, then? Following the original C implementation, we can define the following structures:

#[derive(Debug)]
pub struct Gc<T> {
    ptr: NonNull<GcData<T>>,
}
#[derive(Debug)]
pub struct GcData<T> {
    marked: bool,
    next: Option<Object>,
    data: T,
}

Here, objects are just pointers to their data with no fancy reference counting or indirection. Object data simply contains two sections - its header containing metadata and its body containing actual information about the object. The type Gc<T> is just a simple wrapper around NonNull<T> to indicate that the pointer is managed by our GC and provides some questionable unsafe implementation that makes our life a bit easier. Additionally, objects form a linked list, so the GC can easily traverse through all of them while cleaning up unreachable objects.

For those who don’t know, NonNull<T> is an abstraction for a raw pointer that is non-null and covariant. Don’t ask me about covariance since I honestly don’t know the answer. I recommend reading the section on subtyping and variance and The Rustonomicon. I went through them a few times but still don’t entirely get all the details.

Then, we can define each type of object like so:

/// The content of a function object.
#[derive(Debug)]
pub struct ObjFun {
    /// The name of the function.
    pub name: Option<RefString>,
    /// The number of parameters of the function.
    pub arity: u8,
    /// The number of upvalues captured by the function.
    pub upvalue_count: usize,
    /// The bytecode chunk of this function.
    pub chunk: Chunk,
}
/// The content of a closure object.
#[derive(Debug)]
pub struct ObjClosure {
    /// The function definition of this closure.
    pub fun: RefFun,
    /// The variables hoisted by this closure.
    pub upvalues: Vec<RefUpvalue>,
}
/// A type alias for a heap-allocated function.
pub type RefFun = Gc<ObjFun>;
/// A type alias for a heap-allocated closure.
pub type RefClosure = Gc<ObjClosure>;

Now, for the questionable parts:

impl<T> ops::Deref for Gc<T> {
    type Target = GcData<T>;

    fn deref(&self) -> &Self::Target {
        unsafe { self.ptr.as_ref() }
    }
}
impl<T> ops::DerefMut for Gc<T> {
    fn deref_mut(&mut self) -> &mut Self::Target {
        unsafe { self.ptr.as_mut() }
    }
}

These two trait implementations define a way for Rust to dereference the pointers and access the data behind them. It’s questionable because Deref and DerefMut must not fail, and we cannot guarantee that. In addition, we cannot guarantee this won’t result in an undefined behavior. If we’re being rigorous, we must not implement these traits but instead provide a distinct method to Gc<T> for performing dereference and mark them as unsafe.

However, I would argue that it’s okay to do so because we only use Gc<T> within the VM to represent garbage-collected objects. With the GC guarantees that no object in use will be freed, we can always dereference the raw pointers in a “safe” manner. Any issue that arises, primarily dangling pointers, is a bug in our GC that causes the VM to malfunction regardless of how we represent an object.

Finally, we have a heap structure to manage all operations performed on the VM’s heap and provide a destructor for freeing all objects once the program finishes running.

pub struct Heap {
    // Interned string table.
    strings: Table<()>,
    // The head of the linked list of heap-allocated objects.
    head: Option<Object>,
}
impl Heap {
    /// Allocates an object and returns its handle. The object is pushed to the front of
    /// the list of allocated objects.
    pub fn alloc<T: GcSized, F>(&mut self, data: T, map: F) -> (Object, Gc<T>)
    where
        F: Fn(Gc<T>) -> Object;

    /// Interns a string and returns its handle. The same reference is returned for 2
    /// equal strings.
    pub fn intern(&mut self, data: String) -> RefString;

    /// Releases all objects that aren't marked. This also removes interned strings
    /// when no object is referencing them.
    pub unsafe fn sweep(&mut self);

    /// Deallocates an object.
    unsafe fn dealloc(&mut self, object: Object);
}
impl Drop for Heap {
    fn drop(&mut self) {
        for object in &*self {
            unsafe { self.dealloc(object) };
        }
        debug_assert_eq!(0, self.alloc_bytes);
    }
}

And we’re done. As for the actual GC implementation, I recommend reading the book and following along because it’s pretty straightforward. These Rust data structures are more or less one-to-one mappings of the C implementation, and there should be no trouble translating them.

afterword

{
  var s = "Hello, world!";
  fun helloWorld() {
    print s;
  }
  var f = helloWorld;
  f();
}
A Lox program and its memory at runtime (incorrect linked list order, but you get the idea).

A Lox program and its memory at runtime (incorrect linked list order, but you get the idea).

Implementing an interpreter is a fun project that anyone should try once. Crafting Interpreters is one of the most impressive technical books I have ever read, and I would recommend it to anyone interested in the topic.

With Rust, I think we can agree that its memory model is unsuitable for creating a mark-and-sweep GC, and there are many different methods to overcome its limitations. The simplest way to achieve maximum performance is to ignore all Rust’s safety guarantees like the method I presented here. While there are better ways to go about it, it’s effortless to understand, especially when you have some experience with C or C++ and are following along the book. Excluding the GC, other components of the interpreter are actually quite easy to implement in Rust. Its excellent type system and capabilities make it very nice for the task.

One thing to keep in mind is that it’s feasible to utilize Rust’s type system to prove that the GC is safe. An intriguing approach I want to try at some point is utilizing Pin<T> and carefully weaving the lifetime of rooted objects through our program to prove they’re safe for access. Such a strategy is described in this blog series on shifgrethor and implemented in rune. Furthermore, there are many more exciting developments in providing GC as a library in Rust, and all have been nicely summarized in this article.

With that, thanks for reading! To see more of the code, you can visit https://github.com/ltungv/rox. If you’re interested, I keep a repository containing the old GC implementation using Rc<RefCell<T>> at https://github.com/ltungv/lox.