Reference Cycles Can Leak Memory

Rust’s memory safety guarantees make it difficult to accidentally create memory that’s never cleaned up, known as a memory leak, but not impossible. Entirely preventing memory leaks is not one of Rust’s guarantees in the same way that disallowing data races at compile time is, meaning memory leaks are memory safe in Rust. We can see this with Rc<T> and RefCell<T>: it’s possible to create references where items refer to each other in a cycle. This creates memory leaks because the reference count of each item in the cycle will never reach 0, and the values will never be dropped.

Creating a Reference Cycle

Let’s take a look at how a reference cycle might happen and how to prevent it, starting with the definition of the List enum and a tail method in Listing 15-20:

Filename: src/main.rs

use std::rc::Rc;
use std::cell::RefCell;
use List::{Cons, Nil};

#[derive(Debug)]
enum List {
    Cons(i32, RefCell<Rc<List>>),
    Nil,
}

impl List {
    fn tail(&self) -> Option<&RefCell<Rc<List>>> {
        match *self {
            Cons(_, ref item) => Some(item),
            Nil => None,
        }
    }
}

Listing 15-20: A cons list definition that holds a RefCell so that we can modify what a Cons variant is referring to

We’re using another variation of the List definition from Listing 15-6. The second element in the Cons variant is now RefCell<Rc<List>>, meaning that instead of having the ability to modify the i32 value like we did in Listing 15-19, we want to be able to modify which List a Cons variant is pointing to. We’ve also added a tail method to make it convenient for us to access the second item, if we have a Cons variant.

In listing 15-21, we’re adding a main function that uses the definitions from Listing 15-20. This code creates a list in a, a list in b that points to the list in a, and then modifies the list in a to point to b, which creates a reference cycle. There are println! statements along the way to show what the reference counts are at various points in this process.

Filename: src/main.rs

# use List::{Cons, Nil};
# use std::rc::Rc;
# use std::cell::RefCell;
# #[derive(Debug)]
# enum List {
#     Cons(i32, RefCell<Rc<List>>),
#     Nil,
# }
#
# impl List {
#     fn tail(&self) -> Option<&RefCell<Rc<List>>> {
#         match *self {
#             Cons(_, ref item) => Some(item),
#             Nil => None,
#         }
#     }
# }
#
fn main() {
    let a = Rc::new(Cons(5, RefCell::new(Rc::new(Nil))));

    println!("a initial rc count = {}", Rc::strong_count(&a));
    println!("a next item = {:?}", a.tail());

    let b = Rc::new(Cons(10, RefCell::new(Rc::clone(&a))));

    println!("a rc count after b creation = {}", Rc::strong_count(&a));
    println!("b initial rc count = {}", Rc::strong_count(&b));
    println!("b next item = {:?}", b.tail());

    if let Some(ref link) = a.tail() {
        *link.borrow_mut() = Rc::clone(&b);
    }

    println!("b rc count after changing a = {}", Rc::strong_count(&b));
    println!("a rc count after changing a = {}", Rc::strong_count(&a));

    // Uncomment the next line to see that we have a cycle; it will
    // overflow the stack
    // println!("a next item = {:?}", a.tail());
}

Listing 15-21: Creating a reference cycle of two List values pointing to each other

We create an Rc instance holding a List value in the variable a with an initial list of 5, Nil. We then create an Rc instance holding another List value in the variable b that contains the value 10, then points to the list in a.

Finally, we modify a so that it points to b instead of Nil, which creates a cycle. We do that by using the tail method to get a reference to the RefCell in a, which we put in the variable link. Then we use the borrow_mut method on the RefCell to change the value inside from an Rc that holds a Nil value to the Rc in b.

If we run this code, keeping the last println! commented out for the moment, we’ll get this output:

a initial rc count = 1
a next item = Some(RefCell { value: Nil })
a rc count after b creation = 2
b initial rc count = 1
b next item = Some(RefCell { value: Cons(5, RefCell { value: Nil }) })
b rc count after changing a = 2
a rc count after changing a = 2

We can see that the reference count of the Rc instances in both a and b are 2 after we change the list in a to point to b. At the end of main, Rust will try and drop b first, which will decrease the count in each of the Rc instances in a and b by one.

However, because a is still referencing the Rc that was in b, that Rc has a count of 1 rather than 0, so the memory the Rc has on the heap won’t be dropped. The memory will just sit there with a count of one, forever.

To visualize this, we’ve created a reference cycle that looks like Figure 15-22:

Reference cycle of lists

Figure 15-22: A reference cycle of lists a and b pointing to each other

If you uncomment the last println! and run the program, Rust will try and print this cycle out with a pointing to b pointing to a and so forth until it overflows the stack.

In this specific case, right after we create the reference cycle, the program ends. The consequences of this cycle aren’t so dire. If a more complex program allocates lots of memory in a cycle and holds onto it for a long time, the program would be using more memory than it needs, and might overwhelm the system and cause it to run out of available memory.

Creating reference cycles is not easily done, but it’s not impossible either. If you have RefCell<T> values that contain Rc<T> values or similar nested combinations of types with interior mutability and reference counting, be aware that you have to ensure you don’t create cycles yourself; you can’t rely on Rust to catch them. Creating a reference cycle would be a logic bug in your program that you should use automated tests, code reviews, and other software development practices to minimize.

Another solution is reorganizing your data structures so that some references express ownership and some references don’t. In this way, we can have cycles made up of some ownership relationships and some non-ownership relationships, and only the ownership relationships affect whether a value may be dropped or not. In Listing 15-20, we always want Cons variants to own their list, so reorganizing the data structure isn’t possible. Let’s look at an example using graphs made up of parent nodes and child nodes to see when non-ownership relationships are an appropriate way to prevent reference cycles.

Preventing Reference Cycles: Turn an Rc<T> into a Weak<T>

So far, we’ve shown how calling Rc::clone increases the strong_count of an Rc instance, and that an Rc instance is only cleaned up if its strong_count is 0. We can also create a weak reference to the value within an Rc instance by calling Rc::downgrade and passing a reference to the Rc. When we call Rc::downgrade, we get a smart pointer of type Weak<T>. Instead of increasing the strong_count in the Rc instance by one, calling Rc::downgrade increases the weak_count by one. The Rc type uses weak_count to keep track of how many Weak<T> references exist, similarly to strong_count. The difference is the weak_count does not need to be 0 in order for the Rc instance to be cleaned up.

Strong references are how we can share ownership of an Rc instance. Weak references don’t express an ownership relationship. They won’t cause a reference cycle since any cycle involving some weak references will be broken once the strong reference count of values involved is 0.

Because the value that Weak<T> references might have been dropped, in order to do anything with the value that a Weak<T> is pointing to, we have to check to make sure the value is still around. We do this by calling the upgrade method on a Weak<T> instance, which will return an Option<Rc<T>>. We’ll get a result of Some if the Rc value has not been dropped yet, and None if the Rc value has been dropped. Because upgrade returns an Option, we can be sure that Rust will handle both the Some case and the None case, and there won’t be an invalid pointer.

As an example, rather than using a list whose items know only about the next item, we’ll create a tree whose items know about their children items and their parent items.

Creating a Tree Data Structure: a Node with Child Nodes

To start building this tree, we’ll create a struct named Node that holds its own i32 value as well as references to its children Node values:

Filename: src/main.rs

use std::rc::Rc;
use std::cell::RefCell;

#[derive(Debug)]
struct Node {
    value: i32,
    children: RefCell<Vec<Rc<Node>>>,
}

We want a Node to own its children, and we want to be able to share that ownership with variables so we can access each Node in the tree directly. To do this, we define the Vec items to be values of type Rc<Node>. We also want to be able to modify which nodes are children of another node, so we have a RefCell in children around the Vec.

Next, let’s use our struct definition and create one Node instance named leaf with the value 3 and no children, and another instance named branch with the value 5 and leaf as one of its children, as shown in Listing 15-23:

Filename: src/main.rs

# use std::rc::Rc;
# use std::cell::RefCell;
#
# #[derive(Debug)]
# struct Node {
#     value: i32,
#    children: RefCell<Vec<Rc<Node>>>,
# }
#
fn main() {
    let leaf = Rc::new(Node {
        value: 3,
        children: RefCell::new(vec![]),
    });

    let branch = Rc::new(Node {
        value: 5,
        children: RefCell::new(vec![Rc::clone(&leaf)]),
    });
}

Listing 15-23: Creating a leaf node with no children and a branch node with leaf as one of its children

We clone the Rc in leaf and store that in branch, meaning the Node in leaf now has two owners: leaf and branch. We can get from branch to leaf through branch.children, but there’s no way to get from leaf to branch. leaf has no reference to branch and doesn’t know they are related. We’d like leaf to know that branch is its parent.

Adding a Reference from a Child to its Parent

To make the child node aware of its parent, we need to add a parent field to our Node struct definition. The trouble is in deciding what the type of parent should be. We know it can’t contain an Rc<T> because that would create a reference cycle, with leaf.parent pointing to branch and branch.children pointing to leaf, which would cause their strong_count values to never be zero.

Thinking about the relationships another way, a parent node should own its children: if a parent node is dropped, its child nodes should be dropped as well. However, a child should not own its parent: if we drop a child node, the parent should still exist. This is a case for weak references!

So instead of Rc, we’ll make the type of parent use Weak<T>, specifically a RefCell<Weak<Node>>. Now our Node struct definition looks like this:

Filename: src/main.rs

use std::rc::{Rc, Weak};
use std::cell::RefCell;

#[derive(Debug)]
struct Node {
    value: i32,
    parent: RefCell<Weak<Node>>,
    children: RefCell<Vec<Rc<Node>>>,
}

This way, a node will be able to refer to its parent node, but does not own its parent. In Listing 15-24, let’s update main to use this new definition so that the leaf node will have a way to refer to its parent, branch:

Filename: src/main.rs

# use std::rc::{Rc, Weak};
# use std::cell::RefCell;
#
# #[derive(Debug)]
# struct Node {
#     value: i32,
#     parent: RefCell<Weak<Node>>,
#     children: RefCell<Vec<Rc<Node>>>,
# }
#
fn main() {
    let leaf = Rc::new(Node {
        value: 3,
        parent: RefCell::new(Weak::new()),
        children: RefCell::new(vec![]),
    });

    println!("leaf parent = {:?}", leaf.parent.borrow().upgrade());

    let branch = Rc::new(Node {
        value: 5,
        parent: RefCell::new(Weak::new()),
        children: RefCell::new(vec![Rc::clone(&leaf)]),
    });

    *leaf.parent.borrow_mut() = Rc::downgrade(&branch);

    println!("leaf parent = {:?}", leaf.parent.borrow().upgrade());
}

Listing 15-24: A leaf node with a Weak reference to its parent node, branch

Creating the leaf node looks similar to how creating the leaf node looked in Listing 15-23, with the exception of the parent field: leaf starts out without a parent, so we create a new, empty Weak reference instance.

At this point, when we try to get a reference to the parent of leaf by using the upgrade method, we get a None value. We see this in the output from the first println!:

leaf parent = None

When we create the branch node, it will also have a new Weak reference, since branch does not have a parent node. We still have leaf as one of the children of branch. Once we have the Node instance in branch, we can modify leaf to give it a Weak reference to its parent. We use the borrow_mut method on the RefCell in the parent field of leaf, then we use the Rc::downgrade function to create a Weak reference to branch from the Rc in branch.

When we print out the parent of leaf again, this time we’ll get a Some variant holding branch: leaf can now access its parent! When we print out leaf, we also avoid the cycle that eventually ended in a stack overflow like we had in Listing 15-21: the Weak references are printed as (Weak):

leaf parent = Some(Node { value: 5, parent: RefCell { value: (Weak) },
children: RefCell { value: [Node { value: 3, parent: RefCell { value: (Weak) },
children: RefCell { value: [] } }] } })

The lack of infinite output indicates that this code didn’t create a reference cycle. We can also tell this by looking at the values we get from calling Rc::strong_count and Rc::weak_count.

Visualizing Changes to strong_count and weak_count

Let’s look at how the strong_count and weak_count values of the Rc instances change by creating a new inner scope and moving the creation of branch into that scope. This will let us see what happens when branch is created and then dropped when it goes out of scope. The modifications are shown in Listing 15-25:

Filename: src/main.rs

fn main() {
    let leaf = Rc::new(Node {
        value: 3,
        parent: RefCell::new(Weak::new()),
        children: RefCell::new(vec![]),
    });

    println!(
        "leaf strong = {}, weak = {}",
        Rc::strong_count(&leaf),
        Rc::weak_count(&leaf),
    );

    {
        let branch = Rc::new(Node {
            value: 5,
            parent: RefCell::new(Weak::new()),
            children: RefCell::new(vec![Rc::clone(&leaf)]),
        });
        *leaf.parent.borrow_mut() = Rc::downgrade(&branch);

        println!(
            "branch strong = {}, weak = {}",
            Rc::strong_count(&branch),
            Rc::weak_count(&branch),
        );

        println!(
            "leaf strong = {}, weak = {}",
            Rc::strong_count(&leaf),
            Rc::weak_count(&leaf),
        );
    }

    println!("leaf parent = {:?}", leaf.parent.borrow().upgrade());
    println!(
        "leaf strong = {}, weak = {}",
        Rc::strong_count(&leaf),
        Rc::weak_count(&leaf),
    );
}

Listing 15-25: Creating branch in an inner scope and examining strong and weak reference counts

Once leaf is created, its Rc has a strong count of 1 and a weak count of 0. In the inner scope we create branch and associate it with leaf, at which point the Rc in branch will have a strong count of 1 and a weak count of 1 (for leaf.parent pointing to branch with a Weak<T>). Here leaf will have a strong count of 2, because branch now has a clone of the Rc of leaf stored in branch.children, but will still have a weak count of 0.

When the inner scope ends, branch goes out of scope and the strong count of the Rc decreases to 0, so its Node gets dropped. The weak count of 1 from leaf.parent has no bearing on whether Node is dropped or not, so we don’t get any memory leaks!

If we try to access the parent of leaf after the end of the scope, we’ll get None again. At the end of the program, the Rc in leaf has a strong count of 1 and a weak count of 0, because the variable leaf is now the only reference to the Rc again.

All of the logic that manages the counts and value dropping is built in to Rc and Weak and their implementations of the Drop trait. By specifying that the relationship from a child to its parent should be a Weak<T> reference in the definition of Node, we’re able to have parent nodes point to child nodes and vice versa without creating a reference cycle and memory leaks.

Summary

This chapter covered how you can use smart pointers to make different guarantees and tradeoffs than those Rust makes by default with regular references. Box<T> has a known size and points to data allocated on the heap. Rc<T> keeps track of the number of references to data on the heap so that data can have multiple owners. RefCell<T> with its interior mutability gives us a type that can be used when we need an immutable type but need the ability to change an inner value of that type, and enforces the borrowing rules at runtime instead of at compile time.

We also discussed the Deref and Drop traits that enable a lot of the functionality of smart pointers. We explored reference cycles that can cause memory leaks, and how to prevent them using Weak<T>.

If this chapter has piqued your interest and you want to implement your own smart pointers, check out “The Nomicon” for even more useful information.

Next, let’s talk about concurrency in Rust. We’ll even learn about a few new smart pointers.

results matching ""

    No results matching ""