A Grounded Conceptual Model for Ownership Types in Rust

1. Introduction

Ownership is a programming discipline for managing the aliasing and mutation of data, enforced statically through ownership types. The flagship programming language for ownership is Rust, which empowers programmers to write memory-safe code without garbage collection. Rust’s ownership model synthesizes several ideas from programming language research, such as linear logic,⁹ class-based alias management,⁶ and region-based memory management.¹⁰ Developers cannot use languages like C and C++ to build memory-safe systems at scale,¹⁹ so the software industry is turning toward Rust. For example, Google’s Android team has thus far found zero memory vulnerabilities in 1.5 million lines of Rust code.²⁴

This rosy picture of tech transfer belies a persistent obstacle: teaching Rust to prospective users, especially about ownership. Over the last four years, studies have found that Rust learners struggle to fix ownership type errors,²⁸ and users self-report that ownership is among their biggest barriers to learning Rust.²³ To wit: Advances in the technical factors of type systems require commensurate advances in the human factors of type systems.

Our work started with the question: How can we systematically design a pedagogy for ownership types? Today, popular pedagogies for advanced type systems are driven by experts’ intuitions about how people learn, as well as by intuitions about what makes type systems difficult to understand. As practicing (computer) scientists, we wanted to approach pedagogic design through scientific principles: grounding the pedagogy in observations about the struggles of Rust learners, and then evaluating the pedagogy by its effects on learning outcomes. This paper describes how we put these principles into practice:

We ran a formative user study to identify misconceptions about ownership types held by Rust learners (Section 3). We designed a new instrument for evaluating understanding of ownership, the Ownership Inventory, by drawing tasks from commonly reported Rust issues on StackOverflow. We studied N = 36 Rust learners trying to solve Ownership Inventory problems. We found that learners can generally identify the surface-level reason for why a program is ill-typed with respect to ownership. However, learners do not understand what undefined behavior (if any) would occur if an ill-typed program were executed. This misunderstanding is reflected in inefficient and incorrect strategies used to fix ownership errors.
We developed a conceptual model of ownership types to address these misconceptions (Section 4). The conceptual model represents the aspects of Rust’s static and dynamic semantics that are relevant to ownership while abstracting other details. The model provides learners a foundation to understand essential concepts such as undefined behavior and the incompleteness of Rust’s ownership type-checker, or “borrow checker.” We implemented tools to execute Rust programs under the conceptual model, generating traces which we visualize to illustrate how the model applies to concrete examples. We wrote a new chapter on ownership for a popular Rust textbook, The Rust Programming Language (TRPL),¹² using this conceptual model with these visualizations.
We A/B tested our pedagogy against the TRPL baseline (Section 5). We set up and advertised a public website that hosts our TRPL fork. We measured learning outcomes with two kinds of quizzes: simpler comprehension questions about the conceptual model, and more difficult multiple-choice versions of the Ownership Inventory. Learners could correctly answer comprehension questions with 72% accuracy. Our initial deployment improved the average Inventory score from 48% to 57% (N = 342, p < 0.001, d = 0.56).

2. Background

If you, the reader, are not familiar with Rust or ownership, then it’s important to understand the basics to understand this paper’s contributions. One contribution is exactly an explanation of the basics of ownership. We will therefore give a brief exposition of Rust in the form of a heavily abbreviated version of the actual pedagogy we developed for Rust learners. Later sections will refer back to explain the design rationale of the concepts and diagrams mentioned here.

2.1 Ownership.

Rust is a systems programming language, most similar in its aims and scope to C++. Rust contains many standard imperative language features, such as mutable variables, conditionals, arrays, loops, structs, and functions.

Rust manages memory through a system of compile-time checks known as ownership, as opposed to runtime checks (garbage collection) or manual checks (malloc/free). The basic idea is that heap allocations are owned by a variable, and allocations are freed when their owner goes out of scope, unless the owner moves ownership to another variable. This concept is illustrated in Figure 1.

Figure 1. An example of how ownership and moves work in Rust with boxed arrays. At L1, assigning an array to a variable `a` puts the data on the stack. At L2, assigning `b` to `a` copies the array into `b`. At L3, we introduce a box, which owns data on the heap, therefore `c` points to a heap allocation. At L4, assigning `d` to `c` copies the pointer (but not the array). This copy is called a move because `c` can no longer be legally used. Once the function ends, the heap allocation is deallocated on behalf of its owner, `d`.

The goal of ownership is to ensure memory safety, or more generally, to prevent undefined behavior. Undefined behavior means runtime operations without semantics under the Rust language specification, such as dereferencing a null pointer or pretending a string is a boolean. To prevent undefined behavior, the Rust compiler enforces several rules. First, data always has exactly one owner. Second, data is only deallocated on behalf of its owner. Third, data can only be accessed through its owner (except through references, discussed next). These rules prevent situations such as double-frees or use after-frees as shown in Figure 2.

Figure 2. Rust’s ownership checks prevent undefined behavior. This diagram shows the counterfactual semantics for a program which does not actually compile. If allowed to compile, the box `x` would be moved into the call to `print` at L2, and get deallocated at the end of `print`, leaving `x` pointing to invalid memory at L3. It would violate memory safety to use `x` at L4. Rust catches this error, reporting “`error: use of moved value: ‘x‘`” at L4.

2.2 Borrowing.

To allow accessing data without owning it, Rust provides non-owning pointers called references. References borrow data owned by another variable. Those references can be either immutable (created by &) or mutable (created by &mut). Figure 3 shows an example of creating an immutable reference to a piece of a string, and creating a mutable reference to pass to a helper function.

Figure 3. An example of immutable and mutable references in Rust. At L1, `s` owns the string “Hello” on the heap. At L2, the reference `lo` points to the end of `s` but does not own it. At L3, a mutable reference to `s` is used to append to the string. This causes the string’s buffer to resize, invalidating the reference `lo`. At L4, `s` is still valid, because `arg` did not own `s`.

Rust enforces safety with references through the concept of permissions. Permissions describe the kinds of operations one can do to data; they come and go based on which owners and references are in use at any given time. Specifically, Rust tracks three permission: readable (R), writable (W), or ownable (O). Figure 4 shows one example of how permissions change over the course of a program.

Figure 4. Each table shows how the permissions on variables change after each statement. On line 1, the mutable owned string `x` has all three permissions. On line 2, creating an immutable reference `y` to `x` removes the **`W`** and `O` permissions from `x` so `y` cannot be invalidated while in use. `*y` only has `R` permissions because it is an immutable reference. On line 3, `x` regains its permissions after `y` is no longer used. On line 4, `x` loses its permissions after it is no longer used.

Each operation expects certain permissions from its inputs. For example, getting the length of a vector requires R. Pushing an element to a vector requires W. Deallocating a vector requires O. Rust’s borrow checker analyzes a program to ensure that variables have their requisite permissions in each operation, raising a compiler error otherwise. Figure 5 shows how these expectations occur in a Rust program.

Figure 5. Operations which expect permissions show their expectations inline. On line 2, the borrow `&x` requires **`R`** permission on `x`, which it has. On line 3, the method call `.push_str` requires **`R`** and **`W`**, but `x` is not writable due to the borrow by `y`. This program is rejected by the compiler with the error: “`error: cannot borrow ‘x‘ as mutable because it is also borrowed as immutable`”.

3. A Concept Inventory for Ownership

Before this work, Rust learners did not learn about ownership in the style of Section 2. Prior Rust learning resources adopted a range of pedagogical techniques, but the consistent outcome was that Rust learners struggled in practice to work with ownership. We first sought to understand the core misconceptions that underlay those struggles. Our overarching methodology was the development of a concept inventory for ownership, henceforth called the “Ownership Inventory.” In education research, a concept inventory (CI) is a test, usually composed of multiple-choice questions, about a narrow domain where the questions and distractors are drawn from common misconceptions about the domain.¹¹

We set out to design the Ownership Inventory for two reasons. First, the misconceptions we would observe in creating the Inventory would inform our eventual pedagogy. Second, we could use the Inventory to evaluate the efficacy of an intervention. If our pedagogy is better than before, then it should cause learners to score higher on the Inventory.

To construct the Inventory, we designed open-ended questions about ownership in situations that frequently stymie Rust learners. We searched for the most common questions asked about Rust on StackOverflow that pertain to ownership. We iteratively categorized each question and identified four main categories of ownership problems: dangling pointers, overlapping borrows, illegal borrow promotion, and lifetime parameters. For each category, we selected a few representative StackOverflow questions and cleaned up the snippet in question. For example, Figure 6 shows a StackOverflow question in the “illegal borrow promotion” category and the corresponding clean program.

Figure 6. An example of how we created snippets for the Ownership Inventory.

We designed a single set of template questions that apply to each program. The template represents each stage of reasoning involved in fixing an ownership error:

What error message would you expect from the compiler?
Assume the compiler did not reject this function. (A) What is a program that calls this function which would violate memory safety or cause a data race? (B) Explain your reasoning.
(A) How can this function be changed to pass the compiler while preserving its intent? (B) Explain your reasoning.

3.1 Methodology.

After developing the open-ended Ownership Inventory, we next administered the Inventory to elicit misconceptions that Rust learners have about ownership. We recruited $N = 36$ participants for the study. We found Rust learners by embedding an advertisement for the study within the online version of TRPL. Participants were required to be 18 years or older, and they were required to have completed reading TRPL before participating. Participants were compensated $20 and had, on average, $1.7$ ( $σ = 2.2$ ) prior years of experience with either C or C++.

We created a Web interface that presents participants with a program and prompts for open-ended responses to each question. The interface uses the Monaco code editor running a Rust language server via a WebAssembly build of Rust Analyzer. The in-browser IDE allows participants to get information about the type and functionality of unfamiliar methods.

To evaluate the accuracy of participant’s answers, the first two authors independently coded each response as correct or incorrect. After the first round of coding, the authors resolved major disagreements, then independently re-coded the data. After the second round, the inter-rater reliability was 91% in terms of raw agreement and $κ = 0.81$ as measured by Cohen’s $κ$ . We considered this sufficient agreement to proceed with the analysis. For the quantitative results, we report scores as the average of the two raters’ scores on each item. To characterize the specific misconceptions that led to incorrect answers, we performed a thematic analysis of participant responses.

3.2 Results.

Participants could usually predict why the borrow checker would reject a program (Q1, 64% correct). However, participants were less able to fix the program (Q3a, 46%) and design counterexample (Q2a, 31%). Their accuracy further drops when asked to justify their answer (Q2b, 15% and Q3b, 31%). That is, participants could sometimes create counterexamples and fixes without understanding why their answer is correct.

Participants’ reasonable performance on Q1 suggests that Rust learners generally understand the surface-level reason for why a program is rejected. However, participants’ comparatively poor performance on Q2a and Q2b suggests Rust learners do not understand the deeper reasons that justify the ownership rules. For instance:

Participants struggled to construct a correct counterexample to an unsafe function. Participants often gave counterexamples which executed the function but failed to actually trigger undefined behavior, such as dereferencing an invalid pointer.
Participants struggled to identify when a function is actually safe and no counterexample exists, such as mutably referencing disjoint indexes in an array.
Participants could usually change a broken function to pass the borrow checker, but these fixes were not always correct and idiomatic. For example, participants might clone a data structure, mutate the clone, and then immediately throw away the cloned data.

4. A Conceptual Model for Ownership

Our study of the open-ended Ownership Inventory suggested Rust learners lacked a depth of understanding of ownership in a few key areas. In particular:

Rust learners generally did not understand what the borrow checker was protecting them from, that is, what would go wrong without ownership.
Rust learners generally could not distinguish between issues of soundness and completeness, that is, whether a program was genuinely unsafe, or rejected due to limitations of the borrow checker.

Our hypothesis was that the core missing pedagogical component was a durable conceptual model of Rust’s semantics with respect to ownership. A conceptual model is a way of thinking about a thing, distinct from a mental model as the concrete way a given person thinks about a thing. The challenge for designers of pedagogy is to provide a conceptual model at an appropriate level of abstraction.

Starting with an example simpler than ownership, consider teaching about integer addition in Rust. The challenge is to design a conceptual model of the semantics of the statement let z = x + y where x, y : i32. The true runtime semantics of Rust’s integer addition include aspects like two’s complement overflow and auto-vectorization due to LLVM’s optimizations. But for the average Rust user, these details are usually irrelevant for correctly using addition in routine programming tasks. A conceptual model that approximates the semantics as “ $x, y \in Z$ and $x + y$ is standard integer addition” is a generally viable model.

In the context of ownership, Rust learning resources would often provide a conceptual model in the form of patterns of bad code. These books explain concepts like mutable references by saying, for example, “Borrows can be read-only or read-write. Only one read-write borrow can exist at any one time.”¹⁸ However, these types of explanations provide little grounding to understand either the safety justification for ownership, or the distinction between soundness and completeness. We therefore set out to build a more holistic conceptual model of ownership that could encompass these concerns.

4.1 Conceptual model.

We developed two related conceptual models of Rust’s runtime (dynamic) semantics and Rust’s compile-time (static) semantics. A key design constraint for both models is that they needed to be executable, that is, we could simulate a program under each model to collect trace information for visualization as opposed to a conceptual model that exists purely in prose or mathematical formalism.

The main goal of the dynamic semantics model is to abstract Rust’s runtime enough to eliminate details irrelevant to ownership (e.g., whether data is placed in a register or spilled onto the stack) while maintaining details essential to ownership (e.g., whether data is allocated on the heap). Conveniently, the Rust compiler contains an intermediate representation of Rust programs called MIR, and MIR satisfies both design goals: Its semantics hit the right level of abstraction, and the Rust compiler provides a built-in interpreter for MIR for purposes of constant evaluation. We therefore instrumented this interpreter to generate the diagrams shown in Figure 1 and Figure 2.

A key aspect of the dynamic semantic model is that we can configure the MIR interpreter to evaluate programs even when they contain borrow checker violations. This allows Rust learners to observe the counterfactual behavior of what would go wrong without the protections of the borrow checker, as shown in Figure 2.

The dynamic model provides the foundation for understanding how programs can go wrong. The static model should then help learners understand how Rust’s borrow checker catches programs that could go wrong (soundness), as well as when safe programs may be rejected (incompleteness). As a flow-sensitive analysis, borrow checking is more complex than the usual type system encountered by today’s programmers. So, in designing a conceptual model of the borrow checker, our main goal was to condense the complex intermediate state of the analysis into a comprehensible, visualizable object. The result is the permissions model of borrow checking, as shown in Figure 4 and Figure 5.

The permissions model allows us to explain how the borrow checker “thinks” about a program in a more general manner than describing one-off rule violations. For example, permissions can be computed and visualized for programs that pass the borrow checker, as shown in Figure 4. This model separates out how the compiler computes facts about a program (the permission change tables) versus how those facts are used to check a program (the permission expectations).

These permissions are a conceptual model in that the Rust compiler does not literally compute the RWO permissions. We implemented an analysis which derives the permissions from the internal state of the borrow checker, and that analysis drives the permissions visualization. The permissions are formally equivalent to the actual borrow checker in a manner we demonstrate in the full paper.

4.2 Pedagogy.

After designing the conceptual models, we next designed a pedagogy to help Rust learners internalize these models, as in Section 2. Rather than designing an entire Rust curriculum from scratch, we instead forked TRPL,¹² which covers most of the language’s core features, and it is the official Rust learning resource endorsed by the Rust project.

We designed a new pedagogy of ownership as a replacement for the existing chapter on ownership in TRPL. We start by explaining the core ideas of undefined behavior and memory safety through boxes and moves (the dynamic model). We then introduce references, the borrow checker, and permissions (the static model). Finally, we synthesize these ideas by providing multiple examples of how a Rust programmer can interpret and fix ownership errors, emphasizing the distinction between soundness and completeness. The full text of the chapter is available online.^a

5. Evaluation

We sought to evaluate our pedagogy on whether it helps learners understand ownership in Rust. This raised two immediate questions. First, how do we find learners to try out our pedagogy? The vast majority of CS education research takes place in a classroom, but we explored an alternative route: free online textbooks. These resources provide access to a larger and more diverse population of learners than CS undergraduates at a single institution. To that end, we set up a publicly accessible website that hosts our TRPL fork, and it has been visited by tens of thousands of Rust learners to date. This site provides a research platform for analyzing and intervening in the Rust learning process. The intervention described in this paper is the first step in an ongoing experiment to leverage the platform for systematically improving Rust education at scale.

The second key question is: How do we know if learners understand ownership after following our pedagogy? vc is difficult to define—ideally, a longitudinal study might measure understanding as learners’ ability to productively write safe and performant Rust code in their context of use. But for lack of such data, we instead opted for a common substitute: quiz questions. We diffused the quiz questions throughout the ownership chapter and the rest of the book. Furthermore, we distinguished between two kinds of quiz questions designed to answer two research questions:

Does our pedagogy help learners understand ownership at all?
Does our pedagogy help learners understand ownership better than before?

For RQ1, we asked participants simple comprehension questions about ownership, presented immediately following the book content that is relevant to a given question. These questions determine whether participants can transfer their ownership knowledge to situations similar to the text. Because the questions make references to the permissions model, we cannot establish a score baseline. Therefore we judge the scores in absolute rather than relative terms.

For RQ2, we gave participants a multiple-choice version of the Ownership Inventory. We inserted these questions later in the book after covering the essential prerequisites for a given program. To compare the baseline TRPL pedagogy against ours, we ran a kind of temporal A/B test. Participants answered Inventory questions after reading the original TRPL content for a few weeks. We then deployed the intervention and continued receiving responses to the Inventory. We quantified the pedagogy’s effect based on the resulting change in scores.

5.1 Methodology.

We recruited participants by advertising in the title page of the official Web version of TRPL, courtesy of the authors. Since this advertisement was put up on Nov. 1, 2022, our TRPL fork has received an average of 450 visitors per day, as measured by unique session IDs stored via cookies.

We developed 11 comprehension questions to cover the content of the ownership chapter. Figure 7 shows one example—to test understanding of permission diagrams, we asked participants to infer the permissions for a path at a given point.

We developed 24 Ownership Inventory questions based on the programs in Section 3. For each program, we created a close-ended version of Q1, Q2a, and Q3a (the justification questions Q2b/Q3b did not translate to the multiple-choice setting). Following the concept inventory methodology, we selected distractors from common misconceptions about the Inventory programs.

Figure 8 shows an example of a multiple-choice Ownership Inventory question. The make_separator task is an instance of a dangling stack reference. A common incorrect counterexample provided by participants in Section 3 was the snippet let s = make_separator("") which creates a dangling pointer, but does not use it. By using that incorrect answer as a distractor, the multiple-choice question is more likely to test for the presence of this misconception. As an example, see the Chapter 6 Inventory questions.^b

On January 3, 2023 we deployed the questions to our website and started gathering pre-intervention data on performance with the baseline TRPL pedagogy. After 45 days, on Feb. 17, 2023 we deployed the initial draft of our new ownership chapter. After a few months of iteration, on June 15, 2023 we began gathering post-intervention data. Data collection continued for 45 days until July 30, 2023.

5.2 Results.

5.2.1 RQ1: Does our pedagogy help learners understand ownership at all?

Overall, readers’ accuracy on the comprehension questions was 72%, suggesting that readers could mostly understand the basic concepts within our pedagogy. Readers were able to successfully interpret both runtime and compile-time diagrams and identify when the compiler was going to reject a program.

However, readers’ mediocre performance on a few of the comprehension questions suggests their understanding may be somewhat shallow. For example, the ownership chapter provides an example program containing a variable x : &Box<i32>, and explains that two dereferences like **x are needed to access the inner integer. One question presents a program that constructs an expression of type Box<&Box<i32>> (including a runtime diagram), and asks respondents to determine the number of dereference operations needed to access the inner i32. Only 47% of respondents correctly answer three, suggesting readers still leave with a somewhat fragile understanding of an essential concept like pointers.

5.2.2 RQ2: Does our pedagogy help learners understand ownership better than before?

We focus on the first 18 Inventory questions, as those questions received enough responses to make statistical inferences. First, we analyze the overall Inventory scores for the $N =$ 177 (pre) / 165 (post) participants who completed the first 18 questions. The average pre-intervention score was 48% ( $σ = 16 %$ ). Notably, the average score on the open-ended Inventory questions in Section 3 was 41% (which should be more difficult than equivalent multiple-choice questions), showing that the quantitative results of the formative study reasonably generalized to a larger sample. The average post-intervention score was 57% ( $σ = 15 %$ ). Using a two-tailed $t$ -test, the difference is statistically significant ( $p < 0.001$ ). The normalized effect size as measured by Cohen’s $d$ is $0.56$ . Therefore, the pedagogy had a statistically significant positive effect (+9%) on overall Inventory performance. The results also confirm that Inventory questions are substantially harder than the comprehension questions.

Second, we analyze the intervention’s effect on each Inventory question individually. The intervention had a statistically significant effect on 10/18 questions. Table 1 shows the size of these effects, including almost-significant effects. Overall, the pooled significant effect was 10% or $d = 0.22$ (note that the question-level effect size is smaller than the quiz-level effect size due to the higher per-question variance). Between the different types of questions, the intervention primarily affected performance on questions about undefined behavior (Q2) and fixing a type error (Q3) more so than identifying a type error (Q1). For example, with make_separator Q2, the +7% effect corresponds to an 8% decrease in the incorrect response of “does not have counterexamples.” Conversely, for reverse and apply_curve Q2, the +13%/+17% effects correspond to participants answering correctly that these functions are safe and do not have counterexamples.

Table 1. Effects of the permissions pedagogy for readers’ accuracy on Ownership Inventory questions. Questions are presented in the order encountered by readers. Only effects with $p < 0.15$ are included here, with $p < 0.05$ in bold.

Task	Q.	Before	$N$	After	$N$	Effect	$d$	$p$
`make_separator`	Q2	33%	1120	40%	660	+7%	0.14	0.007
`make_separator`	Q3	57%	1120	62%	660	+5%	0.11	0.024
`get_or_default`	Q1	56%	1120	66%	660	+10%	0.21	<0.001
`get_or_default`	Q2	10%	1120	16%	660	+6%	0.19	<0.001
`remove_zeros`	Q3	35%	629	52%	470	+17%	0.34	<0.001
`reverse`	Q2	28%	629	40%	470	+13%	0.27	<0.001
`reverse`	Q3	21%	629	33%	470	+13%	0.29	<0.001
`find_nth`	Q1	86%	314	90%	374	+4%	0.12	0.099
`find_nth`	Q2	16%	314	23%	374	+7%	0.18	0.018
`find_nth`	Q3	27%	316	34%	374	+7%	0.15	0.041
`apply_curve`	Q2	41%	452	58%	374	+17%	0.34	<0.001
Pooled significant effect:						+10%	0.22

5.3 Threats to validity.

This experiment assumes that the Ownership Inventory is a valid instrument to measure a person’s understanding of ownership. To that end, we designed the Inventory such that the situations reflect common ownership problems (by weighting based on StackOverflow), and such that the questions reflect each stage of reasoning about ownership (based on our formative study). However, future work should validate the extent to which performance on the Inventory correlates to performance in solving ownership problems in practice.

The setting of an online textbook provides the benefit of scale, but it also poses methodological challenges due to lack of controls. One such threat is the uncontrolled quizzing environment. A reader could augment their problem-solving with external aids like a friend, a compiler, a Google search, a large language model, and so on. Participants could also be influenced by learning material outside the book, such as the official or Rust-related YouTube videos. To combat this threat, we explicitly instructed participants to not use external resources while solving quiz problems, and the quiz widget takes over the browser tab while taking a quiz. Moreover, we assume the average participant will be a good actor—our readers are taking these quizzes for their own edification, not to get paid by us or to get a good grade. Gathering enough data should turn bad actors into noise.

Another threat is the uncontrolled assignment to experimental conditions. We chose not to perform a randomized-controlled trial for the reasons discussed in Section 5.1. However, it is possible that temporal correlations in readership could have affected our results. For example, if all the C++ engineers at one company decided to start learning Rust at the same time, then average scores would likely go up in that window of time compared to the average in the limit.

A final threat is teaching to the test. Unlike us, the TRPL authors were not aware of the Ownership Inventory when they wrote the book. At the extreme, if our pedagogy taught the exact answers to Inventory questions, then Inventory scores would not be a useful measure of ownership understanding. At the same time, part of the point of our experiment is exactly to teach to the test! For example, the Inventory is intentionally designed to measure understanding of undefined behavior; in turn, we intentionally designed our new pedagogy to explain undefined behavior. Like any well-meaning educator, we sought a balance. The Inventory materials do not appear anywhere in the revised text. But we do, for instance, walk through an example of how iterator invalidation causes undefined behavior, which is similar to the remove_zeros problem.

6. Related Work

Almeida et al.¹ created RustViz, a visualization format for ownership annotations on a Rust program. In terms of pedagogy, RustViz’s premise is that the key challenge with ownership is “the user must learn to mentally simulate the logic of the borrow checker.” Our pedagogy is based more on connecting Rust’s static and dynamic semantics, which we show in our formative study to be a more serious problem for Rust learners. In terms of implementation, RustViz diagrams are constructed by hand using a DSL while we automatically generate our diagrams from the compiler. In terms of evaluation, Almeida et al. deployed RustViz in a classroom, finding that students responded to a Likert item that the visualizations were “helpful in terms of improving their understanding of ownership.” Our evaluation goes further to quantify the effect of our pedagogy on learning outcomes.

Our runtime diagram is similar to program-state visualizers in prior work—see Sorva²² for a survey. Our findings about misconceptions of undefined behavior and memory safety are consistent with prior work on teaching C. For instance, Lam et al.¹³ found in a study of undergraduates who had taken a computer systems course that “many students displayed little knowledge or had misunderstandings about memory and memory layout” and would simply say “something bad” happens when unsafe operations occur.

Our work continues a line of CS education research about conceptual models. Bayman and Mayer² first showed that an appropriate conceptual model for BASIC could help students “develop fewer misconceptions […] and perform better on transfer tests.” du Boulay⁸ coined the term “notional machine” for conceptual models specifically of a language’s dynamic semantics, which has received renewed focus in recent years.⁷ Our work differs from most research on notional machines by focusing equally on a conceptual model of static semantics.

Our work also intersects with a line of programming language research on the human factors of type systems and functional languages. Most prior work has focused on algorithms for identifying the root cause of confusing type-inference errors.²⁵^,²⁷ Recent work has broadened scope to develop theories about how programmers read functional programs¹⁶ and leverage the type system during development.¹⁴

This paper focuses on ownership types as they are implemented in Rust, but ownership types have taken many forms in prior work.⁵ For instance, early systems of ownership focused on ensuring uniqueness of access to data by checking for dominance in the alias graph.⁶ Later systems relaxed this constraint by permitting temporary borrowing of data, both mutably³ and immutably.²⁰ The connection between ownership and permissions has been well-established within formal models, such as fractional permissions.⁴

7. General Discussion

Future programming languages will undoubtedly have increasingly complex type systems. Rust is the language du jour, so this work focused on ownership types. But the next popular language could bring a renewed emphasis to any existing line of PL research: refinement types, session types, or even theorem proving. Effective transfer of these technologies will require pedagogies that do not expect learners to come equipped with Ph.D.-level knowledge of programming languages, mathematics, and Greek. While our immediate goal in this work was to make ownership types more understandable, our broader goal was to explore the viability of different techniques for improving PL pedagogy. In this section, we will briefly reflect on lessons learned.

First, to develop a metric for understanding of ownership, we created a concept inventory by combining data from StackOverflow with a formative study of Rust learners. StackOverflow works for popular languages like Rust but is less useful for niche languages. Human-factors research on niche languages can instead consider using telemetry from developer interactions as has been explored for Racket¹⁵ and Coq.²¹ The concept inventory is an idea that could easily be reused in the context of other languages. Inventories can serve as communal benchmarks for progress in education research, like how datasets of programs serve as benchmarks for performance in compiler research.

Second, to develop a conceptual model for ownership, we carefully selected a level of abstraction that was concrete enough to explain relevant phenomena like undefined behavior, while abstract enough to avoid unnecessary details. We leveraged the rich prior work on distilling the Rust type system into a small, explainable set of mechanisms, especially the Oxide²⁶ and Polonius¹⁷ models. However, PL research usually distills type systems to permit formal reasoning, such as a soundness proof. An open question is how to distill type systems for didactic reasoning, that is, to help learners acquire a conceptual model valid for common tasks. For example, one of our principles was that our model must be encodable in a concise visual representation, which is not a property usually expected of standard PL research. Future work can investigate the properties of semantics that make them more or less explainable.

Finally, to evaluate the efficacy of our pedagogy, we publicly deployed our textbook and compared pre/post-intervention scores on the Ownership Inventory. Collecting telemetry from quizzes in online learning resources is a readily applicable strategy for other contexts. Learners want to take quizzes to engage with the content they are reading. Temporal A/B testing offers a lightweight method for evaluating content changes without sophisticated infrastructure. We encourage anyone interested in programming language learning to try out our methodology. To that end, we have open-sourced our front-end quiz plug-in and our back-end telemetry system.^c

8. Acknowledgments

The authors are immensely grateful to Niko Matsakis and Amazon. They provided both the encouragement and the funding to initiate this project, and supplied additional emergency funding when our first grant application fell through because we were studying Rust instead of C++. We thank Carol Nichols for taking a leap of faith in allowing us to advertise in trpl; this was essential for driving traffic to the experiment. Later parts of this work are partially supported by the US NSF under Grant No. 2319014.

9. Originally Published

This paper was originally published in Proceedings of the 2023 Intern. Conf. on Object-oriented Programming, Systems, Languages, and Applications.