Contents
Motivation
Companies ship software that contains security vulnerabilites to millions of customers. For C++ products, 70% of those vulnerabilities would be stopped by a memory-safe language. There’s growing pressure to move off memory-unsafe languages and onto safe languages like Rust, Swift, Java and C#. The US Government is calling for safety roadmaps from big vendors outlining how they’ll migrate to memory-safe languages for new code. The deadline for these roadmaps is coming up: January 1 2026.
What can be done to hasten the migration to safe coding?
I proposed the Safe C++ extension. This overhauls Standard C++ with memory safety capabilities. It implements the same borrow-checking technology as featured in Rust. This is one path for C++ projects to start writing memory-safe code.
A second viable path to safety is through improved Rust interop. The recent study Eliminating Memory Safety Vulnerabilities at the Source demonstrates that old production code contains fewer vulnerabilities than new code. Time has debugged it. Consequently, the best way to reduce vulnerabilities is to put existing code in maintenance mode and write new code in a memory-safe language. This document explores an idea for dramatically reducing interop friction between C++ and Rust. If it’s easy to use C++ code from Rust, developers will be more open to making the transition.
This is a proposal about molding C++ to support all of Rust’s vocabulary types to increase the surface area of interop between the languages.
C interoperability
Operating systems expose functionality through C APIs. Standard libraries, for any language, are built atop these system APIs. Interoperability with C is very easy for language developers. There’s no overloading of declarations. There are no templates or generics. Structs have straight-forward layout rules that are no challenge to implement.
For the purpose of compilers, C’s ABI is just the parameter-passing convention of the platform your program targets. Unix-like systems follow the ELF object file conventions. Each CPU architecture has an ELF or System V supplement that specifies struct layouts, parameter passing and object file definitions.
Peruse the x86-64 System V ABI for details on processor-specific conventions. C abstracts these concerns from the user. If you code against the C language, your software should compile for many operating systems and hardware architectures.
Languages provide a way to define C-layout structs. Compilers implement the parameter-passing conventions for each target. Voilà. C interoperability.
C++ interoperability
To call C functions, you don’t need much. To call C++ functions you need all the intelligence of a C++ frontend. There are a lot of factors that contribute to making C++ interoperability a colossal challenge.
- C++ functions can overload. Which function candidate does a call expression select? That requires overload resolution, an extremely complicated process for finding the best viable candidate.
- How are function templates considered? That requires argument deduction, specialization and substitution. Very complicated subsystems.
- Function parameters bind arguments using implicit conversion sequences. Conversion sequences may invoke user-defined conversions or copy constructors, and those may require argument deduction, specialization and substitution.
- It’s not feasible to express C++ function declarations in other languages, because C++ function declarations may involve almost every feature in the language. Dependent expressions appear in array bounds, requires-clauses and noexcept-specifiers. Expressions also appear in unevaluated contexts, such as decltype-specifiers, sizeof-operator, alignof-operator and noexcept-operator. These expressions may require overload resolution, argument deduction and substitution to evaluate.
- How do you distinguish overloaded declarations? You need name mangling, which stringifies many parts of a function declaration to prevent different overloads from colliding in the binary.
C++ is a big knot that can’t be untangled. If each language feature is hitch or bend, tugging at one concern just tightens the others.
Two classes of interop
Let’s break the interop problem into first- and second-class levels of support.
- Coverage - First-class support for language features. By definition, C++ has coverage of all C++ features and Rust has coverage of all Rust features. We can grow the interop surface area by adding Rust feature coverage to C++ and C++ feature coverage to Rust. Existing C++/Rust interop tools rely on coverage, which is why they operate at the C language level of abstraction: both toolchains support C layout and function calls. Coverage is high-quality but it increases the complexity of the toolchain.
- Intelligence - Second-class support for language features. Language A needs access to data and semantics of language B. If you want to call a C++ function, you need argument-dependent lookup, overload resolution, argument deduction, substitution and so on. Rather than language A implementing this complex logic, it gets provided across toolchains by language B.
Intelligence is the novel portion of this design. Expose compiler functionality through an API. Using the API, point the compiler at a module or header file to parse it and return a metadata tree of declarations. Submit a query, such as request for the primary, partial or explicit specialization of a class template and retrieve a result. This compiler-as-library, which provides intelligence, is a language server. Rust and C++ compilers can access data and semantics by utilizing one another as language servers.
Coverage goes beyond intelligence in letting you not just use declarations, but define them. We don’t propose adding coverage of C++ templates into Rust (it already has a rich generics system), so you can’t define templates in Rust. But you can use them through the c++ language server.
Wide coverage for Rust interop
In the Safe C++ proposal
I introduced a new std2 standard
library. The containers are designed with borrows, lifetime parameters
and relocation semantics to provide rigorous memory safety. But the
excellent Eliminating
Memory Safety Vulnerabilities at the Source study out of Google made
me reconsider this design choice. The study makes a strong case that
rather than worrying about rewriting C++ code, the best strategy for
improving software quality is to focus on a quick transition to
memory-safe languages.
- If the goal is safety within C++, you’ll need a new
std2safe standard library. - If the goal is reduced C++/Rust interop friction, build the infrastructure on the C++ side to use Rust’s standard library directly.
Extending C++ to natively use Rust’s standard library directly improves interoperability.
Using Rust from C++
Consider extending language coverage and accessing cross-language intelligence to model a toolchain where Rust declarations can be used directly from C++ without bridge code. Let’s walk through a scenario.
- Intelligence: A C++ file imports a Rust module into a C++ namespace.
The Rust language server parses the module code and returns metadata of
all parsed declarations. Rust has a different layout scheme than C++,
and only structs marked with
#[repr(C)]are guaranteed compatible with C layout. Since we want to support all Rust types,structandenumlayouts are part of the discovery data provided by the language server. - Coverage: The C++ frontend injects these declarations into the requested namespace, making them available for qualified lookup. Name lookup is natively supported by C++ and doesn’t require use of a language server.
- Coverage: C++ code can define functions originally declared on the Rust side. Safe C++ already has a safe-specifier, borrow types and lifetime parameters with outlives-constraints.
- Coverage: Lower C++ functions from AST to MIR. Safe C++ implements NLL borrow checking, which guarantees that the exclusivity and lifetime invariants implied by the function declaration are upheld through the definition of the function. This is coverage, because it’s a first-class feature.
- Intelligence: Function declarations that are ODR-used require definitions be emitted to satsify the linker. C++ transmits ODR usage of Rust-provided functions to the Rust language server. Rust must lower these implementations to LLVM bitcode, which is merged with the C++ bitcode prior to optimization.
Prioritize a list of features to improve C++’s coverage of Rust:
- Borrow types. In Circle as
T^andconst T^. - Borrow operators. In Circle as
^and^const. - Lifetime parameters and outlives-constraints.
- Relocation. In Circle as
rel. - safe-specifier. In Circle as
safe. selffunction parameter. Enables self-consuming functions.- Rust enums. In Circle as
choice. - Zero-sized types.
- First-class tuples. In Circle as
(T1, T2). - Slices and arrays. In Circle as
[T; dyn]and[T; N]. traitandimpl.- dyn Trait. These are trait objects and were implemented in an earlier version of Circle.
These are profound extensions to C++. Rust types use relocation semantics rather than C++11 move semantics, so C++ compilers need a new mid-level IR subsystem to perform initialization analysis and drop elaboration. In order to call functions with Rust types, we essentially embed Rust’s object model into the C++ extension.
Using C++ from Rust
Let’s go in the other direction and use C++ entities from Rust:
- Intelligence: A Rust module imports a C++ header. The C++ language server parses the header’s text and returns metadata of all parsed declarations.
- Coverage: The Rust frontend injects the supported declarations into the requested namespace, making them available for qualified lookup.
- Intelligence: Rust code can use C++ types and functions. If it wants to specialize a class template or call a member function on a C++ object, it uses the C++ language server to perform specialization or overload resolution. While it’s possible to include C++ semantics directly into a Rust frontend, that is a big lift. A language server provides the same result for your function call without the immense cost in tooling development.
- Coverage: After argument deduction and overload resolution choose a
best viable candidate, Rust has to make a direct function call.
This requires coverage of C++ function call ABI, which is
absent in existing interop tools. It has to be able to copy or move its
function arguments. Types with non-trivial move constructors, like
std::string, will require that Rust support non-trivial relocation. - Intelligence: As with C++, any ODR-used foreign functions (ODR usage is discovered when a function is lowered to IR) must be communicated back through the language server. The C++ compiler generates those definitions in LLVM bitcode. The linker incorporates both the C++ and Rust definitions in the same crate.
What new coverage does Rust need for high-quality C++ interop? This is more modest than the C++ extensions, because there’s a desire maintain the relative simplicity of Rust. The C++ coverage can be considered an interop extension rather than “core language.”
- Lvalue- and rvalue-reference types. These are
non-null unsafe pointer types. They may be implemented as primitives or
as library types with compiler magic to hook into interop support. These
types are necessary for overload resolution. For example,
std::vectorhas twopush_backoverloads. One takes aconst T& valueand the other takesT&& value. The former overload copies the parameter and the latter overload moves it. Efficient usage of C++ requires differentiation between lvalue-reference accepting functions and rvalue-reference accepting functions. The addition of lvalue- and rvalue-references don’t imply any change to Rust’s object model. - Non-trivial relocation functions. For C++ classes
with non-trivial move constructors, the relocation function
move-constructs into the target and runs the destructor on the source.
This corresponds to
operator relrelocation constructor in Safe C++. - noexcept-specifier. This is to accurately
match functions declared with a noexcept-specifier on the C++
side. Without this capability, we’d fail to support Rust definitions of
non-throwing functions. Rust’s coverage guarantees that there are no
unwind paths out of a
noexceptfunction; a requirement of the function’s implementation. noexcept-specifier is something Rust would benefit from generally, especially in builds where panics abort, because it increases the distance between potential throws which promotes the ability to relocate out of references. And it generates smaller code.
We won’t be able to define in Rust all functions previously declared in C++, since some function prototypes involve language entities that extended Rust doesn’t have coverage for. Overloading is supported, but templates aren’t. But that should be okay. You can still use C++ types and functions directly from Rust. The C++ language server is responsible for evaluating the semantics around function calls and template specializations.
One-sided interop without language servers
We can add a lot of value even with one-sided interop. Extend C++ with coverage of all of Rust’s types. This provides an environment to write idiomatic wrappers that provide access to C++ functionality through Rust’s native types and traits. There’s still the impedence of only being able to access C++ assets via these wrappers, but since the wrappers are implemented in C++ side, they have unfettered access to your legacy C++ code. It’s just C++ wrapper implementations calling into other C++ code.
Contrast with the current practice of trying to bridge the language divide with unsafe C APIs, and then wrapping those in Rust. It’s that loss of expressiveness that makes C API bridge interop so frustrating.
Parameter destructors
The C++ Standard doesn’t prescribe parameter-passing conventions. That’s left to the platform ABI. On Unix-like platforms, the Itanium C++ ABI stipulates that callers are responsible for calling destructors on function arguments.
If the type has a non-trivial destructor, the caller calls that destructor after control returns to it (including when the caller throws an exception).
– Itanium C++ ABI: Non-Trivial Parameters[https://itanium-cxx-abi.github.io/cxx-abi/abi.html#non-trivial-parameters]
fn f1(s:String) {
// s is owned and destroyed by f1.
}
fn f2(s:String) {
// s is owned by f2.
// s is relocated to f1. It's no longer owned by f2.
f1(s);
// s is not destroyed because it's not an owned place.
}Rust performs relocation on objects that are
non-Copy. Relocating
s from
f1 into
f2 leaves the
s parameter uninitialized. Drop
instructions for local objects with non-trivial destructors are emitted
when a function is lowered to MIR, but a subsequent drop
elaboration pass eliminates drops for places that are
uninitialized.
In Rust, callees destroy parameter objects. This is necessary since a parameter may be relocated or dropped before the end of its scope. Calling a Rust function with the C++ convention risks a double-free: from C++, the caller would destroy the argument; from Rust, the callee would destroy the parameter.
C++ needs an alternate calling convention to support Rust’s affine
type system. The Safe C++ draft discusses [function parameter
ownership], proposing a __relocate
calling convention that gives ownership of parameters to callees.
The std::string tragedy
Almost all C++ container types are trivially relocatable without
knowing it. Important types like
unique_ptr,
shared_ptr and
vector are trivially relocatable.
Their declarations could be marked with a [[trivially_relocatable]]
attribute for compatibility with Rust’s relocation semantics.
Unfortunately, the libstdc++ version of
std::string
is not trivially relocatable. It implements a small-string optimization
that maintains a pointer back into its own storage. Move construction
and move assignment reset the small-string pointer back to local
storage. Trivial relocation would leave a dangling pointer. The idea was
to get rid of a branch when calling the std::string::data()
member function. But this optimization makes for a pretty wasteful
implementation.
std::string
weighs 32 bytes
(std::vector
is 24 bytes) but only has a local capacity for strings of 15 characters
or fewer.
This is not the first troublesome string. There was a previous
libstdc++
std::string
that used copy-on-write
to deliver cheap string copies. This was no good, because using the
non-const operator[]
function would technically invalidate the string, spawning a copy of the
data if the string didn’t have exclusive ownership. (A pity. Why are you
even allowed to modify strings like that?)
The move away from the copy-on-write string was one of the few ABI
breaks in C++ history. Rust’s avoidance of a stable ABI makes it easier
to change library implementations to satisfy new requirements. But C++
has a stable ABI, for better or worse, and you have to play it as it
lays. Thanks to
std::string
and the transitive property of containment, non-trivial relocation is a
necessary buy-in for Rust to support move semantics for many C++
types.
Swift’s coverage tradeoffs
The Swift team has been working for several years improving C++ interop. Their effort also embeds a C++ compiler (which is Clang) into the Swift toolchain. There’s no way to interop with C++ without embedding a C++ frontend.
The question of how much C++ coverage to incorporate in Swift is one that the engineers are wrestling with.
- Functions and constructors that use r-value reference types are not yet available in Swift.
- Swift supports calling some C++ function templates.
- Any function or function template that uses a dependent type in its signature, or a universal reference (T &&) is not available in Swift.
- Any function template with non-type template parameters is not available in Swift.
- Variadic function templates are not available in Swift.
The Swift language remains slightly smaller at the cost of not being able to use a large amount of C++. Without access to move semantics, it’s really not able to use any of it efficiently. Is this tradeoff worth it? I don’t think so. I don’t advocate a maximalist approach to extending Rust with C++ capabilities (although I do favor maximalism in the other direction), but I am convinced that a few strategic extensions to Rust will have enormous payoff for a quality interop experience.
- Enumerations that have an enumeration case with more than one associated value [are not yet supported]
Swift didn’t extend its embedded C++ compiler with first-class enum
types. Therefore, the C++ side can’t use Swift enums with more than one
associated value. Enums are a flagship feature for both Rust and Swift.
I think it’s worth it to extend the C++ side to fully support them. Safe
C++ has first-class choice
types with pattern matching. While maintaining these extensions is a
burden for C++ tooling engineers, the goal of interop isn’t to make
their life easier, it’s to make everyone else’s life easier.
Exception handling
C++ exception handling is a major source of friction when dealing
with Rust interop. But it doesn’t have to be. Rust is 99% of the way to
supporting C++ exceptions. When compiled with -C panic=unwind,
which is the default, Rust functions are all potentially
throwing. When lowered to MIR and then to LLVM, function calls have
a normal edge leading to the next statement and a cleanup
edge that catches the exception, calls the destructor for all
in-scope objects with non-trivial drops, and resumes unwinding.
This is exactly what C++ does.
How C++ unwinds
struct HasDtor {
int i;
~HasDtor() { }
};
// Potentially throwing. (i.e. not noexcept)
void may_throw() { }
int func() {
HasDtor a { };
// On the cleanup edge out of may_throw, run a's dtor.
may_throw();
return 1;
}define dso_local noundef i32 @func()() #0 personality ptr @__gxx_personality_v0 !dbg !15 {
%1 = alloca %struct.HasDtor, align 4
%2 = alloca ptr, align 8
%3 = alloca i32, align 4
call void @llvm.dbg.declare(metadata ptr %1, metadata !20, metadata !DIExpression()), !dbg !28
%4 = getelementptr inbounds %struct.HasDtor, ptr %1, i32 0, i32 0, !dbg !29
store i32 1, ptr %4, align 4, !dbg !29
invoke void @may_throw()()
to label %5 unwind label %6, !dbg !30
5:
call void @HasDtor::~HasDtor()(ptr noundef nonnull align 4 dereferenceable(4) %1) #4, !dbg !31
ret i32 1, !dbg !31
6:
%7 = landingpad { ptr, i32 }
cleanup, !dbg !31
%8 = extractvalue { ptr, i32 } %7, 0, !dbg !31
store ptr %8, ptr %2, align 8, !dbg !31
%9 = extractvalue { ptr, i32 } %7, 1, !dbg !31
store i32 %9, ptr %3, align 4, !dbg !31
call void @HasDtor::~HasDtor()(ptr noundef nonnull align 4 dereferenceable(4) %1) #4, !dbg !31
br label %10, !dbg !31
10:
%11 = load ptr, ptr %2, align 8, !dbg !31
%12 = load i32, ptr %3, align 4, !dbg !31
%13 = insertvalue { ptr, i32 } poison, ptr %11, 0, !dbg !31
%14 = insertvalue { ptr, i32 } %13, i32 %12, 1, !dbg !31
resume { ptr, i32 } %14, !dbg !31
}
declare i32 @__gxx_personality_v0(...)In C++, in-scope objects with non-trivial destructors are destroyed
by the cleanup block. Here the cleanup block is
6. The
landingpad instruction advertises
its intent to cleanup in-scope
objects. The cleanup block copies out a { ptr, i32 }
pair, which indicates the exception object, calls
HasDtor’s destructor, and
resumes on that cached pair. Since
the function participates in exception handling it is associated with
__gxx_personality_v0, C++’s standard
personality function, which abstracts some even lower-level
exception-handling APIs.
How Rust unwinds
struct HasDtor { i: i32 }
impl Drop for HasDtor {
fn drop(&mut self) { }
}
// Potentially throwing. (i.e. not noexcept)
fn may_throw() { }
fn func() -> i32 {
let _a = HasDtor { i: 1 };
// On the cleanup edge out of may_throw, run a's dtor.
may_throw();
return 1;
}define internal i32 @_ZN5throw4func17hd08044f7eb69f50cE() unnamed_addr #1 personality ptr @rust_eh_personality {
start:
%0 = alloca [16 x i8], align 8
%_a = alloca [4 x i8], align 4
store i32 1, ptr %_a, align 4
; invoke throw::may_throw
invoke void @_ZN5throw9may_throw17hb8b8ce4f5b598848E()
to label %bb1 unwind label %cleanup
bb3: ; preds = %cleanup
; invoke core::ptr::drop_in_place<throw::HasDtor>
invoke void @"_ZN4core3ptr35drop_in_place$LT$throw..HasDtor$GT$17hcc21909492c17e73E"(ptr align 4 %_a) #5
to label %bb4 unwind label %terminate
cleanup: ; preds = %start
%1 = landingpad { ptr, i32 }
cleanup
%2 = extractvalue { ptr, i32 } %1, 0
%3 = extractvalue { ptr, i32 } %1, 1
store ptr %2, ptr %0, align 8
%4 = getelementptr inbounds i8, ptr %0, i64 8
store i32 %3, ptr %4, align 8
br label %bb3
bb1: ; preds = %start
; call core::ptr::drop_in_place<throw::HasDtor>
call void @"_ZN4core3ptr35drop_in_place$LT$throw..HasDtor$GT$17hcc21909492c17e73E"(ptr align 4 %_a)
ret i32 1
terminate: ; preds = %bb3
%5 = landingpad { ptr, i32 }
filter [0 x ptr] zeroinitializer
%6 = extractvalue { ptr, i32 } %5, 0
%7 = extractvalue { ptr, i32 } %5, 1
; call core::panicking::panic_in_cleanup
call void @_ZN4core9panicking16panic_in_cleanup17hb5e4521fe5c4d68fE() #6
unreachable
bb4: ; preds = %bb3
%8 = load ptr, ptr %0, align 8
%9 = getelementptr inbounds i8, ptr %0, i64 8
%10 = load i32, ptr %9, align 8
%11 = insertvalue { ptr, i32 } poison, ptr %8, 0
%12 = insertvalue { ptr, i32 } %11, i32 %10, 1
resume { ptr, i32 } %12
}
declare i32 @rust_eh_personality(i32, i32, i64, ptr, ptr) unnamed_addr #1Rust does all the same cleanup as C++. In fact, it does more
cleanup, because even its destructors are potentially throwing. C++
destructors are implicitly
noexcept. In
this Rust example, the cleanup block is called
cleanup. The
landingpad instruction expresses the
cleanup handler and caches the same
{ ptr, i32 }
pair. The cleanup code branches to
bb3 which calls
HasDtor’s destructor. But that
destructor is also potentially throwing. If the destructor
throws, it’s non-recoverable, since we’re already on the cleanup path.
That cleanup edge jumps to the
terminate block which calls core::panicking::panic_in_cleanup.
That function prints “panic in a destructor during cleanup” and aborts.
The normal path out of the destructor branches to
bb4 which resumes stack
unwinding.
If you look closely you may one salient difference: Rust uses the
rust_eh_personality personality
function. This is closely modeled on the C++ version: rust_eh_personality_impl.
If Rust’s personality function is actually incompatible with
C++ cleanup (I don’t know if it is or not), it can be replaced by
__gxx_personality_v0. Additionally,
for consistency with C++ exceptions, Rust’s panic objects could be
allocated with
__cxa_allocate_exception, the same
storage that backs C++ exceptions. That’s part of libc++abi.
RTTI
struct S { int i; };
void throw_it() {
throw S { 10 };
}
int main() {
try {
throw_it();
} catch(S s) {
} catch(int i) {
}
}%struct.S = type { i32 }
$_ZTS1S = comdat any
$_ZTI1S = comdat any
@_ZTVN10__cxxabiv117__class_type_infoE = external global [0 x ptr]
@_ZTS1S = linkonce_odr dso_local constant [3 x i8] c"1S\00", comdat, align 1
@_ZTI1S = linkonce_odr dso_local constant { ptr, ptr } { ptr getelementptr inbounds (ptr, ptr @_ZTVN10__cx
xabiv117__class_type_infoE, i64 2), ptr @_ZTS1S }, comdat, align 8
@_ZTIi = external constant ptr
; Function Attrs: mustprogress noinline optnone uwtable
define dso_local void @_Z8throw_itv() #0 {
%1 = call ptr @__cxa_allocate_exception(i64 4) #4
%2 = getelementptr inbounds %struct.S, ptr %1, i32 0, i32 0
store i32 10, ptr %2, align 16
call void @__cxa_throw(ptr %1, ptr @_ZTI1S, ptr null) #5
unreachable
}C++ uses RTTI typeinfo data to identify the type of a thrown
exception. The throw-expression passes a pointer to
_ZTS1S to
__cxa_throw. That’s the RTTI
typeinfo structure for class S.
%8 = landingpad { ptr, i32 }
catch ptr @_ZTI1S
catch ptr @_ZTIi
%9 = extractvalue { ptr, i32 } %8, 0
store ptr %9, ptr %2, align 8
%10 = extractvalue { ptr, i32 } %8, 1
store i32 %10, ptr %3, align 4
br label %11The try-statement in main indicates the RTTI typeinfo data
for all of its catch-clauses. Rust doesn’t exactly conform to
this convention. Does that create interoperability problems? I’m not
sure. It is the case that C++ can’t catch panic objects. But this is
easy to resolve: emit a C++ RTTI typeinfo struct for the Rust panic type
and point __cxa_throw at that. This
is a very minor change, if it is necessary at all.
We can unstick one of interop’s most irritating sticking points. C++
exceptions will propagate safely through Rust frames, properly
destroying all in-scope objects. As far as the ability to catch C++
exceptions, coverage could be added to Rust. But since that’s already
part of C++, you may as well do it there: write your
catch/throw
handler on the C++ side. Interop will let you return
Result or any other Rust type.