Think twice before translating all C to Rust

Herbie Bradley and Girish Sastry for the IFP proposed The Great Refactor: translating all C code to Rust, to eliminate memory corruption vulnerabilities. This is a noble goal. Memory corruption is a huge security problem: 70% of vulnerabilities plaguing C and C++ codebases are due to memory corruption. They write that a Focused Research Organization (FRO) using AI could translate all the code for $100 million by 2030, which in my opinion is relatively cheap and quick.

But for most of the code, translation would be a mistake. Writing a new codebase is almost never a good idea, because old code has accumulated lots of lessons about how it should behave.1 These lessons are sometimes explicitly encoded in regression tests, but often they remain implicit in the code. Human programmers invariably fail to notice some of these lessons and forget to put them in the new codebase. It is likely that AIs would do the same, unless they become much more reliable agents and incredibly thorough at writing tests. For example, on November 10th sudo-rs, a crucial Linux utility translated to Rust, was found to have introduced two vulnerabilities (one ‘medium’, one ‘low’, it must be said). These were not memory-related, but were logic vulnerabilities: errors in the details of specifying what the program should do.

The risk of logic bugs would be worth it, if translation to Rust was the only way to get memory safe code. However, there is another way: use Fil-C, the fanatically compatible memory-safe C and C++ compiler, to re-compile all existing open-source code. Because Fil-C only needs minimal changes, we could provide safe versions for almost all of the ~70,000 Debian packages by the end of 2026, for a fraction of the price of a Rust rewrite and introducing practically no logic bugs. This is not theoretical: Filip Pizło, the author of Fil-C, already has a Linux demo with 100 memory-safe user programs, including CPython 3.12.5 and Ruby. Noted cryptographer DJB is already building Debian packages with Fil-C.

Fil-C does present tradeoffs: it is slower (1-4x) and uses more memory (3-6x) than the normally compiled C program. For most workloads, the extra resources are negligible, but in some cases (web browsers, probably) they will be a problem. It is for web browsers that Rust was designed, and there translating to Rust might be unavoidable. I would also recommend Rust for new projects—it is a great language.

Ambitiously, the Fil-C approach could make a much larger quantity of code memory-safe than a Rust translation can realistically do, with much lower adoption costs. Beyond the userland, some Linux kernel subsystems (filesystems, drivers, namespaces) could be compiled with Fil-C, with escape hatches in performance-critical regions reminiscent of Rust’s unsafe blocks. Depending on the amount of code changes required, it is even plausible that closed-source driver vendors (e.g. NVidia) could distribute safe versions of their drivers.

When the program allocates some memory and assigns it to a pointer, the object’s bounds are stored in the Fil-C runtime. Then, every time before a pointer is accessed, Fil-C checks that the access is within bounds. If not the program crashes. Fil-C also uses more memory, to keep track of the pointer bounds. To safely and efficiently handle the extra pointer bounds objects, Fil-C introduces a garbage collector. (IMO, this makes Fil-C compiled programs similar to Golang: memory safe, garbage collected, and 1-4x slower than C).

The overwhelming majority of open-source C programs use only a small amount of compute and memory resources, and we should be willing to spend more of them to acquire security. Even so, in some cases the extra resources are a downside.

Another downside is that Fil-C is not compatible with libraries in binary form, it requires a re-compile. This is fine for open-source software, but not for proprietary libraries unless the vendor re-compiles using Fil-C. This would require only small engineering efforts, once the general approach to safely compiling drivers is established. The FRO can prove this approach works by compiling the AMD drivers, which are open source.

The memory-safety guarantees provided by Fil-C are enforced at runtime, rather than proven at compile time. This means that Fil-C programs may crash unexpectedly. This is infinitely better than the current state of memory vulnerabilities, and can be handled just fine.

Finally, Fil-C code is new and untested. However, it builds on LLVM, a widely used and scrutinized compiler infrastructure. The changes made to the compiler, while extremely clever, are only a few thousand lines of code and thus auditable. It’s probably easier to audit this than all the unsafe blocks in Rust dependencies.

Fil-C augments each pointer with a lower and upper bound, which jointly determine what values of the pointer are allowed to dereference. The compiler generates code to check bounds when dereferencing the pointer, and crashes the program if the pointer’s value is out of bounds.

Pointers in 64-bit normally take up 8 bytes. In Fil-C, pointers on the stack (‘in flight’) incur 24 extra bytes of overhead (lower, upper and aux word). On the heap (‘at rest’), the overhead scales with object size: it is the size of the object holding the pointer, plus 16 bytes for normal and 32 bytes for atomic pointers.

This can add up to a lot of memory if all your objects contain pointers. Its performance impact can be mitigated by allocating a large chunk for objects and using indices instead of pointers, but it does mean a corrupted pointer can access anything in the object-chunk. This is fine most of the time, but could be a big problem in Javascript runtimes, where both memory efficiency and the integrity of objects within the runtime are essential. For Python machine learning workloads, it is less of a problem, because most memory is used for massive arrays with no pointers.

It would cost much less money and be done faster. Because the open-source programs involved are the same (rather than a rewrite), they can be distributed through existing channels that distribute software versions with different ABIs, helping adoption. This is the approach taken by

Because Fil-C doesn’t require much in terms of rewrites, it would be easy for software vendors that publish C/C++ libraries to update them. After demonstrating with an open-source complex driver that it can be done, the software vendors could adopt it..

Fil-C would also have positive externalities on the memory safety of code for people who don’t make the switch. First, it is easy. Second, data corruption does not have to cause a crash, so as a user of the software who is triggering incorrect writing patterns wouldn’t notice anything is wrong. I expect that when a Linux userland is compiled fully in Fil-C, people’s programs will crash a lot more often and we’ll discover some security vulnerabilities this way, that would have forever remained hidden.

Because Fil-C and the Rust compiler are both built on LLVM. With some development Fil-C could plug into the part of Rust’s compiler which generates code from unsafe blocks. As Graydon Hoare points out, it would make the unsafe parts of Rust even safer by checking pointers there—thus getting performance when the Rust borrow checker can prove memory safety, and otherwise a sound but slower approach.

Think twice before translating all C to Rust

Discussion about this post

Ready for more?