David A. Wheeler – Countering Trusting Trust via Diverse Double-Compiling (2009)

22 points by ce4 2 years ago · 4 comments

Reader

If I understand the mechanism correctly, you compile the compiler with itself. Then you compile the compiler with a trusted compile and then compile the compiler with itself. You then make sure the two versions are bit identical.

I think one big limitation with this is the assumption that the compiler applies optimizations in a deterministic manner. That’s possible today through the control of the RNG seed via a command line parameter, but one could imagine alternate attacks that violate the assumption around a deterministic mapping from input to output of the malicious compiler (eg __TIME__ in the compiler mechanism would completely thwart this approach until the compiler is fixed to be deterministic). For example, you could have the exploit only injected when it detects a specially crafted PGO payload that adjusts the heuristics in a way that triggers specific passes to turn on or not. Now the mechanism would still protect against that I think but only if you had access to the exact PGO profile used. Similarly, I suspect there are non determinism sources to exploit around file iteration ordering. Again, not that it defeats the protection but that it subverts the assumption of the protection mechanism. You could also imagine a compiler that uses a search mechanism to refine and apply optimizations (ie E-Graph compilers output is dependent on the amount of search time allowed for).

It would be nice though if compilers adopted this now to harden attacks so that a) you have to do more esoteric attacks b) compiler authors think about sources of non determinism and how to control for them (they already do but this would be another layer).

The hardest thing though would be coming up with the trusted compiler. The easiest hack is to have gcc be that for clang and clang be that for gcc to raise the cost of the attack required. Not sure what a better mechanism might look like.

mhh__ 2 years ago

Something I've been wondering about: is it possible to use something like mutation testing / a reduction tool to heuristically "prove" that the binary can be attributes to real lines of code in the input.

Solves a different problem to this.

vlovich123 2 years ago

My hunch is that’s been proven to be a Turing completeness level challenge (ie I think you’d have a tough time even coming up with a way to compute a probability that it’s non-tampered).
I suspect even that heuristically will be exceedingly hard to do for an optimizing compiler since the optimizations it performs are very complex data crunching to detect various invariants that allow for a safe optimization (eg notice how when you debug optimized native code it jumps around between random lines?)
A stochastic approach might give you something but you’d need to actually explore the idea to understand what that is if anything. I’m not aware of anything in that space but I’m also only a casual observer and not that read up on the literature.

Settings

David A. Wheeler – Countering Trusting Trust via Diverse Double-Compiling (2009)

Keyboard Shortcuts