March 20, 2025
This started as a Mastodon thread, but I’m putting it here for future reference.
Tell me (email, I guess, I don’t do comments anymore) if this chain of thinking is reasonable:
We deploy fleets of container-borne VMs that are – in theory – whole but zero-user systems. They’re SAAS shims with well-understood, well-constrained roles.
Sometimes zero user systems unexpectedly become one-user systems. When that happens, that’s because an uninvited user has figured out how to do something unanticipated to make themselves at home, and that’s bad.
Our uninvited guest’s exploration will involve novel-for-this-VM interactions, so the correct behaviour for that machine in that moment is to die and leave an admin the most complete forensic experience possible, just core dump and halt.
With all of that in mind, containerized VMs destined for production should be gathering a list of known-permitted actions during a prerelease/testing phase, and permitting only those actions in production.
I admit that this is mostly a new space to me but it’s kind of surprising that none of the LSM frameworks – SELinux, AppArmor, Tomoyo, Smack, none of them – admit the existence of verbs, much less conditionals.
Put differently, all of these frameworks offer permit or deny but not react; that is: there is not a built-in way to to say “in the event of this policy violation, take the following action”, even if that action is as simple as “halt immediately”.
All the contention in that space seems to be around how difficult v. “expressive” the different context descriptions are. All of the LSM frameworks look like they’re intended to constrain known actors within multiuser systems that are themselves expected to stay resiliently running, which might be a product of their times but… I think that most of the systems we deploy in the world now are the opposite of all three of those things.
I’m about 95% sure that if I want the most basic version of this, I can crowbar that into existence without being much of a programmer at all, much less a kernel dev: in theory, I’d just need to pick an LSM, find the line where it logs policy violations, add “panic();” to the line under it. Nothing to it, but… is this all people want this to be? Sure, I can get a more complicated version of what I want with journalctl | grep | something chmodded +s, but: c’mon. I shouldn’t be cobbling this together; I can’t be the only person in the world who wants this.
There has to be prior art here, this can’t be a thing we don’t already have somewhere. What am I missing?