eBPF Verification Is Untenable
twitter.comThis is weird.
1. Instead of having the kernel verify the program about to be installed at installation time, they rely on a trusted compiler and having the kernel perform signature validation. This means that the kernel is relying on a userspace component to enforce kernel-level safety guarantees, adds another level of coupling (via key infrastructure) between the kernel and a particular version of the Rust compiler, and if someone can get the signing key then the kernel will run their signed code no problem.
2. The Rust compiler famously prevents various memory safety correctness bugs, but does not enforce other important parts of eBPF such as termination. The proposed solution is basically just to have a timeout instead. This moves checking for bugs from load time (with the verifier) to runtime, which means you will not know you have a buggy eBPF program until you actually hit the bug and it's terminated. Timeouts are strictly worse than termination checking because they are always either too long or too short.
3. Their major problem is with "escape hatches", kernel code which eBPF programs call out to. They show that various escape hatches can be eliminated or simplified. However they don't have a plan to eliminate all escape hatches, and don't even demonstrate that their technique would eliminate particularly problematic escape hatches.
The escape hatches are unfortunately core to how eBPF in the kernel can work at the moment. It keeps the kernel from having to provide every possible piece of data the program might need as an argument, and provides a lot of assists that otherwise wouldn't be verifiable in general code. Stuff like string operations that would cut into the max 4096 instruction count.
For the most part, the kernel doesn't provide generic string processing helpers. Most helpers are there to form the basis for specific eBPF integration points (the majority are packet and socket handling tools).
The kernel provides quite a few string processing helpers; I'm not sure why they're not documented. Perhaps the purpose of the document is to highlight bpf specific functionality rather than underlying runtime helpers?
You can see them listed here: https://elixir.bootlin.com/linux/latest/source/kernel/bpf/he...
And my point in highlighting them in this discussion was to get one to consider how of trap outs to the kernel are sort of fundamental. You get 4096 instructions to execute before you're cut off by the kernel (albeit statically). Given that, say, PATH_MAX is 32768, then you simply don't have enough compute time available to process certain strings you'd expect some bpf filters to be able to handle.
So basically I just wanted to hit home that some raw computation expected of bpf is incompatible with it's current verification model and at the very least extremely deep changes to bpf would need to occur to get rid of helpers.
This a link to string->integer conversion and comparison, and nothing else.
that's what GP said.
They are semantically simple string operations whose computational complexity scales with string length. Near unbound string lengths would exhaust the time (somewhat aproximated by instruction) budget of eBPF applications doing even a single one.
That's all that's there. String-integer conversion and comparison.
I think the point was not to show "there are a lot of helper functions" but rather to point out "the (undocumented) helper functions that exist are problematic"
That's not "quite a few" string helpers.
There are others too, for instance string formatting: https://elixir.bootlin.com/linux/latest/source/kernel/bpf/he...
Write an eBPF program that actually needs to do any kind of meaningful string manipulation, and you'll quickly get a sense of just how rich the BPF helper inventory is in string processing functions. It'll be sharply obvious, because bounded loop enforcement will keep you from writing even the simplest string functions yourself.
I never said that it's anywhere near a complete set. My point is that they exist as something that must be a helper because they represent a type of raw computation expected of bpf programs, but incompatible with it's verifier model.
The original parent was talking about removing all helpers,
> They show that various escape hatches can be eliminated or simplified. However they don't have a plan to eliminate all escape hatches
which simply doesn't make sense. The string helpers are a good example one might not have thought of beyond helpers that expose linux specific functionality.
The specific quantity of "quite a few" was left intentionally vague as it's orthogonal to my core point.
And yes, string ops are difficult if not impossible to write in verified bpf; that's almost a restatement of what I've been saying this whole thread.
I don't think it's super common to do any kind of serious string manipulation in BPF. I happened to have Facebook's `dnswatch` open in my editor, and there are zero calls to strtol or printf. The idiomatic design here is to do that kind of thing in userland, piping raw stuff down a perf or ring buffer to postprocess in a "real" language runtime.
So my rebuttal would be: you could just remove those few string helpers, and not change much about the programming model.
It's common enough in seccomp filters; I've personally used them there. Sometimes you trap into a user space dameon via seccomp_unotify to do relatively unbounded brain surgery on the unprivileged process, but that's pretty expensive to trap out and back to a daemon and it's much better if possible to make the decision in the filter program if you can get away with it.
Edit: Perhaps you'd be happy with another example that isn't a string processing function, but follows the same core idea I'm trying to get across of compute offload from the verified program to create useful verifiable programs: bpf_l3_csum_replace. It's not too hard to hit an mtu where you'd run out of instructions just recomputing the checksum because of the complexity of per byte computation required if it were to happen in regular bpf. This helper is not exposing a specific of the network stack or kernel really (other than who else needs one's complement?), but is instead really a unit of computation not super amenable to bpf verification that's still required of the use cases expected of bpf.
So "generic string processing helpers" provided by the kernel?
"quite a few" tho
There are others listed in the thread. The specific quantity of 'quite a few' is orthogonal to the core point and was intentionally left vague in what now appears to be a misguided attempt to keep the comment concise and focused.
Rookie mistake! :P
> 1. Instead of having the kernel verify the program about to be installed at installation time, they rely on a trusted compiler and having the kernel perform signature validation. This means that the kernel is relying on a userspace component to enforce kernel-level safety guarantees, adds another level of coupling (via key infrastructure) between the kernel and a particular version of the Rust compiler, and if someone can get the signing key then the kernel will run their signed code no problem.
FWIW I'm pretty sure this is how Microsoft does it. Verifier is in userland and signs programs post-verification. This keeps the attack surface unprivileged and is a great idea if you couple your Kernel to your userland and if your operating system has a notion of process protection - Linux doesn't do either.
Driver Verifier? That’s not intended to prove the code under test secure, only to hopefully show that it’s not complete crap in well-known ways. Even a signed driver is still trusted code and requires administrator privileges to install. I guess the closest Linux counterpart would be a distro maintainer running a hardware vendor’s out-of-tree module under KASAN and, if it passes, signing the package with their PGP key.
But none of that is intended or able to check the module (resp. driver) is not gimmeroot.ko (resp. gimmesystem.dll)—that’s left to humans inspecting the source (resp. thoughts and prayers[1]). On the other hand, the eBPF VM absolutely is intended to be able to load anything any unprivileged user throws at it and emerge unscathed.
It’s not precisely essential that a kernel have this capability, but if one is to have it, restricting the allowable code to a predetermined vendor-approved set defeats most of the point. (The authors propose that a userspace compiler running on the user’s computer be allowed to extend it, as I understood them.)
[1] https://www.zdnet.com/article/these-hackers-used-microsoft-s...
No, not driver verifier. https://github.com/vbpf/ebpf-verifier
This link is about a proposed new eBPF verifier for the Linux kernel that doesn't use signing. As a research project it is not integrated to the kernel, but their plan does not involve trusting user space (instead they suggest doing the heavy lifting of the verification in user space and provide a proof of safety that the kernel checks, which seems sensible to me).
I believe you meant to link https://github.com/microsoft/ebpf-for-windows/ instead (discussed on HN recently) which is an implementation by Microsoft using the above research project that indeed does not follow the suggestion from the authors of the research project to use validation and does require trusting user space.
Yeah, I had intended to link to that repo, which also links to the one I provided - unsure what happened there.
> FWIW I'm pretty sure this is how Microsoft does it. Verifier is in userland and signs programs post-verification.
Almost. Yes the verifier is in userland, but it doesn't sign things — it's a trusted component of the system, there's no need for a signature on this step. It simply says "OK". But the verifier itself is covered by the usual system integrity mechanisms.
I see, thanks.
I'm a bit skeptical of this. It will work for some BPF use cases, but for others it might be a nightmare to deploy something in production at scale this way. Essentially on the target machine you're no better than signed kernel modules. If someone gets in possession of the key, they can do whatever they want given there is no verification mechanism anymore. It sounds good for programs of rather static nature, but for more complex application it's rather theory imo.
Your point 1 is the elephant herd in the room. If I were a paranoid person, I would think it’s by design - build in a way to compromise a system retroactively.
That makes no sense
You should read Ken Thompson's "Reflections on trusting trust". Outsourcing security to a tool which you have to blindly trust, and can't verify is very, very dangerous.
You've obviously misunderstood the proposal - there's nothing about this that is "blind trust" at all.
Anything based on PKI which at some unknown time in the future can be leaked or otherwise compromised is “blind trust”.
This is why perfect forward secrecy techniques have been developed.
In re 1, the system operator could configure the kernel to trust their signing key, and build the extensions themselves, which is still highly complex but would minimize the risk of a general compromise.
That said, I agree in general that this approach is mostly going backwards and fails to address the core risks. It’s also important to push back on the Rust-as-security-panacea meme. Rust prevents a certain class of bugs, but it doesn’t ensure reliable operation.
Questions about whether it'll work aside, the architecture is not that unusual. The Burroughs machines from the 1960/70s relied on a trusted compiler. Not signed though - just regular OS permissions ensured only the "operator" account could run the compiler :-) https://en.wikipedia.org/wiki/Burroughs_MCP https://seclab.cs.ucdavis.edu/projects/vulnerabilities/doves...
Hm. Doesn’t look viable to me.
I’m not against language-based security, proof-carrying code, and all that, but I have less than perfect confidence that the Rust compiler currently is or will soon be sound enough to be secure against actively hostile code—AFAIU the language designers haven’t even written down their core calculus, let alone proven it sound. Putting the entirety of the Rust compiler (including, at least for now, millions of lines of C++ from LLVM) in the TCB of your system also feels less than inspiring.
There’s also the part where if you want to instrument the kernel with something other than Rust but still relatively powerful—I dunno, Ada—then you’re looking at putting the compiler for that in the TCB, too; you benefit from none of the verification work. Sound, tractable, and expressive type systems are usually fairly isolated in design space, so source-to-source translation of arbitrary programs is impossible most of the time.
Uploading System F (e.g. Dhall) or CoC to the kernel I could see—except for the tiny problem of memory management of course—but uploading Rust, even precompiled, I honestly can’t.
> I’m not against language-based security, proof-carrying code, and all that, but I have less than perfect confidence that the Rust compiler currently is or will soon be sound enough to be secure against actively hostile code
Yeah, rustc currently does not claim to be resilient to hostile source inputs. Those are bugs that need to be fixed, but they're not p-critical warranting a point release.
> to be secure against actively hostile code
Was that a requirement for the predecessor of eBPF: Custom kernel modules?
Kernel modules require root privileges to load and the Linux kernel's philosophy (pre user namespaces lollllll) was that root -> kernel privesc didn't matter.
Of course it would be nice if every app can load up its own untrusted eBPF code and for the kernel to not be compromised. But why such high standards, where else is that the standard to go for? Seems perfect is the enemy of good.
I don't think "standard" is the point. It's about unlocking new features and capabilities.
eBPF is not a replacement for the general concept of custom kernel modules.
Also JVM, ART, .NET verifiers show how complexity hard is to write bytecode verifiers and that is with bytecode that was designed for verification to start with.
I’m not sure how true that is. I seem to remember that some of the problems in JVM bytecode verification are due to a wrong design and not shared by e.g. WASM, and I’m under the impression that (if you don’t try for the absolute best performance and streaming) WASM verification is fairly straightforward. Also, eBPF should probably also fall into the “designed for verification” category, so I can’t figure out what your point is here.
If it had been we wouldn't been having this discussion thread on a security paper.
OK then! Back to C it is I guess. More seriously, we're talking about the Linux kernel here: it's written in C, and there's some momentum to write new code in Rust. You're asking for the moon, but you may have to settle for a picture of it.
I’m not talking about writing parts of the kernel in Rust. I‘m not even talking about using Rust inside the eBPF implementation specifically. In either case that’s replacing C with Rust, and if that’s what you want, sure, knock yourself out. In the spirit of full disclosure, I’ll admit to not being a fan, but it’s still entirely plausible and one can seriously argue it’d be an improvement.
But what TFA talks about amounts to replacing the eBPF verifier with (a blessed userspace version of) the Rust typechecker—dragging the rest of the compiler along for the ride—and that just feels like a downgrade in almost every respect. It’s humongous, it requires strange contortions due to not fitting in the kernel, it implements a comparatively very complicated spec, that spec is not written down, etc. The eBPF machine is not perfect, especially (as the authors point out) when you account for the “helpers“, but it mostly avoids these downsides. It’s not the moon—it’s already there.
You can write ebpf programs in rust. Bpftrace generates programs from an awk like language, you can make the program however you want. Solana does it with rust, I don’t know what it gains you though given the verifier protecting you from most of the pitfalls of C.
Back to dTrace would be the obvious solution. Only OracleLinux has that.
Seemless probes across the kernel, libs and user-facing app.
No arrays.
Works for decades, but linux devs thought it they could do better.
Yes! But, NIH. Hmm, well, then again, BPF is NIH..
I hope no one tries to use the rust "safety" guarantees for security guards.
They are designed to prevent bugs not intentional abuse.
If perfect without bugs they theoretically might be usable for security guards, but it's not where priorities lies when it comes to bug fixes and design.
And people mistaking rust safety + no unsafe lint for "security against evil code" could be long term quite an issue for rust in various subtle ways (not technical problems put people problems).
I agree -- relying on Safe Rust's "guarantees" for security purposes is very likely to be problematic. To make the reasons concrete: for the last 4 years rustc has had a bug that allows writing transmute (arbitrary type conversion) without the use of unsafe: https://zyedidia.github.io/blog/posts/5-safe-transmute/. This is one of the 77 current open unsoundness bugs on the Rust issue tracker. To make this tenable you would probably have to use a separate language -- maybe some formally-verified minimal Rust-like language, and with different priorities from a people perspective.
I am a bit skeptical this is a workable approach long term, but there is a project based on an attempt to enumerate all of Rust's soundness holes and use Rust's compiler infrastructure to detect and forbid them. They think that by erring on the side of forbidding valid code this is feasible. https://news.ycombinator.com/item?id=35501065
From the horse's mouth (article of the linked HN post):
> PL/Rust contains a small set of lints to block what the developers have deemed the most egregious "I-Unsound" Rust bugs. > [...] > Note that this is done on a best-effort basis, and does not provide a strong level of security — it's not a sandbox, and as such, it's likely that a skilled hostile attacker who is sufficiently motivated could find ways around it (PostgreSQL itself is not a particularly hardened codebase, after all).
They have extra lints to help you avoid what they deem the most common soundness bugs. They make no claims that there is a way to make this approach safe against an attacker.
Postgres does that for its new Rust support.
https://news.ycombinator.com/item?id=35501065
They do ban unsafe and also the stdlib which probably covers a lot of soundness holes.
Also I suspect the trust level required is somewhere in the middle.
in most situations custom SQL function are from a trusted source, through potentially run with untrusted inputs in a unprivileged/trusted execution environment.
this would mean they don't necessary rely on it for sandboxing untrusted code purpose
it's more like a convenient way to write a native extension function
through it's still a bit worrisome
They didn't mistake rust safety for anything. This is called out by them as a shrotcoming of their approach that has to be mitigated separately.
Because my comment was "in general" not specific to any specific case I very intentionally did not refer to the paper at all.
I guess I find it weird that you posted something totally random and unrelated to the paper as a direct reply to the paper.
First off, I kinda skimmed this.
So I think the critical thing here is that verification is not enough. It has to be the critical thing, because the implementation in the kernel might suck but Microsoft has shown that it's possible to build a powerful eBPF verifier that isn't a hacky mess.
The main issue is seemingly these helper functions. The position is that even a perfectly verified program won't be safe because of them. To me, the situation makes me think "so why are we allowing these helper functions?". The suggestion is, among other things, to replace these helpers with Rust code. But couldn't we just have the helpers not suck to begin with?
Using the Rust compiler as a sort of safety oracle also ignores the fact that rustc has numerous problems that can lead to unsafe code without `unsafe` (and tbh I don't really see the project prioritizing these cases because it's just not a meaningful problem for the typical rust threat model). They sort of address this but not very well imo - timers and runtime mitigations aren't ideal.
I think what might make much more sense is to instead have the eBPF Virtual Machine (and verifier) written in Rust, including all helper functions, but to still execute pure, verified ebpf within it, using a verifier that's been built in a way that's actually sound.
1. The verifier attack surface goes down because it's Rust. I think that removes the need to keep it in userland, which would fly for Windows / BSD but not Linux.
2. Helpers are in Rust so they're at least safer - I feel like this addresses a (the?) major priority in the paper. Based on the paper's notes about implementing helpers in rust requiring no unsafe, it's probably safe to say that the verifier and helpers being in Rust would solve a lot of problems without requiring eBPF programs to be in Rust (and good news, Rust programs can expose a C API).
3. We don't throw out the baby with the bath water. A verified program is a cool thing to have. I would rather keep verification.
It’s worth noting the verifier doesn’t verify C code, it verifies the compiled ebpf bytecode. You can generate that bytecode from rust (the solana cryptocurrency does this) but you still need to verify the actual instructions since someone can just write whatever they want by hand.
I'm suggesting that the ebpf code still be verified and that the only rust code used is to implement the verifier and the virtual machine itself.
This paper is an easy read, but it's basically just restating the premises of eBPF:
* Most programs can't be expressed in verified eBPF.
* The verifier functions, to the extent it does, in large part by rejecting most programs (and implicitly limiting the uses to which eBPF can be put).
* This is "extension code", and by definition, it interacts with the unsafe, unverified C code that the kernel is built out of.
(In addition to helpers, most serious eBPF-based systems also interact extensively with userland code, which is also not verified, and might even be memory-unsafe, though that's increasingly less likely).
It follows from these premises that vendors should be careful about enabling non-root access to eBPF; when you do that, you really are placing a lot of faith in the verifier. And: most people don't allow non-root eBPF. The verifier is in an uncomfortable place between being a security boundary and a reliability tool.
I'd argue that most of the benefit of eBPF is that you're unlikely to panic your kernel playing with it. Ironically, that's a feature you might not get out of signed, userland-verified, memory-safe Rust code.
Surely if you are allowing non-root eBPF then security of the programs is one of your least worries? Given all the implicit privilege escalation that comes with allowing non-root to spy on everything the kernel does.
Unprivileged BPF is used for socket filters, for programs to BPF-extend themselves. It wasn't ever the case that unprivileged eBPF would allow you to, say, load a TC filter and read everybody's traffic.
Ok but you can like put a tracepoint on read/write and peek at what’s going through those, no?
Nope. Tracepoint eBPF programs require root to load always. For eBPF you select a program type, and that limits what you can do (aka what helper functions are available to you) and what privileges are required.
I have no idea, because every system I've ever worked on has disabled unprivileged eBPF.
> ows from these premises that vendors should be careful about enabling non-root access to eBPF;
The thing is that it would be really nice to be able to set up a seccomp filter without a suid :\
seccomp does not use the eBPF userspace interface or any of the associated permission checks. seccomp (and also the classic socket filter interface) take cBPF (classic BPF), with no privilege checks; they use completely separate verification logic for this cBPF bytecode (the eBPF verifier is not involved IIRC), and then the cBPF code is (on almost all architectures) translated into eBPF. The eBPF kernel component is then only responsible for execution/JITting of this already-verified code, nothing else.
Makes sense, thanks.
Put extensions in a Wasm sandbox. The type system has been proven sound to the highest level of assurance possible with today's technology, mechanized at least twice, once in Coq and once in Isabelle. The algorithm is efficiently implementable and there are approaching a dozen production Wasm engines, some of which have tiers with proven safety guarantees. There is even an interpreter written in a proof assistant that has been proven fully functionally correct.
eBPF code gets to read and, with many limits, write kernel memory; further, the most fundamental guarantee BPF provides, going back to 1991, is that programs terminate, which isn't a Wasm guarantee.
The verifier is doing something much more ambitious than hardened runtimes do (and that only because it makes drastic compromises in the otherwise valid programs it will accept).
> eBPF code gets to read and, with many limits, write kernel memory
Import kernel read/write functions into the Wasm module, so they can be policed. Or, if performance needs be, map limited portions of the kernel memory into the Wasm extensions linear memory.
> programs terminate,
Several Wasm runtimes count Wasm instructions (e.g. by internal bytecode rewriting) and dynamically enforce execution times. If static enforcement of termination is really all that important, exactly the same kinds of restrictions could be applied to Wasm code (e.g. bounded loops, no recursion, limits on function size, memory size, etc).
The BPF verifier doesn't simply count instructions (though there is a maximum instruction count as a failsafe). It can't: eBPF programs are JIT'd down to machine code --- that's part of what makes eBPF so attractive, because the code you're running is comparably fast to the "native" kernel code. Instead, it refuses to admit programs that can't be proven to constrain their loops.
I don't think jitting necessarily precludes counting runtime instructions. You could always jit in an internal variable that gets incremented for each high level instruction being translated. And of course you can optimize the increments within each basic block. There's even a cool minimum spanning tree algorithm due to Larus and Ball originally for path profiling that might be adaptable to reduce increments across the blocks.
I know this is a bit of an aside. The point still stands about the user not wanting their bpf program to terminate at runtime investment.
I get that. Maybe you should read my comment again. Enforcement doesn't have to be dynamic. Any restrictions put on eBPF code could be enforced statically on Wasm code too. Wasm has way better JITs, some of which have been subjected to formal verification. The tech curve for Wasm engines is still pointing up, and eBPF has completely fallen off it and is a liability at this point. It should be abandoned in favor of Wasm.
If you're back to relying on the same verifier, what does switching to WASM accomplish? I don't understand your "tech curve" point at all. If Rust programs compiled to WASM had to be BPF-verified, you'd be in exactly the same tooling pain you are now with eBPF. The hard part of writing eBPF programs isn't eBPF bytecode, which nobody uses (virtually all eBPF is either C or Rust now), it's passing the verifier.
Program termination is a solved problem with gas metering. Ethereum popularized the idea, but the idea itself is as old as hills.
There's a subtlety being missed here. Proven termination of BPF programs far predates the adversarial threat model you and the WASM person are thinking about. It was a property of the original 1991 McCanne BPF. It's a safeguard for the programmer against themselves. eBPF shims in all over the place in the kernel; it would not be OK for the guarantee to simply be "there's a worst case maximum cycle budget for programs". eBPF programs are bounded, so they can be installed in hot places in the kernel.
The solved problem you're referring to is a much simpler problem.
What is the difference between "there's a worst case maximum cycle budget for programs" and bounded? and why it would not be OK?
This is such an obvious solution that I wonder why eBPF exists at all. WebAssembly is better for the purpose in like every way? Be against Not-Invented-Here, don't reinvent the wheel.
Well, for one thing, eBPF predates WebAssembly.
I feel that this proposal defeats the entire purpose of ebpf. The point is to have a bytecode language that can do simple processing in the kernel. This code is frequently generated adhoc, such as with bpftrace. I don’t like all the limitations that currently exist in bpf, but just replacing it with rust and signature verification basically turns this into kernel modules all over again.
There's nothing really "simple" about eBPF bytecode; it's a full fledged ISA, so much so that the idiomatic way to build eBPF programs is to compile them from straight C with clang.
The Rust compiler has several unsoundness bugs that are years old. If you trusted Rust language security in the kernel, these would all be security holes.
Somewhat tangentially related, if anyone is interested in writing eBPF programs in Rust, check out aya-rs (https://aya-rs.dev/).
Rustc supports eBPF bytecode as a target, and aya-rs avoids using clang/llvm. So you can use rust to write eBPF code in both user and kernel space.
This is a different beast from the usual rust though - lots of `unsafe`s.
I haven't been following the eBPF situation for a while, but... how did it come to this? I thought the point of BPF (sans 'e' anyway) was that it was pretty much secure by construction, or at minimum was simple enough to fully verify in polynomial time. So these eBPF vulnerabilities sound like a completely invented, unnecessary class of problems.
The track record of eBPF to date has been reasonably strong, and the threat model serious systems give to eBPF is narrow: you care a lot about the formal soundness of the verifier if you're loading untrusted code, and much less if you're never doing that. eBPF has been a pretty important victory for the Linux systems design model.
The real goal of eBPF verification is to avoid kernel crashes, and for that goal, eBPF has been unreasonably successful.
Because devs latched into their interpretation of eBPF’s promises and found ways to make them happen via the way any problem in computer science gets solved: indirection. This is human nature (just look at the stack of BIOS/EFI/OS execution rings that pre-empt each other to provide more features at lower hardware levels). Responsible operators should avoid these hacks where possible.
When I read the title I thought this was maybe about eBPF verification and the difficulty of creating eBPF programs that actually pass the verifier. What's the HN take on this?
I'm not happy about the entire concept of running user code in the kernel. As a special-purpose hack for servers that do very little else, maybe. As a standard OS feature, it seems to create too big an attack surface. One which has been exploited.[1]
[1] https://www.theregister.com/2022/02/23/chinese_nsa_linux/
Your example doesn’t really document an exploit but use of it as a tool in an attack. It’s just an interface they chose to use, not something they broke.
It requires root to use, if someone has root they’ve already owned your system anyway.
The kernel should be considered a tier above root, they shouldn't be considered the same level.
a) Root can be constrained by the kernel via LSM - you can run a program as root and it could be limited to very little given the current set of tools we have.
b) These days unprivileged users can be "root" in their own namespaces, so what "root" is means something very different
Re b): Yeah but, like, colloquially "root" means "a process in the init user namespace with all UIDs set to 0 and a full capability mask".
Re a): If you are root in that sense (and haven't been blanket-denied the ability to use capabilities like CAP_SYS_ADMIN by an LSM), and not subject to a strict seccomp policy, then you cannot really in general be securely constrained with LSMs.
The kernel essentially treats CAP_SYS_ADMIN in the init userns as the catch-all for "you have been granted the ability to administer and access anything on the system", for anything that doesn't have a more specific permission and isn't access-controlled by UID. And if you can, like, call swapon() on an arbitrary file to make the kernel swap memory from the whole system into that file of your choice, LSM-enforced security boundaries probably don't work all that well anymore.
The actual paper.
to secure linux, both ebpf and io_ring need to be disabled in kconfig at kernel compile time.
in security insensitive scenarios, they are both interesting tech.
Radically different thread models. io_uring is conventionally exposed to unprivileged programs, and eBPF virtually never is.
isn’t the current linux security mindset that all access is potentially privileged?
The whole BPF verifier and development process is so botched, it's ridiculous. It's like maintainers decided to make this as hard as possible out of pettiness and "they have to use C APIs instead" or something.
- Loading an eBPF module without the CAP_BPF (and in some cases without the CAP_NET_ADMIN which you need for XDP) capabilities will generate a "unknown/invalid memory access" error which is super useless as an error message.
- In my personal opinion a bytecode format for both little endian (bpfel) and big endian (bpfeb) machines is kinda unnecessary. I mean, it's a virtual bytecode format for a reason, right!?
- Compiling eBPF via clang to the bpf bytecode format without debug symbols will make every following error message down the line utterly useless. Took me a while to figure out what "unknown scalar" really means. If you forget that "-g" flag you're totally fucked.
- Anything pointer related that eBPF verifier itself doesn't support will lead to "unknown scalar" errors which are actually out of bounds errors most of the time (e.g. have to use if pointer < size(packet) around it), which only happen in the verification process and can only be shown using the bpftool. If you miss them, good luck getting a better error message out of the kernel while loading the module.
- The bpftool maintainer is kind of unfriendly, he's telling you to read a book about the bytecode format if your code doesn't compile and you're asking about examples on how to use pointers inside a BPF codebase because it seems to enforce specific rules in terms of what kind of method (__always_static) are allowed to modify or allocate memory. There's a lot of limitations that are documented _nowhere_ on the internet, and seemingly all developers are supposed to know them by reading the bpftool codebase itself!? Who's the audience for using the bpftool then? Developers of the bpftool itself?
- The BCC tools (bpf compiler collection) are still using examples that can't compile on an up-to-date kernel. [1] If you don't have the old headers, you'll find a lot of issues that show you the specific git hash where the "bpf-helpers.h" file was still inside the kernel codebase.
- The libbpf repo contain also examples that won't compile. Especially the xdp related ones [2]
- There's also an ongoing migration of all projects (?) to xdp-tools, which seems to be redundant in terms of bpf related topics, but also has only a couple examples that somehow work [3]
- Literally the only userspace eBPF generation framework that worked outside a super outdated enterprise linux environment is the cilium ebpf project [4], but only because they're using the old "bpf-helpers.h" file that are meanwhile removed from the kernel itself. [5] They're also incomplete for things like the new "__u128" and "__bpf_helper_methods" syntax which are sometimes missing.
- The only working examples that can also be used for reference on "what's available" in terms of eBPF and kernel userspace APIs is a forked repo of the bootlin project [6] which literally taught me how to use eBPF in practice.
- All other (official?) examples show you how to make a bpf_printk call, but _none_ of them show you how to even interact with bpf maps (whose syntax changed like 5 times over the course of the last years, and 4 of them don't run through the verifier, obviously). They're also somewhat documented in the wiki of the libbpf project, without further explanation on why or what [7]. Without that bootlin repo I still would have no idea other than how to make a print inside a "kretprobe". Anything more advanced is totally undocumented.
- OpenSnitch even has a workflow that copies their own codebase inside the kernel codebase, just to make it compile - because all other ways are too redundant or too broken. Not kidding you. [8]
Note that none of any BPF related projects uses any kind of reliable version scheme, and none of those project uses anything "modern" like conan (or whatever) as a package manager. Because that would have been too easy to use, and too easy on documenting on what breaks when. /s
Overall I have to say, BPF was the worst development experience I ever had. Writing a kernel module is _easier_ than writing a BPF module, because then you have at least reliable tooling. In the BPF world, anything will and can break at any unpredictable moment. If you compare that to the experience of other development environments like say, JVM or even the JS world, where debuggers that interact with JIT compilers are the norm, well ... then you've successfully been transferred back to the PTSD moments of the 90s.
Honestly I don't know how people can use BPF and say "yeah this has been a great experience and I love it" and not realize how broken the tooling is on every damn level.
I totally recommend reading the book [9] and watching the YouTube videos of Liz Rice [10]. They're awesome, and they show you how to tackle some of the problems I mentioned. I think that without her work, BPF would have had zero chance of success.
What's missing in the BPF world is definitely better tooling, better error messages (e.g. "did you forget to do this?" or even "unexpected statement" would be sooooo much better than the current state), and an easier way to debug an eBPF program. Documentation on what's available and what is not is also necessary, because it's impossible to find out right now. If I am not allowed to use pointers or whatever, then say so in the beginning.
[1] https://github.com/iovisor/bcc
[2] https://github.com/libbpf/libbpf
[3] https://github.com/xdp-project/xdp-tools
[4] https://github.com/cilium/ebpf/
[5] https://github.com/cilium/ebpf/tree/master/examples/headers
[6] https://elixir.bootlin.com/linux/latest/source/tools/testing...
[7] https://github.com/libbpf/libbpf/wiki/Libbpf-1.0-migration-g...
[8] https://github.com/evilsocket/opensnitch/blob/master/ebpf_pr...
[9] https://isovalent.com/learning-ebpf/
[10] (e.g.) https://www.youtube.com/watch?v=L3_AOFSNKK8
It sounds like you're mostly trying to use higher level toolkits like BCC and clang for eBPF, which I agree isn't a great experience, if you don't know what's going on underneath.
I used to hand-code BPF before LLVM had a backend for it, and I can tell you that each enhancement added to the userland tooling made sense in isolation to help you if you already knew what you were doing. But the overall picture isn't really an SDK - it's more like a collection of someone's bash scripts used to automate repetitive parts of writing the bytecode.
For one thing, 80% of the contents of the popular toolkits is just there to accomplish two goals:
1) Let you pretend to write C / some other higher level language 2) Cut down on manual set up of the maps, checking if BPF is enabled, etc.
Arguably the only really complicated thing the tooling does is CO-RE, which is largely done in the loader, with some C macros to support it.
What you pay for this "convenience" is that the kernel has no idea what the hell you're trying to do. All it sees is the generated, rewritten and relocated BPF bytecode which it has no way of tying back to the C code you made it from.
To arrive at a point - I would honestly recommend trying to write BPF by hand. The bytecode is pretty friendly, the BPF helpers are numbered and you'll see what the verifier is talking about.
After you've got that down, you'll see the two annoying parts: doing BTF-based relocations and doing the setup with BPF maps, etc, and you'll get a feeling for how the clang-based tooling does those things, and what cost it extracts: IMO it's not worth it.
Windows on houses (and other buildings) are flawed. Look! I just broke one with a sledgehammer to prove it. News at 11.
After bashing Java and .NET, the Linux kernel folks discover the complexity of bytecode verification.
Secure code inside the kernel is untenable.
We can do ok, lots of hard work goes in to doing ok, but this isn't the kernels top priority, and never will be.
Userspace is the security boundary.
eBPF verification was always a laugh from the very beginning design stages, if you ask me, because as this paper demonstrates, it was never going to be enough. Anyone with a modicum of security or PLT experience could have told you this when evaluating the design and history. Like, if I had to be completely honest, the very fact the security/robustness model started on principles like "fixed number of loop iterations" or "no backedge jumps" (among several others) in the verifier was a pretty good sign that this was always going to be a source of continuous vulnerabilities. It makes me think people are flying blind. If you're not systematically fixing these issues in the very design stages of the system, and using duct tape, you're just going to patch every single thing one by one as it happens, and then how is that any different from today?
The basic idea is simple. You have the verifier, and the TCB. The verifier has to reject invalid programs, so the TCB does not have its integrity compromised by the program. The verifier is small, so it can be audited. That's nice -- until you back out and realize the TCB is "the entire linux kernel and everything inside of it and all of the surface area API between it and the BPF Virtual Machine" and it dawns on you that at that point the principle of "system integrity being maintained" relies very little on the verifier and actually a whole lot on Linux being functionally correct. Which is where you started at in the first place. The goal of eBPF after all isn't just to burn CPU cycles and return an integer code. It has to interact with the system. Having the TCB functionally be "every line of code we're trying to protect" is the Windows 3.1 of integrity models.
Now, this might also be OK and quantifiable to some extent. Except for the other fact that the guiding design principle in Linux is to pretty much grow without bound, without end, rewrite code left and right, and the eBPF subsystem itself has been endlessly tacking on features left and right for what -- years now?
If you take away any of these three things (flawed design basis, ridiculously large TCB, endless and boundless growth) and modify or remove one of them, the picture looks much better. Solid basis? You can maybe handle the other two if you're careful and on top of things, big hand waive. Very small TCB? Great, you can put significantly more trust in the verifier, freeing you from the need to worry about every line of code. No endless growth? Then you have a target you can monitor and maybe improve on e.g. reduce trends downward over time. But the combination of all three of these things means that the end result is "greater than the sum of the parts" so to speak and it will always be a matter of pushing the boulder up the hill every day, all so it can fall back down again.
That said, eBPF is really useful. I get a ton of value out of it. The verifier does allow you to have greater trust in running things in the kernel. In this case, doing something is quite literally 1,000% better than doing nothing in this if you ask me, at least for most intents and purposes. So making it safer and more robust is worthwhile. But it was pretty easy to see this sort of stuff from a long way out, IMO.
The original BPF model was "no backedge jumps, constrained memory model", and its track record is quite good. Say more about why you think "no backedge jumps" --- which isn't the current, more-sophisticated, harder-to-understand verification model --- was obviously weak.
When I read about eBPF for kernel extension, it immediately made me think it would be full of security problems. I don't even know anything about the kernel, eBPF validation and barely anything about security, but just from a theoretical level, it seems highly insecure to run someone else's code in the kernel. "Verifying" it seems impossible from a theoretical level. Am I wrong? What's the limits of security in eBPF kernel extensions?
1. Using eBPF requires root
2. The verifier checks memory bounds access, guarantees termination in a certain number of instructions, and restricts function calls to a limited number of helper functions provided by the kernel.
3. BPF code runs on a vm, think like the jvm. It’s impossible to express a lot of nasty stuff given the restrictive bytecode language.
There have been bugs in the verifier, but overall it works very well, the biggest issue being that it drastically limits the complexity of your program.
> 1. Using eBPF requires root
Unprivileged eBPF has been around for a long time.
Except that it's been almost universally disabled, for many years. Nobody trusts it.
Idk if I'd call 2 years "many", but yes.
In eBPF years, 2 years is an eternity.