Coding Agent VMs on NixOS with Microvm.nix

michael.stapelberg.ch

109 points by secure 5 months ago · 57 comments

Reader

That is quite an involved setup to get a costly autocomplete going.

Is that really where we are at? Just outsource convenience to a few big players that can afford the hardware? Just to save on typing and god forbid…thinking?

“Sorry boss, I can’t write code because cloudflare is down.”

Cyph0n 5 months ago

Keep in mind that this setup is a one-time cost. Also, a lot of the code is related to configuring it the way the author wants it (via Home Manager).
Generally speaking, once you have a working NixOS config, incremental changes become extremely trivial, safe, and easy to rollback.
- aquariusDue 5 months ago
  
  To provide another data point: I too use NixOS and oh boy that one-time is really costly. And while we're sharing Nix stuff for LLMs there's this piece of kit too: https://github.com/YPares/rigup.nix
  - Cyph0n 5 months ago
    
    Agreed, the learning curve is insane and docs are sparse. But it is truly worth it imo, even if you’re just using Nix as a build tool, or using Home Manager on Linux or macOS.
groby_b 5 months ago

If you believe "costly autocomplete" is all you get, you absolutely shouldn't bother.
You're opting for "sorry boss, it's going to take me 10 times as long, but it's going to be loving craftsmanship, not industrial production" instead. You want different tools, for a different job.

0xcb0 5 months ago

I was looking for a way to isolate my agents in a more convenient way, and I really love your idea. I'm going to give this a try over the weekend and will report back.

But the one-time setup seems like a really fair investment for having a more secure development. Of course, what concerns the problem of getting malicious code to production, this will not help. But this will, with a little overhead, I think, really make development locally much more secure.

And you can automate it a lot. And it will be finally my chance to get more into NixOS :D

NJL3000 5 months ago

A pair of containers felt a bit cheaper than a VM:

https://github.com/5L-Labs/amp_in_a_box

I was going to add Gemini / OpenCode Kilo next.

There is some upfront cost to define what endpoints to map inside, but it definitely adds a veneer of preventing the crazy…

phrotoma 5 months ago

One problem with using containers as an isolation environment for a coding assistant is that it becomes challenging to have the agent work on a containerized project. You often need some janky "docker-in-docker" nonsense that hampers efforts.
- indigodaddy 5 months ago
  
  I like using LXC containers, eg full persistent OS and you can do docker if you want etc. I started this and it works well for me to put on a server or VPS:
  https://github.com/jgbrwn/vibebin
- NJL3000 5 months ago
  
  I was planning to have worktrees bind mounted systematically, but agree it’s not super clean atm at scale (yet)

giancarlostoro 5 months ago

This brings me back to my college days. We had Windows, and Deep Freeze. Students could do anything on the computer, we restart it and its all wiped and new. How long before Deep Freeze realizes they could sell their tool to Vibe Coders, they have Deep Freeze for Mac but not for Linux, funnily enough.

mxs_ 5 months ago

I there a way to make this work with macOS hosts, preferably without having to install a Linux toolchain inside the VM for the language the agent will be writing code in?

mtlynch 5 months ago

This is a similar macOS solution:
https://github.com/lynaghk/vibe/

ghxst 5 months ago

I'm working on a shared remote box for AI assisted development, will definitely look at this for some inspiration.

messh 5 months ago

I use shellbox.dev to create sandboxes through ssh, without ever leaving the terminal

heliumtera 5 months ago

Couldn't you replicate all of your setup with qemu microvm?

Without nix I mean

rictic 5 months ago

Yep. What nix adds is a declarative and reproducible way to build customized OS images to boot into.
- CuriouslyC 5 months ago
  
  Nix is the best answer to "works on my machine," which is a problem I've seen at pretty much every place I've ever worked.
  - 0x457 5 months ago
    
    It's also an answer to caching with /nix/store. I wish more cloud services supported "give me your nixosConfiguration or something similar" instead of providing api to build containers/vms imperatively. Dockerfile and everything that mimics it is my least favorite way to do this.
    
    Cyph0n 5 months ago
    
    It’s fairly trivial to map your NixOS config into a VM image: https://nixos.org/manual/nixos/stable/#sec-image-nixos-rebui...
    An alternative is to “infect” a VM running in whatever cloud and convert it into a NixOS VM in-place: https://github.com/nix-community/nixos-anywhere
    In fact, it is a common practice to use the latter to install NixOS on new machines. You start off by booting into a live USB with SSH enabled, then use nixos-anywhere to install NixOS and partition disks via disko. Here is an example I used recently to provision a new gaming desktop:
    nix run github:nix-community/nixos-anywhere -- \ --flake .#myhost \ --target-host user@192.168.0.100 \ --generate-hardware-config nixos-generate-config ./hosts/myhost/hardware-configuration.nix
    At the end of this invocation, you end up with a NixOS machine running your config partitioned based on your disk config. My disko config in this case (ZFS pool with 1 disk vdev): https://gist.github.com/aksiksi/7fed39f17037e9ae82c043457ed2...
    
    0x457 5 months ago
    
    I know that part is easy, i just nix-anywhere just yesterday to reinstall one of my servers. It's not what I'm talking about.
    
    Cyph0n 5 months ago
    
    Okay, so your idea is that cloud providers should make this even easier?
    $ nixos-rebuild build-image --flake .#myhost --image-variant amazon $ aws-cli image upload < result/images/image.ami $ aws-cli create vm --image={image}
    
    0x457 5 months ago
    
    Less about IaaS providers, more about PaaS providers that often abstract away image you're running and tell you "just run pip/apt/gem install whatever".
    Same with the CI platforms, instead of `setup-*` steps in GHA it could have just take flake in. Yes, I know I can build OCI image with nix, again, not the issue.
    My private CI runs on top of nix, all workers on the same host share /nix/store. My pipelines focused on running actual things rather than getting a worker ready to run things. If I didn't want output to be parsed by CI, I could have just reduced my pipeline to `nix flake check`.
    I share the exact same pipeline and worker image across multiple projects in multiple languages, all because everything is hidden behind devenv's tasks. When I switched project different rust and node versions, I didn't have to touch my CI at all. When I added a bunch of native deps that usually needed to be installed separately on GHA - again, didn't have to touch anything beyond my nix env once.
- schmuhblaster 5 months ago
  
  Or try this: https://github.com/deepclause/agentvm, it's based on container2wasm, so the VM is fully defined by a Dockerfile.

clawsyndicate 5 months ago

we run ~10k agent pods on k3s and went with gvisor over microvms purely for density. the memory overhead of a dedicated kernel per tenant just doesn't scale when you're trying to pack thousands of instances onto a few nodes. strict network policies and pid limits cover most of the isolation gaps anyway.

secureOP 5 months ago

Yeah, when you run ≈10k agents instead of ≈10, you need a different solution :)
I’m curious what gVisor is getting you in your setup — of course gVisor is good for running untrusted code, but would you say that gVisor prevents issues that would otherwise make the agent break out of the kubernetes pod? Like, do you have examples you’ve observed where gVisor has saved the day?
- zeroxfe 5 months ago
  
  I've used both gVisor and microvms for this (at very large scales), and there are various tradeoffs between the two.
  The huge gVisor drawback is that it __drastically_ slows down applications (despite startup time being faster.)
  For agents, the startup time latency is less of an issue than the runtime cost, so microvms perform a lot better. If you're doing this in kube, then there's a bunch of other challenges to deal with if you want standard k8s features, but if you're just looking for isolated sandboxes for agents, microvms work really well.
- clawsyndicate 5 months ago
  
  since we allow agents to execute arbitrary python, we treat every container as hostile. we've definitely seen logs of agents trying to crawl /proc or hit the k8s metadata api. gvisor intercepts those syscalls so they never actually reach the host kernel.
  - alexzenla 5 months ago
    
    The reason why virtualization approaches with true Linux kernels is still important is what you do allow via syscalls ultimately does result in a syscall on the host system, even if through layers of indirection. Ultimately, if you fork() in gVisor, that calls fork() on the host (btw fork() execve() is expensive on gVisor still).
    The middle ground we've built is that a real Linux kernel interfaces with your application in the VM (we call it a zone), but that kernel then can make specialized and specific interface calls to the host system.
    For example with NVIDIA on gVisor, the ioctl()'s are passed through directly, with NVIDIA driver vulnerabilities that can cause memory corruption, it leads directly into corruption in the host kernel. With our platform at Edera (https://edera.dev), the NVIDIA driver runs in the VM itself, so a memory corruption bug doesn't percolate to other systems.
    
    syzcowboy99 5 months ago
    
    > Ultimately, if you fork() in gVisor, that calls fork() on the host
    This isn't true. You can look at the code right here[1], there is no code path in gVisor that calls fork() on the host. In fact, the only syscalls gVisor is allowed to make to the host are listed right here in their seccomp filters[2].
    [1] https://github.com/google/gvisor/blob/master/pkg/sentry/sysc...
    [2] https://github.com/google/gvisor/tree/master/runsc/boot/filt...
    
    alexzenla 5 months ago
    
    I was more specifically referring to the fact that to implement threads in gVisor, it calls to the go runtime, which does make calls to clone() (not fork()), but I see the pushback :)
    I think it's a small distinction. fork() itself isn't all that useful anyways.
    However, consider reading a file in gVisor. This passes through the IO layers, which ultimately will end up a read in the kernel, through one of the many interfaces to do so.
  - rootnod3 5 months ago
    
    And you see no problem in that at all? Just “throw a box around it and let the potentially malicious code run”?
    Wait until they find a hole. Then good luck.
    
    alexzenla 5 months ago
    
    This is why you can't build these microVM systems to just do isolation, it has to provide more value than that. Observability, policy, etc.
alexzenla 5 months ago

This is a big reason for our strategy at Edera (https://edera.dev) of building hypervisor technology that eliminates the standard x86/ARM kernel overhead in favor of deep para-virtualization.
The performance of gVisor is often a big limiting factor in deployment.
- souvik1997 5 months ago
  
  Edera looks very cool! Awesome team too.
  I read the thesis on arxiv. Do you see any limitations from using Xen instead of KVM? I think that was the biggest surprise for me as I have very rarely seen teams build on Xen.
  - alexzenla 5 months ago
    
    I'd say the limitation has been that sometimes we have to implement things by hand. But it has enabled us to do things that others can't achieve since KVM is a singular stack in many ways. For example, VFIO-PCI is largely the same across all VMMs, but we have true full control over the PCI passthrough on our platform which has allowed us to do things KVM VMMs can't.
- yearolinuxdsktp 5 months ago
  
  How do you compete with Nitro-based VMs on AWS with 0.5% overhead?
  - alexzenla 5 months ago
    
    When running on bare metal, the CPU performance is within 1%, so usually quite well! Hardest thing is I/O, but we do a lot to help with that too.
souvik1997 5 months ago

Hey @clawsyndicate I'd love to learn more about your use case. We are working on a product that would potentially get you the best of both worlds (microVM security and containers/gVisor scalability). My email is in my profile.
- alexzenla 5 months ago
  
  This is the thesis of our research paper here, a good middle ground is necessary: https://arxiv.org/abs/2501.04580
dist-epoch 5 months ago

LXC containers inside a VM scales. bonus point that LXC containers feel like a VM.
- indigodaddy 5 months ago
  
  I started this with same idea:
  https://github.com/jgbrwn/vibebin

Settings

Coding Agent VMs on NixOS with Microvm.nix

Keyboard Shortcuts