written by
published
It is 00:43 at night. I look at the plan and press y. “s4.ruuda.nl: connecting …” I hold my breath. “Applying” briefly flashes in my terminal window before settling on “done.” It worked. Now the real work can start. Pop stack frame. What was I doing again?
People who have worked with me for a while might accuse me of suffering from not-invented-here syndrome. I prefer to call it “having higher standards.” Why subject yourself to endless frustration that mediocre tools inflict on you, when you can build your own tools that are actually nice to use? Anyway, what was I doing again?
Right, I wanted to write a blog post about European digital sovereignty. It would be hypocritical to publish that on a blog hosted in the US, with a US-controlled hyperscaler. So let’s move that to Europe, easy enough. Push stack frame. Spawn a new VM, point the DNS records at … oh, right, DNS. I use Cloudflare for that. Another entity that Trump can order to stop providing services when shit hits the fan. Maybe I should self-host my DNS servers then. Push stack frame.
My webserver is a tiny VM that runs Nginx, plus Lego to renew certificates. The Nginx configuration grew somewhat complicated over the years, but I generate it with Nix so it’s fine, just two configuration files. I wrote a small Python script that copies the files to the server and restarts Nginx. The script served me well for the past years, but now I want to start running DNS servers. I need at least two servers now. And more systemd units, configuration files, zonefiles … The script is not going to cut it any more, I need serious cluster configuration management. Push stack frame.
“NixOS” I hear Arian’s voice whisper in my head. “Just use NixOS. It’s only one line to configure, services.nsd.enable = true.” He’s right of course. I already use Nix to build minimal EROFS images for Nginx and Lego. That’s how I run them on Flatcar. But I like the idea of a minimal base OS, and running my services from readonly chroots with no more binaries than needed. “Let’s not scope-creep this into switching distros right now,” I tell myself. “Let’s build a new deployment tool instead.”
How it looks
It is now one month later, and Deptool exists. This is me updating my DNS records:
$ deptool deploy
s4.ruuda.nl
update nsd
~ zones/ruuda.nl.zone
restart unit nsd.service
s5.ruuda.nl
update nsd
~ zones/ruuda.nl.zone
restart unit nsd.service
Auto-rollback if deploy fails.
Apply to 2 hosts in cluster 'prod'? [y/N/d] y
s4.ruuda.nl: done
s5.ruuda.nl: done
Changes deployed successfully to 2 hosts in 0.99s.
In this post we’ll walk through how it works, but let’s not run ahead. How did I get here?
Wishlist
If I’m going to build my own tool … what would an actually nice configuration management tool look like? Here we can look to Ansible for guidance: it made all the mistakes so that others can learn from them. I want my tool to be:
Fast. A configuration update should be sub-second. There’s no fundamental reason for it to be slower than that, even a transatlantic ping is only 100ms.
Predictable. The tool should show me what it’s going to do, and then do just that. Like OpenTofu, with a separate plan and apply phase. Not like Ansible, where check mode is useless because every imperative step can trigger a cascade of changes that are only known after executing the step. And where nothing prevents the host from changing between the check and the real run, making the check more of a vibe check than something you can depend on.
Safe. If I break my Nginx configuration, I don’t want my webserver to be down for minutes while I frantically try to fix it. (Me: “Ah, only a small typo, faster to just fix it than to try and restore the previous version.” Narrator: “If only that typo were the only problem …”) No, I want the tool to automatically roll back for me. In milliseconds.
Simple. I just need to copy configuration files from my laptop to my servers and restart a few systemd units. I don’t need to solve every deployment problem for everybody; I don’t need control flow or arbitrary code execution. I do need to be able to template — excuse me, generate — configuration files, but a separate tool can do that.
Declarative. If I remove a file or application from my config, it should be removed from the server. I don’t want to have to add explicit cleanup steps, and end up with drift and lingering files when I inevitably forget that.
Zero-setup. I want to use this tool to manage my servers right after provisioning them. I don’t want to manually install agents, daemons, or dependencies, and I don’t want to have to enroll or register the host anywhere, because then I’d have the new problem of automating that.
Decouple distribution
The core idea feels obvious in hindsight, which is maybe why it keeps appearing everywhere. At work, David had recently built Unsible, a tool that takes Ansible playbooks, but instead of executing them step by step, it locally builds a tarball and ships it to the host, where it mostly just needs to put the files in place. I like this idea of decoupling configuration generation from distribution. It’s how my barebones deploy script worked too: build the configuration externally, and the script is mostly a dumb file copier. In a sense, NixOS is this idea applied to the local system. We can learn more from Nix: store generated artifacts in a place where different versions can coexist, and limit the imperative parts of system administration to a small activation step that swaps a few symlinks. This design works well for both package management and system configuration.
With the scope reduced to distributing small files and running a simple activation step, deployment becomes a tractable problem to solve. I call my take on it Deptool. Here’s how it works.
Pre-render config files for the entire cluster. Store them in a directory on disk. This directory tree is two levels deep: a directory per target host at the top level, a directory per application below that.
Put that in a Git repository. Now we can diff one version against another, and see what changed across the entire cluster. From the diffstat we can see which hosts are affected and which apps changed there, and for every config file we have a precise diff.
Materialize files in an isolated directory on the hosts. We put everything in /var/lib/deptool where it doesn’t interfere with anything. In there we create a directory named after the commit to deploy. Now multiple versions can coexist on disk. Then we point a current symlink at the deployed version. Now we can swap versions atomically, and we never have lingering files. Deleted files are simply not materialized in a next version. For applications that demand files in particular locations, we can create symlinks to /var/lib/deptool anywhere on the filesystem. That step is not atomic, but it only has to happen when adding or removing symlinks, not when we edit the files they point to. And if we later deploy a version that no longer includes the symlink, we know from the diff that we need to delete it. Nothing lingers.
Track what is deployed with remote-tracking refs. Back on the operator laptop, we track the commit we deployed to each host. This is a per-host property rather than a cluster-wide property, because if a change doesn’t affect a host, we don’t need to deploy the new commit there. Now we can compute the cluster diff offline. This diff is the deployment plan, and we can present it in milliseconds.
To deploy, first try to take a lock at the target host. We connect over SSH and send a lock request. In the request we include the commit that we think is deployed on that host. If we get the lock, then our plan was valid, and nothing else can be deployed on that host until we release the lock, so the plan remains valid. Deployment only proceeds if we hold the lock for all hosts affected by a change. If any ref was outdated and something else is deployed on a host, that means the plan is stale, and we abort. We update our local refs, and the next run will show an up to date plan.
Restart systemd units. I run all my services as systemd units, and they are quick to start, so I can err on the side of restarting them when in doubt. After an application’s configuration changed, we restart the affected systemd units. If the unit fails to start, we can point the symlink back at the previous known-good version, and restart again. Automatic rollback in milliseconds.
So this is the core idea behind Deptool. Cluster configuration is just a directory tree tracked in a Git repository, and on each host we check out the relevant files into isolated directories, change a few symlinks, and restart systemd units. That’s it!
Optimistic concurrency
A Deptool deployment has elements of optimistic concurrency. We assume that we know the current cluster state and plan based on that. If the assumption turned out wrong, we have to retry. This means it’s very fast when there is no contention (when I’m the only person deploying, always from the same laptop), at the cost of performing abysmally under contention (a team of people all trying to deploy constantly, where only one can win and everybody else has to retry). This is the same model as git push. It’s not going to scale to hundreds of people or thousands of servers, but that’s okay. I’m the only person managing my personal infra, and I don’t have thousands of servers. The nice thing about building your own tools is that you can optimize them for exactly your own use case!
Building the agent
We have the design, we know on a high level what to do. How do we make it happen on a managed host? My webserver runs Flatcar Linux, an image-based OS with a very minimal userspace. It has coreutils and Bash, but no package manager and no Python. This is great for reducing attack surface. It’s not so great for installing things. I want to use my tool to manage my servers, so it needs to be compatible with Flatcar out of the box. If I need to install something for the tool to work, then I’d have the new problem of automating that installation process!
So we need to manage the fresh host from the outside. I can SSH into it, and use passwordless sudo there. Maybe we can just execute commands over SSH? If you have ever tried to automate anything mildly nontrivial that way, you will have quickly learned the hard way that the handshake is not just slow — the argv also doesn’t cross the SSH boundary unscathed. I want a fast tool and I don’t want to deal with the subtleties of word splitting and escaping shell-over-SSH. So let’s sidestep this minefield. How about we run one simple command? A program that takes zero args, at a predictable location. This is the agent. The agent reads messages from stdin, and responds on stdout. We use SSH only for transport, as a socket. Now we have code execution on the host, and we can send whatever input we want, without ever having to worry about escaping!
Now we just need to start the agent … But how did the agent get there? This is a fresh host! And how do we even build this agent? Here too we can look to Ansible for mistakes to avoid. After you mitigate its worst flaws, Ansible still sends over megabytes of Python modules every single time it connects to a host. Needless to say, this is slow. Excruciatingly slow. The one nice thing about this design is that you know exactly what’s running on the remote end, so you never have version mixups. In theory, at least. In practice I’ve wasted more hours than I dare to admit staring at cryptic Python errors, because Ansible is not fully self-contained. That creates a compatibility nightmare. Besides, we don’t even have Python on Flatcar. So how about this instead?
We build a static binary. No assumptions about what’s available except the kernel, and no interpreter that has to parse through megabytes of code before it can start doing anything useful.
We put it at a location named after the commit it was built from. This ensures that both ends of the connection run the same version, so there are never any compatibility issues. If we managed to start the binary, it speaks the right protocol. We’ll put it at /var/lib/deptool/bin/deptool-<version>-<commit>.
We optimistically assume the binary is already there. An SSH handshake is fairly expensive. Let’s not waste any on probing or on idempotent installation steps. The binary is about 1.6 MB. Not prohibitively slow to send, but not free either, and cluster configuration changes far more frequently than Deptool updates; the common case is that the binary exists.
If we failed to start the binary, then we need to install it. We connect over SSH a second time, now with this command:
uname -sm
&& sudo mkdir -p /var/lib/deptool/{bin,apps,store}
&& sudo dd status=none of=<remote_bin_path>
&& sudo chmod +x <remote_bin_path>
&& sudo sha256sum <remote_bin_path>We drive that like so. First, we read one line from stdout to get the uname output. This tells us the machine’s OS and CPU architecture, so we can send the appropriate agent binary for that platform. We write that binary to stdin; dd writes it to disk on the remote end. Finally, we read one more line from stdout: the remotely computed shasum. This confirms that the transfer was successful. The entire process depends only on standardized coreutils programs. When we retry starting the agent now, it should succeed, and the agent cleans up old versions to avoid filling up the disk.
What did we gain?
- A way to execute an agent on a remote host and communicate with it.
- Automatic installation that requires nothing beyond coreutils on the remote host.
- Both sides run the same version, the protocol is compatible by construction.
- All user-controlled input goes over our SSH-backed socket, it never enters SSH or shell commands, so we sidestep any escaping issues and length limits.
- It costs only a single SSH handshake in the common case, so latency is minimal.
- In the uncommon case (deploying against a new machine, or after updating the tool) it’s two additional connects plus a one-time 1.6 MB transfer. With
ControlMasterwe can skip most of the overhead on the later connections, so overall it costs a few seconds. No sub-second deploy in this case, but still better than Ansible.
This approach has been working out great for me even as I was building Deptool, and the case of having to copy the binary was relatively more common. Especially for the flow where I deploy a configuration, tweak it a bit, and deploy again, SSH can keep the underlying connection alive, so deployment really feels instant.
Conclusion
I’ve been using Deptool to manage my personal infra for the past month, and I’m very pleased with it. It’s super nice to instantly see an accurate plan before connecting, and to have automatic rollback, but sub-second deploys are the real game-changer. When the proper way to deploy takes minutes, I’m always tempted to edit files directly on the server to shorten the feedback loop. With Deptool, it’s faster to make the edit locally and deploy, than to SSH into the server and open an editor there. The proper way is the one with least friction. Every applied edit gets recorded in the Git history, and if I break something, Deptool rolls back before I even realize it was broken. It’s been refreshing to use a tool that gets out of the way, rather than one I dread running.
I built Deptool for myself, to solve exactly my problem. The fact that it doesn’t try to solve every deployment problem for everybody is also what enables it to shine for my use case. Still, I think it’s too good not to share, and I hope others may find it useful, especially for working with image-based operating systems. Deptool is now available from Codeberg and GitHub, and it has an extensive manual.