Self-hosting my websites using bootable containers

54 min read Original article ↗

I've been running this website since 2008. Over the years I've changed hosting providers and what software to use a bunch of times. Some time in 2015 I switched to hosting it as a static website on Amazon CloudFront. A few years ago I moved from Amazon to Cloudflare, primarily because Cloudflare Pages and Cloudflare R2 have a generous free tier, and an interface that isn't a confusing mess.

Cloudflare isn't without its issues though. For example, the manual and standard library documentation for Inko is versioned using sub-directories so you end up with URLs like so:

While you can technically do this with Cloudflare Pages, the deployment model is such that you'd have to keep these generated files around and include them as part of each deployment. As far as I know you also can't download the source files of a deployment, thus requiring you to track all generated files in Git just so future deployments keep the data around.

Then there are the release artifacts of Inko's release process, such as source archives and pre-compiled runtime libraries for cross-compilation. Here we run into a similar issue: we generate the files once then never touch them again. Unlike the documentation they're also binary blobs which Git doesn't handle well, unless you use Git LFS.

To work around these issues I was using two different setups:

  • https://yorickpeterse.com/ and https://inko-lang.org/ used Cloudflare Pages as these sites are always built from source using inko-wobsite
  • The documentation and release artifacts used two separate Cloudflare R2 buckets with public website hosting enabled, using a custom domain name to make this transparent to clients

While this worked, I was never a fan of it, especially as I've found R2's pricing structure somewhat confusing. I also never liked how they took a similar approach to AWS by dangling a carrot in your face, only to say "if you want this carrot and 200 other things you didn't ask for, you need to pay $200/month". In case of Cloudflare that carrot could be something as simple as "website metrics that aren't useless".

In 2025 two things happened that made me decide it was time to move away from Cloudflare and US provided services (where possible):

  • Several high-profile outages that lead to many Cloudflare services being unavailable for hours, including my websites
  • The United States decided the best course of action was to screw over all its allies, kidnap a president, consider invading Greenland, and do a whole bunch of other dumb things

With that in mind I spent the last few months looking into alternative hosting providers and technology stacks. "Just get to the point Yorick!" OK OK, I hear you, let's get started.

Table of contents

Immutable infrastructure

I've been a fan of immutable infrastructure ever since first introduced to it back in 2012. Back then I was working for a small company that did a lot of scraping and analysis of travel reviews. The company used AWS and made heavy use of EC2 spot instances. To allow for fast deployments, the VM images contained everything they needed to run our applications. Upon first boot each VM would download the necessary service(s) to run from S3 and start them.

To apply server updates we'd build a new VM. I think this part was still somewhat manual. I don't think Packer was around yet, and we certainly didn't use Chef or Ansible. I guess it was a combination of shell scripts and manual work. To deploy the update we'd gradually roll it out by deploying new servers, replacing the old ones in the process.

The resulting setup was a semi immutable setup: downloading of applications would mutate the server (but only upon boot), but everything else was provided by the VM image.

The benefit of such a setup is being able to quickly deploy new servers, without the need for centralized configuration management systems, and a more deterministic environment as deploying the same image 10 times should produce the same results 10 times.

While these ideas have been around forever, the process of building immutable server images has historically been rather clunky and often tied to specific cloud hosting providers. In recent years there's been a push in the Linux ecosystem towards more immutable distributions and infrastructure, with some examples being Fedora Silverblue, Fedora CoreOS, Bazzite, and whatever systemd is doing.

Wanting to use something similar to what I worked with back in 2012, I spent a few weeks evaluating different tools for building immutable infrastructure:

For each tool I tried to build an image to provision a new OS, and an image of sorts to update it. The resulting source code is found in this Git repository.

FreeBSD and Poudriere

The first tool I tried was Poudriere. While Poudriere is primarily used for building FreeBSD package repositories it's also able to produce OS images and ZFS based update images. There's also NanoBSD that's been around since 2006. I chose not to look into NanoBSD because it seems focused on building images for e.g. USB sticks using UFS, and because from what I could find it builds everything from source rather than reusing FreeBSD's package manager to install existing packages where possible.

The experience of using Poudriere was mixed. For example, the command you need to run to build an image is pretty simple:

poudriere image \
    -j custom-image \         # The name of the jail to use
    -p latest \               # The ports tree to use
    -n custom \               # The name of the image to build
    -h freebsd-custom \       # The hostname for the image (not required)
    -s 10g \                  # The size of the disk image
    -w 1g \                   # The size of the swap partition
    -f ./packages.txt \       # A list of packages to install
    -t zfs+gpt \              # Produce a ZFS disk image with a GPT layout
    -A hooks/post-build.sh \  # A post-build script to run
    -c overlay \              # A directory of files to copy into the image
    -o build                  # The directory to store output files in

In contrast, I spent quite some time fighting the configuration file Poudriere requires to work in the first place. I ended up with the following:

# Settings that I changed:

# The default name of the root pool. Change this if you use something else.
ZPOOL=zroot

# Where to download data from.
FREEBSD_HOST=https://download.FreeBSD.org

# I changed this from the default (/usr/ports) because otherwise Poudriere
# starts screaming about a "distfiles" directory not existing. So much for
# sensible defaults.
DISTFILES_CACHE=/usr/local/poudriere/ports

# Without this Poudriere tries to build using `nobody:nobody` which will then
# fail. For some reason this setting is ignored by regular poudriere but honored
# by poudriere-devel. OK then?
BUILD_AS_NON_ROOT=no

# The "pkg" branch to fetch from, in this case the latest branch instead of
# quarterly so we actually get updates in a reasonable timeframe.
PACKAGE_FETCH_BRANCH=latest

# The base URL to fetch from. DO NOT use "pkg+https..." or something like that
# because Poudriere will silently accept it and then just fail to fetch
# packages, but not provide you with a reasonable error message of some sort.
# See https://forums.freebsd.org/threads/problem-with-poudriere-and-packages-fetch.99072/
# for more details.
PACKAGE_FETCH_URL=https://pkg.freebsd.org/\${ABI}

# Settings that I didn't change and were uncommented by default:

RESOLV_CONF=/etc/resolv.conf
BASEFS=/usr/local/poudriere
USE_PORTLINT=no
USE_TMPFS=yes

I'm not the only one that ran into issues while configuring Poudriere. A big source of frustration here is that if Poudriere is given the wrong configuration it will accept the value and produce an error message that amounts to "Something went wrong". This is further complicated by Poudriere's lacking documentation, especially when it comes to its various imaging related features.

There are also issues with Poudriere insisting to build packages from source even when you tell it to use existing binary packages. This pull request from 2024 is supposed to fix that (at least based on what I could find on the FreeBSD forums and such), but it never got reviewed or merged. I ran into this issue myself when Poudriere decided to build FreeBSD's own package manager from source in spite of the above configuration that should cause it to use pre-built packages instead. To resolve that I had to explicitly add pkg to the list of packages to install (using the packages.txt file used by the poudriere image command).

In the end my conclusion is that while the combination of FreeBSD and Poudriere seems interesting, it's sorely lacking in the polishing and documentation department, and that's ignoring the many challenges you'll face by using FreeBSD instead of Linux, something I wrote about here and here. You can find some additional notes about my experience with Poudriere here.

FreeBSD and bsdinstall

Besides experimenting with Poudriere I also experimented with using bsdinstall (provided by FreeBSD itself) and raw jails, hoping this would allow me to work around the issues of Poudriere.

This process was frustrating because while all the pieces necessary seem to be there, bsdinstall doesn't appear to be built with unattended installations in mind, or at least doesn't consider it a primary use case. For example, bsdinstall itself always shows a TUI interface during the installation process which is something you don't want when running your build in an unattended manner (e.g. as part of a CI job).

To avoid this issue you have to use the underlying commands directly (bsdinstall scriptedpart, bsdinstall mount, bsdinstall bootconfig, etc). Even when doing this the bsdinstall scriptedpart command still pops up a TUI, even if the process is fully automated.

While I was able to produce a disk image I wasn't able to get it to boot. Since this was just the initial image and I'd still have to figure out how to produce ZFS boot environment images, I decided I had enough of FreeBSD and move on. If you're curious, you can find the failed experiment here.

mkosi

mkosi is a tool by the same people that brought us systemd. Two types of outputs can be produced using mkosi: disk images (raw images, qcow images, etc), and images for specific partitions that may be consumed by systemd-sysupdate to update an existing system.

Getting started with mkosi wasn't too bad: create a configuration file, run mkosi build, ignore all the noise it writes to STDOUT and there you have it: a bootable disk image that you can then try in a VM using the mkosi vm command. Neat!

Going beyond the basics turned out to be a challenge though. For example, the manual pages cover the various options in great detail but are sorely lacking when it comes to simple end-to-end examples. There also aren't many articles about using mkosi and the few that I did find either used Nix such that they didn't make any sense to me, or they were years old and no longer compatible with the latest version of mkosi.

One practical problem I ran into is that the order of sections in the configuration file matters: if you customize the initrd/initramfs image the Output section must come before the Content section so settings in the Content section can refer to values from the Output section. Trying to figure out why things weren't working before I realized this was certainly "fun".

I'm also not a fan of how systemd-sysupdate applies updates: you have to ship what is essentially an image of an entire partition to hosts that need to be updated. If that partition contains 10 GiB of data then every update will be 10 GiB, even if you only changed a tiny configuration file. Compression may help here but will likely be of limited help. Perhaps there's some magical option to work around this but I wasn't able to find any. To make things worse, I never actually got the updating part to work and just gave up. You can find some additional notes on this here.

There's also the larger issue of systemd lock-in. Don't get me wrong, I like systemd as an init system and process supervisor, something it does a much better job at than anything that came before. What I don't like is how there's an ever increasing suite of systemd-something tools with questionable reasons for their existence (for example why is systemd-homed a thing?). Perhaps I'm wrong here, but either way I'd rather not depend too much on the systemd suite outside of using it as an init and process supervisor.

This means that mkosi wouldn't cut it either, so on to the next tool!

bootc

Bootable containers and specifically bootc is an interesting new way of building disk images and updates for image based systems. Using the same approach used for building OCI containers (which you either love or hate) you can now build an initial disk image and an update image. The update images use the same technology as OCI containers (because they are OCI containers) and thus you're able to benefit from incremental updates by only downloading new or changed layers.

Using the 10 GiB partition example from earlier, an update to such a system wouldn't produce a 10 GiB update but rather an update the sum size of all affected layers. If you're smart enough to order your layers such that more frequently changed layers come last, you can drastically reduce the size of updates. Clever use of layers also speeds up the build process as you only need to rebuild what changed.

Getting started took a bit of effort though, as with many Fedora related projects the documentation around bootc is messy. And just as with Fedora's package tooling there isn't just one tool that you have to familiarize yourself with, instead there's a range of projects that are either deprecated, experimental and/or not well documented.

I actually gave up on bootc initially but later came back to it and managed to power my way through the remaining challenges. Who said stubbornness is a bad thing?

In fact, this website is running in a container hosted on a bootc powered server. Thanks to bootc I can rebuild the server from scratch (or just update it) in a matter of minutes. I can also test my changes locally in a VM without the need for a different build process or tool.

Let's take a closer look at how this works, and how you can get started with bootc yourself without going through the trouble I had to go through to make it work.

Getting started with bootc

As mentioned before, bootc turns containers into bootable disk images. Updates are applied by downloading a new image and staging it such that the next reboot applies it. Rollbacks are performed by staging an older image followed by a reboot. At the moment the ecosystem is limited to Fedora and CentOS but a bunch of people managed to get it to work for other distributions as well, it's just not officially supported.

To get started we'll need the following:

  • Podman: the container engine used to build and run the containers locally
  • bootc-image-builder: used to build initial disk images and installers
  • QEMU: to test the disk images in a local VM

To start, let's create a simple Fedora image that contains fastfetch:

FROM quay.io/fedora/fedora-bootc:43

RUN --mount=type=cache,target=/var/cache,sharing=locked \
    --mount=type=cache,target=/var/lib/dnf,sharing=locked \
    --mount=type=tmpfs,target=/var/log \
    dnf install --assumeyes --quiet fastfetch

RUN bootc container lint --fatal-warnings

The FROM command specifies the base image to use for building our container. The first RUN command installs the fastfetch package and ensures that temporary data (e.g. the package manager's cache) isn't persisted in the container itself. This ensures we don't end up with a container that contains a bunch of build output we don't need. The last RUN command runs a linter during the build process, and I highly recommend you always include that line as the last command in your Containerfile.

To build the container, save this somewhere in a Containerfile and run the following:

podman build -t bootc-test .

The output will be something along the lines of the following:

STEP 1/3: FROM quay.io/fedora/fedora-bootc:43
STEP 2/3: RUN --mount=type=cache,target=/var/cache,sharing=locked     --mount=type=cache,target=/var/lib/dnf,sharing=locked     --mount=type=tmpfs,target=/var/
log     dnf install --assumeyes --quiet fastfetch
Package    Arch   Version       Repository      Size
Installing:
 fastfetch x86_64 2.57.1-1.fc43 updates      1.6 MiB
Installing dependencies:
 yyjson    x86_64 0.12.0-1.fc43 fedora     264.2 KiB

Transaction Summary:
 Installing:         2 packages

Importing OpenPGP key 0x31645531:
 UserID     : "Fedora (43) <fedora-43-primary@fedoraproject.org>"
 Fingerprint: C6E7F081CF80E13146676E88829B606631645531
 From       : file:///etc/pki/rpm-gpg/RPM-GPG-KEY-fedora-43-x86_64
The key was successfully imported.
[1/4] Verify package files              100% |   2.0 KiB/s |   2.0   B |  00m00s
[2/4] Prepare transaction               100% |  44.0   B/s |   2.0   B |  00m00s
[3/4] Installing yyjson-0:0.12.0-1.fc43 100% |  86.5 MiB/s | 265.6 KiB |  00m00s
[4/4] Installing fastfetch-0:2.57.1-1.f 100% |  22.3 MiB/s |   1.6 MiB |  00m00s
--> 674e550d0e18
STEP 3/3: RUN bootc container lint --fatal-warnings
Checks passed: 12
Checks skipped: 1
COMMIT bootc-test
--> 72b0d8f11141
Successfully tagged localhost/bootc-test:latest
72b0d8f11141d11ac4086dd151c4c78e970587d61043e6a08fe538974a098f5b

Let's quickly test our container by running the following:

podman run --rm bootc-test:latest fastfetch

On my system this produces the following:

             .',;::::;,'.                 root@c486b84564bd
         .';:cccccccccccc:;,.             -----------------
      .;cccccccccccccccccccccc;.          OS: Fedora Linux 43 (Forty Three) x86_64
    .:cccccccccccccccccccccccccc:.        Kernel: Linux 6.18.8-200.fc43.x86_64
  .;ccccccccccccc;.:dddl:.;ccccccc;.      Uptime: 6 hours, 28 mins
 .:ccccccccccccc;OWMKOOXMWd;ccccccc:.     Packages: 530 (rpm)
.:ccccccccccccc;KMMc;cc;xMMc;ccccccc:.    Shell: bash 5.3.0
,cccccccccccccc;MMM.;cc;;WW:;cccccccc,    Display (CS2740): 3840x2160 in 27", 60 Hz [External]
:cccccccccccccc;MMM.;cccccccccccccccc:    CPU: AMD Ryzen 5 5600X (12) @ 4.65 GHz
:ccccccc;oxOOOo;MMM000k.;cccccccccccc:    GPU: Intel Arc A380 @ 2.45 GHz [Discrete]
cccccc;0MMKxdd:;MMMkddc.;cccccccccccc;    Memory: 3.24 GiB / 15.51 GiB (21%)
ccccc;XMO';cccc;MMM.;cccccccccccccccc'    Swap: 132.00 KiB / 8.00 GiB (0%)
ccccc;MMo;ccccc;MMW.;ccccccccccccccc;     Disk (/): 62.49 GiB / 463.16 GiB (13%) - overlay
ccccc;0MNc.ccc.xMMd;ccccccccccccccc;      Local IP (enp7s0): 192.168.1.123/24
cccccc;dNMWXXXWM0:;cccccccccccccc:,       Locale: C
cccccccc;.:odl:.;cccccccccccccc:,.
ccccccccccccccccccccccccccccc:'.
:ccccccccccccccccccccccc:;,..
 ':cccccccccccccccc::;,.

Now that we have a working container, let's build a disk image using bootc-image-builder:

mkdir -p build
sudo podman build -t bootc-test .
sudo podman run \
    --rm \
    --interactive \
    --tty \
    --privileged \
    --security-opt label=type:unconfined_t \
    --volume ./build:/output \
    --volume /var/lib/containers/storage:/var/lib/containers/storage \
    quay.io/centos-bootc/bootc-image-builder:latest \
    --type raw \
    --use-librepo=True \
    --rootfs ext4 \
    --chown $(id -u):$(id -g) \
    localhost/bootc-test:latest

"Woah! What's all this business?" you may wonder. Well, first we need to run the commands as root because bootc-image-builder does certain things that require a privileged container and privileged containers require the user to have root access. bcvk is supposed to remove the need for root and make this process easier, but I found it to be a buggy mess. Perhaps this has something to do with large portions of it being written by Claude Code. Either way, we're going to stick to bootc-image-builder since it's the least immature option available.

The options such as --security-opt and --volume are all necessary to give bootc-image-builder the permissions it needs to operate. The more interesting options are the following:

  • --type raw: we're building a raw disk image
  • --use-librepo=True: speed up the build process
  • --rootfs ext4: use ext4 as the root file system, necessary when building a Fedora based container
  • --chown ...: set the permissions to the current user (before the sudo invocation) so the image isn't owned by root

The use of --use-librepo=True instead of --use-librepo True is deliberate as without the = you'll get an incorrect argument error. Why this is the case I don't know.

The last argument is the qualified name of our container to turn into an image. If you're building a local container you must include the repository name (= localhost), otherwise bootc-image-builder won't be able to find the image for some reason.

Building a disk image using the above commands should take a few minutes or so on commodity hardware. Once done your newly built image is found in the build directory:

$ tree build
build
├── image
│   └── disk.raw
└── manifest-raw.json

2 directories, 2 files

To test our image in a VM, run the following:

qemu-system-x86_64 \
    -enable-kvm \
    -cpu host \
    -smp 4 \
    -m 4096 \
    -bios /usr/share/OVMF/OVMF_CODE.fd \
    -net user,hostfwd=tcp::2200-:22 \
    -net nic \
    -snapshot build/image/disk.raw

This command assumes the BIOS/UEFI firmware file /usr/share/OVMF/OVMF_CODE.fd exists. Depending on your system that file may be located elsewhere. You can also just remove the -bios option as it's not strictly required.

This starts a new QEMU VM with 4 CPU cores and 4 GiB of memory. SSH connections to port 2200 on the host are forwarded to port 22 on the VM. The use of -snapshot means the VM won't update the disk image.

At this point you may ask yourself "Wait, who do I log in as?". That's the wonderful thing, you don't! Jokes aside, by default there's a root user but it doesn't allow you to log in through the console. Let's see how we can fix that.

User management

User management in bootc is a little different, and arguably not as fleshed out as it should be. The issue is that adding users modifies state such as /etc/shadow but those files are also considered as locally mutable on the host you'll deploy your image to. What this means is that if the host modifies such a file and a future update also modifies it, the host retains its version and ignores the changes introduced by the update. In other words, you basically don't want to ever change files in /etc that you may also update through your image.

The bootc documentation briefly mentions a few approaches, but I didn't find it to be that helpful. What I recommend is to not add any users in the container (be it using adduser, systemd-sysusers or some other mechanism). For servers you use the root account over SSH and disable password based authentication. For desktops you'd create users as part of the installation process using the Anaconda installer (more on that later).

Let's update our Containerfile so we can log in using SSH. First we'll create a directory for the files to copy into the container:

mkdir -p overlay/etc/ssh/keys/
mkdir -p overlay/etc/ssh/sshd_config.d/

Next we'll create overlay/etc/ssh/sshd_config.d/10-custom.conf with the following contents:

AuthorizedKeysFile .ssh/authorized_keys /etc/ssh/keys/%u
PasswordAuthentication no
PermitRootLogin yes

This disables password authentication over SSH, allows logging in as the root user and configures /etc/ssh/keys/USER as an extra source for authorized_keys files. This allows us to change the list of keys over time as part of the image and potentially add keys for different users, all in a single easy to find place.

Next, create overlay/etc/ssh/keys/root and add the appropriate public keys to this file. For example, this is what I use:

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIAtIdG1mSd5MRlfWiy0n7XF3K3s+yaq26qeur7LVgJFT desktop
ssh-ed25519 AAAC3NzaC1lZDI1NTE5AAAAIIZQJ5WP5Z3epZU4gN+sXczNSm3DB3NsYRGU0WMgSNTj laptop

With the files in place, let's edit the Containerfile to copy these files into the container:

FROM quay.io/fedora/fedora-bootc:43

RUN --mount=type=cache,target=/var/cache,sharing=locked \
    --mount=type=cache,target=/var/lib/dnf,sharing=locked \
    --mount=type=tmpfs,target=/var/log \
    dnf install --assumeyes --quiet fastfetch

COPY overlay/ /

RUN bootc container lint --fatal-warnings

The newly added COPY command copies everything (recursively) from the overlay directory into the root of the container, such that overlay/etc/foo becomes /etc/foo in the container.

Now let's build the container to make sure the new COPY command works:

podman build -t bootc-test .

This produces the following:

STEP 1/4: FROM quay.io/fedora/fedora-bootc:43
STEP 2/4: RUN --mount=type=cache,target=/var/cache,sharing=locked     --mount=type=cache,target=/var/lib/dnf,sharing=locked     --mount=type=tmpfs,target=/var/
log     dnf install --assumeyes --quiet fastfetch
--> Using cache 674e550d0e189697b51ad6fc431d57435b887fd623de93f7f2cf55f3f836f4d6
--> 674e550d0e18
STEP 3/4: COPY overlay/ /
--> 0bf3001b91e7
STEP 4/4: RUN bootc container lint --fatal-warnings
Checks passed: 12
Checks skipped: 1
COMMIT bootc-test
--> 6d6f9607e100
Successfully tagged localhost/bootc-test:latest
6d6f9607e10029b48a952d5def7d2adc41db2c64b6d428bd8d5ec07acda1ccb7

We can now rebuild our disk image and test it in a VM using the bootc-image-builder and qemu-system commands from earlier. Once the VM is running you can SSH into it using the following command:

ssh -o "StrictHostKeyChecking no" -o "UserKnownHostsFile=/dev/null" -p 2200 root@localhost

The -o options are used so we don't save the VM host in ~/.ssh/known_hosts and to disable host verification. This way we don't end up with scary warnings when we SSH into the VM each time we rebuild its image, and we don't need to explicitly approve the connection.

If all went well you should now be able to log in to your VM without the need for a password.

Besides including SSH keys you'll likely want to include other files in the container, such as firewall configuration, secrets, Podman Quadlets and more. The approach of using a dedicated directory to copy into the container (the overlay directory above) is by far the easiest approach of doing so, and probably good enough for most cases.

Deploying the initial image

OK, so we have our image and confirmed it works by running it in a VM. How do we get this on a server? Well, this depends on the hosting provider. For example, Hetzner servers come with a Debian based rescue system that you can SSH into. Using this rescue system you can then stream the image directly to the target disk as follows:

cat build/image/disk.raw | \
    zstd -3 | \
    ssh \
        -o "UserKnownHostsFile=/dev/null" \
        -o "StrictHostKeyChecking no" \
        root@SERVER 'zstd --decompress | dd of=/dev/sda bs=1M status=progress'

Using this command we send a compressed image over SSH to the server running our rescue system, then decompress it on the fly and write the output directly to the target disk. We also disable strict host checking and updating of ~/.ssh/known_hosts since rescue systems typically have their own SSH host keys while running on the same IP/host, and SSH won't like that.

The use of compression is deliberate: bootc-image-builder is configured to always produce disk images that are at least 10 GiB in size. While tools such as du -hs will report a smaller size (e.g. 2 GiB), the moment you send that image over the network you end up transferring 10 GiB of data. By compressing the data we're able to greatly reduce the amount of data transferred. For example, the image built thus far is about 2.1 GiB according to du -hs. By compressing it I only need to transfer 1.1 GiB instead of 10 GiB.

For hosting providers without a rescue system the process may be a bit more tricky. One option is to instead first install Fedora CoreOS (assuming the hosting provider supports this out of the box) then rebase it to your container image (see this article for an example). This does require that the image is hosted in a container registry somewhere, which we'll get to in a moment.

Updating existing servers

So we've deployed the image to a server, we've made some changes to the image and we want to deploy those changes. To do so there are two options:

  1. Build the container image locally and export it using podman image save, upload it to the server and rebase to the new image
  2. Upload the container image to a container registry such as quay.io (if you like Red Hat charging you a premium), GitHub (if you like Microsoft charging you a slightly smaller premium), or your own (if you like being woken up at 04:00 on a Saturday because the registry is down)

Updates using OCI archives

Let's start with the first option. First we'll start our VM but use the -drive option instead of -snapshot to simulate it using an actual disk:

qemu-system-x86_64 \
    -enable-kvm \
    -cpu host \
    -smp 4 \
    -m 4096 \
    -bios /usr/share/OVMF/OVMF_CODE.fd \
    -net user,hostfwd=tcp::2200-:22 \
    -net nic \
    -drive format=raw,index=0,media=disk,file=build/image/disk.raw

Now let's change the Containerfile to also install htop:

FROM quay.io/fedora/fedora-bootc:43

RUN --mount=type=cache,target=/var/cache,sharing=locked \
    --mount=type=cache,target=/var/lib/dnf,sharing=locked \
    --mount=type=tmpfs,target=/var/log \
    dnf install --assumeyes --quiet fastfetch htop

COPY overlay/ /

RUN bootc container lint --fatal-warnings

For our updates we don't need to use bootc-image-builder, instead we build them using Podman:

podman build -t bootc-test .

Building as root isn't necessary for updates, only when using bootc-image-builder.

Once the image is built we export it as an OCI archive and upload it to the VM using scp:

podman image save bootc-test:latest \
    --format oci-archive \
    --output "bootc-test-$(date +%s).oci"

scp \
    -o "StrictHostKeyChecking no" \
    -o "UserKnownHostsFile=/dev/null" \
    -P 2200 \
    bootc-test-*.oci root@localhost:~/

Make sure to use root@localhost:~/ instead of just root@localhost otherwise scp will copy the archive to a local file called root@localhost, instead of copying it to the VM.

Then log in to the VM and switch to the new image and reboot the VM:

ssh \
    -o "StrictHostKeyChecking no" ]
    -o "UserKnownHostsFile=/dev/null" \
    -p 2200 \
    root@localhost 'bootc switch --transport oci-archive --apply bootc-test-*.oci'

Once the VM is rebooted you should be able to run htop. Don't forget to remove the .oci file!

While this approach technically works, it suffers from a few flaws:

  1. Exporting images exports all their layers, so the image we end up deploying is larger than necessary (i.e. no incremental updates)
  2. bootc switch with a local file uses the file name to determine if the image is already applied, meaning you have to give each file a unique name (as done using the date command in the above examples)
  3. It's annoying if you need to update multiple servers

Updates using container registries

The second and better approach is to upload an image to a container registry and use bootc update to download and apply the changed layers. This approach requires less data to be transferred and makes it easier to update multiple servers.

For this to work there are two things we'll need. First, when building the image using bootc-image-builder, the image name must be a qualified name that matches the one used for our container registry. The reason for this is that bootc update uses that name to pull new updates. If that name happens to be localhost/whatever it won't be able to pull any updates because they don't exist on the server itself. The second requirement is that if the container image is hosted in a private registry we'll need to add the necessary credentials to the image.

Let's for a moment assume you have a private GitHub repository at github.com/kitten/mittens that contains the code used to build your image, and you're using GitHub's container registry such that the image is available at ghcr.io/kitten/kittens:latest. This means you'd use bootc-image-builder to build the initial image as follows:

sudo podman build -t ghcr.io/kitten/mittens:latest .
sudo podman run \
    --rm \
    --interactive \
    --tty \
    --privileged \
    --security-opt label=type:unconfined_t \
    --volume ./build:/output \
    --volume /var/lib/containers/storage:/var/lib/containers/storage \
    quay.io/centos-bootc/bootc-image-builder:latest \
    --type raw \
    --use-librepo=True \
    --rootfs ext4 \
    --chown $(id -u):$(id -g) \
    ghcr.io/kitten/mittens:latest

To allow the server to pull the image you'll need to generate a personal access token with the read:packages scope. Let's assume the token value is hunter2. Create the file overlay/etc/ostree/auth.json with the following contents:

{
  "auths": {
    "ghcr.io": {
      "auth": "BASE64"
    }
  }
}

Replace BASE64 with the output of the following command:

echo -n 'your-github-username:hunter2' | base64 --wrap=0

This would result in something like this:

{
  "auths": {
    "ghcr.io": {
      "auth": "eW91ci1naXRodWItdXNlcm5hbWU6aHVudGVyMg=="
    }
  }
}

If you're wondering why base64 is necessary: I believe this is because it's Podman that requires it (this configuration file is a Podman configuration file that bootc just happens to use), but why it uses base64 instead of anything else I don't know.

With the configuration file in place you can rebuild your image and deploy it using the methods discussed thus far. From this point on you can update using bootc update instead (using --apply to automatically reboot if desired).

You don't have to include the secret in the image. Because /etc is locally mutable you can also upload the file after setting up the server, or use something like cloud-init to create the file on first boot.

Disabling automatic updates

By default updates are periodically applied in the background using two systemd units:

  • bootc-fetch-apply-updates.service
  • bootc-fetch-apply-updates.timer

The problem with this approach is that no synchronization across a cluster of any kind is performed, meaning that if you have 20 bootc servers they'll possibly reboot all at once (if you're unlucky enough). I don't like systems that automatically reboot at certain intervals, so I disabled these units:

systemctl disable \
    bootc-fetch-apply-updates.service \
    bootc-fetch-apply-updates.timer

Using the setup of using enable.txt and disable.txt introduced earlier, you'd add the following to systemd/disable.txt then rebuild the container and deploy it:

bootc-fetch-apply-updates.service
bootc-fetch-apply-updates.timer

Unless you're OK with your server rebooting some random amount of time after pushing your container updates, I suggest you do the same. My current approach (since I only have a single server) is to just run ssh root@host 'bootc update --apply' whenever I want to apply updates and reboot the server.

Applying temporary changes

Image based deployments are great but sometimes we need to quickly roll out a temporary change, such as a security update or a firewall configuration change. Or maybe you need to change something but can't afford to reboot right now.

For this we can use the bootc usr-overlay command. Running this command results in a mutable /usr overlay that we can then mutate as if we were using a regular Linux distribution. This overlay is lost upon a reboot.

Imagine for a moment that our server is performing important work we can't interrupt right now, but we also need to apply a critical security update. Using the overlay command we can do something along the lines of the following:

bootc usr-overlay          # Enable the overlay
dnf update package-name    # Update the relevant package(s)

Not only is this useful for quickly applying updates, it's also useful for desktop environments as it allows you to play around with certain packages without being forced to work in a Toolbox or Distrobox container.

Of course it's important to keep in mind that the overlay is temporary and lost upon a reboot, so if you make any changes you need to persist you'll need to also include them in your image and deploy that image at some point.

Mutating local state

While the /usr tree is immutable, /etc and /var/lib (and a few other directories such as /var/log) are considered "local mutable state". This means that changes made on the host remain present between updates. This does come with a caveat: if both the host and a future image update change the same file (e.g. /etc/foo), the changes on the host are used instead of those introduced by the image.

This approach can be both useful and annoying. It's useful because you can make certain changes and have them persisted, such as host specific firewall rules that live in /etc/firewalld. It's annoying because if a file changes when this isn't expected (e.g. some program decides to reformat its configuration file when it starts), future updates to that file introduced by the image are ignored.

The ostree admin config-diff command is used to generate a diff of the files added and modified in /etc. Its output will be something along the lines of the following:

M    ld.so.cache
M    machine-id
M    selinux/targeted/active/commit_num
A    alternatives/ld
A    alternatives-admindir/ld
A    issue.d/22_clhm_ens3.issue
A    selinux/targeted/semanage.read.LOCK
A    selinux/targeted/semanage.trans.LOCK
A    ssh/ssh_host_ecdsa_key
A    ssh/ssh_host_ecdsa_key.pub
A    ssh/ssh_host_ed25519_key
A    ssh/ssh_host_rsa_key
A    ssh/ssh_host_rsa_key.pub
A    ssh/ssh_host_ed25519_key.pub
A    systemd/system/multi-user.target.wants/-.mount
A    systemd/system/multi-user.target.wants/boot-efi.mount
A    systemd/system/multi-user.target.wants/boot.mount
A    systemd/system/-.mount
A    systemd/system/boot-efi.mount
A    systemd/system/boot.mount
A    .updated
A    fstab
A    .pwd.lock
A    locale.conf
A    vconsole.conf
A    .rpm-ostree-shadow-mode-fixed2.stamp
A    npmrc
A    mailcap
A    mime.types

If you're only interested in modified files, run the following instead:

ostree admin config-diff | grep '^M'

While bootc is supposed to support making /etc immutable, doing so breaks a bunch of services, and I remember there being a bunch of other issues with it as well. I personally keep /etc mutable as it makes it easier to test and deploy firewall changes (before including them in an image update) as the configuration for this resides in /etc/firewalld.

Building an installer

Building raw disk images works well enough when you can somehow connect to the target host, such as by using a rescue system. If this isn't possible but you can mount an ISO somehow, building an Anaconda installer is another option. Installers are also useful for more advanced host specific configuration such as disk layouts, adding additional users, etc.

To build an installer we need to do the following things:

  1. We need to create a config.toml that contains a basic kickstart configuration to set up the host
  2. We need to build a container containing all the dependencies necessary to build the installer itself (not the container we want to deploy)
  3. We need to build an installer ISO by combining this installer container and the container we want to deploy

Configuring Anaconda

Let's start with a basic installer that sets up a ext4 root disk without encryption:

[customizations.installer.kickstart]
contents = """
text
zerombr
clearpart --all --initlabel --disklabel=gpt
autopart --noswap --type=plain --fstype=ext4
rootpw --lock

%post --erroronfail
grep \"boot \" /etc/fstab > /etc/fstab-new
mv /etc/fstab-new /etc/fstab
%end
"""

The %post block is necessary to work around this issue where Anaconda generates a broken /etc/fstab.

Full disk encryption

To enable full disk encryption, use the following instead:

[customizations.installer.kickstart]
contents = """
text
zerombr
clearpart --all --initlabel --disklabel=gpt
autopart --noswap --type=plain --fstype=ext4 --encrypted
rootpw --lock

%post --erroronfail
grep \"boot \" /etc/fstab > /etc/fstab-new
mv /etc/fstab-new /etc/fstab
%end
"""

This will result in the installer asking for a passphrase during the installation process.

Automatic unlocking using a TPM2 device

To automatically unlock the root disk using a TPM2 device, create overlay/etc/dracut.conf.d/tpm2.conf with the following contents:

add_dracutmodules+=" tpm2-tss "

Then create overlay/usr/lib/bootc/kargs.d/10-tpm2.toml with the following contents:

kargs = ["luks.options=tpm2-device=auto,headless=true,tpm2-pcrs=7+15"]

The overlay directory should now look something like this:

$ tree overlay
overlay
├── etc
│   ├── dracut.conf.d
│   │   └── tpm2.conf
│   └── ssh
│       ├── keys
│       │   └── root
│       └── sshd_config.d
│           └── 10-custom.conf
└── usr
    └── lib
        └── bootc
            └── kargs.d
                └── 10-tpm2.toml

10 directories, 4 files

Then change the contents of config.toml to the following:

[customizations.installer.kickstart]
contents = """
text
zerombr
clearpart --all --initlabel --disklabel=gpt
autopart --noswap --type=plain --fstype=ext4 --encrypted --passphrase 1234567890
rootpw --lock

%post --erroronfail
grep \"boot \" /etc/fstab > /etc/fstab-new
mv /etc/fstab-new /etc/fstab

env PASSWORD=1234567890 \
  systemd-cryptenroll --wipe-slot tpm2 --tpm2-device auto \
  --tpm2-pcrs 7+15 \
  $(blkid --match-token TYPE=crypto_LUKS --output device)

env PASSWORD=1234567890 \
  systemd-cryptenroll --wipe-slot 0 \
  $(blkid --match-token TYPE=crypto_LUKS --output device)

%end
"""

This sets up TPM2 unlocking using systemd-cryptenroll. The passphrase is temporary and removed by the second call to systemd-cryptenroll.

I experimented with different PCR registers but found that only the combination of registers 7 and 15 worked reliably. It's possible this is because of the use of a virtual machine for testing the changes instead of using physical hardware. If you change the registers make sure to do so in both the config.toml file and in kargs.d/10-tpm2.toml.

After completing the installation process and booting into the host you'll want to either add a custom passphrase or generate a recovery key, as the PCR registers may change. For example, to generate a recovery key:

systemd-cryptenroll --recovery-key \
  $(blkid --match-token TYPE=crypto_LUKS --output device)

If you change the PCR registers in config.toml you'll also need to update the 10-tpm2.conf configuration file accordingly.

Building the installer container

There's no pre-built container containing the Anaconda dependencies, so we have to build our own container. While bootc-image-builder has a dedicated anaconda-iso output option that doesn't require a dedicated container, it's deprecated (though you wouldn't know unless you look at the source code).

Create an installer directory containing a Containerfile with the following contents (based on this example):

FROM quay.io/fedora/fedora-bootc:43
RUN dnf install -y \
     anaconda \
     anaconda-install-env-deps \
     anaconda-dracut \
     dracut-config-generic \
     dracut-network \
     net-tools \
     squashfs-tools \
     grub2-efi-x64-cdboot \
     python3-mako \
     lorax-templates-* \
     biosdevname \
     prefixdevname \
     && dnf clean all

RUN mkdir -p /boot/efi && cp -ra /usr/lib/efi/*/*/EFI /boot/efi
RUN mkdir /var/mnt

Then build the container:

sudo podman build -t bootc-installer installer

This step only needs to be done once and not for every update. Changes to config.toml don't require a rebuild of this container either.

Building the installer

Now we can build the Anaconda installer that deploys our container as follows:

mkdir -p build
sudo podman build -t bootc-test .
sudo podman run \
    --rm \
    --interactive \
    --tty \
    --privileged \
    --security-opt label=type:unconfined_t \
    --volume ./config.toml:/config.toml:ro \
    --volume ./build:/output \
    --volume /var/lib/containers/storage:/var/lib/containers/storage \
    quay.io/centos-bootc/bootc-image-builder:latest \
    --type bootc-installer \
    --installer-payload-ref localhost/bootc-test:latest \
    --use-librepo=True \
    --rootfs ext4 \
    --chown $(id -u):$(id -g) \
    localhost/bootc-installer:latest

Here the --installer-payload-ref specifies the qualified name of the container we wish to deploy, while localhost/bootc-installer:latest refers to the qualified name of the installer container (i.e. the one from the installer/ directory). The additional --volume ./config.toml:/config.toml:ro option ensures the configuration file is available to the bootc-image-builder container.

Building an installer this way can take a while, and if you're using a laptop it may attempt to fly away. For example, on my Framework 13 with a Ryzen 5 7640U it takes around 2.5 minutes during which the CPU temperature reaches a crisp 99C.

Once built we can test the ISO using a VM as follows:

truncate -s 10G iso-disk.raw
qemu-system-x86_64 \
    -enable-kvm \
    -cpu host \
    -smp 4 \
    -m 4096 \
    -bios /usr/share/OVMF/OVMF_CODE.fd \
    -net user,hostfwd=tcp::2200-:22 \
    -net nic \
    -cdrom build/bootiso/install.iso \
    -drive format=raw,index=0,media=disk,file=iso-disk.raw

When booting choose "Install Fedora Linux" and the installation process begins. Once complete, press Enter to reboot into the new installation.

Managing packages and services

So we now know how to build a basic image and installer and we can SSH into it. What about enabling additional services such as firewalld?

The approach is pretty simple: in your Containerfile you use systemctl enable and systemctl disable to enable and disable the appropriate services respectively. For example:

FROM quay.io/fedora/fedora-bootc:43

RUN --mount=type=cache,target=/var/cache,sharing=locked \
    --mount=type=cache,target=/var/lib/dnf,sharing=locked \
    --mount=type=tmpfs,target=/var/log \
    dnf install --assumeyes --quiet fastfetch firewalld

COPY overlay/ /
RUN systemctl enable firewalld
RUN bootc container lint --fatal-warnings

This would install and enable firewalld, which isn't installed by default when using the fedora-bootc base image.

Using the above approach can get a little tedious to work with as the list of packages and/or services increases. I prefer storing such lists in text files and mounting those into the container:

FROM quay.io/fedora/fedora-bootc:43

RUN --mount=type=cache,target=/var/cache,sharing=locked \
    --mount=type=cache,target=/var/lib/dnf,sharing=locked \
    --mount=type=tmpfs,target=/var/log \
    --mount=type=bind,source=dnf,target=/dnf,z \
    dnf install --assumeyes --quiet $(< /dnf/install.txt) >/dev/null

COPY overlay/ /

RUN --mount=type=bind,source=systemd,target=/systemd,z \
    systemctl disable $(< /systemd/disable.txt) && \
    systemctl enable $(< /systemd/enable.txt)

RUN bootc container lint --fatal-warnings

Using this setup you can add the packages to install to dnf/install.txt and those to remove to dnf/remove.txt. Services listed in systemd/disable.txt are disabled and those listed in systemd/enable.txt are enabled. For example, here's the contents of dnf/install.txt that I use for building my web server:

firewalld
htop
rsync
zram-generator-defaults

Do note that the above approach will fail if one of the files is empty. For example, if you don't need to remove any packages you should remove the dnf line that uses dnf/remove.txt or it will produce an error.

Those familiar with systemd and specifically systemd presets may wonder if they can't use that instead of the text file approach. The short answer is that this isn't reliable enough as presets are only applied upon first boot. This means that if you introduce new presets in a future update, they'll be ignored. The text file approach doesn't suffer from the same problem, so I recommend using this approach instead.

Running applications and containers

There are two ways to run an application in a bootc environment: install them when building the container and run them the usual way (e.g. as a dedicated user), or run them inside a Podman container. I highly recommend taking the second approach for two reasons:

  1. You don't have to fiddle with creating users in the container (e.g. using systemd-sysusers)
  2. You can take advantage of the isolation features provided by Podman to isolate the application from the rest of the system

Fortunately, running containers in a bootc environment is pretty easy thanks to Podman Quadlets. To add a container, create a NAME.container in overlay/etc/containers/systemd/ using the following template:

[Container]
ContainerName=CONTAINER-NAME
Image=QUALIFIED-IMAGE-NAME
Pull=missing
UserNS=keep-id

[Unit]
After=network-online.target

[Service]
Restart=on-failure
RestartSec=60

[Install]
WantedBy=default.target

Pull=missing means that if the container image isn't present it's pulled from the source specified in the Image setting. Depending on your needs you may want to change this to Pull=newer to also update it when a newer version is available. I recommend only doing so with container images you control.

The UserNS=keep-id setting is important. Using the setup introduced thus far, the containers are started by the root user, which is what you typically want in a server environment. The UserNS setting ensures that the container is given its own namespace (instead of reusing the root namespace) while the IDs are still mapped to the root user ID on the host. This is important because it allows such containers to share files using Podman volumes, without file ownership getting messed up. In other words: you get isolation but without the headaches.

The Unit section ensures the container starts after the network is available, which may or may not be necessary for your use case. The Service section ensures the container is restarted upon a failure, using a 60 second interval between restarts. The default is 100 msec, which may cause the host to go nuts if there's an error (e.g. you gave it a non-existing image name) as it will constantly try to restart the container.

The Install section ensures the container is enabled upon boot. This ensures we don't have to explicitly start the container using systemctl.

To see the available quadlets and their status, run podman quadlet list on the host. For example, this is the output for my web server:

$ podman quadlet list
NAME                     UNIT NAME              PATH ON DISK                                     STATUS          APPLICATION
certbot.container        certbot.service        /etc/containers/systemd/certbot.container        inactive/dead
shost.container          shost.service          /etc/containers/systemd/shost.container          active/running
ssh-container.container  ssh-container.service  /etc/containers/systemd/ssh-container.container  active/running

To get the status of a specific quadlet, use systemctl status NAME. For example:

$ systemctl status shost.service
● ssh-container.service
     Loaded: loaded (/etc/containers/systemd/ssh-container.container; generated)
    Drop-In: /usr/lib/systemd/system/service.d
             └─10-timeout-abort.conf
     Active: active (running) since Tue 2026-02-10 03:14:36 UTC; 11h ago
 Invocation: f6b0b884b62340bc872cee29a8037c06
   Main PID: 1117 (conmon)
      Tasks: 3 (limit: 4423)
     Memory: 55.9M (peak: 64.9M)
        CPU: 12.601s
     CGroup: /system.slice/ssh-container.service
             ├─libpod-payload-2550d67a0cd564fcfa3075875169e8cb648f9254b92ba4f069548c87ab1c3b88
             │ ├─1122 bash /usr/local/bin/sshd
             │ └─1173 "sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups"
             └─runtime
               └─1117 /usr/bin/conmon --api-version 1 -c [...]

Feb 10 03:14:32 fedora systemd[1]: Starting ssh-container.service...
Feb 10 03:14:34 fedora podman[1047]: 2026-02-10 03:14:34.166096899 +0000 UTC m=+1.165587651 container create [...]
Feb 10 03:14:34 fedora podman[1047]: 2026-02-10 03:14:34.126733649 +0000 UTC m=+1.126224421 image pull [...]
Feb 10 03:14:36 fedora podman[1047]: 2026-02-10 03:14:36.326839872 +0000 UTC m=+3.326330634 container init [...]
Feb 10 03:14:36 fedora podman[1047]: 2026-02-10 03:14:36.330256351 +0000 UTC m=+3.329747113 container start [...]
Feb 10 03:14:36 fedora systemd[1]: Started ssh-container.service.
Feb 10 03:14:36 fedora ssh-container[1047]: 2550d67a0cd564fcfa3075875169e8cb648f9254b92ba4f069548c87ab1c3b88

Example repository

I've set up a GitHub repository that contains all the files necessary to build a bootc based system that uses full disk encryption and TPM2 unlocking, using the setup discussed in this article. When playing around with it make sure to change overlay/etc/ssh/keys/root to include your own SSH public keys instead of mine, otherwise you won't be able to log in to the system but I will be, which is probably not what you want.

To build the project, clone it and then run sudo make installer to build an unattended Anaconda installer.

My new hosting setup

OK so we now know how to build a bootable container using bootc and we have a reasonable setup for managing its configuration, let's look at the new setup that I'm using for the few websites that I'm hosting.

As you can probably guess, I'm using bootc. The contents of the image are stored in a private GitHub repository, using the same structure as introduced in this article. Of course the image I use for the server contains more than what I've introduced so far. For example, I include various files to configure firewalld so I can block annoying bots. Applying these is done by either building an image and rebooting into it, or by uploading them using rsync followed by a systemctl restart firewalld.

Running services

The server runs the following services in a container using quadlets:

  • shost: a small static file server written in Inko, basically "nginx at home"
  • certbot for generating TLS certificates
  • SSH

There are a few reasons I'm using a custom file server instead of nginx or Caddy:

  • To test/further develop Inko
  • Because I don't need the full laundry list of features provided by nginx and Caddy (e.g. proxying to a backend)
  • Because I can
  • I like being in control/having ownership of the software that I run

Serving websites

The definition for the shost container is as follows:

[Container]
ContainerName=shost
Exec=shost --tls /etc/shost/tls --no-timestamps
Image=ghcr.io/yorickpeterse/shost:main
PodmanArgs=--memory 512m
IP=10.88.0.2
Pull=newer
ReadOnly=true
ReloadSignal=HUP
UserNS=keep-id
AddCapability=CAP_NET_BIND_SERVICE
DropCapability=all

# These volumes are all mounted such that they may be easily shared between
# other containers (e.g. the certbot container).
Volume=/etc/shost/tls:/etc/shost/tls:z,ro
Volume=/var/lib/shost:/var/lib/shost:z,ro

[Unit]
After=network-online.target

[Service]
Restart=on-failure    # Automatically restart upon a failure
RestartSec=60         # Wait 60 seconds before restarting a failed service
TimeoutStopSec=15     # Wait up to 15 seconds for the service to stop

[Install]
WantedBy=default.target

When this article was first published I used PublishPort to expose container ports to the world. When using firewalld as a firewall this results in additional firewall rules (e.g. blocking IPs) being ignored. See this discussion for more details.

The solution is to give each container a static IP address, add StrictForwardPorts=yes to /etc/firewalld/firewalld.conf, and to not use PublishPort and instead add firewalld port-forwarding rules for each port that should be publicly accessible.

The use of ReadOnly=true means the root file system inside the container is read-only. This way if somebody somehow manages to exploit shost and can shell access inside the container there isn't much they can do.

The container runs as the root user instead of a dedicated rootless user. While this still gives isolation, it does mean the container runs in the same namespace as the root user. To avoid this, I use UserNS=keep-id. This option results in the container using its own namespace while still mapping the user ID inside the container to the ID of the root user outside the container. This is useful when sharing (writable) data between containers and/or the host as it ensures the file permissions don't get all messed up. Another possibility would be to use UserNS=auto. This essentially behaves the same as UserNS=keep-id except it doesn't do any user ID mapping, making sharing files through volumes much more difficult.

The options AddCapability=CAP_NET_BIND_SERVICE and DropCapability=all result in the container dropping all capabilities except for CAP_NET_BIND_SERVICE. There is no easy way to determine which capabilities an application needs other than just trying it out, but most network applications that just serve files as-is won't need more than CAP_NET_BIND_SERVICE. It shouldn't technically be necessary as Podman containers already give the necessary isolation to prevent them from taking over the host (barring any bugs in Podman itself of course), but it doesn't hurt either.

The volumes use mount two sets of data: /etc/shost/tls contains the TLS certificates to use for each domain, while /var/lib/shost contains the files to serve for each website:

$ ls /etc/shost/tls
docs.inko-lang.org  inko-lang.org  releases.inko-lang.org  www.inko-lang.org  www.yorickpeterse.com  yorickpeterse.com

$ ls /var/lib/shost
docs.inko-lang.org  inko-lang.org  releases.inko-lang.org  yorickpeterse.com

$ ls /var/lib/shost/yorickpeterse.com
404.html  articles  css  favicon.ico  feed.xml  images  index.html  resume  robots.txt

Besides the container there are two additional auxiliary units:

# /etc/systemd/system/reload-shost.path
[Unit]
Description=Reload shost when a reload file changes

[Path]
PathModified=/var/lib/shost/reload

[Install]
WantedBy=default.target

# /etc/systemd/system/reload-shost.service
[Unit]
Description=Reload shost
ConditionPathExists=/var/lib/shost/reload

[Service]
Type=oneshot
ExecStart=systemctl reload shost
ExecStartPost=rm /var/lib/shost/reload

[Install]
WantedBy=default.target

The service unit just invokes a reload of shost using systemctl while the path unit triggers it whenever /var/lib/shost/reload is created or modified (but not deleted). This will allow the certbot container to reload shost without requiring special privileges, more on that below.

Blocking bots

When running a website you'll inevitably run into misbehaving bots, such as feed readers refreshing your Atom feed every 5 minutes while ignoring the 304 Not Modified status and the various caching related response headers.

The first line of defense is a simple heuristic employed by shost: if a client pretends to be a browser but doesn't do a good job of it, an error response is produced. Although this heuristic is simple it's still effective, blocking many bots that seemingly put no effort into actually trying to hide themselves. While it won't catch every bot that's OK, as the load caused by serving static files isn't a big deal, it's more about the principle of things.

The second approach is to block IPs and/or entire ASNs at the firewall level. I don't have any automatic tools for this, instead I periodically check the access logs for suspicious/problematic behavior and ban the IP or ASN accordingly. At the moment I have the following ASNs blocked:

  • Alibaba: a common source of misbehaving bots
  • Techoff: some hosting provider (I think?) that is a source of questionable/misbehaving clients (e.g. scanning for WordPress PHP files)

I also blocked a handful of individual residential IPs used by feed readers that just get slamming my Atom feeds every few minutes. I thought about serving some sort of "Atom bomb" where each request produces a dynamically generated list of nonsense articles or warnings, triggering "You have X unread articles" notifications (if the user enabled them at least), but I couldn't be bothered with actually implementing this.

Generating TLS certificates

The certbot setup is a little more involved, starting with the container:

[Container]
ContainerName=certbot
Entrypoint=/bin/sh
Exec=/usr/share/letsencrypt/run.sh
Image=docker.io/certbot/certbot@sha256:5255405f241cd64b121f36ef0172711420816bbbd9c029674de53e5ed953182d
PodmanArgs=--memory 128m
IP=10.88.0.3
UserNS=keep-id
DropCapability=all
AddCapability=CAP_NET_BIND_SERVICE

# I don't control certbot so I don't want new updates to be pulled
# automatically.
Pull=missing

# The volumes that certbot needs.
Volume=/usr/share/letsencrypt:/usr/share/letsencrypt
Volume=/etc/letsencrypt:/etc/letsencrypt:z
Volume=/var/lib/letsencrypt:/var/lib/letsencrypt:z

# These volumes are mounted so we can move the TLS data in the right place and
# reload the webserver, and so we can detect the domains to issue certificates
# for.
Volume=/etc/shost/tls:/etc/shost/tls:z
Volume=/var/lib/shost:/var/lib/shost:z

[Unit]
Wants=certbot.timer
ConditionHost=*yorickpeterse.com*

[Service]
Type=oneshot

I don't control the container image so instead of pulling a mutable tag I'm pulling a specific SHA. This way I don't have to about the underlying image changing while still using the same tag.

The unit is only enabled on hosts that match the pattern *yorickpeterse.com*. This way I can run the image in a local VM without having to worry about certbot trying to request or renew certificates in a way that may mess up with the production server.

Unlike the shost container the certbot container is a oneshot container. This means that it will run and then stop, rather than running forever. The container is started periodically using the following systemd timer:

[Unit]
Description=Renew TLS certificates using certbot
Requires=certbot.service
After=network-online.target

[Timer]
OnCalendar=daily
Persistent=true

[Install]
WantedBy=timers.target

certbot takes a list of domain names and generates a single certificate and private key for all the domain names. I instead want one certificate/private key pair per domain. To achieve this I use a custom script that invokes certbot appropriately, then mount that script into the container and run it instead of the default entry point/command. The run.sh script is as follows:

#!/usr/bin/env sh

set -e

for dir in /var/lib/shost/*
do
    certbot certonly --standalone -n -d $(basename "${dir}")
done

certbot certonly --standalone -n -d www.yorickpeterse.com
certbot certonly --standalone -n -d www.inko-lang.org

# Reload shost if one or more certificates are renewed. We don't do this
# directly in the deploy hook so that if 3 domains are renewed we don't trigger
# a reload 3 times in a row.
if [ -f /var/lib/letsencrypt/renewed ]
then
    rm /var/lib/letsencrypt/renewed
    touch /var/lib/shost/reload
fi

certbot is configured to use a custom deploy hook as follows:

email = Removed so I don't get spammed as much
agree-tos = true
no-eff-email = true
keep-until-expiring = true
deploy-hook = /bin/sh /usr/share/letsencrypt/deploy.sh

The deploy.sh script is as follows:

#!/usr/bin/env sh

set -e

cd /etc/shost/tls

for name in ${RENEWED_DOMAINS}
do
    mkdir -p "${name}"
    cp "${RENEWED_LINEAGE}/fullchain.pem" "${name}/cert.pem"
    cp "${RENEWED_LINEAGE}/privkey.pem" "${name}/key.pem"
done

# Signal the main process that one or more certificates are renewed.
touch /var/lib/letsencrypt/renewed

The way this all ties together is as follows: the deploy hook is called only if the certificates changed. This script moves the certificates and private keys into the right place, then it creates /var/lib/letsencrypt/renewed. The run.sh script checks for this file at the end and if present creates /var/lib/shost/reload, triggering a reload of the shost container without the certbot container requiring full root access to run systemctl reload shost or to send a signal to the shost process.

This setup could be simplified drastically if certbot supported producing separate certificate/key pairs in a single command, but alas it doesn't and so here we are.

SSH and deploying websites

The server runs two instances of SSH: one allows the root user to log in to the server (using only a public key of course, password authentication is disabled). Access to this instance is restricted to a list of allowed IPs using firewalld: my home IP address and the address of the private network attached to the server. This way if I'm ever not home I can start up a new server, attach it to the private network and still log in using this server as a proxy.

The way this is achieved is as follows:

  • I define an IP set called "admins" including the allowed IP addresses
  • The image includes a custom version of the "public" firewalld zone

The custom public zone is as follows:

<?xml version="1.0" encoding="utf-8"?>
<zone>
  <short>Public</short>
  <description>The default zone</description>

  <!-- These services are available to everybody. -->
  <service name="http" />
  <service name="https" />
  <service name="mdns" />
  <service name="dhcpv6-client" />

  <!-- The port used for the SSH container. -->
  <port port="2222" protocol="tcp" />

  <!-- Only allow regular/root SSH from a known list of sources -->
  <rule>
    <source ipset="admins" />
    <service name="ssh" />
    <accept />
  </rule>

  <!-- Port-forwarding for the containers. -->
  <forward-port port="443" protocol="tcp" to-port="443" to-addr="10.88.0.2" />
  <forward-port port="80" protocol="tcp" to-port="80" to-addr="10.88.0.3" />
  <forward-port port="2222" protocol="tcp" to-port="22" to-addr="10.88.0.4" />

  <forward/>
</zone>

Here the rule block ensures that SSH on port 22 is allowed only for connections that match the "admins" IP set, while SSH on port 2222 is allowed to everybody.

The "admins" IP set is as follows:

<?xml version="1.0" encoding="utf-8"?>
<ipset type="hash:net">
  <short>admins</short>
  <description>IP addresses of admins</description>
  <entry>My home IP address here</entry>
  <entry>10.0.0.0/16</entry>
</ipset>

While technically my home IP address may change, it hasn't changed in years. This was also the case for past homes with the same ISP, so for all practical purposes it's basically a fixed IP address.

The second SSH instance runs in a container and has severely restricted access. The purpose of this container is to allow tools such as rsync and rclone to deploy website changes without requiring full root access. The quadlet definition of this container is as follows:

[Container]
ContainerName=ssh-container
Image=ghcr.io/yorickpeterse/servers/ssh:main
PodmanArgs=--memory 64m
IP=10.88.0.4
Pull=newer
UserNS=keep-id
Volume=/etc/ssh-container/host-keys:/etc/ssh/host-keys:z
Volume=/etc/ssh-container/keys:/etc/ssh/keys:z
Volume=/var/lib/shost:/var/lib/shost:z

[Unit]
After=network-online.target

[Service]
Restart=on-failure
RestartSec=60
TimeoutStopSec=15

[Install]
WantedBy=default.target

Here the port 2222 on the host is mapped to port 22 in the container. Unlike port 22 on the host this port is publicly available, and thus subject to annoying bots trying to log in (though the use of a non-standard port makes this much less common).

The volume mount of /etc/ssh-container/host-keys ensures that the SSH server keys generated by the SSH daemon inside the container are persisted on the host such that restarting the container doesn't result in new keys being generated.

The /etc/ssh-container/keys directory is mounted and contains a file called root that defines all the SSH keys that have access to the SSH instance in the container. This file contains my own public keys and one for each project that should be able to deploy website changes:

ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIAtIdG1mSd5MRlfWiy0n7XF3K3s+yaq26qeur7LVgJFT desktop
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIIZQJ5WP5Z3epZU4gN+sXczNSm3DB3NsYRGU0WMgSNTj laptop
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIEylPjsCiQlx/w2ldJDdHIA2xF1Yq3trlxqjbqkAl6Lf github-yorickpeterse.com
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIJMN6lde4pKJopsoLGKePByIF1H/uBHx2WW2I/GcN6Ab github-inko
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIA7DHDKL6gWzIW+FgjYxlHz+Z9aHEGcvU1i3BBS0TWUm home server

The Containerfile used to build this container image is as follows:

FROM registry.fedoraproject.org/fedora-minimal:43
RUN microdnf install --assumeyes openssh-server coreutils
COPY overlay/ /
CMD /usr/local/bin/sshd

The overlay directory defines two files that are worth discussing: an sshd wrapper script and the sshd configuration file. The wrapper script is as follows:

#!/usr/bin/env bash

set -e

if [ ! -f /etc/ssh/host-keys/ssh_host_ecdsa_key ]
then
    ssh-keygen -A
    mv /etc/ssh/ssh_host* /etc/ssh/host-keys/
fi

/usr/sbin/sshd -D

All this does is generate the host keys and move them in the right place, then start SSH in the foreground.

The configuration file is as follows:

# We store the keys in /etc/ssh-host-keys so this can be mounted onto a volume
# without messing up the package-provided configuration in /etc/ssh.
HostKey /etc/ssh/host-keys/ssh_host_ecdsa_key
HostKey /etc/ssh/host-keys/ssh_host_ed25519_key
HostKey /etc/ssh/host-keys/ssh_host_rsa_key

# Authorized keys are stored in a dedicated directory so it's possible to mount
# them onto the host without messing up the configuration files.
AuthorizedKeysFile .ssh/authorized_keys /etc/ssh/keys/%u
PasswordAuthentication no
PermitRootLogin yes

This allows us to move the SSH server keys into a persistent location. If we kept these files in /etc/ssh we'd run into mounting conflicts because you can't mount /etc/ssh onto the host so it persists while mounting host data onto a sub directory (e.g. /etc/ssh/keys), or at least I couldn't get it to work.

The alternative approach would be to create a dedicated user and have each service that needs to deploy changes log in as that user instead of root. This approach has three downsides:

  • You need to create a real user you can log in as and that gets a bit annoying with bootc
  • That real user now needs read-write access to your websites data, meaning you have to start fiddling with file permissions and user groups so the data can live in /var/lib while the dedicated user still has access to it
  • If the user is compromised it does have more privileges compared to the container approach, e.g. it's able to see more processes and possibly read their environment data through the /proc file system

That last reason is mainly why I went with the somewhat convoluted SSH container approach: I don't want to give GitHub (or any other service for that matter) any more access to my servers than it strictly needs, and I don't trust the use of dedicated user accounts being secure enough.

In theory you could use a single dedicated user account and run each SSH command in a dedicated namespace of sorts, but at that point you basically end up with a similar but (probably) more clunky approach.

Hardware

For the "hardware" I'm renting a Hetzner cloud VPS. I went with Hetzner for the following reasons:

  • They offer both virtual and dedicated servers
  • Their pricing is much better compared to the usual cloud providers such as AWS and Digital Ocean
  • They are based in Europe
  • They have been around for a long time, generally have a good track record, and probably will stay around for a long time

I started with a CPX22 with 2 shared cores and 8 GiB of memory. This revealed a bunch of potential performance issues, resulting in a two week long investigation and a bunch of performance optimizations and bug fixes for Inko:

There is still an ongoing issue of the CPX22 server experiencing random latency spikes possibly due to noisy neighbors. For this reason I upgraded to a CCX13 with two dedicated CPU cores instead of shared cores, though tests thus far suggest the performance of such a VPS still isn't great.

I'm contemplating renting a dedicated server instead, though I haven't convinced myself an occasional latency spike justifies paying €40-45 per month for massively overpowered server just to serve a bunch of static websites.

A downside of using a single centralized server in Europe is that latency far away (e.g. south-east Asia) is greater compared to using a CDN such as the one provided by Cloudflare. Since the websites that I host all load super fast I'm not sure this matters that much. Maybe if I ever end up hating myself enough I'll dabble with GeoDNS or building an anycast network, but for now the single server setup will have to do.

DNS

For DNS I still use Cloudflare, mainly because I recently renewed a few domain names and don't feel like paying again just to move them elsewhere. Proxying through Cloudflare is disabled as I don't have use for it.

In the future I may move my DNS elsewhere such as Hetzner, though I do prefer keeping it separate from my infrastructure provider so I don't end up putting all eggs in one basket so to speak.

Conclusion

I'm pretty happy with the new setup: I'm fully in control, the cost is reasonable and the setup has proven to be a good way of stress testing various parts of Inko.

Getting started with bootc was a challenge though: bootc itself works well enough but its documentation is lacking, and the same applies to bootc-image-builder. bcvk is a buggy vibe-coded mess and I would stay away from it until its authors realise that having Claude Code projectile-vomit buggy code isn't going to work out in the long run.

Managing real users in container images is also in dire need of a better solution. You can sort of get there with systemd-sysusers but it's not meant for that, hence the name systemd-sysusers.

I hope the steps outlined in this article will make getting started with bootc a little easier, because in spite of it still being in an experimental stage I think it's a promising approach to building operating system images and updates.