Migrating infrastructure off Coolify • Cory Dransfeldt

I've spent a while recently migrating my personal infrastructure off Coolify. Coolify's an excellent tool and one that helped manage the initial learning curve of managing and deploying things when I started to self host things. As I got more comfortable with the process, issues and details I found myself wanting to remove it as an abstraction on top of what I had become comfortable managing.

Thankfully, moving off of Coolify wasn't all that bad.

I inventoried what I was running (docker ps --format '{{.Names}}' | sort).
Authored a consolidated docker-compose.yml for each server I manage that's running Docker (the server hosting this site is not).
Created new volumes to align with those generated by Coolify (this allowed me to simplify names and remove dynamically generated IDs from them).
Copied data to the new volumes.

docker run --rm \
    -v rccw4kock44w0g0kgckkww0o_forgejo-data:/source:ro \
    -v apps-server_forgejo-data:/dest \ alpine sh -c "cp -av /source/. /dest/"

Verified the move with something non-critical (I went with GoatCounter).
Stopped the Docker managed containers.
Brought up the docker-compose.yml managed containers (each container in the compose got a human readable name as well).
Removed the old containers.
Verified and removed the old volumes.
Pruned and cleaned up.

I did this for my primary apps server and followed the same pattern for audiobookshelf, Navidrome and Immich. I now have the files for my infrastructure versioned in git and pushed to my forgejo instance (the structure is ~/infra/HOSTNAME-server).

I'd been deploying a few services using Coolify webhooks — two when changes were pushed to their respective forgejo repos and this site is deployed manually via my Filament dashboard. Replicating this without Coolify was, well, fun. I'd wanted to run adnanh/webhook in a container, but my deploy scripts run docker compose commands. The webhook container, however, is Alpine-based and, well, we don't need to be running Docker in Docker or building a custom image.

Instead, I'm running the adnanh/webhook as a binary on the host and proxying it through Caddy. I define each hook in hooks.json like this:

{
  "id": "coryd-dev",
  "execute-command": "/root/infra/apps-server/webhook/scripts/deploy-coryd-dev.sh",
  "trigger-rule": {
    "match": {
      "type": "value",
      "value": "Bearer <token>",
      "parameter": {
        "source": "header",
        "name": "Authorization"
      }
    }
  }
}

And can post to the appropriate rule to run the appropriate script.

forgejo runs in a container and can't reach services outside the Docker bridge network via a domain, so those webhooks point directly to the bridge gateway: http://10.0.10.1:9000/hooks/<service-name>.

For this site I have a CNAME record that resolves to the webhook service being managed by Caddy as they live on both separate servers and this site does not live in a container.

As if this wasn't enough, I also moved these services behind Tailscale.

I installed Tailscale (curl -fsSL https://tailscale.com/install.sh | sh).
Enabled Tailscale ssh (sudo tailscale up --ssh).
Enabled Tailscale on boot (sudo systemctl enable tailscaled).

To keep these servers on the tailnet persistently I also disabled key expiry for each one in the Tailscale admin console.

ufw is used to manage access to the servers and scoping incoming traffic is as simple as ufw allow in on tailscale0 (with a few caveats). Forward rules are also needed to route traffic to containers:

ufw route allow in on tailscale0 proto tcp to 10.0.10.2 port 80
ufw route allow in on tailscale0 proto tcp to 10.0.10.2 port 443

Where 10.0.10.2 is Caddy's static ip on the bridge network. With those rules in place, traffic to ports 80 and 443 is forwarded to Caddy. Each container also gets a static IP, which allows for stable firewall rules.

Next, because services behind Tailscale aren't publicly accessible, Let's Encrypt's standard HTTP-01 challenges won't work to issue SSL certificates for the subdomains I have configured. To handle this, I'm using DNS-01 challenges via certbot and the DNSimple plugin:

certbot certonly --dns-dnsimple \
  --dns-dnsimple-credentials ~/.secrets/dnsimple.ini \
  -d '*.coryd.dev' -d 'coryd.dev'

This covers the subdomains I configure for the A records I point at my Tailscale IPs. DNSimple manages my public DNS records and the same API token is used by certbot to complete the DNS-01 challenge. The certs are renewed using a systemd timer. This happens on each of my servers in the tailnet. Caddy is running in a container with auto_https off and certs directly accessible via a volume mount.

With each A record pointed to a Tailscale IP, devices off of the tailnet can't reach them. Devices on the tailnet need a DNS resolver that knows these domains map to Tailscale IPs. I set up CoreDNS in a container on my apps server with a Corefile mapping each A record to the appropriate IP:

coryd.dev {
    hosts {
        IP <service>.coryd.dev
        fallthrough
    }
    forward . 1.1.1.1
    log
    errors
}

. {
    forward . 1.1.1.1
    log
    errors
}

In the Tailscale admin console, I added my apps server IP as a split DNS resolver, routing queries for coryd.dev through CoreDNS before querying NextDNS (which I use as my go to DNS service).

I keep my admin panel behind Tailscale by configuring nginx to allow access only from tailnet source IPs. It also connects to the database connection pooler and postgREST via the tailnet.

This was a significant amount of work and it's a migration I had to plan and think through in detail. I'd originally started with the goal of simply moving off of Coolify, but I'd recently set up Tailscale to make Jellyfin accessible outside of my home network (there are other services I plan to expose from my NAS in the future). Moving my cloud-based services behind Tailscale for added security made too much sense not to do.

With all of this in place, my infrastructure is versioned, more secure and easier to reason about. Every change is a commit and every server's state is a text file I can read. I also managed to reinforce how much of a pain dealing with networking can be and can't imagine how painful it would be without Tailscale.