Faster Docker builds using a remote BuildKit instance

75 points by adityamaru a year ago · 47 comments

Reader

This is fairly similar in concept to what we do over at depot.dev https://depot.dev/blog/depot-magic-explained

We've found that BuildKit has several inefficiencies preventing it from being as fast as it could be in the cloud, especially when dealing with simultaneous builds (common in CI). That led us to create our own optimized fork of BuildKit.

The number of fine-tuning knobs you can turn running a self-hosted BuildKit instance is limitless, but I also encourage everyone to try it as a fantastic learning exercise.

astroalex a year ago

Love what you're doing at Depot, keep up the good work!
- TechSquidTV a year ago
  
  Thank you!

rtpg a year ago

I really am hopeful we come a bit full circle on builders and machines to "we buy one or two very expensive machines that run CI and builds". Caching in particular is just sitting there, waiting to be properly captured, instead of constantly churning on various machines.

Of course, CI SaaSes implement a lot of caching on their end, but they also try to put people on the most anemic machines possible to try and capture those juicy margins.

aayushshah15 a year ago

> we buy one or two very expensive machines that run CI and builds
This unfortunately does not work for orgs that have, say, more than 20 engineers. The core issue is that once you have a test suite large enough to have ~30 shards, you only need one engineer `git push`ing once to saturate those 1-2 expensive machines you've got sitting in the office.
The CI workload is quite amenable to "serverless" when you get to a large enough org size, where most of the time you actually want to pay nothing (i.e. outside your business hours) but when your engineers are pushing code, you want 1500 vCPUs on-demand to run 4 or 5 test suites concurrently.
- rtpg a year ago
  
  Sounds like somebody should set up incremental CI with Bazel :)
  Seriously though, of course there's a lot of details here, but I think people tend to not really interenalize how much testing is about confidence, and things like incremental CI can really chew away at how big/small your test suite needs to be. There are some things that are just inherently slow, but I've seen a lot of test suites that are mostly rerunning tests that only use unchanged code for most of its runtime.
  My glib assertion is that there is likely to be no test suite generated by 20 engineers that requires 30 shards that is impossible to chop up with incremental CI. And downstream of that, getting incremental CI would improve DX a lot, cuz I bet those 30 shards take a long time
  - suryao a year ago
    
    incremental CI is absolutely the way to go
- BonoboIO a year ago
  
  I can get a 48 core/96 threads dedicated server for 200€ a month on hetzner. The cheapest EC2 with that comes close costs 2€ per hour. I can get nearly 10 hetzner servers that run consistently for that price.
  Obviously the dedicated machines are not rentable per hour, but the cloud is so much more expensive.
  - adamcharnock a year ago
    
    Very much this. I’ve overseen this process for one of my clients and we’ve seen build + deploy times go from 5-10 minutes down to around 1-2m. This was down to increased performance, improved caching, and being about to cut out some minor workflow setup steps.
    So 10x cheaper and 5x the performance.
    Still using GitHub Actions, but now just using self-hosted runners.
  - aayushshah15 a year ago
    
    Well, the point was that if 4 concurrent `git push`es saturates up to 1500 vCPUs then you'd need 16 of those hetzner dedicated servers (which you have to manage the uptime for) that you're paying for for the entire month. ~4 pushes is a very small amount and an org with a few dozen or so engineers will regularly see peaks higher than this.
    Additionally, you'd have to ensure some isolation across your test runs (either by running the test suites in ephemeral containers, or VMs) which is additional engineering work for something that isn't business critical.
    
    adamcharnock a year ago
    
    In my experience, dedicated hardware has provided a baseline real-world 2x speed up over cloud instances (presumably down to no contention, local nvme). So that would be 8 hetzner instances.
    I managed to squeeze out a 5x speed up total (see my other comment). In which case that would mean 3-4 instances.
    Plus with shorter build times you may then find that having a builds queued up is acceptable.

zerotolerance a year ago

There are a few tragedies in the Docker story, but at least two are specifically tied to naming things. First, Swarm (mode) because by the time they released Swarm (mode) the world had already taken a collective dump on Swarm (the proof-of-concept). Even in 2024 most of the time people talk about Swarm and start dumping on it they're actually talking about the proof-of-concept architecture. Second, they should never have called the subcommand "build." It isn't building anything. In this case "build" is performing a packaging step with very raw tools. But the minute they called it build people started literally building software INSIDE intermediate container layers on the way to assembling a packaged container. Dockerfile is about as weak of a build tool as you could possibly ask for. Zero useful features with respect to building software. But Docker named it "build" and now we've got Dockerfile calling compilation steps, test commands, and dependency retrieval steps.

Noumenon72 a year ago

What is the alternative? Like one pipeline step running Gradle or make or something and then copying the result into your container that's basically an apt-get and nothing else?
imtringued a year ago

I don't see how this is a tragedy. You're blowing something trivial out of proportion.
When you build alpine packages, you literally have to call abuild on your APKBUILD files. It's the same for Arch Linux. The files are called PKGBUILD. So even if you decide to package your applications (uh, using docker run? that changes nothing!) before docker build and then install them with the OS package manager, you will run into exactly the same thing.
jahewson a year ago

I can just imagine someone inside Docker, inc. saying “the name is not important” or over and over again until one day they ship it. They’ll never know what they missed out on.

pxc a year ago

I'm in the process of rolling out something analogous at work, where Nix jobs run inside rootless Podman containers but the Nix store and Nix daemon socket are passed through from the host, so the jobs' dependencies all persist, dependencies shared between projects are stored only once, when two concurrent jobs ask for the same dependency they both just wait for the dependency to be fetched once, etc.

We also currently have some jobs that build OCI images via the Docker/Podman CLI amd build using traditional Dockerfile/Containerfile scripts. For now those are centralized and run on just one host, on bare metal. I'd like to get those working via rootless Docker-in-Docker/Podman-in-Podman, but one thing that will be a little annoying with that is that we won't have any persistent caching at the Docker/Podman layer anymore. I suppose we'll end up using something like what's in the article to get that cache persistence back.

adityamaruOP a year ago

> Nix jobs run inside rootless Podman containers but the Nix store and Nix daemon socket are passed through from the host
That's a neat idea, was the primary motivation for building this out the perf gains on the table?
- pxc a year ago
  
  More or less! Before that we basically had two kinds of jobs: one where Docker provided the job's dependencies and one where Nix provided the job's dependencies. The former was, naturally, always containerized, but the latter ran on bare metal. (We also had some jobs that used Nix without a persistent /nix/store, but never mind that.) Given both options, choosing Nix over Docker is really nice in two ways: (1) you don't need to deal with any infrastructure (i.e., push an image somewhere) for your job; and (2) even so, the second run and later can be really, really fast. We also use Nix to provide dependencies on our local machines for some projects, so just reusing that same environment in CI is a natural fit, too.
  But as we started to mature our own CI 'infrastructure' (the automation we use to set up our self-hosted runners), I wanted to containerize the Nix builds. Using 'shell executors' in GitLab just feels icky to me, like a step backwards into Jenkins hell. Those jobs do leave a little bit more behind on disk. More importantly, though, while all of my team's Nix jobs use Nix in an ephemeral way, it is possible to run `nix profile install ...` in one of these bare metal jobs. That could potentially affect other such jobs, plus it creates a 'garbage collector root' that will reduce how much `nix-collect-garbage` can clean up a little bit. Our jobs are ones we'd like other teams across the company to run, and so we also want to provide some really low-effort ways for them to do so, namely: via shared infrastructure we host, via any Docker-capable runners they might already have, and by leveraging the same IaC we use to stand up our own runners.
  To that end, we really want to have just one type of job that requires just one type of execution environment, and we definitely want opt-in persistence instead of a mess where jobs can very easily influence one another by accident or malice. But we don't want to lose the speedup! The real action in these jobs is small, so by sharing a persistent Nix store between runs, they go down from 2-10 minutes to 2-10 seconds, which is the kind of UX we want for our internal customers.
  The new Nix image is more suitable for all three target scenarios: it's less risky on runner hosts shared by multiple teams, it still works normally (downloading deps via Nix on every run) on 'naive' Docker/Podman setups, and our runner initialization script actually uses Nix to provide Docker and Podman (both rootless), so any team can use it on top of whatever VM images they're already using for their CI runners regardless of distro or version once they're ready to opt into that performance optimization.

aliasxneo a year ago

This is basically what we do, except we use Earthly[1]. An Earthly satellite is basically a modified remote Docker Buildkit instance.

[1]: https://earthly.dev/

AkihiroSuda a year ago

> endpoint: tcp://${{ secrets.BUILDKIT_HOST }}:9999

This should be protected with mTLS (https://docs.docker.com/build/drivers/remote/) or SSH (`endpoint: ssh://user@host`) to avoid potential cryptomining attack, etc.

adityamaruOP a year ago

indeed, thats a good callout. We'll add this to our README over at https://github.com/useblacksmith/remote-buildkit-terraform

bhouston a year ago

30 minute docker builds? Crazy.

I know it is out of style for some, but my microservice architecture, which has a dozen services, each takes about 1:30m to build, maybe 2m at most (if there is a slow Next.js build in there and a few thousand npm packages), and that is just on a 4 core GitHub Actions worker.

My microservices all build and deploy in parallel so this system doesn't get slower as you expand to more services.

(Open source template which shows how it works: https://github.com/bhouston/template-typescript-monorepo/act... )

throw1948012309 a year ago

> My microservices all build and deploy in parallel so this system doesn't get slower as you expand to more services.
If you're deploying all your "microservices" in parallel, then what you might have built is a distributed monolith.
A microservice can be tested and deployed independently.
- nine_k a year ago
  
  I don't see a contradiction. I read it that the microservices are independent and thus can build in parallel, if several teams work on changes to several microservices.
  Spinning a build worker outright when a change us pushed is the fastest way, and may be expensive if the build process is prolonged.
  OTOH I've seen much faster image build times, with smart reuse of layers, so that you don't have to re-run that huge npm install if your packages.lock did not change.
- bhouston a year ago
  
  Whether it is a distributed monolith or a set of microservices is independent of the speed of build.
adityamaruOP a year ago

> 30 minute docker builds?
At Blacksmith we do see this pretty often! Rust services in particular are the most common offender.
- jeffparsons a year ago
  
  I'm working on an ungodly pile of hacks (https://github.com/jeffparsons/hope) to help with this. Coming Soon™: S3 backend and better tests.
  - adityamaruOP a year ago
    
    `hope` is a good name for a service trying to solve this problem :D
    
    jeffparsons a year ago
    
    Haha, thanks. I chose it because:
    - Here's One (I? You?) Prepared Earlier - Sometimes Hope _is_ a (caching) strategy - And yeah, I really hope I can make this thing work well.
    I like silly puns. They bring me joy.
- bhouston a year ago
  
  Yikes. I would be so much less productive with a 30 minute build time.
jjayj a year ago

I regularly build images where we install Python from source, which makes 30m seem quite normal...
- bhouston a year ago
  
  Why?
  - jjayj a year ago
    
    We do development/builds in containers to make things easier on the devs. These are the Python build containers, so we have to rebuild every time there's a new version of Python or one of its dependencies.
  - xyst a year ago
    
    Maybe to secure the supply chain
    
    smallerize a year ago
    
    If they can't trust their own build cache, they really have problems.

spankalee a year ago

Google's Cloud Build has always worked very well for me for remote builds, but it'd be nice if BuildKit works as consistent service interface so it's easy to switch between build backend providers.

suryao a year ago

This is pretty cool - provides a good speed up for container builds. The couple of beefy instances can set you back $200-1000 a month on aws apart from the regular github action runner costs and it only goes up from there. We have a way around that plus effective scaling for multiple parallel builds with WarpBuild.

As a side note: In my time running a CI infra co, we see that a majority of the workflow time for large teams comes from tests - which can have over 200 shards in some cases.

crohr a year ago

Another option is to simply cache layers in a fast cache close by (e.g. S3) ? Like https://runs-on.com/features/s3-cache-for-github-actions/#us...?

xyst a year ago

> At Blacksmith, we regularly see our customer’s Docker builds taking 30 minutes or more

What’s the most common cause of builds taking this long in the first place…

Worst I have ever had was 5 minutes, but subsequent builds were reduced to under a minute due to build cache, creating multi-stage builds, and keeping the layers thin and optimizing the .dockerignore

zerotolerance a year ago

People doing all the work of dependency fetches, code builds, and test execution inside ephemeral environments never designed for building software within.
- imtringued a year ago
  
  Fetching packages isn't the problem. The problem is the lack of "out of the box" caching pf the downloaded packages. You'll have to do that yourself with artifactory and Docker does not nudge you towards doing that, at all.
kapilvt a year ago

Multiarch via qemu
- adityamaruOP a year ago
  
  This is a common cause yeah but becoming less of an issue with increasing support for ARM runners.

delduca a year ago

I prefer to use a VPS

adityamaruOP a year ago

What provider do you prefer? We're big fans of Hetzner
selcuka a year ago

Well, EC2 is a type of VPS.
- pxc a year ago
  
  'Cloud VPS' is generally a lot more expensive than the cheapest old-school VPS providers— sometimes like 10x, similar to the numbers commenters elsewhere were discussing for dedicated servers vs. EC2.

Settings

Faster Docker builds using a remote BuildKit instance

Keyboard Shortcuts