Cheap Docker images with Nix

lethalman.blogspot.com

171 points by Lethalman 10 years ago · 50 comments

Reader

nickjj 10 years ago

Why isn't he using the Alpine based Redis image when comparing final image sizes?

It's unfair to say the official Redis image is 177mb because the Alpine version is available on the Docker Hub[1] and it's only 15.95mb.

Alpine is pretty awesome if your main goal is to shrink images without any effort[2].

[1] https://hub.docker.com/_/redis/

[2] http://nickjanetakis.com/blog/alpine-based-docker-images-mak...

LethalmanOP 10 years ago

You are right. It's unfair, I'm going to modify the post to mention alpine.
The post is both about tiny images AND how to build docker images with nix. I think it was an interesting tooling to make for our community. Some Nix people are already using it for obvious reasons.
- yhodique 10 years ago
  
  I commented on the original post, but actually I believe the strength of the Nix-based approach is to provide package management capabilities outside of the target image. In other words, it makes the "scratch" image actually usable.
  So now, if you take an Alpine-like approach to the problem (musl, no extra stuff) in Nix, you can get much smaller images. And the reason is you don't have to pay for the limitations of the Dockerfile-based approach.
  As a proof of concept, here's an extension of the Nix recipe to produce a 1.2MB redis image: https://gist.github.com/sigma/9887c299da60955734f0fff6e2faee...
  Now, the numbers start getting a little bit meaningless (although that's still an order of magnitude...), but the point is that regardless of how great Alpine is (and it is definitely great), as a base image for a Dockerfile it'll always contain way too much stuff compared to what's really needed for the application itself.
markbnj 10 years ago

Agreed, and alpine is the official Docker base distro at this time. I've built some very small images for haproxy and iperf with it.

_0w8t 10 years ago

A common problem with Docker is after running a compiler/preprocessor during an image build one ends up with the bulid tools inside the image. A workaround is to use a shell script that first gets the image with the compiler, run it and then pass the output to the Docker build. But this is non-standard and encourages running random scripts on the build host defeating the advantages of using Docker during development to isolate build system. It is nice that Nix addresses this.

xienze 10 years ago

Or you could remove the tools at the end of your Dockerfile...
Edit: am I missing something? This is a legitimate solution to the problem. Install the tools, compile, and remove them. The parent is suggesting a very clumsy approach (build on the host and pass the binary to the container as it's being built).
- shykes 10 years ago
  
  (I didn't downvote you)
  Disclaimer: I work at Docker.
  Your approach is the logical one... But Docker currently has a limitation in how it handles removing files in a build. After each build step, the intermediary state is committed as a layer, just like a git commit. So removing files in a docker build is like removing files in git: they are still taking up space in the history.
  The long-term solution is to support image "squashing" or "flattening" in docker-build.
  A less clumsy short-term solution is to build a Docker image of the build environment; then 'docker run' that image to produce the final artifact. At least that way you get rid of the dependency on the host, which keeps your build more portable (if not as convenient as a single 'docker build')
  - akbar501 10 years ago
    
    Our approach is to view Docker as part of our overall development process and then develop stage specific containers.
    For example, we have development containers, build containers and runtime containers. Runtime containers are further segmented into product demo containers, testing containers and production containers.
    I just published a new article on Docker this morning: http://www.dynomitedb.com/blog/2016/04/13/docker-containers/
    An important point is that the build containers produce binaries that are used in both native package managers (ex. apt) and in Docker containers.
    If you're interested in seeing this in action then checkout our source on GH: https://github.com/DynomiteDB
    IMHO, a well designed approach to UnionFS layers is vital to high quality container architecture.
    While we're focused on container use for databases (both in-memory and on-disk), much of our approach applies equally well to application layer containers.
    
    _0w8t 10 years ago
    
    Nice reference about https://www.projectcalico.org . At some point insanity of using ethernet on top of UDP to carry IP traffic between containers must stop.
  - jasonjei 10 years ago
    
    Straight from the horse's mouth--admire your product, Mr. Hykes!
    I love how you can run docker inside of a container. What I've done sometimes is run docker inside my build environment container. I use Docker Machine (OSX), so I just send the same machine environmental variables over to the container, but on Linux you could just link the socket file. In fact, I have a container just for Google Cloud that maintains my GKE config and makes it easy for me to prepare new deployments to the cloud.
    
    bogomipz 10 years ago
    
    Could you elaborate on this process of deploying your container from inside docker via socket linking? I'm not sure I follow.
  - xienze 10 years ago
    
    > But Docker currently has a limitation in how it handles removing files in a build. After each build step, the intermediary state is committed as a layer, just like a git commit. So removing files in a docker build is like removing files in git: they are still taking up space in the history.
    That's true unless you do the "everything in a single RUN statement" trick that is very popular.
    
    fapjacks 10 years ago
    
    "Very popular" for the one case of installing things via package manager.
  - lisivka 10 years ago
    
    Proper solution is to build packages first using build environment, then install built binary packages in container, like any other package.
- zenlikethat 10 years ago
  
  There's also another workaround to the ones mentioned by other posters: You could install your compilers, do your build, and clean up all your build tools within just one shell script invoked by a RUN line in Dockerfile. It's not pretty but it works.
  - shykes 10 years ago
    
    Yes that is a common workaround as well.
    By the way: there is an open request for contributions to help improve this. The core Docker team very much wants to improve squashing in build, but it's a matter of time and resources.
    If somebody cares enough to take the time to carry a design proposal then a patch, we would be happy to support that effort!
- _0w8t 10 years ago
  
  The idea is not to build anything on the host. Rather it is more like a staged build initiated through a shell script. First pull/build image with the compiler, then docker run it to compile the application and finally build the final image with the application.
- mayank 10 years ago
  
  The layered fs that Docker uses is based on additive snapshots, so removing tools at the end will paradoxically increase your image size with a useless snapshot.
  - xienze 10 years ago
    
    You can do everything within a single RUN statement to avoid that.
    
    mayank 10 years ago
    
    Yes, but at the expense of readability, development speed, and incremental updates to images (where typically the dependencies layer changes slower than your target code).
mmerickel 10 years ago

The idea is actually to do something similar to a heroku buildpack where you have a container with build tools that generates binary assets. You then inject the built binary into a new image that has only runtime dependencies installed.
I've experimented (and use) a variant of this workflow myself built around my marina tool [1]. The basic idea is to define a file that uses a dev/builder image to build, then exports a tarball into a runner image.
[1] https://pypi.python.org/pypi/marina
- lisivka 10 years ago
  
  It is much easier to just use standard package format (rpm/deb/apk/etc.) and standard installer (yum/dnf/apt/apk/etc.). Of course, you can invent your own build and installation system, it will work too.
  - mmerickel 10 years ago
    
    That depends on a lot of factors. The advantage of an approach like this is that every package is built in a clean-room container independent of the host. For example my host is os x and I'm building binary tarballs to run on ubuntu. If you have a build server obviously this is less of an issue.
    
    lisivka 10 years ago
    
    Just build .deb's instead of tarballs. Use "alien" to convert .tgz into .deb, for example. I see no reason to invent my own build system, package format and package management software. I build my rpm packages in clean room chroot (using mock) for about decade. It works fine in docker too.
stevvooe 10 years ago

Nix has been a very cool project to watch over the years.
You can address part of the problem of picking up extra data in final images by declaring temporary build locations, such as `/var/lib/cache`, as a volume. Anything written to a volume won't be included in the final image.

TomFrost 10 years ago

If the goal is solely Docker images with a standard size in the 20-40MB range, this can be achieved without additional tooling. After switching our development and deployment flow to docker, my team quickly tired of the 200-400MB images that seemed to be accepted as commonplace. We started basing our containers on alpine (essentially, busybox with a package manager) or alpine derivatives, and dropped into that target size immediately. Spinning up 8-10 microservices locally for a full frontend development stack is a shockingly better experience when that involves a 200MB download rather than a 2GB one.

This is in no way a negative commentary on Nix; it looks like an interesting solution to a well-known problem.

Artemis2 10 years ago

Same here! Switching to Alpine for most services was essentially painless. To go a step further, the images with binaries that have no dependencies (mostly programs written in Go) use scratch Docker images. This way we get 5MB images, where the size overhead of Docker is nothing.
- _0w8t 10 years ago
  
  I have found that images with a single executable and perhaps /etc/passwd with no other files prevents to use docker exec as a valuable debugging/poking tool. My preference is to have a single image with all the services and basic tools included and use it to run all the containers on the machine.

iElectric2 10 years ago

Once https://github.com/NixOS/nixpkgs/pull/14711 is merged, the images might also be binary deterministic (depending on what packages you use).

edofic 10 years ago

Already merged :)

josh-wrale 10 years ago

@Lethalman: Can you expound on this?

> Be aware that things like PAM configurations, or other stuff, created to be suitable for Debian may not work with Nix programs that use a different glibc.

So this would not be a factor using the method which has no base? The Debian base approach seems like a non-starter if negative emergent behavior like PAM config mismatches are common.

Also, to be sure, I can do this "no base Docker build" using Nix on let's say CentOS 7? Meaning, I'm not required to use NixOS natively?

I plan to read the post closer later today, so feel free to ignore these questions if they are answered in-post, but I usually don't post to HN from work computer, so I thought I'd get my questions out here early in case the thread drops and I forget to ask. :-)

Nice work!

trishume 10 years ago

Yes, AFAIK you can build docker containers with Nix on any Linux machine with the nix package manager installed. You can even build them on OSX if you configure your OSX Nix install with a remote build machine that runs Linux (yes, Nix can automatically and transparently distribute your builds).

k__ 10 years ago

Haha, very good.

So Nix makes even the use of Docker better, while some Nix user here claimed that you don't even need Docker if you're using Nix(OS).

LethalmanOP 10 years ago

You don't need in fact. But sometimes you are forced to use Docker anyway.

speedkills 10 years ago

Installed emacs with nix last weekend to use with spacemacs. Three errors at startup that didn't make sense but I had a feeling that nix was the problem so rolled back and reinstalled with brew. Worked perfectly. Nix has some great ideas but when the install of emacs took 10 times as long as homebrew, then didn't work correctly it didn't leave me wanting to use it for anything work related like my docker images.

Hopefully with time, their planned cli improvements and binary caching it will be a contender but that feels a ways off at this point.

thinkpad20 10 years ago

> Nix has some great ideas but when the install of emacs took 10 times as long as homebrew, then didn't work correctly it didn't leave me wanting to use it for anything work related like my docker images.
I understand it's not fun to watch a long build process finish only to have the final product not work, but that assessment is not really fair. Nix itself is not the problem there; the problem is in the definition of that package. The same thing could just as easily occur in a package defined in homebrew. Saying that nix itself is at fault when an individual package (among hundreds of thousands) is faulty is sort of akin to encountering a buggy program and equating it to a bug in the language the program was written in.
I'd also add that due to the closed-source nature of OSX core libraries, it's hard to achieve the same degree of robust determinism in OSX that nix allows in other platforms. Fortunately things tend to be very solid on Linux. My company has been using Nix in production for about a year now and it has been a huge benefit to platform stability and speed of deployment.
bennofs 10 years ago

Nix does have binary caching (but only if you use `/nix/store` for the nix store path), I'm using it all the time on linux. OSX support is definitly much less stable though and there have been problems with build machines for OSX in the past, so I do not know how the situation on OSX regarding binary caches is currently.
Regarding your point about build time, I'm not sure why installing with nix would make that much of a difference. Nix still just executes the build system of the underlying packages. However, nix might have to build more packages since it doesn't rely on the underlying system to provide dependencies which helps to make it more robust. Additionally, emacs has various build configurations, which might also require different dependencies, affecting build time.
ocharles 10 years ago

If there are any bug reports you could file, that would be much appreciated. Installing Emacs should just be a case of downloading the prebuild binaries, which shouldn't take long at all. I use Spacemacs and have never encountered any problems. I have a feeling things are still a lot worse on OSX though...

bogomipz 10 years ago

I really like the Nix package manager, however is there an upside to using Nix to build a Docker image over just writing a regular Dockerfile? Is this an odd use case? Maybe it just for demo purposes? Is there a benefit I'm overlooking?

cstrahan 10 years ago

For reference, I left a comment further down:
https://news.ycombinator.com/item?id=11509065

isido 10 years ago

Interesting. Not being that familiar with Nix(OS), how much of a moving target Nix is? Can you do these kinds of things with stable versions or do you need to keep up with HEAD?

LethalmanOP 10 years ago

The blog post code is supposed to run with latest master since a few days. We've merged a big change that leads to reducing a lot the closure of our packages.
Nix moves fast enough, in the sense we usually do a good job at not breaking things. Yet we have to necessarily introduce innovations in our frameworks.

meta_AU 10 years ago

Now all we need is a non-Docker image push and I can remove docker-in-docker from my build system.

fishnchips 10 years ago

I'm wondering what would be the advantage of using Nix versus building on Alpine Linux with good understanding how Docker layers work. My main reason to be skeptical about Nix is the need to learn a new single-purpose language as opposed to just using Shell like you do in Dockerfiles.

davexunit 10 years ago

Both the Nix language and the Dockerfile language include embedding shell scripts. Let's not pretend that Docker doesn't have its own DSL to learn. Dockerfile's are imperative, where as Nix is declarative and functional, which is a big improvement.
- fishnchips 10 years ago
  
  I see Docker's DSL as a rather thin layer of abstraction as compared with Nix. Re: declarative vs. functional I believe this does not matter all this much in containerland. As long as I can get sh*t done deterministically, I could not care less about the programming paradigm that got me there.
  - davexunit 10 years ago
    
    >As long as I can get sh*t done deterministically
    Docker is non-deterministic. If you and I build the same image, we are not going to get the same result. See https://reproducible-builds.org for more information on the subject.
bogomipz 10 years ago

This is the same thing I was asking. As much as I like the idea of declarative functional package manager what value does it provide if you are just building docker images?
- cstrahan 10 years ago
  
  Here's a copy of a comment I left on the post:
  1. Better abstraction (e.g. the example of a function that produces docker images).
  2. The Hydra build/CI server obviates the need for paying for (or administering a self hosted) docker registry, and avoids the imperative push and pull model. Because a docker image is just another Nix package, you get distributed building, caching and signing for free.
  3. Because Nix caches intermediate packages builds, building a Docker image via Nix will likely be faster than letting Docker do it.
  4. Determinism. With Docker, you're not guaranteed that you'll build the same image across two machines (imagine the state of package repositories changing -- it's trivial to find different versions of packages across two builds of the same Dockerfile). With Nix, you're guaranteed that you have the same determinism that any other Nix package has (e.g. everything builds in a chroot without network access (unless you provide a hash of the result, for e.g. tarball downloads))

awinter-py 10 years ago

what docker build extensions would make it possible to do this without 'rolling your own' tarball docker layer? having access to volumes at build time?

Settings

Cheap Docker images with Nix

Keyboard Shortcuts