Optimizing Docker image size and why it matters

252 points by swazzy 4 years ago · 107 comments

Reader

A common mistake that's not covered in this article is the need to perform your add & remove operations in the same RUN command. Doing them separately creates two separate layers which inflates the image size.

This creates two image layers - the first layer has all the added foo, including any intermediate artifacts. Then the second layer removes the intermediate artifacts, but that's saved as a diff against the previous layer:

    RUN ./install-foo
    RUN ./cleanup-foo

Instead, you need to do them in the same RUN command:

    RUN ./insall-foo && ./cleanup-foo

This creates a single layer which has only the foo artifacts you need.

This why the official Dockerfile best practices show[1] the apt cache being cleaned up in the same RUN command:

    RUN apt-get update && apt-get install -y \
        package-bar \
        package-baz \
        package-foo  \
        && rm -rf /var/lib/apt/lists/*

[1] https://docs.docker.com/develop/develop-images/dockerfile_be...

gavinray 4 years ago

You can use "--squash" to remove all intermediate layers
https://docs.docker.com/engine/reference/commandline/build/#...
The downside of trying to jam all of your commands into a gigantic single RUN invocation is that if it isn't correct/you need to troubleshoot it, you can wind up waiting 10-20 minutes between each single line change just waiting for your build to finish.
You lose all the layer caching benefits and it has to re-do the entire build.
Just a heads up for anyone that's not suffered through this before.
- JimDabell 4 years ago
  
  That’s useful, thanks.
  I’m confused why they haven’t implemented a COMMIT instruction.
  It’s so common to have people chain “command && command && command && command” to group things into a single layer. Surely it would be better to put something like “AUTOCOMMIT off” at the start of the Dockerfile and then “COMMIT” whenever you want to explicitly close the current layer. It seems much simpler than everybody hacking around it with shell side-effects.
  - franga2000 4 years ago
    
    There's an issue on Github about that and it's been open for about as long as Docker has existed. Looks like they just don't care.
  - bborud 4 years ago
    
    ...or at least some syntax that represents layers like (block) scopes do in a programming language so it is visually easier to see what is going on.
- imglorp 4 years ago
  
  This is huge, thanks for the lead. Others should note it's still experimental and your build command may fail with
  > "--squash" is only supported on a Docker daemon with experimental features enabled
  Up til now, our biggest improvement was with "FROM SCRATCH".
  - gavinray 4 years ago
    
    No problem.
    > Others should note it's still experimental and your build command may fail with
    You might try "docker buildx build", to use the BuildKit client -- squash isn't experimental in that one I believe =)
    https://docs.docker.com/engine/reference/commandline/buildx_...
  - selfup 4 years ago
    
    Good to know. `FROM scratch` is such a breath of fresh air for compiled apps. No need for Alpine if I just need to run a binary!
    
    jrockway 4 years ago
    
    Do keep in mind that you might want a set of trusted TLS certificates and the timezone database. Both will be annoying runtime errors when you don't trust https://api.example.com or try to return a time to a user in their preferred time zone. Distroless includes these.
    
    selfup 4 years ago
    
    Yea CA certs are the first pain point I hit. Worth the hurdle. Noted on the timezone. Never really thought about that one. Thanks!
- a_t48 4 years ago
  
  Downside - squash makes for longer pulls from the image repository, which can matter for large images or slow connections (you keep build layers but now have no caching for consumers of the image). There's various tricks to be pulled that don't use squash - I've had the most luck putting multiple commands into a buildkit stage, then mounting the results of that stage and copying the output in (either by manually getting the list of files, or using rysnc to figure it out for me).
- yjftsjthsd-h 4 years ago
  
  But then you end up with just one layer, so you lose out on any caching and sharing you might have gotten. Whether this matters is of course very context dependent, but there are times when it'll cost you space.
- selfup 4 years ago
  
  Had no idea about squash. Using cached layers can really save time, especially when you already have OS deps/project deps installed. Thanks!
- franga2000 4 years ago
  
  Doesn't that squash all your layers though? That defeats the whole purpose of there being layers. Now instead of a huge total size, but pushing only a fraction of it, your total is lower but you're pushing all of it every time. Same goes for disk space if you're running multiple instances or many images with shared lineage.
- fragmede 4 years ago
  
  Or build using the extra layers, and remove them by squishing the commands together once you've got the Dockerfile right.
qbasic_forever 4 years ago

You don't have to do this anymore, the buildkit frontend for docker has a new feature that supports multiline heredoc strings for commands: https://www.docker.com/blog/introduction-to-heredocs-in-dock... It's a game changer but unfortunately barely mentioned anywhere.
- chessmango 4 years ago
  
  Yeah, been using this myself for a whole bunch of sanity. Until buildkit is the default though, I wouldn't expect it to gain too much traction.
- pstuart 4 years ago
  
  Wow, that deserves its own Tell HN ;-)
  Thanks for the tip!
kristjansson 4 years ago

Multistage builds are a better solution for this. Write as many steps as required in the build image and copy only what’s needed into the runtime image in a single COPY command
martinpw 4 years ago
Is it an option to put all the setup and configuration into a script? So the Dockerfile becomes effectively just:
```
  RUN ./setup.sh
```
I have seen that in some cases as a way to reduce layer count while avoiding complex hard to read RUN commands. Also seen it as a way to share common setup across multiple Docker images:
```
  RUN ./common_setup_for_all_images.sh
  RUN ./custom_setup_for_this_image.sh
```
However this approach of doing most of the work in scripts does not seem common, so I'm wondering if there is a downside to doing that.
- kdmccormick 4 years ago
  The downside of this is the same as the upside: it stuffs all that logic into one layer. If the result of your setup script changes at all, then the cache for that entire layer and all later layers are busted. This may or may not be what you want.
  As a concrete example... if your setup.sh were:
  #!/bin/bash ./update_static_assets.sh ./install_libraries.sh ./clone_application_repo.sh
  then any time a static asset is updated, a library is changed, or your application code changes, the digest of the Docker layer for `RUN ./setup.sh` will change. Your team will then have to re-download the result of all three of those sub-scripts next time they `docker pull`.
  However, if you found that static assets changed less often than libraries, which changed less often than your application code, then splitting setup.sh into three correspondingly-ordered `RUN` statements would put the result of each sub-script its own layer. Then, if just your application code changed, you and your team wouldn't need to re-download the library and static asset layers.
- mathstuf 4 years ago
  
  I do this for all of the CI images I maintain. Additionally, it leaves evidence of the setup in the container itself. Usually I have a couple of these scripts (installing distro-provided deps, building each group of other deps, etc.).
zelphirkalt 4 years ago

During development or with any image, which you need to update rather often, you usually don't want to lose all of docker's caching by putting everything into one giant RUN directive. This is one, where early optimization strikes hard. Don't merge RUN directives from the start. First build your image in a non-optimized way, saving loads of build time making use of docker build cache.
Personally I would not merge steps, which have nothing to do with each other, unless I am sure, that they are basically set in stone forever.
With public and widely popular base images, which are not changed once they are released, the choices might be weighed differently, as all the people, who make use of your image, will want to have fast download and small resulting images building on top of it.
Simply put: Don't make your development more annoying than necessary, by unnecessarily introducing long wait times for building docker images.

qbasic_forever 4 years ago

There's some more to consider with the latest buildkit frontend for docker, check it out here: https://hub.docker.com/r/docker/dockerfile

In particular cache mounts (RUN --mount-type=cache) can help the package manager cache size issue, and heredocs are a game-changer for inline scripts. Forget doing all that && nonsense, write clean multiline run commands:

    RUN <<EOF
      apt-get update
      apt-get install -y foo bar baz
      etc...
    EOF

All of this works right now in plain old desktop docker you have installed right now, you just need to use the buildx command (buildkit engine) and reference the docker labs buildkit frontend image above. Unfortunately it's barely mentioned in docs or anywhere else other than their blog right now.

rochacon 4 years ago

I guess you don't even need to use `docker buildx`, just `export DOCKER_BUILDKIT=1` and go with it (great to enable globally in a CI system). Heredocs make these multi-lines so much cleaner, awesome.

miyuru 4 years ago

There are another base images from google that are smaller than the base images and come handy when deploying applications that runs on single binary.

> Distroless images are very small. The smallest distroless image, gcr.io/distroless/static-debian11, is around 2 MiB. That's about 50% of the size of alpine (~5 MiB), and less than 2% of the size of debian (124 MiB).

https://github.com/GoogleContainerTools/distroless

Ramiro 4 years ago

Distroless are tiny, but sometimes the fact that don't have anything on them other than the application binary makes them harder to interact with, specially when troubleshooting or profiling. We recently moved a lot of our stuff back to vanilla debian for this reason. We figured that the extra 100MB wouldn't make that big of a difference when pulling for our Kubernetes clusters. YMMV.
- podge 4 years ago
  
  I found this to be an issue as well, but there are a few ways around this for when you need to debug something. The most useful approach I found was to launch a new container from a standard image (like Ubuntu) which shares the same process namespace, for example:
  docker run --rm -it --pid=container:distroless-app ubuntu:20.04
  You can then see processes in the 'distroless-app' container from the new container, and then you can install as many debugging tools as you like without affecting the original container.
  Alternatively distroless have debug images you could use as a base instead which are probably still smaller than many other base images:
  https://github.com/GoogleContainerTools/distroless#debug-ima...
- jrockway 4 years ago
  
  I've found myself exec-ing into containers a lot less often recently. Kubernetes has ephemeral containers for debugging. This is of limited use to me; the problem is usually lower level (container engine or networking malfunctioning) or higher level (app is broke, and there is no command "fix-app" included in Debian). For the problems that are lower level, it's simplest to resolve by just ssh-ing to the node (great for a targeted tcpdump). For the problems that are higher level, it's easier to just integrate things into your app (I would die without net/http/pprof in Go apps, for example).
  I was an early adopter of distroless, though, so I'm probably just used to not having a shell in the container. If you use it everyday I'm sure it must be helpful in some way. My philosophy is as soon as you start having a shell on your cattle, it becomes a pet, though. Easy to leave one-off fixes around that are auto-reverted when you reschedule your deployment or whatever. This has never happened to me but I do worry about it. I'd also say that if you are uncomfortable about how "exec" lets people do anything in a container, you'd probably be even more uncomfortable giving them root on the node itself. And of course it's very easy to break things at that level as well.
- theptip 4 years ago
  
  Also if you are running k8s, and use the same base image for your app containers, you amortize this cost as you only need to pull the base layers once per node. So in practice you won’t pull that 100mb many times.
  (This benefit compounds the more frequently you rebuild your app containers.)
  - yjftsjthsd-h 4 years ago
    
    Doesn't that only work if you used the exact same base? If I build 2 images from debian:11 but one of them used debian:11 last month and one uses debian:11 today, I thought they end up not sharing a base layer because they're resolving debian:11 to different hashes and actually using the base image by exact image ID.
    
    gui77aume 4 years ago
    
    Indeed. But the old debian and the new debian images may have a common layer.
  - PaulKeeble 4 years ago
    
    Base images like alpine/debian/ubuntu get used by a lot of third party containers too so if you have multiple containers running on the same device they may in practice be very small until the base image gets an upgrade.
    
    erik_seaberg 4 years ago
    
    This. The article talks about
    > Each layer in your image might have a leaner version that is sufficient for your needs.
    when reusing a huge layer is cheaper than choosing a small layer that is not reused.
    
    Ramiro 4 years ago
    
    I think this something that people miss a lot when trying to optimize their Docker builds. Is the whole optimizing for most of your builds vs optimizing for a specific build. Not easy.
- gravypod 4 years ago
  
  There are some tools that allow you to copy debug tools into a container when needed. I think all that needs to be I'm the container is tar and it runs `kubectl exec ... tar` in the container. This allows you to get in when needed but still keep your production attack surface low.
  Either way as long as all your containers share the same base layer it doesn't really matter since they will be deduplicate.
  - theptip 4 years ago
    
    I believe “Ephemeral containers” are intended to resolve this issue; you can attach a “debug container” to your pod with a shell and other tools.
    https://kubernetes.io/docs/concepts/workloads/pods/ephemeral...
    Still beta, I haven’t tried it yet myself. Looks interesting though.
- staticassertion 4 years ago
  
  The way I imagine this is best solved is by keeping a compressed set of tools on your host and then mounting those tools into a volume for your container.
  So if you have N containers on a host you only end up with one set of tooling across all of them, and it's compressed until you need it.
  You can decouple your test tooling from your images/containers, which has a number of benefits. One that's perhaps understated is reducing attacker capabilities in the container.
  With log4j some of the payloads were essentially just calling out to various binaries on Linux. If you don't have those they die instantly.
ImJasonH 4 years ago

It got removed from the README at some point, but the smallest distroless image, gcr.io/distroless/static is 786KB compressed -- 1/3 the size of this image of shipping containers[0], and small enough to fit on a 3.5" floppy disk.
0: https://unsplash.com/photos/bukjsECgmeU
yjftsjthsd-h 4 years ago

So the percentage makes it look impressive, but... you're saving no more than 5MB. Don't get me wrong, I like smaller images, but I feel like "smaller than Alpine" is getting into -funroll-loops territory of over-optimizing.
- tallclair 4 years ago
  
  The best advantage I’ve found to the distroless static image is that it cuts down on the noise from container vulnerability scanners.

tonymet 4 years ago

This app is great for discovering waste

https://github.com/wagoodman/dive

I've found 100MB fonts and other waste.

All the tips are good, but until you actually inspect your images, you won't know why they are so bloated.

Twirrim 4 years ago

Every now and then I break out dive and take a look at container images. Almost without fail I'll find something we can improve.
The UX is great for the tool, gives me absolutely everything I need to see, in such a clear fashion, and with virtually no learning curve at all for using it.
a_t48 4 years ago

+1 for dive - I just wish it performed better on larger images.
- angrais 4 years ago
  
  Can you outline the issues you've had?
  I have used it with 15GB sized images without problems. (Size due to machine learning related image)
  - a_t48 4 years ago
    
    The biggest issue was the number of files - it would hang for a bit while walking through the history

sriku 4 years ago

If you really want to optimize image size, use Nix!

Ex: https://gist.github.com/sigma/9887c299da60955734f0fff6e2faee...

Since it captures exact dependencies, it becomes easier to put just what you need in the image. Prior to nix, my team (many years ago) built a redis image that was about 15MB in size by tracking the used files ans removing unused files. Nix does that reliably.

steve-chavez 4 years ago

Can second Nix! With Nix we were able to reduce PostgREST image size[1] from over 30 MB to about 4 MB.
[1]: https://github.com/PostgREST/postgrest/tree/main/nix/tools/d...
jvolkman 4 years ago

We use Nix + Bazel. Nix builds the base image with Python and whatever else we want. Bazel layers our actual Python app on top of it. No dockerfiles at all.
Example: https://github.com/jvolkman/bazel-nix-example
- kristjansson 4 years ago
  
  It sounds so cool, but then I don’t get out of the base image before you’re writing your own Python launcher in a heredoc in a shell script in a docker image builder in a nix derivation[0]? Curiosity compels me to ask: how did all that become necessary?
  [0]: https://github.com/jvolkman/bazel-nix-example/blob/e0208355f...
  - jvolkman 4 years ago
    
    It mostly grew out of using Nix to fetch a python interpreter for builds and tests. By default, Bazel will use whichever python binary is on the host (if any), which can lead to discrepancies between build hosts and various developer machines.
chriswarbo 4 years ago

The main difference between Dockerfiles and something like Nix is that the former is run "internally" and the latter "externally".
For example, a Dockerfile containing 'my-package-manager install foo' will create an image with foo and my-package-manager (which usually involves an entire OS, and at least a shell, etc.). An image built with Nix will only contain foo and its dependencies.
Note that it's actually quite easy to make container images "externally", using just `tar`, `jq` and `sha256sum`. The nice thing about using Nix for this (rather than, e.g. Make) is the tracking of dependencies, all the way down to the particular libc, etc.

bingohbangoh 4 years ago

For my two cents, if you're image requires anything not vanilla, you may be better off stomaching the larger Ubuntu image.

Lots of edge cases around specific libraries come up that you don't expect. I spent hours tearing my hair out trying to get Selenium and python working on an alpine image that worked out-of-the-box on the Ubuntu image.

aledalgrande 4 years ago

I would rather install the needed libraries myself and not have to deal with tons of security fixes of libraries I don't use.
- erik_seaberg 4 years ago
  
  That’s rolling your own distro. We could do that but it’s not really our job. It also prevents the libraries from being shared between images, unless you build one base layer and use it for everything in your org (third parties won’t).
- coredog64 4 years ago
  
  Once you start adding stuff, I think Alpine gets worse. For example, there’s a libgmp issue that’s in the latest Alpine versions since November. It’s fixed upstream but hasn’t been pulled into Alpine.
- pas 4 years ago
  
  musl DNS stub resolver is "broken" unfortunately (it doesn't do TCP, which is a problem usually when you want to deploy something into a highly dynamic DNS-configured environment, eg. k8s)
- CJefferson 4 years ago
  
  Do libraries just sat there on disc do any damage?
  Also, are you going to update those libraries as soon as a security issue arises? Debian/Ubuntu and friends have teams dedicated to that type of thing.
  - postalrat 4 years ago
    
    Can they be used somehow? They perhaps.
    Depending where you work you might also need to pass some sort of imaging scan that will look at the versions of everything installed.
- curiousgal 4 years ago
  
  I mean honestly if you're that paranoid then you shouldn't be using Docker in the first place.
  - aledalgrande 4 years ago
    
    What does docker have to do with patching security fixes? If you have an EC2 box it's going to be the same. I don't consider that paranoid.
    
    xyzzy_plugh 4 years ago
    
    This is not a valid comparison. You're comparing bare metal virtual machines wherein you are responsible for all of the software running on the VM, with a bundled set of tarballs containing binaries you probably cannot reproduce.
    Many, many vendors provide docker images but no Dockerfile. And even if you had the Dockerfile you might not have access to the environment in which it needs to be run.
    Docker is successful in part because it punts library versioning and security patches and distro maintenance to a third party. Not only do you not have to worry about these things (but you should!) now you might not be able to even do anything if you wanted to.
    
    prmoustache 4 years ago
    
    > Docker is successful in part because it punts library versioning and security patches and distro maintenance to a third party. Not only do you not have to worry about these things (but you should!) now you might not be able to even do anything if you wanted to.
    This is a very restricted view.
    Besides this article is about building your own images, not using existing ones.
mianos 4 years ago

I found this is not actually an "Alpine" issue but a libmusl issue. Lots of stuff like local support does not work for musl. I do like the compact size of Alpine but, if you are not developing on with libmusl underneath there seem to be lots of surprises.
m000 4 years ago

True. I had a somewhat similar experience with the official Alpine-based Python images. The are supposedly leaner than the Debian-based ones, but any advantage is cancelled if you need any PyPI packages that use native libraries. Now you suddenly need to include a compiler toolchain in the image and compile the native interface every time you build the image.
FinalBriefing 4 years ago

I generally agree.
I start all my projects based on Alpine (alpine-node, for example). I'll sometimes need to install a few libraries like ImageMagic, but if that list starts to grow, I'll just use Ubuntu.

nodesocket 4 years ago

A very common mistake I see (though not related to image size perse) when running Node apps is to do CMD ["npm", "run", "start"]. This is first memory wasteful, as npm is running as the parent process and forking node to run the main script. Also, the bigger problem is that the npm process does not send signals down to its child thus SIGINT and SIGTERM are not passed from npm into node which means your server may not be gracefully closing connections.

j1elo 4 years ago
Node.js has both a Best Practices [0] and a tutorial [1] that instruct to use CMD ["node", "main.js"]. In short: do not run NPM as main process; instead, run Node directly.
This way, the Node process itself will run as PID 1 of the container (instead of just being a child process of NPM).
The same can be found in other collections of best practices such as [2].
What I do is a bit more complex: an entrypoint.sh which ends up running
```
    exec node main.js "$*"
```
Docs then tell users to use "docker run --init"; this flag will tell Docker to use the Tini minimal init system as PID 1, which handles system SIGnals appropriately.
[0]: https://github.com/nodejs/docker-node/blob/main/docs/BestPra...
[1]: https://nodejs.org/en/docs/guides/nodejs-docker-webapp/
[2]: https://dev.to/nodepractices/docker-best-practices-with-node...
Edit: corrected the part about using --init for proper handling of signals.
Ramiro 4 years ago

I never really thought about this, it's a good point. What do you suggest it's used instead of ["npm", "run", "start"]?
- nicholasjarnold 4 years ago
  This is a great use case for tini[0]. Try this, after installing the tini binary to /sbin:
  ENTRYPOINT ["/sbin/tini", "--"] CMD ["node", "/path/to/main/process.js"]
  [0]: https://github.com/krallin/tini
  edit: formatting, sorry.
  - remram 4 years ago
    
    I think this is built into docker now: https://docs.docker.com/engine/reference/run/#specify-an-ini...
    If you use Kubernetes then you have to add tini for now (https://github.com/kubernetes/kubernetes/issues/84210)
- davidjfelix 4 years ago
  
  ["node", "/path/to/your/entrypoint.js"]
- bravetraveler 4 years ago
  
  I'm not a Node/NPM person, but I imagine they had in mind the equivalent of whatever is expected from npm. I expect some nodejs command to invoke the service directly
  Edit: Consequently this should make the container logs a bit more useful, beyond better signal handling/respect
- pineconewarrior 4 years ago
  
  I assume it'd be better to execute index.js directly with node

2OEH8eoCRo0 4 years ago

I also liked this one:

https://fedoramagazine.org/build-smaller-containers/

I don't avoid large images because of their size, I avoid them because it's an indicator that I'm packaging much more than is necessary. If I package a lot more than is necessary then perhaps I do not understand my dependencies well enough or my container is doing too much.

yjftsjthsd-h 4 years ago

> 1. Pick an appropriate base image

Starting with: Use the ones that are supposed to be small. Ubuntu does this by default, I think, but debian:stable-slim is 30 MB (down from the non-slim 52MB), node has slim and alpine tags, etc. If you want to do more intensive changes that's fine, but start with the nearly-zero-effort one first.

EDIT: Also, where is the author getting these numbers? They've got a chart that shows Debian at 124MB, but just clicking that link lands you at a page listing it at 52MB.

bravetraveler 4 years ago

The article doesn't seem to do much... in the 'why'. I'm inundated with how, though.

I've been on both sides of this argument, and I really think it's a case-by-case thing.

A highly compliant environment? As minimal as possible. A hobbyist/developer that wants to debug? Go as big of an image as you want.

It shouldn't be an expensive operation to update your image base and deploy a new one, regardless of size.

Network/resource constraints (should) be becoming less of an issue. In a lot of cases, a local registry cache is all you need.

I worry partly about how much time is spent on this quest, or secondary effects.

Has the situation with name resolution been dealt with in musl?

For example, something like /etc/hosts overrides not taking proper precedence (or working at all). To be sure, that's not a great thing to use - but it does, and leads to a lot of head scratching

yjftsjthsd-h 4 years ago

> A highly compliant environment? As minimal as possible. A hobbyist/developer that wants to debug? Go as big of an image as you want.
Hah, I go the other way; at work hardware is cheap and the company wants me to ship yesterday, so sure I'll ship the big image now and hope to optimize later. At home, I'm on a slow internet connection and old hardware and I have no deadlines, so I'm going to carefully cut down what I pull and what I build.
- bravetraveler 4 years ago
  
  Haha, definitely understandable! The constraints one operates in always differ, so that's why I really try to stay flexible (or forgiving) in this situation.
  Our development teams at work have a lot of [vulnerability scanning] trouble from bundling things they don't need. In that light, I suggest keeping things small - but that's the 'later' part you alluded towards :)
3np 4 years ago

I mean on one hand, yeah, but comparing Debian (124 MB) with Ubuntu (73 MB) shows that with some effort you can eat your cake and have it too.
- bravetraveler 4 years ago
  
  Definitely - I guess my biggest takeaway with my post is... keep size in mind (in terms of vuln surface area) - but don't let it become a time sink.
  In the quest of the smallest image possible, one can bring about many unwarranted problems.

adamgordonbell 4 years ago

You might not need to care about image size at all if your image can be packaged as stargz.

stargz is a gamechanger for startup time.

kubernetes and podman support it, and docker support is likely coming. It lazy loads the filesystem on start-up, making network requests for things as needed and therefore can often start up large images very fast.

Take a look at the startup graph here:

https://github.com/containerd/stargz-snapshotter

no_wizard 4 years ago

I like this article, and there is a ton of nuance in the image and how you should choose the appropriate one. I also like how they cover only copying the files you actually need, particularly with things like vendor or node_modules, you might be better off just doing a volume mount instead of copying it over to the entire image.

The only thing they didn't seem to cover is consider your target. My general policy is dev images are almost always going to be whatever lets me do one of the following:

- Easily install the tool I need

- All things being equal, if multiple image base OS's satisfy the above, I go with alpine, cause its smallest

One thing I've noticed is simple purpose built images are faster, even when there are a lot of them (big docker-compose user myself for this reason) rather than stuffing a lot of services inside of a single container or even "fewer" containers

EDIT: spelling, nuisance -> nuance

Sebb767 4 years ago

> I also like how they cover only copying the files you actually need, particularly with things like vendor or node_modules, you might be better off just doing a volume mount instead of copying it over to the entire image.
I'd highly suggest not to do that. If you do this, you directly throw away reproducibility, since you can't simply revert back to an older image if something stops working - you need to also check the node_modules directory. You also can't simply run old images or be sure that you have the same setup on your local machine as in production, since you also need to copy the state. Not to mention problems that might appear when your servers have differing versions of the folder or the headaches when needing to upgrade it together with your image.
Reducing your image size is important, but this way you'll loose a lot of what Docker actually offers. It might make sense in some specific cases, but you should be very aware of the drawbacks.
dvtrn 4 years ago

I like this article, and there is a ton of nuisance in the image and how you should choose the appropriate one.
By chance, did you mean nuance? Because while I can agree it you can quickly get into some messy weeds optimizing an image...hearing someone call it a "nuisance" made me chuckle this afternoon
- no_wizard 4 years ago
  
  I did! Edited for clarification, though it definitely can be both!

alanwreath 4 years ago

I always feel helpless with python containers - it seems there isn’t much savings ever eeked out of multi-stage and other strategies that typically are suggested. Docker container size really has made compiled languages more attractive to me

alanwreath 4 years ago

That said, i love love love python. So, I mean even a tree shaking capability would be awesome.

hrez 4 years ago

Nobody mentioned https://github.com/docker-slim/docker-slim yet.

So here it is.

hamiltont 4 years ago

There is some strange allure for spending time crafting Dockerfiles. IMO it's over glorified - for most situations the juice is not worth the squeeze.

As a process for getting stuff done, a standard buildpack will get you a better result than a manual Dockerfile for all but the most extreme end of advanced users. Even for those users, they are typically advanced in a single domain (e.g. image layering, but not security). While buildpacks are not available for all use cases, when available I can't see a reason to use a manual Dockerfile for prod packaging

For our team of 20+ people, we actively discourage Dockerfiles for production usage. There are just too many things to be an expert on; packers get us a pretty decent (not perfect) result. Once we add the packer to the build toolchain it becomes a single command to get an image that has most security considerations factored in, layer and cache optimization done far better than a human, etc. No need for 20+ people to be trained to be a packaging expert, no need to hire additional build engineers that become a global bottleneck, etc. I also love that our ops team could, if they needed, write their own buildpack to participate in the packaging process and we could slot it in without a huge amount of pain

no_wizard 4 years ago

Somewhat tangentially related to the topic of this post: does anyone know any good tech for keeping an image "warm". For instance, I like to spin up separate containers for my tests vs development so they can be "one single process" focused, but it is not always practical (due to system resources on my local dev machine) to just keep my test runner in "watch" mode, so I spin it down and have to spin it back up, and there's always some delay - even when cached. Is there a way to keep this "hot" but not run a process as a result? I generally try to do watch mode for tests, but with webdev I got alot of file watchers running, and this can cause a lot of overhead with my containers (on macOS for what its worth)

Is there anything one can do to help this issue?

pas 4 years ago

You could launch the container itself with sleep. (docker run --entrypoint /bin/sh [image] sleep inf) Then start the dev watch thing with 'docker exec', and when you don't need it anymore you can kill it. (Eg. via htop)
With uwsgi you can control which file to watch. I usually just set it to watch the index.py so when I want to restart it, I just switch to that and save the file.
Similarly you could do this with "entr" https://github.com/eradman/entr
PhilippGille 4 years ago

> keeping an image "warm"
Do you mean container? So you'd like to have your long running dev container, and a separate test container that keeps running but you only use it every now and then, right? Because you neither want to include the test stuff in your dev container, nor use file watchers for the tests?
Then while I don't know your exact environment and flow, could you start the container with `docker run ... sh -c "while true; do sleep 1; done"` to "keep it warm" and then `docker exec ...` to run the tests?

nnx 4 years ago

One way to simply optimize Docker image size is to use https://github.com/GoogleContainerTools/distroless

Supports Go, Python, Java, out of the box.

gui77aume 4 years ago

For Java, JIB on distroless works pretty well. It's small, fast and secure.

- https://github.com/GoogleContainerTools/jib

- https://github.com/GoogleContainerTools/distroless

somehnacct3757 4 years ago

The analyzer product this post is content marketing for looks interesting, but I would want to run it locally rather than connect my image repo to it.

Am I being paranoid? Is it reasonable to connect my images to a random third party service like this?

EscargotCult 4 years ago

> Am I being paranoid? Is it reasonable to connect my images to a random third party service like this?
Depends on your line of work I suppose

jiggunjer 4 years ago

When I want to run a containerized service I just look for the dockerhub image or github repo that requires the least effort to get running. In these cases is it very common to write dockerfiles and try to optimize them?

MDingas 4 years ago

I've heard that using alpine over a base image like debian makes it harder for current vulnerability scanners to find problems. Is this still true?

detaro 4 years ago

in the sense that with fewer stuff around they find fewer things to throw mostly-meaningless warnings about? /s

betaby 4 years ago

https://paketo.io/ is worth mentioning.

funcDropShadow 4 years ago

Is there a tool to get an overview over the size of a set of images, considering sharing?

encryptluks2 4 years ago

Has anyone had more luck from using podman and alternative builders?

natrys 4 years ago

Using buildah you have pretty much complete control over layers. You can mount the image in filesystem, manipulate it in as many steps as you want, and then commit it (thus explicitly create layer) when you want.
On the flip side, it's a shell script that calls various buildah sub-commands rather than a nicer declarative DSL. Also you don't have the implicit auto cache reuse behaviour of Dockerfiles, since everything runs anew in next invocation. You would have to implement your own scheme for that, iirc breaking down the script into segments for each commit, writing the id to a file at the end of it, combining that with make worked for me.

epberry 4 years ago

If you click ‘Pricing’ on the main site an error occurs just FYI.

Settings

Optimizing Docker image size and why it matters

Keyboard Shortcuts