What's new in Docker 1.13: prune, secrets, checkpoints and more
cloudshare.comI'm really looking forward to seeing the scientific community adopt docker as a way to distribute reproducible research and coursework.
MIT 6.S094 has a Dockerfile[^1] that contains all the software required for taking part in the class. This is a huge boon for getting stuck into the class and its coursework.
Most of the excitement that I've seen in the HPC scientific world has been around Singularity [1] containers. In particular, the main advantage seems to be keeping processes running as non-privileged users. This lets these containers get integrated with existing HPC clusters much easier.
How is publishing a Dockerfile even remotely reproducible? Almost every Dockerfile is a series of apt-get install, or yum install or pip install commands. How do I know what versions of packages I am downloading or whether they will even be available to download if I build from this Dockerfile, say two months from now?
IMHO, every Dockerfile has left-pad written all over it.
Good question.
Reproduciblity is all about the starting point. Computers are electronic, so if your computation requires high entropy from some random source and supposed next run there is not enough entropy your experiment may fail. But that's really really really a corner case. Docker image keeps the state of the starting point (kernel, packages, history of bashrc etc) are kept version controlled. It is as if someone gave you a copy of the virtualbox image.
So how do we lock down?
1) When you start with a Dockerfile, specify the version of the packages you are installing
2) When you want to reproduce, you can rebuild an image with that Dockerdile.
3) But most people are just going to use your image which is always the same now or next year. Building image != launching a container using an image.
Currently a lot of research is computed using Condor to schedule jobs and yeah, they span across multiple machines, like how Jenkins master/slaves work. It's been a go-to for many HPC research.
There's been some effort individually to integrate Docker with Condor (after all, both are just processes running on some host machine).
I'm just a guy that wants to deploy web apps. Is docker overkill for me? Basically, I want to be able to test something on my local machine under the same conditions it will be running on my server. Containerisation seems like the only way to do this that doesn't involve keeping packages and system configurations in sync in two or more systems.
Docker may be overkill to start but it's relatively low cost to implement and it will definitely pay dividends over time:
* You can be sure that what you're running locally is exactly what you'll be running on the server * Your deployment experience will be the same regardless of which tech stack you're using for the web application * There are many places you can deploy docker containers (Google GCE, Amazon ECS, Amazon EB, etc.) * A web application is often composed of several services (e.g. the web app, a database, redis etc.) and docker compose makes it easy to fire all of those up in development e.g. if a new developer joins, they only need to install docker rather than web app framework + database + redis * Docker sets you up quite well to grow into a more complex deployment (e.g. using Kubernetes)> it's relatively low cost to implement
Running Docker in production takes a huge amount of effort to get right and is not easily done.
I dont believe that's an accurate assessment. If the grandparent wants to run a one off container with reproducible results, something like docker-compose is perfect. If he wants to run a multi-node microservices architecture then the story gets more complicated.
i run a lot of small projects with docker-compose on a single host and it makes deploying my changes very easy. Maybe there is a low cost setting it up, but i think eve with a small project it pays it divides pretty fast.
I could have been clearer. I meant that setting up docker for his use case i.e. a single 'standard' web application, is relatively easy. Especially if you're using something like Amazon Elastic Beanstalk. At least, that's been my experience.
You're right that docker can become very complex e.g. dockerizing and orchestrating mariadb with galera for high availability was not pleasant.
I agree that its actually ok for that use case. But then you don't have a big initial pain to solve, anyway - people using Docker in production usually have few other choices due to the scale they are operating at, and Nomad+Ansible doesn't cut it because there are complex dependencies.
Docker is very well suited for local development and testing, particularly since the launch of Docker for Mac and Windows. It makes utilities like MAMP less necessary.
But apart from local development, I'd say that depends on your needs. If you want more ease-of-use, and you run a single-server hosting environment with multiple projects, it may be easier to keep doing that without adding Docker. But if you want increased security and better isolation between your projects, Docker is likely a better solution.
In any case, I would strongly recommend that you familiarize yourself with Docker, at least locally. After a while, you can decide if you want to take the leap and use it on your server as well.
Docker for local development for us been a pain in the butt. - We've hit performance problems with the filesystem, - Problem with caching things like yarn and npm install - The need to constantly rebuild the images for changes to be picked up. - Dificulty dealing with single docker file for prod and testing, making us want to montain 2 docker files.
Probably some bad setup of our part, but we've been using on production with kubernetes and none of those problems.
We're still using the compose to bootstrap database, caching, etc.
I don't understand how kubernetes solved your base image issue. That's a clustering system, so by default it can't help.
It sounds as though your setup doesn't work with the immutable filesystems introduced by docker. That's not an issue with docker at all - just something to learn.
I can't imagine dev or deployment without docker any more - all of my tests, yarn installs, dev workflow and prod runs through it.
My experience has been that it's great for local develoment, if your app is reasonably complex (ie Docker doesn't make sense if you only have an app worker and SQLite database), but I don't love it for production. In order for Docker to work well on production, you need something like Kubernetes, and that's a huge hassle for a small app.
I don't think that Kubernetes is the most important thing on prod. Some colleagues from another team at $WORK use plain Docker and "orchestrate" their containers with simple systemd units that run `docker stop|start`. If the app is only a single container, that should do it. (Actually, in that case, I think that `rkt run` would be better since the process runs below the same cgroup, and systemd can detect crashes and restart the container.)
Anyway, Kubernetes is not so important for small deployments, but what I've found really helpful is CoreOS: an auto-updating base OS that gets out of the way and (more importantly) ships a combination of Linux kernel + Docker that usually works really well.
Recent versions of systemd-nspawn can directly download a docker image and run it in a service unit.
What about docker-compose? We've recently started using it, and we don't see any problems; did your colleagues evaluate it?
docker-compose is really straightforward to get running, even moreso with docker-machine, and it gives you dev/prod parity, but the downside is that there's not a built in way to do zero downtime deploys.
Actually with the new docker-compose version 3 you can do rolling updates[1]. 1. https://docs.docker.com/compose/compose-file/#/deploy
That doesn't suggest zero downtime though, no? Still needs an LB to know to stop routing to that host for a moment.
That's how I do deployments, but they take a while to start/stop. Whereas, with uwsgi, for example, deployments are zero-downtime, since uwsgi loads a new interpreter and uses that for new connections from that point on, without interrupting any old connections.
For your use case, containers aren't overkill, but a full orchestration system probably is. Just letting some simple outside process just handle starting them up is fine, and Swarm seems to have improved enough that you can use that for single computer "keep my app running with x instances" stuff with no overhead.
I'll go against the grain here, and say that Vagrant + Ansible (or your favorite config management tool) will be easier to handle. It's well understood, simple, and you can try out any config changes in your local Vagrant environment before running the same changes on production.
At least in my mind, it's much more simple to say "OK, I installed these packages, let me add that to Ansible" than it is to get a production-ready Docker setup going.
You can try rkt[1] instead. Its from a container framework from coreos which makes a lot of things easier than docker.
Running it in a simple production setup is simply writing a systemd/initd job which starts the container. No container management daemon or orchestration framework involved.
>Containerisation seems like the only way to do this that doesn't involve keeping packages and system configurations in sync in two or more systems.
In a nutshell, this is why I'm now hooked on docker. I can reproducibly build things on my macbook without tearing up the system packages, and I can deploy them to my small datacenter without thinking twice.
I'd suggest you at least try it out.
For single-node applications, I develop on a LXC setup with a base template of the distribution that will run a production VM. This combination provides maximum dev/prod parity, the benefits of lightweight virtualization for development, and a boring, battle-tested production environment. The setup and deployment is written once for the choice distribution.
> Containerisation seems like the only way to do this that doesn't involve keeping packages and system configurations in sync in two or more systems.
Virtual machines will also work.
Docker serves as a lightweight virtualization that will provide the same experience, assuming you are willing to keep to the kernel and Docker version "in sync" between prod and local.
Lately I've been sending a bunch of patches upstream to the runv project (https://github.com/hyperhq/runv). Turns out that wrapping the docker interface with full VM isolation is a model I very much like.
If you only care about controlling the software configuration and versions, nix (nixos.org/nix/) will do this far more elegantly than docker.
Does your web app have a database?
Does this mean the qcow2 disk space usage in Mac is fixed?
Yep, just tested it. The qcow2 disk space gets reclaimed on Docker restart.
Sweet. Now time to play. I followed that bug until i got tired of it. Took a couple of months! Good on them for fixing it. Thanks!
As much as I welcome the CLI cleanup, I can't stop thinking that the 'docker ps -> docker container ls' change makes no sense to anyone who has any experience with bsd/unix/linux systems. Seriously, why?
I agree. It looks like `docker ps` still works so it's nothing to really be concerned about just yet.
`docker ps` will never be removed.
Secrets is a big one! Will really help speed up enterprise adoption.
Looks like there's a mistake about image pruning:
"Add -f to get rid of all unused images (ones with no containers running them)."
But the option is actually `-a` -- `-f` just simply skips the prompt.
Oops. Thanks for bringing this to my attention. Fixing...
Like this?
docker rmi -af
I'm a bit confused by the backticks as I use them all the time scripting, but also in Markdown.
I have a gist for it: https://gist.github.com/pubkey/73dcb894cf5f7d262863
#stop and delete all containers
docker rm -f $(docker ps -a -q)
#delete all images
docker rmi -f $(docker images -q)
This is NOT equivalent. The OP was talking about removing unused images. Your commands remove all images.
Maybe this?
docker rm $(docker ps -qa --no-trunc --filter "status=exited") docker rmi $(docker images --filter "dangling=true" -q --no-trunc)
fwiw, there's a new syntax for this, that is a bit more verbose, but probably worth adopting:
docker container rm $(docker container ls -qa)
docker image rm $(docker image ls -q)
Prune seems not that well thought to me. Don't get me wrong, I do find it useful but many people use containers as environments. Think about how many people are going to run prune only to find their work go missing.
If you are gonna add a nuclear button, do it with a big red alert and give the option to whitelist some containers.
But that's really what `docker rm` is for, isn't it? I mean, if you want to only delete specific containers, use that. Prune has a specific purpose, which I think is very clear. If you're running the command, you (presumably) know what it should be doing.
I suppose you could argue it might be nice to be able to do something like `docker container prune startsWith*` or something similar. But on the other hand, that functionality is already available -- just use `docker rm` with xargs or something.
But the thing people complain most isn't because they want to delete everything but because they need docker rm, xargs and complex bash foo to delete the containers and images they don't need.
For example I want to delete all old and all untagged versions of an image. I want to delete all stopped containers that use a specific image, or that were created more than two weeks ago. I want to delete all images starting with test.
Nuke everything? Not so much and to be honest this would be the easiest even with xargs and docker rm.
fyi, you do not need xargs.
`docker rm $(docker ps -q --filter blash)`
But agree, `prune` is currently sledge hammer and needs some refinement. It's not about not being well thought out, it's about getting something out there that can be built on top of.
Thanks, it was too long since I used filter and it wasn't that interesting. Seems much better now!
That's kind of like saying the 'rm' command is not well thought out, because many people wouldn't want to delete the whole file system when running 'rm -r /'.
Actually the rm command won't let you remove the root filesystem.
Also, when ran in bash but also supported by other shells, it supports regular expressions and extended pattern matching letting you for example specify not which files you want deleted, but which files you want not.
Curious what methods others use for handling secrets at build time (using docker-compose). I'm currently installing (private) dependencies at runtime by mounting my secrets as a volume. I couldn't find a method that didn't seem to have some risk of inadvertently exposing them.
There are only methods that I'm aware of:
- Exposing the secrets on a (http) server that the Dockerfile can use to fetch
- What we use: Create a one time use secret that is destroyed after the image is built and before it is pushed.
>What we use: Create a one time use secret that is destroyed after the image is built and before it is pushed.
This approach has sparked my interest, could you post an example of any open source docker-compose file and/or associated scripts that would do this?
I did actually encounter this solution while researching the problem, didn't love it, but you can check out the solution at: https://github.com/docker/docker/issues/13490#issuecomment-1...
As long as you add the file and remove it in the same command it doesn't get committed as an extra layer, so the container won't have any history of the secrets. You'll run into problems if you do multiple RUN's or an ADD and then RUN.
Stay tuned for `docker build` support for secrets, and more secret backends in later versions.
Why not one 'prune' command with 'containers', 'images', ... as an argument / subcommand?
Would have seemed more intuitive to me.
All of the other commands have been namespaced by what they deal with, so I think it makes more sense in the come t of everything else.