Dockerfiles provide a simple syntax for building images. The following are a few tips and tricks to help you get the most out of Dockerfiles.
1: Use the cache
Each instruction in a Dockerfile commits the change into a new image which will then be used as the base of the next instruction. If an image exists with the same parent and instruction ( except for ADD ) docker will use the image instead of executing the instruction, i.e. the cache.
In order to effectively utilize the cache you need to keep your Dockerfiles consistent and only add the alterations at the end. All my Dockerfiles start with the same 5 lines.
FROM ubuntu MAINTAINER Michael Crosby <[email protected]> RUN echo "deb http://archive.ubuntu.com/ubuntu precise main universe" > /etc/apt/sources.list RUN apt-get update RUN apt-get upgrade -y
Changing MAINTAINER instruction will force docker to execute the proceeding RUN instructions to update apt instead of hitting the cache.
1. Keep common instructions at the top of the Dockerfile to utilize the cache.
2: Use tags
Unless you are experimenting with docker you should always pass the -t option to docker build so that the resulting image is tagged. A simple human readable tag will help you manage what each image was created for.
docker build -t="crosbymichael/sentry" .
2. Always pass -t to tag the resulting image.
3: EXPOSE-ing ports
Two of the core concepts of docker are repeatability and portability. Images should able to run on any host and as many times as needed. With Dockerfiles you have the ability to map the private and public ports, however, you should never map the public port in a Dockerfile. By mapping to the public port on your host you will only be able to have one instance of your dockerized app running.
# private and public mapping EXPOSE 80:8080 # private only EXPOSE 80
If the consumer of the image cares what public port the container maps to they will pass the -p option when running the image, otherwise, docker will automatically assign a port for the container.
3. Never map the public port in a Dockerfile.
4: CMD and ENTRYPOINT syntax
Both CMD and ENTRYPOINT are straight forward but they have a hidden, err, “feature” that can cause issues if you are not aware. Two different syntaxes are supported for these instructions.
CMD /bin/echo # or CMD ["/bin/echo"]
This may not look like it would be an issues but the devil in the details will trip you up. If you use the second syntax where the CMD ( or ENTRYPOINT ) is an array, it acts exactly like you would expect. If you use the first syntax without the array, docker pre-pends /bin/sh -c to your command. This has always been in docker as far as I can remember.
Pre-pending /bin/sh -c can cause some unexpected issues and functionality that is not easily understood if you did not know that docker modified your CMD. Therefore, you should always use the array syntax for both instructions because both will be executed exactly how you intended.
4. Always use the array syntax when using CMD and ENTRYPOINT.
5. CMD and ENTRYPOINT better together
In case you don’t know ENTRYPOINT makes your dockerized application behave like a binary. You can pass arguments to the ENTRYPOINT during docker run and not worry about it being overwritten ( unlike CMD ). ENTRYPOINT is even better when used with CMD. Let’s checkout my Rethinkdb Dockerfile and see how to use this.
# Dockerfile for Rethinkdb # http://www.rethinkdb.com/ FROM ubuntu MAINTAINER Michael Crosby <[email protected]> RUN echo "deb http://archive.ubuntu.com/ubuntu precise main universe" > /etc/apt/sources.list RUN apt-get update RUN apt-get upgrade -y RUN apt-get install -y python-software-properties RUN add-apt-repository ppa:rethinkdb/ppa RUN apt-get update RUN apt-get install -y rethinkdb # Rethinkdb process EXPOSE 28015 # Rethinkdb admin console EXPOSE 8080 # Create the /rethinkdb_data dir structure RUN /usr/bin/rethinkdb create ENTRYPOINT ["/usr/bin/rethinkdb"] CMD ["--help"]
This is everything that is required to get Rethinkdb dockerized. We have my standard 5 lines at the top to make sure the base image is updated, ports exposed, etc… With the ENTRYPOINT set, we know that whenever this image is run, all arguments passed during docker run will be arguments to the ENTRYPOINT ( /usr/bin/rethinkdb ).
I also have a default CMD set in the Dockerfile to --help. What this does is incase no arguments are passed during docker run, rethinkdb’s default help output will display to the user. This is same functionality that you would expect interacting with the rethinkdb binary.
docker run crosbymichael/rethinkdb
Output
Running 'rethinkdb' will create a new data directory or use an existing one, and serve as a RethinkDB cluster node. File path options: -d [ --directory ] path specify directory to store data and metadata --io-threads n how many simultaneous I/O operations can happen at the same time Machine name options: -n [ --machine-name ] arg the name for this machine (as will appear in the metadata). If not specified, it will be randomly chosen from a short list of names. Network options: --bind {all | addr} add the address of a local interface to listen on when accepting connections; loopback addresses are enabled by default --cluster-port port port for receiving connections from other nodes --driver-port port port for rethinkdb protocol client drivers -o [ --port-offset ] offset all ports used locally will have this value added -j [ --join ] host:port host and port of a rethinkdb node to connect to .................
Now lets run the container with the --bind all argument.
docker run crosbymichael/rethinkdb --bind all
Output
info: Running rethinkdb 1.7.1-0ubuntu1~precise (GCC 4.6.3)... info: Running on Linux 3.2.0-45-virtual x86_64 info: Loading data from directory /rethinkdb_data warn: Could not turn off filesystem caching for database file: "/rethinkdb_data/metadata" (Is the file located on a filesystem that doesn't support direct I/O (e.g. some encrypted or journaled file systems)?) This can cause performance problems. warn: Could not turn off filesystem caching for database file: "/rethinkdb_data/auth_metadata" (Is the file located on a filesystem that doesn't support direct I/O (e.g. some encrypted or journaled file systems)?) This can cause performance problems. info: Listening for intracluster connections on port 29015 info: Listening for client driver connections on port 28015 info: Listening for administrative HTTP connections on port 8080 info: Listening on addresses: 127.0.0.1, 172.16.42.13 info: Server ready info: Someone asked for the nonwhitelisted file /js/handlebars.runtime-1.0.0.beta.6.js, if this should be accessible add it to the whitelist.
And there it is, a full Rethinkdb instance running with access to the db and admin console by, interacting with the image the same way you interact with the binary. Very powerful and yet extremely simple. I love simple.
5. ENTRYPOINT and CMD are better together.
I hope this post helps you to get started working with Dockerfiles and building images that we all can use and benefit from. Going forward, I believe that Dockerfiles will be a very important part of what makes docker so simple and easy to use whether you are consuming or producing images. I plan to invest much of my time to provide a complete, powerful, yet simple solution to building docker images via the Dockerfile.