The three laws of config dynamics

How to fight against entropy in your configuration files

The birth of configuration files

A long, long time ago, a developer had written a simple web app to store the payrolls for his company. He used two different databases: the “production” database containing the real employees and salaries, and a “staging” database with dummy data he used while developing the app.

Once, late at night, he deployed a new feature and forgot to remove the hardcoded reference to the staging database:

mysql_connect("db-staging.example.com", "admin", "admin");

The next morning, the boss logged in to the system and found the employees’ names had been mysteriously replaced by Disney characters.

Press enter or click to view image in full size

The boss was not amused.

The developer, determined never to repeat the mistake, decided to replace the hardcoded hostname with a variable.

The first configuration file was born.

; Do not store in SVN
[db]
host = db1.example.com
dbname = payrolls
user = admin
pass = s3cur3

Fast forward a few years and database credentials are still the first thing everyone puts in a config file. But now we also have API keys, third-party tokens and all kinds of crazy stuff with the annoying tendency to change.

At buildo, our web apps used to rely on two config files.

Our Scala back-end APIs would read their config values from a file named application.conf. It looked like this:

app {
  db = {
    url = "jdbc:postgresql://localhost:5432/app"
    user = "postgres"
    password = ""
    driver = org.h2.Driver
    keepAliveConnection = true
    queriesTimeoutSeconds = 1
    connectionTimeout = 1000
    numThreads = 1
  }
  interface = "0.0.0.0"
  port = 8082
  allowedHostnames = ["localhost"]
  allowedHeaders = ["Content-Type", "Authorization", "Cache-Control", "Pragma"]
  #Timeout for reading routine time from file
  routineTimeDataTimeout = 1
  localBackupPath = "/Users/fra/buildo/app/backup"
  serviceEndpoint = "http://localhost:8083"
  maximumItemsNumber = 100000000
  #Wait 5 seconds before killing all connections
  waitBeforeKillingConnectionsMillis = 5000
  #Size of the buffer used to read and write files
  streamBufferSize = 4096
}

The JS front-end clients did the same using a JSON file named config.json:

{
  "NODE_ENV": "development",
  "hostname": "localhost",
  "port": 9090,
  "apiEndpoint": "https://api-buildo-dev.example.com/v2",
  "gMapsAPIKey": "abcdefghijklmnopqrstuvwxyz",
  "title": "Yolo",
  "username": "test-temp@buildo.io",
  "password": "test",
  "debug": "state*,react-avenger*"
}

Both files were added to .gitignore and never committed to Git.

Wait, you should use environment variables!

You might be familiar with the Twelve-Factor App manifesto, recommending not to use config files and to store all configuration in environment variables instead. While they provide a nice, language-independent way to store this information, they do not change the underlying problem. Actually, once your app requires more than a few config variables, you will probably end up storing them in a shell include file.

export MY_APP_VAR1="foo"
export MY_APP_VAR2="bar"

Congratulations, you have just created an unversioned configuration file. You have also discovered the first law of config dynamics:

Config values can be transformed from one form to another, but can be neither created nor destroyed.

https://commons.wikimedia.org/wiki/File:Carnot_heat_engine_2.svg

Sharing is caring

When I start working on a project and clone the repo for the first time, I often find out the project won’t start without a config file.

$ npm install && npm start
[...]
Error: Cannot find module './config.json'

I will usually ask on Slack and another developer will drag&drop a working version of the config file to a private chat. (Yes, this is silly. No, we’re not the only ones to do this.)

If you look again at the two examples above, you might also notice they are pretty long. This is because of the second law of config dynamics:

The total length of a config file can only increase over time.

https://commons.wikimedia.org/wiki/File:PSM_V76_D246_Demonstration_of_the_second_law_of_thermodynamics.png

I actually omitted some of the values from real files I was sent over Slack a few months ago. How can you tell if all these values are the right ones, or just custom settings someone was using temporarily to test an edge case?

The only way to reverse the second law is to use some (developer) energy and look for config variables that you rarely need to change. Maybe you don’t want to hardcode them in your codebase, but nobody stops you from committing them to the repo in some other form.

For Scala apps, we use Lightbend’s Config, which lets you define a reference.conf containing default values that can be safely committed.

Get Francesco Negri’s stories in your inbox

Join Medium for free to get updates from this writer.

Remember me for faster sign in

Recently, we’ve started paying more attention to what goes in the reference.conf file to make sure it's not just a skeleton, but it includes all the config values required to start the app.

If you want to override any of those values, you can either set a local env variable, or create an application.conf file that will not be committed as it's listed in .gitignore.

This is the beginning of one of our new-style reference.conf:

# This is the reference config file that contains all the default settings.
# Make your edits/overrides in your application.conf.app {
  interface = "0.0.0.0"
  interface = ${?SERVICE_INTERFACE}
  port = 8080
  port = ${?SERVICE_PORT}
  ...
}

A similar thing happens for the front-ends, where we now always create a development.json file (committed) with some default values that can be overridden with an optional local.json file (not committed). We also create a production.json file with the default production settings. In this case we are not relying on an open-source library but we have written our own simple implementation.

This allowed us to transform, for example, this old CI build script:

echo '{
"NODE_ENV": "production",
"port": 9090,
"apiEndpoint": "/api",
"uglify": true,
"gzip": false,
"title": "Awesome App"
}' > config.jsonnpm run build

Into this new one:

NODE_ENV=production npm run build

A tale of many environments

The thing you should strive for is to have at least one default configuration that is committed to the repo and is enough to start the app successfully in a local dev environment.

This is also known as the third law of config dynamics:

The length of a perfect config file in a development environment is exactly equal to zero.

https://commons.wikimedia.org/wiki/File:Can_T%3D0_be_reached.jpg

But what about your other environments? You probably want to deploy this app in production, and quite possibly you have a staging server with some subtle difference, like more verbose logging.

The first step is to minimize the unnecessary differences. If more environments can share exactly the same config, they probably should.

Then create override files for each environment. Store them in Git, either in the app repo or in a separate “infrastructure” repo. All developers in your team should be able to quickly find the config values for the different environments and, if necessary, apply the same values to their development environment.

Finally, make sure the artifacts you have versioned in Git are automatically deployed to the servers. Resist the temptation to SSH into the server and modify a config file manually. Use Ansible, Chef or another configuration management tool; or use Packer to bake new AMIs and deploy them with Terraform. Use the tool you’re more familiar with but always keep your config files in sync.

The power of Docker

We use Docker to package our applications, and this makes things easier by reducing the differences between our environments.

The following Docker Compose file will work fine both on a MacBook and on a production server. The API hostname will always be api, the db hostname will always be db and Docker will take care of pointing them to the right containers. No need to configure different hostnames for each environment!

services:
  web:
    image: quay.io/buildo/app-frontend
    ports:
    - "80:5000"
    links:
    - api
  api:
    image: quay.io/buildo/app-backend
    links:
    - db
  db:
    image: postgres

We are also using Docker’s support for multiple compose files to aggregate environment-specific configuration into a single file. With a quick glance at testing.yml you can see the testing environment uses some custom HTTP ports, enables a development token, and loads a custom db config.

services:
  web:
    ports:
      - 8008:5000
  api:
    environment:
      - "USE_DEVELOPMENT_TOKEN=true"
    ports:
      - 8005:8080
  db:
    volumes:
      - ./config/postgres/staging.conf:/usr/share/postgresql/postgresql.conf.sample

In this way, our Docker images are exactly the same for all environments, and we use Compose files to configure environment-specific settings, either via env variables or config files.

Env variables are often a good choice here because config values for different (micro)services can be easily embedded in the same Compose file, that is then committed to Git. As shown in the example above, when you think a config file is more appropriate you can bind-mount it using Docker volumes, but don’t forget that all config files that are referenced from a Compose file should be committed as well.

How to handle secrets

Sometimes configuration values can be too sensitive to be stored in Git, even if the repo is not publicly accessible. These include AWS secret keys, production API tokens and so on. Diogo Monica has recently argued you should not store them in environment variables either.

At buildo, we often use git-crypt to encrypt sensitive values, so that we can commit them to Git but they cannot be accessed without a whitelisted PGP key.

More complex solutions like Vault or Docker secrets would offer some advantages. This is something we’re still working on and might be the topic of a future blog post…

tl;dr

Please remember the three laws of config dynamics: moving config to env variables does not change the problem, regularly check if you have unnecessary config values, and make sure that your app can start with no config at all.

If you do, you will decrease config entropy and, ultimately, save a lot of time. This is sometimes referred to as the zeroth law of config dynamics:

If a config A is committed to Git and pulled by two devs B and C, then B and C will be in config equilibrium with each other.

http://hyperphysics.phy-astr.gsu.edu/hbase/thermo/thereq.html#c2

—

If you want to work in a place where we care about the quality of our development workflow, take a look at https://buildo.io/careers