Lean Roadmap Towards Continuous Delivery

In case you are a CTO or a devops lead of some sorts and have taken the initiative to adopt a continuous delivery style of shipping software, you are likely to be overwhelmed and are wondering where to start. This is an opinionated roadmap on how to get to the end result in several steps, delivering value to developers in each step.

Continuous Delivery?

There are many schools of thought of what continuous delivery vs continuous integration vs continuous deployment mean and represent, so let’s make it clear of what are we talking about in this post.

Press enter or click to view image in full size

In this write up we are defining continuous delivery as the following process:

Upon pushing our code changes, we start with running tests in an CI platform (think travis/circle CI or similar),
then we build our application, if it needs either building or packaging (i.e. your Go code needs building, your python app can be packaged into a wheel or you can build your docker container),
followed by an either automatic or manual trigger (if we automatically trigger a deploy after every build step, we get continuous deployment) of some sort that initiates the deployment of our application.

Roadmap and Scenario

The process described above requires several components either built or introduced into your overall system and instead of trying to tackle everything at once we can treat the development and adoption of continuous delivery as a lean product development cycle. We will try to build something in every step and deliver it to the “customers” — customers being developers themselves since the entire process brings value to developers.

The scenario we are basing this roadmap on is fairly realistic and also based on my personal experience and experiences of my colleagues that worked for startups that gained traction, started expanding the team and then requiring the adoption of developer tooling to support continuous delivery.

Obviously there are numerous options on technologies you can pick to implement your continuous delivery pipeline, but in this scenario we will explore 2 broader solutions: one based on docker and an alternative solution without docker.

Let’s start our scenario with a typical setup where developers still deploy the web application by ssh-ing into a production machine and then running git pull and running a script that restarts the application.

Given these circumstances, we can outline the roadmap here:

Automated deployment — in this step we will build a deployer component in our CD pipeline, making sure that we can deploy an application in a single step. If our team is deploying manually, then we can deliver value here, by providing them with a single command to be able to deploy.
Staging environment — here we will introduce a separate instance of the application where we will deploy our application to and then use this instance for testing new features.
Introduction of an CI environment — after being able to deploy in a single step, we will introduce a CI system that enables us to test and build an application on a remote server. The value for developers here is the ability to be able to run the test suite remotely and possibly also in parallel, thus speeding up the build time considerably opposed to running all tests locally in a single thread.
Building/packaging of the app within CI environment — a logical step after introducing the CI env is to build the application (i.e. building the container+pushing it to registry or producing a python wheel or similar) in the pipeline.
Tooling for running database migrations — schema migrations in the context of continuous delivery require extra attention and we will postpone solving this problem up until this point.
Fine tuning and making the most of our CI env — since you have the ability to run a process upon every code push, we can capitalize on this fact and also make sure our code is automatically checked for quality, complexity and even security flaws.
Review apps — this is an emerging pattern in the continuous delivery ecosystem and is supported by heroku and CI systems like gitlab.

Automated deployment

To kick things off we will design a standalone system component that is conceptually capable of performing the following steps:

Press enter or click to view image in full size

Issuing a remote command that instructs the system to deploy an application at a certain version to the designated environment. The environment being: production, staging and similar.
As a next step the deployer system component will fetch the application from the storage where you store application builds and the corresponding configuration.
As a last step this system will deploy the application to the app servers and deploy the static files to the CDN or whatever else have we.

Luckily for us we don’t have to design such a system from scratch since the following solutions satisfy the desired system design:

Kubernetes on it’s own is a perfect fit since it is a true application management layer on top of an OS and together with docker and the ecosystem provides all the tooling required, since we can simply grab the desired build from your docker registry, you configure the container via ConfigMaps and secrets and finally perform a rolling update to deploy a new replication set. Kubernetes can also be called remotely since the CLI tool kubectl just wraps a well designed RESTful API.
Saltstack — could be a good alternative in case you don’t have a dockerized architecture. Saltstack is a master-slave type of config management system, where each application server is a slave and then we have a master to whom we can issue a command via a RESTful API (it even comes with a CLI wrapper named pepper).
Other alternatives — we could conceptually achieve the same results with Ansible and it’s addon Ansible Tower since it implements a nice REST API to be able to invoke a remote deployment command.

When building and or setting up such a system we have be to sure to follow some design principles:

We should be able to roll back to the exact version of the application, using exactly the same config and the same dependencies. In case we don’t use docker, we should make sure that we cache the same node_modules or virtualenv or whatever is the equivalent of dependencies state in the language you are using.
Keep your config and secrets of the application decoupled from application package and code i.e. make sure that you don’t push secrets and credentials to your docker image. At the same time we have to make sure that your config is versioned and not just overridden. In the case of a roll back we know exactly how our application was configured.

Why is #1 important? Imagine a scenario where we don’t use docker and one developer decided to bump up a library version and didn’t realize that this will break the app. If we simply cache the representation of dependencies on the local filesystem of the previously deployed app version we don’t need to play the guessing game and hoping to restore the previous environment by reinstalling the previous version of the lib, we simply revert back to the EXACT same dependencies, config and app version. #2 is self explanatory and a widely adopted practice of the 12 factor app manifesto.

You can take all you have built in this step of the roadmap and simply wrap the REST API in a CLI tool and provide it to your dev team. It’s not perfect, but following the lean methodology of shipping products we deliver value at each step.

$ deployer --app=foo --version=build-123 --env staging

If you are a developer and a devops person just made this CLI tool for you, that relieves you of tedious work of having to SSH into a machine (or even multiple machines!) and perform multiple steps to deploy, this intermediate CLI command is a breath of fresh air.

Staging Environment

Having a separate instance of the application isn’t exactly vital to the process of continuous delivery, but it is a widely adopted pattern on how to QA, test and gather feedback of your new version of the application prior to the final production release. At the end of this section we will explore an alternative on how to QA features without a staging environment.

The value for developers by introducing a staging environment is pure peace of mind that they are able to release their changes to a neutral and separate instance of application (neutral in the sense of — this not being their own laptop or personal setup) before taking the effects of their code to production.

Having a staging environment also solves a lot of problems for continuous delivery with respect to deployment since we can implement a pattern named blue-green deployment.

The basic idea here is to run a staging green instance of your application in parallel to the production blue instance. Your blue instance is considered production and you use your green instance as staging. In front of both instances is a traffic router that can, depending on the type of the application we are running, mirror traffic the the green instance for the purposes of performance QA. So you simply integrate all the changes to your green instance and once you are happy with the changes, you simply instruct the router to start redirecting traffic to the green instance which becomes your new production and blue becomes staging. The beauty of such system design is also to be able to trivially rollback by switching traffic back to the blue instance in case of an uncaught regression due to new changes.

In this setup there are a lot of options how to deal with database transactions and this really depends on the nature of your application and data. One of the options as per Martin Fowler is to switch to read only mode before the cut of and then switching back to read write mode of your app.

An alternative to the pattern of a staging environment for the purposes of QA and gathering feedback are so called feature flags (also known as feature toggle). This is a mechanism that hides specific features from users based on a built in permissions system, that defines, if a specific user has access to certain feature. This enables developers, to simply release the newly built feature to production, give permissions to QA users in the system and then test the feature directly in production. Now feature flags aren’t exclusive to the pattern of a staging environment, but can be a complementary component to the release process.

CI Platform

CI Platform = you push code, this system is hooked to that event and it grabs the code, tests it, builds it and optionally triggers subsequent events such as deployment.

When we go out and start picking and choosing candidates for our CI platform there are several important features we should be looking for in a CI platform:

Support for pipelines — this an emerging pattern among CI platforms, that allows developers to configure stages in the build i.e. test, lint code, test coverage computation, build, package, deploy etc

Press enter or click to view image in full size

Cache — your builds will most likely become slow and being able to cache some part of the build is a valuable feature (perhaps you will want to cache a tarball that represents a dependency so that you don’t have to download it upon every build)
Support for docker — being able to rely on running docker containers on our CI platform can prove to be very useful since we might have to set up some elaborate dependencies to run ours test suite (i.e. running postgres, headless browser etc)
Parallelization — nowadays many CI platforms support running tests in parallel in isolated containers, which can significantly speed up your build time.
Easy build config— almost all modern CI platforms support hidden YML files (e.g. .gitlab-ci.yml or .travis.yml ) that are committed into the repository. This proves to be really useful since it allows your developers to have a versioned recipe of your build pipeline close to code and allows them to fine tune the build system with little to no involvement of ops.

There are other things to consider when choosing a CI platform and one of them is to decide whether or not we want to host the platform ourselves or should we utilize a multi tenant hosted solution. When making such a decision it’s just important that we weigh in on the security vs convenience tradeoff. Keep in mind that a hosted CI platform can save us a ton of time when trying to set up complex end to end tests that require us to spin up a ton of supporting infrastructure (i.e. circle CI has a lot of databases already set up for you).

If you have followed the pattern of building a standalone deployment system that can be triggered via a REST API then you shouldn’t be worried about security as much, since you aren’t letting a CI platform perhaps even outside of the boundaries of your system to ssh into your machines since the responsibility of the CI platform stops at the point of pushing the docker image to your registry of uploading the application package to s3 or similar.

Once we introduce the CI platform to our tech stack this will allow developers to start developing build pipelines and will relieve them of a lot of stress of potentially breaking the app since they will have an independent system running tests automatically for every build. A CI platform also brings accountability to team dynamics since the team will be able to clearly see, if coverage is going up or down (pssst .. come CI systems have first class support for coverage data or you can make use of coverage analysis tooling like coveralls) due to someone pushing untested code.

Building the Application

Now that we have a system that is capable of deploying our application and a system that is testing it continuously, among other steps in the pipeline, we can build a contract between the two in a sense where:

CI platform is responsible for compiling the code and or packaging it (i.e. python wheel, ruby gem, java jar etc), minifying and concatenating the Javascript static files and then just simply pushing it to a pre-agreed location — i.e. docker registry, s3 folder or you can make use of the CIs platform artifacts.
When deployer is instructed to deploy an application at a certain version, it simply fetches the packaged application from a pre-agreed location and then unpacks it, installs the dependencies and installs the app to individual application servers. In case you use docker+k8s this step is trivial since all you need to do is to deploy a new replica set based on an image that was pushed by the CI platform.

Press enter or click to view image in full size

A common pitfall is to build in the complexity of pulling code from git and compiling it into deployer. Architecturally it defeats the purpose of a CI platform and the pipeline it implements and with such system design we would be doubling the same functionality.

What’s the value for developers in this step? It’s not having to manually build the application locally for the purposes of deployment. In the ecosystem we have introduced into our tech stack everything happens on remote servers away from dev laptops.

At this point we can take different turns in our continuous delivery strategy since all the necessary system components are mostly there. So if we would want, we can throw away the wrapper deployer script introduced 2 steps back and develop a simple chatbot that updates a slack channel when a new stable build is available in master branch and then a slack command to deploy the app via this version. Building such a command is now trivial since all we need to do is to call deployer via REST API (i.e. kubectl is just a wrapper).

Alternatively, if we have deep trust into our test suite, we can simply trigger a deployment automatically after every build in our CI platform, thus implementing continuous deployment.

Tooling for Database Migrations

So let’s first address how to deal with database migrations conceptually before designing any tooling for developers. Since database migrations are indeed a critical section in our continuous delivery pipeline there is only one solution to the problem and the solution is called — preparation.

“By failing to prepare you prepare yourself to fail” — Thomas Jefferson

Database migrations should be separate from the deployment and you shouldn’t be coupling schema changes with the deployment of the application since you create a point of no return in case you need a rollback.

We should always prepare the new version of the application to be able to work with both the new and old version of the database schema. Luckily for us the majority of migrations are trivial (i.e. adding columns with either default or null values) and adapting the application to the old and new schema will not be as painful.

Once we have prepared the new version of the application we first deploy the new version, commit the schema changes, make sure everything is working as expected and then remove the support for the old schema version.

In terms of tooling, here is how we can deliver value to developers. Instead of having developers storing credentials to production databases locally and then putting the responsibility onto them and their laptops to carefully apply the database schema migration or even a data migration (perhaps they need to insert a ton of CSV data into the DB .. ?), we can provide them with tooling that will relieve them of such a responsibility.

Personally I’ve seen one really good idea and a solution implemented by my former colleague Marko Čelan that is worth mentioning here. Marko designed a simple slack chatbot who receives a simple command:

> developer: bot, I need a migration container at build #235
> bot: spinning up a new container for you ...
> bot: here is a container with your ssh key already installed on it        ssh -p 28239 ubuntu@migration234.internal.domain

The idea is to spin up a new container based on a build number coming from the CI platform and putting the necessary code, data and dependencies into such a container, required to run a successful data or schema migrations.

This pattern of a chatbot spinning up dedicated containers that we ssh into and run migrations from, solves the following problems:

We don’t need developers carrying around prod DB credentials,
for long running migrations, we don’t need to worry about developers loosing connection with production from local dev environments mid migration,
developers can coordinate the migrations via slack and even see who is running them in real time,
in case of data heavy migrations, we don’t rely on developers disk on their laptop to carry around lot’s of data.

Fine Tuning

Since we have introduced a system that is capable of performing automated tasks based on our code, why not use this opportunity to ensure that the same systems checks our code for quality, complexity, best practices and conventions?

The value for developers by introducing these checks into your CI pipeline is to make sure that the code committed is vetted by a machine before it reaches a fellow developer for a code review. Your code reviews can then be more focused on substance and not on “hey you should be following the PEP8 standard for naming this variable”.

In the wake of codeclimate, there are a ton of open source tools that can help you achieve the almost the same functionality that codeclimate offers and here are just a couple of examples for automated checks you can run for JS, Go and python, but I am sure that with a little bit of googling you can find the same tools for other popular languages also.

Computation of McCabe score aka cyclomatic complexity — by monitoring the amount of linearly independent paths through a function, can alert a developer that they have concocted a monster function that should probably be broken down or simplified before submitting that code review (example: eslint has support for cyclomatic complexity computation)
Code formatting and linting — pretty much every language has a tool that can check your code for style, idioms, formatting and can expose common coding pitfalls for that specific language e.g. Javascript ecosystem has eslint, Python has pylint and lately coala.io is getting traction in the community for providing a common CLI interface across all the linting ecosystem for several languages.
Tracking code coverage — we can also use the CI platform to continuously compute the code coverage and prevent untested code to be pushed into the codebase. More importantly than computing coverage for a single build is also to keep an eye on code coverage statistics in terms of test suite execution duration, number of tests and overall coverage i.e. if the number of test cases being written increase linearly over time and your test suite execution time increases exponentially, then you must be doing something wrong. There is a great service for tracking code coverage called coveralls.io and there’s also this DIY CLI tool (disclaimer: this is my hack) that parses junit xml files and pushes observed metrics to your metrics backend.

Review Apps

An emerging pattern in the process of reviewing app changes, is to orchestrate the creation of a new instance of the application (or parts of it) for every new branch pushed to CI and then destroying the instance once the branch is deleted.

The pattern of review apps is supported by both heroku and gitlab CI.

Press enter or click to view image in full size

Using this pattern, we enable every developer or even a group of developers working on a feature or topic branch to push their changes and obtain a completely isolated instance of the application, which they can play around with and use it for the purposes of demoing the newly created feature to their teammates, product managers and or QA engineers/testers.

As mentioned, heroku supports this out of the box, but if you don’t have the luxury to work with heroku, you can build your own solution on top of gitlab CI. There are some challenges to overcome since creating a new instance of the application is trivial using docker and kubernetes, the non trivial part is how to spin up a new instance of the app alongside a separate database with representative data. The options here are either to start maintaining representative fixtures alongside your code that developers can use for both populating a review app and for the purposes of writing integration tests.

Conclusion

This was a thinking essay on how I would approach introducing continuous delivery at a company that is growing or hasn’t yet adopted modern CD practices. Introducing CD can be overwhelming, if you try to do a little of everything at once and then fail to deliver value to your team for a long time, that’s why in the proposed roadmap above we always stick to principle of iterating and delivering value at each step.

About the author

Tomaž Kovačič is an engineering manager, advisor to companies and other fellow technical managers, meetup and conference organizer and a devops enthusiast.

Would you have done any of the steps differently? Are you currently trying to introduce CD to your workflow? Ping me, I’m happy to chat. (tomaz dot kovacic at gmail dot com)