Press enter or click to view image in full size
When was the last time you asked your Ops person to help you with some task? Maybe it was resetting a VM that wasn’t responding, checking the logs for the CI server, or perhaps creating a new account for the developer who joined the company this morning.
Chances are the response fell somewhere between the following categories:
- “Please open a Jira ticket.”
- “Sure, I’ll do it as soon as I have fixed this production outage.”
- “I’m really busy today, can you ping me again tomorrow?”
- “Actually, you should be able to do it yourself. Here’s a link to the documentation.”
Most people would probably prefer to receive the last answer. However, that is still far from being the most common one, especially in small companies.
Ops people often think it’s easier to do something than to teach someone to do it, but for repeated tasks this is not true. The effort spent on documenting a task can be offset very quickly.
Documentation is useful also for the person who writes it. It is easier to follow a sequence of written steps, instead of relying on memory.
Empowering your team
This is the core of the DevOps philosophy:
we want to enable everyone to perform Ops tasks.
When developers are autonomous in performing operations, they’re also less likely to depend on other people to complete their work. If a task can be performed by several people, this creates redundancy, increasing robustness.
A short history of runbooks
You might have heard of a specific term for this kind of documentation: runbooks, or playbooks.
But what is precisely a runbook? I first encountered runbooks during my time at the BBC. All teams were required to create a runbook following a standard template so that the 24/7 team could effectively fix issues that occurred out of office hours.
Working in a smaller company, especially one where DevOps is preached and there is no separate Operations team, you might think that a structured, exhaustive runbook is not necessary as everyone will slowly learn how to perform essential tasks.
In reality, knowledge just doesn’t spread automatically. Devs will always find easier to call someone else to perform Ops tasks. Ops will always struggle to prioritize documenting things over addressing urgent issues.
How to get there
If you don’t have any runbook at all, writing the first one might seem a daunting job. Our advice is to follow a simple rule:
If you are doing a task that you expect you will do again in the future, take some extra time and write down the steps that you perform.
This applies even more if it’s something that you have already done several times, but still does not have any documentation.
Get Francesco Negri’s stories in your inbox
Join Medium for free to get updates from this writer.
You might not have noticed, but you have just started your first runbook!
Choosing the right tool
How do you go from having documented a single procedure, to a collaborative runbook that you continuously improve?
One key thing is to remove as much friction as you can from the process of editing and improving a runbook. We started out using GitHub Wiki, but the editing experience was not pleasurable. To make a small change you had to switch to the “edit mode”, you were asked to provide a commit message, and it wasn’t possible to leave a comment on an existing page.
We switched to Dropbox Paper and haven’t looked back since. We were already familiar with it since we use it as our primary collaboration tool at buildo. But even if you have never used it, it’s intuitive and easy to pick up. Edits are saved in real-time, and you don’t have to worry about other people editing the same document. You can quickly assign small tasks without resorting to an external tool, and we find the commenting feature extremely useful to discuss a section that needs improving.
Defining a template
Once you have written a few runbooks, you will probably find some recurring sections and themes. At this point (and not before!) it’s a good idea to consolidate these into a runbook template that can be used as a starting point for new projects.
We encourage you to find your own template but, just as a reference, these are some of the sections we have in ours:
- List of environments: a table where you list all the existing environments for an app, together with the hostname, the type of deployment, etc.
- CI: links to the CI environment, with a description of the different CI jobs.
- Running the app on your computer: dependencies, prerequisites, and setup instructions for new Devs that need to start working on the project on their local machine.
- Logs: where to find logs for all environments and services.
- Maintenance & troubleshooting: a list of procedures and short scripts that can be copy-pasted to address recurring situations (e.g., restarting an instance, cloning a database to your local machine, cleaning up disk space, etc.)
A word about scripts
Many operations will involve some command-line script that must be copy-pasted into a terminal. While writing it down in a runbook is already an improvement over typing it by hand, or relying on your shell history (Ctrl+R, anyone?), we quickly realized that scripts longer than two lines are better treated as code; thus they should be versioned!
bash history !== knowledge
Instead of copy/pasting your bash code into the runbook we would advise having a scripts folder in your repos, while the runbook should just link to the relevant ones.
Keeping runbooks up-to-date
As our runbooks grow, it gets easier for some sections to become outdated. We don’t have a silver bullet to avoid this, but we believe this is normal and outdated documentation is often better than no documentation at all.
If you find something that is outdated, but don’t have time to fix it immediately, leave a quick note or add a task for someone to update it. Also, make sure every runbook has a clear owner and write it at the top of the document.
What we gained so far
At buildo, we started using runbooks only a few months ago. Now we have a runbook for every major project and tool in our company.
Some operations that only one or two people in the company could do are now routinely performed by different devs, sometimes even by project managers!
The conversation about Ops tasks has shifted to runbooks and it’s now common to see people saying on Slack “you can find it on the runbook 📋”, or “remember to add this to the runbook 👮”.
Finally, writing runbooks increased the overall transparency in buildo. Outdated or convoluted procedures have come under the spotlight and we have implicitly mapped our entire DevOps stack. Because every journey starts with a map.
__
If you want to work in a place where we take DevOps seriously, and we care about quality in our development workflow, take a look at https://buildo.io/careers