Paint it Black!

7 min read Original article ↗

David Wobrock

How we formatted our Python codebase using Black

On improving code reviews through homogeneous and consistent code formatting.

The need

At Botify, code reviews are at the heart of our development process. We benefit from each developer’s vigilant eyes as the code is read, challenged and improved as a team. A great article about Botify’s review process is already available on the engineering blog.

However, we faced a repeated irritating issue when doing reviews on our Python codebase: formatting.

Pull request reviews are paramount to a functional working codebase and we consider that they should be as efficient as possible. Part of that is removing a lot of the extraneous discussions through automatic tests and continuous integration (CI). During the review process, this allows us to entirely focus on the platform and how it works rather than just how the code looks.

Frequently, code review comments pointed out a missing trailing comma, which annoyed the author, polluted the produced diff and reduced our overall velocity.

We, as a team, aimed to fix this problem.

Formatting issues in pull request reviews are par for the course of any programming language, but this issue is especially true for in Python.

We could try to conform to the standard style guide for Python code proposed by the famous PEP 8, but the specification is loose and the ways of formatting code complying to the PEP 8 are as numerous as the number of programmers writing it.

We listed our requirements in order to find the best fitting tool:

  • An opinionated tool to avoid unnecessary discussions
  • A code format that we can all agree on
  • Fast code formatting
  • Fast code format assertion
  • Easy integration in our CI system

We benchmarked a few tools: autopep8, yapf. But one stood out for us: Black.

The tool

Black is described by its author Łukasz Langa as the uncompromising Python code formatter which frees the developers from painstaking manual code formatting. The most important aspects of Black to us are the facts that it leaves no place for undecidedness on formatting, it minimises the diffs and it strongly tends to Python 3.

This allows the developers to be expecting a specific formatting when reading and reviewing Python code, making the process more pleasant and spending less brain-time on unnecessary style questions and more on the code and its architecture.

In addition, since all of our other codebases at Botify were using opinionated code formatting tools (JavaScript using ESLint/Prettier and Golang using gofmt), we wanted our Python codebase to follow suit. The mentioned formatters most certainly inspired Black and its philosophy.

Press enter or click to view image in full size

Snippet of our Project model, formatted with Black

The integration

To roll out the chosen tool and format our codebase, we did not want to force a rebase of the dozens of open pull requests. We therefore started by applying Black to all files that had not been modified in the beginning of the year. We felt that those files exposed a relatively high probability of not being edited in the next few weeks as they are not part of an active feature development.

Maintaining a partially formatted Black Python codebase is not built-in to Black, so we built on our experience with Prettier (the JavaScript equivalent) which uses a file pragma to describe formatted files. We set up some scripts around this concept to easily format the files, add the pragma and then only verify that those files are correctly formatted.

Example of the pragma in a file:

# -*- coding: utf-8 -*-
# @black 18.6b4
import …

The first application of the pragma was done in two steps. Identifying the files that have not been modified since the beginning of the year, and then adding the pragma to those. It is worth noting that the pragma comment has to be inserted after all leading one-line comments in the file to avoid breaking the file encoding or the interpreter binary comments.

Additionally, to avoid huge diffs and impossible code reviews, we followed a rule of creating a separate formatting commit which didn’t change any logic but only applied Black to the modified files. This allowed developers to skip formatting commits during reviews, as they know that the commits are deterministically applied by the tool and do not need reviewing.

Once the first change with the partial formatting applied, we enforced that every modified file had to contain the formatting pragma and thus incrementally applied Black to our entire codebase. With this strategy, we quickly rolled out formatting on the codebase, however, some corners of the code were left unchanged for quite some time and nasty conflicts emerged when multiple patches changed the same unformatted files.

The benefit of having a formatter rapidly became clear to every developer, we therefore decided to swallow the pill and format the entire codebase in a one-time change. We planned to format the entire codebase (Over 3000 Python files and 400k Python LOC) and to remove our pragma implementation and scripts once done.

The fundamental work of formatting the code was done quickly be calling black on the entire codebase:

black botify/

Likewise, removing our pragma from the concerned files is a one-liner:

find botify/ -name “*.py” -type f -print | xargs sed -i.bak “/^# @black/d”

Pitfalls

  • On the formatting aspect itself, some of our older scripts contained Python3-incompatible Python2 syntax. Those made Black gracefully fail and required manual editing of the files. Painful.
    Additionally, multiline type hinting in Python 2 is not correctly handled by Black. The issue is known but has not been addressed at the time of writing. Since those comments are not that numerous in our codebase, we decided to also fix those manually when occurring. However, they could become more painful if they appear more often.
  • Introducing the formatting incrementally was a good idea at first sight, but it introduced some side problems along the way. It is definitely worth doing a one-shot action once every one integrated formatting to one’s workflow and after all large pull requests have been merged to avoid large conflicts just before the acceptance of a piece of work.
  • It is worth noting that Black is still in beta and might be subject to format changes. We are confident that if changes happen, they will be for the best. In any event, thanks to fixed version dependencies, our entire continuous integration pipeline will not fail when the Black maintainers decide on a change.
  • A bad side-effect of applying a formatter during the lifetime of a repository is that we lightly mess up the version control system’s history. The last modification of a lot of lines of code aren’t semantic changes but only style changes. Those might slow down the process of looking up the history of a portion of code.
  • On a large codebase, checking the format can be long, even for an efficient tool such as Black. On the developer’s computer, Black uses a cache to remember the already formatted files. Nevertheless, on a CI server, no such cache exists and checking the formatting can add several minutes to each build. These sum up quickly when multiple developers are suggesting several changes throughout the day. It might be worthwhile to only verify the style of the altered files, and therefore adopt a smarter strategy than naively checking the entire repository. Another possibility is to parallelize CI jobs, by splitting format assertion, code linting and test executions.
  • Depending on your team’s developments habits, it can be beneficial to provide an integration for automatic Black formatting on different environments. For example, we provided a sample configuration for the three editors used by our Python developers so that .py files are formatted automatically on save.

Lessons learned

Now that formatting is completely enabled in our Python codebase, we are fully embracing the benefits it brings in terms of code readability and homogeneity. Making the change in a concentrated one-time effort allowed us to roll out Black quickly and painlessly.

Interested in joining us? We’re hiring! Don’t hesitate to send us a resume if there are no open positions that match your skills, we are always on the lookout for passionate people.