A tour of invariant techniques

12 min read Original article ↗

Emmanuel Stephan

Tales of Software Engineering — 4

Invariants are a very useful, but usually ignored, technique to reduce the number of bugs in code and speed-up development. There are several types of invariants, but they all revolve around this simple idea: verify that certain conditions hold at specific points in the code, usually object or method boundaries. Invariants are checkpoints in your code that assert predicates to police the state and behavior of your code. With some discipline, this is a powerful technique to reason about the code, and to reach correctness faster.

Class invariants

Let’s look at a first example, actually a counter-example, an example of something you might not want to do, unless you really have very good reasons.

Press enter or click to view image in full size

Here, we have a class very similar to std::pair, which contains only 2 pointers. It looks innocent enough, but there is no indication of whether these 2 pointers can be nullptr or not. We do not know if it is possible to have del_index be nullptr, but not add_ptr for example (in other words, if we can rely on the fact that add_index is always present when using a LookupData, but del_ptr is optional). What happens here is that the clients of this struct are left to their own devices and can make any assumption they want about these 2 pointers. If LookupData is used in many places in the code, it is very possible that different or even contradictory assumptions will be made about these 2 pointers, which will at least make the code harder to refactor. A better solution is to use encapsulation together with invariants:

Press enter or click to view image in full size

Here, we make the object immutable after construction (there are no setters). This makes it really easy to assert in the constructor that both pointers should not be nullptr. Now we have a properly defined state for instances of LookupData, and the assumptions are explicit, for everybody to see when they look at the constructor. This will also simplify the code, because the users of getAddIndex() and getDelIndex() can now be guaranteed that the pointers they get are not nullptr. They don’t need to check it on their end.

Let’s asssume that in the application that uses this LookupData, it turned out that del_index could be nullptr. For example, LookupData is used in a DB to return pointers to Indexes for a given key: there is always an index of additions to the DB for a given key, but sometimes there simply aren’t any deletions for that key (the deletions are stored instead as well as the additions for speed and to avoid locks). In fact, we could also imagine that the fact that del_index could be nullptr was discovered precisely because one of the test cases failed. That would be very easy to debug, and would give more insight into the functioning of the system. Fixing would be equally easy, by just relaxing the second invariant. The invariants in that case would be a way to better understand the overall system, like a learning guide in case the specification for LookupData was not communicated properly. Here is the code fixed with this learning. It is good to leave the assertion in place, but commented out, and to leave a note there, so that future maintainers of the code do not reintroduce the original bug. So, the lesson here is that systematically thinking about invariants is a great way to understand your code more deeply, which will invariably improve its quality. When you design a new class, you should right away ask yourself: what invariants hold for this class?

Press enter or click to view image in full size

A few things to note: you want to assert as much as you can upfront and then relax the assertions, rather than the other way around — once your code is in use, relaxing is usually easier than tightening ; invariants work great with encapsulation and immutability — it’s harder to check invariants if a portion of a class state is mutable ; as you relax invariants, document abundantly — the game is to make assumptions explicit, because implicit assumptions kill (i.e. introduce bugs).

Let’s look at another example in C++, where this time the invariants are made explicit by having their own checking method:

Press enter or click to view image in full size

Note that the first constructor needs to assert one more thing than is verified in the check_invariants() method: no problem! And that check is of course very important, it’s a trap you put for users of that class (which include you as you write more code that uses class Bytes). It will help weed out bugs faster. In this example, we also verify a static (compile time) invariant, that the value_type has a size of exactly 1 byte. This is very valuable to communicate to users of your class, and can be leveraged by algorithms that use this class.

Invariants can become quite complex in classes that have a lot of state, but that is all the more reason to ask yourself what the invariants should be. Here is a fragment of Java code with two classes that interact to verify invariants:

Press enter or click to view image in full size

In this example, the twoclasses DirectoryPair and Experiment both have their invariant checks, which are verified at the end of their respective constructors, and DirectoryPair constrains the possible values for ownerType quite tightly. Another solution for that would be to create an enum with two values. Types are very good tools to enforce invariants in code. Note also that the strategy here to flag an invariant failure is to return an Exception rather than assert. In this case, the invariants can presumably be recovered from, whereas in previous examples, any failure could not be recovered from. This depends on the situation at hand, but the invariant technique can be useful in both cases.

Relational invariants

Here is a fragment of a complex invariant check for a ConfigurationManager class that has “entities” and “externalResources”:

Press enter or click to view image in full size

The invariant makes sure that a certain dependency graph (defined by the order in which the Configuration objects are processed) holds between components of the ConfigurationManager class: a resource can depend only on resources that have seen before in the Configuration. It cannot be verified in the constructor of the Configuration objects, because it asserts something about the relations between several Configuration objects. It is nonetheless extremely valuable for the correctness of further algorithms.

Health-check invariants

Sometimes the invariant-upon-construction technique can become an outright health-check for the system. For example, the ConfigurationManager class from which we saw a snippet just before actually verifies the presence of certain resources in the system and the configuration of these resources as part of its construction, as well as some relational constraints between these resources (the part we already saw, which is in _graph.checkInvariant()). This may seem like a lot of work, but object construction is a good place to do all that work, and it pays handsome dividends later when writing the rest of the class.

Press enter or click to view image in full size

Let’s now look at another type of invariant: method invariants.

Method invariants

Class invariants are very powerful when used in conjunction with encapsulation and immutability. But the technique is not limited to classes. It can be applied to method calls too. When you do that, you setup more checkpoints that will prevent more bugs. Here is our first example, a C++ static template method that decodes variable byte encoded positive integers pointed to by an iterator.

Press enter or click to view image in full size

Here, we use 2 static pre-conditions to mare sure that we work with positive integers, and that the iterator iterates on bytes (the algorithm is incorrect otherwise). Note that passing a reference to val is superior to passing a T* (which is often seen), because in C++, references cannot be null. So, as we mentioned earlier, types themselves enforce invariants (here that val has already be allocated somewhere). There can be a lot of benefit for correctness to work with types which have a domain as small as possible and to relax these domains later if needed.

In this C++ example also, the invariants are checked at compile time, which is again superior to runtime verification, in the sense that the compiler won’t even let you run anything that does not respect the invariants. One final note: there are still implicit assumptions in this code that could be made explicit. Can you find them? Here is one: the iterator is assumed to be a forward iterator. On the other hand, nothing more is required from the iterator. In particular, we don’t assume that the vbe-encoded value lies in a contiguous chunk of memory. Here again, we see how thinking about invariant/pre-conditions deepens the understanding of the code.

Here is another example, in Java this time:

Press enter or click to view image in full size

Here the runtime pre-conditions to calling removeEdge are neatly isolated in a scope (that can be folded in any decent IDE). 3 conditions are verified: that src and dst are indeed node names in this Graph, and that there is an edge between src and dst. Of course, it is also possible to just return silently if any of these conditions does not hold. But catching them is more valuable, as it can be indicative of a bug in the client code that calls this function. This kind of aggressive pre-condition can be very efficient to catch logical bugs.

Post-conditions are rarer, but can be very useful too. If the class invariants and pre-conditions on all methods are tight, there might be less use for post-conditions in methods, except when the body of the method contains complex logic. These pieces of code can also usually benefit from loop invariants. Post-conditions can help guarantee that the results satisfy some basic conditions when the complex code that produces these results is tweaked/refactored. Here is an example:

Press enter or click to view image in full size

This function receives 4 lists, 2 of them sorted, 2 of them unsorted. Imagine that this is coming from a lock-free, log-style data structure that stores elements added and elements removed (“dels”) separately. New “adds” and “dels” are appended as fast as possible to a “delta” section, and a separate thread regularly “compacts” the data structure to produce sorted “add” and “del” sections including the new “adds” and “dels” that were in the delta sections. At the end of compaction, the add and del delta sections are empty and ready to receive more data. The “minus” function is in change of computing the difference between the “adds” and the “dels”, i.e. to produce a list of elements where those who were deleted do not appear.

To write this algorithm, pre- and post-conditions were first written, before anything else:

Press enter or click to view image in full size

Then a loop invariant was established, based on the post-condition and the fact that the algorithm would gradually build the merged result:

Press enter or click to view image in full size

Here, the invariant is immediately useful to detect a bug… can you find it? It is on line 41, where i needs to be incremented at the same time as j:

Press enter or click to view image in full size

With that fix, and thanks to the invariant, the algorithm (a simple one, admittedly) can be written correctly in record time. This example also illustrates how to use a known implementation of an algorithm to check a hand-written variant: after all, we could have made the code much simpler by just calling Python’s set.difference to compute merged. But in fact, this code was implemented because it is cheaper to work on lists than to allocate sets.

Other considerations

There can be a distinction between invariant checks that occur as part of development, where a developer makes assumptions explicit and sets up careful traps to catch “endogenous” development bugs, and “exogenous” runtime error conditions, that are meant to be surfaced to the user. The latter are usually related to user input mistakes or to the environment in which the code runs (e.g. resource unavailability).

Since invariant checks of the first kind are supposed to catch mistakes made by the developer, one can ask whether they should be removed once development is “complete”. But that is easy to answer, since development is usually never “complete”. There is always another refactoring on the horizon, and the invariant checks will be invaluable when refactoring. Better to leave them in the code. Of course, there are sometimes cases where an invariant, maybe a loop invariant inside an algorithm, can be removed once the algorithms has been thoroughly tested (with many unit tests including corner cases). This is usually done for speed.

And this is another concern you hear often, and that is actually a valid concern: invariant checks can slow down the code. If that happens, invariants can be guarded by an if-statement, or removed from the release version of the code entirely at compile time (like C assert). It is probably best to keep the invariants and make them conditional, rather than removing them entirely from the code base. For two reasons: when a bug shows up in the released code, reproducing the code with the invariant checks turned on can be very productive, and again, when refactoring time comes, the invariants will help with development velocity. Note that if invariant checking is made conditional, there is always the possibility that the correctness of the code was shown with asserts, but that nothing was shown about the correctness of the code without asserts. Here, one has to be very careful that the invariant checks do not introduce side-effects. Invariant checks should be purely functional predicates.

Conclusion

Systematically ask yourself what invariants hold for your classes, and what conditions need to be true of the arguments to any method. You cannot assert too much. At the very least, trying to find invariants makes you think about the code more deeply, and bring to the light implicit assumptions, always a good thing. Assert defensively and aggressively, relax later. Use invariants together with immutability, and in conjunction with copious amounts of unit tests. This will mechanize correctness verification, and speed up development, as well as inevitable refactorings.