Comment commentary - NFHN Reader

A recent highly-voted-up question on Programmers asked what’s wrong with comments that explain complex code? I think quite a bit about how I comment my code but rather than posting an answer on Programmers I thought I’d blog about it a bit here. I already discussed this topic — HOLY GOODNESS TEN YEARS AGO wow I have been writing this blog for a long time — and everything I said then still applies. But today thanks to Roslyn being open-sourced there is now a large corpus of my code on the internet so I can talk about my comment strategy in the context of real production code.

The code which does semantic analysis of user-defined explicit conversions in C# is a pretty good example.

To start with, notice that the code is in a partial class and every method is private. Because these are private implementation details of a larger class, not public API methods for consumption by Roslyn’s users, there are no XML doc comments anywhere.

Next, notice that every method has an absolutely immense name, like ComputeApplicableUserDefinedExplicitConversionSet. These names clearly explain what the method does using the jargon of the specification; if you want to know what “applicable” means, read the specification. By making the names of methods so verbose I can do things like search the entire source code text for “Applicable” and know that I’ll find all the methods that compute applicability. Codebases where the developers use abbrevs lk n emo 13yo drive me nuts; naming this method CompAppUDEC does no one any favours. Long descriptive names decrease the need for comments because the code reflects its purpose without comments.

Let’s look at some statistics. This file has 108 lines that are either blank or contain a single brace, 165 lines of comments, 15 lines of precondition assertions, and 122 lines of meaningful code. The comments-to-real-code ratio seems huge; more than one line of commentary per line of code! But if you take a closer look at those comments, you see that one line is the copyright, only three lines are single-line comments explaining the meaning of the next line of code, 39 lines are direct quotes from the specification, and the remaining 122 lines of comments are divided up into seven blocks, several of which are quite long — more than 40 lines in one case.

What’s really going on here should now be clear. The code is structured so that every method implements a particular paragraph of the specification. Each line of code that is motivated by a given line of the specification has that line quoted. This way I can go back and forth between the text of the specification and the text of the code easily. When the spec changes I can do a search on the changed line and then update the code and quoted spec line. And when I find a bug in the implementation, I can easily find the corresponding section in the specification.

The really big comments are places where the code deliberately diverges from the specification in order to maintain a bug from the previous version because that is precisely what needs a huge amount of explanation! Those big comments are there to ensure that no one maintaining the code in the future says “oh, this is a spec violation and therefore a bug, I will fix it”, and to ensure that the consequences of every violation of the specification can be clearly understood.

And of course by reading these comments and this code you probably now know far more about explicit conversion processing than any sane human being ought to know. This is one of the most confusing parts of the specification, so it is no wonder that the previous implementation was buggy, and that Roslyn has to be bug-for-bug compatible. People often ask why it is that Roslyn took so long; remember, I and my colleagues had to discover all those bugs, deduce what the previous compiler’s behavior was, and then copy that behavior into Roslyn, and then write huge numbers of regression tests to ensure that the bug stayed enshrined forever. Not to mention the arguments about whether the spec ought to be updated to match the compiler. It’s a lot of work rewriting a ten-year-old compiler in a new language!