Test coverage only matters if it's at 100%

23 points by charlax 7 years ago · 52 comments

Reader

cbanek 7 years ago

As someone who worked somewhere where 100% code coverage was practically required, while it does point out flaws in your testing where you may have "missed a spot." It still doesn't prove that you're testing everything. You never know if a library function you are calling beneath your code has a different path it can also follow.

> I believe the only level of test coverage that's worth it is 100% of the lines you want to be covered.

This statement I vehemently disagree with. It seems like the kind of thing someone would hit you with as an excuse for why testing isn't useful at all because it's not perfect. I have found in my experience that simple end to end tests give me the best bang for my buck, and even a test suite that covers 80% of the code is pretty good.

To paraphrase from the Google code reviewer guide: No code is perfect, just try to make it better than it was.

100% test coverage doesn't tell you if you have race conditions, security problems, or networking issues. 100% test coverage doesn't matter much in distributed systems, where you can individually test each component to 100% but still have the system not work as a whole. 100% coverage doesn't mean you're checking that all your returned results are verified and what they should be.

> The problem with having a goal that is not 100% is that it leaves room for interpretation and negotiation.

By handing out edicts like these, you're taking away tools like negotiation and common sense and replacing them with blanket rules. Time is a finite resource and test coverage is only one small part of testing. Going from 95% to 100% might take more time and provide less value than other kinds of testing and other concepts (like UX, or market fit, where you test that what you're building is usable and useful).

Just because you answer all the questions right on the test doesn't mean you have perfect knowledge of the situation. Thinking such is hubris and leads to errors/defects.

By excluding lines and saying only make sure you cover what you're testing you're intentionally making blind spots. Just let the coverage number be what it is. Look at your coverage reports. Check the coverage of key components.

> I think it is a perfectly fine decision to exclude some code from coverage if the tests are too expensive to write (because of the required setup for instance).

No, no, no, no, no x a million. This is the fallacy that if you unit test each component, then when you put it together it's perfect because each part is perfect. You still have to do integration tests and complicated setup tests. If a test takes a long time, try to make it shorter, don't try to avoid testing it. Anything you avoid testing is where bugs will nest and code will churn.

To sum up, most of the important, complicated, and hairy bugs are not found by unit testing. It is the simple bugs that are found by unit testing. These are important to get out of the way because a system with a thousand cuts can still kill you, but in no way is 100% coverage and tests passing mean anything other than 100% coverage and tests passing.

virgilp 7 years ago

I would go farther and claim that 100% test coverage is worse than 90% test coverage. You're probably testing irrelevant stuff - which just means that you have more test code to maintain, and also likely means that your tests are bad (i.e. not testing business logic, but testing irrelevant implementation details). Last, but not least, 100% test coverage means that your code is bad/ you're not programming defensively (how exactly did you cover all those error-handling bits for cases of unexpected exceptions/ "stuff that theoretically should never happen, but in practice it might, so let's have some sensible treatment/ logging/ etc."? )
- plorkyeran 7 years ago
  
  Or it could mean that you've contorted your code to make sure that you can hit every single defensive edge case, and in the process possibly introduced new bugs due to the increased complexity.
- davidw 7 years ago
  
  Yep. At a certain point you start hitting diminishing returns and increased costs.
charlaxOP 7 years ago
Hey - author of the article here.
Thanks for the detailed answer! I have answered some of your points and some of the points made in the comments below in the article.
The article might have been perceived as more dogmatic than it meant to be. It is actually, I believe, a pretty pragmatic position. I tried to clarify the wording here and there, let me know if you still disagree with the point I'm making.
### 100% test coverage does not mean you're testing everything right.
Absolutely - this point is explicitly stated in this article. I even give an example situation showing how just checking the test coverage leads to missing an important test.
100% test coverage does not make your code invulnerable, and it should evidently not be your only metric. This article is only about the test coverage metric.
### A test suite that covers 80% is pretty good
Absolutely. It is a good number and a good goal for any codebase.
However, what about that remaining 20 %? Why are they not tested? Will it be clear in 2 months why they were not tested? In 6 months? In a year? While it may make perfect sense not to test them, you should be explicit about that decision and keep the reason in the code.
If you don't keep the test coverage _metric_ at 100%, then you leave it up to the code reviewer to challenge your test coverage assumption.
### 100% is a blanket rule that leaves no room for negotiation
Once again, the goal is not to cover 100% of the lines of code - it would be almost impossible. Thanks to `no cover` markers, you can still decide to exclude code from coverage. It actually makes this negotiation explicit in the code, as opposed to implicit during the code review phase coverage.
Consider the example below:
```
  def toast(bread: str) -> str:
      if toast == "brioche":
          raise ValueError("...")
```
Let's say you're fixing a bug in this code, and you find out that the `if` branch is not covered. You're left to wonder why. The developer did not have enough time? They decided it was too trivial to need a test?
With an explicit marker, the intent is clear and gives you insight with what is considered appropriate coverage for _this_ codebase:
```
  def toast(bread: str) -> str:
      if toast == "brioche":  # pragma: no cover, trivial error case
          raise ValueError("...")
```
### It is not feasible in an old codebase
Right, that's probably the case. However, this is not different from proper testing, monitoring, logging practices. Decide if it's important, then start with something attainable that brings you closer to the goal, and iterate.
Also, if it is not a desirable goal for the codebase, then for sure don't monitor test coverage!
### Enforcing 100% test coverage leads to bad tests
It bears repeating: this is not about testing 100% of the lines; this is about keeping the code coverage metric at 100%. I am not sure how that would lead to bad tests.
Putting too much focus on one particular metric might lead to gaming behavior. That does not mean that no metric you should be enforced. I believe that there isn't enough material and training about what it takes to write good tests. This is beneficial regardless of whether you enforce test coverage. Also, the more you talk about test coverage, the more you iterate and learn about what it takes to write good tests in your language and codebase.
What's sure is that it is straightforward to write bad tests. It takes a lot of skills, experience, and hygiene to write great, maintainable tests. That's a topic for another article.
### 100% coverage only matters for languages without a type checker
No. Sure, a type checker might catch some classes of bugs that would require a test in a language without type checking, but you still need to test correctness (among others).
### You're creating blind spots
Unless you're reviewing your test coverage report after every single commit, leaving explicit markers in the code and keeping the metric at 100% is actually a much safer practice.
When working in code that has been excluded, you'll immediately see the `no cover` marker, perhaps with a comment explaining why the code is excluded. This lets you reconsider the coverage decision.
Any regression in coverage will break the tests.
### You should not exclude code from coverage because of setup cost
This article is not about end-to-end vs. unit testing. I have provided some examples of code that I sometimes exclude from testing, but your mileage may vary.

aluminum96 7 years ago

This is totally ridiculous.

Test coverage clearly matters even if it's not 100%, as anyone who has ever worked on an older codebase knows.

As long as you're not causing coverage regressions, it is perfectly acceptable to leave untested code that "should" be tested, but hasn't been changed in years.

Test it when you change it; don't test for the sake of achieving a coverage benchmark.

rightbyte 7 years ago

And implying that there is 100% coverage just because the instruction register had all valid values is silly.
When writing test one should worry about the other registers too ...
- DoofusOfDeath 7 years ago
  
  Thank you for stating this so succinctly!
  This concern has been nagging me in the back of my mind, but I hadn't convinced myself that code-coverage was insufficient (in practice) until you framed it this way.
  Now all kinds of supporting examples come to mind:
  - divide-by-0 errors
  - floating-point exceptions
  - numeric underflow / overflow (both silent and signaled)
  The alternative code paths exercised by these situations are generally outside the purview of code-coverage tools, because they reside in the silicon / firmware, process' default signal handlers, etc.
  - m4r35n357 7 years ago
    
    Mutation testing can expose weaknesses like these in your automated tests.

aeternum 7 years ago

This is the type of article I'd expect from a junior engineer.

Test coverage matters even if it is not 100%, especially if it covers the most important use-cases. Cost and time always needs to be considered. 100% coverage may not be worth it. Coverage based on LOC is also an arbitrary metric in some sense, the author himself has examples of that. From a business POV, use-case coverage is more important.

lvturner 7 years ago

We had some mild debates at an old company on this matter (I don't think anyone was really ever pro 100% coverage, but we were under pressure to increase coverage in order to provide momentum and to emphasise that automated testing should be routine)

It resulted in a mug being made in the same style as our company mugs but with the words "Fuck Coverage" instead of the company name.

Eyebrows were raised when it was mistaken for a 'real' mug and taken in to a board meeting by a manager however..

kempbellt 7 years ago

So, don't bother writing any tests unless you can test everything?

At a previous shop, we had a simple test that ensured user could log in for a while, until we created more coverage. We didn't test every aspect of the code. This simple sanity check saved us once or twice.

Should we have not written this test just because we couldn't test every conceivable piece of code?

Don't let perfect be the enemy of good...

AgloeDreams 7 years ago

Exactly this, plus, the entire concept of 100% coverage is hilariously wrong.
Are you covering 100% of code interactions? How much of your integration logic are you mocking? Are you covering 100% of HTML and interface logic? How much code is not in your code? What about packages and imports? It's incredibly arbitrary for no good reason except to slow development of a product that may depend on fast development for survival in trade to test for bugs that honestly, actually, may never happen. To be clear, testing is good, but you write tests to serve the code and product, not the other way around.

josepot 7 years ago

Test coverage is a very dangerous metric. IMO it will only be useful the day that we find a way to not take into account those tests that are testing implementation details. Every time that I see a high number being enforced, the end result is always a disaster: lots of poor quality tests that are only there in order to bump up that metric. Those tests are poison: they don't aim to test correctness, they make refactors/improvements a PITA, they just lock you down to a concrete implementation and they don't even give you any guarantees that your code works correctly.

AgloeDreams 7 years ago

Bingo, this *1000.
Your team ends up spending half their time writing painful depressing tests that are bad afterthoughts. Generally with TONS of mocks and little real meaning.
- cbanek 7 years ago
  
  The other half of the time is spent editing the tightly coupled mocks/tests when you refactor the real code. Bad tests can be a liability, not an asset.

amflare 7 years ago

The title is disingenuous. A better title would be "100% test coverage should be the goal". Though that is laughably obvious and would generate no clicks, so I can see why the author made the choice they did.

That said, if you have a single test, that better than no tests. And 90% is better than 80%. Yes, 100% is ideal, but if you let perfect be the enemy of good, than you waste time when low hanging fruit breaks.

t-writescode 7 years ago

The irony to me is that the examples he provides for things not to cover are exactly things I think should be covered. Short-circuit logic is ripe for bugs and the code that causes the short-circuit should definitely be tested!

dbcurtis 7 years ago
Yes, and exceptions. Tests must tickle all the exception paths. An exception that contains a dumb typo crashes the system instead of propagating up to get handled. Just touching the exception path roots those out.
```
  raise ValueError(' '.join('meaningful message with:', parameter)
```
Is a stupid mistake that I still make all too often inside an exception handler.

rraghur 7 years ago

- Don't write unit tests for testing code, use them to guide your design.

While that statement is itself is somewhat questionable, it's a little better and IMO reflects the purpose of unit tests better. Usually, if I'm finding it hard to write or setup a test, that's a signal that something's off. Also, UTs are absolutely lovely when you're refactoring code - as long as code tests invariants and publicly visible behavior instead of implementation. So even if the part you're refactoring has say 60%, you'd still be in a much more confident of making changes.

Code coverage under anything other than unit tests can be very misleading - for ex I've seen places where system & integration tests are run on instrumented code (and little to no unit tests) and since a huge swathe of code will be hit (mostly happy path), there's a false sense of code quality and safety which is probably more dangerous than no tests.

Overall, "When a measure becomes a target, it ceases to be a good measure." applies most often to test coverage.

everyone 7 years ago

The author is either crazy, or else maybe works in a very niche area where this makes sense. I work in games, and some modules are very conducive to automated testing.

Eg. complex math functions, it'd be very easy for them to be not quite right and you would never notice without tests, and also they will probably never change.

An area that would be stupid to test (with automated tests) for example, would be a UI menu.

* The test (ie. replicating a user pressing buttons via code) is probably gonna be more complex and less maintainable than the code its testing.

* If there is a bug it will be obvious in human acceptance testing.

* UI menus are gonna change drastically during development.

Anyway, I could see a misguided requirement for 100% code coverage massively hampering a project and potentially killing it.

cbanek 7 years ago

And yet, UI menus are one of the things that always break, and break right in front of the user ("I can't click this button. This is greyed out and it shouldn't be.") Menus should definitely be tested, and I think testing them in an automated way is fine. Without a working UI, you have no control.
You can even do model based testing.
For example, you're testing a word processor. You know that your state is no document. Then you click the open file button. Press some other buttons, and knowing what button it is, you should expect some change.
That being said, usually you can "press" the buttons virtually, for example sending window messages or calling the same functions as the buttons wire up to, rather than knowing the pixel layout.
People have been doing these things for years.
- everyone 7 years ago
  
  Your sort of reiterating my point.
  I know UI stuff breaks a lot, but my point is that when it does break, its obvious. Bugs should be easily caught in human acceptance testing.
  Also an automated test UI is never going to be the same as a human test, a lot of UI bugs can result from the vagaries of various physical devices touchscreens / gamepads / motion controls etc.
  Also anything that tests such high level functionality is going to be very complicated and touch many parts of your code, it will probably be more complicated and less maintainable than the code its testing, which defeats the purpose imo, do you want to write tests for your tests?
  UI stuff tends to change radically over development, so you'd be more than doubling your work with all the testcode youd have to write.
  and it would still never replicate an actual human running the actual game, you absolutely are still going to be doing human acceptance tests + the closer you try and get automated tests to that the more complex and less maintainable they get.

dbcurtis 7 years ago

100% of what? Oh... line coverage, in Python.

Well, with Python, the compiler doesn't do type checking, so 100% coverage (or close) is very important. One of his cases of "what do you not need to check" is error-handling. Yet, having something in production that should raise an exception that gets handled versus having the exception blow up and crash the system is a critical system stability defect. And in fact, in my personal code, since I tend to try to create exception messages that are as specific as possible, I must admit with chagrin that screwing up a ''.join() to create the exception message is one of my most common errors. I at least touch-test my exception code, without exception.

And then... line coverage? Oh my, that is just the beginning. I pretty much always use the hypothesis package so that everything get exercised with a large variety of inputs. Inputs that I didn't have to think up, and avoid my pre-existing biases of what might get thrown at the function.

Line coverage does do much to really ensure that you have reached "interesting conditions". I wish Python had a good tool for that kind of instrumentation. Line coverage is really only a "touch-test" and mainly useful as a check-in gate, not as production testing.

DoofusOfDeath 7 years ago

> Well, with Python, the compiler doesn't do type checking, so 100% coverage (or close) is very important.
Intuitively I agree with you: In my personal experience, the more dynamically typed a language is, the more unpleasant surprises I get at runtime.
Some years ago I did a project in MLTON [0], and I was delighted by how many bugs were caught at compile that that, in other languages, would likely go unnoticed until runtime.
[0] http://www.mlton.org/

pacaro 7 years ago

For me the absence of coverage, or low coverage, is a useful piece of information. It's not the whole story but it is a smell, and it should be taken into consideration with the other smells/signals

Conversely perfect or high coverage is not a positive signal per se, but rather just the absence of a negative signal

alephnan 7 years ago

Edgy, but wrong.

Is there a difference between 79.9% and 80%? Maybe not. My team is increasing test coverage for a legacy code base. 60% to 80% DOES matter. And it’s not just the fact that the tests flags regression, but the very process of increasing test coverage uncovers bugs / dead code.

nrook 7 years ago

There's an insight here I haven't seen elsewhere. Everyone but the most blind ideologue knows 100% test coverage isn't a good use of your time. But it's hard to care about increasing coverage from 89% to 90%, or to feel justified in objecting during a code review because the change under review would decrease coverage from 90% to 89%.

The solution to this dilemma is to make it easy to mark code as "OK not to be tested". It's annoying to have test annotations in code, but I'd expect mostly entire modules will be exempted, so it's not that big a deal. And this way, the coder has to either write tests or make a specific decision not to.

choward 7 years ago

> With this setup, test coverage will be at 100% even though we did not test the case where the toaster is not plugged because it is too simple. Writing a test for this "functionality" would not be worth it. Therefore we explicitly exclude it from coverage. You should revisit this decision if it becomes more complex and needs to be tested.

This is ridiculous. He writes unreachable code and says he doesn't test it because it's "too simple". No, you're not testing it because you can't test it, because it's code that will NEVER run. Instead of deleting the pointless code he marks it as not needing test coverage.

rbongers 7 years ago

Not my experience at all... Tests need to be intelligently written for the project at hand, which sometimes means testing something to freeze the API, writing tests for small functions that are easier to test in a suite than anywhere else, and especially catching a problem where your program keeps failing, like testing with an older version of a library or runtime, or even testing something that you know could be fragile.

ihaveajob 7 years ago

Seems like the Intelligent Design nuts arguing "what good is half an eye?". Well, it's half as good as a full eye, a lot better than no eye.

the_narrator 7 years ago

I think what really matters is that your test coverage percentage increases over time and never goes down. A goal of 100% might be ideal, but is rarely realistic.

mabbo 7 years ago

This is such bad advice that's so well written I can only assume the author is trolling HN-types with it.

I'm serious. I think we're being trolled here everyone.

al2o3cr 7 years ago

    Some lines of code should be tested twice to ensure they're correct.

Translation: "100% coverage" is a vanity metric

m4r35n357 7 years ago

Not only that, you should subject your automated tests to mutation testing. No link coz wikipedia seems to be down for me.

matt2000 7 years ago

Given unlimited time, money and importance of the code in question - go for 100% coverage. Otherwise, probably not.

tracker1 7 years ago

100% coverage only ensures that you've looked at everything twice. Nothing more, nothing less.

mpweiher 7 years ago

There is actual empirical evidence that this is false.

wvenable 7 years ago

Feels like you didn't the read the (whole) article.
The author makes a good point -- everything should either be covered by tested or explicitly marked as excluded. Having a gray area of code that is included but not covered by tests is a recipe for missing something. That much seems obvious.
- nimblegorilla 7 years ago
  
  Seems like 50% coverage is better than 0% coverage and 75% coverage is better than 50% coverage.
  The author argues about the horrors of going from 80% to 79.9%, but I don't see how that is any different from the suggestion to explicitly exclude code. The excluded code is still in the gray area of code that is included but not covered by tests. What happens when someone starts making breaking changes to excluded code?
  - derefr 7 years ago
    
    > What happens when someone starts making breaking changes to excluded code?
    You... don't. You un-exclude it first, and then, because of the rule, you're not allowed to commit the code until you write the tests that bring it back up to 100% coverage. Then you make breaking changes.
    
    nimblegorilla 7 years ago
    
    I'm not following the benefit of excluding code in the first place. Your method seems like more work, more error prone, and more likely to leave unintended gaps in coverage.
  - wvenable 7 years ago
    
    Lots of code bases have code coverage in some middle range, lets say it's 82.34%. What does that number tell you? Did the team write 7.64% more code this month that doesn't have test cases?
    The point you're making is actually irrelevant to the discussion. If you explicitly excluded code, you probably have a reason for that and you take your chances. Ideally, in a perfect world, you'd have no excluded code.
    
    AgloeDreams 7 years ago
    
    This is why we have tools like coveralls.
    Also, this method doesn't show when you remove excluded code or add coverage for it. In a perfect world you should have no excluded code, and in a perfect world you should have 100% coverage.
    World isn't perfect though, and a good working product with critical features is better than one lacking important features but with '100%' coverage.
- blumomo 7 years ago
  
  That's exactly what we practice. And it works quite well. Can only recommend this methodology.
- mpweiher 7 years ago
  
  Telling the obvious from the true is the role of science.
  - kempbellt 7 years ago
    
    Beautiful statement
- youeseh 7 years ago
  
  What about the opposite? Exclusively marked as tested and assumed to not be otherwise?
markl42 7 years ago

Care to share?

Settings

Test coverage only matters if it's at 100%

Keyboard Shortcuts