3 percent of Python codebases we checked had silently failing unit tests

98 points by rikatee 4 years ago · 67 comments

Reader

misnome 4 years ago

We work with a rather large upstream scientific codebase in python. They have an innate distrust of anything that isn't written by them.

Their testing system depends on tests printing "OK" after every test. This means that in many cases, tests failing are indicated by the _absence_ of "OK" being printed.

(We've attempted to isolate those parts and write our own stuff testing against upstream in pytest. We once presented a proposal to move them to pytest, offering to do any work and even wrote pytest plugins to seamlessly integrate with their current system. We got a - literal - "Thanks, but no thanks.")

chriswarbo 4 years ago

> Their testing system depends on tests printing "OK" after every test.
Oof. If they'd instead put "ok" before every test, they might have been accidentally compatible with TAP! https://testanything.org
- rightbyte 4 years ago
  
  I think relying on stout is kinda fine. I do tests with TCL/Expect like that.
  It is nice to not have to depend on the language runtime to do the test.
danuker 4 years ago

There is a point where enough mismanagement makes relying on the said scientific codebase a liability rather than an advantage.
You could even be a victim of "Embrace, Extend, Extinguish".
My advice is to consider forking it, and poaching contributors, in the interests of common good.
adenozine 4 years ago

And you’re unwilling to name the library?
I’d love to investigate this further.
popularonion 4 years ago

That’s not the way I would do it, but is there really a problem here? Assuming the library itself isn’t littered with print statements that cause false positives.
Experience has taught me that the “right” testing framework for a project is whatever the developers are happy and productive with.
oli5679 4 years ago

Did they catch exceptions and then print 'not ok'?
- danuker 4 years ago
  
  > tests failing are indicated by the _absence_ of "OK" being printed
  I guess they are counting OKs then bisect their test suite until they find the "not ok" test.

byteface 4 years ago

I just grepped some sitepackages and saw this. Is it an example of what the author is saying?... https://github.com/gitpython-developers/GitPython/blob/main/...

If so that would appear common as came up right away.

rikateeOP 4 years ago

yes that's exactly the kind of problem :)

forgotusername6 4 years ago

I've had unit test suites in the past that failed to run the test if the test failed to compile. Those were the worst. I only found out because I roughly knew how many tests I expected to be run

ianbicking 4 years ago

During development I sometimes end my tests with assert 0, and only once I get to that failure do I know I've finished

dc-programmer 4 years ago

I’ve shipped unit tests that would fail but don’t run by copying the function signature of the previous test then forgetting to change the name of the new one. Only one of the tests will run (the first?).

Maybe unit tests need unit tests? (There’s probably a lint rule to catch what I describe above)

jka 4 years ago

> Maybe unit tests need unit tests? (There’s probably a lint rule to catch what I describe above)
Yep - meta-testing (ensuring that every unit test that exists in a project adds unique coverage, remains valid, runs as expected, and I'm sure many other properties) could (and should!) definitely be automated.
Some more advanced meta-testing could involve tracking changes to a project's source history over time (in other words: tests that run with commit history). By that I'm thinking of situations like: "does this test genuinely still test what it used to, after the test and/or application code was modified?"
- rcxdude 4 years ago
  
  mutation testing is one example: if you make random changes (random in terms of transforming valid code to different valid code) to the code being tested, you should expect that the test will then fail. If not, there is some part of the code's behaviour which is not being tested.
- dc-programmer 4 years ago
  
  Wow, that sounds like the future of testing. It’d be a hard to sell to manager now though. Some of those checks seem like they could be auto-generated though
pure_simplicity 4 years ago

The second one will run, because as the file gets executed top down, the second declaration overwrites the first declaration, just like when you reassign a variable.
But yeah that would be a good thing for a linter to catch. I'm not aware if any do.

njharman 4 years ago

Huh? Seems super clear to me that "assertTrue" is asserting Truthyness and not equality. It's right there in the method name! And if you don't know "True" means Truthyness in Python, they you don't know the basics of Python.

A reviewer should catch this error easily. I kind of think many don't give much attention to unittests when reviewing. Which is bad. Good unittests are far harder to write than good code.

There's much more subtle errors of this class (False Negatives / always pass).

js2 4 years ago

Given that this is a not uncommon mistake, despite the name, indicates that people make mistakes. They don't read. They're in a hurry. They see what they want to see.
The fix isn't to blame people for making mistakes. It's to figure out a design that doesn't allow this mistake to happen in the first place.
For example, the method could (today) require the second argument to be a keyword argument. This is also something a good linter should be able to warn on.
edit: rikatee and I wrote essentially the same reply at the same time. :-)
rikateeOP 4 years ago

Agreed in perfect world, but unfortunately any process that involves humans will involve human error.
We do code review because we expect human error when the code was written by a human, but then we also expect not human error when the code is being read (reviewed) by a human? Any process that expects zero human error will always fail.
That's where linters add value: they allow devs to do what humans are good at (the creative complex and interesting stuff) while the bots do what bots are good at (the boring repetitive stuff)

krab 4 years ago

Why Shroeder's test? Does it have anything to do with Gerhard Schroeder? I fail to see what. :-)

Retr0id 4 years ago

I wonder if they meant Schrödinger - the test could be both passing or failing, but we don't know until we use the correct function to check the results.
- tialaramex 4 years ago
  
  Note that Schrödinger's thought experiment is intended to ridicule this way of thinking. Schrödinger is trying to suggest that since it's clearly nonsensical to imagine that maybe a whole cat can be both dead and alive the same would be true for other macroscopic subjects.
  Instead popular culture has decided that at best, this is what Schrödinger believed (Ha those crazy scientists) and at worst that somehow the cats being dead and not-dead at the same time is the core idea of quantum physics :/
  - 3pt14159 4 years ago
    
    Writing software for a long time really changes your perspective on things. It no longer seems weird that "the" cat is both alive and dead. It's just compression between two universes and until we open the box our perspective is the same.
  - marcosdumay 4 years ago
    
    > and at worst that somehow the cats being dead and not-dead at the same time is the core idea of quantum physics
    Yet people are keeping larger and larger objects in a coherent state. Probably nobody will ever do it with a cat, but quantum physics is keeping its tradition of taking anything people think as absurd and saying "well, not really, look at this".
- rikateeOP 4 years ago
  
  you're right thanks! Updated :)
carpenecopinum 4 years ago

That typo/misread (Schroeder instead of Schrödinger) honestly almost causes it to make more sense to me. Because I don't really see how those tests relate to quantum mechanics. Instead comparing those tests to a chancellor of the local labor party that was expected to help the situation of workers in the country, but essentially only used the office as a stepping stone to become a russian oligarch, making situation even worse in the process, makes plenty of sense to me...
rikateeOP 4 years ago

oh dear time for a edit as I did indeed mean Schrödinger!
- ajuc 4 years ago
  
  I thought you meant the villain from ninja turtles ;)
  - rikateeOP 4 years ago
    
    deep ninja turtles lore
contravariant 4 years ago

Maybe it's supposed to be Ernst Schroeder? Though I don't see the immediate connection.

generic-husky 4 years ago

Nobody's going to talk about the cute fursona?

rikateeOP 4 years ago

we were all waiting for you to

dragonwriter 4 years ago

Though it's unlikely to get made in the existing testing library because it's hugely breaking, the API would be better if the assertXxx methods’ optional message argument were keyword-only, and assertTrue (and assertFalse) were replaced with assertIsTrue and assertTruthy (and assertIsFalse and assertFalsey.)

robertlagrant 4 years ago

Good lord I forgot how ugly pre-pytest was.

hyperzeit 4 years ago

How come, pytest does not appear in the post?
- rikateeOP 4 years ago
  
  the post covers the built-in unittest package, which 28% of devs still use. But pytest is nicer to work with. I think brownfield codebases and inertia are the reason 28% of devs work (or have to work) with unittest
  - KptMarchewa 4 years ago
    
    Pytest easily runs unittest codebases, and you can just start writing new tests in pytest, and gradually move to it. Most of those left in pure unittest land are probably in some category of "amount of legacy is too large" or "I don't care anymore", and most probably some amount of both.
    
    thraxil 4 years ago
    
    I actually prefer unittest style tests to pytest. I hop between languages a lot and find them easier to remember how to write when I'm doing Python. I also place a lot of value on minimizing the dependencies that I have to install and every codebase I see using pytest seems to also have to pull in a dozen other pytest plugins that then have to be reviewed, pinned, and updated. I also feel like whatever niceties pytest brings in to make writing tests easier are balanced out on Django apps by having to add a `@pytest.mark.django_db` decorator on basically every single test function.
    
    ginja 4 years ago
    
    Regarding Django, if you add `pytestmark = pytest.mark.django_db` to the top of your file or organize your tests in classes and decorate those, then you won't have to decorate every single test :)
    
    KptMarchewa 4 years ago
    
    I mean, pytest ones are the easiest. They are just functions. If you're not doing anything fancy, then you don't need to do anything else. Use normal assert, not some fancy functions. Only plugin I really use is coverage.
    Minimizing runtime dependencies is nice, but personally I couldn't care less about build/test time dependencies.
    I don't touch Django so can't comment on that though.
  - teddyh 4 years ago
    
    unittest is included in the Python standard library. Adding third-party libraries is a huge step to take for a project, and just “nicer to work with” does not cut it. Third-party libraries come and go, and depending on one means being subject to the storms of changes and lulls of inactivity and death. But the standard library is dependable.
    
    njharman 4 years ago
    
    > Adding third-party libraries is a huge step to take for a project
    Really? I've can't think of a "real" project that did not include 3rd party libs.
    Reinventing wheels or struggling with poorer implementations (unittest vs pytest, http vs requests, etc) is huge drain for a project. A huge misstep.
    
    teddyh 4 years ago
    
    I should have written “Adding each extra third-party library as an additional dependency is always a huge step to take for a project”.
    Any one third-party library comes with these drawbacks, and each library must be evaluated individually. Some may be worth the pain (requests, etc.), but many are not. One should try to minimize the number of third-party dependencies one has, not necessarily eliminate them entirely. It’s simply that the lower number of third-party libraries you depend on, the less pain you get of the kinds I listed. Every individual third-party library will have to overcome that threshold by being useful enough. And I doubt that, for most people, pytest is that much better to work with than the built-in unittest is.
    
    robertlagrant 4 years ago
    
    From the article:
    > While pytest is very popular, 28% of codebases still use the built-in unittest package.
    
    teddyh 4 years ago
    
    Yes? This proves that the writer thinks that everybody should obviously change from unittest to pytest. It does not prove the writer to be correct in all circumstances. Each project should make their own determination of whether a third-party library is worth it.
    
    robertlagrant 4 years ago
    
    No, that doesn't prove that. It just indicates the opposite of what you said, here:
    > And I doubt that, for most people, pytest is that much better to work with than the built-in unittest is.
    
    teddyh 4 years ago
    
    Then why did you quote the author so selectively as to appear to indicate the opposite?
    
    robertlagrant 4 years ago
    
    Sorry, I'm not following. I'm saying the article says less than 30% of repos scanned used Unittest. That was in response to you seeming to say that most projects would choose Unittest.
    What've I missed?
    
    teddyh 4 years ago
    
    Oh, I see, you misunderstood me. I did not claim that “most projects would choose Unittest”. I claimed that for many projects, it would be wise to consider using unittest instead of, for example, pytest, as a way of minimizing usage of third-party modules.
    
    robertlagrant 4 years ago
    
    > Every individual third-party library will have to overcome that threshold by being useful enough. And I doubt that, for most people, pytest is that much better to work with than the built-in unittest is.
    This sounds more like you're saying that pytest wouldn't overcome that threshold; I was saying that has already happened, at least according to the article's statement.
    
    teddyh 4 years ago
    
    Maybe. But I would personally guess that most people nowadays are way too eager to use a new library than would be good for their projects in the long run. I very much doubt that most new libraries are that much better than the Python standard library counterparts that it’s worth using them in non-toy projects. Of course, all progress depends on the unreasonable man, etc.

boxed 4 years ago

This (and so much more) will be caught by mutation testing. For python that means mutmut.

pc86 4 years ago

I'm no Python expert, but why does assertTrue() even accept two arguments?

js2 4 years ago

Explained later down in the post:
> assertTrue also accepts a second argument, which is the custom error message to show if the first argument is not truthy. This call signature allows the mistake to be made and the test to pass and therefore possibly fail silently.
justinsaccount 4 years ago
The 2nd argument is the 'msg' argument.
With modern features in python you could change the signature to
```
  assertTrue(expr, *, msg=None)
```
which would prevent that issue.
- d0mine 4 years ago
  or just:
  assert expr, "custom message"
  though given the verbose api, it is ok to require the explicit msg kwarg (duplication in the tests is ok if it makes them more robust)
guilherme-puida 4 years ago

The second argument is the message that is displayed if the value is not truthy.
cyberia23424 4 years ago

Second arg is a string message to display when test fails
- 6LLvveMx2koXfwn 4 years ago
  
  Indeed, from TFA;
  "assertTrue also accepts a second argument, which is the custom error message to show if the first argument is not truthy. This call signature allows the mistake to be made and the test to pass and therefore possibly fail silently."

charcircuit 4 years ago

Does unittest cause an exception when the second argument isn't a string? That would catch some of these I'd imagine.

lozenge 4 years ago

They should make `msg` a keyword-only argument on all the TestCase.assert* statements. It wouldn't be the first breaking change of this type.
Searching for announced breaking changes about arguments to Python included functions...
https://bugs.python.org/issue25628
https://bugs.python.org/issue29193
https://bugs.python.org/issue36492
https://bugs.python.org/issue29209
rikateeOP 4 years ago

it does not, it just prints them out in the console

dingosity 4 years ago

well. yeah. they were probably written for 2.x or 3.y and being run in 2.(x+n) or 3(y+n).

dcdc123 4 years ago

Only 3 percent?

rikateeOP 4 years ago

of the codebases checked yep (20 of 666 checked).
Bear in mind only 28% of codebases actually use built-in unittest package that this gotcha is affected by, so really it's 20 of 28% of 666 aka 10% ... but that claim would be hard to justify by folks that dig stats.

Settings

3 percent of Python codebases we checked had silently failing unit tests

Keyboard Shortcuts