Settings

Theme

TDD from the Factorio Team

factorio.com

457 points by sorahn 5 years ago · 319 comments

Reader

dgb23 5 years ago

Two interesting takeaways:

> This is the beautiful thing about having a company that isn't on the stock market. Imagine you have a company that goes slower and slower every quarter, and then you confront the shareholders with the statement, that the way to solve it, is to do absolutely no new features for a quarter or two, refactor the code, learn new methodologies etc. I doubt that the shareholders would allow that. Luckily, we don't have any shareholders, and we understand the vital importance of this investment in the long run. Not only in the project, but also in our skill and knowledge, so we do better next time.

This is reassuring the notion of what I think actually matters, what the real essence is of developing a product, may that be a piece of art and entertainment (like here) or a productivity tool etc.

There are creators and there are consumers. We split them up by developers, designers, domain experts and so on, but what matters is that all the other participants, especially those who can exert power traditionally are not part of the essence and if not being careful and responsible, can easily add complexity and limitations that are entirely accidental and can even be harmful.

This reminds me of the agile manifesto, modern UX approaches and other processes that are driven by creators, but are often and very unfortunately being bent over backwards to fit into hierarchical power structures.

> TDD actually is the constant fast swithing between extending the tests and making them pass continously. So as you write tests, you write code to satisfy them basically at the same time. This allows you to instantly test what you write, and mainly use tests as specifiation of what the code should acctually do, which guides the thought process to make you think about where you are headed to, and to write code that is more structured and testable from the very beginning.

The important notion here is that TDD is not about tests and correctness, but about development. It continuously checks assumptions and explores the surrounding code, state and data until a sufficient solution is found.

If we squint a little we can see how closely related TDD with REPL Driven Development is. In essence it is the same thing and even has similar results, where the tests or REPL code can be left as an artifact for further, likely historical understanding.

We know now that neither is sufficient for a high degree of correctness, but they are certainly useful for understanding and development.

  • rpastuszak 5 years ago

    > The important notion here is that TDD is not about tests and correctness, but about development.

    Yup, writing tests helps me sleep at night. TDD helps me manage my mental resources and iterate.

    (another reason is communication--we code for our colleagues first, then for the machine: https://sonnet.io/posts/code-sober-debug-drunk/)

    IIRC smalltalk had a workflow where you'd debug and write your program at the same time. You'd just reach a path that has not been implemented yet, break, implement it and continue.

    • dgb23 5 years ago

      This „keeps working while it breaks“, does it have a name? It comes up in highly dynamic environments often like Smalltalk as you mentioned, but also Lisp, Erlang and others.

      • dkarp 5 years ago

        I think this may be "REPL driven development" popular in the lisp community

        • dgb23 5 years ago

          I meant more at the system/vm level. You're going to use a REPL (of some sort) for this, but the system needs to accommodate failure and just keep going, while giving you the opportunity to fix and change things.

          This has been implemented to varying degrees of reliability, granularity (in terms of preserving state) and utility. It's a common concept but I don't know the name for it!

          Examples:

          https://en.wikipedia.org/wiki/Continuation

          https://en.wikipedia.org/wiki/Fault_tolerance

          Maybe fault tolerance is the right word? But it doesn't imply that you actually modify and build the thing while it already runs.

      • aiisjustanif 5 years ago

        In security of systems you would call the design “Fail open”. They are many systems that fail open.

        An example of fail open design in the real world, is electronic magnetic locks at apartment complexes or business buildings. If the building loses power the doors demagnetize and unlock, then indefinitely remains open but failing because it is always open now. The opposite would be the building loses power and the doors fail close and lock the doors indefinitely when power is lost, and now people can’t escape.

      • FigmentEngine 5 years ago

        this could be many different concepts. fault tolerance is the most obvious, and self-healing and auto-rollback

        a good read is ROC https://en.m.wikipedia.org/wiki/Recovery-oriented_computing

    • mulmboy 5 years ago

      > You'd just reach a path that has not been implemented yet, break, implement it and continue.

      This is exactly what I do with the PyCharm - hit a breakpoint, write the code as it should be, and execute it in the debug REPL to do basic initial testing [repeat]. Extremely productive.

  • wpietri 5 years ago

    I appreciate this, but I still think it's a suboptimal approach:

    > that the way to solve it, is to do absolutely no new features for a quarter or two, refactor the code, learn new methodologies

    Even in a world without shareholders, we still are building things for users. 6 months without improvements has an effect on them, too. When possible, I think it's better to spread cleanup work out. Even if one spends 80% of the time on cleanup and 20% on features, that is much better for relationships than going dark for a quarter or two. And in my experience, continuing to do productive work during that period makes the behind-the-scenes improvements better.

    • mikewarot 5 years ago

      I've both watched all of Uncle Bob's videos (parts more than once), and been a long time Reddit/Factorio reader/game player. (The factory must grow!)

      They did exactly what you suggested! The relationship with the users never went dark. They kept up with bug fixes, and kept feeding new features at a more than acceptable pace. [Edit: Here's the pace of their status updates, as evidence at https://www.reddit.com/r/factorio/?f=flair_name%3A%22FFF%22 ]

      I didn't quite believe that Uncle Bob's lessons worked in the real world, but if the Wube team is sold on them, that's about as good as an endorsement as I'll ever get.

      • wpietri 5 years ago

        Yay! Glad to hear it.

        I keep thinking I should try Factorio. But given amount of time friends have spent on it, I'm scared I'll never be heard from again. One of my friends has built a whole Grafana setup for it.

        • bregma 5 years ago

          Try heroin first. It's easier to quit.

        • mikewarot 5 years ago

          Factorio is easy to dump a ton of time into, if you are the type that likes solving puzzles and optimizing systems. You are wise in your caution.

          • elevatortrim 5 years ago

            True, and it is probably worth it. In my late 30s, I thought I'd never enjoy a game like I used to in my teen years. Well, Factorio was easily as much as if not more enjoyable. I do not know if I'll enjoy a game this much ever again.

        • oarsinsync 5 years ago

          If you spend your working hours solving problems and architecting systems, this game will be fun until you cross whatever your personal limits are for immediately grasping concepts.

          As long as you're immediately understanding, you'll continue to have fun.

          As soon as you need to stop and research to understand what it is that you need to do next, and then start to re-architect a bunch of things, you may or may not realise this is the same thing you do for a living.

          Nothing quite like burning the candle at both ends, for fun and profit.

          Personally, it's not for me. Others (including some of my colleagues) definitely do not share that sentiment.

          That said, I enjoyed Mindustry for a similar period too. Factorio-lite tower-defence game.

          • pault 5 years ago

            You should take a look at satisfactory[0] if you haven't already. It's not (quite) as deep as factorio, but the game world is absolutely beautiful and immersive, and it's still in early access so new content is constantly arriving.I personally had to ban it because I played it for 300 hours in less than three weeks. I literally could not get myself to stop playing (until I did). :)

            [0] https://store.steampowered.com/app/526870/Satisfactory/

    • scotty79 5 years ago

      In this specific case, Factorio is such beloved product with such great development history that 6 months without updates is nothing.

      • elevatortrim 5 years ago

        It is not entirely without updates, the team have been pushing bug fixes for the most obscure edge cases that almost noone is running into.

      • raxxorrax 4 years ago

        There were times, when no new features were included and polish was the main goal. I don't think it was 6 months, but their FF seriously is a recommended read. Just about how problems manifested and the approaches to solve them. Not only interesting for game developers.

      • wpietri 5 years ago

        Sure, but why eat into that belovedness if they don't have to?

  • indigochill 5 years ago

    Regarding the power structure observation, I've lately been weighing the merits of co-ops in the tech industry. Intuitively, it seems like a tech product owned by its users would lead to more product-centric decision-making. Leaving governance to shareholders who aren't necessarily directly involved in the product feels weird by comparison.

    • dgb23 5 years ago

      I think so too. A lot of that free energy is in the FOSS movement it seems, and there are plenty freelancers and single consultants in tech. This is a strong indicator that there is enough developers who primarily want to create, while having agency, responsibility and a direct communication channel. So why not start businesses that share these values and bring consumers and producers closer together?

  • wpietri 5 years ago

    Yes, you're exactly right about REPL-driven development and TDD being the same spirit.

    I took to TDD pretty easily because I was already used to doing short run-it-and-see-if-it-works cycles. The main difference was that instead of checking via eyeball, I talk the computer to check it. This is slightly slower early on, but so much faster once a program is big enough that manually checking everything would take a while.

  • bjornjajayaja 5 years ago

    I’d love to see a language where you write the tests and then the compiler creates the application code.

    • kilburn 5 years ago

      This is an active field of research. Search for "program synthesis".

      We are advancing, but the current state is... not mind-blowing yet (albeit somewhat cool!). See [1] for an example interactive demo and [2] for the corresponding presentation.

      [1] http://comcom.csail.mit.edu/comcom/#Synquid

      [2] https://www.youtube.com/watch?v=HnOix9TFy1A

      • hwayne 5 years ago

        I was in the audience for that talk. The recording doesn't capture how much energy there was- we were all gasping and cheering throughout.

        Still a decade off from production, though. That it doesn't just take "test cases": you have to know how to formally express the program properties, which is a separate skill from both unit testing and implementing.

    • oats 5 years ago

      Is this not almost how logic programming (prolog, etc.) works? You tell the language some things which are true, and then it'll be able to infer answers to "questions" you ask:

      https://wiki.c2.com/?LogicProgramming

      • victorNicollet 5 years ago

        Let's say we work on the "reverse list" function. A few tests I could write are that the following clauses are true:

            rev([1], [1])
            rev([1, 2, 3], [3, 2, 1])
        
        And maybe also that the following clauses are false:

            rev([1, 2], [1, 2])
            rev([], [1])
            rev([1, 1], [1])
        
        Is Prolog able to infer, from the above, that rev([4, 5], [5, 4]) ? Or to synthesize the general form rev([H|T],R) :- rev(T, RT), concat(RT,[H],R) ?
        • ffhhj 5 years ago

          Sounds like NN training that could be achievable. The problem comes with unit testing, because the minimal testable unit could be more complex than a function.

      • iamwil 5 years ago

        Yeah. I've often seen TDD where you act both as the logic programmer writing the tests and the imperative programmer writing the implementation.

      • JohnHaugeland 5 years ago

        No. Not even a little bit.

    • bregma 5 years ago

      It's incredibly easy. First, you start with a customer that actually knows exactly what they want.

    • nyberg 5 years ago

      This sounds a lot lke like minikanren (https://github.com/webyrd/Barliman) where you give test cases and Idris2 (https://github.com/idris-lang/Idris2) where you give type constraints as a tool for building programs.

    • dgb23 5 years ago

      We typically write _sample_ tests in boolean logic, which isn't quite expressive enough for this.

      But if you look at logic programming with more expressive systems you can have something like what you propose. We describe what we expect to have and the system deduces a result. Not quite what you want but its closer.

      Now there is also an ubiquitous logic system that many use: static typing. In a sense you are describing the general properties of something and the compiler infers optimizations based on your assertions. The concrete program is not a line by line translation from your code to machine code, but perhaps looked at in its entirety.

      I agree that there is a lot of merit in pushing these things further and further. Right now we're kind of in a stage of patching things together. But I hope and assume that programming becomes more holistic in the future. Ironically we have to look at the past first, there was a lot of momentum in this direction up until the 80's roughly.

    • sidlls 5 years ago

      How would this even work? At best you might get a set of class/function stubs with some minimal logic. At worst what you'd have is a compiler that is actually a complicated, truly AGI brain, which could produce some truly awful code. In which case you've simply reproduced TDD's normal result: truly awful code.

      Aside from that, to produce a program that did what is actually required would require test cases and functions that cover the set of inputs and outputs. This is trivial for mathematical functions, but impractical (or impossible) for more general applications (e.g. anything dealing with human inputs).

    • mikewarot 5 years ago

      You could just use a fuzzer to generate code, and run tests on the output. Each new test would approximately double the run time until a new "correct" output was found.

      This doesn't become practical until you can do it on a quantum computer with millions of cubits.

    • iamwil 5 years ago

      This is kinda what logic programming is, like in prolog. You tell the computer what you want, and it finds the answer for you.

      TDD is where you both write what you want, and you do the implementation also.

    • phtrivier 5 years ago

      I would like to see this language handle some lawmaker-specified code.

      Show examples of French retirement pension computation, and watch if a computer can actually commit petit-suicide.

    • astrange 5 years ago

      Well, that's how machine learning works.

      • mikewarot 5 years ago

        Machine learning requires a problem that you can have partially correct, so that it can climb the gradient to optimize on. If you can build tests that have an analog instead of pass/fail output, you could, in theory, do it with machine learning.

        Beware that machine learning in a single pass/fail is more like having an infinite number of monkeys trying to write the works of Isaac Asimov.

        [Edit/Update] All of the tests could be individual values, so non-zero (but nowhere near all ones) might help. Thanks for making me reconsider this, sdenton4.

        • sdenton4 5 years ago

          Not all ml is gradient based. Other options exist: Bayesian black box optimization (like vizier), or genetic algos. And Vizier is actually quite efficient for small problems.

          • peheje 5 years ago

            With genetic algorithms you still need to be able to calculate a fitness. Usually a test is fail/success. There's no fitness in that. I would guess the other optimizers also need such a signal?

            • sdenton4 5 years ago

              Success/failure is just a binary classification signal: you can look at how it correlates with your target variables, how noisy it is for particular choices of variables over multiple trials, etc. The noisier it is, the harder it is to learn, but such is life.

              • peheje 5 years ago

                I think it would be impractical for a naive fitness function that is 0 for failure and 1 for success. Wouldn't the signal be too difficult to find? GA would be brute force until you find code that passed a test. I don't think tests for factorio are trivial.

                Maybe you could move the goal/fitness function along the way. So start with something that compiles. Then having the desired input output. Etc.

                • peheje 5 years ago

                  Addendum. More tests. I see. Would you then lead the algorithm along a trajectory. If you can pass this simple test you would probably be able to pass this as well then this... Babysitting it along the way. Ideally you wouldn't need to, but maybe to make it possible..

                  • sdenton4 4 years ago

                    Check out Thompson Sampling and multi-armed bandit problems for how this can work out in real life. (I tend to think this approach is much better than genetic algorithms...)

                    Each 'bandit' is a random boolean outcome, governed by some hidden success probability. Thompson Sampling trades of exploration and exploitation. If there's no successes ever, then all bandits are equally bad, and you just keep exploring randomly until you find some success (or give up). If you do have some success, you can try to exploit it.

                    For a problem with continuous parameters, you can discretize the parameter space by binning, and then choose randomly within a bin for each trial. 'Exploiting' a particular bin might lead to breaking it into more bins for finer resolution.

          • astrange 5 years ago

            I was actually thinking about decision trees.

  • CraigJPerry 5 years ago

    For those curious about REPL driven development, this is a good example that i ran across recently:

    https://gist.github.com/daveray/1441520

    Even if you don’t follow along and try it, you can probably get the gist of how experimental / exploratory you can be.

  • truetraveller 5 years ago

    TDD is basically automated and "named" REPL-driven development. Which is very nice!

    • infogulch 5 years ago

      Oh that's an interesting take, and feels right to me. So the thing that holds back TDD is ergonomics (compile time/cached results) and maybe marketing.

  • lugged 5 years ago

    Its almost like Test Driven Development is the practice of using tests to drive development.

    • dgb23 5 years ago

      Much of TDD mantra is less about development but about (perceived) correctness, the article in question emphasizes the development part, where they describe their AHA moment.

      • berkes 5 years ago

        > Much of TDD mantra is less about development but about (perceived) correctness,

        I have (seriously!) never heard it reference like this at all. Every introductory article, every book, every tutorial, every video about TDD (or BDD) that I consumed, emphasised that it is about "workflow", about "a way of development" and not about percieved correctness.

        Do note that the crucial different is that TDD > automated testing. In the sense that with TDD you write automated tests (that live in a test suite). But that "writing tests" or "developing a testsuite" is not nessecarily TDD.

        • dgb23 5 years ago

          Yes in relation to automated testing it is obviously focusing on the development aspect, but as a whole method there is an at least implied and sometimes explicitly emphasized correctness benefit, that doesn't just come from automated tests but from the method of driving design through testing.

          In my personal opinion this claim goes too far. I think we can often explore aspects of design through code (including tests) but it can't be the main driver of design in any circumstance, nor does it by default or in general make our modules and programs more correct.

          The key benefit of doing this is that we have to think of code in terms of invariants and testability. But that is just one of many, sometimes conflicting factors and there are certainly more approaches to doing this than writing automated (unit-) tests.

        • hwayne 5 years ago

          The original TDD book said it was both.

      • ffhhj 5 years ago

        > (perceived) correctness

        Right, in TDD there is "always" a range of values against which you are not testing. A developer can only create a finite amount of tests, and the program will fail with any of the missing permutations.

      • lugged 5 years ago

        Also need to echo the above.

        The TDD book itself is barely even about testing, it's almost entirely about development processes and different types of refactoring and how to use tests to get from a to b.

        Correctness is rarely emphasized if at all.

ramblerman 5 years ago

I'd be curious to hear what kovarex thinks in 2-3 months.

TDD is often sold as a fix-all solution, which is incredibly appealing to mgmt and quite fun for most programmers as a new paradigm, allowing for quick adoption.

It also has its uses, especially in the enterprise space where requirements aren't often clear. But I don't know many good programmers that truly stick to the dogma after the honeymoon period. It becomes just another tool in your toolset.

Uncle bob is a salesman, not a "craftsman".

  • naikrovek 5 years ago

    TDD and OOP as dogmas are very bad, for different reasons.

    TDD encourages you to write many times the number of lines of code for your tests as the code you're testing, often 10X or more. So when you inevitably decide "there's a much better way to do this" (which happens to me 100% of the time) then you're not only changing your code, you're changing all of those tests. That's a lot of weight that you now have to deal with.

    Most of the time, that's enough weight that the better design simply doesn't happen and becomes yet another chunk of tech debt that prevents certain things from happening in the future. I've seen that happen, and it's given me a very bad taste in my mouth for "TDD" because "TDD inertia" is the reason that better designs aren't implemented. If that piece of software lives for a while, and grows in scope, someone is going to have to deal with that, and eventually make the architectural change anyway.

    By all means, test your code, of course. If you can find a way to easily generate test cases for an arbitrary code base, by all means, do that, because the test cases are no longer influencing your decision to change how your application works.

    Otherwise, being "test-driven" is bad, IMHO. Software development is never as simple as the various dogmas would lead you to believe.

    • kllrnohj 5 years ago

      I think kovarex is avoiding your TDD concerns by rejecting the notion that everything must be covered by minimal-scope unit tests. This means that there's much fewer (if any) mocks that need updating when the implementation changes, and fewer tests should end up needing to be touched as a result as well.

      See specifically the "Fig. 5 - Test dependencies" section of the post.

      Personally I think mocks are far over-used in tests, and I much prefer the solution kovarex outlines. I think dealing with mocks are the bigger source of test-update friction here than the tests themselves. I was already layering my tests instead of using mocks. As in having no clear line between "unit" and "integration" tests, everything just uses the "real" implementations of things and tests just get naturally more complex the higher up the stack it's testing. The idea of sorting by dependency depth is a cool idea I hadn't ever considered, though, I'll be borrowing that idea.

    • CharlesW 5 years ago

      > So when you inevitably decide "there's a much better way to do this" (which happens to me 100% of the time) then you're not only changing your code, you're changing all of those tests.

      Isn't the point of TDD that you can change implementation at will, safe in the knowledge that the tests will help guarantee that you're not inadvertently changing behavior?

      • bogdanoff_2 5 years ago

        Unless what you're testing has a very clear interface boundary (like a pure function whose behavior is unlikely changed, or the program's end-to-end behavior), there's no clear distinguish between behavior and implementation.

        Most of the time you have classes that interact with one another. If you test each class individually, and you decide to refactor what classes do what, you'll need to rewrite a lot of tests, even though the higher level behavior should not change.

      • twh270 5 years ago

        Usually, yes -- your tests should be decoupled from the implementation.

        Usually however, what I see (in Java and Spring Boot land) is test methods which use mocks that are set up to expect certain method calls, e.g. when(accountService.getAccount(id)).thenReturn(account). Typically there are anywhere from six to 20 lines of this stuff per test (and, it's rarely de-duplicated into a separate method). So as soon as the contract to getAccount() changes, a bunch of tests need to change.

        Second, it's common to believe that the "unit" in a "unit test" is either a class or (less often) a method. Consequently, people write test classes for every single class of a group of classes that are acting in collaboration, when what they really ought to do is write tests against the single class supplying the public interface, varying input to that class as necessary to achieve good confidence.

      • cryvate1284 5 years ago

        This depends on what you're doing. For example, if you are changing an intermediary (abstraction) layer, then the (unit) tests (TDD or not) for those will have to change and the "guarantee that you're not inadvertently changing behaviour" is kinda moot.

        If you do not have TDD, I guess the reasoning is that either you do not have tests for this (probably bad) and if you do, less of them (and so more easily changed).

        Not sure it's an argument against TDD, but I guess if management/the programmer do not know about sunk cost fallacy, it might make them hold on to bad abstractions/layers.

      • ema 5 years ago

        Often the better way to do something is achieved by changing how two components interface with each other. in this case you have to change the tests that test this interface.

    • pault 5 years ago

      I always thought of TDD as a way to exercise your code while writing it, not a set in stone spec of your product. I consider unit tests disposable and basically noise; they're only useful when you're writing code. They are by nature a test of your implementation, not your specification. I usually throw away a lot of them after I'm done writing, and if I refactor I just delete the ones that are failing because I'm writing new ones as I go. There are places at boundaries where more stable unit tests can go, but even there a change to the interface is going to break all the tests anyway.

      The better way to test your specification and behavior, IMO, is to use comprehensive E2E tests. The traditional test pyramid is upside down. If you have an E2E test for every user story in your spec, you can develop with confidence that your new code will not disrupt your users' activity. Unit tests are cattle, E2E tests are pets.

    • imiric 5 years ago

      > TDD encourages you to write many times the number of lines of code for your tests as the code you're testing, often 10X or more.

      If that happens I'd say you're doing TDD too early. TDD can be very useful, but early on in the process of transferring a design to code, all your interfaces are highly unstable and there's a lot of experimentation, so sticking strictly to TDD would naturally be frustrating. I would start a bit later when at least some of the API has stabilized and you don't have to do major changes.

      TDD helps with creating user friendly APIs, since you experience it from the user's perspective. And it forces you to actually write testable code and not incur technical debt that's very costly to remove later (having to refactor code to make it testable).

      > Most of the time, that's enough weight that the better design simply doesn't happen and becomes yet another chunk of tech debt that prevents certain things from happening in the future.

      This reads like you're saying that tests themselves are technical debt...? Because you're going to run into this issue (or should run into it) regardless if you use TDD or not. Eventually you'll want to refactor parts of your codebase and, sure, you'll have to change some of your, hopefully mostly, unit tests. So in that sense you can say that doing TDD early creates a lot of technical debt that needs to be resolved quickly, but like I said above, that doesn't have to be the case.

      TDD can be a dogma just like any practice (Agile is my favorite), but it doesn't mean that it's not useful if used correctly.

      Kudos to the Factorio team for adopting it, which I think is rare in the gaming industry. The idea alone of testing the complexities of a video game is mind boggling to me as a web developer. Especially in an industry where it's popular to hype and sell broken products with the promise of patches and DLC.

      • naikrovek 5 years ago

        does TDD not dictate that you write tests first, before you write the code under test? then you write only enough code to make the test pass, as I recall.

        yeah TDD is a mess

        • imiric 5 years ago

          Why does a practice have to "dictate" anything? TDD is a workflow suggestion, not gospel that must be strictly followed.

          Writing and maintaining a test before any experiments have been done with the design would indeed be frustrating and lead to a lot of rewriting. If you first explore the design and allow it to settle into a usable interface and then write the test for it as assurance that it's stable, you'd have the benefit of being able to safely refactor the feature and quickly add more tests for it. As long as the deliverable includes tests with decent coverage, who cares whether you wrote a test first without writing a single line of the implementation?

          • naikrovek 5 years ago

            > TDD is a workflow suggestion, not gospel that must be strictly followed.

            you haven't been talking to the same people I've been talking to, I guess. TDD is sacred and must be strictly adhered to, according to them. "anything less is just plain testing and is useless."

            and people sometimes ask why I have such strong opinions against things like this. it's because people who don't write software come up with these things, and their lack of experience hides the realities of software development from them. they mean well, and they cause harm.

  • wpietri 5 years ago

    Nobody should stick to any dogma past the honeymoon period. The point of a dogma is to get you into the behavior space that you will learn how to do something well. It's like using a recipe in a cookbook. Once I had enough practice making scrambled eggs, I didn't need a recipe anymore.

    I learned TDD ~20 years ago from Beck's "TDD by Example" book. For me it's far more than "just another tool". I'll certainly do exploratory work with throwaway code. And in the early stages of something, I might be a bit slack. But the lesson over and over for me is that if I don't get to significant test coverage soon, I end up wasting a lot of time on bugs. And TDD is the easiest way to get to test coverage. Test-first ends up feeling like a set of small successes with the red-great-refactor loop. With test-after, going back and adding tests feels more like drudgery, and I'm less likely to have written the code in a testable way.

    So TDD is not dogma for me, just the inevitable place I end up if I want to maximize the amount of time getting things done while working on a non-trivial codebase.

    • wpietri 5 years ago

      That said, I totally agree that Martin is a salesman. I knew him in the early days of the Agile movement and he did a lot of good then. But he's become consistently more strident, more dogmatic, more unpleasant. It's sad to see, really.

      • roguas 5 years ago

        I think it's the archetypical story of you either die a hero... People find new ways to cope with complex problems, they soon become bound to those new ways.

        • wpietri 5 years ago

          No, I think he's become the villain for unrelated reasons. A lot of the early Agile people are still out there doing good work. More than I'd expect, honestly; the people who are attracted to a wild, early-stage idea are a very mixed bag.

    • nightski 5 years ago

      I had a similar trajectory but opposite experience. I felt like after doing heavy TDD and even FP after a while TDD felt more and more like a waste of time. I find that I naturally think and design in a test driven perspective without actually writing the tests due to many years of experience.

      1.) I'd much rather lean on the type system to prove things than automated tests if possible. But of course depending on the language that often isn't possible.

      2.) I find that only a small fraction of my code really benefits from automated testing. This is the logic/calculation parts of the code. The rest is slinging IO and SQL queries which all ends up being mocked out anyways and the tests just become secondary implementations of the original code.

      • wpietri 5 years ago

        I am all for people finding the minimum sufficient amount of testing to get the quality they want. And I agree that if one's project is amenable to strong typing, that's a better place to put some concerns than unit tests. I also agree that people newer to testing things tend to ritualistically over-test in ways that feel complete to them but don't really advance quality.

        That said, my solution to that isn't to not test, it's to start with broad, high-level tests that make sure the whole thing works together. Then I add automated tests whenever I get bugs, whenever I find myself manually testing something to make sure it works, and whenever tests would have documentary value.

        For those edge-of-system things, if you haven't tried it you might consider abstracting them out. E.g., with SQL persistence I might end up with an abstract Store and implementations of InMemoryStore and DatabaseStore. The fast version of the tests uses the former and I run it all the time when testing. The slow version uses the latter and it gets run less often. That can get rid of a lot of essentially duplicative testing while still giving a fast feedback loop.

  • alanfranz 5 years ago

    My 2c: tdd is great in order to learn to create testable designs. You can’t tdd nontestable code.

    Once you understand how to design testable code, tdd offers minimal benefits.

    The real value comes from thorough, automated testing suites, whatever their origin is.

    • Chris_Newton 5 years ago

      The real value comes from thorough, automated testing suites, whatever their origin is.

      Automated testing is useful, for sure. However, TDD purists tend to focus on one very specific type of testing: automated unit testing, in the small, where you already know the expected output for given input and can easily specify that output using assertions in code. By its nature, TDD emphasises being testable in that specific sense above all else. I don’t think that is necessarily a good thing, partly because that type of universal unit testing may not be a good strategy for every software system, and partly because other useful properties of the code might be diminished because of the changes needed to make it “testable” in the TDD sense.

      • alanfranz 5 years ago

        > where you already know the expected output for given input and can easily specify that output using assertions in code

        Well, "knowing" (or manually generating) the output for your code is required to check whether your code is working. If you execute your code and copypaste its output blindly, you risk testing the wrong thing, don't you.

        And if you can't specify your code's output, how do you test?

        BTW, I agree with the fact that TDD is overly pedantic at times.

        • Chris_Newton 5 years ago

          Well, "knowing" (or manually generating) the output for your code is required to check whether your code is working.

          Not all code is written knowing the answer in advance. Sometimes you’re implementing a mathematical model to predict next week’s weather, or the effects of a change in economic policy, or the performance of different hull shapes for a ship under different conditions at sea.

          My second standard challenge to TDD evangelists is to write a program to draw a classic multicoloured Mandelbrot image, showing an arbitrary rectangular region of the complex plane, using only the mathematical specification without any reference to existing code or output data.

          This can be done with a short program that is simple enough to verify by eye that it matches the required mathematical specification. However, the problem has an infinite input space, its fractal nature means predicting output values is impossible in the most interesting parts of that input space, and you have no idea where those interesting parts are anyway.

          • sterlind 5 years ago

            that actually sounds like a fun thing to do in a formally verified language like Dafny. you'd write a "ghost function" for the Mandelbrot equation, then write your code, then prove a theorem that the epsilon between the equation and the code was below a threshold at every pixel in your rectangle.

            I guess that'd still be TDD, but with the first T standing for Theorem :p

          • snovv_crash 5 years ago

            Yes, of course there are cases where it doesn't work. But what percentage of code is that, really?

            • Chris_Newton 5 years ago

              I don’t know what percentage of all code that gets written would be better tested in other ways. I doubt anyone else does either.

              For me, unit testing (in the typical xunit style associated with TDD) is most useful for basic data processing code with predictable outputs. That might include a wide range of code, from little utility functions on strings to the business rules for whole CRUD applications.

              On the other hand, anything with input or output data in an awkward format, anything communicating with any external equipment or remote API, anything involving nondeterminism or heuristics or where the purpose of the code is to perform some calculation where you don’t know the correct answer in advance, these kinds of code don’t tend to fit well with a TDD approach and that style of unit testing as the main test strategy IMHO. Those cover a pretty wide range of code as well.

              • snovv_crash 5 years ago

                Then it's really a matter of the granularity of the tests, right? Maybe they should target a higher level of abstraction, where the nasty details (which can hopefully be simplified in the future) aren't an issue.

                If you don't know about what the code is meant to do (non-determinism, heuristics, etc), to me that you're targeting the wrong level of abstraction. At some level you know what it's meant to do, unless you're working on an abstract art project.

                TDD done dogmatically is a mess, sure, but then so is anything.

                • Chris_Newton 5 years ago

                  I agree that it may be more useful to test a larger part of the code together instead of trying to do everything at the finest levels of detail. However, my point here is about more than just the level(s) where you perform the tests. It’s also about what kinds of testing you are doing. Here are some of the most common possibilities:

                  • xUnit-style tests for specific cases

                  • Property-based testing

                  • Snapshot-based testing

                  • Manual testing

                  • Formal verification

                  • Peer code review

                  All of these can be useful under the right circumstances.

                  If your code produces output that is best checked by human inspection but shouldn’t then change (or change very much) then snapshots may be a good choice.

                  If your code won’t deterministically produce the same correct output every run but whatever output it does produce should always satisfy certain conditions, maybe property-based testing is a good way to go.

                  If your code involves communication with external equipment that requires operator interaction to do anything interesting, fully automated testing might simply not be possible. In that case, manual integration testing with someone physically operating the equipment might be appropriate.

                  Formal verification covers a wide range of possibilities from the likes of basic static type checking all the way up to the use of automated theorem provers in specialised programming languages. It almost always has some extra cost in terms of annotating the code but it can sometimes produce far more powerful evidence of correctness in the general case than any test suite checking individual cases ever could.

                  Code reviews are pretty much universally good, as long as you’ve got enough people available with the relevant knowledge to do them.

                  Sometimes, these testing techniques are complementary and using more than one of them together might be beneficial. At other times, a coding style or process that favours one might make another more difficult.

                  This HN discussion is mainly about the style of “testable” code that TDD tends to produce, with many small units and lots of dependency injection, which is of course very friendly to small-scale unit testing. However, it might also be more difficult to review because of all the configurability and indirection. If it relies on doubles to stand in for external resources, it can end up testing the accuracy of the simulation more than anything else, so adding very little (justified) confidence that the real system is operating correctly. And as a rule of thumb, individual testing of specific cases may be less effective than testing large numbers of generated cases with property testing, which in turn may be less effective than proving that all cases work via rigorous analysis.

                  And this brings me back to where I came in, which was that prioritising “testable” code in the TDD sense isn’t necessarily a good thing. The real meaning of “testable” depends greatly on what type of code you’re writing and which testing strategies are most helpful.

          • iudqnolq 5 years ago

            What's the issue with using the formula and a calculator to compute sample points?

            • Chris_Newton 5 years ago

              What you are advocating is duplicating the calculations the program does manually for some set of specific cases, which can then be used for tests. Let’s consider a few practical questions about how that might work.

              1. How do you choose which specific cases to calculate manually?

              Remember, you have no prior knowledge of what the correct answer looks like or where the interesting parts of the input space are. You have no way to determine whether any given set of manual calculations is representative of the work your program will be doing or covers the areas with the greatest risk of error.

              2. How practical is it to make all those manual calculations?

              Even in this very simple case, calculating the correct answer for a single input point manually might require hundreds of complex arithmetic operations to be performed. That’s going to be slow and error-prone. After all, isn’t that why we’re writing a program to do this for us?

              Now, how does this idea scale? What if our program isn’t computing a nice analytical solution to a simple arithmetic problem, but instead running a complicated numerical method to process many thousands of data points in each input? It quickly becomes impractical to rely on this strategy for testing.

              3. How will having a set of known outputs you can test against drive an implementation from scratch?

              I mentioned before that Mandebrot is my second standard challenge to TDD evangelists. The first is to write add(x,y) driven by tests. After a bit of back and forth, this invariably ends up with an implementation that was generalised from however many specific cases were given to the general case. Invariably, that generalisation is the step that actually creates a useful solution to the original problem, and invariably it uses an insight that was not driven by the tests.

              Our Mandelbrot scenario is the same situation, just a slightly more complicated example. No matter how many individual tests you create by choosing sample points in the input space and manually calculating the expected output, you won’t have a systematic way to work back from those answers to derive a correct general implementation of the Mandelbrot calculation. Your specific cases might be useful for verifying an existing implementation, but they give you no insight into how to write a good implementation from scratch. (If I’m dealing with a particularly strident advocate of TDD, this is the point where I mention the word “sudoku”.)

              And again, we have to ask how this process scales. What if we had a more challenging problem, say extracting an audio track from a video file and running a speech recognition process on it to generate subtitles? It might actually be easier in that scenario to identify individual test cases: just take a known video file as input and write down the expected output for it, and now you have an end-to-end test. However, it’s still true that no amount of end-to-end tests will necessarily offer any insight into how to structure a good implementation of the required functionality in detail, nor will it tell us how to implement any specific part or generate useful unit tests cases for those parts. End-to-end test cases might help us to verify an existing implementation, but in general they won’t reliably drive a correct one from scratch.

              • iudqnolq 5 years ago

                You're right that if you play the sort of game where you write the test "assertEq(2, add(1, 1))" and then write "fun add(_, _) = 2" you won't get very far with this sort of problem. I personally (without much experience) prefer the kind of TDD where you write say "assertEq(0, add(0, 0)), assertEq(43, add(42, 1))", fully implement the add function, and then move on. In the mandlebrot case I'd maybe compute what a few pixels should be, write all the code, and see if those pixels are right. Not perfect, but I find it better than nothing.

                • Chris_Newton 5 years ago

                  The point of the example is that the fundamental premise of TDD is flawed here.

                  You can’t reach a program that solves the general case by iteratively writing a failing test for a specific case, making the smallest change required to make that test pass, refactoring, and then moving on to the next test. The important step — what you called “fully implementing the add function” — is taken by implementing some other insight that is not driven by the tests. That step isn’t a refactoring either, because it very much does change the behaviour of the program.

                  To design and implement a good program, sometimes you just have to know what you’re doing. There is no substitute for understanding the problem you are trying to solve and how you intend to solve it. And if you have that understanding and you design and implement your program accordingly, what is the value of following the red-green-refactor process compared to simply writing any unit tests you find helpful for verifying your implementation?

                  • iudqnolq 5 years ago

                    It depends on the atomic unit of work. I'm saying implementing the function add could be an atomic unit that doesn't need iteration.

              • snovv_crash 5 years ago

                It gives you something that can be verified for when someone does some optimization, or ports to a different platform. It isn't for today, it is for tomorrow.

                • Chris_Newton 5 years ago

                  I agree that having an automated test suite might be helpful in those situations, but TDD comes with a lot more baggage.

                  • snovv_crash 5 years ago

                    For sure, it's a tool that some people take as a dogma. But I can say definitively that untested codebases beyond toy-scale are harder to work with, and for all its flaws TDD makes sure that doesn't happen.

    • pydry 5 years ago

      Sacrificing at the altar of unit testability doesnt necessarily make better code.

      Unit tests' inability to handle state is way too often viewed as a problem with the code it can't properly test than the general crappiness of this form of test.

      • pc86 5 years ago

        Doesn't the state get handled in integration tests? The best environment I've worked in (from a testing perspective) had thousands of super fast unit tests. We had it run continuously (.NET / Visual Studio) and the entire suite probably ran in 3-4 seconds. When you submitted a PR, a suite of longer-running integration tests kicked off that took a couple minutes but a failure automatically kicked the PR back to the dev, and success notified everyone there was a new PR ready for review.

        • pydry 5 years ago

          It does.

          For some reason "code testability" is almost exclusively used to refer to the overuse of dependency inversion to make it easier (although not necessarily more useful) to write unit tests, though.

          I find that code falls into three camps - integration/logical/mixed.

          If it's integration code only an integration test is really useful and a unit test is a pointless waste of time that will do little more than mirror the code you wrote and fail when it is changed for any reason at all.

          If it's logical code a unit test is most useful and integration tests are likely going to be too slow.

          If it's mixed, integration tests are most useful although drawing out and decoupling the logical from the integration code and testing them separately is better in the long run.

        • bcrosby95 5 years ago

          Most of our tests are "integration" tests. With an in memory database which is blazingly fast. Our longer tests use a real database, which is just a config away.

          Our projects are so database centric, I've almost never seen a true unit test fail. The majority of time it's the tests that use the in memory RDBMS.

    • ashtonkem 5 years ago

      Good observability standards and fast releases catch more bugs than meticulously maintained test suites, imho.

      • nd 5 years ago

        I actually think if you squint testing is part of observability, except it gives you signal during the development phase. And observability as a concept is fundamental to a reliable system.

        • ashtonkem 5 years ago

          I see where you’re going, but I think if we start defining tests as part of observability then we’ll water that term down to meaninglessness.

          • snovv_crash 5 years ago

            If there's a bug and no test suite to try to reproduce it, how do you know you actually fixed it?

            • ashtonkem 5 years ago

              I define observability as the tools I use to check on the behavior of my system in ad-how ways in production. This is quite different from tests, which are designed to exercise specific behavior in a fixed environment. Both are used towards the goal of software quality, but they’re different.

              • snovv_crash 5 years ago

                I dunno. Working on projects with no tests for me always feels like a complete crapshoot. Any change could have broken some implicit assumption in some module you didn't know depended on it. Even worse is if it's a dynamic language and you don't even have a compiler to do the basic checks.

                For things like reproducing bugs, in particular, functional tests plus a debugger have been indispensable in seeing how the code actually behaves.

                Sure, production monitoring is nice. But it really only tells you about high level problems, and there are entire classes of bugs, eg. silently corrupted data, which you'll never see there.

                • ashtonkem 4 years ago

                  You’re arguing against a more extreme version of the point I’m making. I’m not arguing for no tests, I’m arguing that uncle bob over states their capabilities by a lot.

  • koonsolo 5 years ago

    The thing is that all that extra test code also needs to be maintained, also contains bugs, etc.

    It all comes down to return on investment. For example, I used to agree with TDD that for every bug, first write a test that fails, and then fix the bug. That way you prevent regression.

    So I proposed this to my manager, and he responded: we tracked all bugs in our system for the last 10 years, and when you look at fixed bugs that get broken again, it only occured very rarely. So in the end, doing that was not the best investment of effort.

    • asddubs 5 years ago

      One point uncle bob makes is that doing this for everything allows you to make far-reaching architectural changes with confidence that you haven't broken a bunch of things in places you don't even realize. so the tests are a tool to allow you to do refactorings you would otherwise be scared of

      • Chris_Newton 5 years ago

        One point uncle bob makes is that doing this for everything allows you to make far-reaching architectural changes with confidence that you haven't broken a bunch of things in places you don't even realize.

        I was working on a new project recently. Over the first month or so I went through four or five significant iterations of the architecture before I settled on one that seemed to have a healthy mix of power, flexibility and simplicity.

        Each time, I found the test cases I’d identified during earlier iterations helpful. Ultimately they led me to a more rigorous analysis to make sure I’d covered all required cases in all required places.

        However, I hardly ever kept the test code from one iteration to the next. Each unit test follows a general pattern of arranging whatever scenario I want to test, then calling the function under test itself, and then asserting some expected result. With a significant architectural change, the interfaces for arranging things or to the function under test might be changed, leaving much of the code in the test obsolete. The responsibility being tested might not even be in the same part of the code any more, meaning a whole set of test cases now need to be applied to a different component in the system.

      • ashtonkem 5 years ago

        Statements like that make me wonder when the last time he committed to production code, because that’s just laughably wrong.

        Maybe if you work in a monolith, sure. But most of us work in distributed systems with really complex behavior. No TDD suite in the world is going to catch a thread pool issue that’ll open a circuit breaker in your client.

        • berkes 5 years ago

          > No TDD suite in the world is going to catch a thread pool issue that’ll open a circuit breaker in your client.

          I don't know your setup. But this is normally exactly what system tests do (aka E2E tests, GUI tests and so on; the tip of the Testing Pyramid). In distributed systems, those often live outside of the various components' codebases; maybe in their own project even.

          Edit: because it is a) very impractical to check these things manually, b) often simply impossible to check them before each release and c) requires hoops and tricks to get in a state where such events/regressions occur in the first place; a state that is near impossible to get to manually.

          • KronisLV 5 years ago

            I think that what the majority of the people within the industry want to do (microservices and distributed systems, much like FAANG) far exceeds their capability to deal with the complexities (such as testing distributed systems) and doesn't always even fit their needs.

            I get the feeling that if more people were okay with developing monoliths (albeit modular ones), then a lot of things could be easier to do, such as doing TDD properly and being able to refactor without fearing the unforseen consequences.

            Heck, maybe the projects that i work on in my $dayjob would even have proper documentation, decent test coverage, as well as a set of tools to support the development processes, instead of having to waste time managing the configuration, deployments, as well as system integrations. Maybe it's a matter of there not being enough workforce (or the managerial folk not seeing much point in investing resources into testing and other ops related activities that don't generate business value directly, a worrying trend i've noticed), but right now i'm implementing Ansible and containerization to at least simplify some of this stuff, but it feels like an uphill battle, as i'm also supposed to ship new features.

            Surely i'm not the only one that kind of sees why the person that you're replying to would express the viewpoints that they did? It's hard to do elaborate E2E tests when everything is figuratively on fire constantly and you're struggling with a complicated architecture which may or may not be necessary. I'm probably projecting here, though, but every single enterprise project that i've worked on has been like that.

      • the_gipsy 5 years ago

        This can also lead to an asphyxiating second-system that prevents you making the slightest architectural change.

      • koonsolo 5 years ago

        That is definitely true and I fully agree with this.

        The benefit is basically all the tests that you don't have to change and give you confidence that your refactorings don't break certain things.

        But the drawback is all those tests that do need to be rewritten because of the refactoring, and so will slow you down again.

        But in the end for this use-case, I think it's a good ROI. Probably the best use-case for TDD.

      • IggleSniggle 5 years ago

        This is what good sum types are for, and have the added benefit of not just testing what the an output is after the fact, but can tell you as you’re writing the thing what the output must be, making refactoring far faster than waiting on a test suite to fail.

    • tikhonj 5 years ago

      When I fix a bug, I have to reproduce it somehow to make sure my fix actually works. Adding a regression test is a way to save that work as code. Once I have a reasonable test infrastructure set up, the majority of the effort is in understanding and reproducing the bug; going from that to an automated test should not take significantly more effort.

      In return, I get to have some extra confidence that a bug doesn't return (which would be embarrassing, even if it's infrequent!) and I get a more thorough test suite that lets me refactor more quickly and aggressively. And if an old bug does come up again, the advantage is not only that the test will catch it before a release, but also that the person fixing the bug won't have to go through the effort of figuring out how to reproduce it from scratch—it's reproduced right in the test suite!

      So I am not sure that just looking at how often regressions actually happen in the existing codebase is sufficient to make any real conclusion by itself.

    • linspace 5 years ago

      > The thing is that all that extra test code also needs to be maintained, also contains bugs, etc.

      I have often observed an evolutionary behavior on tests: tests that pass easily survive

    • cbushko 5 years ago

      > So I proposed this to my manager, and he responded: we tracked all bugs in our system for the last 10 years, and when you look at fixed bugs that get broken again, it only occurred very rarely. So in the end, doing that was not the best investment of effort.

      That sounds very hand wavy.

      It is making the assumption that:

        - your bug tracking system and people are so good that they have found all the duplicate tickets.  
        - they don't make mistakes and find all duplicates for tickets. 
        - your code is so good that it hasn't had any side effects that caused regressions. 
        - your boss is so good that he has a full grasp on 10 years worth of bugs.
      
      edit: formatting
    • dgb23 5 years ago

      Regression tests are apparently an effective tool to maintain code stability.

    • peregren 5 years ago

      Even if it's only rare, the test also shows clearly to a reviewer that the bug has been fixed.

      In my opinion, as soon as a test suite finds a bug it has added a lot of value, even if it's rare.

    • drewcoo 5 years ago

      Brushing _that_ tooth is a bad ROI because it rarely keeps a filled cavity from recurring.

  • cjfd 5 years ago

    I learned TDD years ago and never looked back. It is true that there are some things where it is less suitable and therefore I would perhaps decide not to use it. E.g., user interfaces in case of changes that are mainly visual. Or code that tightly integrates with the system. E.g., file system manipulations. For all other code I think TDD is absolutely the best way to write anything more complicated than a throw-away script. Frankly, I am pretty much at the point where I consider anything less than TDD borderline unprofessional. I have come to expect code that was not TTD-ed to be buggy or more complicated than necessary or both.

    • sidlls 5 years ago

      I take the opposite view. Code developed with TDD tends to be overly complicated and inefficient, and the test suites themselves are often bizarre labyrinthian hellscapes of mocks, facades, and workarounds. "They just did it wrong, then," you say? That statement ceases to be meaningful when it can applied routinely: it's the norm, and if it's the norm, it's not something "they just did it wrong" applies to.

      • blacktriangle 5 years ago

        So the way that TDD advocates point out that if your tests are ugly, its a sign your code is ugly. I think this also applies to languages and frameworks as well.

        For example, I think of the experience doing TDD in Rails with all its ORM and callback interconectedness to TDD on functional code that is mostly exchanging values between systems. All the stub and mock crap goes away since values become the boundary of the system. Gary Bernhardt has an awesome talk making that exact point.

        I'm now of the opinion that if TDD is a struggle, its a sign that your environment has serious underlying flaws.

        • sidlls 5 years ago

          "[I]f your tests are ugly, its a sign your code is ugly" is a bit tautological, and highly subjective besides.

          I consider the 50-line function to be "less ugly" than the same function butchered for "testability" into a bog of poor abstractions and single-use smaller functions.

          • disease 5 years ago

            For me at least, TDD has driven me to look for more functional solutions to problems. It's hard to come up with a more testable piece of code than a pure function that spits out the same output every time for a given input.

            Unfortunately this approach tends to be less effective in languages that do not have good support for functional ideas like Java and C#.

          • blacktriangle 5 years ago

            I agree some TDD advocates are way to big on loc as a metric for testability. Loc is pretty much irrelevant for both testability and readability.

        • carlmr 5 years ago

          >TDD on functional code that is mostly exchanging values between systems. All the stub and mock crap goes away since values become the boundary of the system.

          That's generally why I think functional programming is taking off. It's much easier to reason about input-output of stateless functions. They're much easier to test and develop. They're way easier to understand for the reader, because the reader can understand what's happening in one place, instead of having to look through many different files, creating a complex model of all the inheritance relationships between classes.

    • serverholic 5 years ago

      Have you considered that maybe TDD is just a really good fit for your brain?

      I've worked at a TDD-focused company and it always felt like I was coding through molasses. Some of us weren't as strict about TDD and I didn't notice a difference in our code quality.

      • wpietri 5 years ago

        I would be very curious to see the code base there. Some places are pretty bonkers about testing, in a way that seems almost religious to me. Excess mocking, extremely verbose tests, and a lot of cleanup to do in testing-land whenever you make significant changes in the production code. Maybe you were at one of those?

        To me, doing TDD right is about having the minimum amount of testing needed to keep bug rates extremely low. It's also about keeping the test code as well factored as the production code, such that changing or refactoring the production code doesn't create disproportionate work in the test suite to bring it back in line.

        An experiment I've tried repeatedly is to start a new project and shoot for zero bugs. I think I'm pretty smart, so that at first seems achievable without tests. But pretty quickly complexity increases that I at least have to manually test. And not long after that, manually testing everything gets tedious, so I have to start automating the things I am manually checking. If I keep pursuing zero bugs and feeling like I'm spending my time optimally, I keep ending back up at TDD. Maybe try that for yourself on a hobby project? You might find that the TDD-focused company was better at talking about TDD than doing it.

        • serverholic 5 years ago

          Everything you said is about good testing practice, not specifically TDD.

          TDD can be nice for small modules when you have a good idea of where you want to go with your code. For large modules with a lot of unknowns it can be a pain, and in those cases I prefer testing after a first draft.

          Perhaps you haven't built a module that was very large yet? Maybe try writing a large module with lots of unknowns. You might find that TDD gets in your way more than it helps in the early stages.

          • wpietri 5 years ago

            It's true that TDD and good testing practice are related, so I agree on that part of it. I also agree that sometimes when I don't know what I'm doing, it's worth writing prototype code and throwing it away or bringing it up to standards. Other times I'm fine starting with TDD as a way of exploring the design space.

            As to whether or not I have enough experience with TDD to judge it, I'll leave that up to you. But I've been doing TDD ~20 years, and I wrote a lot of code before that. Let me know if that's sufficient.

        • mypalmike 5 years ago

          I've found more problems with insufficient mocking as opposed to excess mocking. If you're not using mocks, you're likely using real dependencies. And that's where you get molasses testing. One small code change has cascading effects on tests, and small feature changes that take a few minutes to implement can result in a day or more of just fixing the affected tests.

          • twh270 5 years ago

            Look up the Object Mother pattern. The idea is to have 'baked' objects which you can use repeatedly in tests, which fulfill a certain scenario. For example a hotel reservation system needs to be able to handle a single person up to a whole family (maybe with an allergy or special needs thrown in for good measure). So you set up a bunch of Guest objects using a GuestObjectMother.

            When you make a change to Guest, you only need to change the GuestObjectMother and any tests that may directly be involved in the change to Guest. Tests that simply require a Guest to be involved in the test probably won't have to change, as they're simply retrieving the appropriate Guest object and then handing it off to something else.

          • wpietri 5 years ago

            It can go wrong both ways. But if changing real dependencies introduces cascading failures that have to be changed in a lot of tests, I'd suspect that there is expressive or structural duplication in the tests that could be eliminated.

    • ramblerman 5 years ago

      > Frankly, I am pretty much at the point where I consider anything less than TDD borderline unprofessional.

      Would you feel that way about the linux kernel for example?

      • wpietri 5 years ago

        If it started today, I certainly would. But I give them a pass because a) legacy code, b) legacy process, c) distributed process, and d) hardware is hard.

        But consider this article on how it is tested: https://embeddedbits.org/how-is-the-linux-kernel-tested/

        It mentions a lot of automated testing, which is good. And if you're doing automated testing anyhow, I think TDD is the most effective way to get there. But consider this:

        "The kernel development process also proves that there is a big focus on testing since around 20% of the development process (2 weeks) is reserved for code integration and the other 80% (8 to 10 weeks) is focused on testing and bug fixes."

        80% is a lot! And that also suggests significant lag between writing a bug and having to fix it, which means a lot more time figuring out what went wrong, and a lot more code written after the bug, meaning really fixing it might be expensive.

        So I think it's reasonable to ask how much more effective kernel development would be if they could safely eliminate that 80% of time spent on cleaning up after the 20%.

        • Jap2-0 5 years ago

          I'm not a kernel developer, but I'm not sure if it's entirely accurate to call it an 80/20 split. Sure, Linus only merges new features during 20% of the release process, but as far as I can tell (again, not a kernel developer) most everyone else continues to work on new features outside of the merge window, and just submits them during that 20%, rather than spending that 80% only bugfixing.

          • wpietri 5 years ago

            Oh sure, I'd expect the labor numbers to be different. But it's still a pretty obvious constraint, and one that causes long feedback loops.

      • cjfd 5 years ago

        Actually, I am not entirely sure how much of the testing there is automated. Presumably there are at least some automated test tools floating around here and there for the kernel. Major parts of the kernel could fall under the 'systems' exceptions where they are very close to the hardware. It is possible to do TDD to stuff that is close to hardware but it certainly is harder and it is more likely that some failures fall through the cracks when doing TDD. One thing that is different about the linux kernel, as far as I know, is how thorough the reviewing is that is going on for every patch. Also, there is the merge window followed by a long stabilization period. I would say that if you are not doing TDD you are going to need processes like that if you want to maintain stability. Also note that processes like thorough review by multiple people and a stabilization period four or five times longer than the merge window sound really expensive. Much more expensive than TDD. But also if one does TDD, still system/end-to-end tests are desirable. Not as much as without TDD but without it things would still break somewhat regularly.

        • berkes 5 years ago

          > One thing that is different about the linux kernel, as far as I know, is how thorough the reviewing is that is going on for every patch.

          This is another underexposed feature of TDD: reviews. When I see a PR with solid test coverage, and those are green, it greatly speeds up the process. I just need to read through tests, see if they make sense, match the requirements and test those. And then glance over the added code quickly.

          It (unfortunately, I might add) is not always possible to flat out reject a PR without solid test coverage, so it often is less work for me to add some test myself to a PR, than to manually poke around the changed code or to read through it in detail. This is not TDD, but testing-after-the-fact, and those test often are of much lower quality, but at least this is less work, for me.

          Testing greatly speeds up the work for a reviewer.

        • Chris_Newton 5 years ago

          With systems programming, performance matters, and all that extra indirection can be expensive.

          Indirection can also make it harder to rigorously analyse all possible paths through a piece of code and make sure you’ve handled all required cases.

          • dnautics 5 years ago

            It's very possible to do the indirection at compile-time, set a flag only active during tests.

            • Chris_Newton 5 years ago

              It’s possible, but typically you then pay a higher price on the other point I mentioned.

              I used to work on a library that solved a certain type of mathematical problem, written mainly in C++. It was built separately for many target platforms in production: different operating systems, processor architectures, compilers. Each supported combination of these might require special provisions in the code due to compiler bugs, hardware limitations, etc. Then you’d have diagnostic code that was only included in the debug builds used during development. There are more than enough preprocessor shenanigans to go around in that kind of environment already, and another layer for test-related flags and checks isn’t going to break any records for most readable code.

              • dnautics 5 years ago

                in the languages I work with this is not a problem. Mox, for example, is what I use in Elixir, and even though "swapping out a module" is kind of a global change, it has facilities to track these mappings in runtime, even in concurrent testing, but the module is selected at compile-time, and only in test, so it is a "zero-cost abstraction" for running in prod.

                I think this is just internalized pain of working with C++ (I suffered a lot with that 20 years ago). For systems-level stuff, I'm getting very excited about Zig, I imagine it will be basically "equally easy" to instrument these sorts of things at compile-time in Zig, as types are first-class values at compile-time. Zig is also super-composable; I have an example prime sieve that can run 60+ settings (combinations of single threaded, multithreaded, bool array, bitmap, hyperthread-awareness) by using a compile-time for loop across a series of instrumented settings; so long as I stay within the standard library the language itself already takes care of the polyfill for platform stuff, and I have access to those flags, if, say I wanted hyper-optimize for things like sse. I don't see any mocking libraries yet, but I doubt that instrumenting a mocking namespace in test will be any more difficult than it would be, in, say, elixir.

        • chitowneats 5 years ago

          Tests != TDD. TDD encourages an architecture with lots of indirection so that it can achieve the necessary polymorphism for injecting dependencies in test.

          I can assure you the linux kernel is not TDD'ed.

          • dnautics 5 years ago

            > TDD encourages an architecture with lots of indirection so that it can achieve the necessary polymorphism for injecting dependencies in test.

            This is a language/library thing. Some language/libraries let you do TDD with very little indirection, a table of mocked dependencies as compile-time configuration, and all of the mocking events happen in code in your tests.

          • wpietri 5 years ago

            I think that's typically true, but that's because most TDD is done in languages where there's not a lot of cost for indirection. I expect we'd see a different outcome if something like a kernel were started with a TDD approach.

            For example, we might see ways of expressing indirection for testing that then get removed when building the final product. Or we can look at the great strides made in virtualization since the Linux kernel was started. Something previously hard to test becomes much easier. And I'd expect more so if the testing had co-evolved.

            Of course, there still might be hard-to-test areas. But given the massive ingenuity applied to the kernel development process over the years, I expect that they would have advanced what's possible in TDD if they had pursued it.

            • chitowneats 5 years ago

              Why do you think that linux kernel developers have not changed their approach? TDD is not a new idea.

              If indeed they have continued for decades to do it "the wrong way", imposing costs onto the project and society proportional to the number of preventable bugs creeping in, is this not a major scandal for the industry?

              I suspect that the core team have reasons that they have not developed tooling to enable TDD of the Linux kernel. And that those reasons are more substantive than "meh".

              • wpietri 5 years ago

                TDD is not a new idea, but it's about 10 years newer than the Linux kernel. My experience is that projects very rarely make a switch like the one described by the Factorio team, so I assumed the Linux folks haven't. If you're saying they do work in a TDD fashion, I'm happy to take your word for it.

    • echelon 5 years ago

      It also seems really ill suited for new systems and places where you're experimenting with new languages or frameworks. You don't know the shape well enough to be able to test the inputs and outputs before writing the thing, and learning via TDD seems highly suboptimal.

      • berkes 5 years ago

        I don't entirely agree here, though.

        If you are learning a new language, new paradigms or new frameworks, then, indeed, often testing is an extra burden that adds little at this point. But in this stage, I hope, one should be aware that the code you are writing is bad anyway. In a year, you'll probably be very ashamed of all the things you did there.

        But in languages or paradigms that you are already familiar with, TDD helps a lot when learning new things. You can very easily learn about a new library by writing tests that use the lib. Or learn a new pattern by implementing them in tests.

        For example, write some tests that wrap a Stripe library, then change the test to change some setting, call a method, or feed it some weird data, and assert some outcome rather than manually putting stuff in a cart, filling your address, CC details and then see that the setting you thought did X actually does Y. And repeat again.

      • rytor718 5 years ago

        I think you have a fair point in regards to learning new things. YOu don't need tests to explore tools and languages. Tests are for development.

        Thats just the thing with TDD: used properly (which is clearly tricky for many of us), it should help you think through what the inputs and outputs of the program you hope to write are. It does assume you know what you want to build though, even if you're unsure of how to build it.

        I think of it as a design tool to be honest. When I'm not quite sure exactly how something should work, but I have a clear idea of what I want it to do, tests help me work through it in a way that produces extensible code.

        • user-the-name 5 years ago

          It's not tools or languages you are exploring, you are exploring the problem you are trying to solve, and its possible solutions.

          If you have a problem that is well understood, well documented and that has a straightforward solution, TDD is great because you know where you are going.

          But a lot of the time, you have none of that. You don't know where you are going. You need to experiment, you need to iterate, you need to occasionally change directions completely. When you're doing that, TDD holds you back and gives you nothing.

          • karmelapple 5 years ago

            In my experience a problem doesn’t need to be well understood, well documented, and straightforward to benefit from TDD.

            Rarely are we building something with absolutely no idea what the final outcome should be, are we? There’s a general idea of, “this new feature should mostly work like this” description, correct?

            If not, you’re doing code very different than the stuff I build.

            And that might be the case! But I’m curious what kind of projects you’re working on where you have basically no idea what you’re aiming for.

            Even when I don’t understand what’s wrong, I’ll generally have an idea of either: * what is undesired behavior to fix, or * new behavior to add

            I can at least play around with the code a little to see how that impacts existing tests, or write some simple test that will demonstrate a really high level desired outcome. As I iterate the functionality, I might see the opportunity for another assertion or three, and then keep iterating on that until I’ve figured out what I truly want, both functionally and test-wise.

          • rytor718 4 years ago

            I agree with Karm on this. I'm not usually trying to figure out how to solve a problem at the first stage. I'm trying to understand what the app should do. I'm exploring the language or framework to understand how to pull that off.

            Once I know, I can write a test based on just that information. The problems usually come well after that step as my implementation becomes more complex with the size of the app.

      • pydry 5 years ago

        Yes, it's awful for spikes.

        I have done a variant of TDD where the output is slightly uncertain where:

        * I build a test with everything filled in except output of some kind.

        * I write code that generates an output I want to eyeball.

        * I run the test in "rewrite mode" where the output is generated by the program and saved with the test. I eyeball it to check if it's ok and then commit.

        * In CI it checks the output against the fixed version.

        There's plenty of scenarios where it doesnt work or is generally inadvisable but I find it to be a super effective technique where it does.

        I think it's an approach that would work better for code that generates text, images, sounds, than common-o-garden TDD.

  • blacktriangle 5 years ago

    You say that like its a bad thing. Sure dogmatic TDD can lead to issues, but going through a period of dogmatic TDD has, for me, resulted in becoming a far better programmer. And as time goes on I find myself drifting further back towards the dogmatic side of TDD as I know the difference how it feels working on tested portions of our code vs the untested portions of our code.

    • astrange 5 years ago

      It is kind of a bad thing. Recently at work there was a quality program introduced by some training guy, but by quality they seemed to mean write more tests and do TDD, and our junior engineers read that as write a ton of unit tests (the only kind they'd heard of.)

      I pointed out the 10 year old Norvig vs Ron Jeffries fight[1] which demonstrates that TDD is useless when you don't already know what you're writing, but they just looked confused.

      The other problem here is that unit tests never break (since you've mocked everything that can break) and therefore aren't worth running; it might be more productive to write them for TDD and then just not commit them.

      [1] https://news.ycombinator.com/item?id=3033446

      • bluGill 5 years ago

        Unit tests do break once in a while. 80% will never break once written and should be thrown away, but that other 20% will unexpectedly break in the future alerting you to a problem that otherwise you wouldn't find until much latter in the release cycle. I don't know have any way to know which tests are in the 20% and which the 80% to throw away, which is why I keep them all.

        Of course the 80%/20% figures are estimates. The longer your code is maintained the more likely it is some tests will break.

        • karmelapple 5 years ago

          Can confirm. Recently had simple unit tests that almost seemed, on first glance, to indicate, “huh, why are these tests here?”

          Upgraded to NodeJS 12 where some details about sorting arrays were changed, and we got 2 failing tests. Before that, I would have bet money that those tests would never fail.

          • astrange 5 years ago

            Was that the point of the tests, or just a coincidence?

            Now that they've failed once they've become regression tests (one of the most useful kind of test), but if you set out wanting to test the platform under you I think you'd want to do a lot more work than that.

            • karmelapple 4 years ago

              I might be missing the question - are you asking if the point of the tests was to check that it would work when we upgrade a version of NodeJS?

              The point of our tests was to document a requirement we had - in this case, how some data sorted. That test was able to eliminate questions of the "how" - whether the sort was done in our database layer or our NodeJS layer - and let us focus on the important outcome: are the things sorted in the right way?

              We didn't write them because of the NodeJS 12 upgrade. We wrote them 5 years ago. We added them because writing them captured a requirement. And when an unexpected reason caused that functionality to break, we avoided shipping a bug in our software.

              Regression tests are certainly useful, and every time we ship a bug, we attempt to write an automated regression test. But I wouldn't argue they're one of the most useful - equally useful is any test that captures a requirement.

      • blacktriangle 5 years ago

        I agree with you, which is why I think the GPs post that TDD is not a dogma but another tool in your toolbox makes sense. Just like the larger development community when faced with new tools, individual developers go through the pattern of adoption where they start using a technique for everything (the honeymoon phase) and then through that phase they learn when the tool is appropriate and when the tool I not.

        As posters in your linked thread point out, the irony is that Norvig would have probably been better off using TDD since he knew the general shape of his solution already where Jeffries likely should have been writing several spikes to explore the solution space rather than using TDD since he was unfamiliar with the problem and did not yet have a good solution.

      • solipsism 5 years ago

        The other problem here is that unit tests never break (since you've mocked everything that can break) and therefore aren't worth running; it might be more productive to write them for TDD and then just not commit them.

        That's fine for code that never changes, or that has no logic in it. As soon as you want to change some code, you'll want a portion of the tests to require modification and a portion to stay passing.

      • 8note 5 years ago

        Unit tests that don't break are documentation, both about your code, and what you expect dependencies to do.

        Gives you something to compare against if the dependency changes its behaviour

    • JohnHaugeland 5 years ago

      > You say that like its a bad thing.

      That's because it is

  • tziki 5 years ago

    TDD, when normalized for time spent writing tests, has not been found to be any better than the 'normal' way of writing tests afterwards. It's interesting how much of programming lore falls apart when you actually try to measure the impact.

  • fendy3002 5 years ago

    From my experience, tdd is useful if you already have solid specification, the code / part can be tested and you / your team understand how to develop unit / integrated tests. The problem arises because usually one or some of the points are unfulfilled.

    And it's not without drawbacks. Increased development time and the needs to maintain the unit tests are costly, but rewarding.

    Also I don't like too much interface and mocking only for the sake of testing. I find it usually breaks when integrated and makes code harder to maintain. Maybe I'm just inexperienced.

    • berkes 5 years ago

      > The problem arises because usually one or some of the points are unfulfilled.

      An important idea of TDD is that it allows you to discover those "unfulfilled points" in the tests. When writing code (the tests) that use an API, instead of when writing the actual API.

      When writing code that uses objects, methods, interfaces and so on, you are in a mindset of writing "what a user of the code would wish there was". This is probably the best place, mindset and moment to define those specs in detail.

  • mbrodersen 5 years ago

    I haven’t had a bug in production for 9+ years. Not a single one. And routinely do major refactorings, feature improvements, optimisations etc. I can do it because of 9000+ tests. 1800 hand written and the rest auto generated to detect changed behaviour. The test suite allows me to spend 95% of my time adding features instead of bug fixing. Most excellent.

  • skinnyarms 5 years ago

    Am I missing something, it sounds like they are already doing this - not striking off on a new venture.

    • hatsuseno 5 years ago

      > I have to admit, that I didn't know what TDD really was until recently.

      They do already do this (hence the blogpost), but it's something kovarex hasn't explored before, so they're pretty new to TDD.

  • ashtonkem 5 years ago

    Red/Green is a good technique for fixing bugs and extending existing functionality.

    • koonsolo 5 years ago

      What is the chance of a fixed bug getting broken again? As it turned out in our analytics over 10 years: very very low.

      So the effort you put into writing a test for a bug, has most likely a negative return on investment. That time could be better spend somewhere else.

      • RHSeeger 5 years ago

        You're ignoring various parts of the equation, though.

        - Writing a test for a bug is part of understanding the bug, much like rubber ducking, it helps you make sure you know exactly what causes the bug.

        - The better you are at writing tests, the less time it takes you to do so.

        - Having tests for the code means the code is testable. In general (but not always), code being testable means it's better code (more readable, less complex, etc).

        - Having tests for code means you have to worry less about breaking it when making changes. This allows work to proceed faster.

        So it's not just "did adding this test prevent this one bug from re-occurring", it's "did adding tests improve our development overall". The later is far more likely to be true than the former.

      • duncan-donuts 5 years ago

        Maybe for you and your org though. This isn’t universal. I’ve worked somewhere that would use the phrase “whack-a-mole bugs” because teams would start thrashing and break each others’ shit over and over. There’s a ton of stuff we could have done — communicate (even just a tiny bit), less silos, communicate!, write better code, write tests.

        When your problems are much more fundamental, like teams just flat out don’t talk, writing the test is an easier investment.

      • synthc 5 years ago

        I disagree: when fixing a bug I usually start with writing a failing test that reproduces and exposes the problem, and then fix the problem so that the test passes.

        Understanding, reproducing and fixing the problem is the hard part, once that is done making a test out it hardly takes any time.

        Writing the test does not only prevent regressions, it also helps with confirming you actually fixed the bug.

ineedasername 5 years ago

Every time a Factorio thread makes it to HN I feel like a fully recovered meth addict who suddenly has their old dealer knocking at their door:

--

Dealer: "Hey? You there? I got some meth for you."

Me: "Go away! I don't want any!"

Dealer: "Oh now don't say that. You remember how good it is? I know you do."

Me: "I can't, I can't afford it, the price is too high."

Dealer: "What? Come on, it's free! You already paid for it!.

Me: "I'll lose my job, I can't, just go away!"

Dealer: "Your JOB? This IS your job. Open the damn door, THE FACTORY MUST GROW"

Me: ::cowers in the closet chanting please leave please leave please leave::

--

Also it was never good. It was more like a mind virus. Like the sort of problem or project at work you can't stop thinking about until it's done. Only with Factorio, it's never done. Never.

My best defense against it are other video games I can stop & start when needed. Or booting up my VPN connection and picking a work task from my back log until the cravings go away.

  • hatsuseno 5 years ago

    Factorio is just personal project for people who are too tired after work to actually do one. Like me. It scratches the itch do "build" something, even if it's only an outlet and not intrinsically productive.

  • LegitGandalf 5 years ago

    I always say, never start with a zero vacation balance!

truncate 5 years ago

> (1) no new features for a quarter or two, refactor the code, learn new methodologies etc

> (2) This allows you to instantly test what you write, and mainly use tests as specification

> (3) the problem comes when you break something and a lot of tests start to fail suddenly

My favorites. Don't expect to give away entire quarter, but at-least sometime would definitely be nice. All three so fundamental, and often ignored. In my experience, you get these right, it makes developer life so much easier. As someone earlier mentioned in thread, TDD is kind of like REPL driven development.

I think, one immediate benefit of companies focusing on good code is that engineers can aim for much more ambitious projects, and they can be more brave with the codebase. Instead we often end up with 100 over-engineered components with no well defined/enforced contracts, and a set of monolithic tests which runs the entire stack to test the most basic case.

  • sidlls 5 years ago

    On the other hand, with TDD we often end up with code that has been butchered in the name of "testability," and which is both less efficient and more complex than necessary.

    • fendy3002 5 years ago

      Which IMO, a bad practice. Too much interface, abstraction, mocking do not reflect real process. I find it often break when integrated with services.

    • leprechaun1066 5 years ago

      This usually happens when the developers in this situation are focusing (or are being forced to focus) on the tests over focusing on the solution to the actual problem in the product. TDD is just a development methodology which is a means to an end, not the goal.

      • truncate 5 years ago

        Yes, I think being little extreme either ways is bad. If TDD makes it hard to test certain new thing we are implementing, maybe do it some other way.

        I've found functional style of programming, or just decomposing functions into smaller functions, often work better with TDD. I'm personally not as much into pure TDD, as much as writing code that can be tested quickly and easily. In the end, what I want is when I'm writing code, I should be able to quickly run that specific piece of code and verify the some basic scenarios, instead of waiting a minute to push to cluster, another couple minute starting whole process and then running actual test.

      • sidlls 5 years ago

        Development methodologies exist to solve product development problems, not the product problems. In that sense, an organization that adopts TDD necessarily focuses on the tests (as part of the development process), by definition.

        The problem is, TDD is a poor development methodology.

  • fendy3002 5 years ago

    This should be the way. I really like three interation approach: research, stable, enforce.

    First you develop fast and breaking things with alpha / beta versions. Then you make it stable with bug fixes and minor enhancements. Finally enforce the code with review, unit tests and code coverage, etc.

    Theoretically they already have a good game engine (long lasting product) and interested in developing it further. Without enforcement, any future changes have potential to break things. Unit tests (enforcement) reduce that risks and make any changes / refactoring closer with specification.

blindmute 5 years ago

I'm not sure I understand why they're committing to such a long term refactor for a game which has already reached the tail end of its sales curve. As far as I know there are no internal monetization schemes in Factorio, and I really doubt further updates will boost sales anywhere near enough to justify the dev salaries.

  • trollied 5 years ago

    They're working on an expansion. See: https://factorio.com/blog/post/fff-365

  • colonwqbang 5 years ago

    I'm also surprised. Factorio feels like a finished game. It's more polished than most games I've played.

    Maybe the devs haven't found a worthy new project yet?

    • naikrovek 5 years ago

      They're working on an expansion for Factorio, and this refactor may have something to do with that.

      Maybe they just want to leave their code in good shape so they (or someone else) can come back to it at a later time and pick it up relatively quickly.

      • kllrnohj 5 years ago

        Expansion packs typically depend on the base game, so improving the base engine is likely directly contributing to the work on the expansion. The expansion is likely changing things about the base game, and they'd want tests to assert both with & without the expansion are still working as intended.

  • EndXA 5 years ago

    Worth pointing out something that they said in the post (emphasis in italics is mine, the full quote is given for context)

    > Imagine you have a company that goes slower and slower every quarter, and then you confront the shareholders with the statement, that the way to solve it, is to do absolutely no new features for a quarter or two, refactor the code, learn new methodologies etc. I doubt that the shareholders would allow that. Luckily, we don't have any shareholders, and we understand the vital importance of this investment in the long run. Not only in the project, but also in our skill and knowledge, so we do better next time.

    This isn't necessarily the full explanation, but it's certainly something to keep in mind.

  • fooey 5 years ago

    I would assume they're working on either an expansion or another game using the same engine

    If you're building a new game and your current GUI paradigm sucks, overhauling it first makes a lot of sense.

  • shepherdjerred 5 years ago

    They're planning to release a paid expansion.

  • robryan 5 years ago

    As he says, they don't have external shareholders that are demanding the everything they do maxmise profit.

    • mattmanser 5 years ago

      Because the worst case scenario is that they get half-way through the refactor, the game's a buggy mess, and then they all move on.

      And that's got a non-trivial chance of happening.

      • xyzzyz 5 years ago

        Sounds like a good deal for customers: either the dev team delivers, in which case they get better product, or they don’t, in which case they can keep using the current version.

  • ashtonkem 5 years ago

    There are some mentions of a DLC in the works, probably that.

    • js8 5 years ago

      Yes. I don't know what the DLC will be (personally I hope for water- and air-borne structures, vehicles and enemies), but I am sure I will pay for it.

      • ashtonkem 5 years ago

        I was planning to, but then I saw Kovarex bear his metaphorical ass on the subreddit, and now I'm conflicted.

        • js8 5 years ago

          Oh god, you should really think about what a freedom of speech is (suggested reading: https://en.wikipedia.org/wiki/Faurisson_affair). People should have right not to be involved in political discussions, and so they should have the right to tell people to take it elsewhere, even harshly.

          I am no right-winger (in fact, I am a socialist), but I understand why Kovarex said what he said, and I think he had a right to do it. Yes, civilized discourse is important, but it's also important to respect individual freedom not to be pushed into such discourse from random Reddit hecklers.

          I suppose it offends your sensibilities not because Kovarex did anything immoral, but because of his word choice. And honestly, I hate this surface level of analysis, where we judge the morality of a person by looking only at the words they use. It's counterproductive to addressing real problems, as there are far too many real villains in this world that are very nice in person.

          • ashtonkem 5 years ago

            > Oh god, you should really think about what a freedom of speech is.

            You're free to read my comment history on this subject matter rather than just making an assumption and patronizing me. You'll probably find that it's more persuasive to actually engage with people than treat them like an idiot right out of the gate.

            > but I understand why Kovarex said what he said, and I think he had a right to do it

            Who claimed that he didn't have the right to say that? This is a weak straw man; nobody is claiming that he can't say that, merely that he shouldn't.

            > I suppose it offends your sensibilities not because Kovarex did anything immoral, but because of his word choice.

            ... yeah? That's kind of how people work. How you communicate matters. Pretending that we should all just ignore him telling a fan to "shove it up your ass" is to basically ignore how humans actually work and think.

            > And honestly, I hate this surface level of analysis, where we judge the morality of a person by looking only at the words they use.

            As compared to what, mind reading? Literally all I have is his words online, and they're toxic and combative.

            I also really dislike that you're upgrading this to "morality" when my original point was about whether I wanted to do business with him. I feel like this is a neat way to side step my freedom to not do business with him.

            > It's counterproductive to addressing real problems.

            So what, I can't make my own decisions based on someone else's behavior because there are other issues? I'm obligated to give him money after he behaved like an asshole because reasons? How absurd.

            • wetmore 5 years ago

              As an outside observer I find it weird that you two are arguing about an event without any reference to what that event was. What did Kovarex say?

            • js8 5 years ago

              > nobody is claiming that he can't say that, merely that he shouldn't

              No, there is no practical difference between the "can't" and "shouldn't", if your intent is to participate in the punishment for what someone does. (Maybe you feel, as an individual, you cannot influence it. But in fact you can easily become part of the mob that does have the real power of punishing somebody.)

              > Pretending that we should all just ignore him telling a fan to "shove it up your ass"

              Yes, of course we should ignore it, it was just a heckler. Even a heckler can be a fan. Everybody can be an a* sometimes, and deserves to be called out about it.

              > As compared to what, mind reading?

              Compared to his actions, of course. If you don't have additional information, only words, think twice about acting upon such information.

              > you're upgrading this to "morality" when my original point was about whether I wanted to do business with him

              It's of course up to you. But why wouldn't you want to do business with him anymore for other reason than to uphold a moral standard? It isn't like he cheated you, in fact we probably agree that Factorio is a great product. The morality comes into it from reasoning about your response.

              In any case, I was speaking in a larger context. People do this all the time. For example, there are people who think Linus Torvalds is worse than Bill Gates because of the harsh words he sometimes uses (in public).

              To me, that is a very superficial way of looking at things. And same here, I think Kovarex is one of very few capitalists who actually deserve all the money they got.

              And that's so sad about all this "cancel culture" thing. It's boycotting people who are not necessarily the worst, just easy targets.

              > So what, I can't make my own decisions based on someone else's behavior because there are other issues?

              You can do whatever you want, all I am saying is you "shouldn't".

              • ashtonkem 5 years ago

                > No, there is no practical difference between the "can't" and "shouldn't", if your intent is to participate in the punishment for what someone does. (Maybe you feel, as an individual, you cannot influence it. But in fact you can easily become part of the mob that does have the real power of punishing somebody.)

                Oh the irony of attacking free speech in order to protect it. How, prey tell, do you intend to differentiate between those who are using their speech legitimately and who is part of "the mob"?

                • blindmute 5 years ago

                  I don't think anyone in these kinds of discussions is trying to make any provision for limiting speech, or your ability to not buy a product out of protest. People are free to do whatever they want. There is no objective criteria that makes your actions "wrong" or his "wrong".

                  These arguments really just boil down to people disagreeing about each other's reactions to things. What somehow gets obscured by oblique discussions of ethics and politics every time I see this kind of exchange online, is that it's simply two sides disagreeing about what a reasonable response is. There's nothing really to discuss about that. I (and the other guy, probably) think your decision to not buy the game because of the developer's comment is a dumb decision. You think otherwise. There's nothing to argue about here.

                • js8 5 years ago

                  > How, prey tell, do you intend to differentiate

                  I don't think I need to do it (why would I have to?), but it seems pretty straightforward. The online lynching mobs cross the line when they cause real world consequences for the lynched. (However, it is hard to predict in advance that the mob will happen and what the consequences will be. That's why people should be generally cautious about such participations.)

                  In other words, the punishment should be proportional to the crime. Kovarex got his comment deleted and that should close the issue and end the drama.

                  • ashtonkem 5 years ago

                    Yeah, that's what I expected, a proposal to silence the speech of some people to protect others. Maybe you haven't thought enough about free speech, not me?

                    I think Ken White covered this dichotomy best, in this case a few years ago over a guy name Pax who got fired for his intemperate speech. See below.

                    > The foundation of "witch hunt" rhetoric is the notion that some free speech (say, Pax's) is acceptable, and other free speech (say, the speech of people criticizing and ridiculing Pax and his employer) is not. You can try to find a coherent or principled way to reconcile that, but you will fail.

                    https://www.popehat.com/2013/09/10/speech-and-consequences/

                    • js8 5 years ago

                      I simply think there is a difference between criticism (saying you disagree or even cursing someone) and punishment (withdrawing funding, getting somebody fired).

                      And I also think that the whole point of free speech is to be free of societal consequences of it, whether they have government authority stamp or not, because it's IMHO impossible to discern that either (and I think the former Soviet regimes, which started in good faith of inclusive community, and ended up authoritarian, show exactly that problem).

                      • ashtonkem 4 years ago

                        > I simply think there is a difference between criticism (saying you disagree or even cursing someone) and punishment (withdrawing funding, getting somebody fired).

                        What an extremely weird take. Are you saying I don’t have the right to decide who to do business with based on what they say, or that that’s somehow inappropriate? Restraining the right of free association to protect free speech is an awful idea.

                        > And I also think that the whole point of free speech is to be free of societal consequences of it

                        You are wrong. Full stop.

                        Free speech is about the right to say what you want without the government punishing you for it. This also includes the freedom to express disapproval or opprobrium. The ability to express disapproval is fundamental to free speech, any attempt to curtail that is in fact an attempt to curtail free speech itself.

                        If you wish to speak without social consequences, speak anonymously. This is the exact reason why the Supreme Court has held that anonymous speech is a right, to speak controversial ideas without suffering social consequences.

                        Also, you’re criticizing me for considering not buying a future DLC. Under your own theory, aren’t you being anti free speech for criticizing me?

                        > whether they have government authority stamp or not,

                        If you can’t tell the difference between government action and individual people expressing an opinion, that seems like a problem specific to you and not this society.

                        > and I think the former Soviet regimes, which started in good faith of inclusive community, and ended up authoritarian, show exactly that problem

                        Ironic, given that you’re proposing authoritarian responses to protect free speech.

  • ranger207 5 years ago

    Maybe just for fun? This kind of thing is half the game itself after all.

IMTDb 5 years ago

> Imagine you have a company that goes slower and slower every quarter, and then you confront the shareholders with the statement, that the way to solve it, is to do absolutely no new features for a quarter or two, refactor the code, learn new methodologies etc. I doubt that the shareholders would allow that

> now there are 9 programmers

Companies on the stock market don't have "9 programmers". They have a lot of teams of 9 programmers. So while it's true that it would probably completely be impossible for a stock market company to completely freeze for a quarter or two, individual teams can still do that.

If the factorio team grows to tens of programmers (it probably won't and probably shouldn't), I would be very surprised if they find the need - and if they manage to - freeze all teams together for a big refactoring round. I am also unsure that it would be the right approach. That observation holds wether they go public or stay private.

  • Aditya_Garg 5 years ago

    Okay imagine you are a small startup backed with VC money. The same situation can still arise.

    • meesles 5 years ago

      Except that as the author of the article says, they don't owe investors any explanations. VC money will demand results which you cannot just ignore for a quarter.

tobyhinloopen 5 years ago

I wonder how common TDD is in game development land, especially when you’re using things like Unity or Unreal.

I feel like testing your behaviors is pretty hard, and even if you unit-test your behaviors, there’s still integration tests.

I only write games as a hobby and never use TDD, even if I’d like to, since the tooling is just either poorly documented or too slow, or both.

Usually this ends with me being frustrated with the slow development cycle and pushes me towards more unconventional methods of developing games in Javascript using Mocha to run the tests directly in the browser.

  • jayd16 5 years ago

    So there's a few reasons tests aren't as ubiquitous in games as they are in non-game dev.

    You need a large QA team (relative to your team size) to test for fun anyway. The game is constantly getting tested and bugs will get logged. The marginal benefit of automated tests is less than other places because of this.

    Games have no specification. You have almost no idea of even the genre of game you'll end up with at the end of the dev cycle unless you're making a sequel that has to fit into a mold. Sure you can write tests as you go along and test that enemies with negative health die. The next day someone will suggest "what if they stay alive for a period of time and then explode!" The definition of correct is constantly changing.

    Tests ossify functionality. It makes it harder to change things because at the very least you need to also change the test. If you're just changing tests whenever you want to suit your new desires then its hard to build trust in the tests.

    Games don't need to be correct. They just need to be fun. This also decreases the marginal benefit of tests compared to other industries.

    That said, it would be natural to unit test some data structure or some well defined system. Also, once your game is done, a la Factorio, you can go back and write tests for some refactor because you know the full design specs.

    • indeedmug 5 years ago

      I don't know if "correct" and "fun" are at odds with each other. There are very famous examples of games failing at the start because of bugs like Cyberpunk. There is a point where the game is too broken to enjoy. You want games where the correct behavior is the fun behavior. (However, there are counter examples like Goat Simulator.)

      To be fair, I don't pretend to know how CDProject developed their games. They might already have testing and the timetable was the problem.

  • mewse 5 years ago

    In twenty years in game development I have never worked on a game which had real unit tests or even integration tests.

    I’ve seen engines which used them, but not games. The rationale was that it was just too hard, which always felt like a cop-out to me.

    I would dearly love to have more automated tests in the game I’m working on now, but I’ve never seen a model of it working well which I could copy, and part of me suspects that it’d be a huge investment of time to figure it out entirely on my own when I’m already vastly overworked as it is.

    If anybody has references to indepth case studies of making game engines more friendly toward automated tests, I’d be super interested to see whether there were lessons I could apply toward my own situation!

    • teamonkey 5 years ago

      It's not exactly what you mean, but automated gameplay testing is fairly common in the AAA space, although maybe not taken seriously enough.

      A simple example might be simply starting and closing the engine after the automated build & package process to make sure the game actually runs, but I've seen things like using bots to emulate player behaviour to smoke test gameplay functionality. No Man's Sky used automated tools to evaluate the procedural-generation algorithms[1]. Here's a more comprehensive example of automated gameplay testing[2].

      [1] https://youtu.be/sCRzxEEcO2Y?t=3100 [2] https://www.youtube.com/watch?v=VVq_hgaX8MQ

      • mewse 5 years ago

        I’ve definitely seen some automated “pretend to be a player” systems which either spammed controller inputs at random or was able to take some very constrained actions within some portion of a game (undirected driving of a car, for example). Those are neat and helpful, but don’t really have the rigor of proper testing where you know that the same thing is being tested every time.

        One of my stumbling blocks at the moment is that the machines in my CI setup don’t have graphics cards in them (or monitors attached or etc); they actually can’t run the game, and I’d have trouble detecting the difference between “runs successfully” and “runs, but only displays a black screen” or other critical graphical issues.

    • dsego 5 years ago

      Not TDD but here is a talk about automated testing at Croteam

      Continuous integration and testing pipelines in games - case studies of The Talos Principle and Serious Sam https://www.youtube.com/watch?v=YGIvWT-NBHk

    • tarcon 5 years ago

      Interesting. I always assumed that to get the balancing feel right, a game would have to run huge parameterized test-suits to make sure win/lose results to user inputs are in balance.

      • beckingz 5 years ago

        Good luck doing this for a huge modern AAA game.

        • mywittyname 5 years ago

          I've always assumed they did this for games like LoL.

          Bots already exist, so the foundation for automated play testing is in place. Take the basic AI and add some functionality to track the effectiveness of various skills or loadout across plays. Using A/B/n testing to choose the most effective character strategy would probably highlight overpowered loadouts within a few thousand game-test-hours.

          They could probably take analytics from real players and do what's outlined above and get a reasonable idea of the impact a change will have.

          • anchpop 5 years ago

            The issue with that is that LOL bots are laughably bad, and making them not bad requires a substantial investment. But something like what you describe is possible. OpenAI has a paper where they evaluate chess variants by training AlphaZero on them and seeing which ones generate more balanced games between white vs black, and which ones lead to more dynamic games with fewer stalemates etc.

            • 8note 5 years ago

              Stalemate a are great fun; it's "the players agreed to a draw" and "draw by repetition" that are boring

          • wbc 5 years ago

            I dont think they used bots to test, just rpc endpoints, at least as of 2016 blog:

            https://technology.riotgames.com/news/automated-testing-leag...

        • kempbellt 5 years ago

          This is likely why "Game Tester" is an actual job, or at least used to be. 14 year old me didn't believe it was a real thing...

          "Early Access" seems to be the more popular route now. People are happy to pay to test a game and feel like they participated in its development. Check out Star Citizen if you want to be amazed and depressed by this truth at the same time.

          Sometimes it works well. Sometimes it doesn't. Seems largely dependent on how engaged the devs are with their community.

  • CodeGlitch 5 years ago

    I was in the industry from early 2000 to early 2010. It was only towards the end that Unit Testing was a thing. At the start we didn't even do code reviews or use a sensible source-control system.

    Yeah it was a painful experience, but I survived.

  • Danieru 5 years ago

    Factorio is perhaps the only game I am aware of using TDD. Lots of engine teams use extensive automated testing. Only Factorio is applying it to game logic of any major game I know.

  • exdsq 5 years ago

    Anecdotal but I think it's pretty rare - my friend worked as a game developer for Epic and didn't know what TDD (or SQL for that matter) actually meant.

    • tobyhinloopen 5 years ago

      Given how common it is for bugs to reappear in Fortnite, I’m pretty confident their testing suite is either incomplete or not present at all

  • Thaxll 5 years ago

    Video game client don't have tests.

  • ashtonkem 5 years ago

    I feel like game dev is probably an area where TDD has some value.

    My team writes distributed systems, which drastically reduces the value of a TDD approach. There’s only so far you can take the technique with a database backed api before it just becomes absurd.

  • mschuster91 5 years ago

    Testing? In games? There is no such thing on a wide scale, not anymore since the cost of distributing patches essentially became a small budget line for a CDN.

    Modern games are notorious for using the first, most loyal customers as beta testers (hello Fallout 76...).

    The reason is two-fold... while you absolutely can test some parts of the engine (e.g. collision detection, networking) you can't really "test" stuff that needs a human eye to see if it's working as intended (anything that's rendered) or involves randomness (e.g. fire, fog, water, opponent spawning, loot). That means you have to hire lots of skilled (!) humans, provide them with expensive rigs, and give them time. Which is incredibly expensive.

    • gmueckl 5 years ago

      Just to give you some perspective: my current employer's main product isn't called a game, but it has an engine at its core that is a game engine in all aspect except its name. And we test the sh*t out of it. We have thousands of automated and very sensitive tests on that stuff. Our test suite goes as far as testing for pixel perfect output. And this involves stuff that is "random". It took us some effort to be random in a perfectly reproduceable way, but we got there.

      Game QA is more involved than that, of course. Content needs to go through a signoff and QA process that involves humans (we do that, too).

    • kempbellt 5 years ago

      This is definitely true for some games, and Early Access is increasingly popular, but for others, QA and testing is definitely a part of the development process.

      I interviewed at a game company a few years ago where one of my daily tasks would be to spend an hour just playing the game and seeing if I spot any bugs. I didn't end up taking the job so I didn't see how involved their actual code testing process was, but it was apparent that they actually cared a bit about quality control.

kevmo314 5 years ago

> Which is a big improvement already, as adding and maintaining the new logic only requires you to look at one place instead of several, and it makes it generally more readable and less prone to errors.

It's interesting to think about the other HN thread discussion about comments vs one-time-call function abstractions in this light: https://news.ycombinator.com/item?id=27546135

I'm a big fan of "put code in one place" too. It was the biggest factor that convinced me that JSX was a great idea compared to separating the templating logic out.

ashtonkem 5 years ago

Honestly, I think the factorio team probably now knows more than Uncle Bob does, based on their blog posts.

nanis 5 years ago

This is a neat article. I do have comments about testing in general though.

IME most developer do not understand each test has four possible outcomes:

* Code is good and test passes

* Code is bad and test fails

These are the only two possible outcomes developers focus on: When I ask what they should do if a test that used to pass now fails, they always tell me stories about how to debug the code under test.

There are two additional possibilities in test:

* Code is bad yet test passes (false negative)

* Code is good yet test fails (false positive)

Again, IME, most people do not look at the test again once it passes for the first time.

As a result, tests which are themselves code, become the largest untested part of the code base. You get these thousands and thousands of lines of untested code yet you have 100% code coverage.

Some of my blog posts on testing:

* Deception in tests considered harmful https://www.nu42.com/2017/02/deception-in-tests-harmful.html

* Know what you are testing: The case of the test for median in Boost.Accumulators C++ Library <https://www.nu42.com/2016/12/cpp-boost-median-test.html>

* Who is testing the tests? https://www.nu42.com/2015/05/who-is-testing-the-tests.html

* Slashing one's feet with tests, or, how to fix 2,950 test failures in one fell swoop https://www.nu42.com/2015/08/fix-2950-test-failures.html

harryf 5 years ago

Came here hoping they’d turned Factorio into a tool of creating tests in other codebaes. Like literal gamification of work.

  • dgb23 5 years ago

    You might have looked at Flow Based Programming?

    It has certain characteristics that align with a game like Factorio or Oxygen Not Included etc. such as visual programming, backpressure, common interfaces, local retention etc.

    I can imagine this being applied to distributed/cloud computing as a way to reason about high level interactions and perhaps functional/integrated testing.

    • ashtonkem 5 years ago

      I’ve done a bit in my home automation; it’s no replacement for a scripting language.

tgtweak 5 years ago

Damn I misread TTD and got excited that they were building it into factorio... Great article though, was not dissapointed.

achairapart 5 years ago

Warning: This page almost crashed my browser (FireFox on MacOS) and put my CPU on fire.

  • DizzyDoo 5 years ago

    My poor 2015 MacBook Air with FireFox went full 100% CPU on this page, I think it's the gifs.

    I think Factorio itself actually runs better on this laptop than that Factorio blog post does.

    • kart23 5 years ago

      I have a dual-core 2015 Macbook Pro running FF too. htop showed my cpu pinned at 100%, VTDecoderXPCService was taking the lions share, there probably is something weird about the gifs.

      • kllrnohj 5 years ago

        the "gifs" all appear to be mp4 <video> elements, not actual gifs. Possibly FF is doing something weird here (such as playing videos that aren't visible), possibly it comes down to whether or not the hardware decoding can handle it or if it's falling back to CPU decoding.

  • Metacelsus 5 years ago

    I'm also using Firefox on Mac and had no issues. I'm blocking their Javascript though.

    • kllrnohj 5 years ago

      There doesn't seem to be any meaningful JS on the page. A single google analytics script and a tiny[0] little toy script for doing a silly animation when you click on the #rocket element.

      Possibly the google analytics is doing something heavy (although it doesn't look to be when spot checking with a profiler), but there's otherwise nothing JS that runs continuously.

      0: https://factorio.com/static/js/factorio.js

  • Diggsey 5 years ago

    Strange. I'm using firefox on windows and didn't notice any problems.

    • Aachen 5 years ago

      Firefox @ Linux, also no problems (and a crappy cpu at that). I've noticed some of their very-gif-heavy posts slowing down this laptop before, but not this post.

  • Smaug123 5 years ago

    Likewise (Firefox 89.0.1 on macOS 10.14.6) I had to close the page once I'd scrolled down to the layout of the various building interfaces; ended up just reading the HTML.

  • diimdeep 5 years ago

    Same, this page hangs entire Firefox on macOS

bluGill 5 years ago

> Imagine you have a company that goes slower and slower every quarter, and then you confront the shareholders with the statement, that the way to solve it, is to do absolutely no new features for a quarter or two, refactor the code, learn new methodologies etc. I doubt that the shareholders would allow that.

It is called restructuring and big companies do it all the time. Investors allow it, though they are rightly suspicious - sometimes it is good, but often it is change for the sake of change and not change for better.

happyweasel 5 years ago

You can TDD as much as you want to once the initial game mechanics are in place and a gameprotoype shows enough promise to be realized until completion. Because then most of the core stuff/ideas/principles won't wildly change and won't be thrown away.. The core mechanics are in place. But would TDD help you reach that stage? I guess it is simply too much overhead. So yeah, this is TDD after the fact ;-). I still love TDD :)

  • chii 5 years ago

    > But would TDD help you reach that stage?

    if your game has a lot of interactions, and you want to make sure that your changes are not causing unintended interactions, tests like these would help a lot during development.

    • marcosdumay 5 years ago

      There is always a comment with that claim on a TDD thread. Just to clarify, writing tests is not TDD. There are plenty of ways you can have a codebase full of tests, TDD is only one.

      But anyway, I doubt tests help at all in the prototype phase (by any procedure you want to get them). My guess is that they are incredibly harmful.

  • meheleventyone 5 years ago

    > You can TDD as much as you want to once the initial game mechanics are in place and a gameprotoype shows enough promise to be realized until completion. Because then most of the core stuff/ideas/principles won't wildly change and won't be thrown away.

    If only game development worked this way!

Aardwolf 5 years ago

I used to follow friday facts until it stopped being weekly. I'm glad that every friday fact now gets posted on hacker news, that serves as my notification for new ones :)

I also misread TDD as TTD (related to the trains in factorio) first

  • depaya 5 years ago

    The trains are my favorite part of Factorio. I would love "TTD but in Factorio"

AwaAwa 5 years ago

While I'm ambivalent on TDD, there seems to be an attempt at cancellation brewing for his invocation of Uncle Bob.

swiley 5 years ago

Factorio convinced me that some people still write good closed commercial games. I wish the best for the authors and hope they don't stop any time soon.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection