Settings

Theme

Ask HN: How do you deal with 2000 line functions?

26 points by pinkunicorn 10 years ago · 71 comments · 1 min read


I'm working with an archaic code base and I'm stuck at trying to add new business logic into a 2000 line function that is absolutely crappily written with nothing but if() conditions in it. Given such a scenario, how do I go about adding a new functionality to this mess? Should I tell my manager that the function needs to be refactored and take time to refactor it?

Edit: Removed offensive word

ndesaulniers 10 years ago

1. Unit tests. Otherwise modifying old code is full of potentially unforeseen landmines.

2. Read the function a few times to get a sense of what it's doing.

3. Break the function into blocks of statements that are performing a related task.

4. Factor those out into smaller functions.

5. Recursively repeat 3-5 until you're satisfied with the new functions' line count.

> Should I tell my manager that the function needs to be refactored and take time to refactor it?

It's absolutely worth mentioning. Time estimations are one of the harder tasks of Software Engineering. Being up front with your manager can help better set expectations.

  • rrauenza 10 years ago

    I would add assertions in as well as you iterate. It will validate whether you understand the flow of information through the function.

    • phpnode 10 years ago

      Absolutely this. Not only do assertions make it easier to validate your understanding, they capture and preserve that understanding for future developers (including yourself, 6 weeks from now).

      Assertions are comments which can always be trusted and which never get out of date. It baffles me that there's so much reluctance to use them (especially in dynamic languages).

  • bdavisx 10 years ago

    There's no way to add unit tests to a multi-thousand line method. There's almost no way something that long was written in a testable manner.

    You can probably add some tests around the inputs/outputs, depending on the structure. For example you might have to check the database before/after and other things like that. But it won't be "unit" tests in the conventional sense. More like Integration tests, because something that long should not only be multiple methods, but multiple classes.

  • afarrell 10 years ago

    Unit tests might be hard on a function like this and you might have to admit that your tests are more integrationey than you'd like.

tinco 10 years ago

Don't tell you manager anything, just do it. If someone tells you to add functionality to a 2000 line function, it takes extra time. If you want the company to benefit a little bit more from the time you spend on grokking it (by saving the next poor soul some time) by splitting it up.

Go to the part that you suspect you need to modify, identify a block of code that together forms something you can give a name, i.e. "initialize_doodad_struct", "validate_widget_params", pull it out put it in a function. Now look at all identifiers that are referenced or declared in your block of code, do a ack or ctrl+f through the 2000-N line function to 100% verify that they're not referenced anywhere. If they aren't you can safely encapsulate them in your new function (if you're working in a static language you probably could just right-mouse-click-extract to do all this).

Then immediately verify that the code still works. Run it, ideally in unit tests but most certainly also in real world(-like) scenarios. Do this for every small step. If your codebase takes 2 hours to compile, instead make a commit. Make a commit for every single 3-10 line extraction you do. (do it anyway, it'll look good and everyone will love you)

And then repeat until either the part you wanted to modify is clear to you, or there's nothing more to extract ;)

By the way, I'm a Ruby programmer mainly, and in Ruby any function over 10 lines is frowned upon. I know that in some programming language communities (looking at you C and C++!) a 200 line function doesn't raise any eyebrows, and they'll sometimes even scold you for extracting a function if it doesn't 'add' anything. I think this is because in C every function pollutes the global namespace, so there's a little more cost associated with it.

  • radicalbyte 10 years ago

    If you write C/C++ then you probably need fast code. Function calls - unless inlined - have a small cost. The cost is really rather small (think memory writes) and can be negligible depending on where the values are stored.

    That is of course rarely the reason. From what I've experienced it's usually that the developer is uncomfortable with abstraction (at least that was my reason many moons ago).

    • joshka 10 years ago

      If speed of algorithm / function is a concern on particular functions, write a test that measures this. Refactor as much as you like until you hit the limit of readability vs adequate speed.

  • azth 10 years ago

    > and in Ruby any function over 10 lines is frowned upon

    Citation required.

mod 10 years ago

You test the shit out of it, then re-write it ensuring tests pass, then add your new code.

It's probably easier to do that than to muck around in the 2000-line version hoping you don't screw something else up.

  • NDizzle 10 years ago

    Yep, this is the only way to handle these beasts. Don't touch anything. Make a ton of tests. Spend some time validating those tests. Get someone to write off that those tests are sufficient. Start refactoring!

    • falcolas 10 years ago

      For a function with 2,000 lines of code, we have to be honest with ourselves and accept that testing will never be sufficient; if there's 2,000 lines of code, you can bet good money that there's global state manipulation as well.

      Making the assumption that anybody is capable of sufficiently testing such functions and subsequently re-writing them will only introduce new bugs and old regressions.

      Seems harsh, but I've had to work with a few such monstrosities. Global state galore.

      • johnmaguire2013 10 years ago

        Global state is one of the trickiest problems we are trying to solve at my workplace. Our entire, 10 year old, PHP codebase relies on a single static class (and now some supporting static classes) aptly called "Meta". What does it do? Meta things.

        It's basically an abstraction over the entire database layer, that has probably over 30 toggles that denote where and how it should fetch data, given a table name and a function call.

        And every database call in the entire project is dependent on it.

      • aeneasreid 10 years ago

        Test if you can, but I agree. Thinking unit testing will solve the problem can be incredibly naive especially if the code has lots of global state. Sometimes, just setting up all this state for a test is equivalent to rewriting the application. If the rest of the application is written like this, chances are this function makes a ton of 2k line calls of its own. Unit testing just might make sure you aren't doing something catastrophically dumb.

      • seanwilson 10 years ago

        I've been here too. If possible, try and pull someone in who's familiar with the code to hopefully help you test and simplify it.

        2000 lines of code is a massive amount of behaviour to understand. There's a minuscule chance you can infer all that behaviour from reading the code unfortunately.

      • NDizzle 10 years ago

        That's why I mentioned getting a write off on the changes. If several peers, project managers, etc write off that they expect the behavior to be one of these things, covered by these other tests, and it's not, well... Everyone just got educated if and when it breaks!

  • awayaccount53 10 years ago

    The other trick once you have tests is to change the code to be easier to work with without changing its behavior. Prefer doing this in small steps that you can be confident about. Once the code is in a form amenable to change, which usually means a change in one place doesn't have non-obvious behavior in other places, bugfixes can be applied.

  • pinkunicornOP 10 years ago

    Makes sense. So step #1, get 100% coverage for that thing.

falcolas 10 years ago

Slowly and carefully, like you're defusing a malfunctioning clockwork bomb.

1) Understand at a very fundamental level what it does (there are probably multiple things).

2) Understand at a very fundamental level what it does (this is really the hard part)

3) Document your hard-earned understanding (tests, ironically, do not work well for this, since there are no fundamental units to test at, and there likely is a metric ton of shared state)

4) Refactor it into something easier to understand and test.

I've had to work with a similar 5,000 line C function at the heart of a DSL parser; literally the core of the entire business. Changes were made very slowly, and with a lot of consideration. Refactoring is still underway 6 years later.

kpil 10 years ago

Pft, 2000 lines :-)

What you do is that you:

A) Realize that there might be years of hidden bugs, but also fixes to complex problems, speedups, kludges, weird changes to requirements, etc, hiding in the code, so even really creative unit-test might not cover real-world edge-cases.

B) Take as small bit as you can, and just refactor out the smelly code into smaller parts. Often a good name around smelly code helps a long way.

D) Iterate. Until good enough.But not more... Maybe you have more important things to do than rewriting ugly code that works.

There is a big risk that since you probably won't understand everything that goes on, and why - you will break something important.

mh8h 10 years ago

Michael Feathers' book, Working Effectively with Legacy Code, is highly recommended. In that book he explains that you can choose between two approaches: Change and pray, or Cover and modify. Then he provides dozens of different strategies to "cover" the existing legacy codes before you make the changes.

  • DanielBMarkham 10 years ago

    +1 on the Feathers book recommendation.

    The good news is that many, many, many people have been here before you. While you may have a lot of work ahead, you can do it safely and in an orderly and safe fashion. (And with something as butt-ugly as that sounds, that's the only way I'd do it)

azeirah 10 years ago

Something I found to be really useful is to remove all statements inside the code, so you'll be left with large skeleton structures. Look at it from a distance, (small font), see some patterns. Can you split it up?

getting a feel for the structure of the function helps me understand it.

Ex:

function someFunction () {

    if () {
        for () {
            if () {

            } else {

            }
        }

        if () {
            for () {
            
            }
        }
    }

}
  • al2o3cr 10 years ago

    To expand on that, something I sometimes find useful with giant functions with too many 'ifs' is to pull out the code that runs for each of the possibilities. Frequently I'll discover that there are actually several different overlapping functions (possibly with some common parts) mashed together with the differences wrapped in 'if'.

    Ferinstance:

      def giant_method
        if thingy1?
          do_stuff
        else
          do_something_else
        end
        do_common_stuff
        if thingy1?
          finish_up_for_thingy1
        else
          finish_up
        end
      end
    
    It's easy to spot that there are really two paths through the method (depending on what thingy1? returns) in this simplified form, but rewriting it will make it clearer even in a 2k line function:

      def giant_method
        if thingy1?
          do_stuff
          do_common_stuff
          finish_up_for_thingy1
        else
          do_something_else
          do_common_stuff
          finish_up
        end
      end
    
    This is also useful even when the 'if' clauses aren't identical - even then, there are frequently implied relationships (for instance, if one checks if something is positive and the other checks for zero) that can simplify things.
  • pinkunicornOP 10 years ago

    Ah I do this with if() block folding in Vim.

    In this case, the developer actually has return statements inside some if blocks. A general point that has stood out from all comments is to first understand how the function achieves what the function is trying to achieve.

    • odonnellryan 10 years ago

      A return point inside an if statement is a great place to do a refactor. Refactor anything inside that if into its own function, write a test, and see what happens :)

yk 10 years ago

Depends, but very likely. There may be cases were there is practically no way around a 2000 line function, one example would be a something like a very long switch-case statement, were the logic is very simple, but there are a lot of cases to cover. For example something equivalent to (python pseudo code)

     dict[ keyword](args)
were dict is a large dictionary containing function objects. In that case the dictionary approach has the advantage of separating the logic from the clutter, but it is possible to work with 2000 lines of

     case something:
         foo();
         break;
However if it is not such a case, then you are in trouble. Thing is, management does not like to be told that the code is a steaming pile of shit, and management is especially unhappy if you tell them that you are going to spend a few month on accomplishing absolutely nothing. From their perspective refactoring is accomplishing nothing: the functionality is there, it works and they do not care about the gory details. So you need to convince your manager that refactoring is a good idea, which just goes against every basic assumption of his job. So think about the actual problems you have with the code, and what the company gains by refactoring it. (Easier extensibility, less risk of bugs and faster turn around if a bug occurs.) Then make sure that you convince your manager that refactoring is the right thing to do. ( Ideally you should enlist your coworkers for that, if they know the kinds of problems.)

Speaking a bit more generally, this is part of why technical dept happens. Selling refactoring is hard, so everybody tries to tip-toe around the 2000 line gorilla until some unhappy soul (read someone else) can no longer avoid to wrestle with it.

lutorm 10 years ago

I like this book, it has a lot of tips for situations like these:

http://www.amazon.com/Working-Effectively-Legacy-Michael-Fea...

noonespecial 10 years ago

I've had worse. I was instructed to add to the > 2000 line function of nested if/than/else but the function was written by the manager who said "Don't change any of my code, its been working perfectly".

This function was used as a container to hold all the business logic and flow control. It had dozens of parameters that all had to have some value or another passed to keep from throwing undefined errors. There was an integer called "wat_to_do" that would choose different blocks of if/than/else inside the function. Not like case, mind you, scattered everywhere.

I've never really been the same since then. Can you get code PTSD?

joshka 10 years ago

A few people have mentioned Michael Feathers' book Working Effectively with Legacy code already, but I'll add my piece on this. The book describes legacy code as any code without tests.

It then goes on to describe the "legacy software change algorithm" (google that for more articles that will help you. Get targeted legacy code into test harness and cover with tests: 1. Identify change points for the target change or new code addition. a. Find test points where the behavior of the code can be sensed. b. Break dependencies (very carefully and often without sufficient tests) and get the targeted legacy code into a test harness. c. Cover targeted legacy code with (characterization) unit tests 2. Add new functionality with Test Driven Development (TDD) 3. Refactor to remove duplication, clean up, etc.

It really is worth reading this book as it covers almost exactly your question: "22. I Need To Change a Monster Method and I Can’t Write Tests for It." and various other relevant chapters.

Answering your manager question with a standard consultant answer "It depends". Refactoring code is about setting yourself up for future development. You should always be refactoring your code after you confirm it works. Asking your manager whether you do it should take on the same amount of importance as asking whether you should wash your hands after visiting the restroom. That said, developers can often feel a sense of ownership of code and not wish change for various reason (usually familiarity with the current crap situation).

  • joshka 10 years ago

    Additionally to this. In your refactoring process, stick a commit on each step of the refactoring. It will help you go back to a point where you were happy. Squash the commits when you're done if needs be for cleaner history, or leave them if there aren't a bunch of people on the project.

davimack 10 years ago

I would actually approach this differently. I'd go back to whomever the authority is with regards to what the function is supposed to do, or to the spec, and then rewrite it from scratch. This won't ensure that it behaves as it does now but with a little bit added - it will ensure that it behaves as per the specification / user requirement. If the function is that huge, with just if's, it's very likely that it's buggy and that those bugs haven't been addressed. They won't be addressed if you duplicate the functionality in a refactor unless you can get a handle on the purpose of the thing to begin with. Scrapping it and rewriting it is not a bad thing - frequently, you'll make something much tighter if you do that.

(note: would be happy for you to have left the "offensive" word in - 2,000 lines of code for a single function screams for a refactor, and says that the original dev had no clue how to write good code. if the code makes you want to cuss, well....)

  • joshka 10 years ago

    -1 on this. 2000 lines shows that there's a good history of edge cases that are potentially relied on by various other parts of the application or system. Much of the time, the code is the spec. Writing tests for this function will allow you to rewrite an equivalent better function safely, as well as discover functionality that is hidden / ambiguous from the spec, and bugs that may trip you up in the second iteration (whether generated via refactoring or rewriting).

alkonaut 10 years ago

I think making small opportunistic fixes is best. Do what you can do without massively delaying your work every time you need to work in that area of the code. That is, leave the code better than you found it, each time.

If you add the new logic this time and also split it into four 500-line methods and maybe a few extra tests, you have done quite a lot and it is likely doable within the time frame of the work you are doing.

But

If there are zero tests and you aren't confident that you can create those tests then I'd bring it up with management. Maybe they'll say that changes in this functionality will be very rare after this small change and that it's very well tested manually and by customers - then that's it. Not all code has to be made concise and elegant, it's better to focus your effort on the code that regularly needs changing. However, in this case I'd ask management to consider not changing it at all, or make sure it will be very well tested manually after your changes).

wnevets 10 years ago

I would add landmarks and do small refactoring within the function until I'm confident in my ability to start splitting it up into smaller functions. This should happen naturally as you make changes and add new features. However if you're not in the function enough to make this happen then it's probably not worth refactoring to the business.

ufmace 10 years ago

Start out by studying and understanding the function, the way that it's used, and everything that it does. Does it mess with a lot of global state? Is it called on one place for one thing, or a whole bunch of places, or even for multiple different reasons? How hard would it be to set up some unit tests?

I would recommend against any kind of rewrites or radical refactoring right away. You probably don't know what it does well enough yet or what kind of workarounds and bug fixes are in there, so you're likely to bring back bugs. And not many businesses have the time to spare to actually redo something like that correctly all at once.

Aside: You could just give up and quit, but I consider it part of the job of a good developer to be able to take a mess of spaghetti code and gradually turn it into something easily understandable without causing huge delays in business processes or show-stopper bugs. I don't think a developer who is only willing to touch pristine code is very valuable in the real world.

I'd make the first goal to add the required changes with as few changes to the function as possible. Focus your refactoring work at first on trying to get some tests around this thing. Don't try to test every use case right away, but do try to comprehensively test a few tasks, including capturing whatever it does to any state in any other systems. Most likely, a lot of the initial refactoring work will hardly touch the function at all, but instead focus on putting the things that it touches into modular, testable parts. Getting good tests around it will help you understand what it does better, what parts of it are important, and how state is handled around the system.

Then as you go, start adding tests, and gradually refactor things that are tested. Write tests for any bugs that are reported back to you. Anything that looks weird, ask around and see if you can figure out what it's for and if you really need to test for it.

rbrogan 10 years ago

You could start by writing comments. There is little chance of any bugs coming from that. If there are any tests available then you could work on understanding the tests and then step through the code in the debugger. Hope that helps.

tienthanh8490 10 years ago

I know other people already suggested writing test, carefully documenting etc so I won't repeat that. Just want to add one more thing: try to prune the code first as such a large codebase is likely to have redundant code built up over time. By doing this you will have some time to understand the code (to know what you can kill and what you can't), and also save yourself some time as the codebase you have to refactor now is now much shorter and probably cleaner.

agentultra 10 years ago

I've had to do this once. They don't teach you managing code like this! A friend gave me a copy of Working Effectively With Legacy Code[0] which helped me.

The gist of it: a strong suite of integration and unit tests. Isolate small code paths into logical units and test for equivalency.

[0] http://www.amazon.com/Working-Effectively-Legacy-Michael-Fea...

timonoko 10 years ago

I remember linear lists of 1000+ if-statements. These were "state machines", popular form of real-time multitasking 50 years ago. Nothing wrong with that. You might make an analyzer which recognizes longer series of events and impossible states, but why bother. This is a programming paradigm, best you can do is to go with the flow.

fallingfrog 10 years ago

We have a saying where I'm from - "That's the smell of money!" Relax, I'm sure your boss knows that this is going to take a while. Make prudent changes, don't go overboard, take your time and test thoroughly.

frozenport 10 years ago

Where a function ends and where a function starts is somewhat artificial. For example, if this were BASIC it would be just be a label.

I would read the thing a few times and figure out exactly where to add the functionality. Refactor second, if time permits. If the thing is 2k long clearly, refactoring it isn't a priority.

there4 10 years ago

This is a great presentation about refactoring that may be useful: "RailsConf 2014 - All the Little Things by Sandi Metz" https://www.youtube.com/watch?v=8bZh5LMaSmE

lectrick 10 years ago

1) Cover code with tests 2) Refactor. Martin Fowler wrote a famous book on refactoring patterns that are guaranteed to merely transform the code and not break functionality. 3) Rerun tests, repeat till everything passes

tahssa 10 years ago

Do a re-write. If your manager says no to the re-write then add another if condition to handle the current business logic and, at the same time, look for a new job (which is probably what the last guy did).

scarface74 10 years ago

It depends on the tooling you have available. If I encountered a 2000 line C# method, I would use Resharper and start with the extract method command and then extract class.

kaizensoze 10 years ago

https://www.youtube.com/watch?v=dWIHvuABaFU&t=01m05s

odonnellryan 10 years ago

You should read Clean Code, even if it is just the bits on refactoring, so you can refactor properly! :)

quintes 10 years ago

unit test first so you know if you've broken it. Then lots of test input variations best taken from production sources.

Refactor thereafter and test after each iteration of changes. It's not a rewrite, so don't lose the plot just Refactor into smaller methods :)

panamafrank 10 years ago

quit, get a job working in a newer more modern language or just a company working on 'green field' projects. I worked on legacy code bases for ~3 years as a junior grunt and it contributed to some kind of PTSD-lite disorder.

or read the 'ol refactoring book...

zippy786 10 years ago

Send it to me with requirements, I'll do it for 500$

jarsin 10 years ago

Get a new job :)

sharemywin 10 years ago

you probably need a ton of unit tests for this thing.

zerr 10 years ago

Do not touch it.

davidw 10 years ago

Indeed.com, Craigslist and maybe Linkedin.

PaulHoule 10 years ago

Yes, just don't use the word "shit".

The obvious refactoring is to split the function up into smaller pieces, even if you can split it into two 1500 line functions that is a win.

  • josephmx 10 years ago

    Is swearing on HN banned now?

    • hamiltonkibbe 10 years ago

      Apparently not, I personally find the phrase "2000 line function" offensive but that part isn't censored.

    • tinco 10 years ago

      No, but a large part of the community is in the US where many people (not all) take offense to even mild swear words (in contrast to for example the UK where even politicians won't hesitate to call someone a wazzock).

  • pinkunicornOP 10 years ago

    Thanks, that came out of frustration from looking at similar long-sized functions for an extended period of time.

    • Nadya 10 years ago

      Er... am I misunderstanding or is everyone else?

      Tell your manager that you want to refactor it. Just don't tell your manager "it's shit".

      The "don't say it's shit" was when speaking to the manager... not using it on your thread. Unless I'm the one misunderstanding the statement...

      • PaulHoule 10 years ago

        Both, but more to the boss. I can't stand the whole "Get Shit Done" thing, the way that word gets used on HN is like a universal qualifier, i.e.

        ForAll(x): Shit(x)

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection