How to use feature flags without technical debt

blog.launchdarkly.com

60 points by sboak 10 years ago · 50 comments

Reader

Hmm. This is a different kind of feature flag than I've used, to solve a different kind of problem.

If the feature you're writing takes several man years of effort, you can't have a feature branch living for several months; continuously keeping it up to date with the trunk is expensive and easy to procrastinate.

Migrations are expensive and you want to front load them to make turning the feature on less stressful. And you may want to let customers use the feature on a beta basis for a few months before committing to it, and then it may take a year or more before all customers have moved.

For a big feature that cuts across large segments of a big app, I don't think there's an alternative to if statements.

Different apps, different business models, etc.

ori_b 10 years ago

This misses the problem -- often, the new feature is buggy in ways that are not seen in the initial testing. In critical systems, it's often desirable to keep the ability to flip back to the old code around for a couple of release cycles. In a previous life, that has saved my team's bacon.

This comes at a cost, and somehow saying "just delete the flags promptly" is a facile solution; If it was easy to just delete them quickly, it would be even easier to just land the change without the flag, and use rollbacks as your 'undo a bad feature' hammer.

gioele 10 years ago

> This misses the problem -- often, the new feature is buggy in ways that are not seen in the initial testing.
Indeed. The only good way to enable features gradually is to use a tool like GitHub Scientist [0] that exercises the new code path and records its effects but then uses the effects of the old code path in production. This allows weird edge cases to be found and dealt with before enabling the new feature.
[0] http://githubengineering.com/scientist/ Previous discussion: https://news.ycombinator.com/item?id=11027581
- pkaeding 10 years ago
  
  Yeah, I actually wrote about something similar recently (in the context of a database migration): http://blog.launchdarkly.com/feature-flagging-to-mitigate-ri...
richman777 10 years ago

This is similar to my experience, but there have been situations that are similar to what the article posted. Usually however, those are weeded out by the time the deployment to production comes around.
In that case, the QA team is already testing off the feature branch and when they're done testing, you probably don't need a cleanup branch as you've tested the case where you're going to not use it.
It seems that the method posted in the article would be very useful for when you have a strict release cycle with versioning/features in place and you know with pretty good certainty when you are either going to sunset an old feature or require all users to be on the new feature. In that regard, I could see this working very well.
pkaeding 10 years ago

Author here.
The point is not to delete the flag promptly, the point is to delete it cleanly, when you are confident the new feature is good.
- ori_b 10 years ago
  
  In that case: I don't think I've ever seen any team have problems with how to delete a flag. The struggles tend to be about when.
  - lijason 10 years ago
    
    When you say "when", do you mean depending when is the best time or making sure you, as a team, consistently do it? The latter is from my experience the hard part. It's hard to make sure that's part of the process as it's by necessity quite awhile after a feature has been released and no one is actively working on bugs related to it.
    We've started to use automated scripts that identify "old" feature flags that are turned on to 100% that are still in the code base and treat them similar to bug tasks.
  - alblue 10 years ago
    
    Define a macro which has the feature name and a future date for when the build should start complaining. For extra points, have two dates: one for generating warnings and another for generating errors. Works like a charm.
kazinator 10 years ago

If a new feature is completely buggy, but it does not impact any existing features, then that doesn't require any compatibility switch. You just fix the feature and issue an update or new release.
If a new feature is buggy but useful (say, the behavior is wrong in a way that users figure out, and start depending on), then you can have an option to emulate that buggy behavior.
If a new feature causes a regression in existing features, then that is a call you have to make. You have a situation in which something worked up to version K. Then was broken between K+1 and L, either not working at all or working differently. Then as of L+1, it was discovered, fixed and works again as before.
If it was completely broken, then you just fix it and that's that. If it was broken in ways that left it useful, such that users may have come to depend on the altered behavior, then just subject it to compatibility. Emulate that behavior if users request compatibility with a version between K+1 and L. If they request compatibility with K or less, or L+1 and higher, enable the current fixed behavior.

aaron695 10 years ago

To clarify...

This is to roll out a feature to a small amount of customers for inital live beta testing before rolling out to all customers?

If so i think this is good. It is documenting the real issue of complexity of comming back to old code and old problems (old being weeks), even if you don't merge it cause of merge hell. At least you have a document of what to do.

And you are culling dead/dangerous code.

pkaeding 10 years ago

Yes, this clarification is accurate. I was thinking of 'canary launch'-style releases, where you release the new thing to a small group, then a larger group, etc, until everyone is getting it, and you don't need the flag any more.

ozten 10 years ago

In my experience with large codebases and multiple teams, another developer might copy that flag into another part of the code to get some desired side-effect.

Yes, this is horrible, but in the real world...

I find you have to grep through the code and think about all the changes that impact your feature flag before systematically removing it. You're cleanup branch isn't being maintained and is could provide a false sense of safety.

pkaeding 10 years ago

You are right that the real world is always more complicated. However, I think the idea holds. If another dev needs to use the flag in another area of code, the flag cleanup branch should be maintained with this change.
The point is not to make flag cleanup automatic. It is to front-load the work of cleaning it up when the complexities involved are fresh in your mind. That way, when it comes time to clean it up, it is much easier to be more confident that you found all of the edge cases.

throwaway6497 10 years ago

"You will need to merge master back into your cleanup branch periodically, but that is usually easier than it would be to recall all of the context relating to the original change."

Won't there be merge conflicts when you do this the first time as the clean feature branch code be different from the flag based feature on the master? Of course, all the subsequent merges should be conflict-free.

pkaeding 10 years ago
There shouldn't necessarily be any merge conflicts if you branch the cleanup branch off of your feature branch. So, it might look like this:
```
  -master---------------*--------------------*--
    \-feature-branch---/                    /
                      \-cleanup-branch-----/
```
cosmolev 10 years ago

But why only the first time? This code is always different.

perlgeek 10 years ago

On a tangent, how long-lived are feature toggles usually?

I have very limited experience, and it points to a wide range from a few days to few months. When I stumble about a 1y+ old flag, I tend to delete it (and the dead code path that it comes with).

What's your experience?

kevan 10 years ago

Our feature toggles have tended to live for months to years. Sometimes the really old ones were our fault for not removing, for example we rewrote our payments UI in mid-2014 and the flag stayed around for more than a year after we were at 100%.
Other times integration with third-party tools was what held us back. We rewrote our product pages in 2013 but our recommendations vendor was scraping the old version until 2015 because no one wanted to spend the vendor hours switching it to the new version.
My favorite was our add-to-cart actions. In the old platform we ended up with about 10 different user flows after clicking the "Add to cart" button from 2013-2016. This case was driven by heavy AB testing (should we show a confirmation modal? tooltip? send them to the cart page? what about an interstitial page that shows recommended add-on products? etc). In this case we accepted the overhead of lots of feature switches because a .1% conversion bump moved the needle pretty far.
The shorter flags have lived for a couple months as we build a new feature and then test into it at small percentages to work the bugs out. Once they're in at 100% and we're confident we rip the flag out.
pkaeding 10 years ago

In my experience, there are a few different types of feature toggles. Some are permanent, and are useful for operational tasks, like putting an application into read-only mode, or disabling one service that is overloaded to prevent a cascading failure.
For the temporary type of toggle, which is what I was addressing with this blog post, my experience coincides with yours-- usually a few weeks.
The trick with deleting a year-old flag (which I was trying to address with this post) was that you need to be careful when deleting code that you haven't worked on in over a year. If you have the list of necessary changes all pre-baked in a branch, this can be at least a little easier.
kazinator 10 years ago

Feature toggles can last decades.
For instance GCC has a feature flag called -ansi which gives you C90 compatibility.
C90 was superseded in 1999 by C99, and so that's 17 years of compatibility, and counting.
- startling 10 years ago
  
  I don't think this is a feature flag in the same way the rest of the discussion is using the phrase.
  - kazinator 10 years ago
    
    How so? C99+ support/conformance is a compiler feature. That feature breaks/conflicts with some aspects of C90 support, an existing, older feature. So you need a feature flag. Inside comiler there are various places where you have the equivalent of "if C90 do this, else do that".
    
    startling 10 years ago
    
    "Feature flags" tend to be for behavior that is being developed and tested. The -ansi flag is more like configuration. It's pretty valuable to continue to support C90.

marc_omorain 10 years ago

I've had good success in the past adding @deprecated annotations to the old code when adding new code behind a feature flag.

It makes it much easier to come along later and know which functions can be deleted when the decision is made to kill the old code.

0x0 10 years ago

Sounds dangerous. Later commits on master might actually add more if(feature-flag) statements, so if you then just mindlessly merge the cleanup branch, you'll miss the added ifs.

I'd prefer to create a cleanup branch like any other feature branch only when actually going to clean up, and spend the extra cost getting back in context, studying all the if(feature-flags) from master. Otherwise you might miss some, or you might forget some interaction that you learned after feature deploy.

pkaeding 10 years ago

Yeah, I think if you mindlessly do anything, you're gonna have a bad time.
Think of the cleanup branch as a running list of changes that you know you will need to make to remove the flag. Any future references to the flag should keep this cleanup list in mind. Code reviewers should keep these cleanup lists in mind.
This list of cleanup tasks happens to be expressed as a branch in your VCS (this is a pretty good way to express changes that need to be applied to a codebase). You will still need to be careful when you execute that list, but it will be helpful to have the running tally of things that need to be done.

vemv 10 years ago

Haven't tried it myself, but why not use authorization libraries instead of specialized 'toggle' libraries?

After all, both are concered with whether user X is allowed to do Y.

Using just one approach might be a clean, maintainable approach.

The original code `if can?(:use_feature_x, user)` is written just once, and then never needed to be removed. The only thing that changes, gradually and cleanly, are the business rules in :use_feature_x (e.g. update the method in your ability.rb, using Ruby CanCan terminology)

pkaeding 10 years ago

In canary launches, you might want to roll a new feature out to 10% of your users, then 20%, etc. Once it is released to 100% of your users, you might want to remove the check, since it is a no-op.
I'm not aware of any authorization libraries that let you grant access to a percentage of your users, but maybe they are out there? It is a strange use case from an 'authorization' standpoint.
- vemv 10 years ago
  
  no-op point is true (except for logged out users - then the check is still useful)
  Anyway, how do you consistently decide to which 10% you show the new feature?
  That piece of data is better stores in your Users table, as I see it. Plays well with authorization libs.
  - pkaeding 10 years ago
    
    The way LaunchDarkly does it is to hash the user key, along with the feature key. This way the same users aren't always included in the '10% set' for all features, but they are consistently in the 10% set for a single feature.
    This also allows the decision to be made in memory, without an additional round-trip to the DB.

kazinator 10 years ago

In the TXR language interpreter, I have a -C option which takes a numeric argument: it means, simulate the old behaviors of version N. If you don't specify -C, you get the latest behavior.

Throughout the code, old behaviors are emulated, subject to tests which look similar to this:

   if (opt_compat && opt_compat < 130) {
     /* simulate 130 and older behavior */
   } else {
     /* just the new behavior please: -C was not specified,
        or is at 130 or more. */
   }

I think that tying specific old behavior to a proliferation of specific options is a bad idea. It does provide more flexibility (give me some old behavior in one specific regard, but everything new otherwise), but that flexibility is not all that useful, given its level of "debt".

The purpose of compatibility is to help out the users who are impacted by an incompatible change; it gives them a quick and dirty workaround to be up and running in spite of the upgrade to the newest. They can enjoy some security fix or whatever, without having to rewrite their code now.

However, they should put in a plan to fix their code and then stop relying on -C.

If users are given individual options, that then encourages a behavior whereby they use new features with emerging releases, yet are perpetually relying on some compatibility behaviors. This leads to ironies: like being on version 150, and starting to a feature that was introduced in 145 and changed incompatibly in 147 and 148---yet at the same time relying on a version 70 behavior emulation of some other feature. Hey we don't care that this new thing was broken recently twice before being settled down; we never used it before! But we forever want this other thing to work like it did in version 70, because we did use it in version 70. It's like using C++14 move semantics and lambdas, but crying that GCC took away your writable string literals and -fpcc-struct-return (static buffer for structure passing).

It's very easy to hunt down the opt_compat uses in the source code just by looking for that identifier, and the version numbers are right there. If I decide that no emulation older than 120 will be supported in new releases going forward, I just grep out all the compat switch sites, and remove anything that provided 119 or older compatibility. The debt is quite minimal, and provides quite a bit of value.

retbull 10 years ago

Is there any way you could explain that again? I don't quite get what you are doing.
- kazinator 10 years ago
  
  Without knowing which part you don't get, there is a risk I just repeat everything in a scrambled order!
  The highlights are:
  We have software (a programming language and its library) that is versioned in a simple, linear way: it goes from version N, to N+1, to N+2 and so on.
  Users who are using version K now depend on some features. Suppose the behavior in version K+1 changes some of the features. The users will be rightfully unhappy; they upgrade to K+1 and things work differently, breaking their code.
  To anticipate this, we can have a command line switch or environment variable whereby users can request "please emulate version K". Then version K+1 (and K+2, K+3 ...) will restore those behaviors which were altered starting in K+1.
  This does not disable purely new features that don't break existing behaviors. For instance, if a two-argument function can now take an optional third argument, such that a two-argument call behaves exactly the same way as before, that won't be subject to emulation. A whole new function that didn't exist in version K is not going to disappear under K emulation.
  This isn't a perfect strategy. Things can go wrong. But it's fairly decent.
- mschuster91 10 years ago
  
  It's similar to the stuff you see in a raw "mysqldump" output, PHP extensions or in Microsoft's C stdlib headers. Shitloads of stuff hidden between #ifdef VER > xxx.
  Pretty easy to deal with, tbh. And it's flexible as hell.
  You can flame MS for a LOT of things, but not for ignoring backwards compatibility. You can take most age-old VC6 projects, import them in a modern VS version, and BUILD them and it will WORK.
  Not so much in the Linux space. A statically compiled binary from Win95 may very well run on a Win7 machine (e.g. EarthSiege 2)... good luck trying to get a Linux binary from the same era running on a similarly fresh Linux kernel.
- shoo 10 years ago
  
  the user can optionally specify which version of behaviour they want. This is named the `opt_compat` value in the code. All through the code there are checks against the `opt_compat` value to decide which version of which old/current behaviour to use.
  - kazinator 10 years ago
    
    And the opt_compat has C integer/boolean semantics in this case, so the test
    if (opt_compat) ...
    tests whether the option has a nonzero value (has been specified).
    And so
    if (opt_compat && opt_compat <= 130)
    means "user has requested compatibility, with a value of 130 or less".
    By the way -C 0, which would look as a Booealn false, as if -C were not specified, is not allowed. If the user specifies -C N such that N is lower than the oldest version that we emulate, the implementation terminates with an error message like "sorry, compatibility with versions less than 70 is not supported by version 140".

jupp0r 10 years ago

If you do feature flags by inserting if blocks throughout your code you will create tech debt anyways. The goal is to have one if block and hide the changed behavior behind interfaces (or polymorphic functions if you are using functional languages). Dependency injection is your friend.

If you don't do this, you won't scale beyond a hand full of feature flags. Chrome has hundreds, for example.

backslash_16 10 years ago

Can you expand on this? I'm interested to see how this works in a real code base.
I'm thinking something like an initial (maybe massive) if block in the setup of the application that sets all of the behavior/features by declaring which implementations get set to which interfaces? After this if block, all of the DI stuff is set?
This of course means you need to use a DI framework of some sort.
Using feature flags is something I'm investigating because our current model is a git branch for every feature, and I wonder/fear it only works because we're a small team that has worked together for a while and in the future when we grow this will break down.
- adamconroy 10 years ago
  
  I was about to say the same thing as your parent. I recently had a nice feature toggle experience using a strategy pattern with DI.
  Basically there was one 'if' statement in the DI container configuration code that looked up a config. Basically
  if (newPricingStrategy) bind IPricingStrategy to NewPricingStrategyImplementation else bind IPricingStrategy to OldPricingStrategyImplementation
cosmolev 10 years ago

In the case of Chrome are they feature flags or configuration flags?

johansch 10 years ago

This obsession about avoiding technical debt is quite strange. It's a tool to use when it makes sense, like loans in the bank...

dahart 10 years ago

Debt, both technical and financial, is always best avoided. And in both cases, once you have it, you have to pay it down sooner or later, and the later you do the more expensive it gets.
It is a tool, but having it is always a negative that is offsetting a bigger negative. By all means, take the loan when you need a boost that you can't otherwise afford. But take the smallest loan you need, and pay it back as fast as you can.
- adamconroy 10 years ago
  
  no, often you don't have to pay for technical debt. the code may never be refactored, and / or the whole system / module replaced before lots of code is touched.
- johansch 10 years ago
  
  Do you think of VC the same way?
aaron695 10 years ago

Worked on many 10+ year code bases?
Technical debt comes in many forms, like developers will refuse to work on messy code.
Not sayimg you're wrong, could be survivor bias. Most code is thrown cause the messy project specs fail. Might be worth paying the extra for devs on the mess that works.
But I don't like it, messy code is annoying hence why I'll remove technical debt. Which the company will pay for. Another cost of introduced technical debt not cleaned at the time when it's easy.
- johansch 10 years ago
  
  > Worked on many 10+ year code bases?
  Two separate codebases got 10+ year old while I managed the development on them, so, yeah. The latter one is something I started together with one colleague; it's now twelve years old and has 250 million monthly active users - Opera Mini; most of those users are in "growth regions". (I left after ten years of that.)
  The main attitude I see nowadays in younger developers seems like an overreaction against technical debt... I blame HN etc. :)
adamconroy 10 years ago

i agree and on top of that I've always thought 'technical debt' was a bad analogy (like most analogies).
my reasoning is that real world debt is inescapable, whereas if someone writes some crap code, which is hard to read/maintain/extend, then the debt is only called in if someone has to maintain it. My intuition is that a fair percentage of code that once it works and is tested, never has changes made, or the whole system is replaced without the code ever changing.
if you buy a house with a mortgage, but never live in it, you still have to pay the debt. if you write some code and don't ever touch it again, there is no debt to pay.
spelunker 10 years ago

Sure, but it's still a yellow flag. Let the tech debt pile up too much and resolving them feels insurmountable.
pkaeding 10 years ago

Sure, use debt where appropriate (be it technical or financial). But be responsible with it, and manage it well.

Settings

How to use feature flags without technical debt

Keyboard Shortcuts