Ask HN: How often should we deploy to production?
I'm team lead in a full stack team. We currentlly accumulate features and bug fixes, and deploy a version every month. Should we deploy more often? How to handle it? I'd love to hear about your experience in this subject. You should be aiming to release on at least a weekly cadence, preferably daily or possibly multiple times a day. Why? 1) If a bug is found in production it's more likely that the dev that built it still has context on what they did. The dev will most likely be able to fix the bug faster if told within a week about the issue. 2) For customers its amazing to be able to see a bug fixed within a day or two. The caveats. There may be legitimate challenges to deploying often that you will have to overcome. Some industries (such as FDA-regulated industries) require documentation and some specialized testing. If you have a change management board, you will probably need to work with your compliance/security team to change the signoff process to something more light-weight that still addresses the risks/requirements. If you have manual testing, you will need to automate some of the testing. A lot of this requires work done to automate steps. How often are your features/fixes ready? How long does your testing take (this sets a lower bound)? How long is the deployment process (adds to that lower bound)? How often does deployment have to be rolled back? If testing takes a week, you can't get deploy less than weekly. Fix this, get your testing setup to the point where you have enough servers to hit your test instance(s) and get testing done fast (don't cut corners and remove tests). If deployment takes days, work on getting that more automated so that it can happen more quickly and with less manual intervention. Write more deployment scripts/programs, and test them like you test your code. If you're rolling back deployments, do an analysis of why. Was there an undetected bug? Why? Manual testing didn't catch it, make it automatic. Testing isn't comprehensive enough, check out other testing techniques like property based testing that can throw more cases at your system. Because the deployment process is manual and someone fat-fingered the wrong value? Move it into a configuration file and automate that step. Repeat. Iterate on all of these things just like you iterate on the code for the system you're deploying. As you reduce the deployment and testing time (but not their correctness or completeness), you'll be able to get down to deployment times that are weekly or daily (or, in the ideal, continuously with each piece of code pushed to master). One factor no one has mentioned is "what's the cost of doing a deploy" and "what's the cost of recovering from a bad deploy". If your product is purely server-side then both of these costs can be made small and you can perhaps release multiple times per day. If your product is a desktop app that users have to download or a mobile app that has to be certified then deploy costs are larger (esp for your users) and may be bottlenecked by some external gatekeeper. If your product is embedded software that cannot be remotely updated where you need to do a complete product recall to fix any serious bugs or pay technicians to visit each site to roll out a new version... you might want to release once every year or two with a very high quality bar. Weigh up the pros and cons. If you've got other issues holding you back like you're lacking continuous integration, builds break often, you need more tests etc. weigh up if they're worth addressing too. Different projects have many unique requirements and characteristics. Pick the deploy process that suits your project and don't e.g. go with move fast and break things because one project says it works for them. I'm guessing you've probably got at least a few good reasons why you're deploying once a month right now. The more things that go out in a single deploy, the more chances that something will break and everything will have to be reverted. The goal of CI/CD is that each new feature/bug-fix/etc gets deployed independently, at least, that is our take on it. We deploy to thousands of servers sometimes multiple times a day, sometimes once a month. It just depends on the throughput of code changes and the required testing/observing that goes with it. Try to decouple features that are being build and implement feature toggles (turn off or on a feature with remote config). We use release trains which are weekly. If a feature is built and tested before the train leaves its turned on, otherwise off. I see people mentioning feature flags, is there a framework or pattern for feature flags? Imagining python where people use argparse as a library I imagine this can be a huge list of CLI arguments, or values to be parsed from a config file, or from the environment, that have to trickle down to a lot of parts of the system full of Environment variables are the best way to handle feature flags. The 12 factor app provides some guidance for doing it the right way. [0] Feature flags do introduce complexity, but that can be partially mitigated by having good defaults and deleting flags once they're no longer needed. Even without that, as long as there are good defaults, you won't run into any problems until you have hundreds of flags. I once worked on a system with tens of thousands of configuration flags that could all be overridden in an inheritance-esque chaining scheme, like Server A inherits regional config Foo which inherits from business unit config Bar, which inherits from three other 5000 line properties files. And naturally, almost none of these properties had defaults, so starting from scratch with a new configuration was impossible, leading to snowballing technical debt from copy pasting old configs. shudders This question has a simple answer. Regardless of your situation consider the "time required to release" and "how often we release" as two important metric of efficiency. Try to reduce them as low as possible. Feature flags + automated deploy on git push. If this freaks you out, write better tests. With feature flags, do you just phase out the flags later? Is that ever a messy process? There's a brief discussion of tech debt of cleaning up old feature flags in https://trunkbaseddevelopment.com
I am imagining some big django app, or even something like elasticsearch which is a big java app. if not feature_flag:
return