Settings

Theme

Ask HN: How do you keep track of releases/deployments of dozens micro-services?

142 points by Kunix 5 years ago · 139 comments · 1 min read

Reader

It's been three years since the last thread (https://news.ycombinator.com/item?id=16166645), maybe there are more mature solutions now.

Interested to hear about current setups, and how it works for you.

johnx123-up 5 years ago

From the previous discussion https://news.ycombinator.com/item?id=16166645

1. https://github.com/gocd/gocd - 6.1k stars

2. https://github.com/Shopify/shipit-engine - 1.2k stars

3. https://github.com/guardian/riff-raff - 252 stars

4. https://github.com/ankyra/escape - 201 stars

5. https://github.com/kiwicom/crane - 92 stars

6. https://github.com/tim-group/orc - 34 stars

7. https://github.com/wballard/starphleet - 19 stars (dead?)

8. https://spinnaker.io/

bhouston 5 years ago

We just made all the microservices into one big monorepo and we deploy all at the same time.

To be honest we tried to avoid the monorepo but it was hellish. Maybe if each microservices was larger and our team was larger but then are they microservices any more?

  • lovedswain 5 years ago

    The biggest difficulty I've experienced is "librification", where some common code ends up in a little library, and soon that library is not so little any more, and not long after starts to look like half of every service. I can maintain discipline when working on small systems alone, but on a team there will always be one lazy person or urgent need which means eventually some shared component gains enough gravity to start sucking code out of their nice isolated homes

    Giving up and dumping everything into a monorepo, that's not going to help at all. At that point probably better off just giving up any hope of carefully split up and individually managed services

    • bhouston 5 years ago

      These libraries already exist whether you write them or you use someone else's. In our case most of our micro services are node.js based so Koa is in every microservice and we use middleware for authentication -- and thus if the authentication system evolves (moving to JWT or a microservice gateway) we have to evolve that middleware everywhere.

      Same with our consistent logging system.

      Libraries are better than unique code everywhere for the same task - allows you to fix a bug once and to do consistency checking.

      • rgoulter 5 years ago

        The problem isn't "some code which different services use is in a library".

        The problem is an not-ideal "the code which is in libraries shared by different services is tightly-coupled to particular services". e.g. changing the shared library might break some service which depends on it.

        Obviously, you don't want code like that. But it's easy to write code which is slightly coupled; and then when you're in a hurry, to increase the coupling.

    • mewpmewp2 5 years ago

      I don't get the first paragraph you are saying. This lazy person puts some random code into a shared component, or... ?

      Wouldn't this urgent need mean that they put this code into the microservice that needs this urgent update as opposed to going through the effort to make it available for everyone to use?

    • slifin 5 years ago

      This video was interesting to me

      https://youtu.be/pebwHmibla4

  • Cthulhu_ 5 years ago

    Microservice is a misnomer; it should have a responsibility, but that could be 10 lines of code or 10 million.

    Anyway, it sounds like you have a distributed monolith. If you cannot maintain and deploy a microservice independently, it should not be a microservice.

    • notwedtm 5 years ago

      I don't build microservices anymore. All of the reasons listed in this thread tend to cause bottlnecks. I aim for domain services. Define your domains, and build a program to service it.

      • UK-Al05 5 years ago

        That's microservices. Microservices handle a bounded domain. At least in traditional advice for splitting microservices. The microservice provide an api for that domain.

        • jblwps 5 years ago

          Doesn't a "service" handle a domain?

          • UK-Al05 5 years ago

            Putting micro in front of service was a bad idea. Lots of people have the wrong idea because of that.

        • goodpoint 5 years ago

          No, that's SOA. Microservices are almost always described as quite small.

          • UK-Al05 5 years ago

            The "Building Microservices" book, which used to be the goto book for microservices suggests on page 31 to page 38 that bounded domains are a good model for microservices.

            The main difference between SOA and microservices is dump pipes vs smart pipes. Not service size. As explained in that book.

            Most people trying to implement microservices seem to have not read much about them outside of blogs.

            • ex_amazon_sde 5 years ago

              Quoting some book does not change the fact that most people understand SOA as being team or domain-bound and microservices as being much smaller.

              • UK-Al05 5 years ago

                I'm assuming most people adopting micro-services weren't even around when SOA was a big thing.

                People not reading the literature then implementing it badly isn't problem with the idea. In the same way that not studying calculus then doing badly isn't a problem of calculus.

                Most of these issues around microservices like how to split them so they don't cause issues, are known and solved problems. Just people have not read the literature other than the odd "blog". Like don't do microservices, for your first iteration. Backwards compaitble apis, and understand your bounded contexts from your first iteration before you try.

                They end up creating chatty nano services with tons of version fixed cross dependencies.

              • tdeck 5 years ago

                Interesting because I always thought SOA was just a slightly older term for microservices.

    • bhouston 5 years ago

      > Anyway, it sounds like you have a distributed monolith. If you cannot maintain and deploy a microservice independently, it should not be a microservice.

      We can maintain and deploy them independently, but it was annoying to try to track which version was deployed where and having to check it out independently, etc.

      The overhead was incredibly high. So we plopped them all into a single monorepo as sub projects. We can still update each one individually but we know what is live on the website is what is in the head of that branch.

      As someone whose last website was a monolith (Clara.io), we do feel we are getting the benefits of micro services with little of their downsides now. It is like night and day.

      It may be we have a lot of micro services for the size of our team - 20+ micro services and a team size of around 12.

  • BurningFrog 5 years ago

    One "wisdom" I hear is that the benefit of microservices is organizational:

    One microservice per team, so you cut down on intra-team friction, and the team can manage their own releases.

    • beastcoast 5 years ago

      What people miss is that micro services/SOA is an organizational concept more than anything. At Amazon, SOA is tied to the concept of the two pizza team. Each 2PT completely owns a collection of related services, owns the roadmap for those services, and can make deployments independently of any other team. If your company doesn’t have enough engineers to justify at least say 5-10 scrum teams then you might not be large enough to need microservices.

    • dfcowell 5 years ago

      Microservices are a solution for scaling teams, not software or infrastructure.

      (In other words, you're 100% spot on!)

  • myzie 5 years ago

    +1 to this. In our case we say "deploy all" but use Zim[1] to automatically determine which services have associated changes. This keeps the overall deploy quick.

    This is comparable to CloudFormation or Terraform in terms of determining whether something is up-to-date, but more general purpose.

    [1] https://github.com/fugue/zim/

  • JensRantil 5 years ago

    "Microservices" is a tool to solve problems. The goal isn't microservices, the goal is to solve those problems. Example of problems it might solve: Unclear ownership, slow innovation, unstable software or orthogonal scaling of features.

  • nicoburns 5 years ago

    Perhaps they shouldn't be microservices. What would be the disadvantage if you combined your microservices into a monolith? Or perhaps 4-5 "macroservices"?

    • jayd16 5 years ago

      In this case I think the disadvantages would be you're locked into one machine type and your blast radius is a bit wider.

  • taneq 5 years ago

    I was going to make some sarcastic response to the effect of "iT's EvErGReeN" but this is basically the only answer. You don't do individual releases of your microservices any more than you do individual releases of the classes/functions in your C++ project.

monster_group 5 years ago

I don't keep track. All microservices use continuous deployment pipelines. If you check in code and it passes all the tests, it will make it out to prod some time in the next few hours.

  • jjice 5 years ago

    How do updates to a database work through that pipeline? Do migrations run through and rolled back automatically as needed?

    • sokoloff 5 years ago

      Not in the context of micro-services, but we ran our production DB for years in a “both N and N+1 work” by following a few simple rules which turn out to be not that restrictive in practice.

      Short version: have DB1 hold the transactional data (data generated while running the system). Have DB2a have the release-bound data (data about and connected to the code itself-settings, prices, whatever).

      Have DB2a have views onto DB1 tables. Version a code only “knows about” DB2a but any transactional CRUD ops hit the tables on DB1.

      Now version b of the code just needs to ship/create a DB2b and both a and b can run in parallel.

      If you need to change the shape of DB1 tables, those changes need to be backward compatible (can only add nullable columns, no use of "select *", etc).

      There’s a few details about how to make it fully practical, but that’s the gist and we ran than for about 12 years on a moderately heavily trafficked e-commerce site.

      • SkyBelow 5 years ago

        Database versioning and backwards compatibility has been something of an upcoming problem where I'm at as we've been getting better at CI/CD. Do you have any recommended resources for different approaches to it, or even some keywords that'll help when searching for such resources?

      • rblatz 5 years ago

        This sounds a lot like CQRS

      • atonse 5 years ago

        Whoa I love this idea. I usually refer to it as "transactional" (always growing, represents the daily activity, read and write heavy) vs "lookup" data (almost exclusively read, not changed often). But still store them in the same database.

        What ends up happening is we end up separating the two more with "cache this one, not that one" rather than two different databases.

        I will explore this idea on my next greenfield project, whenever that happens.

      • dandigangi 5 years ago

        Ran similar setup to this and worked pretty well!

      • atrandom 5 years ago

        This sounds interesting! is there any more detailed write up that you link me to? Thanks!

        • sokoloff 5 years ago

          I looked briefly (and I could have sworn I posted our "nine rules" on HN years ago, but I couldn't find it in a quick search).

          I'll look again later tonight more thoroughly to see if I've posted the mechanisms and restrictions publicly anywhere before. If I haven't, I'll try to dig it out of our old dev doc system and post them here, but I can't make any promises as the docs I recall are now over a decade old, so I'm not fully sure they exist any more. :)

          • sokoloff 5 years ago

            The internal docs for this are not on any of our documentation systems that we've moved to zero-trust (as they're 12 years old and unchanged for 5+ years). I will probably be able to retrieve them when we're back in the offices; shoot me an email (in my profile) and I'll find a way to get something over to you with some significant delay.

    • GordonS 5 years ago

      Not the OP, but the way I handle this to to ensure that all migrations are backwards compatible - the current and new versions of the app/API/service must be able to run with the old and new database.

      This requires a little discipline, but if you follow a few simple rules it's not really that arduous:

        - when adding a new column, it must have a default value set, or be nullable
      
        - don't drop any columns
      
        - don't rename any columns
      
      Now, for those last 2, what I really mean is "don't do it in a single release" - if you want to make destructive changes, do it over the course of 2 releases.

        - release 1: remove dependencies on the column from the app/API/service
      
        - release 2: performs the database migration with destructive changes
      
      It probably sounds more difficult than it actually is :) In reality, I don't make destructive changes that often though.
    • rblatz 5 years ago

      We always do it by not pushing breaking changes to the database. It’s extremely freeing. It does require some discipline to go back and cleanup things later, but not worrying about database “versions” is the way to go in my opinion.

    • vbsteven 5 years ago

      Not gp but here's a possible answer: I usually require db migrations to have a "down" script as well but "down" is never applied automatically. I only auto-apply "up", and when a rollback is needed (which has been very infrequent in my case) I manually apply the "down" scripts using Flyway cli commands or by hand.

      • BatteryMountain 5 years ago

        To add.

        Same here. A forward only approach works best for us too. if you need to clean up a mess, it is a new migration script. It's too complex to try an work backwards. What if multiple scripts were ran? Then you have to roll back say script number 2 out of 5 and there were destructive operations. It becomes really hairy really quickly. So forward-only is the easiest to reason about.

        Please do make sure that you have snapshots for restoring if you really mess up badly. I know its not always feasible to do snapshots before every deploy, but having a daily snapshot can bring you a lot of comfort.

        If you built your own migration tool (highly encourage it, its not that hard to build a forward-only migration tool), then you can trigger selective snapshots/table dumps for only the tables that gets changed, and only for specific operations (updating schema, dropping columns, dropping table) before your migration scripts touches the db - that way you have a path to restore. You don't always need a full DB dump (say you have 500+ tables but only changing 2, 1 of which is destructive, thus the backup is tiny and quick). It also helps if different data sets live in isolation to help manage this kind of admin.

  • imafish 5 years ago

    Do you ever release serious errors into prod?

    • BatteryMountain 5 years ago

      The question is not IF, but WHEN.

      So ideally you have some kind of monitoring that reports/shows how many services are alive (and where they live in a cluster), how many errors they generate etc. Then based on some thresholds you can take them out of circulation and let them cool down. If certain kinds of errors occurs, or at a certain frequency, the system can notify a site reliability engineer (or equivalent) to check it out. Then they can decide if it should be permanently removed and to log an internal support ticket and so forth for the developers or product teams.

      Production issues are a part of life. You need to have some visibility on issues and their severity. Every company and tech stack is different, also depending on their SLA's and uptime promises.

      Ads not rendering in an app might be less severe than a pump failure at a fuel station, so they have different kinds of monitoring and and reaction times to faults. Obviously things like hospitals, banks, airlines/aircraft manufacturers have way different requirements and infrastructure from say a system that manages all school libraries for a state/province.

      There are too many products and approaches to mention here if you were looking for a list of those. I have one or two favorite approaches and a handful of tools for this kind of stuff, half of which is homemade, so not something you can google. But you can google it and see a few different approaches. "microservices monitoring java" or "microservices monitoring best practice" or something along those lines will get you on a path. Try to find 5 different approaches and reflect what each one is missing or how they may help you, and then ponder what would you like to see from a reporting system with hundreds/thousands of services.

      And then obviously the the best lessons will come from production itself.

      Good luck!

      • killtimeatwork 5 years ago

        > Production issues are a part of life.

        Only if you accept them. The alternative is to do very few, rigorously tested releases per year. This way you don't have production issues. That's how industries like banking make sure bank transfers and card payments work and people's money is not randomly lost... It's a shame many other industries just accept their product failing for users as something normal/inevitable.

        • lordgilman 5 years ago

          I can't say my experience echoes your comment. I'm a former employer of a financial services (billing) company built around a mainframe code base started in the 70s. We probably qualify for the sort of business you had in mind with your comment.

          We did four releases a year, across the entire organization (so mainframe and more modern platforms), on Saturday nights/early Sunday mornings. There was plenty of testing but there was still plenty of errors only found on the day of, and rushed to fix in the wee hours or daylight hours of Sunday morning.

          The only thing that seemed to correlate with release quality was the overall risk of the release, i.e. the complexity and number of new features written during that quarter.

          • killtimeatwork 5 years ago

            > We did four releases a year, across the entire organization (so mainframe and more modern platforms), on Saturday nights/early Sunday mornings. There was plenty of testing but there was still plenty of errors only found on the day of, and rushed to fix in the wee hours or daylight hours of Sunday morning.

            This way, you had bugs in prod for less than a day once every quarter, as opposed to having buggy prod all the time, as is common in organizations doing Continuous Deployment.

        • taormina 5 years ago

          That's adorable. You know that no matter how much testing you do, that something WILL slip through the cracks? Always.

          • killtimeatwork 5 years ago

            Of course. Even the Space Shuttles blew up, twice. I'm guessing even pace makers and software in nuclear power plants have bugs. The point is, these things are exceedingly rare or have very limited scope (occur only in most obscure corner cases and also do limited damage), while in web companies which adopted Continuous Deployment, serious bugs are just common and I think seen as part of life.

          • scruple 5 years ago

            Work in healthcare where we have heavily tested, quarterly releases. Well, we had a release today and some stuff was pretty horribly broken, despite being so heavily tested. We didn't adequately load test one piece of the new release under production-like conditions. Oops. Thankfully the fix was simple and a hotfix only took a couple of hours in total. Yet another lesson learned.

            • killtimeatwork 5 years ago

              That's pretty bad, but nonetheless you detected and fixed it very quickly. Compare that to lingering bugs in Twitter iOS client (it's just broken on iPhone 5s, I guess they simply don't test on that device anymore), or happy random bugs in Windows 10 that appear after they CD an update on their users.

        • SkyBelow 5 years ago

          Then you get the worst of both worlds. You are in an industry where few very well tested releases are needed to meet SLA and customer expectations, but you have enough of the company looking at entirely different industries and wanting to follow their pipeline instead.

    • Townley 5 years ago

      Sometimes, though thankfully less frequently (and for a less-disastrous definition of "serious") than I used to.

      Luckily, a good CI/CD pipeline makes reversions just as easy as deployments. So even when you have errors, it's easier to correct than if you suddenly discovered "our deployment bash script / ansible playbook isn't as reversible as we thought it was"

    • monster_group 5 years ago

      Rarely. All features are gated by feature flags with the capability to dial up the feature gradually and dial down the launch instantly. I can monitor if the feature launch is going as expected by monitoring errors and metrics in the logs.

    • ex_amazon_sde 5 years ago

      YES and this is why deployments to prod should go though many stages and have long bake-in time for critical applications.

      The idea of deploying every commit all the way to prod is is very questionable.

jayd16 5 years ago

If you need to keep track its probably too late. What makes them services is that they should be able to be deployed without a bunch of orchestration of other services. You can solve this by having backwards compatible apis.

That said, to know what changes would actually break things you'd ideally have a suite of tests.

  • giantg2 5 years ago

    "If you need to keep track its probably too late. What makes them services is that they should be able to be deployed without a bunch of orchestration of other services."

    If only you could tell my bosses/architects that. They won't listen to me.

    Edit: why downvote?

    • kjeetgill 5 years ago

      Quick counterpoint:

      Just because you should be able to release without orchestration doesn't mean you shouldn't be able to watch and track things.

      You shouldn't have frequent breaking changes but you should still have the tools to manage when you do.

      • giantg2 5 years ago

        I'm not sure what your counterpoint is in reference to. I didn't see anything about tracking or recovery from breaking changes.

    • lowbloodsugar 5 years ago

      >If only you could tell my bosses/architects that. They won't listen to me.

      Then leave and go somewhere where they will. I wasted too much of my life trying to "change things from within", but I finally learned the lesson. If you have no authority but are held accountable, then GTFO.

    • edoceo 5 years ago

      I'm not a downvoter but maybe its cause "they wont listen"

      my theory is your presentation is not compelling. Was your CBA clear? What risk/reward metrics did you highlight?

      • giantg2 5 years ago

        I'm a midlevel dev at a large company. I don't even have access to make a presentation to the true stakeholders. I can only make suggestions to my lead/boss and have them move it up the chain. On a side note, I've tried moving up th chain when I felt appropriate, but apparently a SQL injection vulnerability with full schema level privileges that was not being prioritized for remediation was not important enough to waste my department head's time.

        Sometimes they take my suggestion, sometimes they are already working on it behind the scenes, and sometimes they go nowhere. In this case, they are already measuring most of the metrics like cycle times and mean time to recovery, etc. They already stated that they want microservices and have been building them out. The problem is that they implement it wrong and have no interest in re-architecting. Many of the apps are rewrites of legacy apps. Instead of evaluating the underlying business process, they just want us to build it the same in the new tech but use "microservices". The problem is that some of the business process was designed around the restrictions of the old technology or old industry norma. We should be evaluating the business process before building the technical system, otherwise we will continue to bake in these old constraints and not fully leverage the capabilities of new technology.

        Edit: looks like I made someone angry since this is downvoted too.

        • ethbr0 5 years ago

          Meta comments tend to get downvoted. Try editing the complaints out?

        • jupp0r 5 years ago

          Time for a job change? Lots of red flags.

          • giantg2 5 years ago

            I've thought about it. I have no local options and my wife won't move. My remote options are very limited based on the stacks/languages that I have experience in.

            • brianwawok 5 years ago

              So you have few options. Work on more? Pick up another stack. Do some weekend consulting in a new framework. Better yourself. By being stuck in the only 1 job you could possibly work for, you are just delaying the mess that happens when they close down / get bought / you get fired. The best part of programming is you can teach yourself almost anything, especially if you have previous experience to build on.

              • giantg2 5 years ago

                I can't really pick up any off-hours stuff since my wife works and I have to watch the kid. Maybe once they're older I'll have time.

                • brianwawok 5 years ago

                  Lunch hour. After bedtime. Morning. You are responsible for making yourself better. Do it.

                  • giantg2 5 years ago

                    Nah, I've done everything right for years. No point in wasting my time if there's no reward.

                    • brianwawok 5 years ago

                      Okay, I read your post as you were stuck in a dead end job with no alternatives.

                      If you are fine to stay in that exact role till you die, then of course there is no reason to do anything else. It does get harder and harder to find time to learn as you age though, saw this first hand with 60 year old cobol programmers 20 years out of date trying to learn Java...

                    • ranguna 5 years ago

                      The reword is literally having a better job that allows you to work remotely, so you can feel better about your work without thinking about moving constraints of your household.

  • acoard 5 years ago

    >What makes them services is that they should be able to be deployed without a bunch of orchestration of other services.

    True, but you absolutely should still be versioning/tagging your releases for each service. It's not to provide sophisticated orchestration; but just to know each of your releases and be able to roll back to them.

    Also I'll point out that some loose coupling between services is unavoidable even in the best case scenario. Sometimes breaking changes happen, or new features need to be taken advantage of. This necessitates some level of (perhaps ad-hoc) "orchestration." If you add a new feature to a microservice and rely on it elsewhere, there's an implicit dependency to that version (or later) of the microservice now.

adamhp 5 years ago

Frankly, we don't do a great job of it. We have some Ansible deploying to OpenShift via openshift applier, that gets run from some Jenkins jobs. We use a form of Git Flow to do branching and tag releases. It's messy.

I've been looking at Sentry for this, recently. They have a specific feature for tracking releases (and even relating them to errors vs. commits) which looks very interesting. Haven't tried it yet though.

anishdhar 5 years ago

We're building Cortex (https://www.getcortexapp.com/) to solve this problem :) We help you track all your microservices and integrate with all your 3rd party tooling to build a single pane of glass for your architecture. Happy to give you a demo if you're interested!

  • headcanon 5 years ago

    We're new customers of Cortex at my workplace and I can't recommend it enough. Its delivering a lot of value to us, mainly around keeping track of service "quality" metrics. Team is very responsive and is constantly improving the app. Big fan!

drbojingle 5 years ago

I'm curious to see how micro service management evolves over time and learn whether or not it will become viable for small companies. Hopefully one day its as cheap as writing a function is.

As it stands, with what I've seen and heard about microservices, I'd say the best way to deal with micro service anything is to use a monolith 90% of the time and for the rest of the time make sure your micro service could stand as it's own SAAS if given enough love.

Not a direct solution to your problem but might be an indirect one.

  • twh270 5 years ago

    Right now I don't think microservice management is 'viable' even at larger companies. The custom deployment scripts & yaml to manage building, package/artifact repository, versioning, and deployment tends to be Too Damn Big, at least at the shops I've seen.

    • drbojingle 5 years ago

      Oof. I haven't seen micro services much but, what I have seen makes me wonder what value people are actually expecting to get back. I hope the next iterations of this idea work better.

      • dcow 5 years ago

        The problem is a few gigantic tech companies did it and it became trendy. The reality is it shouldn't be done until you are at a scale that requires it. Most companies never reach that scale.

        • milesvp 5 years ago

          Seriously trendy. I had a CTO try to claim we were doing microservices even though our stack just had a couple of different systems serving different APIs. We would just roll our eyes whenever this person would try to brag about it to other C-levels. This was for a team that was just big enough to maybe justify 2 managers at it’s peak.

  • ex_amazon_sde 5 years ago

    > Hopefully one day its as cheap as writing a function is.

    When a network call is involved, never.

nonameiguess 5 years ago

Two ways I've seen it done reasonably well.

The somewhat more modern way with Kubernetes deployments is the Helm "chart of charts" pattern, where your system level deployment is a single Helm chart that does nothing but pull in other charts, specifying the required semantic version of each sub-chart in the values.yaml file.

The older, but also much more flexible way I've seen it done is through something a local system architect developed a while back that he called a "metamodule." This was back when Apache Ivy was a preferred means of dependency management when Apache Ant was still a popular build tool and microservices were being deployed as Java OSGi components. Ivy defines a coordinate to uniquely identify a software dependency by organization, module, and revision. So a metamodule was just a module, but like the chart of charts, it doesn't define an actual software component, but rather a top-level grouping of other modules. Apache Ivy is significantly more flexible than Helm, however, allowing you to define version ranges, custom conflict managers, and even multiple dependencies that globally conflict but can be locally reconciled as long as the respective downstreams don't actually interact with each other.

Be aware both of these systems were for defense and intelligence applications. Personally, I would just recommend trunk based development and fail fast in production for most consumer applications, but for things that are safety or mission critical, you can't do that and may have very stringent pre-release testing and demonstration requirements and formal customer acceptance before you can release anything at all into ops, in which case you need the more complicated dependency management schemes to be able to use microservices.

Arguably, in this case, the simplest thing to do from the developer's perspective is don't use microservices and do everything as a monorepo instead, but government and other enterprise applications usually don't want to operate this way because of being burned so much in the past by single-vendor solutions. It's not totally impossible to have a monorepo with multiple vendors, but it's certainly a lot harder when they tend to all want to keep secrets from each other and have locally incompatible standards and practices and no direct authority over each other.

klohto 5 years ago

Flux with GitOps approach, using Helm charts.

All of our of microservices have deployment charts, with frozen image versioning. That way, we can can rollout a whole release knowing they are all compatible with each other and can easily fall back just by using git rollback.

CI/CD updates image versions in affected YAMLs on every backend release and Flux keeps staging in sync. When we are happy, we sync to production branch, Flux syncs and it's done.

If we spot an issue that we didn't see in staging, we either release a hotfix or rollback.

  • theptip 5 years ago

    Do you have a separate git repo for the deploy config/manifests? Or just force-push your `master` branch to the `staging` and `production` branches to do a deploy (i.e. not keeping full history in the env branches)?

    I've seen both advocated for, interested in what the consensus is.

    • klohto 5 years ago

      We have gitops repo which contains state of both clusters. Staging and production. The only difference is that production flux watches only production folder and production branch, while staging flux watches staging folder and master branch. Production branch is kept in sync with master when releasing, ff-only.

      Backend is a monorepo. I can easily check the commit history in gitops repo to see what was the state of backend when the release was made.

      Nothing should be lost, we keep history of everything this way.

  • computershit 5 years ago

    Have you looked into Jenkins-X at all? I'm at a point where I'm starting to adopt GitOps and I'm torn between Flux and (what I consider) a far more opinionated but pretty elegant solution in JX.

    • klohto 5 years ago

      I did, it’s overly complicated for what I need (single team, apply YAMLs in git repo, specific branch, tagging). I see the industry using mostly Flux and ArgoCD and I really don’t want anything Jenkins related in infra again.

  • dclausen 5 years ago

    Could you explain more about your "frozen image versioning?"

    • klohto 5 years ago

      Was wondering if I just invented the term, or it’s something known :)

      Basically a specific semver, no major.minor or just major. Whole version including patch.

abunuwas 5 years ago

In a previous job had tons of microservices and tons of environments, so it was getting difficult to track what was deployed where. We opted for a simple solution to this: we wrote a very simple CLI that makes the deployments and at the same time registers the deployment in a DynamoDB table. Then to get a picture of a certain environment we just had to list all services for that environment. You could also list the history of releases for a certain service in a certain environment.

  • abunuwas 5 years ago

    To clarify: we tracked not only microservices but also UI deployments. We had what they now call "microfrontends"

sdevonoes 5 years ago

We don't. I'm just waiting the day we reach more than 100 microservices and my company realises that microservices was a bad idea to begin with. That's usually the way it works: learning the hard way.

To elaborate:

- I do think there is value in "utility microservices". For example: a microservice to send email, a microservice to filter spam, etc. These are the next level libraries (because they do need to run as services 24/7). Management usually don't like these kind of microservices because these "domains" usually don't belong to any particular team, so managers cannot "own" their success.

- I don't think there's much value in building microservices for the core of your business (e.g., a checkout microservice, a payments microservice, etc.). The usual argument management gives is: "we'll make teams more independent and they will be able to delivery stuff faster than with a monolith!". While this is sometimes true, "faster software delivery" is not on my top list of prioritites when it comes to build software.

vbsteven 5 years ago

My usual setup is pretty simple with each service in its own git repository with a Gitlab pipeline:

  * build code
  * run tests (unit + integration using database)
  * build docker image
  * push to gitlab registry
  * deploy to staging k8s environment by using a custom image that just templates a .yml and does `kubectl apply` against the staging cluster
  * optional extra "deploy to production" that works in the same way but is triggered with a manual button click in the pipeline.
I don't do canary deploys or anything. Just deploy to staging, and if it works, promote to production.

For some projects I have "staging test scripts" which I can run from my devmachine or CI that check some common scenarios. The test scripts are mostly blackbox using an HTTP client to perform a series of requests and assert responses. (signup flow scenario for example)

I would like to move to a monorepo, but I have not yet figured out an easy way to have a separate pipeline for each service that is only triggered when that service has changed.

edit: formatting

  • Chico75 5 years ago

    The issue with this model of manual deployment to production is that it creates uncertainty about what version was last deployed to, and the team can lose confidence in the deployment process if that doesn't happen regularly

itielshwartz 5 years ago

Founder of https://Komodor.com here, we track changes and alerts across your complete K8s-based stack, analyzing their ripple effect and then providing devs, DevOps and SRE teams the context they need to troubleshoot efficiently. Independently.

Feel free to ask question or reach out :)

UK-Al05 5 years ago

You've broken the microservice abstraction if this a problem.

A team should own a microservice, you release as soon as the team able to.

You version your apis, so you don't break any services which rely on yours.

  • giantg2 5 years ago

    "You've broken the microservice abstraction if this a problem."

    I agree, but in practice it seems more companies break it rather than follow it.

    • UK-Al05 5 years ago

      I agree people hop on the microservice bandwagon without really understanding the "philosophy" behind it. Then blame microservices when they struggle.

      • giantg2 5 years ago

        I actually prefer a monolith compared to the way we do microservices. One or two big apps to own vs ten microservices or apps that require coordinating with others. It's really just a duplication of paperwork and other overhead processes.

        • UK-Al05 5 years ago

          We don't require any coordination with ours. It's low cost for us.

          We may make the occasional announcement that v5 our api is coming soon which has x features based on feedback from other teams. Then announce when we've released it.

          Its a product, and we own it.

          • giantg2 5 years ago

            That sounds nice. My most recent elevation involved coordinating with 5 other services to stop/restart/reprocess items with a couple deploys between the 5. Had to be off-hours too. Sort of normal for us.

        • ex_amazon_sde 5 years ago

          > vs ten microservices or apps that require coordinating with others

          That's called the "distributed monolith" and it's worse than a monolith.

  • nonameiguess 5 years ago

    Although this is true, there is still the additional problem that a lot of customers, specifically government customers, require frozen known versions of everything as a requirement for acceptance testing. Auditing standards for the aerospace industry require that what is running in ops is exactly the same version of everything that was running when an acceptance test was witnessed and signed.

    It's a totally impractical standard for modern software development, but the developers themselves have no choice in the matter until the customers change.

jokethrowaway 5 years ago

Push to master -> jenkins runs linting, tests, applies migration (or fail, requiring manual intervention), build sdocker image, k8s deploys to canary, monitors canary for a bit for errors, k8s deploys to production, tags docker image, notifies slack.

In the past, instead of canary, we used a staging environment with manual promotion. That was costing us a cool half a million in AWS overpriced machines (but we were committed to spend a certain amount of money per year in exchange for discounts, so it's hard to price things) and it was doubling the testing process (promote to staging, test, promote to prod, test). We have been bitten by issues happening in production and not in staging. With the canary, prod only approach we have higher risks of messing up with real data but we have safeguards in place and the canary approach means that a small portion of the users will see problems. We also have the option to deploy to a canary for devs only.

I'm not happy about using / running / maintaining jenkins (terrible UI, upgrade path, API to add plugins, etc) but it does the job and it improved a fair bit over the last 5 years. Jenkinsfile are especially nice, even though not being able to easily run them locally is a bit annoying.

mrdonbrown 5 years ago

When I worked at Atlassian, we had this issue as well, given all the many services that were deployed for products. A few of us left and created Sleuth [1] to solve it for Atlassian and folks like you. Sleuth helps you know what is deployed, its health, and helps with workflow automation. It also tracks the DORA metrics so you know how healthy a service release processes is.

[1] https://sleuth.io

moksly 5 years ago

It depends a little on you definition of “microservice”, but we keep track of a lot of our “mostly single responsibility” data-processes that make up the builk of our AD, IDM and organisational database for 10.000 employees and 300+ IT systems with a mix of azure automation runbooks and local tasks that are activated by azure automation to. This gives us a clear picture of when what is run, alert humans on errors and halts processes.

For all-ways-on systems we have a simple dash-board that each service interacts with.

We don’t have a fancy CI/CD pipeline or anything like that, just a set of rules that you have to follow.

Database-wise a service has to register itself with one of our data-gatekeepers, which involves asking for permission for the exact data used with a reason. But beyond that services are rather free to make “add” changes, often in the forms of new tables that are linked with views. It’s not efficient, and we have a couple of cleanup scripts that check if anyone subscribed to all the data, but we’re not exactly Netflix, so the inefficiency is less expensive than doing something about it.

romanhn 5 years ago

Check out OpsLevel, seems in line with what you might be looking for. I know the folks behind it, they're top tier.

  • kenrose 5 years ago

    Thanks Roman.

    Founder of OpsLevel here (https://www.opslevel.com).

    A lot of companies build their own internal microservice tracking tools. Not just for release/deployments, but also for tracking service owners and production readiness.

    e.g., Shopify has ServicesDB ([1]) and Spotify has System-Z [2], which they recently open sourced as Backstage [3].

    If you're down to build / maintain your own service catalog, those are good places to start.

    We started OpsLevel a few years back because we saw a pretty clear need for a product in this space. OpsLevel tracks your services and their owners, production readiness of your services, and brings together lots of event/metadata about your services (including deploys).

    There's been a lot of traction in this space over the last few years with a lot of new companies popping up. I'm glad to see some of our newer friends in the space chiming in this thread.

    [1] - https://shopify.engineering/e-commerce-at-scale-inside-shopi...

    [2] - https://dzone.com/articles/modeling-microservices-at-spotify...

    [3] - https://backstage.io/

kerblang 5 years ago

Since the specific question is "how do you keep track of", my build & deploy script copies a quick one-liner dump of git information (SHA, date, environment, branch, etc.) to a directory on a shared server, as a text file. Later I can go to that server and `cat versions/* | sort` to get a report of what is deployed where/when and so on.

It helps that I have One Deployment Script To Rule Them All (or really, a couple DSTRTA's). When every service has its own special build & deploy script you have to ask nicely and hope people keep up with it. A lot of CI/CD systems force you into that corner because of an implicit assumption that each build & deploy is its own special one-off.

Anyhow, text files rule, at least as an ad-hoc solution.

100011_100001 5 years ago

My team is responsible for about 200 microservices being deployed, some of them have 10 or more pods. We don't do continuous delivery. Instead it's done by 5 different groups deploying once every week or two.

Our production deployment jobs are in Jenkins and isolated. It's easy to check what was deployed when. We also have a script written that can run an environment report to see what versions and which microservices have been deployed. Along with their CPU/memory allocations, number of pods etc.

Release management tracks which JIRA stories are in which release, they do it mainly by looking at master merges between prod deployments.

  • JamesSwift 5 years ago

    That seems overwhelming to me. Do you feel this is a tenable strategy? Or is the number of deployments becoming an issue?

    • twh270 5 years ago

      This is similar to what $myclient is doing.

      Parent comment doesn't mention whether identification of versions is done manually or whether they just grab master. If the latter, it's probably reasonable. At $myclient, every release to stage and prod requires teams to manually identify each version of each microservice as well as the stories (JIRA tickets) that are being deployed. This is extremely painful, time-consuming, and error-prone. Avoid at all cost; as the number of services grows, the pain/time/error cost appears to increase geometrically.

      • 100011_100001 5 years ago

        Our master versions automatically create new images with unique ids and JIRA stories are associated with them. You can see that from JIRA to Gitlab and vice versa.

        Having said that there are people that do some manual correlation. Mainly to be able to determine that the fixes actually got in.

    • 100011_100001 5 years ago

      If you automate enough you almost don't need to know the specifics. We no longer monitor actively during production deployments. They just work, once every 5 months there is a deployment that needs to be escalated to us.

  • Crazyontap 5 years ago

    Any plans for showing user's birthday on the settings page?

    Sorry, I'm just kidding but that's the only thing I could think off when I heard the number 200!

k8s_Hero 5 years ago

Have you heard of Komodor? They just held a joint webinar with Epsagon regarding this very issue! You can see a recording of the webinar here: https://www.youtube.com/watch?v=J32ZoiRVvPg Or the product overview here: https://www.youtube.com/watch?v=Qgio3vF1sPE&t=6s

znpy 5 years ago

you don't. each team keeps track of its own set of micro-services.

  • igetspam 5 years ago

    This.

    We have standardized pipeline models that we reuse everywhere. Service owners are responsible for updating their pipelines to pick up changes. As we mature, we're moving a lot of it into ci templates and key changes will be picked up automatically. There are a few pipelines that occasionally require manual steps but those are uncommon. As we add more continuous testing, we'll be deploying more frequently. Once we've gotten good at that, then we'll be working on a/b testing and/or feature flags.

mandeepj 5 years ago

Looks like someone from AirBnb can shed a light on the topic. They seem to be nailed the Microservices deployments :-)

https://www.altoros.com/blog/airbnb-deploys-125000-times-per...

taleodor 5 years ago

We work on a solution - https://relizahub.com

Our community Discord Server (questions on DevOps and DataOps, not limited to Reliza Hub) - https://discord.gg/UTxjBf9juQ

selphy1 5 years ago

We use Octopus Deploy. On commit it autodeploys to dev and sends the team a slack message ([environment] version x (previous was y) deployed by z). Prod deployments are also done through octopus but "manually" by the team when we are ready to make a release. Usually every week or two.

sasfn 5 years ago

Sauron is a solution to help to track as many microservices as you have, indexing these information into an elasticsearch

https://github.com/freenowtech/sauron

raphaelj 5 years ago

In the case of web/RESTful services, I'm just relying on Heroku/Dokku.

exabrial 5 years ago

I hate to state the obvious... but don't do microservices? It's the lunacy of the 2000's ESB craze but without all the formal testing.

The only way to accomplish what you're asking for would be extremely thorough mock testing.

geritwo 5 years ago

CI/CD can make it manageable with Git and Atlassian tools, or you can build a custom web dashboard if needed. Personally I like version tagging and release management based on semantic versioning.

sidcool 5 years ago

GoCD or Gitlab CI work work well for me. I have CircleCI too, but it's initial version did not impress me. I am sure it has improved since.

rileymichael 5 years ago

GitOps w/Flux, although currently evaluating ArgoCD for its “app of apps” pattern to more easily provide feature preview environments.

wikibob 5 years ago

Don’t have dozens of micro services.

This is a serious comment.

anotherhue 5 years ago

ArgoCD has been very helpful here. I understand that the upcoming 'Argo Rollouts' will be even more so.

caniszczyk 5 years ago

check out https://backstage.io

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection