Settings

Theme

Tolerating full cloud outages with Monzo Stand-in

monzo.com

64 points by abritishguy 10 months ago · 44 comments

Reader

QuinnyPig 10 months ago

What I wonder is “have they isolated third party dependencies?” If AWS is hard down, those may well be impacted—in some cases, by their own third party dependencies. You can test turning off your AWS environment, but you can’t really test turning off S3 for everyone…

  • abritishguyOP 10 months ago

    It's a very good question. The stand-in system itself has been built to have basically no external dependencies itself.

    So, the question you are really asking is "to what extent are the other parties involved in the processing of payments resilient to AWS failure" – e.g. Stripe probably isn't and that's probably a decent chunk of e-commerce.

    I definitely don't think this would be anything close to smooth sailing if AWS was to fully go down, but we do have the benefit that underlying payment infra is still dominated by on-prem with leased lines etc. My best guess of the actual behaviour would be that bank transfers would keep working, the card networks themselves would keep working but the average e-commerce website would not.

    Naturally, we can only control for what we can control for – and for us the primary benefit of stand-in is what it gives us in the much more likely scenario of an incident in our platform.

  • sleepgou 10 months ago

    From what I understand of payment systems this is so that payments through card machines, contactless payments for public transport, cash withdrawals from ATMs, etc. all continue to work. A lot of those systems are surprisingly insulated from AWS simply by virtue of being extremely archaic

    • fujinghg 10 months ago

      I wouldn’t assume that is the case. The failure modes are different that is all.

      I saw a whole corp POS platform a couple of decades ago that was hanging off a TFTP server on a machine that no one dared turn off in case the world ended. One day the DC UPS failed, it didn’t come back up and they had no retail operations for several hours while they sent a bunch of cash to a guy who had left to help them fix it.

      There’s stuff like that everywhere lurking in the archaic.

      I know of a modem in a DC which is used to talk to a branch office running AS400 hardware that is so old they have to buy spares off eBay.

      • chrisldgk 10 months ago

        To add to this, I remember a story my father told me. This is off the top of my head and a few years ago so it might not be fully accurate.

        My father worked as a banker for most of his life and when he was in his late twenties he got a position to oversee a smaller investment bank. This is sometime in the late 90s. When he started, he took a general look around, checked with everyone how things are going and happened to meet on of the few IT people working in the building. When the IT guy realized that he was speaking to a new person who might be able to change things around there, he was elated and told him that there was an issue the previous boss never took too urgently, even though it was quite critical. Apparently the servers that were running pretty much all of the transactions of that investment bank were located in the basement of that building and have literally never been migrated, upgraded or anything else. The servers that were left over from that time was literally one running machine and another machine that had died a few years prior that was now only used for spares in case anything on the singular still working machine broke. Since the hardware was so old, there apparently weren’t many replacement parts left and the ones that were left were incredibly expensive due to many bank depending on those specific servers.

        Anyway, my father heard that story and immediately got the guy the funding he needed to migrate to a newer and better system. Sometimes I think about this kind of stuff, we think banks are really resilient (and they try to be), but I wouldn’t be surprised if setup like these still exist somewhere because people are too scared to touch them.

Koffiepoeder 10 months ago

Unrelated tangent: I was reading the article and suddenly realised that I could not identify the font. After a quick search:

> Our functional typeface is Monzo Sans, a custom cut of Universal Sans, meaning it’s unique to Monzo. We chose it for maximum readability, with generous dots and curled ends.

Intersting choice, but I dig it :)

noodlesUK 10 months ago

This seems especially relevant given the massive outage that Barclays, another major UK bank just suffered. Barclays was down for around two days with customers unable to spend money at all.

I suppose had they implemented a similar system, they would have degraded into a minimum viable banking system rather than the total outage that impacted so many brits.

tikkabhuna 10 months ago

These blog posts are why I continue to support Monzo. Their openness is really appreciated.

theginger 10 months ago

A decent setup which allows you to prove you are not dependent on 1 cloud provider will probably pay for itself when it's time to negotiate discounts.

  • cbg0 10 months ago

    I doubt the sales folks you'll be talking to will care about your multi cloud deployment, as they don't have the skills to verify something like that.

    • matt-p 10 months ago

      Well you can turn them off for a day and they have the skills to see that.

paulbjensen 10 months ago

My only conclusion is that Monzo would rather embrace the apocalypse than rely on Microsoft Azure to provide a tertiary fallback.

4ndrewl 10 months ago

Really interesting. Would love to understand how they came to the decision to build this,and whether there's any precedent for it.

  • matt-p 10 months ago

    Part of being a regulated bank in the UK is proving infrastructure resiliency.

    Monzo were the first bank here to run entirely on the cloud, so I imagine the regulators were extra strict with them.

    I'm not saying this level of resilience is due to that alone, but perhaps it started them on the path?

  • quesera 10 months ago

    Payment card networks have delegated authorization plans, where if a major processor goes down, they will still route transactions and use a simplified secondary network for making approval decisions.

    It's called "stand-in processing", and I assume it's the inspiration here.

    • 4ndrewl 10 months ago

      The Monzo example feels different though, as they're explicitly not looking to replicate all functionality, just something minimal to get by whilst they fix the primary cloud services.

joshstrange 10 months ago

Completely unrelated to this blog post but I really dislike Fintech saying "Get paid early" in their promos.

It's clearly marketing at someone too stupid to be able to see right through how utterly useless that is. If you are celebrating getting your paycheck 1 day earlier (every time) then your financial literally and financial health are probably in the toilet. They _must_ know they are preying on people with statements like that.

Then again, 90% of Fintech seems to be just a heavy layer of lipstick over an archaic system. Often with very little care of if any of the tools actually help people and more of a focus on how flashy or how much people think they are being helped.

  • jkingsman 10 months ago

    Though, in some cases (like when it's your bank saying it), it's usually just them frontrunning reliable (coming from a payroll provider) and predictable (getting paid the same time each month) ACH transactions with a near-zero likelihood of not settling, then crediting you the money before the ACH is totally settled, so not ALL cases are fintech gimmicks.

    But most are, and unfortunately, as the proliferation of payday loans shows us, there is no shortage of desperate people and organizations willing to take advantage of that.

    • quesera 10 months ago

      Right, some banks will not post a deposit to your account until after a holding period. I deal with a lot of ACH payments, and despite a very strict schedule in the network, the retail customer-facing side is surprisingly unpredictable.

      So the "post credit early" promise is not a gimmick, but the whole idea of being paid early is a gimmick. The next pay period is still a full period away, so any benefit to being credited early is literally a one-time, and probably just one-day thing.

    • andrewaylett 10 months ago

      Remember that Monzo is a UK institution -- ACH isn't relevant, and they can see the payment in flight if it's using BACS.

      https://monzo.com/blog/2019/08/20/monzo-now-lets-you-get-pai...

    • blibble 10 months ago

      as a banker, when I first heard about that I did I wonder if they've modeled that risk correctly

      it's the sort of thing that could probably wipe out their capital completely in a black swan event

      • quesera 10 months ago

        There's an ocean of historical data to predict reversal or settlement failure of ACH transactions.

        I would guess that payroll credits are the second most-reliable category in the ocean of ACH transactions, right after US Treasury payments.

        How black would this swan need to be to blow up this stability?

        • blibble 10 months ago

          not sure what american payment transfers have to do with UK BACS payments, but ok

          > I would guess that payroll credits are the second most-reliable category in the ocean of ACH transactions, right after US Treasury payments.

          maybe some sort of lunatic getting control of the US treasury payment systems?

          I suppose that can't ever happen

          • quesera 10 months ago

            This thread is "completely unrelated to this blog post" and your previous comment was responding to comments about the US ACH network.

            • lol768 10 months ago

              Right, but it's talking about "Get paid early" - and in the context of this particular post that's a specific Monzo feature that has absolutely nothing to do with ACH.

              Do US fintechs offer something similar? Perhaps, but I bet it works pretty differently to BACS where the bank already knows about the money transfer.

              • quesera 10 months ago

                Yes, "get paid early" is a feature that US fintechs and banks offer, and have for years. So in the context of this subthread, it's all about ACH.

                The feature is based on the predictable pattern of payroll direct deposits, and/or a pre-settlement view into the ACH transfers for the day. The latter sounds like what you are describing for BACS, but I don't know the UK details.

      • simonvc 10 months ago

        It's a risk that was very much understood and it's fully covered.

        • blibble 10 months ago

          I've heard that one before

          • lol768 10 months ago

            I mean, it's been in production for years, has a cap of £20k and I'm pretty sure the design of BACS means it's very difficult to recall a transaction after the 4pm time when the feature becomes available. Simon is probably in a very good position to know how often that sort of thing happened, if ever.

            I'm pretty happy with them offering this based on my understanding of BACS, as a shareholder.

            • blibble 10 months ago

              20k... per account? adds up

              I suppose most people won't have 20k (net) payslips, or draw it all out on pay day!

              I'm not saying he's wrong, but covering a risk of this nature? I'm sure you'd be able to find someone to take your premiums

              whether you'd be able to collect on it when needed, in the situation where the financial system is under serious stress is something else (see: 2008)

    • joshstrange 10 months ago

      > not ALL cases are fintech gimmicks.

      Fair and that's all well and good. I'm just saying if 1-3 days delay of getting your paycheck is going to have a big impact on one's life then I encourage one to reexamine their decisions, something else is the problem.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection