Settings

Theme

Cloudflare R2 Global Outage

cloudflarestatus.com

133 points by leiferik a year ago · 36 comments

Reader

jgrahamc a year ago

This is resolved. Sorry about the downtime.

  • ozgune a year ago

    On our side, we first saw the Cloudflare outage. Then, Docker Hub started failing, followed by GitHub API errors.

    It's amazing how much of the internet runs / depends on Cloudflare these days. Thank you for keeping the lights on. :)

    • martin_a a year ago

      > It's amazing [...]

      "Shocking" is the word you're looking for. Keeping the lights on would be easier if we wouldn't have an internet resting on a few shoulders.

      • neom a year ago

        Critical infrastructure is critical so it belongs in the hands of a few. You can make it reliant on nobody so I have to trust nobody, or you can put it in the hands of a trusted few who I can research and understand. Michelle, Matthew and John GC have done such a good job of building trust we've rewarded them to the tune of billions of dollars. I understand your perspective, I just wanted to share there are other people on the other-side who don't find it shocking and appreciate how it is.

        • wongarsu a year ago

          Counterargument: centralization leads to all critical infrastructure failing at once, which is far worse than single pieces of infrastructure experiencing intermittent failures at different uncorrelated points in time.

          • electroly a year ago

            Counter-counter-argument: users forgive you when all their other apps are down, too. Countless cloud outages have proven there is safety in numbers. If your app is the only one down, you're having a very bad day. If everyone's app is down, nobody seems to actually care that much. Not enough to start making my outages my own problem instead of just waiting for someone else to fix it, at least.

          • neom a year ago

            Yes, I want to also be clear my point was philosophical not technical and i'm not trying to start a flamewar. I don't know there is a "right answer" here - I just have one perspective, I don't see the others as being less valid in exploration. I've thought about your point a lot over the years (I helped build a large cloud provider from scratch) - you might be right I don't know but in my experience outages on less centralized systems tend to go on for longer and are harder to deal with if the surface area is too diffuse, there is probably a happy medium, but I still don't have a problem with cloudflare, they seem generally fine, we've known them a long ass time now. i am quite concerned however about when Matthew and Michelle move on, as I expect they will one day, who takes over... you have to have a lot of gaul to run cloudflare correctly.

        • nullstyle a year ago

          "Critical infrastructure is critical so it belongs in the hands of a few."

          Holy shit no. This is the sort of thing that gets kooky politicians randomly turning off the flouride in a city's water supply: https://alaskapublic.org/news/2021-12-15/anchorage-mayor-tur...

          Things aren't as simple as you seem to think they are.

          • neom a year ago

            I take your point and it's well presented. We could easily get into neo-liberalism here but It's too early on a Thursday morning for that so I'll just accept it's more nuanced and I hope you'll look for the nuance on my side also. :)

        • johnmaguire a year ago

          > Critical infrastructure is critical so it belongs in the hands of a few.

          Well that's a non-sequitur.

        • diggan a year ago

          > Critical infrastructure is critical so it belongs in the hands of a few.

          Yeah, why even have ASNs, BGP and distributed network infrastructure when we could just have GooFlareZon host it all, with basically no drawbacks?

          There is many good reasons why the internet is distributed and why that was the architecture that allowed it to go global. Going back from that would do no one any good except the ones who ends up the new owners.

          • neom a year ago

            The internet is distributed control, yes. BOFH operating tables, as I said for me it's either no humans or humans I can audit. The DNSSEC Root KSK Ceremony is neat.

  • randomtoast a year ago

    Thanks, when will we get a post-mortem?

your_challenger a year ago

Thank God my side hustle doesn't have any real users

evertedsphere a year ago

suddenly realising that deepseek could do something very funny and name their next reasoning model r3 to avoid a clash with this just like openai did

danielskogly a year ago

Status page lists R2 as Operational.

From the discord: > They already have been paged and acknowledged the issue, unfortunately there is delay putting up a statuspage but it should be there soon

Edit: Status message was added right as I posted this.

  • eknkc a year ago

    Looks like we have logged the first errors at 08:12 UTC and the status page has the issue at 08:34 UTC.

roboben a year ago

Docker hub reported an incident[1] at the same time. Are they running on R2?

[1] https://www.dockerstatus.com/pages/incident/533c6539221ae15e...

roelb a year ago

Normal workers seem to work, although the response error 500 mentions worker failure. Seems to be limited to R2.

  • dvrp a year ago

    If you check their latest updates it says “Update - This incident is impacting R2, Durable Objects, Cache Reserve, Key Transparency Auditor, Stream, Logpush and Images.”

immibis a year ago

The cause: while trying to block a phishing site they accidentally turned off R2.

I am not joking. Check the blog.

lousken a year ago

pulling dockerhub images also returns 500, so i am assuming this is the cause?

roelb a year ago

Seems to be resolved now, requests and listing works again overhere.

vault a year ago

why is the date `Oct 17, 2024 - 20:26 UTC`?

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection