Settings

Theme

Cloudflare Experiencing Latency Issues

cloudflarestatus.com

151 points by z0a 4 years ago · 68 comments

Reader

zertop 4 years ago

> Monitoring > Cloudflare has implemented a fix for this issue and is currently monitoring the results. > Posted 1 minute ago. Dec 16, 2021 - 20:44 UTC

Fast feedback, communication and fix. Always impressed with them...

jgrahamc 4 years ago

Should be cleared up now. Sorry about that.

  • tentacleuno 4 years ago

    I really like it when people in your position at such a big company post on here, even if it is a brief comment like this. Thank you!

    • jgrahamc 4 years ago

      I would have posted faster but I was too busy doing my own little bit of debugging which consisted of

          1. dig @1.1.1.1 jgc.org
          2. nc -v 104.22.11.223 80
          3. curl -v https://jgc.org/cdn-cgi/trace
          4. curl -v https://jgc.org/
      
      Hmm #1 was fast so network is routing OK. Hmm #2 was fast so TCP is OK. Hmm #3 was fast so I know (because I worked on that code) that this code path is good. Hmm #4 is slow so that means component X is slow but still working.

      Of course, in parallel I'm in a conference call with about 40 other people who have actual access to monitoring and systems and other things who can see exactly where things are.

      But I was damn close with four commands and gave me confidence in what people were saying. But, I have to say, Cloudflare's internal distributed tracing system is pretty cool because I got sent a trace and you could see right where the slowdown was.

      • tailspin2019 4 years ago

        Now you're just showing off :)

        • jgrahamc 4 years ago

          Allow an old man the fantasy that he still knows how the whole of the system works.

          • maxgashkov 4 years ago

            That's a huge problem for companies of almost any scale. Can you shed some light on tools used internally in Cloudflare for tracing?

          • jatone 4 years ago

            sounds like you do. ;)

      • vinay_ys 4 years ago

        What was component X? Was it a buggy rollout?

        • jgrahamc 4 years ago

          A proxy that serves traffic. No, it was way more complicated than “buggy rollout”.

      • tuananh 4 years ago

        you often do you code these days?

        • jgrahamc 4 years ago

          Hardly ever. If I start a project I don’t end up having time to finish or maintain it which isn’t fair to the team.

          If I write something it’s for my own use. And I like to write things that test Cloudflare. Доверяй, но проверяй.

  • odiroot 4 years ago

    I'm always surprised you have the time to respond in these threads, jgrahamc.

  • rexreed 4 years ago

    Thanks for the post. Yesterday you strongly rebutted me for saying that the widespread outages also were impacting Cloudflare [0], even tho it wasn't obvious to me at that time who was being impacted by who. Knowing the intricate connections of all the app and infrastructure and cloud providers is tricky! When stuff goes down, the blame gets spread around.

    [0] https://news.ycombinator.com/item?id=29568319

    edit: changed "scolded" to "strongly rebutted"

    • jgrahamc 4 years ago

      Sorry if that came across harshly. ASCII is a tough medium.

      • rexreed 4 years ago

        Yes text is indeed devoid of context and sentiment. But to the main point, I fear that the comments on tone are a bit beside the point anyways. So let's move past that to talk about what I really wanted to comment on.

        To be direct - yesterday I spotted what seemed like an Internet-wide issue that was also impacting Cloudflare. You told me yesterday that no in fact, there was no impact on Cloudflare. Today there is a post about a separate issue where there is an impact on Cloudflare. In my mind I make the connection between these two events, and on the one hand the quick and direct denial of the issue being that of Cloudflare on the first day, but today an acknowldgement of issues, even if they were a different set of problems.

        It would be helpful on outages where Cloudflare is showing an outage when the problem doesn't originate with Cloudflare to put on your own error page an indication of where the error might be. I know this might be touchy to do so, but you should feel free to point fingers when you know that an outage to your client is caused by another party.

        For example:

        "Error. Cloudflare reports this site is down. Issues point to an outage with [AWS, Google, Azure, Oracle <-- just kidding] as being the source of that outage"

        That would help make it clear that yes, there is an outage, and no, Cloudlfare is not the proximal cause.

        All this chatter about your use of words and my use of words kinda misses the main point of what I was trying to communicate.

    • tailspin2019 4 years ago

      Unless the comment was edited, I don't think that was a "scolding" :-)

    • eightysixfour 4 years ago

      Is there a particular reason you perceived their response as "scolding?" It just looks like a straightforward answer.

      • sneak 4 years ago

        In my experience, a lot of Americans interpret direct, no-wasted-word statements as aggressive or confrontational. Euphemism and indirect implication is the norm in American communication, much to my dismay.

        It can wrap around to extremes sometimes, too.

        https://sneak.berlin/20191201/american-communication/

        • Buttons840 4 years ago

          This reminds me of the younger generation sometimes perceiving messages ending in a period as rude.

          I'm also sometimes surprised by how effectively a simple statement like "I don't want to spend money on that" can shut down even a pushy salesman. Or even the simple "No." can work wonders.

          • ziddoap 4 years ago

            >This reminds me of the younger generation sometimes perceiving messages ending in a period as rude.

            I've never seen any comments regarding a single period, but I've seen comments (and sometimes agree with them) regarding the perceived rudeness when ending messages in ellipses.

            "Good job..." seems almost sarcastic compared to "Good job.".

        • wolverine876 4 years ago

          Interesting: IME, it's the Americans who are called rude and overly direct. Go to Japan and give it a try, for example.

          Edit: Reading your link: First, that's well-written and insightful; thank you.

          However, it seems like a common (young, if I dare guess) frustration with human communication, especially among geeks (if I dare guess, here on HN, and including myself as one): Communication is not transmission of information, but a social interaction. You have to think about all these other things (where many geeks feel out of their depth), and in fact those other things are more consequential than the information (with which many geeks feel very confident). In other words, it sucks to have all the information, to be a master at it, and find that it doesn't matter so much.

          Tip: Don't try to dismiss it; it's human nature and won't change; learn the skills. 'Skill' #1: learn to not objectify the other party (they aren't an endpoint device in your communication network), and the best tools for that: curiosity about them - about their unique universe in their mind, their own wants and perspectives, completely unrelated to yours - and compassion: they have a difficult life too. (Of course, that's just my perspective! :) )

        • tailspin2019 4 years ago

          > a lot of Americans interpret direct, no-wasted-word statements as aggressive or confrontational.

          I think this can be said of the British too. Though we would probably make the mistake of interpreting it as rude rather than aggressive. As someone who doesn't communicate particularly directly, I often make this mistake myself.

          Though I'm not sure which side of "the pond" is worse in this respect.

        • rexreed 4 years ago

          It's possible it's cultural. Sometimes strong, absolute rebuttals come across as someone just trying to shut down a conversation and deny. Other times a direct answer is just a direct answer. The problem is that context and tone are helpful here.

        • freedomben 4 years ago

          as a native, I can say this is absolutely true and also horrible for productive communication. particularly the pacific and mountain west is really rough.

          Especially when discussing politics it can be confusing as hell trying to figure out what somebody really believes/wants because the tip toeing around egg shells can make the words impossible to decode.

tailspin2019 4 years ago

All my sites running through Cloudflare Tunnel are very very slow, but still just about online.

As a side note, I'll take this opportunity call out the superb Checkmk monitoring system which alerted me to this. I don't see Checkmk mentioned on HN that often...

https://checkmk.com

EDIT: Seems to be fixed. Good job!

  • marcolussetti 4 years ago

    CheckMK offers pretty solid monitoring our of the box, but I think it falls quite short of the mark when you want to add more than the default monitoring. It's still probably the easiest solution for monitoring a bunch of basic VMs, servers, etc. You can set it up and get a solid idea of how it works in only a few hours.

  • donmcronald 4 years ago

    I don’t think I’d use it even if they have a self-hosted version. The way they set their pricing based on “number of services” seems like the kind of tactic you use when you want to intentionally make things confusing so you can extract as much value as possible.

    What happened to honest businesses with fair, easy to understand pricing?

    • tailspin2019 4 years ago

      > What happened to honest businesses with fair, easy to understand pricing?

      Well in my case the pricing is very easy to understand. It's free!

      I only have < 25 hosts so I self-host the open source version on a $5/month DigitalOcean instance (ironically also reverse proxied through Cloudflare)

      So I certainly don't think that's exactly dishonest or unfair. It's been rock solid since I've used it. I don't know how many services you'd need to monitor but the starting prices for Standard and Enterprise seem pretty reasonable to me?

      It probably doesn't scale to a very large operation - but then it's not really "cloud first" monitoring akin to something like Prometheus, so perhaps their target audience isn't really likely to have a huge number of services to monitor.

yRetsyM 4 years ago

I'm impressed they've updated their status page so quickly, unlike some of their cloud competitors

  • jgrahamc 4 years ago

    It's part of our process. Here's the internal timeline

        T+0 Automatic comms thread created
        T+1 XXX Is this a P0, do we need a status page?
                @YYY
        T+1 YYY Eyes on
        T+4 ZZZ Yes
                let's get super-generic status page up
                @XXX / @YYY - you have one handy?
                I see it now thx
mrcnkoba 4 years ago

I would love if their customer service was so fast. They keep ghosting us for 7 days. Poking the on a live chat results in "hey we'll look at this".

Truly loving the service but we had to "unproxy" our website. When it works, it brings so much value. I'm guessing our issue isn't trivial to solve though.

lend000 4 years ago

This impacted just about every API I regularly request for ~20 minutes, but it seems to be fixed now. Hopefully permanently.

Jamie9912 4 years ago

Seems to be a regional thing because i'm not experiencing any issues with Cloudflare hosted things, reaching the SYD PoP

chizhik-pyzhik 4 years ago

CircleCI still very slow/returning 504s, not sure if related?

amar0c 4 years ago

Is this Telia (1299) related maybe ?

spurgu 4 years ago

Yeah seems to apply to all the sites I've tried so far.

gildedage77 4 years ago

Maybe because they're powered by Clickhouse?

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection