Settings

Theme

A Tale of DNS and BGP: The Facebook Outage, October 2021

riskledger.com

66 points by jamescun 4 years ago · 17 comments

Reader

fauria 4 years ago

Facebook in this case, operates a set of intermediary DNS servers that are responsible for everything between your ISP's recursers and the roots. These are responsible for facebook.com, instagram.com, whatsapp.com and everything else they operate.

This is not the case for instagram.com, which is hosted on a different provider (AWS Route53) and was resolvable during the whole outage.

I'm not sure why Instagram's fronted servers returned 503, though. Maybe their backend fleet was included in the withdrawn prefixes, or maybe it was referenced through the affected domains.

  • 1vuio0pswjnm7 4 years ago

    "I'm not sure why Instagram's frontend servers returned 503, though."

    One explanation is Facebook uses a proxy configuration that requires DNS in order to resolve the internal IP addresses for the backend servers. High availability proxy servers like haproxy can easily use files loaded into memory to do lookups, instead of making DNS requests. Apparently Facebook had no backup plan if the DNS method started failing. Facebook remained down until their DNS servers became available. The proxies continued to work and no doubt the backend servers were available the entire time, but proxies could not connect to them because the DNS lookups for their internal IP addresses (serv)failed. After the retried DNS queries finally timeout, a 503 is returned.

    "Maybe their backend fleet was included in the withdrawn prefixes..."

    According to Cloudflare's writeup the only prefixes withdrawn were for DNS servers.

    • 1vuio0pswjnm7 4 years ago

      Another possibility is that failing to announce the prefixes for their DNS server IPs was just a symptom of a larger problem, like misconfigured routers.

  • jvolkman 4 years ago

    Kind of funny that instagram.com uses Route53, but amazon.com does not.

shric 4 years ago

> No two devices on the internet are directly connected.

I get the need for brevity and simplicity in a post like this, but is there really a need for obviously false statements?

  • xapata 4 years ago

    You can't get there from here.

  • thehappypm 4 years ago

    What’s a counter example?

    • shric 4 years ago

      My router and my desktop is one of several billion counterexamples.

      • thehappypm 4 years ago

        Your route and desktop are not an example of an internet connection. That's an intranet connection. I think that's what they mean -- for two devices to be connected on the internet there's always (at least) routers in between.

        • shric 4 years ago

          I can infer what they mean, it doesn't make it a correct statement. Maybe I'm being pedantic, but routers are devices too, and I have computers with multiple NICs that act as routers as well as servers. Intranet vs internet is an arbitrary distinction. If a "device" has an IP address that's reachable from "the internet" then it's on the internet, regardless.

          • thehappypm 4 years ago

            The article's point is that to get information from Device A to Device B across the internet is never a straight link from Device A to Device B, there are always middlemen whose purpose it is just to forward the data along. There's always something between the end nodes.

nazgulsenpai 4 years ago

This page makes Brave think it's unavailable and offer an archived version, lol.

cryptodan 4 years ago

This was likely an inside job. This outage prevented employees from entering their office buildings.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection