What is BGP? – BGP routing explained
cloudflare.comThere is already a nice writeup on the current incident from Cloudflare at https://blog.cloudflare.com/october-2021-facebook-outage/
They key observations:
"Due to Facebook stopping announcing their DNS prefix routes through BGP, our and everyone else's DNS resolvers had no way to connect to their nameservers. Consequently, 1.1.1.1, 8.8.8.8, and other major public DNS resolvers started issuing (and caching) SERVFAIL responses.
But that's not all. Now human behavior and application logic kicks in and causes another exponential effect. A tsunami of additional DNS traffic follows.
This happened in part because apps won't accept an error for an answer and start retrying, sometimes aggressively, and in part because end-users also won't take an error for an answer and start reloading the pages, or killing and relaunching their apps, sometimes also aggressively."
> apps won't accept an error for an answer and start retrying, sometimes aggressively
I'm certainly guilt of this. Retries make the world go round, and round again. I've been given attitude by teams that own downstream services.
Them: "Why are you retrying so aggressively?" Me: "Why is your service so damn flakey?"
> Retries make the world go round, and round again.
Depends on the rate I would think:
Surely they are upstream from you if they need to rate limit you?
(And that sounds like you giving, rather than being given, attitude.)
I don't reload often, but when I do, I do it rapidly and in anger.
-some tester I know
Cloudflare has a useful tool for measuring if your ISP is using RPKI.[0] For Facebook, this is the latest I could find for their implementation of BGP.[1][2]
[1] https://engineering.fb.com/2021/05/13/data-center-engineerin...
[2] https://www.usenix.org/conference/nsdi21/presentation/abhash...
Was banging on about this with some of the people probably here over 20 years ago. Not sure what this issue with FB was as I'm not on nanog anymore, but if it's bgp, it's a short list of likely events, as I foggily remember.
- someone big redistributed their static routes for FB into their announcements to peers.
- someone who has mapped peer filters and their prefix lengths has figured out how to announce smaller prefixes for FB routes and have them propagate.
- someone with enable somewhere in one of the major ASNs (like 701 back in my day etc) is doing a straight forward attack on FB.
- someone inside FB messed with load balancing and prepended a bunch of their routes internally and redistributed the long AS paths themselves and just broke shit with internal routing loops.
I have no idea how people unbefunge routing problems now that you have to coordinate multiple teams on the phone to get anything done instead of just one router guru just logging into everything and fixing it. I would be useless at it now, but this is not a recent problem. If it's still a problem, it will always be a problem.
> While there have been a number of ambitious proposals intended to make BGP more secure, these are hard to implement because they would require every autonomous system to simultaneously update their behavior. Since this would require the coordination of hundreds of thousands of organizations and potentially result in a temporary takedown of the entire Internet, it seems unlikely that any of these major proposals will be put into place anytime soon.
Excellent. Just what I like to hear /s
It's a Cloudflare lie. Probably for business reasons. One of the solutions that does not require every as to simultaneously update their (irresponsible) behavior is RPKi. https://www.ripe.net/manage-ips-and-asns/resource-management...
I am not sure why we say that in that way. Have raised internally. We are big fans of RPKI. See https://isbgpsafeyet.com/.
Thank you a lot for doing so! <3
You might be entertained to know that this is exactly what happened when 'the net' switched from NCP to TCP/IP -- there was a 'flag day' and poof! we were henceforth on TCP. So, it can be (successfully) done.
the diversity of the stakholders was arguably _much_ lower (mostly us-education and -defense) when this was done
I have a hunch that the "How BGP can break the Internet" will get updated in the near future :^)
Why can’t they at least start to inform who is advertising what. After say 1 year we would have most if not all … gradually we can build a grey BGP not all white but at least in case if some … wonder. Or any other option. Total trust is so untrustworthy.
I recall from networking classes messing around with BGP can be bad. Very bad.
How does one go about setting up an autonomous system? Seems like a shadowy world based on the impact they could potentially have.