Playing Battleships over BGP (2018)
blog.benjojo.co.uk> For a protocol that was produced on two napkins in 1989 [...]
I'm not sure I'd want to deal with a protocol that can't be explained on a napkin or two. UTF-8 was design on a diner placemat:
Author of the post here, Ask me almost anything I guess(?)
If you could replace BGP globally instantly with no problems. What would you replace it with?
(Keeping in mind that replacing BGP is similar hard-ness as replacing SMTP, and thus, might not be worth it)
Honestly, the issue that exists with BGP is not the protocol. The issue is attached to trust, and there is not a instantly fixable problem with a different protocol.
One issue with the internet as a whole is that seemingly simple questions are actually hard, The one is slowly being fixed with RPKI is "Who actually owns this IP address", knowing this we can build better filters against direct (origin AS != owner AS) hijacks.
However the next question that has no solution for is "Who is allowed to carry this route/transit this data?" -- This is going to be unbelievably hard to solve with certainty, There is question that maybe a PKI solution could be deployed (BGPSEC). However you also will hit the next issue.
The bgp table is massive. 1M+ routes that is stored on machines with reasonably long lifetimes. It does not help that in terms of computing power these machines are in general very slow. A multi TBit/s router may only have a 2014 era laptop CPU powering it. So computing anything 1M times quickly is a massive ask, and when links go down, it is reasonable have fast recompute/reconvergance times.
Fixing bgp is not a easy issue. Anyone who is telling you so is either fraudulent or does not understand the sheer scale/scope of the issues attached to the protocol.
it is if you relax the constraint that the providers keep the legacy allocations and can advertise whatever the hell they want
Steve Deering had a really nice proposal on geographic addressing that would make pki sufficiently performant by using hierarchical assignments
Have you seen Yggdrasil? It provides an alternate routing idea, among other things.
and keep IPv[4|6]?
IPv9 is where it's at.
no question, just a selfish request for more blog posts please
Considering Events yesterday - how do you test non-live ?
Maybe dn42.eu?
> Experiment with routing technology
> Participating in dn42 is primarily useful for learning routing technologies such as BGP, using a reasonably large network (> 1500 AS, > 1700 prefixes).
> Since dn42 is very similar to the Internet, it can be used as a hands-on testing ground for new ideas, or simply to learn real networking stuff that you probably can't do on the Internet (BGP multihoming, transit). The biggest advantage when compared to the Internet: if you break something in the network, you won't have any big network operator yelling angrily at you.
Who said I tested non-live?
The actual beta builds/sanity checks were done just with two VMs peered with each other, but the live internet one was done in one take (and never again, at least by me)
To add on, BGP has a very much "meme" status of being scary and dangerous, and any touching will break youtube etc. [Mostly perpetuated by infosec circles]
It's really not the 2000's anymore, BGP is mostly safe and filtered. There are still improvements to be made (I've even written on the blog about them), but one persons immense fuck ups are far less likely to cause issues now that IRR filters and prefix limits exist.
> It's really not the 2000's anymore, BGP is mostly safe and filtered. There are still improvements to be made (I've even written on the blog about them), but one persons immense fuck ups are far less likely to cause issues now that IRR filters and prefix limits exist.
Any non-maliciously designed protocol probably can be used safely, but surely yesterday's events show that it is still eminently possible to use BGP dangerously?
What part of yesterday was showing that it was possible to use BGP dangerously?
If you are certain in this argument, then you master electric switch is dangerous because you could switch off the power to your house.
> If you are certain in this argument, then you master electric switch is dangerous because you could switch off the power to your house.
This seems like a response to an argument I haven't made yet! (All else aside, if you prebut my argument, it allows me not to make that argument.)
Sure, it's possible to do dangerous things with BGP; that alone is not why I say it's possible to use it dangerously. What is dangerous is the fact that a small and apparently innocent change can have such far-reaching consequences—for example, I'll bet there was no serious consideration at Facebook of not being able to open electronic door locks in the case of an apparently innocent BGP update.
I don't consider my master electric switch dangerous because I could switch off the power to my house. I would consider it dangerous if, after switching off the power to my house, I was ejected from my house, and could no longer open the doors of my house to get in and switch the power back on.
>for example, I'll bet there was no serious consideration at Facebook of not being able to open electronic door locks in the case of an apparently innocent BGP update.
If that was actually the case, a lot of heads at FB should roll over this. The logic is simple and obvious, and if the sysadmins and network admins didn't think about this line of thinking then they're overpaid:
1) Our door control system is accessed via a public IP/address, not via an internal/private address.
2) Accessing our public IPs/addresses is dependent on BGP and DNS not getting borked.
> If that was actually the case, a lot of heads at FB should roll over this. The logic is simple and obvious, and if the sysadmins and network admins didn't think about this line of thinking then they're overpaid:
You say "if" as if it's a conditional, but surely the fact that it happened proves that no-one considered it (or, I suppose, that whoever did consider it didn't have enough sway to stop it from happening). There are, rightly, so many laws and regulations requiring that safe egress in case of emergency not be prevented, and I can't imagine anyone actually considering and tolerating even the slightest risk of an Internet issue preventing that egress. Well, I guess I can imagine lots of people doing lots of awful and harmful things, but I can't imagine anyone doing it in such a way that it would be this easy to get caught doing something blatantly illegal.
(Or were there safety measures in place that allowed egress, just not entry? I don't know the specifics, since my source is just the news stories that mention that the door locks didn't work and people couldn't get in—but maybe they could still get out?)
What happened yesterday was (appears to be) Facebook screwing up their own routing and DNS, not anyone else's. They didn't take down routing for any IPs and domains they didn't own. I can't imagine any other protocol making a mistake like FB's impossible
So how should these problems be mitigated? Have separate infrastructure for critical services or staging BGP or what?
It seems that the main problem Facebook group in restoring device was a lack of a completely separate out of band management network
If my network (way smaller than FB, but budget way lower) goes, I can get in via another ISP and WireGuard into the OOB network which is completly separate to the inband management.
Not every access switch is on OOB, but the core ones and a few critical devices are.
This was very cool and also IMO very irresponsible.
So this is why Facebook went down, eh? ;)