New AWS UDP Load Balancing for Network Load Balancer
aws.amazon.comThis is a Big Deal because it enables support for QUIC, which is now being standardized as HTTP/3.
To work around the TCP head of line blocking problem (among others) QUIC aises UDP.
QUIC does some incredible patching over legacy decisions in the TCP and IP stack to make things faster, more reliable especially on mobile networks, and more secure.
Here’s a great summary from Fastly on what QUIC means for the Internet: https://www.fastly.com/blog/why-fastly-loves-quic-http3
This is big for making services which rely on DNS much easier to roll out in a container environment (ECS, EKS, etc). Traditionally we've had to create custom AMI images, use CloudFormation to keep them running with EIPs, and then have those EIPs be part of runtime configuration for our services.
One "downside" of AWS is we've rolled a lot of custom solutions like this, at significant time/expense, only to have them be made obsolete by eventual native feature support. So we get left with a mixture of legacy systems using the custom solution and newer ones using native support and it makes things more complicated. It's actually a good problem to have in many ways, and basically unavoidable in many circumstances, but an interesting dynamic nonetheless. Reminds me of interstellar wait calculation[0] - do we defer dependent features until there's native support, or forge ahead knowing there's a likelihood of being 'overtaken'?
[0] https://en.m.wikipedia.org/wiki/Interstellar_travel#Wait_cal...
Another way to look at it: customers like you, who build custom work arounds to some problem, influence our decision that a particular problem is important enough to be solved.
Yup, and that's overwhelmingly a good thing! The one thing I will say is that AWS does tend to lean on this attitude a bit too much, IMO, with a tendency to ignore common sense about what people will inevitably need, thus causing the kind of thrash I described when it could have been avoided. It is erring on the right side of delivering vs waiting generally, but the balance could stand to be fine tuned.
Nothing is forcing you to switch from your custom solution to the native support. If you don't switch, you are in the same situation as if they native solution was never invented.
Nothing except every new hire that complains having to learn it instead of the native solution.
It's just like the parent said very much a first world problem/ good problem to have, as it's a situation which only exists if you're in a very productive team
Can you elaborate a bit on your architecture? I'd love to understand what your use case is.
In most architectures I've seen where containers are involved, the rendezvous point between external clients and containerized services is an external proxy (i.e., a load balancer), and the only DNS lookup required by such clients is of the proxy itself, so no DNS UDP traffic needs to be sent into the cluster. In K8S we call this proxy an "ingress."
Is the situation that you want to expose the cluster's internal DNS to the outside world to avoid having to configure ingress? Or is it something else?
Containers that require custom DNS queries about incoming connections from a non-HTTP service (we're using the NLB for this), using a caching DNS server that isn't publicly accessible.
I could see an SRV record style of load balancing being done on containers optimizing that layer by reducing a hop
I wonder if the AWS Route53 VPC resolvers work in that same way for internal VPC DNS resolution.
Related - has anyone done much with UDP load balancing on prem?
We're starting to hit performance and HA walls with ingesting Netflows from edge routers - you can only nominate one target, and using Elasticsearch / Logstash there are some hard limits.
Would AWS be appropriating nginx under the hood here?
Lots of people use IPVS but the more efficient modes don't work on AWS. Generally why most that need a LOT of traffic use a cloud provider for regular servers and their own servers in CoLo for heavy stuff.
With how Amazon likes to use OSS in their services I'm pretty sure their UDP load balancer are in fact just using IPVS
NLB is built on top of AWS HyperPlane, a hybrid system that has components distributed in our Nitro security system, and pseudo-central components that keep flow-tracking state. It's different from IPVS.
Interesting, thanks. Hadn't considered this option before, and will do some more exploring, though I note on the IPVS page they say:
"For scheduling UDP datagrams, IPVS load balancer records UDP datagram scheduling with configurable timeout, and the default UDP timeout is 300 seconds. Before UDP connection timeouts, all UDP datagrams from the same socket (protocol, ip address and port) will be directed to the same server."
I'm hopeful / confident that affinity can be fully de-tuned here, as we're looking at around 5-10k UDP Netflows per second from a given router that need to be distributed to a set of receivers.
I may be wrong, but I think you can tell IPVS to schedule using tuple hash only using Direct Return mode, which means no stored state for connection tracking.
Edit: doesn't appear to be true, but it uses it's own "lightweight" connection tracking table so you can unload conntrack modules from kernel.
Realistically IPVS can probably route 40 gigabit of traffic per instance. Combine that with DNS round robin and maybe even multi-homing at the front and you could handle basically anything
Nice! I wonder if this is a preparatory step for future quick/http3 support?
That's great! Any idea what Load balancing algorithm this would use?
We have a need for some stickiness in the load balancer (for example: UDP Packets from a source must be routed to the same instance, at least for a short while)
It's documented as:
> For UDP traffic, the load balancer selects a target using a flow hash algorithm based on the protocol, source IP address, source port, destination IP address, and destination port. A UDP flow has the same source and destination, so it is consistently routed to a single target throughout its lifetime. Different UDP flows have different sources, so they can be routed to different targets.
From the NLB docs at https://docs.aws.amazon.com/elasticloadbalancing/latest/netw...
This is great news, and something I’ve been requesting for years. I manage an IoT backend based on CoAP, which is typically UDP-based. I’ve looked at Nginx support for UDP, but a managed load balancer is much more appealing.
Same story here, getting NGINX to help even with highest support tiers was a PIA too
Apparently if the target is the instance ID this can preserve public source IP and port. That can be a big deal for e.g. bootstrap nodes for P2P networks.
Can be nice for games, QUIC and DNSCrypt.
Curious: How does one generally load balance udp? Drop packets? Slow them down?
It means taking a set of packets sent to one address and spreading them across multiple servers to share out the load.
oh, geez. Thank you. Somehow I was thinking about throttling, not load balancing.
A plug for our (Cloudflare's) product — we support managed load balancing for UDP as well.
- https://blog.cloudflare.com/spectrum-for-udp-ddos-protection...
- https://blog.cloudflare.com/introducing-spectrum-with-load-b...
Looks cool but if the product is only available for "Enterprise" customers and the pricing is "Request Quote" that means it's expensive. At least the AWS pricing is published.
Sweet. Now add support for multiple ports on a single service[1] and this load balancer might actually become useful.
ALB !== NLB
With NLB targetting EC2 you can only specify one port per target group. To achieve multiple ports going to a single instance (or autoscaling group) you need to have one listener and target group per port.
It effects both (and that ticket is about both if go through the comments).