Ask HN: How can I monetize a load balancer for ML applications?
I have a solution that solves a set of problems that keep showing up in ML workloads. The kind of systems I'm talking about are ones where:
- You have a GPU attached to each instance.
- Each request takes anywhere from 10ms to 2min.
- There's a hard limit on the number of in-flight requests/queries (I assume because of the GPUs).
Normally, I see people fronting the instances with software load balancers, but this doesn't work very well for reasons. Assuming I have a solution in the form of a fancy load balancer, how would I go about monetizing it? Let's assume the solution is non-trivial to create, but very straightforward to use (essentially a drop-in replacement).
I ask because I don't think I can just "sell a fancy load balancer" like it's the late 90s or something. Modern companies appear to always have more complicated products and I just want to sell a straightforward piece of infrastructure that solves a fairly hard problem.
Thanks in advance. Is there accessible documentation which covers installation & non-functional requirements (aka hardware/software requirements & how to setup/use the solution) I can write these things, and I assume it will be necessary for anything someone pays for, but I'm mainly just asking a question so I can learn about how people would monetize these things (if it's even possible to) nowadays. Ok, backing up a few steps. Can look at things that have been done in the past related to licensing/usage fees relative to computational power. aka statistical/mathematical / operating system / database packages/software. Modern cloud processing has 'fees/payment' tied to scale/balancing per use requirements (not necessarily tied specifically to gpu's). what was done to access "doesn't work very well for reasons"? aka monitored systems in questions and saw ...... ????? What were the "reasons" for "doesn't work very well? aka trying to do goolgle search type work on 2mb intel 486 oover a 2mb network and expecting to be able to compete with google is never going to work out. What type of load balancing? Load balancing typically has to be tuned/adjusted based on end usage requirements/production environment (not just per factory setting) From what I've seen, there's usually some hard requirement that a single instance of the application can handle N in-flight requests. This is fine when you have a single load balancer that uses some flavor of "least request" or "least connection" balancing algorithm. The problem arises when you have a fleet of load balancers- they don't share state, so each LB can't know the state of any individual backend without sending a request. This results in a failed request if the backend is already at full capacity. Your tone is coming off as condescending and I'm not sure if it's intentional. I guess I should mention that I'm _very_ familiar with how load balancing works. This is a real problem and you can't "tune" your way into a consistent global view of the backend states. But do have control over how the data parcels causing load are partitioned/preped before being parceled out. (which requires knowledge of resource metrics/topology-- distributed resources within computing box vs. distributed external computing resources ). Pre-modern computing, load balancing was a telecommunications field thing. Cloud computing is the modern 'load balancing' take. I'm familiar with the "modern load balancing" take. Modern load balancers, like you'd find in any cloud vendor or OSS project like envoy/nginx/haproxy, learn about the endpoints it is balancing across via some service discovery mechanism. DNS is usually what you find, but there are other ways like envoy's xds mechanism. When you generalize this by saying "cloud computing", it glosses over the fact that there's still a fleet of load balancers somewhere. When you use an NLB or ALB in AWS, there are many machines behind the scene and a very complex control plane providing those machines with the information they need to balance load. The problem I'm talking about still exists here. I _know_ they have this problem because I'm familiar with how those systems are built and I know what shortcomings they have. > but this doesn't work very well for reasons. Which reasons? In my experience/exposure, people are perfectly happy with Proxmox on a big GPU-laden boxen. I didn't mean to imply that this is a problem with arbitrary machines that have GPUs. There are specific kinds of applications that use GPUs and because of how they work have hard limits on the number of tasks they can concurrently process on those GPUs. Either way, I feel like these details are orthogonal to my original question. Do you think it matters? Why doesn’t software load balancers work? Also surely you just implement queuing?