Ask HN: How long before we get "coding agent in a box"?

2 points by heavyarms 2 months ago · 6 comments · 2 min read

I've been using Claude Code since the early beta days and, since the 4.5 Sonnet release, it's changed my workflow a lot. At least in my view, the current iteration of frontier coding agents are good enough to automate a lot of rote software development tasks and are worth the money... if put in the hands of capable developers who know how to use them. But giving unrestricted access to all of your developers to something like Claude Code is also signing yourself up for huge variability in OpEx budgets.

I understand the current hardware limitations and that you can't just put a frontier LLM in a black box and hook it up to your existing MBP via USB-C. In my estimation, something like a Apple Mac Studio M3 (256gb or more of unified memory) is maybe one possible option ($7,500 - $10,000) for running a 405b open weights model... but it wouldn't be very fast. And it wouldn't come close to the level of quality or workflow of Claude Code.

To really run a current frontier LLM locally with something like >30 tokens per second would probably require four A100s.. add in NVLink bridges, expensive cooling, 256GB RAM, a cool case with LED lights (optional) and we're talking about ~$60,000? $80,000?

So my question is: How many generations—or what specific architectural shifts (specialized ASICs, better quantization, etc.)—do we need before we can buy a dedicated co-processor box that sits on a desk and runs a Sonnet-level agent at viable speeds... at a price point where it makes sense vs. spending $500-$2,000 per month per developer on API fees? In my opinion that "makes sense to me, here's the credit card" price point might be $10,000 right now, but I could be wrong.

And related question: Who will do this? Anthropic could probably make a killing right now IF they had could sell "Claude Code in a box for $10,000" but would they ever want to? It would be cannibalizing the majority of their business. But Apple might do this. And it might only be one or two generations of hardware upgrades away. They just need the "frontier LLM" to stick into the box.

PaulHoule 2 months ago

I think it's the memory which is the bottleneck more so than processing speed and the obvious levers to push on are:

- more memory efficient models

- a whole system approach to getting better performance out of a less capable model

- more memory

the memory crisis, Micron shutting down a beloved brand that built trust over almost 30 years, and all of that is an economic externalization of memory as the bottleneck.

wmf 2 months ago

Barring some breakthrough... never. The cloud will always be cheaper.

Settings

Ask HN: How long before we get "coding agent in a box"?

Keyboard Shortcuts