543 Hours: What happens when AI runs while you sleep
michael.roth.rocksThis would be a lot more convincing if it were written by a human. The omission of showing what what actually "shipped" says more than all the words in the writeup. It demonstrates a profound misunderstanding of what AI critics are saying and what evidence would change their mind.
It really doesn't take much: "I used AI to make X. You can find it at https://whatever." Show people the actual results.
I'm the author of that post. Thank you for your feedback.
The production code is proprietary work for clients, so I can't link to it directly. But the tooling I built to support the pipeline is open source: the log analyzer that computed these statistics.
There are a couple of other in-flight projects I will open source soon, created by this process, but they aren't out yet.
The research page is about the methodology because that's what generalizes. The specific microservices I ship are just microservices.
Thank you for being so graceful. I regret being too harsh in my criticism. I'm sorry.
I've tried reading this and I can't. It's not that the text is AI generated, it's that the whole structure seems to be. (Hope you appreciate the irony of my LLMisms). It's not human-parseable, at least not by this human. And it's not that my attention is shot, luckily I'm still able to read copious amounts of long-form text and analysis.
Also, opening with "I'm a top performer"... That's not how writing for other humans works. It's perfectly legitimate to establish authority in the opening a piece, but you have to show some credible proof. "I'm a top performer" is immediately off-putting.
Thank you for your feedback. These are fair points.
I get that "top performer" is off-putting. You're right that authority has to be earned in the text (and I hope I do that), not declared.
On the structure: yes, it's a novel format and I can see how that would be hard to parse. It won't work for everyone.
Both of these are artifacts of trying to blend research into the modern social-media driven world.
Based on the numbers from article this person says they are writing a prompt every 3 minutes, all day long, every day.
This is just nonsense. The whole thing looks like the fever dream of someone in a severe manic episode. Even the formatting and writing style of blog has a manic feeling. Hard to tell if that’s coming from the user or the AI.
I’d like to know how many users does all this “shipped code” have?
>35 years building event-driven distributed systems.
Also this guy was not building event-driven distributed systems in 1991.
Hi, I'm the original author and I can clarify a few things.
The 543 hours are the agent compute hours, not me at the keyboard. The pipeline runs autonomously, the agents execute in parallel, and the gates verify the output. Most of the prompts are agent-to-agent, not human-to-agent.
On the timeline: I have a BSCS (1995) and MSCS (1997) with a specialty in distributed systems. I actually worked my way through school doing this work so I didn't need loans. Let's call it almost 35 years.
The terminology has evolved but the architecture hasn't changed as much as people think.
> Most of the prompts are agent-to-agent, not human-to-agent.
I can’t even begin to parse any of this if that’s the case.
> Let's call it almost 35 years.
I was hooking up TI-83s to each other when I was 12, so I guess I’ll tell people I’ve been building distributed systems for 30 years.
I’m going to bet that you didn’t have “building event driven distributed systems for 15 years” on your resume in 2006.
> High performers have built infrastructure that makes AI effective
And yet these "high performers" ship nothing but thousands of words of how AI makes them performant, or hundreds of thousands of the worst quality slop you can imagine (see Garry Tan's GStack, Steve Yegge's Gastown etc.)
> 650 work arcs clustered into distinct types.
And the result of these arcs are?
At this point I lost all interest in the navel-gazing, AI-generated or AI-corrected verbiage that can rival Yegge's, and no idea what he spent all these 543 autonomous hours actually doing.
I addressed this in my reply to kelseyfrog above. The short version: the production work is proprietary, the tooling I used to do the analysis is open source.
Yeah, it's "my code is lives in another (gas) town, you wouldn't know her". Same for some undisclosed opensource projects.
I can't imagine letting any LLMs do 500+ hours of autonomous work on any code at my company, or even for my own project (hundreds of thousands of lines of unreviewable slop? no thank you). Especially for the amount of features you claim they implement from scratch.
I also don't believe anything about "2 agents running for 12 hours" given how fast they exhaust context, become extremely stupid, and completely ignore most of previous work on subsequent runs, and will happily ignore any explicit instructions. Despite any "guardrails".
Funnily enough literally right now in my current session Claude has "forgotten" most instructions from its global memory *and* its local CLAUDE.md