Eight more months of agents
crawshaw.io> I deeply appreciate hand-tool carpentry and mastery of the art, but people need houses and framing teams should obviously have skillsaws.
Where are all the new houses? I admit I am not a bleeding edge seeker when it comes to software consumption, but surely a 10x increase in the industry output would be noticeable to anyone?
This weekend I tried what I'd call a medium scale agentic coding project[0], following what Anthropic demonstrated last week autonomously building a C-compiler [1]. Bottom line is, it's possible to make demos that look good, but it really doesn't work well enough to build software you would actually use. This naturally lends itself to the "everybody is taking about how great it is but nobody is building anything real with it" construct we're in right now. It is great, but also not really useful.
[0] https://www.marble.onl/posts/this_cost_170.html
[1] https://www.anthropic.com/engineering/building-c-compiler
Related, this reminds me of the time Cursor spent millions of dollars worth of tokens to write a new browser with LLMs and ended up with a non-functioning wrapper of existing browser libraries.
Thank you for doing this.
Org processes have not changed. Lots of the devs I know are enjoying the speedup on mundane work, consuming it as a temporary lifestyle surplus until everything else catches up.
You can't saw faster than the wood arrives. Also the layout of the whole job site is now wrong and the council approvals were the actual bottleneck to how many houses could be built in the first place... :/
Basically this. My last several tickets were HITL coding with AI for several hours and then waiting 1-2 days while the code worked its way through PR and CI/CD process.
Coding speed was never really a bottleneck anywhere I have worked - it’s all the processes around it that take the most time and AI doesn’t help that much there.
I’m seeing it slightly differently. So much of our new slowdown is rework because we’ve seen a bunch more API and contract churn. The project I’m on has had more rework than I care to contemplate and most of it stems from everyone’s coding agents failing to stay synced up with each other on the details and their human handlers not noticing the discrepancies until we’re well into systems integration work.
If I may hijack your analogy, it would be like if all the construction crews got really fast at their work, so much so that the city decided to go for an “iterative construction” strategy because, in isolation, the cost of one team trying different designs on-site until they hit on one they liked became very small compared to the cost of getting city planners and civil engineers involved up-front. But what wasn’t considered was the rework multiplier effect that comes into play when the people building the water, sewage, electricity, telephones, roads, etc. are all repeatedly tweaking designs with minimal coordination amongst each other. So then those tweaks keep inducing additional design tweaks and rework on adjacent contractors because none of these design changes happen in a vacuum. Next thing you know all the houses are built but now need to be rewired because the electricity panel is designed for a different mains voltage from the drop and also it’s in the wrong part of the house because of a late change from overhead lines in the alleys to underground lines below the street.
Many have observed that coding agents lack object permanence so keeping them on a coherent plan requires giving them such a thoroughly documented plan up front. It actually has me wondering if optimal coding agent usage at scale resembles something of a return to waterfall (probably in more of a Royce sense than the bogeyman agile evangelists derived from the original idea) where the humans on the team mostly spend their time banging out systems specifications and testing protocols, and iteration on the spec becomes somewhat more removed from implementing it than it is in typical practice nowadays.
To me the hard problem isn’t building things, it’s knowing what to build (finding the things that provide value) and how to build it (e.g. finding novel approaches to doing something that makes something possible that wasn’t possible before).
I don’t see AI helping with knowing what to build at all and I also don’t see AI finding novel approaches to anything.
Sure, I do think there is some unrealized potential somewhere in terms of relatively low value things nobody built before because it just wasn’t worth the time investment – but those things are necessarily relatively low value (or else it would have been worth it to build it) and as such also relatively limited.
Software has amazing economies of scale. So I don’t think the builder/tool analogy works at all. The economics don’t map. Since you only have to build software once and then it doesn’t matter how often you use it (yeah, a simplification) even pretty low value things have always been worth building. In other words: there is tons of software out there. That’s not the issue. The issue is: what it the right software and can it solve my problems?
> To me the hard problem isn’t building things, it’s knowing what to build (finding the things that provide value) and how to build it (e.g. finding novel approaches to doing something that makes something possible that wasn’t possible before).
The problem with this that after doing this hard work someone can just copy easily your hard work and UI/UX taste. I think distribution will be very important in the future.
We might end up that in future that you have already in social media where influencers copy someones post/video and not giving credits to original author.
>The problem with this that after doing this hard work someone can just copy easily your hard work and UI/UX taste.
Or indeed, somebody might steal and launder your work by scooping them up into a training set for their model and letting it spit out sloppy versions of your thing.
I agree. It’s really easier to build low-impact tools for personal use. I managed to produce tools I would never have had time to build and I use them everyday. But I will never sell them because it’s tailored to my needs and it makes no sense to open source anything nowadays. For work it’s different, product teams still need to decide what to build and what is helpful to the clients. Our bugs are not self-fixed by AI yet. I think Anthropic saying 100% of their code is AI generated is a marketing stunt. They have all reasons to say that to sell their tool that generates code. It sends a strong signal to the industry that if they can do it, it could be easier for smaller companies. We are not there yet from a client perspective asking a feature and the new feature is shipped 2 days later in prod without human interactions
I wonder what happened to the old addage of "only 10% of the time you actually spend coding, the rest of the time is figuring out what is needed".
At the same time I see people claiming 100x increases and how they produce 15k lines of code each day thanks to AI, but all I can wonder is how these people managed to find 100x work that needed to be done.
For m, I'm demotivated to work on many ideas thinking that anyone can easily copy it or OpenClaw/Nanobot will easily replicate 90% of that unctionality.
So now need to think of different kind of ideas, something on line of games that may take multiple iteration to get perfected.
>thinking that anyone can easily copy it
I mean this is how it's always been throughout history.
Creating something new is hard, copying something in terms of energy spent, is far easier. This is software or physical objects that don't require massive amounts of expensive technology to reproduce.
In a few decades, AIs will probably be better at those than most humans. Possibly even sooner.
At my $work this manifests as more backlog items being ticked off, more one-off internal tooling, features (and tools) getting more bells-and-whistles and much more elaborate UI. Also some long-standing bugs being fixed by claude code.
Headline features aren't much faster. You still need to gather requirements, design a good architecture, talk with stakeholders, test your implementation, gather feedback, etc. Speeding up the actual coding can only move the needle so much.
I feel like we work at the same place. IT Husbandry/Debt Paying/KTLO whatever you call it is being ground into dust. Especially repetitive stuff that I originally would've needed a week to automate and never could get to the top of the once quarterly DevOps sprint...bam. GitHub Action workflow runs weekly to pull in the latest OS images, update and roll over a smoke test VM, monitor, roll over the rest or rollback and ping me in Slack. Done in half a day.
I've got a couple Claude Code skills set up where I just copy/paste a Slack link into it and it links people relevant docs, gives them relevant troubleshooting from our logs, and a hook on the slack tools appends a Claude signature to make sure they know they weren't worth my time.
That said, there's this weird quicksand people around me get in where they just spend weeks and weeks on their AI tools and don't actually do much of anything? Like bro you burned your 5 hour CC Enterprise limit all week and committed...nothing?
I'm sure there's plenty of new software being released and built by agents, but the same problem as handcrafted software remains - finding an audience. The easier and quicker it is to build software, or the more developers build software, the more stuff is thrown at a wall to see what sticks, but I don't think there's more capacity for sticktivity, if my analogy hasn't broken down by now.
I think, if there were to be a noticeable increase in software quantity due to agentic coding, we should test it by looking into indie games.
According to SteamDB (and Reddit), 2024 and 2025 both saw about 19.000 games released on Steam - there's a big jump between '23 and '24 of about 5000 games, but oddly it plateaued then.
https://www.reddit.com/r/pcgaming/comments/1pl7kg1/over_1900...
LLMs are unusably bad at generating game code
It's AI features and 10x more bugs. Microsoft is leading the way.
Quite a few - and I know I am only speaking for myself - live on my different computers. I created a few CLI tools that make my life and that of my agent smoother sailing for information retrieval. I created, inspired by a blog post, a digital personal assistant, that really enables me to better juggle different work contexts as well as different projects within these work contexts.
I created a platform for a virtual pub quiz for my team at my day job, built multiple pandingpages for events, debugged dark table to recognize my new camera (it was to new to be included in the camera.xml file, but the specs were known). I debugged quite a few parts of a legacy shitshow of an application, did a lot of infrastructure optimization and I also created a massive ton of content as a centaur in dialog with the help of Claude Code.
But I don't do "Show HN" posts. And I don't advertise my builds - because other than those named, most are one off things, that I throw away after this one problem was solved.
To me code became way more ephemeral.
But YMMV - and that is a good thing. I also believe that way less people than the hype bubble implies are actually really into hard core usage like Pete Steinberger or Armin Ronacher and the likes.
> Quite a few - and I know I am only speaking for myself - live on my different computers
I use AI/agents in quite similar ways, and even rekindled multiple personal projects that had stalled. However, to borrow OPs parlance, these are not "houses" - more like sheds and tree-houses. They are fun and useful, but not moving the needle on housing stock supply, so to speak.
If you recall it, would you mind sharing the blog post that inspired the digital personal assistant?
It's not 10x, but https://www.ft.com/content/5ac2ee5f-f8bd-4f39-a759-3c5c50c8b... has some graphs suggesting a 1.5x increase in metrics like "number of new apps published in the iOS App Store" and "lines of code committed by US GitHub users".
People haven't noticed because the software industry was already mostly unoriginal slop, even prior to LLMs, and people are good at ignoring unoriginal slop.
The real outcome is mostly a change in workflow and a reasonable increase in throughput. There might be a 10x or even 100x increase in creation of tiny tools or apps (yay to another 1000 budget assistant/egg timer/etc. apps on the app/play store), but hardly something one would notice.
To be honest, I think the surrounding paragraph lumps together all anti-AI sentiments.
For example, there is a big difference between "all AI output is slop" (which is objectively false) and "AI enables sloppy people to do sloppy work" (which is objectively true), and there's a whole spectrum.
What bugs me personally is not at all my own usage of these tools, but the increase in workload caused by other people using these tools to drown me in nonsensical garbage. In recent months, the extra workload has far exceeded my own productivity gains.
For the non-technical, imagine a hypochondriac using chatgpt to generate hundreds of pages of "health analysis" that they then hand to their doctor and expect a thorough read and opinion of, vs. the doctor using chatgpt for sparring on a particular issue.
>people using these tools to drown me in nonsensical garbage
https://en.wikipedia.org/wiki/Brandolini%27s_law
>The amount of energy needed to refute bullshit is an order of magnitude bigger than that needed to produce it.
do you have the right expectations?
rather than new stuff for everyone to use, the future could easily be everyone building their own bespoke tools for their own problems.
You aren't notcing it?
Small and mid sized companies are getting custom software now.
Small software is able to be packed with extra features instead of bare minimum.
I am seeing shit tons of chatbots for everything under the sun being onboarded at corporate
yes, there is a very large increase in TUI tools
> Pay through the nose for Opus or GPT-7.9-xhigh-with-cheese. Don't worry, it's only for a few years.
> You have to turn off the sandbox, which means you have to provide your own sandbox. I have tried just about everything and I highly recommend: use a fresh VM.
> I am extremely out of touch with anti-LLM arguments
'Just pay out the arse and run models without a sandbox or in some annoying VM just to see them fail. Wait, some people are against this?'
I'm doing web development in a VM (on exe.dev) and it works quite nicely.
In case you missed the several links, exe.dev is his startup which provides sandboxing for agents. So it makes sense he wants to get people used to paying for agents and in need of a good sandbox.
Well, not 'against' per se, just watching LLM-enthusiasts tumble in the mud for now. Though I have heard that if I don't jump into the mud this instance, I will be left behind apparently for some reason. So you either get left behind or get a muddy behind, your choice.
Everybody keeps saying the models are getting better, the tooling is getting better, people are discovering better practices...
So why not just wait out this insane initial phase, and if anything is left standing afterwards and proves itself, just learn that.
Because there's nothing to learn. "learning" to use claude code is less effort than learning how to use the basics of git.
They provide value today so I'm using them today.
This. If you use a modern frontier model like Opus 4.5, there's nothing to learn. No special prompting techniques. You give it a task, and most of the time it's capable of solving a big chunk quickly. You still need to babysit it, review its plan/code and make adjustments. But that's already faster than achieving the same results manually. Especially when you're at low energy levels and can't bring yourself to look into a task and figure it out from zero.
I don't trust the idea of "not getting", "not understanding", or "being out of touch" with anti-LLM (or pro-LLM) sentiment. There is nothing complicated about this divide. The pros and cons are both as plain as anything has ever been. You can disagree - even strongly - with either side. You can't "not understand".
> There is nothing complicated about this divide [...] You can't "not understand"
I beg to differ. There are a whole lot of folks with astonishingly incomplete understanding about all the facts here who are going to continue to make things very, very complicated. Disagreement is meaningless when the relevant parties are not working from the same assumption of basic knowledge.
Bolstering your point, check out the comments in this thread: https://www.reddit.com/r/rust/comments/1qy9dcs/who_has_compl...
There’s a lot of unwillingness to even attempt to try the tools.
Those people are absolutely going to get left in the dust. In the hands of a skilled dev, these things are massive force multipliers.
That's one of the sentiments I don't quite grasp, though. Why can't they just learn the tools when they're stable? So far it's been sooo many changes in workflows, basically relearn the tools every three months. It's maybe a bit more stabilized the last year, but still one could spend an enormous amount of time twiddling with various models or tools, knowledge that someone else probably could learn quicker at a later time.
"Being left in the dust" would also mean it's impossible for new people / graduates to ever catch up. I don't think it is. Even though I learned react a few years after it was in vogue (my company bet on the wrong horse), I quickly got up to speed and am just as productive now as someone that started a bit earlier.
Not the person you asked, but my interpretation of “left in the dust” here (not a phrasing I particularly agree with) would be the same way iOS development took off in the 2010s.
There was a land rush to create apps. Basic stuff like the flash light, todo lists, etc, were created and found a huge audience. Development studios were established, people became very successful out of it.
I think the same thing will happen here. There is a first mover advantage. The future is not yet evenly distributed.
You can still start as an iOS developer today, but the opportunity is different.
I’m not sure your analogy is applicable here.
The introduction of the App Store did not increase developer productivity per se. If anything, it decreased developer productivity, because unless you were already already a Mac developer, you had to learn a programming language you've never used, Objective-C, (now it's largely Swift, but that's still mainly used only on Apple platforms) and a brand new Apple-specific API, so a lot of your previous programming expertise became obsolete on a new platform. What the App Store did that was valuable to developers was open up a new market and bring a bunch of new potential customers, iPhone users, indeed relatively wealthy customers willing to spend money on software.
What new market is brought by LLMs? They can produce as much source code as you like, but how exactly do you monetize that massive amount of source code? If anything, the value of source code and software products will drop as more is able to be produced rapidly.
The only new market I see is actually the developer tool market for LLM fans, essentially a circular market of LLM developers marketing to other LLM developers.
As far as the developer job market is concerned, it's painfully clear that companies are in a mass layoff mood. Whether that's due to LLMs, or whether LLMs are just the cover story, the result is the same. Developer compensation is not on the rise, unless you happen to be recruited by one of the LLM vendors themselves.
My impression is that from the developer perspective, LLMs are a scheme to transfer massive amounts of wealth from developers to the LLM vendors. And you can bet the prices for access to LLMs will go up, up, up over time as developers become hooked and demand increases. To me, the whole "OpenClaw" hype looks like a crowd of gamblers at a casino, putting coins in slot machines. One thing is for certain: the house always wins.
My take is more optimistic.
I think it will make prototyping and MVP more accessible to a wider range of people than before. This goes all the way from people who don't know how to code up to people who know very well how to code, but don't have the free time/energy to pursue every idea.
Project activation energy decreases. I think this is a net positive, as it allows more and different things to be started. I'm sure some think it's a net negative for the same reasons. If you're a developer selling the same knowledge and capacity you sold ten years ago things will change. But that was always the case.
My comparison to iOS was about the market opportunity, and the opportunity for entrepreneurship. It's not magic, not yet anyway. This is the time to go start a company, or build every weird idea that you were never going to get around to.
There are so many opportunities to create software and companies, we're not running out of those just because it's faster to generate some of the code.
What you just said seems reasonable. However, what the earlier commenter said, which led to this subthread, seems unreasonable: those people unwilling to try the tools "are absolutely going to get left in the dust."
Returning to the iOS analogy, though, there was only a short period of time in history when a random developer with a flashlight or fart app could become successful in the App Store. Nowadays, such a new app would flop, if Apple even allowed it, as you admitted: "You can still start as an iOS developer today, but the opportunity is different." The software market in general is not new. There are already a huge number of competitors. Thus, when you say, "This is the time to go start a company, or build every weird idea that you were never going to get around to," it's unclear why this would be the case. Perhaps the barrier to entry for competitors has been lowered, yet the competition is as fierce as ever (unlike in the early App Store).
In any case, there's a huge difference between "the barrier to entry has been lowered" and "those who don't use LLMs will be left in the dust". I think the latter is ridiculous.
Where are the original flashlight and fart app developers now? Hopefully they made enough money to last a lifetime, otherwise they're back in the same boat as everyone else.
> In any case, there's a huge difference between "the barrier to entry has been lowered" and "those who don't use LLMs will be left in the dust". I think the latter is ridiculous.
Yeah, it’s a bit incendiary, I just wanted to turn it into a more useful conversation.
I also think it overstates the case, but I do think it’s an opportunity.
It’s not just that the barrier to entry has been lowered (which it has) but that someone with a lot of existing skill can leverage that. Not everyone can bring that to the table, and not everyone who can is doing so. That’s the current advantage (in my opinion, of course).
All that said, I thought the Vision Pro was going to usher in a new era of computing, so I’m not much of a prognosticator.
> it’s a bit incendiary
> I also think it overstates the case
I think it's a mistake to defend and/or "reinterpret" the hype, which is not helping to promote the technology to people who aren't bandwagoners. If anything, it drives them away. It's a red flag.
I wish you would just say to the previous commenter, hey, you appear to be exaggerating, and that's not a good idea.
I didn't read the comment as such a direct analogy. It was more recalling a lesson of history that maybe doesn't repeat but probably will rhyme.
The App Store reshuffled the deck. Some people recognized that and took advantage of the decalcification. Some of them did well.
You've recognized some implications of the reshuffle that's currently underway. Maybe you're right that there's a bias toward the LLM vendors. But among all of it, is there a niche you can exploit?
> In the hands of a skilled dev, these things are massive force multipliers.
What do you get from it? Say you produce more, do you get a higher salary?
What I have seen so far is the opposite: if you don't produce more, you risk getting fired.
I am not denying that LLMs make me more productive. Just saying that they don't make me more wealthy. On the other hand, they use a ton of energy at a time where we as a society should probably know better. The way I see it, we are killing the Earth because we produce too much. LLMs help us produce more, why should we be happy?
(Imagine me posting the graph of worker productivity in the US climbing quickly over time while pay remains flat or falls)
Using these tools comes down to basically just writing what you want in a natural language. I don't think it will be a problem to catch up if they need to.
Context management, plan mode versus agent mode, skills vs system prompt, all make a huge difference and all take some time to build intuition around.
Not all that hard to learn, but waiting for things to settle down assumes things are going to settle down. Are they? When?
That these facets of use exist at all are indicative of immature product design.
These are leaked implementation details that the labs are forcing us to know because these are weak, early products and they’re still exploring the design space. The median user doesn’t want to and shouldn’t have to care about details like this.
Future products in this space won’t have them and future users won’t be left in the dust by not learning them today.
Python programmers aren’t left behind by not knowing malloc and free.
Someone will package up all that intuition and skills and I imagine people won't have to do any of these things in future.
You wait for everyone to go broke chasing whatever, and then take their work for your own. It's not that hard to copy and paste.
It doesn’t matter how fast you run if it’s not the correct direction.
Good LLM wielders run in widening circles and get to the goal faster than good old school programmers running in a straight line
I try to avoid LLMs as much as I can in my role as SWE. I'm not ideologically opposed to switching, I just don't have any pressing need.
There are people I work with who are deep in the AI ecosystem and it's obvious what tools they're using It would not be uncharitable in any way to characterize their work as pure slop that doesn't work, buggy, untested adequately, etc.
The moment I start to feel behind I'll gladly start adopting agentic AI tools, but as things stand now, I'm not seeing any pressing need.
Comments like these make me feel like I'm being gaslit.
We are all constantly being gaslit. People have insane amounts of money and prestige riding on this thing paying off in such a comically huge way that it can absolutely not deliver on it in the foreseeable future. Creating a constant pressing sentiment that actually You Are Being Left Behind Get On Now Now Now is the only way they can keep inflating the balloon.
If this stuff was self-evidently as useful as it's being made out to be, there would be no point in constantly trying to pressure, coax and cajole people into it. You don't need to spook people into using things that are useful, they'll do it when it makes sense.
The actual use-case of LLMs is dwarfed by the massive investment bubble it has become, and it's all riding on future gains that are so hugely inflated they will leave a crater that makes the dotcom bubble look like a pothole.
Then where is all this new and amazing software? If LLM can 10x or 100x someones output we should've seen an explosion of great software by now.
One dude with an LLM should be able to write a browser fully capable of browsing the modern web or an OS from scratch in a year, right?
That's a silly bar to ask for.
Chrome took at least a thousand man years i.e. 100 people working for 10 years.
I'm lowballing here: it's likely way, way more.
If ai gives 10x speedup, to reproduce Chrome as it is today would require 1 person working for 100 years, 10 people working for 10 years or 100 people working for 1 year.
Clearly, unrealistic bar to meet.
If you want a concrete example: https://github.com/antirez/flux2.c
Creator of Redis started this project 3 weeks ago and use Claude Code to vibe code this.
It works, it's fast and the code quality is as high as I've ever seen a C code base. Easily 1% percentile of quality.
Look at this one-shotted working implementation of jpeg decoder: https://github.com/antirez/flux2.c/commit/a14b0ff5c3b74c7660...
Now, it takes a skilled person to guide Claude Code to generate this but I have zero doubts that this was done at least 5x-10x faster than Antirez writing the same code by hand.
Ah, right, so it's a "skill issue" when GPT5.3 has no idea what is going on in a private use case.
Literally yes
I still haven’t seen those mythical LLM wielders in the wild. While I’m using tools like curl, jq, cmus, calibre, openbsd,… that has been most certainly created by those old school programmers.
I only wish I was there when that cocky "skilled dev" is laid off.
The negative impacts of generative AI are most sharply being felt by "creatives" (artists, writers, musicians, etc), and the consumers in those markets. If the OP here is 1. a programmer 2. who works solely with other programmers and 3. who is "on the grind", mostly just consuming non-fiction blog-post content related to software development these days, rather than paying much attention to what's currently happening to the world of movies/music/literature/etc... then it'd be pretty easy for them to not be exposed very much to anti-LLM sentiment, since that sentiment is entirely occurring in these other fields that might have no relevance to their (professional or personal) life.
"Anti-LLM sentiment" within software development is nearly non-existent. The biggest kind of push-back to LLMs that we see on HN and elsewhere, is effectively just pragmatic skepticism around the effectiveness/utility/ROI of LLMs when employed for specific use-cases. Which isn't "anti-LLM sentiment" any more than skepticism around the ability of junior programmers to complete complex projects is "anti-junior-programmer sentiment."
The difference between the perspectives you find in the creative professions vs in software dev, don't come down to "not getting" or "not understanding"; they really are a question of relative exposure to these pro-LLM vs anti-LLM ideas. Software dev and the creative professions are acting as entirely separate filter-bubbles of conversation here. You can end up entirely on the outside of one or the other of them by accident, and so end up entirely without exposure to one or the other set of ideas/beliefs/memes.
(If you're curious, my own SO actually has this filter-bubble effect from the opposite end, so I can describe what that looks like. She only hears the negative sentiment coming from the creatives she follows, while also having to dodge endless AI slop flooding all the marketplaces and recommendation feeds she previously used to discover new media to consume. And her job is one you do with your hands and specialized domain knowledge; so none of her coworkers use AI for literally anything. [Industry magazines in her field say "AI is revolutionizing her industry" — but they mean ML, not generative AI.] She has no questions that ChatGPT could answer for her. She doesn't have any friends who are productively co-working with AI. She is 100% out-of-touch with pro-LLM sentiment.)
I think this is an interesting point, my one area of disagreement is that there is no "anti-LLM sentiment" in the programming community. Sure, plenty of folks expressing skepticism or disagreement are doing so from a genuine place, but just in reading this site and a few email newsletters I get I can say that there is a non-trivial percent in the programming world who are adamantly opposed to LLMs/AI. When I see comments from people in that subset, it's quite clear that they aren't approaching it from a place of skepticism, where they could be convinced given appropriate evidence or experiences.
But there's a difference. Being opposed to AI-generated art/music/writing is valid because humans still contribute something extraordinarily meaningful when they do it themselves. There's no market for AI-generated music, and AI-generated art and writing tends to get called out right away when it's detected. People want the human expression in human-generated art, and the AI stuff is a weak placeholder at best.
For software the situation is different. Being opposed to LLM-generated software is just batshit crazy at this point. The value that LLMs provide to the process makes learning to use them, objectively, an absolute must; otherwise you are simply wasting time and money. Eric S. Raymond put it something like "If you call yourself a software engineer, you have no excuse not to be using these tools. Get your thumb out of your ass and learn."
Ok, I’ll bite. What’s there to learn that you can tie directly to an increase of productivity?
I can say “learn how to use vim makeprg feature so that you can jump directly to errors reported by the build and tool” and it’s very clear where the ROI. But all the AI hypers are selling are hope, prayers, and rituals.
I’m not an AI hyper, I just don’t code manually anymore. Tickets take about as much time to close as before, but the code shipped now has higher test coverage, higher performance, better concurrency error handling, less follow-up refactor PRs, less escapes to staging/prod and better documentation; some of it is now also modeled in a model checker.
> I just don’t code manually anymore
I'm curious about what industry you are in and the tech stack you are using?
without revealing too much generic saas at non-toy scale, 95% TS + postgres + 5% a very long tail of other stuff.
So the code was an unknown (to the world) X quality, but now it’s X+k quality? How does that help me exactly?
I don't care to be honest, it's up to you to learn to use the tool.
The skill is learning to supply the LLM with enough context to do anything a developer does: turn specs into code, check its work including generating and running tests, debug and analyze the code for faults or errors, and run these in a loop to converge on a solution. If you're about to do something by hand in an IDE, STOP. Think about what the LLM will need to know to perform that task for you.
It may take some human intervention, but the productivity results are pretty consistent: tasks that used to take weeks now take hours or days. This puts in reach the ability to try things you wouldn't countenance otherwise due to the effort and tedium involved. You'd have to be a damn fool not to take advantage of the added velocity. This is why what we do is called "engineering", not a handicraft.
An engineering take on this would have provided numbers like success and failure rate, guaranteed results, operation manuals,…
> This puts in reach the ability to try things you wouldn't countenance otherwise due to the effort and tedium involved.
If you’re talking about prototypes, a whiteboard is way cheaper and less time consuming than an agent.
> "Anti-LLM sentiment" within software development is nearly non-existent.
Strong disagree right there. I remember talking to a (developer) coworker a few months ago who seemed like the biggest AI proponent on our team. When we were one-on-one during a lunch though, he revealed that he really doesn't like AI that much at all, he's just afraid to speak up against it. I'm in a few Discord channels with a lot of highly skilled (senior and principal programmers) who mostly work in game development (or adjacent), and most of them either mock LLMs or have a lot of derision for it. Hacker News is kind of a weird pro-AI bubble, most other places are not nearly as keen on this stuff.
> "Anti-LLM sentiment" within software development is nearly non-existent.
I see it all the time in professional and personal circles. For one, you are shifting the goalpost on what is “anti-llm”, two, people are talking about the negative social, political and environmental impacts.
What is your source here?
>"Anti-LLM sentiment" within software development is nearly non-existent
This is certainly untrue. I want to say "obviously", which means that maybe I am misunderstanding you. Below are some examples of negative sentiments programmers have - can you explain why you are not counting these?
NOTE: I am not presenting these as an "LLMs are bad" argument. My own feelings go both ways. There is a lot that's great about LLMs, and I don't necessarily agree with every word I've written below - some of it is just my paraphrasing of what other people say. I'm only listing examples of what drives existing anti-LLM sentiment in programmers.
1. Job loss, loss of income, or threat thereof
These two are exacerbated by the pace of change, since so many people already spent their lives and money establishing themselves in the career and can't realistically pivot without becoming miserable - this is the same story for every large, fast change - though arguably this one is very large and very fast even by those standards. Lots of tech leadership is focusing even more than they already were on cheap contractors, and/or pushing employees for unrealistic productivity increases. I.e. it's exacerbating the "fast > good" problem, and a lot of leadership is also overestimating how far it reduces the barrier to creating things, as opposed to mostly just speeding up a person's existing capabilities. Some leadership is also using the apparent loss of job security as leverage beyond salary suppression (even less proportion of remote work allowed, more surveillance, worse office conditions, etc).
2. Happiness loss (in regards to the job itself, not all the other stuff in this list)
This is regarding people who enjoy writing/designing programs but don't enjoy directing LLMs; or who don't enjoy debugging the types of mistakes LLMs tend to make, as opposed to the types of mistakes that human devs tend to make. For these people, it's like their job was forcibly changed to a different, almost unrelated job, which can be miserable depending on why you were good at - or why you enjoyed - the old job.
3. Uncertainty/skepticism
I'm pushing back on your dismissal of this one as "not anti-LLM sentiment" - the comparison doesn't make sense. If I was forced to only review junior dev code instead of ever writing my own code or reviewing experienced dev code, I would be unhappy. And I love teaching juniors! And even if we ignore the subset of cases where it doesn't do a good job or assume it will soon be senior-level for every use case, this still overlaps with the above problem: The mistakes it makes are not like the mistakes a human makes. For some people, it's more unnatural/stressful to keep your eyes peeled for the kinds of mistakes it makes. For these people, it's a shift away from objective, detail-oriented, controlled, concrete thinking; away from the feeling of making something with your hands; and toward a more wishy-washy creation experience that can create a feeling of lack of control.
4. Expertise loss
A lot of positive outcomes with LLMs come from being already experienced. Some argue this will be eroded - both for new devs and existing experienced devs.
5. The training data ownership/morality angle
> 4. Expertise loss
> A lot of positive outcomes with LLMs come from being already experienced. Some argue this will be eroded - both for new devs and existing experienced devs.
This is true, but the pace of progress is so mind blowing the experts we have now might just be enough until the whole industry becomes obsolete (10-20 years assuming the lower bound of the trend line holds?)
Facebook's algorithm has picked up on the idea that "I like art". It has subsequently given me more examples of (human-created) art in my feed. Art from comic book artists, art from manga-style creators, "weird art", even a make-up artist who painted a scene from Where the Wild Things Are on her face.
I like this. What's more, while AI-generated art has a characteristic sameyness to it, the human-produced art stands out in its originality. It has character and soul. Even if it's bad! AI slop has made the human-created stuff seem even more striking by comparison. The market for human art isn't going anywhere, just like the audience for human-played chess went nowhere after Deep Blue. I think people will pay a premium for it, just to distinguish themselves from the slop. The same is true of writing and especially music. I know of no one who likes listening to AI-generated music. Even Sabrina Carpenter would raise less objection.
The same, I'm afraid, cannot be said for software—because there is little value for human expression in the code itself. Code is—almost entirely—strictly utilitarian. So we are now at an inflection point where LLMs can generate and validate code that's nearly as good, if not better, than what we can produce on our own. And to not make use of them is about as silly as Mel Kaye still punching in instruction opcodes in hex into the RPC-4000, while his colleagues make use of these fancy new things called "compilers". They're off building unimaginably more complex software than they could before, but hey, he gets his pick of locations on the rotating memory drum!
I'm one of the nonexistent anti-LLMers when it comes to software. I hate talking to a clanker, whose training data set I don't even have access to let alone the ability to understand how my input affects its output, just to do what I do normally with the neural net I've carried around in my skull and trained extensively for this very purpose. I like working directly with code. Code is not just a product for me; it is a medium of thought and expression. It is a formalized notation of a process that I can use to understand and shape that process.
But with the right agentic loops, LLMs can just do more, faster. There's really no point in resisting. The marginal value of what I do has just dropped to zero.
I don’t code anymore basically, but I have no issues accepting that there are huge gaps in the training sets in e.g. finance code bases. There is a long tail of niches to still code (semi) manually in.
Yeah, "not understanding" means they aren't engaging with the issue honestly. They go on to compare to carpentry, which is a classic sign the speaker understands neither carpentry or software development.
The anti-LLM arguments aren't just "hand tools are more pure." I would even say that isn't even a majority argument. There are plenty more arguments to make about environmental and economic sustainability, correctness, safety, intellectual property rights, and whether there are actual productivity gains distinguishable from placebo.
It's one of the reasons why "I am enjoying programming again" is such a frustrating genre of blog post right now. Like, I'm soooo glad we could fire up some old coal plants so you could have a little treat, Brian from Middle Management.
Local models are decent now. Qwen3 coder is pretty good and decent speed. I use smaller models (qwen2.5:1.5b) with keyboard shortcuts and speech to text to ask for man page entries, and get 'em back faster than my internet connection and a "robust" frontier model does. And web search/RAG hides a multitude of sins.
"Using anything other than the frontier models is actively harmful" - so how come I'm getting solid results from Copilot and Haiku/Flash? Observe, Orient, Decide, Act, Review, Modify, Repeat. Loops with fancy heuristics, optimized prompts, and decent tools, have good results with most models released in the past year.
Have you used the frontier models recently? It's hard to communicate the difference the last 6 months has seen.
We're at the point where copilot is irrelevant. Your way of working is irrelevant. Because that's not how you interact with coding AIs anymore, you're chatting with them about the code outside the IDE.
I have.
Just this month I've burned through 80% of my Copilot quota of Claude Opus 4.6 in a couple of days to get it to help me with a silly hobby project: https://github.com/ncruces/dbldbl
It did help. The project had been sitting for 3 years without trig and hyperbolic trig, and in a couple days of spare time I'm adding it. Some of it through rubber ducking chat and/or algorithmic papers review (give me formulas, I'll do it), some through agent mode (give me code).
But if you review the PR written in agent mode, the model still lies to my face, in trivial but hard to verify ways. Like adding tests that say cosh(1) is this number at that OEIS link, and both the number and the OEIS link are wrong, but obviously tests pass because it's a lie.
I'm not trying to bash the tech. I use it at work in limited but helpful ways, and use hobby stuff like this as a testbed precisely to try to figure out what they're good at in a low stakes setting.
But you trust the plausibly looking output of these things at your own peril.
Just to be clear: I mean Copilot CLI. I had used the IDE, and it was terrible; I tried the CLI, and for some reason it was much better. I explain carefully what I want, and it iterates until it's done, quickly, on cheap models.
If you check the docs, smaller, faster, older models are recommended for 'lightweight' coding. There's several reasons for this. 1) a smaller model doesn't have as good deep reasoning, so it works okay for a simple ask. 2) small context, small task, small model can produce better results than big context, big task, big model. The lost-in-the-middle problem is still unsolved, leading to mistakes that get worse with big context, and longer runs exacerbate issues. So small context/task that ends and starts a new loop (with planning & learning) ends up working really well and quickly.
There's a difference between tasks and problem-solving, though. For difficult problems, you want a frontier reasoning model.
Honestly, I've been using the frontier models and I'm not sure where people are seeing these massive improvements. It's not that they're bad, it's just that I don't see that much of an improvement the last 6 months. They're so inconsistent that it's hard to have a clear idea of what's happening. I usually switch between models and I don't see either those massive differences either. Not to mention that sometimes models regress in certain aspects (e.g., I've seen later models that tend to "think" more and end up at the same result but taking far more time and tokens).
> Have you used the frontier models recently?
Yes.
> It's hard to communicate the difference the last 6 months has seen.
No, it isn't. The hypebeast discovered Claude code, but hasn't yet realized that the "let the model burn tokens with access to a shell" part is the key innovation, not the model itself.
I can (and do) use GH Copilot's "agent" mode with older generation models, and it's fine. There's no step function of improvement from one model to another, though there are always specific situations where one outperforms. My current go-to model for "sit and spin" mode is actually Grok, and I will splurge for tokens when that doesn't work. Tools and skills and blahblahblah are nice to have (and in fact, part of GH Copilot now), but not at all core to the process.
The author is correct in that agents are becoming more and more capable and that you don't need the IDE to the same extent, but I don't see that as good. I find that IDE-based agentic programming actually encourages you to read and understand your codebase as opposed to CLI-based workflows. It's so much easier to flip through files, review the changes it made, or highlight a specific function and give it to the agent, as opposed to through the CLI where you usually just give it an entire file by typing the name, and often you just pray that it manages to find the context by itself. My prompts in Cursor are generally a lot more specific and I get more surgical results than with Claude Code in the terminal purely because of the convenience of the UX.
But secondly, there's an entire field of LLM-assisted coding that's being almost entirely neglected and that's code autocomplete models. Fundamentally they're the same technology as agents and should be doing the same thing: indexing your code in the background, filtering the context, etc, but there's much less attention and it does feel like the models are stagnating.
I find that very unfortunate. Compare the two workflows:
With a normal coding agent, you write your prompt, then you have to at least a full minute for the result (generally more, depending on the task), breaking your flow and forcing you to task-switch. Then it gives you a giant mass of code and of course 99% of the time you just approve and test it because it's a slog to read through what it did. If it doesn't work as intended, you get angry at the model, retry your prompt, spending a larger amount of tokens the longer your chat history.
But with LLM-powered auto-complete, when you want, say, a function to do X, you write your comment describing it first, just like you should if you were writing it yourself. You instantly see a small section of code and if it's not what you want, you can alter your comment. Even if it's not 100% correct, multi-line autocomplete is great because you approve it line by line and can stop when it gets to the incorrect parts, and you're not forced to task switch and you don't lose your concentration, that great sense of "flow".
Fundamentally it's not that different from agentic coding - except instead of prompting in a chatbox, you write comments in the files directly. But I much prefer the quick feedback loop, the ability to ignore outputs you don't want, and the fact that I don't feel like I'm losing track of what my code is doing.
The other thing about non-agent workflows is they’re much, much less compute intensive. This is going to matter.
I agree with you wholeheartedly. It seems like a lot of the work on making AI autocomplete better (better indexing, context management, codebase awareness, etc) has stagnated in favor of full-on agentic development, which simply isn't suited for many kinds of tasks.
But if you try some penny-saving cheap model like Sonnet [..bad things..]. [Better] pay through the nose for Opus.
After blowing $800 of my bootstrap startup funds for Cursor with Opus for myself in a very productive January I figured I had to try to change things up... so this month I'm jumping between Claude Code and Cursor, sometimes writing the plans and having the conversation in Cursor and dump the implementation plan into Claude.Opus in Cursor is just so much more responsive and easy to talk to, compared to Opus in Claude.
Cursor has this "Auto" mode which feels like it has very liberal limits (amortized cost I guess) that I'm also trying to use more, but -- I don't really like to flip a coin and if it lands up head then waste half hour discovering the LLM made a mess the LLM and try again forcing the model.
Perhaps in March I'll bite the bullet and take this authors advice.
Just use Codex 5.3 in codex cli, the $20/mo plan is basically limitless at least for me and I keep reasoning efforts high.
You can enjoy it while it lasts, OpenAI is being very liberal with their limits because of CC eating their lunch rn.
Yeah, I can’t recommend gpt-5.3-codex enough, it’s great! I’ve been using it with the new macOS app and I’m impressed. I’ve always been a Claude Code guy and I find myself using codex more and more. Opus is still much nicer explaining issues and walking me through implementations but codex is faster (even with xhigh effort) and gets the job done 95% of the time.
I was spending unholy amounts of money and tokens (subsidized cloud credits tho) forcing Opus for everything but I’m very happy with this new setup. I’ve also experimented with OpenCode and their Zen subscription to test Kimi K2.5 an similar models and they also seem like a very good alternative for some tasks.
What I cannot stand tho is using sonnet directly (it’s fine as a subagent), I’ve found it to be hard to control and doesn’t follow detailed instructions.
Out of curiosity, what’s your flow? Do you have codex write plans to markdown files? Just chat? What languages or frameworks do you use?
I’m an avid cursor user (with opus), and have been trying alternatives recently. Codex has been an immense letdown. I think I was too spoiled by cursor’s UX and internal planning prompt.
It’s incredibly slow, produces terribly verbose and over-complicated code (unless I use high or xhigh, which are even slower), and missed a lot of details. Python/django and react frontend.
For the first time I felt like I could relate to those people who say it doesn’t make them faster,” because they have to keep fixing the agent’s shot, never felt that with opus 4.5 and 4.6 and cursor
Codex cli is a very performant cli though, better than any other cli code assistant I've used.
I mean does it matter what code it's producing? If it renders and functions just use it. I think it's better to take the L on verbose code and optimizing the really ugly bits by hand in a few minutes than be kneecapped every 5 hour by limits and constant pleas to shift to Sonnet.
you've always been a Claude Code guy? this has existed less than a year.
I was born clutching a Claude Code shell, you peasant.
The first sentence out of my mouth was a system prompt
To be fair that still feels like an eternity somehow.
Perhaps AI time is the inverse of Valve time.
Thanks a lot, after today I have fully switched to Codex I think.
This vscode extension makes it almost as easy to point codex to something as when doing it in cursor:
https://github.com/suzukenz/vscode-copy-selection-with-line-...
+1, codex 5.2 was really good and 5.3 seems to be better at everything; caveat - I had little time to test it.
I promise you you're just going to continue to light money on fire. Don't fall for this token madness, the bigger your project gets, the less capable the llm will get and the more you spend per request on average. This is literally all marketing tricks by inference providers. Save your money and code it yourself, or use very inexpensive llm methods if you must.
I think we are going to start hearing stories of people going into thousands in CC debt because they were essentially gambling with token usage thinking they would hit some startup jackpot.
Compared to the salary I loose by not taking a consulting gig for half a year, these $800 arent't all that much. (I guess depending on definition of bootstrap, mine might not be, as I support myself with saved consulting income.)
Startup is a gamble with or without the LLM costs.
I have been coding for 20 years, I have a good feel for how much time I would have spent without LLM assistance. And if LLMs vanish from the face of the earth tomorrow, I still saved myself that time.
Well if you've been "coding" for 20 years you should have known after a pretty short period of time it doesn't save you any time.
You can rent a GPU server and run your own Qwen models.
It's 90 percent the same thing as Claude but with flat-rate costs.
Any sufficiently complicated LLM generated program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of an open source project.
We had an effort recently where one much more experienced dev from our company ran Claude on our oldish codebase for one system, with the goal of transforming it into newer structure, newer libraries etc. while preserving various built in functionalities. Not the first time this guy did such a thing and he is supposed to be an expert.
I took a look at the result and its maybe half of stuff missing completely, rest is cryptic. I know that codebase by heart since I created it. From my 20+ years of experience correcting all this would take way more effort than manual rewrite from scratch by a senior. Suffice to say thats not what upper management wants to hear, llm adoption often became one of their yearly targets to be evaluated against. So we have a hammer and looking for nails to bend and crook.
Suffice to say this effort led nowhere since we have other high priority goals, for now. Smaller things here & there, why not. Bigger efforts, so far sawed-off 2-barrel shotgun loaded with buckshot right into both feet.
Not to take away from your experience but to offer a counterpoint.
I used claude code to port rust pdb parsing library to typescript.
My SumatraPDF is a large C++ app and I wanted visibility into where does the size of functions / data go, layout of classes. So I wanted to build a tool to dump info out of a PDB. But I have been diagnosed with extreme case of Rustophobiatis so I just can't touch rust code. Hence, the port to typescript.
With my assistance it did the work in an afternoon and did it well. The code worked. I ran it against large PDB from SumatraPDF and it matched the output of other tools.
In a way porting from one language to another is extreme case of refactoring and Claude did it very well.
I think that in general (your experience notwithstanding) Claude Caude is excellent at refactorings.
Here are 3 refactorings from SumatraPDF where I asked claude code to simplify code written by a human:
https://github.com/sumatrapdfreader/sumatrapdf/commit/a472d3... https://github.com/sumatrapdfreader/sumatrapdf/commit/5624aa... https://github.com/sumatrapdfreader/sumatrapdf/commit/a40bc9...
I hope you agree the code written by Claude is better than the code written by a human.
Granted, those are small changes but I think it generalizes into bigger changes. I have few refactorings in mind I wanted to do for a long time and maybe with Claude they will finally be feasible (they were not feasible before only because I don't have infinite amount of time to do everything I want to do).
“I want this thing, but in a different language” seems to be something that the current generation of cutting edge LLMs are pretty good at.
Translating a vibe is something the Ur-LLMS (GPT3 etc) were very good at so it’s not entirely surprising that the current state of the art is to be found in things of a “translate thing X that already exists into context Y” nature.
All software before LLMs had a copious number of bugs, many of which were never fixed.
The real insight buried in here is "build what programmers love and everyone will follow." If every user has an agent that can write code against your product, your API docs become your actual product. That's a massive shift.
I'm very much looking forward to this shift. It is SO MUCH more pro-consumer than the existing SaaS model. Right now every app feels like a walled garden, with broken UX, constant redesigns, enormous amounts of telemetry and user manipulation. It feels like every time I ask for programmatic access to SaaS tools in order to simplify a workflow, I get stuck in endless meetings with product managers trying to "understand my use case", even for products explicitly marketed to programmers.
Using agents that interact with APIs represents people being able to own their user experience more. Why not craft a frontend that behaves exactly the the way YOU want it to, tailor made for YOUR work, abstracting the set of products you are using and focusing only on the actual relevant bits of the work you are doing? Maybe a downside might be that there is more explicit metering of use in these products instead of the per-user licensing that is common today. But the upside is there is so much less scope for engagement-hacking, dark patterns, useless upselling, and so on.
> Right now every app feels like a walled garden, with broken UX, constant redesigns, enormous amounts of telemetry and user manipulation
OK, but: that's an economic situation.
> so much less scope for engagement-hacking, dark patterns, useless upselling, and so on.
Right, so there's less profit in it.
To me it seems this will make the market more adversarial, not less. Increasing amounts of effort will be expended to prevent LLMs interacting with your software or web pages. Or in some cases exploit the user's agentic LLM to make a bad decision on their behalf.
the "exploit the user's agentic LLM" angle is underappreciated imo. we already see prompt injection attacks in the wild -- hidden text on web pages that tells the agent to do things the user didn't ask for. now scale that to every e-commerce site, every SaaS onboarding flow, every comparison page.
it's basically SEO all over again but worse, because the attack surface is the user's own decision-making proxy. at least with google you could see the search results and decide yourself. when your agent just picks a vendor for you based on what it "found," the incentive to manipulate that process is enormous.
we're going to need something like a trust layer between agents and the services they interact with. otherwise it's just an arms race between agent-facing dark patterns and whatever defenses the model providers build in.
Maybe. Or maybe services will switch to charging per API call or whatever instead of monthly or per-seat. Who can predict the future?
I mean, services _could_ make it harder to use LLMs to interact with them, but if agents are popular enough they might see customers start to revolt over it.
This extends further than most people realize. If agents are the primary consumers of your product surface, then the entire discoverability layer shifts too. Right now Google indexes your marketing page -- soon the question is whether Claude or GPT can even find and correctly describe what your product does when a user asks.
We're already seeing this with search. Ask an LLM "what tools do X" and the answer depends heavily on structured data, citation patterns, and how well your docs/content map to the LLM's training. Companies with great API docs but zero presence in the training data just won't exist to these agents.
So it's not just "API docs = product" -- it's more like "machine-legible presence = existence." Which is a weird new SEO-like discipline that barely has a name yet.
> Using anything other than the frontier models is actively harmful
If that is true, why should one invest in learning now rather than waiting for 8 months to learn whatever is the frontier model then?
So that you can be using the current frontier model for the next 8 months instead of twiddling your thumbs waiting for the next one to come out?
I think you (and others) might be misunderstanding his statement a bit. He's not saying that using an old model is harmful in the sense that it outputs bad code -- he's saying it's harmful because some of the lessons you learn will be out of date and not apply to the latest models.
So yes, if you use current frontier models, you'll need to recalibrate and unlearn a few things when the next generation comes out. But in the meantime, you will have gotten 8 months (or however long it takes) of value out of the current generation.
You also don't have to throw away everything you've learnt in those 8 months, there's some things that you'll subtly pickup that you can carry over into the next generation as well.
Also a lot of what you learn is how to work around limitations of today's models and agent frameworks. That will all change, and I imagine things like skills and subagents will just be an internal detail that you don't need to know about.
snarky answer: so you can be that 'AI guy' at your office that everyone avoids in the snackroom
Because you might want to use LLMs now. If not, it's definitely better to not chase the hype - ignore the whole shebang.
But if you do want to use LLMs for coding now, not using the best models just doesn't make sense.
It's not like you need to take a course. The frontier models are the best, just using them and their harnesses and figuring out what works for your use case is the 'investing in learning'.
There's not that much learning involved. Modern SOTA models are much more intelligent than what they used to be not long ago. It's quite scary/amazing.
How could it be actively harmful if it wasn't harmful last month when it was the frontier model?
> Agent harnesses have not improved much since then. There are things Sketch could do well six months ago that the most popular agents cannot do today.
I think this is a neglected area that will see a lot of development in the near future. I think that even if development on AI models stopped today - if no new model was ever trained again - there are still decades of innovation ahead of us in harnessing the models we already have.
Consider ChatGPT: the first release relied entirely on its training data to answer questions. Today, it typically does a few Google searches and summarizes the results. The model has improved, but so has the way we use it.
agreed, and i'd go further - the harness is where evaluation actually happens, not in some separate benchmark suite. rhe model doesn't know if it succeeded at a web task. the harness has to verify DOM state, check that the right element was clicked, confirm the page transitioned correctly. right now most harnesses just check "did the model say it was done" which is why pass rates on benchmarks don't translate to production reliability. the interesting harness work is building verification into the loop itself, not as an afterthought.
Really? I hardly think it's neglected. The Claude Code harness is the only reason I come back to it. I've tried Claude via OpenCode or others and it doesn't work as well for me. If anything, I would even argue that prior to 4.6, the main reason Opus 4.5 felt like it improved over months was the harness.
Related. Others?
How I program with agents - https://news.ycombinator.com/item?id=44221655 - June 2025 (295 comments)
> It sounds like someone saying power tools should be outlawed in carpentry.
I see this a lot here
All metaphors break down at a certain point, but power tools and generative AI/LLMs being compared feels like somebody is romanticizing the art of programming a bit too much.
Copyright law, education, just the sheer scale of things changing because of LLMs are some things off the top of my head why "power tools vs carpentry" is a bad analogy.
if that someone is clumsy, had an active war going on against basic tools before, and wandered into the carpentry from completely different area, then power tools might be a bad idea.
On HN lately? Haven't seen anything about outlawing. But I see a lot of "powertools don't work and make me slower"
They're not powertools lol. Tech has plenty of powertools and we automated the crap out of our job already.
Writing code has never been the limiting factor, it's everything else that goes into it.
Like, I don't mind that there's a bunch of weekend warriors out here building shoddy gazebos and sheds with their brand new overpriced tools, incorrecting each other on the best way to do things. We had that with the bitcoin and NFT bros already.
What I do roll my eyes at is when the bros start talking about how they're totally going to build bridges and planes and it's gonna be soooo easy to get to new places, just slap down a bridge.
Uh huh. Y'all do not understand what building those actually entails lol.
Yes because A tech-bro AIs dream is hundreds of thousands of developers being let go and replacing them with no code tools.
Sure, replace me with AI, but I better get royalties on my public contributions. I like many other developers have kids and other responsibilities to pay for.
We did not share our work publicly to be replaced. The same way I did not lend my neighbour my car so he could run me over, that was implicit.
We’ve been doing this to other professions for half a century. Live by the sword, die by the sword.
Are you really going to appeal to nature here? As a species we are basically magic. We have gone to the moon, cured countless diseases and many other amazing things, but not to stop being taken advantage of? Is it too high of an ask?
> Along the way I have developed a programming philosophy I now apply to everything: the best software for an agent is whatever is best for a programmer.
Not a plug but really that’s exactly why we’re building sandboxes for agents with local laptop quality. Starting with remote xcode+sim sandboxes for iOS, high mem sandbox with Android Emulator on GPU accel for Android.
No machine allocation but composable sandboxes that make up a developer persona’s laptop.
If interested, a quick demo here https://www.loom.com/share/c0c618ed756d46d39f0e20c7feec996d
muvaf[at]limrun[dot]com
Disagree with the point about anything less than opus being harmful to learning.
Much of my learning still requires experimentation - including lots of token volume so hitting limits is a problem.
And secondly I’m looking for workflows that build the thing without needing to be at the absolute edge of the LLM capability. Thats where fragility and unpredictability live. Where a new model with slightly different personality is released and it breaks everything. I’d rather have flow that is simple and idiot proof that doesn’t fall apart at the first sign of non-bleeding edge tokens. That means skipping the gains from something opus could one shot ofc but that’s acceptable to me
Regarding the shift away from time spent on agriculture over the last century or so..
> That was a net benefit to the world, that we all don't have to work to eat.
I’m pretty sure most all of us are still working to have food to eat and shelter for ourselves and our families.
Also, while the on-going industrial and technological revolution has certainly brought benefits, it’s an open question as to whether it will turn out to be a net benefit. There’s a large-scale tragedy of the commons experiment playing out and it’s hard to say what the result will be.
> Along the way I have developed a programming philosophy I now apply to everything: the best software for an agent is whatever is best for a programmer.
I agree with this and I think it's funny to see people publish best practices for working with AI that are like, "Write a clear spec. Have a style guide. Use automated tests."
I'm not convinced it's 100% true because I think there are code patterns that AI handles better than humans and vice versa. But I think it's true enough to use as a guiding philosophy.
> Some believe AI Super-intelligence is just around the corner (for good or evil). Others believe we're mistaking philosophical zombies for true intelligence, and speedrunning our own brainrot
Not sure which camp I'm in, but I enjoyed the imagery.
> I am having more fun programming than I ever have, because so many more of the programs I wish I could find the time to write actually exist. I wish I could share this joy with the people who are fearful about the changes agents are bringing.
It might be just me but this reads as very tone deaf. From my perspective, CEOs are seething at the mouth to make as many developers redundant as possible, not being shy about this desire. (I don't see this at all as inevitable, but tech leaders have made their position clear)
Like, imagine the smugness of some 18th century "CEO" telling an artisan, despite the fact that he'l be resigned to working in horrific conditions at a factory, to not worry and think of all the mass produced consumer goods he may enjoy one day.
It's not at all a stretch of the imagination that current tech workers may be in a very precarious situation. All the slopware in the world wouldn't console them.
I bought Steve Yegge's "Vibe Coding" book. I think I'm about 1/4th of the way through it or so. One thing that surprised me is there's this naivete on display that workers are going to be the ones to reap the benefits of this. Like, Steve was using an example of being able to direct the agent while doing leisure activities (never mind that Steve is more of an executive/thought leader in this company, and, prior to LLMs, seemed to be out of the business of writing code). That's a nice snapshot of a reality that isn't going to persist..
While the idea of programmers working two hours a day and spending the rest of it with their family seems sunny, that's absolutely not how business is going to treat it.
Thought experiment... CEO has a team of 8 engineers. They do some experiments with AI, and they discover that their engineers are 2x more effective on average . What does the CEO do?
a) Change the workweek to 4 hours a day so that all the engineers have better work/life balance since the same amount of work is being done.
b) Fire half the engineers, make the 4 remaining guys pick up the slack, rinse and repeat until there's one guy left?
Like, come on. There's pushback on this stuff not because the technology is bad, (although it's overhyped), but because the no sane person trusts our current economic system to provide anything resembling humane treatment of workers. The super rich are perfectly fine seeing half the population become unemployed, as far as I can tell, as long as their stock numbers go up.
You missed option c. C) keep all 8 engineers so the team can pump out features faster, all still working 8 hour days. The ceo will probably be forced to do it to keep up with their competition.
Haven't read that book, but agree that if anyone thinks the workers are likely to capture the value of this productivity shift, they haven't been paying attention to reality.
Though at the same time I also think a lot of the CEO-types (at least in the pure software world) who believe they are going to capture the value of this productivity shift are also in for a rude awakening because if AI doesn't stall out, its only a matter of time from when their engineers are replaceable to when their company doesn't need to exist at all anymore.
in addition to B, the best part is the 4 fired engineers can go around and say "we'll do the same work as those 4 for 10% less" and so on.
"AI won't replace you. The guy who's about to get fired but has more to lose is going to replace you."
IDEs are going to come roaring back.
As the author says, there's nothing wrong with the idea of the IDE. Of course you want to be using the best, most powerful tools!
AI showed us that our current-gen text-editor-first IDEs are massively underserving the needs of the public, yes, but it didn't really solve that problem. We still need better IDEs! What has changed is that we now understand how badly we need them. (source: I am an IDE author)
> In 2026, I don't use an IDE any more.
I don't think it is the best way to look at it. I think that now every team has the power to build and maintain an internal agent (tool + UX) to manager software products. I don't necessarily think that chat-only is enough except for small projects, so teams will build agent that gives them access to the level of abstraction that works best.
It's a data point but this weekend (e.g. in 2 days) I build a desktop + web agent that is able to help me reason on system design and code. Built with Codex powered by the Codex SDK. It is high quality. I've been a software engineer and director of engineering for 10 years. I'm blown away.
Curious what kind of agent did you build? I'm building a programming agent myself, it's intentionally archaic in that you run it by constantly copy-pasting from-to fresh ChatGPT sessions (: I'm finding it challenging to have it do good context management: I'm trying to solve this by declaring parts of code or spec as "collections" with an overview md file attached that acts like a map of why/where/what, but that can't scale indefinitely.
Send an DM on twitter to @edfixyz (one of my account) and I'll reply with a link to the website tomorrow to give you a sense. Can't share a link here, it will kill my backend.
Care to send me a link on my email? It's in my about. I don't use my X account and can't seem to login, the verification SMS never arrives.
> ...and director of engineering for 10 years. I'm blown away.
It's always the CTO types who get most enthusiastic.
I’m not saying this is definitely a bot. However, this is the 7th time I’ve read a post and thought it might be an OpenAI promotion bot, clicked on the username, and noticed that the account was created in 2011.
I have yet to do this and see any other year. Was there someone who bought a ton of accounts in 2011 to farm them out? A data breach? Was 2011 just a very big year for new users? (My own account is from 2011)
I'm not a bot. You are saying that because for some reason you resent people who have a good experience with Codex / OpenAI. Curious what that is - people hate the CEO or what?
I like Claude Code too btw.
The crazy thing here is that I wrote the initial comment myself!
> It's a data point but this weekend (e.g. in 2 days) I build a desktop + web agent that is able to help me reason on system design and code. Built with Codex powered by the Codex SDK. It is high quality. I've been a software engineer and director of engineering for 10 years. I'm blown away.
Assuming you’re not a bot. It’s nothing to do with you having a good experience, it’s the way you wrote about that experience that sounds like a product placement.
I asked OpenAIs very own ChatGPT 5.2 powered by OpenAI to tell you why it sounds like a product placement:
“ Because it hits a bunch of “native ad / testimonial” tells at once: • Brand-name density in a tiny space. “Built with Codex powered by the Codex SDK” repeats the same brand in two adjacent phrases, like copy that’s trying to lodge a name in your head rather than naturally describe a build. • Overly polished value signals. “High quality” is a generic superlative with no concrete evidence (features, metrics, constraints, tradeoffs). Ads often lean on verdict words instead of specifics. • Credential + astonishment combo. “I’ve been a software engineer and director of engineering for 10 years” is classic authority framing, immediately followed by “I’m blown away.” That’s a common testimonial structure: I’m hard to impress → I’m impressed. • Time-compressed “miracle build” narrative. “This weekend (in 2 days) I build a desktop + web agent…” reads like the “you can do it fast/easily now” story arc you see in promos. Not impossible—just a familiar marketing shape. • “It’s a data point” language. That phrase feels like social-proof seeding: “don’t treat this as hype, just one datapoint,” which paradoxically makes it feel more like deliberate persuasion. • No friction or downsides. Real engineer excitement usually includes at least one caveat (bugs, rough edges, limitations, cost, setup pain). The total absence makes it sound curated. • Benefit phrased like positioning. “Able to help me reason on system design and code” is basically a product pitch line (target user + problem + outcome) rather than a personal anecdote (“it helped me untangle X design and refactor Y”).”
That's exactly what a bot would say
2011 just so happened to be 4 years before a very important year: 2015 — The founding of OpenAI. Unrelated note, have you tried Codex and the Codex SDK?
It's definitely a bot, just like probably around 10% of comments on HN at this point, and the majority of upvotes. And it's only increasing.
Calling it bot is a bit dismissive though. It's an agent!
Care to have a phone call with who you call a bot tonight?
If so, send a DM on twitter to @edfixyz with your phone number and I will call you immediately. Or give me your twitter handle.
I'm tired of that BS - when people don't like what you write they call you a bot.
it is giving a very agentic vibe
Look, I'm very negative about this AI thing. I think there is a great chance it will lead to something terrible and we will all die, or worse. But on the other hand, we are all going to die anyway. Some of us, the lucky ones, will die of a heart attack and will learn of our imminent demise in the second it happens, or not at all. The rest of us will have it worse. It has always been like that, and it has only gotten more devastating since we started wearing clothes and stopped being eaten alive by a savanna crocodile or freezing to death during the first snowfall of winter.
But if AI keeps getting better at code, it will produce entire in-silico simulation workflows to test new drugs or even to design synthetic life (which, again, could make us all die, or worse). Yet there is a tiny, tiny chance we will use it to fix some of the darkest aspects of human existence. I will take that.
That's stupid. If you genuinely think that there's a great chance AI will kill us all, you wouldn't spin the wheel just for some small vague chance that it doesn't and something good (what exactly, nobody knows) will happen
A lot of very silly people have convinced themselves and you of this, but it is not true, was never true, and is never going to be true.
We have a lot of actual problems to deal with that aren't telling ghost stories about sand. Focus on those.
Its funny how many variations of meaning people assign to agent related terms. Conflating agent with cli and as opposite spectrum of ide is a new one i did not encounter before. I run agents with vscode-server also in a vm and would not give up the ability to have a proper gui anytime i feel like and also being able to switch seamless between more autonomous operation and more interactive seems useful at any level.
The sandboxing pain is real. Sadly, a new VM seems like the most simple and viable solution. I don't think the masses are doing any sandboxing at all. We really need a sandbox solution that is sort of dynamic and doesn't pester the user with allow/deny requests. It has to be intelligent and keep up with the llm agents.
> By far the greatest IDE I have ever used was Visual Studio C++ 6.0 on Windows 2000. I have never felt like a toolchain was so complete and consistent with its environment as there.
+1. I've tried many times, and failed, to replicate the joy of using that toolchain.
I agree with his assessment up until this point in time, it is where we currently are. But it seems to me there is still a large chunk of engineers who don't extrapolate capability out to the engineer being taken out of the loop completely. Imo, it happens in fairly short order. 2-3 years.
On what basis are you making that prediction?
> the best software for an agent is whatever is best for a programmer
My conclusion as well. It feels paradoxical, maybe because on some level I still think of an LLM as some weird gadget, not a coworker. Context ephemerality is more or less the only veritable difference from a human programmer, I'd say. And, even then, context introduction with LLMs is a speedrun of how you'd do it with new human members of a project. Awesome times we live in.
> By far the greatest IDE I have ever used was Visual Studio C++ 6.0 on Windows 2000
Visual C++ 6 was incredible! My favourite IDE of all time too.
Poor fellow has never used IntelliJ IDEA.
Yes I have.
"Eight more months of Bitcoin. It's usage continues to dramatically expand. The amount of transactions is increasing exponentially. Soon, fiat currencies will collapse, all replaced by Bitcoin transactions. If you haven't converted your assets over to Bitcoin you're going to be left behind and lose it all. I can't even understand people that don't see the obvious technical superiority of Bitcoin, such people are going to go through rough times."
>this is why I'm building
My clipart folder of that kid with the lolipop continues to stay relevant
What are those 3 sentences that the author typed to replicate Stripe for his situation?
Curious what you mean by "agent harness" here... are you distinguishing between true autonomous agents (model decides next step) vs workflows that use LLMs at specific nodes? I've found the latter dramatically more reliable for anything beyond prototyping, which makes me wonder if the "model improvement" is partly better prompting and scaffolding.
An agent harness is what enables the user to seamlessly interact with both a model and tool calls. Claude Code is an agent harness.
Here's an example of a harness with less code: https://github.com/badlogic/pi-mono/blob/fdcd9ab783104285764...┌────────────────────────────┐ │ User │ └──────────────┬─────────────┘ │ ▼ ┌────────────────────────────┐ │ Agent Harness │ │ (software interface) │ └──────┬──────────────┬──────┘ │ │ ▼ ▼ ┌────────────┐ ┌────────────┐ │ Models │ │ Tools │ └────────────┘ └────────────┘Hi, author here. I mean the piece of code that calls the model and executes the tool calls. My colleague Philip calls it “9 lines of code”: https://sketch.dev/blog/agent-loop
We have built two of them now, and clearly the state of the art here can be improved. But it is hard to push too much on this while the models keep improving.
the harness being "9 lines of code" is deceptive in the same way a web server is "just accept connections and serve files."
the hard part isn't the loop itself — it's everything around failure recovery.
when a browser agent misclicks, loads a page that renders differently than expected, or hits a CAPTCHA mid-flow, the 9-line loop just retries blindly. the real harness innovation is going to be in structured state checkpointing so the agent can backtrack to the last known-good state instead of restarting the whole task. that's where the gap between "works in a demo" and "works on the 50th run" lives.
I have no problem with experienced senior devs using agents to write good code faster. What I have a problem with is inexperienced "vibecoders" who don't care to learn and instead use agents to write awful buggy code that will make the product harder to build on even for the agents. It used to be that lack of a basic understanding of the system was a barrier for people, but now it's not, so we're flooded with code written by imperfect models conducted by people who don't know good from bad.
the number of experienced, senior programmers though, who are in “anti-LLM” camp, is still fairly staggering.
Why is that staggering? That feels like a pretty dramatic expression. Is it a foregone conclusion that one must use agents?
one does not have to use anything at all… but if someone is “senior” and is incapable of using llms for some parts of her/his job then senior part is just age related and not tied to skill level
You originally said being "anti-llm", but you now refer to being "incapable". Surely you can see that those are different things?
I mean when the tag line is "this will replace senior engineers and you, the senior engineer, must be forced to use it"
Then yeah, it makes sense.
Yeah I’m baffled why people are surprised that senior+ engineers who are being told in one breath they will be replaced by this tool and also they MUST use this tool to make it better to replace them aren’t happy about it or want to use it willingly.
I also find it wild how we’re sleepwalking into this, but I’m also part of the problem and using these things too.
as nvidia CEO wisely said - you won’t be replaced by these tools, you will be replaced by folk who excel at utilizing these tools
If you're forced to use it by company mandate then that's fine. If you're not forced and still use it being fully aware, then I wish you well.
I’m forced to yes. It’s tracked.
Where are you encountering all this slop code? At my work we use LLMs heavily and I don't see this issue. Maybe I'm just lucky that my colleagues all have Uni degrees in CS and at least a few years experience.
> Maybe I'm just lucky that my colleagues all have Uni degrees in CS and at least a few years experience.
That's why. I was using Claude the other day to greenfield a side project and it wanted to do some important logic on the frontend that would have allowed unauthenticated users to write into my database.
It was easy to spot for me, because I've been writing software for years, and it only took a single prompt to fix. But a vibe coder wouldn't have caught it and hackers would've pwned their webapp.
You can also ask Claude to review all the code for security issues and code smells, you'd be surprised what it finds. We all write insecure code in our first pass through if we're too focused on getting the proof of concept worked out, security isnt always the very 1st thing coded, maybe its the very next thing, maybe it comes 10 changes later.
> We all write insecure code in our first pass through
no, we don't
Yes we do, you don't just start a brand new web project and spit out CORS rules, authentication schemes, roles, etc in one sitting do you? Are you an AI?
> are you an AI?
no, I'm a competent engineer
maybe you've not worked with any
So let me get this straight, you get instructed to build an Instagram clone, and you sit down and one shot code every single feature for the project? My point is about in one sitting, doing EVERYTHING all at once, without pausing, without standing up, without breaks. I don't know about you but people who tend to rush code out make just as many if not worse mistakes than AI does.
I've worked with many competent engineers and have built things people couldn't even google help for before AI existed, and that surpassed mine and my teams expectations both solo and in a team setting, none of them were done in one sitting, which is what you're suggesting. Everything is planned out, and done piecemeal.
For the record, I can one shot an AI model to do all of those things, with all the detail they need and get similar output as if I gave a human all those tasks, I know because I've built the exact tooling to loop AI around the same processes competent developers use, and it still can do all of it in record time.
> I can one shot an AI model to do all of those things
Bullshit you can lol. If it's that trivial, create an instagram right now and post the code.
Yes I really do, because this has been a solved problem for a while. Also it’s necessary to get right because retro fitting it later is a pain.
So if you're going to build a massive application say, YouTube, Facebook or Instagram you're going to sit down, and write out every template, db model, controller, view model, etc in one single sitting for the entire application? No bathroom breaks, no lunch, no "I'll finish that part tomorrow" you do it ALL in one sitting? Because you will miss something, and that's my point, nobody gets their first crack at a greenfield project 100% in one sitting, you build it up to what it is. The AI is used the same way.
No, the AI writes far less secure code than I do to start, even with the SotA models and careful prompting/detailed plans.
You’ve moved the goalposts so far that you’re now talking about a different game altogether.
I actually do build all of those things before standing something up in prod. Not doing that is insane. Literally every web framework has reasonable defaults baked in.
Any competent tech company will have canned ways to do all of those things that have already been reviewed and vetted
I never said anything about before hitting production, I said do you build everything in one shot when you start a brand new project, in one sitting.
Why are you building and deploying a site critical enough to need CSP and user security & so on in one sitting lol
Anyways, yes, if I know I'm gonna need it? Because every framework has reasonable defaults or libraries for all of those things, and if you're in a corporate environment, you have vetted ways of doing them
1. import middleware.whatever
2. configure it
3. done
Like, you don't write these things unless you need custom behavior.
The issue isn't when the programmers start using it. It's when the project managers start using it and think that they're producing something similar to the programmers
We're in a transition phase, but this will shake out in the near future. In the non-professional space, poorly built vibecoded apps simply won't last, for any number of reasons. When it comes to professional devs, this is a problem that is solved by a combination of tooling, process, and management:
(1) Tooling to enable better evaluation of generated code and its adherence to conventions and norms (2) Process to impose requirements on the creation/exposure of PRDs/prompts/traces (3) Management to guide devs in the use of the above and to implement concrete rewards and consequences
Some organizations will be exposed as being deficient in some or all of these areas, and they will struggle. Better organizations will adapt.
The unfortunate reality is that (1) and (2) is what many, many engineers would like to do, but management is going EXACTLY in the opposite direction: go faster! Go faster! Why are you spending time on these things
> In 2000, less than one percent lived on farms and 1% of workers are in agriculture. That was a net benefit to the world, that we all don't have to work to eat.
The jury's still out on that one, because climate change is an existential risk.
Existential? Maybe to beachfront property owners
Everybody gangsta until the permafrost starts leaking massive amounts of methane.
Did you know that 10% of the world's population lives in coastal zones at low elevations?
But they'll just disappear into thin air peacefully when that happens right? It's not like they're gonna fight tooth and nail to find a place to survive, that'd be rude.
Idiot
>I wish I could share this joy with the people who are fearful about the changes agents are bringing.
The 'fear' is about losing ones livelihood and getting locked out of homeownership and financial security. its not complicated. life is actually largely determined by your access to capital, despite whatever fresh coping strategy the afflicted (and the afflicting) like to peddle.
the quality of life versus capital availability is very non-linear. there is a step-change around the $500k mark where you reach 'orbital velocity', where as long as you dont suffer severe misfortune or make mistakes, you will start accelerating upwards (albeit very slowly.)
under that line, you are constantly having to fight 'gravity'.
basically everyone in tech is openly or quietly aiming to get there, and LLMs have made that trek ever more precarious than before.
guys it's an ad
Nah, it's not, really
> In 2000, less than one percent lived on farms and 1% of workers are in agriculture. That was a net benefit to the world, that we all don't have to work to eat.
Not obvious
> To me that statement is as obvious as "water is wet".
Well... is water *wet* or does it *wet things*? So not obvious either.
I'm really dubious when reading posts posing some things as obvious or trivial. In general they are not.
"In 2026, I don't use an IDE any more."
Just a question? What IDE feature is obsolete now? Ability to navigate the code? Integration with database, Docker, JIRA, Github (like having PR comments available, listed, etc), Git? Working with remote files? Building the project?
Yes, I can ask copilot to build my project and verify tests results, but it will eat a lot of tokens and added value is almost none.
> I can ask copilot to build my project and verify tests results, but [..] added value is almost none.
The added value is that it can iterate autonomously and finish tasks that it can't one-shot in its first code edit. Which is basically all tasks that I assign to Copilot.
The added value is that I get to review fully-baked PRs that meet some bar of quality. Just like I don't review human PRs if they don't pass CI.
Fully agree on IDEs, though. I absolutely still need an IDE to iterate on PRs, review them, and tweak them manually. I find VSCode+Copilot to be very good for this workflow. I'm not into vibe coding.
Listen to this guy. I've been using his code for a long time, and it works. I am a happy customer of his service, and it works. I listen to his advice and it works.
The author has a github.
In the past couple days I've become less skeptical of the capabilities of LLMs and now more alarmed by them, contra the author. I think if we as a society continue to accept the development of LLMs and the control of them by the major AI companies there will be massively negative repercussions. And I don't mean repercussions like "a rogue AI will destroy humanity" per se, but these things will potentially cause massive social upheaval, a large amount of negative impacts on mental health and cognition, etc. I think if you see LLMs as powerful but not dangerous you are not being honest.
There are some good things here:
First, we currently have 4 frontier labs, and a bunch of 2nd tier ones following. The fact that we don't have just oAI or just Anthropic or just Google is good in the general sense, I would say. The 4 labs racing each other and trading SotA status for ~a few weeks is good for the end consumer. They keep each other honest and keep the prices down. Imagine if Anthropic could charge 60$ /MTok or oAI could charge 120$ /MTok for their gpt4 style models. They can't in good part because of the competition.
Second, there's a bunch of labs / companies that have released and are continuing to release open mdoels. That's as close to "intelligence on tap" as you can get. And those models are ~6-12 months behind the SotA models, depending on your usecase. Even though the labs have largely different incentives to do so, a lot of them are still releasing open models. Hopefully that continues to hold. So not all control will be in the hands of big tech, even if the "best" will still be theirs. At some point "good enough" is fine.
There's also the thing about geopolitics being involved in this. So far we've seen the EU jumping the gun on regulation, and we're kinda sorta paying for it. Everyone is still confused about what can or cannot be done in the EU. The US seems to be waiting to see what happens, and China will do whatever they do. The worst thing that can happen is that at some point the big players (Anthropic is the main driver) push for regulatory capture. That would really suck. Thankfully atm there's this lingering thinking that "if we do it, the others won't so we'll be on the back foot". Hopefully this holds, at least until the "good enough" from above is out :)
I'm not just concerned about control by one company, I'm concerned by control for the profit motive, and probably concerned about the wisdom of using these things for anything except extremely limited use cases (breakthrough scientific research, etc.). I think tech people have a bad tendency of viewing this through the lens of platform wars type stakes, and there are much bigger problems with AI. The fact that an alarming number of ex-and-current Anthropic people I've met think the world is going to end is something we should take heed of!
The AI labs started down this path using the Manhattan Project as a metaphor and guess what? It's a good metaphor and we should embrace most of the wider implications of that (though I'd love to avoid all the MAD/cold war bullshit this time).
I see them as powerful and dangerous. The goal for decades now is to reduce the human population to 500 million. All human technology was pushed to this end, covertly. If we suddenly have a technology that renders white collar workers useless, we will get to that number faster than expected.
I don't believe that is true, but if it WAS true that human technology was covertly pushed to this end: there are people out there who are demanding that this technology come up with social manipulations (using language) to reduce the human population to a SPECIFIC 500 million.
Or less.
And I don't think it's collar color they're going to be checking against.
So I guess I'm saying I agree that this is powerful and dangerous. These are language models, so they're more effective against humans and their languages. And self-preservation, empathy, humanity do not play a role as there is nobody in there to be offended at the notion of intentionally killing more than 9/10 of humanity… for some definitions of humanity, ones I'm sympathetic to.
I see a lot of people here saying things like:
>ah, they're so dumb, they don't get it, the anti-LLM people
This is one of the reasons I see AI failing in the short term. If I call you an idiot, are you more or less likely to be open minded and try what I'm selling? AI isn't making money, 95% of companies are failing with AI
https://fortune.com/2025/08/18/mit-report-95-percent-generat...
I mean, your AIs might be a lot more powerful if it was generating money, but that's not happening. I guess being condescending to the 95% of potential buyers isn't really working out.
> To me that statement is as obvious as "water is wet".
Water is not wet. Water makes things wet. Perhaps the inaccuracy of that statement should be taken as a hint that the other statements that you hold on the same level are worthy of reconsideration.
The good old classic technically correct and completely besides the point observation.
username checks out
Yes, technically HN is full of these kinds of corrections... but HN isn't actually wet.