Why agents are bad pair programmers

justin.searls.co

261 points by sh_tomer a day ago


Traster - 12 hours ago

I think this has put into words a reason why I bounced off using AI this way, when I need something done I often have a rough idea of how I want it done, and how AI does it often doesn't match what I want, but because it's gone off and written a 2,000 lines of code it's suddenly more work for me to go through and say "Ok, so first off, strip all these comments out, you're doubling the file with trivial explanations of simple code. I don't want X to be abstracted this way, I want that...." etc. And then when I give it feedback 2,000 lines of code suddenly switch to 700 lines of completely different code and I can't keep up. And I don't want my codebase full of disjoint scripts that I don't really understand and all have weirdly different approaches to the problem. I want an AI that I have similar opinions to, which is obviously tough. It's like working with someone on their first day.

I don't know if it's giving the tools less self-confidence per se, but I think it's exposing more the design process. Like ideally you want your designer to go "Ok, I'm thinking of this approach, i'll probably have these sorts of functions or classes, this state will be owned here" and we can approve that first, rather than going straight from prompt -> implementation.

searls - 13 hours ago

Whenever I land on the front page, I check the comments and brace for HN coming and telling me how stupid I am and lighting me aflame in front of my peers.

But sometimes if I manage to nail the right headline, nobody reads my post and just has their own discussion, and I am spared.

khendron - 21 hours ago

When I first tried an LLM agent, I was hoping for an interactive, 2-way, pair collaboration. Instead, what I got was a pairing partner who wanted to do everything themselves. I couldn't even tweak the code they had written, because it would mess up their context.

I want a pairing partner where I can write a little, they write a little, I write a little, they write a little. You know, an actual collaboration.

bluefirebrand - a day ago

Pair programming is also not suitable for all cases

Maybe not for many cases

I mentioned this elsewhere but I find it absolutely impossible to get into a good programming flow anymore while the LLM constantly interrupts me with suggested autocompletes that I have to stop, read, review, and accept/reject

It's been miserable trying to incorporate this into my workflow

jyounker - 29 minutes ago

I think the author's dislike for pair programming says a great deal more about the author than it does about pair programming or LLMs.

If you're pair programming and you're not driving, then it's your job to ask the driver to slow down so you can understand what they're doing. You may have to ask them to explain it to you. You may have to explain it back to them. This back-and-forth is what makes pairing work. If you don't do this, then of course you'll get lost.

The author seems to take the same passive position with an LLM, and the results are similar.

motbus3 - 12 hours ago

I have mixed feelings about this situation. I have committed myself to learning how to use it as effectively as possible and to utilising it extensively for at least one month. Through my company, I have access to multiple products, so I am trying them all.

I can say that I am more productive in terms of the number of lines written. However, I cannot claim to be more productive overall.

For every task it completes, it often performs some inexplicable actions that undo or disrupt other elements, sometimes unrelated ones. The tests it generates initially appear impressive, but upon examining other metrics, such as coverage, it becomes evident that its performance is lacking. The amount of time required to guide it to the correct outcome makes it feel as though I am taking many steps backwards before making any significant progress forward—and not in a beneficial way. On one occasion, it added 50,000 unnecessary import lines into a module that it should not have been altering.

On another occasion, one of the agents completely dismantled the object-oriented programming hierarchy, opting instead to use if/else statements throughout, despite the rules I had set.

The issue is that you can never be certain of its performance. Sometimes, for the same task, it operates flawlessly, while at other times, it either breaks everything or behaves unpredictably.

I have tried various techniques to specify what needs to be done and how to accomplish it, yet often, for similar tasks, its behaviour varies so significantly between runs that I find myself needing to review every change it makes each time. Frustratingly, even if the code is nearly correct and you request an update to just one part, it may still behave erratically.

My experience thus far suggests that it is quite effective for small support tools, but when dealing with a medium-sized codebase, one cannot expect it to function reliably every time.

palisade - 20 hours ago

LLM agents don't know how to shut up and always think they're right about everything. They also lack the ability to be brief. Sometimes things can be solved with a single character or line, but no they write a full page. And, they write paragraphs of comments for even the most minuscule of changes.

They talk at you, are overbearing and arrogant.

throwawayffffas - 14 hours ago

In my experience the problem is not they are too fast, they are too slow.

Honestly, their speed is just the right amount to make them bad. If they were faster, I could focus on following the code they are writing. But they take so much time for every edit that I tune out. On the other hand if they were slower, I could do other work while they are working, but they are done every 50 seconds to a few minutes which means I can't focus on other tasks.

If they did smaller faster changes it would probably be better.

Ideally though I would prefer them to be more autonomous, and the collaboration mode to be more like going over merge requests than pair programming. I ideally would like to have them take a task and go away for a few hours or even like 30 minutes.

The current loop, provide a task, wait 1 to 3 minutes, see a bunch of changes, provide guidance, repeat is the worst case scenario in my view.

rhizome31 - 15 hours ago

As a developer who doesn't use AI for coding, except for the occasional non-project specific question to a chat bot, I am wondering if you use it for client projects or only for your own projects. If you do use it for client projects, do you have some kind of agreement that you're going to share their code with a third-party? I'm asking because most clients will make you sign a contract saying that you shouldn't disclose any information about the project to a third-party. I even once had a client who explicitly stated that AI should not be used. Do you find clients willing to make an exception for AI coding agents?

bsenftner - 11 hours ago

The collaborative style of AI use struck me as the obvious correct use of AI, just as the more popular "AI writing code" style struck me as horribly incorrect and indication of yet again the software industry going off on a fool's tangent, as the larger industry so often does.

I never have AI write code. I ask it to criticize code I've written, and I use it to strategize about large code organization. As a strategy consult, with careful LLM context construction, one can create amazing effective guides that teach one new information very successfully. That is me using my mind to understand and then do, never giving any AI responsibilities beyond advice. AI is an idiot savant, and must be treated as such.

Onewildgamer - 19 hours ago

Finally someone said it, they're overconfident in their approach, don't consult us with the details of the implementation, they're trained to create mock APIs that don't follow structure, leading to lot of rework. The LLM actions should be measured, collaborative, ask for details when it's not present. It is impossible to give every single detail in the initial prompt, and a follow up prompt derails the train thought and design of the application.

I don't know if I'm using it right, I'd love to know more if that's the case. In a way the LLM should improve on being iterative, take feedback, maybe it's a hard problem to add/update the context. I don't know about that either, but love to learn more.

shultays - 13 hours ago

A week or so ago I needed to convince chatgpt that following code will indeed initialize x values in struct

  struct MyStruct
  {
    int x = 5;
  };
  ...
  MyStruct myStructs[100];
It was insisting very passionately that you need MyStruct myStructs[100] = {}; instead.

I even showed msvc assembly output and pointed to the place where it is looping & assigning all x values and then it started hallucinating about msvc not conforming the standards. Then I did it for gcc and it said the same. It was surreal how strongly it believed it was correct.

ChrisMarshallNY - a day ago

I use an LLM as a reference (on-demand), and don't use agents (yet). I was never into pair programming, anyway, so it isn't a familiar workflow for me.

I will admit that it encourages "laziness," on my part, but I'm OK with that (remember when they said that calculators would do that? They were right).

For example, I am working on a SwiftUI project (an Apple Watch app), and forgot how to do a fairly basic thing. I could have looked it up, in a few minutes, but it was easier to just spin up ChatGPT, and ask it how to do it. I had the answer in a few seconds. Looking up SwiftUI stuff is a misery. The documentation is ... a work in progress ...

__MatrixMan__ - 20 hours ago

I've been considering a... protocol? for improving this. Consider this repo:

    foo.py
    bar.py
    bar.py.vibes.md
This would indicate that foo.py is human-written (or at least thoroughly reviewed by a human), while bar.py is LLM written with a lower bar of human scrutiny.

bar.py.vibes.md would contain whatever human-written guidance describes how bar should look. It could be an empty file, or a few paragraphs, or it it could contain function signatures and partially defined data types.

If an LLM wants to add a new file, it gets a vibes.md with whatever prompt motivated the addition.

Maybe some files keep their assiciated *.vibes.md forever, ready to be totally rewritten as the LLM sees fit. Maybe others stick around only until the next release, after which the associated code is reviewed and the vibes files are removed (or somehow deactivated, I could imagine it being useful for them to still be present).

What do people think, do we need handcuffs of this kind for our pair programming friends the LLMs?

pjmlp - 3 hours ago

Even humans are bad pair programmers, I always try to steer away from projects or companies that have drinken the whole XP kool aid.

jamil7 - 9 hours ago

I think one huge issue with pairing for non programmers or junior programmers is that the LLM never pushes back on whatever you throw at it. Like it can't deconstruct and examine what that acutal problem and suggest a more robust or simplar alternative.

sltr - 9 hours ago

> to ask a clarifying question

I think one shortfall of LLMs is their reluctance to ask clarifying questions. From my own post [1]:

> LLMs are poor communicators, so you have to make up the difference. Unlike a talented direct report, LLMs don't yet seem generally able to ask the question-behind-the-question or to infer a larger context behind a prompt, or even ask for clarification.

[1] https://www.slater.dev/dev-skills-for-the-llm-era/

pradeepodela - 11 hours ago

The major problem I see with current LLM-based code generation is their overconfidence beyond a certain point. I've experienced agents losing track of what they are developing; a single line change can literally mess up my entire codebase, making debugging a nightmare.

I believe we need more structured, policy-driven models that exhibit a bit of self-doubt, prompting them to revert to us for clarification. Furthermore, there should be certain industry standards in place. Another significant issue is testing and handling edge cases. No matter what, today's AI consistently fails when dealing with these scenarios, and security remains a concern. what are some problems you have noticed ??

skeptrune - 5 hours ago

Title of the blog is negative, but the contents seem fairly positive? If a few UX improvements are the only blocker to the author finding LLMs to be useful pair programmers then we are in a good spot.

epolanski - 4 hours ago

I'm conflicted, you can slow down and take all the time you need to understand and ask to clarify further.

woah - 6 hours ago

> give up on editor-based agentic pairing in favor of asynchronous workflows like GitHub's new Coding Agent, whose work you can also review via pull request

Why not just review the agent's work before making a git commit?

azhenley - 11 hours ago

Writing out hundreds of lines of code is not what I meant by proactive tools…

Where are the proactive coding tools? https://austinhenley.com/blog/proactiveai.html

tobyhinloopen - 18 hours ago

This guy needs a custom prompt. I keep a prompt doc around that is constantly updated based on my preferences and corrections.

Not a few sentences but many many lines of examples and documentation

travisgriggs - 17 hours ago

> Allow users to pause the agent to ask a clarifying question or push back on its direction without derailing the entire activity or train of thought

I think I’ve seen Zed/Claude do kind of this. A couple times, I’ve hit return, and then see that I missed a clarifying statement based on the direction it starts going and I put it in fast, and it corrects.

syllogism - 14 hours ago

LLM agents are very hard to talk about because they're not any one thing. Your action-space in what you say and what approach you take varies enormously and we have very little body of common knowledge about what other people are doing and how they're doing it. Then the agent changes underneath you or you tweak your prompt and it's different again.

In my last few sessions I saw the efficacy of Claude Code plummet on the problem I was working on. I have no idea whether it was just the particular task, a modelling change, or changes I made to the prompt. But suddenly it was glazing every message ("you're absolutely right"), confidently telling me up is down (saying things like "tests now pass" when they completely didn't), it even cheerfully suggested "rm db.sqlite", which would have set me back a fair bit if I said yes.

The fact that the LLM agent can churn out a lot of stuff quickly greatly increases 'skill expression' though. The sharper your insight about the task, the more you can direct it to do something specific.

For instance, most debugging is basically a binary search across the set of processes being conducted. However, the tricky thing is that the optimal search procedure is going to be weighted by the probability of the problem occurring at the different steps, and the expense of conducting different probes.

A common trap when debugging is to take an overly greedy approach. Due to the availability heuristic, our hypotheses about the problem are often too specific. And the more specific the hypothesis, the easier it is to think of a probe that would eliminate it. If you keep doing this you're basically playing Guess Who by asking "Is it Paul? Is it Anne?" etc, instead of "Is the person a man? Does the person have facial hair? etc"

I find LLM agents extremely helpful at forming efficient probes of parts of the stack I'm less fluent in. If I need to know whether the service is able to contact the database, asking the LLM agent to write out the necessary cloud commands is much faster than getting that from the docs. It's also much faster at writing specific tests than I would be. This means I can much more neutrally think about how to bisect the space, which makes debugging time more uniform, which in itself is a significant net win.

I also find LLM agents to be good at the 'eat your vegetables' stuff -- the things I know I should do but would economise on to save time. Populate the tests with more cases, write more tests in general, write more docs as I go, add more output to the scripts, etc.

Pandabob - 15 hours ago

I basically jump away from Cursor to ChatGPT when I need to think thoroughly on something like an architecture decision or an edge case etc. Then when I've used ChatGPT to come up with an implementation plan, I jump back to Cursor and have Claude do the actual coding. O3 and ChatGPT's search functionality are just better (at least for myself) currently for "type 2" thinking tasks.

cmrdporcupine - 7 hours ago

Article has some good points. They move fast, and they can easily run off the rails if you don't watch carefully.

And I've found that it's just as mentally exhausting programming alongside one as it is doing it yourself.

The chief advantage I've found of working alongside Claude is its automation of tedious (to me) tasks.

UltraSane - 20 hours ago

It is rather soul crushing how fast LLMs spit out decent code.

SkyBelow - 10 hours ago

>Continue to practice pair-programming with your editor, but throttle down from the semi-autonomous "Agent" mode to the turn-based "Edit" or "Ask" modes.

This can be done while staying in agent mode. I never used edit mode and only use ask mode when my question has nothing to do with the project I have open. Any other time, I tell it to either make no changes at all as I'm only asking a question to research something, or to limit changes to a much smaller scope and style. It doesn't work perfectly, but it works well enough that it is worth the tradeoff given the extra capabilities agent mode seems to provide (this likely depends upon the specific AI/LLM system you are using, so given another tool I might not arrive at the same conclusion).

atemerev - 17 hours ago

Aider does everything right. Stop using Cursor or any other agentic environments. Try Aider, it works exactly as suggested here.

ninetyninenine - 20 hours ago

>LLM agents make bad pairs because they code faster than humans think.

Easily solved. Use less compute. Use slower hardware. Or put in the prompt to pause at certain intervals.

ramesh31 - a day ago

>LLM agents make bad pairs because they code faster than humans think

This is why I strongly dislike all of the terminal based tools and PR based stuff. If you're left to read through a completed chunk of code it is just overwhelming and your cycle time is too slow. The key to productivity is using an IDE based tool that shows you every line of code as it is being written, so you're reading it and understanding where it's going in real time. Augmentation, not automation, is the path forward. Think of it like the difference between walking and having a manual transmission car to drive, not the difference between having a car and having a self driving car.

1Sebastian - 17 hours ago

[dead]

excitor - 19 hours ago

[dead]

natch - 6 hours ago

tldr: author is a bad prompter.

Good prompting takes real and ongoing work, thought, foresight, attention, and fastidious communication.

ldjkfkdsjnv - 21 hours ago

[flagged]