Why agents are bad pair programmers
justin.searls.co261 points by sh_tomer a day ago
261 points by sh_tomer a day ago
I think this has put into words a reason why I bounced off using AI this way, when I need something done I often have a rough idea of how I want it done, and how AI does it often doesn't match what I want, but because it's gone off and written a 2,000 lines of code it's suddenly more work for me to go through and say "Ok, so first off, strip all these comments out, you're doubling the file with trivial explanations of simple code. I don't want X to be abstracted this way, I want that...." etc. And then when I give it feedback 2,000 lines of code suddenly switch to 700 lines of completely different code and I can't keep up. And I don't want my codebase full of disjoint scripts that I don't really understand and all have weirdly different approaches to the problem. I want an AI that I have similar opinions to, which is obviously tough. It's like working with someone on their first day.
I don't know if it's giving the tools less self-confidence per se, but I think it's exposing more the design process. Like ideally you want your designer to go "Ok, I'm thinking of this approach, i'll probably have these sorts of functions or classes, this state will be owned here" and we can approve that first, rather than going straight from prompt -> implementation.
Just like with human engineers, you need to start with a planning session. This involves a back and forth discussion to hammer out the details before writing any code. I start off as vague as possible to see if the LLM recommends anything I hadn't thought of, then get more detailed as I go. When I'm satisfied, I have it create 2 documents, initialprompt.txt and TODO.md. The initial prompt file includes a summary of the project along with instructions to read the to do file and mark each step as complete after finishing it.
This ensures the LLM has a complete understanding of the overall goals, along with a step by step list of tasks to get there. It also allows me to quickly get the LLM back up to speed when I need to start a new conversation due to context limits.
In essence, I need to schedule a meeting with the LLM and 'hammer out a game plan.' Gotta make sure we're 'in sync' and everybody's 'on the same page.'
Meeting-based programming. No wonder management loves it and thinks it should be the future.
LLMs are stealing the jobs of developers who go off half-cocked and spend three days writing 2000 lines of code implementing the wrong feature instead of attending a 30 minute meeting
That's dumb, of course, but sometimes people really just do the bare minimum to describe what they want and they can only think clearly once there's something in front of them. The 2000 lines there should be considered a POC, even at 2000 lines.
and the jobs of developers that want to schedule another breakout session to discuss the pros and cons of a 2-line change.
my manager has been experimenting have AI first right the specs as architecture decision records (ADR), then explain how the would implement them, then slowly actually implementing with lots of breaks, review and approval/feedback. He says it's been far superior to typically agent coding but not perfect.
> This ensures the LLM has a complete understanding of the overall goals
Forget about overall goal. I have this simple instruction that i send on every request
"stop after every failing unit test and discuss implementation with me before writing source code "
but it only does that about 7 times out of 10. Other times it just proceeds with implementation anyways.
Ive found similar behaviour with stopping at linting errors. I wonder if my instructions are conflicting with the agent system prompt.
System prompts themselves have many contradictions. I remember hearing an Anthropic engineer (possibly Lex Fridman's interview with Amanda Askell) talking about using exaggerated language like "NEVER" just to steer Claude to rarely do something.
This absolutely captures my experience.
My successful AI written projects are those where I care solely on the output and have little to no knowledge about the subject matter.
When I try to walk an agent through creating anything about which I have a deeply held opinion of what good looks like, I end up frustrated and abandoning the project.
I've enjoyed using roo code's architect function to document an agreed upon approach, then been delighted and frustrated in equal measure by the implementation of code mode.
On revelation is to always start new tasks and avoid continuing large conversations, because I would typically tackle any problem myself in smaller steps with verifiable outputs, whereas I tend to pose the entire problem space to the agent which it invariably fails at.
I've settled on spending time finding what works for me. Earlier today I took 30 minutes to add functionality to an app that would've taken me days to write. And what's more I only put 30 minutes into the diary for it, because I knew what I wanted and didn't care how it got there.
This leads me to conclude that using AI to write code that a(nother) human is one day to interact with is a no-go, for all the reasons listed.
> "This leads me to conclude that using AI to write code that a(nother) human is one day to interact with is a no-go, for all the reasons listed." So, if one's goal is to develop code that is easily maintainable by others, do you think that AI writing code gets in the way of that goal?
Anthropic's guide to using Claude Code [1] is worth reading.
Specifically, their recommended workflow is "first ask it to read the code, then ask it to make a plan to implement your change, then tell it to execute". That sounds like the workflow you're asking for - you can read its plan and make adjustments before it writes a single line of code.
One of the weird things about using agents is that if they're doing things in a way you don't like, including things like writing code without first running the design by you, you can simply ask them to do things a different way.
[1] https://www.anthropic.com/engineering/claude-code-best-pract...
> you can simply ask them to do things a different way
Instead of a writing a blog post about how they didn't guess how you wanted things done?
good one!
I#m wondering how some can complain about ClaudeAI: - its actually enlightening - it saves a lot of time - by intuition, i did whats written in this blog from the beginning on
YES: - sometimes the solution is rubish because i can see that its "randomly" is connecting/integrating stuff - ...but: In about 95% of the cases the output is exactly what i asked for
Yeah, and then you just wind up feeling exhausted AND unsatisfied with where you wind up. You are exactly who I posted this for.
100% of my positive experiences with agent coding are when I don't have reason to care about the intrinsic qualities of the code (one-off scripts or genuine leaf node functions that can't impact anything else.
I prompt it to come up with 3 potential implementation plans, choose which one it thinks is best, and explain its plan to me before it does anything. I also ask it to enumerate which files/functions it will modify in its chosen plan. Then you can give feedback on what it thinks and have it come up with a new plan if you don't like it. Every bit of context and constraints you give it helps. Having a design doc + a little description of your design/code "philosophy" helps. This is easy to do in cursor with rules, I'm guessing other tools have a similar feature. Also, if there's a somewhat similar feature implemented already or if you have a particular style, tell it to reference example files/code snippets.
> but because it's gone off and written a 2,000 lines of code
That’s a you problem, not an AI problem. You have to give it small tasks broken down the same way you would break them down.
Your ai is generating 2000 line code chunks? Are you prompting it to create the entire Skyrim game for SNES? Then after taking long lunch, getting mad when you press run and you find out it made fallout with only melee weapons in a ps1 style?
> I want an AI that I have similar opinions to, which is obviously tough. It's like working with someone on their first day.
Most of what you're describing does apply to humans on the first day, and ais on their first day. If you aren't capturing these preferences somewhere and giving it to either a human or the ai, then why would they somehow know your preferences? For ai, the standard thats forming is you create some markdown file(s) with these so they only need to be explained once, and auto provided as context.
I honestly don't expect to use AI tools extensively for code generation until we figure out how to have the models learn and become accustomed to me aside from clever context prompting. I want my own models derived from the baseline.
That said, I also value not becoming too dependent on any service which isn't free and efficient. Relying on a CNC machine when you never learned how to whittle strips me of a sense of security and power I'm not comfortable with.
> "Ok, I'm thinking of this approach, i'll probably have these sorts of functions or classes, this state will be owned here"
This is the gist of what I've always wanted from a programming mentor, instructor, or tutor.
It can be surprisingly hard to find. Knowing that current LLMs still struggle with it perhaps helps explain why.
The trick is to have rules specific to the project and your programming style & preferences.
Think of the AI like an outsourced consultant who will never say no. It'll always do everything it can to solve the problem you've given it. If it doesn't know how, it'll write a thousand lines when a single additional dependency and 2 lines of code would've done it.
There is an old issue, that's related, why one should avoid using equality relations in AI used for creating a proof. It will go back and forth and fill the log with trivial statements before it comes to the right proof path. This might end up being the majority of the proof and could be an unbounded part. Then somebody has to read that and spend a good deal of time deciding what's trivial and what isn't. Whereas if you remove equality, you have something that isn't very natural.
Personally, I’ve gone from working with the AI to code to working on it to develop specifications. It’s also useful at troubleshooting issues.
I’m no longer a developer by trade and it’s more a hobby or specific problem solving scenario now. But I find using it to identify gaps in my thinking and edit English is ultimately better than getting random code — I love editing English text, but find editing code without consistent style a drag for my purposes.
At the start of prompt before project requirements I copy paste paragraph about the code I want.
No emojis, no comments, no console log statements, no read me file, no error handling. Act as a senior developer working with other experienced developers.
Otherwise it happily generates bunch of trash that is unreadable. Error handling generated will most of the times just hide errors instead of actually dealing with them.
initial prompting like this has a huge impact, yes.
also: I clean the chat and start over sometimes, because results may differ.
People are thinking too much of humans and LOCs as something valuable or worth their consideration when working with AI (because usually that LOCs would have required human effort). This is simply not the case when doing AI coding, and you need to adjust how you work because of that and play to the strengths of this setup, if you want to get something out of it and not frustrate yourself.
Here is how to do this: Have it generate something. That first 2000 lines of not so great first attempt code, don't even think about understanding all of that, or, worse, about correcting it.
Review it loosely. You are not dealing with a human! There is absolutely no need to be thorough or nice. You are not hurting any feelings. Go for 80/20 (or the best ratio you think you can get).
Then, think:
- Anything you missed to inform the AI about? Update your initial prompt
- Anything the AI simply does not do well or to your liking? Write general instructions (all of the IDEs have some way of doing that) that are very explicit about what you don't want to see again, and what you want to see instead.
Then revert everything the ai did, and have it go again from the start. You should approach something that's better.
This approach is essentially the PR workflow preferred by the author. Why let an LLM make huge changes to your working copy just for you to revert them next, instead of just writing patches to be asynchronously reviewed? What you propose is no way of doing pair programming in particular, and seems to support the author’s argument.
1. There is not a mention of "pair programming" in the comment I was addressing. As often happens, the discussion evolves.
2. The point is, that you are training the AI through this process. You can do pair programming afterwards (or not). Aim to instruct it to give you ballpark answers first, and take it from there.
> LOCs as something valuable or worth their consideration when working with AI (because usually that LOCs would have required human effort)
At this point AI generated code absolutely requires review by a human so LOC is still an important metric.
> It's like working with someone on their first day.
This matches my experience exactly, but worse than working with a human on their first day, day 100 for an AI is still like working with them on their first day. Humans have effectively infinite context windows over a long enough period of time, AIs context windows are so constrained that it's not worthwhile to invest the effort to 'teach' it like you would a junior engineer.
It’s not really that humans have infinite context windows, it’s more that the context windows are a very poor substitutes for long term memory.
Memory even in a text heavy field like programming is not only text based so it’s often hard to describe for example an appropriate amount of error checking in prompt.md. Giving a person with anterograde amnesia a book of everything they know - no matter how well indexed or how searchable will not fix the lack of long term memory.
I don't have this experience with Claude, frankly. I do have to correct it, but I try to give very specific prompts with very specific instructions. IT does well with highly commented code.
Now, I have the best luck in my personal project codebase, which I know extremely intimately so can be very surgical with.
Work, which has far less comments and is full of very high level abstractions that I don't know as well.. it struggles with. We both do.
It's a fine pair programmer when one of the pairs knows the codebase extremely well. It's a bad companion elsewhere.
Whenever I land on the front page, I check the comments and brace for HN coming and telling me how stupid I am and lighting me aflame in front of my peers.
But sometimes if I manage to nail the right headline, nobody reads my post and just has their own discussion, and I am spared.
Hilarious but realistic take. I've noticed a similar trend with other posts. I'm a fan of the discourse either way tbh.
I liked your post, a bit of a “how to enjoy pair programming with an AI”. Useful, so thank you!
When I first tried an LLM agent, I was hoping for an interactive, 2-way, pair collaboration. Instead, what I got was a pairing partner who wanted to do everything themselves. I couldn't even tweak the code they had written, because it would mess up their context.
I want a pairing partner where I can write a little, they write a little, I write a little, they write a little. You know, an actual collaboration.
My approach has generally been to accept, refactor and reprompt if I need to tweak things.
Of course this does artificially inflate the "accept rate" which the AI companies use to claim that it's writing good code, rather than being a "sigh, I'll fix this myself" moment.
I do this too and it drives me nuts. It's very obvious to me (and perhaps anyone without an incentive to maximize the accept rate) that the diff view really struggles. If you leave a large diff, copilot and cursor will both get confused and start duplicating chunks, or they'll fail to see the new (or the old) but if you accept it, it always works.
Aider solves this by turn-taking. Each modification is a commit. If you hate it, you can undo it (type /undo, it does the git reset --hard for you). If you can live with the code but want to start tweaking it, do so, then /commit (it makes the commit message for you by reading the diffs you made). Working I turns, by commits, Aider can see what you changed and keep up with you. I usually squash the commits at the end, because the wandering way of correcting the AI is not really useful history.
Have you tried recently? This hasn't been my experience. I modify the code it's written, then ask it to reread the file. It generally responds "I see you changed file and [something.]" Or when it makes a change, I tell it I need to run some tests. I provide feedback, explain the problem, and it iterates. This is with Zed and Claude Sonnet.
I do notice though that if I edit what it wrote before accepting it, and then it sees it (either because I didn’t wait for it to finish or because I send it another message), it will overwrite my changes with what it had before my changes every single time, without fail.
(Zed with Claude 4)
Gemini has insisted on remembering an earlier version of a file even after its own edits.
“We removed that, remember?”
“Yes! I see now …”
Sometimes it circles back to that same detail that no longer exists.
Interesting. I always wait for it to finish with my workflow.
It does it even if I wait for it to finish, but don't accept. Eg:
Starting code: a quick brown fox
prompt 1: "Capitalize the words"
AI: A Quick Brown Fox
I don't accept or reject, but change it to "A Quick Red Fox"
prompt 2: "Change it to dog"
AI: A Quick Brown Dog
Do you tell it to reread the file? Seems like the updates aren't in the context.
I usually add “discuss first. Don’t modify code yet”. Then we do some back and forth. And finally, “apply”.
Claude Code has "plan mode" for this now. It enforces this behavior. But its still poorly documented.
They should add a “cmd-enter” for ask, and “enter” to go.
Separately, if I were at cursor (or any other company for that matter), I’d have the AI scouring HN comments for “I wish x did y” suggestions.
I've been thinking about this a lot recently - having AI automate product manager user research. My thread of thought goes something like this:
0. AI can scour the web for user comments/complaints about our product and automatically synthesize those into insights.
1. AI research can be integrated directly into our product, allowing the user to complain to it just-in-time, whereby the AI would ask for clarification, analyze the user needs, and autonomously create/update an idea ticket on behalf of the user.
2. An AI integrated into the product could actually change the product UI/UX on its own in some cases, perform ad-hoc user research, asking the user "would it be better if things were like this?" and also measuring objective usability metrics (e.g. task completion time), and then use that validated insight to automatically spawn a PR for an A/B experiment.
3. Wait a minute - if the AI can change the interface on its own - do we even need to have a single interface for everyone? Perhaps future software would only expose an API and a collection of customizable UI widgets (perhaps coupled with official example interfaces), which each user's "user agent AI" would then continuously adapt to that user's needs?
> 3. Wait a minute - if the AI can change the interface on its own - do we even need to have a single interface for everyone? Perhaps future software would only expose an API and a collection of customizable UI widgets (perhaps coupled with official example interfaces), which each user's "user agent AI" would then continuously adapt to that user's needs?
Nice, in theory. In practice it will be "Use our Premium Agent at 24.99$/month to get all the best features, or use the Basic Agent at 9.99$ that will be less effective, less customizable and inject ads".
Well, at the end of the day, capitalism is about competition, and I would hope for a future where that "User Agent AI" is a local model fully controlled by the user, and the competition is about which APIs you access through them - so maybe "24.99$/month to get all the best features", but (unless you relinquish control to MS or Google), users wouldn't be shown any ads unless they choose to receive them.
We're seeing something similar in VS Code and its zoo of forks - we're choosing which API/subscriptions to access (e.g. GitLens Pro, or Copilot, or Cursor/Windsurf/Trae etc.), but because the client itself is open source, there aren't any ads.
I try to be super careful, type the prompt I want to execute in a textfile. Ask the agent to validate and improve on it, and ask it to add an implementation plan. I even let another agent review the final plan. But even then, occasionally it still starts implementing halfway a refining.
Same. I use /ask in Aider so I can read what it's planning, ask follow-up questions, get it to change things, then after a few iterations I can type "Make it so" while sitting back to sip on my Earl Grey.
I had done something slightly different. I would ask LLM to prepare a design doc, not code, and iterate on that doc before I ask them to start coding. That seems to have worked a little better as it’s less likely to go rogue.
You can totally do that. Just tell it to.
If you want an LLM to do something, you have to explain it. Keep a few prompt docs around to load every conversation.
I've asked for hints/snippets to give ideas and then implemented what I wanted myself (not commercially). Worked OK for me.
Hmm you can tweak fine these days without messing up context. But, I run in “ask mode” only, with opus in claude code and o3 max in cursor. I specifically avoid agent mode because, like in the post, I feel like I gain less over time.
I infrequently tab complete. I type out 80-90% of what is suggested, with some modifications. It does help I can maintain 170 wpm indefinitely on the low-medium end.
Keeping up with the output isn’t much an issue at the moment given the limited typing speed of opus and o3 max. Having gained more familiarity with the workflow, the reading feels easier. Felt too fast at first for sure.
My hot take is that if GitHub copilot is your window into llms, you’re getting the motel experience.
> My hot take is that if GitHub copilot is your window into llms, you’re getting the motel experience.
I’ve long suspected this; I lean heavily on tab completion from copilot to speed up my coding. Unsurprisingly, it fails to read my mind a large portion of the time.
Thing is, mind reading tab completion is what I actually want in my tooling. It is easier for me to communicate via code rather than prose, and I find the experience of pausing and using natural language to be jarring and distracting.
Writing the code feels like a much more direct form of communicating my intent (in this case to the compiler/interpreter). Maybe I’m just weird; and to be honest I’m afraid to give up my “code first” communication style for programming.
Edit: I think the reason why I find the conversational approach so difficult is that I tend to think as I code. I have fairly strong ADHD and coding gives me appropriate amount of stimulation to do design work.
Take a look at aider's watch mode. It seems like a bridge for code completion with more powerful models than Copilot.
In all honesty - have you tried doing what you would do with a paired programmer - that is, talk to them about it? Communicate? I’ve never had trouble getting cursor or copilot to chat with me about solutions first before making changes, and usually they’ll notice if I make my own changes and say “oh, I see you already added XYZ, I’ll go ahead and move on to the next part.”
I’ve never had trouble getting cursor or copilot to chat with me about solutions first before making changes
Never had any trouble.... and then they lived together happy forever.
I do this all the time with Claude Code. I’ll accept its changes, make adjustments, then tell it what I did and point to the files or tell it to look at the diff.
Pair programming requires communicating both ways. A human would also lose context if you silently changed their stuff.
Pair programming is also not suitable for all cases
Maybe not for many cases
I mentioned this elsewhere but I find it absolutely impossible to get into a good programming flow anymore while the LLM constantly interrupts me with suggested autocompletes that I have to stop, read, review, and accept/reject
It's been miserable trying to incorporate this into my workflow
Second this. My solution is to have a 'non-AI' IDE and then a Cursor/VS Code to switch between. Deep work cannot be achieved by chatting with the coding bots, sorry.
I do this as well and it works quite well for me like that!
Additionally, when working on microservices and on issues that don’t seem too straightforward, I use o3 and copy the whole code of the repo into the prompt and refine a plan there and then paste it as a prompt into cursor. Handy if you don’t have MAX mode, but a company-sponsored ChatGPT.
I do this too by pasting only the relevant context files into O3 or Claude 4. We have an internal tool that just lets us select folders/files and spit out one giant markdown.
Thirded. It was just completely distracting and I had to turn it off. I use AI but not after every keystroke, jeez.
But but but... "we are an AI-first company".
Yeah, nah. Fourthed!
> AI-first company
Does anybody introduce itself like that?
It's like when your date sends subtle signals, like kicking sleeping tramps in the street and snorting the flour over bread at the restaurant.
(The shocking thing is that the expression would even make sense when taken properly - "we have organized our workflows through AI-intelligent systems" -, while at this time it easily means the opposite.)
> > AI-first company
> Does anybody introduce itself like that?
Yes, I've started getting job posts sent to me that say that.
Declaring one's company "AI-first" right now is a great time-saver: I know instantly that I can disregard that company.
I just cmd + shift + p -> disable Cursor tab -> enter
Sure, you could just add a shortcut too.
After a while, it turns into a habit.
This is kind of intentionally the flow with Claude code as I’ve experienced it.
I’m in VSCode doing my thing, and it’s in a terminal window that occasionally needs or wants my attention. I can go off and be AI-Free for as long as I like.
> Deep work cannot be achieved by chatting with the coding bots, sorry.
...by you. Meanwhile, plenty of us have found a way to enhance our productivity during deep work. No need for the patronization.
I don't believe you experience deep work the same way I do then
In my mind you cannot do deep work while being interrupted constantly, and LLM agents are constant interruptions
We're getting constantly interrupted with Slack messages, Zoom meetings, emails, Slack messages about checking said emails, etc. At least an LLM isn't constantly pinging you for updates (yet?) - you can get back to it whenever.
if youre using a computer at all, youre doing it wrong. deep work can only be done from the forest with no internet reception, pencil and paper
Everyone knows real programmers only need to use a butterfly.
If you've opened your eyes, it's not deep work.
Deep work happens in a sensor deprivation tank. And you have to memorize everything you thought through, and write it down (with quill pen) after you emerge.
Anything else isn't really deep. Sorry, you posers.
This sounds like an issue with the specific UI setup you are using. I have mine configured so it only starts doing stuff if I ask it to. It never interrupts me.
You can do better than a No true Scotsman fallacy. The fact is that not everyone works the same way you do, or interacts the same way with agents. They are not constant interruptions if you use them correctly.
Essentially, this is a skill issue and you're at the first peak of the Dunning–Kruger curve, sooner ready to dismiss those with more experience in this area as being less experienced, instead of keeping an open mind and attempting to learn from those who contradict your beliefs.
You could have asked for tips since I said I've found a way to work deeply with them, but instead chose to assume that you knew better. This kind of attitude will stunt your ability to adopt these programs in the same way that many people were dismissive about personal computers or the internet and got left behind.
It’s quite amusing to see you complain about patronisation, and then see you turn about and do it yourself one comment later.
As an observer to this conversation, I can't help but notice that both have a good point here.
Soulofmischief’s main point is that meesles made an inappropriate generalization. Meesles said that something was impossible to do, and soulofmischief pointed out that you can't really infer that it's impossible for everyone just because you couldn't find a way. This is a perfectly valid point, but it wasn't helped by soulofmischief calling the generalization “patronizing”.
Bluefirebrand pushed back on that by merely stating that their experience and intuition match those of meesles, but soulofmischief then interpreted that as implying they're not a real programmer and called it a No True Scotsman fallacy.
It went downhill from there with soulofmischief trying to reiterate their point but only doing so in terms of insults such as the Dunning-Kruger line.
I only took issue with ", sorry." The rest of it I was fine with. I definitely didn't need to match their energy so much though, I should have toned it down. Also, the No true Scotsman was about deep work, not being a programmer, but otherwise yeah. I didn't mean to be insulting but I could have done better :)
Oh 100%. I deliberately passed no judgement on the actual main points, as my experience is quite literally in between both of theirs.
I find agent mode incredibly distracting and it does get in the way of very deep focus for implementation for myself for the work I do... but not always. It has serious value for some tasks!
I'm open to hearing how being honest with them about their negative approach is patronizing them.
Calling someone "on the first peak of the Dunning-Kruger curve" is patronizing them.
How would you have handled it?
Here is how I might have handled it differently:
Instead of
> Meanwhile, plenty of us have found a way to enhance our productivity during deep work. No need for the patronization.
you could have written
> Personally, I found doing X does enhance my productivity during deep work.
Why it's better: 1) cuts out the confrontation (“you're being patronizing!”), 2) offers the information directly instead of merely implying that you've found it, and 3) speaks for yourself and avoids the generalization about “plenty of people”, which could be taken as a veiled insult (“you must be living as a hermit or something”).
Next:
> You can do better than a No true Scotsman fallacy.
Even if the comment were a No True Scotsman, I would not have made that fact the central thesis of this paragraph. Instead, I might have explained the error in the argument instead. Advantages: 1) you can come out clean in the case that you might be wrong about the fallacy, and 2) the commenter might appreciate the insight.
Reason you're wrong in this case: The commenter referred entirely to their own experience and made no “true programmer” assertions.
Next:
> Essentially, this is a skill issue [...] Dunning–Kruger curve [...] chose to assume that you knew better. [...]
I would have left out these entire two paragraphs. As best as I can tell, they contain only personal attacks. As a result, the reader comes away feeling like your only purpose here is to put others down. Instead, when you wrote
> You could have asked for tips
I personally would have just written out the tips. Advantage: the reader may find it useful in the best case, and even if not, at least appreciate your contribution.
That's real patronizing. His answers were fine, unless you think he is totally wrong.
Would be informative if both sides share what the problem domain is when providing their their experiences.
It's possible that the domain or the complexity of the problems are the deciding factor for success with AI supported programming. Statements like 'you'll be left behind' or 'it's a skill issue' are as helpful as 'It fails miserably'
For what it’s worth, the deepest-thinking and most profound programmers I have met—hell, thinkers in general—have a peculiar tendency to favour pen and paper. Perhaps because once their work is recognised, they are generally working with a team that can amplify them without needing to interrupt their thought flow.
Ha, I would count myself among those if my handwriting wasn't so terrible and I didn't have bad arthritis since my youth. I still reach for pen and paper on the go or when I need to draw something out, but I've gotten more productive using an outliner on my laptop, specifically Logseq.
I think there's still room for thought augmentation via LLMs here. Years back when I used Obsidian, I created probably the first or second copilot-for-Obsidian plugin and I found it very helpful, even though GPT-3 was generally pretty awful. I still find myself in deep flow, thinking in abstract, working alongside my agent to solve deep problems in less time than I otherwise would.
> You could have asked for tips since I said I've found a way to work deeply with them
How do you work deeply with them? Looking for some tips.
Analysis in the last 5-10 years has shown the Dunning-Kruger effect may not really exist. So it’s a poor basis on which to be judgmental and condescending.
> judgmental and condescending
pushing back against judgement and condescension is not judgemental and condescending.
> may not really exist
I'm open to reading over any resources you would like to provide, maybe it's "real", maybe it isn't, but I have personally both experienced and witnessed the effect in myself, other individuals and groups. It's a good heuristic for certain scenarios, even if it isn't necesarily generalizable.
I would invite you to re-read some of the comments you perceived as judgement and condescension and keep an open mind. You might find that you took them as judgement and condescension unfairly.
Meanwhile, you have absolutely been judgemental and condescending yourself. If you really keep the open mind that you profess, you'll take a moment to reflect on this and not dismiss it out of hand. It does not do you any favors to blissfully assume everyone is wrong about you and obliviously continue to be judgmental and condescending.
I recently got a new laptop and had to setup my IDE again.
After a couple hours of coding something felt "weird" - turns out I forgot to login to GitHub Copilot and I was working without it the entire time. I felt a lot more proactive and confident as I wasn't waiting on the autocomplete.
Also, Cursor was exceptional at interrupting any kind of "flow" - who even wants their next cursor position predicted?
I'll probably keep Copilot disabled for now and stick to the agent-style tools like aider for boilerplate or redundant tasks.
It's strange the pure llm workflow and boring. I still write most of my own code and will llms when I'm too lazy to write the next piece.
If I give it to an llm most of my time is spent debugging and reprompting. I hate fixing someone elses bug.
Plus I like the feeling of the coding flow..wind at my back. Each keystroke putting us one step closer.
The apps I made with llms I never want to go back to but the apps I made by hand piece by piece getting a chemical reaction when problems were solved are the ones I think positively about and want to go back to.
I always did math on paper or my head and never used a calculator. Its a skill I never have forgotten and I worry how many programmers won't be able to code without llms in the future.
> Also, Cursor was exceptional at interrupting any kind of "flow" - who even wants their next cursor position predicted?
Me, I use this all the time. It’s actually predictable and saves lots of time when doing similar edits in a large file. It’s about as powerful as multi-line regex search and replace, except you don’t have to write the regex.
AI "auto-complete" or "code suggestions" is the worst, especially if you are in a strongly-type language because it's 80% correct and competing with an IDE that can be 100% correct.
AI agents are much better for me because 1) they don't constantly interrupt your train of thought and 2) they can run compile, run tests, etc. to discover they are incorrect and fix it before handing the code back to you.
> Pair programming is also not suitable for all cases
I think this is true but pair programming can work for most circumstances.
The times where it doesn't work is usually because one or both parties are not all-in with the process. Either someone is skeptical about pair programming and thinks it never works or they're trying to enforce a strict interpretation of pair programming.
It doesn't work when someone already has a solution in mind and all they need to do is type it into the editor
I've been doing this a while. This is most of my work
I love the autocomplete, honestly use it more than any other AI feature.
But I'm forced to write in Go which has a lot of boilerplate (and no, some kind of code library or whatever would not help... it's just easier to type at that point).
It's great because it helps with stuff that's too much of a hassle to talk to the AI for (just quicker to type).
I also read very fast so one line suggestions are just instant anyway (like non AI autocomplete), and longer ones I can see if it's close enough to what I was going to type anyway. And eventually it gets to the point where you just kinda know what it's going to do.
Not an amazing boost, but it does let me be lazy writing log messages and for loops and such. I think you do need to read it much faster than you can write it to be helpful though.
I’ve always seen it as primarily an _education_ tool; the purpose of pair programming isn’t that two people pair programming are more productive than two people working individually, they’re generally not. So pair programming with a magic robot seems rather futile; it’s not going to learn anything.
LLMs in their current incarnation will not, but there's nothing inherently preventing them from learning. Contexts are getting large enough that having a sidecar database living with each project or individual as a sort of corpus of "shit I learned pairing with Justin" is already completely achievable, if only a product company wanted to do that.
Claude Plays Pokemon is kind of an interesting case study for this. This sort of knowledgebase is implemented, but even state of the art models struggle to use it effectively. They seem to fixate on small snippets from the knowledge base without any ability to consider greater context.
Zed has a "subtle" mode, hopefully that feature can become table stakes in all AI editor integrations
I’m a Vim user and couldn’t agree more.
Didn’t like any of the AI-IDEs, but loved using LLMs for spinning up one off solutions (copy/paste).
Not to be a fan boy, but Claude Code is my new LLM workflow. It’s tough trying to get it to do everything, but works really well with a targeted task on an existing code base.
Perfect harmony of a traditional code editor (Vim) with an LLM-enhanced workflow in my experience.
Code regularly, and use ai to get unblocked if you do so or review code for mistakes.
Or have the ai write the entire first draft for some piece and then you give it a once over, correcting it either manually or with prompts.
I think the author's dislike for pair programming says a great deal more about the author than it does about pair programming or LLMs.
If you're pair programming and you're not driving, then it's your job to ask the driver to slow down so you can understand what they're doing. You may have to ask them to explain it to you. You may have to explain it back to them. This back-and-forth is what makes pairing work. If you don't do this, then of course you'll get lost.
The author seems to take the same passive position with an LLM, and the results are similar.
I have mixed feelings about this situation. I have committed myself to learning how to use it as effectively as possible and to utilising it extensively for at least one month. Through my company, I have access to multiple products, so I am trying them all.
I can say that I am more productive in terms of the number of lines written. However, I cannot claim to be more productive overall.
For every task it completes, it often performs some inexplicable actions that undo or disrupt other elements, sometimes unrelated ones. The tests it generates initially appear impressive, but upon examining other metrics, such as coverage, it becomes evident that its performance is lacking. The amount of time required to guide it to the correct outcome makes it feel as though I am taking many steps backwards before making any significant progress forward—and not in a beneficial way. On one occasion, it added 50,000 unnecessary import lines into a module that it should not have been altering.
On another occasion, one of the agents completely dismantled the object-oriented programming hierarchy, opting instead to use if/else statements throughout, despite the rules I had set.
The issue is that you can never be certain of its performance. Sometimes, for the same task, it operates flawlessly, while at other times, it either breaks everything or behaves unpredictably.
I have tried various techniques to specify what needs to be done and how to accomplish it, yet often, for similar tasks, its behaviour varies so significantly between runs that I find myself needing to review every change it makes each time. Frustratingly, even if the code is nearly correct and you request an update to just one part, it may still behave erratically.
My experience thus far suggests that it is quite effective for small support tools, but when dealing with a medium-sized codebase, one cannot expect it to function reliably every time.
LLM agents don't know how to shut up and always think they're right about everything. They also lack the ability to be brief. Sometimes things can be solved with a single character or line, but no they write a full page. And, they write paragraphs of comments for even the most minuscule of changes.
They talk at you, are overbearing and arrogant.
I expect a lot of the things people don't like ("output too long, too many comments in code") are side effects of making the LLM good in other areas.
Long output correlates with less laziness when writing code, and higher performance on benchmarks due to the monotone relationship between number of output tokens and scores. Comment spam correlates with better performance because it's locally-specific reasoning it can attend on when writing the next line of code, leading to reduced errors.
Just add to the prompt not to include comments and to talk less.
I have a prompt document that includes a complete summary of the Clean Code book, which includes the rules about comments.
You do have to remind it occasionally.
I have added it in the guidelines doc for Junie and that won't stop it. It can't help itself - it needs to write a comment every three lines, no matter the language it's writing in.
You can, but I would expect code correctness to be reduced, you're removing one mechanism the model uses to dump local reasoning immediately prior to where it's needed.
With that logic, I should ask the AI to _increase_ the amount of comments. I highly doubt the comments it generates are useful, they're usually very superficial.
Perhaps not useful to you, but they are the only way the LLM has to know what it is doing.
It has to reason about the problem in its output, since its output comprises almost the entirety of its "awareness". Unlike you, the LLM doesn't "know" anything, even superficial things.
In some sense it's like us when we are working on a problem with lots of novel parts. We usually have to write down notes to refer to in the process of solving the problem, except for the LLM the problem is always a novel problem.
I usually use huge context/prompt documents (10-100K tokens) before doing anything, I suppose that helps.
I’ll experiment with comments, I can always delete them later. My strategy is to have self-documenting code (and my prompts include a how-to on self-documenting code)
But that information is scattered. It's helpful for the LLM to cluster and isolate local reasoning that it can then "forget" about when it moves on to the next thing. Attending to nearby recent tokens is easy for it, looking up relevant information as needle in a haystack every single time is more error prone. I'm not saying asking it to remove comments will lead to a catastrophic drop off in performance, maybe something like a few percent or even less. Just that it's not useless for pure benchmaxxing.
I was trying out sonnet 4 yesterday and it spent 15 minutes changing testing changing etc just to get one config item changed. It ended up changing 40 files for no reason. Also kept trying to open a debugger that didn’t exist and load a webpage that requires auth.
They’re far from perfect that’s for sure.
I don’t think anyone seriously is claiming perfect. The thing is all of AI is moving 5 times faster than any disrupting tech before it.
We went from proof reading single emails to researching agentic coding in a year.
It should have been five.
In my experience the problem is not they are too fast, they are too slow.
Honestly, their speed is just the right amount to make them bad. If they were faster, I could focus on following the code they are writing. But they take so much time for every edit that I tune out. On the other hand if they were slower, I could do other work while they are working, but they are done every 50 seconds to a few minutes which means I can't focus on other tasks.
If they did smaller faster changes it would probably be better.
Ideally though I would prefer them to be more autonomous, and the collaboration mode to be more like going over merge requests than pair programming. I ideally would like to have them take a task and go away for a few hours or even like 30 minutes.
The current loop, provide a task, wait 1 to 3 minutes, see a bunch of changes, provide guidance, repeat is the worst case scenario in my view.
Yeah, I could completely see this. Reminds me of the "Slow Internet vs No Internet" oatmeal comic
> that I tune out
You need a 30L fishtank for your desk. Great for tuning out.
As a developer who doesn't use AI for coding, except for the occasional non-project specific question to a chat bot, I am wondering if you use it for client projects or only for your own projects. If you do use it for client projects, do you have some kind of agreement that you're going to share their code with a third-party? I'm asking because most clients will make you sign a contract saying that you shouldn't disclose any information about the project to a third-party. I even once had a client who explicitly stated that AI should not be used. Do you find clients willing to make an exception for AI coding agents?
I basically only use it in the workplace, and largely because of one of those AI mandates.
I don't think it actually saves me enough time (or for many tasks, any time) so I wouldn't pay for it for my own projects, and also for my own projects, the enjoyability is a big factor, and I enjoy doing more than prompting.
Thank you for the reply. What do you mean by "AI mandates"? Does it mean your company has an explicit policy allowing sharing code with AI services?
Sadly, I mean my current employer is doing the whole "tracking to see AI usage rates" and basically checking in performance reviews if people are using as much AI as the AI sales people told the CEO people need to use.
We're a SaaS company so we own all our code.
I just puked in my mouth a little. Sorry to hear you're being subjected to that.
Wow, really?! I had no idea that such policies existed. Quite astonishing I have to say.
Klarna, Shopify and either Google or Meta made a lot of press promoting policies like that and also the AI companies themselves are selling this kind of approach in the "how to make the best use of our tools" advice they give to execs.
I have a client that actively asks me to use AI more and more. They expect to get better quality code faster, ie. to reduce costs. (That's not my experience but that's beside the point).
I don't share anything with openai/anthropic that I wouldn't feel comfortable pasting into a web search prompt.
So no AI autocomplete I suppose?
I assume AI autocomplete may send any part of your code base or even all of it to a third-party.
No, I don't. This goes for internal projects as well, we're not going to share code unless payed to do so.
We commonly work with personal information so it would also introduce rather harsh legal risks if usian corporations could reach it.
The collaborative style of AI use struck me as the obvious correct use of AI, just as the more popular "AI writing code" style struck me as horribly incorrect and indication of yet again the software industry going off on a fool's tangent, as the larger industry so often does.
I never have AI write code. I ask it to criticize code I've written, and I use it to strategize about large code organization. As a strategy consult, with careful LLM context construction, one can create amazing effective guides that teach one new information very successfully. That is me using my mind to understand and then do, never giving any AI responsibilities beyond advice. AI is an idiot savant, and must be treated as such.
Finally someone said it, they're overconfident in their approach, don't consult us with the details of the implementation, they're trained to create mock APIs that don't follow structure, leading to lot of rework. The LLM actions should be measured, collaborative, ask for details when it's not present. It is impossible to give every single detail in the initial prompt, and a follow up prompt derails the train thought and design of the application.
I don't know if I'm using it right, I'd love to know more if that's the case. In a way the LLM should improve on being iterative, take feedback, maybe it's a hard problem to add/update the context. I don't know about that either, but love to learn more.
Most stacks now support some form of "plan" workflows. You'd want to first do this, and see if it improves your experience.
One workflow that works well for me, even with small local models, is to start a plan session with something like: "based on @file, and @docs and @examples, I'd like to _ in @path with the following requirements @module_requirements.md. Let's talk through this and make sure we have all the info before starting to code it."
Then go back and forth, make sure everything is mentioned, and when satisfied either put it into a .md file (so you can retry the coding flow later) or just say "ok do it", and go grab a cup of coffee or something.
You can also make this into a workflow with .rules files or .md files, have a snippets thing from your IDE drop this whenever you start a new task, and so on. The idea with all the advancements in LLMs is that they need lots of context if you want them to be anything other than what they were trained on. And you need to try different flows and see what works on your specific codebase. Something that works for projectA might not work for projectB ootb.
Also giving them more details seems to confuse them. There is probably a way around this, though. They are pretty good in finding a tiny silver of information out of the ocean. What I hate is that the industry is all geared toward the same model (chat bot). Imagine if we never invented the keyboard, mouse, GUI, touch screen, etc...
Yes, this is exactly why the "planning" approach never seems to work for me. Like every ounce of planning I do with the LLM it becomes a pound stupider at implementation time
A week or so ago I needed to convince chatgpt that following code will indeed initialize x values in struct
struct MyStruct
{
int x = 5;
};
...
MyStruct myStructs[100];
It was insisting very passionately that you need MyStruct myStructs[100] = {}; instead.I even showed msvc assembly output and pointed to the place where it is looping & assigning all x values and then it started hallucinating about msvc not conforming the standards. Then I did it for gcc and it said the same. It was surreal how strongly it believed it was correct.
LLM's don't have beliefs, so "convincing" them of this or that is a a waste of your time. The way to handle such cases is to start anew with a clean context and just add your insight to the prompt so that it lands on the right track from the beginning. Remember these models are ultimately just next-token predictors and anthropomorphizing them will invariably lead to suboptimal interactions.
That is not even valid C code, so you would have to seriously convince me, too.
What makes it invalid is "= 5", and lack of "struct" before "MyStruct" (could have used typedef).
I use an LLM as a reference (on-demand), and don't use agents (yet). I was never into pair programming, anyway, so it isn't a familiar workflow for me.
I will admit that it encourages "laziness," on my part, but I'm OK with that (remember when they said that calculators would do that? They were right).
For example, I am working on a SwiftUI project (an Apple Watch app), and forgot how to do a fairly basic thing. I could have looked it up, in a few minutes, but it was easier to just spin up ChatGPT, and ask it how to do it. I had the answer in a few seconds. Looking up SwiftUI stuff is a misery. The documentation is ... a work in progress ...
> I use an LLM as a reference (on-demand), and don't use agents (yet)
This was me until about three weeks ago. Then, during a week of holiday, I decided I didn't want to get left behind and tried a few side-projects using agents -- specifically I've been using Roo. Now I use agents when appropriate, which I'd guess is about 50% of the work I'm doing.
Roo looks interesting. How does it compare with Cursor and Windsurf?
It burns tokens if you BYOK but you can hook into GH Copilot LLMs directly
I really like the orchestrator and architect personas as is out of the box. I prefer it over Cursor / Windsurf for a few reasons - no indexing (double edged sword) - orchestrator I find much more useful than windsurf cascades - tool usage is fantastic
The no indexing is a double edged sword, it does need to read files constantly, contributing to token burn. However, you don't have to worry about indexed data being on a 3rd party server (cursor), and also since it has to crawl to understand the codebase for it to implement, to me it seems like it is more capable of trickier code implementations, as long as you utilize context properly.
For more complex tasks, I usually either spend 20-30 minutes writing a prompt to give it what I'm looking to implement, or write up a document detailing the approach I'd like to take and iterate with the architect agent.
Afterwards, hand it off to the orchestrator and it manages and creates subtasks, which is to provide targeted implementation steps / tasks with a fresh context window.
If you have a GH Copilot license already, give it a shot. I personally think it's a good balance between control as an architect and not having to tie my time down for implementations, since really a lot of the work in coding is figuring out the implementation plan anyways, and the coding can be busy work, to me personally anyways. I prefer it over the others as I feel Windsurf/Cursor encourages YOLO too much.
I've been considering a... protocol? for improving this. Consider this repo:
foo.py
bar.py
bar.py.vibes.md
This would indicate that foo.py is human-written (or at least thoroughly reviewed by a human), while bar.py is LLM written with a lower bar of human scrutiny.bar.py.vibes.md would contain whatever human-written guidance describes how bar should look. It could be an empty file, or a few paragraphs, or it it could contain function signatures and partially defined data types.
If an LLM wants to add a new file, it gets a vibes.md with whatever prompt motivated the addition.
Maybe some files keep their assiciated *.vibes.md forever, ready to be totally rewritten as the LLM sees fit. Maybe others stick around only until the next release, after which the associated code is reviewed and the vibes files are removed (or somehow deactivated, I could imagine it being useful for them to still be present).
What do people think, do we need handcuffs of this kind for our pair programming friends the LLMs?
I think coding will eventually go away in favor of models with metadata built around them.
How many times did you have a mutation operation where you had to hand code the insert of 3 or 4 entities and make sure they all come back successful, or you back out properly (and perhaps this is without a transaction, perhaps over multiple databases).
Make sure the required fields are present Grab the created inserted ID Rinse, repeat
Or if you're mutating a list, writing code that inserts a new element, but you don't know which one is new. And you end up, again, hand coding loops and checking what you remember to check.
What about when you need to do an auth check.
And the hand coder may fail to remember one little thing somewhere.
With LLM code, you can just describe that function and it will remember to do all the things.
An LLM with a model + metadata - we won't really need to think of it as editing User.java or User.py anymore. Instead User.yaml - and the LLM will just consume that, and build out ALL of your required biz-logic, and be done with it. It could create a fully authenticating/authorizing REST API + GraphQL API with sane defaults - and consistent notions throughout.
And moving into UIs- we can have the same thing. The UI can be described in an organized way. What fields are required for user registration. What fields are optional according to the backend. It's hard to visualize this future, but I think it's a no-code future. It's models of requirements instead.
> I think coding will eventually go away in favor of models with metadata built around them.
You can pry my understanding of, and desire to use, traditional programming languages from my cold dead neurons. The entire point of computer systems is that they automatically and unerringly follow precise, explicit instructions.
In writing the code that is supposed to implement my idea, I find that my idea has many flaws.
Sending that idea to an LLM (in absence of AGI) seems like a great way to find out about the flaws too late.
Otherwise, specifying an application in such detail as to obtain the same effect is essentially coding, just in natural language, which is less precise.
What do you suppose that metadata is going to look like if not partially complete code where the LLM fills in the gaps?
I don't understand all of what you wrote but a lot of it is very old news and usually done with deterministic tooling you don't have to wait for, some of which you should have built or configured yourself to get it tailored to the type of work you do.
And some of it we've had under the RAD umbrella, basically using configuration files and tools to generate those that are used to generate large portions of systems.
Even humans are bad pair programmers, I always try to steer away from projects or companies that have drinken the whole XP kool aid.
Indeed, this submission is also a good article on why pair programming often fails with humans.
It's not that AIs are bad. It's that pair programming is (often) not effective. In the majority of cases, one side dominates the other.
In my experience, a modified pair programming system works better where the two of us discuss the problem to be solved, then each of us goes off for a few hours independently coming up with ideas, and doing experiments. We then get together, discuss our findings, and finalize a plan of attack. And then pair programming helps as we're both at the same level and on the same page. But even then, having to watch someone else's screen (or have people interrupt you while you're typing) is a pain.
I think one huge issue with pairing for non programmers or junior programmers is that the LLM never pushes back on whatever you throw at it. Like it can't deconstruct and examine what that acutal problem and suggest a more robust or simplar alternative.
> to ask a clarifying question
I think one shortfall of LLMs is their reluctance to ask clarifying questions. From my own post [1]:
> LLMs are poor communicators, so you have to make up the difference. Unlike a talented direct report, LLMs don't yet seem generally able to ask the question-behind-the-question or to infer a larger context behind a prompt, or even ask for clarification.
The major problem I see with current LLM-based code generation is their overconfidence beyond a certain point. I've experienced agents losing track of what they are developing; a single line change can literally mess up my entire codebase, making debugging a nightmare.
I believe we need more structured, policy-driven models that exhibit a bit of self-doubt, prompting them to revert to us for clarification. Furthermore, there should be certain industry standards in place. Another significant issue is testing and handling edge cases. No matter what, today's AI consistently fails when dealing with these scenarios, and security remains a concern. what are some problems you have noticed ??
Title of the blog is negative, but the contents seem fairly positive? If a few UX improvements are the only blocker to the author finding LLMs to be useful pair programmers then we are in a good spot.
I'm conflicted, you can slow down and take all the time you need to understand and ask to clarify further.
> give up on editor-based agentic pairing in favor of asynchronous workflows like GitHub's new Coding Agent, whose work you can also review via pull request
Why not just review the agent's work before making a git commit?
Writing out hundreds of lines of code is not what I meant by proactive tools…
Where are the proactive coding tools? https://austinhenley.com/blog/proactiveai.html
This guy needs a custom prompt. I keep a prompt doc around that is constantly updated based on my preferences and corrections.
Not a few sentences but many many lines of examples and documentation
Gist an example of what you mean? My experience with very large prompts and exacting custom instructions has been drastically eroded "intelligence"
Many of my prompts include _somewhat_ sensitive details because they're tailor-made for each project. This is a more generic prompt I've been using for my code generation tool:
https://gist.github.com/tobyhinloopen/e567d551c9f30390b23a0a...
More about this prompt:
https://bonaroo.nl/2025/05/20/enforced-ai-test-driven-develo...
Lately, I've been letting the agent write the prompt by ordering it to "update the prompt document with my expressed preferences and code conventions", manually reviewing the doc. Literally while writing this comment, I'm waiting for the agent to do:
> note any findings about this project and my expressed preferences and write them to a new prompt document in doc, named 20250610-<summary>.md
I keep a folder of prompt documents because there's so many of them (over 30 as of writing this comment, for a single project). I have more generic ones and more specific ones, and I usually either tell the agent to find relevant prompt documents or tell the agent to read the relevant ones.
Usually over 100K tokens is spent on reading the prompts & documentation before performing any task.
Here's a snippet of the prompt doc it just generated:
https://gist.github.com/tobyhinloopen/c059067037a6edb19065cd...
I'm experimenting a lot with prompts, I have yet to learn what works and what doesn't, but one thing is sure: A good prompt makes a huge difference. It's the difference between constantly babysitting and instructing the agent and telling it to do something and waiting for it to complete.
I had many MRs merged with little to no post-prompt guidance. Just fire and forget, commit, read the results, manually test it, and submit as MR. While the code is usually somewhere between "acceptable" and "obviously AI written", it usually works just fine.
I guess it is tool-dependent, but do you pass in that enormous prompt on each request?
Yes, I inject multiple documents like that before every session. The documents I inject are relevant to the upcoming task.
The one I shared is a variant of the “Base” document, I have specific documents per use case. If I know I’m adding features (controller actions), I inject a prompt containing documentation how to add routes, controllers, controller actions, views, etc and how to format views, what helpers are commonly used.
If I’m working on APIs, I have API specific prompts. If I’m working on syncs with specific external services, I have prompts containing the details about these services.
Basically I consider every session a conversation with a new employee. I give them a single task and include all the relevant documentation and guidelines and wish them good luck.
Sometimes it takes a while, but I generally have a second issue to work on, in parallel. So while one agent is fixing one issue, I prepare the other agent to work on the second. Very occasionally I have 3 sessions running at the same time.
I barely write code anymore. I think I’ve not written a single line of code in the last few work days. Almost everything I submit is written by AI, and every time I have to guide the LLM and I expect the mistake to be repeated, I expand the relevant prompt document.
Last few days I also had the LLM update the prompt documents for me since they’re getting pretty big.
I do thoroughly review the code. The generated code is different from how I would write it, sometimes worse but sometimes better.
I also let it write tests, obviously, and I have a few paragraphs to write happy flow tests and “bad flow” tests.
I feel like I’m just scratching the surface of the possibilities. Im writing my own tools to further automate the process, including being able to generate code directly on production and have different versions of modules running based on the current user, so I can test new versions and deploy them instantly to a select group of users. This is just a wild fantasy I have and I’m sure I will find out why it’s a terrible idea, but it doesn’t stop me from trying.
Thanks!
Sorry to belabor the question, when you say "before every session", how many "things" do you do in a session? You say you give them a single task, but do you end up chatting back and forth with the agent in that session? I guess I'm unsure how far back the "context" goes in a conversation and if would drift from your directives if the conversation went back and forth too much.
> Allow users to pause the agent to ask a clarifying question or push back on its direction without derailing the entire activity or train of thought
I think I’ve seen Zed/Claude do kind of this. A couple times, I’ve hit return, and then see that I missed a clarifying statement based on the direction it starts going and I put it in fast, and it corrects.
LLM agents are very hard to talk about because they're not any one thing. Your action-space in what you say and what approach you take varies enormously and we have very little body of common knowledge about what other people are doing and how they're doing it. Then the agent changes underneath you or you tweak your prompt and it's different again.
In my last few sessions I saw the efficacy of Claude Code plummet on the problem I was working on. I have no idea whether it was just the particular task, a modelling change, or changes I made to the prompt. But suddenly it was glazing every message ("you're absolutely right"), confidently telling me up is down (saying things like "tests now pass" when they completely didn't), it even cheerfully suggested "rm db.sqlite", which would have set me back a fair bit if I said yes.
The fact that the LLM agent can churn out a lot of stuff quickly greatly increases 'skill expression' though. The sharper your insight about the task, the more you can direct it to do something specific.
For instance, most debugging is basically a binary search across the set of processes being conducted. However, the tricky thing is that the optimal search procedure is going to be weighted by the probability of the problem occurring at the different steps, and the expense of conducting different probes.
A common trap when debugging is to take an overly greedy approach. Due to the availability heuristic, our hypotheses about the problem are often too specific. And the more specific the hypothesis, the easier it is to think of a probe that would eliminate it. If you keep doing this you're basically playing Guess Who by asking "Is it Paul? Is it Anne?" etc, instead of "Is the person a man? Does the person have facial hair? etc"
I find LLM agents extremely helpful at forming efficient probes of parts of the stack I'm less fluent in. If I need to know whether the service is able to contact the database, asking the LLM agent to write out the necessary cloud commands is much faster than getting that from the docs. It's also much faster at writing specific tests than I would be. This means I can much more neutrally think about how to bisect the space, which makes debugging time more uniform, which in itself is a significant net win.
I also find LLM agents to be good at the 'eat your vegetables' stuff -- the things I know I should do but would economise on to save time. Populate the tests with more cases, write more tests in general, write more docs as I go, add more output to the scripts, etc.
I basically jump away from Cursor to ChatGPT when I need to think thoroughly on something like an architecture decision or an edge case etc. Then when I've used ChatGPT to come up with an implementation plan, I jump back to Cursor and have Claude do the actual coding. O3 and ChatGPT's search functionality are just better (at least for myself) currently for "type 2" thinking tasks.
Article has some good points. They move fast, and they can easily run off the rails if you don't watch carefully.
And I've found that it's just as mentally exhausting programming alongside one as it is doing it yourself.
The chief advantage I've found of working alongside Claude is its automation of tedious (to me) tasks.
It is rather soul crushing how fast LLMs spit out decent code.
In my experience, LLMs are idiot savant coders--but currently more idiot than savant. Claude 3.7 (via cursor and roo) can comment code well, create a starter project 10x faster than I could, and they spit out common crud apps pretty well.
However I've come to the conclusion that LLMs are terrible at decision making. I would much rather have an intern architect my code than let AI do it. It's just too unreliable. It seems like 3 out of 4 decisions that it makes are fine. But that 4th decision is usually asinine.
That said, I now consider LLMs a mandatory addition to my toolkit because they have improved my developer efficiency so much. I really am a fan. But without a seasoned dev to write detailed instructions, break down the project into manageable chunks, make all of the key design decisions, and review every line of code that it writes, today's AI will only add a mountain of technical debt to your project.
I guess I'm trying to say: don't worry because the robots cannot replace use yet. We're still in the middle of the hype cycle.
But what do I know? I'm just an average meat coder.
LLMs currently can generate a few thousand lines of coherent code but they cannot write a cohesive large scale code base.
But LLMs are very good at writing SQL and Cypher queries that I would spend hours or days figuring out how to write.
Agreed.
I find it interesting that LLMs seem pretty good at spitting out SQL that works well enough. But on the other hand LLMs seem pretty awful at working with CSS. I wonder if this is due to a difference in the amount of training data available for SQL vs CSS, or is this because CSS is a finicky pain in the ass when compared to SQL.
There should be a insane amount of CSS on the web but CSS output is primarily visual so I think that makes it hard for a text only model to generate.
interesting. I've been having a great time telling the LLMs to generate CSS for me so I don't have to fight with tailwind
>Continue to practice pair-programming with your editor, but throttle down from the semi-autonomous "Agent" mode to the turn-based "Edit" or "Ask" modes.
This can be done while staying in agent mode. I never used edit mode and only use ask mode when my question has nothing to do with the project I have open. Any other time, I tell it to either make no changes at all as I'm only asking a question to research something, or to limit changes to a much smaller scope and style. It doesn't work perfectly, but it works well enough that it is worth the tradeoff given the extra capabilities agent mode seems to provide (this likely depends upon the specific AI/LLM system you are using, so given another tool I might not arrive at the same conclusion).
Aider does everything right. Stop using Cursor or any other agentic environments. Try Aider, it works exactly as suggested here.
I prefer Claude Code (the `claude` cmd line version, with Sonnet 4) because it's more like an actual pair-programming session. It uses my claude acct rather than costing extra per token. It also hooks into all my MCP tools (shell (restricted), filesystem, ripgrep, test runners, etc. etc.) which makes it pretty amazing.
After turning off its annoying auto-commit-for-everything behavior, aider does work OK but it's harder to really get it to understand what I want during planning. Its new `--watch-files` thing is pretty darn cool though.
I've been wanting to use Aider but I want to use Copilot as a provider (to stay compliant with the wishes of my employer). Haven't gone down that road yet because Aider copilot support seems a bit tentative. I see they have some docs about it up now though: https://aider.chat/docs/llms/github.html
"The easiest path is to sign in to Copilot from any JetBrains IDE"
Somebody must've made a standalone login script by now right? I wonder if `gh auth login` can be used to get a token?
>LLM agents make bad pairs because they code faster than humans think.
Easily solved. Use less compute. Use slower hardware. Or put in the prompt to pause at certain intervals.
>LLM agents make bad pairs because they code faster than humans think
This is why I strongly dislike all of the terminal based tools and PR based stuff. If you're left to read through a completed chunk of code it is just overwhelming and your cycle time is too slow. The key to productivity is using an IDE based tool that shows you every line of code as it is being written, so you're reading it and understanding where it's going in real time. Augmentation, not automation, is the path forward. Think of it like the difference between walking and having a manual transmission car to drive, not the difference between having a car and having a self driving car.
If I have a 20 line function in my mind and the LLM injects 20 lines for me to accept or reject, I have two problems
First I have to review the 20 lines the LLM has produced
Second, if I reject those lines, it has probably shoved the function I had in mind out of my head
It's enormously disruptive to my progress
The hard truth here is in accepting that the 20 lines in your head were probably wrong, or suboptimal, and letting go of that urge. Think in interfaces, not implementations. Successive rendering, not one-shot.
This is just fundamentally not the case most of the time. LLMs guess where you're going, but so often what they produce is a "similar looking" non sequitur relative to the lines above it. It guesses, and sometimes that guess is good, but as often, or more, it's not.
The suggestion "think in interfaces" is fine; if you spell out enough context in comments, the LLM may be able to guess more accurately, but in spelling out that much context for it, you've likely already done the mental exercise of the implementation.
Also baffled by "wrong or suboptimal," I don't think I've ever seen an LLM come up with a better solution.
Maybe it's the domain I work in, or the languages I use, but the 20 lines the LLM comes up with is almost certainly wrong.
> The hard truth here is in accepting that the 20 lines in your head were probably wrong, or suboptimal, and letting go of that urge.
Maybe, but the dogshit that Cursor generates is definitely wrong so frankly if it's gonna be my name on the PR then I want it to me my wrong code not hide behind some automated tool
> Think in interfaces, not implementations
In my experience you likely won't know if you've designed the right interface until you successfully implement the solution. Trying to design the perfect interface upfront is almost guaranteed to take longer than just building the thing
> ...and letting go of that urge.
What urge? The urge to understand what the software you're about to build upon is doing? If so, uh... no. No thanks.
I've seen some proponents of these code-generation machines say things like "You don't check the output of your optimizing compiler, so why check the output of Claude/Devon/whatever?". The problem with this analogy is that the output from mainstream optimizing compilers is very nearly always correct. It may be notably worse than hand-generated output, but it's nearly never wrong. Not even the most rabid proponent will claim the same of today's output from these code-generation machines.
So, when these machines emit code, I will inevitably have to switch from "designing and implementing my software system" mode into "reading and understanding someone else's code" mode. Some folks may be actually be able to do this context-shuffling quickly and easily. I am not one of those people. The results from those studies from a while back that found that folks take something like a quarter-hour to really get back into the groove when interrupted while doing a technical task suggest that not that many folks are able to do this.
> Think in interfaces...
Like has been said already, you don't tend to get the right interface until you've attempted to use it with a bunch of client code. "Take a good, educated stab at it and refine it as the client implementations reveal problems in your design." is the way you're going to go for all but the most well-known problems. (And if your problem is that well-known, why are you writing more than a handful of lines solving that problem again? Why haven't you bundled up the solution to that problem in a library already?)
> Successive rendering, not one-shot.
Yes, like nearly all problem-solving, most programming is and always has been an iterative process. One rarely gets things right on the first try.
[dead]
[dead]
tldr: author is a bad prompter.
Good prompting takes real and ongoing work, thought, foresight, attention, and fastidious communication.
[flagged]