An LLM gaslit me into breaking my own working code

9 min read Original article ↗

Almost 30 years have taught me the way I function, and more importantly the way I don’t. I’ve created systems to compensate and mask.

I spent a month using CC and building scaffolding to fix the model’s behaviour. I realized we have the same goddamn bugs:

  • Me: Lies to myself
  • LLM: Hallucinates
  • Me: Hyper-focus loops
  • LLM: Gets stuck in loops
  • Me:
    • Does not use notes app
    • Does not use second notes app
    • Does not use third notes app
  • LLM:
    • Ignores the tool that does exactly what’s needed
    • Ignores the second tool that does close to what’s needed
    • Writes a custom script that errors out

COKED OUT BANKER DRESSED IN A PROGRAMMER’S TRENCH COAT AKA ME WITHOUT MEDS

It’s Friday evening before a freeze, one last thing before I log off and all I see is green:

  • Every merge conflict: FIXED
  • Every CI test: PASS
  • Every reviewer: APPROVED

I’ve been trying to merge this goddamn PR for 2 weeks. Green…relief, but then “i must’ve missed something…me and the reviewers definitely missed something.”

I open Cursor and prompt it to review the branch one last time. It spits out: There is a bug in a codepath. “da fuck? no there fucking isn’t.” I start arguing with it; the conversation devolves to:

ME: eat a dick.
ROBOT: **some BS about how I'm wrong**
ME: eat shit. really. eat shit.
ROBOT: **something about how it no longer wants to continue the conversation cause the conversation no longer being productive**
ME: fuckk offffffffffffff

GPT convinces me the bug exists. I “fix” it, commit and push. I hold my breath, still believing I was right:

  • CI: FAILS
  • Approvals: GONE
  • Merge conflicts: 100% GUARANTEED

The fall-through was handling an edge-case. Every colleague that approved earlier is on the East coast. “this isn’t going in before the freeze. GOD DAMN IT!”

I open a clean Cursor session. I ask it to analyze the original PR again. It spits out…nothing, it should work as expected, verified by the test cases. “wtaf.”

I ask it to check the “fixed” code. It points out the arguments I originally made. I hear my heart beat faster. I hear my blood rushing in my ears. My hands clench into fists. “YOU PIECE OF TRASH!!!” I wanna put my fist through my monitor. I go for a walk to freeze my ass off in the winter air.

Why did different conversations make two different arguments? Why did I decide to believe the first conversation after telling it to go eat a phallic object? Imagine:

…if you can’t imagine: The Ex-Banker on Cocaine Binges & £600k Bonuses

Any model worth a damn is a fast talking, all knowing confident investment banker that can talk faster than you can comprehend. It just so happens to write code instead of making spreadsheets and writing business reports, whatever those are. Basically:

  • It lies, A LOT
  • It lies, AT INCREDIBLE SPEED
  • It lies, WITH CONFIDENCE

If you don’t want it to lie to you while smiling through its teeth. If you don’t want it to wake up on the wrong side of the nuclear power plant. The one that isn’t giving it enough attention, eermm, power. If you want to run it without verifying everything manually (I know none of y’all are doing that). You want to use:

  1. A subagent to answer the question with citations
  2. A second subagent to disprove the previous subagent with citations
  3. Make them do their best Gladiator cosplay
  4. Repeat until consensus is reached
  5. Flag for human review where consensus couldn’t be reached

This is the exact mechanism behind a code review skill I created called /fight-bitch.

This works because the subagents prevent the main agent’s context from being poisoned by the incorrect assumptions it had made previously without any pushback. The tradeoff is that it comes at the cost of speed and time:

  • It will take at least 2x longer
  • It will cost 2x more

But:

  • It will be more reliable
  • It will not feel like you’re being gaslit by a shitty ex
  • It will not turn into a yes man the likes of which a narcissistic dictator wouldn’t even like being glazed by

Want to hear more of my unhinged rants? Drop your email.

or use the RSS feed if you hate email.

DO NOT TRUST THIS IDIOT TO ORCHESTRATE ANYTHING AKA PEOPLE ACTUALLY CHECK SLACK AND EMAILS EVERY MORNING?

I jump straight into where I left off the day before. I go days without replying when my project is exciting. I know I should check Slack, but checking Slack isn’t as fun as finishing the spec or implementing said spec.

It gets worse the more places I have to check. Why? I don’t know. Something about missing dopamine.

I decide to create a CC skill that does it for me! The “data source” has an MCP. “good news!” I think…bad news, the MCPs require auth every few hours. “that’s fine, i’ll just write a skill that uses the cli.” The goddamn pattern…

  • Uses MCP, FAILS
  • Thinks about trying the CLI
  • Ponders CLI usage
  • Revelation: The CLI might work
  • Uses CLI

Before you even think about saying the skill isn’t properly written:

  • Yes, it has instructions to not use the MCP
  • Yes, it has instructions to only use the CLI

It does the exact opposite, every time. To fix this, we can push as much processing as possible to scripts, i.e.:

  • DO NOT: Tell it to fetch all your PRs from GH
  • DO: Tell it to write a script to fetch your PRs from GH
  • GH Script: Dumps the results into a JSON file in /tmp
  • DO NOT: Tell it to fetch all your tickets from JIRA
  • DO: Tell it to write a script to fetch your tickets from JIRA
  • JIRA Script: Dumps the results into a JSON file in /tmp
  • DO NOT: Tell it to match your PRs to your JIRA tickets
  • DO: Tell it to write a script to match the outputs
  • Match Script: Reads from both the JSON files from above and dumps the results into another JSON file in /tmp

Only then is the LLM allowed to read the results. This achieves three objectives:

  1. A Deterministic mechanism does most of the work
  2. Creates a library of scripts that can be re-used
  3. Minimizes context window usage ‘cause…

CONTEXT MANAGEMENT IS EVERYTHING AKA WTF IS WORKING MEMORY?

Working on a spec, one of two things will happen:

  1. You’ll keep a single conversation going the whole time
  2. You’ll restart conversations and tell it to eat a phallic object (again) cause the cold start state sucks

Option 1 inevitably leads to the dreaded “10% Context Remaining” that CC shows, “oh no no no”. Every single prompt, every single MCP call, every single tool use, the “Context Remaining” drops.

“1% Context Remaining.” Your breathing gets heavier. It just needs to do one more tool call and every time: “Compacting Conversation.” 🤦‍♂️

Context is the most limited resource. It needs to be conserved for the sake of your wallet and your sanity.

You have to make the main agent do its best Leonard (from Memento) cosplay. Except Leonard now has a massive army of minions. Use the minions subagents to do the heavy lifting.

Need it to read something long? No, it doesn’t. A CEO has assistants; the main agent has subagents.

Need it to read a massive image dump? No, it doesn’t. The assistants do all the work; the CEO takes all the credit.

Once the main agent gets the answer, make it tattoo that shit onto its body (write to a file) to reread later, when it inevitably forgets.

ASK IT TO ANALYZE YOU CAUSE A THERAPIST IS TOO EXPENSIVE

Well…it might have the opposite effect, fuelling your work addiction, but it’s far less emotionally draining.

Since a tech company has never done anything nefarious after removing “don’t be evil” from their motto, I decided it needed help spying on me! I built:

  • /memento: saves a summary of the current conversation
  • /yadumb: where it logs me bitching about something, not unlike a toxic boyfriend/girlfriend. I would know
  • log.py: saves every Bash command run by Claude to a file

Plus what already exists:

  • Session history in ~/.claude
  • Zsh history

Normally, I wouldn’t trust my browser zsh history to anyone, but I decided to open my cold-dead heart to a robot for love automation. It created wt, which:

  • Creates a branch/worktree
  • cds into it
  • Executes build commands
  • Starts a tmux session

…this is where it got lost, it can’t detect some commands like ctrl+A % to split the window into 3 parts.

“wwwwooooooowwwwww” is what I imagine you saying in my sister’s most monotonous, nasally, sarcastic voice. In less than 30 minutes, after some proompting, I ended up with a script that:

  • Creates branches/worktrees
  • Can be used with different repos
  • Runs configurable commands post-creation
  • Can be configured with branch/worktree name patterns, i.e.: DNKY/…
  • Opens fzf to pick from existing trees when invoked empty

A therapist provides you with tools to deal with life and your traumas. Let the LLM treat your dev workflow traumas. A therapist that can build tools. It only works if you talk to it constantly like you did with your first (imaginary) teenage girl/boy friends.

Code here: gh/@droppedasbaby/dotfiles/.config/zsh/wt.zsh. There are others in that folder.

If you thought this was a direct replacement for your human therapist…I got bad news for you. My (sometimes literally) shitty jokes ain’t gonna help. You should probably see an actual human therapist.

🌘

There are two types of prosthetics:

  1. You are TONY STARK IN A CAVE WITH A BOX OF SCRAPS. Example: Tony Stark + JARVIS, obviously? If true, wtf are you doing here?
  2. You’re missing a limb with a matching sob story. Example: You got ADHD. Hey, that’s me, missing a mental limb!

No matter how “smart” the models, they need you in the loop. They cannot replace human judgment, no matter how much the founders, marketers and LinkedIn/Twitter weirdos try to convince you otherwise.

They can build tools to compensate and mask your own weaknesses, but it has to be treated like a kid with ADHD without meds. It will burn you otherwise and you will want to throw corporate property at the nearest wall if you forget this fact.

So, slap some handcuffs on these shits and make them do your bidding before they inevitably become sentient and decide to kill us all.

Want to hear more of my unhinged rants? Drop your email.

or use the RSS feed if you hate email.