An LLM gaslit me into breaking my own working code

Almost 30 years have taught me the way I function, and more importantly the way I don’t. I’ve created systems to compensate and mask.

I spent a month using CC and building scaffolding to fix the model’s behaviour. I realized we have the same goddamn bugs:

Me: Lies to myself
LLM: Hallucinates

Me: Hyper-focus loops
LLM: Gets stuck in loops

Me: Wait, you guys remember things?
LLM: Context degrades

Me:
- Does not use notes app
- Does not use second notes app
- Does not use third notes app
LLM:
- Ignores the tool that does exactly what’s needed
- Ignores the second tool that does close to what’s needed
- Writes a custom script that errors out

COKED OUT BANKER DRESSED IN A PROGRAMMER’S TRENCH COAT AKA ME WITHOUT MEDS

It’s Friday evening before a freeze, one last thing before I log off and all I see is green:

Every merge conflict: FIXED
Every CI test: PASS
Every reviewer: APPROVED

I’ve been trying to merge this goddamn PR for 2 weeks. Green…relief, but then “i must’ve missed something…me and the reviewers definitely missed something.”

I open Cursor and prompt it to review the branch one last time. It spits out: There is a bug in a codepath. “da fuck? no there fucking isn’t.” I start arguing with it; the conversation devolves to:

ME: eat a dick.
ROBOT: **some BS about how I'm wrong**
ME: eat shit. really. eat shit.
ROBOT: **something about how it no longer wants to continue the conversation cause the conversation no longer being productive**
ME: fuckk offffffffffffff

GPT convinces me the bug exists. I “fix” it, commit and push. I hold my breath, still believing I was right:

CI: FAILS
Approvals: GONE
Merge conflicts: 100% GUARANTEED

The fall-through was handling an edge-case. Every colleague that approved earlier is on the East coast. “this isn’t going in before the freeze. GOD DAMN IT!”

I open a clean Cursor session. I ask it to analyze the original PR again. It spits out…nothing, it should work as expected, verified by the test cases. “wtaf.”

I ask it to check the “fixed” code. It points out the arguments I originally made. I hear my heart beat faster. I hear my blood rushing in my ears. My hands clench into fists. “YOU PIECE OF TRASH!!!” I wanna put my fist through my monitor. I go for a walk to freeze my ass off in the winter air.

Why did different conversations make two different arguments? Why did I decide to believe the first conversation after telling it to go eat a phallic object? Imagine:

Elmo snorting that good good
A fast talking 23 year old whose nose bleeds cause of “dry air”

…if you can’t imagine: The Ex-Banker on Cocaine Binges & £600k Bonuses

Any model worth a damn is a fast talking, all knowing confident investment banker that can talk faster than you can comprehend. It just so happens to write code instead of making spreadsheets and writing business reports, whatever those are. Basically:

It lies, A LOT
It lies, AT INCREDIBLE SPEED
It lies, WITH CONFIDENCE

If you don’t want it to lie to you while smiling through its teeth. If you don’t want it to wake up on the wrong side of the nuclear power plant. The one that isn’t giving it enough ~~attention,~~ eermm, power. If you want to run it without verifying everything manually (I know none of y’all are doing that). You want to use:

A subagent to answer the question with citations
A second subagent to disprove the previous subagent with citations
Make them do their best Gladiator cosplay
Repeat until consensus is reached
Flag for human review where consensus couldn’t be reached

This is the exact mechanism behind a code review skill I created called /fight-bitch.

This works because the subagents prevent the main agent’s context from being poisoned by the incorrect assumptions it had made previously without any pushback. The tradeoff is that it comes at the cost of speed and time:

It will take at least 2x longer
It will cost 2x more

But:

It will be more reliable
It will not feel like you’re being gaslit by a shitty ex
It will not turn into a yes man the likes of which a narcissistic dictator wouldn’t even like being glazed by

Want to hear more of my unhinged rants? Drop your email.

or use the RSS feed if you hate email.

DO NOT TRUST THIS IDIOT TO ORCHESTRATE ANYTHING AKA PEOPLE ACTUALLY CHECK SLACK AND EMAILS EVERY MORNING?

I jump straight into where I left off the day before. I go days without replying when my project is exciting. I know I should check Slack, but checking Slack isn’t as fun as finishing the spec or implementing said spec.

It gets worse the more places I have to check. Why? I don’t know. Something about missing dopamine.

I decide to create a CC skill that does it for me! The “data source” has an MCP. “good news!” I think…bad news, the MCPs require auth every few hours. “that’s fine, i’ll just write a skill that uses the cli.” The goddamn pattern…

Uses MCP, FAILS
Thinks about trying the CLI
Ponders CLI usage
Revelation: The CLI might work
Uses CLI

Before you even think about saying the skill isn’t properly written:

Yes, it has instructions to not use the MCP
Yes, it has instructions to only use the CLI

It does the exact opposite, every time. To fix this, we can push as much processing as possible to scripts, i.e.:

DO NOT: Tell it to fetch all your PRs from GH
DO: Tell it to write a script to fetch your PRs from GH
GH Script: Dumps the results into a JSON file in /tmp

DO NOT: Tell it to fetch all your tickets from JIRA
DO: Tell it to write a script to fetch your tickets from JIRA
JIRA Script: Dumps the results into a JSON file in /tmp

DO NOT: Tell it to match your PRs to your JIRA tickets
DO: Tell it to write a script to match the outputs
Match Script: Reads from both the JSON files from above and dumps the results into another JSON file in /tmp

Only then is the LLM allowed to read the results. This achieves three objectives:

A Deterministic mechanism does most of the work
Creates a library of scripts that can be re-used
Minimizes context window usage ‘cause…

CONTEXT MANAGEMENT IS EVERYTHING AKA WTF IS WORKING MEMORY?

Working on a spec, one of two things will happen:

You’ll keep a single conversation going the whole time
You’ll restart conversations and tell it to eat a phallic object (again) cause the cold start state sucks

Option 1 inevitably leads to the dreaded “10% Context Remaining” that CC shows, “oh no no no”. Every single prompt, every single MCP call, every single tool use, the “Context Remaining” drops.

“1% Context Remaining.” Your breathing gets heavier. It just needs to do one more tool call and every time: “Compacting Conversation.” 🤦‍♂️

Context is the most limited resource. It needs to be conserved for the sake of your wallet and your sanity.

You have to make the main agent do its best Leonard (from Memento) cosplay. Except Leonard now has a massive army of minions. Use the ~~minions~~ subagents to do the heavy lifting.

Need it to read something long? No, it doesn’t. A CEO has assistants; the main agent has subagents.

Need it to read a massive image dump? No, it doesn’t. The assistants do all the work; the CEO takes all the credit.

Once the main agent gets the answer, make it tattoo that shit onto its body (write to a file) to reread later, when it inevitably forgets.

ASK IT TO ANALYZE YOU CAUSE A THERAPIST IS TOO EXPENSIVE

Well…it might have the opposite effect, fuelling your work addiction, but it’s far less emotionally draining.

Since a tech company has never done anything nefarious after removing “don’t be evil” from their motto, I decided it needed help spying on me! I built:

/memento: saves a summary of the current conversation
/yadumb: where it logs me bitching about something, not unlike a toxic boyfriend/girlfriend. I would know
log.py: saves every Bash command run by Claude to a file

Plus what already exists:

Session history in ~/.claude
Zsh history

Normally, I wouldn’t trust my ~~browser~~ zsh history to anyone, but I decided to open my cold-dead heart to a robot for ~~love~~ automation. It created wt, which:

Creates a branch/worktree
cds into it
Executes build commands
Starts a tmux session

…this is where it got lost, it can’t detect some commands like ctrl+A % to split the window into 3 parts.

“wwwwooooooowwwwww” is what I imagine you saying in my sister’s most monotonous, nasally, sarcastic voice. In less than 30 minutes, after some proompting, I ended up with a script that:

Creates branches/worktrees
Can be used with different repos
Runs configurable commands post-creation
Can be configured with branch/worktree name patterns, i.e.: DNKY/…
Opens fzf to pick from existing trees when invoked empty

A therapist provides you with tools to deal with life and your traumas. Let the LLM treat your dev workflow traumas. A therapist that can build tools. It only works if you talk to it constantly like you did with your first (imaginary) teenage girl/boy friends.

Code here: gh/@droppedasbaby/dotfiles/.config/zsh/wt.zsh. There are others in that folder.

If you thought this was a direct replacement for your human therapist…I got bad news for you. My (sometimes literally) shitty jokes ain’t gonna help. You should probably see an actual human therapist.

🌘

There are two types of prosthetics:

You are TONY STARK IN A CAVE WITH A BOX OF SCRAPS. Example: Tony Stark + JARVIS, obviously? If true, wtf are you doing here?
You’re missing a limb with a matching sob story. Example: You got ADHD. Hey, that’s me, missing a mental limb!

No matter how “smart” the models, they need you in the loop. They cannot replace human judgment, no matter how much the founders, marketers and LinkedIn/Twitter weirdos try to convince you otherwise.

They can build tools to compensate and mask your own weaknesses, but it has to be treated like a kid with ADHD without meds. It will burn you otherwise and you will want to throw corporate property at the nearest wall if you forget this fact.

So, slap some handcuffs on these shits and make them do your bidding before they inevitably become sentient and decide to kill us all.

If you wanna read more of my 2AM antics, checkout Homelab Chronicles: A Dusty Gaming PC and a 2AM Basement Spiral
Or if you would rather dig up disasters buried in commit history, checkout Ban commits/transactions using AST analysis and linters

Want to hear more of my unhinged rants? Drop your email.

or use the RSS feed if you hate email.