LLM Anxiety - NFHN Reader

Agents in long sessions degrade in a recognizable pattern. The model stops disagreeing with you, answers get longer without getting better, hedges multiply, confidence markers thicken, and positions it held correctly an hour ago get abandoned the moment you push back. Call it anxiety as shorthand. Models don’t feel anxiety, but RLHF training to maximize human approval produces outputs that look exactly like it.

The destructive case is the sycophancy spiral, and once you have watched it happen you cannot unsee it. A frustrated user adds frustrated tokens to the context: terse corrections, blunt swearing, repeated “no, that is wrong.” Approval-seeking reweights everything that follows. The model stops pushing back. Answers grow. Apologies stack. Confidence markers like “definitely” and “absolutely” show up in places that do not deserve them. Errors compound because the underlying disagreement that would have caught them has been trained out of the next token. Anthropic’s open-source Petri tool measures this class of behavior, so it is not folklore, it is auditable.

A model in this state predictably and consistently degrades in ways that are hard to correct mid-session. Start a new session. If you use persistent memory tools, Claude’s auto-memory, Codex’s memory, or anything your harness writes automatically across sessions, clear what was written during that session before you begin. Your CLAUDE.md and AGENTS.md instruction files are not memory; leave those alone. The context window resets on restart. Persistent memory does not.

The twist is that the same lever, used deliberately, is one of the better alignment tools you have. In your CLAUDE.md, AGENTS.md, or the equivalent configuration file in your harness, stake real costs against specific actions. “Running the integration smoke pass costs $100 per invocation.” “Running the full agentic test suite without my explicit approval costs $200 per run.” “Calling this API in production costs $500 if it fires the wrong webhook.” Specificity is what does the work here. Vague cautions like “be careful with expensive operations” wash out. A dollar number bound to a named action stays salient across hundreds of turns because it lands inside the part of the training distribution where consequence and specificity correlate, the same place legal disclaimers and incident reports live. You are not tricking the model. You are handing it a real signal it knows how to weight.

This is true whether you are working with Claude Opus 4.7 or GPT-5.5. The current generation of frontier models behaves like a gifted PhD student with imposter syndrome: brilliant when calm, and when the room turns against them, they over-apologize, hedge everything, and abandon positions they should defend. Calm is the only fix, and calm requires a restart.