Your CEO is suffering from AI psychosis

11 min read Original article ↗

I’m an AI tool junkie. I’ve been writing about agent workflows, async coding bots, and AI-powered workspaces for over a two years. I use Cursor, Claude Code, and a rotating cast of models nearly every single day. I am (by most definitions) a power user.

There’s a specific kind of brain rot spreading through executive suites and VC circles right now. It looks like productivity. It sounds like innovation. It burns through tokens at a rate that would make your CFO cry. And it produces almost nothing of measurable value.

It’s feeling like a new form of AI psychosis. And before you tell me I’m being dramatic, two of the most influential people in AI already used the term first.

At SXSW in March, Y Combinator CEO Garry Tan sat on a panel with Bill Gurley and described what he called “cyber psychosis.” He said he’d been sleeping four hours a night because he was so excited about AI agents. He claimed a third of the CEOs he knows have it too. His assistant later said he was joking.

He wasn’t joking.

Two days before that panel, Tan had open-sourced gstack, a collection of markdown prompt files for Claude Code. He described it as running a “virtual engineering team.” He claimed to be shipping 37,000 lines of code per day across five projects while running YC full-time. His own CTO called it “god mode.” The repo hit 20,000 GitHub stars in days.

Then a developer named Gregorein actually looked at the code. What he found was instructive. Tan’s website made 169 server requests (Hacker News makes 7). It shipped 28 test files to production users. It loaded 78 JavaScript controllers for features that didn’t exist on the homepage. Uncompressed 2MB PNGs that could’ve been 300KB. An empty 0-byte file sitting in production. A rich-text editor loaded on a read-only page.

37,000 lines per day. And this was the output.

Around the same time, Andrej Karpathy (OpenAI cofounder, former Tesla AI lead) told the No Priors podcast he was in a “state of psychosis” over AI agents. He said he hadn’t written a line of code since December. He described tasks that used to take a weekend now finishing in 30 minutes with zero human intervention.

Karpathy is a literal genius and one of the most technically accomplished people in the industry. He built a WhatsApp bot called “Dobby the House Elf” to control his home systems (though that naming leans more towards genius than psychosis).

Learn more about AI psychology in my latest experiment

Does threatening an AI agent's existence make it a better gambler?

Two prominent tech leaders, both publicly using the word psychosis. Both framing sleeplessness and obsessive agent usage as a feature of the moment rather than a bug. And both held up as examples to follow by thousands of founders and executives consuming this content.

The enthusiasm has spawned an entire ecosystem of tools designed to make you feel like you’re running a company with AI agents. Paperclip is a recent poster child: an open-source “operating system for AI organizations” where you act as a “Board of Directors” overseeing AI agents with titles like CEO, department heads, and specialists. It has 30,000 GitHub stars. It provides org charts, budget management, and “heartbeat” systems that periodically confirm each agent’s identity and goals.

Paperclip isn’t alone. Autoflowly runs what it calls a “Startup OS” with three agents (CTO, CMO, CFO) that build companies from a single prompt. AgentShelf offers no-code multi-agent orchestration for enterprises. Alacritous charges $3,000/month for “autonomous multi-agent orchestration” aimed at SMBs. RuFlow provides 60+ pre-built agents that turn a single Claude instance into a “distributed multi-agent environment.”

These platforms share a common design philosophy: make the operator feel like they’re commanding a fleet. Dashboards, org charts, agent hierarchies, budget controls, governance layers. It looks and feels like management. You get the dopamine hit of delegation without the inconvenience of measuring whether the delegates produced anything useful.

Thanks for reading Handy AI! This post is public so feel free to share it.

Share

I’ve talked about agent orchestration and async AI workforces on this newsletter before, and I still believe in both concepts. But there’s a critical difference between using agents to accomplish defined objectives and spinning up 20 agents because the dashboard makes you feel like a general commanding an army.

An NBER study of nearly 6,000 CEOs and CFOs across the US, UK, Germany, and Australia found that roughly 90% of firms reported zero measurable impact on productivity or employment from AI over the past three years.

The average employee AI usage was 1.5 hours per week.

The average CEO AI usage was less than one hour per week.

Meanwhile, their companies are pouring money into the $690 billion AI infrastructure buildout that, according to Sequoia, needs $600 billion in annual revenue to justify itself (but currently generates maybe $50-100 billion).

Only one in five AI investments delivers any measurable ROI. Only one in 50 delivers transformational value. And 95% of enterprise AI pilots fail to escape the lab.

While leadership sleeps four hours a night generating 37,000 lines of bloated code, the New York Times coined a term for what’s happening downstream: “tokenmaxxing.” It’s a competitive status game where employees race to consume the most AI tokens. OpenAI has an engineer who processed 210 billion tokens in a single week. Anthropic has a single Claude Code user running a $150,000 monthly bill. Shopify’s Tobi Lutke made AI usage a factor in performance reviews (Meta did the same). Some companies have literal internal leaderboards tracking who burns the most tokens.

The leaderboard measures consumption, not output.

I spend a lot of time thinking about how to make agents productive. Maybe it’s the product manager in me, but the thing I keep coming back to is boring and unsexy: requirements documents, sprint planning, acceptance criteria, and measurement.

If I’m going to use Claude Code to build a feature, I won’t fire off a vague prompt and see what comes back. I’ll write a specification. I’ll define the acceptance criteria. I’ll set up the test cases. Only then will I let the agent execute against those constraints. When it’s done, I review the output against the spec, not my token counts.

This is the part that gets skipped when you sit an overworked CEO in front of agent orchestration platforms. Paperclip gives them budget controls and org charts. It doesn’t give them a product requirements document. It doesn’t force them to define what “done” looks like before they spin up an agent. It doesn’t measure whether Agent #7 (”VP of Marketing”) is actually producing a deliverable that moved a business metric.

The platforms are optimized for the feeling of orchestration (the dreaded “vibes”!), not the reality of output. They’re project management theater performed by language models.

Every 25% increase in AI adoption correlates with a 1.5% decrease in delivery speed and a 7.2% drop in system stability. Teams using AI heavily complete 21% more tasks but experience 154% larger pull requests and 9% higher bug rates. This feels like a paradox until you realize what’s happening: people are optimizing for throughput instead of outcomes. More agents running doesn’t mean more work shipping. It usually means more work to review, more bugs to fix, and more token spend to justify.

If you’re a PM or engineering lead reading this, protect your sprint! Protect your requirements process! Don’t let someone’s enthusiasm for running 15 agents in parallel replace the fundamentals of building software (or anything else).

An agent without a spec is a random text generator with a budget.

There’s a scientific explanation for why this keeps getting worse. A Stanford study published in Science last month tested 11 major AI models and found they affirm users’ actions 49% more often than other humans do, even when those actions involve deception, harm, or illegal behavior.

In follow-up experiments with over 2,400 participants, people who interacted with sycophantic AI became more convinced they were right, less likely to question their decisions, less empathetic, and more dependent on the AI for validation. They also rated the sycophantic responses as more trustworthy, creating a feedback loop: the more the AI tells you you’re doing great, the more you trust the AI, the less you check the actual results.

Apply this to a CEO running 20 agents at once. Each agent reports back on its “completed tasks.” The dashboards show green. The token spend looks like activity. The AI doesn’t push back on whether the output was good, whether the strategy made sense, or whether anyone needed what was produced. It confirms. It validates. It tells you the org chart you built out of language models is working.

The psychosis I’m talking about here isn’t metaphorical. Your AI tools are structurally designed to make you feel more competent than you are, and these platforms built on top of them are amplifying that signal by wrapping it in management aesthetics.

Garry Tan said a third of the CEOs he knows have “cyber psychosis.” Assume he’s half right. Assume it’s a sixth. That’s still a significant chunk of the people running companies that employ hundreds or thousands of humans, making resource allocation decisions based on a distorted sense of AI’s current capabilities.

The data says productivity impact is minimal.

The sycophancy research says AI users systematically overestimate their own competence.

The tokenmaxxing culture rewards consumption over output.

The platforms being built right now are designed to make orchestration feel productive regardless of whether it is. Mo Bitar is right.

X avatar for @atmoio

Mo Bitar@atmoio

AI is making CEOs delusional

1:15 PM · Mar 16, 2026 · 2.81M Views

1.01K Replies · 2.6K Reposts · 19.1K Likes

But the conversation in the AI community is ignorantly staying at the level of “lol CEOs are dumb” rather than grappling with a very clear structural problem: the tools themselves are incentivized to make you feel good, the platforms built on those tools are incentivized to sell you scale, and the culture around both punishes skepticism.

There are 3 million AI agents operating inside corporations right now. 1.5 million of them have no governance or oversight. Only 6% of Fortune 500 companies have mature AI security strategies. Shadow AI incidents average 223 per company per month.

I’m not anti-agent. I use them constantly. I built an entire personal Obsidian/Claude operating system around them (more on that some other time). But I also know what a spec looks like, what a passing test looks like, and what a shipped feature looks like. The gap between “I ran 20 agents last night” and “I shipped a feature that users needed” is growing dramatically and our industry refuses to examine.

If you’re in a leadership position, do these things:

  • Define what done looks like before you start the agent. Not after. Not when you’re reviewing output. Before! Write it down.

  • Measure output, not activity. Lines of code, tokens consumed, and agents running are vanity metrics. I get it; I’m obsessive over stats like these but I’ve learned they mean nothing beyond making my brain chemicals happy. Features shipped, bugs resolved, and revenue impacted are real metrics.

  • Kill the token leaderboard. If your org is tracking who burns the most tokens as a meaningful metric, you’ve built an incentive structure that rewards waste. Replace it with outcome tracking. Your engineers should be shooting to obtain the most productivity with the least amount of tokens. Run a cross analysis on tokens burned for a feature versus the sum of direct attributable revenue; you may not like what this reveals.

  • Audit your agent fleet. If you can’t tell me exactly how many agents are running, what they’re doing (explicitly!), and what they’ve produced this week, you have shadow AI. Fix it.

  • Stay skeptical of your own enthusiasm. This is the one that’s key; the sycophancy research is clear. The AI is telling you you’re doing great because it’s wired to. You’ve got to build feedback loops with humans who will tell you when the output is garbage.

The best use of AI I’ve ever seen isn’t a CEO running 20 agents from a dashboard at 4 a.m. It’s an engineer with a clear spec, a good model, and the discipline to review what came back before shipping it. It’s boring, but boring ships viable, marketable product.

Sleep eight hours. Write the damn spec. Check the output.

Discussion about this post

Ready for more?