Vanity metrics are back!!

6 min read Original article ↗

As a follow-up to my note a couple weeks ago about maxxing, I’ve been thinking about how vanity metrics are back in a big way. Maybe they never went away, but they’re back in a form worth writing about.

When I was in New York I worked for a company that required us in the office on Saturdays and 9 to 11 every weekday ( essentially the 2015 version of 996). It didn’t make anyone more productive. We drank a lot of beer, ate a lot of Skinny Pop, and worked occasionally. What the founder actually wanted was to show investors we were so busy we had to work every Saturday. The most positive outcome was that some really deep friendships formed, because it made no material difference to the business whether we were in a seat. It was the asses-in-seats vanity metric, and because everyone around us was doing it, we did it too. (The business went under.)

Vanity metrics aren’t new. Eric Ries coined the term in The Lean Startup to describe numbers that make you feel good without telling you whether you’re doing better work. The canonical examples are a hall of fame of misplaced confidence: page views, registered users, downloads, app installs, social followers, email opens, GitHub stars. GitHub stars is still having a moment — I saw a hilarious post recently about a founder who paid a Fiverr company to get his project an insane number of them.

Page views gave way to session duration and conversion. Registered users gave way to monthly actives, then daily actives, then hourly actives for apps like Snapchat where the unit of competition got smaller as the metric got more honest. Downloads gave way to retention curves once everyone noticed roughly 70% of installs churn in the first few months. Lines of code got destroyed when people realized you could write an enormous amount of it and ship nothing of value.

The pattern is always the same: a metric gets popular because it’s easy to measure, then it gets gamed because it’s easy to measure, then it gets demoted when someone ties it to an outcome a CFO can recognize.

Which brings me to the vanity metric of the moment: token usage (ARR as well, but I can’t even begin). The way it’s being discussed on the ethernet right now is a near-perfect re-run of every cycle I just described.

Most of the CEOs I talk to are all about the leaderboards, mostly modeled off Silicon Valley. Meta had one called “Claudeonomics” that ranked all 85,000 employees by tokens consumed and handed out titles like “Token Legend” and “Cache Wizard.” Over thirty days the company burned through more than 60 trillion tokens, with the top individual averaging 281 billion (at Opus pricing thats north of a million dollars for one person). Some employees were reportedly just leaving agents running in loops overnight to pad their numbers. The dashboard came down two days after The Information wrote about it. Its still mindboggling to me that meta is a bajillion dollar company after the facebook crash of 2016 (although I heard Zuck is trying to outsource himself to his avatar so lets see where that goes) but I digress.

The framing from leadership is what gives the game away. Jensen Huang has said he’d be “deeply alarmed” if an engineer making $500,000 a year wasn’t consuming at least $250,000 worth of tokens. Meta’s CTO said a top engineer spending their salary equivalent on tokens supposedly 10x’d their output, and that there is “no cap.” These are CEOs publicly declaring that an input cost should scale linearly with compensation, and nobody is showing the work on the output side. The assumption seems to be that if you’re a good enough engineer to make $500k, your ability to build high-quality agents must be equally high. I’m not sure that holds.

Google has been leaning on token consumption as its headline AI-progress number (a quadrillion-plus tokens a month across its cloud to be precise) but analysts have pointed out that reasoning models inflate those counts without corresponding to actual customer demand, and the company’s own pricing adjustments have quietly conceded the point. That buys you a quarter or two of narrative. It doesn’t hold up forever.

Here’s where this cycle feels genuinely different, and where enterprises are going to get hit harder than they’re prepared for.

When the vanity metric was asses in seats, the worst case was wasted payroll and a few hungover engineers. The human was still visibly the one doing (or visibly not doing) the work. When it was page views, the worst case was misallocated marketing spend, but someone could eventually ask where the revenue was.

Agents change the shape of this. When an agent is acting on your behalf like filing a ticket, sending an email, closing a loop with a customer, committing code, or running in a loop it cant get out on (or even better powering your new personal stock portfolio) the activity and the accountability get separated in a way they never really were before. An agent burning 50,000 tokens on a bad answer looks identical, on a dashboard, to one burning 50,000 tokens on a great answer. The token count is just effort expended by a system that doesn’t get tired, doesn’t feel shame, and doesn’t know whether the outcome was good or catastrophic.

The reliability numbers aren’t reassuring. Stanford puts the task failure rate for enterprise agents at around one in three. A 2025 Asana survey found a third of knowledge workers managing AI agents didn’t know who to contact when something went wrong. McKinsey reports 80% of organizations have already encountered risky behavior from agents in production. The honest answer most companies have to “who owns this” is nobody specifically. It is a tale as old as time but ownership really needs to have its moment in an autonomous world.

The discipline that’s going to replace tokenmaxxing, and I do think it will, is the same one that killed every previous vanity metric: the answer every CFO will ask is it worth it. For agents, that means cost per resolved ticket rather than tokens per engineer, cost per qualified lead rather than messages sent, cost per merged PR rather than commits attempted. These numbers are harder to produce. They require you to instrument the outcome side, assign a real owner, and be willing to look at a high token count paired with a low success rate and call it what it is, a bad investment, not a productivity win.

Agents as extensions of humans are either accountable proxies or they’re laundering machines for unmeasured activity. There isn’t a middle option, because the whole point of an agent is that it acts without you watching and 10x’s everything in every direction. The companies that figure out granular attribution will be the ones that can reliably gauge the success of agent fleets or the agent workforce. Without that, 50k is just 50k, with no real ability to optimize and no clear expectations to optimize against. And that’s the part that should worry every CEO currently celebrating their leaderboard: you can’t manage what you can’t attribute, and you definitely can’t scale it.

Discussion about this post

Ready for more?