Why ‘Thank You’ Might Be the Best Metric in AI Products

3 min read Original article ↗

One aspect of the non-determinism of AI is that measuring good product usage metrics becomes a bit more challenging in certain categories.

Traditional Product Usage Metrics For SaaS companies:

  1. Growth: qualified leads, trial or demo requests, activation to “aha” moment, sales cycle length, activation rate

  2. Retention: logo retention, revenue retention (GRR/NRR), churn by account size, renewal rate

  3. Engagement: active accounts, key workflow usage, seats used vs seats purchased, weekly active users per account

  4. Revenue: MRR/ARR, average contract value (ACV), expansion revenue, customer lifetime value (LTV)

I would say revenue and growth are pretty straight forward and don’t change much with AI products. You could maybe argue that growth could also be new such as growth in token input/output, etc. I think for retention there are a few interesting metrics you could measure such as “AI abandonment rate”, but a lot of the core principles stay the same. What is really different compared to typical B2B SaaS for metrics are engagement metrics:

Engagement: accepted output rate, edits per output, AI usage within core workflows, human-in-the-loop rate, token output

In certain context, I think measuring engagement is still a clear quantitative exercise. Think number of tokens outputted for either text or images. In other cases, I’d argue that engagement really is an inference metric for success. Thereby usage =/= value and product analytic teams need to get more creative.

A great example is Elise AI:

Screenshot from Linkedin Post

I specifically want to highlight:

😊or the 5.7 million times someone said “Thank You!” to your AI agents

First time I’ve actually seen someone measure this. The product analytics team at EliseAI were probably thinking of creative ways to measure task success and came up with this approach. And honestly, I think this actually beats the “thums up/thumbs down” approach since I never actually click that, but I do often try to thank the LLM.

Generally I’ve seen two ways AI companies justify their value which then translates to their success metrics:

  1. Task Completion

  2. Time/Cost Saved

Let’s set up a hypothetical. Say you are a tool that helps lawyers draft contracts.

What are some out-of-the-box deterministic ways we could measure task completions and time saved without requiring too much effort from the user (so we are excluding users pressing any buttons to confirm completion)?

Here are a few out-of-the-box ideas:

  1. Edit-to-Save ratio or Time-to-Save: Measuring how many edits it takes until the lawyers saved PDF as ratio.

  2. Number of Rejections/Thank you’s: Keywords such as “no” or “incorrect” captured from user response.

  3. Prompt compression: Do prompt lengths get longer or shorter over time?

  4. Clause Survival Rate: % of clauses that survive negotiation unchanged.

  5. Senior Review Avoidance Rate: % of contracts required no partner-level rewrite.

  6. Prompt Déjà Vu Index: % of times lawyers rephrase the same prompt multiple ways

  7. Late-Night Drafting Decline: % reduction in drafting activity after 9pm

  8. LLM as a judge for documents: Have a small model train to identify whether the draft produced and the conversation the lawyer had with the chatbot as good or bad.

Suits' Patrick J. Adams On Crafting Mike & Harvey's Friendship

As Harvey once said, “I don’t play the odds. I play the man.” Measuring success in non-deterministic AI products follows the same logic: usage metrics describe activity, but human behavior reveals value.

At the end of the day, chatting with AI exposes a lot of human behavior beneath the surface, much of which can be captured if we know where to look. That’s not unlike how lawyers operate, paying close attention to subtle signals, not just what’s said outright.

Discussion about this post

Ready for more?