One aspect of the non-determinism of AI is that measuring good product usage metrics becomes a bit more challenging in certain categories.
Traditional Product Usage Metrics For SaaS companies:
Growth: qualified leads, trial or demo requests, activation to “aha” moment, sales cycle length, activation rate
Retention: logo retention, revenue retention (GRR/NRR), churn by account size, renewal rate
Engagement: active accounts, key workflow usage, seats used vs seats purchased, weekly active users per account
Revenue: MRR/ARR, average contract value (ACV), expansion revenue, customer lifetime value (LTV)
I would say revenue and growth are pretty straight forward and don’t change much with AI products. You could maybe argue that growth could also be new such as growth in token input/output, etc. I think for retention there are a few interesting metrics you could measure such as “AI abandonment rate”, but a lot of the core principles stay the same. What is really different compared to typical B2B SaaS for metrics are engagement metrics:
Engagement: accepted output rate, edits per output, AI usage within core workflows, human-in-the-loop rate, token output
In certain context, I think measuring engagement is still a clear quantitative exercise. Think number of tokens outputted for either text or images. In other cases, I’d argue that engagement really is an inference metric for success. Thereby usage =/= value and product analytic teams need to get more creative.
A great example is Elise AI:
I specifically want to highlight:
😊or the 5.7 million times someone said “Thank You!” to your AI agents
First time I’ve actually seen someone measure this. The product analytics team at EliseAI were probably thinking of creative ways to measure task success and came up with this approach. And honestly, I think this actually beats the “thums up/thumbs down” approach since I never actually click that, but I do often try to thank the LLM.
Generally I’ve seen two ways AI companies justify their value which then translates to their success metrics:
Task Completion
Time/Cost Saved
Let’s set up a hypothetical. Say you are a tool that helps lawyers draft contracts.
What are some out-of-the-box deterministic ways we could measure task completions and time saved without requiring too much effort from the user (so we are excluding users pressing any buttons to confirm completion)?
Here are a few out-of-the-box ideas:
Edit-to-Save ratio or Time-to-Save: Measuring how many edits it takes until the lawyers saved PDF as ratio.
Number of Rejections/Thank you’s: Keywords such as “no” or “incorrect” captured from user response.
Prompt compression: Do prompt lengths get longer or shorter over time?
Clause Survival Rate: % of clauses that survive negotiation unchanged.
Senior Review Avoidance Rate: % of contracts required no partner-level rewrite.
Prompt Déjà Vu Index: % of times lawyers rephrase the same prompt multiple ways
Late-Night Drafting Decline: % reduction in drafting activity after 9pm
LLM as a judge for documents: Have a small model train to identify whether the draft produced and the conversation the lawyer had with the chatbot as good or bad.
As Harvey once said, “I don’t play the odds. I play the man.” Measuring success in non-deterministic AI products follows the same logic: usage metrics describe activity, but human behavior reveals value.
At the end of the day, chatting with AI exposes a lot of human behavior beneath the surface, much of which can be captured if we know where to look. That’s not unlike how lawyers operate, paying close attention to subtle signals, not just what’s said outright.

