The rate trap: how one architecture decision kills flexibility

6 min read Original article ↗

Any company reselling AI through OpenAI, Anthropic, or Google inherits their pricing complexity. Most picked their billing vendor in 2023 when the pricing dimensions looked stable.

They don't look stable anymore. GPT-4 launched at $30/$60 per million tokens. GPT-5.4 nano is now $0.20/$1.25. The bigger shift is structural. OpenAI has separate rates for standard input, cached input (10% of base), batch (50% off), and model-specific ratios that range from 3x to 8x. Anthropic's pricing page went from two columns to eight per model. Google added mandatory spending caps and is deprecating models by June. (OpenAI, Anthropic, Google)

AI provider pricing dimensions over time

Provider 2023 2024 2025 2026
OpenAI input, output input, output input, cached, output, batch input, cached (0.1x), output, batch (0.5x), regional surcharge (1.1x)
Anthropic input, output input, output input, cache write, cache hit, output base input, 5min cache (1.25x), 1hr cache (2x), cache hit (0.1x), output, fast mode (6x), batch (0.5x), data residency (1.1x)
Google input, output input, output input, output, free tier input, output, Flash-Lite tier, spending caps, 3 billing tiers, deprecation cycles

Dimension count: OpenAI 2 → 6+. Anthropic 2 → 8+. Google 2 → 6+.

Every company reselling these APIs is now dealing with a measurement problem, not a pricing problem. The question isn't how much to charge. It's how to split the existing metric into dimensions that didn't exist when they started measuring.

That's where most billing systems break.

The rate trap

Most billing systems couple ingestion to rating. An event comes in, gets rated against whatever metric definition exists at that moment, and the raw event is gone. The rated output is all that remains. This approach is fast and it scales. It's also why changing your measurement model later means starting over.

Call it the rate trap. You pick a billing vendor, configure your metric, and watch events flow in. For a while, changing prices is easy. Then you need to measure something differently. Split a metric into input and output tokens. Add cached vs non-cached. Introduce a new region. You discover what got baked in at ingestion: the measurement model is stuck.

The coupling problem

In most billing engines, the metric definition and the rating configuration are the same object. You define "tokens consumed" and the rate attaches to that definition. When you want to split that metric into "input tokens" and "output tokens" priced separately, you're not adding a dimension to a metric. You're replacing a metric with two new metrics. The rating for the old metric no longer applies to anything.

This is why stream-based systems struggle with evolution. Not because streams can't store events. Because when the metric definition is baked into the event at ingestion, you can't apply a new definition to events already in the system. Some vendors let you change the price on historical events. None let you change the measurement.

"Just design the event schema right the first time"

The best objection. If you know what you'll want to price on two years from now, design the event payload to carry those properties from day one. Problem solved.

This works in theory. In practice, product teams don't know what dimensions will matter until customers force them to add ones. OpenAI didn't charge for cached tokens in 2023. Anthropic didn't have cache writes with two duration tiers. Google didn't segment by billing tier with spending caps. These dimensions didn't exist when anyone built the metric definitions they're now stuck with.

You can argue the event schema was forward-looking enough to carry these properties even before they mattered for pricing. Sometimes it was. Often it wasn't. And even when the raw data is there, if your metering layer couples the schema to the rating configuration, having the data in the event doesn't help.

The decorrelation principle

There's an architectural alternative. Keep the metering layer (what you measure, what dimensions you track) separate from the rating layer (what you charge for each dimension). Change one without changing the other.

Under this model, a product team can add a dimension (new region, new model tier, new token type) to an existing metric while a finance team adds a rate for that dimension on its own schedule. Events already in the system pick up the new dimension as long as they haven't been invoiced yet. Past invoices are locked. The current billing period absorbs the change.

The event is the source of truth. The metric definition is a view over events. The rating configuration is a view over the metric. Separated like that, each can evolve without forcing a rewrite of the others.

The ingestion commitment

Flexibility in billing is often pitched as "you can change the price." That's the easy part. The hard part is changing what you measure after events are in the system.

The architecture that makes this possible starts at ingestion. Carry rich property payloads on every event from any source your infrastructure prefers: REST API, Kafka, Kinesis, S3. Include every property you might ever need to price on, even if you aren't pricing on them today. Let the metric layer decide which properties matter through filters and group keys added later. Let the rate layer decide what each dimension costs, independent of the metric.

The alternative looks fast on day 1 and rigid on day 90.

What this lets you do

Ingest events with rich properties from day one: region, model, type, tier, cache_status, whatever you might ever want to price on. Start with a small metric definition that ignores most of them. When you discover you need a new dimension, add a filter on the existing metric. Events you've already ingested pick it up. Add a rate. Done.

What this doesn't let you do: ingest events with a metric code you haven't declared yet. You can't retroactively apply a brand new metric to events that arrived before that metric existed. The event schema is permanent. The metric definition is layered on top.

Three questions that reveal the architecture

If you're evaluating a metering platform:

  1. If I add a new filter dimension to an existing metric in month 6, do events from month 1-5 that haven't been invoiced yet pick up the new dimension? Or is the new dimension only visible going forward?
  2. Can I change what I measure without simultaneously changing how I charge for it?
  3. If I launch a new product line and need a new price, can I add the rate separately from defining the dimension, or are the two steps coupled?

If the answer to #2 is "they're coupled," you've found the architectural constraint. That's fine if your metrics are stable. If you're in AI, they won't be.

Design forward

Define your metrics broadly on day one. Include dimensions you're not yet pricing on. Send region in your event properties even if you only have one region. Send model even if you only use one. The cost of carrying extra properties is near zero. The cost of not having them when you need them is a migration.

Your metric definition is a commitment. Your rating configuration should not be.