The call that sticks with us was with an engineering leader who'd already built billing. He wasn't shopping for anything. He just wanted to know if we could handle the edge cases his homegrown system couldn't.
About 20 minutes in, he mentioned his team was spending somewhere between 20 and 30 percent of their daily engineering bandwidth just maintaining the billing engine.
We asked if that felt like a lot.
"It's just how it is," he said.
That person is Shubhendu Shishir. He heads engineering at Simplismart, a full-stack MLOps platform serving companies like InVideo and Swiggy. By the time we spoke, his team had already spent 1.5 to 2 months building a custom billing engine.
The system worked. It also broke under heavy load, synced data via cron jobs instead of in real time, and meant sales couldn't close any deal with a custom rate without filing an engineering ticket first. He later described the whole thing as "building billing as a second product."
Simplismart isn't a one-off. Kush Daga, founding engineer of Segwise, an AI-driven ad analytics platform, spent three weeks building credit-based billing infrastructure in-house.
The system was fragile, had no customer-facing UI, hadn't been load tested, and every credit decision needed an engineer to unblock it.
They eventually scrapped it and shipped working credit infrastructure in three days on a third-party platform. Kush's advice afterward: "I'd tell any AI company, don't build a credit system yourself."
A team scopes billing, feels good about the estimate, builds the parts they can see, and then spends the next 12 to 18 months discovering the parts they couldn't.
The work never stops. It just gets less visible. It turns into maintenance tickets, support escalations when an invoice is wrong, and engineering time that should be going toward the actual product.
So why does this keep happening to smart teams, especially new AI teams?
The estimation trap nobody talks about
Billing estimates are almost always wrong in the same direction and for the same reason. It's not that engineers are bad at estimating. It's that billing has a deceptive surface area.
When an engineering lead scopes a billing system for SaaS, they're thinking about the parts they can see: capture usage events, run some math, generate an invoice, charge a card.
That's a reasonable thing to build in a few weeks. The catch is that those parts are also what you'd build for a prototype. They work fine until the business model changes, a customer does something unexpected, or finance asks for something nobody accounted for.
The billing system architecture that survives contact with the real world isn't the one you scoped. It's the one you ended up with after 18 months of edge cases.
There's a second trap hiding inside the first one. Even when a team knows billing will be more complex than a prototype, they scope the build, not the ownership. Building billing once is expensive but owning it permanently is more expensive.
But that second cost never shows up as a line item. It shows up as a percentage of every sprint for two years: maintenance tickets, proration bugs, finance reconciliation requests, and the slow grind of keeping a system running that nobody really wants to own.
The question that actually matters isn't "can we build billing?" Most teams can. The real question is what it costs to own billing as permanent infrastructure, and whether that's the best use of the engineering time available.
What you’re actually signing up to build
The part that surprises most teams isn't any single subsystem. It's how many there are.
When a team scopes billing, they're usually thinking about three or four pieces.
Usage metering, Invoice generation, Stripe for payment and maybe credit wallets if the product needs them.
In practice, a production-ready billing system has nine distinct subsystems, each with its own edge cases and maintenance surface. Most teams discover this in production rather than in planning.
Here's what that actually looks like.
1. Usage metering.
Before Simplismart had proper metering infrastructure, every billing cycle meant manual log parsing, SQL queries, and cron jobs.
In Shubhendu's words: "Very initially, we basically captured the log lines and whatever was necessarily required, we wrote some queries on top of it, and finally did the billing manually." That's not metered billing, that's archaeology.
What metering actually requires: deduplication at scale, backfill handling for events that arrive out of order, exactly-once semantics when things crash, and support for multiple billing units per product.
Simplismart now meters tokens, audio minutes, megapixels, and GPU minutes across 750-plus pricing features. All of that needs to be designed for from day one, not retrofitted after things start breaking.
2. The pricing engine
The first pricing model any team builds is never the last one. Tiered pricing, volume discounts, per-customer overrides, contract-specific minimums, promotional rates: none of these exist on day one, but the billing engine needs to support them when they do.
One engineer in an HN thread on billing infrastructure described rewriting their pricing logic four separate times in 18 months as their business model evolved. Each rewrite meant a migration on live customer data.
3. Proration and mid-cycle changes:
This one is genuinely tricky math, and it's easy to convince yourself it doesn't matter until you have enterprise customers. What happens when a customer upgrades from monthly to annual on day 17 of a 30-day cycle, while carrying a usage add-on that is already billed for 12 days under the old plan?
Most homegrown billing systems have "it mostly works" logic for this. The edge cases show up as customer support tickets six months after launch.
4. Invoice generation:
This is the subsystem teams most often think they've handled when they haven't. An invoice isn't a template with some math. It's tax calculations, credits, discounts, minimum commitments, overages, and the kind of line-item transparency that enterprise customers expect.
When Simplismart's manual billing couldn't show customers what they were consuming in real time, customers had to take the final invoice on faith. For a platform serving banks and healthcare companies, that's not a minor inconvenience. It shows up in renewal conversations.
Four down, five to go. And these next five are the ones nobody budgets for.
5. Credit wallets and prepaid balances:
A credit balance sounds like a number that decrements. In practice it means tracking balance state correctly, deciding the credit application order (promotional before paid? oldest first?), running real-time balance checks without adding latency to your API call path, blocking access automatically when credits run out, and handling top-up flows and grace periods. Segwise built this in-house.
Every credit decision needed an engineer, nobody had visibility into burn rate, and there was no customer-facing UI or alerting. Just an engineer manually unblocking things when something went wrong.
6. Entitlements and feature gating
Most teams forget to scope this entirely, and it's the one that bites hardest. Billing isn't only about money. It's about access. What features is this customer allowed to use right now, given their plan?
What's their API rate limit? This logic lives directly in your product's critical request path, not in a billing admin panel.
Getting it wrong means either over-serving customers and leaking revenue, or blocking them when they shouldn't be blocked.
7. Payment processing and dunning:
Teams correctly account for Stripe handling the payment mechanics. What they don't account for is dunning. Like what happens when a card fails, how many times to retry, when to send recovery emails, when to suspend access, when to write off the invoice.
That's entirely yours to build. Involuntary churn from failed payments is one of the largest sources of revenue leakage in SaaS, and most homegrown systems have almost no dunning logic because it never feels urgent until a lot of revenue has already walked out the door.
8. Subscription lifecycle management
Trials, activations, renewals, pauses, plan changes, cancellations, reactivations, contract amendments. The happy path is straightforward. The gaps show up at the edges: what happens to a paused subscription when its trial expires?
What happens to in-flight usage when a customer downgrades mid-cycle? These don't surface during development. They surface in production, months later, usually when something else is also broken.
9. Reporting and revenue recognition
Finance needs MRR, ARR, expansion revenue, and churn. Accounting needs ASC 606-compliant revenue recognition.
If your billing system can't produce clean data for both, someone is reconciling in a spreadsheet every month-end. It stops being a billing problem and becomes an engineering-and-finance problem that pulls in three teams every quarter.
Teams that plan carefully get two or three of these right in the first sprint. The rest show up in production.