Settings

Theme

Show HN: I lost $200 from an agent loop, so I built per-tool AI budget controls

lava.so

2 points by mej2020 a month ago · 2 comments · 2 min read

Reader

I left an agent running before bed. It got stuck in a loop. By morning it had burned through $200 in LLM calls.

That was the breaking point, but the real problem had been building for a while. I use tools like OpenClaw and Cursor daily, each hitting various AI providers. But I had no idea what each tool was actually costing me. One shared key across everything, no per-tool visibility, no way to cap spend.

So I built AI Spend into Lava. The idea is simple. Create isolated API keys, each with their own:

- Spend limit (daily/weekly/monthly/total) - Model restriction (lock to a specific model or allow any) - Real-time usage tracking - Instant revoke

It works as a transparent proxy. Your tools point to a single OpenAI-compatible endpoint. Lava validates the key, checks the spend limit and model restrictions, then forwards the request to the right provider. Spend is tracked per key per cycle. When a key hits its limit, requests are rejected until the cycle resets. Under the hood it translates requests across 38+ providers (OpenAI, Anthropic, Google, Mistral, DeepSeek, etc.), so anything that works with the OpenAI API works with this. No SDK changes.

Would love to hear how others are handling AI cost control, especially if you're running agents in production.

aura-guard 24 days ago

API-level spend caps solve the "how much" problem but not the "why." The agent still loops 50 times before hitting the limit. You just lose $50 instead of $200.

The missing layer is detection inside the agent loop itself. If the agent is calling search_kb for the 8th time with slightly different args, or about to issue a refund it already issued, you want to catch that at iteration 3, not at the dollar ceiling.

I built an open-source middleware called Aura Guard (https://github.com/auraguardhq/aura-guard) that does exactly this. It sits in the agent loop and detects repeated tool calls, argument jitter, duplicate side-effects, stall patterns, and budget overruns. When it catches a loop it can rewrite the prompt, return a cached result, or escalate instead of letting the agent spin until an external limit kills it.

Zero dependencies, framework-agnostic, works with any LLM provider. Has a shadow mode so you can see what it would catch without blocking anything.

Your approach and this are complementary. Spend caps at the proxy level, loop detection at the agent level. Both are needed if you're running agents in production.

amavashev 23 days ago

Per-key isolation + model locking is a solid baseline — especially for multi-tool stacks where one shared key hides everything.

One thing we’ve noticed though: spend caps stop damage, but they don’t prevent pathological behavior. By the time the cap trips, the agent has already drifted.

We’ve been experimenting with pre-authorization per action (reserve → commit style) rather than just per-key ceilings. It lets you detect anomalous patterns before the burn accumulates — especially in looping or tool-chaining scenarios.

Curious — have you seen most overruns come from loops, retries, or just high-token completions?

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection