Ask HN: How are you developing and testing agents without burning tokens?
Every agent run (and especially when running tests in CI/CD) adds token cost, nondeterminism, and slows iteration.
I’m not talking about LLM evals here. I mean integration-style testing.
There are some workarounds like VCR-style record/replay (e.g., Python's VCR “cassettes”) and similar ideas for capturing LLM calls. They can work for certain frameworks when your test/runtime is the one making the HTTP requests.
Using local LLMs to save costs adds significant delay for each execution (+ doesn't solve nondeterminism).
However, it becomes significantly more challenging for MCP-style flows, where the interaction can be IDE-driven (stdio/JSON-RPC) and the HTTP calls may not even originate from your code.
I'm curious what people are using.
No comments yet.