Ask HN: How are you developing and testing agents without burning tokens?

1 points by zlatkov 2 months ago · 0 comments · 1 min read

Every agent run (and especially when running tests in CI/CD) adds token cost, nondeterminism, and slows iteration.

I’m not talking about LLM evals here. I mean integration-style testing.

There are some workarounds like VCR-style record/replay (e.g., Python's VCR “cassettes”) and similar ideas for capturing LLM calls. They can work for certain frameworks when your test/runtime is the one making the HTTP requests.

Using local LLMs to save costs adds significant delay for each execution (+ doesn't solve nondeterminism).

However, it becomes significantly more challenging for MCP-style flows, where the interaction can be IDE-driven (stdio/JSON-RPC) and the HTTP calls may not even originate from your code.

I'm curious what people are using.

No comments yet.

Settings

Ask HN: How are you developing and testing agents without burning tokens?

Keyboard Shortcuts