Settings

Theme

Show HN: Pitstop-check – finds the retry bug that turns 429s into request storms

3 points by SirBrenton 3 months ago · 4 comments · 1 min read


I kept running into the same bug in AI agent codebases: retry logic that ignores Retry-After under concurrency.

Looks fine at first. Under load it turns rate limits into request storms.

I wrote a small CLI to catch it:

  npx pitstop-check ./src
It scans TS/JS and flags things like:

  - 429 handled without Retry-After
  - blanket retry of all 429s (no CAP vs WAIT distinction)
  - unbounded retry loops (no max elapsed)
Example (ran against OpenClaw):

  [WARN] src/agents/venice-models.ts:24 — 429 handled without Retry-After
  [WARN] src/agents/venice-models.ts:24 — All 429s treated as retryable — CAP vs WAIT not distinguished
The retry primitive supports Retry-After. The callers just don’t wire it up.

So when the API returns Retry-After: 600, the client retries on its own schedule instead of backing off.

What’s going on is basically collapsing different failure modes into one:

  WAIT — respect Retry-After
  CAP  — limit retries / concurrency
  STOP — don’t retry
Most code just does:

  retry()
The tool is heuristic (will flag some test files), but it’s been useful for quickly spotting this in real repos.

https://github.com/SirBrenton/pitstop-check

SirBrentonOP 3 months ago

If you want to try it on your own code: npx pitstop-check ./src — no config, no install. Works on any TS/JS repo. Happy to answer questions about the pattern or false positives.

rjpruitt16 3 months ago

Cool stuff.

  • SirBrentonOP 3 months ago

    Thanks Rahmi — saw EZThrottle a while back, been thinking about how the layers relate. Your coordinated retry layer assumes the failure is classified correctly upstream — that’s the gap I’ve been working on.

    Would be curious if you’re seeing cases where region racing picks the wrong route because the original 429 was misclassified.

    • rjpruitt16 3 months ago

      Ezthrottle works by sending the request and depending on what error code the user wishes to reroute on, it will send to another region. It give the user a chance to say something different in case the api misclassifies the error. The user would have to tune it.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection