Ask HN: How to scale agent systems when Layer 7 is unreliable?

1 points by rjpruitt16 4 months ago · 2 comments · 1 min read

Agent workflows often involve 10+ API calls to different services (LLMs, data APIs, web scraping). Layer 7 being unreliable = workflows fail or cause retry storms.

Common failure modes I'm thinking about: - 429 rate limits → agents retry → hammer API worse - Partial outages → synchronized retries across customers - LangGraph workflows fail mid-execution → how to resume?

For those running agent systems at scale: - How do you handle Layer 7 failures? - Retry coordination? Circuit breakers? - How do you prevent retry storms to downstream dependencies? - Do LangGraph workflows gracefully handle API failures?

Curious what the production reality looks like.

verdverm 4 months ago

There's nothing special about agents in this respect, the same techniques we use for other services.

rjpruitt16OP 3 months ago

What do you use?

Settings

Ask HN: How to scale agent systems when Layer 7 is unreliable?

Keyboard Shortcuts