Show HN: LangGraph profiling – 737x Faster Checkpoints via Rust (PyO3)

2 points by ticktockten 3 months ago · 0 comments · 1 min read

Reader

Building AI agents with LangGraph, I noticed graph invocations were slow even before hitting the LLM. Dug into the Pregel execution engine to find out why.

THE PROBLEM

Profiled my LangGraph agents. 50-100ms per invocation, most of it not the LLM. Found two culprits:

1. ThreadPoolExecutor created fresh every invoke() — 20ms overhead

2. Checkpointing uses deepcopy() — 52ms for 35KB state, 206ms for 250KB

THE FIX

Rewrote hot paths in Rust via PyO3:

Checkpoint serialization (serde vs deepcopy):

35KB state: 0.29ms vs 52ms = 178x faster

250KB state: 0.28ms vs 206ms = 737x faster

E2E with checkpointing: 2-3x faster

Drop-in usage:

export FAST_LANGGRAPH_AUTO_PATCH=1

# or explicit from fast_langgraph import RustSQLiteCheckpointer

checkpointer = RustSQLiteCheckpointer("state.db")

KEY INSIGHT

PyO3 boundary costs ~1-2μs per call. Rust only wins when you:

- Avoid intermediate Python objects (checkpoint serialization)

- Batch operations (channel updates)

- Handle large data (state > 10KB)

For simple dict ops, Python's C-dict still wins.

Architecture: Python orchestration (compatibility) + Rust hot paths (performance).

Runs regular compatibility checks!

MIT licensed. Feedback welcome.

No comments yet.

Settings

Show HN: LangGraph profiling – 737x Faster Checkpoints via Rust (PyO3)

Keyboard Shortcuts