Open Source
Stop burning tokens.
Cure context rot.
AI agents re-read your entire context on every turn - costs explode, quality drops. The Librarian fixes this: up to 85% fewer tokens, no context rot, and near-infinite scalability. Open source.
💸
Exponential Cost
By turn 50, brute-force approaches send 6× more tokens than necessary. Every turn re-processes the entire history - costs scale as n².
Up to 85%Cost Reductionvs. brute-force at 50 turns
🧠
Context Rot
As context grows, LLMs lose track. Key instructions get buried under noise. Research shows the "Lost in the Middle" effect can cause quality to drop by 20% - 85% as context length increases.
82%Answer AccuracyBeats brute-force (78%) with less context
⏱️
Latency Ceiling
At 100K tokens, brute-force response generation can take up to 60 seconds. The prefill cost scales linearly with history size.
3-4×Faster at ScaleAt 100K tokens vs. brute-force
How the Librarian Works
A simple three-step process that replaces brute-force context with intelligent reasoning.
1
Index
After each message, a lightweight model creates a ~100-token summary. This builds a compressed index of the entire conversation - 10× smaller than the raw history. This happens asynchronously, so the user never waits.
2
Select
When a new message arrives, the Librarian reads the summary index and reasons about which messages are relevant. Unlike vector search, it understands temporal logic and dependencies between messages.
3
Hydrate
Only the selected messages are fetched in full and passed to the responder. The result: a highly curated context of ~800 tokens instead of 2,000+ tokens of noise. Less noise → better answers.
Built for Everyone
Coming Soon
Fine-Tuned Librarian Endpoints
We're building specialized LLM endpoints optimized for the Librarian's selection task. Early benchmarks show 1.3s context creation - an 84% reduction from general-purpose models. Zero config, drop-in replacement.