Librarian - Intelligent Context Management for AI

Open Source

Stop burning tokens.
Cure context rot.

AI agents re-read your entire context on every turn - costs explode, quality drops. The Librarian fixes this: up to 85% fewer tokens, no context rot, and near-infinite scalability. Open source.

💸

Exponential Cost

By turn 50, brute-force approaches send 6× more tokens than necessary. Every turn re-processes the entire history - costs scale as n².

Up to 85%Cost Reductionvs. brute-force at 50 turns

🧠

Context Rot

As context grows, LLMs lose track. Key instructions get buried under noise. Research shows the "Lost in the Middle" effect can cause quality to drop by 20% - 85% as context length increases.

82%Answer AccuracyBeats brute-force (78%) with less context

⏱️

Latency Ceiling

At 100K tokens, brute-force response generation can take up to 60 seconds. The prefill cost scales linearly with history size.

3-4×Faster at ScaleAt 100K tokens vs. brute-force

How the Librarian Works

A simple three-step process that replaces brute-force context with intelligent reasoning.

Index

After each message, a lightweight model creates a ~100-token summary. This builds a compressed index of the entire conversation - 10× smaller than the raw history. This happens asynchronously, so the user never waits.

Select

When a new message arrives, the Librarian reads the summary index and reasons about which messages are relevant. Unlike vector search, it understands temporal logic and dependencies between messages.

Hydrate

Only the selected messages are fetched in full and passed to the responder. The result: a highly curated context of ~800 tokens instead of 2,000+ tokens of noise. Less noise → better answers.

Built for Everyone

Coming Soon

Fine-Tuned Librarian Endpoints

We're building specialized LLM endpoints optimized for the Librarian's selection task. Early benchmarks show 1.3s context creation - an 84% reduction from general-purpose models. Zero config, drop-in replacement.

Stop burning tokens.Cure context rot.