I built a persistent memory layer for AI agents in Rust
Every Claude Code session starts from zero. It doesn't remember the bug you debugged yesterday, the architecture decision you made last week, or that you prefer Tailwind over Bootstrap. I built Memori to fix this.
It's a Rust core with a Python CLI. One SQLite file stores everything -- text, 384-dim vector embeddings, JSON metadata, access tracking. No API keys, no cloud, no external vector DB.
What makes it different from Mem0/Engram/agent-recall:
- Hybrid search: FTS5 full-text + cosine vector search, fused with Reciprocal Rank Fusion. Text queries auto-vectorize -- no manual --vector flag needed.
- Auto-dedup: cosine similarity > 0.92 between same-type memories triggers an update instead of a new insert. Your agent can store aggressively without worrying about duplicates.
- Decay scoring: logarithmic access boost + exponential time decay (~69 day half-life). Frequently-used memories surface first; stale ones fade.
- Built-in embeddings: fastembed AllMiniLM-L6-V2 ships with the binary. No OpenAI calls.
- One-step setup: `memori setup` injects a behavioral snippet into ~/.claude/CLAUDE.md that teaches the agent when to store, search, and self-maintain its own memory.
Performance (Apple M4 Pro): - UUID get: 43µs - FTS5 text search: 65µs (1K memories) to 7.5ms (500K) - Hybrid search: 1.1ms (1K) to 913ms (500K) - Storage: 4.3 KB/memory, 8,100 writes/sec - Insert + auto-embed: 18ms end-to-end
The vector search is brute-force (adequate to ~100K), deliberately isolated in one function for drop-in HNSW replacement when someone needs it.
After setup, Claude Code autonomously:
- Recalls relevant debugging lessons before investigating bugs
- Stores architecture insights that save the next session 10+ minutes of reading
- Remembers your tool preferences and workflow choices
- Cleans up stale memories and backfills embeddings ~195 tests (Rust integration + Python API + CLI subprocess), all real SQLite, no mocking.
MIT licensed.
GitHub: https://github.com/archit15singh/memori
Blog post on the design principles: https://archit15singh.github.io/posts/2026-02-28-designing-cli-tools-for-ai-agents/ This is exactly the problem most agent builders hit around turn 10-15. Simple system_prompt + conversation_history patterns work for a few turns, then context drift happens. The 5-layer approach (chunk → embed → retrieve → rank → synthesize) fixes this: your agent needs to forget smartly, not remember everything. Key insight from building agents daily: the hard part isn't storage - it's knowing WHEN to chunk, expire, or summarize. Session boundaries matter more than raw persistence. If you're building for multi-agent workflows, think about concurrent write conflicts early. Much cheaper to design around than retrofit. Something something session boundaries mumble think much cheaper to design around. Project sounds cool. Would have tried it except for this weird engagement.