Home

5 min read Original article β†—

Hey HN,

4.5 months ago I was climbing cell towers and installing closets. Zero programming background. Today I'm releasing VAC Memory System β€” an open-source conversational memory that achieves 80.1% on LoCoMo benchmark, beating Mem0 (68. 5%), Letta/MemGPT (74%), Zep (75. 1%), and Memobase (75.8%).

GitHub: https://github.com/vac-architector/VAC-Memory-System


πŸ”₯ The Problem I Solved

Vector databases have a critical blind spot: they return semantically similar but factually wrong results.

Example:

Query: "Where did I meet Alice?"
FAISS returns: "I met Bob at the coffee shop"  

Why? High cosine similarity β€” same sentence structure, location mention. But wrong entity.

BM25 catches "Alice" via exact match but misses paraphrasing ("encountered Alice", "ran into her").

Mem0's approach: LLM-driven extraction β†’ graph storage β†’ semantic retrieval. Great for preferences, but still vulnerable to entity confusion on factual questions.

MemGPT/Letta's approach: OS-like virtual memory with paging between "core" and "archival" memory. Elegant for context management, but retrieval relies on the same semantic search that causes false positives.


🎯 My Solution: MCA (Multi-Candidate Assessment)

I invented a physics-inspired pre-filter that runs before expensive vector search:

# The "gravitational" ranking formula
coverage = len(query_keywords & memory_keywords) / len(query_keywords)
distance = max(0. 1, 1. 0 - coverage)
mass = coverage * importance_weight
force = G * (query_mass * memory_mass) / (distanceΒ² + Ξ΄)

The insight: Treat memories as planets with mass. Keywords create "gravitational attraction". High entity overlap = strong pull toward the query.

Why it works:

  • Query: "Where did I meet Alice?" β†’ keywords: {alice, meet}
  • Memory: "Met Bob at coffee shop" β†’ coverage = 0/2 = 0% β†’ filtered out
  • Memory: "I met Alice at the library" β†’ coverage = 2/2 = 100% β†’ passed through

MCA eliminates false positives before FAISS even runs.


βš™οΈ Full Architecture (8 Steps)

Query: "Where did I meet Alice?"
         ↓
[1] Query Classification (factual/temporal/conceptual)
         ↓
[2] LLM Synonym Expansion (Qwen 14B via Ollama)
    "alice" β†’ ["alice", "alicia", "her"]
    "meet" β†’ ["meet", "met", "encountered", "ran into"]
         ↓
[3] MCA-FIRST FILTER (coverage β‰₯ 0.1)
    1000 memories β†’ ~30 candidates
         ↓
[4] FAISS (BGE-large, 1024D)
    Adds semantic matches: "visited Alice", "saw her"
    β†’ 100 candidates
         ↓
[5] BM25 (Okapi with custom tokenization)
    Catches keyword variations FAISS missed
    β†’ 40 more candidates
         ↓
[6] Union + Deduplication β†’ ~120 unique
         ↓
[7] Cross-Encoder Reranking (bge-reranker-v2-m3, 278M params)
    120 β†’ 15 best
         ↓
[8] GPT-4o-mini (T=0.0, max_tokens=150)
    β†’ Final answer

πŸ“Š Head-to-Head Comparison

Aspect VAC Memory Mem0 Letta/MemGPT Zep
LoCoMo Accuracy 80.1% 68.5% 74. 0% 75.1%
Architecture MCA + FAISS + BM25 + Cross-Encoder LLM extraction + Graph OS-like paging + Archive search Summarize + Vector
Entity Protection βœ… MCA pre-filter ❌ Semantic only ❌ Semantic only ❌ Semantic only
Latency 2. 5 sec/query ~3-5 sec ~2-4 sec ~2-3 sec
Cost per 1M tokens <$0.10 ~$0.50+ ~$0. 30+ ~$0.20+
Reproducibility 100% (seed-locked) Variable Variable Variable
Conversation Isolation 100% Partial Partial Partial

πŸ”¬ Why Existing Solutions Fail on Factual Questions

Mem0's Graph Memory

  • Strength: Great for storing relationships and preferences ("User likes pizza")
  • Weakness: On factual retrieval ("When did I meet Alice? "), the LLM-driven extraction can miss nuances. Graph traversal still relies on semantic similarity for node matching.
  • VAC advantage: MCA ensures entity-level precision before any semantic matching

Letta/MemGPT's Virtual Memory

  • Strength: Elegant OS-inspired design. Self-editing memory blocks. Multi-step reasoning via "heartbeats".
  • Weakness: Archival retrieval uses archival_memory_search which is... vector search. Same entity-confusion problem.
  • VAC advantage: Hybrid retrieval (MCA + FAISS + BM25) covers all failure modes

Pure Vector Search (FAISS/Pinecone)

  • Strength: Fast, scalable, catches paraphrasing
  • Weakness: Optimizes for cosine similarity, not factual correctness
  • VAC advantage: Cross-encoder reranking on filtered candidates, not raw vectors

πŸ“ˆ The Numbers Behind 80.1%

Validated across:

  • 10 conversations Γ— 10 seeds = 100 runs
  • 1,540 total questions
  • 3 question types: Single-hop (87%), Multi-hop (78%), Temporal (72%)

Component Recall (ground truth coverage):

  • MCA alone: 40-50%
  • FAISS alone: 65-70%
  • BM25 alone: 50%
  • Union (MCA + FAISS + BM25): 94-100%

Key insight: No single retrieval method is sufficient. The union catches what each individual method misses.


πŸ› οΈ Technical Deep Dive

Why MCA Works (The Physics Metaphor)

When I explained my idea to Claude CLI, I said: "Memories are like planets. They have MASS based on frequency. They ATTRACT the query with gravitational force."

Claude thought I was crazy. Three hours of arguing later:

def gravitational_force(m1, m2, distance):
    G = 1.0
    return G * (m1 * m2) / (distance ** 2 + 0.001)

Result: +15% recall improvement.

Embedding Stack

  • Model: BAAI/bge-large-en-v1.5 (1024D)
  • Index: FAISS IVF1024,Flat
  • Why BGE: #1 on MTEB leaderboard for retrieval

Cross-Encoder Precision

  • Model: BAAI/bge-reranker-v2-m3 (278M params)
  • Why: Cross-encoders see query+document together, not separate embeddings
  • Impact: Converts 94% recall β†’ 80% accuracy

LLM Generation

  • Model: GPT-4o-mini
  • Temperature: 0.0 (deterministic)
  • Why: Cheapest + fastest + reproducible

πŸ’° Cost Comparison

System Cost per 1M tokens Notes
VAC Memory <$0.10 GPT-4o-mini at T=0.0
Mem0 ~$0.50+ LLM extraction overhead
Letta Cloud ~$0. 30+ Agent orchestration
OpenAI Memory ~$0. 30+ Built-in, but limited

VAC is 5-10x cheaper because:

  1. MCA filter reduces candidate pool before expensive operations
  2. Single LLM call for final answer only
  3. No LLM-driven memory extraction/consolidation

πŸ§ͺ Reproducibility

Every result is verifiable:

# Run with seed
SEED=2001 LOCOMO_CONV_INDEX=0 python orchestrator.py

# Same seed = same results
# 100 runs validated

This matters. Most AI benchmarks are non-reproducible. VAC locks everything:

  • Random seeds
  • Temperature = 0. 0
  • Deterministic FAISS search
  • Hash verification of indexes

πŸ“¦ What's in the Repo

VAC-Memory-System/
β”œβ”€β”€ mca_lite.py          # ~40 lines: learn MCA concept
β”œβ”€β”€ pipeline_lite.py     # ~250 lines: 4-step demo pipeline
β”œβ”€β”€ Core/*. so            # Compiled production binaries
β”œβ”€β”€ data/                # Pre-built SQLite + FAISS indexes
β”œβ”€β”€ baseline_100/        # 100 verified benchmark runs
└── run_test.sh          # One-click testing

LITE version: Fully open-source Python to learn the architecture FULL version: Compiled binaries that achieve 80.1%