Home - NFHN Reader

Hey HN,

4.5 months ago I was climbing cell towers and installing closets. Zero programming background. Today I'm releasing VAC Memory System — an open-source conversational memory that achieves 80.1% on LoCoMo benchmark, beating Mem0 (68. 5%), Letta/MemGPT (74%), Zep (75. 1%), and Memobase (75.8%).

GitHub: https://github.com/vac-architector/VAC-Memory-System

🔥 The Problem I Solved

Vector databases have a critical blind spot: they return semantically similar but factually wrong results.

Example:

Query: "Where did I meet Alice?"
FAISS returns: "I met Bob at the coffee shop"

Why? High cosine similarity — same sentence structure, location mention. But wrong entity.

BM25 catches "Alice" via exact match but misses paraphrasing ("encountered Alice", "ran into her").

Mem0's approach: LLM-driven extraction → graph storage → semantic retrieval. Great for preferences, but still vulnerable to entity confusion on factual questions.

MemGPT/Letta's approach: OS-like virtual memory with paging between "core" and "archival" memory. Elegant for context management, but retrieval relies on the same semantic search that causes false positives.

🎯 My Solution: MCA (Multi-Candidate Assessment)

I invented a physics-inspired pre-filter that runs before expensive vector search:

# The "gravitational" ranking formula
coverage = len(query_keywords & memory_keywords) / len(query_keywords)
distance = max(0. 1, 1. 0 - coverage)
mass = coverage * importance_weight
force = G * (query_mass * memory_mass) / (distance² + δ)

The insight: Treat memories as planets with mass. Keywords create "gravitational attraction". High entity overlap = strong pull toward the query.

Why it works:

Query: "Where did I meet Alice?" → keywords: {alice, meet}
Memory: "Met Bob at coffee shop" → coverage = 0/2 = 0% → filtered out
Memory: "I met Alice at the library" → coverage = 2/2 = 100% → passed through

MCA eliminates false positives before FAISS even runs.

⚙️ Full Architecture (8 Steps)

Query: "Where did I meet Alice?"
         ↓
[1] Query Classification (factual/temporal/conceptual)
         ↓
[2] LLM Synonym Expansion (Qwen 14B via Ollama)
    "alice" → ["alice", "alicia", "her"]
    "meet" → ["meet", "met", "encountered", "ran into"]
         ↓
[3] MCA-FIRST FILTER (coverage ≥ 0.1)
    1000 memories → ~30 candidates
         ↓
[4] FAISS (BGE-large, 1024D)
    Adds semantic matches: "visited Alice", "saw her"
    → 100 candidates
         ↓
[5] BM25 (Okapi with custom tokenization)
    Catches keyword variations FAISS missed
    → 40 more candidates
         ↓
[6] Union + Deduplication → ~120 unique
         ↓
[7] Cross-Encoder Reranking (bge-reranker-v2-m3, 278M params)
    120 → 15 best
         ↓
[8] GPT-4o-mini (T=0.0, max_tokens=150)
    → Final answer

📊 Head-to-Head Comparison

Aspect	VAC Memory	Mem0	Letta/MemGPT	Zep
LoCoMo Accuracy	80.1%	68.5%	74. 0%	75.1%
Architecture	MCA + FAISS + BM25 + Cross-Encoder	LLM extraction + Graph	OS-like paging + Archive search	Summarize + Vector
Entity Protection	✅ MCA pre-filter	❌ Semantic only	❌ Semantic only	❌ Semantic only
Latency	2. 5 sec/query	~3-5 sec	~2-4 sec	~2-3 sec
Cost per 1M tokens	<$0.10	~$0.50+	~$0. 30+	~$0.20+
Reproducibility	100% (seed-locked)	Variable	Variable	Variable
Conversation Isolation	100%	Partial	Partial	Partial

🔬 Why Existing Solutions Fail on Factual Questions

Mem0's Graph Memory

Strength: Great for storing relationships and preferences ("User likes pizza")
Weakness: On factual retrieval ("When did I meet Alice? "), the LLM-driven extraction can miss nuances. Graph traversal still relies on semantic similarity for node matching.
VAC advantage: MCA ensures entity-level precision before any semantic matching

Letta/MemGPT's Virtual Memory

Strength: Elegant OS-inspired design. Self-editing memory blocks. Multi-step reasoning via "heartbeats".
Weakness: Archival retrieval uses archival_memory_search which is... vector search. Same entity-confusion problem.
VAC advantage: Hybrid retrieval (MCA + FAISS + BM25) covers all failure modes

Pure Vector Search (FAISS/Pinecone)

Strength: Fast, scalable, catches paraphrasing
Weakness: Optimizes for cosine similarity, not factual correctness
VAC advantage: Cross-encoder reranking on filtered candidates, not raw vectors

📈 The Numbers Behind 80.1%

Validated across:

10 conversations × 10 seeds = 100 runs
1,540 total questions
3 question types: Single-hop (87%), Multi-hop (78%), Temporal (72%)

Component Recall (ground truth coverage):

MCA alone: 40-50%
FAISS alone: 65-70%
BM25 alone: 50%
Union (MCA + FAISS + BM25): 94-100%

Key insight: No single retrieval method is sufficient. The union catches what each individual method misses.

🛠️ Technical Deep Dive

Why MCA Works (The Physics Metaphor)

When I explained my idea to Claude CLI, I said: "Memories are like planets. They have MASS based on frequency. They ATTRACT the query with gravitational force."

Claude thought I was crazy. Three hours of arguing later:

def gravitational_force(m1, m2, distance):
    G = 1.0
    return G * (m1 * m2) / (distance ** 2 + 0.001)

Result: +15% recall improvement.

Embedding Stack

Model: BAAI/bge-large-en-v1.5 (1024D)
Index: FAISS IVF1024,Flat
Why BGE: #1 on MTEB leaderboard for retrieval

Cross-Encoder Precision

Model: BAAI/bge-reranker-v2-m3 (278M params)
Why: Cross-encoders see query+document together, not separate embeddings
Impact: Converts 94% recall → 80% accuracy

LLM Generation

Model: GPT-4o-mini
Temperature: 0.0 (deterministic)
Why: Cheapest + fastest + reproducible

💰 Cost Comparison

System	Cost per 1M tokens	Notes
VAC Memory	<$0.10	GPT-4o-mini at T=0.0
Mem0	~$0.50+	LLM extraction overhead
Letta Cloud	~$0. 30+	Agent orchestration
OpenAI Memory	~$0. 30+	Built-in, but limited

VAC is 5-10x cheaper because:

MCA filter reduces candidate pool before expensive operations
Single LLM call for final answer only
No LLM-driven memory extraction/consolidation

🧪 Reproducibility

Every result is verifiable:

# Run with seed
SEED=2001 LOCOMO_CONV_INDEX=0 python orchestrator.py

# Same seed = same results
# 100 runs validated

This matters. Most AI benchmarks are non-reproducible. VAC locks everything:

Random seeds
Temperature = 0. 0
Deterministic FAISS search
Hash verification of indexes

📦 What's in the Repo

VAC-Memory-System/
├── mca_lite.py          # ~40 lines: learn MCA concept
├── pipeline_lite.py     # ~250 lines: 4-step demo pipeline
├── Core/*. so            # Compiled production binaries
├── data/                # Pre-built SQLite + FAISS indexes
├── baseline_100/        # 100 verified benchmark runs
└── run_test.sh          # One-click testing

LITE version: Fully open-source Python to learn the architecture FULL version: Compiled binaries that achieve 80.1%