memory is the residual of surprise

your brain doesn’t record experiences like a camera. it predicts them.

Karl Friston’s free-energy principle describes the brain as a prediction machine. it maintains an internal model of the world and generates expectations about what happens next. when reality matches the prediction, nothing interesting happens. when reality violates the prediction — a prediction error — that’s the signal.

prediction errors drive learning and memory consolidation. you don’t remember your commute because your brain predicted every part of it correctly. you remember the day your car broke down because your world model didn’t anticipate it. the surprise is the information.

during sleep, the hippocampus replays the day’s experiences and consolidates the ones that generated prediction errors into long-term cortical storage. the rest — the predictable, the familiar, the redundant — gets pruned.

memory isn’t storage. memory is the residual of surprise.

how AI agents “remember” today

most AI memory libraries — Mem0, Zep, Letta — follow the same pattern:

a conversation happens
an LLM extracts facts from the conversation
every extracted fact gets stored
at retrieval time, vector similarity finds the “most relevant” facts

step 3 is where everything goes wrong. the system stores everything — facts the agent already knows, trivial observations, redundant restatements. there’s no prediction error. the agent treats “user mentioned they like coffee” the same as “user just changed jobs.”

three months into production, the knowledge base is a dump. retrieval becomes unreliable because signal is buried in noise. engineers add hacks — recency weighting, importance scores, periodic cleanup jobs. none of it addresses the root cause: the system never had a principled way to decide what was worth remembering.

this is how you’d design memory if you’d never studied how memory actually works.

the Nemori paper was the first agent memory system I saw that took this seriously:

a conversation happens and gets segmented into episodes
before extracting knowledge, the system predicts what the episode should contain given existing knowledge
the episode is compared against the prediction
only the prediction errors — facts the system failed to anticipate — get extracted and stored

the existing knowledge base is the world model. the prediction step is the brain generating expectations. the calibration step identifies what violated those expectations. the extraction is consolidation — encoding only the surprising parts.

the result: a knowledge base that grows slowly, containing only information that was novel when encountered. no importance scoring heuristics. no cleanup jobs. importance emerges from prediction error, the same way it does in biological systems.

but prediction error only solves half the problem. it tells you what’s new — not when it was true or whether it still is.

time isn’t a timestamp

your friend lives in Paris, then moves to Amsterdam. if your memory system only tracks “the latest thing I know,” the Paris fact gets overwritten. that’s what most AI memory systems do — “Lives in Paris” becomes “Lives in Amsterdam” and history is gone.

biological episodic memory doesn’t work this way. it tracks when things were true and when you learned about them — separately. you find out on Friday that your friend moved last month. the event happened last month; your knowledge of it started Friday. your brain keeps both.

in database theory, this is called bi-temporal modeling: event time (when was this true in the world) and transaction time (when did the system learn it). a single timestamp tells you when something was stored. it doesn’t tell you when it started being true or when it stopped.

without this, prediction error gets the extraction right but the storage wrong. you correctly identify “user moved to Amsterdam” as novel. but now what? overwrite “lives in Paris” and the history is gone. keep both and you have two contradicting facts with no way to tell which is current. either way, a year later someone asks “where did they live in 2023?” and the system can’t answer.

what I built

I took these two ideas — predict-calibrate extraction from prediction error, bi-temporal validity from episodic temporal structure — and built memv, an open-source memory system for AI agents. (the neurobiology degree finally paid off.)

messages → episodes → predict-calibrate → knowledge → hybrid retrieval

conversations get segmented into episodes — coherent chunks of interaction, analogous to event segmentation in cognitive psychology. before extraction, the system predicts what each episode should contain given existing knowledge. only prediction errors get stored.

each piece of knowledge carries event time (when was this true in the world) and transaction time (when did the system learn it). contradicting facts don’t get overwritten — they get superseded, the old fact receiving a temporal bound.

retrieval combines vector similarity and BM25 text search via Reciprocal Rank Fusion, filtered by temporal validity. by default you get currently-valid knowledge. you can also query any point in time.

the whole thing runs on SQLite. no graph database, no vector service, no external infrastructure. pip install memvee.

what this doesn’t do

I borrowed two ideas from neurobiology. biological memory has dozens more.

prediction error filters the redundant, not the irrelevant. your brain encounters thousands of surprising stimuli daily. most don’t make it to long-term memory — they pass through additional filters: emotional salience via the amygdala, goal-relevance via prefrontal evaluation, social significance. you walk past a stranger in an unusual hat — prediction error. you don’t remember it a week later. someone tells you your cat died — also prediction error, but it stays with you for years. the difference is relevance, not surprise. memv has no second filter. “user saw a weird cloud” and “user is leaving their job” both get stored.

forgetting is a feature, not a bug. biological memory actively forgets — both passive decay and interference from newer memories. memv stores everything indefinitely. the knowledge base grows monotonically even when old facts lose practical value.

associative retrieval goes deeper than similarity. a smell triggers a childhood memory across completely unrelated contexts. vector embeddings capture semantic similarity, but not the rich, multi-modal associative structure of biological memory. retrieval in memv finds what’s about the same thing. biological retrieval finds what’s connected to the same thing. that’s a fundamentally different operation.

these define the boundary: eliminate redundancy, handle temporal evolution, keep the knowledge base focused on the novel. what it can’t do is distinguish novel-and-important from novel-and-irrelevant. that’s the next hard problem.

where this leads

right now, memv’s “world model” is flat — a collection of facts. it doesn’t represent what the user is trying to achieve or why facts matter to them. the free-energy principle points at what’s missing: a generative model of goals, causes, and expectations — not just stored facts.

a world model that tracks goals could weight prediction errors by relevance. “user saw a weird cloud” is novel but irrelevant to shipping a product. “user’s co-founder is leaving” is novel and urgent. the difference is legible only if you model goals.

a world model with causal structure could propagate consequences. “user switched from Python to Rust” doesn’t just update one fact — it invalidates a cluster of related knowledge about their tech stack. you don’t hope the LLM notices. you guarantee it propagates.

this is probably achievable with current LLMs as the world model backbone — but that depends on a question nobody has settled: do LLMs trained on next-token prediction develop internal world models, or are they pattern matchers that need explicit structure on top? the answer changes how far prompting alone can take you.

either way: memory systems need to move from “store what’s novel” toward “store what matters.” prediction error gets you halfway there.

memv is open source at github.com/vstorm-co/memv. the Nemori paper is at arxiv.org/abs/2508.03341. Friston’s free-energy principle has a gentler introduction in “the free-energy principle: a unified brain theory?” (Nature Reviews Neuroscience, 2010).