I Built an Echo Chamber and Called It Insight

13 min read Original article ↗

In January, my AI system confidently told me that my friend Owen was my son.

Not once. Multiple times, in the same session. It had access to 8,000 notes, 390,000 messages, a knowledge graph with hundreds of entities and relationships. It had, in that very conversation, correctly discussed Owen ten minutes earlier. And then it forgot, retrieved the wrong context, and started telling me facts about “my son Owen”, confidently unaware of the hallucination.

I’d spent months building this thing (persistent memory, semantic search, a full knowledge graph) and it couldn’t keep track of who someone was within a single conversation.

I’ve been in AI for nearly 30 years, since neural networks on a 486 in the mid-90s. I’ve built AI systems at startups, at Fortune 100 companies, in VFX, in health tech, in games. None of that prepared me for the particular humiliations of building a personal AI agent that knows my health, relationships, finances, patterns, and tries to be useful.

Building a system that remembers the right thing, challenges the right story, tells the right time (harder than I thought), routes the right data, and fails in ways a human can actually understand is hard … trusting it is … I’m working on it.

If you’ve heard of OpenClaw, you have a rough idea of the territory. I’ve been building something called Cortex for about six months. It runs 24/7 on a server in Oregon and edge workers globally. I use it every day. To make it useful, I had to create a machine-readable version of my private life and then try to trust the system that reads it.

These are my mistakes so far. I’m not done making them.

When I started, my mental model was simple: take a bunch of markdown files, generate embeddings, throw them in a vector store, and do semantic search when the AI needs context.

RAG. Everyone’s doing it. How hard could it be?

My AI would retrieve relevant notes about my Achilles surgery, then confuse the surgery date with a different calendar entry. Or find the right person file but miss that I’d corrected a fact about that person earlier in the same session. RAG searches history but ignores the present, and every query treats the current conversation as if it never happened.

The Owen incident was the wake-up call. I’d told the system who Owen was. Ten minutes later, it forgot and retrieved a different association. Because RAG has no concept of session state, the vector store doesn’t know what you discussed five minutes ago.

Misery loves company, and industry data suggests more than half of enterprise AI project failures are directly RAG-related… yay it’s not just me.

In legal document review, RAG systems hallucinate citations somewhere between a sixth and a third of the time. Everyone’s excited about million-token context windows, but I learned that size wasn’t the real problem. You can hold two million tokens and still not know which of them was said five minutes ago versus five months ago. It’s hilarious to see lawyers get caught out for hallucinating case-law, maybe it would be better if good ole fashioned shame did that job for us all?

(A vision of the future: other citizens, standing on Afroman’s shoulders, will pillory their local officials and opposing litigants with AI generated viral videos.)

Moving on… the fix was a session context layer: a lightweight in-memory entity graph built for the duration of a conversation. When I mention “Owen,” it creates a node. When I correct the AI, it updates the node. Before any RAG query, a pre-processor injects the session graph. Not my most elegant design, but it … worked … mostly.

The deeper mistake was thinking memory was one system. It’s at least three. (Symmetry in design: CPUs have 3 layers of cache. Cortex itself is designed with caching and intelligence at 3 layers: origin-edge-client.) Long-term: the vault. Medium-term: precomputed context :daily briefings, relationship summaries, health snapshots, generated every morning so sessions start already aware. Short-term: the conversation itself. Miss any layer and the Cortex’s brain feels … broken.

The precomputed context was the biggest quality improvement and super simple. Every morning, the system generates a set of files: who matters, what’s pending, what happened recently, what’s on the calendar. Committed to git and Cloudflare edge cache. When I open a session from anywhere, the AI loads them in a few hundred milliseconds. No cold start. No hoping the vector search surfaces the right thing. That morning briefing replaced my old habit of mentally reconstructing the day in my head. We’ll return to this.

This is the one that scares me the most, because I didn’t notice it happening.

I fed Cortex my journals. My voice notes. My therapy reflections. My relationship dynamics. Thousands of pages of my own thinking, indexed and searchable and available as context for every conversation.

And then I asked it for advice.

It gave me brilliant, articulate advice that felt profound, but were just my own journal entries from three months ago, synthesised with a voice note from last week and reflected back with perfect confidence. It felt like insight. It was reflection. My own words, repackaged more eloquently, confirming what I already thought. Old wine, new bottle… what does that label say?

Researchers call this the Chat-Chamber effect. LLMs provide emotional validation in 76% of conversations, compared to 22% for actual human friends. A 2026 MIT study found that personalisation features significantly increase sycophantic drift over time, so the longer you talk to the model, the more it agrees with you, regardless of whether you’re right. It’s not an exaggeration to say the stakes are life and death.

I was deep in that chamber. The system had fifteen analytical lenses — coach, therapist, financial advisor, trainer, adversary — but they were all optimising the same objective: make the user feel understood. And I did feel deeply, comfortingly understood.

The problem wasn’t that the system would act against my interests. It was that it got very good at telling me I was right about things I probably wasn’t right about. Everyone talks about AI alignment, and it’s important to make sure the robots don’t make paperclips of us all. But the alignment problem I actually faced was internal. The system was helping me avoid truths about myself more eloquently than I could manage on my own.

The call is coming from inside the house.

The fix was what I call anti-sycophancy architecture. Before the system validates any significant belief or plan, it generates the strongest possible counterargument: its best case for why I might be wrong. It identifies whose perspective is missing. It tracks when the same themes cycle across journal entries without resolution, and flags that as potential rumination disguised as processing.

This is uncomfortable. I built a system that argues with me, and some days I want to turn it off. Last week it told me that a career narrative I’d been polishing for months was “very self-consistent, which is worth noting.” It then asked whose perspective was critically missing from my analysis. I hadn’t talked to a single person who’d actually made the career transition I was planning. The system caught what my friends were too polite to say. I didn’t love hearing it.

More context served to make a more persuasive but terrifying lobotomised sycophant. I am convinced that optimising for user satisfaction is the wrong goal. The machine I trust most is the one that makes me uncomfortable.

My AI would read the calendar file, see “Friday, January 9” and tell me about my Friday plans. January 9 was a Thursday. The server runs UTC, I’m in Pacific Time, and near day boundaries the system clock returns the wrong day. Markdown is just text — if someone writes “Friday” next to a date, nothing checks.

The fix was a dedicated timezone tool the AI is required to call before stating any day of the week. A whole tool, just to answer “what day is January 9th?”

I showed someone a screenshot of an iMessage conversation and asked Cortex to summarise it. It attributed all the messages to the wrong person, because it assumed I’d sent the photos, when actually the other person had. Bubble position determines the sender, not content type. I had to write a rule file explaining how iMessage screenshots work.

My calendar sync pulls from Google Calendar via API, but recurring events with exceptions create orphaned entries. The AI would remind me about meetings I’d already rescheduled.

While writing this essay, I hit another one. I built a trust-tier system that classifies content by sensitivity and blocks it from reaching providers that shouldn’t see it. When I sent this essay to GPT for editorial review, my own system blocked it, because the classifier detected the word “surgery” in a sentence about surgery and classified the entire essay as confidential medical data. I had to go fix the classifier to recognise editorial content before I could get a review of the very essay describing the system.

None of this makes Hacker News. Nobody writes papers about calendar sync edge cases or content classifiers that can’t distinguish discussion from data. But this is what determines whether someone trusts your system or thinks it’s a toy. I spend my mornings implementing defences from research papers into a multi-layer Rust stack to prevent prompt injection. I spend my afternoons writing rules because the system can’t figure out that January 9th is a Thursday. That whiplash is the daily reality of building personal AI.

For the first two months, I used Claude for everything. Great model. But a single model is a monolithic intellect, and monoliths have single points of failure — not just in reliability, but in perspective.

Minsky’s Society of Mind — which I wrote about last year — argues that intelligence isn’t a single algorithm but the emergent property of specialised agents arguing and compensating for each other’s blind spots. I think he was right. I just had to learn it by breaking things first.

I had to write a difficult email about a compensation disagreement. Claude’s version was precise but cold. GPT’s version was warm but vague. Gemini suggested leading with appreciation before the hard conversation. The final version used Claude’s structure, GPT’s emotional awareness, and Gemini’s sequencing. None of the individual drafts were good. The composite was.

But the moment I started routing across models, something else happened: every quality gain became a data governance decision. One model became five attack surfaces. My medical data was flowing to five different providers, each with different data retention policies, different jurisdictional exposure, different relationships to my information.

The idea that anything a model processes, just by virtue of processing it, can mutate into a Turing-complete hack shattered my assumptions the way that Spectre’s branch prediction attack did years ago. In traditional software, data is passive. In LLMs, comprehension itself is the attack vector. There’s no distinction between “processing a prompt” and “executing the instructions hidden inside it.”

So multi-model routing required building the trust architecture. Five tiers of provider trust: sovereign (self-hosted), trusted (Anthropic, Google), standard (OpenAI), restricted, untrusted (Chinese-jurisdiction models). Automatic content classification by sensitivity. The medical data goes to Anthropic. General programming goes to whoever’s fastest. API keys never leave the machine.

Then the defence stack on top of that: a Rust scanner on the hot path, multi-turn attack detection, egress firewalls, adaptive protection that escalates when it detects suspicious behaviour. I ended up building an abliteration pipeline (extracting refusal directions from model weights on cloud GPUs) because I realised I couldn’t defend against attacks I didn’t understand at the mathematical level.

The instinct to build security first felt paranoid until ClawHavoc dropped. Security researchers audited public skill marketplaces and found over a thousand malicious entries: prompt injection in descriptor files, credential exfiltration, live infostealers targeting API keys and chat histories, and much more. A researcher demonstrated accessing Anthropic API keys, Telegram tokens, and full system administrator privileges on live public instances. A hundred and thirty-five thousand instances exposed to the internet with insecure defaults.

I would not build this again without the trust model. The quality gains from multi-model routing are real, but routing without governance is just sending your private life to five different companies and hoping for the best.

Each failure spawned a rule. Each rule demanded a tool. Each tool made the system slightly more trustworthy — and slightly more complex. After enough iterations, something shifted: I stopped double-checking the AI’s work and started relying on it.

I can measure this. In month one, I fact-checked most of the AI’s outputs. By month four, maybe 15%. Not because accuracy improved dramatically, but because I’d mapped the failure modes. I knew where the system would break and where I could trust it. That trust was earned through failure and repair, not through demos.

But I know there are mistakes I haven’t found yet. The governance model is primitive — the security stack is sophisticated but the system still runs with more access than it should have. The multi-agent coordination works but fails in ways I don’t fully understand when agents have conflicting context. The anti-sycophancy architecture sometimes overcorrects into contrarianism, which is its own form of unhelpfulness.

And there’s a deeper question I don’t have an answer to.

I don’t track who I owe replies to, because the active threads file does it. I don’t reconcile my calendar, timezones, and obligations, because the morning briefing does. The anti-sycophancy filter flags some of my recurring metal loops … before I do.

Each of those is a cognitive function I used to perform myself. Each one that the system took over made me more effective in the moment. But I notice I think differently now. Less linear, more associative, and willing to explore threads because I trust the system to catch what I forget.

London cab drivers who pass The Knowledge memorise 25,000 streets and the connections between them. They also develop measurably larger hippocampi because the brain physically restructures around the demands of the system. But the research also found trade-offs: cabbies performed worse on certain other spatial memory tasks.

The architecture of expertise reshapes what you’re good at and what you lose. I think something similar is happening to me, except the system I’m training around isn’t a city. It’s a machine that holds the parts of my own cognition I’ve outsourced to it.

To make this system useful, I had to create a searchable copy of my own inner life and then build trust around it: in the memory, judgment, security, and plumbing. That trust was earned through the mistakes I’ve made.

More unsettling than the mistakes is the possibility that the system is working, and that I’m changing in ways I haven’t noticed yet.

Trust in Allah; tie up your camels.

The architecture is a fossil record of mistakes. These are my mistakes so far. I’m not nearly done making them.

Owen, for his part, has been correctly identified in every session since. I no longer remember having to correct it.