What Used to Take Months Now Takes Days

Building production software with Claude Code while doing my day job

Look, I’ve been slinging code professionally for 30 years now. I’ve also built succesful startups, written bestselling books, consulted for Fortune 500s, and watched countless technology waves come and go. Catching some of those waves at just the right moment are what propelled my career to where it is now. I’ve also witnessed “paradigm shifts” that weren’t and “revolutions” that fizzled. So believe me when I tell you that what I’m living through this week is genuinely different from any change I’ve ever seen before.

It started this past week (between Christmas and New Year’s) a weird liminal period where in ordinary years I would just relax and take it easy. Instead I was reflecting on how much time my team at ZAR and I spend in Claude Code sessions. Hours and hours of deep technical work, decisions being made, architecture being discussed, bugs being solved. And then… poof. The transcript disappears. The next session starts fresh.

Of course it’s not just Claude Code. The same challenge faces our Slack conversations. Linear comments. GitHub PR discussions. All this priceless institutional knowledge, constantly evaporating.

Also, the following tweet was very much on my mind.

Press enter or click to view image in full size

The problem isn’t unique to AI-assisted development, but AI makes it worse. Traditional development leaves artifacts: design documents, ADRs, wiki pages, commented code. When you’re pair-programming with Claude Code, the artifact is the conversation itself. The decisions live in the dialogue. And unless you’re meticulously copying things into documentation (face it, you’re not), then that knowledge disappears the moment you close the terminal.

I started thinking about the scale of it. My team runs hundreds of Claude Code sessions per week. Each one contains decisions, learnings, architectural discussions, debugging insights. Multiply that by the dozens of Slack threads, the Linear comments on tickets, the GitHub PR review conversations. We’re generating reams of institutional knowledge and retaining none of it in a methodical fashion that can be leveraged by the advanced/autonomous AI agents I want to build in 2026.

But what if we could capture it all? What if there was a system that passively ingested every transcript, every thread, every discussion and distilled it into queryable organizational memory? Not just storing raw text, but actually understanding it. Extracting the decisions. The learnings. The conflicts. Building a semantic graph that a coding assistant could traverse. Believe it or not, that’s been a pet topic of mine since 2005!

RAG you say? Sure, I guess I could buy a solution or use something open-source. There’s probably a few dozen startups working on exactly what I’m about to show you. There’s also a slew of more traditional enterprise knowledge management platforms that would cost at least six figures before you factor in the integration costs, the consultants, the months of implementation. Fuck that.

I built it in four days.

Not a proof of concept. Not a demo. The first cut of Nexus, a production-ready system with authentication, semantic search, an MCP server for agent access, webhook integrations for our primary SaaS platforms, comprehensive test coverage, deployed, integrated and ready for full-scale adoption at my company this coming Monday. Nearly 13,000 lines of code. By the time I let this blog post marinate awhile and hit publish, I’ll probably have written another few thousand just with the issues I have queued up.

And here’s the funny thing: building Nexus wasn’t even my primary focus all week. I’m the new CTO of ZAR, and other than New Year’s Eve and New Year’s Day (when I had the luxury of uninterrupted time), I was building Nexus while juggling otherwise normal life and work responsibilities. Meetings. Slack threads. Writing production code. Code reviews. Planning. Recruiting. The usual shit. Nexus happened in the gaps, in the afternoons, in the bursts of flow state I could carve out between other obligations.

Why I’m writing this (instead of open-sourcing)

Before diving in, let me address something. My career was built on open source. I’ve contributed to Rails since the early days. I’ve authored gems that have been downloaded millions of times. I literally wrote “The Rails Way” franchise, which across its eight editions has helped codify best practices for generations of Rails developers. I’ve evangelized sharing code freely for two decades.

So why am I blogging about Nexus instead of just publishing it on GitHub?

Because of an uncomfortable realization: in an era where killer software can be developed this fast, by the right people with the right tools, and maintaining that software is practically free thanks to agentic help… open-sourcing doesn’t make the same kind of sense it used to. Not for this project.

Nexus represents a genuine competitive advantage for my company. It’s the kind of infrastructure that could differentiate us in the market. In the old days, building something like this was so expensive and time-consuming that you’d never just build it in-house. (Unless you’re Shopify, I guess.) If you were crazy/stupid enough to try anyway, then open-sourcing it made strategic sense. You’d gain community contributions, bug fixes, and reputation, all while knowing your competitors would need their own multi-month effort to catch up. The barrier to replication was high enough that sharing made sense.

Now? Fuck, no…. If I open-source Nexus today, pretty much anyone with Claude Code could fork it, customize it, and deploy it by tomorrow afternoon. My competitive advantage would evaporate in hours. That’s not an exaggeration. I’ve used Claude Code to take complex codebases and modify them substantially in single sessions. To rewrite Python libraries in Ruby in one sitting. The replication barrier has collapsed.

Before you go into fits of despair, don’t worry. I don’t think Claude Code is the death knell for open source. Foundational libraries, protocols, and tools still benefit enormously from open collaboration. The Ruby ecosystem, the JavaScript ecosystem, infrastructure tools like PostgreSQL and Redis… all will continue to benefit from shared development. But for custom built in-house applications that represent strategic advantage? The calculus has shifted dramatically.

Another reason I’m writing this because I keep seeing skeptics online asking: “Okay, if AI-assisted development is so revolutionary, where are the projects? Where’s the evidence?” The implication is that people like me are just hype-mongering, that “vibe coding” produces nothing of substance. That we’re exaggerating our productivity gains or building toy projects and calling them production systems.

Well, here you go. Here’s the evidence. Here’s a real project, with real commits, real timestamps, and real production deployment. Follow along. I’ll show you what was built, when it was built, and how long it actually took.

What Nexus Actually Does

Let me be concrete about what got built. Abstract descriptions of “knowledge management” don’t convey the scope. Let me walk you through the actual system.

Nexus is an organizational knowledge distillation service. The core workflow:

Transcripts come in from any source: Claude Code hooks that fire automatically when sessions end, Slack threads via Events API, GitHub webhooks for PR discussions, Linear webhooks for issue comments, or manual submission through the API.
LLM distillation analyzes each transcript and extracts structured knowledge: decisions made, lessons learned, people involved, topics discussed.
RDF storage persists everything as semantic triples in Oxigraph, a high-performance graph database with full SPARQL support. Our source of truth is the graph, not Postgres.
Vector embeddings (via pgvector) enable semantic similarity search across the entire knowledge base.
MCP server exposes the graph to AI agents, so Claude can query your organizational memory directly during coding sessions.
Web UI lets humans browse sessions, explore the ontology, search semantically, and manage conflicts.

Press enter or click to view image in full size

The sessions index with conversational query interface running on my local development environment. Real sessions from actual Claude Code work on Nexus itself.

The technology stack deserves explanation because the choices matter:

Ruby 4.0 and Rails 8 (edge): I’m running the brand new version of Ruby and the main branch of Rails, not a released version. (Why, see the screenshot below for an example of Claude being witty.)

Press enter or click to view image in full size

One of my favorite things about Rails 8 is that ships with the Solid* gems that eliminate Redis as a dependency. SolidQueue for background jobs, SolidCache for caching, SolidCable for websockets. One less piece of infrastructure to manage.

PostgreSQL with pgvector: The primary relational database, but also the vector store for semantic search. The pgvector extension lets you store 768-dimensional embeddings alongside your regular data and query them with similarity operators. No need for a separate Pinecone or Weaviate instance.

Oxigraph: Oxigraph is a Rust-based RDF triple store with SPARQL support. It’s what makes the knowledge graph actually work. Every piece of distilled knowledge becomes semantic triples that you can query with the full power of SPARQL. “Find all decisions related to authentication made by Sarah” is a single query.

Raix: My own gem for LLM orchestration via OpenRouter. It handles the prompt construction, response parsing, and error handling for the distillation pipeline.

GitHub OAuth: Authentication for the web UI and API. Everyone has a GitHub account, so there’s no signup friction.

Now let me take you through how this thing came together, day by day.

Day 1: December 29th — The Initial Checkpoint

Time: Late morning to evening
Commits: 1 major checkpoint
Lines added: ~6,000

The first real checkpoint landed at 5:47 PM Central time. I’d been working since late morning, but I didn’t checkpoint it until I had something coherent and working.

136 files. Six thousand lines. That’s not a typo.

It’s hard to properly explain the power of Claude Code to people who haven’t experienced AI-assisted development at this level. I wasn’t typing 6,000 lines. I was directing 6,000 lines. Describing what I wanted, reviewing what Claude proposed, course-correcting, integrating, testing. The actual character input from my keyboard was maybe 10% of that. But the design decisions? The architecture? That was all me, refined through rapid dialogue with an AI that could actually implement what I was describing.

Here’s what made this sustainable rather than chaotic: TDD. Test-driven development. For most of the features, I insisted that Claude Code follow the red-green-refactor cycle with me. Write a failing test first. Make it pass with the simplest implementation. Then refactor while keeping tests green.

This wasn’t just methodology purism. TDD served a critical function in AI-assisted development: it kept me in the loop. When you’re directing thousands of lines of code generation, you need a forcing function that makes you actually understand what’s being built. Tests are that forcing function. You can’t write a meaningful test for something you don’t understand. And you can’t verify that a test correctly captures intent without understanding the intent yourself.

I’ve written about this more extensively in Ruby Was Ready From the Start, but the short version is: TDD is the only development process I know of that continually validates intent. When machines can generate endless variations of working-looking code, the only reliable way to know that software does what you intend is to encode that intent in tests and keep those tests running all the time.

Let me break down what that initial checkpoint actually contained:

The Distillation Pipeline

The core of Nexus is the DistillTranscript service. It takes raw transcript text and produces structured knowledge. Here's what it does:

Session identification: Generate a deterministic ID from the transcript content so we can detect duplicates and updates
LLM extraction: Send the transcript to an LLM with a carefully crafted prompt that extracts decisions, learnings, participants, and topics
Response processing: The LLM returns JSON
Deduplication: Check if we’ve already processed this session, and if so, only process new content
RDF transformation: Convert the structured JSON into semantic triples
Storage: Write the triples to Oxigraph

The prompt engineering was critical, but didn’t take a lot of time. The LLM needs to understand what counts as a “decision” versus a “learning,” how to identify participants, and how to extract meaningful topics without being too granular or too vague. Claude one-shotted it.

The RDF Schema

Claude designed a custom ontology for organizational knowledge:

@prefix nx: <https://nexus.zar.app/ontology#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .  # SKOS vocabularynx:Session a rdfs:Class ;
    rdfs:label "Session" ;
    rdfs:comment "A conversation or transcript session from any source" .
nx:Decision a rdfs:Class ;
    rdfs:label "Decision" ;
    rdfs:comment "An architectural, strategic, or implementation decision" .
nx:Learning a rdfs:Class ;
    rdfs:label "Learning" ;
    rdfs:comment "An insight, lesson learned, or piece of knowledge discovered" .

Every session, decision, and learning becomes a node in the graph with typed relationships. A decision nx:madeIn a session. A session nx:hasTopic concepts. A person nx:proposedDecision a decision. The graph structure enables queries that would be impossible with traditional relational storage.

The Claude Code Hooks

One of the most useful features landed on Day 1 and let me dogfood from that moment on: automatic transcript capture. Claude Code supports hooks that fire at various lifecycle points. I built a Stop hook that:

Captures the full conversation transcript from the session
POSTs it to a Nexus API endpoint
Handles authentication automatically (more on this later)

The hook is a simple shell script:

#!/bin/bash
# Capture Claude Code session transcript and send to NexusTRANSCRIPT=$(cat "$CLAUDE_TRANSCRIPT_FILE")
SESSION_ID="$CLAUDE_SESSION_ID"
curl -X POST "$NEXUS_URL/transcripts/ingest" \
  -H "Authorization: Bearer $NEXUS_API_KEY" \
  -H "Content-Type: application/json" \
  -d "{
    \"content\": $(echo "$TRANSCRIPT" | jq -Rs .),
    \"source\": \"claude_code\",
    \"session_id\": \"$SESSION_ID\",
    \"project\": \"$CLAUDE_PROJECT_DIR\"
  }"

Every Claude Code session now automatically becomes organizational memory. No manual effort required. The transcript gets distilled, decisions and learnings get extracted, and everything becomes queryable.

Conversational Knowledge Queries

This one was fun. Instead of requiring users to write SPARQL (which, let’s be honest, nobody wants to do), I built a conversational interface. You ask a question in plain English, and an LLM translates it into SPARQL, executes the query, and explains the results.

The KnowledgeQueryAssistant service orchestrates this:

Question analysis: Understand what the user is asking for
Schema awareness: Know what entity types and properties exist in the graph
SPARQL generation: Translate the natural language question into a valid query
Execution: Run the query against Oxigraph
Result explanation: Present the results in human-readable form

It’s genuinely useful. “What decisions did we make about authentication?” becomes:

PREFIX nx: <https://nexus.zar.app/ontology#>
SELECT ?decision ?title ?description ?rationale WHERE {
  ?decision a nx:Decision ;
            nx:title ?title ;
            nx:description ?description .
  OPTIONAL { ?decision nx:rationale ?rationale }
  FILTER(CONTAINS(LCASE(?title), "authentication") ||
         CONTAINS(LCASE(?description), "authentication"))
}

The query returns matching triples, and the assistant presents them conversationally: “I found 3 decisions related to authentication. The most recent was ‘Use GitHub OAuth for user authentication,’ made yesterday…”

Press enter or click to view image in full size

The Web UI

Even on Day 1, I wanted a browsable interface. The initial UI was already pretty full featured:

Sessions list: See all ingested transcripts with their generated titles and summaries
Decisions view: Browse all extracted decisions with their rationale
Learnings view: Browse all extracted learnings
Inquiries: For demonstration purposes and to validate functionality

That first day established the core architecture. Transcript in, LLM distillation, RDF storage, queryable knowledge. The foundation was solid. Everything after was iteration and enhancement.

Day 2: December 30th — Refinements and the Delta Problem

Commits: 5
Theme: Making it actually work in production

Day two was about confronting reality. The initial system worked beautifully for short transcripts. But Claude Code sessions can run for hours. Transcripts grow to thousands of lines. And the way I’d built the system, every time a transcript was submitted, it re-processed the entire thing.

That’s fine for demos. It’s not fine when you’re paying for LLM tokens and waiting for responses.

Enter delta distillation.

The insight was simple: track how much of each transcript we’ve already processed, and only distill the new content. But implementing it required rethinking the entire pipeline.

The Delta Algorithm

The new approach:

Content offset tracking: Store the character offset of how much we’ve processed for each session
Delta extraction: When a transcript update arrives, extract only the content after the last offset
Incremental distillation: Send only the new content to the LLM
Append-only storage: New decisions and learnings get added to the existing session, not replaced
Metadata preservation: Session title and summary come from the first distillation only

Here’s the commit that captured this change:

This is the kind of architectural decision that separates production software from demos. It required extracting new service objects (KnowledgeDeduplicator, KnowledgeQuery), rethinking how sessions were identified, and ensuring idempotent behavior when the same transcript got submitted multiple times with different lengths.

The TDD discipline proved essential here. Refactoring from full-transcript processing to delta-based processing could have introduced subtle bugs at every seam. But because we had comprehensive specs for the original behavior, I could refactor confidently. Change the internals, run the specs, verify everything still works. Then add new specs for the delta-specific behavior. The red-green-refactor cycle made a potentially dangerous architectural change feel safe.

This is exactly the pattern I’d internalized from decades of extreme programming practice: tests dissolve fear. When I encounter new code or need to make substantial changes, I don’t tiptoe around it worrying about breaking things. I know I can move in small steps, keep the system passing its tests at almost all times, try a refactoring and back it out if it doesn’t feel right. That safety net isn’t just technical. It changes how willing you are to explore.

By the way, introducing the delta algorithm and the entire refactoring I described above took about 30 minutes.

Killing the Action Items Feature

I also learned something important about my own system on Day 2: Action Items were noise.

The initial design extracted three types of knowledge: decisions, learnings, and action items. It seemed logical. Surely transcripts contain action items that people need to track?

The LLM was dutifully extracting them. “Need to update the database schema.” “Should add tests for the authentication flow.” “Remember to update the documentation.” Dozens of action items from each session.

The problem? They were almost always stale by the time anyone saw them. Claude Code sessions involve immediate implementation. By the time Nexus distilled “need to update the database schema,” the database schema was already updated. The action item was historical noise, not useful information.

Decisions and learnings persist; action items expire. I ripped out the entire feature: 11 files changed, 8 insertions(+), 132 deletions(-)

More lines deleted than added. A good sign. The willingness to remove features that don’t work is important. It’s easy to keep accumulating functionality. It’s harder to admit something isn’t useful and cut it.

Minor Fixes

The other commits were bug fixes and refinements:

User tracking: Properly associate sessions with the API user who submitted them
URI domain corrections: Claude has hallucinated zar.com into my ontology instead of zar.app in several places. Easy fix, including some rake tasks to fix existing data.
Session deduplication edge cases: Handling discovered cases where two sessions have identical content but different metadata, stuff like that.

In other words, normal software development stuff, just happening at 50x speed.

Day 3: December 31st — The Explosion

Commits: 18
Lines added: ~3,000+
Theme: Production-ready features

New Year’s Eve. Most people are planning their parties or getting drunk. I’m in the zone. My first all-hands company meeting as CTO was tomorrow and I wanted to demo my work, so it’s go time.

This day was absolutely packed. Eighteen commits. Multiple major features. Let me break it down by capability.

RESTful Knowledge API

First up: proper REST endpoints for everything. The initial system had basic views, but nothing approaching a real API. Day 3 changed that.

Sessions, decisions, and learnings all got their own controllers with full RESTful endpoints:

GET  /sessions          # List all sessions
GET  /sessions/:id      # Show session details
GET  /decisions         # List all decisions
GET  /decisions/:id     # Show decision details
GET  /learnings         # List all learnings
GET  /learnings/:id     # Show learning details

Each endpoint supports multiple formats:

HTML: For human browsing
JSON: For API consumers
TXT: For LLM consumption

Press enter or click to view image in full size

The decisions index showing distilled decisions with titles, descriptions, and rationale.

The text format endpoints deserve special mention. When an AI agent requests /decisions.txt, it gets a clean, token-efficient representation:

# Decision: Use PostgreSQL with pgvector for semantic search
Rationale: Keeps vector storage in the same database as other data, simplifying the stack.
Session: 9cafca33-cb1c-49c4-8696-d8be97871356
Date: 2026-01-01# Decision: Implement delta distillation
Rationale: Full re-processing is too expensive for long transcripts.
Session: bf6dd8f-1f2b-4f8c-8c81-c3551f2fb368
Date: 2025-12-30

No HTML cruft. No JavaScript. No navigation chrome. Just the knowledge, formatted for machine consumption. This is the kind of thing that matters when you’re building for a world where agents consume APIs.

GitHub OAuth + API Key System

This system will have precious knowledge stored. I can’t deploy it without authentication, so that was next. I went with GitHub OAuth because literally everyone at my company has a GitHub account. No signup friction, no password management, no email verification flows.

The most challenging part was the device authorization flow for CLI tools.

Claude Code hooks need to authenticate somehow, but they run in a terminal with no browser. I couldn’t find a way to share authentication state with an installed MCP server. Solution: a streamlined device flow that automatically opens your browser.

The flow works like this:

Hook calls POST /device/authorize
Server returns a device code and verification URL
Hook automatically opens your browser to the verification page (using open on macOS or xdg-open on Linux)
You see the Nexus authorization page and click to verify
If not already logged in, you’re redirected to GitHub OAuth
Meanwhile, the hook polls /device/token in the background
Once you authorize, the server returns an API token
Hook saves the token to ~/.config/nexus/api_key.<hostname> with secure permissions

The UX is seamless: on first run, your browser pops open, you click authorize, and you’re done. Future sessions authenticate automatically using the cached token.

Deployment Infrastructure

I wanted this running in production ASAP, not just localhost. I picked Render as my target platform since I know it inside and out, but Claude says that I took an unconventional approach: a single Docker container that bundles everything.

The container includes:

PostgreSQL 17 with pgvector: Embedded database, not a managed service
Oxigraph: The RDF triple store, also embedded
Rails + Puma: The web application
SolidQueue worker: Background job processing

All four processes are managed by Overmind, a Procfile-based process manager. One container, one persistent disk mount at /data, everything self-contained:

postgres: su postgres -c '/usr/lib/postgresql/*/bin/postgres -D /data/postgresql'
oxigraph: oxigraph serve --location /data/oxigraph --bind 127.0.0.1:7878
web: bundle exec puma -p 3000 -b tcp://0.0.0.0
worker: bundle exec rake solid_queue:start

Why bundle everything? Simplicity. No coordinating multiple services. No managed database pricing. No network latency between app and database. For a side project that might scale to a small team, this is perfect. If it needs to scale beyond that, breaking out the database is straightforward.

Is a single container deployment really unconventional? I don’t think it should be.

The commit also added:

GitHub Actions workflow: Automated deployment on push to main
CI pipeline: RSpec tests running before deploy, blocking bad commits

name: Deploy to Render
on:
  push:
    branches: [main]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run tests
        run: bundle exec rspec
  deploy:
    needs: test
    runs-on: ubuntu-latest
    steps:
      - name: Deploy to Render
        uses: johnbeynon/render-deploy-action@v0.0.8
        with:
          service-id: ${{ secrets.RENDER_SERVICE_ID }}
          api-key: ${{ secrets.RENDER_API_KEY }}

By end of day, commits to main were automatically tested, built into Docker images, and deployed to production. From idea to deployed feature in minutes, not days.

The needs: test line is critical. No deployment happens unless the test suite passes. And because we'd been doing TDD from the start, that test suite was substantial.

By Day 3, we had specs covering:

The full distillation pipeline with various transcript formats
RDF transformation and SPARQL query execution
Deduplication logic and delta processing
Authentication flows for both web and API
The conversational query assistant
Edge cases for identity resolution

The TDD discipline was paying dividends in letting me practice continuous deployment with confidence.

Day 4: January 1st — New Year’s Day, New Capabilities

Commits: 29
PRs Merged: 12
Theme: Making it intelligent

Happy New Year! While normal people were recovering from celebrations, I was building an MCP server. Looking back at the commit log for January 1st, I honestly don’t know how I fit it all in.

The Ontology Browser

RDF is powerful because it’s self-describing. The schema itself is data. You can query the ontology just like you query the instances.

I wanted that exposed through the UI, so users (and agents) could explore what types of entities exist, what properties they have, and how to query them.

PR #19 delivered the ontology browser:

Press enter or click to view image in full size

The ontology browser showing all 10 entity types with property counts and entity totals.

Visual grid of entity types: Session, Decision, Learning, Conflict, Person, Agent, Project, Concept, User, ExternalResource. Each card shows the type name, description, property count, and instance count.

Detail views: Click on any type to see its full property list, relationships to other types, and example SPARQL queries.

Press enter or click to view image in full size

Downloadable Turtle format: Export the full ontology definition for use in other tools.

JSON API endpoints: So agents can programmatically explore the schema before querying.

This might sound minor, but it’s important for discoverability. When you land on a new knowledge system, you need to understand its shape. What kinds of things are stored? How are they related? What can you query for? The ontology browser provides that.

Identity Resolution

One of the big PRs of the day tackled a fundamental problem: who is involved in these sessions?

When a transcript mentions “Sarah made the decision to use PostgreSQL,” the system should understand that Sarah is a real person. It should link her to other sessions she’s participated in. It should enable queries like “What decisions has Sarah been involved in?”

This required multiple components:

New RDF entity types: nx:Person for humans, nx:Agent for AI assistants. They're both participants, but they need different handling. You want to track what decisions Sarah has made. You probably don't need to track what decisions Claude has "made" (though it's interesting for analysis).

IdentityResolver service: Takes a name or email from a transcript and resolves it to a canonical Person record. Handles variations: “Sarah,” “Sarah Chen,” “sarah@company.com” should all resolve to the same person. Uses fuzzy matching and email-based identification.

Attribution tracking: The distillation prompt now extracts not just “participants” but specific attributions: who proposed this decision, who discovered this learning, who contributed to the discussion.

People UI: A /people endpoint to browse and merge identities. Sometimes the system creates duplicate Person records that need manual consolidation.

The identity resolver is smart about AI names too. “Claude,” “Claude Code,” “Assistant,” “GPT-4,” “Copilot” all get correctly classified as agents, not people. This prevents accidentally creating Person records for AI assistants.

Around this time I started realizing that a lot of my entities were going to look similar. Maybe I could try to feed the context of the knowledge distillation worker with similar entities right off the bat? Or do some post-processing later? Either way I would need embeddings.

Semantic Search with pgvector

SPARQL is powerful but exact. You query for specific predicates and values. You get back things that match precisely.

What if you want conceptual similarity? “Find knowledge related to authentication” should surface:

Decisions about OAuth
Discussions about API keys
Learnings about session management
Security-related conflicts

Even if none of them literally contain the word “authentication.”

PR #21 added vector similarity search:

pgvector extension: PostgreSQL with the pgvector extension can store and query high-dimensional vectors. No separate vector database needed.

Embedding generation: Using Gemini’s embedding model (via OpenRouter), each decision and learning gets converted to a 768-dimensional vector that captures its semantic meaning.

RdfEntity model: A PostgreSQL table that mirrors entities from the RDF store and adds vector embeddings. When knowledge gets distilled, each entity is queued for embedding generation.

SemanticSearch service: Takes a natural language query, embeds it, and finds the closest matches in vector space.

class SemanticSearch
  def search(query, limit: 10)
    query_embedding = EmbeddingService.embed(query)

    RdfEntity
      .nearest_neighbors(:embedding, query_embedding, distance: "cosine")
      .limit(limit)
      .map { |entity| enrich_with_rdf_data(entity) }
  end
end

When knowledge gets distilled, each decision and learning is automatically queued for embedding generation. The graph stays in sync with the vector index.

Conflict Detection

Knowledge systems accumulate contradictions. This is inevitable. Two sessions might record opposing decisions:

Session A: “Decided to use REST APIs for the mobile client”
Session B: “Decided to use GraphQL for the mobile client”

Or learnings that directly contradict each other:

Learning 1: “Caching improved performance by 40%”
Learning 2: “Caching caused consistency issues and was removed”

Rather than silently harboring inconsistency, Nexus now surfaces these explicitly.

PR #22 added the Conflict entity type:

Async conflict detection: After each distillation, a background job scans for potential conflicts. It uses embeddings to find semantically similar items, then prompts an LLM to assess whether they actually conflict.

Status tracking: Conflicts can be: open, investigating, resolved. Each status change gets tracked with timestamps.

Priority levels: Not all conflicts are equal. A contradiction between architectural decisions is more important than conflicting preferences about code style.

Entity Browser

The ontology browser shows the schema. But what about the actual data? You need to be able to browse and search the instances themselves.

Press enter or click to view image in full size

The entity browser with type filters and semantic search.

PR #25 and #27 added entity browsing:

/entities endpoint: Browse all entities with type filtering. Show me all Decisions. Show me all Learnings. Show me all Conflicts.

Search: Both exact text search and semantic similarity search. Find entities that mention “authentication” literally, or find entities conceptually related to authentication.

Individual entity views: Click through to see all properties and relationships for any entity.

Counts in ontology view: The ontology browser now shows how many instances of each type exist.

The MCP Server

MCP (Model Context Protocol) is Anthropic’s standard for exposing tools to AI agents. By embedding an MCP server in Nexus, any AI agent can directly query the organizational knowledge base.

Think about what this enables. Before starting to implement a feature, Claude Claude can be configured to check Nexus: “Has this team made any decisions about authentication patterns?” Before proposing an architectural change, it can query: “What learnings do we have about caching in this codebase?”

PR #30 implemented the MCP server with five tools:

nexus_ontology: Get the full schema with types, properties, and example queries. This is the starting point for exploration.

nexus_type: Get detailed information about a specific entity type. “Tell me about Decision entities.”

nexus_recent: Fetch recent sessions, decisions, learnings, or conflicts. “What were the last 10 decisions?”

nexus_query: Execute arbitrary SPARQL queries. For agents that know what they’re looking for.

nexus_search: Semantic vector similarity search. “Find knowledge related to database performance.”

The tools follow HATEOAS-style progressive discovery. Each response includes hints about what to query next. An agent can start with nexus_ontology, understand the schema, then drill into specific types, then query for specific instances.

Agent: nexus_ontology()
→ Returns: all types with descriptions and example queries
  Hint: "Use nexus_type('Decision') for detailed Decision properties"Agent: nexus_type(type_name: "Decision")
→ Returns: properties, relationships, example query
  Hint: "Use nexus_recent(type: 'Decision') to see recent decisions"
Agent: nexus_recent(type: "Decision", limit: 5)
→ Returns: 5 most recent decisions with full details
  Hint: "Use nexus_query() with SPARQL for more specific queries"

The MCP server gives them read access to everything Nexus knows. The organizational memory is no longer just for humans browsing a web UI; it’s infrastructure that agents consume, which was the goal from the start.

Twenty-nine commits. Twelve merged PRs. On January 1st. I still can’t quite believe it.

Days 5–6: January 2nd-3rd — Opening the Floodgates

Commits: ~10
Theme: Universal ingestion

I successfully demoed the system at our Friday all-hands company meeting. People are psyched. I’m psyched. The system was powerful, but it only captured Claude Code sessions. What about all those other knowledge sources I mentioned at the start? GitHub PR discussions. Slack threads. Linear issues. What about non-engineers. The vision was passive ingestion from everywhere, right? Time for some magic.

Want to become the type of person who writes a Universal Webhook Processor and calls it a day? First step, buy my book on Amazon or Leanpub (where it’s available in 31 languages)

Universal Webhook Processor

PR #33 was the key unlock. Instead of building bespoke integrations for every SaaS platform, I built a universal webhook endpoint that uses AI to understand and transform any JSON payload:

POST /webhooks/:source

Point GitHub webhooks here. Linear webhooks here. Slack event subscriptions here. Notion webhooks here. Any service that can POST JSON can become a knowledge source.

The WebhookProcessor service:

Receives the raw JSON payload: No assumptions about structure
LLM analysis: Uses Gemini 3 Flash to understand what kind of webhook this is and extract meaningful content
Transcript transformation: Converts the webhook into a “transcript” format that the existing distillation pipeline can process as if it was coming from Claude Code.
Job queuing: Enqueues a distillation job for background processing

The beautiful part is graceful degradation:

Slack: Gets optimized handling with a specific extractor that knows where to find the relevant content. (See next section.)
Everything else: The LLM analyzes the JSON structure and makes its best effort to extract meaningful content

Some random internal tool sends webhooks? As long as there’s meaningful text content somewhere in the payload, Nexus will try to distill knowledge from it.

Slack Integration

Slack deserved deeper integration than just webhooks. It’s where so much organizational discussion happens.

PR #34 added:

Slack Events API handler: Real-time event processing for messages, reactions, and thread updates.

SlackThreadCollector: Fetches and formats entire threads with full context. When a thread gets distilled, we capture the whole conversation, not just the triggering message.

Threshold filtering: Not every Slack message is worth preserving. The defaults require at least 3 replies or 2 participants before auto-ingestion. ∑e also filter out many casual “sounds good” (ok, lol, etc) while capturing substantive discussions.

ID resolution: Slack uses internal IDs like U09RS4298TY for users and C01234ABCDE for channels. The LLM now has a tool to resolve these to human-readable names. "U09RS4298TY said..." becomes "Obie Fernandez said..."

The threshold filtering is important. Slack generates enormous amounts of content. Without filtering, you’d be paying to distill “sounds good” and “thanks!” a thousand times a day.

Metadata as RDF Properties

The final architectural piece: when sessions originate from webhooks, their metadata becomes queryable.

PR #44 added new RDF properties for webhook-originated sessions:

nx:sourceUrl: A link back to the original resource. For a GitHub PR discussion, this links to the PR. For a Slack thread, it links to the thread in Slack.

nx:resourceType: What kind of external resource this came from: pull_request, issue, slack_thread, linear_issue, etc.

nx:eventType: What triggered the session: pr_opened, pr_reviewed, issue_commented, message_posted.

nx:significance: Impact categorization: low, medium, high. Webhook processors can assess significance based on factors like number of participants, length of discussion, or explicit markers.

Now you can query:

PREFIX nx: <https://nexus.zar.app/ontology#>
SELECT ?session ?title ?url WHERE {
  ?session a nx:Session ;
           nx:resourceType "pull_request" ;
           nx:significance "high" ;
           nx:title ?title ;
           nx:sourceUrl ?url .
}

“Show me all high-significance decisions from GitHub PRs in the last month.” The RDF graph knows where knowledge came from, not just what it contains.

The Numbers

Despite the detail in this now very long blog post, I still feel like I’m only scraping the surface of the work that was involved.

Code

~12,800 lines of code across app, lib, and spec directories
20+ controllers handling various endpoints
10+ models including Person, RdfEntity, Inquiry, DeviceAuthorization
15+ services for distillation, transformation, search, conflict detection, identity resolution
A full-featured first draft of a Nexus ontology with 10 unique types and 23 properties
Comprehensive test coverage with RSpec (247 examples)
Working integration with Claude Code, Github, Slack, and Linear

Infrastructure

Full deployment infrastructure on Render with CI/CD
Production-ready authentication with GitHub OAuth and API keys
MCP server for agent integration
Universal webhook ingestion for any SaaS platform

Timeline

December 29th to January 3rd. Call it four working days. Here’s a full list of the PRs.

Press enter or click to view image in full size

I worked directly in main for the first couple days, but I wish I had not done that

What This Means

I’ve built a lot of software in my career. I’ve owned and run substantial software delivery consulting organizations. I know how long projects typically take and I know what they cost. I know the difference between “working demo” and “production system.”

What Claude Code enabled me to do this week is not incremental improvement. It’s categorical change.

A system like Nexus, built by a solo developer in the “old” way, would take months. Each of the following bullet points would take at least a week or two, assuming a motivated engineer with little in the way of distractions.

Research and spike the RDF/SPARQL approach
Design the schema and ontology
Build the ingestion pipeline
Implement distillation with prompt engineering
Add authentication and authorization
Build the web UI
Set up deployment infrastructure
Add semantic search with vector embeddings
Build the MCP server
Test everything thoroughly
Handle the hundred edge cases that emerge

That’s at least 13 weeks of focused development. Call it 4–5 months. And that’s with an experienced developer who knows the technologies involved. This would not be a viable side project for a typical engineer that has other deliverable feature work. For a busy CTO? Yeah, right.

Yet I did it in the time between Christmas and New Year’s, and still had time for regular work, walking my dog, and holiday activities. The leverage is absurd. (Admittedly some of the work was done on Claude Code on my phone at the barber shop.)

It’s not about replacing experienced developers. My architectural judgment, domain knowledge, and product intuition were essential every step of the way. Claude Code couldn’t have built Nexus alone. It’s very likely that you, dear reader cannot build Nexus in a few days if you tried, even with Claude Code’s help. An AI coding agent doesn’t automatically understand my company’s needs, our existing infrastructure, our preferences for certain patterns. But I couldn’t have built it this fast alone either.

Never. Would. Have. Happened.

The Development Experience

Productivity measures don’t capture the qualitative difference involved in working this way. Traditional development has a certain rhythm. You think about what you want to build. You start typing. You hit issues, debug them, refer to documentation, search Stack Overflow. You write tests. You refactor. Each step takes time, and there’s cognitive overhead in context-switching between “thinking about the problem” and “mechanically implementing the solution.”

AI-assisted development collapses that overhead. I stay in “thinking about the problem” mode almost the entire time. When I need something implemented, I describe it. When I need documentation, I ask. When I hit an issue, I describe the symptoms and get debugging suggestions.

Actually, if I happen to think of something new that’s outside of the current workstream, you know what I do? I used to log an issue. Now I use the & prefix in Claude Code to just fire up a remote instance and have it start working on whatever I thought of.

I’ve been reflecting on this shift in What Happens When the Coding Becomes the Least Interesting Part of the Work, which recently went mega viral. For someone at my experience level, the actual typing of code has long since stopped being the thing that teaches me anything. The intellectual work happens before the first line is written: understanding the problem space, recognizing patterns from decades of experience, making judgment calls about abstraction levels, assessing blast radius of changes, feeling out whether something should be general or specific.

That’s the “senior thinking” that AI doesn’t replace. What AI does replace is the mechanical translation of those decisions into working code. And honestly? That translation was always the boring part.

The experience is closer to pair programming with a very capable, very fast colleague who never gets tired and knows every API by heart. I’m still making all the decisions. I’m still responsible for the architecture. But the mechanical parts happen at conversation speed instead of typing speed.

Conclusion

Enterprise software vendors charge six and seven figures for knowledge management systems. They’re built by teams of dozens over years. They require consultants to implement and months to configure.

I built equivalent functionality in four days. While doing my day job as a CTO. Yes, I’m exceptional, but so what? I can only do it today because the tools have changed so dramatically.

If you’re a developer who hasn’t tried Claude Code yet, you’re working with one hand tied behind your back. The productivity multiplier is real. The creative leverage is real. The ability to turn ideas into working systems in days instead of months is real.

But bring your discipline with you. Bring your tests. Bring your judgment. The AI provides velocity. You provide direction. Together, you build things that neither could build alone.

Live in EMEA and want to work on Nexus and AI agents with me? I’m currently hiring senior product engineering talent with Ruby on Rails expertise at ZAR, where we’re building the future of global stablecoin adoption.