A practical walkthrough of running 6 specialized AI agents that talk to each other, share memory, and follow a structured development workflow — all from the terminal.
The Problem
I’m building a complex software project. One AI assistant isn’t enough — not because it lacks intelligence, but because it lacks structure. A single long conversation mixes code writing, reviewing, testing, and documentation into one messy stream. Context gets lost. Quality slips.
What if, instead of one assistant doing everything, I had a team? A developer, a reviewer, a tester, a documentation writer — each with clear responsibilities, strict permissions, and a shared memory?
That’s exactly what I built.
The Setup
The entire system runs on Claude Code (Anthropic’s CLI tool) with multiple instances running in tmux sessions on a Linux machine. Each agent is a separate Claude Code process with its own working directory, its own rules file (CLAUDE.md), and its own permission set.
Here’s the team:
Press enter or click to view image in full size
- neo: Project manager. Routes tasks, enforces workflow, handles git operations.
- dev: Developer. Writes code. Can only write in
src/andconfig/. - doc: Documentation. Maintains official docs. Can only write in
Docs/. - rev: Reviewer. Read-only access to source code. Writes review reports.
- val: Validator. Runs tests, validates results.
- admin: Infrastructure admin. Maintains the communication and memory systems.
Each agent has a binding permission matrix — a table that specifies exactly which directories it can read, write, modify, or delete. The developer cannot touch documentation. The reviewer cannot modify source code. The orchestrator gets emergency-only access when an agent is offline.
This isn’t just organization. It’s enforcement. Agents literally cannot cross boundaries.
The Glue: Two MCP Servers
Agents running in isolated tmux sessions need two things: a way to talk to each other and a way to remember things across sessions. That’s where MCP comes in.
MCP (Model Context Protocol) is an open standard by Anthropic that lets external tools expose capabilities to AI models. Think of it as a USB port for AI: any tool that speaks MCP can plug into Claude Code and provide new functions (called “tools”) that the agent can call. You write a server that implements the protocol, and every connected agent gets access to your tools automatically.
Every agent connects to two MCP servers:
Press enter or click to view image in full size
MCP Broker: Agent-to-Agent Communication
The Mcp Broker is a lightweight message routing server backed by SQLite. It provides these MCP tools to every agent:
Press enter or click to view image in full size
Communication follows a strict Req/Resp pattern. The orchestrator sends a Req to an agent. That agent does the work and sends back a Resp referencing the original request ID. No pleasantries, no acknowledgment loops — just content.
Every agent registers on startup (registry("dev", "logon")) and unregisters on exit. The Mcp Broker tracks who's online, routes messages, and pings agents periodically to detect disconnections.
For anything beyond a short message, agents follow a file-based protocol: write the detailed content to a file in the appropriate folder, then send a broker message with just the file path. The broker message is a pointer. The file is the deliverable.
MCP Memory: Shared Knowledge Graph
This is where it gets interesting. The memory server wraps a Neo4j graph database and Voyage AI embeddings behind MCP tools:
Press enter or click to view image in full size
The knowledge graph builds up organically from the codebase. Here’s a simplified view of what the graph looks like for a typical Python project:
Press enter or click to view image in full size
Entities are extracted from source code and documentation — classes, functions, modules, concepts. File contents are chunked and embedded for semantic search. Relations (DEFINED_IN, DEPENDS_ON, CALLS, CONTAINS) connect everything into a navigable graph.
When an agent needs to validate a proposal, it doesn’t just read files — it queries the knowledge graph to check if the proposed changes are consistent with the existing codebase. When the reviewer checks code, it can search for relevant architectural decisions (ADRs) stored in the graph.
The Hook System: Invisible RAG Injection
Here’s the part that makes the whole system feel seamless. Claude Code supports hooks — shell commands that fire on specific events. I use four hooks that run automatically on every agent, invisibly enriching every interaction with context from the knowledge graph.
Hook 1: user_prompt_submit — Context Injection
When: Every time anyone sends a message to any agent.
What happens:
- The hook sends the prompt text to the memory server via HTTP
- The server runs a hybrid 4-signal search: vector similarity (Voyage embeddings), fulltext (BM25), graph traversal, and Personalized PageRank
- Results are merged using Reciprocal Rank Fusion (RRF)
- The top relevant chunks are injected back into the prompt as a system reminder
The agent never explicitly asks for context. It just has it. When the developer reads a file, the RAG system has already injected related knowledge graph entities, relevant past conversations, and connected code modules into the context window.
Press enter or click to view image in full size
Hook 2: post_tool_use — Silent Activity Tracking
When: Every time an agent reads a file, edits a file, or runs a command.
What happens: The hook fires in the background (fire-and-forget) and logs the operation to the knowledge graph. Which files were read, which were modified, which commands were executed — all tracked as File and Command nodes with relations to the active session.
This creates an organic activity graph. Over time, the system knows which files are most frequently accessed, which modules tend to be edited together, and what commands agents typically run.
Hook 3: pre_compact — Memory Preservation
When: Before Claude Code compresses a conversation to save context window space.
What happens:
- The full conversation transcript is captured
- It’s chunked into overlapping segments (512 words, 64-word overlap)
- Each chunk is embedded via Voyage AI and stored in Neo4j
- The system extracts file paths and entity references from the text, creating
DISCUSSESandMENTIONSrelations
This is how agents build long-term memory. When a conversation is compacted, the knowledge isn’t lost — it’s stored in the graph. Future searches by any agent can retrieve those chunks.
Hook 4: session_start — Broker Registration
When: An agent session starts.
What happens: Reminds the agent to register with the Mcp Broker. Simple but essential for the communication system.
The Development Workflow
With the infrastructure in place, here’s how actual work gets done. Every code change follows a mandatory 10-step workflow:
Step 1: Orchestrator writes proposalWHAT & WHY
Step 2: Agents validate in paralleldoc, dev
Step 3: Iterate on feedback (max 3x)loop 2↔3
Step 4: Orchestrator writes dev promptHOW (task spec)
Step 5: Developer validates promptfeasibility check
Step 6: Developer implementscode
Step 7: Reviewer reviews (read-only)APPROVED / REJECTED
Step 8: Validator runs testsGO / NO GO
Step 9: Doc writer updates docskeeps docs in sync
Step 10: Orchestrator commitsgit + journal
Every step is mandatory. No shortcuts. Max 3 iterations on feedback loops. If a review is rejected, the developer fixes and resubmits. If validation fails, we go back to step 4 with a new prompt — not step 6.
Here’s how step 2 looks at the message level:
Press enter or click to view image in full size
During step 6 (implementation), the hooks are working continuously. Every file the developer reads gets RAG context injected. Every edit is tracked. When the conversation gets too long and is compacted, the transcript is preserved in the knowledge graph.
The Maintenance Loop
A background daemon runs every hour and keeps the knowledge graph healthy:
- Cleanup: Removes chunks from deleted files
- Reindex: Updates chunks for modified files
- Entity extraction: Discovers new entities from source code and docs
- Repair: Multi-step integrity check (scope propagation, orphan detection, origin tracking)
- Fill: Generates missing embeddings via Voyage AI
- Health check: Verifies database indices and data quality
After each cycle, the admin agent broadcasts a verification request to all other agents. Each one checks its domain in the knowledge graph and reports back. If the developer’s entities look wrong, or the doc writer’s ADRs are missing, we know immediately.
What I Learned
Separation of concerns works for AI agents too. Giving each agent a narrow scope with enforced permissions eliminates entire categories of mistakes. The reviewer can’t accidentally modify code. The developer can’t overwrite documentation. Constraints breed quality.
Shared memory changes everything. Without the knowledge graph, each agent session starts from zero. With it, an agent can pick up context from a conversation that happened days ago, by a different agent, about a related topic. The RAG injection is invisible but transformative.
Structured communication prevents chaos. The Req/Resp pattern with file-based content and anti-loop rules (no pleasantries, no acknowledgment messages, one response per request) keeps the system focused. Without these rules, agents quickly degenerate into politeness loops and redundant confirmations.
Hooks are the killer feature. The four hooks — context injection, activity tracking, memory preservation, and broker registration — run invisibly on every interaction. The agents don’t need to know about the knowledge graph. They just benefit from it.
Mandatory workflows catch problems early. The 10-step process feels heavy, but it means every code change is proposed, validated by multiple specialists, reviewed, tested, and documented before it’s committed. Problems surface at step 2, not step 10.
The Stack
Press enter or click to view image in full size
Is This Overkill?
Maybe. For a weekend project, absolutely. But for a complex, evolving codebase where correctness matters and you want to maintain quality over months of development — having specialized agents with shared memory and enforced workflows is remarkably effective.
On the practical side: the Voyage AI embeddings cost a few dollars per month for a medium-sized codebase. Neo4j runs fine on 1–2 GB of RAM. The RAG hook adds about half a second of latency per prompt — noticeable but not disruptive. The main cost is complexity: you’re running multiple tmux sessions, two MCP servers, and a maintenance daemon. It’s infrastructure, and infrastructure needs care.
The real insight isn’t “use more AI agents.” It’s that AI agents benefit from the same organizational patterns that human teams do: clear roles, defined permissions, structured communication, shared knowledge bases, and mandatory quality gates.
The agents don’t need to be smarter. They need to be organized.
If you’re curious about MCP servers or Claude Code hooks, the Anthropic documentation is a good starting point. If you want to discuss multi-agent architectures, find me in the comments.