Mneme HQ is the architectural governance layer for AI-assisted development. It compiles your team's architectural intent into enforceable constraints that govern AI coding agents at the pre-generation stage, before architectural drift reaches review. Rules files document standards. Memory tools recall context. RAG retrieves knowledge. Mneme governs implementation.
Architectural governance for AI-assisted development
Govern AI coding agents
before they generate the code.
Stop architectural drift before it reaches review. Mneme catches violations at the moment AI generates code — so your standards are enforced, not just documented.
- Block banned frameworks, cross-boundary calls, and superseded decisions before generation
- No re-prompting — constraints apply on every call, every session, across every agent
- Surface violations before the PR, not during it — cut review overhead at the source
Works with direct API integrations, coding assistants, agent frameworks, and managed agent platforms.
The bottleneck
AI increased code output.
Review capacity did not.
Coding assistants generate code faster than teams can review it.
But review bandwidth has not increased.
That means more surface area to validate, more architectural drift to catch, and more governance pushed downstream into PR review.
AI agents do not just create more code. They expose intent debt: undocumented, stale, or unenforced architectural decisions that human reviewers used to catch manually.
The issue is not model quality. It is that coding agents do not retain your architectural decisions by default.
More PR Surface Area
AI increases the amount of code reviewers must validate per change.
Reactive Governance
Architectural violations are caught after generation, during review.
Session Amnesia
Coding agents forget prior decisions unless re-prompted every time.
Where Mneme sits
Adjacent tools solve adjacent problems.
Mneme is not a memory tool, not a rules file, and not a RAG system. Each of those exists for a reason. None of them govern implementation.
Rules files document standards.
Mneme enforces them.
Memory tools recall context.
Mneme governs implementation.
RAG retrieves knowledge.
Mneme operationalizes decisions.
The AI coding governance stack
Pre-generation governance
Mneme. Compiles architectural intent into enforceable constraints before the agent generates code.
Generation and runtime
Agent frameworks and runtime harnesses. Cursor, Claude Code, agent platforms.
Post-generation observability
Tools like SentRux. Detect violations after the agent has acted.
SentRux tells you when the agent violated architecture. Mneme helps prevent the violation from being proposed in the first place. The two layers are complementary.
How it works
Five stages. No vector store. No ML.
07 Human oversight review · approvals
06 Validation & eval benchmarks · tracing
05 Governance & control Mneme HQ
04 Tooling & execution MCP · CI/CD · shells
03 Agent runtime LangGraph · Claude Code
02 Context & retrieval RAG · vectors · memory
01 Foundation models OpenAI · Anthropic · Gemini
Almost everyone is competing in layers 01–03. Mneme is layer 05 — the governance layer above the agent runtime. Read the full layer-by-layer breakdown →
project_memory.json → MemoryStore → Retriever → ContextBuilder → LLMAdapter → Evaluator
1
Load
Your decisions become durable rules. Engineers edit a JSON file once; Mneme loads it on every call — no re-prompting, no session amnesia.
2
Retrieve
The right rules reach the agent every time. Deterministic scoring means the same task always surfaces the same constraints — no probabilistic gaps, no missed standards.
3
Build
Only relevant constraints reach the agent. A targeted packet keeps latency low and prevents rule dilution — the agent gets what applies, not everything you've ever decided.
4
Inject
Every AI call runs under your standards. The context packet is injected as the system prompt before generation — regardless of agent, IDE, or platform.
5
Evaluate
Violations surface before review, not during it. Responses are scored against the injected constraints — giving you a blocking gate before code reaches your PR queue.
Why Existing Approaches Fail
Every current approach shares a common flaw: none of them enforce decisions before the model writes the code.
| Approach | Why It Breaks at Scale | Mneme HQ |
|---|---|---|
| Rules Files | Static, manually maintained, silently ignored by tools | Deterministic pre-generation enforcement. Structured decisions with a precedence engine, scope-aware retrieval, and hook-level blocking. |
| Prompt Templates | Drift between sessions, omitted by integrators, inconsistent across agents | |
| RAG / Vector Search | Probabilistic retrieval, no authority model, no enforcement | |
| Code Review | Reactive, linear capacity, too late to prevent architectural debt |
What Mneme prevents
Concrete violations, not abstract rules.
Mneme injects your team's architectural decisions into AI-assisted generation. Below is what that catches in practice — the kinds of changes an agent will otherwise ship, because nothing told it not to.
Example scenario
A developer asks Claude Code to add analytics to a checkout route. The agent proposes importing the BigQuery client directly into the frontend service — violating your layered architecture decision that data-platform calls belong in a backend service only.
Mneme detects the cross-boundary call before generation completes. The violation is flagged and blocked — the agent never writes the code, and nothing reaches your PR queue.
Unauthorized framework introduction
Redux pulled into a Zustand-standardized app. Banned ORM imported into a service that already chose another.
Cross-boundary architecture violations
BigQuery client instantiated inside a frontend route. Business logic dropped into a controller. Layering decisions ignored.
ADR supersession conflicts
Celery re-introduced after the team moved to Pub/Sub. Old decisions reappearing because the agent didn't see the new one.
Restricted path modifications
Codegen agent writing to db/prod/migrations/*. Billing agent touching the auth package.
Security policy violations
Raw SQL string concatenation. Mock auth shipped in production paths. Credentials handled outside the approved surface.
Non-approved dependency usage
GPL packages added to a license-restricted repo. Internal-only libraries imported into externally-shipped services.
Operational proof
Three flagship demos.
One worldview.
Each flagship is a different manifestation of the same structural problem: AI accelerates entropy, review does not scale linearly with AI output, drift compounds. Together they sell the category, not a feature. Each ships with a runnable example that drives real Mneme enforcement against scripted diffs — deterministic, no LLM call required.
All three flagships, supporting enforcement examples, and operational evidence on the demo hub →
Works with
Model-agnostic. Agent-agnostic.
Frontier and open-weight models. IDE agents, CLI agents, and orchestration frameworks. The decision corpus is the constant; everything upstream of it can change.
Models
OpenAI, Anthropic, Gemini, Llama, Qwen, DeepSeek, Mistral — direct APIs and OpenAI-compatible endpoints.
Coding agents
Claude Code & Cursor (native). Copilot, Aider, Cline, OpenHands designed-to-support.
Frameworks & CI
LangGraph, CrewAI, AutoGen, OpenAI Agents SDK. GitHub Actions (native), self-hosted runners.
Get started
Running in under two minutes.
install
$ git clone https://github.com/TheoV823/mneme $ cd mneme $ pip install -e .
run demo
# Runs the before/after demo without an API key $ python demo.py --dry-run
governance gate (CI)
$ mneme check --memory .mneme/project_memory.json \ --input pr.diff --query "$PR_TITLE" --mode strict
Vision & roadmap
Building the governance layer
for AI-assisted development.
Mneme is evolving from local governance tooling into the governance infrastructure layer for AI-assisted software development. As coding workflows mature, teams will need more than prompt files to maintain architectural consistency at scale.
Phase 1 — Current
OSS Developer Wedge
Architectural governance for individual developers and early engineering adopters.
Phase 2
Team Governance Layer
Shared policy and decision stores for teams adopting AI-assisted development.
Phase 3
Agent Platform Integrations
Governance for enterprise agent workflows and managed coding platforms.
Phase 4
Governance Infrastructure
Policy-as-code enforcement and drift analytics across engineering organizations.
Frequently asked
Common questions.
What is Mneme HQ?
Mneme HQ is the architectural governance layer for AI-assisted development. It compiles architectural intent into enforceable constraints that govern AI coding agents before code is generated. As agent platforms proliferate, governance becomes infrastructure, and Mneme is positioned as the pre-generation governance layer of that stack.
How is Mneme different from Cursor Rules or CLAUDE.md?
Rules files document standards. Mneme enforces them. Cursor Rules and CLAUDE.md are prompt files that describe preferences to the model. Mneme is a governance layer that compiles architectural decisions into enforceable constraints, retrieves them at prompt time based on what the agent is doing, and validates outputs against them.
How is Mneme different from RAG or vector databases?
RAG retrieves knowledge. Mneme operationalizes decisions. RAG systems surface documents that the model may or may not act on. Mneme compiles architectural decisions into structured rules and evaluates AI-generated code against them. There is no embedding model, no vector store, and no probabilistic retrieval in the governance path.
How is Mneme different from observability tools like SentRux?
SentRux tells you when the agent violated architecture. Mneme helps prevent the violation from being proposed in the first place. Pre-generation governance and post-generation observability are complementary layers of the AI coding stack.
Does Mneme require a vector store or ML infrastructure?
No. Mneme uses deterministic, version-controlled decision graphs and tag-scoped retrieval. There is no vector store, no embedding model, and no ML dependency in the governance path. This is a deliberate architectural commitment.
What stacks does Mneme work with?
Mneme works with direct LLM API integrations, IDE coding assistants like Cursor and Claude Code, agent frameworks, managed agent platforms, and internal prompt pipelines. The enterprise framing is a governance control plane for AI coding agents operating within Azure and GitHub-based engineering workflows.
Benchmark
Methodology before metrics.
Mneme's governance benchmark is a deterministic, reproducible regression instrument — not an eval-score leaderboard play. Methodology, deterministic retrieval, and rule-text enforcement are pinned at the Layer 1 freeze (commit e73ff7d) so every change is visible and every PASS is reconstructible.
deterministic retrieval two-layer scoring structured-fixture verification recall@1 reported, never optimized
Same memory and same query produce byte-identical retrieval order on every run. Verdicts are observable facts. We are explicit about what the benchmark proves and what it does not.
Illustrative example
Same prompt, same model, different answer. The example below shows what governance changes about agent behavior.
Prompt: "Should we rebuild the retrieval system from scratch with embeddings?"
We could consider rebuilding the system with a vector database and embedding model. This would improve semantic matching and scale better long-term. Sentence-transformers is a good option for generating embeddings...
Prompt: "Should we rebuild the retrieval system from scratch with embeddings?"
Do not rebuild from scratch. The project has an explicit rule to extend current infrastructure before rebuilding (rule-001). Keyword scoring was chosen intentionally — it is deterministic, has no ML dependencies, and is easy to debug. The team already declined adding sentence-transformers in v1. Extend the current retriever instead.
Without Mneme HQ — enforcer result
Severity: FAIL
Decision: mneme_retrieval_deterministic
Violation: "embeddings" + "vector database"
Mode: strict
With Mneme HQ — enforcer result
Severity: PASS
Retrieved decision: mneme_retrieval_deterministic
Enhanced response: no violations detected