The structural intelligence layer for AI coding tools.
Your AI coding tool sees files. Carto sees architecture — domains, blast radius, import graph, routes. Every AI tool you use stops guessing about your codebase and starts knowing it.
npm install -g carto-md
cd your-project
carto initThat's it. Carto auto-wires itself into every AI tool you have installed. Restart your AI tool and it now knows your codebase architecturally.
Works with: Cursor · Claude Code · Codex · Kiro · Claude Desktop · Windsurf · VS Code Copilot · JetBrains · Zed
Carto also publishes ANCI — the open file format for any codebase to describe its architecture to AI tools. Carto is the reference implementation; the spec is open and any tool can consume it. § ANCI ↓
Use it with your AI tool
carto init auto-wires the MCP server into every AI tool it detects. If yours wasn't detected, here's the manual config — one block, copy-paste, done.
Cursor
carto init writes this for you. Manual: ~/.cursor/mcp.json
{ "mcpServers": { "carto": { "command": "carto", "args": ["serve"], "cwd": "/your/project" } } }Claude Code (CLI)
carto init writes <project>/.mcp.json for you when Claude Code is detected (claude binary on PATH or ~/.claude/ exists). Manual:
claude mcp add carto -- carto serve
Or create .mcp.json at the project root:
{ "mcpServers": { "carto": { "command": "carto", "args": ["serve"] } } }Codex (CLI)
carto init writes ~/.codex/config.toml for you when Codex is detected (codex binary on PATH or ~/.codex/ exists). Manual:
codex mcp add carto -- carto serve
Or edit ~/.codex/config.toml:
[mcp_servers.carto] command = "carto" args = ["serve"] cwd = "/your/project" enabled = true
Kiro
carto init writes this for you. Manual: ~/.kiro/settings/mcp.json
{ "mcpServers": { "carto": { "command": "carto", "args": ["serve"], "cwd": "/your/project" } } }Claude Desktop
carto init writes this for you (cross-platform). Manual paths:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json - Linux:
~/.config/Claude/claude_desktop_config.json(community Linux builds)
{ "mcpServers": { "carto": { "command": "carto", "args": ["serve"], "cwd": "/your/project" } } }VS Code Copilot
carto init writes <project>/.vscode/mcp.json for you when the code binary is on PATH. Manual: .vscode/mcp.json in your project root, or Command Palette → MCP: Add Server. Note: VS Code uses servers (not mcpServers) and requires "type": "stdio".
{ "servers": { "carto": { "type": "stdio", "command": "carto", "args": ["serve"] } } }Windsurf
carto init writes this for you when Windsurf is detected. Manual: ~/.codeium/windsurf/mcp_config.json
{ "mcpServers": { "carto": { "command": "carto", "args": ["serve"], "cwd": "/your/project" } } }Zed / JetBrains / VS Code (full agent mode)
Carto also runs as a full ACP agent — not just a passive MCP server, but an active coding agent with architectural awareness. See ACP Agent below.
What changes for your AI
Without carto, your AI greps text and guesses. With carto, it queries structural facts:
"Add rate limiting to /api/users"
Without carto: AI grep-searches your codebase, finds 12 files mentioning "users", picks 3 at random, hopes for the best.
With carto: AI calls get_change_plan("add rate limiting to /api/users") and gets back:
- The exact route handler file
- All 7 files that import it (blast radius)
- The AUTH domain it lives in
- Similar middleware patterns already in the codebase
- Cross-domain dependencies that could break
Your AI's response goes from "here's a guess" to "here's the right change with the architectural context."
Catching bad changes before save
The harder problem isn't finding the right file — it's stopping a confident-sounding AI from writing a refactor that breaks the rest of the repo. Carto exposes a diff-shaped query the AI can call before the user accepts a change:
AI proposes a 12-line patch to
packages/pg-meta/src/pg-format/index.ts. Before showing the diff, it callsvalidate_diff(patch).
# Diff Validation
**Risk:** 🔴 HIGH
**Files changed:** 1
**Union blast radius:** 83 transitive dependents
## Violations (1)
| Severity | Kind | File | Detail |
|----------|-------------|----------------------------------------|-----------------------------------------------------------------|
| HIGH | high_blast | `packages/pg-meta/src/pg-format/index.ts` | Modifying this file affects 83 transitive dependents (>50). |
The AI sees this before it proposes the diff. It revises its plan, splits the change, or asks the user. The bad refactor never makes it to the screen. Sub-millisecond on a 7,000-file repo — see the Benchmarks section below.
Every validate_diff call is also written to a local SQLite log, so a session that runs five hours later can ask did_we_discuss_this("snake_case naming") and get back the prior decision. The AI stops re-deciding settled questions.
In the wild
Claude Code analyzing the supabase repo via carto. Real session, no editing — 5,974 files indexed in ~780ms, 86 routes, 4,839 import edges, 7 domains. The agent's own verdict at the bottom: "useful, especially for a large codebase like supabase. The blast radius + cross-domain tools are the most valuable."
Languages and frameworks
Import graph + symbols (any repo)
| Language | Extensions |
|---|---|
| JavaScript / TypeScript | .js .jsx .ts .tsx .mjs .cjs |
| Python | .py |
| Go | .go |
| Rust | .rs |
| Java | .java |
| C / C++ | .cpp .cc .cxx .h .hpp |
| C# | .cs |
| Ruby | .rb |
| R | .r .R |
| Prisma schema | .prisma |
| HTML | .html (for fetch() discovery) |
Route extraction (framework-aware)
| Framework | Language |
|---|---|
| Express, Next.js (App + Pages), tRPC, React Router | TypeScript / JavaScript |
| FastAPI, Flask, Django | Python |
| Gin, Echo, Chi, net/http | Go |
| Actix-web, Axum, Rocket | Rust |
| Spring MVC / Boot, JAX-RS | Java |
| ASP.NET Core | C# |
| Rails, Sinatra | Ruby |
Model extraction
| ORM / Schema | Language |
|---|---|
| Prisma, Zod, Drizzle, TypeScript interfaces | TypeScript / JavaScript |
| Pydantic, SQLAlchemy | Python |
| Go structs | Go |
| Rust structs | Rust |
JPA @Entity, Java records |
Java |
| EF Core, C# records | C# |
| ActiveRecord | Ruby |
TypeScript path aliases
Reads tsconfig.json / jsconfig.json paths config. @/components/Button resolves to the actual file in the import graph — blast radius works correctly for Next.js and Vite projects.
ACP Agent (Zed / JetBrains / VS Code)
Beyond MCP, Carto runs as a full ACP agent — an active coding agent with built-in architectural awareness.
User: "Add rate limiting to /api/users"
↓
Carto auto-queries its own SQLite:
- Blast radius of relevant files
- Domain context (AUTH)
- Similar patterns in codebase
↓
Builds rich prompt with structural context
↓
Sends to LLM (your API key) → streams answer + diffs back to editor
Setup in Zed
~/.config/zed/settings.json:
{
"agent_servers": {
"Carto": { "command": "carto", "args": ["agent"] }
}
}Bring Your Own Key
Carto supports any LLM provider — configure in your editor:
| Provider | Models |
|---|---|
| Anthropic | Claude Sonnet 4, Haiku |
| OpenAI | GPT-4o, GPT-4o-mini, o1, o3 |
| Google Gemini | Gemini 2.5 Pro, 2.5 Flash |
| Ollama | Any local model (free) |
| OpenRouter | Any model via single API |
| Groq | Ultra-fast inference |
| Together AI | Open-source models |
| Azure OpenAI | Enterprise deployments |
GitHub Action — PR impact reports
Drop carto onto every PR your repo gets. Posts a sticky comment on each pull request with the diff's blast radius, cross-domain violations, affected routes, and a risk badge.
.github/workflows/carto.yml:
name: Carto Impact Report on: pull_request: branches: [main] permissions: contents: read pull-requests: write jobs: carto: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 with: { fetch-depth: 0 } - uses: theanshsonkar/carto@v2.0.9
That's the whole config. The action handles npm install, builds (or restores from cache) the .carto/ index, runs carto pr-impact, and posts the comment via GITHUB_TOKEN.
Inputs
| Input | Default | What it does |
|---|---|---|
carto-version |
latest |
The carto-md npm version to install. Pin in production for reproducibility. |
base |
auto (origin/$GITHUB_BASE_REF) |
Git ref the PR branched from. |
head |
auto ($GITHUB_SHA) |
Git ref of the PR head. |
fail-on |
(empty) | Fail the workflow when risk meets/exceeds this severity. One of HIGH, MEDIUM, LOW. Empty = comment-only. |
comment-mode |
sticky |
sticky updates the existing carto comment in place. new posts a new comment every push. none skips posting (renders to stdout). |
node-version |
20 |
Node.js version on the runner. |
Outputs
| Output | Description |
|---|---|
risk |
Rolled-up risk: SAFE | LOW | MEDIUM | HIGH. Lets downstream steps gate behavior on Carto's verdict. |
comment-url |
URL of the posted/updated PR comment. |
What the comment looks like
## 🗺️ Carto Impact Report This PR touches AUTH and DATABASE domains. | Metric | Value | |--------|-------| | Risk | 🔴 HIGH | | Blast radius (union) | 23 files | | Files changed | 6 | | Cross-domain violations introduced | 2 | | High-impact file changed | src/auth/session.ts (8 direct dependents) | <details> <summary>Affected routes (4)</summary> - POST /auth/login — risk: HIGH - GET /auth/me — risk: HIGH - POST /auth/register — risk: MEDIUM - POST /api/users — risk: LOW </details> <details> <summary>Cross-domain violations (2)</summary> - auth/login.ts now imports from payments/billing.ts (AUTH→PAYMENTS) - database/user-repo.ts now imports from auth/jwt.ts (DATABASE→AUTH) </details>
Standalone CLI use
The action is a thin wrapper around carto pr-impact. Use it locally, in custom CI, or in pre-commit hooks:
carto pr-impact --base origin/main --head HEAD # markdown to stdout carto pr-impact --base origin/main --head HEAD --format json carto pr-impact --base origin/main --head HEAD --fail-on HIGH # exit 2 on HIGH risk
ANCI — the open spec for codebases describing themselves to AI
Every AI coding tool today re-discovers a codebase's architecture from scratch on every session. Cursor builds its own embedding index. Cline builds its own. Continue builds its own. Same parsing, every tool, every session.
ANCI (Architecturally Normalized Code Index) is the file format that fixes this. Two files at .carto/anci.{yaml,bin} that describe the codebase's architecture in a form any AI tool can read without indexing it itself. OpenAPI did this for REST APIs. ANCI does it for codebases.
carto sync writes both files automatically. The header is grep-able YAML; the body is a compact binary import graph. Spec lives in docs/anci/v0.1-DRAFT.md. Carto is the reference implementation.
carto anci publish # re-emit anci.{yaml,bin} from the index carto anci show # human-readable summary carto anci validate ./.carto # validate a published pair
Any tool can consume an ANCI pair without depending on Carto:
const { loadAnci } = require('carto-md/src/anci/consumer'); const reader = loadAnci('./.carto'); console.log(reader.domains); // [{ name: 'AUTH', file_count: 42 }, ...] console.log(reader.getHighImpactFiles(5)); // top 5 by transitive dependents console.log(reader.blastRadius('src/auth/session.ts'));// { count, hops, files: [...] } console.log(reader.simulateChangeImpact([ // multi-file change blast radius 'src/auth/session.ts', 'src/db/connection.ts', ]));
Status: v0.1.0-DRAFT — wire format may change up to v1.0. The reference implementation lives in this repo at
src/anci/.
Once carto is wired in, your AI tool can call any of these mid-task. You don't need to memorize them — your AI will pick the right ones.
| Tool | What it returns |
|---|---|
get_architecture() |
500-word project overview: domains, entry points, stack, key patterns. Use this first. |
get_change_plan(intent) |
Given "add rate limiting to /api/users" → files to touch, domains affected, blast radius, similar patterns |
get_blast_radius(file) |
All files affected by changing a given file, with hop distance |
simulate_change_impact(files) |
Union of all files transitively affected by changing multiple files at once. Powered by the bitmap engine — sub-millisecond on 7K-file repos. |
validate_diff(diff) |
Given a unified diff: violations (cross-domain imports, high-blast files), blast radius per file, risk level (SAFE/LOW/MEDIUM/HIGH), suggestions. Each call is recorded in the episodic memory log so other tools can ask "did we discuss this?". Sub-15ms p99. |
get_context(file) |
Everything about a file: domain, blast radius, neighbors, routes, models |
get_file_summary(file) |
What a file does, its role, key deps and dependents |
get_similar_patterns(file) |
Files with same domain, route shape, or shared deps — find conventions before writing new code |
get_routes() |
All API endpoints with file mapping |
get_structure() |
Import graph, entry points, high-impact files, tech stack |
get_domain(name) |
All routes, models, functions for a domain. Lazily regenerated when stale. |
get_neighbors(file, hops) |
Import graph neighbors: nodes and edges |
get_cross_domain() |
Import edges that cross domain boundaries |
search_routes(query) |
Search API routes by path or method |
get_models(domain?) |
All data models, optionally filtered by domain |
get_high_impact_files(n) |
Top N files by blast radius |
get_env_vars(domain?) |
All env vars with domain mapping |
get_domains_list() |
All detected domains with file, route, model counts |
get_recent_decisions(time_range, kind?) |
Recent validation decisions and architectural choices the AI has made in this project |
get_session_context(session_id?) |
Full context for an AI session: every decision and intervention, ordered chronologically |
did_we_discuss_this(topic) |
Substring search over the episodic memory log — avoid re-deciding settled questions |
get_intervention_history(file?) |
Past Carto-issued violations and suggestions, optionally filtered by file |
Latency: All bitmap-backed queries return in microseconds on real repos; cross-domain and simulate_change_impact settle under 2ms even on a 7.5K-file codebase like vscode. See Benchmarks for the per-tool table.
Episodic Memory
Carto remembers every diff it validates. The validate_diff tool writes one row per call into a local SQLite log (ai_sessions/decisions/interventions tables) — so a session that runs five hours later can still ask did_we_discuss_this("snake_case naming") and get back the prior decision. The log lives next to the index in .carto/carto.db — never sent over the network, never shared between projects.
Domain detection
Carto uses Leiden+CPM graph clustering — files that import each other heavily cluster together. Domain names are inferred from path tokens, with keyword hints for well-known patterns (AUTH, PAYMENTS, DATABASE, etc.).
Adaptive strategy: Repos under 100 files use keyword-only clustering (avoids over-fragmentation). Larger repos with dense import graphs get graph-based clustering with a gamma that scales continuously with repo size.
Works on any repo — not just SaaS apps. vscode gets AUTH/EVENTS/DATABASE. zed (Rust) gets DATABASE/AUTH/EVENTS. A game engine would get RENDERER/PHYSICS/AUDIO.
Custom domains via carto.config.json:
{
"domains": {
"EDITOR": ["editor", "monaco", "text"],
"WORKBENCH": ["workbench", "layout", "panel"]
}
}Full schema with anchor pinning (forces files into a domain regardless of clustering):
{
"domains": {
"AUTH": {
"keywords": ["auth", "login", "session"],
"anchor": ["src/auth/session.ts", "src/auth/middleware.ts"]
}
}
}Stability tracking: Carto tracks domain assignments across syncs. If >5% of files change domain, carto check flags it as unstable. Two consecutive syncs with no code changes always produce 0% drift.
CLI commands
| Command | What it does |
|---|---|
carto init |
Detect project, index codebase, generate AGENTS.md, install git hooks (pre-commit, post-checkout, post-merge, post-rewrite), auto-wire MCP into every AI tool found |
carto sync |
Full re-index (skips unchanged files via mtime+size cache). Called automatically by git hooks on commit/checkout/merge/rebase. |
carto serve |
Start MCP server (called by your AI tool — usually you don't run this directly). On every file-aware query the server mtime+size-checks the file and re-parses inline if stale. |
carto watch |
Optional. Live re-index on every file save. Not required — git hooks + lazy MCP re-parse keep the index fresh by default. Use only for AI-heavy workflows that write 50+ files between commits. |
carto agent |
Start ACP agent mode (for Zed / JetBrains / VS Code) |
carto impact <file> |
Blast radius: risk level, affected files, routes at risk |
carto pr-impact |
Diff-shaped impact report between two git refs. Markdown (default) or JSON. Used by the GitHub Action; works locally too. --fail-on HIGH|MEDIUM|LOW exits non-zero on threshold trip. |
carto check |
Cross-domain violations, high-risk uncommitted changes, domain health |
carto inspect |
Read-only diagnostic: index paths, sizes, freshness, bitmap sidecar shape, top-impact files, schema version, sync timestamps. --json for piping into jq. Never triggers a rebuild. |
carto remove |
Remove AGENTS.md and .carto/ from project |
Benchmarks
Measured on real open-source repos. Apple M-series, 8 CPUs, 8GB RAM. SHAs pinned in ~/carto-test-repos. Reproducible via ~/carto-test-repos/run-bench.sh.
Indexing speed
| Repo | Language | Indexed Files | First Run | Second Run | DB Size | Import Edges |
|---|---|---|---|---|---|---|
| prisma/prisma | TypeScript | 961 | 1.0s | 350ms | 1.1 MB | 1,387 |
| supabase/supabase | TypeScript | 6,330 | 5.4s | 1.2s | 4.8 MB | 5,189 |
| microsoft/vscode | TypeScript | 7,567 | 8.0s | 935ms | 14.3 MB | 13,335 |
| zed-industries/zed | Rust | 1,752 | 2.9s | 468ms | 4.8 MB | 2,110 |
Indexed Files counts what Carto actually parses — .ts/.js/.py/.go/.rs/... after excluding node_modules, build output, and per-file test patterns (*.test.* / *.spec.* / *.stories.* for JS/TS, test_*.py / *_test.py for Python, test_* / *_test.r for R). The on-disk file count of the repo is larger.
Second Run = carto sync after no changes. mtime+size checked before reading content — if nothing changed, nothing is re-parsed.
MCP query latency (bitmap engine vs SQLite)
Carto's MCP query path is bitmap-backed on five tools, plus a sixth (simulate_change_impact) that's only feasible with bitmap OR-aggregation. Speedups measured against the same data, same DB, same machine — bitmap path vs the equivalent SQLite path.
| Tool | vscode (7,567 files) | Speedup vs SQLite |
|---|---|---|
get_blast_radius |
sub-ms | 10.7× |
get_cross_domain |
2.1ms | 6.2× |
get_high_impact_files |
sub-ms | 559× |
get_similar_patterns |
sub-ms | 73× |
simulate_change_impact |
sub-ms | 6.5× (no SQLite equivalent at this latency) |
Median speedup across all five tools on vscode: 10.7×. Smaller repos with denser graphs hit higher peaks — supabase get_high_impact_files clocks well over 100× on its tighter import graph. Reproducible via npm run bench:bitmap -- --repo <path>.
validate_diff latency
The new diff-shaped query that lets the AI ask "is this patch safe?" before showing it to the user. Profiled with a representative 20-line diff against 20 random mid-blast-radius files per repo, 1000 calls each.
| Repo | Files | p50 | p99 |
|---|---|---|---|
| supabase | 6,259 | 0.082ms | 0.298ms |
| vscode | 7,567 | 0.084ms | 0.489ms |
Budget was p50 ≤ 5ms, p99 ≤ 15ms. Both targets are cleared by 30-60×. The bitmap engine handles every blast-radius and cross-domain query in microseconds; what's left is diff parsing + result aggregation. Reproducible via node bench/validation-perf/index.js --repo <path>.
Scale
Synthetic stress sweep + real-world corpus, dense-fan-out worst case all the way to 50K files. Headline numbers (full table in docs/scale.md):
- Synth 50K files: init 1.1m,
blast_radiusp50 22µs,simulate_change_impactp50 50µs,high_impact_filesp50 750ns. The dense Uint32Array bitset hits 415 MB on disk and 1.35 GB peak RSS at this size — the Tier-2 Roaring upgrade per PEAK §9.6 is the next move. - Real-world (vscode, 7,567 files):
blast_radiusp50 2.7µs,cross_domainp50 1.23ms,similar_patternsp50 834ns,simulate_change_impactp50 19µs.
Reproducible via npm run bench:scale -- --size <N> (synth) and node bench/scale-test/real-world.js --repo <path> (any local clone, including Linux kernel or Chromium).
Domains detected
| Repo | Domains |
|---|---|
| prisma | CORE · DATABASE · AUTH · EVENTS |
| supabase | CORE · AUTH · DATABASE · PAYMENTS · NOTIFICATIONS · EVENTS · TRPC |
| vscode | EXTENSIONS · AUTH · EVENTS · DATABASE · EXTENSION · CLI · CORE |
| zed (Rust) | CORE · DATABASE · AUTH · EVENTS · PAYMENTS · TRPC · NOTIFICATIONS |
vscode at 7,567 indexed files in around 8 seconds. Rust import graph working on zed (2,110 edges from mod declarations and use crate:: paths).
Accuracy
12 corpus repos pass node test/accuracy-corpus.js --samples 100 — full parity between the bitmap path and the SQLite path on blastRadius, crossDomain, highImpactFiles, and simulateChangeImpact. The bitmap layer is a speedup, never a behavior change.
How it works
carto init
↓
Discovers all files (no cap — SQLite handles the volume)
mtime+size check → skip unchanged files
tree-sitter parse → imports + symbols (0.05–0.2ms/file)
Babel deep parse → routes + models (API handler files only)
Leiden+CPM graph clustering → auto-detects domains
Computes reverse deps → blast radius for every file
Writes AGENTS.md + .carto/context/*.md (lazy, on-demand)
Auto-wires MCP into every AI tool found
Installs 4 git hooks: pre-commit, post-checkout, post-merge, post-rewrite
↓
[no daemon, no watcher, no background process]
↓
─── Freshness mechanism 1: git hooks (90% of cases) ───
You commit / pull / switch branches / rebase
→ hook runs `carto sync` quietly in <1s
→ only changed files re-parsed
↓
─── Freshness mechanism 2: lazy mtime check (the gap) ───
You edit files between commits, AI asks "blast radius of db.ts?"
→ MCP server stats db.ts (mtime+size vs DB row)
→ if stale, re-parses just that file inline (<50ms)
→ returns fresh answer
↓
─── Optional: carto watch (AI-heavy workflows only) ───
File saved → debounce 50ms → re-parse 1 file → SQLite write → <50ms
What Carto never does
- Sends your code anywhere. Local only. SQLite on disk.
- Writes secrets into AGENTS.md.
.cartoignoreblocks.envand credential files by default. - Touches your manual notes. Writes only between
<!-- CARTO:AUTO:START -->and<!-- CARTO:AUTO:END -->. - Forces you to install a C++ toolchain. Prebuilt native binaries ship for macOS arm64, Linux x64 (glibc + musl/Alpine), and Windows x64. Intel Macs and other platforms transparently fall back to building from source, then to regex-only extraction if that fails.
- Costs money. MIT license. Free forever.
Origin
I was building Emfirge — a cloud security agent that maps AWS infrastructure into a graph and simulates the blast radius of every change.
To make the AI inside Emfirge understand infrastructure, I wrote a module called cartography.py. It mapped AWS resources, built a graph of how they connected, and wrote it into a structured map. The AI stopped hallucinating. It worked with facts, not guesses.
Carto is the same idea, applied to source code. Same insight: AI agents stop guessing once they can query the architecture.
License
MIT. Free forever.
Your code changes. Carto knows. Every AI you use knows.
