Graphify — Knowledge Graph Skill for AI Coding Assistants

5 min read Original article ↗

Open-Source Knowledge Graph Skill

Graphify is an open-source skill that helps AI coding assistants understand multi-modal codebases by building a queryable knowledge graph from code, docs, papers and diagrams.

$ pip install graphifyy

What is Graphify?

Graphify is a multi-modal knowledge graph builder created for AI coding assistants such as Claude Code, OpenAI Codex and OpenCode. By combining Tree-sitter static analysis with LLM-driven semantic extraction, Graphify turns an entire repository — including source code, documentation, research papers and diagrams — into an interactive graph that explains both what the code does and why it was designed that way. The project is maintained by Safi Shamsi, released under the permissive MIT license, and built on widely-trusted libraries including NetworkX and Tree-sitter.

3.7k+GitHub Stars

MITLicense

71.5×Token Reduction

Python 3.10+Runtime

Core Capabilities

Graphify unifies static analysis, semantic extraction and graph clustering into a single skill that any AI coding assistant can invoke.

Multi-Modal Extraction

Parses code (.py, .js, .go, .java, …), Markdown, PDFs and images. Tree-sitter extracts ASTs, call graphs and docstrings; LLMs extract concepts from prose; vision models read diagrams.

Knowledge Graph Build

Merges all extracted nodes and edges into a NetworkX graph and applies the Leiden algorithm for semantic community detection — no vector embeddings required.

God Nodes & Surprises

Identifies the highest-degree "god nodes" at the heart of the system and flags unexpected cross-file or cross-domain connections worth investigating.

Interactive Outputs

Exports an interactive graph.html, a queryable graph.json, and a human-readable GRAPH_REPORT.md audit report.

Assistant Integration

Ships with /graphify, /graphify query, /graphify path and /graphify explain commands for Claude Code, Codex, OpenCode and more.

Secure by Design

Strict input validation: only http/https URLs, size and timeout limits, path containment, HTML-escaped node labels — defending against SSRF, injection and XSS.

Architecture & Pipeline

Graphify is a multi-stage pipeline. Each stage is an isolated module so contributors can extend any step independently.

detect — collect files extract — AST + LLM nodes/edges build — NetworkX graph cluster — Leiden communities analyze — god nodes & surprises report — GRAPH_REPORT.md export — HTML / JSON / Obsidian

Supporting modules include ingest.py for URL fetching, cache.py for semantic caching, security.py for input validation, watch.py for live updates and serve.py for MCP-protocol service.

Install & Run

Graphify is distributed on PyPI. The package name is graphifyy; the CLI command remains graphify.

# Requires Python 3.10+
pip install graphifyy && graphify install

# Build a knowledge graph for any project folder
/graphify ./raw

# Outputs land in graphify-out/
graphify-out/
├── graph.html        # interactive visualization
├── GRAPH_REPORT.md   # core nodes, surprises, suggested questions
├── graph.json        # persistent, queryable graph
└── cache/            # incremental cache

Graphify does not bundle an LLM. It uses the model API key already configured by your AI coding assistant (Claude, Codex, etc.) and only sends semantic content — never raw source code — to the upstream model.

Worked Examples

The repository ships with reproducible corpora demonstrating Graphify on both small libraries and large mixed code-and-paper collections.

httpx (small)

6 Python files modeling an HTTP transport layer. Result: 144 nodes, 330 edges, 6 communities. God nodes: Client, AsyncClient, Response, Request. Surprise edge: DigestAuth → Response.

Karpathy mixed corpus

3 GPT framework repos + 5 attention papers + 4 diagrams (~52 files, ~92k words). Result: 285 nodes, 340 edges, 53 communities. Average query cost ~1.7k tokens vs ~123k naive — a 71.5× reduction.

Comparison

How Graphify relates to adjacent open-source projects in the code-intelligence space.

ProjectFocusStrengthLimitation vs Graphify
SourcegraphCross-repo code searchEnterprise-grade navigationNot a knowledge graph; limited design semantics
Code2VecFunction-level embeddingsVector retrieval & classificationNo graph structure, no multi-modal input
Neo4jGeneral graph databasePowerful Cypher queriesDoes not generate graphs from code itself

Security, Licensing & Trust

Graphify is released under the MIT License. Its core dependencies — NetworkX (BSD) and Tree-sitter (MIT) — are all permissive open-source licenses with no conflicts. The project performs no telemetry. The only outbound network call is the semantic-extraction step, which uses your own configured AI model API key; only semantic descriptions of documents are transmitted, never raw source code. URLs are restricted to http/https, downloads are size- and time-bounded, output paths are containment-checked, and node labels are HTML-escaped to prevent SSRF, Cypher injection and XSS.

Learn more about Graphify

Deeper guides on how Graphify builds, clusters and serves knowledge graphs to AI coding assistants.

Frequently Asked Questions

Does Graphify send my code to a third-party model?

No. Graphify only sends semantic descriptions of documents and diagrams to the AI model you have already configured in your assistant — never raw source files.

Which AI coding assistants are supported?

Claude Code, OpenAI Codex and OpenCode are supported out of the box via dedicated skill-*.md manifests. Any assistant that can call shell commands can invoke graphify.

How large a codebase can Graphify handle?

Tree-sitter parsing and NetworkX construction scale linearly with code size. On a ~500k-word corpus, BFS subgraph queries stay around ~2k tokens versus ~670k naive — preserving compression at scale.

Is Graphify free for commercial use?

Yes. Graphify is MIT-licensed and free for both personal and commercial use.