GitHub - laleshii/engrym: A fast, AI-first code repository knowledge base.

6 min read Original article ↗

A knowledge base for your codebase — stored as Markdown your coding agent can read and write.

engrym captures what a repo knows (architecture, decisions, the non-obvious gotchas) as a graph of plain .md files, then builds a disposable SQLite index over them for instant keyword, semantic, topic, and graph search. It's built so coding agents retrieve that knowledge before a task and record durable findings after — so the same things stop getting re-explained every session. The source of truth stays plain Markdown with a little YAML frontmatter, so a human reviews it in a normal diff.

engrym — a play on engram, a stored memory trace.

Why you'd want it

  • Onboard in minutes, not days. engrym search "how does auth work" returns the exact passage — no spelunking through the codebase.
  • Your agent stops re-deriving the obvious. Knowledge compounds in the repo instead of evaporating when the chat window closes.
  • No lock-in, no cloud. Just Markdown plus a rebuildable index. Embeddings run locally and offline by default — your code never leaves the machine.
  • Reviewable like code. Every fact is a line in a .md file; changes show up in pull requests.

Install

Needs a Rust toolchain. If you don't have one:

brew install rust                                                # macOS (Homebrew)
# or, any platform, via rustup:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh   # then: source "$HOME/.cargo/env"

Then install from crates.io (cargo builds it and places engrym on your PATH; all deps, incl. bundled SQLite, are pulled in — no system libraries needed):

Or build from a clone of this repo:

cargo install --path .     # builds + installs onto PATH
# or just: cargo build --release   → binary at target/release/engrym

Quick start

In any repo, one command sets everything up:

engrym init      # scaffold engrym + hand off to your agent to build the initial KB

init writes engrym.toml, installs the agent skills, and hands a prompt to your coding agent (Claude Code, Codex, …) to author the first docs from your codebase. From then on the agent retrieves and records knowledge on its own.

Just want to try it without touching the repo? Add --local:

engrym init --local   # KB lives outside the repo, in ~/.engrym/ — zero files added

Local mode keeps the entire KB (docs + index) under ~/.engrym/, keyed to the git root, so nothing is committed and the working tree stays clean. It's the low-commitment way to start; everything below works identically. (See Local mode for how the agent still finds it.)

Once a KB exists, query it:

engrym search "how does hybrid search work"   # hybrid keyword + semantic retrieval
engrym topic indexing                         # everything under a topic
engrym related hybrid-search                  # a document's typed graph neighborhood
engrym show engrym-overview                   # print a document
engrym browse                                 # read & navigate the KB in your browser
engrym index                                  # (re)build the index after editing docs

Every command takes --json (for agents) and --repo <dir> (target another repo). This repo dogfoods itself — its docs/ is an engrym KB about engrym, so the queries above all work right here, right now.

Commands

Notation: <required>, [optional], a|b = choose one. Anything not bracketed is typed literally.

Command What it does
engrym init [--local] [--docs <dir>] Scaffold a repo and hand off to an agent
engrym index [--no-embed] (Re)build the index
engrym search <query> [--keyword|--semantic] [--altitude <n>] Retrieve passages
engrym topic <path> List documents under a topic
engrym related <id> Show a document's graph neighborhood
engrym show <id> Print a document
engrym new <id> … Create a document (also set, rm, relocate)
engrym lint [--strict] Validate the frontmatter contract
engrym browse [--port <n>] [--open] Local web UI to read/navigate the KB
engrym serve [--stop] Warm embedding daemon (usually automatic)
engrym install <skills|memory> Install agent skills, or record the repo in agent memory
engrym uninstall <skills|memory> Inverse of install
engrym reset Delete the KB's documents + index (keeps config)
engrym deinit Remove engrym from the repo entirely (inverse of init)

Data model

Each document is Markdown with a small frontmatter contract (full spec: spec/document-schema.md):

---
id: oauth-token-refresh          # required · stable, unique — the identity
title: OAuth token refresh flow  # required
altitude: 3                      # required · 0 = overview … 3 = impl detail
topics: [backend/auth/oauth]     # required · slash-paths, hierarchy implicit
relations:                       # optional · typed edges to other ids
  - { type: refines,    target: auth-architecture }
  - { type: depends_on, target: token-store }
---
Body prose. Inline [[wikilinks]] become `references` edges for free.

Three hierarchies make "abstract → specific" navigable: the topic taxonomy (engrym topic), typed relations (refines/part_of/depends_on/…), and altitude (0–3). A document's id is its identity — links never reference file paths — so the on-disk layout (flat / topic / altitude) is purely for human review, and relocate rearranges files safely.

How it works

  • Hybrid search — BM25 (exact terms, identifiers) and vector cosine (meaning) fused with reciprocal rank fusion. --keyword / --semantic force one ranker; an unembedded index falls back to keyword.
  • Local embeddings — offline by default (fastembed, bge-small-en-v1.5). Your code never leaves the machine; only changed passages re-embed.
  • Warm daemon — the first semantic query spawns a tiny background daemon that keeps the model resident (~130ms → ~13ms), self-terminating when idle. Auto-managed; ENGRYM_NO_DAEMON=1 opts out.
  • Authoringnew/set/rm generate frontmatter (never hand-write it) and edit source files by id, so they work regardless of index freshness.

Agents

init installs two skills into your chosen agent (Claude Code, Codex, …): a bootstrap skill that builds the initial KB, and a working skill that retrieves before a task and captures durable findings after — pull-based and model-judged, never a hook on every prompt.

Local mode (engrym init --local) keeps the entire KB outside the repo (~/.engrym/projects/<repo>-<hash>/, keyed by git root) so the repo is never touched. Because there's then no in-repo cue, init --local also records the repo in the agent's global memory (~/.claude/CLAUDE.md, ~/.codex/AGENTS.md); install/uninstall memory manage it on demand.

Configuration (engrym.toml)

[docs]
root   = "docs"                # where the Markdown KB lives
layout = "altitude"            # flat | topic | altitude

[embedding]
provider = "local"             # offline by default
model    = "bge-small-en-v1.5"

[search]
rrf_k = 60

[lint]
strict = false                 # CI passes --strict

[daemon]
enabled   = true
idle_secs = 300

Architecture

Markdown + frontmatter   →   SQLite index (.engrym/)   →   CLI / agent
(authored, git-tracked)      (derived, gitignored)         (query surface)

The index is never hand-edited and always rebuildable from the docs. Schema: spec/index-schema.sql. Deeper design notes live in the KB itself — try engrym search "…" or read docs/.