This project exposes documentation from any Sphinx-based repository through an MCP stdio server. It can:
- Clone/update a target repo (public or private GitHub).
- Build Sphinx plain-text docs.
- Index docs into SQLite FTS5.
- Optionally build OpenAI embeddings into
sqlite-vec. - Serve MCP tools for docs search and source-file reads.
MCP Tools
search_docs(query): FTS5 search (default) or hybrid (FTS5 + vector + RRF)read_doc_page(filepath): read a full docs page returned by searchread_source_code(module_path): read a source file from the configured repo
MCP Resources
The server also exposes explicit resources:
mcp://sphinx-docsmcp://sphinx-docs/indexmcp://sphinx-docs/statusmcp://sphinx-docs/config
You can change the namespace (sphinx-docs) with MCP_RESOURCE_NAMESPACE in .env.
Prerequisites
- Python 3.11+
- Git
- Build tools required by your target repo docs (commonly
make,pandoc, LaTeX, etc.) - For hybrid mode: OpenAI API key (
OPENAI_API_KEY)
Sphinx versions and extension requirements are intentionally not pinned in this
project. They should be installed by your target repo via the configurable
commands in .env.
Quick Start (Local)
From this repository root:
cp .env.example .env
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip setuptools wheel
pip install -r requirements.txtEdit .env at minimum:
REPO_URL=https://github.com/owner/repo.git REPO_REF=main SEARCH_MODE=fts5 REPO_INSTALL_COMMAND="pip install -e ." DOCS_BUILD_COMMAND="make -C docs text"
Then run:
python setup_repo.py python server.py
.env Configuration
Copy .env.example to .env and set values for your repo.
Required/typical:
REPO_URL: repo clone URLREPO_REF: branch/tag to cloneMCP_RESOURCE_NAMESPACE: resource URI namespace (defaultsphinx-docs)REPO_DIR: local checkout path (relative to current working dir)DOCS_TEXT_DIR: text-doc output path relative toREPO_DIR(defaultdocs/_build/text)REPO_INSTALL_COMMAND: dependency/project install commandDOCS_BUILD_COMMAND: command that builds Sphinx text docsSEARCH_MODE:fts5(default) orhybrid
Useful optional:
DOCS_INSTALL_COMMAND: extra docs dependency commandAUTO_UPDATE_REPO=true: fetch/pull repo on eachsetup_repo.pyrunCLONE_DEPTH: initial clone depth (default1, shallow clone)FORCE_REBUILD=true: rebuild docs + index even if outputs existDB_PATH: index database output pathSOURCE_ROOT: root folder used byread_source_codeVECTOR_DB_PATH: path to sqlite database for vector index (hybrid mode)MAX_FILE_READ_BYTES: max bytes returned byread_doc_page/read_source_code
Search Modes
fts5 (default)
- Uses only SQLite FTS5.
- Requires no OpenAI credentials.
hybrid
- Uses OpenAI embeddings +
sqlite-vecnearest-neighbor lookup. - Combines FTS and vector ranks with Reciprocal Rank Fusion (RRF).
- Requires
OPENAI_API_KEY.
Set in .env:
SEARCH_MODE=hybrid OPENAI_API_KEY=your-openai-api-key EMBEDDING_MODEL=text-embedding-3-small EMBEDDING_DIMENSIONS=1536 VECTOR_DIMENSIONS=1536 VECTOR_DISTANCE_METRIC=cosine VECTOR_TOP_K=20 FTS_CANDIDATE_LIMIT=25 RRF_K=60 RRF_FTS_WEIGHT=1.0 RRF_VECTOR_WEIGHT=1.0
Important:
VECTOR_DIMENSIONSmust exactly match embedding vector size.- If
EMBEDDING_DIMENSIONSis set, it controls output size from OpenAI. setup_repo.pyrebuilds vector index when needed in hybrid mode.- Current limitation: vector rebuild is full refresh (drop + rebuild), not incremental.
- Both FTS and vector indexes include schema version metadata and are auto-rebuilt when outdated.
Public vs Private GitHub Repos
For public repos, no credentials are needed.
For private GitHub repos, set:
GITHUB_USERNAME=your-github-username GITHUB_TOKEN=your-personal-access-token
setup_repo.py uses temporary per-process Git auth headers for GitHub HTTPS
operations, so credentials are not written into the remote URL or git config.
Existing local clones can also be reused by setting REPO_DIR to that path.
If AUTO_UPDATE_REPO=true and the clone was shallow (CLONE_DEPTH=1),
setup_repo.py automatically unshallows before fetching tags/history.
Docker Workflow
From this repository root:
cp .env.example .env
# edit .env
docker compose build
docker compose run --rm --no-deps -T sphinx-mcpNotes:
- The container runs
/app/start_server.sh(setup_repo.pythenserver.py). - Docker healthcheck runs
/app/healthcheck.pyto verify index readiness. - Data persists in the named Docker volume
sphinx-mcp-data. - First run can take time due to clone/build/index.
- In hybrid mode, first run also generates embeddings and vector index.
IDE MCP Configuration
Option A: Docker-backed MCP
{
"mcpServers": {
"sphinx-docs": {
"command": "docker",
"args": [
"compose",
"-f",
"/absolute/path/to/sphinxdocs_mcp/docker-compose.yml",
"run",
"--rm",
"--no-deps",
"-T",
"sphinx-mcp"
]
}
}
}Option B: Local virtualenv MCP
{
"mcpServers": {
"sphinx-docs": {
"command": "/absolute/path/to/sphinxdocs_mcp/.venv/bin/python",
"args": ["/absolute/path/to/sphinxdocs_mcp/server.py"],
"cwd": "/absolute/path/to/sphinxdocs_mcp"
}
}
}Option C: IDE on Windows, server in WSL
{
"mcpServers": {
"sphinx-docs": {
"command": "wsl",
"args": [
"-e",
"bash",
"-lc",
"cd /absolute/path/to/sphinxdocs_mcp && .venv/bin/python server.py"
]
}
}
}Troubleshooting
- If search fails: run
python setup_repo.pyand confirmDB_PATHexists. - If docs file reads fail: verify
DOCS_TEXT_DIRpoints to Sphinxtextoutput. - If private clone fails: verify PAT scopes and
REPO_URLuseshttps://github.com/.... - If startup is slow: keep
AUTO_UPDATE_REPO=falseand avoidFORCE_REBUILDunless needed. - If hybrid search fails: verify
OPENAI_API_KEY,SEARCH_MODE=hybrid, and matchingEMBEDDING_DIMENSIONS/VECTOR_DIMENSIONS. - If resource reads fail with
Unknown resource: verify your client reads one ofmcp://<MCP_RESOURCE_NAMESPACE>,/index,/status, or/config.