GitHub - AUrbanec/sphinxdocs_mcp: An MCP server to allows AI agents to query any docs for repos which use Sphinx

4 min read Original article ↗

This project exposes documentation from any Sphinx-based repository through an MCP stdio server. It can:

  1. Clone/update a target repo (public or private GitHub).
  2. Build Sphinx plain-text docs.
  3. Index docs into SQLite FTS5.
  4. Optionally build OpenAI embeddings into sqlite-vec.
  5. Serve MCP tools for docs search and source-file reads.

MCP Tools

  • search_docs(query): FTS5 search (default) or hybrid (FTS5 + vector + RRF)
  • read_doc_page(filepath): read a full docs page returned by search
  • read_source_code(module_path): read a source file from the configured repo

MCP Resources

The server also exposes explicit resources:

  • mcp://sphinx-docs
  • mcp://sphinx-docs/index
  • mcp://sphinx-docs/status
  • mcp://sphinx-docs/config

You can change the namespace (sphinx-docs) with MCP_RESOURCE_NAMESPACE in .env.

Prerequisites

  • Python 3.11+
  • Git
  • Build tools required by your target repo docs (commonly make, pandoc, LaTeX, etc.)
  • For hybrid mode: OpenAI API key (OPENAI_API_KEY)

Sphinx versions and extension requirements are intentionally not pinned in this project. They should be installed by your target repo via the configurable commands in .env.

Quick Start (Local)

From this repository root:

cp .env.example .env
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip setuptools wheel
pip install -r requirements.txt

Edit .env at minimum:

REPO_URL=https://github.com/owner/repo.git
REPO_REF=main
SEARCH_MODE=fts5
REPO_INSTALL_COMMAND="pip install -e ."
DOCS_BUILD_COMMAND="make -C docs text"

Then run:

python setup_repo.py
python server.py

.env Configuration

Copy .env.example to .env and set values for your repo.

Required/typical:

  • REPO_URL: repo clone URL
  • REPO_REF: branch/tag to clone
  • MCP_RESOURCE_NAMESPACE: resource URI namespace (default sphinx-docs)
  • REPO_DIR: local checkout path (relative to current working dir)
  • DOCS_TEXT_DIR: text-doc output path relative to REPO_DIR (default docs/_build/text)
  • REPO_INSTALL_COMMAND: dependency/project install command
  • DOCS_BUILD_COMMAND: command that builds Sphinx text docs
  • SEARCH_MODE: fts5 (default) or hybrid

Useful optional:

  • DOCS_INSTALL_COMMAND: extra docs dependency command
  • AUTO_UPDATE_REPO=true: fetch/pull repo on each setup_repo.py run
  • CLONE_DEPTH: initial clone depth (default 1, shallow clone)
  • FORCE_REBUILD=true: rebuild docs + index even if outputs exist
  • DB_PATH: index database output path
  • SOURCE_ROOT: root folder used by read_source_code
  • VECTOR_DB_PATH: path to sqlite database for vector index (hybrid mode)
  • MAX_FILE_READ_BYTES: max bytes returned by read_doc_page/read_source_code

Search Modes

fts5 (default)

  • Uses only SQLite FTS5.
  • Requires no OpenAI credentials.

hybrid

  • Uses OpenAI embeddings + sqlite-vec nearest-neighbor lookup.
  • Combines FTS and vector ranks with Reciprocal Rank Fusion (RRF).
  • Requires OPENAI_API_KEY.

Set in .env:

SEARCH_MODE=hybrid
OPENAI_API_KEY=your-openai-api-key
EMBEDDING_MODEL=text-embedding-3-small
EMBEDDING_DIMENSIONS=1536
VECTOR_DIMENSIONS=1536
VECTOR_DISTANCE_METRIC=cosine
VECTOR_TOP_K=20
FTS_CANDIDATE_LIMIT=25
RRF_K=60
RRF_FTS_WEIGHT=1.0
RRF_VECTOR_WEIGHT=1.0

Important:

  • VECTOR_DIMENSIONS must exactly match embedding vector size.
  • If EMBEDDING_DIMENSIONS is set, it controls output size from OpenAI.
  • setup_repo.py rebuilds vector index when needed in hybrid mode.
  • Current limitation: vector rebuild is full refresh (drop + rebuild), not incremental.
  • Both FTS and vector indexes include schema version metadata and are auto-rebuilt when outdated.

Public vs Private GitHub Repos

For public repos, no credentials are needed.

For private GitHub repos, set:

GITHUB_USERNAME=your-github-username
GITHUB_TOKEN=your-personal-access-token

setup_repo.py uses temporary per-process Git auth headers for GitHub HTTPS operations, so credentials are not written into the remote URL or git config. Existing local clones can also be reused by setting REPO_DIR to that path. If AUTO_UPDATE_REPO=true and the clone was shallow (CLONE_DEPTH=1), setup_repo.py automatically unshallows before fetching tags/history.

Docker Workflow

From this repository root:

cp .env.example .env
# edit .env
docker compose build
docker compose run --rm --no-deps -T sphinx-mcp

Notes:

  • The container runs /app/start_server.sh (setup_repo.py then server.py).
  • Docker healthcheck runs /app/healthcheck.py to verify index readiness.
  • Data persists in the named Docker volume sphinx-mcp-data.
  • First run can take time due to clone/build/index.
  • In hybrid mode, first run also generates embeddings and vector index.

IDE MCP Configuration

Option A: Docker-backed MCP

{
  "mcpServers": {
    "sphinx-docs": {
      "command": "docker",
      "args": [
        "compose",
        "-f",
        "/absolute/path/to/sphinxdocs_mcp/docker-compose.yml",
        "run",
        "--rm",
        "--no-deps",
        "-T",
        "sphinx-mcp"
      ]
    }
  }
}

Option B: Local virtualenv MCP

{
  "mcpServers": {
    "sphinx-docs": {
      "command": "/absolute/path/to/sphinxdocs_mcp/.venv/bin/python",
      "args": ["/absolute/path/to/sphinxdocs_mcp/server.py"],
      "cwd": "/absolute/path/to/sphinxdocs_mcp"
    }
  }
}

Option C: IDE on Windows, server in WSL

{
  "mcpServers": {
    "sphinx-docs": {
      "command": "wsl",
      "args": [
        "-e",
        "bash",
        "-lc",
        "cd /absolute/path/to/sphinxdocs_mcp && .venv/bin/python server.py"
      ]
    }
  }
}

Troubleshooting

  • If search fails: run python setup_repo.py and confirm DB_PATH exists.
  • If docs file reads fail: verify DOCS_TEXT_DIR points to Sphinx text output.
  • If private clone fails: verify PAT scopes and REPO_URL uses https://github.com/....
  • If startup is slow: keep AUTO_UPDATE_REPO=false and avoid FORCE_REBUILD unless needed.
  • If hybrid search fails: verify OPENAI_API_KEY, SEARCH_MODE=hybrid, and matching EMBEDDING_DIMENSIONS/VECTOR_DIMENSIONS.
  • If resource reads fail with Unknown resource: verify your client reads one of mcp://<MCP_RESOURCE_NAMESPACE>, /index, /status, or /config.