GitHub - 2dogsandanerd/DAUT: DAUT – Documentation Auto Updater - AI-powered documentation generator for your codebase. MCP-Connector

5 min read Original article β†—

πŸ“š DAUT - Documentation Auto-Update Tool

AI-powered documentation generator that keeps your docs in sync with your code

Python 3.9+ License: MIT Streamlit

DAUT scans your codebase, detects undocumented code, and automatically generates comprehensive documentation using LLM (Ollama). Perfect for maintaining up-to-date API docs, class references, and function documentation across Python, JavaScript, and TypeScript projects.

✨ Features

  • πŸ” Universal Code Scanner - Detects functions, classes, API endpoints across Python, JS, TS
  • πŸ€– AI Documentation Generation - Uses Ollama to generate human-readable docs
  • πŸ“Š Live Progress Tracking - Real-time progress bars and statistics
  • 🎯 Smart File Detection - Respects .gitignore, skips venv/node_modules automatically
  • πŸ’Ύ ChromaDB Integration - Semantic search and context-aware documentation
  • ⚑ Resume Support - Skip already-generated docs, continue where you left off
  • πŸ”Œ MCP Server - Expose RAG capabilities to external agents (Claude, Cursor, etc.)
  • 🎨 Beautiful UI - Streamlit-based interface + powerful CLI

🧠 RAG Strategy (Under the Hood)

DAUT uses a sophisticated structural indexing approach to ensure high-quality answers:

  1. Unified Knowledge Base 🌐 All files, regardless of their folder depth, are indexed into a single project-wide collection (e.g., rag_enterprise_core_code). This prevents context fragmentation and ensures the AI sees the "Big Picture".

  2. Full-Content Embedding πŸ“– Unlike simple splitters that chop text into arbitrary chunks, DAUT indexes the full content of your documentation files. This preserves the complete context of tutorials and guides.

  3. Structure-Aware Code Indexing πŸ—οΈ Code is not just text. We parse the AST (Abstract Syntax Tree) to treat Classes, Functions, and API Endpoints as distinct semantic entities.

πŸš€ Quick Start

Installation

# Clone and setup
git clone <your-repo>
cd doc_updater_app
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate
pip install -r requirements.txt

Generate Docs in 3 Steps

# Launch UI
streamlit run src/ui/main.py
  1. Select Project β†’ Browse to your codebase
  2. Scan β†’ Analyze code and find undocumented elements
  3. Generate β†’ AI creates comprehensive docs

CLI Mode:

python -m src.docs_updater /path/to/project

πŸ“Έ Screenshots

Scan Progress

Scanning with real-time progress

Analysis Dashboard

Diskrepancy analysis and statistics

AI Documentation Generation

Live documentation generation with Ollama

Generated Documentation Files

Auto-generated markdown documentation

Documentation Files Overview

Explorer view of generated docs

Explorer view of generated docs

🎯 Use Cases

  • API Documentation - Auto-generate REST API endpoint docs
  • Code Onboarding - Help new developers understand your codebase
  • Documentation Audits - Find and fix documentation gaps
  • Legacy Code - Document undocumented legacy systems
  • Continuous Docs - Keep docs in sync with code changes

πŸ’‘ Best Practices

Ensure High-Quality RAG Results

To avoid "diluting" the AI's knowledge base with outdated information:

  • Prioritize auto_docs: These files are generated directly from the current codebase and represent the "source of truth".
  • Exclude Legacy Docs: If you have an old docs/ folder with manual (potentially outdated) documentation, consider adding docs/ to the Exclude Patterns in the Filter Management sidebar.
  • Why? If the RAG system indexes both current code (via auto_docs) and outdated manuals (via docs/), it might retrieve conflicting information. By filtering out legacy docs, you ensure a "Pure Code-Truth" knowledge base.

πŸ“‹ Example Output

Input: Python function

def get_session(session_id: str):
    """Retrieve session history."""
    return db.query(session_id)

Generated Documentation:

## get_session

### Description
The `get_session` API endpoint retrieves the conversation history for
a specific session. Requires permission to view session history.

### Parameters
| Name | Type | Default |
|------|------|---------|
| session_id | str | None |

### Return Value
Returns the session history including session ID and message list.

### Example
```bash
GET /sessions/12345

Error Handling

Returns 500 on errors, 403 if permission denied.

doc_updater_app/ β”œβ”€β”€ src/ β”‚ β”œβ”€β”€ core/ # Config management, project analysis β”‚ β”œβ”€β”€ scanner/ # Code & documentation scanners β”‚ β”œβ”€β”€ matcher/ # Discrepancy detection β”‚ β”œβ”€β”€ llm/ # Ollama integration β”‚ β”œβ”€β”€ chroma/ # ChromaDB vector store β”‚ β”œβ”€β”€ updater/ # Documentation update engine β”‚ └── ui/ # Streamlit interface β”œβ”€β”€ requirements.txt └── setup.py


## πŸ”§ Configuration

**service_config.json:**
```json
{
  "ollama_host": "http://localhost:11434",
  "chroma_host": "localhost",
  "chroma_port": 8000,
  "ollama_timeout": 120
}

πŸ› οΈ Requirements

  • Python 3.9+
  • Ollama (optional, for AI generation)
    # Install: https://ollama.ai
    ollama pull llama3
  • ChromaDB (optional, for semantic search)
    pip install chromadb
    chroma run --path ./chromadb_data --port 8000

πŸ“š Supported Languages & Formats

Code:

  • Python (.py)
  • JavaScript/TypeScript (.js, .ts, .tsx, .jsx)

Documentation:

  • Markdown (.md)
  • reStructuredText (.rst)
  • Plain text (.txt)

🎨 Features in Detail

Smart Progress Tracking

πŸ” Scanning: [45/1234] 3.6% - api_service.py

[1/150] Verarbeite: get_session (api_endpoint)
    βœ… Gespeichert: get_session.api.md
[2/150] Verarbeite: delete_session (api_endpoint)
    ⏭️  Übersprungen (existiert): delete_session.api.md

Resume Support

Stop and restart anytime - already generated docs are automatically skipped!

Diskrepanz Analysis

  • Undocumented Code - Functions/classes without docs

  • Outdated Documentation - Docs that don't match current code

  • Mismatched Elements - Signature changes, parameter updates

  • Mismatched Elements - Signature changes, parameter updates

πŸ”Œ MCP Server Integration

DAUT includes a Model Context Protocol (MCP) server, allowing you to connect external AI agents (like Claude Desktop, Cursor, or other LLMs) directly to your project's knowledge base.

Features

  • Secure Access: Protected via API Key (Bearer Token).
  • RAG Tools:
    • query_rag(query): Semantic search in your code and documentation.
    • read_documentation_file(path): Read full content of generated docs.
    • list_documentation_files(): List available documentation.
  • Monitoring: Live connection tracking via the Web UI.

πŸš€ Usage

Manual Start:

# Start the server (Default port: 8001)
./start_mcp.sh

Auto-Start (Systemd): Run as a background service that survives reboots:

chmod +x install_service.sh
sudo ./install_service.sh

πŸ” Security & Configuration

The server requires an API Key for all requests. You MUST configure this to secure your data.

Setting the API Key:

  1. Edit start_mcp.sh (for manual start) or daut-mcp.service (for systemd).
  2. Change the MCP_API_KEY variable:
    export MCP_API_KEY="your-secure-password-here"
  3. Restart the server.

Environment Variables:

Variable Default Description
MCP_PORT 8001 Port for the MCP SSE endpoint
MCP_HOST 0.0.0.0 Bind address
MCP_API_KEY secret-token-123 REQUIRED: Auth token for clients

Connect a Client:

  • URL: http://<your-server-ip>:8001/mcp/sse
  • Auth: Header Authorization: Bearer <your-key>

🀝 Contributing

Contributions welcome! This project is under active development.

πŸ“„ License

MIT License - see LICENSE file for details

πŸ™ Acknowledgments

🚦 Project Status

Current Version: 1.0.0 (Stable)

All core features implemented:

  • βœ… Universal code scanning
  • βœ… AI documentation generation
  • βœ… Progress tracking and resume support
  • βœ… ChromaDB integration
  • βœ… Streamlit UI + CLI

πŸ“ž Support

Found a bug or have a feature request? Open an issue!


Made with ❀️ for developers who love good documentation