Enterprise RAG Core – Feature Manifest V2.55

Note: This manifest describes a system designed for "Zero Data Loss" ingestion. It prioritizes accuracy and auditability over speed. # Enterprise RAG Core – Feature Manifest V2.55 **Version:** V2.55 (Public Release Candidate) **Status:** Production Ready / Code Verified **Summary:** A high-precision, hybrid Graph-Vector RAG platform featuring a multi-lane ingestion engine with consensus reconciliation. --- ## 1. Agent Service (The Orchestrator) *Handles query planning, decomposition, and context synthesis.* * **Query Decomposition Engine:** * **Plan-and-Solve Pattern:** Breaks complex user prompts into multi-step execution plans. * **Sub-Query Modeling:** Auto-generates dependencies (`depends_on`) between steps. * **Routing:** Dynamically routes sub-queries to Vector Search, Graph Traversal, or Mathematical Calculation tools. * **Semantic Caching (Redis + Embeddings):** * **Similarity Matching:** Caches responses based on vector similarity (>95%) rather than exact string matching. * **Performance:** ~40x latency reduction for recurring semantic queries. * **Cost Efficiency:** estimated 80% reduction in LLM token usage for high-traffic topics. * **Resilience & API:** * **Streaming Responses:** Real-time token streaming with state progress updates. * **Rate Limiting:** IP-based throttling (SlowAPI). * **Session Management:** Full conversation history with "Time-to-Live" support. ## 2. Ingest Service (The Multi-Lane Engine) *The system's USP. Instead of a single extraction method, we use parallel lanes and a consensus engine.* * **Pipeline Routing ("The Triage"):** * Analyzes incoming documents for complexity (layout, scan quality, text layer). * Routes to one or more specific processing lanes based on confidence scoring. * **Multi-Lane Architecture:** * **Lane A (Fast/LedZeppelin):** Raw text extraction via PyMuPDF. <100ms/page. * **Lane B (Smart/Goethe):** Structure-aware extraction using Docling (Tables, Headers, Markdown). * **Lane C (Vision/Hawk):** VLM-based extraction (Ollama Vision) for charts, photos, and complex layouts. * **"Solomon" Consensus Engine:** * **Parallel Execution:** Runs selected lanes concurrently. * **Reconciliation:** Compares "Ground Truth" (Text) against "Visual" (Vision) layers. * **Conflict Resolution:** Merges outputs to maximize coverage and accuracy. * **Entity Extraction:** * LLM-based extraction of 8 core entity types (Person, Org, Location, Skill, etc.) and 7 relationship types. * **JSON Schema Enforcement:** Ensures strict adherence to the Graph Schema. * **Control Room API:** * Real-time WebSocket feeds for pipeline status. * Live metrics on batch processing and lane performance. ## 3. Knowledge Service (Vector Layer) *Optimized for unstructured semantic retrieval.* * **ChromaDB Integration:** * Custom HNSW configuration for Cosine/L2 distance metrics. * Batch chunk ingestion with metadata preservation (page numbers, source refs). * **Retrieval Logic:** * **Metadata Filtering:** Pre-filtering chunks based on document ownership or attributes. * **Score Normalization:** Standardizes distance metrics to similarity scores (0-1). ## 4. Graph Service (Context Layer) *Optimized for structured relationships and "Multi-Hop" reasoning.* * **Neo4j Implementation:** * **Strict Schema:** Pre-defined ontology for Enterprise contexts (Entities: `Organization`, `Person`, `Contract`, etc.). * **Cypher Injection Protection:** Strict allow-listing of property names and relationship types. * **Graph Algorithms:** * **Traversal:** recursive retrieval of connected entities (e.g., "Who reports to X who works on Project Y?"). * **Constraint Management:** Enforced uniqueness constraints to prevent node duplication. ## 5. Shared Infrastructure & Security * **Observability Stack:** * **OpenTelemetry:** Distributed tracing across all microservices (instrumented for Jaeger). * **Prometheus/Grafana:** Metrics for request latency, LLM token usage, and cache hit rates. * **Audit Logging:** Immutable logs of every data access intent (User X viewed Document Y). * **Security:** * **RBAC:** Granular permissions (`INGEST`, `READ`, `ADMIN`, `SERVICE_INTERN`). * **Auth:** Dual-mode authentication (JWT for users, API Keys for service-to-service). * **Input Sanitization:** Protection against Path Traversal and ReDoS (Regular Expression Denial of Service). ## 6. Tech Stack * **Core:** Python 3.11, FastAPI, Pydantic V2 * **Orchestration:** LangGraph, LangChain * **Databases:** Neo4j 5.x (Graph), ChromaDB (Vector), Redis 7 (Cache/Queue) * **AI/ML:** Ollama (Local Inference), Docling (PDF Processing), PyMuPDF * **Ops:** Docker Compose, Prometheus, OpenTelemetry --- ## Roadmap (Coming in V3.0) * **Cross-Encoder Reranking:** Re-scoring retrieval results for higher precision. * **Full Human-in-the-Loop (HITL) UI:** A frontend for manually resolving the specific edge cases flagged by the Solomon Consensus Engine. * **Distributed Tracing UI:** Full visualization of the request lifecycle in Grafana. --- ### Why this exists? Most RAG systems fail on complex PDFs or hallucinate relationships. By splitting ingestion into **"Lanes"** (Text, Vision, Layout) and having them vote on the result (**Consensus**), and then storing the data in both a **Vector DB** (for fuzzy search) and a **Knowledge Graph** (for hard facts), we achieve a much higher level of factual consistency for enterprise use cases.