|
Note: This manifest describes a system designed for "Zero Data Loss" ingestion. It prioritizes accuracy and auditability over speed. |
|
|
|
# Enterprise RAG Core – Feature Manifest V2.55 |
|
|
|
**Version:** V2.55 (Public Release Candidate) |
|
**Status:** Production Ready / Code Verified |
|
**Summary:** A high-precision, hybrid Graph-Vector RAG platform featuring a multi-lane ingestion engine with consensus reconciliation. |
|
|
|
--- |
|
|
|
## 1. Agent Service (The Orchestrator) |
|
*Handles query planning, decomposition, and context synthesis.* |
|
|
|
* **Query Decomposition Engine:** |
|
* **Plan-and-Solve Pattern:** Breaks complex user prompts into multi-step execution plans. |
|
* **Sub-Query Modeling:** Auto-generates dependencies (`depends_on`) between steps. |
|
* **Routing:** Dynamically routes sub-queries to Vector Search, Graph Traversal, or Mathematical Calculation tools. |
|
* **Semantic Caching (Redis + Embeddings):** |
|
* **Similarity Matching:** Caches responses based on vector similarity (>95%) rather than exact string matching. |
|
* **Performance:** ~40x latency reduction for recurring semantic queries. |
|
* **Cost Efficiency:** estimated 80% reduction in LLM token usage for high-traffic topics. |
|
* **Resilience & API:** |
|
* **Streaming Responses:** Real-time token streaming with state progress updates. |
|
* **Rate Limiting:** IP-based throttling (SlowAPI). |
|
* **Session Management:** Full conversation history with "Time-to-Live" support. |
|
|
|
## 2. Ingest Service (The Multi-Lane Engine) |
|
*The system's USP. Instead of a single extraction method, we use parallel lanes and a consensus engine.* |
|
|
|
* **Pipeline Routing ("The Triage"):** |
|
* Analyzes incoming documents for complexity (layout, scan quality, text layer). |
|
* Routes to one or more specific processing lanes based on confidence scoring. |
|
* **Multi-Lane Architecture:** |
|
* **Lane A (Fast/LedZeppelin):** Raw text extraction via PyMuPDF. <100ms/page. |
|
* **Lane B (Smart/Goethe):** Structure-aware extraction using Docling (Tables, Headers, Markdown). |
|
* **Lane C (Vision/Hawk):** VLM-based extraction (Ollama Vision) for charts, photos, and complex layouts. |
|
* **"Solomon" Consensus Engine:** |
|
* **Parallel Execution:** Runs selected lanes concurrently. |
|
* **Reconciliation:** Compares "Ground Truth" (Text) against "Visual" (Vision) layers. |
|
* **Conflict Resolution:** Merges outputs to maximize coverage and accuracy. |
|
* **Entity Extraction:** |
|
* LLM-based extraction of 8 core entity types (Person, Org, Location, Skill, etc.) and 7 relationship types. |
|
* **JSON Schema Enforcement:** Ensures strict adherence to the Graph Schema. |
|
* **Control Room API:** |
|
* Real-time WebSocket feeds for pipeline status. |
|
* Live metrics on batch processing and lane performance. |
|
|
|
## 3. Knowledge Service (Vector Layer) |
|
*Optimized for unstructured semantic retrieval.* |
|
|
|
* **ChromaDB Integration:** |
|
* Custom HNSW configuration for Cosine/L2 distance metrics. |
|
* Batch chunk ingestion with metadata preservation (page numbers, source refs). |
|
* **Retrieval Logic:** |
|
* **Metadata Filtering:** Pre-filtering chunks based on document ownership or attributes. |
|
* **Score Normalization:** Standardizes distance metrics to similarity scores (0-1). |
|
|
|
## 4. Graph Service (Context Layer) |
|
*Optimized for structured relationships and "Multi-Hop" reasoning.* |
|
|
|
* **Neo4j Implementation:** |
|
* **Strict Schema:** Pre-defined ontology for Enterprise contexts (Entities: `Organization`, `Person`, `Contract`, etc.). |
|
* **Cypher Injection Protection:** Strict allow-listing of property names and relationship types. |
|
* **Graph Algorithms:** |
|
* **Traversal:** recursive retrieval of connected entities (e.g., "Who reports to X who works on Project Y?"). |
|
* **Constraint Management:** Enforced uniqueness constraints to prevent node duplication. |
|
|
|
## 5. Shared Infrastructure & Security |
|
|
|
* **Observability Stack:** |
|
* **OpenTelemetry:** Distributed tracing across all microservices (instrumented for Jaeger). |
|
* **Prometheus/Grafana:** Metrics for request latency, LLM token usage, and cache hit rates. |
|
* **Audit Logging:** Immutable logs of every data access intent (User X viewed Document Y). |
|
* **Security:** |
|
* **RBAC:** Granular permissions (`INGEST`, `READ`, `ADMIN`, `SERVICE_INTERN`). |
|
* **Auth:** Dual-mode authentication (JWT for users, API Keys for service-to-service). |
|
* **Input Sanitization:** Protection against Path Traversal and ReDoS (Regular Expression Denial of Service). |
|
|
|
## 6. Tech Stack |
|
|
|
* **Core:** Python 3.11, FastAPI, Pydantic V2 |
|
* **Orchestration:** LangGraph, LangChain |
|
* **Databases:** Neo4j 5.x (Graph), ChromaDB (Vector), Redis 7 (Cache/Queue) |
|
* **AI/ML:** Ollama (Local Inference), Docling (PDF Processing), PyMuPDF |
|
* **Ops:** Docker Compose, Prometheus, OpenTelemetry |
|
|
|
--- |
|
|
|
## Roadmap (Coming in V3.0) |
|
|
|
* **Cross-Encoder Reranking:** Re-scoring retrieval results for higher precision. |
|
* **Full Human-in-the-Loop (HITL) UI:** A frontend for manually resolving the specific edge cases flagged by the Solomon Consensus Engine. |
|
* **Distributed Tracing UI:** Full visualization of the request lifecycle in Grafana. |
|
|
|
--- |
|
|
|
### Why this exists? |
|
Most RAG systems fail on complex PDFs or hallucinate relationships. By splitting ingestion into **"Lanes"** (Text, Vision, |
|
Layout) and having them vote on the result (**Consensus**), and then storing the data in both a **Vector DB** (for fuzzy |
|
search) and a **Knowledge Graph** (for hard facts), we achieve a much higher level of factual consistency for enterprise use |
|
cases. |
|
|