GitHub - rodmena-limited/ragit: Correct complete RAG -- built for Highway Workflow Engine

RAG toolkit for Python. Document loading, chunking, vector search, LLM integration.

Installation

Quick Start

You must provide an embedding source: custom function, Ollama, or any provider.

Custom Embedding Function

from ragit import RAGAssistant

def my_embed(text: str) -> list[float]:
    # Use any embedding API: OpenAI, Cohere, HuggingFace, etc.
    return embedding_vector

assistant = RAGAssistant("docs/", embed_fn=my_embed)
results = assistant.retrieve("search query")

With LLM for Q&A

def my_embed(text: str) -> list[float]:
    return embedding_vector

def my_generate(prompt: str, system_prompt: str = "") -> str:
    return llm_response

assistant = RAGAssistant("docs/", embed_fn=my_embed, generate_fn=my_generate)
answer = assistant.ask("How does authentication work?")

With Ollama (nomic-embed-text)

from ragit import RAGAssistant
from ragit.providers import OllamaProvider

# Uses nomic-embed-text for embeddings (768d)
assistant = RAGAssistant("docs/", provider=OllamaProvider())
results = assistant.retrieve("search query")

Core API

assistant = RAGAssistant(
    documents,           # Path, list of Documents, or list of Chunks
    embed_fn=...,        # Embedding function: (str) -> list[float]
    generate_fn=...,     # LLM function: (prompt, system_prompt) -> str
    provider=...,        # Or use a provider instead of functions
    chunk_size=512,
    chunk_overlap=50
)

results = assistant.retrieve(query, top_k=3)      # [(Chunk, score), ...]
context = assistant.get_context(query, top_k=3)   # Formatted string
answer = assistant.ask(question, top_k=3)         # Requires generate_fn/LLM
code = assistant.generate_code(request)           # Requires generate_fn/LLM

Index Persistence

Save and load indexes to avoid re-computing embeddings:

# Save index to disk
assistant.save_index("./my_index")

# Load index later (much faster than re-indexing)
loaded = RAGAssistant.load_index("./my_index", provider=OllamaProvider())
results = loaded.retrieve("query")

Thread Safety

RAGAssistant is thread-safe. Multiple threads can safely read while another writes:

import threading

assistant = RAGAssistant("docs/", provider=OllamaProvider())

# Safe: concurrent reads and writes
threading.Thread(target=lambda: assistant.retrieve("query")).start()
threading.Thread(target=lambda: assistant.add_documents([new_doc])).start()

Resource Management

Use context managers for automatic cleanup:

from ragit.providers import OllamaProvider

with OllamaProvider() as provider:
    response = provider.generate("Hello", model="llama3")
# Session automatically closed

Document Loading

from ragit import load_text, load_directory, chunk_text

doc = load_text("file.md")
docs = load_directory("docs/", "*.md")
chunks = chunk_text(text, chunk_size=512, chunk_overlap=50, doc_id="id")

Hyperparameter Optimization

from ragit import RagitExperiment, Document, BenchmarkQuestion

def my_embed(text: str) -> list[float]:
    return embedding_vector

def my_generate(prompt: str, system_prompt: str = "") -> str:
    return llm_response

docs = [Document(id="1", content="...")]
benchmark = [BenchmarkQuestion(question="...", ground_truth="...")]

experiment = RagitExperiment(
    docs, benchmark,
    embed_fn=my_embed,
    generate_fn=my_generate
)
results = experiment.run(max_configs=20)
print(results[0])  # Best config

License

Apache-2.0 - RODMENA LIMITED