GitHub - ory/rerag-rebac: ReRAG - Secure retrieval augmented generation with Google Zanzibar style relationship based access control (ReBAC) and SQLite Vector

RAG (Retrieval-Augmented Generation) lets LLMs answer questions about documents by fetching relevant content and adding it to the prompt. It's everywhere: customer support, enterprise search, legal discovery. But RAG doesn't work in multi-user contexts where different users have different permissions. This repository shows how to fix it with ReBAC (relationship based access control) using Ollama and Ory Keto, an open source Google Zanzibar implementation.

TL;DR: Most RAG systems leak private data across users. This repo demonstrates permission-aware RAG that guarantees the LLM never sees unauthorized documents. Think Google Zanzibar meets embeddings — fork it, break it, extend it.

The Problem & Solution

RAG only

# Alice queries the system
curl -X POST /query -H "Auth: bad-actor" \
  -d '{"question": "What was the total refund?"}'
# Response: "$1,200 for John Doe and $3,500 for ABC Corp"  ❌ DATA LEAK

With ReRAG (ReBAC-powered RAG)

# Alice queries (can only see John Doe's docs)
curl -X POST /query -H "Auth: alice" \
  -d '{"question": "What was the total refund?"}'
# Response: "$1,200 for John Doe"  ✅

# Bob queries (can only see ABC Corp's docs)
curl -X POST /query -H "Auth: bob" \
  -d '{"question": "What was the total refund?"}'
# Response: "$3,500 for ABC Corporation"  ✅

# Bad actor queries (no docs at all)
curl -X POST /query -H "Auth: bad-actor" \
  -d '{"question": "What was the total refund?"}'
# Response: "You don't have access to any tax returns."  ✅

The model never sees text the user isn't authorized for. No prompt injection can leak it.

Quick demo

Prerequisites:

Docker (for Ollama)
Golang (1.22+)
tmux (optional, for make dev)

First clone the repository:

git clone https://github.com/ory/rerag-rbac-rag-llm.git
cd rerag-rbac-rag-llm

Then run the demo:

# Install dependencies (starts Ollama via Docker, installs Keto, pulls models)
make install

# If you have tmux (starts Keto and app in split panes):
make dev

# If you do not have tmux (run in separate terminals):
make start-keto  # Terminal 1
make start-app   # Terminal 2

# Setup and run demo
make demo

Note: This project requires CGO (C compiler) for sqlite-vec integration. Ensure you have a C compiler installed:

macOS: Install Xcode Command Line Tools (xcode-select --install)
Linux: Install build-essential (apt-get install build-essential)
Windows: Install MinGW-w64 or use WSL

This will:

Start Ollama via Docker and pull required models (llama3.2:1b, nomic-embed-text)
Install Keto and Go dependencies
Start Keto and the application server
Load demo documents
Run permission-aware queries showing different results per user

The Ollama container runs as rerag-ollama on port 11434. To stop it, run make reset.

See config.example.yaml for all configuration options.

Why this matters

Standard RAG pulls all matching documents into context, then relies on the LLM to "respect" permissions. That's a compliance nightmare waiting to happen. This architecture:

Filters at retrieval: Only authorized documents enter the vector search results
Never leaks: Unauthorized content never reaches the LLM context window
No prompt injection: Users can't trick the LLM into revealing data they shouldn't see
Audit-ready: Every permission check is logged and traceable
Transport security: Optional TLS/HTTPS encryption
Data at rest: Optional SQLite database encryption

Tech stack

All open source, runs locally:

Ory Keto: Google Zanzibar-based ReBAC for permissions
Ollama: Local LLM runner via Docker (llama3.2:1b for inference, nomic-embed-text for embeddings)
SQLite: Persistent vector storage with optional encryption
sqlite-vec: Fast vector similarity search directly in SQLite using KNN
Go: For performance and hackability (requires CGO for sqlite-vec)
Docker: For running Ollama in a container
TLS/HTTPS: Optional SSL encryption for secure transport

How it works

graph TD

    %% ------------------------
    %% Add documents flow
    %% ------------------------
    subgraph ADD["📥 Document Management"]
      AA["New Document (POST /documents)"]
      AA --> H["Permission Assignment (Ory Keto)"]
      AA --> DD["Generate Embeddings (Ollama)"]
      DD --> I
    end

    %% ------------------------
    %% Query flow
    %% ------------------------
    subgraph QUERY["🔎 Query Documents"]
      A["📝 User Query"]
      A --> B["🔒 Auth Middleware"]
      B --> D["🔍 Vector KNN Search (sqlite-vec)"]
      D --> E["🛂 Permission Check (Ory Keto)"]
      E --> F["🤖 LLM Processing (Ollama)"]
      F --> G["✅ Secure Response"]
      I["SQLite vec0 Virtual Table"]
      J["Ollama / LLM"]
    end

    %% Wiring external systems
    H --> E
    I --> D
    J --> F

Upload: Documents tagged with owner metadata, embeddings stored in sqlite-vec
Permissions: Relationships defined in Keto (who can see what)
Query: User asks a question, embedding generated
Vector Search: sqlite-vec performs efficient KNN search in SQLite
Filter: Permission check ensures user can access retrieved documents
Answer: LLM processes authorized subset only

Vector Search Performance

The system uses sqlite-vec for efficient vector similarity search directly in SQLite:

Native SQL operations: Vector search happens in the database, not in application memory
KNN algorithm: K-nearest neighbors search using cosine distance
Efficient storage: Vectors stored in a vec0 virtual table with automatic indexing
No memory overhead: Documents don't need to be loaded into memory for similarity computation
Scales with SQLite: Leverages SQLite's proven performance and reliability
Adaptive recursive search: Dynamically increases candidate pool when filtering reduces results
Permission-aware filtering: Efficiently handles sparse permission scenarios without over-fetching

Recursive Search Algorithm

When searching with permission filters, the system uses an adaptive approach:

Initial Search: Fetches topK × 2 candidates from sqlite-vec
Filter Application: Applies permission filter to candidates
Adaptive Expansion: If insufficient matches found:
- Recursively doubles the candidate pool (growth factor: 2.0)
- Continues until topK matches found or all documents searched
- Safety limit of 10 attempts prevents infinite recursion
Optimization: Stops early when enough matches found or no more documents exist

This approach balances efficiency with completeness, adapting to different permission distributions without requiring manual tuning.

API examples

# Upload document
curl -X POST localhost:4477/documents \
  -d '{"title": "Tax Return", "content": "...", "metadata": {"taxpayer": "John Doe"}}'

# Query with permissions
curl -X POST localhost:4477/query \
  -H "Authorization: Bearer alice" \
  -d '{"question": "What was the refund amount?"}'

# Check what Alice can see
curl localhost:4477/permissions -H "Authorization: Bearer alice"

Configuration

ReRAG supports flexible configuration via config files and environment variables.

Config File

Create a config.yaml file for persistent settings:

# Example configuration file for LLM RAG ReBAC OSS
# Copy this to config.yaml and modify as needed

# Server configuration
server:
  host: 'localhost'
  port: 4477
  read_timeout: 30 # seconds
  write_timeout: 30 # seconds

  # TLS/HTTPS configuration
  tls:
    enabled: false # Set to true to enable HTTPS
    cert_file: '' # Path to TLS certificate file (required if enabled)
    key_file: '' # Path to TLS private key file (required if enabled)
    min_version: '1.3' # Minimum TLS version ("1.2" or "1.3")

# Database configuration
database:
  path: 'data/vector_store.db'

  # Database encryption using SQLCipher
  encryption:
    enabled: false # Set to true to enable database encryption
    key: '' # Encryption key (required if enabled)

# External services
services:
  # Ollama configuration
  ollama:
    base_url: 'http://localhost:11434'
    embedding_model: 'nomic-embed-text'
    llm_model: 'llama3.2:1b' # A model that fits on your machine / use case
    timeout: 60 # seconds

  # Ory Keto configuration
  keto:
    read_url: 'http://localhost:4466'
    write_url: 'http://localhost:4467'
    timeout: 10 # seconds

# Security settings
security:
  auth_mode: 'mock' # "mock" or "jwt"
  jwt_secret: '' # JWT secret (required if auth_mode is "jwt")
  error_mode: 'detailed' # "detailed" or "secure"

# Application settings
app:
  environment: 'development' # "development", "staging", or "production"
  log_level: 'info' # "debug", "info", "warn", or "error"
  log_format: 'text' # "text" or "json"

Environment Variables

Override any setting with environment variables:

# Enable HTTPS
export SERVER_TLS_ENABLED=true
export SERVER_TLS_CERT_FILE=certs/cert.pem
export SERVER_TLS_KEY_FILE=certs/key.pem

# Enable database encryption
export DATABASE_ENCRYPTION_ENABLED=true
export DATABASE_ENCRYPTION_KEY=your-secret-key

# Production settings
export APP_ENVIRONMENT=production
export SECURITY_ERROR_MODE=secure

SSL/TLS Setup

For HTTPS support, generate certificates:

# Development certificates (not for production!)
mkdir certs
openssl req -x509 -newkey rsa:4096 -keyout certs/key.pem \
  -out certs/cert.pem -days 365 -nodes \
  -subj "/CN=localhost"

# Enable in config
echo "server:" > config.yaml
echo "  tls:" >> config.yaml
echo "    enabled: true" >> config.yaml
echo "    cert_file: certs/cert.pem" >> config.yaml
echo "    key_file: certs/key.pem" >> config.yaml

Database Encryption

Enable SQLite encryption for data at rest:

database:
  encryption:
    enabled: true
    key: 'your-32-character-encryption-key'

⚠️ Important: Store encryption keys securely using environment variables or key management systems in production.

Architecture Details

Vector Storage with sqlite-vec

The system uses a dual-table approach for efficient storage and retrieval:

documents table: Stores document metadata (id, title, content)
vec_documents virtual table: Stores vector embeddings using sqlite-vec's vec0 module

This separation allows:

Fast metadata queries without loading embeddings
Efficient vector similarity search using native SQLite operations
Dynamic embedding dimension support (auto-detected from first document)
Adaptive search that scales with permission filtering requirements

Permission-Aware Vector Search

The vector search implementation combines sqlite-vec's KNN algorithm with an adaptive recursive approach:

SQL Query Pattern:

-- Vector KNN search returning top K candidates
SELECT d.id, d.title, d.content, v.distance
FROM vec_documents v
JOIN documents d ON d.id = v.id
WHERE v.embedding MATCH ? AND k = ?
ORDER BY v.distance;

Adaptive Filtering Algorithm:

1. Start: Fetch topK × 2 candidates via KNN
2. Filter: Apply permission check to candidates
3. Evaluate:
   - If ≥ topK matches → Return results ✓
   - If all documents fetched → Return partial results ✓
   - Otherwise → Increase multiplier (×2) and recurse
4. Safety: Stop after 10 attempts, return best effort

Example Scenario:

User requests 5 documents
- Attempt 1: Fetch 10 candidates → 2 authorized → insufficient
- Attempt 2: Fetch 20 candidates → 4 authorized → insufficient
- Attempt 3: Fetch 40 candidates → 6 authorized → success (return 5)

This approach is particularly efficient when:

Users have access to a significant subset of documents (minimal recursion)
Permission distribution is sparse but consistent (predictable growth)
Document corpus is large but user access is limited (avoids loading all vectors)

Building and Development

The project requires CGO enabled for sqlite-vec:

# Build with CGO
CGO_ENABLED=1 go build -o bin/server .

# Run tests
CGO_ENABLED=1 go test ./...

The Makefile automatically sets CGO_ENABLED=1 for all build operations.

Future work

This is a working reference, not production code. Ideas for extensions:

Real Auth: Replace mock tokens with OAuth2/OIDC ([Ory Hydra] works great with Ory Keto)
Scale Storage: Swap SQLite for Pinecone/Weaviate/pgvector (keep sqlite-vec approach)
Audit Trail: Add comprehensive logging for compliance
Reverse Expand: Instead of using vector search to filter, use Keto to pre-filter document IDs
UI: Build a simple web interface for uploading/querying documents
Vector Indexing: Add HNSW or other ANN indexes for larger datasets

CI/CD Performance

The GitHub Actions workflow includes optimizations for faster CI runs:

Key Optimizations

🎯 Model Caching: Ollama models are cached between CI runs using GitHub's cache action
⚡ Simple Setup: Straightforward installation with minimal complexity
🔍 Quick Health Checks: Simple service readiness verification

Performance Gains

First run: Downloads and caches models (~3-4 minutes)
Subsequent runs: Uses cached models (~1-2 minutes)
Cache hit rate: 90%+ for models that don't change

Common issues

Problem	Solution
Ollama connection refused	Run `make install-ollama` or `docker start rerag-ollama`
Models missing	Run `docker exec rerag-ollama ollama pull llama3.2:1b nomic-embed-text`
Keto not running	Check with `curl localhost:4467/health/ready`
Docker not found	Install Docker from https://www.docker.com/get-started
Port 11434 in use	Stop other Ollama instances: `docker stop rerag-ollama`
TLS certificate errors	Check cert file paths and permissions
Database encryption fails	Verify encryption key and SQLite encryption support
Config validation errors	Check required fields when features are enabled
CGO build errors	Ensure C compiler is installed (see requirements above)
sqlite-vec not found	Run `go mod tidy` and ensure CGO is enabled

Contributing

This is experimental code meant for learning and extending. PRs welcome!

Feedback

Found this useful? Hit us with a star. Have ideas? Open an issue or PR.