GitHub - varshith-Git/Valori-Kernel: Valori is a Deterministic Memory OS that sits between intelligence (LLMs) and reality (devices, products, decisions).

7 min read Original article ↗

Valori

The Only Vector Database That Can Cryptographically Prove Perfect Crash Recovery

License: AGPL v3 arXiv Build Status Determinism: Verified Verification Report GitHub stars

Valori is a vector database built for regulated industries (healthcare, finance, legal) that need verifiable AI memory. Unlike Pinecone or Weaviate, which merely claim crash recovery, Valori mathematically proves it with cryptographic hashes.


🎯 Why Valori?

The Problem: You deploy an AI system with vector memory. It crashes. Did it lose data? Did it corrupt state? You have no way to know.

Other Solutions: Pinecone and Weaviate claim they have crash recovery. But you have to trust them.

Valori's Solution: We give you cryptographic proof. Bit-identical state hash before and after crash. Zero trust required.


🛡️ Crash Recovery: Proven, Not Claimed

Production Test (Koyeb Deployment - 2026-01-12)

# Before crash
curl $VALORI_URL/v1/proof/state
# State Hash: aea3a9e17b6f220b3d7ae860005b756c759e58f1d56c665f0855178ee3a8d668

# [Force restart - simulate production outage]

# After recovery  
curl $VALORI_URL/v1/proof/state
# State Hash: aea3a9e17b6f220b3d7ae860005b756c759e58f1d56c665f0855178ee3a8d668

# Verify
diff before_crash.json after_crash.json
# Output: (empty) ← Bit-perfect recovery. Zero data loss. Cryptographically proven.

What this means:

  • Zero data loss - Every operation recovered
  • Bit-identical state - Exact same memory structure
  • Cryptographic proof - BLAKE3 hash verification
  • Production tested - Real deployment, real crash

Full case study →


📊 Valori vs. Competitors

Feature Pinecone Weaviate Valori
Crash Recovery ✓ (claimed) ✓ (claimed) Proven with cryptographic hash
State Verification ✅ Cryptographic proof via /v1/proof/state
Forensic Replay ✅ Event sourcing - replay any incident
Audit Compliance Partial Partial ✅ Full trail (HIPAA/SOC2 ready)
Multi-arch Determinism ✅ Identical on x86/ARM/WASM
Open Source ✅ AGPL-3.0
Pricing Usage-based Usage-based Free (open source)

Valori's advantage: We're the only one that lets you verify recovery, not just hope it worked.


🚀 Quick Start

Install

# Clone the repository
git clone https://github.com/varshith-Git/Valori-Kernel.git
cd Valori-Kernel/python
pip install .

Use

from valori import Valori

client = Valori()
# Atomic Batch Insert
client.insert_batch([[0.1]*16, [0.2]*16]) 
# Search
results = client.search([0.1] * 16, k=5)

That's it. Simple embedded mode. No Docker. No Kubernetes.

Full documentation →


👥 Who Should Use Valori?

✅ You Need Valori If:

  • You're building AI for healthcare (HIPAA compliance requires audit trails)
  • You're building AI for finance (SOC2 audits need verifiable state)
  • You're building AI for legal (forensic replay of decisions)
  • You need to debug production incidents (replay exact state)
  • You deploy on multiple architectures (ARM, x86, WASM)

❌ You DON'T Need Valori If:

  • You need massive query-per-second scale (use Pinecone)
  • You don't care about crash recovery
  • You're okay trusting your vendor
  • You don't need audit compliance

⚡ Performance: Is Determinism Slow?

TL;DR: No. Fixed-point math has negligible overhead.

Benchmarks (SIFT1M dataset, MacBook Air M2)

Metric Result Status
Ingestion 1.24M vectors/sec ⚡ Production-ready
Search Accuracy 99% Recall@10 ✅ State-of-the-art
Search Latency 0.47ms ⚡ Real-time
Snapshot Save 50ms (50K vectors) ✅ Fast checkpointing
Snapshot Load 33ms (50K vectors) ✅ Fast recovery

Verdict: Determinism is free. You get verifiability at zero performance cost.

1M Vector Benchmark Ingestion Speed Persistence Speed


🎯 Accuracy Benchmark

We benchmarked Valori's Q16.16 Fixed-Point Kernel against the SIFT1M Ground Truth.

Metric Valori (Fixed-Point) Target Verdict
Recall@1 99.00% >90% 🌟 State of the Art
Recall@10 99.00% >95% Production Ready
Filter Accuracy 100.00% 100% 🎯 Strict Enforcement
Latency 0.47 ms <1.0ms Real-Time

Methodology: Ingested SIFT1M subset, built HNSW graph using integer-only arithmetic, queried against pre-computed ground truth integers.

Recall Benchmark Filter Performance


� Key Features

1. Event-Sourced Architecture

  • Every operation is logged to an immutable event log
  • State is deterministic - replay events = identical result
  • Forensic debugging - reproduce exact production state
  • Audit trail - full history of all changes

2. Multi-Architecture Determinism

Valori uses Q16.16 Fixed-Point Arithmetic instead of standard f32 floats.

  • Bit-identical results on x86, ARM, WASM
  • No floating-point bugs - operations yield identical results across CPUs
  • Cross-platform verified - tested across all architectures
  • Benefits: Deploy anywhere, test once

3. Zero-Cost Tag Filtering

  • O(1) tag filtering via parallel arrays
  • 100% accuracy - no false positives
  • Use case: Filter by user_id, tenant_id, document_type
  • Performance: No graph traversal overhead

4. Metadata & Knowledge Graph

  • Zero-Cost Filtering: Filter searches by tag (u64) with O(1) overhead
  • Strict Enforcement: 100% accuracy without graph traversal penalties
  • Metadata: Attach optional binary metadata (up to 64KB) per record
  • Knowledge Graph: Create nodes and edges for complex relationships

5. Crash Recovery & Durability

  • WAL & Event Log: Every operation is synced to disk via length-prefixed logs
  • Batch Ingestion: Atomic commits for high-throughput bulk inserts
  • Snapshots: Instant checkpointing and restoration

6. Flexible Deployment

  • Embedded (Python FFI): Link directly into Python for microsecond latency
  • HTTP Server: Run as standalone node with REST API
  • Bare Metal: no_std compatible for ARM Cortex-M embedded systems
  • Replication: Leader-follower for read scaling

📚 Documentation


🛠️ Setup

Prerequisites

  • Rust 1.70+ (rustup recommended)
  • Python 3.8+ (for Python bindings, optional)

Quick Start

  1. Clone the repository:

    git clone https://github.com/varshith-Git/Valori-Kernel.git
    cd Valori-Kernel
  2. Download benchmark dataset (optional):

    chmod +x scripts/download_data.sh
    ./scripts/download_data.sh
  3. Build and test:

    cargo build --release
    cargo test --workspace --exclude valori-embedded
  4. Run benchmarks:

    cargo run --release --bin bench_recall
    cargo run --release --bin bench_ingest
    cargo run --release --bin bench_filter

📡 HTTP Server (Production Mode)

Run Valori as a standalone node.

Start Server:

cargo run --release -p valori-node
# Server listening on 0.0.0.0:3000

Client Usage:

from valori import Valori

# Connect to Remote Server
client = Valori(remote="http://localhost:3000")

# Atomic Batch Insert
batch = [[0.1] * 16, [0.2] * 16, [0.3] * 16]
ids = client.insert_batch(batch)
print(f"Batch inserted IDs: {ids}")

# Search
hits = client.search([0.1] * 16, k=1)

� Replication & Clustering

Valori supports Leader-Follower Replication.

Running a Leader

# Default (Leader)
cargo run --release -p valori-node

Running a Follower

Followers stream the WAL/Event Log from the leader and maintain an identical in-memory replica.

VALORI_REPLICATION_MODE=follower \
VALORI_LEADER_URL=http://localhost:3000 \
VALORI_HTTP_PORT=3001 \
cargo run --release -p valori-node

The follower will:

  1. Bootstrap: Download a snapshot from the leader
  2. Stream: Replay the WAL/Event Log in real-time
  3. Cross-Check: Verify state hashes to ensure zero divergence

📊 Observability

Valori exposes Prometheus metrics at /metrics.

Key Metrics:

  • valori_events_committed_total: Total events persisted
  • valori_batch_commit_duration_seconds: Latency of batch commits
  • valori_replication_lag: Seconds behind leader (on followers)

📐 Architecture

Event Sourcing Pipeline

[Request] -> [Batch Buffer] -> [Shadow Execute (Validation)] 
                                     |
                                     v
                             [Append to Event Log (fsync)]
                                     |
                                     v
                             [Update In-Memory Kernel]
                                     |
                                     v
                             [Update Index (HNSW)]

Tech Stack

  • Kernel: Pure Rust, no_std, Q16.16 Fixed Point
  • Storage: Append-only Logs (Bincode serialized)
  • Network: Axum (HTTP), Tokio (Async)
  • Interface: PyO3 (Python FFI)

🛠️ Development

Build:

cargo build --release --workspace

Test:

# Unit & Integration Tests
cargo test --workspace

# Batch Ingestion Verification
cargo test -p valori-node --test api_batch_ingest

# Replication Verification
cargo test -p valori-node --test api_replication

Python FFI Dev:

cd python
pip install -e .
python test_valori_integrated.py

⭐ Star History

If you find Valori useful, please star the repository! It helps others discover the project.

Star History Chart


🔬 Research & Citations

Valori is based on peer-reviewed research into deterministic substrates.

Paper: Deterministic Memory: A Substrate for Verifiable AI Agents

@article{valori2025deterministic,
  title={Deterministic Memory: A Substrate for Verifiable AI Agents},
  author={Valori Research Team},
  journal={arXiv preprint arXiv:2512.22280},
  year={2025}
}

🏢 Enterprise Support

Need help deploying Valori in production?

  • Production deployment consulting
  • Custom compliance implementations (SOC2, HIPAA)
  • Priority bug fixes & SLAs
  • Forensic analysis tools (Deterministic Evaluator, Compliance Packs)

Contact: varshith.gudur17@gmail.com


📄 License

AGPL-3.0 - See LICENSE

Core features are free forever. Enterprise extensions available commercially.