GitHub - varshith-Git/Valori-Kernel: Valori is a Deterministic Memory OS that sits between intelligence (LLMs) and reality (devices, products, decisions).

Valori

The Only Vector Database That Can Cryptographically Prove Perfect Crash Recovery

Valori is a vector database built for regulated industries (healthcare, finance, legal) that need verifiable AI memory. Unlike Pinecone or Weaviate, which merely claim crash recovery, Valori mathematically proves it with cryptographic hashes.

🎯 Why Valori?

The Problem: You deploy an AI system with vector memory. It crashes. Did it lose data? Did it corrupt state? You have no way to know.

Other Solutions: Pinecone and Weaviate claim they have crash recovery. But you have to trust them.

Valori's Solution: We give you cryptographic proof. Bit-identical state hash before and after crash. Zero trust required.

🛡️ Crash Recovery: Proven, Not Claimed

Production Test (Koyeb Deployment - 2026-01-12)

# Before crash
curl $VALORI_URL/v1/proof/state
# State Hash: aea3a9e17b6f220b3d7ae860005b756c759e58f1d56c665f0855178ee3a8d668

# [Force restart - simulate production outage]

# After recovery  
curl $VALORI_URL/v1/proof/state
# State Hash: aea3a9e17b6f220b3d7ae860005b756c759e58f1d56c665f0855178ee3a8d668

# Verify
diff before_crash.json after_crash.json
# Output: (empty) ← Bit-perfect recovery. Zero data loss. Cryptographically proven.

What this means:

✅ Zero data loss - Every operation recovered
✅ Bit-identical state - Exact same memory structure
✅ Cryptographic proof - BLAKE3 hash verification
✅ Production tested - Real deployment, real crash

Full case study →

📊 Valori vs. Competitors

Feature	Pinecone	Weaviate	Valori
Crash Recovery	✓ (claimed)	✓ (claimed)	✅ Proven with cryptographic hash
State Verification	❌	❌	✅ Cryptographic proof via `/v1/proof/state`
Forensic Replay	❌	❌	✅ Event sourcing - replay any incident
Audit Compliance	Partial	Partial	✅ Full trail (HIPAA/SOC2 ready)
Multi-arch Determinism	❌	❌	✅ Identical on x86/ARM/WASM
Open Source	❌	✅	✅ AGPL-3.0
Pricing	Usage-based	Usage-based	Free (open source)

Valori's advantage: We're the only one that lets you verify recovery, not just hope it worked.

🚀 Quick Start

Install

# Clone the repository
git clone https://github.com/varshith-Git/Valori-Kernel.git
cd Valori-Kernel/python
pip install .

Use

from valori import Valori

client = Valori()
# Atomic Batch Insert
client.insert_batch([[0.1]*16, [0.2]*16]) 
# Search
results = client.search([0.1] * 16, k=5)

That's it. Simple embedded mode. No Docker. No Kubernetes.

Full documentation →

👥 Who Should Use Valori?

✅ You Need Valori If:

You're building AI for healthcare (HIPAA compliance requires audit trails)
You're building AI for finance (SOC2 audits need verifiable state)
You're building AI for legal (forensic replay of decisions)
You need to debug production incidents (replay exact state)
You deploy on multiple architectures (ARM, x86, WASM)

❌ You DON'T Need Valori If:

You need massive query-per-second scale (use Pinecone)
You don't care about crash recovery
You're okay trusting your vendor
You don't need audit compliance

⚡ Performance: Is Determinism Slow?

TL;DR: No. Fixed-point math has negligible overhead.

Benchmarks (SIFT1M dataset, MacBook Air M2)

Metric	Result	Status
Ingestion	1.24M vectors/sec	⚡ Production-ready
Search Accuracy	99% Recall@10	✅ State-of-the-art
Search Latency	0.47ms	⚡ Real-time
Snapshot Save	50ms (50K vectors)	✅ Fast checkpointing
Snapshot Load	33ms (50K vectors)	✅ Fast recovery

Verdict: Determinism is free. You get verifiability at zero performance cost.

🎯 Accuracy Benchmark

We benchmarked Valori's Q16.16 Fixed-Point Kernel against the SIFT1M Ground Truth.

Metric	Valori (Fixed-Point)	Target	Verdict
Recall@1	99.00%	>90%	🌟 State of the Art
Recall@10	99.00%	>95%	✅ Production Ready
Filter Accuracy	100.00%	100%	🎯 Strict Enforcement
Latency	0.47 ms	<1.0ms	⚡ Real-Time

Methodology: Ingested SIFT1M subset, built HNSW graph using integer-only arithmetic, queried against pre-computed ground truth integers.

� Key Features

1. Event-Sourced Architecture

Every operation is logged to an immutable event log
State is deterministic - replay events = identical result
Forensic debugging - reproduce exact production state
Audit trail - full history of all changes

2. Multi-Architecture Determinism

Valori uses Q16.16 Fixed-Point Arithmetic instead of standard f32 floats.

Bit-identical results on x86, ARM, WASM
No floating-point bugs - operations yield identical results across CPUs
Cross-platform verified - tested across all architectures
Benefits: Deploy anywhere, test once

3. Zero-Cost Tag Filtering

O(1) tag filtering via parallel arrays
100% accuracy - no false positives
Use case: Filter by user_id, tenant_id, document_type
Performance: No graph traversal overhead

4. Metadata & Knowledge Graph

Zero-Cost Filtering: Filter searches by tag (u64) with O(1) overhead
Strict Enforcement: 100% accuracy without graph traversal penalties
Metadata: Attach optional binary metadata (up to 64KB) per record
Knowledge Graph: Create nodes and edges for complex relationships

5. Crash Recovery & Durability

WAL & Event Log: Every operation is synced to disk via length-prefixed logs
Batch Ingestion: Atomic commits for high-throughput bulk inserts
Snapshots: Instant checkpointing and restoration

6. Flexible Deployment

Embedded (Python FFI): Link directly into Python for microsecond latency
HTTP Server: Run as standalone node with REST API
Bare Metal: no_std compatible for ARM Cortex-M embedded systems
Replication: Leader-follower for read scaling

📚 Documentation

Node API Reference - HTTP endpoints (/health, /v1/memory/...)
Python SDK Guide - Valori & ProtocolClient usage
FFI Internals - Rust ↔ Python bridge
Architecture Deep Dive - Kernel design, Fxp Math, State Machine
Crash Recovery Case Study - Production proof

🛠️ Setup

Prerequisites

Rust 1.70+ (rustup recommended)
Python 3.8+ (for Python bindings, optional)

Quick Start

Clone the repository:

git clone https://github.com/varshith-Git/Valori-Kernel.git
cd Valori-Kernel

Download benchmark dataset (optional):

chmod +x scripts/download_data.sh
./scripts/download_data.sh

Build and test:

cargo build --release
cargo test --workspace --exclude valori-embedded

Run benchmarks:

cargo run --release --bin bench_recall
cargo run --release --bin bench_ingest
cargo run --release --bin bench_filter

📡 HTTP Server (Production Mode)

Run Valori as a standalone node.

Start Server:

cargo run --release -p valori-node
# Server listening on 0.0.0.0:3000

Client Usage:

from valori import Valori

# Connect to Remote Server
client = Valori(remote="http://localhost:3000")

# Atomic Batch Insert
batch = [[0.1] * 16, [0.2] * 16, [0.3] * 16]
ids = client.insert_batch(batch)
print(f"Batch inserted IDs: {ids}")

# Search
hits = client.search([0.1] * 16, k=1)

� Replication & Clustering

Valori supports Leader-Follower Replication.

Running a Leader

# Default (Leader)
cargo run --release -p valori-node

Running a Follower

Followers stream the WAL/Event Log from the leader and maintain an identical in-memory replica.

VALORI_REPLICATION_MODE=follower \
VALORI_LEADER_URL=http://localhost:3000 \
VALORI_HTTP_PORT=3001 \
cargo run --release -p valori-node

The follower will:

Bootstrap: Download a snapshot from the leader
Stream: Replay the WAL/Event Log in real-time
Cross-Check: Verify state hashes to ensure zero divergence

📊 Observability

Valori exposes Prometheus metrics at /metrics.

Key Metrics:

valori_events_committed_total: Total events persisted
valori_batch_commit_duration_seconds: Latency of batch commits
valori_replication_lag: Seconds behind leader (on followers)

📐 Architecture

Event Sourcing Pipeline

[Request] -> [Batch Buffer] -> [Shadow Execute (Validation)] 
                                     |
                                     v
                             [Append to Event Log (fsync)]
                                     |
                                     v
                             [Update In-Memory Kernel]
                                     |
                                     v
                             [Update Index (HNSW)]

Tech Stack

Kernel: Pure Rust, no_std, Q16.16 Fixed Point
Storage: Append-only Logs (Bincode serialized)
Network: Axum (HTTP), Tokio (Async)
Interface: PyO3 (Python FFI)

🛠️ Development

Build:

cargo build --release --workspace

Test:

# Unit & Integration Tests
cargo test --workspace

# Batch Ingestion Verification
cargo test -p valori-node --test api_batch_ingest

# Replication Verification
cargo test -p valori-node --test api_replication

Python FFI Dev:

cd python
pip install -e .
python test_valori_integrated.py

⭐ Star History

If you find Valori useful, please star the repository! It helps others discover the project.

🔬 Research & Citations

Valori is based on peer-reviewed research into deterministic substrates.

Paper: Deterministic Memory: A Substrate for Verifiable AI Agents

@article{valori2025deterministic,
  title={Deterministic Memory: A Substrate for Verifiable AI Agents},
  author={Valori Research Team},
  journal={arXiv preprint arXiv:2512.22280},
  year={2025}
}

🏢 Enterprise Support

Need help deploying Valori in production?

Production deployment consulting
Custom compliance implementations (SOC2, HIPAA)
Priority bug fixes & SLAs
Forensic analysis tools (Deterministic Evaluator, Compliance Packs)

Contact: varshith.gudur17@gmail.com

📄 License

AGPL-3.0 - See LICENSE

Core features are free forever. Enterprise extensions available commercially.