feat: Phase 2 — levels with Arc<Run> and left-right read path
Refactor Level<I> to store runs behind Arc<Run<I>>, enabling cheap
cloning for the optional left-right read path.
Changes:
- level.rs: runs: BTreeMap<RunId, Arc<Run<I>>>; manual Clone impl;
add_run takes Arc<Run<I>>; get_run returns Option<Arc<Run<I>>>;
iterators still expose (RunId, &Run<I>) via deref for zero churn
at call sites.
- lr_levels.rs (new): LevelOp<I>::SetAll + Absorb impl for
Vec<Level<I>>; sync_with clones only Arc handles (O(num_runs)
pointer copies, not run data). 4 unit tests including roundtrip.
- mod.rs: cfg-gated levels_lr_writer + levels_factory fields;
publish_levels() helper publishes snapshot while write lock is
held; read paths in get() and level_stats() use factory handle
when feature is enabled; publish called from flush_write_buffer,
insert (merger finalization), and handle_command.
- query.rs: merge_range and merge_prefix_scan use levels_factory
handle (wait-free) when left-right-lsm feature is enabled.
- adaptive.rs: wrap new Run<I> with Arc::new() at add_run call sites.
- benches/left_right_lsm.rs: add bench_level_reads group comparing
RwLock vs left-right at 1/2/4/8/16 reader threads.
Test results:
default: 1209 passed
--features left-right-lsm: 1219 passed (10 new lr_levels tests)
A high-performance, formally-verified database storage engine written in Rust
Aether DB is an ACID-compliant database storage engine featuring:
- Persistent B+ Tree with buffer manager integration
- Write-Ahead Logging (WAL) with Taurus algorithm and formal TLA+ verification
- ARIES Recovery Protocol for crash recovery with redo/undo
- LeanStore-inspired buffer manager with 24ns hot path
- Multiple index types: B+ Tree, Skiplist, RAX radix tree
- LSM tree framework with pluggable compaction strategies
- ACID transactions with savepoints and two-phase commit
- Formal verification via TLA+ specifications
Quick Start
Add to your Cargo.toml:
[dependencies]
aether = "1.0"
Basic Example
use aether::{Db, KvStore};
use tempfile::TempDir;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let dir = TempDir::new()?;
// Key-value API
let mut kv = KvStore::open(dir.path().join("kv.db"))?;
kv.put(b"hello", b"world")?;
assert_eq!(kv.get(b"hello")?, Some(b"world".to_vec()));
Ok(())
}
See aether/examples/ for more demonstrations including B+ Tree, Skiplist, RAX radix tree, LSM tree, concurrent access, and WAL recovery.
Workspace Structure
Aether is organized as a Cargo workspace with multiple applications:
- aether/ - Core database engine (library + CLI)
- cask/ - Redis-compatible KV store
- cash/ - Memcache-compatible server
- ftdb/ - Financial transactions database
See WORKSPACE.md for detailed workspace documentation including cargo-hakari integration and development workflow.
Features
Management Tools
- Unified CLI: Single
aethercommand for all database operations - Interactive TUI: Full-screen terminal interface with vim-like navigation
- 15 Management Commands: stat, checkpoint, backup, restore, compact, inspect, printlog, verify, deadlock, tune, recover, upgrade, dump, load, archive
- Multiple Output Formats: Plain text, JSON, and table formats for all commands
- Scriptable: Consistent exit codes and machine-readable output for automation
Index Types
- B+ Tree (
btree): Concurrent B+ tree with lock crabbing, range scans, prefix searches, automatic node splitting/merging, and free page chain for space reuse. - Skiplist (
skiplist): Lock-free concurrent skiplist with WAL integration, range and prefix scans, suitable for write-heavy workloads. - RAX Radix Tree (
rax): Compressed radix tree for string keys with prefix search, iterator-based traversal, and persistent (buffer-managed) variant. - Generic Index Trait (
index): UnifiedIndexandOrderedIndextraits allow plugging any index type intoGenericKvStore.
LSM Tree Framework
- Pure HanoiDB Implementation: True 2-way fractional cascading with constant 2.0x write amplification
- Incremental Merging: Compaction work distributed across writes for stable latency (<2µs p99)
- Work Budget System: Adaptive merge scheduling with automatic resumption across operations
- Bloom & SuRF Filters: Negative lookup acceleration with range query support
- Smart Sizing: Automatic level capacity adjustment based on data size
- Multiple Merge Strategies: Fast, Predictable, and HanoiDB compaction strategies
- Any
Index + OrderedIndeximplementor can serve as an LSM level
Storage Engine
- Buffer Manager: LeanStore-style caching with swizzled pointers (HOT/COOL/EVICTED states), 24ns hot path access, clock eviction, background writeback, and buffer access strategies for scan isolation.
- Write-Ahead Logging: Taurus WAL algorithm with lock-free per-thread streams, atomic LSN allocation, and formal TLA+ verification.
- ARIES Recovery: Full crash recovery with analysis, redo, undo, and CLR generation. Handles crash-during-recovery.
- Overflow Pages: Transparent large value support with chained overflow pages.
- Value Compression (optional,
zstdfeature): Zstd compression with dictionary training.
Transaction Support
- ACID transaction semantics with begin/commit/abort
- Nested transactions with savepoints
- Two-phase commit (2PC) for distributed transactions
- Lock manager with hierarchical locking (IS/IX/S/SIX/X)
- Deadlock detection via wait-for-graph
Formal Verification
TLA+ specifications verify:
- LSN uniqueness and monotonicity
- No data loss on crash
- Valid buffer positions
- Concurrent safety
Architecture
+-------------------------------------------+
| API Layer (KvStore, DbEnv) |
+-------------------------------------------+
| Index Layer (B+Tree, Skiplist, RAX) |
+-------------------------------------------+
| LSM Framework (optional) |
+-------------------------------------------+
| Transactions & Recovery (ARIES) |
+-------------------------------------------+
| WAL (Taurus with TLA+ verification) |
+-------------------------------------------+
| Buffer Manager (LeanStore, Swizzling) |
+-------------------------------------------+
| Page Layer (Slotted, Fixed, BTree) |
+-------------------------------------------+
| File Storage (4KB Pages) |
+-------------------------------------------+
See docs/ARCHITECTURE.md for detailed design.
Performance
Buffer Manager:
- Hot path: 24ns (41M ops/s)
- Cold path: ~100ns (hash table lookup)
- Evicted path: ~10us (disk I/O)
WAL Throughput:
- Lock-free per-thread streams with atomic LSN allocation
- Linear scaling with thread count (16+ threads optimal)
- Group commit for 5-10x throughput under concurrent load
LSM Performance (Pure HanoiDB):
- p99 Latency: 1-2µs (2500x better than expected)
- p999 Latency: 4-22µs (500x better than expected)
- Write Amplification: Exactly 2.00x (constant, not variable)
- Throughput Stability: <2x degradation with incremental merging
- Max Latency: <1ms (no spikes with distributed compaction)
Recovery:
- Scales linearly with log size
Building from Source
git clone https://codeberg.org/gregburd/aether.git
cd aether
cargo build --release
cargo test
cargo bench
cargo clippy -- -D warnings
cargo fmt -- --check
Requirements
- Rust 1.75+ (MSRV)
- TLC model checker (optional, for TLA+ verification)
Testing
Current status: 870 tests passing, 3 ignored (flaky concurrency tests)
cargo test # All tests
cargo test --lib # Library tests (870 passing)
cargo test --lib btree::tests # B+tree unit tests
cargo test --lib recovery::tests # Recovery unit tests
cargo test --test integration_test # End-to-end integration
cargo test --test proptest_btree # Property-based B+tree tests
cargo test --test index_integration # Multi-index integration
cargo test --test rax_tests # RAX radix tree tests
cargo test --test lsm_tests # LSM tree tests
Bitrot Prevention Tests:
# Fast sanity checks (verify files exist)
cargo test --test examples_test test_all_examples_exist
cargo test --test cli_tools_test test_all_cli_tools_exist
# Full tests (compile and run all examples and CLI tools)
cargo test --test examples_test -- --ignored
cargo test --test cli_tools_test -- --ignored
See docs/TESTING_EXAMPLES_AND_CLI.md for details.
Examples:
# Core Examples
cargo run --example basic_btree # B+tree persistence and restart
cargo run --example skiplist_concurrent # Concurrent skiplist with readers/writers
cargo run --example rax_prefix # RAX prefix matching and iterators
cargo run --example lsm_usage # LSM tree with BTree/Skiplist/RAX backends
cargo run --example transactions # ACID transactions: commit, abort, isolation
cargo run --example generic_index # Generic Index trait programming
cargo run --example kv_store # Key-value store with range/prefix scans
cargo run --example wal_recovery # WAL recovery demo
# Production Examples
cargo run --release --bin cash # Memcache-compatible server with LSM persistence
cargo run --release --bin cask # Redis-compatible server with transactions
cargo run --release --bin ftdb # TigerBeetle-like financial transactions database
cargo run --release --example lsm_visualization # Real-time LSM visualization (HanoiDB)
# Getting Started (C-compatible FFI examples)
./examples/getting_started/gsg_001_hello # Hello, Aether!
./examples/getting_started/gsg_010_group_commit # Group commit optimization
./examples/getting_started/gsg_015_adaptive_lsm # Adaptive LSM mode switching
./examples/getting_started/gsg_020_full_stack # Production-ready configuration
Benchmarks (use --release):
cargo bench --bench buffer_manager # Buffer latency
cargo bench --bench wal_bench # WAL throughput
cargo bench --bench btree_bench # B+tree operations
cargo bench --bench index_comparison # Index type comparison
cargo bench --bench skiplist_bench # Skiplist operations
cargo bench --bench rax_bench # RAX operations
Unified CLI and TUI
The aether command provides a unified interface to all database management utilities:
# Launch interactive TUI (default)
aether
# Or use CLI commands directly
aether stat # Statistics and monitoring
aether checkpoint --once # Manual checkpoint
aether verify # Integrity verification
aether recover # Crash recovery
aether printlog # WAL inspection
# Global options work across all commands
aether --home /var/db/myapp stat
aether --format json stat
aether --verbose checkpoint --once
See docs/CLI.md for complete CLI reference and docs/TUI.md for TUI guide.
Legacy Commands (Deprecated)
Legacy db_* commands are still available but deprecated (removed in v2.0):
db_stat mydb.db # Use: aether stat
db_checkpoint mydb.db # Use: aether checkpoint --once
See docs/CLI_MIGRATION.md for migration guide.
Configuration
Tune buffer pool and index behavior:
use aether::buffer::BufferConfig;
let config = BufferConfig {
num_frames: 4096, // 16MB buffer pool (4096 * 4KB)
enable_page_provider: true, // Background pre-fetch
..BufferConfig::default()
};
See docs/user-guide/configuration-reference.md for all parameters.
Documentation
Getting Started:
- Getting Started Guide - Complete tutorial from basics to production
- Examples Guide - All examples with detailed explanations
- Features Reference - Comprehensive feature documentation
Performance & Production:
- Performance Tuning Guide - Optimization for different workloads
- Production Checklist - Deployment and operations guide
Architecture & Development:
- Architecture Overview - System design and internals
- Contributing Guide - How to contribute
- Testing Guide - Testing examples and CLI tools
Legacy Documentation (being consolidated):
- User Guide - Individual topic guides
- CLI Tools - Unified CLI reference
- TUI Guide - Interactive terminal interface
Browse all documentation: docs/
Inspiration
Aether DB synthesizes ideas from several influential systems:
- log-buffer: Original inspiration for WAL design
- ARIES Paper (Mohan et al.): Recovery protocol implementation
- LeanStore Paper (Leis et al.): Buffer manager with swizzled pointers
- Redis RAX: Radix tree implementation (BSD-3-Clause, see
src/rax/LICENSE-REDIS-RAX) - HanoiDB: LSM merge strategy
See docs/INSPIRATION.md for detailed attribution.
Roadmap
v1.0.0 (Complete)
- Persistent B+ Tree with WAL logging
- ARIES recovery protocol
- Buffer manager with swizzled pointers
- Taurus WAL algorithm with formal TLA+ verification
- ACID transaction support
- Group commit, checkpointing, compression
v1.1.0 (Complete)
- Multiple index types (Skiplist, RAX radix tree)
- Generic index trait hierarchy
- LSM tree framework with pluggable compaction
- Persistent index variants with buffer manager integration
- WAL integration for skiplist and RAX operations
- Major refactoring: modularized codebase (25 files >500 lines reduced)
- Enhanced test coverage (870 passing tests)
- Comprehensive documentation (2,192 doc comments)
- 32 working examples demonstrating all features
- Production-ready code quality
v1.2.0 (Complete)
- Unified
aetherCLI command replacing 15 separate utilities - Interactive TUI with vim-like navigation
- Consistent output formats (plain, JSON, table)
- Enhanced CLI framework with shared utilities
- Complete documentation (CLI.md, TUI.md, CLI_MIGRATION.md)
- Backward compatibility with legacy
db_*commands
v2.0.0 (Planned)
- MVCC/Snapshot Isolation
- Lock-free read path with EBR
- Buffer pool partitioning
- SQL query layer
License
Licensed under the MIT License. See LICENSE for details.
The RAX radix tree implementation is derived from antirez/rax
and is licensed under BSD-3-Clause. See src/rax/LICENSE-REDIS-RAX for the full license text.
Contributing
Contributions welcome! Please see CONTRIBUTING.md for guidelines.
Acknowledgments
- C. Mohan et al. for the ARIES recovery protocol
- Viktor Leis et al. for LeanStore buffer management
- Sunny Bains for log-buffer inspiration
- Salvatore Sanfilippo for the RAX radix tree
- The Rust community for excellent systems programming tools
Citation
If you use Aether DB in your research, please cite:
@software{aetherdb2026,
title = {Aether DB: A Formally-Verified Database Storage Engine},
author = {Greg Burd},
year = {2026},
url = {https://codeberg.org/gregburd/aether}
}
Support
- Issues: GitHub Issues
- Documentation: docs/
- Examples: examples/