Settings

Theme

EloqKV: Achieving Predictable P99.99 Latency on NVMe with Redis API

eloqdata.com

16 points by hubertzhang 2 months ago · 5 comments

Reader

hubertzhangOP 2 months ago

Most Redis alternatives that use disk for persistence struggle with tail latency (P9999) due to background maintenance or OS filesystem overhead. We built EloqKV on a custom storage engine, EloqStore, to solve this.

Key Architectural Choices:

- Custom B-tree Variant: Unlike LSM-trees used in many disk-backed stores, our B-tree variant avoids the "compaction stalls" that typically cause high tail latency during heavy writes.

- Coroutines & io_uring: We leverage io_uring for asynchronous I/O and use coroutines to manage thousands of concurrent I/O requests without the context-switching overhead.

- Object Storage Integration (optional): EloqStore uses object storage as the primary persistent layer, with NVMe acting as a high-speed cache/tier, providing durability without sacrificing speed.

We’ve reached a point where we can provide predictable P99.99 latency even when the working set is primarily on NVMe. We’d love to answer any questions about the storage internals or our benchmarking process.

the_precipitate 2 months ago

With DRAM price this high, this is certainly a welcome feature. But how do you control write latency? B+ Tree is pretty bad at updates and LMDB, another BTree based storage is lightning fast on reads but quite bad on writes compared with RocksDB.

  • iamlintaoz 2 months ago

    The disk storage EloqKV uses (EloqStore [1]) is optimized for batch updates because the upper Data Substrate layer manages buffering and the Write-Ahead Log (WAL), absorbing writes and guaranteeing durability. When durability is not required, the WAL can be optionally disabled.

    [1] github.com/eloqdata/eloqstore

    Disclaimer: I am the CEO of EloqData

  • hubertzhangOP 2 months ago

    we leverage batch write optimization which uses Copy-on-write B-tree variant enables high-throughput batch writes without blocking concurrent reads. MVCC-based design eliminates lock contention and provides predictable write amplification.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection