Show HN: How do you reliably benchmark sub-100ns code paths in Rust

1 points by yousef06 23 days ago · 1 comment · 1 min read

Reader

Hi everyone,

I just finished coding the core version of this library called Cuttlefish written completely in rust. It’s a CRDT inspired framework that packs stuff like io_uring, SIMD, zero copy pipelines etc.. Here’s what it is:

So most distributed systems are strong consistency but the tradeoff is latency. Cuttlefish is a coordination-free state kernel that preserves invariants and constraints at the speed of your L1 cache.

Correctness here is defined by a property of algebra. So if your operations commute, you don’t need coordination. If they don’t, you know at admission time in nanoseconds, or at least it’s supposed to.

Running a full benchmark suite triggered the following results:

Full admission cycle: ~40ns Kernel admit: ~13 ns Causal clock dominance: ~700 ps Tiered hash verification: ~280 ns Durable admission: ~5.2 ns WAL hash: ~230 ns

On my CPU though (r5 7600x), I measure 40 ns full cycle including causality check, but I’m not sure of my benchmark setup because most of it was written by AI. How are other people measuring sub-100 ns rust code paths reliably? Repo: https://github.com/abokhalill/cuttlefish

verdverm 23 days ago

Not sure about Rust, I assume it's going to be the same, my PhD advisor gave me an inline C assembly snippet I could use to do cycle accurate benchmarking.

It used counters on the CPU, something super basic like reading those registers into a var.

---

You can probably take the above to a coding agent or LLM and get what you need back.

Settings

Show HN: How do you reliably benchmark sub-100ns code paths in Rust

Keyboard Shortcuts