Show HN: How do you reliably benchmark sub-100ns code paths in Rust
github.comHi everyone,
I just finished coding the core version of this library called Cuttlefish written completely in rust. It’s a CRDT inspired framework that packs stuff like io_uring, SIMD, zero copy pipelines etc.. Here’s what it is:
So most distributed systems are strong consistency but the tradeoff is latency. Cuttlefish is a coordination-free state kernel that preserves invariants and constraints at the speed of your L1 cache.
Correctness here is defined by a property of algebra. So if your operations commute, you don’t need coordination. If they don’t, you know at admission time in nanoseconds, or at least it’s supposed to.
Running a full benchmark suite triggered the following results:
Full admission cycle: ~40ns Kernel admit: ~13 ns Causal clock dominance: ~700 ps Tiered hash verification: ~280 ns Durable admission: ~5.2 ns WAL hash: ~230 ns
On my CPU though (r5 7600x), I measure 40 ns full cycle including causality check, but I’m not sure of my benchmark setup because most of it was written by AI. How are other people measuring sub-100 ns rust code paths reliably?
Repo: https://github.com/abokhalill/cuttlefish Not sure about Rust, I assume it's going to be the same, my PhD advisor gave me an inline C assembly snippet I could use to do cycle accurate benchmarking. It used counters on the CPU, something super basic like reading those registers into a var. --- You can probably take the above to a coding agent or LLM and get what you need back.