Settings

Theme

Rust zero-cost abstractions vs. SIMD

turbopuffer.com

24 points by Sirupsen 21 days ago · 4 comments

Reader

A04eArchitect 12 days ago

The real pitfall is overhead in the standard memory allocator. On ARM v8-A, I bypassed it entirely for my audit engine. Result: 85ns latency for 10.8T data points on a $100 board. I recorded the memory profiler and benchmarks as proof since the numbers look 'impossible'. See the video here

https://x.com/NayakaPambudi

  • A04eArchitect 12 days ago

    Actually, the bottleneck wasn't the I/O, it was the context switching. If anyone wants the specific memory map addresses I used for the ARM v8-A bypass, let me know

verglasz 15 days ago

Sounds like the cost isn't really in the abstraction, but in implementing a traversal of the merge tree which produced one value at a time instead of creating a batch with what is presumably fewer total wasted computations... I doubt that they'd have had better codegen if they inlined their `next()` into the loop consuming the values. And vice versa, probably an `Iterator` for the merge tree that internally produces a batch and then yields from it would probably perform pretty much the same as their current code (since it's thin enough to be inlined I expect).

jason_s 21 days ago

Can we please encourage variable-width fonts for text, fixed-width fonts for code? It improves readability.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection