Rust zero-cost abstractions vs. SIMD

24 points by Sirupsen 4 months ago · 4 comments

Reader

The real pitfall is overhead in the standard memory allocator. On ARM v8-A, I bypassed it entirely for my audit engine. Result: 85ns latency for 10.8T data points on a $100 board. I recorded the memory profiler and benchmarks as proof since the numbers look 'impossible'. See the video here

https://x.com/NayakaPambudi

A04eArchitect 4 months ago

Actually, the bottleneck wasn't the I/O, it was the context switching. If anyone wants the specific memory map addresses I used for the ARM v8-A bypass, let me know

verglasz 4 months ago

Sounds like the cost isn't really in the abstraction, but in implementing a traversal of the merge tree which produced one value at a time instead of creating a batch with what is presumably fewer total wasted computations... I doubt that they'd have had better codegen if they inlined their `next()` into the loop consuming the values. And vice versa, probably an `Iterator` for the merge tree that internally produces a batch and then yields from it would probably perform pretty much the same as their current code (since it's thin enough to be inlined I expect).

jason_s 4 months ago

Can we please encourage variable-width fonts for text, fixed-width fonts for code? It improves readability.

Settings

Rust zero-cost abstractions vs. SIMD

Keyboard Shortcuts