Rust: Zero-Cost Abstraction in Action

103 points by i_dursun 6 years ago · 45 comments

Reader

bdd 6 years ago

These are not about abstractions but compile time optimizations. These are also not implemented in the Rust compiler but LLVM. So any any language frontend in front of LLVM would yield the same optimizations. According to Wikipedia the list is:

> [...] variety of front ends: languages with compilers that use LLVM include ActionScript, Ada, C#, Common Lisp, Crystal, CUDA, D, Delphi, Dylan, Fortran, Graphical G Programming Language,Halide, Haskell, Java bytecode, Julia, Kotlin, Lua, Objective-C, OpenGL Shading Language, Ruby, Rust, Scala, Swift, Xojo, and Zig.

Sometimes I think mention of Rust in the title just gets upvotes without even reading the article, here.

devit 6 years ago

Only those that emit LLVM code that can be optimized like this.
In poorly performing languages like Python and Ruby calling a method on a value (like sum() or even the += operator) essentially means looking up the method name in a hash table, and then doing a dynamic function call on the resulting function pointer, so this sort of optimizations cannot be done in an ahead-of-time compiler unless all objects are created locally and thus have known method dictionaries (and even then, it depends on whether they are available to the compiler in IR form, whether the lookup code is inlined and whether the code can actually be simplified).
- zozbot234 6 years ago
  
  > essentially means looking up the method name in a hash table, and then doing a dynamic function call on the resulting function pointer
  Well, you do need something very much like this if you want to support "openly extensible" classes in a backward- and forward-compatible way. Otherwise you end up with something like the C++ ABI hell where changing things around in a base class breaks every binary that links to earlier versions of that class. This is only a "poorly-performing" pattern inasmuch as the developer is not enabled to "opt out" of that extensibility when that's the sensible choice.
- jashmatthews 6 years ago
  
  10+ years ago this was true but CRuby's basic operations like += (bytecode is opt_plus followed by setlocal) don't do method lookup unless you redefine them and even if you do then it uses inline caches in the bytecode so there's only one hash lookup.
  TruffleRuby has been constant folding trivial code like this since 2014. AOT is also an option.
habitue 6 years ago

Actually, so there's a very good chance that this particular optimization is happening due to MIRI, which allows very sophisticated constant evaluation of MIR (a rust intermediate representation)
https://rust-lang.github.io/rustc-guide/miri.html
So while, yes, in general pretty much all rust optimizations are actually due to LLVM, this one in particular might not be
- bdd 6 years ago
  
  Here's the equivalent in C++, compiled with clang 9.0.0 at optimization level 1. https://godbolt.org/z/nKSG9_
  Byte for byte identical assembly.
JoeCamel 6 years ago

I think it's both about abstraction and compile time optimization. The point was that by default you can write code that uses higher level abstraction but compiles to the same code as an imperative-style loop. Most used languages or default implementations don't use LLVM, moreover they are interpreted or JIT compiled, so there might be different performance between e.g. for-loops vs. forEach with a closure.
pjmlp 6 years ago

Even worse, the lack of proper compiler design courses in many universities or "engineer" bootcamps, means that many attribute to language X, what is actually a compiler backend capability, like you quite well explain.
thcz 6 years ago

The C# compiler uses LLVM? Is that for something like Xamarin or something? I was under the impression that it was self-contained. Anyone knows the details of this?
- wtfleming 6 years ago
  
  Unity's burst compiler https://docs.unity3d.com/Packages/com.unity.burst@1.2/manual... uses LLVM, but only supports a somewhat limited subset of C#. I've used it a bit in some performance critical parts of a game and saw some dramatic speed increases. But I can't speak to other parts of the C# ecosystem beyond Unity though.
- bdd 6 years ago
  
  Anyone can build a frontend. "Java bytecode" is in that list too. It doesn't mean defacto compilers of these languages rely on LLVM.
  - jerven 6 years ago
    
    In this case it is Azul Zing, a very serious product. It's one of the four major VM Jit compilers (OpenJ9, C2, Graal are the others in my opinion)
    [1] https://www.azul.com/products/zing/
    
    pjmlp 6 years ago
    
    There are also PTC, Aicas, Virtenio, Ricoh and Gemalto all targeted to embededded deployments, of which, PTC and Aicas are the most well known ones.
    Sadly Excelsior is no more. I imagine that regular JIT compilers making AOT/JIT caches available, is what killed them.
- pjmlp 6 years ago
  
  There are many implementations of C#, a few of them use LLVM, like Xamarin AOT, IL2CPP, Burst.
  .NET Core and .NET Framework, do not.

lasagnaphil 6 years ago

This isn’t really talking about Rust, it’s actually talking about the optimization capabilities of LLVM (the compiler backend, which quite a lot of languages use, such as Clang for C++, Rust, Swift, Julia, Zig, ...) These languages all have similar chances of performing those same optimizations, at least in those simple cases.

What I’m interested is how the IL code for the compiler frontends for each of those languages are more well-optimizable for LLVM (in practical situations, not just a few lines of simple numerical code.) I’ve heard that you need to be careful about encoding IL code in the right way such that LLVM does not generate needless memcpy’s or something but I’m not that much of a compiler expert...

kibwen 6 years ago

rustc does indeed deliberately seek to emit LLVM IR that resembles what Clang would emit, in order to benefit from the same sorts of optimizations that default LLVM is tuned for.
kbenson 6 years ago

It's both, it's just complicated by the fact that LLVM is optimizing it to a known formula.
The zero cost abstractions are the fact that you get the same output, special optimization and all, when you do the original loop, or when you use the "(1..n).fold(0, |x,y| x + y)" variant, or "(1..n).sum()" variant. Rust is converting those to intermediate code that is the same, or close enough, that LLVM is able to apply the same optimization.
Would it be better is some sample code was chosen that wasn't special cased to the degree that the entire algorithm was replaced wholesale? Probably. It doesn't invalidate the premise though, just slightly obscures it.

tiziano88 6 years ago

There is nothing about zero const abstractions in this article, just basic compiler optimizations.

kibwen 6 years ago

The OP doesn't explicitly demonstrate any zero cost abstractions, however these "basic compiler optimizations" are only feasible to achieve statically (i.e. without a JIT) in the presence of sufficiently transparent abstractions, which takes a good deal of language design effort to enable.
microtonal 6 years ago

Iterators in Rust are (usually) zero-cost abstractions of loops. Bjarne Stroustrup's definition:
What you don’t use, you don’t pay for. And further: What you do use, you couldn’t hand code any better.
Rust iterators fit in the second part.
Ericson2314 6 years ago

Well, functions are abstractions too, but yes to me the interesting part of "zero const abstraction" is "zero cost data structure composition", and that indeed is not covered.

SeekingMeaning 6 years ago

For anyone interested in reading more about this, I would recommend Zero Cost Abstractions[1] by withoutboats, who is a contributor to Rust.

1: https://boats.gitlab.io/blog/post/zero-cost-abstractions/

drej 6 years ago

This is not Rust specific, this is a compiler thing, both LLVM and GCC can detect sums and generate closed form formulas instead. There are other fun algorithm detections - e.g. if you try to do bitcounts yourself, LLVM will use popcnt instead.

Compilers are awesome, check out Matt Godbolt's talk on this very topic: https://www.youtube.com/watch?v=nAbCKa0FzjQ

univerio 6 years ago

I assume it's doing `(N-2)(N-3)/2 + 2N - 3` instead of `N(N-1)/2` due to overflow concerns? But couldn't `(N-2)(N-3)` also possibly overflow, just supporting a larger range of `N`?

devit 6 years ago

In this assembly code it cannot overflow because N is a 32-bit integer and the multiplication gives a 64-bit result, which is converted to 32-bit only after shifting.
I can't figure out why it doesn't use the simpler formula (other than the optimizer being bad).

Ericson2314 6 years ago

The earlier transformations are actually more impressive. Constant folding is almost always has a good RoI, so lots of compilers do it, and it's simple because, well, it's a constant there is no variables or partial eval needed. The others require lots of inlining before the final rule fires, and so are more ambitious.

Seeing what

  pub fn sum3(n: i32) -> i32 {
     (1..n).sum() + (1..2*n).sum() + (1..(n + 2)).sum()
  }

does would be more interesting to me.

Also, while all the inline is rustc, I assume the "triangle number trick" is LLVM.

SAI_Peregrinus 6 years ago

Godbolt supports Rust, and can show the LLVM IR: https://godbolt.org/z/GsccW3
- pjmlp 6 years ago
  
  A bit offtopic, I love how Godbolt grew out of C++ community to embrace as much AOT toolchains as possible, kudos to Matt and everyone involved into making it happen.

ojosilva 6 years ago

I wonder if the const syntax introduced in ES6 will ever result in const folding, or any JIT optimizations for code running in JS runtimes. Apparently, from what I've read, const is being ignored as far as optimiztions go and is only used as a way to prevent the developer from ever reassigning certain variables.

pkilgore 6 years ago

The ocaml compiler does constant folding too, and I consistently look at the (usually javascript because of Bucklescript) output in amazement when it finds shit like that. Unrolling all my tail recursion is great too.

pjmlp 6 years ago

For anyone that wants to learn about optimizations in AOT compiled ML languages, have a look at:
"The Implementation of Functional Programming Languages"
"Compiling with Continuations"
"Modern Compiler Implementation in ..." (C, Java and ML variants)
Although oriented towards Lisp, "Lisp in small pieces" is a classical book as well, with many optimizations like inlining of lambda calls across multiple call levels.

incadenza 6 years ago

Are compiler optimizations like summing a series done on an ad-hoc basis? Certainly the compiler couldn’t have discovered or inferred (not sure what term to use) that formula, no?

Just generally curious.

exacube 6 years ago

Ad-hoc basis. Usually comes out of studying the most common classes of computation and optimizing for it. For example, the "summing a series" probably falls out of loop optimizations: https://en.wikipedia.org/wiki/Loop_optimization
kibwen 6 years ago

Those optimizations are happening on LLVM's end (though the higher-level language will have to expose sufficiently transparent abstractions to make these optimizations possible). I'd love to read a book on the optimization techniques that are implemented in LLVM/GCC to make transformations like this possible.
- incadenza 6 years ago
  
  Gotcha. Thanks. So presumably Clang would have done the same for C?
  - microtonal 6 years ago
    
    Easy to check, thanks to godbolt:
    https://godbolt.org/z/SopXnf
  - steveklabnik 6 years ago
    
    Yes, and it's mentioned in the post that the C code was also brought into parity.
    
    incadenza 6 years ago
    
    Yeah I saw that, but just wasn’t sure how using intrinsics played a role.

boomer_joe 6 years ago

There is nothing impressive about this. https://godbolt.org/z/S2tDDh

shmerl 6 years ago

> he was very disappointed because Rust version was twice as fast than the C version which was hand-optimised by pulling off all the tricks he knew to make it perform well.

Why disappointed? It just highlights the quality of Rust's approach.

ncmncm 6 years ago

Evidently the tricks were what made it slow.
This happens all too often: once the code changes enough that the optimizer doesn't recognize the pattern anymore, it throws up its hands, and you're on your own. Some people call this optimizer roulette.
It's not just compilers, either. CPUs have their own peephole optimizers, and patterns they recognize, or don't, and it can easily make a 2x difference in your run time depending on if it cottons to what you're trying, or doesn't.
- pjmlp 6 years ago
  
  With CPUs it gets even worse, because that clever optimized Assembly code can stop being so in another CPU or after a firmware update.
  The days of Z80, 6502 and similar are long gone.

yahyaheee 6 years ago

This is neat but there is still cost in compile time

JoeCamel 6 years ago

Usually in Rust community zero cost means zero runtime cost. Obviously, there are many other costs you could define.

Settings

Rust: Zero-Cost Abstraction in Action

Keyboard Shortcuts