Settings

Theme

Speed Without Wizardry

fitzgeraldnick.com

206 points by mnemonik 8 years ago · 71 comments

Reader

austincheney 8 years ago

The simple rule I have found for achieving superior performance in high level languages, particularly JavaScript is to simply do less. It isn't that simple though.

Doing less really means less code totally at the current compilation target, essentially feeding fewer total instructions to the compiler. This means no frameworks and minimal abstractions. It means having a clear appreciation for the APIs you are writing to. It means minimizing use of nested loops, which exponentially increase statement count.

Sometimes caching groups of instructions in functions can allow for cleaner code with a positive performance impact.

V8 cannot compile arithmetic assignment operators, which it calls left-side expressions, so you can see a rapid speed boost in V8 when you replace something like a += 1 with a = a + 1.

The side benefit of less code is generally clearer and cleaner code to read. There isn't any wizardry or black magic. No tricks or super weapon utilities.

As an example I wrote a new diff algorithm last year that I thought was really fast. https://news.ycombinator.com/item?id=13983085 This algorithm is only fast because it does substantially less than other algorithms. I only wrote it because I could not wrap my head around the more famous Myers' O(ND) algorithm. A side benefit, in this case, of doing less is an algorithm that produces substantially more accurate results.

  • Narishma 8 years ago

    > V8 cannot compile arithmetic assignment operators, which it calls left-side expressions, so you can see a rapid speed boost in V8 when you replace something like a += 1 with a = a + 1.

    Is there a reason it can't? I'm not familiar with Javascript, but aren't the two expressions equivalent?

    • austincheney 8 years ago

      The two expressions are equivalent. V8 cannot compile that logic into optimized bytecode due to a violation in its code engine that conflicts with other optimization logic. So instead of fast compiled code the code in the local scope of that expression is slow string interpreted code.

      https://github.com/vhf/v8-bailout-reasons

      • titzer 8 years ago

        That list is for CrankShaft which has been replaced by TurboFan > 6 months ago. If you continue to experience slowdowns please file a bug and it can be investigated.

        • maaark 8 years ago

          There's an automotive engineer somewhere reading this, irrationally upset at the idea of replacing a crankshaft with a turbofan.

      • chillee 8 years ago

        Do you have a source that cites the += example? I can't seem to find it on the page you linked.

        • austincheney 8 years ago

          I remember seeing the cause of this specific case mentioned in a slide deck by one of the V8 engineers. I don't remember where online it is. I was to validate this performance limitation more than a year ago through self-testing in my personal code.

          As titzer mentioned this issue may no longer exist. I would have to run additional tests to independently make an assessment with the current V8.

  • whatyoucantsay 8 years ago

    > "It means minimizing use of nested loops, which exponentially increase statement count."

    Nesting two loops has an n^2 cost and nesting 3 levels deep costs n^3. At no point does it ever cost 2^n or any x^n.

    It's polynomial, not exponential.

    • austincheney 8 years ago

      Before I begin just let me say I am not a mathematician. I program so that I don't have to do complex math.

      I know in reality the frequency of iterations varies considerably but for simplicity of discussion let's remove variability.

      Say we have a loop with 1000 iterations. That is at minimum 1000 statements in the loop body plus expression overhead from the loop itself. If this loop is nested once with a same sized loop there are now 1,000,000 iterations plus some expression overhead per iteration. If it is nested twice deep there are now 1 billion iterations.

      That example is exponential of 1000. Given that there is overhead associated with operation of a loop it is actually greater than exponential. It may not be quite so dramatic as a logarithmic growth curve though.

      I completely concede that in reality loops vary in iteration count and so nested loops aren't likely perfectly exponential unless their iteration counts are identical. The increase of iterations from nesting loops does increase more dramatically than a simple multiplicative operation as the depth of loop nesting increases, such that the growth of total iterations is a curve on a graph. A polynomial growth operation when graphed should present a straight diagonal line without the presence of a third variable.

      • tachyoff 8 years ago

        I wouldn’t call time complexity particularly complex math. Imagine a loop. It does something n times. Within that loop, you do something n times. For each outer looo through n,you do an inner loop through n. Trivially, this is n*n, or n^2, also known as quadratic time. If you nest another loop, it becomes n^3, or cubic time.

        Anyway, there might be some confusion of terms, perhaps. Exponential time in the algorithmic complexity sense is any algorithm that takes 2^n operations to complete. If you’re talking about the number of iterations after you loop through n things n times, then that increase itself would not be exponential. But that delta in iterations is unrelated to exponential and polynomial time complexity.

        • austincheney 8 years ago

          I am sooo not a math person.

          Let's not forget there is overhead to loops. At a minimum let's assume there is a single statement in the loop body, an increment statement and a terminal condition. In a single loop of 1000 iterations there are 3000 things to evaluate. The math becomes:

          (n * 3)^x

          If a simple loop is nested twice (3 depths) there would be 1 billion iterations but about 27 billion evaluations. Would it be correct to say that is just slightly faster polynomial growth?

      • mlevental 8 years ago

        >that example is exponential of 1000.

        no it's not. you simply have the definitions mixed up. exponential slow down or speed up means a^x where x=1000. you are describing polynomial growth, i.e. x^a (where x=1000 and a=3)

        • yorwba 8 years ago

          It is exponential in the number of nested loops, which is what's important for the realization that adding more nested loops is bad.

          • whatyoucantsay 8 years ago

            > "It is exponential in the number of nested loops, which is what's important for the realization that adding more nested loops is bad."

            Adding more nested loops is bad, but it's not exponential. It's polynomial.

            As you nest more and more loops, the big O complexity goes from N to N^2 (quadratic) to N^3 (cubic) to N^4, etc... N^(any number) is polynomial. Exponential would be 2^N or 3^N or any number raised to the N.

            See: https://en.wikipedia.org/wiki/Exponential_growth

            • nickloewen 8 years ago

              OK I had to write things down to try and make sense of this. If anyone is like me, consider this loop that loops 5 times... maybe it will help?

                defines = 0
                tests = 0
                increments = 0
              
                defines++
                for (let i = 0; i < 5; i++) {
                  tests++
                  increments++
                }
              
                console.log(`defines: ${defines}, tests: ${tests}, increments: ${increments}`)
              
              For the sake of space I'm not going too copy nested versions of that, but imagine nesting it two and then three levels deep.

                1 level will increment 5 (5^1) times    
                2 levels will increment 25 (5^2) times    
                3 levels will increment 125 (5^3) times
              
              More interesting are the number of definitions and tests:

                levels    tests    defines
                1         15       1 
                2         30       6
                3         155      31
              
              But I don't know is how the interpreter works and whether it has tricks to shrink those numbers; those just reflect my naive mental model of how things work.
            • kazagistar 8 years ago

              If N is the number of nested loops, and M is the number of times through the loop, then it is indeed a O(M^N). So indeed, complexity scales exponentially with the level of nesting. The wording was just off amd confusing due to it being a nontraditonal formulation of the problem, but what he was saying does actually make sense.

              • AstralStorm 8 years ago

                You will probably blow up stack on counters or iterators way before you reach even close to 2^N behaviour.

            • yorwba 8 years ago

              Something can be exponential in one context and polynomial in another. Asymptotic analysis is about the growth rate of a function in terms of some input variable. The results you get depend on which values you assume to be fixed while others vary. It is usually applied to analyze the runtime of a program in terms of the input size, which is the basis of classifying the runtime complexity, but that's not the only way you can do it.

              If you have a sequence of programs with polynomial runtime, but which grows as N, N^2, N^3, N^4, ..., that's a textbook example of exponential growth. It is not generated by running a program on increasingly larger inputs, so there's nothing to be put in the exponential runtime complexity class, but it's exponential nonetheless.

              • whatyoucantsay 8 years ago

                > If you have a sequence of programs with polynomial runtime, but which grows as N, N^2, N^3, N^4, ..., that's a textbook example of exponential growth

                No, it isn't. N, N^2, N^3, N^4, ... is polynomial, not exponential. Exponential would be X^N. Look at the graph on the Wikipedia page I linked to.

                • yorwba 8 years ago

                  Each element of the sequence is a polynomial, but the sequence of polynomials grows exponentially. X^N is exponential in N, polynomial in X. N^X is exponential in X, polynomial in N.

                  • whatyoucantsay 8 years ago

                    > Each element of the sequence is a polynomial, but the sequence of polynomials grows exponentially.

                    Bringing this back to the original comment about nesting loops, perhaps what you're describing is some sort of metaprogramming that dynamically generates increasingly deeply nested loops based upon the size of the input. If someone were to write that program then yes, it would be exponential.

                    In the far more common case of a programmer nesting a few loops, the resulting run-time would be polynomial, not exponential.

                • IncRnd 8 years ago

                  What you are describing now is called superpolynomial time in computer science.

            • dbaupp 8 years ago

              In this case, the number of iterations (1000) of each loop is being held fixed, and the depth of nesting is varying, i.e. the body of the inner loop executes O(1000^depth) times.

              • colanderman 8 years ago

                Nitpick: "O(…) times" is nonsensical. O-notation applies only to behavior in the limit. Notably, O(some constant) is exactly equivalent to O(1).

                • dbaupp 8 years ago

                  What exactly are you nitpicking?

                  1000^depth is not a constant: the function here is "f(depth) = number of times the inner loop body executes". f(depth) = O(1000^depth).

                  And, "times" here is serving the same role as "comparisons" in "Mergesort takes O(n log n) comparisons": it's referring to the thing that f is actually counting.

                  • colanderman 8 years ago

                    Given O(f(x)), x is almost always understood to be measurement of the work input to the algorithm. Loop nesting depth is not a measurement of work input, it's a property of the code – "1000^depth" is constant for any given algorithm.

                    Yes, O(…) is just a mathematical construct, so you can apply it the way you have to mean "the amount of work done for a given input size increases exponentially with respect to the number of nested loops iterating over the input"… but that's not how most people interpret it in the context of a computer algorithm. (The exception would be if the nesting depth were dynamically determined by an input parameter, but I don't think that's what anyone here is talking about.)

                    Anyway I'm not trying to argue, I agree with your point, but I think everyone here is talking past each other trying to say the same thing, ultimately due to imprecision in semantics.

                    • dbaupp 8 years ago

                      The only difference between what is "code" and what is "work" is context: in this case, the work of interest is a property of some input code. If you want a framing that matches your notions of input better, just imagine we're measuring the asymptotics of a VM/interpreter, and then the code truly is the input.

                • yorwba 8 years ago

                  In this case depth is not constant, so 1000^depth isn't either.

                  And usually when you encounter O(some constant), it's meant as "of the order of magnitude of", i.e. somewhere between some constant/10 and some constant * 10. That isn't the definition used here, but seems to be the cause of most complaints about asymptotic analysis being misapplied when no asymptotic analysis was being done in the first place.

                  • colanderman 8 years ago

                    See my answer to the GP. `depth` – the lexical nesting depth of "for" loops in the program text – is constant with respect to any given algorithm.

      • Ar-Curunir 8 years ago

        You won't have random extra nested loops appear out of nowhere in your code

    • simonbw 8 years ago

      > At no point does it ever cost 2^n or any x^n.

      That's only true if your n is referring to the number of iterations of each loop. If your n instead refers to the number of nested loops, you will indeed get a runtime of O(2^n).

      Seeing as in this context we were talking about varying the number of nested loops, it makes sense that we would define our variable n = number of nested loops.

  • oldandtired 8 years ago

    One would have expected that the arithmetic assignments operators would have been faster as

    a += 1 would only compute the address of "a" only once and then duplicate it, whereas

    a = a + 1 would compute the address of "a" twice. For more complicated examples the first should see an even faster speedup.

    So, from my perspective, there is a serious problem here in the optimiser.

nickm12 8 years ago

I've got to admire the graciousness in this response. It's making the point that mraleph's “Maybe you don’t need Rust and WASM to speed up your JS” article completely neglected code maintainability as a factor, but it does so without turning the whole thing into a pissing match. It's all been a fascinating to read.

  • mraleph 8 years ago

    > completely neglected code maintainability

    Where did I neglect maintainability as a factor? The only optimization that potentially affects maintainability is manually allocating Mapping-s in the typed array. And there I openly acknowledged that it affects readability and makes the code error prone. All other optimizations are not in any way affecting maintainability.

    Even typed array optimization is purely confined in the library internals... On the other hand WASM spills out of the library by requiring users to explicitly destroy SourceMapConsumer.

    • ghusbands 8 years ago

      Turning a function into a string and back is also a reduction in maintainability, as programmers now have to be sure it doesn't use any captured variables or such, in future edits. Using a Uint8Array and integer constants rather than a string and character constants makes the code harder to read. Separating the sorting pass and doing some of it lazily (for good performance reasons) makes the final ordering slightly less clear.

      But "completely neglected code maintainability" is definitely unfair. While you made changes that reduced maintainability, you weren't neglectful.

  • wwwigham 8 years ago

    It's also probably not worth overtly begging the question of weather maintaining multiple languages within one project is worth the burden (seriously the full build for the `source-map` package is complex in comparison to the usual JS state of affairs if you want to experiment with the now-Rust bits), especially considering that at least part of the reason node became popular was to have one language for all needs, since focusing on that would invite that charged discussion. As a polyglot developer, I welcome all the mixing of the languages that wasm is bringing (certainly, it makes my flexible skillset more valuable); but honestly I do think we're leaving _something_ of the past homogeneity behind in doing so. (Was FFI quality all that was stopping us from mixing tons of languages in our non-browser projects before?)

    Circling back; both authors have their biases - it's best to read both articles skeptically, and IMO, take away the sage bits of advice they both echo and not anything about a specific technology: You should make the right choices for your project and your design, performance, and maintenance needs (including making your own evaluations), rather than jumping on some bandwagon without being properly informed. Also that profile-guided algorithmic improvements are usually the easiest place to make sweet, sweet perf gains (in any language) before you have to get into hairy maintainability trade-off decisions.

    • AstralStorm 8 years ago

      Unfortunately, JS does not lend itself to ease of maintenance - it is bad that it became language for the web browsers.

      You get dynamic weak typing making reuse more complex, unpredictable (nonlinear in performance with coffee changes) GC and JIT, weird type system. No error handling facilities in the language either.

      Heck, compared to JS even modern Java (which shares the GC and JIT unpredictability) or C++ (incl. arcane syntax and less safety if you like living dangerously) seem easy to achieve predictable results with.

      Rust takes more work you front - but not that much more.

Felz 8 years ago

Funny, I was using the source-map library under Nashorn. The performance was poor enough that I had to switch to embedding V8; I'm not sure whether that was a consequence of Nashorn itself being too slow, or the Javascript optimizations intended for V8/Firefox just completely missing their mark.

Not that the WASM version of the library would've helped, since Nashorn doesn't do WASM at all. But maybe the performance would've been decent if it had.

hobofan 8 years ago

> But a distinction between JavaScript and Rust+WebAssembly emerges when we consider the effort required to attain inlining and monomorphization, or to avoid allocations.

I'm not sure that is true. Having worked/interacted with a lot of people working with Rust on different experience levels, most of them (that includes me) don't have a deep knowledge of what Rust concept maps to a specific concept with which performance implications. And if they do it's often only partial. I'd say that right now, only very few people that don't work on the Rust compiler have a broad knowledge in that area. Sure, it's much better to have to Result of the optimization expressed in code itself, but I'd say that the amount of knowledge and effort required to get to such a level of optimization is similar to optimizing Javascript.

I also found the hint to `#[inline]` suggestions, a bit disingenuous. In the end they are just _suggestions_, and your are just as much at the mercy of the Rust/LLVM optimizer to accept them, as you are with a Javascript JIT.

I'm a big fan of Rust, and I'm a big fan of Rust+Webassembly (working with it is the most fun I had programming in a long time!). Generally I think that Rust has one of the better performance optimzation stories, I just don't a gree with some of the sentiments in the post. There are also enough other reasons to love Rust+WebAssembly than just the peformance!

  • lambda 8 years ago

    > I'd say that the amount of knowledge and effort required to get to such a level of optimization is similar to optimizing Javascript.

    Really? After reading all of the articles in this series (the original about porting to Rust, the rebuttal about optimizing JavaScript, and this one)?

    I'm more familiar with Rust than JavaScript, but I found that other than the algorithmic optimization, the rest of the JavaScript optimizations were very non-obvious in order to achieve an effect that is entirely natural in Rust.

    There's no need to be careful to avoid particular types of function calls, since there are no dynamic variadic function calls in Rust. There's no need to resort to using the equivalent of `eval` for monomorphization, it's a natural feature of the language. There's no need to do manual memory management by encoding values into typed arrays of integers; you can simply allocate vectors of structs, borrow references, and so on.

    These are all things that had significant cost in JS, and needed to be worked around via some intensive profiling and knowledge of how JIT engines work, but just doing things naturally in Rust leads to a pretty much zero-cost solution.

    It's true that if you want to really get into the nitty-gritty of micro-optimization in Rust, you need to learn some more and do things like profiling and inspecting the generated code to see what optimizations the compiler was able to apply or not.

    But the Rust rewrite of source maps did none of that; they just rewrote it in fairly idiomatic Rust, and achieved a similar speedup to what a JIT engineer armed with a profiler, a deep knowledge of what affects how well a JIT works, and a willingness to do manual memory management in typed arrays was able to do.

    > Having worked/interacted with a lot of people working with Rust on different experience levels, most of them (that includes me) don't have a deep knowledge of what Rust concept maps to a specific concept with which performance implications

    What parts of Rust do you feel like you don't understand the performance implications of? I feel like the mental model for performance is relatively similar to C or C++, with relatively few things that would surprise you if you're familiar with modern C++ (in fact, a lot fewer performance surprises than modern C++ offers, in addition to the extra safety).

DannyBee 8 years ago

I really don't understand this article, and the claims really rub me the wrong way.

The main point it makes is, again "He perfectly demonstrates one of the points my “Oxidizing” article was making: with Rust and WebAssembly we have reliable performance without the wizard-level shenanigans that are required to get the same performance in JavaScript."

This doesn't make a lot of sense as a claim.

Why? Because underneath all that rust .... is an optimizing compiler, and it happens the author has decided to stay on the happy path of that. There is also an unhappy path there. Is that happy path wider? Maybe. It's a significantly longer and more complex optimization pipeline just to wasm output, let alone the interpretation of that output. I have doubts it's as "reliable" as the author claims (among other things, WebAssembly is still an experimental target for LLVM). Adding the adjective "reliable" repeatedly does not make it so.

Let's ignore this though, because there are easier claims to pick a bone with.

It also tries to differentiate optimizations between the two in ways that don't make sense to me: "In some cases, JITs can optimize away such allocations, but (once again) that depends on unreliable heuristics, and JIT engines vary in their effectiveness at removing the allocations."

I don't see a guarantee in the rust language spec that these allocations will be optimized away. Maybe i missed it. Pointers welcome.

Instead, i have watched plenty of patches to LLVM go by to try to improve it's heuristics (oh god, there's that evil word they used above!) for removing allocations for rust. They are all heuristic based, they deliberately do not guarantee attempting to remove every allocation (for a variety of reasons). In general, it can be proven this is a statically undecidable problem for a language like rust (and most languages), so i doubt rustc has it down either (though i'm sure it does a great job in general!)

The author also writes the following: "WebAssembly is designed to perform well without relying on heuristic-based optimizations, avoiding the performance cliffs that come if code doesn’t meet those heuristics. It is expected that the compiler emitting the WebAssembly (in this case rustc and LLVM) already has sophisticated optimization infrastructure,"

These two sentences literally do not make sense together. The "sophisticated optimization infrastructure" is also using heuristics to avoid expensive compilation times, pretty much all over the place. LLVM included. Even in basic analysis, where it still depends on quadratic algorithms in basic things.

If you have a block with 99 stores, and ask LLVM's memory dependence analysis about the dependency between the first and the last, you will get a real answer. If you have 100 stores, it will tell you it has no idea.

What happened to reliable?

Why does this matter? For example: Every time rust emits a memcpy (which is not infrequent), if there are more than 99 instructions in between them in the same block, it will not eliminate it, even if it could. Whoops. Thats' a random example. These things are endless. Because compilers make tradeoffs (and because LLVM has some infrastructure that badly needs rewriting/reworking).

These "sophisticated optimization infrastructures" are not different than JITs in their use of heuristics. They often use the same algorithms. The only difference is the time budget allocated to them and how expensive the heuristics let things get.

There may be good reasons to want to write code in rust and good reasons to believe it will perform better, but they certainly are not the things mentioned above.

Maybe what the author really wants to say is "we expect the ahead of time compiler we use is better and more mature than most JITs and can spend more time optimizing". But they don't.

Maybe it would also surprise the author to learn that their are JITs that beat the pants off LLVM AOT for dynamic languages like javascript (they just don't happen to be integrated into web browsers).

But instead, they make ridiculous claims about heuristics and JITs. Pretending the compiler they use doesn't also depend, all over the place, on heuristics and other things is just flat out wrong. At least to me (and i don't really give a crap about what programming language people use), it makes it come off as rampant fanboism. (Which is sad, because i suspect, had it been written less so, it might be actually convincing)

  • tedmielczarek 8 years ago

    > Why? Because underneath all that rust .... is an optimizing compiler, and it happens the author has decided to stay on the happy path of that.

    There are two big differences here: 1) You're comparing "staying on the happy path" of a JIT compiler in the JS case, vs. an optimizing compiler in the Rust case. With the latter you can just compile your code and see what comes out and it tends to be fairly predictable. With the former, I'm not even sure there are tools to inspect the generated JIT code, and you're constantly walking the line of JS engines changing their heuristics and throwing you off the fast path. This was one of the primary motivations for the asm.js/WebAssembly work: the ability to get predictable performance.

    2) Many of the optimizations mraleph performed were tricks to avoid allocation (which is normal optimization stuff, but more of a pain in GCed languages). In JS he winds up having to effectively write C-in-JS which looks pretty hairy. In Rust controlling allocation is a built-in language feature, so you can write very idiomatic code without heap allocation.

    • hobofan 8 years ago

      > what comes out and it tends to be fairly predictable

      Predictable as long as you stay on the same version of the compiler (yes, I know that there are crater runs to prevent regressions). Also, how does much can/does the output for different target architetures differ in performance? Couldn't that be likened to trying to optimize for multiple JS engines?

  • tmandry 8 years ago

    To address your question about allocations, in Rust, you always know when you are allocating on the heap vs on the stack. The way you write your code guarantees it. Stack allocations are “basically free” compared to the heap because the memory management overhead is negligible or nonexistent. There are also guarantees about when objects you allocate on the heap are freed; there is no garbage collection.

    This is in contrast to JavaScript where objects are always allocated on the heap, and are garbage collected.

    So Rust gives you a lot more control and flexibility about how memory is managed. This might or might not matter for your use case, of course. There are compiler optimizations and heuristics -on top of that-, to be sure, but you end up with a lot more guarantees about how your code executes, because that’s what the language is designed for.

    EDIT: If you want to learn more about memory management in Rust, see https://doc.rust-lang.org/book/first-edition/the-stack-and-t...

    • DannyBee 8 years ago

      "here are also guarantees about when objects you allocate on the heap are freed; there is no garbage collection. This is in contrast to JavaScript where objects are always allocated on the heap, and are garbage collected."

      Surely you realize that this, and what the author wrote, are basically the same ever rehashed GC vs non-GC language discussion. Performance characteristics of each is not anywhere near as simple as any of these things claim, so i'm going to leave this alone.

      "EDIT: If you want to learn more about memory management in Rust, see https://doc.rust-lang.org/book/first-edition/the-stack-and-t...

      Again, i see no guarantees here.

      I see it saying things like "This program has one variable binding, x. This memory needs to be allocated from somewhere. Rust ‘stack allocates’ by default, which means that basic values ‘go on the stack’."

      Can someone point me to an actual language level guarantee in a spec? I looked, and i don't see it.

      I can't see anything that makes it non-conformant to build a rust compiler that dynamically allocates and places all of these on the heap, and is very non-constant time for local variable allocation.

        But again, i didn't spend more than a few minutes browsing, so i may have missed it (For example, https://doc.rust-lang.org/reference/memory-allocation-and-lifetime.html says nothing here, the requirements i can find in this entire chapter can easily be met by using dynamic allocation. There are not guarantees on space or time usage that would require a real stack :P).
      
      In fact, the vast majority of requirements here look like they could be met by a garbage collected heap/stack. It would be horribly inefficient, but ...

      I'm totally willing to believe nobody has written down a good enough spec yet, just saying i don't see it ATM :)

      I want to strongly differentiate between what "one implementation of rust does" and what "the language guarantees". Because if you are going to claim it's rust that makes the guarantee, as the author did, you should be able to back the claim up.

      • pcwalton 8 years ago

        > I want to strongly differentiate between what "one implementation of rust does" and what "the language guarantees". Because if you are going to claim it's rust that makes the guarantee, as the author did, you should be able to back the claim up.

        It's perfectly reasonable to use "Rust" as a proxy for the only implementation that can be realistically deployed to the Web (rustc). Everyone knows what the author meant.

      • dbaupp 8 years ago

        > I can't see anything that makes it non-conformant to build a rust compiler that dynamically allocates and places all of these on the heap, and is very non-constant time for local variable allocation.

        This is such a ... pointless nitpick? https://www.xkcd.com/115/

        I mean sure, the language doesn't require that compilers don't emit dumb code, it is just designed so that it's easy to emit good code. Something with fewer static guarantees like JS makes it much harder for compilers to emit that good code. I don't think there's any ground to contest that.

        In any case, if you just replace "stack allocates" etc. with "semantically stack allocates", you get the language's guaranteed behaviour (although Rust has no ISO spec or anything, so you're probably going to say that even that isn't truly guaranteed).

        Mean, it's fair that the language used is quite strong, but still, spelling literally everything out with every possible caveat is a great way to have bad pedagogy.

        • jdright 8 years ago

          He is a lawyer, what would you expect? Being pendandic is his job. Not that I agree with him btw. I think he just does not have enough real practical engineering experience to understand it well.

          • pcwalton 8 years ago

            Daniel Berlin has way more practical engineering experience than you or I. He's a longtime GCC contributor.

            (I just think in this specific instance he's wrong.)

  • acemarke 8 years ago

    I think the author's primary point was that the work done by mraleph required _deep_ knowledge of the V8 JIT internals and low-level profiling to get those "3x speedup" results, plus the algorithmic improvements. Meanwhile, the original Rust implementation got the same "3x" results without having to do deep analysis of how the compiler was behaving. It also seems (based on the commentary) that the way JS/JIT engines treat WASM bytecode is likely to require fewer special-cases or heuristics than plain JS.

    Sure, the Rust compiler and the LLVM infrastructure are doing a lot of complicated work internally... but the end users of the compiler aren't having to spend time digging through the guts of it to guess what kind of magic sequences are needed to get fairly good performance.

    • DannyBee 8 years ago

      "but the end users of the compiler aren't having to spend time digging through the guts of it to guess what kind of magic sequences are needed to get fairly good performance."

      That's precisely my point: The claims they make about this in this article that claim to prove this are demonstrably false. So that may even be true, but it's definitely not anything in this article that shows that.

      If the author had just said "hey, when i wrote this version, it performed better and was easier", that'd be great, and awesome.

      Instead, it seems they wanted to make more general claims, and to be honest, don't seem to have a lot of idea what they are talking about on that front, and to anyone who does have experience in this area, like i said, it comes off very badly.

      I'll also point out, if you expect to not need to do profiling and algorithmic improvement to anything to get significant speedups, that's also similarly silly. It's just not realistic for any language in the real world. The only question is "which of the code you write will this be true for" not whether it will be true.

  • imtringued 8 years ago

    The article clearly admits that the LLVM compiler uses heuristics. You even quoted that part of the article.

    >WebAssembly is designed to perform well without relying on heuristic-based optimizations, avoiding the performance cliffs that come if code doesn’t meet those heuristics. It is expected that the compiler emitting the WebAssembly (in this case rustc and LLVM) already has sophisticated optimization infrastructure, that the engine is receiving WebAssembly code that has already had optimization passes applied, and that the WebAssembly is close to its final form.

    But really the point of the article is the last part.

    >that the engine is receiving WebAssembly code that has already had optimization passes applied, and that the WebAssembly is close to its final form.

    The compiler applies heuristics once during the compilation step and then never again. Compare this to JITs which are constantly changing and can pull the rug from under you.

    >Maybe it would also surprise the author to learn that their are JITs that beat the pants off LLVM AOT for dynamic languages like javascript (they just don't happen to be integrated into web browsers).

    Yes but it gets even better! If you limit yourself to a formalised subset of javascript called asm.js which gives you fine control over memory layout and allocation you can reach even the performance of C! Have you heard of it's successor? I think it's name was Web Assembly and every major browser has integrated it. It's a really cool technology that shows how sandboxed JITs can have the same performance characteristics as AOT compilers.

  • littlestymaar 8 years ago

    Allocation in Rust are opt-in, you allocate when using the `Box` keyword, when using reference-counted pointers or when using a heap-allocated data structure (vector, hashmap, etc.). If you do neither of those, you have zero heap allocation, 100% of the time, there is no heuristic involved here.

    For optimizations, you're right : LLVM uses a lot of heuristic. But this happens at compile time, when you generate your webassembly blob. When you have it, it will run at a consistent speed across browsers and between several generations of the same browsers. None of this is achievable in plain JavaScript, where JS optimized for old V8 (crankshaft) won't be optimal for SpiderMonkey or newer version of V8 (Turbofan).

  • gpm 8 years ago

    What you're completely missing here is that a naive rust compiler would be reasonably close in speed to LLVM's non-naive compilation. Sure some inlining helps a bit, and llvm does some really fancy things that help a bit, but even without those Rust would still be a reasonably fast language. As such you can just right normal rust, and your worst case where the compiler completely fails to help is reasonably fast.

    A naive javascript interpreter is really slow, spidermoney and V8 do a ton of work to make it reasonably fast. If there's an important piece of code where the compiler doesn't even completely fail to help, but just helps less than normal, the code is no longer reasonably fast.

  • dbaupp 8 years ago

    > Why? Because underneath all that rust .... is an optimizing compiler, and it happens the author has decided to stay on the happy path of that. There is also an unhappy path there. Is that happy path wider? Maybe

    I think you've got a fair point, that all optimizing compilers and JITs have happy paths and unhappy ones, falling off the happy path for either form will result in slower code. But, the happy path being wider kinda seems like the key thing here?

    Additionally, AOT has consistency, in that it doesn't depend on runtime data. This can obviously leave some performance on the table if a JIT ends up optimizing for the data that happens to be used in a given session, but also means performance usually doesn't (accidentally) depend on global state.

    > Instead, i have watched plenty of patches to LLVM go by to try to improve it's heuristics (oh god, there's that evil word they used above!) for removing allocations for rust

    I recall some that just told LLVM the names of the symbols Rust uses instead of malloc/free, so that it could do its standard escape analysis of completely pointless (that is rarely useful for Rust, IME), but didn't actually touch the rest of LLVM.

    In any case, rustc (and the current understanding of Rust as a language) doesn't insert allocations itself, because there's no reason to, unlike JS, where the allocations are essentially semantically required. That is to say, Javascript's language model is so flexible that many things have to allocate by default (as it's the only way to get the required shared-mutation behaviour), where as in Rust, heap allocation is implemented as a standard library construct (it's not necessary for the language: e.g. libcore is a bare-metal subset of the standard library that works with no dependencies) and people won't manually put things on the heap unless they have to.

    > Maybe it would also surprise the author to learn that their are JITs that beat the pants off LLVM AOT for dynamic languages like javascript (they just don't happen to be integrated into web browsers).

    Are you saying that a JIT for JS beats LLVM for... JS? I don't think that's particularly surprising (and probably especially not for someone who seems to have a lot of history with writing JS engines/Spidermonkey, like the author), given things like webkit's experience: https://webkit.org/blog/5852/introducing-the-b3-jit-compiler... .

IncRnd 8 years ago

> This is a factor of 4 improvement!

This is a common mistake. It should read, "This is a factor of 3 improvement!"

x+x+x+x is an improvement over x of 3x not of 4x. The improvement factor is 3.

  • recursive 8 years ago

    It's 3x faster than, but 4x as fast as.

  • mraleph 8 years ago

    I am not a native speaker, so I was always under impression that "factor of K larger" means something was X and became K * X (e.g. the multiplicative factor was 1 and became X). Can I read somewhere about the correct usage?

    • IncRnd 8 years ago

      www.wikihow.com/Calculate-Percentage-Increase

      The general equation for an increased quantity from A to B is

        increase = B - A
      
      So a percentage increase is

        increase * 100
      
      not

        (increase + A) * 100
      • mraleph 8 years ago

        but I am not talking about the percentage increase I am talking about increase "by a factor of X" that is X = B/A.

        See: https://ell.stackexchange.com/a/52747

        Maybe my problem is that I am missing "by".

        • IncRnd 8 years ago

          You are exactly talking about percentages.

          B/A is the percentage of the new value relative to the old.

          (B-A)/A is the percentage of the increase of the value.

          This is algebra not English.

          1) You start out with one rock. 2) You add another rock, getting a pile of two rocks. 3) The rock pile size is now 200% of the original size. But, the increase of the rock pile size is 100%. 4) According to the rules of algebra, 100% = 1. In this case 100% = 1 rock, since the unit under discussion is rocks.

          • neurotrace 8 years ago

            They absolutely are not talking about percentages. They're talking about factors. A factor is a term in a multiplication. In standard US English, increasing something by a factor means to multiply it by that number. Even the GMAT uses that phrase in that way. You absolutely can split hairs on this and the phrasing can be confusing but it is absolutely understood that "increased by a factor of 4" means to multiply by 4. If one was talking in terms of percentages, they would say something like "increased by 300%".

            • IncRnd 8 years ago

              >* increasing something by a factor means to multiply it by that number*

              That's absolutely wrong... When you multiply something you scale it by a factor. When you add to something you increase it.

              1) 4A is 400% of A.

              2) 4A in an increase of 300%.

              3) 4A is an increase of 3A.

              4) 4A is an increase of a factor of 3.

              In the most charitable interpretation you are confusing the terms of exponential growth with linear scaling and misapplying from one to the other. A growth factor is not the same as an increase in a factor.

      • IncRnd 8 years ago

        I forgot the division in that post, and I can't edit it. See the post below. Sorry.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection