Optimising Common Lisp to try and beat Java and Rust on phone encoding 2/2

renato.athaydes.com

78 points by mark254 4 years ago · 25 comments

Reader

There are a lot of interesting comments in the Lisp subreddit regarding the second part of the blog: https://www.reddit.com/r/lisp/comments/q5f7u5/revenge_of_lis...

rob74 4 years ago

In case anyone else is wondering what the author means by "phone encoding": it's an algorithm trying to map telephone numbers to words (via the letters usually printed on telephone keypads). Would have been better to call it "phone number encoding" IMHO...

Cryptonic 4 years ago

The Rust code is bar far not optimized. For example while loading the dictionary, why creating a Vec and returning it instead of operating on a max word size array and reusing it. Also why not write everything at the end. I'm not a Rust Professional also, but maybe get a review by one please before benchmarking against something else.

chrismorgan 4 years ago

Another point where it’s doing something that to me as a Rust expert is obviously inferior: it’s using Unicode-aware string stuff although anything non-ASCII will either be ignored (if non-alphabetic) or panic (if alphabetic). It’d certainly be better to treat the input throughout the program as a sequence of bytes rather than as UTF-8.
This type of thing reminds me of the three articles ending in https://fitzgeraldnick.com/2018/02/26/speed-without-wizardry... (which has links to the first two parts of the saga), where one guy rewrote stuff in Rust for performance, another demonstrated how it was possible to make the JavaScript version faster than the Rust by some algorithm changes and by various painful and fragile tricks requiring detailed knowledge of the runtime environment, and finally the first guy applied the applicable parts of that back to the Rust, after which it handily beat the JavaScript again while also being more consistent and dependable.
- nerdponx 4 years ago
  
  It's worth distinguishing between algorithmic optimizations, optimizations that generally take advantage of the language standard/runtime, and optimizations that are highly specific for one machine/platform/implementation. It's also worth keeping track of relative programmer effort to optimize.
  I think most people are moderately-optimized benchmarks, i.e. moderate effort expended relative to baseline implementation effort.
  That is, people are interested in getting the most performance out of the least amount of effort.
  Obviously some people want and need to care about extreme peak optimization. But if you are writing benchmarks for a wide audience, that probably should not be your priority.
- mst 4 years ago
  
  The author was pretty explicit in the article that the rust implementation was suboptimal.
  I'm sure if you submitted a better implementation he'd be happy to add it in.
  - dwohnitmok 4 years ago
    
    Eh kind of... The author was talking about a current iteration of the Rust code being suboptimal, not the current one, which the author believed was well-optimized.
    > The objective is to get the fastest implementation possible, and having optimised Java and Rust implementations to compare against is a motivator to keep going until there really isn’t anything else that can be tried!
    > Do you think Common Lisp can run as fast as well-optimised Rust? Read on if you want to find out.
    
    mst 4 years ago
    
    That was based on all of the rust feedback the author actually got from rust people who talked to him as opposed to people who complained in HN comments.
    Hence if you want it to get even better you should provide feedback to the author, and while I absolutely respect your right to decide you can't be bothered, I don't think it's his fault that he's using code that was well-optimized according to the rustaceans who -did- talk to him.
    
    dwohnitmok 4 years ago
    
    Oh I'm sure there's a better way to give the author feedback than sniping in HN comments.
    I'm reacting to your comment here:
    > The author was pretty explicit in the article that the rust implementation was suboptimal.
    This is not the way the article portrays it.
    
    mst 4 years ago
    
    Perhaps you missed the part where he said you'd need to rewrite it to use a trie to get a properly optimised one?
    
    dwohnitmok 4 years ago
    
    That's only vis-a-vis the Main.java Java implementation, which has a different algorithm, not implementation. The author is not referring to the CL implementation (note that the author explicitly calls out Main2.java, CL, and the Rust versions as being the most similar and worthy of being benchmarked). The Main.java one is more of a curio than anything else, which the author threw in because it was the first implementation they wrote (the article acknowledges it has a different algo and is therefore not directly comparable to the other two). This is not what the author is referring to when they say "well-optimised" (note that the article explicitly excludes Main.java from a lot of its valid comparisons, e.g. "Because the Rust code implements a similar algorihm to Java’s Main2, NOT Main, we should not conclude that Java can beat Rust in speed!")
    The main comparison of the article is Main2.java, main.rs, and main.lisp, as the author both calls out in the article and the attached GitHub repo (as is apparent in the author's choice of optimizations; if it was a comparison of algos, then the CL and Rust versions would be rewritten to use tries as well).
    The point is the author did not explicitly call out this current Rust iteration as explicitly suboptimal RE CL. The closest the article comes to calling the Rust iteration suboptimal is
    > However, the Java and Rust implementations were, as CL’s, written without much thought given to performance, after all the description of the original study which introduced the problem asked participants to focus on correctness and readability, not performance.
    which is referring to the previous iteration of the code, not the current one.
    (It's also evident from Cryptonic and chrismorgan's comments they are talking about implementation-level concerns, not algo-level ones such as Main2.java vs the other implementations)
    
    brabel 4 years ago
    
    There are two or three branches with different versions of the Rust code, and the author is using the fastest one. What you believe will make the Rust code faster won't, trust me.
    If you think I'm wrong, could you please submit a PR and link here?
    @Cryptonic 's suggestions are laughable. Try using arrays as HashMap keys in Rust :D nope, won't even compile let alone be fast. There was a way smarter attempt here to do something *based on* arrays: https://github.com/renatoathaydes/prechelt-phone-number-enco...
    The DigitBytes struct is needed because just using arrays (I guess they mean slices, as arrays are obviously wrong) is incredibly slow - it would need to consider the whole array every time instead of just the relevant bytes - far slower than `Vec`. This is indeed fast, but slower than Vec.
    The other suggestion: print everything at the end?? Do we even know what the objective is here? It's not to finish first, but to show to the user the results as soon as possible. It's like people don't even read the problem proposition and still think it's ok to criticize... also, Rust is using buffered IO... ALSO, the benchmark only prints a single line at the end for the two last runs, essentially doing "print it all at the end".
    
    dwohnitmok 4 years ago
    
    I think you're missing the point of my comments. I know neither enough of CL nor Rust to make a comment on the performance (I have less than ~1000 lines of experience in either and have only built the smallest of toy projects in either language).
    This is my original sentiment:
    https://news.ycombinator.com/item?id=28842546
    mst stated that the author explicitly called out the Rust code as suboptimal. That is a misreading of the original text, which was referring to a previous iteration. The author (and you) clearly does not think the current Rust code is suboptimal.
    
    brabel 4 years ago
    
    > The author (and you) clearly does not think the current Rust code is suboptimal.
    Sorry if it looked like I was being harsh on you particularly, I was not... I was being harsh on people making comments like "this Rust code is very slow" without showing their faster code, making suggestions that would almost certainly be slower or not even compile and other similar things I observed in this thread.
    Never believe anyone saying "this could be much faster" without showing their code so people can actually check it.
    I will believe the OP's code is slow when I see a faster implementation, which I haven't.
    
    mst 4 years ago
    
    Fair enough. I stand by that so far as I can tell he incorporated every implementation optimisation he got as feedback, but you're dead right that I didn't represent his wording correctly.
    
    dwohnitmok 4 years ago
    
    Whoops I meant "previous iteration"
- brabel 4 years ago
  
  > it’s using Unicode-aware string stuff
  Rust uses UTF-8 internally for Strings, so it's very efficient to parse a file into a String, then using slices to go through it... this is probably the best you can get as parsing ASCII input as UTF-8 is very efficient (the 0-bit is always zero in ASCII, the unicode decoder only needs to check that's the case for every byte, so it's not some kind of complicated computation it's doing to decode)...
  If you use bytes for everything, you will make the whole code much harder to follow and it still won't run faster.
  Check for yourself: https://github.com/renatoathaydes/prechelt-phone-number-enco...
  - chrismorgan 4 years ago
    
    The code will be somewhat faster (I don’t care to predict how much) from removing the variable-width character encoding in favour of bytewise access. Yes, pure ASCII stuff has some fast paths in string access, but they’re still decidedly slower than the fixed-width encoding that is [u8]. Using strings also gives the incorrect impression that it can cope with non-ASCII.
    The code will be easier to follow if you use bytes throughout, because currently it’s a mixture of bytewise and charwise operations, so that you need to think a little about whether you’re dealing with char or u8 in each place (and half of them are even mislabelled); and there are suitable alternative ASCII methods for every place that uses charwise methods (e.g. char.is_digit(10) → u8.is_ascii_digit()) so that no extra burden is added. In the end the only place slightly complicated by it is printing the solutions, but more code will have been decomplicated—hotter path code, too—so that it’s easily worth it.
    I don’t know where the code you’re citing came from, it’s newer than what’s on the master branch but in its changes includes some pretty bad stuff like DIGITS, using &str for something that is always a single-character ASCII digit, accessed by already having had the digit as a u8 and turning it back into a string prematurely. Admittedly the optimiser is going to fix a fair bit of that badness, but not all.
    
    brabel 4 years ago
    
    There are a few pull requests that claim to make the code faster, but you can run the benchmarks and see none of them actually did. Why not try to improve the code I posted above and make the apparently small changes you want to make and check if it's faster or not.
    I've tried a few myself and I am almost sure your hints will not work.
    > some pretty bad stuff like DIGITS, using &str for something that is always a single-character ASCII digit, accessed by already having had the digit as a u8 and turning it back into a string prematurely.
    The end result needs to be printing the strings so I don't see how you can work around that. Can you at least post your code doing that in a way that won't totally destroy the performance gains you may have obtained elsewhere?
brabel 4 years ago

Commenting without having to get to the trouble of showing your code is faster is cheap.
Your suggestions would make the code much slower.
There may be ways to make it a bit faster, but not with your silly suggestions.

hajile 4 years ago

Those are pretty impressive results for what didn't amount to a huge amount of changes (mostly just adding some types).

tonetheman 4 years ago

And here come the Rust fan boys telling us the correct way to write the code so it will be faster than anything ever written, much safer than anything ever written and better than any programming language ever written.

chrismorgan 4 years ago

Refer to my other comment here and the cited articles for a fair rebuttal: Rust lets you get equivalent or better performance (than Common Lisp or Java, in this instance) without significant special effort or deep knowledge of the environment, while being much more predictable; and if you do apply deeper knowledge of the language, then it’ll pull well ahead.
Tanjreeve 4 years ago

If there's easy out the box ways to write things that a normal Dev would do without pushing the language to its limits then it seems a bit unfair to ignore it. If I declared python to be the world's most performant concurrent language by hand wiring Cython and the deepest depths of the language and then completely ignored the out the box constructs in other languages that would be a bit misleading too.
Zababa 4 years ago

As a Rust fanboy, Rust's advantage is that I wouldn't be afraid of dropping to its level, while I definitely wouldn't feel comfortable with C++ or C. Once the program is written, it's the usual cycle of optimizations: benchmark, flamegraph, cachegrind, etc.

Settings

Optimising Common Lisp to try and beat Java and Rust on phone encoding 2/2

Keyboard Shortcuts