Rust is now overall faster than C in benchmarks
benchmarksgame-team.pages.debian.netApart from those benchmark games a lot of real world C is a lot less performant than people think it might be. I spent a fair amount of time reviewing C code in the last 5 years - and things that pop up in nearly every review are costly string operations. Linear counts due to the use of null terminated strings and extra allocations for substrings to attach null terminators, or just deep copies because ownership can’t be determined are far more common than exceptional. This happens because null terminated strings feel like idiomatic C to most people.
Rust avoids those from the start by making slices idiomatic.
Another thing I commonly see is the usage of suboptimal containers (like arrays with linear search) - just because it’s there’s no better alternative at hand (standard library doesn’t offer ones and dependency management is messy). Which also makes it less surprising that code in higher level languages might perform better.
The reason that C programs often perform well is that it’s so incredibly hard to do anything at all in C (especially something reliable) that one can usually only do the simplest thing possible and this typically means simple data structures, simple algorithms and arrays. In many ways, modern CPUs are particularly designed to run the machine code generated by C compilers on typical C code like this. Pointers and memory allocation are often painful or fickle so use of pointer-heavy data structures like linked lists is quite uncommon.
The reason that C programs often don’t perform as well as an equivalent rust program[1] is that it’s so incredibly hard to do anything at all in C (especially something reliable) that one can usually only do the simplest thing possible and this typically means simple data structures, simple algorithms and arrays
[1] suppose the programs are independently created at the same time by programmers of equal ability and are idiomatic in their languages. If a C program is rewritten in rust, it is likely to be faster, but that would likely also be true if it were rewritten in C.
> The reason that C programs often don’t perform as well as an equivalent rust program[1] is that it’s so incredibly hard to do anything at all in C (especially something reliable) that one can usually only do the simplest thing possible and this typically means simple data structures, simple algorithms and arrays
Brian Cantrill talks[1] about exactly this: in his C version he was using AVL trees because they are easy to implement, while his Rust version was using B-trees just because he could. And as a result, his naive first attempt in Rust outperformed his long-optimized C code.
So he's comparing the AVL tree he wrote himself to a B-tree optimized to death by someone else?
Out of interest though, does he mention somewhere which actual implementation he used?
Yes, and I suppose that's some of the point: a performant library ecosystem is one hell of a feature.
It sure must be a nice feature - for fast results. Counterpoint, though. I barely used Rust, but when I wanted to play with the Xi editor I got to see the bad places that buying into such an ecosystem can get you to - places formerly pioneered by NPM. I had to download 100s of packages, (IIRC?) there were some build problems, all for something where I don't really see the value of noteworthy dependencies.
If you really need a Btree (like, if you want to make a fair benchmark for a presentation) then you'll absolutely find a reasonable implementation in C. After all, why would you need dependency management for something that should have 0 dependencies?
As an example of library data structures implemented in C - if you want, check out my Red-black tree (not a Btree): https://github.com/jstimpfle/rb3ptr/blob/master/rb3ptr.h . It's really easy to integrate into your project. I think the API (found in rb3ptr.h) is totally usable, and it might also be faster than what you can get with safe Rust - unless you can easily use intrinsically linked data structures in safe Rust.
>The reason that C programs often don’t perform as well as an equivalent rust program[1] is that it’s so incredibly hard to do anything at all in C (especially something reliable) that one can usually only do the simplest thing possible and this typically means simple data structures, simple algorithms and arrays
Reliability is very often the name of the game with C, and part of the reason you might see it written in such a simplistic fashion. In embedded systems, we often follow very strict code convention that has strict requirements on how a C program is to be written. This includes everything from how memory is to be allocated, the maximum number of local variables, the maximum number of the arguments, various limits on a function, constraints to handle errors, etc. We do this because it minimizes potential hazards especially when working with limited memory and system resources.
C is an unforgiving language in that mistakes can occur silently and on mission critical hardware these mistakes can cost more than just your project milestones. These programs are very specialized and often have complex algorithms associated with them. Most of the systems I've worked with include various kinds of feedback control systems, and accompanying algorithms. If you're familiar with control systems, you'll know these algorithms are certainly not trivial by any means.
The challenge with C is more as a developer your C code needs to be perfect. Bugs just aren't an option like they are in other environments. Once a specialized piece of hardware ships it needs to work as intended under a myriad of conditions without powering off for the next 30 years (not always the case but more often than you might think). Sometimes this kind of software is going to be put a position it may have never been designed for. The best we can do is try to add various check/support mechanisms both in software and in hardware, and maintain as safe and hazard free software as possible.
In terms of performance, again coming from an embedded environment, there is almost always a spec we are aiming for. It needs to do X tasks in N time for example. Of course high level design questions like whether to poll or wait for interrupt, or how to broker data from shared resources is decided long before you get to the question of how should I, or if I even should, compare two strings. It is nevertheless advised to go with a trivial solution if the ends satisfy the needs.
Is C time-consuming to write? Is C a "hard" language? These are subjective and depend on the nature of the project. I would certainly never use C with only libc to write a webserver that's going to be serving SaaS Co's backend API.
The irony is that most C developers rebelled against Pascal and similar languages for being programming with a straitjacket, and in the end need to comply with MISRA-C and similar security enforcement regulations to ship anything worth using when human lives are at risk.
And even MISRA C isn't enough to assure the absence of undefined behaviour.
C is a tricky beast to tame.
Not such a big deal in embedded, where the compiler and hardware is always the same, which means that the behavior is still predictable even if undefined
This is not true. Undefined behaviour can mean that
``` bool b; if (b) printf("1 "); if (!b) printf("2 "); ```
might print `1 2 `.
Right on. Unless the compiler documentation promises a consistent handling of the various triggers of undefined behaviour, it is under no obligation to always generate code that handles UB consistently.
To my knowledge there is no C compiler that promises to always handle all forms of UB consistently. The closest you could get would probably be something like Valgrind.
At least for this specific case Microsoft and Google are fixing it by always initializing the variables.
It's not like that is hard to fix, and modern compilers will flag uninitialized variables.
Use your brain and use your tools. C is not really that hard for a large number of problem sets.
Yet, CVE database is full of well known companies that apparently lack such developers.
If their salaries cannot find them, where are they?
Do we have a candidate to prove their secure reports as being wrong?
> It's not like that is hard to fix
In an embedded context, such bugs may be extremely costly to fix, if it's even possible.
> modern compilers will flag uninitialized variables
Reading uninitialized variables is one of the more easily prevented forms of undefined behaviour. As pjmlp points out, undefined behaviour in C/C++ programs is one of the major sources of security vulnerabilities in today's software.
Fascinating. I would’ve thought most commercial C programs would have heavily used linked lists, hashes (dictionary), and binary search trees all over the place.
I assume most C++ programs heavily use more of these advanced data structures, correct?
Heavy use of linked lists is one reason for why commercial C code often is slow. While there are some exceptions linked lists are slower than arrays for most things. Linked lists are easy to implement in C and therefore used a lot.
You can't really compare C with a C++ stdlib or boost, or Qt. Well, some organizations surely have their own repo of debugged, tested, and optimized stuff but I guess a lot of them roll everything on their own again and again - and in C++ there is a rich ecosystem, at least for basic data structures.
The other thing that C++ and Rust have for collections and algorithms is that they follow a standard pattern.
Because of that, it is usually pretty easy to substitute an optimized algorithm or a container for another one with minimal code changes. In C, because there is no standard for how to do containers and algorithms, it is likely that every library does something a little different in terms of conventions making it harder to swap in implementations.
> I would’ve thought most commercial C programs would have heavily used linked lists, hashes (dictionary), and binary search trees all over the place.
Every large scale C code base I have worked on has had these things, and they were used as you say all over the place. If I were to encounter a code base that didn't have these things I would wonder why it was written in C in the first place.
Conversely, I have only seen arrays being used, and never linked lists/hash maps in the few embedded code bases I have worked on.
And I have run across, and implemented, both linked lists, and arrays, in embedded systems. The implementation depended on what made the most sense for the particular problem being addressed.
Mostly agree with your comment but linear search through arrays of size less than a few hundred will typically beat more sophisticated structures such as red-black trees or hashtables. This is due to prefetching and avoidance of unpredictable pointer traversals. Asymptotic complexity is only that: asymptotic.
In many programs in many domains the sizes of these data structures will rarely exceed this limit.
It might be fast, but it'll load CPU caches with that data and it'll evict another useful data. Which means that while this particular code will be fast or at least not very slow, some other code will be slow because its data have to be fetched again.
I have no idea whether that matters or even easy to measure...
> I have no idea whether that matters or even easy to measure...
It is reasonably easy to measure, and the GP is about right. I've measured a crossover point of around a few hundred items too. (Though I'm sure it'll vary depending on use case and whatnot.)
I made a rope data structure a few years ago in C. Its a fancy string data structure which supports inserts and deletes of characters at arbitrary offsets. (Designed for text editors). The implementation uses a skip list (which performs similarly to a b-tree). At every node we store an array of characters. To insert or delete, we traverse the structure to find the node at the requested offset, then (usually) memmove a bunch of characters at that node.
Q: How large should that per-node array be? A small number would put more burden on the skip list structure and the allocator, and incur more cache misses. A large number will be linearly slower because of all the time spent in memmove.
Benchmarking shows the ideal number is in the ballpark of 100-200, depending on CPU and some specifics of the benchmark itself. Cache misses are extremely expensive. Storing only a single character at each node (like the SGI C++ rope structure does) makes it run several times slower. (!!)
Code: https://github.com/josephg/librope
This is the constant to change if you want to experiment yourself:
https://github.com/josephg/librope/blob/81e1938e45561b0856d4...
In my opinion, hash tables, btrees and the like in the standard library should probably swap to flat lists internally when the number of items in the collection is small. I'm surprised more libraries don't do that.
> In my opinion, hash tables, btrees and the like in the standard library should probably swap to flat lists internally when the number of items in the collection is small. I'm surprised more libraries don't do that.
If I recall correctly, the STL provides guarantees that prevents it from taking advantage of flat lists. I think some containers (not arrays) guarantee that they don't move the address of whatever they're containing, even if you insert or remove elements. Even if they switched to flat lists, it would be a flat list of pointers, with all the incurred overhead.
Likewise, I believe I have heard of small string optimisation being impossible with std::string for similar reasons.
> Likewise, I believe I have heard of small string optimisation being impossible with std::string for similar reasons.
not impossible, SSO is implemented to some extent in most mainstream c++ compilers.
I thought that was impossible because it breaks references when moving strings.
I don't see why this would happen. what problems are you envisioning? moves should leave the original object in a valid but unspecified state. that is, a move should not break a reference to the original object itself, but there are no guarantees as to what data it contains afterwards. in the case of a dynamically allocated string, the moved-from object would probably be empty. with SSO, a move is necessarily a copy, so the moved-from string could either be empty or just contain stale data.
are you maybe thinking of iterators? you have to assume iterators are invalid after a move.
No, I think the requirement is that a reference to the actual string buffer must be kept valid when the string is moved. And that property breaks with SSO data, which is not placed in the dynamically allocated buffer - it is placed directly in the object struct (it's an optimization after all), which can't be moved.
EDIT: seems I was wrong, and SSO is allowed for std::string. A similar optimization is illegal for std::vector, though, for the reasons I gave above.
I seem to recall some of this related to absl's flat_hash_map and co. They can't be stdlib compatible because there are api concerns that the std implementations provide (but that in practice no one uses) that result in real world performance being left on the table.
Specifically, the bucket interface: https://en.cppreference.com/w/cpp/container/unordered_map#Bu...
Yes, the address of any element of an std::map is valid until that element is erased.
Thank you, I tried the rope, and I ran a benchmark that creates a rope of length 1G by repeatedly inserting 1023 bytes at random positions. Some notes:
When I changed the hardcoded node size from 136 to 1024 bytes, time went down from 3.6 secs to 2.6 secs on my laptop. It kind of plateaued at 1024 bytes. I didn't do more extensive testing.
What's the rationale for the choice of 136? A cache line is usually 64, so the CPU will always end up loading a multiple of that in any case.
When I did a rope implementation myself (I don't know how it compares in terms of performance) I think I ended up with nodes of 4096 or 8192 bytes size. That was based on a Red-black tree. When I load ~1Gig of data, even with a node size of 4096, there will still be 2^18 nodes, meaning each access requires sth. like 17 child traversals, which seems a lot to me. So I can't see myself going down to 200 bytes or less.
Oh interesting! What CPU was that on? 136 the optimal number I found benchmarking on an old intel based laptop I was using a few years ago. I wonder if modern CPUs (with much larger caches) have changed the ideal array size.
This was on a Sandybridge Laptop from 2011: Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz. If I read right, it has 256KB and 3MB L1 and L2 caches
Here's an article about how NSArray on MacOS does something similar:
https://ridiculousfish.com/blog/posts/array.html
In the sense of swapping the underlying implementation as the number of items in the collection increases.
Conversely, more sophisticated algorithms, and particularly monomorphization, may bloat the code size, causing icache, L2/L3 cache, and TLB misses. (Also slows compilation.) I think this is under-appreciated because it doesn't show in microbenchmarks. (I wish I knew of an easy way to measure how much cache pressure you're causing from a microbenchmark.)
I suspect for this reason it'd be better in Rust to use a Go-like hash map implementation [1] that keeps all the key/value information (size, Hash and Eq implementations) in a vtable-like form rather than be monomorphized, except in really hot inner loops where the specialization is worth it. There was an interesting article and discussion on reddit relating to this [2] where someone made a toy type-erased map (though not as nice as Go's) to measure the difference in compilation times. Maybe some day I'll make a more production-ready attempt...
[1] https://dave.cheney.net/2018/05/29/how-the-go-runtime-implem...
[2] https://www.reddit.com/r/rust/comments/g1skz4/an_experiment_...
Thank you for mentioning monomorphization - I wasn't aware of this concept.
Would you maybe if there's any source discussing how this is solved in different languages and performance consequences of it?
Not really answering your questions, but regarding monomorphization, you can see a comparison of code size and compilation speed between printf, std::format and C++ iostream. The last one is monomophization, I believe.
I'm not aware of a high-quality source discussing this. I'd be interested in seeing one. Off the top of my head:
* In many languages, monomorphization is impractical. Eg, in C it requires macros or external code generators. Java and Go don't even have macros. So it's rarely done. Instead, stuff uses type erasure, dynamic dispatch, more heap allocations, and tables like the ones described in that dave.cheney.net link. Maybe some JITs or even some ahead-of-time compilers do monomorphization internally in some cases, but I'm speculating rather than speaking from knowledge.
* In C++ and Rust, monomorphization is easy. You don't have to do it, but it's easy, so many people do. The containers in the standard library are monomorphized. This code can perform very well in any particular case, but in aggregate it's large enough that I have doubts about whether it's always worth it for the whole program.
You’re stretching there imho.
L1 and L2 are per core. A cacheline is 64 bytes and per-core cache size is on the order of 32kb and 256kb for L1/L2.
Reading data from L1 is on the order of 1 nanosecond (or less) and RAM on the order of 50 nanoseconds.
If you’re scanning an array and load a dozen cachelines that’s almost certainly preferable to several cache-misses (and lines).
Memory access is very often an application’s bottleneck. The answer is almost always more arrays and fewer pointers.
> The answer is almost always more arrays and fewer pointers.
The number of people who dismiss the lowly array is way too high. Arrays are fast. Keep your data in the cache and they're 10x or more faster than normal, and plain flat arrays are almost always faster than literally any OOP nonsense. By "almost always" I mean that I've never once encountered a situation where flat arrays weren't the fastest solution attempted, and I haven't seen everything, so I can't claim they're always fastest.
People really don't understand how fast their computers really are, thanks to developers not caring how fast their code is.
The CPU will load less cache data with linear searches. This is because you will have less cache misses. Less cache misses == less loading from memory. With pointer-heavy data structures, you load more from memory, and most of the stuff you do load is useless.
A binary tree might require you to visit only O(log2(N)) of your data, while a linear array on average requires you to visit one half. How does that correspond to less loading from memory?
Linear arrays are often faster, not because they require fewer memory loads, but because the cache has intelligence built in (the prefetcher) that loads some memory in advance even before the code requests it. That intelligence works way better with linear array scans than with pointer chasing. (I'm not sure if these loads will count as cache misses or not).
You need to look at the coefficients. Memory queries pull in 64 bytes of contiguous memory at a time (actually more with speculative prefetching). That extra data is used in a linear scan, but is mostly wasted when doing random access. You also have other bottlenecks with binary search, e.g. branch misprediction.
Yes, for large enough N, a BST easily beats linear search.
> That extra data is used in a linear scan, but is mostly wasted when doing random access
You are shifting goalposts. What the parent poster said is that a linear scan will read more memory than a binary search, therefore it will cause more churn in the cache (assuming same cache policies, which I don't know is a safe assumption), therefore linear scan may actually be less cache friendly - given that you're not interested in the data in any other way besides for the scan - by way of putting other parts of the program whose data was pushed out of the cache at a disadvantage.
Implied was also that linear scan might actually result in a slower program, even if the scan itself might be faster than a binary search.
What you replied is "The CPU will load less cache data with linear searches", which I assume to be false, although I will be happy to learn that it's actually the case (for example because the CPU has a clever cache eviction policy, executing linear scans by streaming in memory into a small part of the cache instead of thrashing the whole cache).
Let's say we have an array of 200 4-byte ints. Let's do a linear array scan and on average will touch 100 of them (or about 7 cache lines). Now let's say we have each integer in its own linked node in a BST in a totally random memory location. We can expect to need about 7 steps as well (2^8 = 256 > 200), which is 7 cache lines. So about 200 is already a lower bound where BST is more efficient in terms of visited cache lines, even for a pessimistic inefficient data structure like this 4-byte integer example.
This is why Rust uses B-trees, which combine linear search and trees at optimal ratios.
> null terminated strings feel like idiomatic C to most people
Doesn't that mean that null terminated strings is idiomatic C? That is, my understanding of the term idiomatic is that it is defined by whatever is most natural to users of a language regardless of whether it is the most performant.
One of my long standing complaints about modern programming is how rarely people read code. We don't encourage it in school, and in the workplace most people only read code written by their coworkers.
It would be the equivalent of teaching people to write books without encouraging them to read anything.
To break myself of the habit I started reading some well regarded programs for fun. And oh boy, have I learned a lot from doing so. One of my first discoveries was this beauty in the Redis source code:
https://github.com/redis/redis/blob/3.0/src/sds.h
The idea is to have a string struct that stores its length and content. But the pointer passed around is a pointer to the (null terminated) contents field in the struct. The string is efficient for internal calls (the length can be queried by subtracting from the pointer). But the pointer is also an idiomatic null-terminated C string pointer, compatible with the standard library and everything else. (typedef char *sds;)
Dovecot is also a gem to read if you're looking for inspiration. The way it manages memory pools is delightful - and I'm sure much more performant than idiomatic rust. (That is, without reaching for arena allocator crates and alternate implementations of Box and Vec).
They're somewhat old now, but I'd recommend checking out The Architecture of Open Source Applications series of books.
My go to troubleshooting method for open source tools I use is to just read the code to figure out what it’s doing (if I can’t quickly figure it out from the documentation). I’ve certainly learnt a lot from doing it, and I’ve even written patches for tools written in all sorts of languages that I’d never used before. The first time I ever wrote Rust and Go code, I was patching bugs in open source projects that I needed to have fixed.
COM on Windows also promoted BSTR as a language-neutral type, which was a pointer to a null-terminated UTF-16 string (compatible with Win32 entry points, at least the Unicode versions) preceded in memory by its length.
This feels like a wrapper around an equivalent of the Linux container_of construct (https://radek.io/2012/11/10/magical-container_of-macro/). I have no idea whether there's further prior art in that respect.
What makes you think arenas are considered unidiomatic in Rust? They’re there to be used when appropriate!
They seem unidiomatic because you have to fight the standard library to use them. Using an arena means abandoning String, Vec, Box, std::collections, and so on.
I have no problem doing that if I need to. But it feels like I'm fighting against the grain of rust more than I'd like.
Per object custom allocators are coming soon! I believe Vec already has an implementation on nightly.
This is one of the things that GATs will solve: it will be possible to have a pointer trait. It's a known language limitation, so you are trying to do the right thing.
This topic is of great interest to me, do you know if there is any related official or community documentation on using arena allocation (and the problems you mentioned) in Rust? I've only found https://doc.rust-lang.org/1.1.0/arena/index.html
Crates like https://crates.io/crates/typed-arena and https://github.com/fitzgen/bumpalo are the way you do this in today’s Rust, but what he’s referring to is that types like String manage their own allocations and aren’t yet parameterizable by an allocator. So they’re not super easy to use together.
In my experience most of the time you need arenas you’re using your own data structure anyway, but YMMV.
> In my experience most of the time you need arenas you’re using your own data structure anyway, but YMMV.
That makes sense for video games. Recently I was goofing with cyrus-imap. I wanted to parse the emails out of an mbox file into JSON (JMAP). Parsing an email with cyrus currently does about 5-10k calls to malloc, but the objects are all extremely short lived - they just have to live long enough to parse and then convert to JSON. This is a perfect case for a bump allocator - I'd love to allocate all the parsed email fields into an arena and then clear the whole thing when we move on to the next message.
Yes, Cyrus uses a ton of its own internal structs for emails, and they're littered with strings and vectors. (Eg for email headers, lists of email recipients, plain text / HTML message content, etc).
Looks like bumpalo will do the job, since it implements its own Box, Vector and String. I understand why, but it seems jarring that I'd need to replace the data types in order to change out the allocator like this. I'm definitely keen for GAT landing if it means bumpalo and friends don't need to reinvent the world to be able to change the allocation strategy.
Edit: Oooh Vec::new_in is in nightly! Exciting! https://doc.rust-lang.org/beta/std/vec/struct.Vec.html#metho...
Thanks for linking those projects and for "yet", which led me to the pages below
https://rust-lang.github.io/rfcs/1398-kinds-of-allocators.ht...
That is from 1.1.0, which is quite old.
do you have any recommended entry points for reading dovecot's code? I opened it up on github and it's a very large library! thanks for any help you can provide.
Nope not really! It is large, but you don't need to understand the big picture. Explore!
I'd either start at a query entry point, or find the email delivery path or something. Or skim through the files and see if anything catches your eye. Or invent a question for yourself - how does X work (for some X), and see if you can find that part of the code.
Wherever you start, explore around a bit then go deep. See if you can understand for yourself how some small, interesting part works (by tracing out the various structs and function calls). Understanding how the whole program fits together is a separate skill - but don't worry too much about it.
Personally I fell in love with the code in src/lib/memarea and mempool. The way dovecot handles memory pools is super clever. There's also lots of src/lib-X directories, and they're reasonably self contained if you want to skim that list and find something more focussed.
If you haven't read much code before and dovecot feels too big and scary, start with something smaller. I can also recommend reading the sourcecode for git or the code for redis. Oh, and if you do, do yourself a favour and checkout one of the earlier versions of those projects. The lines added early in a project's life are usually more succinct and important compared to lines added later on. So earlier versions are usually better reads. When I read redis, I think I read the code for redis 2.0 or something.
Null terminated strings are often called cstrings. They’re beyond idiomatic; they’re part of the C standard library.
Also Unix. Syscalls return null-terminated strings all over the place. So any language running on Unix has to deal with null-terminated strings.
I know because I run a userland that uses length-prefixed strings as far as possible: https://github.com/akkartik/mu
Not just the standard library, but the core language. They are what you get if you write a string literal in your code.
Yes but if we are being pedantic no. Of course the language does still greatly lean into the null terminated string concept.
You get a null terminated char array but the length is actually available since arrays (as long as they haven't decayed into pointers) can still share their length at compile time.
True. I was skipping over that, but the fact that functions manipulating null terminated strings are part of the standard library is certainly a reason in itself to consider them idiomatic.
They're agreeing with you.
Yes, I understand :)
Also lack of generics can make it slow, e.g. qsort() requires a function call for each comparison. So C++'s std::sort() can be significantly faster on an array of integers.
It's more to do with the fact that std::sort's definition is visible to the compiler and qsort() is not. Put qsort() code in stdlib.h, make it static and write a static intcmp() and you'll see the compiler inline that no problem.
I've done this a few times using http://www.corpit.ru/mjt/qsort.html
Sure you can hard-code intcmp into qsort but then it would only work for arrays of ints.
You could do some macro magic instead of templates e.g. `DEFINE_QSORT(int, intcmp)` which could stamp out `qsort_int` but that's not a part of the stdlib.
C++ arguably gets this right since sort<int> and sort<string> will be separate functions, although templates are of course a footgun. And of course duping the logic for std::sort<T> for a bunch of different T impls increases the binary size.
The poster you are replying to didn't suggest hardcoding intcmp into qsort - just making it so the implmentation of qsort is available to the compiler when the comparison function is known (i.e. just like with C++).
When this is done, the compiler can inline qsort, and replace the indirect function call with an inlined version of intcmp, and then things are equivalent.
Only if inlining qsort is best. Sometimes it is sometimes it isn't, based on complex rules that I trust the compiler to know.
I think mh7 did not mean to hard-code intcmp into qsort. The idea is to move the definition of qsort directly into the stdlib.h header file. That way, the compiler can see the definition of qsort and intcmp at the same time.
In that case, the compiler could make a specialized qsort using intcmp automatically.
I'm skeptical. I doubt that the whole of qsort gets inlined (it's big), and the best the compiler can do is to clone qsort to const-propagate the function pointer for it to get inlined.
In my similar experiments with std::sort with a function pointer only gcc does this with -O3.
I benchmarked it several times in the past and couldn't replicate std::sort being faster (GCC with high optimization settings). Anyway both are slow. If you need fast sort you need an implementation without any function calls (no recursive calls) and both the pivot choice and the chunk size at which insert sort kicks in optimized to your data and hardware. My experience is that you can beat built in sort by 2x to 3x.
How would you perform this optimization? If it’s the same data getting sorted, why not put it in an ordered data structure?
Andrei Alexanrescu has a talk on doing this - he calls them metaparameters e.g. where a hybrid sort chooses to change algorithm.
One library I have exploits the fact that D templates are embarrassingly better than C++'s, so you can actually benchmark a template against it's parameters in a clean manner without overhead - that could be anything from a size_t parameter for a sort or a datastructure for example.
This made-up (pointless) benchmark measures how insert a number of cpuid instructions into the loop of a summing function affects it's runtime. My library writes the code from your specification as above to generate the instantiations and loop to measure the performance. As you might guess, the answer is a lot (CPUID is slow and serializing).enum cpuidRange = iota(1, 10).map!(ctfeRepeater).array; @TemplateBenchmark!(0, cpuidRange) @FunctionBenchmark!("Measure", iota(1, 10), (_) => [1, 2, 3, 4])(meas) static int sum(string asmLine)(inout int[] input) { int tmp; foreach (i; input) { tmp += i; mixin("asm { ", asmLine, ";}"); } return tmp; }edit: https://github.com/maxhaton/chimpfella - I haven't bothered to add pmc support yet
I love qsort! It's easy to use.
When I once tested std::sort against qsort (sorting 4-byte integer) I measure a 2x difference. So yes, definitely non-trivial, but it won't get much worse than that.
Have you ever seen a program that was slow because of a slow sorting routine?
If you ever need a fast sort (~ never) then the last thing you should do is use std::sort anyway. You should figure out what your data looks like and hand roll an implementation. For example, a radix sort is often possible to use, easy to implement, and much faster than std::sort.
> real world C is a lot less performant than people think it might be
What is meant here? Fast? Reliable? Secure? Memory efficient? Power efficient? Easy to write? Easy to maintain? Quick to compile? Easy to debug?
I have no illusions about the reliability/security/correctness of real-world C code, especially since it's usually not compiled with a memory-safe compiler or run with memory-safe libraries and runtime environments (though it's often sandboxed to limit the damage.) It's relatively easy to introduce memory errors which are not detected by the compiler or runtime.
Certainly many algorithms and data structures (in C and other languages) exhibit tradeoffs including things like speed vs. memory use vs. code size vs. complexity, etc..
But C compilers are pretty fast, which I really like. Then there are/were environments like Turbo Pascal or Think C, which seem to have been amazingly compact while offering a rapid edit-compile-debug cycle as well as decent runtime speed and code size.
I assume, since the submissions says is talking about something "faster in benchmarks", that he means "fast".
Yeah, I think you're right.
"Less performant" seems like a less clear (so to speak) way of saying "slower" (or maybe "less efficient" or simply "worse" in some instances.)
Right. As an industry, we need to get away from the simplistic "manual=fast" model. Logic does not automatically get faster when you write it in C or C++. It frequently gets slower, since your one-off "lean and mean" native code clumsily and slowly does the job that a managed runtime has been tuned for decades to do.
D uses slices for strings, which also gives D a big performance boost over C whenever strings are in play.
Walter, can you elaborate on this, are D's slices functionally equivalent to C++'s string_view? I.e. no copying as long no 'ASCIIZ-dependent' code is involved?
D strings are a simply a pointer/length pair. Substrings can be extracted with no alloc/copying necessary. The length can be determined without loading the string into the cache and scanning it.
Those two features make for big speed improvements.
It is funny when I think of the hundreds of thousands of lines of code I have shipped in my long career (all embedded) and how little of it ever had to handle a string.
I agree, C is a really poor choice for string or other human readable content handling.
But if you have to read-shift-mask-write bytes to hardware control registers, it is a pretty easy language to use, and faster than assembly language in 90% of the cases, with modern compilers. That last note was less true in the mid 1980s, but by the mid-1990s, compiler optimization had gotten to be quite good.
Cellular modem code that runs on DSPs is written in C. Maybe someday that will be written in Rust. We'll see.
Forgive me for asking a stupid question: what does it mean when something is “idiomatic” in a programming language context? Is it just the best or recommended way to do something? Or is it something that’s supported by the language? Or something else?
Idiomatic in a programming language means pretty much what it means in any other language: how would someone fluent in the language express it? In a programming context, that may also correspond to the most performant or otherwise "best" expression, but sometimes it's just a commonly used style or phrase.
A programming language idiom is a pattern or technique that you expect your readers to know, even if it isn't obvious when encountered for the first time. Here's an idiom:
We all instantly know this means loop 10 times, from 0 through 9. But on first read it takes a little work to figure out what's happening.for (int i=0; i < 10; i++)I was struggling with that too initially - replace the word with 'standard' or 'natural' and it retains the same meaning.
Slow code bottlenecked by string operations is probably just bad, though - no matter which way you code your strings. Strings are not what computers like to do. Computers like integers and subscripting operations. Strings can make sense for inter-process scenarios (for example, file paths are usually the right thing and you don't want to reference files with inodes).
There is a culture of bad C code from the 90s - especially C code written in OOP-y ways where that was never necessary. Like, calling malloc() + free() for every little thing instead of more structured memory management. A lot of that code is written in C not with a efficiency or elegance mindset but simply because C was the language that you wrote programs in.
> I spent a fair amount of time reviewing C code in the last 5 years - and things that pop up in nearly every review are costly string operations.
Do not tie computational operations on string operations in C.
Pick one of many string manipulation libraries to do any serious work with them.
Once LLVM fixes some bugs with `noalias`, at which point Rust will begin using it again in more circumstances [1], I'd expect to see Rust get even faster in these benchmarks, given that the Rust compiler knows much more about which pointers do/do-not alias than most other programming languages [2] and the myriad optimizations this knowledge allows.
[1] https://github.com/rust-lang/rust/issues/54878#issuecomment-...
How often does benchmark code have a function that takes two pointers that could potentially alias each other? If it's as rare as I think it is, it might not have that much of an impact on Rust's position in the benchmarks game.
Still, real world performance will probably benefit from this fix so it's a positive change regardless.
I've been fairly convinced for a while that once Rust matures (which is probably fairly close to "now", but I've held this opinion for years) that it's going to have a performance advantage in real code that's going to be hard to capture in benchmarks, because it's easy in a small benchmark to be very careful and ensure that you don't have aliasing, avoid extra copies, etc.
Where I expect Rust to really shine performance-wise is at the larger scale of real code, where it affords code that copies less often because the programmer isn't sure in this particular function whether or not they own this so they just have to take a copy, or the compiler can't work out aliasing, etc. Ensuring at scale that you don't take extra copies, or have an aliasing problem, or that you don't have to take copies of things just to "be sure" in multithreading situations, is hard, and drains a lot of performance.
I agree with this. Benchmark code differs from real code in in that it approximates the performance ceiling for a language implementation; it's not "ordinary code" or even "somewhat optimized" but usually the most optimal code one can conceive of with little respect paid to competing concerns, like maintainability. Rust aspires to make idiomatic, maintainable code almost as performant as benchmark code by way of zero-cost abstractions; probably more so than any other language, including C and C++, and all the while keeping the ceiling high relative to other languages.
Unfortunately, I'm still of the opinion that "idiomatic Rust" is quite a lot harder to write than idiomatic Go or C# or etc, and many applications absolutely index on developer velocity and performance and quality are "nice-to-haves". Many companies are making money hand over fist with Python and JavaScript, and Go and C# are quite a lot more performant and typesafe than those languages already; Rust is better still in these areas, but returns diminish. If C# and Go preclude 95% of the errors found in Python and JS, it's not worth trading tens or hundreds of percents of developer velocity for that extra 4% or so improvement in quality (a Go developer could recoup a bit more of that 4% by writing tests in the time they save over Rust).
Of course, this is all subjective, and I've met smart people who argue with a straight face that Rust is better for developer velocity than languages like Go, so YMMV.
> If C# and Go preclude 95% of the errors found in Python and JS
Well Go and C#[1] still suffer from the billion dollar mistake (null pointers), which represents at least 1/3 of errors I've witnessed in JavaScript code, so I'd say they at best removes 70% of errors. And there's also logic errors, for which neither Go's or C#'s type system helps either, so maybe we're at 50% error reductions with Go and C# compared to JS and Python. I don't even think that Rust, with its really helpfull type system (ADT + Affine types) achieves 95% error reductions though.
[1]: for C#, at least last time I used it, which was 2011
> for C#, at least last time I used it, which was 2011
Since then, C# added nullable reference types:
https://docs.microsoft.com/en-us/dotnet/csharp/language-refe...
In projects with complete NRT coverage, it's extremely rare I ever get a null ref, as the compiler will warn/error me if I'm using a possibly null value in an unsafe way.
Thanks. I suspected it since they added them to TypeScript, and I'm glad they did!
Newtypes are also pretty great for detecting logic errors. (You can use them in Go and C# but they're not idiomatic and usually not zero-cost so people usually don't.)
Newtypes are pretty idiomatic in Go (e.g., https://golang.org/src/os/types.go#L35), and they're so inexpensive that cost isn't a reason to forego them 99% of the time.
Sorry, my mistake, thanks for the correction.
Besides the sibling answer about Go, when you are in .NET world there are also F# and C++/CLI to chose from, so C# doesn't stand alone (leaving VB.NET out as it appeals other kind of developers).
The optimal point on the tradeoff between developer velocity and performance/correctness depends a lot on the domain, in particular:
-- How much usage do you expect to have? The more usage, the more important performance and correctness are.
-- How critical is your application? The more critical it is, the more important correctness is.
Also, the argument that you can spend developer time to increase correctness just by writing more tests works up to a point, but then it doesn't, because of diminishing returns. With Rust you can eliminate certain entire classes of bugs which tests will never reach.
> The optimal point on the tradeoff between developer velocity and performance/correctness depends a lot on the domain
Agreed. This is what I was alluding to by "many applications absolutely index on developer velocity and performance". Note that it's even a bit more nuanced--within an application there are bits that are more sensitive than others. For example, the UI widgets are typically much less sensitive than the security, data privacy, and data integrity bits. Even still, I haven't noticed a frenzied rewriting of these systems from Go/C#/Java/etc into Rust, and I'm willing to bet that there's a fair amount of this kind of code implemented in JavaScript or Python.
> How much usage do you expect to have? The more usage, the more important performance and correctness are.
True, but this is a very low-value concern. Notably, hugely popular apps like Reddit can break altogether on a nearly daily basis (never mind more minor bugs, like some UI widget breaking) and they're still content to write in a completely dynamic language.
> How critical is your application? The more critical it is, the more important correctness is.
I think this is a much more important driver than usage, but relatively little software is so critical. As previously mentioned, there's lots of software that governs privacy and security systems that isn't being frenzily rewritten in Rust. Even OpenSSL, an application in which correctness is far more important than velocity, isn't considering a rewrite to Rust despite that it's written in a language with worse static verifiability than the Go/C#/Java tier. This is probably the kind of application that I would want to see in Rust--it's very sensitive to performance and correctness; which is to say, I think OpenSSL is very near the boundary at which writing in Rust makes economic sense (something that is very stable and very sensitive to correctness and performance). It's almost certainly not your web services.
> Also, the argument that you can spend developer time to increase correctness just by writing more tests works up to a point, but then it doesn't, because of diminishing returns. With Rust you can eliminate certain entire classes of bugs which tests will never reach.
Yeah, sorry, my brain was thinking one thing and my fingers typed another. I should have said "writing additional tests would recoup a bit less than the additional 4% that Rust would get you"; instead I typed "more". I missed the edit window, so I can't update my post.
I think we mostly agree.
> Notably, hugely popular apps like Reddit can break altogether on a nearly daily basis (never mind more minor bugs, like some UI widget breaking) and they're still content to write in a completely dynamic language.
Yes, but I wonder if people are sometimes being irrational here. You pay for those breaks out of the devops budget. Moving breakage to earlier in the developer cycle typically reduces the cost of fixing it.
Regarding OpenSSL, there are a few reasons that isn't going to be rewritten in Rust anytime soon. The developers probably don't know Rust. OpenSSL runs in lots of places that don't have a Rust toolchain. Even if you rewrite it in Rust you're stuck with a bad API so why bother. In general there are more reasons against rewrite code in Rust than against writing new code in Rust ... and more reasons against writing new code in Rust that interfaces with old code than writing new greenfield code in Rust.
> You pay for those breaks out of the devops budget.
Only for organizations where "devops" means "ops". If your organization practices continuous deployment, then getting a bug fix out is no big deal (merge your PR to master and make sure the deployment doesn't fail); certainly no devops time is required.
> Moving breakage to earlier in the developer cycle typically reduces the cost of fixing it.
This axiom originates in the days of waterfall development and annual software deployments shipped to customers on shrink-wrapped physical media. It's not nearly as costly as our dated CompSci or SoftEng educations make it out to be.
Your average web service isn't going to benefit on balance from moving from Go to Rust because any gains in finding/preventing bugs sooner are going to be dwarfed by the slowed pace of development.
Of course, someone will point out that data races of the sort that Rust precludes are really hard to debug, which is true, but they're also very rare because request handling rarely requires parallel access to local resources (mostly things like database connections, where the locking is handled by the database library). The total cost of dealing with these for your average web application is still going to be dwarfed by the smaller tax Rust imposes on every change you make.
Note that Rust has improved significantly, both by accepting more correct programs (laxer rules such as non-lexical lifetimes) and by improving tooling (rust analyzer). I'm not sure if they'll ever close the gap completely such that using Go/C#/etc are relegated to certain niche use cases, but I think it will continue to steal more and more of the application space that leans toward performance- or correctness-sensitive.
I don't think the value proposition of Rust over C#, Java, and Go is that it eliminates more errors. I think the advantage is it eliminates the same number of errors and performs better/has less overhead.
I agree with your post generally though.
Lots of people argue that Rust improves correctness by way of sum types (including no implicit nil/null), affine types (addressing data races and so on), and (compared to Go) generics. I completely buy this argument.
Those data races are only addressed for in-memory accesses across threads.
If you are accessing data via some mechanism of IPC across processes, e.g. database access to same data rows without proper transactions, accessing shared files without locks, GPU memory data,.., affine types won't save you.
Yes, that is correct.
Maybe but there's reasons to doubt this. For starters, it's not obvious that Rust copies less. Sometimes you copy in Rust because it's easier than threading lifetimes throughout; sometimes you copy because you are defending against an imaginary threat (consider std::env::var or std::env::args).
One potentially important difference is integer indexes vs pointers. Compared to C, Rust uses int indexes less often within a function (iterators, not for loops) but more often between functions: in C you might store a pointer somewhere, while in Rust it's easier to store an index into a collection. But a ton of work has been poured into LLVM's induction variable analysis so this is swimming upstream.
The aliasing could be really important; there's a ton of pessimizations for e.g. uint8 pointers. Part of the maturing is fully specifying Rust's UB and aliasing model.
One big advantage when writing average code is that while Rust emphasizes generics and makes them easy to write, C++ makes it so difficult that you try and avoid it at all costs. Big, complex projects are going to probably use polymorphism all over the place in my experience which is not really captured by benchmarks.
However, one advantage for C and C++ for such projects is that they make it far more pleasant to do bulk allocations. With C, you just need to group your mallocs and cast the memory and C++ offers placement new where you pass in the memory explicitly. With Rust, you need to reach into unsafe Rust (rightfully so), but unsafe Rust is very unpleasant to write.
I don't agree with the first paragraph at all. C++ templates are more powerful than Rust generics, and also easier to write, because they are not inherently type-checked. In practice templates are embraced and used extensively, not "avoided at all costs" (like C++ exceptions.)
>C++ templates are more powerful than Rust generics, and also easier to write
They are definitely not easier to write. The last time I checked, the error reporting were still so intractable that would be hard to take this argument in good faith. Maybe it improved?
>In practice templates are embraced and used extensively, not "avoided at all costs" (like C++ exceptions.)
I think many would agree that this is largely because the package management system for C++ is so awful that header-only libraries are popular since you just need to copy paste a single file into your project.
C++ is a large language and there are some wildly different coding standards. Some do avoid templates (but you're right that GP overstated this). Why? Templates introduce a lot of code bloat because there is little opportunity for eliding copies of the same code at link time. Aside from bloating the global offset table in position independent executables (if the GOT doesn't fit in cache, perf drops), the compile time hit is also a non trivial issue. But this is a really exceptional scenario where you are shaving the monolith and need to get the ELF executable down (e.g. server side ball-of-mud, small embedded systems).
I agree that the aliasing control and other features of Rust enable all kinds of interesting optimizations that currently aren't being done, and 'restrict' is only part of it. For example we also know that the data behind a shared reference is really immutable (apart from UnsafeCell etc).
Unfortunately we may need an entirely new generation of compiler infrastructure to exploit these opportunities because LLVM is really a C/C++ compiler at heart and may not be happy about taking big changes that can't benefit C/C++.
Honestly, the biggest thing that attracts me to Rust is memory consumption. Most of my work is IO bound, usually on IO latency and not throughput, so the ability to pack more instances per container/vm/server/whatever would have a significant effect on my final bill per measurable unit.
The problem is that it’s “rare” and not “impossible.” Compilers have to be logically sound, not probabilistic when it comes to defined behaviors.
Benchmark hackers can just sprinkle `restrict` all over the place, too. Real world C code doesn't get restrict by default and often isn't fully annotated, so the benchmarks may artificially hide the real-world difference.
> How often does benchmark code have a function that takes two pointers that could potentially alias each other? If it's as rare as I think it is, it might not have that much of an impact on Rust's position in the benchmarks game.
Most C++ STL algorithms take iterators, that in many cases are just pointers in benchmarks because these often only deal with arrays. Hell most C standard APIs (e.g. memcpy) take multiple pointers.
I doubt there's any performance to be gained that way, but if so, the C implementation can just use `restrict` to the same effect.
Have you ever used restrict in anger?
I've done it when we really needed that performance for an inner loop(particle system). It can be a real bastard to keep the non-alias constraint held constant in a large, multi-person codebase and the error cases are really gnarly to chase down.
Compare that to Rust which has this knowledge built in since it naturally falls out of the ownership model.
Isn't every pointer `restrict` in Fortran which was often cited as why it still outperformed C in many cases for a very long time?
I agree with that, no aliasing in C is an ugly kludge that bitches about perfectly fine code that benefit benefit from it. And it's hard insure that it actually works in code that does. Worse the failure is completely silent.
Is there something special about the noalias keyword that it really brings out the most unprofessional pig language in developers? https://www.lysator.liu.se/c/dmr-on-noalias.html
When people use the unsafe keyword are they always taking into account aliasing?
At least you can audit only those places though.
Yeah, `unsafe` is the developer signing a contract to uphold all the guarantees that safe rustc provided (at least from the perspective of the public interface, the invariants can be broken in private code).
That's why Rust developers don't take kindly to unnecessary unsafe usage, because the audit surface area (and interaction complexity) increases.
But this is the crux. What sorts of aliasing are permitted in unsafe Rust? I think we kind of don't know; that's what the "stacked borrow" effort is trying to answer.
We know pretty well for the vast majority of patterns whether it violates the aliasing rules or not. There may be some corner cases that are not yet decided on, but if you just consider those as UB for now, you are perfectly fine.
Rust takes const with pointers very seriously. It is undefined behavior to mutate anything non-mut (pointer or reference) unless it is through an UnsafeCell. While you can break compiler guarantees through pointers in Rust, dereferencing this pointers must still obey the above guarantee (basically think of it as you need to go back to references to actually use the aliased pointers which would be undefined due to the aliasing).
Therefore, the optimizer can always assume mutations don't alias. Even UnsafeCell isn't allowed to break the alias rules, it just provides flexibility going from immutable to mutable.
What you describe about 'const' is just about const correctness. That's important but different than aliasing.
For example consider Slice::swap [1]. This does consider the possibility that the pointers alias, and correctly, because they might point at the same object.
https://doc.rust-lang.org/src/core/slice/mod.rs.html#541-553
Thanks for posting that link. I found the following issue because of it: https://github.com/rust-lang/rust/issues/80682
I mean obviously there are people who write incorrect unsafe code, but it is certainly something people take seriously.
"Just use 'restrict'" isn't as easy as it sounds. You need to only use 'restrict' where you are 100% sure pointers can't alias. Wherever you get this wrong you have introduced a subtle bug that a) only shows up in optimized code b) may or may not show up at all depending on compiler version and target architecture and c) only shows up when two pointers actually alias at runtime, which may be rare, and may be intermittent ... in other words these bugs will be hell to debug. So in fact in a large codebase it will be a lot of work to figure out where you can safely put 'restrict' and you will probably introduce some very nasty bugs. Most people aren't going to be willing to do this.
Rust's problem with aliasing in LLVM is caused exactly by limited usefulness of C's restrict.
LLVM implements only coarse per-function aliasing information needed by C, and doesn't properly preserve fine-grained aliasing information that Rust can provide.
Note that Rust's issue was that LLVM is buggy with restrict (because it isn't exercised frequently).
That said, there is a series of patches in-flight to get actual working full restrict support: https://reviews.llvm.org/D69542
I thought the problem was there is simply a bug in the implementation because restrict isn't used frequently enough to have exposed the bug earlier.
> I doubt there's any performance to be gained that way
And how do you know this? Have you actually measured the difference?
My CPU-heavy program (which takes ~30 seconds to run on a 32-core Threadripper) gains ~15% extra performance if I turn it on.
Sure, if you put the burden of proving correctness of restrict on the developer.
That's like saying C programmers could just write memory safe code if they felt like this would help, so clearly memory safety is not important.
Looking at the reverse-complement code, it appears that the Rust and C implementations are using different algorithms:
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
On a quick inspection:
- The Rust code is about twice as long.
- The Rust code has CPU feature detection and SSE intrinsics, while the C code is more idiomatic.
- The lookup table is larger in the Rust code.
I recall an anecdote about how Haskell actually outperformed C on various tree benchmarks because it was using a better implementation. At some point, the C programmers got fed up with the airs of superiority from Haskell programmers, ported the Haskell implementation, and reclaimed their position.
I wouldn't be surprised if there's something similar happening here.
That kind of tit-for-tat in benchmarks seems like it's counter to the goal of benchmarks: what kind of performance could I expect to see using technology $FOO? Crucially, that question depends on how someone will realistically implement $FOO.
I like PyPy as an example: on the surface, implementing a Python runtime in Python and expecting performance gains seems crazy. PyPy manages to outperform CPython because although a C implementation should theoretically be faster, realistically the increased expressiveness of Python lets the PyPy devs opt into optimizations the CPython devs find out of reach.
I don't know C or Rust well enough to comment on these specific scenarios, but if two technologies can be fast, and one makes that speed accessible while the other encourages implementors to leave performance on the table, that's much more useful information to me than seeing a back-and-forth where folks compete to implement optimizations I'll never make in my own projects.
> PyPy manages to outperform CPython because although a C implementation should theoretically be faster, realistically the increased expressiveness of Python lets the PyPy devs opt into optimizations the CPython devs find out of reach.
Pypy outperforms cpython for the simple reason that pypy is a jit where cpython is a basic bytecode interpreter. Anything else is icing on the cake.
The only reason python seems slow as an implementation language is because it was traditionally slow; the reason it was traditionally slow is that cpython, the only major implementation, is slow. Common lisp, for instance, is similarly dynamic (moreso, in fact), also is also generally natively compiled by itself, and is quite performant.
Common Lisp (hereafter Lisp for concision) is differently dynamic, not strictly more dynamic.
One quick example: standard-class objects in Lisp are mostly more dynamic than python objects[1] but structure-class objects are quite static, and compilers will use this to their advantage.
On top of this, functions in python are in fact objects; in python you can do this just fine:
There are obviously other ways to accomplish this in lisp, but the fact that someone might do something like this can inhibit optimizations.def foo(): foo.bar = 3 def bar(): print(foo.bar)If you read the Lisp specification you will see dozens of places where they allow for opting out of dynamism in ways that help the compiler optimize. Type declarations are one and the fact that a function defined in a single file is considered to have a fixed definition for the scope of the file is another.
1: It's not quite as easy to add new slots to Lisp standard-class instances on the fly as it is to python class instances, but it's doable, and Lisp, of course, supports class redefinition in a way that few other languages (including Python) do.
PyPy has a JIT because using Python as the implementation language made it simple to build a runtime generator instead of a single runtime implementation, so RPython (the underlying PyPy tech) can generate a JIT and non-JIT runtime from the same source code. Conversely, the CPython team doesn't have the resources to make a JIT runtime the bespoke way that's more typical for these projects.
https://morepypy.blogspot.com/2011/04/tutorial-part-2-adding...
CPython isn't a JIT because its goals as a portable reference implementation would be much harder to achieve with a JIT.
To put it another way, there's a reason PyPy trails CPython language features on the scale of literal years. This is not explained by the affordances of the language they're written in! They're just projects with very different requirements which, on some platforms, happen to run a lot of similar-looking code.
> there's a reason PyPy trails CPython language features on the scale of literal years.
the reason is under-funding, and the lack of developers to the cause as a consequence, not the portability complexities of jit.
You raise a good point. I've always been fascinating by PyPy's performance, personally -- anecdotally, I've achieved ~10x speedups from just running a script with `pypy` instead of `python`. I always attributed that to better performance of the JIT, but I could be wrong.
I have nothing against Rust personally, but it's ultimately not an apples-to-apples comparison if they're not implementing the same algorithm, or even if they're not using the same mechanisms (e.g. Rust explicitly using SSE intrinsics, which are certainly available to C in just as idiomatic a fashion).
comparing apples to apples is pointless, they are both apples.
and you can always compare “equivalent” algorithms as some languages may not be able to efficiently express the same algorithm as another. i know what you are after, but trying to have some benchmark that is “fair” according to some spec that is important to your needs will just be seen as pointless to others. benchmark game at least lets us see what the performance ceiling is of various languages, and anytime you object to a language having a worse implementation you may go fix it.
i think what anyone with a clue understands is that C C++ and Rust all have roughly equivalent performance ceilings. Nobody really thinks Rust is faster than C now.
> Nobody really thinks Rust is faster than C now.
The submission title would beg to differ.
(I know, it explicitly calls out "benchmarks" as the context.)
I think languages like Rust or Swift have significant advantages around safety over C/C++, while not sacrificing much in terms of performance. But if one language's benchmark contributors are willing to put in more effort than another's to eke out additional performance, then you're going to see skewed results in favor of whichever has the more fervent evangelists or whichever language has more to prove.
If the goal is to compare performance of two languages which can express the same optimization in exactly the same way, and only one uses it, then the benchmarks fail in that respect.
go fix the c implementation then.
no single person can maintain optimally implemented solutions over dozens of languages. its up to we the people to help out. you want perfectly equivalent comparisons, make it happen
I don't want perfectly equivalent comparisons. I want people to stop citing this set of imperfect ones as if they mean anything.
Yeah. At a fundamental level these programs are running on the same hardware. Unless you’ve an ultra simplistic language (basic or logo), one could implement identical algorithms and methodologies and techniques and get the same performance.
I admit I'm destroying the metaphor here but there are a lot of useful comparisons to be made between different cultivars of apple and those comparisons have implications on their applications.
That's the whole point of the metaphor - comparing apples to apples is good, apples to oranges is bad.
I'm comparing apples to oranges every time I'm trying to decide which fruit to buy at the supermarket though.
and its a bad metaphor. there are many reasonable comparisons to be made between any two things even when both are not fruits
Technically speaking, there are limits and you cannot compare incomparable. You can find distinctions between two phenomena, a.k.a. inherent traits that make them different-by-definition from one another, but you cannot compare (liken, collate, contrast) them in any meaningful way. Like, what's the outcome of comparing "soft" with "warm"? And hence, comparing apples to apples are not pointless, quite contrary - they are comparable items and it makes total sense to compare them in the context of superiority by desirable value axes.
> Like, what's the outcome of comparing "soft" with "warm"?
You can have "soft" or you can have "warm" — which do you want?
You can have an apple or you can have an orange — which do you want?
> That kind of tit-for-tat in benchmarks seems like it's counter to the goal of benchmarks
Yep. There are good reasons why this set of benchmarks is called The Computer Language Benchmarks Game.
> I like PyPy as an example: on the surface, implementing a Python runtime in Python and expecting performance gains seems crazy.
Not if it includes a JIT compiler, as PyPy does. I don't know much about PyPy, but if most of the time is spent in JITted machine code, the fact that it's written in Python may not affect performance much.
Kind of, PyPy is a metacircular JIT compiler, it uses RPython a Python subset.
realistically the increased expressiveness of Python lets the PyPy devs opt into optimizations the CPython devs find out of reach
Not really: It's the compilation strategy (that whole meta-tracing JIT compiler thing, compared to a simple bytecode interpreter) that makes the difference, not the surface syntax of the implementation language - which is actually not 'real' Python, but a restricted, less dynamic subset known as RPython.
Also note that the CPython is deliberately kept 'dumb'.
There was a long period where a fairly unknown theorem proving language ATS (1) was beating C in many test cases on the benchmark game (2) the benchmarks were removed though (3). I expect many languages could be made to win with sufficient effort.
1. http://www.ats-lang.org/ 2. http://web.archive.org/web/20121218042116/http://shootout.al... 3. https://stackoverflow.com/questions/26958969/why-was-the-ats...
ATS is not really a theorem proving language, it's almost a superset of C with a very long list of type system and language features that lets you write anything from C code with no safety to very high level recursive functional code with lifetime tracking which will translate to efficient C code, if the transformations are proved to be correct. It's a weird beast, but I'm not surprised it outperformed some C implementations, because it is basically C with a lot more features.
> ATS is not really a theorem proving language
What is "real theorem-proving language" in this context? ATS has ATS/LF that is designed for writing formal proofs in the language.
http://ats-lang.sourceforge.net/DOCUMENT/INT2PROGINATS/HTML/...
> I wouldn't be surprised if there's something similar happening here.
In this case it seems like benchmark code is allowed to use intrinsics, which can degenerate into a situation where a benchmark in language X is more "glorified x86 Assembly code" than actual code in language X.
This is not very useful for comparing languages IMO. Especially since all of Rust, C, C++ can use this strategy and become almost identical in both code and performance.
So three languages with identical performance ceilings end up with essentially identical performance results. sounds like a job well done.
if you want to know something more than that, like how performance tends to end up after inputs of identical effort and talent you will need to recruit a lot of people and money to run really large scale experiments and when you are done people on HN will just endlessly find nits to pick with your results whenever they don’t agree with their preconceived notions
It is always like that. There is no way that X is faster than C. It most definitely has to do with algorithm. What I can believe is Fortran being faster than C in some circumstances, but that is about it. Can you guys please stop this nonsense? No, your equivalent Rust code is not faster, sorry.
The Haskell code on some pathological examples of these implementations that I have seen has so many unsafe construct, strictness annotations, inlining annotations and so forth that it's practically C in a different syntax.
It is not idiomatic Haskell at all and loses all of the touted benefits.
Is there a separate benchmark that only accepts idiomatic code?
Is there a GHC compiler switch that ensures only "idiomatic code" is compiled?
One can, with sufficient effort, essentially write C code in Haskell using various unsafe primitives. We would argue that this is not true to the spirit and goals of Haskell, and we have attempted in this paper to remain within the space of “reasonably idiomatic” Haskell. However, we have made abundant use of strictness annotations, explicit strictness, and unboxed vectors. We have, more controversially perhaps, used unsafe array subscripting in places. Are our choices reasonable?
http://www.cs.ru.nl/P.Achten/IFL2013/symposium_proceedings_I...
It shows the performance of that variant of Haskell.
But when using that variant all the correctness guarantees that Haskell is known for are no longer on the table.
Just like C is pimped up BCPL, a language designed to bootstrap the CPL compiler, 10 years younger than ESPOL/NEWP, 10 years younger than Jovial, 6 years younger than PL/I, among other possible examples from the 60's.
Also C was known for not being a fast language on 80's home hardware, ruled by Assembly.
So low level coding for OS isn't coding like C and is just the way for mechanical sympathy, regardless of the language.
It's exactly the same thing. I've been seeing this tossed around for years now, and one of these days it'll make me grumpy enough to fix the benchmark.
Yes, it seems everyone is always trying to beat C, but really, no one can.
Well, junior Assembly developers had it pretty easy on 80's hardware.
Also C++ and Fortran don't have to spend much effort to achieve that.
To be fair, if there's a language that has a fair chance it's Rust.
Looked at the git repo https://salsa.debian.org/benchmarksgame-team/benchmarksgame and it seems there's only one committer. So if all scripts are written by the same guy, then his biases are all over the place, and it needs more code diversity before we can arrive to any conclusion.
Am I missing something?, is there a list somewhere of the people who have contributed to this?
Click on any of the benchmark programs and it will have the contributors' names at the beginning.
Thanks!
> The Rust code has CPU feature detection and SSE intrinsics, while the C code is more idiomatic.
These benchmarks are almost always either useless or a scam - you either end up writing rewriting the same implementation n times or you don't utilize the capabilities of the language, either way you're not really measuring much of anything intrinsic to the language itself - Rust and C both have the same backends and if you really care about performance you're going to take it to the max anyway, so inference by the compiler isn't that important.
Nothing stops someone from copying and submitting other implementation's algorithm. There are multiple implementations of each benchmark for every language:
• https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
• https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
• https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
It's possible that someone has already submitted both algorithms for both languages, and different approaches won for language-specific reasons.
These are all the C versions I can find:
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
None of them have SSE intrinsics or are quite as long as the Rust version.
I find it doubtful that SSE intrinsics wouldn't help the C version, if they are indeed helping the Rust version. This seems fairly easy to check as the Rust version has a non-SSE fallback code path - I'd do it myself but am not able to at the moment.
> Nothing stops someone
Well, it looks like the submission process[0] and the maintainer[1] do, actually.
[0]: https://www.reddit.com/r/rust/comments/kpqmrh/rust_is_now_ov... [1]: https://www.reddit.com/r/rust/comments/kpqmrh/rust_is_now_ov...
> "…but they don't accept Gmail addresses."
"Registration with @gmail.com addresses is blocked due to abuse, please register with another address. You can change it later, once signed up and verified."
However, apparently, you can register using a Google, Bitbucket or GitLab.com sign-in.
When someone on proggit writes — "Yeah it's really hard to submit code. Not only do you have to sign up for their site but they don't accept Gmail addresses." — they may just be repeating something they read on proggit.
It's only 6 weeks since that Rust revcomp #1 program was contributed — November 23, 2020.
I'm the author of that C program. You are correct that the programs are using somewhat different algorithms (and as an author for many other submissions on the site, I can say this is true for many of the benchmarks on the site).
In addition to what you've already mentioned, if I'm not mistaken, the programs also take different multi-threading approaches. One of the tasks in this benchmark involves reversing three very large strings. My C program took a fairly simple approach and does that reversal process for each string serially but processes multiple strings in parallel. The Rust program on the other hand does the opposite and does the reversal process for each string in a parallel manner but only processes strings one at a time. The algorithm the Rust program uses is more complicated which results in a considerably larger program size that also is a bit less flexible regarding the input (it assumes each line of input is 60 characters whereas my program will work with lines of any length) but it results in better CPU utilization, less clock time, and less memory use. I've been meaning to submit a faster C program using this faster algorithm but have simply been too busy with other things.
I have seen this happening so often: C/C++/Rust often end up using CPU-specific features, and the code starts looking more and more like assembly code, and less like idiomatic high-level language code. Basically, comparisons of programs written in all the other languages against these three become meaningless. And in turn, hurts benchmarksgame as a resource for comparing languages.
If I had to write a performant library at work, I too might rely on CPU-specific assembly wrappers in my code. But IMO, such code has no place in a general-purpose cross-language benchmark site.
> If I had to write a performant library at work, I too might rely on CPU-specific assembly wrappers in my code. But IMO, such code has no place in a general-purpose cross-language benchmark site.
My guess is that other people would take your "performant library at work" premise as justification for including that code.
What does "idiomatic" C even mean? It's a high level assembler and as such should not limit the creativity of programmers using it.
C code that pretends it does not need to care about it's platform is not idiomatic, it's just suboptimal.
By "idiomatic C" I meant any of the following:
- Code that most C books/courses would teach you how to write
- Portable C code (arguably portability is one of C's biggest successes!)
- Code that you'd expect to find in the K&R book
One of C's biggest marketing successes.
All high level languages are portable, including some older than C.
It helps when one drags the OS the language was born on, along via parallel standards like POSIX.
> All high level languages are portable, including some older than C.
Theoretically portable, maybe. But C is actually portable in practice - you can compile C code for the vast majority of CPUs and OSs.
Devil’s advocate: compared to assembly, C is portable, but compared to a lot of high-level languages, C isn’t very portable.
Every “implementation defined” feature (https://en.wikipedia.org/wiki/Unspecified_behavior#Implement...) creates two different languages (examples are sizes of basic numeric types, signedness of the char type, and evaluation order of arguments)
Also, the standard library is so limited that, historically, you needed lots of stuff that wasn’t standardized between systems (https://en.wikipedia.org/wiki/GNU_Autotools: “It can be difficult to make a software program portable: the C compiler differs from system to system; certain library functions are missing on some systems; header files may have different names.”)
But yes, bringing up C on a system is a lot easier than doing the same for, to pick another extreme, Ada, but that’s because only half the job is done. Autotools does a bit of the rest, but still requires help from program writers.
> compared to a lot of high-level languages, C isn’t very portable.
Right, with C, you may have to fix a few bugs to get a program working properly in a new system due to the reasons you mention.
But with many other languages, you'd have to port a compiler and the standard library first, which is probably harder :)
Depends on the language, luckily other languages are equality blessed in portability as other people have spent similar resources making their toolchains available elsewhere.
Yeah, but isn't due to the language, rather to the amount of people that have spent money and resources to implement compilers for all those CPUs and OSs.
Original C was created after UNIX already existed, used for UNIX's first rewrite and until UNIX source code left Bell Labs, it only worked on PDP-11 computers.
That's not the history as I'm familiar with it. What I read - and remember, which may be faulty - is that the first attempt to write Unix failed because what they wanted to achieve was too complex to implement easily. Adding structs to C allowed them to overcome this bottleneck and led to the first production versions of C.
Here is a refresher, with some key quotes.
http://cm.bell-labs.co/who/dmr/chist.html
> Thompson was faced with a hardware environment cramped and spartan even for the time: the DEC PDP-7 on which he started in 1968 was a machine with 8K 18-bit words of memory and no software useful to him. While wanting to use a higher-level language, he wrote the original Unix system in PDP-7 assembler. At the start, he did not even program on the PDP-7 itself, but instead used a set of macros for the GEMAP assembler on a GE-635 machine. A postprocessor generated a paper tape readable by the PDP-7.
> Although we entertained occasional thoughts about implementing one of the major languages of the time like Fortran, PL/I, or Algol 68, such a project seemed hopelessly large for our resources: much simpler and smaller tools were called for. All these languages influenced our work, but it was more fun to do things on our own.
> By 1970, the Unix project had shown enough promise that we were able to acquire the new DEC PDP-11. The processor was among the first of its line delivered by DEC, and three months passed before its disk arrived. Making B programs run on it using the threaded technique required only writing the code fragments for the operators, and a simple assembler which I coded in B; soon, dc became the first interesting program to be tested, before any operating system, on our PDP-11. Almost as rapidly, still waiting for the disk, Thompson recoded the Unix kernel and some basic commands in PDP-11 assembly language. Of the 24K bytes of memory on the machine, the earliest PDP-11 Unix system used 12K bytes for the operating system, a tiny space for user programs, and the remainder as a RAM disk. This version was only for testing, not for real work; the machine marked time by enumerating closed knight's tours on chess boards of various sizes. Once its disk appeared, we quickly migrated to it after transliterating assembly-language commands to the PDP-11 dialect, and porting those already in B.
> By 1971, our miniature computer center was beginning to have users. We all wanted to create interesting software more easily. Using assembler was dreary enough that B, despite its performance problems, had been supplemented by a small library of useful service routines and was being used for more and more new programs. Among the more notable results of this period was Steve Johnson's first version of the yacc parser-generator [Johnson 79a].
> By early 1973, the essentials of modern C were complete. The language and compiler were strong enough to permit us to rewrite the Unix kernel for the PDP-11 in C during the summer of that year. (Thompson had made a brief attempt to produce a system coded in an early version of C—before structures—in 1972, but gave up the effort.)
What failed was the initial port attempt to C, not UNIX running on PDP-7.
While I am quite critic of C, I take the effort to learn and criticise properly, because regardless of my opinion it isn't going away on my lifetime and bills need to be paid.
> " (Thompson had made a brief attempt to produce a system coded in an early version of C—before structures—in 1972, but gave up the effort.)"
Yes, but that is the key bit right, the fact that C didn't have structures yet is what caused the effort to be aborted, and once those were added they succeeded.
Except my point was
> Original C was created after UNIX already existed, used for UNIX's first rewrite and until UNIX source code left Bell Labs, it only worked on PDP-11 computers.
Regardless of failed porting effort, the original UNIX, implemented in Assembly and running on the PDP-7, was already being used by a small community.
"Portable" - yes. But will likely require the effort of porting due to platform specific sections.
Practically all non-trivial C codebases have platform specific segments. They can be either isolated to their own libraries or wrapped around tons of preprocessor switches.
C code is portable in the sense that it's not too much work to get a C compiler to work on a new hardware platform. On the other hand, compiling a large real-world C codebase on the said platform will probably require porting, and the effort may or may not be trivial.
cpu feature detection is way easier in rust. its just built into the core libs.
Doesn’t gcc have a CPU feature branching system? You use it by attaching attributes to functions that say what instructions are required. Granted, it’s not as “elegant” as Rust, but it does exist.
And?
These are the kind of things that make working in one language different from another.
I'd actually like to see the SSE version in C so I can compare the two implementations and how much grief you have to go through.
The fastest n-body program is written in very idiomatic rust. https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
In https://old.reddit.com/r/rust/comments/kpqmrh/rust_is_now_ov... /u/Saefroch writes:
Some years ago I submitted an n-body implementation that used the crunchy crate to unroll the inner loop. This was rejected as being non-idiomatic and obscuring what compilers are/aren't capable of. Rust is currently leading the benchmark because someone added the flag -C llvm-args='-unroll-threshold=500' which achieves the same effect. Why one of these is acceptable and the other isn't is beyond me, and all of this makes the whole project very discouraging.
n-body in C compiled by clang runs just as fast as Rust apparently:
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
It's not entirely surprising that a carefully-optimized C program using explicit SSE intrinsics, plus a fancy trick involving a low-precision square root instruction fixed up with two iterations of Newton's method, would be fast. :-)
What impresses me is that the Rust version didn't do any of that stuff, just wrote very boring, straightforward code -- and got the same speed anyway. Some impressive compilation there!
Someone wrote a 6 part of blog post [0] about porting that nbody benchmark from C to Rust. They went from a straight line by line port using unsafe rust using the same SSE based design to clean no-unsafe and no SSE rust code that was faster then the original C code with hand optimized SSE.
It is a great example of how the Rust compiler can auto-vectorize code.
I would be shocked if those two programs performed the same. C and Rust are using the same compiler backend, better aliasing information probably isn't going to make that big of a difference.
Are you sure the rust performance data isn't for one of the other implementations that use the same crufty tricks as the C version?
e.g.: https://benchmarksgame-team.pages.debian.net/benchmarksgame/..., https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
It's pretty startling, but yes, that non-crufty version is "Rust #8" that tops the n-body performance lists:
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
It looks like the autovectorizer did a really good job on this one.
It also looks like this version is using an algorithm that none of the others use: it precomputes the distance pairs for all bodies. Some of the others precompute the vectors between the pairs, but that's not the expensive part of computing the distance.
As an aside: this made me notice this n-body simulation only has 5 bodies! This is a pretty strange case that makes these O(n^2) optimizations practical. The n-body simulations I've been familiar with in the past had thousands of bodies, where this approach probably isn't a good idea.
Good point! It would be interesting to find out where the Rust version gets most of its speed from.
The Rust compiler can auto-vectorize loop code.
This blog post shows how to write simple idiomatic Rust code that will allow the compiler to auto-vectorize:
But, C and C++ compilers also can autovectorize quite well. I had some SSE & AVX algorithms where the compiler ended up doing the same or lightly better job in C++ for instance.
So maybe it's just the C benchmark being a cargo cult.
Yes, it could be mostly LLVM doing the heavy lifting here, for all we know.
The code for the C-Clang version is terrifying, compared to the Rust version. Which one would you rather maintain?
Well, given the comments at the top:
// Contributed by Mark C. Lewis.
// Modified slightly by Chad Whipkey.
// Converted from Java to C++ and added SSE support by Branimir Maksimovic.
// Converted from C++ to C by Alexey Medvedchikov.
// Modified by Jeremy Zerfas.
It sounds like no-one bothered to actually write a from-scratch version of many of these things :)
From the page: "… a pretty solid study on the boredom of performance-oriented software engineers grouped by programming language." I find this both funny and consider it true to some degree. There's nothing like a good old friendly arms race for the benefit of all (languages and its users, in this case) involved!
I was wondering if perhaps this was actually measuring a difference between LLVM and GCC, but they also provide a set of benchmarks of C Clang vs C GCC (1) and Clang is generally slower in those test. Although there is some correlation between the ones Clang wins in C And Rust.
1. https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
Rust can be faster than C because in general C compilers have to assume that pointers to memory locations can overlap (unless you mark them __restrict). Rust forbids aliasing pointers. This opens up a whole world of optimizations in the Rust compiler. Broadly speaking this is why Rust can genuinely be faster than C. Same is true in FORTRAN, for what it's worth.
Why does the Rust compiler not optimize code assuming that two mutable references cannot alias? —https://stackoverflow.com/q/57259126/155423
Follow along here for re-enabling: https://github.com/rust-lang/rust/issues/54878
For what it's worth I did say "can be" not "is" because I wasn't sure of the current state of this feature. I was just passing on the theory.
There are certainly other potential reasons, for instance constant expressions and generics. And, of course, the prohibited undefined behavior makes other optimizations possible, and potentially better.
> C compilers have to assume that pointers to memory locations can overlap, unless you mark them __restrict...
What I don't fully understand is: "GCC has the option -fstrict-aliasing which enables aliasing optimizations globally and expects you to ensure that nothing gets illegally aliased. This optimization is enabled for -O2 and -O3 I believe." (source: https://stackoverflow.com/a/7298596)
Doesn't this mean that C++ programs compiled in release mode behave as if all pointers are marked with __restrict?
restrict and strict aliasing have to do with the same general concept, but aren't the same. They both have to do with allowing the compiler to optimize around assuming that writes to one pointer won't be visible while reading from another. As a concrete example, can the following branches be merged?
Enabling strict aliasing is effectively an assertion that pointers of incompatible types will never point to the same data, so a write to y will never touch *x. restrict is an assertion to the compiler on that specific pointer that no other pointer aliases to it.void foo(/*restrict*/ bool* x, int* y) { if (*x) { printf("foo\n"); *y = 0; } if (*x) { printf("bar\n"); } }OK thanks, indeed Clang is able to generate better assembly using __restrict__. And -O3 generates the same assembly as -O3 -fstrict-aliasing (which is not as good as __restrict__).
I wish there was a C/C++ compiler flag for treating all pointers as __restrict__. However I guess that C/C++ standard libraries wouldn't work with this compiler option (and therefore this compiler option wouldn't be useful in practice).
What's interesting to note though is that I tried marking pointers __restrict__ in the performance critical sections in 2 of my C++ projects and the assembly generated by Clang was identical in all cases!
So while it is true that by default Rust has a theoretical performance advantage (compared to C/C++) because it forbids aliasing pointers I wonder (doubt) whether this will cause Rust binaries to generally run faster than C/C++ binaries.
On the other hand Rust puts security first, so there are lots of array bounds checks in Rust programs (and not all of these bounds checks can be eliminated). Personally I think this feature deteriorates the performance of Rust programs much more than you gain by forbidding aliasing pointers.
> so there are lots of array bounds checks in Rust programs
Depends on how those programs were written. Iterators should avoid bounds checking for example.
Likely most C code, particularly data structure code, would break if compiled with a setting that treats all pointers as restrict.
I think C and C++ have enough problems with accidental undefined behavior already without making aliased pointers into UB.
Not for char. Compiler always assumes non-restrict for char pointers and arrays, which is important to remember if you're ever operating on a RGB or YCbCr matrix or something.
Huh.
Does that also hold for a "uint8_t"--which is often just a renamed unsigned char rather than being a genuine type of its own?
not according to the standard, but in practice yes, currently. there are arguments that it should be considered not for optimization reasons, but there is likely too much existing code relying on it to change behavior. (see related llvm, gcc bugs)
Yes
Well you are saying that even in C you can use the restrict keyword to tell the compiler that 2 memory locations can overlap. Of course is in the hand of the programmer to tell the compiler to do so.
I don't think there is a fair comparison between Rust and C: C is just an higher level assembler, if the programmer knows what he's doing he can use the hardware 100% of its potential. That is the reason why C is still used in all the embedded applications where you have ridiculous low power microcontrollers and you must squeeze out the best performance.
That is the difference between C and Rust to me: for each fast Rust program you are guaranteed that you can write an equivalent performant program in C (or assembly). Worst case scenario you use inline assembly in C and you get that.
Thus the contrary cannot be true for Rust: if I give you a heavily optimized C program not always you can produce an equivalent version in Rust.
Also not always these optimizations are what you want. In C you can choose the level of optimizations, and most of the time, at least on the program that I write, I choose a low level of optimization. The reason is that a lot of time performance is not the only thing that matters, but it maybe matters most the stability of the code (and a code compiler with optimizations is more likely to contain bugs) or the ability to debug (and thus the readability of the assembly output of the compiler).
Rust gives out an horrible assembly code, that is impossible to debug, or to check for correctness. You just have to hope that the compiler doesn't contains bugs. For the same reason Rust is the ideal language to write viruses, since it's difficult to reverse engineer.
> Well you are saying that even in C you can use the restrict keyword to tell the compiler that 2 memory locations can overlap. Of course is in the hand of the programmer to tell the compiler to do so.
It says that they can't overlap, but yes, you can get the compiler to optimize based on this if you provide the aliasing information and remember to keep it accurate. You probably won't do that, though, for anything except the most performance-critical of inner loops. A compiler that can infer more about aliasing can provide more optimization, safely, in the >99% of code that doesn't have explicit aliasing annotations, and that's probably worth some decent speedups in practice.
There are two main things you might be talking about when you call a programming language fast:
1. Given some really performance-critical code and enough time to hand-optimize it, how close can you get the compiler's output to the optimal machine code?
2. If you write code normally, not making any effort to micro-optimize, how fast will it usually be?
Both of these matter. #1 matters when you've got some bottlenecks that you really care about, and #2 matters when you've got a flatter profile -- and both situations are common.
Another illustrative example of the #2 type of performance with Rust is the default collections compared to the ones in C++. Rust's default HashMap is faster than C++'s std::unordered_map because of some ill-advised API constraints in the C++ STL. You can get similar performance by using a custom C++ hash table implementation, and in fact Rust's HashMap is a port of a C++ hash table that Google wrote for that purpose, but most people probably won't bother.
So, a semantic question: if you can get the same speed in one language as you can in another, but in practice usually don't, is one language faster than the other?
> but most people probably won't bother.
Most people who don't bother with that probably also don't bother with the performance of their hashmap.
People think C is an high level Assembler, that was only true on PDP-11, 8 and 16 bit CPUs.
Also in C you cannot retrofit restrict in existing programs, specially they depend on existing binary libraries.
> Thus the contrary cannot be true for Rust: if I give you a heavily optimized C program not always you can produce an equivalent version in Rust.
Do you have some evidence here? Or something more specific?
(The rest of your comment is opinions, which you can of course have, but does not match my personal experience, FWIW.)
There's a translator for C99 to (unsafe) Rust: https://github.com/immunant/c2rust
So probably you can give me any C program and I'll be able to give you an equivalent Rust program. It'll probably perform about the same, too.
> That is the difference between C and Rust to me: for each fast Rust program you are guaranteed that you can write an equivalent performant program in C (or assembly). Worst case scenario you use inline assembly in C and you get that.
Rust also allows inlining assembly.
> Thus the contrary cannot be true for Rust: if I give you a heavily optimized C program not always you can produce an equivalent version in Rust.
This is incorrect for the reason above. Worst case scenario you just inline assembly in Rust to reproduce exactly the same performance of any arbitrarily optimized C program.
I think you'd be hard pressed to find more than a handful of usual C programmers, even in embedded, who know what the __restrict keyword does, let alone are rigorous in its application.
I've always wondered if the reason why Rust keeps running into llvm bugs around restrict is that it is used so sparingly, and the semantics being unclear enough, that these codepaths just aren't exercised as often.
Besides the previous answer, it isn't supported by ISO C++ (although it does exist as extension), because the majority of the WG thinks it will go the same way as register and inline in the long run.
> For the same reason Rust is the ideal language to write viruses, since it's difficult to reverse engineer.
Eh, not really.
In theory one day Rust will but LLVM doesn't support anything beyond the function level for noalias since that's all that's needed to support __restrict in C. Even that isn't used by Rust most of the time since they have found several bugs in it.
The Rust vs C Clang comparison [1] has Rust winning on every benchmark except pidigits, and that's only by .01 seconds. Memory usage and binary size is competitive as well.
[1]: https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
Which is not at all surprising. Rust has much larger compilation unit and knows more about what can read/write a particular piece of memory. This allows some occasions for optimization where C compiler must be conservative.
An example of simpler version of this is Fortran that can be faster for numerical loads due to the fact that Fortran disallows aliasing of function arguments. C on the other hand, must pay the price of having to be conservative with how it treats arguments just in case they overlap.
Wouldn't the C compiler be allowed to make similar assumptions with -flto or -fwhole-program?
Yeah I can understand why. Though I still prefer C in some ways simply because of its minimalism in the language, while still allowing for the kinds of things you want to be able to do if you wanna push your code to the limits. It is a bit scary sometimes writing in C though and sometimes I get a bit annoyed at how some things work or not work in it. I personally am hoping Zig can soon fill this minimalism trait in langs for me. C will probably always be the standard though and I do think more programmers should learn C better to become better programmers.
Rust is a good lang though. I am glad something else is pushing up there for the top spots. And more competition in performance is a good thing.
You don't always have to pick your sides. I just want to be able to write good code and be happy writing it :)
You'll be happy to know I implemented the n-body benchmarks game in zig 0.6.0 and it absolutely thrashed the rust version. Submitted it, but they don't take PLs not on the board currently.
But how much? Do you have an idea why?
Honestly I have no idea. You can try it yourself, or put it through godbolt:
https://gist.github.com/ityonemo/f34e0d3b4891f481863980a3142...
Note it could also be platform dependent. I'll give it a whirl on my own (new) machine.
Do you think that it may be caused by use of comptime? I have not looked at the benchmark implementations, but that kind of struck me as a reason
Well, I ran it again and now it's 10% slower than rust (the best rust impls have gotten better since I last tried); I think there's a simd optimization I'm missing, I'm not doing simd operations in the "pairs" phase
Just took a quick peek at the binary-trees C code. Why using openmp while the others don't? why use recursive functions? The C implementation is not correctly optimized. People just paid more attention to Rust and other languages. Rust can be as fast as C, but "faster" is really misleading. BTW, I use clang all the time since it's better than GCC.
It seems that someone simply tacked openmp on without checking if it actually increases performance in this case - which is unusual, because I remember a course in college, part of which was a project aiming to teach the student that openmp is not a silver bullet and should be used only when it actually helps.
I submitted the C binary-trees program. I did check to see if OpenMP increased the performance and it most certainly does. Just looking at the CPU and clock time usage on https://benchmarksgame-team.pages.debian.net/benchmarksgame/... you can see that the clock time used is about one third of the CPU time that was used.
How do you know that this is not one of those cases when openmp actually helps?
You're probably referring to the C program I submitted.
OpenMP is only available for C, C++, and Fortran so that is why most programs won't use it. However most of the other programming languages have their own ways for doing multi-threading and many of the programs for this benchmark do make use of them.
The rules at https://benchmarksgame-team.pages.debian.net/benchmarksgame/... request that submitters use the same algorithms as existing programs and I try to follow that rule. I believe all the binary-trees programs are using recursive functions so naturally I did the same to avoid breaking the rule about using different algorithms.
I'm having a hard time making sense of this page. Why is this comparing fastest implementation with slowest implementation? Why is the metric busy time/least busy? Why is C++ so much better than C?
Speaking generally and about no specific program, you should expect C++ to be faster than C. C++ has more ways for the programmer to communicate with the compiler.
At the same time, a lot of those mechanics involve additional overhead (e.g. vtable lookups for dynamic dispatch per inheritance).
But yes, some of these do provide hints for the compiler, e.g. constexpr, ownership semantics per unique_ptr. There's nothing stopping a human from writing equivalent C, so my suspicion is that the performance gap is primarily due to the benchmark implementation.
While nothing prevents you from writing C that will generate the equivalent code, it requires several times more lines of C than the equivalent C++. At which point, it becomes an economics discussion. C is usually a choice when pure performance is secondary to other considerations like portability.
Most C++ code I see has few vtables, and what vtables exist are mostly removed by the compiler. Java-style inheritance hierarchies are not idiomatic in C++. The code gen for C++ looks a lot like hyper-optimized C code, but without having to write hyper-optimized code. C++ naturally provides much more information about intent to the compiler than C does, and modern compilers are excellent at using that information to provide nearly optimal code gen.
> Most C++ code I see has few vtables, and what vtables exist are mostly removed by the compiler. Java-style inheritance hierarchies are not idiomatic in C++.
FWIW I feel like this is a fairly recent (good!) development. Up through TR/TR1 I think at least half the C++ code I saw was pretty heavy on the "Java envy" and not at all concerned about the cost of dynamic dispatch or memory allocation. Avoiding vtables was derided as part of "C with classes" - today that usually refers more to templates and exceptions but STL implementations were not mature enough and too much code not exception-safe for those to be universally viable back then. Something I heard more than once was that if you had a destructor it should always be virtual just in case someone wanted to subclass it later.
Quite natural given that Java was created to be attractive to 90's C++ developers, which carried their development practices into Java.
Just a small, correction, you mean 90's style C++ GUI frameworks inheritance are no longer idiomatic in modern C++.
Naturally we have to ignore that wxWidgets, Gtkmm, MFC, ATL, Qt, COM, DirectX, IO Kit, DriverKit, Skia are still around.
> you mean 90's style C++ GUI frameworks inheritance are no longer idiomatic in modern C++
I mean, as much of a Qt user I am, that's definitely the case. All the libs you listed are still around, but no one writes new libraries in that style (sometimes at a performance cost, e.g. using a ton of std:: function instead of virtuals)
Right,
https://microsoft.github.io/microsoft-ui-xaml/
https://developer.apple.com/documentation/driverkit
https://www.khronos.org/registry/SYCL/
https://developer.nvidia.com/optix
https://docs.unrealengine.com/en-US/ProgrammingAndScripting/...
I could keep adding a couple more, but I think you got the point by now.
Probably fairer to say that the virtual inheritance architecture is idiomatic to GUI libraries rather than any particular language? You even mention GTKmm, a C++ wrapper around a C library that leverages inheritance.
I can also pull iostreams, CORBA, COM/DCOM and more recently UWP cards if you like.
Being around long enough, the meme writing Java in C++ just rubs me the wrong way for something that was already common several years before Gosling even thought of Oak.
As if C++, alongside Smalltalk, and all those enterprise object modelling methodologies and books, weren't to blame for the widespread of that way of programming.
When Java came into the scene, those C++ and Smalltalk developers (and enterprise architects) migrated to it, and kept doing what were already considered "best practices" by then.
You've got it backwards. For the same amount of polymorphism in the design of a program C++ is likely to be faster because a C++ compiler can often devirtualize calls but a C compiler faced with a home-grown vtable (the struct of function pointers that every large C program eventually uses) will never be able to do so.
Why bother to write the vtable in the first place?
My experience is that C++ that's written for performance will often prefer the use of templates over inheritance since the cost is then paid upfront by the compiler. What's stopping a C programmer from hand-coding template instantiations (via macro or otherwise)?
Sure you are welcome to reimplement C++ in C macros. When your time is worthless anything is possible.
> When your time is worthless anything is possible.
I kinda want this on a mug or something.
An unrelated rant about benchmarksgame. Has anyone noticed that the Python implementation of regex beats a lot of the Rust and C implementations? That’s because it uses the PCRE2 library (written in C) which it assumes is installed on the OS. Benchmarks are always artificial but this seems like a step too far: the benchmark hardly says anything about Python and is dependent on the OS environment having the right dependences.
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
I wrote that program in the hope that it would better illustrate why some of the benchmarks on the site aren't very good since for some benchmarks the program performance is highly dependent on the libraries being used and not the programming language implementation itself. I know at least one person opened an issue regarding this on the site issue tracker at https://salsa.debian.org/benchmarksgame-team/benchmarksgame/... and I also believe I recall one of the Rust regex crate developers mentioning this too.
I know that using PCRE2 in a Python program isn't typical but the Benchmarks Game does allow using other libraries and many of the previously submitted programs have been doing this for a long time already. The pidigits benchmark is the other benchmark that is heavily dependent on the libraries being used as is illustrated by their being a nearly ten way tie for second place by a bunch of programs that all use GMP.
Sure, I understand your concern. However, this is also very representative of python. Python programs are very much about using libraries for the heavy lifting. Python has never been a "distributable single binary" language, and has always required certain OS dependencies, or installable "wheels" (binary dependencies)
Let's talk about speed when we are implementing the same algorithm and optimizations, please. If $1 was donated to cure cancer every time a developer games a comparison like this, there would be no more cancer.
I sort of doubt that, billions of dollars have been spent on cancer research.
Rust seems to be using parallelism better. In one benchmark though (fasta), C gcc is using all 4 cpus and Rust only two, and still wins.
(Looking at just C gcc vs Rust) https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
The C version uses OpenMP, while the Rust version doesn't.
I tried running the C code with 2 and 4 threads, wall-time and CPU time don't change much in either case which is strange (this is cygwin with gcc 10.2.0):
2 threads:
4 threads:$ /usr/bin/gcc -pipe -Wall -O3 -fomit-frame-pointer -march=ivybridge -fopenmp fasta.c -o fasta.gcc-2.gcc_run && time ./fasta.gcc-2.gcc_run 25000000 | md5sum fd55b9e8011c781131046b6dd87511e1 *- real 0m0.724s user 0m1.468s sys 0m0.108s$ /usr/bin/gcc -pipe -Wall -O3 -fomit-frame-pointer -march=ivybridge -fopenmp fasta.c -o fasta.gcc-2.gcc_run && time ./fasta.gcc-2.gcc_run 25000000 | md5sum fd55b9e8011c781131046b6dd87511e1 *- real 0m0.670s user 0m1.514s sys 0m0.046s
Nodejs is incredible fast for a interpreted language. It is only ~4 times slower than Rust and only a bit slower than Go or Java (compiled GC languages) in the benchmarks. Compare that with Python 3, also interpreted but ~30 times slower than Rust.
I know that Python 3 can do some runtime stuff that Nodejs can't, but I wonder whether that's worth so much performance. Maybe if the answer is that you would include C modules in Python if you need the speed, but I don't know if that's a good answer to the problem.
Node uses V8, which does JIT compilation, while CPython is a straight bytecode interpreter. A better point of comparison would be PyPy
Are there any CPython vs PyPy benchmarks?
Okay according to their results: https://pybenchmarks.org/u64q/chartbox.php?s=eNptUtttAzEMW0l... PyPy seems to be twice as fast as CPython in these benchmarks. While this is indeed a noticeable achievement, it is still much slower than Node.
> I know that Python 3 can do some runtime stuff that Nodejs can't
Examples? This is too interesting to just throw in then hand wave away.
Honestly, I don't know. I think I read it somewhere, as explanation for why python is so slow. It made sense to me, that it must be language features. I hope there's someone on HN who can explain it.
One of Rust's performance advantages is the compiler's ability to unambiguously determine memory aliasing. Aliasing is why many numeric kernels are written in Fortran, a much older language that also enforces strict aliasing as it simply doesn't allow overlapping references.
There are probably others as well, but this is the advantage I'm familiar with. C's ambiguity makes it harder to achieve some optimizations that can really matter on modern CPUs.
LLVM's support for this is bugged, so rust does not currently take (edit: full) advantage of it.
Rust does not take full advantage of it; that is, &T will still get noalias, it's &mut T that's currently disabled.
The tracking bug is https://github.com/rust-lang/rust/issues/54878
Here is an example which, while trivial, demonstrates some of the optimization issues you can run into in C and C++ but not in Rust, even with noalias currently only applying to &T.
C++: https://godbolt.org/z/dTPsbh Rust: https://rust.godbolt.org/z/Mdx87h
In C and C++, compilers are allowed to infer "noalias" based on pointer types; two pointers of different type are not allowed to alias. This is known as type-based alias analysis. But char * is given special treatment, because of its use as a generic pointer type; so if one of your arguments is char * or even signed char * , that disables any optimizations which were relying on type-based alias analysis.
This provides both for performance and undefined-behavior footguns. If you ever try to use some pointer type other than char * or signed char * to refer to data of another type, you may inadvertently cause type-based alias analysis to kick in, causing invalid optimizations and miscompilation. On the other hand, if you have a function which takes both a char * and another pointer, the compiler may not apply optimizations that it otherwise could because char * is allowed to alias anything.
In Rust, there is no such undefined behavior footgun. Because of the LLVM bug, noalias isn't applied to &mut pointers so there are still some cases which could be better optimized, though it sounds like there is some progress being made on the LLVM front so it should be fixed at some point, and there are already places where the compiler can do better optimizations with better safety due to the stronger semantics of &T.
Ah, my mistake. Thank you for the correction.
It's all good; most people don't draw the distinction. I myself didn't know that &T was also noalias for years.
I really hope this gets fixed at some point, just because I'd like to see how much faster rust can get. In particular I wonder how much it would impact the compiler itself since it is also written in rust.
Does not fully take advantage of it.
Unfortunately that's only on paper due to LLVM issues.
Some of the rust versions calls C libraries for its heavy lifting (gmp, pcre) so I wouldn't take this too seriously.
As soon as the libraries are RiiR, then the Rust compiler can optimize across those library calls.
That's not good enough. Rust already has a pure-Rust regex library. (I'm its author.) It is the only non-PCRE regex engine to appear in the first 20 results of the regex-redux benchmark. (The 21st I believe is currently RE2.) When using the regex crate, you would not materially benefit from optimizations across library calls, nor is it the difference maker here. Highly optimized regex engines depend more on internal inlining. Take a look at the object code for a program compiled with PCRE2 or the regex crate. You'll find huge functions internal to the regex library where inlining has been forced to reduce overhead. Those things are never going to be inlined across library boundaries.
> Those things are never going to be inlined across library boundaries.
What prevents this?
I trust you on on this, but where is the remaining work? Language semantics, compiler, third choice?
It's prevented by good sense. The functions are likely multiple KB in size. Inlining them would seriously bloat the binary and would be unlikely to help due to how much work most regex engines do on each search.
The remaining work _on this particular benchmark_ is the regex algorithm itself. I'm on mobile so I can't do a deep dive, but I haven't yet figured out how to easily improve on this particular case. It has to do with the fact that the benchmark has a high match count and the finite automata approach in the regex crate has a bit higher overhead than the typical backtracking solution used in PCRE2 (which is also JIT'd in this case).
It's not the language, compiler, inlining or any other such thing. It's algorithms.
But this is one single benchmark. Before regex-redux there was regex-dna, and Rust's regex crate was #1 there. Why? Same reason. Algorithms.
You can't judge regex performance by a single benchmark. Two won't do it. Not even ten. It's one of the many problems with the Benchmark Game. This would be fine if everyone was circumspect and careful with their interpretation of the data provided, but they aren't. And the Benchmark Game doesn't really do much to alleviate this other than some perfunctory blurbs in some corners of the web site.
With that said, running a benchmark is hard work. It's easy to criticize.
> … doesn't really do much to alleviate this … It's easy to criticize.
Indeed.
Especially easy if it's just fault-finding without any suggestion as to what might be done to "alleviate this" :-)
You can do better than that, but I suppose I don't expect more than an insubstantial pithy quip from you.
Add analysis to each benchmark. Require submissions to come with analysis. Or more minimally, make the existing disclaimers on the web site more discoverable through one of any number of means, up to taste.
> Add analysis to each benchmark.
Perhaps you'd like to take on that task?
> Require submissions to come with analysis.
Which raises the barrier for program contributors and presumably would require me to judge whether their analysis was acceptable? Me? Really? ;-)
> disclaimers on the web site more discoverable
The problem is that no one wants to "discover" disclaimers.
We want to see something that supports whatever it is we already believe — and we're very good at ignoring anything else.
You asked me what could be done. I answered. I'm sure you have your reasons why you don't do more. But you asked.
> The problem is that no one wants to "discover" disclaimers.
Again repeating something I already said. It is indeed a problem. There are ways to address it. I personally think you do the absolute minimum.
> It is indeed a problem. There are ways to address it.
Please make a specific suggestion.
I did. You just didn't like their costs. Which is reasonable. They do have costs. I just happen to think they are worth it.
For analysis, it increases review time and increases the burden on contributors. The extent to which any one person can review all analyses isn't clear to me, and they would likely need to rely on contributors to self regulate them. But I would expect the maintainers to review them for some minimal level of quality. Being more specific than this would require a more thorough proposal. Given my time constraints and how much I dislike interacting with you, it's not something I can do right now.
As for increasing contributor barriers, yes, adding an analysis requirement would do that. However, I think you could compensate for that partially by reducing existing barriers. I've submitted code to your game before, and I'm unlikely to ever do it again. One reason is that interacting with you was unpleasant for me. But I found the interface difficult to navigate (that has since changed) and the rules difficult to discover. I think barriers could be reduced by switching to github (one of the reasons I use github) with more automated checks on submissions and better documented rules. (At the time when I worked on a submission it was unclear for example how to submit a Rust program that used external crates.)
Invariably though, providing an analysis is still more work. IMO, I think it would add rnough value to be worth it. I imagine many amalyses would build on top of others, similar to how code submissions do that today. As can be seen in any thread about the Benchmark Game, there are tons of comments that misunderstand things or make wild guesses. (Followed up invariably a day later with coy quips from you.) Some of those comments are from people that are unlikely to be helped by my analysis idea. But a lot aren't. I think a lot of people would benefit from an explanation of the performance characteristics of the program.
I generally try not to publish benchmarks without analysis. Here's one example.[1] I care a lot about it and I think it has immense benefits in terms of educating others. But reasonable people can disagree. You might have different values than me, which influences whether my suggestions are applicable to you or not.
As for the disclaimer, I don't really see the point of being more specific. There isn't much to this one? A simple suggestion might be to put it at the top of each page or even in the description of each benchmark. I don't know how much this would help things. Maybe people would still just ignore it.
People regularly misinterpret benchmarks and especially those in your Benchmark Game. From what I can tell, you as the maintainer have done very little to address that. Maybe you don't care. Maybe you think it isn't your problem to solve. Maybe you think it can't be solved or even improved, and therefore it isn't worth doing anything. I don't know. We might just disagree, which is fine.
Invariably, you can just respond and tell me to go do it. I think that's a cheap response, but I know you're fond of it. I've been thinking a lot about it and considering starting my own benchmark. But it will likely be a few years before I'll get to it. It is a trying daunting amount of work.
> I think a lot of people would benefit from an explanation of the performance characteristics of the program.
For sure!
And that's way beyond the modest aims of the benchmarks game.
> People regularly misinterpret benchmarks … From what I can tell, you as the maintainer have done very little to address that.
Seems to me that misinterpretation is not something that is effectively addressed by website content; it's something that is effectively addressed by responding to what people say when they discuss benchmarks in forums like HN and proggit and … one person at a time.
> … you can just respond and tell me to go do it. I think that's a cheap response…
It isn't intended as a brush-off.
otoh as you show on proggit, it's when you "Just sit down and actually try to plan out what you would do." that you start to understand what is involved.
otoh I'd like to see others do all that stuff the benchmarks game does not do, in whatever way they choose https://pybenchmarks.org/
> I did. … As for the disclaimer, I don't really see the point of being more specific.
I really was asking for a specific suggestion for "disclaimers on the web site more discoverable".
> A simple suggestion might be to put it at the top of each page or even in the description of each benchmark. I don't know how much this would help things. Maybe people would still just ignore it.
https://www.reddit.com/r/rust/comments/kpqmrh/rust_is_now_ov...
People see what they are looking for.
Looking at the progress of Rust/WinRT, and C++ renaissance thanks to GPGPU computing and mobile OSes stack, that is still a couple of decades away.
And then there are the whole LLVM and GCC based eco-systems.
When a Rust program is faster than the matching C program, it is utterly nonsensical to attribute its speed to C.
It is gratifying to see C++ identified, here, as the hands-down fastest implementation language, but odd to see Rust performance still compared, in the headline, to C, as if that were the goal. The headline should say that Rust speed is approaching C++ speed. In principle, Rust speed should someday exceed C++'s, in some cases, because it leaves behind some boat anchors C++ must retain for backward compatibility. In particular, if compiler optimizers could act on what they have been told about language semantics, they should be able to do optimizations they could not do on C++ code. As it is, Rust relies on optimizers coded for C and C++.
Rust may never reliably beat C++, because Rust has itself committed to details that interfere with optimization, principally in its standard library: The C++ Standard Library offers more knobs for tuning, while the Rust libraries are simpler to use.
Some of the reasons that C++ is so reliably faster than C are subtle and, to some, surprising. As noted elsewhere in this thread, the compiler knows more about what the language is doing, and can act on that knowledge, but C and C++ compilers share their optimizer, so that makes less difference than one might guess. Mainly, the C++ code does many things you might do in C, but in exactly one way that the optimizer can recognize easily.
The biggest reason why C++ is so much faster than C is that C++ can capture optimizations in libraries and reliably deliver those optimizations to library users. The result is that people who write C++ libraries pay a great deal of attention to performance because it pays, and that attention directly benefits all users of the library, including other libraries.
Because you can capture more semantics in C++ libraries, C++ programs naturally use better algorithms than C programs can. C programs much more frequently use pointer-chasing data structures because those are easier to put into a C library, or quicker to open-code, where the corresponding C++ program will use a library that has been hand-tuned for performance without giving up anything else.
Rust gets to claim many of the same benefits, because it is also more expressive than C, and Rust libraries are often as carefully tuned. Rust is not as expressive as C++, yet, and has many, many fewer people working to produce optimal libraries, but it is doing well with what it has.
Your claims contradict my experience.
One example: C re-write of a popular chess engine Stockfish (written in C++) is significantly faster (10%-30% depending on who measures it at which time and on what hardware). This is a piece of code which was already heavily optimized for many years as speed is critical for the engine's performance. One guy (although very talented one) re-wrote it in C and got the speed gains. Another guy (also very talented one) re-wrote it in assembly and got even bigger speed gains (see CFish and asmFish projects)
It looks to me that you are comparing badly written C to decently written C++ and claiming speed gains. I use C for my own software(solver for a popular game) and I wouldn't dream of using stuff like "pointer chasing structures". If you need to use stuff like hash tables in your performance critical code you have to code them yourself anyway as using some generic hash and memory handling algorithm is a recipe for being slow.
Typically any rewrite of a piece of code will result in speed gains because you can take all of the knowledge that resulted in the previous version and use that in writing the next one. To properly discount for this you'd have to do two rewrites: one in the original language and one in the new one.
Well, that would be true if it was the same team doing the re-write and/or the original code was set in stone. Instead it was one guy. Stockfish is still heavily worked on and the developers have access to CFish code. The reason they are not porting performance improvements is exactly that it wouldn't be idiomatic C++ (even though the whole project uses only minimal set of C++ features). One of his explicitly stated goals was to explore possible optimization issues of C versus C++ compilers. That is to see how much typical even light C++ abstractions cost in comparison to more close to the metal style.
This is implausible on its face.
If your new C program is faster than your old C++ program, then you may simply rename files from ".c" to ".cc". Then, you have two C++ programs, one faster than the other.
If your custom hash table really is faster than any library you find, congratulations! Recoding its interfaces, without giving up any performance, you can make it available for use in other C++ programs. As one may deduce, this has already happened, and has produced the better hash table libraries you appear not to know about. Available C++ hash table libraries have benefit of overwhelmingly more optimization attention than could be afforded on behalf of a single program, and they deliver that performance to all dependent programs.
Every programming project is an exercise in practical economics: your strictly-limited available attention goes where you choose. The greater productivity of coding in a more powerful language frees up attention that may then be allocated to areas that would otherwise suffer neglect. Whatever amount of attention you devote to making code in a poor language work at all, you may spend a fraction of coding in a better language, and the balance on other beneficial uses, such as better performance.
That the new program is faster than the old program reliably demonstrates that the old program was not so well optimized as you suggest. (Your comment elsewhere, that "the whole project uses only [a] minimal set of C++ features" reveals perhaps more than you intended.) One may surmise that the old program's authors spent more of their limited attention on its effectiveness at playing chess than the latter program's author needed to.
That the third, assembly-language program is faster still demonstrates that the previous programs left performance on the table. My experience is that it is not hard to double the speed of a typical program just by paying attention to cache and pipeline effects. Most likely, the asm coder just happens to know more about those effects, knowledge that could as well have been applied to the others. Most programs seem fast enough exactly until another, faster one comes along; then they are instantly slow.
>>If your new C program is faster than your old C++ program, then you may simply rename files from ".c" to ".cc". Then, you have two C++ programs, one faster than the other.
While C++ isn't an exact superset of C it's close enough but it's not the point. The debate is about how the languages are used. Otherwise you could always say "yeah, use inline assembly and don't use any abstractions with the exception of structs, pointers and arrays".
>>Available C++ hash table libraries have benefit of overwhelmingly more optimization attention than could be afforded on behalf of a single program, and they deliver that performance to all dependent programs.
They are also general purpose. When you have something specific to optimize you will always beat general solutions.
>>Every programming project is an exercise in practical economics: your strictly-limited available attention goes where you choose. The greater productivity of coding in a more powerful language frees up attention that may then be allocated to areas that would otherwise suffer neglect.
Sure. The debate is about what is faster though and then C++ is not faster than C but C is often faster than even slightly idiomatic C++ and much faster than C++ with all the modern features used frequently. You will get stuff done faster in a higher level language of course but that besides the point of debate which language is faster where performance matters.
>>Whatever amount of attention you devote to making code in a poor language work at all, you may spend a fraction of coding in a better language, and the balance on other beneficial uses, such as better performance.
I don't agree C is a poor language. It's quite a common view as well. A lot of people who are good at low level stuff prefer C to C++ because the language is simple, easy to read and easy to reason about. C is poor at some things, it's fantastic for others. I personally love the language and it's my choice for many weekend projects.
>>That the new program is faster than the old program reliably demonstrates that the old program was not so well optimized as you suggest. (Your comment elsewhere, that "the whole project uses only [a] minimal set of C++ features" reveals perhaps more than you intended.) One may surmise that the old program's authors spent more of their limited attention on its effectiveness at playing chess than the latter program's author needed to.
Stockfish is a big decade+ old and still heavily developed community project with tens of programmers contributing to it. A lot of attention was given to it and to optimizing specific parts of it. Still, it's written in C++ in usual (although still very minimal) style and there is cost to it.
>>That the third, assembly-language program is faster still demonstrates that the previous programs left performance on the table.
Well, my experience is that people good at assembly run circles around very good C/C++ programmers. I would go as far as to say that it's hard to take anyone who doesn't realize it seriously. There is just so many things you can't do in C/C++ even with today very smart compilers.
>>My experience is that it is not hard to double the speed of a typical program just by paying attention to cache and pipeline effects. Most likely, the asm coder just happens to know more about those effects, knowledge that could as well have been applied to the others. Most programs seem fast enough exactly until another, faster one comes along; then they are instantly slow.
You just don't get it. Stockfish is a program which depends on being fast. Making it fast is the number one priority. A lot of very smart people spent hundreds of hours optimizing small parts of it like the best way to generate chess moves in as few CPU cycles as possible, designing the hash table to minimize cache misses etc.
If you can make Stockfish 2x faster you would be considered the prophet of programming and your statues would be raised in cities around the world. Seriously, it's not some average random code which you can make faster applying concept you happen to see on front page of Hacker News. We are talking about the code where performance matters. All the people working on such code know a lot about all the concepts you mentioned because they apply them day to day.
Anyway, CFish, asmFish and Stockfish are all open source projects. You can find them on Github. You can ask the authors why they think their stuff is faster if you are curious about it. I mean maybe it's worthwhile to understand why someone chooses to do all the work to re-write a popular community project to C or asm. It's a lot of difficult work. They might know something you don't.
I have measured abstraction overhead in C++ for 20+ years. It was significant in the early 2000s. It is zero today.
That does not mean C++ programs are necessarily fast: it has always been easier to write slow programs, in any language. It doesn't mean C programs must be slow: making a fast C program just takes a great deal more work, and time, than a fast program in modern C++, so the extra time taken has been wasted.
You are always free to waste your own time. Wasting your employer's time is often less defensible. In the fintech world, suggesting C as a way to speed up a C++ program would get you laughed out of the room, at best.
The common experience noted in the original page is that C++ programs are faster than C programs. We know why.
That the chess program was easy to transcribe to C demonstrates that it was not good C++ code -- which you also revealed yourself. Making a faster C program while transcribing an almost-C C++ program is no big trick.
I routinely speed up programs by 2x by changing only a few lines. A 10-20% improvement in a whole rewrite is a great waste of time. Translating to asm, moreso. I would call a 10-20% improvement negligible, and crediting it to C delusional.
I agree, however from my experiences with lifetime checkers in VC++, it will still take a couple of more years to make it work properly and hardware memory tagging as being pushed by Oracle, Microsoft, Apple, Google and ARM is still a couple of years away to be deployed everywhere.
The only conclusion I have got about this web site is how much some programmers like to write benchmark code in Rust.
Interestingly, C++ seems to be the fastest overall.
After the rust blossom storm I didn't track cpp implementation evolutions.. did cpp compiler perf/libs increased or was is simply faster and still is the same ?
Was faster and is the same, checking isn't free. Not that Rust is slow. C++ is unsafe by default whereas in Rust it is opt-in (lexically scoped). The Rust implementations don't appear to use `unsafe`.
Checking can often be free, because it’s done at compile time, rather than at runtime.
Shouldnt they be about the same? (At least until the LLVM immutability optimizations happen)
I suspect surprising factors are in play.
The ceilings should be similar, but there is an argument that there are cases where idiomatic Rust might outperform idiomatic C or vice versa.
For instance, in real-world usage, you might do a bit more redundant copying in C in places you would avoid it with the borrow checker in Rust.
Conversely, enforcing RAII in Rust might cost performance in some scenarios relative to C.
I'd look at the algorithms here. Looking a bit fishy with sometimes quite serious differences...
The reasons I use C versus comparable alternatives are not limited to speed. For example, the size of the compiler toolchain, the speed of compilation and the size of the resulting executables are all factors I have to consider. I do lots of work on systems with limited resources. How does Rust compare on those points versus, say, GCC.
I mean, you'll get drastically different numbers with all of those examples if you actually ask for options that produce small binaries. That the default flags don't try to make things small (across any of these toolchains) means this isn't really a fair comparison.
The smallest "hello world" binary rustc has ever produced was 137 bytes. https://github.com/tormol/tiny-rust-executable
If that's possible, why should not binaries be small (without too many downsides, eg execution speed) by default?
Because everything is a tradeoff. Getting the binary to be smaller means taking more compile time, because compilers have to do work to reduce the size. It is much faster to produce larger binaries.
Additionally, most compilers produce something that's useful for development and debugging by default, and that means including extra stuff for that purpose.
A big part of why binaries are "big" is dynamic linking and external symbols.
That Rust hello world is just hand-crafted to never be linkable to anything else at runtime and invoke a system call with a buffer. This isn't really how we'd like to do most things.
Interestingly enough, I would say that dynamic linking makes binaries smaller, that code no longer lives in the binary, but another place instead.
I'm pretty sure that for the classic "Hello World" example that's not the case, because we can simply use the linux write system call.
For example, an assembly program that just calls write and then exits is much smaller than even an unlinked version of it that uses printf, because the ELF binary doesn't need to contain information for the linker.
Yes, on the very very very low end, this is the case, but generally, in more real programs, it's not.
(The program I linked is an example of exactly what you're talking about)
Sure. But there's a lot of overhead that comes with it, too.
Just like C compilers, it is a matter of use case, and compiling for size has tradeoffs regarding execution speed.
It’s all depends on developer. I have talent to write really bad code in any language and make code as slow as I want. I can write code in ruby that would outperform code in C. The benchmarks like this don’t give you full picture of language possibilities. On top of everything we also need account for compilers and compiler optimization. Nothing against rust. I write ton of code using rust, and advocating at my work. But this looks like a PR move to me.
Is it just me or has that 'benchmarks game' site been growing less navigable over time? It use to be easy to compare benchmarks across several languages. If that capability still exists somewhere it's buried and I'm not interested in puzzling it out. There are no side bars or menus or anything helpful.
It's a bit hard to find, but the page comparing languages is found via the main page -> "Which are fastest?"
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
1) The parent post linked to charts which "compare benchmarks across several languages".
2) The home page (click the banner) has links which compare benchmarks across two language implementations.
3) Each of those comparison pages has links which compare all the programs, for all the language implementations, for each benchmark.
I am not sure I can buy such a comparison. Someone smarter than me already argued about test implementations. Someone else also put compilers and interpreters into prospective. Of course language expressiveness can gauge in but, IMHO, comparing the same sort algorithm or the same hash table implementation (or n-queens algo) could make much more sense especially with comparable compilers.
If Rust implementation is father than C's, kudos goes to the compiler, not to the language
The Rust language specifically gives more information and thus more optimization opportunities to the compiler. Plus, when comparing Rust code compiled with rustc and C code compiled with Clang, both are using LLVM as the compiler backend, so what primarily comes into play is how expressive each language is, and how idiomatic it is to write code that will be optimized by the compiler.
Both C and Rust are capable of just inlining optimal assembly, so comparing "pure speed potential" is pointless.
OK. So you say that implementations are the same within the respective language capabilities and that the Rust compiler frontend is inherently better than C's thanks to the language expressiveness? Still sounds weird to me, unless the test programs use very different approaches, like parallel programming...
Searched the page for "Rust" which returned nothing... and the box labels are awkward. Why not label them conventionally?
You can see the data in https://benchmarksgame-team.pages.debian.net/benchmarksgame/... but it has no graphic comparison.
> Why not label them conventionally?
Space.
Wondering around the site I found this particular benchmark interesting
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
Why is julia using 250x the memory? (and it`s still fast)
It's including compilation time and memory too, which I would consider a major flaw in the benchmarks game.
If I wrap things in a function and run it a second time in the repl, this is what I get:
So, the second time there's no compilation overhead, just like in Rust, C, C++.$ JULIA_LLVM_ARGS="-unroll-threshold=500" ~/julia-b00e9f0bac/bin/julia -O3 --check-bounds=no julia> include("nbody.jl") run (generic function with 1 method) julia> @time run(50000000) -0.169075164 -0.169059907 2.044478 seconds (294.25 k allocations: 12.599 MiB, 8.89% compilation time) julia> @time run(50000000) -0.169075164 -0.169059907 1.867936 seconds (12 allocations: 1.750 KiB)Julia is not allowed to use its compiler (PackageCompiler.jl) but Rust and C are. It's really silly and imposed by the author of the benchmarks for no clear reason.
Was the Formula One pilot Schumacher as good as he is just because of his Ferrari? Obviously not, but he probably wouldn't have performed as well with a Ford Focus.
If you are willing to spend the time to perform at the highest level, Rust can bring you there.
It struck me a while ago that the most powerful feature of rust is the strong contracts libraries can and must express. This allows people with much deeper knowledge than me to make awesome stuff I can depend on.
There are 2 charts on the page, both using the same title: "How many times slower?"
I don't get those. What's the difference?
The different charts show different programming language implementations.
My experience using ripgrep and fd is that this is also true in real-world programs. :-)
... but not as fast as C++/g++ ?
Yet...
Define 'c'.
C++ still seems to be the fastest.
Apples and oranges. How about Rust vs C++?
Thanks, I'd rather not use it. My time is valuable as well.
Two charts with different languages. Unclear what is being measured. Is this a humor article?
This site has a page where they explain what they're doing - https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
Basically, it's toy programs written in each of these languages. They measure how fast each one is executed. This methodology does have it's limitations, which that page is upfront about.
How do I downvote a thread in HN, I can just upvote it?
Also is there also a way to filter out posts with the word "Rust" in it so I don't see them?
The only option is to Hide it.
Because Rust does alloca for all locals, and this if course faster. Everyone else avoids it for security reasons. Just search the Rust bugtracker for stack overflows.
You are confusing the llvm instruction “alloca” with the C feature “alloca.” llvm will use its alloca instruction for C and C++ local variables the same way as Rust does.
You are correct that the C feature is often banned. Rust doesn’t even support it at all.
No, you are still confusing it. llvm alloca reserves a stack allocation slot. The number of slots is unchecked that's why rust has so many stack overflows. Esp with varargs. Check your issue tracker or source code.
Rust doesn't support varargs. It also supports stack probes on x86, guard pages, all that stuff. It is runtime checked because it is not possible to compile-time check.
I know it's a meme, but it really does seem like most C or C++ code would be better off transitioning to Rust at some point. That includes the entire Linux kernel, web browsers, entire OS's...
Ah, rust. A language which combines the flexibility of assembly language with the power of assembly language.
I think benchmarking C vs C++ vs Rust must only really be useful for researchers. They’re all making a similar tradeoff for performance: forcing you to consider how you use memory. Does anyone work in a field where the performance difference between these specific three platforms matters? I’m genuinely curious. Edit: also, if you could explain briefly why and what makes particular choices out of the three unsuitable, that would be awesome too.
It matters for real-world software development, though the reason may not be intuitive. In theory, for any particular bit of software, you can write code in any of these three languages that has nearly identical performance. In practice, the complexity of expressing equivalent performance can vary considerably depending on what you are trying to do.
There are finite limits to the complexity cost developers are willing to pay for performance. Because the cost in each of these three languages is different to express some thing, sometimes there will be a threshold where in one or more of these languages most developers will choose a less optimal design. This manifests as practical performance differences in real software even though in theory they are equally expressive with enough effort.
This comes with another tradeoff. Efficient expressiveness relative to software performance comes at a cost of language complexity. C is a simple language that has enough efficient expressiveness for simple software architectures. C++ is at the extreme opposite; you can do mind-boggling magic with its metaprogramming facilities that can express almost anything optimally but god help you if you are trying to learn how to do this yourself. Rust sits in the middle; much more capable than C, not as expressive as C++.
This suggests the appropriate language is partly a function of the investment a developer is willing to make in learning a language and what they need to do with it. Today, I use modern C++ even though it is a very (unnecessarily) complex language and write very complex software. Once you pay the steep price of learning C++ well, you acutely feel the limitations of what other languages can't express easily when it comes to high performance software design.
I used to write a lot of C. It still has critical niches but not for high-performance code in 2021, giving up far too much expressiveness. C++17/20 is incredibly powerful but few developers really learn how to wield that power, though usage of it has been growing rapidly in industry as scale and efficiency have become more important. Rust is in many ways an heir apparent to C and/or Java, for different reasons. Or at least, that is how I loosely categorize them in my head, having a moderate amount of contact with all three. They all have use cases where you probably wouldn't want to use the others.
> In practice, the complexity of expressing equivalent performance can vary considerably depending on what you are trying to do.
Yep. Rust's safety means that in some cases, you can be more aggressive because you know the compiler has your back. And in other cases, it makes harder things tractable. The example of Stylo is instructive; Mozilla tried multiple times to pull the architecture off in C++, but couldn't manage to do it until Rust.
This is a fantastic comment and I think it gives me a much better understanding of the tradeoffs here. It seems like which is the best choice depends a lot on what you're trying to express and how well the tool fits the problem you're trying to solve, and a lot less on what the absolute capabilities of a given language are.
I do some graphics stuff. As soon as you get to "this chunk of code needs to be run for every pixel of every 4k frame at 60fps", suddenly the number of clock cycles and registers matters... Some of my platforms don't have GPU's, so it really is squeezing everything possible out of the language and compiler...
I do audio stuff and it is the same there. DSP is easy until it has to be in real time and there can only be minimal latency...
I'm curious, what platforms support 4K output but don't have any kind of GPU?
AWS... When you're cheap and don't want to pay for a GPU. Or a bunch of other cloud providers that don't support GPU's at all.
I think what's most interesting about your comment is the assumption that the three are in the same ballpark. You're not wrong, but it just reminds me of how far we've come. That is, the key is not "which of these three can eke out the last tiny ounce of things," but that Rust has successfully landed across that gap you see in the graph. That it's "(C/C++/Rust) vs everything else" is in of itself an interesting result. You can see some skepticism of the premise elsewhere in this thread, even.
>You can see some skepticism of the premise elsewhere in this thread, even.
Heh, well. The downvotes are clearly trying to tell me something, though I'm not sure what. I can guess. I assumed that this was something widely agreed upon at this point but clearly I assumed incorrectly.
I think ATS is in that category too but it was removed from the benchmarks game at some point. It's also vastly more esoteric and complicated than Rust is from my limited knowledge. Perhaps someday Zig will get there as well. I wouldn't have expected that this niche would ever see so much exploration, as C and C++ were the only game in town for so long.
> It's also vastly more esoteric and complicated than Rust is from my limited knowledge.
I doubt that ATS is more complicated than Rust at on-par feature comparison. It gets more complex only when you reach for a larger set of static constraints you can express in ATS, that are unavailable in Rust: things like dependent types and complete formal proofs that you don't need a separate language for. It feels esoteric only because those features in such low level languages are extremely rare. Also, consider the amount of effort you need to spend on learning a new toolchain (Rust), whereas ATS seems like a plugin/extra compilation step on top of your regular and familiar C toolchain.
My impression was that its facilities for type- and memory-safe pointer usage were fairly sophisticated and unlike anything in more popular languages, which is what I was going off of. I must admit I haven't written any ATS, just done a cursory skim of the documentation and watched a Strange Loop talk a few years back.
Yeah I mean, to be clear I believe you are right and am pretty surprised you were downvoted. It happens I guess.
Graphics software.
Yeah, databases
Could you elaborate a little? I’d be interested in this answer
All three languages are different enough that depending on the language you use it fundamentally alters your database architecture to better fit the mechanics of the language. Databases have a very high levels of internal complexity naturally, so there is a strong incentive to align the design with what the language can express without adding substantially more complexity.
I work on database engines and the impact of language choice on the design, architecture, and implementation of database engines is very evident. In the specific case of database engines, these differences in design can have a very large performance impact.
It's mostly an exercise that's useful for Rust: Internally, to prove that "it works"* and externally, to make a credible name for Rust.
(*) of course part of the output is also looking at the Rust code and evaluating style and `unsafe`-wise what the price for winning was.