C++ vs. OCaml: Ray tracer comparison (2005)
ffconsultancy.comKinda missing the point of C++ by using a linked-list.
If you're going to spend the time to implement an algorithm in C++ at least take the time to make it cache aware. Otherwise it's not really surprising when the performance is the same when you're doing the exact same thing.
Would be curious to see what the performance looks like with a tightly packed scene and static dispatch on a discrete number of primitives instead of using virtual functions.
I was curious about whether using a vector over list would have any effect even though the size is so small and the answer seems no (I guess as it could be expected).
It seems like the listed OCaml program has a default level of 9 where as the C++ one uses 6?
Times using original listing (minus print stmts):
Time after changing the 9 in the ocaml program to a six:$ time ./ray-cpp real 0m3.369s user 0m3.347s sys 0m0.010s $ time ./ray-ml real 0m4.710s user 0m4.703s sys 0m0.003s
Seems like C++ has a good edge. Above was ran on a 64 bit machine.$ time ./ray-ml real 0m4.032s user 0m4.026s sys 0m0.003sList is just one part that makes the C++ impl sub-optimal. All the scenes are allocated on the heap so you're actually bouncing around memory twice(once for the next link list entry, second for the actual heap allocation).
Lastly the virtual call again has the chance to cache miss(although probably only once seeing as we've just got a few impls).
In a small example like this there's a good chance that most of the heap allocations are close together but put this approach into production and you can very easily see fragmentation start driving up your cache misses.
(FWIW I'd also store radius*radius on creation, use a BSP/kd-tree and all sorts of other tricks that really help in these types of computations).
I did a qucik and dirty test with levels = 14.
# program time change ./ray-ml: 21.46 # Original ocaml ./ray-cpp: 17.63 # Original c++ ./ray-cpp-no-dtor: 13.86 # Removed all cleanup, ocaml doesn't do it ./ray-cpp-2: 8.90 # Changed list to vector ./ray-cpp-3: 7.89 # Removed virtual functions ./ray-cpp-4: 5.64 # Preallocated large array for all GroupsSolid stuff, glad to see someone testing my assumptions :).
If there's one thing I could teach new CS-grads it's how much your memory access patterns(and cache) really-effing-matter. Everything else is almost noise in comparison.
I'm pretty curious about this: have memory access patterns and cache awareness become more of a big deal in the past decade or so, or has it only recently reached my awareness? I studied CS in the early 2000s and while I was aware of various trade-offs between contiguous and linked data structures, it's only in the last few years that the importance of cache (and non-"system" languages' widespread flouting of its importance) has really bubbled up in my consciousness.
It's certainly likely that I just didn't take the right kinds of classes, or that I did but just missed or failed to really internalize this, but I'm curious if something else is going on. Has the bottleneck increasingly become memory rather than CPU? Have we more thoroughly exhausted other areas of optimization?
I was thinking about this while watching a talk[0] given by someone from Google about data structures in the STL, which decried that the standardized specification for `std::unordered_map` more or less forces an implementation to use buckets instead of the more cache-friendly technique of probing. I'm sure that the people who standardized it were very competent, which makes me think that, at least at the time, buckets were thought to be the better implementation.
[0]: https://www.youtube.com/watch?v=fHNmRkzxHWs
Edit: Added video link.
> Has the bottleneck increasingly become memory rather than CPU?
Yes.
Besides the basic memory latency vs. CPU clock speed issue, there's also:
- CPU caches have gotten bigger, so the payoff of making an program cache-friendly is larger
- CPUs have gained SIMD vector ALUs and increased superscalar behavior, so the CPU is capable of processing more data in per clock cycle
- A cultural shift happened after the freewheeling Moore's Law era ended in the mid-2000s, and people started examining performance more closely
http://gec.di.uminho.pt/discip/minf/ac0102/1000gap_proc-mem_...
Thanks, this is what I had suspected, but it seems to be left unstated in everything I've read.
> Type inference removes the need to specify types explicitly over and over.
These days with C++11's auto, this is improved a lot. It's not full type inference, but eliminates most of the boilerplate, and arguably what's left you really want to keep for documentation purposes.
> The object-oriented representation of a trivial sum type as an inheritance hierarchy of classes is needlessly inefficient.
This use case strikes me as a particularly bad example of sum types: The author wants to represent different kinds of scene members, like spheres and groups, as a sum type. Using a sum type means that anyone wanting to add a new kind of object has to add it to the sum type and handle it everywhere, which will quickly become problematic. Using object-oriented design allows new object types to be introduced without updating everything.
Basically the sum type looks nicer here only because it's a toy example and we're not considering how it will evolve.
That said, I do think sum types are really useful in some use cases, especially in compilers, and I'm sad C++ doesn't have them. I tend to end up defining ASTs using Cap'n Proto as a convenient way to declare a tagged union, even though I have no intent to serialize the structures.
> No need to write constructors for tuples, records and variant types in OCaml.
To be fair, for an all-public struct (as in the example), you don't need constructors in C++ either.
Once you have private members, then of course you need a constructor.
> Native lists in OCaml.
C++11 initializer lists make it basically possible to write list/vector/array classes that "feel native".
> Higher-order functions instead of loops in OCaml.
C++11 lambdas, etc. However, in a lot of cases, loops are easier to read.
I've been really happy with C++ lambda's lately.
Would be nice to see Rust's Option<T> and Result<T,E> ported over to C++ since I find that error handling extremely elegant compared to most C++ approaches.
Already done (kinda). There is no match exhaustion, but C++ is getting an option type. (Part of the library fundamentals TS). http://en.cppreference.com/w/cpp/experimental/optional
Option has been around for a while as boost::optional or now std::experimental::optional. Result is coming and has a reference implementation as boost::expected. There's also a type-erased option type, "any".
They're quite nice, especially with a few trivial additions for ergonomics.
Yup, except you lost me as soon as you said "boost". I enjoy my fast compile speeds very much. :)
I work on a project that's a mix of F# and C++ and my code tends to break down very similarly to this.
The actual algorithmic lines are very similar in size and readability but when it comes to defining your data structures, F#/OCaml clearly wins.
The author of the website(Jon Harrop) is a very good writer and though his books are expensive I'd recommend them to anyone learning OCaml or F#.
Specifically F#/OCaml for scientists is awesome( it comes in variants for both languages) as is F# for Numerics.
Yea, c/++ tends to have a memory oriented way of defining data structures. Sometimes this is great, but most of the time there is a lot of mental translation and work.
The benefit of this comparison is not that the languages have nearly identical performance, but that OCaml can be briefer while accomplishing the same task, with 47% less lines of code.
Saying nothing about comparing LOC in an imperative/OO language to a functional one, does the brevity actually help a reader's understanding of the code at all? It seems to me that a lot of the comparisons call out descriptions of explicit actions in C++ where OCaml does the same action implicitly.
That seems like a language trade-off more than a feature.
Most people just consider the raw execution time to measure performance. In long-term applications that really matters.
In usual applications however most people don't realize that the development time should also be considered. In that sense a program written in a few lines of Python could actually outperform a super optimized C++ program because it usually requires more time and effort to code the same thing in C++.
From article:
> With two cross-overs, there is little difference between the performance of the OCaml and C++ implementations. Note that the C++ was compiled with CPU-specific optimizations whereas the OCaml binary will run on any 486 compatible.
Ok, so the author is suggesting numerical C++ compiler generated SSE2+ code is comparable to OCaml 486, and thus with only very slow stack based x87 FPU available? SSE beats x87 hands down even without vectorization enabled, using a single float/double (scalar) per XMM vector register! That should make a significant difference in any ray-tracer worth 2 cents.
I didn't bother reading the source code, but I think it's very likely he didn't write C++ code like you should for a performance sensitive numerical application.
Although maybe the code for scene graph, bounding box and hit test was bad? FPU performance starts to matter once you actually find the ray intersecting object...
Anyways, usually if you do use C++, you do so because of specific performance requirements. Not to have the shortest or cleanest possible implementation.
The author seems to not know enough about what he is talking about. For ex. he claims that the performance of passing a struct Vec { double x, y, z; } by value would be very poor, which is trivially not true, and he uselessly wrote some constructors for this and similar POD structs, even explicitly claiming they are needed in C++ (and not in OCaml, implying superiority of the latter on the conciseness criteria)
The compiler would have inferred the same constructor code ?
You can omit the OO in C++ and achieve a similar code size. If that is a reasonable metric in the first place, is a question the article doesn't address.
I'm surprised the performance of the two implementations is so close. Aren't floating-point numbers in OCaml usually boxed? Is the compiler smart enough to optimize away the boxes in this situation?
The floats in the structs are unboxed. Float arrays and all-float structs have a special representation[1] in the compiler so they are stored unboxed. I believe the developers of OCaml have expressed some regrets about this, but in this particular case it's likely to be a win.
Within functions, the OCaml compiler can assign floats to registers, so those would also be unboxed at least for the duration of the function.
[1] https://realworldocaml.org/v1/en/html/memory-representation-...
One can do a ray tracer in O(1) at runtime in C++ ;)
https://github.com/phresnel/metatrace
Performance 100%!
OCaml is a really clean language though!
Amazingly, there is no easy way to define a type in C++ which can be "one of these or one of these", e.g. a sphere or a group in this case.
The author may have assumptions I'm missing here but the standard way to have a "one of these or one of these" thing is to have two classes inherent from a given abstract base class and use a pointer to the base class.
If one wants the device specifically as a type, one could wrap the base-pointer in a class.
Is there something I'm missing?
https://programmers.stackexchange.com/questions/231092/is-pa... points out some differences.
Some of the things that make me miss OCaml variants/sum types when I have to use C++ subclassing/dynamic dispatch instead: * very lightweight syntax both for creating them and matching on them (and lightweight representation; variants without arguments are just ints under the hood) * compile-time checks on exhaustiveness in pattern matches (so no unexpected NULL's!)
Note that OCaml also allows open/polymorphic variants for the cases where you want to allow expanding the list of options – here the compiler will only check that e.g. a function that matches on only `A|`B won't be sent a `C, although it can be sent a member of a type that contains only `A's. This is typically used for public interfaces where you want to let your implementation expand later (e.g. your current function accepts `UTF8|`UTF16 but you might later decide to accept `WTF8 as well).
> The author may have assumptions I'm missing here but the standard way to have a "one of these or one of these" thing is to have two classes inherent from a given abstract base class and use a pointer to the base class.
I think this is what the author describes in the sentence that follows the one you quoted. He also calls it "a common (ab)use of object oriented programming."
So it's reasonable to define data types in this way in OCaml, but it's abusing OO to do so in C++? Given that classes are data types, that seems... less than consistent.
I would love to see this updated using C++14.
Totally. I've been mulling over the idea of porting it. Would be interesting to see how the performance compares vs the older style.
Between C++14 and some template usage, I'd imagine the LOC could be reduced a lot.
And now's a perfect time to reflect on how the two languages evolved during the past decade. C++ obviously moved forward quite a bit -- cleaning up the loop constructs in particular.
OCaml? Uh... there's F# for realists, Scheme for optimists, and Haskell for pedants. If you're actually writing OCaml web apps in OWebl I'd sure love to know why. Amazon released a native C++ SDK for AWS if you're into pain. If you're using "OPAM" to install the OCaml AWS client... wow.
Let's see: opam as you mention. Loads of new libraries[0]. Lots of small evolutions in the language in OCaml 4 [1] [2]. Lots of work on optimization in ocamlopt. Aarch64, POWER7 & POWER8, replacement ARMv7 backends.
OCaml is totally a practical language for writing real world applications. As I usually point out in these threads, Red Hat pay me to write OCaml programs[3], that are used by thousands of customers and many more free software users worldwide. The advantages (over other languages, not C++) are: easy linking to C libraries, and builds a native binary with no extra dependencies, speed. Advantages over C++: safety, robustness, compact source code size.
[0] https://opam.ocaml.org/packages/
[1] https://ocaml.org/meetings/ocaml/2014/OCaml2014-Leroy-slides...
Looks like solid work all around. But if we could weigh it relative to the advancements in C++ the last decade, it wouldn't even show up on the scale.
Red Hat could pay someone to do exactly what you're doing in many other languages with the exact same results. The only difference would be that the language in question would show up on the StackOverflow list of "most loved" or "most wanted" languages. Rust shows up in 50th place with less than 0.2% share and OCaml doesn't even make that list (TIOBE 9/2015).
Sounds argumentative, but is a genuine question: why have you chosen OCaml for this? What is so remarkable about it for this use case that 50+ more popular languages wouldn't do it?
Obviously there are downsides, so what is the upside?
Did you read my reply? To quote: "easy linking to C libraries, and builds a native binary with no extra dependencies, speed ... safety, robustness, compact source code size" There are certainly many more popular languages, but few with that combination of advantages. The project started in 2009, so you should discount any languages which were not mature before that date.
There are a few people at Red Hat and outside who contribute. To my knowledge none had prior experience with OCaml, but it's pretty easy to pick up.
even in 2015 i can only think of D, go and haskell that would work as viable alternatives.
You write OCaml for Red Hat programs that customers use? Or you write OCaml for Red Hat customers' programs? Either way, care to share any examples?
See rwmj's reference [3].
Scheme and OCaml don't have that much in common really (unless you mean derivatives like typed racket, but even then).
OWebl, in spite of its shiny website, is not what I would call "popular". You want to look at opium[1] or ocsigen/eliom [2].
Don't know what you have against OPAM, it's the best language-specific dependency manager I know.
1: https://github.com/rgrinberg/opium 2: http://ocsigen.org/eliom/
My problem with OPAM is that there's no Windows port, which makes developing OCaml programs on Windows difficult at best. In theory, we can compile all of the dependencies by hand, but in practice, the Oasis build scripts refuse to work on Windows for most packages that I've tried.
Personally, I would love to use OCaml for my company's projects, but the lack of true cross platform tooling makes it impossible at the moment.
That's a fair criticism. I think there is also an ongoing effort to make OPAM work on Windows, so this may get better in the future.
My only problem with OPAM from the precisely eight seconds I've spent looking at it: 1) unsigned packages 2) not named something goofy like "OPamL." Serious missed opportunity there, guys.
I think the daily quota of puns had been reached on that day. As for signing packages, I think it's being worked on, I remember reading something about it not that long ago.
Hum, OWebl doesn't exist. The author made some buzz because he's decent at marketing (and much better at it than the rest of the OCaml community, clearly), but it doesn't mean it's a thing.
There are other (good) web frameworks in OCaml (ocsigen, opium).
Yes, please open up the code, notice how the implementation of HTTP handling is approximately 30 lines long, and ask yourself if you would actually want to use that. ;)