Land ahoy: leaving the Sea of Nodes
v8.devAs a former member of the V8 team I found this very interesting. (I was pre-Sea-of-Nodes and mostly not working on the compiler.)
Key quotes:
most instructions end up being on the effect or control chain, which imposes a strict order on the operations, and completely defeats the purpose of Sea of Nodes.
and
It’s hard to estimate the exact impact of this cache unfriendliness on memory. Still, now that we have our new CFG compiler, we can compare the number of cache misses between the two: Sea of Nodes suffers on average from about 3 times more L1 dcache misses compared to our new CFG IR, and up to 7 times more in some phases. We estimate that this costs up to 5% of compile time, although this number is a bit handwavy. Still, keep in mind that in a JIT compiler, compiling fast is essential.
I'd love to hear how the new optimizing compiler (Turboshaft) differs from the old one (Crankshaft). It's true that nobody managed to make Crankshaft work with try-catch, which was a huge gap in its coverage.
Thanks :)
Turbofan differed from Crankshaft in two key points IMO: 1) the IR changed to SoN instead of CFG (at least for part of the pipeline; the backend remained CFG), and 2) the whole pipeline was redesigned, in particular with high-level node (JS level), mid level nodes ("simplified" nodes) and low-level nodes ("machine nodes") (all of these are SoN), then followed by the CFG nodes which we just call "Instructions" and are closer to actual machine operations. If I'm not mistaken, Crankshaft didn't have this deep(ish) pipeline and different IR levels.
With Turboshaft, we're keeping Turbofan's pipeline (while allowing a bit more flexibility in mixing high and low-level nodes), and "just" using CFG instead. So basically, the current pipeline is something like - in Turbofan (but we're currently moving it to Maglev): graph building, feedback-based optimizations, load elimination, escape analysing, typing, representation selection - in Turboshaft: lots of lowerings, peephole optimizations, store-store elimination, allocation folding, load elimination again, some simple escape analysis, then instruction selection - then we join back the old Turbofan pipeline and do register allocation and code generation on a CFG-based IR (and this was always like this in Turbofan).
The Turboshaft IR is very compact and is based on copying the whole graph entirely during each phase (which can only be efficient if the IR is compact, and only makes sense if enough nodes are updated in the process, which is the case in the backend). In the frontend, we want a bit more flexibility and the ability to update the IR in-place, so we'll use the Maglev IR, which is also CFG, but it's not so compact, and it's easy to mutate the graph in-place over there.
I hope that this is useful information; I can ask some colleagues from the Crankshaft era to chime in otherwise :)
Darius
Thanks. I'm curious about the internal representation. Is Turboshaft using literal SSA names for sources and destinations, or are destination implicit somehow?
Comment on this from Cliff Click https://github.com/SeaOfNodes/Simple/blob/main/ASimpleReply....