Project Valhalla, Simple as it can be, but not simpler
cr.openjdk.orgThe shift in valhalla's direction reminds me of the initial challenges i faced when experimenting with its early prototypes. i distinctly remember grappling with the introduction of q-types and v-bytecodes and trying to align it with my existing understanding of the jvm. it's interesting to see how the design is evolving towards expressing struct-like values as normal java classes. from my own tinkering with valhalla, the convergence towards a more streamlined approach without losing out on performance seems promising. curious to see how these changes will play out in real world applications, especially considering the nuances of java's memory model and optimization techniques.
From the start I was wondering if there's no way for the JVM to solve this with less cruft on the bytecode level. Of course, that doesn't mean I was aware of the challenges, just an intuition.
I do think they must have had similar intuitions in the beginning. Then identified big challenges with that and that what would be needed to overcome it more streamlined is some set of features or optimizations. It seems though that these emerged in isolation? At least to cross the threshold to saying "there's a way".
Seeing how long Valhalla has been in development, I'm curious how things went the way they did. The article mentions hindsight but that alone doesn't explain the change in direction.
There is always the cheap way out - generate garbage bytecode and leave it to the hotspot team to clean up the mess with intrinsics. Sort of like the vector API is implemented with objects, but obviously no one wants allocations in the middle of tight vector loops, so HotSpot just treats them as a special case.
This was not so much a „change in direction“ but rather having a guiding model which assumes the worst case. But by now enough evidence has emerged that the same performance can be achieved with object descriptors.
Discussion on the mailing list: https://mail.openjdk.org/pipermail/valhalla-spec-experts/202...
Well I suppose that if you treat value types like a variation of an object reference rather than a variation of an object, then it would make sense that they cannot be mutated (since neither can object references but only their targets), with the added wrinkle of allowing such mutation during value constructors. To me this still feels less general than C or Go structs, which can have individual fields mutated in place (and even take the address of stack variables and pass pointers around, though that's unsafe in C and may require heap/GC allocation in Go unless escape analysis proves otherwise).
I haven't read the article fully, but can anyone please explain why value types seem so much more complicated in the JVM than DotNet, where they have had it for a long time?
Value types was in the CLR(_Common_ Language Runtime) from day one as the runtime supported (variations of) C/C++,etc. Iirc proper generics didn't appear in version 1 of C# but since the foundation was there generics could become an addon in terms of new collection classes that could be properly parametrized.
Java generics took a shortcut by reusing the old classes and just layering them on in the language but erasing all useful information to the runtime, and this is what's biting them in the back now when practice has shown that flat memory layouts is highly beneficial in terms of performance as CPU speeds has outstripped memory latencies.
A List<int> in C# will be 2 objects, the List object and the underlying int[] array of sequential numbers, a Java List<int> will be 2+N objects, the List object, the object[] array and N boxed int objects, and the if N int objects are scattered in memory then traversing the integers will be far more expensive due to memory latencies.
Generics were also already on an experimental branch by the time .NET 1.0 was released.
Microsoft decided not to delay the release waiting for them to get ready.
Don Syme of F# fame has a couple of blog posts with the history of generics in .NET, as he was part of the original design team.
Regarding List<int> and specializing it to an int[] has more to do with generics than value types. The most important semantic info about a value type is not having identity, that is modifying it is not observable from another thread that were holding the same value previously.
This alone lets one do things like freely copy/share/modify them, that directly allows for flattening.
In theory you're right, in practice however the details of the JVM instruction set and the existing generics system throws wrenches into it.
The JVM has instructions like iload, aload, dload,etc (integer, reference, double) to load values from stack slots, the simple instructions were probably chosen to support direct HW execution that was attempted for various embedded Java scenarios. They were unwilling to extend the instruction set back in the day when generics were introduced due to these type specific instructions and they just decided that everything generic could be an object.
The CLR(.NET/C#) equivalent of iload,aload,etc is ldloc , a single instruction whose type depends on the stack slot, generic classes "just" need to specialize the type of slots,etc to have a specialized class, be it a primitive, reference or value.
What they're doing now according to the article is to make aload and the a- family of instructions behave more like the CLR counterparts by adding slot flags, analysis and optimizations to detect "value-cases", so List<int> would be List<Integer(notnull)> and then optimized back down to List<int>.
It's basically a hack upon the hack because they want to preserve backwards compat to the previous hack. If they can get the engineering sorted then sure, I'm mostly glad I won't have to maintain any code related to all this machinery.
And I'm quite curious about how the JVM-lang ecosystem feels about this, maybe they're open to it if it solves backwards compat under the hood for them also, but I wouldn't be surprised if we're going to see pre-JVM 25 versions in addition to post JVM 25 versions of JVM langs in some cases.
In C#, value types and reference types are two separate kind of types, value types are pass by copy (like C structs) and generics where clauses have coloring (struct constraints [1]).
In Java, value types are immutable so the VM can pass by value or by copy, as a user you don't see the difference and there are backward compatible (per the article), you don't need to recompile the user code when you replace an existing class by a value class.
C# has value types since the beginning so primitive types are value types while in Java it seems there will be 3 kinds of types instead of 2.
[1] https://learn.microsoft.com/en-us/dotnet/csharp/programming-...
Besides what everyone else is describing, people forget that JVM was designed for Java only, while Common Language Runtime was designed as a polyglot runtime, including support for C and C++, and to this day C++ is officially supported, even if only on Windows (as Visual C++ is the only production compiler with a CLR backend).
I’d say that it’s backwards compatibility.
My understanding is that moving from 1.1 to 2.0 (that introduced reified generics) required work from developers of libraries.
It was also done in relative infancy of the ecosystem.
But I trust some c# oldtimer to tell me how it actually was as I only remember that some tools I used insisted to have older .net runtime installed.
Value types (structs) were in C# 1.0, and they’re used e.g. in native code interop.
Java generics use erasure, and they are backwards-compatible with non-generic-using code. You can still say `var l = new ArrayList();` in the latest Java versions; you’ll get a compiler warning, but the code will compile and run as well as code using `ArrayList<object>` would. C# uses reified generics (which are faster, saner, and more expressive), and standard collections exist in two namespaces (System.Collections vs System.Collections.Generic). If you needed to work with legacy code that uses the non-generic types, System.Collections.Generic.List<T> implements System.Collections.IList (but the code would need to be smart enough to demand the IList interface instead of the concrete System.Collections.ArrayList implementation).
> reified generics (which are faster, saner, and more expressive
I wouldn’t go as far to claim that, e.g. it is often claimed to be the reason behind why the JVM has a blooming language ecosystem, while the CLR, not so much.
Generics have language-level semantics and they may decide to do it differently, in which case erasure gives better results.
The way generics are implemented is definitely not the reason why the CLR has a non-existent language ecosystem. The real reason is because .NET Framework for years was a Microsoft/Windows-only thing, and is still perceieved that way despite .NET being cross-platform now; yes Mono existed since 2005, but why would anyone invest time in writing a whole new programming language for a platform that was Windows only until 2016? This, despite the technical facilities that allow for multi-langauge implementations in the CLR. All the langauges that support the CLR are Microsoft developed ones: C#, VB.NET, F#, and C++/CLI, the last one still being Windows-only. Even then VB.NET and C++/CLI exist because Microsft internally needed to support old code for a bunch of already existing projects at Microsoft anyway.
Also generics in the CLR isn't mandatory - you can implement a language without buying into the CLR-way of generics. For instance in C++/CLI, you can mix and match templates with CLR generics, but it's in no way mandatory. You can still write C++/CLI code using C++'s native template system: https://learn.microsoft.com/en-us/cpp/extensions/generics-an...
Which languages that are built on top of the JVM would have significant issues if generics weren’t implemented via erasure? I’m pretty sure Scala would be happier with reified generics. I think the CLR might not be as popular a target because of the Microsoft ties (and the main implementation being Windows-only and closed-source for most of the CLR’s existence).
Scala actually was implemented for the CLR, but was later dropped. Here is what its creator said: https://news.ycombinator.com/item?id=14179881
A short way to describe this: you have to throw away all high-level type information in order to execute code on real machines, so the choice becomes when to throw it away.
CLR languages throw it away in the runtime, JVM languages throw some away in bytecode, and Haskell throws it all away in the compiler.
A non-obvious difficulty is that on the bytecode level constructor invocation are "special", i.e., a separate bytecode command has to be used to call it. Therefore, introducing separate instructions for value types would have required recompilation (or rewriting at startup) of user code that calls constructors of value classes. This would have been an ongoing migration issue and a potential source of bugs, not just because of Java Core legacy like the primitive wrapper types. Fortunately, it turns out that the JVM can be adapted to smoothly hide the difference behind the scenes.
First, .NET structs are flattenable but they're not value types, which are immutable and so pose further problems. But the two main challenges for adding any new big feature to Java are forward compatibility and simplicity.
1. Forward compatibility: While backward compatibility means that old code continues to run and so is easy to do (add the new feature "on the side"), forward compatibility, or migration compatibility, means making it easy for old code to take advantage of new features with little or no change.
2. Simplicity: For every feature (added to help, say, efficiency) that makes an advanced developer happy you risk scaring away ten less-advanced developers. Complex languages are also rarely taught as first languages, which makes it very hard for them to reach or remain at the very top of the most popular languages. Eventually every language faces requirements that demand a new feature that makes the language more complex, but if great care is not taken to control the added complexity as much as possible (or even give up on the feature as not being worth the cost to complexity), the languages eventually faces a threat to popularity.
While Java has always cared about these two, .NET not so much (maybe they're right not to, but in any event that's a real philosophical/cultural difference between the two platforms). For example, to make simple blocking enjoy the scalability benefits of async/await you need to change a lot of things; that wasn't acceptable for us. And MS has always been a fan of rather complex languages (with the exception of VB); as early as the late eighties MS was "the C++" company.
Now, to be more specific, one of the big challenges of value types is object initialisation, which is what John's article is primarily about. If you create an array of a value type, the elements must be initialised to some value which isn't null. How do you express that in a way that is relatively forward compatible, simple, but also efficient? Furthermore, as John points out, even for value classes that don't admit a "zero" default value, how do you express the constructor for an immutable type that can be flattened? It's easy to do this for reference records because reference types are always "published" to readers with an atomic write of the pointer, but value types (which, again, are immutable) need to initialise their fields atomically even when flattened.
BTW, generic erasure is not a big challenge. While value types will eventually (in a later phase) require specialised generics (because you want, say, an ArrayList of a complex number value type internally use a flattened array), adding that is not a huge problem for forward compatibility and simplicity because value types are invariant, i.e. they cannot subclass nor be subclassed, so the complexity and forward compatibility issues that reification brings to variant types aren't as bad (the problem with reifying generics that can be variant is that you need to bake a particular language's variance strategy into the runtime itself). Erased generics have so far helped Java much more than they hurt it because we keep seeing their forward-compatibility benefits over and over (including for future features that we're thinking about). They also help simplicity because they make the runtime an attractive compilation target for complex languages that may draw those who prefer such languages while keeping them on the platform and at the same time reducing the pressure on the Java to add complex features that could threaten its popularity among the majority who prefer simpler languages.
IMO it's worth keeping in mind that an "actually immutable" guarantee is pretty hard to enforce for value types without being willing to set aside performance, especially for interop.
At any point if you make the raw bytes accessible to someone, even if it's just to copy them somewhere, it becomes possible to mutate an immutable value and users will eventually take advantage of it, if only because they came up with a really good reason why they need to (like for example, initializing readonly values in a deserialization function).
I don't think .NET could have ever adopted true immutability for structs because the rules would be so easy to break. There are "readonly" fields but they're at best a railing to keep you from falling off - there are trivial ways to bypass those protections.
I do think immutability can be a valuable property though so it's cool to see the Java folks doing the hard work to execute on it.
> IMO it's worth keeping in mind that an "actually immutable" guarantee is pretty hard to enforce for value types without being willing to set aside performance, especially for interop.
Well, there's the issue of tearing for values beyond 64 bits, which users will need to accept to flatten those types (although it only occurs in the face of a race, i.e. a bug in the program, anyway), but Java's native interop is quite careful. As a general rule, Java code can manipulate "native" memory but doesn't hand out pointers to Java objects to native code (don't forget, we like our moving collectors).
Also, only a small minority of Java programs call native code outside of the JDK itself, and the new FFM API, which replaces JNI and is generally safer, separates safe and unsafe parts: https://openjdk.org/jeps/442. The unsafe parts require an opt in.
The goal -- which we're quickly approaching -- is that 1. no "integrity" rule will be possible to break without an explicit opt in (what we call "integrity by default"), even through native interop and 2. that only a small minority of programs (~1%) will need to opt out of integrity: https://openjdk.org/jeps/8305968
> as early as the late eighties MS was "the C++" company.
Which to me was always ironic that they get perceived like that, given that C++ was born at Bell Labs alongside C and UNIX, and CORBA was born on UNIX, IBM and Apple were equally pushing for C++, and Borland always had better C++ tooling (to this day even as Embarcadero) than Microsoft has done during the last 30 years.
Now it is certainly true that WinDev is pretty much a C++ shop to detriment of anything else, including .NET.
I understand it's in jest but I don't think Einsteins version "blew it". The simplified (and definitely catchier) aphorism misses to explain what the criterion for enough or not enough is.
A little bit off topic but this is another challenging Project Valhalla:
https://www.youtube.com/watch?v=XL2zzFaybdE&ab_channel=EduMa...