Ahead-of-Time Compilation

bugs.openjdk.java.net

128 points by hittaruki 10 years ago · 94 comments

Reader

KMag 10 years ago

This is great news, though it initially only supports Linux x86-64 and is decades late for Java desktop apps (and not having non-blocking I/O until Java 1.4 was shameful for a language explicitly targeted and a pervasively networked ecosystem.)

In their "tiered mode", they put sampling instrumentation into the native code, and if they detect a hotspot, regenerate fully instrumented native code from bytecode using the C1 (fast) JIT, which then allows the C2 JIT to do its full optimizations on the code as if AoT were not involved.

Since the invention of tracing JITs, I've often wondered why languages don't package together a compact serialized SSA form such as LLVM bitcode or SafeTSA along with storing functions as lists of pointers to space-optimized compilations of extended basic blocks (strait-line code), similar to how some Forth compilers generate threaded code. A threaded code dispatcher over these strait-line segments of native code would have minimal overhead, and when a simple SIGPROF lightweight sampler detected a hotspot, a tracing version of the dispatcher could collect a trace, and then generate native code from the visited traces using the stored SSA for the basic blocks.

In this way, they'd have a light-weight tracing JIT for re-optimizing native code.

vidarh 10 years ago

You might be interested in looking at Semantic Dictionary Encoding [1]. It was professor Michael Franz' PhD thesis work. Franz' was Andreas Gals advisor on his thesis on trace trees.
SDE didn't propose starting with SSA, but could easily work with an SSA representation. SDE basically functions as a compression mechanism for an semantic IR that builds a dictionary on compression/decompression reminiscent of LZW. So instead of storing straight byte code, you store a compact higher level representation, that could very well be SSA, that is structure for you to generate code while "decompressing" it, and reuse generated code fragments as "templates" for later fragments.
An implementation was built in Oberon, compact tree representation (you could do a DAG with some adjustments) that mirrors your code generation orderand e.g. used to support PPC and M68k from the same "binares" in MacOberon. The way it was structured makes retaining arbitrary higher level structure of the programs very straight forward.
I keep wanting to do something with SDE, but life keeps intervening... I see it as a huge shame that more work didn't go into exploring that alternative to straight up bytecode, but it basically had way too little head start on Java, and I believe Franz' moved to Java for his subsequent research on code generation.
[1] https://en.wikipedia.org/wiki/Semantic_dictionary_encoding
- KMag 10 years ago
  
  Yes, I'm familiar with SDE, but thanks for mentioning it. The SafeTSA I mentioned was one of Michael Franz's later contributions to the field. SafeTSA was an SSA representation capable of expressing all of the security and other semantic constraints of the Java language. Michael Franz's group took the Jikes RVM (then known as Japapeno) and added a second front-end to the JIT that could read SafeTSA, so they could test performance of programs running Java bytecode and SafeTSA in the same process. SafeTSA both took less time to go from bytecode to native code, but also the resulting native code ran faster.
- amelius 10 years ago
  
  Interesting. Do you have a link to the thesis? The link on wikipedia seems broken, and Franz's homepage doesn't seem to contain a link.
  - cybernet 10 years ago
    
    The name of the paper is "Code-Generation On-the-Fly: A Key to Portable Software" just search for it.
alblue 10 years ago

LLVM bitcode is still architecture specific - for example, whether the code is 64 bit or 32 bit will result in different bitcode paths.
You may be interested to look further into Eclipse OMR, which is a generic VM used by IBM for many of their runtimes (including J9). The Testarossa JIT support landed last week, and although it doesn't support bitcode form directly there are optimisations that can be used to transform the static parts of the class from the dynamic parts, to facilitate loading. There is an IL for the JIT and interpreter use.
https://developer.ibm.com/open/omr/
- KMag 10 years ago
  
  Thanks for the pointer!
  I (and others) have noted that for more than a decade, it seems that Java would have been better off under IBM than under Sun/Oracle (SWT vs. Swing/AWT, jikes vs. javac, Jalapeno/JikesRVM vs. not much interesting research until Graal, etc.) It's really a shame IBM didn't buy up Sun's Java intellectual property at fire sale prices.
nwmcsween 10 years ago

This could be taken even further, if the IR can hold about effects and purity, etc you could potentially optimize across libraries and binaries.
pjmlp 10 years ago

> decades late for Java desktop apps
Commercial JDKs always offered AOT compilation, the problem is that people nowadays apparently don't buy compilers anymore unless forced to do so (e.g. embedded, consoles...).
- Asooka 10 years ago
  
  Those are priced for people who already made a big investment in writing their application in Java and now realise they need features not present in javac. If you're just starting out, it can very well make more sense to use Microsoft Visual C++, which costs less than a commercial Java compiler and comes with an IDE that's light years ahead of anything available to Java developers.
  Desktop Java also had many other problems, which can be summarised as "the JVM is its own OS". You can't write an application in Java that has a native look and feel. Or at least you couldn't for the first several significant years of its life and even now I don't think there's a good story for writing a simple native application. Meanwhile you could grab wxWidgets or Qt (and there goes your budget for a java compiler) and have a native-looking cross-platform application. Which very few did, because back then Mac OSX didn't exist, Apple were on their death bed and "Linux Desktop Environment" was even more of a joke than it is today.
  So yeah, it didn't make any bit of sense to develop Java desktop apps given that you already had a large pool of proficient C++ developers, the only platform you cared about was Windows and Java GUI libraries insisted on reinventing their own look and feel. Oh and you could always just buy Delphi if you didn't want to suffer C++ (again, for a fraction of the price of a commercial Java compiler).
  Nowadays people wrap a bunch of javascript in an electron instance, but this only happened after the web took off and nobody really looks at native desktop apps much. If this AOT work can give us fully contained native executables that we can distribute without having the user install Java and with significantly better performance than nodejs, maybe Java on the desktop can still happen.
  - mike_hearn 10 years ago
    
    That's not correct. The first UI toolkit Java had was AWT and it mapped through to native widgets. AWT was not very successful because it tried to be cross platform rather than a direct mapping of the Windows UI toolkit, which was significantly more advanced in that era than its competitors MacOS Classic and - most problematically - UNIX workstations, which had truly miserable UI toolkits. So AWT was limited to the lowest common denominator and trying to abstract UI libraries didn't work very well, the abstraction was leaky.
    So for the first few years of Java's existence developers were given native UI, and said no, actually, we don't care if we have a native look and feel or not - for the kinds of line-of-business apps they were writing a powerful and consistent toolkit was more important than one that looked the right shade of grey. Hence, Swing.
    Nowadays if you want to write a small, pure native Java app with native widgets you can do it with SWT and Avian. There's an example here:
    https://readytalk.github.io/avian/
    It demos all the features available in SWT with a 1mb download that's fully self contained. You still have the problem of leaky abstractions and SWT apps don't look entirely normal, as some more complex widgets still need to be custom, but it's another attempt at AWT that works significantly better as MacOS and Linux closed the gap with what Windows could do, so you can have a richer abstraction.
  - pjmlp 10 years ago
    
    > If you're just starting out, it can very well make more sense to use Microsoft Visual C++, which costs less than a commercial Java compiler and comes with an IDE that's light years ahead of anything available to Java developers.
    Sorry but you are way wrong.
    I do consulting in Java, .NET and C++ eco-systems, and started using C++ back in 1993, when C++ARM was the only reference for a possible future standard.
    The only C++ IDE that for many years could match the Java ones is C++ Builder.
    Visual C++ only started to match C++ Builder now with the C++/CX + XAML, for the WinRT applications.
    And while Visual C++ debugger and code navigation are quite good, they still don't rival Java IDEs or even their own .NET experience, without installing something like Visual Assist or ReSharper C++.
    > You can't write an application in Java that has a native look and feel. Or at least you couldn't for the first several significant years of its life and even now I don't think there's a good story for writing a simple native application.
    Sure you can, but developers seem not to like to read books, so they just write crappy Java desktop applications without learning how to use Swing.
    https://www.amazon.de/dp/B004Y4UTHM/ref=dp-kindle-redirect?_...
    > Meanwhile you could grab wxWidgets or Qt (and there goes your budget for a java compiler) and have a native-looking cross-platform application. Which very few did, because back then Mac OSX didn't exist, Apple were on their death bed and "Linux Desktop Environment" was even more of a joke than it is today.
    We were targeting UNIX with Motif++ back in those days.
    Regarding Windows, OWL and later VCL were way better than anything that Microsoft produced for C++. Even XAML was initially targeted to .NET.
    As for Apple, we were mainly using Metrowerks with PowerPlant.
    > If this AOT work can give us fully contained native executables that we can distribute without having the user install Java and with significantly better performance than nodejs, maybe Java on the desktop can still happen.
    There are many applications that people aren't aware that are actually compiled with ExcelsiorJET.
    As I said, this generation doesn't pay for compilers.
- KMag 10 years ago
  
  There has also been gcj for a long time, but default toolchains matter a lot.
  - calpaterson 10 years ago
    
    I seem to recall that GCJ had a lot of limitations back in the day - you couldn't use the same standard library, it didn't support newer language features, etc. Wouldn't surprise me if even now GCJ had poor integrations with IDEs and other essential tools to make Java livable.
  - pjmlp 10 years ago
    
    GCJ was abandoned in 2009 as the majority of its devs left to either work on Eclipse's compiler or early OpenJDK days.
    It is still available on gcc, because of its unit tests. Some gcc code paths are only used by gcj.

taspeotis 10 years ago

I can't see anywhere in the linked issue that indicates AOT compilation is coming to Java 9, or even coming at all. The issue demonstrates nothing more than an intent to bring it to OpenJDK, and the issue seems to be very nascent? It was only created a fortnight ago.

Lest the title is changed:

    AOT compilation is coming to Java 9 (java.net)
    18 points by hittaruki 37 minutes ago

chc 10 years ago

The person who created this ticket is an Oracle employee, not some random Joe, so it seems like a reasonable guess that it's something Oracle is planning.
- pjmlp 10 years ago
  
  They are planning it, and there were already several talks, but the roadmap was Java 10 or later.
- mike_hearn 10 years ago
  There was a talk about it last year:
  https://www.youtube.com/watch?v=Xybzyv8qbOc
  The project seems to have gone slower than I expected, perhaps because Chris Thalinger moved to Twitter.
dantiberian 10 years ago

> The extra step of recompiling code at Tier 3 is necessary since the overhead of full profiling is too high to be used for all methods, especially for a module such as java.base. For user applications it might make sense to allow AOT compilations with Tier 3-equivalent profiling, but this will not be supported for JDK 9.
This implies it will be in Java 9 (in a limited fashion).

alblue 10 years ago

Slightly off topic but if you are interested in how HotSpot compiles to native code I gave a presentation at JavaOne:

http://alblue.bandlem.com/2016/09/javaone-hotspot.html

The presentation wasn't recorded but there is a video recorded from a DocklandsLJC event which is on InfoQ:

https://www.infoq.com/presentations/hotspot-memory-data-stru...

avbor 10 years ago

I'm not familiar enough with compilers, but why would an ahead of time compiler perform worse than a just in time compiler in a static language? I think I'd understand if it was a dynamic language, because you can't know the types for sure until you start running the program, but are similar issues present for Java?

yuriks 10 years ago

Yes, Java code relies a lot on devirtualization to get good performance. This is because the language allows every function to be virtual by default. The flip-side is that this make coding styles that make heavy usage of interfaces (arguably good for testing, interoperability, decoupling, etc.) effectively "free" compared to ones using concrete classes.
h4nkoslo 10 years ago

Most of the JIT optimizations amount to "it's been called like this in the past; assume it'll always be & depotimize (at a penalty) if it isn't."
That potentially includes the fully resolved types of objects (ie devirtualization), branch prediction (stronger than the CPU can do; for instance, if a value is only used inside a branch that's never taken, don't bother mutating it), data sizes (this "array" is only ever size 2, store it in registers), dead code elimination (keeps the compiled code small), and a whole bunch more fun stuff.
- catnaroek 10 years ago
  
  > assume it'll always be & depotimize (at a penalty) if it isn't
  Stuff like this makes me nervous. Performance is already a complex topic, and stuff like this makes it even more complex. Unnecessarily so. If we were talking about a very high-level programming language (say, Prolog), you could argue that the expressiveness benefits outweigh the cost of the runtime system's complexity. But Java isn't even as expressive as C++, let alone Prolog.
  > fully resolved types of objects (ie devirtualization)
  C++ (and similar languages: D, Rust, etc.) and MLton (a Standard ML implementation) have been using monomorphization for ages, which is a compile-time analogue of devirtualization. Moreover, monomorphization has important advantages over devirtualization:
  (0) It's completely predictable. You don't need to guess when it will happen. It happens iff the concrete type (and its relevant vtables, if necessary) can be determined at compile-time: https://blog.rust-lang.org/2015/05/11/traits.html
  (1) It's always a sound optimization, so it doesn't have to be undone at runtime under any circumstances.
  (2) It's relatively simple to implement. In fact, a compiler front-end can completely monomorphize a program before handing it over to the back-end for target code generation.
  > if a value is only used inside a branch that's never taken, don't bother mutating it)
  The best way to handle unreachable branches is to avoid creating them in the first place. With proper use of algebraic data types and pattern matching, unreachable branches can be kept to a minimum, or even outright eliminated in many cases.
  > data sizes (this "array" is only ever size 2, store it in registers)
  C and similar languages natively handle statically sized arrays, so there's no need for runtime profiling and analysis just to determine that an array will always have size 2.
  ML does something even better: you just use tuples (in this case, pairs), which reflect your intent much better than using arrays whose size has to be tested or guessed.
  ---
  What I take away from this is that the JVM's supposedly “fancy” optimizations exist primarily to work around the Java language's lack of amenability to static analysis.
  - pjmlp 10 years ago
    
    Hence why we have projects Valhala and Panama.
    However bringing those features to Java and the JVM specification, while keeping existing code running, is a big engineering effort, which is why they are targeting Java 10+.
    I would have like that back in those days, Gosling and his team would have taken the inspiration from Modula-3, Oberon, Component Pascal and many others.
    But better later than never, and regarding Java 10 time frame, I am happy they aren't doing a Python break the world thing.
  - chrisseaton 10 years ago
    
    I think you're ignoring the fact that there is lots of extra information available at runtime that isn't available from static analysis of languages even if they're very amenable to that, and that static analysis can actually give you worse information.
    For example a call site could be statically analysed to be bimoprhic, but then sometimes when you run it the second type is never actually used and the call site can be made monomorphic.
    The same thing applies to branches - they're both possible to use, but often when you run it with real data you only actually use one of them.
    So I don't think these optimisations are primarily to work around amenability to static analysis - they achieve somethign different and actually more powerful.
    I can even give you a real-world example of where static analysis is actually what causes unpredictable performance. There is an implementation of the Ruby language called Rubinius that statically looks at the instance variables in a class that are visible in the source code, and optimises the objects for that many instance variables. If you start to set extra variables dynamically, and so upset this static analysis, performance drops by a half. In the implementation of Ruby that I work on we can see the static references to instance variables in the source code but don't try to do anything based on this - we purely use a hidden class system and let the true number of instance variables emerge dynamically at runtime, and we don't have the same performance drop when you start to set extra variables dynamically (http://chrisseaton.com/rubytruffle/pppj14-om/pppj14-om.pdf)
    I think it's quite a neat philosophy to apply - don't assume anything statically and let the real characteristics of the program's data and control flow emerge at runtime.
    
    catnaroek 10 years ago
    
    > There is an implementation of the Ruby language called Rubinius that statically looks at the instance variables in a class that are visible in the source code, and optimises the objects for that many instance variables. If you start to set extra variables dynamically, and so upset this static analysis, performance drops by a half.
    You can implement an unsound static analysis for any language, and this in fact what Rubinius is doing: it's making potentially wrong conclusions about what instance variables will exist in objects. However, unsound analyses are outright harmful:
    (0) Undoing unsound “optimizations” costs even more performance than was supposed to be gained by optimizing your program. (As you found out the hard way yourself.)
    (1) They lie to you about what your code means! If this isn't bad enough, I don't know what else could be.
    Unfortunately, a sound static analysis of Ruby code wouldn't be able to tell you much, precisely because Ruby allows you to subvert everything at runtime.
    
    chrisseaton 10 years ago
    
    > As you found out the hard way yourself.
    That makes it sound like I implemented it - I didn't - I implemented the alternative mechanism which doesn't have the same problem.
  - the8472 10 years ago
    
    > With proper use of algebraic data types and pattern matching, unreachable branches can be kept to a minimum, or even outright eliminated in many cases.
    Those would still result in branches. speculative optimizations can turn the common case into branch-free code that uses traps in the uncommon ones to bail out. But since those things are often data-dependent you can't just bake them in at compile-time.
psuter 10 years ago

One thing JIT compilers are good at is specializing methods for common (runtime) types; e.g. you have a method operating on Iterable but it turns out most of the time you get a List as input; you can generate code that bypasses the method lookups for .size etc. The question of whether this pays off is usually only answerable at runtime.
- bad_user 10 years ago
  
  Actually that's not a thing most JIT compilers are good at, as such runtimes are hard to build. For the mainstream ones, you only have Java and the major Javascript runtimes.
Scarbutt 10 years ago

A JIT can use information from runtime for optimizations, you pay a cost in startup time. Being worse or better depends on the workloads and implementations.
- CalChris 10 years ago
  
  AOT can as well. PGO.
  - vidarh 10 years ago
    
    Yes, but that locks you in to optimising for whatever you covered with your profiling. If the character of your data changes, it'll take a recompile to change how the application performs, while a JIT can potentially choose to deoptimise/reoptimise.
    I'm all for "as static as possible" toolchains, but there are optimization opportunities you simply won't have with AOT, PGO or not. E.g. consider something trivial: A program doing certain image operations that depends on dimensions passed in on the command line. A JIT could optimise the inner loops for the actual operations. To get the same with AOT even with PGO would be totally unable to deal with it without causing a massive explosion in code size.
    
    pjmlp 10 years ago
    
    Hence why .NET also has MPGO, Managed Profile Guided Optimizations for NGEN.
    https://msdn.microsoft.com/en-us/library/hh873180(v=vs.110)....
    
    srean 10 years ago
    
    In theory yes, but sadly i have never seen that promise realized in any consistent fashion. Same goes for Java's escape analysis. Although the principle is sound i think the engineering required is horrendously difficult to make it robust. In a very narrow window of variation it works, but should you step out of that zone it fails pretty badly. I think it will take many more years, till then pgo and metaprogramming it is.
    
    pjmlp 10 years ago
    
    Regarding escape analysis, beware that the OpenJDK is the worst of them all.
    IBM J9, Graal are much better at it, and I bet Aonix, Azul and other JDKs targeting high performance deployment scenarios are even better.
    
    mike_hearn 10 years ago
    
    Speculative optimisations yield about a 20% improvement in Java and more for higher level languages like Scala (and for Ruby it's off the charts). HotSpot isn't perfect by any means and C2's EA is not strong enough, but it's on track to be replaced with Graal (which is a part of what AOT is all about - you don't want your VM compiling itself at the same time as compiling your app).
  - mike_hearn 10 years ago
    
    PGO in C++ can yield quite significant speedups, however the difficulty of integrating it into a development workflow means that in practice it's hardly ever used. One of Java's accomplishments is that it brought PGO to the masses by making it entirely built in and automatic.
  - alblue 10 years ago
    
    There are a set of optimisations that you can only do at runtime that you can't do with AOT. Anything that depends on data is something that you can't reliably do, such as eliding null checks if you can prove this cannot happen based on the data passed in. There are also cases where you can have multiple subclasses of a type such as a normal and a debug subclass, or multiple drivers for different back ends such as MongoDB or Cassandra, only one of which is used at runtime but you cannot know ahead of time which is selected (for example, it's based on an environment variable or system property).
    The point is that while AOT can do a set of optimisations, including whole module analysis, there are a set which are only available at runtime.
mike_hearn 10 years ago

Almost all languages can benefit from profile guided optimisations, including very static languages like C++

BuckRogers 10 years ago

Why was Java ever JIT'd rather than natively compiled anyway? I hate to stick my neck out and even ask this but I never understood why you'd want to JIT or interpret when you can just natively compile to a binary. It seems like Go has gone "back" to the future on this one and in general their toolchain approach to me looked like the way.

I always got the sense the world is waiting for a statically typed Python that compiles to native code with Go's CPU performance. I suppose Nim might fit that bill but a shame it doesn't have compatibility with Python's or even the extent of a language like Go's libraries. And if possible, an imperative language that interfaces with OTP.

And that said, I can see why Erlang/Elixir wouldn't make as much sense or even work with native code AOT compilation due to it's feature set (thinking stuff like hot code reloading). But I've never grasped why Java or Python were better off with JIT or interpreters than AOT comp. Seems like a type system such as Go's is simple enough and allows for good gains in both CPU performance and memory usage. Add in the fact you don't need to install anything and less to think about in deploying and it seems to be a no brainer. Please feel free to fill me in on this or where I went wrong..

jcheng 10 years ago

For several of Java's early use cases, being able to deploy a single file that could be run on any Java-supported platform was very important.
https://en.wikipedia.org/wiki/Write_once,_run_anywhere
- qznc 10 years ago
  
  Also, a small file and stack based bytecode is often the most compact representation.
  - CalChris 10 years ago
    
    This is true and small memory sizes and limited bandwidth were design assumptions valid in 1992.
j-g-faustus 10 years ago

Because Java was originally designed for set-top TV boxes and appliances, where it's kind of a big deal that you don't need to know or care what OS or processor each appliance is using internally.
https://web.archive.org/web/20050420081440/http://java.sun.c...
When the appliance market didn't pan out, they went for web browsers and Java applets. Bytecodes were a feature because browsers didn't exectute native code, and because it allowed for sandboxing to limit the attack surface.
Even when Java became more popular on the server than in the browser, the "write once, run everywhere" was considered a major feature: The same bytecode could be distributed everywhere; no need to maintain a heap of different build environments for different CPU architecture and OS combinations.
- mike_hearn 10 years ago
  
  I'd say the appliance market did pan out actually. BluRay players all contain an embedded JVM, as do many other kinds of set top box, as do of course all Android smart TVs.
  Abstracting the CPU has worked out pretty well for the Java platform. Look at how easy the 64 bit transition was for the Java world vs the C++ world. Visual Studio is still not a 64 bit app and yet Java IDEs hardly even noticed the change. The transition on Linux was just a disaster zone, every distro came up with their own way of handling the incompatible flavours of each binary.
  In addition, a simple JIT compiled instruction set makes on the fly code generation a lot easier in many cases and it's a common feature of Java frameworks. For instance the java.lang.reflect.Proxy feature is one I was using just the other day and it works by generating and loading bytecode at runtime. On the fly code generation is considered a black art for native apps and certainly extremely non portable, but is relatively commonplace and approachable in Java.
- pjmlp 10 years ago
  
  Plenty of other platforms do support bytecodes, JIT and AOT on the same toolchain.
  So they could keep the WORA story and still offer AOT as an option, which actually most commercial JDKs do.
  Just Sun was against providing it at all on Java SE, but they actually supported it on Java Embedded.
  - j-g-faustus 10 years ago
    
    I was commenting on "why did they design Java that way in the first place", as opposed to (say) Go.
    I agree that once the primary use of Java moved outside the browser, there was no particular reason to not give the option of AOT too. I'm not sure why Sun was so adamantly opposed to the idea.
    If I recall correctly, Sun really wanted to stick with JIT on Java Embedded too, they just couldn't get it to run fast enough on embedded hardware. For desktop and servers, they considered bytecode interpretation and JIT "fast enough".
    
    pjmlp 10 years ago
    
    Sure and actually that is where mobile OSes are moving.
    We now have bitcode on iDevices, DEX on Android and MSIL/MDIL on WinRT.
    Still, both iDevices and Windows Store take, what I consider the best approach, to do AOT on the store for each supported target.
    As Google found out, using AOT on the device doesn't scale. I just don't get why they went back to an overly complicated architecture of Interpreter/JIT/PGO → AOT, instead of following the same path as the competition and serve freshly baked AOT binaries.
pjmlp 10 years ago

Religion.
Talking about AOT compilation at Sun was tabu and I remember seeing a few forum discussions from former employees disclosing this.
Plenty of other platforms do support bytecodes, JIT and AOT on the same toolchain.
So they could keep the WORA story and still offer AOT as an option, which actually most commercial JDKs do.
pvg 10 years ago

Deployment (and fever dreams of 'mobile code'). It's also worth remembering that Java was designed and implemented at a time when the landscape was significantly less x86-centric and Sun was one of the companies on the not-x86 side.
- mike_hearn 10 years ago
  
  The landscape is still not really x86 centric is it.
  Java is old. It's seen a lot of CPU architectures come and go over the years. When it started out x86, SPARC and POWER, were important. Then it saw a mass migration from x86 to amd64 on the desktop and server side, and an explosion in the importance of ARM in mobiles (several flavours).
  Along the way it's seen lots of smaller proprietary architectures come and go too, like the exotic DSP-oriented processors found in BluRay players and pre-smartphone phones and like the Azul Vega architecture that was specifically designed for executing business Java.
  And don't forget that even amd64 is not a homogenous architecture. It adds new CPU instructions pretty regularly and thus can be seen as a long line of compatible but different CPU architectures. Java apps transparently get support for all of them on the fly, without having to recompile the world. You see the benefit when you realise the size of Maven Central ... there are JARs out there that are still useful and good even a decade after they were compiled, yet they still get optimised to full speed using the latest CPU instructions no matter what kind of computer you use.
CalChris 10 years ago

I remember that HotSpot was promising faster than GCC -O2. This sort of over promising was good for presentations to the Schmidt and McNealy types.
- the8472 10 years ago
  
  My understanding is that GCC has a more diverse arsenal of optimizations that it can apply to code while hotspot has the advantage that it can profile at runtime and apply speculative optimizations based on those profiles and bail out later if things change. In principle it can even optimize code that never reaches steady state as long as the transient states last long enough.
  What costs java performance these days is not the quality of the JIT compilers or even the garbage collectors. It's the object layout that is not very cache-friendly. There is lots of pointer-chasing going on since there are no arrays-of-structs.
  Valhalla[0] promises to improve the data layout issue at some point in the future while graal may allow compiler writers to cram some more optimizations into the jits.
  [0] http://openjdk.java.net/projects/valhalla/
  - srean 10 years ago
    
    Indeed. Another issue is that java semantics is too rigidly defined. Compilers of many other languages have a lot more room in deciding order of evaluation, vectorization etc
    
    the8472 10 years ago
    
    Java tries to provide semi-sane behavior even in the presence of data races. Carefully constructed code can give you benign behavior even under racy data accesses.
    If you relaxed some compiler constraints, e.g. allowed it to re-read local variables because they were discarded due to register pressure then those benign races would suddenly turn into non-benign ones.

Cyph0n 10 years ago

Assuming this comes in Java 9, and compilation of code other `java.base` is possible, will this make Java a more solid competitor to Go? I guess it partly depends on how much they optimize the compiled binary size. Go does a really good job at static compilation, so it will be tough to compete.

CJefferson 10 years ago

In benchmarks I've seen, Java already smokes Go on any benchmark except the second or so it takes the JVM to start up.
I'm not aiming this just at you, but I think many people (node.js users in particular come to mind) don't realise just how good the JVM is, performance-wise. I'm not a great fan of Java the language, but the JVM is top class.
- mike_hearn 10 years ago
  
  HotSpot can run hello world in about 50msec, not one second. A lot of people's views of JVM startup time are hopelessly out of date.
  The primary thing people seem to like about Go is that it produces single native binaries. You can do that with Java too (I gave an example of Avian further up the thread), but people don't tend to bother because distributing a single JAR is not much harder and avoids any assumptions about what OS the recipient might have. Go users seem invariably to be writing programs for their own use and Go doesn't really "do" shared libraries, so they don't ever encounter the problem of distributing a binary of the wrong flavour because they don't distribute binaries at all.
  By the way, in Java 8 there's a tool that produces Mac, Linux and Windows standalone packages and installers that don't depend on any system JVM. I've used it to distribute software successfully, although I had to make my own online update system for it. In Java 9 it's being extended quite a bit with the new "jlink" tool that does something similar to static linking ... the output of jlink is either a directory that's a standalone JRE image optimised and stripped to have only the modules your app needs, or you can combine it with the other tool to get a MacOS DMG (with an icon, code signing etc), Windows MSI/EXE (ditto), or a Linux DEB/RPM/tarball.
  This isn't a single file at runtime of course, it's a single directory, but basically any complex native app will have data files and some sort of package too so that's not a big deal.
  - CJefferson 10 years ago
    
    Sorry, my Java startup time is also very out of date it seems. Thanks for the deeper detail.
pjmlp 10 years ago

Java has AOT compilation since ages, here is one example.
https://www.excelsiorjet.com/
Most commercial JDKs do support AOT compilation to native code, and alongside Java library and eco-system, it definitely makes it more than a solid competitor to Go.
The problem is that free AOT compilers never were a big match to the ones from commercial JDKs, and in this day and age, most developers don't pay for compilers unless forced to do so.
So Java AOT compilers are usually only used by enterprise companies.
- Cyph0n 10 years ago
  
  Yeah I'm aware of Excelsior, but it would be cool to see AOT in OpenJDK.
paukiatwee 10 years ago

Java AOT is compile JVM bytecode to native code during startup of JVM, which is different from Go's compile source to native and distribute platform specific binaries. So in this case, Java binary size remain same as before, which is .jar or .war binaries.
For Go, .go -> native
For Java, .java -> .class -> package .jar -> AOT native
For Go part I might be wrong, not working on Go professionally.
- mike_hearn 10 years ago
  
  No, not during startup, AOT can happen at any time chosen by the developer or user: there's a command line tool that triggers it and I believe the plan is to integrate it with the "jlink" tool that produces standalone, app specific JRE images. So you can produce a native installer for each platform.
- Scarbutt 10 years ago
  
  My understanding is that JVM bytecode is compiled to native before startup of the JVM, that is why it is called AOT ;)
- sandGorgon 10 years ago
  
  that is an incorrect assumption. AOT step will include deadcode elimination. In that way - the Java way is a more sophisticated way combining platform independent bytecode and platform-specific machine code.
- bitmapbrother 10 years ago
  
  No it's not. Compilers such as the one in IBM's J9 have a switch setting to use AOT.
Matthias247 10 years ago

Go still has the feature of lightweight threads (goroutines) by default and great communication and synchronization primitives (channels, select) between them. For Java you still have to choose either real threads or from one of dozens of not-necessarily-compatible async IO frameworks (Netty, Grizzly, ...). How much that really matters depends on the type of your application. There's also emulation of Gos concurrency through Quasar - but again it's not as first class as Gos features.
- pjmlp 10 years ago
  
  Java also has green threads.
  The JVM specification doesn't require OS threads, and in the early days all JVMs only had green threads.
  Nowadays you can still get such JVMs, just not the OpenJDK.
- Cyph0n 10 years ago
  
  Scala and Akka blow goroutines out of the water imo.

jgalt212 10 years ago

> Infrequently-used Java methods might never be compiled at all, potentially incurring a performance penalty due to repeated interpreted invocations.

That sort of makes no sense. How can you incur a real performance hit if the uncompiled method is rarely called?

smitherfield 10 years ago

I would assume if the method is rarely called, but very complex or time-consuming when it is called.
Of course, I'm not an expert on JVMs, so I wouldn't know whether their analysis is synchronous or asynchronous or a mix of both.

johnydepp 10 years ago

That's great! Won't it make the compiled executable platform specific?

mike_hearn 10 years ago

You're intended to use it like this: either you distribute JARs and the recipient triggers the AOT compilation if they want it, or you distribute a "jlinked" JRE image that's inherently OS specific because it includes a bundled JVM. It's also possible that a future Java module format will allow the AOT compiled code images to come along for the ride next to the classfiles.
Sanddancer 10 years ago

There are other JVMs that already do AOT compilation. This just brings an AOT compilation option to Java. It also looks like the contained file will include both the compiled version and the normal bytecode, so that the code can be recompiled if/when necessary.
copperx 10 years ago

I believe that's the whole point.
From what I could gather, this is the process one would follow to get native code:
.java -> javac -> .class (still cross-platform bytecode) -> jaotc -> .so native code
jcdavis 10 years ago

the generated .so file will be platform specific, but it seems like the class files will still be needed as usual
rcaught 10 years ago

Write once, run anywhere... you've compiled it to.

my123 10 years ago

IKVM, the Java VM for .NET converted Java bytecode to .NET since a while, and crossgen/ngen can be used for .NET AOT(Mono also has AOT)

haberman 10 years ago

How does this interact with classloading?

My general impression is that the design of classloaders is pretty actively hostile to making JVM startup fast.

the8472 10 years ago

AOT only applies to clases that have been AOT-compiled and are not transformed at runtime. Everything else will either still be need JITing or could potentially throw errors if pure AOT is desired.
AOT and JIT are not mutually exclusive. From the proposal itself:
> AOT libraries can be compiled in two modes:
> Non-tiered AOT compiled code behaves similarly to statically compiled C++ code in that no profiling information is collected and no JIT recompilations will happen.
> Tiered AOT compiled code does collect profiling information. The profiling done is the same as the simple profiling done by C1 methods compiled at Tier 2. If AOT methods hit the AOT invocation thresholds these methods are being recompiled by C1 at Tier 3 first in order to gather full profiling information. This is required for C2 JIT recompilations to be able to produce optimal code and reach peak application performance.
qznc 10 years ago

Yes it is. On the other hand, gcj still did it years ago. I guess some features like dynamic class loading are just not supported.

premium-concern 10 years ago

Is there anything this adds over Scala-native, which seems to be much further ahead already?

edko 10 years ago

I think this could be a great complement to Scala-native. Right now, the project contributors have to spend effort translating the essential Java libraries that would allow Scala-native to be successful. This could really ease that job for them. It could potentially make all the Java code ever written available to Scala-native.
The other thing it adds is the backing of a giant, like Oracle, which can bring stability and peace of mind to some people, when deciding whether to adopt the technology or not.
- premium-concern 10 years ago
  
  I think this assumes that Oracle
  - can ship something in time
  - and that it will be generally available for developers (looking at how hard Oracle pushes their Java department to invent commercial features they can sell, I'm not sure about that)
  Looking at it, I assume that this will go the way of GWT ... not starting from "how can we make Java a good citizen in this new ecosystem?", but "here we have 100% of Java, the JDK and the JVM ... how can we compile this with full fidelity into X?".

saynsedit 10 years ago

not sure why this isn't a transparent feature implemented via caching.

jcdavis 10 years ago

Hotspot's JIT compiled code tends to be pretty specialized based on runtime profiling information, which may not necessarily be similar between different runs even the class itself hasn't changed, or (in an extreme case) even if none of the code has.
Some other JVMs (at least Azul's Zing) try to solve this by cache profiling information to speed up code generation.
- alblue 10 years ago
  
  I believe the ReadyNow technology used by Zing records what methods are compiled at which level, then trigger a compilation of those methods at start up. So you effectively use profiling information from the previous run to inform the next run of what the final target state is, allowing the warm up times to be dramatically reduced.
mike_hearn 10 years ago

It's explained in the talk "Java goes AOT":
https://www.youtube.com/watch?v=Xybzyv8qbOc
Basically they thought it'd de-opt too much. I'm not totally sure it's the case but they'd be the experts on that.

_ZeD_ 10 years ago

it's a gcj comeback?

singularity2001 10 years ago

resurrect me when it's there https://github.com/search?p=3&q=jaotc&type=Code

Settings

Ahead-of-Time Compilation

Keyboard Shortcuts