Three Pillars of Reproducible Builds
fossa.comOne of the most fun non-determinism bugs I have worked on was the result of using an associative container with the key type being a pointer (like a std::map<void*, int> or similar), and then iterating over this container.
Since the order and value of dynamically allocated pointers is non-deterministic, this resulted in diverging behaviour at some point.
Better be sure that all your tools used during the build don't do this kind of thing as well.
With ASLR off, the order and value should be identical between runs on the same malloc implementation, as stochastic allocators are not in common use
Not when multi-threading is involved, I would think. That or timing-dependent code making allocations.
That's true.
These three aren't enough, you also need to take care of not storing build timestamps, hostnames, timezones, sorting and more:
Some of that is mentioned, e.g.
> Build steps that use system time to generate timestamps.
> Builds that change behavior based on currently set environment variables but don’t commit environment variable configurations.
On the JVM, maven doesn’t make this particularly easy.
It’s possible to try to store dependencies locally instead of shared in a global m2 repository, but it’s difficult to stop maven from adding the current time in jars or wars…
It’s as if all the default settings are the opposite of what they should be for reproducible builds.
Any idea if there is a project to try to improve things with maven or with another JVM tool? (Grade, sbt, etc.)
If you have an option to containerize the app, Jib may be what you are looking for. Plugs into Maven, and the same source/content always generates the same image - https://github.com/GoogleContainerTools/jib
And this is the best explanation of Jib [1], but it’s hard to find via Google. It’s how all builds for every ecosystem should work IMO.
> Any idea if there is a project to try to improve things with maven or with another JVM tool? (Grade, sbt, etc.)
We've found SBT to be less reproducible than Maven. In particular, its "configuration file" (build.sbt) is actually executable Scala code (and highly imperative too, e.g. appending to mutable dependency lists). I've seen projects which choose different dependencies based on env var settings, string matches, etc.
I've also seen projects which add pre/post steps to a test suite, for spinning-up and tearing-down a mock database (the dynamodb-local SBT plugin). The crazy part about that, is that SBT only becomes aware of the plugin when it's about to execute the test suite; hence it doesn't appear in any dependency lists, so we can't automatically fetch it ahead-of-time. By the way, that plugin itself works by downloading and running a "latest.zip" file from an AWS URL....
Huawei just published a paper (Towards Build Verifiability for Java-based Systems[0]) on trying to get the JVM ecosystem reproducible. It looks like it's early days, but I'm paying attention.
https://reproducible-builds.org/docs/jvm/ Which links to https://maven.apache.org/guides/mini/guide-reproducible-buil...
Haven't tried this myself as I don't particularly like maven. It should be possible though
How can you discuss this w/o mentioning Nix (or the likes)?
I guess any stubs the compiler adds will also have to be reproducible, big whoop.