Settings

Theme

Unity builds lurked into the Firefox Build System

serge-sans-paille.github.io

65 points by sylvestre 3 years ago · 63 comments

Reader

Dwedit 3 years ago

Note that this is not referring to the Game Engine Unity. It's just referring to #including .cpp files.

  • jb1991 3 years ago

    Indeed, the title almost makes no sense grammatically in its current form, a consequence of the word “how“ being removed. It would be obvious it was not the game engine when the word “unity“ appeared as the second, lowercases word.

10000truths 3 years ago

With the advent of LTO, unity builds are mostly a band-aid for poor management of header files. The Linux kernel project was able to net a ~40% reduction in compilation CPU-time just by pruning the contents of some key header files [1].

It really boils down to two rules:

1. Don't declare anything in header files that is only used in one compilation unit. Internal structs and functions should be declared and defined in source files, and internal linkage used wherever possible. gcc and clang's -fvisibility=hidden is useful here.

2. The more frequently a header file is included (whether transitively or directly), the more it should be split up. If a "common" or "utility" header file is included in 10000 source files, then any struct, function, etc. that you add to that file will have to be parsed 10000 times by the compiler every time you build from scratch, even if only 10 source files actually use the struct/function that you added. gcc and clang's -H flag is useful here.

[1] https://lore.kernel.org/lkml/YdIfz+LMewetSaEB@gmail.com/

  • pm215 3 years ago

    I think "just" is perhaps not the right word for something that took a senior dev over a year and more than 2000 commits just to get to an RFC patchset that doesn't compile for all architectures... Tremendous work, but it clearly wasn't easy or a matter of "follow these simple rules".

  • flohofwoe 3 years ago

    > unity builds are mostly a band-aid for poor management of header files

    That's what its always was about (to improve build times), better optimization is just a welcome side effect. But header hygiene is hard because the problem will creep back into the code base over time.

    > The Linux kernel project was able to net a ~40% reduction in compilation CPU-time

    Linux is a C codebase. Header hygiene is much easier in C, because C headers usually only contain interface declarations (usually at most a few hundred lines of function prototypes and struct declarations), while C++ headers often need to include implementation code inside template functions, or are plastered with inline functions (which in turn means more dependencies to include in the header). And even if the user headers are reasonably 'clean', they still often need to include C++ stdlib headers which then indirectly introduce the same problem.

    For instance your point (2) only makes sense if this header doesn't need to include any of the C++ stdlib headers, which will add tens of thousands of lines of code to each compilation unit. For such cases you might actually make the problem worse by splitting big headers into many smaller ones.

    PS: the most effective, but also most radical and controversial solution is also a very simple one: don't include headers in headers.

omoikane 3 years ago

> This generally leads to faster compilation time in part because it aggregates the cost of parsing the same headers over and over.

But this also reduces the opportunity to parallelize compilation across multiple files because they have been concatenated into fewer build units, and each unit now requires more memory to deal with the non-header parts. For some build systems and repositories, this actually increases build time.

  • simplotek 3 years ago

    > But this also reduces the opportunity to parallelize compilation across multiple files because they have been concatenated into fewer build units (...)

    Irrelevant. There is always significant overhead in handling multiple translation units, and unity builds simply eliminate that overhead.

    > and each unit now requires more memory to deal with the non-header parts.

    And that's perfectly ok. You can control how large unity builds are at the component level.

    > For some build systems and repositories, this actually increases build time.

    You're creating hypothetical problems where there are none.

    In the meantime, you're completely missing the main risk of unity builds: increasing the risk of introducing problems associated with internal linkage.

    • tomjakubowski 3 years ago

      unity builds do often have worse performance than separate compilation for "incremental rebuilds" during development. that all depends on how the code is split up and how bad of a factor linking is.

      as in the article, it's best to support both

  • flohofwoe 3 years ago

    You also need to consider that (at least in C++), your own code is just a very small snippet dangling off the end of a very large included stdlib code block, and that's for each source file which needs to include any C++ stdlib header.

    For instance, just including <vector> in a C++ source file adds nearly 20kloc of code to the compilation unit:

    https://www.godbolt.org/z/56ncqEqYs

    If your project has 100 source files, each with 100 lines of code but each file includes the <vector> header (assuming this resolves to 20kloc), you will compile around 2mloc overall (100 * 20100 = 2010000).

    If the same project code is in a single 10kloc source file which includes <vector>, you're only compiling 30kloc overall (100 * 100 + 20000 = 30000).

    In such a case (which isn't even all that theoretical), you are just wasting a lot of energy keeping all your CPU cores busy compiling <vector> a hundred times over, versus compiling <vector> once on a single core ;)

  • stephc_int13 3 years ago

    On very large projects you can always cut them into several libraries, and compile them on different cores. Quite easy to do in practice.

    • cpeterso 3 years ago

      I believe Firefox builds only unify files within the same directory and a maximum of a ~dozen cpp files per unit. So there are still plenty of build parallelism across directories.

  • rsaxvc 3 years ago

    Not necessarily - I've been prototyping a fork of tcc that does both. It's multi-threaded rather than multiprocess.

stephc_int13 3 years ago

I used Unity builds for my projects basically forever, at some point I discovered the practice had a name and some debates around it.

It is a simple thing to do, and the gains are substantial, faster and simpler, less maintenance, especially across different platforms.

For big projects I simply cut them into several libraries.

I've seen some incredulous reactions, mostly from young coders, and I know that makefiles should be faster, but in practice I never found that to be true.

andybak 3 years ago

"Lurked into"?

You can lurk but surely you can't lurk into something?

  • bragr 3 years ago

    I assume the author is not a native speaker based on some of the odd grammar and phrasing in the post. It doesn't really detract from the work

    Edit: They appear to be french: http://serge.liyun.free.fr/serge/

    • loufe 3 years ago

      The word "lurk" doesn't exactly exist in French. "se rôder" fits in some cases, but another translation "se cacher" (to hide) fits others. I'd write "X crept into Y" as "X s'est glissé dans Y", but that has a connatation moreso as an accidental short-term mistake. I don't know how I'd express the idea concisely in French. Also hard to tell exactly how he wanted to convey it, something between "crept", "were hidden", "were lurking" probably? As I've discovered the hard way, there is not always an analogous term for the same fundamental idea/concept between two given languages; mastering the nuances of these differences is important for proper fluency. I probably make errors like this frenquently writing/speaking French.

    • okeuro49 3 years ago
Y_Y 3 years ago

This used to be mandatory for nvcc/CUDA, if you had multiple source files (not just headers) you had to #include all of them in your main file. It made me very uncomfortable.

thinkling 3 years ago

I’ve been out of C/C++ development for a long time but seem to remember that precompiled headers were a thing back in the day. That approach didn’t have the name space issues pointed out here. Why are precompiled headers not used anymore?

  • pjmlp 3 years ago

    As far as I am aware, they never work that great on UNIX compilers, as no big effort was ever spent improving them.

    About 20 years ago, on UNIX workloads we used to speed the compilation via ClearMake, a kind of distributed version of code cache that would plug into the compilers, however it has part of ClearCase SCM product.

    On Windows, with Microsoft and Borland (nowadays Embarcadero), they work quite alright.

    Also, modules will fix that, as per VC++ reports, importing the whole standard library (import std, as per C++23) takes a fraction of only including iostream.

  • bgmeister 3 years ago

    They are still used in some places. But they have some downsides:

    Precompiled headers don't play nicely with distributed compilation or shared build caches (which are perhaps the fastest way to build large C++ codebases). So while they can work well for local builds, they exclude the use of (IMO) better build-time optimisations.

    They also require maintenance over time- if you precompile a bad set of headers it can make your compile times worse.

  • maccard 3 years ago

    They're very much alive and well on MSVC. Our work projects use both unity builds _and_ precompiled headers.

  • flohofwoe 3 years ago

    In a project that already has good header hygiene, precompiled headers don't help much to speed up builds. They're just a bandaid when the situation is already completely out of control.

mastax 3 years ago

Interesting. I've been aware of this technique for years because of the SQLite Amalgamation, but that was always sold as a way to simplify distribution and perhaps improve performance of the binary. I hadn't considered it as a build speed optimization, though that seems somewhat obvious in hindsight.

  • simplotek 3 years ago

    > I hadn't considered it as a build speed optimization, though that seems somewhat obvious in hindsight.

    Some build systems like cmake already support unity builds, as this is a popular strategy to speed up builds.

    Nevertheless, if speed is the main concern them it's preferable to just use a build cache like ccache, and modularize a project appropriately.

    • maccard 3 years ago

      Why not both?

      Also, does ccache work with MSVC?

      • simplotek 3 years ago

        > Also, does ccache work with MSVC?

        Technically it works, but it requires some work. You need to pass off ccache's executable as the target compiler, and you need to configure the settings in all vsproj files to allow calls to the compiler to be cacheable, like disabling compilation batching.

        Using cmake to generate make/ninja projects and use compilers other than msvc is far simpler and straight-forward: set two cmake vars and you're done.

leni536 3 years ago

Unity builds mean that you can no longer use internal linkage safely anymore, and that's not something I like to give up. It forces the codebase to follow a certain style that I don't like. Hopefully modules will give the advantage of unity builds without this downside.

Scubabear68 3 years ago

Headers and C style macros are probably the most unfortunate aspects of C (and by extension, C++).

So many hacks in compilers to try to work around this. A shame there is no language level fix for this nonsense.

Really wish there could be a C++—- that would improve on C in areas like this, and avoid all the incredible nonsense of C++. And no, not Rust or Go.

  • pxeger1 3 years ago

    Have you tried Zig? I think it fits those criteria, and is known for its good build system, although AIUI it is quite a large language compared to C

  • flohofwoe 3 years ago

    > Headers and C style macros are probably the most unfortunate aspects of C (and by extension, C++).

    Headers only became a massive problem in C++ because of templates and the unfortunate introduction of the inline keyword (which then unfortunately also slipped into C99, truly the biggest blunder of the C committee next to VLAs).

    Typical C headers (including the C stdlib headers) are at most a few hundred lines of function prototypes and struct declarations.

    Typical C++ headers on the other hand (include the C++ stdlib headers) contain a mix of declarations and implementation code in template and inline functions and will pull in tens of thousands of lines of code into each compilation unit.

    This is also the reason why typical C projects compile orders of magnitude faster than typical C++ projects with a comparable line count and number of source files.

  • pphysch 3 years ago

    Headers (in new code) will hopefully become optional due to modules. That would be such a big boost to the language.

    C Macros are pretty much considered code smell in C++, right?

  • zyedidia 3 years ago

    I think Hare (https://harelang.org/) might fit the bill: it retains the minimalism and simplicity of C, but fixes issues like this (and others). Unfortunately I don't think it's ready for real use yet, but I am keeping an eye on it.

StellarScience 3 years ago

We leverage many third party C++ libraries with complex templates, concepts, and constexpr expressions that seem to require lots of CPU to compile. We've found unity builds to be almost 3X faster, so we make it the default for both developer and CI jobs.

But we keep a separate CI job that checks the non-unity build, so developers have to add the right #include statements and can't accidentally reference file-scoped functions from other files. While working on a given library or project, developers often disable the unity build for just that project to reduce incremental build times. It seems to offer the benefits of both approaches.

Precompiled headers don't give nearly the same speedup. We're excited for C++ modules of course, but we're trying to temper any expectations that modules will improve build speed.

firstlink 3 years ago

The compilation-unit-per-file model (and in fact the whole concept of linking) are a legacy incremental build solution for C which somehow metastasized into fundamental requirements of building software on current OSes. It is an atrocity and should be disavowed by all developers.

robalni 3 years ago

I always use unity builds for all my projects now. That combined with using tcc as compiler (for C code) makes builds really fast. Another nice feature of unity builds is that I don't need to declare functions twice and keep the declarations synced. It's also nice to only have one place to find information about a function; people often put comments in header files that you can miss if you go to the definition.

All of those things combined make C programming more enjoyable.

  • simplotek 3 years ago

    > Another nice feature of unity builds is that I don't need to declare functions twice and keep the declarations synced.

    What exactly leads you to have multiple declarations in sync, and thus creating the to "keep [multiple] declarations synced"?

    • robalni 3 years ago

      I mean if you use multiple translation units and header files, you need to have a copy of the function declaration in that header file to be able to call it from other translation units.

dundarious 3 years ago

Why can’t static analyzers analyze the main cpp that #include-s the actual code? I don’t understand that point.

And what were the resulting affects on build times?

  • monocasa 3 years ago

    They can. In addition to that, they get confused by a .cpp that isn't the top level file of a compilation unit.

tyleo 3 years ago

Lots of game studios use Unity builds like this. It saves a massive amount of time. Last I heard it also improves Incredibuild performance which is another popular tool for decreasing build timed.

cpeterso 3 years ago

Another benefit not mentioned is optimization. The compiler may be able to inline more function calls when function definitions and callers are in the same unified compilation unit.

xcdzvyn 3 years ago

Link is 404: https://web.archive.org/web/20230505055736/https://serge-san...

zdimension 3 years ago

The post was renamed, the URL changed accordingly: https://serge-sans-paille.github.io/pythran-stories/how-unit...

(HTTP 301 on the old URL would have been appreciated)

kccqzy 3 years ago

To avoid some of these issues, it can be helpful in a project to require that all files including header files must be compilable on their own. Doesn't get rid of all the problems (you can still depend on transitive includes without explicitly including them) but enforces a minimal amount of code hygiene.

zX41ZdbW 3 years ago

Tried Unity builds recently for ClickHouse, but without success: https://github.com/ClickHouse/ClickHouse/pull/18952#issuecom...

nine_k 3 years ago

It's another reminder how <expletives omitted> C++ is, but for a long time nothing better existed.

Grum9 3 years ago

Been using unity builds forever. The trick is to also have a standard build and try to compile it every week or so to catch anything that might have been missed, like a source file that is missing a header and wont compile alone because it got the header through the unity build ordering.

vinyl7 3 years ago

Compilation units are a relic of a time where computers only had a few KB of memory. At this point computers are fast enough and have enough memory to compile the whole thing in one go faster than whatever gains doing change detection and linking will have.

  • vitno 3 years ago

    This, is just deeply untrue. Do you really think everybody working on compilers and linkers are deeply ignorant? I can easily saturate my 64 GB RAM home setup during a compile.

  • dagmx 3 years ago

    While everyone else is (rightfully) correcting you, I am curious what sort of codebases you’re working with?

    Are you working on large compiled software? Any game, rendering engine or large application benefits from compilation units in my experience.

    Some of my libraries that I work with take upwards of an hour for a fresh compile. Having sane compilation units cuts down subsequent iteration to minutes instead.

  • dblohm7 3 years ago

    Yeah, no. To this day Firefox developers building Gecko need a beefy desktop machine to be able to do it in a reasonable amount of time. I could do a clean build in 6 minutes with a ThreadRipper whose cores were all pegged, but forget doing the same in under an hour on a laptop.

    And that was with unified builds enabled.

    • glandium 3 years ago

      And more importantly, on that same machine the build would take more than ten minutes without unified builds.

  • regnerba 3 years ago

    Hahaha tell that to my Unreal Engine build times.

    A brand new AMD Epyc, 64 core machine, will take over an hour to compile. Good times.

    • dagmx 3 years ago

      I’d really like to see a comparison someday between Epics weird C# based build system and something like CMake+Ninja.

      I suspect there’s compilation optimizations to be made, but I don’t think it would save more than 30% here and there.

      • maccard 3 years ago

        > I suspect there’s compilation optimizations to be made

        There definitely are. I've spent a lot of time with UBT, and a "reasonable" amount of time with cmake and friends. UBT isn't quite the same as CMake + Ninja. UBT does "adaptive" unity builds, globbing, and a couple of other things.

        > but I don’t think it would save more than 30% here and there.

        Agreed. The clean build with UBT is painfully slow compared to Cmake + Ninja, but the full builds themselves are pretty good, and I'd bet that there's probably less low hanging fruit there.

        I did a good chunk of work on improving compile times in Unreal, and there is definitely just low hanging fruit in the engine for improving compile times. Some changes to how UHT works around forward declares would also make a significant difference too.

        • dagmx 3 years ago

          The big issue, in addition to speed, I had with UBT was how difficult it was to debug when it did the wrong thing. Often this was when having to adopt new Xcode versions, where CMake gave a lot of escape hatches to adapt it whereas UBT required spelunking.

          At some points, there’s multiple layers of historic cruft that just seem arcane.

          Last year, epic released a video where an engineer went through it and even they hit points where they said: “I have no idea what this area of code does”

          • maccard 3 years ago

            No disagreements there. Spelunking is a great word for it, but spelunking is a requirement for most "deep" unreal engine development. On the other hand, its incredibly empowering to switch your ide to build UnrealBuildTool, and put "My project Development Win64" as the arguments and be able to debug the build tool there and then to see what it's actually doing.

            • dagmx 3 years ago

              That’s true. I should give it a go again now that Rider is available. It’s been a huge QoL improvement in the rest of my Unreal/Unity development work.

      • regnerba 3 years ago

        I would as well! It’s honestly a bit beyond me, the Unreal build tools run deep, so I imagine it would take some effort.

  • smabie 3 years ago

    Why do clean builds of my code take like 30m then?

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection