Bazel Fawlty

75 points by elrodeo 8 years ago · 51 comments

Reader

I have worked extensively with Bazel in the context of migrating a very large Java/Scala codebase from Gradle to Bazel.

My impression is that it is a first-class build system _specifically for Java and C++_. There are specific properties of the compilation and packaging ecosystem around those languages and runtimes that make them uniquely in need of tools like Bazel. This is not true for Go, where build speed and large codebases were a consideration up-front (especially with the new build caching features that shipped in Go 1.10), and are almost completely irrelevant in the context of "interpreted" languages like Python, Ruby or JavaScript.

There are plenty of other languages and runtimes that stand to benefit from a tool like Bazel, but just because Bazel is a fantastic tool in some contexts doesn't mean it brings that much to the table in others.

cromwellian 8 years ago

Actually it is very relevant for JavaScript at Google, given Closure Compiler. Most of our JS apps are type checked, code split and optimized and Bazel is how we structure the module dependencies. The Bazel build rules define independent logical modules, and build processes prune unused dependencies, and calculate a dependency graph which is used to organize an optimal module structure, this is then used by Closure Compiler’s CrossModuleCodeMotion to further reduce size by moving methods and properties down to later modules.
Granted, JS isn’t incrementally built, but the dependency graph pruning (removing unused files from the build that aren’t imported or referenced in JS code) is parallelizable.
- venantius 8 years ago
  
  Yeah, JavaScript is a bit of an odd one out here, and I can obviously see the value in terms of providing a minimal build (in a way that's of less import to Ruby/Python users, say).
  That said, it all depends what benefits you want to get out of Bazel. For us, incremental builds were the key thing we cared about - minimizing JAR size was important, but we already had a decently good handle on that.
letientai299 8 years ago

I can speak for Java only. Most of my Java project, which are using Spring Boot and lots of libraries, gradle (previous is maven) works just fine. If there's any build issue, assume that it's not compilation issue, I just delete the `build/` and `.gradle` folders, then `./gradlew clean build` again.
I've took a hard look as bazel, for what they claim to be fast, correct, but the complicated setup and document just put me off.
Why should we consider bazel over gradle or maven for Java, or any other JVM languages?
- venantius 8 years ago
  
  Your comment makes it sound like you don't run your build process in a continuous integration environment.
  - letientai299 8 years ago
    
    I do. In fact, I'm working on several Spring Boot services that has CI/CD configured on Gitlab. The workflow from push, build, test, release docker image and deploy are all automatic on Gitlab CI.
- foota 8 years ago
  
  I don't really view bazel as being that complicated to set up, what makes you think that?
danmux 8 years ago

Spot on.

elteto 8 years ago

It seems like Bazel's support for Go is still not first class and a lot of the issues mentioned here come from that.

But it strikes me as odd that they are figuring all of this out _now_, as opposed to a year ago when they first switched. Why did they switch to bazel in the first place? Did they do a trade study/test drive?

All of the problems mentioned (more complex build files and workflow, increased build times, lacking support for Go) you can find out right away as soon as you do a test drive. Unlike C++ or Java, it seems like the native go build tooling was always superior to what Bazel could offer, and they knew that. Why switch then?

In my experience using Bazel in a large and complex C++ codebase, it has been nothing short of amazing, compared to the CMake horror show that we had... We did have problems with it though, but mostly the lacking documentation in some areas, like building toolchain files, or the fact that support for Windows only got really good a couple of releases ago only. We did identify all of these in the initial migration effort and none of them was by itself a showstopper.

danmux 8 years ago

To give a tool a fair chance I think it has to be used daily to really grok the pleasure / pain. We were aware of some of the problems, but perhaps overly confident about managing them, and underestimated the frustration and lack of agency. Possibly we let it embed too far before we decided we have tried as hard as could be expected. Around 6 months in, once a number of us formally proposed that it be reverted, we took the bold decision that we should persevere no further. Having worked in C codebases in the past - I can imagine it is almost a miraculous improvement there.

Sir_Cmpwn 8 years ago

We've been using Bazel at work, where we have a monorepo with lots of different languages at play (Python, Go, Java, Scala, JavaScript, CoffeeScript, TypeScript, Protobuf, just to name the major players) and I like that it makes all of them use a consistent interface for configuring and running builds. It helps that we have a specific team handling most of the Bazel work - and it costs about 10% of all of that team's time.

That's about the only thing I like about it, though. It's the most complicated build system I've ever used and I still don't fully grok it after 7 months working in this position.

merb 8 years ago

> where we have a monorepo with lots of different languages
isn't that the only reason to use something like bazel? I considered using it for our go, java, scala, typescript and c# project, but I would've never used if for go only.

zellyn 8 years ago

If your entire codebase is Go, you really shouldn't be using Bazel/Pants/Buck/Please. Go's built-in build system is perfectly adequate for just Go, with light tooling for generated code (eg. protobufs).

It's when you have multiple languages and want to be able to build/test only what has changed that you need a Blaze-alike.

With regards to Gazelle: is it intended to be run every build, on every file? Or is it intended to be hooked into your editor, and run on save to update your BUILD files as you go? At Google, I used the latter kind of tool, and it worked just fine. If it missed something, you'd notice when the CI system failed to build/test your change.

danmux 8 years ago

Regarding Gazelle, yes, the CI would break early if it detected a none porcelain repo after running Gazelle itself. I think others may have configured their editors, but for me the occasional 30+s run time (and as mentioned in some workflows 3-4 mins) on file save - plus the already creaking VSCode plugins was too much to bear.
- zellyn 8 years ago
  
  Was it 3-4 minutes when updating a single file? Or only running over the whole repo?
  - danmux 8 years ago
    
    yes, could easily have been the latter

EtDybNuvCu 8 years ago

Have you tried Nix? It's like Bazel, but without incremental builds. The boons are pretty great.

I'm extremely disappointed, as a career SRE, to see "hope is not a strategy" trampled in the mud. I understand that perhaps the author has succumbed to epistemic helplessness and given up on having computers work without optimism or belief, but computers do not operate on any ruleset which includes hope, and we should not encourage magical thinking.

danmux 8 years ago

Not trampling, just holding at arms length. Recognising the reality of the economics of most projects. Unless you live in a completely deterministic world, devoid of human fallibility, or perhaps omnipotent, or simply have unlimited time or resource, at some point you are going to have to admit you just don't know, and you are managing the percentages. Delusional overconfidence is more close to magical thinking than recognising the reality that complex systems will fail in surprising ways, and that it costs ever increasing resource to reduce the risks with ever diminishing returns, until you are forced to stop. The ruleset the computer abides by probably represents a fraction of the factors affecting success. If you feel you have never got to a point where there is an element of faith involved in your choice, then Im envious.
- EtDybNuvCu 8 years ago
  
  Thank you for your thoughtful reply; you deserve a serious answer.
  I accept that, occasionally, my systems will fail. My data will be lost. No recovery will be possible, and the damage will be permanent and lasting. I do not hope to succeed in those times, but expect to be scarred. I will fail.
  Given that I will fail, I'd like to understand how I fail. I'd like to understand why I fail. I'd like to measure how often I fail, how short I fall of success, and the root causes of my failure. I'd like to know when failure is about to happen or is happening.
  By writing down what I do, I know how to fail. I can write down what I do when I don't fail, too. I can write while I do, and I can read what I wrote to do it again. I can let somebody else read and do what I have done.
  I can expect to fail sometimes. I can expect to not fail sometimes. I expect failures based on causes, not based on self-blame. I expect to not fail most of the time, and only fail at certain times when something has caused me to fail.
  I can fail less in the future. My actions today can change how I fail in the future. I can plan to fail, or intentionally fail, or sometimes fail less. I act intentionally.
  I haven't failed in a while. The last time I failed, I looked at why I failed and I did what was necessary to try to recover and fail less.
  This is how SRE works. This isn't overconfidence; this is fault-tolerance. It's not easy, but it works.
  To respond to your point about faith, I have plenty of faith, just not hope that my faith will be able to prevent me from failing.
  And finally, try Nix sometime. It's pretty cool.
  - danmux 8 years ago
    
    Good points well made. I hope I'll be able to prioritise trying Nix at some point.
ris 8 years ago

> Have you tried Nix? It's like Bazel, but without incremental builds.
Well, depending on how granular you expect your incremental builds to be, it can indeed have "incremental builds". Any build step that produces a derivation that ends up in the nix store will be lazily evaluated and not regenerated if its input parameters haven't changed. You could go down the route of outputting each compiled object file as a derivation...
- benley 8 years ago
  
  > You could go down the route of outputting each compiled object file as a derivation...
  You could, but I believe you would find that Nix becomes pretty inefficient in that scenario. There is a surprising amount of overhead involved in setup/teardown of network namespaces and the other various sandbox features, and that cost is incurred for each individual derivation. It's a reasonable tradeoff for Nix when used as a package-level build sandbox, but (assuming my understanding is correct and still current) for Nix to work well as a file-level incremental build system it would require some strategic changes.
  Here's the github issue with a bunch of related discussion and details: https://github.com/NixOS/nix/issues/179
  - ris 8 years ago
    
    Yes it's all interesting stuff.
    But, one way of looking at it, that's just ("just") an implementation issue. No ideas fundamental to Nix would have to be changed to find a faster way of implementing derivation isolation, hopefully any solution to problems like this could be transparent to people who had written build systems in Nix and not require significant rewrites...?
    Even so, incremental builds wouldn't be an all-or-nothing scenario. A project could be divided up into a number of separately buildable partitions, each built as a separate derivation. Deciding how granular to make your incremental..ism would just be a tradeoff of overhead vs. amount of rebuilding saved.

philip1209 8 years ago

When we used Bazel at Staffjoy, we had two main issues:

(1) NPM dependency management was hard. We ended up committing built dist files, like index.html

(2) We ended up duplicating dependencies between Bazel and Glide because tools like goimports and linters could not read the Bazel remote dependencies system.

We open-sourced the repo with Bazel when we shut down -> http://github.com/staffjoy/v2

Aissen 8 years ago

It added a dependency on the JVM. (I probably have an overly painful past with the JVM in production), suffice to say we had to bump up the RAM on the build machines.

This alone is a reason not to use Bazel. It's designed for the enterprise in mind, where JVMs are common, sharing a common build environment in a team is easy (or at least, the setup time isn't an issue).

For a community project, this type of thing wouldn't fly. There's a reason Meson is winning the latest build-systems war: it only depends on python3 & ninja.

mkobit 8 years ago

> There's a reason Meson is winning the latest build-systems war:
What do you mean by winning? I haven't heard of Meson until the last few weeks, and I haven't seen it consumed by any open source projects on GitHub, although my search biases more towards JVM things anyways.
> it only depends on python3 & ninja
I don't see how depending on both Python3 and Ninja is any less dependencies than the JVM. Python3 isn't really available by default on most distributions (maybe that is changing soon), so there is still the overhead of installing it, right? Public CI systems like Travis CI and Circle CI have images that make having a JDK easy.
- Aissen 8 years ago
  
  See https://youtu.be/gHdTzdXkhRY?t=8m4s for a list of high-profile projects that have moved or are moving to meson.
  While my parent comment is over-simplifying, the talk goes into more details on the strengths of Meson.
  Also, I'm not saying there's no open-source community around JVM-based projects. Just that adding it as a dependency is a very expensive decision to make. Python3 is pre-installed in major distros (e.g in Fedora/Ubuntu/Arch), and the binary package for ninja is about 300k installed here. Python3 is about 10m; openjdk is about 100M. Then the runtime requirements the article talks about add up.
  - mkobit 8 years ago
    
    I wasn't aware that Python3 was pre-installed on major distros, I don't believe it was by default on my Ubuntu 16.04. It is possibly I removed it, though. It does look like 18.04 is going to bundle Python 3 [1]. I agree with you that adding JVM is an expensive decision and the runtime requirements can make using them in real projects a big pain.
    Thanks for the video, I'll check it out. It is exciting (and I would also say a bit worrying) to see a lot of competing tools in this area.
    [1] https://wiki.ubuntu.com/Python/Python36Transition
    
    jkaplowitz 8 years ago
    
    2018 and 2019 is when you'll start to see all the distros with an enterprise support lifecycle (RHEL, Debian, Ubuntu LTS, SLES, probably others I'm forgetting) start to move more heavily to Python 3, since Python 2's EOL is in 2020. As your link notes, Ubuntu and Debian are trying to make their next long-term supported stable releases use 3.x as the default Python.
    Looking to the following iteration, any distro releasing in 2020 with Python 2.7 as default and a support lifecycle greater than 6-9 months, doing so after 2020 regardless of lifecycle, will be irresponsible. I doubt any of the major ones will overlook this, not even those which target hobbyists instead of enterprises.
    (Disclaimer: While I am a Debian developer, I have no personal involvement in this transition for Debian or any other distro.)

ggambetta 8 years ago

> Hope is a strategy [...] At some point we have to hope and assume. for example eventually we hope the compiler authors did a good job with the next version we are about to use, or we assume that the kernel fix was good.

Or you can do a very slow, controlled rollout of the new version and see what happens. With all the systems I worked with while at Google, both as a SRE and a SWE, whenever we had a new version to release, we'd update one task in one cluster, let it run for a day or two, check the logs and the metrics... if it was OK, update one job in one cluster, then repeat the process with an entire cluster (the designated canary cluster), and only then release to the rest of the clusters. If any of these went wrong, we'd rollback or patch, depending on the severity. We rarely missed our weekly releases.

I'm sure the same applies to a new compiler version or a new kernel fix. You don't need to assume anything.

danmux 8 years ago

hmm, yes, this was always going to be a contentious point, and highly subjective. I think the main issue here is it was in a large part also a very human impact, which is much harder to measure. I think it was fair to say that our build system at the time was more rigid and testing fully both flows not within our grasp. I discussed this point about 'hope' with a colleague, I think I agree with his conclusion: "Google actually designs some of the chips it uses! And runs its own power stations - I think if anyone could legitimately say "Hope is not a Strategy" it might be them."

sarabande 8 years ago

Thanks for the comments on Bazel -- I am evaluating a proper build system for us right now for a data science codebase almost entirely in Python, with a lot of scientific computing dependencies. Bazel doesn't yet seem to have first-class support for Python, so that's out, and I'm running into a lot of problems with Twitter Pants, although their Slack channel is really helpful.

Did you ever try using Pants or Buck, or a more barebones approach like Ninja?

P.S.: An explanation of the pun on "Basil Fawlty" (as a non-Brit, I thought, what on earth is that?) in the article might be helpful.

philbarr 8 years ago

Classic Basil Fawlty scene when his car breaks down (needs sound, includes swearing):
https://www.youtube.com/watch?v=mv0onXhyLlE
Imagine the car is your build tool...
- mnw21cam 8 years ago
  
  Which more recently gave rise to this [0] advert.
  [0] https://www.youtube.com/watch?v=n9tSN0178Us
humanrebar 8 years ago

You might have figured it out already, but:
"The plots centre on tense, rude and put-upon owner Basil Fawlty (Cleese)..."
https://en.wikipedia.org/wiki/Fawlty_Towers
davidstanke 8 years ago

Bazel + Python definitely needs some work; if you're interested, check out the recently-formed https://groups.google.com/forum/#!forum/bazel-sig-python to track progress.
nine_k 8 years ago

(For a mostly-Python shop, I'd consider using [Search domain github.com] https://github.com/waf-project/waf or scons.org)
danmux 8 years ago

Glad it can help show you some of the pitfalls, though I'm not sure how well our experience extrapolates to a primarily python codebase.
I have added a link to the classic scene as referenced by philbarr
danmux 8 years ago

Also as you work with Python, you may want to look at the link between Basil Fawlty and your chosen language :)
jasikpark 8 years ago

Using `pipenv`[0] may solve some of your problems, though idk about provisioning cpp dependencies. [0]https://github.com/pypa/pipenv

erain 8 years ago

For your docker container needs: "Bazel should in theory be able to decide if a docker image should be rebuilt"

You can checkout Bazel's docker rule: https://github.com/bazelbuild/rules_docker

Also, go already has a very good build sysmtem build-in, and hazel really shines when: - you have a complex codebase with multi-languages: one can build a tool in one language and use it as tool for a another language. - you simply have a really large code base - you can work on the path //service1/my/service and your colleague can work on //service2/their/service, and only the path got changed needs to rebuild every time.

leg100 8 years ago

It eschews docker, and Dockerfiles and their non-determinism, and builds the layers itself, and uses a python lib [1] to push and pull container images.
In order to ensure reproducibility/determinism, however, Bazel doesn't have an equivalent of the RUN instruction. You have to use other Bazel rules to fetch and produce artefacts and add them to an image, and there aren't always rules for what it is you want to do (I have a spot of bother with installing pip packages, for which there is apparently alpha support).
This is, I think, the thing with Bazel: it has to re-invent everything to ensure this reproducibility and hermetic seal, because it doesn't trust existing build tools to do this, and quite rightly so if this is what you're seeking. But I suspect it's going to be painful doing things outside the mainstream supported stuff.
[1] https://github.com/google/containerregistry
- cyphar 8 years ago
  
  > It eschews docker, and Dockerfiles and their non-determinism, and builds the layers itself [...] But I suspect it's going to be painful doing things outside the mainstream supported stuff.
  This should not matter at all. The OCI specification provides an interoperable image format that doesn't care who built the image or how. Docker still doesn't support the OCI yet[1], but tools exist to do conversions[2]. To shamelessly plug my own projects, you can use umoci[3] as a completely unprivileged user to construct images (without using containers if you don't want to) and the resulting images are still usable with Docker (once you convert them from OCI images). This is how we build the official openSUSE images, and it doesn't affect any user of the images that we chose not to use Docker or Dockerfiles to build them.
  [1]: https://github.com/moby/moby/pull/33355 [2]: https://github.com/projectatomic/skopeo [3]: https://github.com/openSUSE/umoci
  - leg100 8 years ago
    
    > This should not matter at all. The OCI specification provides an interoperable image format that doesn't care who built the image or how
    The OCI doesn't, but the build tool does care about how the image was built, because of reproducibility etc. But build tools will be able to leverage the kind of stuff you're doing, rather than doing it themselves (like Bazel does).
    - umoci looks pretty cool, this is how things should have been done in the first place.
- erain 8 years ago
  
  disclaimer: I am one of the rules_docker contributor :D
  You are right about the pain that there is no `RUN` equivalent (yet), and I think someone is working on that.
  Regarding producing the docker image itself: given Open Container Initiative[1] and its container image spec[2], it is a good thing that multiple tools can produce exchangeable container images and bazel is one of them with its own reproducible properties.
  [1] https://www.opencontainers.org/ [2] https://github.com/opencontainers/image-spec/blob/v0.2.0/ser...

pvg 8 years ago

What's the context for this - as in, the 'we' in question and the thing they were building with bazel? And are they really posting from March 16, 2018?

danmux 8 years ago

I probably should have made the context clearer earlier in the post - later on I mention :
"To set the scene, our codebase is mainly Go, including vendor directories, and a considerable amount of data compiled in, we have 2.5M lines of code, a full build from a clean clone on one of our Jenkins slaves takes 1 minute.
We had between 8 and 20 engineers working on that codebase. Importantly we all develop on Mac, but run in production in linux."
ndh2 8 years ago

Well, one section is suspiciously titled Back to the Future.
- danmux 8 years ago
  
  hahah. oops.

meanmrmustard92 8 years ago

After Python and Bazel, I fully expect a programming framework named 'Wanda' to crop up somewhere. Cleese would be so confused.

kemenaran 8 years ago

Interesting to have feedback on Bazel usage outside of Google. Bazel looks like a great tool, but it's good to see the community starting to explore its sweet spots and drawbacks.

Settings

Bazel Fawlty

Keyboard Shortcuts