EngFlow Customers & Friends Meetup
In April, Ulf and I, and a few of our EngFlow team members, traveled to San Francisco to visit and work with our customers. In the spirit of one of EngFlow’s LEAP values (Loyalty – making customers and employees better), we decided to host a workshop aimed at helping developers write better custom Bazel rules. With the help of our friends at Mux (and their high tech/high vibe office space), we were able to extend the audience to up to 40 engineers, and invited our customers and friends in the SF Bazel community. This is how the First Quarterly EngFlow Customers & Friends Meetup was born!
I had asked the attendees what they would like to learn about, and one of the suggestions was “non-Googleable information about writing custom Bazel rules”. Given Ulf, EngFlow co-founder & CTO, spent 11 years building and leading Bazel development at Google, this was a fun challenge he accepted, and this is how the talk came together, in just a few days. You can watch a recording of it here.
The talk focused on an overview of Bazel’s language-independent rules & actions model, how to write rules, what isn’t possible yet, what patterns to avoid, and several advanced topics.
- Rules, Macros, Functions all have the same syntax, but to Bazel they have a different meaning.
- Function: called while evaluating a build file
- Macro: can call other functions and rules
- Rules: have a special evaluation model, creates a node in the BUILD graph
- Custom rules
- Before writing your own rules, check bazel.build/rules and awesomebazel.com – the rules you are looking for likely already exist!
- Phase model for rules: loading, analysis, execution
- The phase model is accurate, but isn’t how Bazel works. Bazel can interleave these phases with each other (load 1 build file, analyze another in parallel, etc. and can even execute rule actions).
- Actions are not automatically executed; which actions are executed depends on the top-level targets and selected output groups. If you have actions that aren’t in that depset, those won’t get executed.
- A provider is passed along by the rules in the analysis phase. It’s an object that has information in it (e.g. compiler, classpath, coverage), represented as a struct with key value pairs, and is defined by the rule’s author. If you have multiple rules that need to interact with each other (e.g. Kotlin rule that depends on a Java rule), then they need to agree on a provider, and on the semantics of that provider. For JVM languages the provider could have a classpath, used to compile Kotlin, Java or Scala code. It provides a way for rules to interoperate.
- Cquery is a query into Bazel that’s more precise than Bazel Query, and is based on the analysis phase. However, it requires you to do the analysis phase first. Analysis is also platform dependent and you can only query one platform at a time.
After the workshop we had a few breakouts on language-specific topics with informative and honest discussions over tacos and beers! Here is what we learned from the community about the best practices and challenges with various Bazel rules.
C++ and Go
Companies working on autonomous driving software, like Tesla, Aurora and Blue River Technology, deeply care about C++ build performance, so we discussed a few interesting opportunities.
- Include Scanning feature exposed by EngFlow engineer Benjamin Peterson in Bazel provides up to 70% reduction in unnecessary include files, which from our experience could lead to up to 2x performance boost. Here are the key takeaways from that discussion:
- The best practice is to have fine-grained libraries with one .cc file and associated .h file per cc library.
- Sometimes you end up with coarse-grained cc library rules, which results in Bazel scanning more files than necessary, leading to a large number of include files per action and slower builds.
- In those cases, a better trade-off may be to enable include scanning, which will scan the code and extract only the relevant list of .h files, so that you end up with fewer header files per action. This makes actions smaller and builds faster.
- It comes with some trade-offs, and may not be as impactful on a local build, while a more significant benefit will be achieved with remote caching and remote execution.
- It’s a trade-off that sometimes is more beneficial for local builds and sometimes for remote. Keith, Envoy CI lead, enabled include scanning for Envoy right after the meetup, and got some early results: locally it saved ~9% build performance on an M1 machine, and it saved ~22gb (~25%) of header setup. However, it didn’t result in savings with remote execution.
- Developers are enforcing reproducibility by running Bazel inside a docker container, and asked how that performs with Remote Execution. One of our customers mentioned that EngFlow automatically handles this by running remote actions inside docker containers.
- For deeply nested header includes, Clang modules may help. Bazel supports module compilation, but not out of the box; you need to provide a custom toolchain where you specify which compiler flags are required to enable module compilation.
- We were surprised to hear that many companies are using Gazelle for automatic build file generation, beyond Go and especially for C++ projects.
- As there is no built-in gazelle extension for C++, everyone is developing their own. To some degree this is necessary, because there are few strong conventions for include paths across C++ code bases, so any tool that works everywhere may require a lot of configuration. OpenCV and ffmpeg were mentioned as particularly challenging dependencies.
- Bazel favors paths relative to the repository root directory, and Gazelle will work best with that, too.
- Performance is a bit slow, but there’s opportunity for improvement in Gazelle itself (e.g. parallelization, caching).
- Autogazelle is one of the most frequently used tools to watch for file system changes and update build files automatically.
- Currently Android rules are built-in in Bazel, which results in companies forking Bazel for changes to their Android rules. Google is currently migrating Android rules to Starlark, which will be available in the rules_android repo.
- As a result, there is a significant demand for Google to review and merge PRs related to Android rules from contributors across a wide range of companies, including Snap, Lyft and Square.
- After the meetup we followed up with Google, and are collaborating on bringing the community feedback and improving the contributor experience.
- Ulf shared performance results with our new dynamic scheduler which is based on a model of a cluster load, and automatically adjusts action scheduling across the cluster.
- With engineers from companies like Tesla and Adobe at the table, we discussed the challenges of using various node.js and typescript rules, as well as energy efficient houses!
- One of the attendees successfully migrated part of their company's Typescript repo to Bazel with remote execution, seeing a significant performance improvement, and documented their process for other companies to learn from.
Java and Scala
- Engineers from Sony, Databricks, EngFlow and other companies discussed being generally happy with building Java with Bazel.
- Combining Kotlin and Java in a project may present problems with compilers resolving cyclic dependencies within a target, so several engineers are recommending to use either Kotlin or Java per target.
- Our customers commented on seeing significant performance impact from persistent workers designed and implemented by Ulf. This is critical for achieving good performance with Java, Scala and Kotlin projects, especially with remote execution.
We stayed way past our scheduled time, with people reconnecting with their peers and friends. And, in what is now a new EngFlow tradition for these packed customer visits, ended up having a second dinner!
The workshop covered several advanced topics, and attendees expressed an interest in learning more about them:
- How to not write rules (performance, memory use)
- Configurations, Platforms & Toolchains
- Test rules (test log, test.xml, undeclared outputs, test warnings, test infrastructure failures, runfiles, sharding + runs per test, coverage)
- Persistent workers (when to use them, how to write them, proto vs. json api, multiplex workers, remote persistent workers)
This means we already have a great reason to meet again. We hope to see you at the Second EngFlow Quarterly Customers & Friends Meetup. Location and time to be confirmed based on your input.