Wednesday, September 15, 2021

Small Project Build Systems

Summary: Forward build systems might work better for small projects.

Yesterday's post talked about how Shake is a good medium sized build system - but what about smaller projects? Is the Shake model right for them? Shake can be considered a backwards build system. Each rule says how to produce a file, given some input files (which are dependencies) and an action. Almost all build systems (e.g. Make, Buck, Bazel, CMake, SCons, Ninja) fit this model, which is analysed in the Build Systems a la Carte paper. While this model works, it has two disadvantages:

  • You have to explicitly list dependencies, or infer them from include files etc. That means either dependencies are insufficient (you probably forgot some), or they are excessive (you added some you don't need). Usually both.
  • You have to think backwards. When you ask someone how to build an executable from a C file, no one talks about linking first, but to program a build system you have to.

The alternative to a backwards build system is a forwards build system, of which Memoize was the first. You just write out the commands in order, and dependency tracing figures out if they have changed. To compile a C program it can be as simple as:

gcc -c util.c
gcc -c main.c
gcc -o main main.o util.o

That build script is incredibly simple - so simple it could also be treated as a shell script.

A few years ago I wrote such a system, called Rattle, and wrote a paper about it at OOPSLA 2020 with my co-authors Sarah Spall and Sam Tobin-Hochstadt. Sarah gave a talk about Rattle at OOPSLA, and I gave a talk at Build Meetup 2021. We were able to compile projects like NodeJS faster than the NodeJS build system (which uses Make), showing the idea might be feasible.

If forward build systems are so great, why do I think they are most suitable for small projects? There are four reasons, the first three of which have mitigations, but the final one sets a limit on the size at which forward build systems are suitable.

  1. Forward build systems rely on tracing which files are dependencies of a command. Doing that quickly in a cross-platform manner is a nightmare. There are tricks like hooking system calls etc, but it presents a significant engineering hurdle, especially on MacOS, which makes this task harder with every release.
  2. Forward build systems are immature. The earliest examples no longer work. Rattle is a relatively practical research system - it could evolve into a production system - but it's not there yet. And compared to the alternatives, Rattle is probably one of the closest to production, in large part because it builds off a lot of common infrastructure from Shake which is much more mature.
  3. Forward build systems lack parallelism, since if you want to express parallelism, you need to think about dependencies once more, and it's easy to go wrong. Rattle mostly solves the parallelism by automatically inferring when it is safe to parallelise, which is how we were able to remain competitive with Make.

And finally, the biggest issue is that forward build systems are not compositional, while backward build systems are. If you want to write a 1 million build rule system, in a backwards system, each rule looks like any other. Whereas in a forward build system, assuming you need to give an order, writing down that order in a compositional way is hard - in fact, whenever I've tried it, you start expressing the dependencies between entries and end up with a backwards build system.

Happily, people are continuing to research forward build system. Rattle adds parallelism, Stroll removes the requirement for an order, Fac allows some dependencies and infers the remaining ones, LaForge finds greater incrementality. Perhaps all those ideas can be combined, along with a lot of engineering, to produce a practical forward build system.

Rattle has shown a well engineered forward build system would be feasible for small projects. It's unclear how much larger the concept might be able to scale, probably never to millions of files, but for small projects it might provide a significantly lower effort path to writing build systems.


Steven Shaw said...

What I like very much about forward build systems is that you don't have to have special build rules. I'm thinking of Bazel here. It seems to have the property of needing to emulate how the native tooling does builds. There's Gazelle, which tries to paper over this by generating the Bazel rules. However, there's the problem that the native tooling keeps changing/improving and all the Gazelle generators (or Bazel rules) need to be updated accordingly. This is a _huge_ maintenance problem which leaves a lot of infidelities.

If I understand correctly, forward build systems could accelerate builds that keep using their native tooling. So, C programs could keep using Make or CMake and Haskell programs could keep using Cabal or Stack or `ghc --make`. This seems ideal.

So, I'm super disappointed to hear that this might not scale to large software systems (life's too short to be maintaining build rules). What can be done to help?

Neil Mitchell said...

It's just a hypothesis that it might not scale to larger software systems. I think the biggest issue with forward build systems is no one uses them in real life, so we haven't got the practical experience we need to figure out how they impact users. Users might have a very different opinion on their weaknesses and strengths.

Steven Shaw said...

Yeah, perhaps it's too good to be true :). It would be great to have some large-scale trials.

Anonymous said...