Thursday, September 16, 2021

Huge Project Build Systems

Summary: Shake won't scale to millions of files, this post says what would be required to make it do so.

While Shake has compiled projects with hundreds of thousands of files, it's never scaled out to millions of files, and it would be unlikely to work well at that size. The most popular build systems that operate at that scale are Buck (from Facebook) and Bazel (from Google). In this post I go through the changes that would need to be made to make Shake scale.

The first issue is covered in my previous post, that Shake doesn't know if you change the build rules themselves. As you scale up, it becomes much more important that if you change the rules, everything is properly tracked. As the number of people involved in a project increases, the rate at which the build system changes will also increase. Both Buck and Bazel solve this problem using a deterministic Python-based configuration language called Starlark. If Shake stopped being a Haskell DSL, I'd argue that it stops being Shake and becomes something different, so it's unclear what could be done there.

The next issue is that every time Shake is invoked, it checks the modification time of every file, and then walks the entire dependency graph. That works fine at 10K files, but as you move to 1M files, it takes too long. The solution is two-fold, first be informed which files have changed using notification APIs (e.g. the Watchman tool), and then use reverse dependencies to only explore the portion of the graph that has changed. Happily, Pepe already has a patch adding reverse dependencies to Shake, so that isn't too infeasible.

The final issue is that Shake was designed as a single-machine build system, not for sharing results between users. When I first wrote Shake, I didn't have access to servers, and AWS was brand new. Now, over a decade later, servers are easy to obtain and large scale build systems need to share results, so that if one user builds a file, no one else needs to. Within the realm of multi-user build systems, there are two basic operations - sharing results and offloading commands.

Shake, with it's new cloud features, is able to share results between users using a shared drive. It works, and big companies are using it for real, but I'd consider it fairly experimental. For execution, Shake is unable to run actions remotely, so can't make use of something like Bazel's remote execution API. Since dependencies are specified at the rule level, and remote execution operates at the command level, there is a bit of a mismatch, and it's unclear what that might look like in Shake.

While Shake won't work at huge scales, it is still quite an effective build system at quite large scales. But, given the limitations, I imagine it will never get to the scale of Buck/Bazel. At the same time, Buck/Bazel lack dynamic dependencies, which makes them unable to express rules such as Haskell effectively.

Happily, I am involved with a new build system, the next generation of Buck. I joined Facebook two years ago, and since that time have been focused on this project. It's written in Rust, configured with Starlark (I've spent a lot of time working on an open-source Starlark interpreter in Rust), and should work at huge scales. It's not yet open source, but it will be - we are targeting early next year.

I think Shake is still a build system with a lot to offer, and continue to maintain and improve it. For people who want to scale beyond the range of Shake, I'd definitely recommend using the next generation of Buck, once it is available.

Wednesday, September 15, 2021

Small Project Build Systems

Summary: Forward build systems might work better for small projects.

Yesterday's post talked about how Shake is a good medium sized build system - but what about smaller projects? Is the Shake model right for them? Shake can be considered a backwards build system. Each rule says how to produce a file, given some input files (which are dependencies) and an action. Almost all build systems (e.g. Make, Buck, Bazel, CMake, SCons, Ninja) fit this model, which is analysed in the Build Systems a la Carte paper. While this model works, it has two disadvantages:

  • You have to explicitly list dependencies, or infer them from include files etc. That means either dependencies are insufficient (you probably forgot some), or they are excessive (you added some you don't need). Usually both.
  • You have to think backwards. When you ask someone how to build an executable from a C file, no one talks about linking first, but to program a build system you have to.

The alternative to a backwards build system is a forwards build system, of which Memoize was the first. You just write out the commands in order, and dependency tracing figures out if they have changed. To compile a C program it can be as simple as:

gcc -c util.c
gcc -c main.c
gcc -o main main.o util.o

That build script is incredibly simple - so simple it could also be treated as a shell script.

A few years ago I wrote such a system, called Rattle, and wrote a paper about it at OOPSLA 2020 with my co-authors Sarah Spall and Sam Tobin-Hochstadt. Sarah gave a talk about Rattle at OOPSLA, and I gave a talk at Build Meetup 2021. We were able to compile projects like NodeJS faster than the NodeJS build system (which uses Make), showing the idea might be feasible.

If forward build systems are so great, why do I think they are most suitable for small projects? There are four reasons, the first three of which have mitigations, but the final one sets a limit on the size at which forward build systems are suitable.

  1. Forward build systems rely on tracing which files are dependencies of a command. Doing that quickly in a cross-platform manner is a nightmare. There are tricks like hooking system calls etc, but it presents a significant engineering hurdle, especially on MacOS, which makes this task harder with every release.
  2. Forward build systems are immature. The earliest examples no longer work. Rattle is a relatively practical research system - it could evolve into a production system - but it's not there yet. And compared to the alternatives, Rattle is probably one of the closest to production, in large part because it builds off a lot of common infrastructure from Shake which is much more mature.
  3. Forward build systems lack parallelism, since if you want to express parallelism, you need to think about dependencies once more, and it's easy to go wrong. Rattle mostly solves the parallelism by automatically inferring when it is safe to parallelise, which is how we were able to remain competitive with Make.

And finally, the biggest issue is that forward build systems are not compositional, while backward build systems are. If you want to write a 1 million build rule system, in a backwards system, each rule looks like any other. Whereas in a forward build system, assuming you need to give an order, writing down that order in a compositional way is hard - in fact, whenever I've tried it, you start expressing the dependencies between entries and end up with a backwards build system.

Happily, people are continuing to research forward build system. Rattle adds parallelism, Stroll removes the requirement for an order, Fac allows some dependencies and infers the remaining ones, LaForge finds greater incrementality. Perhaps all those ideas can be combined, along with a lot of engineering, to produce a practical forward build system.

Rattle has shown a well engineered forward build system would be feasible for small projects. It's unclear how much larger the concept might be able to scale, probably never to millions of files, but for small projects it might provide a significantly lower effort path to writing build systems.

Tuesday, September 14, 2021

Reflecting on the Shake Build System

Summary: As a medium-sized build system, Shake has some good bits and some bad bits.

I first developed the Shake build system at Standard Chartered in 2008, rewriting an open source version in my spare time in 2011. I wrote a paper on Shake for ICFP 2012 and then clarified some of the details in a JFP 2020 paper. Looking back, over a decade later, this post discusses what went well and what could be improved.

The first thing to note is that Shake is a medium sized build system. If you have either 1 source file or 1 million source files, Shake probably isn't a good fit. In this post I'm going to go through how Shake does as a medium-sized build system, and two other posts reflect on what I think a small build system or huge build system might look like.

The most important thing Shake got right was adding monadic/dynamic dependencies. Most build systems start with a static graph, and then, realising that can't express the real world, start hacking in an unprincipled manner. The resulting system becomes a bunch of special cases. Shake embraced dynamic dependencies. That makes some things harder (no static cycle detection, less obvious parallelism, must store dependency edges), but all those make Shake itself harder to write, while dynamic dependencies make Shake easier to use. I hope that eventually all build systems gain dynamic dependencies.

In addition to dynamic dependencies, Shake has early cut-off, meaning files that rebuild but don't change don't invalidate rules that depend upon them. This feature is something increasingly becoming standard in build systems, which is great to see.

Shake is written as a Haskell DSL, which means users of Shake are writing a Haskell program that happens to heavily leverage the Shake library. That choice was a double-edged sword. There are some significant advantages:

  • I didn't need to invent a special purpose language. That means I get to reuse existing tooling, existing learning materials, and existing libraries.
  • Since Shake is just a library, it can be documented with the normal tools like Haddock.
  • Users can extend Shake using Haskell, and publish libraries building on top of Shake.
  • The modelling of monadic dependencies in Haskell is pretty elegant, given a dedicated syntax for expressing monadic computations (the do keyword).
  • Projects like Haskell Language Server can build on top of Shake in fairly fundamental ways. See our recent IFL 2020 paper for the benefits that brought.

But there are also some downsides:

  • Most significantly, the audience of Shake is somewhat limited by the fact that Shake users probably have to learn some Haskell. While the user manual aims to teach enough Haskell to write Shake without really knowing Haskell, it's still a barrier.
  • Haskell has some significant weaknesses, e.g. it has two competing package managers, countless distribution mechanisms, and none of these things are consistent for long periods of time. Haskell has a poor on-ramp, and thus so does Shake.

The choice of an embedded DSL for Shake also leads to the issue that Shake doesn't know when a rule has changed, since a rule is opaque Haskell code. As a consequence, if you modify a command line in a .hs file, Shake is unaware and won't rebuild the necessary files. There are a bunch of techniques for dealing with this limitation (see the Shake functions shakeVersion, versioned), but none are pleasant, and it remains an easy mistake to make. A potential way out is to build a system which reads configuration files not in Haskell and interprets them, which I gave a talk about, and I've seen deployed in practice. But it's something where each user ends up rolling their own.

Another limitation is that Shake is (deliberately) quite low-level. It gives you a way to depend on a file, and a way to run a command line. It doesn't give you a way to express a C++ library. The hope from the beginning was that Shake would be language neutral, and that libraries would arise that built on top of Shake providing access to standard libraries. If you were writing a Python/C++/Ruby build script, you'd simply import those libraries, mix them together, and have a working build system. There are libraries that have gone in that direction, the libraries shake-language-c and shake-cpp provide C++ rules, avr-shake lets you work with AVR Crosspack. Unfortunately, there aren't enough libraries to just plug together a build system. I think a fundamental problem is that it's not immediately obvious how such libraries would compose, and without that composability, it's hard to build the libraries that would rely on composability.

Finally, the most surprising aspect about developing Shake is that a large part of the effort has gone into writing an ergonomic and cross-platform process executor. The result is found at Development.Shake.Command, and can be used outside Shake, letting users write:

cmd "gcc -c" [src]

This example invokes gcc, ensuring that src is properly escaped if it has spaces or other special characters. A significant amount of the engineering work in Shake has gone into that facility, when it's totally orthogonal to the build system itself.

In the next two parts of this series, I'll go through why I think Shake isn't a great build system for tiny projects (and what might be), followed by why Shake isn't great for absolutely huge projects (and what would need to be fixed).

Sunday, January 17, 2021

Recording video

Summary: Use OBS, Camo and Audacity.

I recently needed to record a presentation which had slides and my face combined, using a Mac. Based on suggestions from friends and searching the web, I came up with a recipe that worked reasonably well. I'm writing this down to both share that recipe, and so I can reuse the recipe next time.

Slide design: I used a slide template which had a vertical rectangle hole at the bottom left so I could overlay a picture of my video. It took a while to find a slide design that looked plausible, and make sure callouts/quotes etc didn't overlap into this area.

Camera: The best camera you have is probably the one on your phone. To hook up my iPhone to my Mac I used a £20 lightning to USB-C cable (next day shipping from Apple) along with the software Camo. I found Camo delightfully easy to use. I paid £5 per month to disable the logo and because I wanted to try out the portrait mode to blur my background - but that mode kept randomly blurring and unblurring things in the background, so I didn't use it. Camo is useful, but I record videos infrequently, and £5/month is way too steep. I'm not a fan of software subscriptions, so I'll remember to cancel Camo. Because it is subscription based, and subscribing/cancelling is a hassle, I'll probably just suck up the logo next time.

Composition: To put it all together I used OBS Studio. The lack of an undo feature is a bit annoying (click carefully), but otherwise everything was pretty smooth. I put my slide deck (in Keynote) on one monitor, and then had OBS grab the slide contents from it. I didn't use presentation mode in Keynote as that takes over all the screen, so I just used the slide editing view, with OBS cropping to the slide contents. One annoyance of slide editing view is that spelling mistakes (and variable names etc.) have red dotted underlines, so I had to go through every slide and make sure the spellings were ignored. Grabbing the video from Camo into OBS was very easy.

Camera angle: To get the best camera angle I used a lighting plus phone stand (which contains an impressive array of stands, clips, extensions etc) I'd already bought to position the camera right in front of me. Unfortunately, putting the camera right in front of me made it hard to see the screen, which is what I use to present from. It was awkward, and I had to make a real effort to ensure I kept looking into the camera - using my reflection on the back of the shiny iPhone to make sure I kept in the right position. Even then, watching the video after, you can see my eyes dart to the screen to read the next slide. There must be something better out there - or maybe it's only a problem if you're thinking about it and most people won't notice.

Recording: For actual recording there are two approaches - record perfectly in one take (which may take many tries, or accepting a lower quality) or repeatedly record each section and edit it together after. I decided to go for a single take, which meant that if a few slides through I stumbled then I restarted. Looking at my output directory, I see 15 real takes, with a combined total of about an hour runtime, for a 20 minute talk. I did two complete run throughs, one before I noticed that spelling mistakes were underlined in dotted red.

Conversion to MP4: OBS records files as .mkv, so I used VLC to preview them. When I was happy with the result, I converted the file to .mp4 using the OBS feature "Remux recordings".

Audio post processing: After listening to the audio, there was a clear background hum, I suspect from the fan of the laptop. I removed that using Audacity. Getting Audacity to open a .mp4 file was a bit of an uphill struggle, following this guide. I then cleaned up the audio using this guide, saved it as .wav, and reintegrated it with the video using ffmpeg and this guide. I was amazed and impressed how well Audacity was able to clean up the audio with no manual adjustment.

Sharing: I shared the resulting video via DropBox. However, when sharing via DropBox I noticed that the audio quality was significantly degraded in the DropBox preview on the iOS app. Be sure to download the file to assess whether the audio quality is adequate (it was fine when downloaded).

Sunday, November 15, 2020

Data types for build system dependencies

Summary: Monadic and early cut-off? Use a sequence of sets.

In the Build Systems a la Carte paper we talk about the expressive power of various types of build systems. We deliberately simplify away parallelism and implementation concerns, but those details matter. In this post I'm going to discuss some of those details, specifically the representation of dependencies.

Applicative build systems

In an applicative build system like Make, all dependencies for a target are known before you start executing the associated action. That means the dependencies have no ordering, so are best represented as a set. However, because they can be calculated from the target, they don't usually need to be stored separately. The dependencies can also be evaluated in parallel. To build a target you evaluate the dependencies to values, then evaluate the action.

Early cut-off is when an action is skipped because none of its dependencies have changed value, even if some dependencies might have required recomputing. This optimisation can be incredibly important for build systems with generated code - potentially seconds vs hours of build time. To obtain early cut-off in applicative systems, after evaluating the dependencies you compare them to the previous results, and only run the action if there were changes.

Monadic build systems

In monadic build systems like Shake, the representation of dependencies is more complex. If you have an alternative mechanism of detecting whether a rule is dirty (e.g. reverse dependencies) you don't need to record the dependencies at all. If the key is dirty, you start executing the action, and that will request the dependencies it needs. The action can then suspend, calculate the dependencies, and continue.

If you want early cut-off in a monadic build system, you need to rerun the dependencies in advance, and if they all have the same result, skip rerunning the action. Importantly, you probably want to rerun the dependencies in the same order that the action originally requested them -- otherwise you might pay a severe and unnecessary time penalty. As an example, let's consider an action:

opt <- need "is_optimised"
object <- if opt then need "foo.optimised" else need "foo.unoptimised"
link object

This rule is monadic, as whether you need the optimised or unoptimised dependency depends on the result of calculating some is_optimised property. If on the first run is_optimised is True, then we build foo.optimised. On the second run, if is_optimised is False, it is important we don't build foo.optimised as that might take a seriously long time and be entirely redundant. Therefore, it's important when checking for early cut-off we build in the order that the previous action requested the dependencies, and stop on the first difference we encounter.

(If you have unlimited resources, e.g. remote execution, it might be profitable to evaluate everything in parallel - but we're assuming that isn't the case here.)

Provided a rule performs identically between runs (i.e. is deterministic and hasn't been changed), everything that we request to check for early cut-off will still be needed for real, and we won't have wasted any work. For all these reasons, it is important to store dependencies as a sequence (e.g. a list/vector).

Monadic build systems plus parallelism

Applicative build systems naturally request all their dependencies in parallel, but monadic build systems are naturally one dependency at a time. To regain parallelism, in build systems like Shake the primitive dependency requesting mechanism takes a set of dependencies that are computed in parallel. While requesting dependencies individually or in bulk gives the same result, in bulk gives significantly more parallelism. (In Shake we use lists to track correspondence between requests and results, but it's morally a set.)

As we saw previously, it is still important that for early cut-off you reproduce the dependencies much like they were in the action. That means you request dependencies in the order they were requested, and when they were requested in bulk, they are also checked in bulk. Now we have a sequence of sets to represent dependencies, where the elements of the sets can be checked in parallel, but the sequence must be checked in order.

Monadic build systems plus explicit parallelism

What if we add an explicit parallelism operator to a monadic build system, something like parallel :: [Action a] -> IO [a] to run arbitrary actions in parallel (which is what Shake provides). Now, instead of a sequence of sets, we have a tree of parallelism. As before it's important when replaying that the dependencies are requested in order, but also that as much is requested in parallel as possible.

What Shake does

Shake is a monadic build system with early cut-off, parallelism and explicit parallelism. When building up dependencies it uses a tree representation. The full data type is:

data Traces
    = None
    | One Trace
    | Sequence Traces Traces
    | Parallel [Traces]

Sequenced dependencies are represented with Sequence and the traces captured by parallelism use Parallel. Importantly, constructing Traces values is nicely O(1) in all cases. (Shake v0.19.1 used a different representation and repeatedly normalised it, which could have awful time complexity - potentially O(2^n) in pathological cases.)

While these traces store complete information, actually evaluating that trace when checking for rebuilds would be complicated. Instead, we flatten that representation to [[Trace]] for writing to the Shake database. The outer list is a sequence, the inner list is morally a set. We have the invariant that no Trace value will occur multiple times, since if you depend on something once, and then again, the second dependency was irrelevant. To flatten Parallel computations we take the first required dependency in each parallel action, merge them together, and then repeat for the subsequent actions. If you run code like:

parallel [
    need ["a"] >> parallel [need ["b"], need ["c"]]
    need ["d"]
]

It will get flattened to appear as though you wrote need ["a","d"] >> need ["b","c"]. When checking, it will delay the evaluation of b and c until after d completes, even though that is unnecessary. But simplifying traces at the cost of marginally less rebuild parallelism for those who use explicit parallelism (which is not many) seems like the right trade-off for Shake.

Conclusion

Applicative build systems should use sets for their dependencies. Monadic build systems should use sets, but if they support early cut-off, should use sequences of sets.

Monday, November 09, 2020

Turing Incomplete Languages

Summary: Some languages ban recursion to ensure programs "terminate". That's technically true, but usually irrelevant.

In my career there have been three instances where I've worked on a programming language that went through the evolution:

  1. Ban recursion and unbounded loops. Proclaim the language is "Turing incomplete" and that all programs terminate.
  2. Declare that Turing incomplete programs are simpler. Have non-technical people conflate terminate quickly with terminate eventually.
  3. Realise lacking recursion makes things incredibly clunky to express, turning simple problems into brain teasers.
  4. Add recursion.
  5. Realise that the everything is better.

Before I left university, this process would have sounded ridiculous. In fact, even after these steps happened twice I was convinced it was the kind of thing that would never happen again. Now I've got three instances, it seems worth writing a blog post so for case number four I have something to refer to.

A language without recursion or unbounded loops

First, let's consider a small simple statement-orientated first-order programming language. How might we write a non-terminating program? There are two easy ways. Firstly, write a loop - while (true) {}. Second, write recursion, void f() { f() }. We can ban both of those, leaving only bounded iteration of the form for x in xs { .. } or similar. Now the language is Turing incomplete and all programs terminate.

The lack of recursion makes programs harder to write, but we can always use an explicit stack with unbounded loops.

The lack of unbounded loops isn't a problem provided we have an upper bound on how many steps our program might take. For example, we know QuickSort has worst-case complexity O(n^2), so if we can write for x in range(0, n^2) { .. } then we'll have enough steps in our program such that we never reach the bound.

But what if our programming language doesn't even provide a range function? We can synthesise it by realising that in a linear amount of code we can produce exponentially large values. As an example:

double xs = xs ++ xs -- Double the length of a list
repeated x = double (double (double (double (double (double (double (double (double (double [x])))))))))

The function repeated 1 makes 10 calls to double, and creates a list of length 2^10 (1024). A mere 263 more calls to double and we'll have a list long enough to contain each atom in the universe. With some tweaks we can cause doubling to stop at a given bound, and generate numbers in sequence, giving us range to any bound we pick.

We now have a menu of three techniques that lets us write almost any program we want to do so:

  1. We can encoding recursion using an explicit stack.
  2. We can change unbounded loops into loops with a conservative upper bound.
  3. We can generate structures of exponential size with a linear amount of code.

The consequences

Firstly, we still don't have a Turing complete language. The code will terminate. But there is no guarantee on how long it will take to terminate. Programs that take a million years to finish technically terminate, but probably can't be run on an actual computer. For most of the domains I've seen Turing incompleteness raised, a runtime of seconds would be desirable. Turing incompleteness doesn't help at all.

Secondly, after encoding the program in a tortured mess of logic puzzles, the code is much harder to read. While there are three general purpose techniques to encode the logic, there are usually other considerations that cause each instance to be solved differently. I've written tree traversals, sorts and parsers in such restricted languages - the result is always a lot of comments and at least one level of unnecessary indirection.

Finally, code written in such a complex style often performs significantly worse. Consider QuickSort - the standard implementation takes O(n^2) time worst case, but O(n log n) time average case, and O(log n) space (for the stack). If you take the approach of building an O(n^2) list before you start to encode a while loop, you end up with O(n^2) space and time. Moreover, while in normal QuickSort the time complexity is counting the number of cheap comparisons, in an encoded version the time complexity relates to allocations, which can be much more expensive as a constant factor.

The solution

Most languages with the standard complement of if/for etc which are Turing incomplete do not gain any benefit from this restriction. One exception is in domains where you are proving properties or doing analysis, as two examples:

  1. Dependently typed languages such as Idris, which typically have much more sophisticated termination checkers than just banning recursion and unbounded loops.
  2. Resource bounded languages such as Hume, which allow better analysis and implementation techniques by restricting how expressive the language is.

Such languages tend to be a rarity in industry. In all the Turing incomplete programming languages I've experienced, recursion was later added, programs were simplified, and programming in the language became easier.

While most languages I've worked on made this evolution in private, one language, DAML from Digital Asset, did so in public. In 2016 they wrote:

DAML was intentionally designed not to be Turing-complete. While Turing-complete languages can model any business domain, what they gain in flexibility they lose in analysability.

Whereas in 2020 their user manual says:

If there is no explicit iterator, you can use recursion. Let’s try to write a function that reverses a list, for example.

Note that while I used to work at Digital Asset, these posts both predate and postdate my time there.

Tuesday, September 22, 2020

Don't use Ghcide anymore (directly)

Summary: I recommend people use the Haskell Language Server IDE.

Just over a year ago, I recommended people looking for a Haskell IDE experience to give Ghcide a try. A few months later the Haskell IDE Engine and Ghcide teams agreed to work together on Haskell Language Server - using Ghcide as a library as the core, with the plugins/installer experience from the Haskell IDE Engine (by that stage we were already both using the same Haskell setup and LSP libraries). At that time Alan Zimmerman said to me:

"We will have succeeded in joining forces when you (Neil) start recommending people use Haskell Language Server."

I'm delighted to say that time has come. For the last few months I've been both using and recommending Haskell Language Server for all Haskell IDE users. Moreover, for VS Code users, I recommend simply installing the Haskell extension which downloads the right version automatically. The experience of Haskell Language Server is better than either the Haskell IDE Engine or Ghcide individually, and is improving rapidly. The teams have merged seamlessly, and can now be regarded as a single team, producing one IDE experience.

There's still lots of work to be done. And for those people developing the IDE, Ghcide remains an important part of the puzzle - but it's now a developer-orientated piece rather than a user-orientated piece. Users should follow the README at Haskell Language Server and report bugs against Haskell Language Server.