Sunday, November 15, 2020

Data types for build system dependencies

Summary: Monadic and early cut-off? Use a sequence of sets.

In the Build Systems a la Carte paper we talk about the expressive power of various types of build systems. We deliberately simplify away parallelism and implementation concerns, but those details matter. In this post I'm going to discuss some of those details, specifically the representation of dependencies.

Applicative build systems

In an applicative build system like Make, all dependencies for a target are known before you start executing the associated action. That means the dependencies have no ordering, so are best represented as a set. However, because they can be calculated from the target, they don't usually need to be stored separately. The dependencies can also be evaluated in parallel. To build a target you evaluate the dependencies to values, then evaluate the action.

Early cut-off is when an action is skipped because none of its dependencies have changed value, even if some dependencies might have required recomputing. This optimisation can be incredibly important for build systems with generated code - potentially seconds vs hours of build time. To obtain early cut-off in applicative systems, after evaluating the dependencies you compare them to the previous results, and only run the action if there were changes.

Monadic build systems

In monadic build systems like Shake, the representation of dependencies is more complex. If you have an alternative mechanism of detecting whether a rule is dirty (e.g. reverse dependencies) you don't need to record the dependencies at all. If the key is dirty, you start executing the action, and that will request the dependencies it needs. The action can then suspend, calculate the dependencies, and continue.

If you want early cut-off in a monadic build system, you need to rerun the dependencies in advance, and if they all have the same result, skip rerunning the action. Importantly, you probably want to rerun the dependencies in the same order that the action originally requested them -- otherwise you might pay a severe and unnecessary time penalty. As an example, let's consider an action:

opt <- need "is_optimised"
object <- if opt then need "foo.optimised" else need "foo.unoptimised"
link object

This rule is monadic, as whether you need the optimised or unoptimised dependency depends on the result of calculating some is_optimised property. If on the first run is_optimised is True, then we build foo.optimised. On the second run, if is_optimised is False, it is important we don't build foo.optimised as that might take a seriously long time and be entirely redundant. Therefore, it's important when checking for early cut-off we build in the order that the previous action requested the dependencies, and stop on the first difference we encounter.

(If you have unlimited resources, e.g. remote execution, it might be profitable to evaluate everything in parallel - but we're assuming that isn't the case here.)

Provided a rule performs identically between runs (i.e. is deterministic and hasn't been changed), everything that we request to check for early cut-off will still be needed for real, and we won't have wasted any work. For all these reasons, it is important to store dependencies as a sequence (e.g. a list/vector).

Monadic build systems plus parallelism

Applicative build systems naturally request all their dependencies in parallel, but monadic build systems are naturally one dependency at a time. To regain parallelism, in build systems like Shake the primitive dependency requesting mechanism takes a set of dependencies that are computed in parallel. While requesting dependencies individually or in bulk gives the same result, in bulk gives significantly more parallelism. (In Shake we use lists to track correspondence between requests and results, but it's morally a set.)

As we saw previously, it is still important that for early cut-off you reproduce the dependencies much like they were in the action. That means you request dependencies in the order they were requested, and when they were requested in bulk, they are also checked in bulk. Now we have a sequence of sets to represent dependencies, where the elements of the sets can be checked in parallel, but the sequence must be checked in order.

Monadic build systems plus explicit parallelism

What if we add an explicit parallelism operator to a monadic build system, something like parallel :: [Action a] -> IO [a] to run arbitrary actions in parallel (which is what Shake provides). Now, instead of a sequence of sets, we have a tree of parallelism. As before it's important when replaying that the dependencies are requested in order, but also that as much is requested in parallel as possible.

What Shake does

Shake is a monadic build system with early cut-off, parallelism and explicit parallelism. When building up dependencies it uses a tree representation. The full data type is:

data Traces
    = None
    | One Trace
    | Sequence Traces Traces
    | Parallel [Traces]

Sequenced dependencies are represented with Sequence and the traces captured by parallelism use Parallel. Importantly, constructing Traces values is nicely O(1) in all cases. (Shake v0.19.1 used a different representation and repeatedly normalised it, which could have awful time complexity - potentially O(2^n) in pathological cases.)

While these traces store complete information, actually evaluating that trace when checking for rebuilds would be complicated. Instead, we flatten that representation to [[Trace]] for writing to the Shake database. The outer list is a sequence, the inner list is morally a set. We have the invariant that no Trace value will occur multiple times, since if you depend on something once, and then again, the second dependency was irrelevant. To flatten Parallel computations we take the first required dependency in each parallel action, merge them together, and then repeat for the subsequent actions. If you run code like:

parallel [
    need ["a"] >> parallel [need ["b"], need ["c"]]
    need ["d"]
]

It will get flattened to appear as though you wrote need ["a","d"] >> need ["b","c"]. When checking, it will delay the evaluation of b and c until after d completes, even though that is unnecessary. But simplifying traces at the cost of marginally less rebuild parallelism for those who use explicit parallelism (which is not many) seems like the right trade-off for Shake.

Conclusion

Applicative build systems should use sets for their dependencies. Monadic build systems should use sets, but if they support early cut-off, should use sequences of sets.

Monday, November 09, 2020

Turing Incomplete Languages

Summary: Some languages ban recursion to ensure programs "terminate". That's technically true, but usually irrelevant.

In my career there have been three instances where I've worked on a programming language that went through the evolution:

  1. Ban recursion and unbounded loops. Proclaim the language is "Turing incomplete" and that all programs terminate.
  2. Declare that Turing incomplete programs are simpler. Have non-technical people conflate terminate quickly with terminate eventually.
  3. Realise lacking recursion makes things incredibly clunky to express, turning simple problems into brain teasers.
  4. Add recursion.
  5. Realise that the everything is better.

Before I left university, this process would have sounded ridiculous. In fact, even after these steps happened twice I was convinced it was the kind of thing that would never happen again. Now I've got three instances, it seems worth writing a blog post so for case number four I have something to refer to.

A language without recursion or unbounded loops

First, let's consider a small simple statement-orientated first-order programming language. How might we write a non-terminating program? There are two easy ways. Firstly, write a loop - while (true) {}. Second, write recursion, void f() { f() }. We can ban both of those, leaving only bounded iteration of the form for x in xs { .. } or similar. Now the language is Turing incomplete and all programs terminate.

The lack of recursion makes programs harder to write, but we can always use an explicit stack with unbounded loops.

The lack of unbounded loops isn't a problem provided we have an upper bound on how many steps our program might take. For example, we know QuickSort has worst-case complexity O(n^2), so if we can write for x in range(0, n^2) { .. } then we'll have enough steps in our program such that we never reach the bound.

But what if our programming language doesn't even provide a range function? We can synthesise it by realising that in a linear amount of code we can produce exponentially large values. As an example:

double xs = xs ++ xs -- Double the length of a list
repeated x = double (double (double (double (double (double (double (double (double (double [x])))))))))

The function repeated 1 makes 10 calls to double, and creates a list of length 2^10 (1024). A mere 263 more calls to double and we'll have a list long enough to contain each atom in the universe. With some tweaks we can cause doubling to stop at a given bound, and generate numbers in sequence, giving us range to any bound we pick.

We now have a menu of three techniques that lets us write almost any program we want to do so:

  1. We can encoding recursion using an explicit stack.
  2. We can change unbounded loops into loops with a conservative upper bound.
  3. We can generate structures of exponential size with a linear amount of code.

The consequences

Firstly, we still don't have a Turing complete language. The code will terminate. But there is no guarantee on how long it will take to terminate. Programs that take a million years to finish technically terminate, but probably can't be run on an actual computer. For most of the domains I've seen Turing incompleteness raised, a runtime of seconds would be desirable. Turing incompleteness doesn't help at all.

Secondly, after encoding the program in a tortured mess of logic puzzles, the code is much harder to read. While there are three general purpose techniques to encode the logic, there are usually other considerations that cause each instance to be solved differently. I've written tree traversals, sorts and parsers in such restricted languages - the result is always a lot of comments and at least one level of unnecessary indirection.

Finally, code written in such a complex style often performs significantly worse. Consider QuickSort - the standard implementation takes O(n^2) time worst case, but O(n log n) time average case, and O(log n) space (for the stack). If you take the approach of building an O(n^2) list before you start to encode a while loop, you end up with O(n^2) space and time. Moreover, while in normal QuickSort the time complexity is counting the number of cheap comparisons, in an encoded version the time complexity relates to allocations, which can be much more expensive as a constant factor.

The solution

Most languages with the standard complement of if/for etc which are Turing incomplete do not gain any benefit from this restriction. One exception is in domains where you are proving properties or doing analysis, as two examples:

  1. Dependently typed languages such as Idris, which typically have much more sophisticated termination checkers than just banning recursion and unbounded loops.
  2. Resource bounded languages such as Hume, which allow better analysis and implementation techniques by restricting how expressive the language is.

Such languages tend to be a rarity in industry. In all the Turing incomplete programming languages I've experienced, recursion was later added, programs were simplified, and programming in the language became easier.

While most languages I've worked on made this evolution in private, one language, DAML from Digital Asset, did so in public. In 2016 they wrote:

DAML was intentionally designed not to be Turing-complete. While Turing-complete languages can model any business domain, what they gain in flexibility they lose in analysability.

Whereas in 2020 their user manual says:

If there is no explicit iterator, you can use recursion. Let’s try to write a function that reverses a list, for example.

Note that while I used to work at Digital Asset, these posts both predate and postdate my time there.