## Sunday, June 29, 2014

### Optimisation with Continuations

Summary: Continuations are confusing. Here we solve a simple problem (that is at the heart of the Shake build system) using continuations.

Imagine we are given two `IO a` computations, and want to run them both to completion, returning the first `a` value as soon as it is produced (let's ignore exceptions). Writing that in Haskell isn't too hard:

``````parallel :: IO a -> IO a -> IO a
parallel t1 t2 = do
once <- newOnce
var <- newEmptyMVar
forkIO \$ t1 >>= once . putMVar var
forkIO \$ t2 >>= once . putMVar var
readMVar var
``````

We create an empty variable `var` with `newEmptyMVar`, fire off two threads with `forkIO` to run the computations which write their results to `var`, and finish by reading as soon as a value is available with `readMVar`. We use a utility `newOnce` to ensure that only one of the threads calls `putMVar`, defined as:

``````newOnce :: IO (IO () -> IO ())
newOnce = do
run <- newMVar True
return \$ \act -> do
b <- modifyMVar run \$ \b -> return (False, b)
when b act
``````

Calling `newOnce` produces a function that given an action will either run it (the first time) or ignore it (every time after). Using `newOnce` we only call `putMVar` for the first thread to complete.

This solution works, and Shake does something roughly equivalent (but much more complex) in it's main scheduler. However, this solution has a drawback - it uses two additional threads. Can we use only one additional thread?

For the problem above, running the computations to completion without retrying, you can't avoid two additional threads. To use only one additional thread and run in parallel you must run one of the operations on the calling thread - but if whatever you run on the additional thread finishes first, there's no way to move the other computation off the the calling thread and return immediately. However, we can define:

``````type C a = (a -> IO ()) -> IO ()
``````

Comparing `IO a` to `C a`, instead of returning an `a`, we get given a function to pass the `a` to (known as a continuation). We still "give back" the `a`, but not as a return value, instead we pass it onwards to a function. We assume that the continuation is called exactly once. We can define `parallel` on `C`:

``````parallel :: C a -> C a -> C a
parallel t1 t2 k = do
once <- newOnce
forkIO \$ t1 (once . k)
t2 (once . k)
``````

This definition takes the two computations to run (`t1` and `t2`), plus the continuation `k`. We fork a separate thread to run `t1`, but run `t2` on the calling thread, using only one additional thread. While the `parallel` function won't return until after `t2` completes, subsequent processing using the `a` value will continue as soon as either finishes.

Looking at the transformers package, we see Control.Monad.Trans.Cont contains `ContT`, which is defined as:

``````newtype ContT r m a = ContT {runContT :: (a -> m r) -> m r}
``````

If we use `r` for `()` and `IO` for `m` then we get the same type as `C`. We can redefine `C` as:

``````type C a = ContT () IO a
``````

The changes to `parallel` just involve wrapping with `ContT` and unwrapping with `runContT`:

``````parallel :: C a -> C a -> C a
parallel t1 t2 = ContT \$ \k -> do
once <- newOnce
forkIO \$ runContT t1 (once . k)
runContT t2 (once . k)
``````

Now we've defined our `parallel` function in terms of `C`, it is useful to convert between `C` and `IO`:

``````toC :: IO a -> C a
toC = liftIO

fromC :: C a -> IO a
fromC c = do
var <- newEmptyMVar
forkIO \$ runContT c \$ putMVar var
readMVar var
``````

The `toC` function is already defined by `ContT` as `liftIO`. The `fromC` function needs to change from calling a callback on any thread, to returning a value on this thread, which we can do with a `forkIO` and `MVar`. Given `parallel` on `IO` takes two additional threads, and `parallel` on `C` takes only one, it's not too surprising that converting `IO` to `C` requires an additional thread.

Aren't threads cheap?

Threads in Haskell are very cheap, and many people won't care about one additional thread. However, each thread comes with a stack, which takes memory. The stack starts off small (1Kb) and grows/shrinks in 32Kb chunks, but if it ever exceeds 1Kb, it never goes below 32Kb. For certain tasks (e.g. Shake build rules) often some operation will take a little over 1Kb in stack. Since each active rule (started but not finished) needs to maintain a stack, and for huge build systems there can be 30K active rules, you can get over 1Gb of stack memory. While stacks and threads are cheap, they aren't free.

The plan for Shake

Shake currently has one thread per active rule, and blocks that thread until all dependencies have rebuilt. The plan is to switch to continuations and only have one thread per rule executing in parallel. This change will not require any code changes to Shake-based build systems, hopefully just reduce memory usage. Until then, huge build systems may wish to pass `+RTS -kc8K`, which can save several 100Mb of memory.

## Sunday, June 22, 2014

### Announcing ghc-make

Summary: I've released `ghc-make`, which is an alternative to `ghc --make`.

I've just released v0.2 of `ghc-make` (on Hackage, on Github). This package provides an alternative to `ghc --make` which supports parallel compilation of modules and runs faster when nothing needs compiling. To unpack that:

• Parallel compilation: Call `ghc-make -j4` and your program will build by running up to four `ghc -c` programs simultaneously. You usually need at parallel factor of 2x-3x to match `ghc --make` on a single core, since `ghc --make` does a lot of caching that is unavailable to `ghc-make`. If you use `-j1`, or omit a `-j` flag, the compilation will be based on `ghc --make` and should take the same time to compile.
• Faster when nothing needs rebuilding: If `ghc --make` is slow when there is nothing to rebuild, and most of your executions do no rebuilding, `ghc-make` will make things go faster. On Windows I have one project where `ghc --make` takes 23 seconds and `ghc-make` takes 0.2 seconds (more than 100x faster). Particularly useful for scripts that do `ghc --make Main && ./Main`.

See the README for full details.

How do I use it?

Install `ghc-make` (`cabal update && cabal install ghc-make`). Then replace your calls to `ghc my -arguments` with `ghc-make my -arguments`. Almost all arguments and flags supported by `ghc` are supported by `ghc-make` - it is intended as a drop-in replacement. Let me know about any bugs on the bug tracker.

To use `ghc-make` with Cabal, try `cabal build --with-ghc=ghc-make --ghc-options=-j4`. (This technique is due to the `ghc-parmake` project, which also does parallel `ghc --make` compiles.)

How is it implemented?

This program uses the Shake library for dependency tracking and `ghc --make` for building. The actual `ghc-make` project itself only contains 4 modules, and the largest of those is the test suite.

To pass options to the underlying Shake build system prefix them with `--shake`, for example `--shake--report=-` will write a profile report to stdout and `--shake--help` will list the available Shake options.

## Tuesday, June 03, 2014

### Shake file hashes/digests

Summary: Shake can now be configured to check file hashes/digests instead of modification times, which is great if you frequently switch git branches.

Build systems run actions on files, skipping the actions if the files have not changed. An important part of that process involves determining if a file has changed. The Make build system uses modification times to impose an ordering on files, but more modern build systems tend to use the modification time as a proxy for the file contents, where any change indicates the contents have changed (e.g. Shake, Ninja). The alternative approach is to compute a hash/digest of the file contents (e.g. SCons, Redo). As of version 0.13, Shake supports both methods, along with three combinations of them - in this post I'll go through the alternatives, and their advantages/disadvantages.

Modification times rely on the file-system updating a timestamp whenever the file contents are written. Modification time is cheap to query. Saving a file afresh will cause the modification time to change, even if the contents do not - as a result `touch` causes rebuilds. Unfortunately, working with git branches sometimes modifies a file but leaves it with the same contents, which can result in unnecessary rebuilds (see the bottom of this post for one problematic git workflow).

File digests are computed from the file contents, and accurately reflect if the file contents have changed. There is a remote risk that the file will change without its digest changing, but unless your build system users are actively hostile attackers, that is unlikely. The disadvantage of digests is that they are expensive to compute, requiring a full scan of the file. In particular, after every rule finishes it must scan the file it just built, and on startup the build system must scan all the files. Scanning all the files can cause empty rebuilds to take minutes. When using digests, Shake also records file sizes, since if a file size changes, we know the digest will not match - making most changed digests cheap to detect.

Modification time and file digests combines the two methods so that a file only rebuilds if both the modification time and digest have changed. The advantage is that for files that have not changed the modification time will cheaply detect that, without ever computing the file hash. If the file has changed modification time, then a digest check may save an expensive rebuild, but even if it doesn't, the cost is likely to be small compared to rerunning the rule.

Modification time and file digests on inputs takes the previous method, but only computes digests for input files. Generated files (e.g. compiled binaries) tend to be large (expensive to compute digests) and not edited (rarely end up the same), so a poor candidate for digests. The file size check means this restriction is unlikely to make a difference when checking all files, but may have some limited impact when building.

Modification time or file digests combines the two methods so that a file rebuilds if either modification time or file digest have changed. I can't think of a sensible reason for using this setting, but maybe someone else can?

Suggestions for Shake users

All these options can be set with the `shakeChange` field of `shakeOptions`, or using command line flags such as `--digest` or `--digest-and-input`. Switching between some change modes will cause all files to rebuild, so I recommend finding a suitable mode and sticking to it.

• If you can't trust the modification times to change, use `ChangeDigest`.
• If you are using git and multiple branches, use `ChangeModtimeAndDigestInput`.
• If you have generated files that rewrite themselves but do not change, I recommend using `writeFileChanged` when generating the file, but otherwise use `ChangeModtimeAndDigest`.
• Otherwise, I currently recommend using `ChangeModtime`, but some users may prefer `ChangeModtimeAndDigest`.

Appendix: The git anti-build-system workflow

Certain common git workflows change files from the current version, to an old version, then back again - causing modification-time checks to run redundant rebuilds. As an example, imagine we have two branches `foo` and `bar`, based on remote branches `origin/foo` and `origin/bar`, both of which themselves are regularly synced to a common `origin/master` branch. The difference between `origin/foo` and `origin/bar` is likely to be small. To switch from an up-to-date `bar` to an up-to-date `foo` we can run `git checkout foo && git pull`. These commands switch to an out-of-date `foo`, then update it. As a result, any file that has changed since we last updated `foo` will change to an old version, then change to a new version, likely the same as it was before we started. This workflow requires build systems to support file digests.