Wednesday, September 20, 2017

Shake 0.16 - revised rule definitions

Summary: I've just released shake v0.16. A lot has changed, but it's probably only visible if you have defined your own rules or oracles.

Shake-0.16 is now out, 8 months since the last release, and with a lot of improvements. For full details read the changelog, but in this post I'm going to go through a few of the things that might have the biggest impact on users.

Rule types redefined

Since the first version of Shake there has been a Rule key value type class defining all rule types - for instance the file rule type has key of filename and value of modification time. With version 0.16 the type class is gone, rules are harder to write, but offer higher performance and more customisation. For people using the builtin rule types, you'll see those advantages, and in the future see additional features that weren't previously possible. For people defining custom rule types, those will require rewriting - read the docs and if things get tough, ask on StackOverflow.

The one place many users might encounter the changes are that oracle rules now require a type instance defining between the key and value types. For example, if defining an oracle for the CompilerVersion given the CompilerName, you would have to add:

type instance RuleResult CompilerName = CompilerVersion

As a result of this type instance the previously problematic askOracle can now infer the result type, removing possible sources of error and simplifying callers.

The redefining of rule types represents most of the work in this release.

Add cmd_

The cmd_ function is not much code, but I suspect will turn out to be remarkably useful. The cmd function in Shake is variadic (can take multiple arguments) and polymorphic in the return type (you can run it in multiple monads with multiple results). However, because of the overloading, if you didn't use the result of cmd it couldn't be resolved, leading to ugly code such as () <- cmd args. With cmd_ the result is constrained to be m (), so cmd_ args can be used.

Rework Skip/Rebuild

Since the beginning Shake has tried to mirror the make command line flags. In terms of flags to selectively control rebuilding, make is based entirely on ordered comparison of timestamps, and flags such as --assume-new don't make a lot of sense for Shake. In this release Shake stops trying to pretend to be make, removing the old flags (that never worked properly) and adding --skip (don't build something even if it is otherwise required) and --build (build something regardless). Both these flags can take file patterns, e.g, --build=**/*.o to rebuild all object files. I don't think these flags are finished with, but it's certainly less of a mess than before.

Sunday, September 17, 2017

Existential Serialisation

Summary: Using static pointers you can perform binary serialisation of existentials.

Many moons ago I asked how to write a Binary instance for a type including an existential, such as:

data Foo = forall a . (Typeable a, Binary a) => Foo a

Here we have a constructor Foo which contains a value. We don't statically know the type of the contained value, but we do know it has the type classes Typeable (so we can at runtime switch on its type) and Binary (so we can serialise it). But how can we deserialise it? We can store the relevant TypeRep when serialising, but when deserialising there is no mechanism to map from TypeRep to a Binary instance.

In Shake, I needed to serialise existentials, as described in the S4.1 of the original paper. My solution was to build a global mapping table, storing pairs of TypeRep and Binary instances for the types I knew were relevant. This solution works, but cannot deserialise anything that has not already been added to the global table, which required certain functions to live in weird places to ensure that they were called before deserialisation. Effective, but ugly.

Recently Abhiroop Sarkar suggested using the relatively new static pointers extension. This extension lets you turn top-level bindings with no arguments into a StaticPtr which can then be serialised/deserialsed, even between different instances of a process. To take advantage of this feature, we can redefine Foo as:

data Foo = forall a . (StaticFoo a, Binary a) => Foo a

class StaticFoo a where
    staticFoo :: a -> StaticPtr (Get Foo)

The approach is to switch from serialising the TypeRep (from which we try to look up Get Foo), to serialising the Get Foo directly. We can write a Binary Foo instance by defining put:

put :: Foo -> Put
put (Foo x) = do
    put $ staticKey $ staticFoo x
    put x

Here we simply grab a StaticPtr (Get Foo) which can deserialise the object, then use staticKey to turn it into something that can be serialised itself. Next, we write out the payload. To reverse this process we define get:

get :: Get Foo
get = do
    ptr <- get
    case unsafePerformIO (unsafeLookupStaticPtr ptr) of
        Just value -> deRefStaticPtr value :: Get Foo
        Nothing -> error "Binary Foo: unknown static pointer"

We first get the staticKey, use unsafeLookupStaticPtr to turn it into a StaticPtr (Get Foo) followed by deRefStaticPtr to turn it into a Get Foo. The unsafe prefix on these functions is justified - bugs while developing this code resulted in segfaults.

The final piece of the puzzle is defining StaticFoo instances for the various types we might want to serialise. As an example for String:

instance StaticFoo String where
    staticFoo _ = static (Foo <$> (get :: Get String))

We perform the get, wrap a Foo around it, and then turn it into a StaticPtr. All other types follow the same pattern, replacing String with Int (for example). The expression passed to static must have no free variables, including type variables, so we cannot define an instance for a, or even an instance for [a] - it must be [Char] and [Int] separately.

A complete code sample and test case is available here.

This approach works, and importantly allows extra constraints on the existential. The two disadvantages are: 1) that static isn't very flexible or easy to abstract over, resulting in a lot of StaticFoo boilerplate; 2) the static pointer is not guaranteed to be valid if the program changes in any way.

Will Shake be moving over to this approach? No. The next version of Shake has undergone an extensive rewrite, and in the process, moved away from needing this feature. A problem I had for 8 years has been solved, just as I no longer need the solution!