Monday, April 30, 2007

Phantom Types for Real Problems

As part of my work on Hoogle 4 I've been bashing my deferred binary library quite a lot. I encountered one bug which I hadn't considered, and was able to come up with a solution, using phantom types. I've read about phantom types before, but never actually put them into practice!

The idea of the deferred binary library is that writing instances should be as short as possible - they don't need to be massively high performance, and I wanted to see if I could do it with minimal boilerplate. Each instance needs to be able to both save and load a value - which means it needs to do some sort of pattern-match (to see which value it has), and have the constructor available (to create a new value). The shortest scheme I was able to come up with is:

data Foo = Foo String Bool | Bar Foo

instance BinaryDefer Foo where
bothDefer = defer
[\ ~(Foo a b) -> unit Foo << a << b
,\ ~(Bar a) -> unit Bar << a

The idea is that to save a value, you step through the list trying to match each constructor, using Control.Exception.catch to see if the value does not match. Once you have matched, you save an index, and the components. To load a value, you take the index and pick the correct construction. This time, you deliberately do not mention any of the fields, and by remaining lazy on the fields, the pattern match is not invoked. This solution took quite a lot of thought to arrive at, but does minimize the size of the instance declarations.

The instance relies crucially on the laziness/strictness of the functions - the load function must not examine any of its arguments (or you will pattern-match fail), the save function must examine at least one argument (to provoke pattern-match failure). After writing a description such as this, I am sure some users will have spotted the issue!

data Bool = False | True

instance BinaryDefer Bool where
bothDefer = defer
[\ ~(False) -> unit False
,\ ~(True) -> unit True

The problem here is 0-arity constructors, the load code works just fine, but the save code has no fields it can demand to ensure the correct constructor has been reached. I encountered this bug in Hoogle 4 - it is subtle, and in a pile of instance rules, these 0-arity cases can easily be overlooked. The solution is to introduce a new operator, (<<!), whose only purpose is to strictly demand something:

instance BinaryDefer Bool where
bothDefer = defer
[\ ~(x@False) -> unit False <<! x
,\ ~(x@True) -> unit True <<! x

This (<<!) operator is only required for 0-arity constructors, but is not harmful in other cases. Now the correct fix has been decided upon, we can fix the Hoogle instances. Unfortunately, we can still write the incorrect case, and will get no warning - a bug just waiting to happen. The solution to this is to use phantom types.

The whole point of a phantom type is that it doesn't actually reflect real data, but exists only at the type level, artificially restricting type signatures to ensure some particular property. In our case, the property we want to ensure is that "unit" is never used on its own, but either with (<<), (<<~) or (<<!). We can achieve this by giving unit a different type, even though in reality all operate on the same type of data:

data PendingNone
data PendingSome

unit :: a -> Pending a PendingNone
(<<) :: BinaryDefer a => Pending (a -> b) p -> a -> Pending b PendingSome
(<<~) :: BinaryDefer a => Pending (a -> b) p -> a -> Pending b PendingSome
(<<!) :: Pending a p -> a -> Pending a PendingSome

Now the type of unit is PendingNone, but the type of all other operations is PendingSome. By demanding that the list "defer" accepts is all PendingSome, we are able to raise a type error if the (<<!) is missed.

By using phantom types, a easy to make error can be guarded against statically. Plus, its not very hard to do!


Stefan O'Rear said...

Somehow I think it would be simpler and more practical to abandon
these types and pattern matching fu, accept 4x as much boilerplate,
and let Derive clean up the mess :)

Neil Mitchell said...

Stefan: When I came up with these instances, there was no such thing as Derive :)

Even so, I'm not sure Derive will cut it - the important bit is that you can modify the instance to make certain fields lazy:

bothDefer = defer [\ ~(Foo a b) -> unit Foo <<~ a << b]

Note the use of <<~ instead of <<, this allows the a field to be lazy instead of strict.

However, you can wrap a field in a Defer type, which has the same effect, so maybe it isn't worth the complication.