Wednesday, January 31, 2007

Writing code by types

In the past few days I've found that while I'm writing the code, the type checker is the one deciding what the code should be. To take an example, I was recently writing some SYB code using gfoldl:

> gfoldl :: (forall a b . Data a => c (a -> b) -> a -> c b) -> (forall g . g -> c g) -> a -> c a

I decided what c had to be:

> newtype C x a = C ([x], [x] -> (a, [x]))

(where "C x" was the instantiated type of c)

From there I didn't have much of a clue what I was doing, but by the time you've got that far, there is really virtually only one type correct program that can possibly work! Finding what that program was took a while, but once you have got this far you can start getting the type checker to write the code for you.

The first step is to write the basic definition, in such a way that Haskell will give you error messages if you don't get it right:

> myexpr :: a -> C x a
> myexpr = gfoldl cont base
>
> cont = undefined
> base = undefined

At this point the Haskell type checker can already check that you haven't made a mistake anywhere else. Once all is good you can start trying to "fill in the blanks". The first step is to tackle base:

> base :: a -> C x a
> base = undefined

Note that no code has been written, but you've checked that the type is valid. A :ttype on base would have given you the same information, assuming base was at the top level of the program, but this step is useful to check.

Next you can start filling in the undefined bits:

> base :: a -> C x a
> base a = C ([], \y -> (a,y))

By working this way it is relatively easy to build up a correct program, at each stage using small steps and regular type feedback to check you are going in the right direction.

The second time I was using this technique was when I was replacing one data structure with another, with some similar operations and some completely different ones. It wasn't as simple as changing the imports, but thanks to regular feedback from the type checker this went from being an exercise in thought to just going around mopping up the errors.

Programming like this is very powerful, giving lots of hints about what the user needs to do next, but would really benefit from some intelligent development tools. Epigram has some similar ideas built into the development environment, but Haskell doesn't.

Sunday, January 14, 2007

Debugging and crying

I'm now back from Hac 07, and I was sharing a room with Pepe, the GHCi Debugger Summer Of Code guy. I got to hear his presentation on the debugger he has completed, and got to see it "in action". He made some interesting comments - that Hat provides a much richer framework, and that when he originally tried to get Hat working he even went as far as replacing his OS (!) to try and get Hat working - but with no luck. He demonstrated his debugger, and the next day, in front of about 10 people I demonstrated Hat.

GHCi Debugger Demonstration

The GHCi debugger is much newer, and not officially released yet. Pepe started off with the demonstration, giving a few examples, but at various points I asked "what if?" questions, all of which he went and did. The debugger worked flawlessly, and produced useful information. It didn't produce as much information as Hat (particularly stack traces were missing), but it was solid and reliable. I only had one major gripe with it - that :quit was used for continuing from a breakpoint - but the next day a patch went in fixing that.

Hat Debugger Demonstration

In return, I tried to go through a pre-scripted demonstration of Hat. However, since the first scripting I had installed GHC 6.6 on the machine, so the compilation took about 100 times longer than I remember - no idea why. When I finally had a Hat file I was able to view it in Visual Hat. I then tweaked the file as people asked various questions - unfortunately that didn't work so well. Various things went wrong. I added a head call, to invoke a pattern match failure. Indeed, it did invoke a pattern match failure, in Hat itself! I tried some other stuff, files kept appearing in the wrong places. Stack traces were coming out not quite as you'd expect them. Everything went wrong.

Some of these issues may well be in the hat-make that I wrote recently and hackily, but others were more fundamental issues in Hat itself.

Conclusion

While Hat is without a doubt more powerful, it looses on reliability. Despite porting Hat to windows, and using Hugs fulltime, GHCi will be my debugger of choice. I don't use Hat because when I've got a bug to fix, I know getting through Hat will take at least an hour, and won't always be possible (Data.Map for example). The GHCi debugger is maybe 10% of Hat in terms of power, but 100% reliability looks like it will make up for this.

Friday, January 12, 2007

Hac 07: Deferred binary

As part of Hac 07 I have written a Deferred binary library. It's not released yet, since I'll need to change some bits, but I have managed to write a pickle/unpickle combinator which is pretty minimal. It can save to a file and load from a file in binary.

The first step is to take your data structure:

> data Computer = Laptop {weight :: Int, wireless :: Bool} | Desktop {software :: [String]}

Then you need to write your combinators:

instance BinaryDefer Computer where
bothDefer = serial
[\ ~(Laptop a b) -> unit Laptop <<> unit Desktop << a =""> Handle -> a -> IO ()
getDefer :: BinaryDefer a => Handle -> IO a

This gives you all the power to save and load your data structure. No need to manually assign tags, no need to write get/put versions, just one set of combinators. Needless to say, this was pretty tricky to set up!

But this is all stuff that belongs in a binary library, so what is a deferred binary library? Well perhaps there are a lot of software packages on a desktop, and usually the program doesn't care about them. Maybe 90% of the file will be the software packages, so 90% of the time will be wasted in reading the file. Of course, sometimes the software is queried, so it still has to be available. Using deferred binary, you can change the combinator to say which fields are loaded in a deferred manner:

\ ~(Desktop a) -> unit Desktop <<~ a

That extra ~ is the single change. Now the file format adapts to store the software list in a different place and it is read only when needed. Much more efficient, yet still the same simple underlying interface.

This is the binary library that Hoogle 4 will be using!

Hac 07: Searchable haddock index files

To see what I've been doing at Hac 07, go to: http://www-users.cs.york.ac.uk/~ndm/temp/haddock/doc-index.html

I have written a searchable index file for haddock, which adds very little file size, and has search-as-you-type features. It doesn't store a separate table of index names. It searches for prefixes only, in O(log n).

There was a version that did full substring searching in O(n), but unfortunately IE is too slow to do this for the base library. If you want that you are still going to have to use Hoogle.

These patches have been bundled up and sent off to Simon Marlow, so hopefully they'll be appearing in a Haddock near you soon!

Thursday, January 04, 2007

Does Haskell need macros?

Haskell has higher-order functions, and lazy evaluation, but not macros. Does Haskell need macros? For the purpose of this, I'm talking about macro in the "a little piece of code that expands out" sense, rather than the conditional compilation that macros quite often get used for in C++.

If Haskell needs macros then there must be some benefit to adding them. The reason that a lot of languages need macros is because they have strict evaluation. Consider trying to write a function that selects one of two arguments, based on the first condition variable - some sort of if:

iff a b c = if a then b else c

You can write this in Haskell. You can also write this in VB (and it is pre-defined), but it doesn't do what you might expect in VB. Haskell's lazy evaluation is a big win here, opening up entirely new opportunities.

So what are the reasons to add macros?

Speed

This is just a silly argument. First off, a pure function can always be inlined away, so performance is not an issue. A compiler might choose not to inline a function, but you can usually persuade it that it should with pragma's etc. And finally, there is the real possibility that you (the author of code) doesn't have a clue what should be inlined and is actually throwing away speed for premature optimisation.

Generate code

One thing that macros can do is to generate definitions, functions can be used to generate expressions (essentially a function is an expression with some pre-defined structure). You can't define a Haskell function that takes some data, and generates classes/instances etc. For example, imagine that you want to define lots of constructors for binary operations. Instead of:

data Expr = Add Expr Expr
| Mul Expr Expr
...

Here a macro might let you write something like:

data Expr = binary(Add) | binary(Mul) ...

Of course, this is something you really can't do with Haskell as it stands. However most of the ways you might use a macro like this are covered by Haskell. There are type synonyms, which may actually let you do the above trick with GADT's (I haven't tried, but its certainly feasible they could work like that, even if they currently don't). Classes has defaults which is another situation where otherwise this functionality might be required.

For most purposes generating code with macros is not about doing clever stuff, but about cutting out repetitive stuff. Haskell has (by design) cut out most of the repetitive tasks, so no longer requires macros. For truly repetitive tasks, Template Haskell can be used, or things like Drift and Scrap Your Boilerplate. All of these techniques reduce the number of places where macros might otherwise be required.

LHS of a pattern match

One place where expressions are present, but functions are not allowed is on the left hand side of a pattern match. For example:

f 3.14159265358979 = 2.71828182845905

Of course, it's silly to write this using hardcoded values, what if either of these constants change?

f pi = exp 1

Unfortunately without macros, this is not allowed in Haskell since pi is a function (a constant arity function, but still a function). You'd instead have to write:

f x | pi == x = exp 1

Not too painful, but what if you are trying to match a LHS with a free variable in it:

f (Neg (Neg (Neg a))) = Pos $ Pos $ Pos a

Here you are matching a deeply nested constructor, and on the right have to explicitly bracket the terms. However on the right you are able to use the function ($) which due to precedence issues removes the need for brackets. One thing you can't do however is:

f (Neg $ Neg $ Neg a) = Pos $ Pos $ Pos a

If ($) was a macro, this would be valid however.

Haskell has no feature here that can reduce the redundancy in the pattern match.

But while macros would be useful here, how far would standard C style macros take us? Let's consider defining a useful auxiliary:

apply 0 f x = x
apply n f x = f (apply (n-1) f x)

Now we can rewrite this as:

f (Neg (Neg (Neg a))) = apply 3 Pos x

But we really can't write:

f (apply 3 Neg a) = apply 3 Pos x

However, its entirely possible to come up with a consistent semantics for this, so maybe one day this might be something people decide is useful.

For the moment the lack of macros on the LHS is occasionally annoying, but generally is not a feature that anyone is crying out for. The LHS is rarely sufficiently complex for macros to show great benefit.

Type checking with rank-2 types

While standard Haskell (mainly, ignoring MR and seq - for those who care) has the fold and unfold law, i.e. you can inline a function and then contact it back, once you introduce rank-2 types you loose this property. Unfold still holds, but fold doesn't. This leads to lots of typing problems for things like runST which present rank-2 types to the user.

While macros would paper over the cracks here, the real solution probably lies in the type system, which I believe GHC has in various flavours of impredictive polymorphism. The other solution is not to use rank-2 types, something I have been happily avoiding for years now.

Conclusion

Despite macros being useful in languages like Lisp, in Haskell their usefulness is probably minimal. Lazy evaluation is often a substitute for macros, and where other languages require macros to define domain specific languages, Haskell solves these problems naturally.

Macros are quite a painful cost to a language, they reduce the assumptions that people can make and require people to play hunt the macro. Haskell really doesn't need them!