Monday, April 23, 2007

Boilerplate considered harmful

At the moment I'm working on a boilerplate removal for Haskell, which is faster (runtime), shorter (code), more type safe and requires fewer extensions than Scrap Your Boilerplate (SYB). However, since I haven't finished and released a stable version, I can't really recommend people use that. The reason I started working on this is because I was unaware of SYB when I started. Last week I also introduced someone to SYB, who had done quite a bit of Haskell programming, but had not stumbled across SYB. As a result, I think it needs a bit more attention - SYB is one of the strengths of Haskell!


Before saying how great SYB is, its important to point out the things that make it not so great:

  • Only implemented in GHC - sorry to the poor Hugs users
  • Requires rank-2 types, which means its not actually Haskell 98
  • Occassionally the rank-2-ness infects the code you write, with unfortunate error messages (although this is not that common)

A data structure

Before showing some operations, I'm going to first introduce a data structure on which we can imagine operations are performed. I don't like the example from the SYB benchmark - it feels like an XML file (as is the stated intention), which means that the logic behind it is a bit disturbed. So instead I'll pick a data type like an imperative programming language:

{-# OPTIONS_GHC -fglasgow-exts #-}
import Data.Generics

data Expr = Var String | Lit Int | Call String [Expr] deriving (Data, Typeable)
data Stmt = While Expr [Stmt] | Assign String Expr | Sequence [Stmt] deriving (Data,Typeable)

We define the data type as normal, adding deriving for Data and Typeable - the two key SYB types. We also add an import and a flag, just to get the GHC machinery working for the derivings.


So lets imagine you have to get a list of all literals. In SYB this is easy:

extractLits :: Data a => a -> [Int]
extractLits = everything (++) ([] `mkQ` f)
where f (Lit x) = [x] ; f _ = []

Wow, easy! This function will operate on anything which has a Data instance, so you can run it on an Expr, Stmt, [Stmt], [Either Stmt Expr] - the choice is yours. For the purposes of a short introduction, I'd recommend treating all the bits except the "f" as just something you write - read the full SYB paper to get all the details of what everything can be used for.


Now lets negate all the literals, we have:

negateLits :: Data a => a -> a
negateLits = everywhere (mkT f)
where f (Lit x) = Lit (negate x) ; f x = x

Again, its pretty easy. And once again, consider all apart from the "f" as just something you write.

The gains in code reduction that can be made with SYB are quite substantial, and by removing the boilerplate you get code which can be reused more easily. Boilerplate code is bad, and should be removed where necessary.


Twan van Laarhoven said...

You have mixed up everywhere and everything. everywhere applies a transformation, everything collects the results of a query.

Anonymous said...

If you think that's "easy", why not use the real thing (Stratego)? You lose type-inference, but you don't need those ugly mk constructors.

Anonymous said...

Yes, those things are really cool. Have you every looked into Attribute Grammars? It's not quite the same, but still related. See: UUAGC and also Wouter Swierstra's excellent article in the Monad Reader: Why Attribute Grammars Matter.

Neil Mitchell said...

twan: for some reason I don't quite understand, I always make that mistake. Just a brain typo.

anon: I think that's easy, but I do think there could be easier things - less mk's and also type inference. That is what I am developing.

chris: I've seen them, but never in the context of boilerplate - but I'll check them out.

Anonymous said...

Just a few notes. (i) Hugs *does* support rank-2 polymorpism but it does not implement the type class Data I guess, but perhaps it should. Also there is a small chance that hugs' approach to rank-2 is not exactly identical and would trigger issues with SYB. (ii) If you generally prefer exercising a GP setup such that no rank-2 types surface the program, you could use Strafunski. (iii) In particular, have a look at the "polymorphic symphony" paper that uses first-class polymorphism (as opposed to rank-2) and opaque generic function types to set up a library of generic function combinators. Because it uses data types for generic function types, you get full type inference. (iv) You find an older JLAP paper on my web site developing a simple type system for a Stratego-like formal setup.


Neil Mitchell said...

Ralf: (i) I suspect that Hugs has enough rank-2 support to work with SYB - adding Data/Typeable deriving would be harder though. It is a shame that this has not been done.

Thanks for the other pointers, I'll have a read up!