Monday, July 09, 2007

Equational Reasoning in Haskell

Update: You can actually further optimise the example in this post, which is done in my IFL 2007 Supero paper, available from my website.

Haskell is great, as it's a pure language. This means that functions don't have side effects, so can be reordered, rearranged, inlined etc. very easily. All this means that you can do reasoning in Haskell in a similar manner to that which is done in mathematics. Most Haskell programmers will exploit the reasoning properties very often, refactoring their code. Today I used equational reasoning to solve a problem I was having, which makes a nice (and simple!) introduction.

This work was all in the context of the Supero project, aiming to optimise a word counting program. I found that in the resultant code, some characters were getting tested for being a space twice (using isSpace). An additional test, and then a branch on a value which is already known, harms performance. There are two possible reasons for this duplicated test: either the input code does the test twice; or the optimiser looses some sharing. After a quick look around I decided that the latter was not occuring, so went to look at the source of words

words :: String -> [String]
words string = case dropWhile isSpace string of
[] -> []
s -> word : words rest
where (word, rest) = break isSpace s

If you are using Hugs, a simple :f words will show you this code, which I've renamed and reformatted slightly.

The key "eureka" moment is to see that s has all it's leading spaces dropped. Therefore, if s has any characters, the first must be a non-space. We then retest the first character to see if it is a space in break. This redundant test is unnecessary, but how to remove it? In Haskell equational reasoning can be used to remove the test, and serves as a proof that the code still has the same functionality.

The first step is to instantiate s in the case alternative. This is safe as the branch above has already examined the constructor, so we do not loose laziness.

words string = case dropWhile isSpace s of
[] -> []
s:ss -> word : words rest
where (word, rest) = break isSpace (s:ss)

Now we know that s is not a space, specifically that not (isSpace s) is true. From now on we will work only within the second case alternative. We now need the definition of break and span to proceed further:

span, break :: (a -> Bool) -> [a] -> ([a],[a])
span p [] = ([],[])
span p xs@(x:xs')
| p x = (x:ys, zs)
| otherwise = ([],xs)
where (ys,zs) = span p xs'
break p = span (not . p)

Now we can inline break.

s:ss -> word : words rest
where (word, rest) = span (not . isSpace) (s:ss)

Now looking at span we can see that we match the xs@(x:xs') branch. Furthermore, we know from earlier that not (isSpace s) is True so we will take the first alternative. This lets us rewrite the expression as:

s:ss -> word : words rest
where (word, rest) = (s:ys, zs)
(ys,zs) = span (not . isSpace) ss

Now we are nearly there. We can replace word with s:ys and rest with zs using the first where binding:

s:ss -> (s:ys) : words zs
where (ys,zs) = span (not . isSpace) ss

Now we can rename ys to word and zs to rest.

s:ss -> (s:word) : words rest
where (word,rest) = span (not . isSpace) ss

And finally, to get us back to something very much like our original, we can fold back the span. Just as we were able to replace the left hand side of break with the right hand side, we can do the same in the other direction:

s:ss -> (s:word) : words rest
where (word,rest) = break isSpace ss

Now putting this all back into the original definition:

words :: String -> [String]
words string = case dropWhile isSpace string of
[] -> []
s:ss -> (s:word) : words rest
where (word,rest) = break isSpace ss

We now have a more efficient version of words which at every stage kept the same behaviour. I have emailed the Haskell libraries mailing list and hopefully this optimisation will be able to be applied to the standard libraries.

In reality, when I did this transformation I skipped the inlining of break and relied on what I knew about its behaviour. If a student was to do this transformation in an exam, more detail would probably be required going from (not . isSpace) in span. Either way, as your experience grows these transformations become more natural, and are a fundamental part of working with Haskell.

No comments: