Thursday, February 05, 2015

Refactoring with Equational Reasoning

Summary: Haskell is great for refactoring, thanks to being able to reason about and transform programs with confidence.

I think one of Haskell's strengths as a practical language is that it's easy to refactor, and more importantly, easy to refactor safety. Programs in the real world often accumulate technical debt - code that is shaped more by its history than its current purpose. Refactoring is one way to address that technical debt, making the code simpler, but not changing any meaningful behaviour.

When refactoring, you need to think of which alternative forms of code are equivalent but better. In C++ I've removed unused variables, only to find they were RAII variables, and their mere presence had a semantic effect. In R I've removed redundant if expressions, only to find the apparently pure condition had the effect of coercing a variable and changing its type. In Haskell, it's equally possible to make refactorings that at first glance appear safe but aren't - however, in practice, it happens a lot less. I think there are a few reasons for that:

  • Haskell is pure and evaluation strategy is largely unobservable - moving a statement "before" or "after" another lexically is usually safe.
  • Refactorings that do go wrong, for example variables that accidentally get moved out of scope or types which are no longer as constrained, usually result in compiler errors.
  • The Haskell community cares about semantics and laws. The Monad laws are satisfied by almost all monads, flagrantly breaking those laws is rare.
  • Functions like unsafePerformIO, which could harm refactoring, are almost always used behind a suitable pure abstraction.

Note that these reasons are due to both the language, and the conventions of the Haskell community. (Despite these factors, there are a few features that can trip up refactorings, e.g. exceptions, record wildcards, space-leaks.)

To take a very concrete example, today I was faced with the code:

f = fromMaybe (not b) . select
if f v == b then opt1 else opt2

At one point the function f was used lots, had a sensible name and nicely abstracted some properties. Now f is used once, the semantics are captured elsewhere, and the code is just unclear. We can refactor this statement, focusing on the condition:

f v == b
-- inline f
(fromMaybe (not b) . select) v == b
-- remove brackets and inline (.)
fromMaybe (not b) (select v) == b
-- expand to a case statement
(case select v of Nothing -> not b; Just x -> x) == b
-- push the == down
case select v of Nothing -> not b == b; Just x -> x == b
-- simplify not b == b
case select v of Nothing -> False; Just x -> x == b
-- collapse back up
select v == Just b

And now substitute back in:

if select v == Just b then opt1 else opt2

Our code is now much simpler and more direct. Thanks to the guarantees I expect of Haskell programs, I also have a high degree of confidence this code really is equivalent - even if it isn't obvious just looking at beginning and end.


Franklin Chen said...

Are there currently interactive IDEs that operate as assistants for this sort of refactoring?

Neil Mitchell said...

Franklin: HaRE and Hermit are meant to do things a bit like this. I did my transforms in an IDE that gave me immediate type-checking (a bit like ghcid does), but did all the steps manually. I'm not sure I'd have used a tool even if one was available - it would have been more constraining. Of course, there are plenty of refactorings that would benefit from a tool, probably anything which involved updating in more than one localised place.