Monday, March 16, 2020

The <- pure pattern

Summary: Sometimes <- pure makes a lot of sense, avoiding some common bugs.

In Haskell, in a monadic do block, you can use either <- to bind monadic values, or let to bind pure values. You can also use pure or return to wrap a value with the monad, meaning the following are mostly equivalent:

let x = myExpression
x <- pure myExpression

The one place they aren't fully equivalent is when myExpression contains x within it, for example:

let x = x + 1
x <- pure (x + 1)

With the let formulation you get an infinite loop which never terminates, whereas with the <- pure pattern you take the previously defined x and add 1 to it. To solve the infinite loop, the usual solution with let is to rename the variable on the left, e.g.:

let x2 = x + 1

And now make sure you use x2 everywhere from now on. However, x remains in scope, with a more convenient name, and the same type, but probably shouldn't be used. Given a sequence of such bindings, you often end up with:

let x2 = x + 1
let x3 = x2 + 1
let x4 = x3 + 1

Given a large number of unchecked indicies that must be strictly incrementing, bugs usually creep in, especially when refactoring. The unused variable warning will sometime catch mistakes, but not if a variable is legitimately used twice, but one of those instances is incorrect.

Given the potential errors, when a variable x is morally "changing" in a way that the old x is not longer useful, I find it much simpler to write:

x <- pure myExpression

The compiler now statically ensures we haven't fallen into the traps of an infinite loop (which is obvious and frustrating to track down) or using the wrong data (which is much harder to track down, and often very subtly wrong).

What I really want: What I actually think Haskell should have done is made let non-recursive, and had a special letrec keyword for recursive bindings (leaving where be recursive by default). This distinction is present in GHC Core, and would mean let was much safer.

What HLint does: HLint is very aware of the <- pure pattern, but also aware that a lot of beginners should be guided towards let. If any variable is defined more than once on the LHS of an <- then it leaves the do alone, otherwise it will suggest let for those where it fits.

Warnings: In the presence of mdo or do rec both formulations might end up being the same. If the left is a refutable pattern you change between error and fail, which might be quite different. Let bindings might be generalised. This pattern gives a warning about shadowed variables with -Wall.


Joseph C. Sible said...

Re "The one place they aren't fully equivalent is when myExpression contains x within it", there's a second place too: if "x" is a refutable pattern and it doesn't match, the "let =" way will put bottoms in whatever variables were in the pattern, whereas the "<- pure" way will call "fail" instead.

Also, it's worth noting that your technique will cause a compiler error if you use it within "mdo" or "rec" from the RecursiveDo extension.

Justin said...

One main distinction between the two that might be worth noting is polymorphism. The let binding is allowed to stay polymorphic (and will actually be inferred as such, unless MonoLocalBinds is on). The version with <- will necessarily be monomorphic. This behavior can be useful in both ways, as the implicit monomorphization can help with sharing.

Ben Franksen said...

Unfortunately with -Wall ghc gives you a shadowing warning for "x <- pure (x + 1)". The only solution I found is to use a name that starts with an underscore e.g. "_x".

Neil Mitchell said...

Justin: Good point!

Ben: I disable that GHC warning - I find it harmful rather than helpful, and indeed in the process of GHC itself becoming -Wall clean one of the changes introduced a bug much like described above.

Neil Mitchell said...

Thanks for all the points, I've added a warnings section to the bottom of the blog listing them.

Noah said...

`-Wno-name-shadowing` is one of the first things I add after `-Wall -Werror -Wextra`

Vaibhav said...

Thanks for writing this up! I first heard about this technique from a presentation which referenced and it's much better to be able to link to a blog post.

Steven Shaw said...

I agree that it would be nice to have a non-recursive let.

Rather than ignoring all name-shadowing (as some represent bugs), it would be nice to have a way to indicate when you want a new binding to make the previous binding go out of scope. I don't have a proposal but it would be nice.

Ganesh Sittampalam said...

You also can't reveal existential variables with let, but you can with <- pure (since it's actually a case statement).