## Monday, April 29, 2019

### foldr under the hood

Summary: The `foldr` function seems simple, but is actually very complex, with lots of layers. This post dives through the layers.

The `foldr` function takes a list and replaces all `:` (cons) and `[]` (nil) values with functions and a final value. It's available in the Haskell `Prelude` and described on Wikipedia. As some examples:

``````sum = foldr (+) 0
map f = foldr (\x xs -> f x : xs) []
``````

But the simple `foldr` described on Wikipedia is many steps away from the one in the Haskell `Prelude`. In this post we'll peel back the layers, learning why `foldr` is a lot more complicated under the hood.

Layer 1: Wikipedia definition

The definition on Wikipedia is:

``````foldr :: (a -> b -> b) -> b -> [a] -> b
foldr f z []     = z
foldr f z (x:xs) = f x (foldr f z xs)
``````

This recursive definition directly describes what `foldr` does. Given a list `[1,2,3]` we get `f 1 (f 2 (f 3 z))`.

Layer 2: Static argument transformation

The problem with this definition is that it is recursive, and GHC doesn't like to inline recursive functions, which prevents a lot of optimisation. Taking a look at `sum`, it's a real shame that operations like `(+)` are passed as opaque higher-order functions, rather than specialised to the machine instruction `ADD`. To solve that problem, GHC defines `foldr` as:

``````foldr f z = go
where go []     = z
go (x:xs) = f x (go xs)
``````

The arguments `f` and `z` are constant in all sucessive calls, so they are lifted out with a manually applied static argument transformation.

Now the function `foldr` is no longer recursive (it merely has a `where` that is recursive), so `foldr` can be inlined, and now `+` can meet up with `go` and everything can be nicely optimised.

Layer 3: Inline later

We now have `foldr` that can be inlined. However, inlining `foldr` is not always a good idea. In particular, GHC has an optimisation called list fusion based on the idea that combinations of `foldr` and `build` can be merged, sometimes known as short-cut deforestation. The basic idea is that if we see `foldr` applied to `build` we can get rid of both (see this post for details). We remove `foldr` using the GHC rewrite rule:

``````{-# RULES "my foldr/build" forall g k z. foldr k z (build g) = g k z #-}
``````

The most interesting thing about this rule (for this post at least!) is that it matches `foldr` by name. Once we've inlined `foldr` we have thrown away the name, and the rule can't fire anymore. Since this rule gives significant speedups, we really want it to fire, so GHC adds an extra pragma to `foldr`:

``````{-# INLINE  foldr #-}
``````

This `INLINE` pragma says don't try and inline `foldr` until the final stage of the compiler, but in that final stage, be very keen to inline it.

Layer 4: More polymorphism

However, the `foldr` function in the `Prelude` is not the one from `GHC.List`, but actually a more general one that works for anything `Foldable`. Why limit yourself to folding over lists, when you can fold over other types like `Set`. So now `foldr` is generailsed from `[]` to `t` with:

``````foldr :: Foldable t => (a -> b -> b) -> b -> t a -> b
``````

Where `foldr` on `[]` is `GHC.List.foldr`.

Layer 5: A default implementation

But `foldr` is actually in the type class `Foldable`, not just defined on the outside. Users defining `Foldable` can define only `foldr` and have all the other methods defined for them. But they can equally define only `foldMap`, and have an implicit version of `foldr` defined as:

``````foldr :: (a -> b -> b) -> b -> t a -> b
foldr f z t = appEndo (foldMap (Endo . f) t) z
``````

Where `Endo` is defined as:

``````newtype Endo = Endo {appEndo :: a -> a}

instance Monoid (Endo a) where
mempty = Endo id
Endo a <> Endo b = Endo (a . b)
``````

The function `foldMap f` is equivalent to `mconcat . map f`, so given a list `[1,2,3]` the steps are:

• First apply `map (Endo . f)` to each element to get `[Endo (f 1), Endo (f 2), Endo (f 3)]`.
• Next apply `mconcat` to the list to get `Endo (f 1) <> Endo (f 2) <> Endo (f 3)`.
• Inline all the `<>` definitions to get `Endo (f 1 . f 2 . f 3)`.
• Apply the `appEndo` at the beginning and `z` at the end for `(f 1 . f 2 . f 3) z`.
• Inline all the `.` to give `f 1 (f 2 (f 3 z))`, which is what we had at layer 1.

Layer 6: Optimising the default implementation

The real default implementation of `foldr` is:

``````foldr f z t = appEndo (foldMap (Endo #. f) t) z
``````

Note that the `.` after `Endo` has become `#.`. Let's first explain why it's correct, then why it might be beneficial. The definition of `#.` is:

``````(#.) :: Coercible b c => (b -> c) -> (a -> b) -> (a -> c)
(#.) _ = coerce
``````

Note that it has the same type as `.` (plus a `Coercible` constraint), but ignores it's first argument entirely. The `coerce` function transforms a value of type `a` into a value of type `b` with zero runtime cost, provided they have the same underlying representation. Since `Endo` is a `newtype`, that means `Endo (f 1)` and `f 1` are implemented identically in the runtime, so `coerce` switches representation "for free". Note that the first argument to `#.` only serves to pin down the types, so if we'd passed an interesting function as the first argument it would have been ignored.

Of course, in normal circumstances, a `newtype` is free anyway, with no runtime cost. However, in this case we don't have a `newtype`, but a function application with a `newtype`. You can see the gory details in GHC ticket 7542, but at one point this impeeded other optimisations.

I tried a simulated version of `foldr` and found that if GHC can't tell that `[]` is the `Foldable` the code looks pretty bad, but if it can, at `-O1` and above, the two implementations are 100% equivalent (to the point that common subexpression elimination makes them actually the same). It's possible this final layer is a vestigial optimisation, or perhaps it's still important in some circumstances. George said...

Excellent! Helps bridge the gap between textbooks and high performance implementation without which Haskell is a non-starter Anonymous said...

typo: equialent -> equivalent

Neil Mitchell said...

Anon: Thanks! Fixed.