Monday, April 29, 2019

foldr under the hood

Summary: The foldr function seems simple, but is actually very complex, with lots of layers. This post dives through the layers.

The foldr function takes a list and replaces all : (cons) and [] (nil) values with functions and a final value. It's available in the Haskell Prelude and described on Wikipedia. As some examples:

sum = foldr (+) 0
map f = foldr (\x xs -> f x : xs) []

But the simple foldr described on Wikipedia is many steps away from the one in the Haskell Prelude. In this post we'll peel back the layers, learning why foldr is a lot more complicated under the hood.

Layer 1: Wikipedia definition

The definition on Wikipedia is:

foldr :: (a -> b -> b) -> b -> [a] -> b
foldr f z []     = z
foldr f z (x:xs) = f x (foldr f z xs)

This recursive definition directly describes what foldr does. Given a list [1,2,3] we get f 1 (f 2 (f 3 z)).

Layer 2: Static argument transformation

The problem with this definition is that it is recursive, and GHC doesn't like to inline recursive functions, which prevents a lot of optimisation. Taking a look at sum, it's a real shame that operations like (+) are passed as opaque higher-order functions, rather than specialised to the machine instruction ADD. To solve that problem, GHC defines foldr as:

foldr f z = go
    where go []     = z
          go (x:xs) = f x (go xs)

The arguments f and z are constant in all sucessive calls, so they are lifted out with a manually applied static argument transformation.

Now the function foldr is no longer recursive (it merely has a where that is recursive), so foldr can be inlined, and now + can meet up with go and everything can be nicely optimised.

Layer 3: Inline later

We now have foldr that can be inlined. However, inlining foldr is not always a good idea. In particular, GHC has an optimisation called list fusion based on the idea that combinations of foldr and build can be merged, sometimes known as short-cut deforestation. The basic idea is that if we see foldr applied to build we can get rid of both (see this post for details). We remove foldr using the GHC rewrite rule:

{-# RULES "my foldr/build" forall g k z. foldr k z (build g) = g k z #-}

The most interesting thing about this rule (for this post at least!) is that it matches foldr by name. Once we've inlined foldr we have thrown away the name, and the rule can't fire anymore. Since this rule gives significant speedups, we really want it to fire, so GHC adds an extra pragma to foldr:

{-# INLINE [0] foldr #-}

This INLINE pragma says don't try and inline foldr until the final stage of the compiler, but in that final stage, be very keen to inline it.

Layer 4: More polymorphism

However, the foldr function in the Prelude is not the one from GHC.List, but actually a more general one that works for anything Foldable. Why limit yourself to folding over lists, when you can fold over other types like Set. So now foldr is generailsed from [] to t with:

foldr :: Foldable t => (a -> b -> b) -> b -> t a -> b

Where foldr on [] is GHC.List.foldr.

Layer 5: A default implementation

But foldr is actually in the type class Foldable, not just defined on the outside. Users defining Foldable can define only foldr and have all the other methods defined for them. But they can equally define only foldMap, and have an implicit version of foldr defined as:

foldr :: (a -> b -> b) -> b -> t a -> b
foldr f z t = appEndo (foldMap (Endo . f) t) z

Where Endo is defined as:

newtype Endo = Endo {appEndo :: a -> a}

instance Monoid (Endo a) where
    mempty = Endo id
    Endo a <> Endo b = Endo (a . b)

The function foldMap f is equivalent to mconcat . map f, so given a list [1,2,3] the steps are:

  • First apply map (Endo . f) to each element to get [Endo (f 1), Endo (f 2), Endo (f 3)].
  • Next apply mconcat to the list to get Endo (f 1) <> Endo (f 2) <> Endo (f 3).
  • Inline all the <> definitions to get Endo (f 1 . f 2 . f 3).
  • Apply the appEndo at the beginning and z at the end for (f 1 . f 2 . f 3) z.
  • Inline all the . to give f 1 (f 2 (f 3 z)), which is what we had at layer 1.

Layer 6: Optimising the default implementation

The real default implementation of foldr is:

foldr f z t = appEndo (foldMap (Endo #. f) t) z

Note that the . after Endo has become #.. Let's first explain why it's correct, then why it might be beneficial. The definition of #. is:

(#.) :: Coercible b c => (b -> c) -> (a -> b) -> (a -> c)
(#.) _ = coerce

Note that it has the same type as . (plus a Coercible constraint), but ignores it's first argument entirely. The coerce function transforms a value of type a into a value of type b with zero runtime cost, provided they have the same underlying representation. Since Endo is a newtype, that means Endo (f 1) and f 1 are implemented identically in the runtime, so coerce switches representation "for free". Note that the first argument to #. only serves to pin down the types, so if we'd passed an interesting function as the first argument it would have been ignored.

Of course, in normal circumstances, a newtype is free anyway, with no runtime cost. However, in this case we don't have a newtype, but a function application with a newtype. You can see the gory details in GHC ticket 7542, but at one point this impeeded other optimisations.

I tried a simulated version of foldr and found that if GHC can't tell that [] is the Foldable the code looks pretty bad, but if it can, at -O1 and above, the two implementations are 100% equialent (to the point that common subexpression elimination makes them actually the same). It's possible this final layer is a vestigial optimisation, or perhaps it's still important in some circumstances.

1 comment:

George said...

Excellent! Helps bridge the gap between textbooks and high performance implementation without which Haskell is a non-starter