Neil Mitchell's Blog (Haskell etc): September 2011

Summary: The let and lambda constructs give a precise way to control sharing, but their exact use can be tricky. This post gives some worked examples and general guidance.

An easy way to improve performance is to call something fewer times, which requires understanding how many times something gets called. One topic I find myself regularly explaining is how lambda expressions under let expressions affect sharing. Consider the two following examples:

Example 1


f x y = sqrt x + y

result = f 1 2 + f 1 4

Example 2


f x = let sqrt_x = sqrt x in \y -> sqrt_x + y

result = let f_1 = f 1 in f_1 2 + f_1 4

Question

In each example, how many times is sqrt executed to compute result? (Assume no advanced optimisations - these often break down on larger examples.)

Answer

In Example 1 we execute sqrt twice, while in Example 2 we execute sqrt once. To go from Example 1 to Example 2 we need to make two changes:

Step 1: Rewrite f to compute sqrt after one argument instead of two.

Step 2: Rewrite result to share the result of f with one argument.

Performing either rewrite alone will still result in sqrt being executed twice.

Step 1: Rewriting f

Let's take a look at the original definition of f:


f x y = sqrt x + y

Rewriting this function in English, we can describe it as:


given x and y, compute sqrt x + y

But the computation of sqrt x does not depend on y. If the computation of sqrt x is expensive, and if we know the function will often be called with the same x for many different values of y, it is better to describe it as:


given x, compute sqrt x, then given y, add that value to y

The Haskell syntax for this description is:


f = \x -> let sqrt_x = sqrt x in \y -> sqrt_x + y

Which would usually be written in the equivalent declaration form as:


f x = let sqrt_x = sqrt x in \y -> sqrt_x + y

Step 2: Using the rewritten f

If we look at the definition of result:


result = f 1 2 + f 1 4

We see that the subexpression f 1 occurs twice. We can perform common subexpression elimination (CSE) and write:


result = let f_1 = f 1 in f_1 2 + f_1 4

With the original definition of f, commoning up f 1 would have had no performance benefit - after f was applied to 1 argument it did nothing but wait for the second argument. However, with the revised definition of f, the value f_1 will create the computation of sqrt 1, which will be performed only once when executed by f_1 2 and f_1 4.

The Optimisation

This optimisation technique can be described as:

Step 1: Rewrite the function to perform some computation before all arguments are supplied.

Step 2: Share the partially applied function.

Crucially the function in Step 1 must take it's arguments in an order that allows computation to be performed incrementally.

A Practical Example

In previous versions of Hoogle, the function I wrote to resolve type synonyms (e.g. type String = [Char]) was:


resolveSynonyms :: [Synonym] -> TypeSig -> TypeSig

Given a list of type synonyms, and a type signature, return the type signature with all synonyms expanded out. However, searching through a list of type synonyms is expensive - it is more efficient to compute a table allowing fast lookup by synonym name. Therefore, I used the optimisation technique above to write:


resolveSynonyms synonyms = let info = buildSynonymTable synonyms in \x -> transformSynonyms info x

This technique worked well, especially given that the list of synonyms was usually constant. However, from simply looking at the type signatures, someone else is unlikely to guess that resolveSynonyms should be partially applied where possible. An alternative is to make the sharing more explicit in the types, and provide:


data SynonymTable

buildSynonymTable :: [Synonym] -> SynonymTable

resolveSynonyms :: SynonymTable -> TypeSig -> TypeSig

The disadvantage is the increase in the size of the API - we have gone from one function to two functions and a data type. Something that used to take one function call now takes two.

Conclusion

I think all Haskell programmers benefit from understand how the interaction of lambda and let affect sharing. Pushing lambda under let is often a useful optimisation technique, particularly when the resulting function is used in a map. However, I wouldn't usually recommend exporting public API's that rely on partial application to get acceptable performance - it's too hard to discover.

Neil Mitchell's Blog (Haskell etc)

Tuesday, September 06, 2011

Faster WPF ComboBoxes

Sunday, September 04, 2011

Sharing in Haskell