Summary: The let and lambda constructs give a precise way to control sharing, but their exact use can be tricky. This post gives some worked examples and general guidance.
An easy way to improve performance is to
call something fewer times, which requires understanding how many times something gets called. One topic I find myself regularly explaining is how lambda expressions under let expressions affect sharing. Consider the two following examples:
Example 1
f x y = sqrt x + y
result = f 1 2 + f 1 4
Example 2
f x = let sqrt_x = sqrt x in \y -> sqrt_x + y
result = let f_1 = f 1 in f_1 2 + f_1 4
Question
In each example, how many times is
sqrt executed to compute
result? (Assume no advanced optimisations - these often break down on larger examples.)
Answer
In Example 1 we execute
sqrt twice, while in Example 2 we execute
sqrt once. To go from Example 1 to Example 2 we need to make two changes:
- Step 1: Rewrite f to compute sqrt after one argument instead of two.
- Step 2: Rewrite result to share the result of f with one argument.
Performing either rewrite alone will still result in
sqrt being executed twice.
Step 1: Rewriting f
Let's take a look at the original definition of
f:
f x y = sqrt x + y
Rewriting this function in English, we can describe it as:
given x and y, compute sqrt x + y
But the computation of
sqrt x does not depend on
y. If the computation of
sqrt x is expensive, and if we know the function will often be called with the same
x for many different values of
y, it is better to describe it as:
given x, compute sqrt x, then given y, add that value to y
The Haskell syntax for this description is:
f = \x -> let sqrt_x = sqrt x in \y -> sqrt_x + y
Which would usually be written in the equivalent declaration form as:
f x = let sqrt_x = sqrt x in \y -> sqrt_x + y
Step 2: Using the rewritten f
If we look at the definition of
result:
result = f 1 2 + f 1 4
We see that the subexpression
f 1 occurs twice. We can perform common subexpression elimination (CSE) and write:
result = let f_1 = f 1 in f_1 2 + f_1 4
With the original definition of
f, commoning up
f 1 would have had no performance benefit - after
f was applied to 1 argument it did nothing but wait for the second argument. However, with the revised definition of
f, the value
f_1 will create the computation of
sqrt 1, which will be performed only once when executed by
f_1 2 and
f_1 4.
The Optimisation
This optimisation technique can be described as:
- Step 1: Rewrite the function to perform some computation before all arguments are supplied.
- Step 2: Share the partially applied function.
Crucially the function in Step 1 must take it's arguments in an order that allows computation to be performed incrementally.
A Practical Example
In previous versions of
Hoogle, the function I wrote to resolve type synonyms (e.g.
type String = [Char]) was:
resolveSynonyms :: [Synonym] -> TypeSig -> TypeSig
Given a list of type synonyms, and a type signature, return the type signature with all synonyms expanded out. However, searching through a list of type synonyms is expensive - it is more efficient to compute a table allowing fast lookup by synonym name. Therefore, I used the optimisation technique above to write:
resolveSynonyms synonyms = let info = buildSynonymTable synonyms in \x -> transformSynonyms info x
This technique worked well, especially given that the list of synonyms was usually constant. However, from simply looking at the type signatures, someone else is unlikely to guess that
resolveSynonyms should be partially applied where possible. An alternative is to make the sharing more explicit in the types, and provide:
data SynonymTable
buildSynonymTable :: [Synonym] -> SynonymTable
resolveSynonyms :: SynonymTable -> TypeSig -> TypeSig
The disadvantage is the increase in the size of the API - we have gone from one function to two functions and a data type. Something that used to take one function call now takes two.
Conclusion
I think all Haskell programmers benefit from understand how the interaction of lambda and let affect sharing. Pushing lambda under let is often a useful optimisation technique, particularly when the resulting function is used in a map. However, I wouldn't usually recommend exporting public API's that rely on partial application to get acceptable performance - it's too hard to discover.