Sunday, September 04, 2011

Sharing in Haskell

Summary: The let and lambda constructs give a precise way to control sharing, but their exact use can be tricky. This post gives some worked examples and general guidance.

An easy way to improve performance is to call something fewer times, which requires understanding how many times something gets called. One topic I find myself regularly explaining is how lambda expressions under let expressions affect sharing. Consider the two following examples:

Example 1

f x y = sqrt x + y
result = f 1 2 + f 1 4

Example 2

f x = let sqrt_x = sqrt x in \y -> sqrt_x + y
result = let f_1 = f 1 in f_1 2 + f_1 4


In each example, how many times is sqrt executed to compute result? (Assume no advanced optimisations - these often break down on larger examples.)


In Example 1 we execute sqrt twice, while in Example 2 we execute sqrt once. To go from Example 1 to Example 2 we need to make two changes:

  • Step 1: Rewrite f to compute sqrt after one argument instead of two.

  • Step 2: Rewrite result to share the result of f with one argument.

Performing either rewrite alone will still result in sqrt being executed twice.

Step 1: Rewriting f

Let's take a look at the original definition of f:

f x y = sqrt x + y

Rewriting this function in English, we can describe it as:

given x and y, compute sqrt x + y

But the computation of sqrt x does not depend on y. If the computation of sqrt x is expensive, and if we know the function will often be called with the same x for many different values of y, it is better to describe it as:

given x, compute sqrt x, then given y, add that value to y

The Haskell syntax for this description is:

f = \x -> let sqrt_x = sqrt x in \y -> sqrt_x + y

Which would usually be written in the equivalent declaration form as:

f x = let sqrt_x = sqrt x in \y -> sqrt_x + y

Step 2: Using the rewritten f

If we look at the definition of result:

result = f 1 2 + f 1 4

We see that the subexpression f 1 occurs twice. We can perform common subexpression elimination (CSE) and write:

result = let f_1 = f 1 in f_1 2 + f_1 4

With the original definition of f, commoning up f 1 would have had no performance benefit - after f was applied to 1 argument it did nothing but wait for the second argument. However, with the revised definition of f, the value f_1 will create the computation of sqrt 1, which will be performed only once when executed by f_1 2 and f_1 4.

The Optimisation

This optimisation technique can be described as:

  • Step 1: Rewrite the function to perform some computation before all arguments are supplied.

  • Step 2: Share the partially applied function.

Crucially the function in Step 1 must take it's arguments in an order that allows computation to be performed incrementally.

A Practical Example

In previous versions of Hoogle, the function I wrote to resolve type synonyms (e.g. type String = [Char]) was:

resolveSynonyms :: [Synonym] -> TypeSig -> TypeSig

Given a list of type synonyms, and a type signature, return the type signature with all synonyms expanded out. However, searching through a list of type synonyms is expensive - it is more efficient to compute a table allowing fast lookup by synonym name. Therefore, I used the optimisation technique above to write:

resolveSynonyms synonyms = let info = buildSynonymTable synonyms in \x -> transformSynonyms info x

This technique worked well, especially given that the list of synonyms was usually constant. However, from simply looking at the type signatures, someone else is unlikely to guess that resolveSynonyms should be partially applied where possible. An alternative is to make the sharing more explicit in the types, and provide:

data SynonymTable
buildSynonymTable :: [Synonym] -> SynonymTable
resolveSynonyms :: SynonymTable -> TypeSig -> TypeSig

The disadvantage is the increase in the size of the API - we have gone from one function to two functions and a data type. Something that used to take one function call now takes two.


I think all Haskell programmers benefit from understand how the interaction of lambda and let affect sharing. Pushing lambda under let is often a useful optimisation technique, particularly when the resulting function is used in a map. However, I wouldn't usually recommend exporting public API's that rely on partial application to get acceptable performance - it's too hard to discover.


Syzygies2 said...

C programmers coming to Haskell often clutch their tennis rackets so hard that their fingers bleed, when they should instead be visualizing the tournament as a whole. Do you really want them to know that these optimizations exist?

Like that old saw about how we all have gills in the womb, do programmers born cleanly into a functional world still have to relive a C phase? I love your blog posts, but is this your C phase?

Seriously, I'm old enough to remember compilers that would write to a register the value that was already there. My work-arounds on an early open-source project managed to break other compilers. The correct response is instead to write clean code and outlive the era.

Haskell's strength is how fast it runs in one's brain, not how fast it runs on current hardware. These code transformations trade performance the wrong way. Wait a few years, and GHC will manage these optimizations just fine, even on large examples.

Mitch Skinner said...

I thought you were big on supercompilation, so I'm a little surprised to see you advocating manual transformations. Or is the suggestion in this post informed by your experience with the difficulties of getting the compiler to do the right thing?

Question: on your hoogle example, did you have to do the manual transformation in order to get the sharing? GHC wouldn't recognize it otherwise?

Norm 2782 said...

Why do we need to introduce a lambda in f, instead of just defining it as

f x y = let sqrt_x = sqrt x in sqrt_x + y

Type-wise, they are equivalent. Doesn't GHC just share the partially applied function?

jberryman said...

Like Norm, I'm also wondering why the compiler can't optimize 'f', or why it can't automatically rewrite pure functions in that cascading let / lambda form.

Paul L said...

Example 1 looks very familiar, did you take it from Sinot's paper "Complete Laziness: A Natural Semantics"?
Lazy evaluation is all about sharing.

george said...

If the f 1 2 and f 1 4 happen in parallel as diferent threads, I think there is a small window in GHC, during which both will try to evaluate the sqrt_x thunk and could possibly evaluate it twice, no? (page 4 in Haskell on a Shared Memory Multiprocessor)

Neil Mitchell said...

I believe it is the programmers job to express the algorithmic complexity (O(n) vs O(n^2)) and the compilers job to reduce the constant associated with that algorithm. Supercompilation and GHC can sometimes reduce the complexity, but I don't think you should ever rely on it. (Note that the example, for simplicity, was a single reduction of sqrt - but the same principle holds if you compare [f 1 x | x <- [1..n]] to let f_1 = f 1 in [f_1 x | x <- [1..n]], which is an n to 1 reduction in the sqrt operations.)

Syzygies2: Don't worry, I'm still all about high-level programming :). But I consider this the equivalent of things like connecting to a database to get a list of months in an inner loop - you should always lift those operations out of the loop in C, and you should equally do it in Haskell.

Mitch: For the Hoogle example I wrote it that way because I wanted to express where the sharing existed - it was a statement of semantics rather than optimisation - "I only want this to happen once". The fact that I only want it to happen once is for performance concerns is a separate issue.

Norm/jberryman: GHC might make them equivalent, but it certainly doesn't have to, and I'd say that if you rely on GHC you've got a bug. For example, it's unlikely to at -O0, so if you ever try and run your program through a debugger it will become take forever.

george: You're right :) - one of the trade-offs the GHC developers made - but you could argue they introduced a bug compared to the Haskell spec (if the spec documented sharing, which it doesn't, but by convention everyone agrees what it mostly would be).

Singpolyma said...

Does this also work with "where" clauses? ie:

f x = (\y -> sqrt_x + y)
sqrt_x = sqrt x

or even:

f x = (sqrt_x+)
sqrt_x = sqrt x

Neil Mitchell said...

Singpolyma: Yes, your functions are both exactly equivalent to:

f x = let sqrt_x = sqrt x in \y -> sqrt_x + y

Anonymous said...

This problem but me hard recently. I made a simple refactor to a program that changed it from O(1) to O(n^2).
(Simplifying, so I might mess up the details.):


n = 5

pns = p [1..n*n]

f x = g pns x

main = print $ iterate f x0 !! 10

When n is constant, f computes pns once. But then I changed

pns n = p [1..n*n]

and took n as input in main.

Bam, no more sharing.

BTW, the way I discovered this was that I noticed my Debug.Trace.trace calls in f were printing output n^2 times as I changed n.

I can imagine this is hard to detect at compile time, since it isn't obvious whether g will call pns many times. But a runtime profiler should be possible, by checking whether a function is being called repeatedly with Equal but not same (um, thunk?)