For the last few days I have been thinking about how to write a low-level program optimiser, based on the ideas from

Supero. Supero works at the level of a lazy Core expressions, but actual hardware works on a sequence of strict instructions. The possible idea is to translate the lazy expressions to strict sequences, then borrow the ideas from supercompilation once more. In particular I have been looking at the

GRIN approach, which defines such a set of instructions.

The GRIN work is very clever, and has many ideas that I would like to reuse. However, the one aspect that gave me slight concern is the complexity. A GRIN program requires the use of several analysis passes, and many many transformation rules. While this approach is perfectly acceptable, one of the goals of the Supero work is to make the optimisation process simpler -- comprising of a few simple but powerful rules.

I will first explain how strictness works, then how my speculative approach works. Readers who already know about unboxing are encouraged to skip to the speculative section.

**Strictness**When doing low-level compilation, one of the most important stages is strictness analysis, and the associated unboxing. To take the example of the

`factorial` function in Haskell:

factorial :: Int -> Int

factorial n = if n > 0 then n * factorial (n-1) else 1

Here it is easy to see that the

`factorial` function always evaluates

`n`. We can also use our knowledge of the definition of

`Int`:

data Int = Int# I#

Where

`I#` is an actual machine integer (possibly stored in a register), and

`Int#` is a lazy box surrounding it. Since we know that

`factorial` will always unwrap our

`n`, we can pass the

`n` around without the

`Int#` box. I have made all the conversions from

`I#` to

`Int` explicit using an

`Int#`, but have left all the unboxings implicit. The operators

`>#` etc. are simply unboxed and strict variants of the standard operators.

factorial# :: Int# -> Int

factorial# n# = if n# ># 0 then n# *# factorial (Int# n# - 1) else 1

Also, since we know

`factorial` is strict in its first argument, we can evaluate the first argument to the recursive call strictly. Applying all these optimisations can now write:

factorial# :: Int# -> Int

factorial# n# = if n# ># 0 then n# *# factorial (n# -# 1) else 1

We have removed the explicit boxing in the recursive call, and work entirely with unboxed integers. Now

`factorial` is entirely strict. We can even write a wrapper around our strict version, to provide a lazy interface matching the original.

factorial :: Int -> Int

factorial n = factorial# n#

I have used

`n#` to denote the unboxing of

`n`. Now

`factorial` looks like it did before, but operates much faster, on unboxed integers.

**Speculative**I would like to not include a strictness analyser in my optimiser, or if it is included, have it be the result of a series of transformations -- without explicit "stop and analyse" then "use the results" stages. As part of my thoughts on this, I was trying to consider how to optimise

`factorial` without invoking the strictness analyser.

The speculative transformation I have defined first generates

`factorial#` - I have left out the details of

*why* it decides to.

factorial :: Int -> Int

factorial n = if n > 0 then n * factorial (n-1) else 1

factorial# :: Int# -> Int

factorial# n# = if n# ># 0 then n# *# factorial (Int# n# - 1) else 1

This step is entirely safe - we have defined

`factorial#`, but we have not written a wrapper that invokes it, even in the recursive case. The

`factorial#` function is equivalent to

`factorial` if the initial argument was evaluated. We have transformed

`factorial#` using only local knowledge, at the point. We can also transform

`factorial`, replacing any uses of

`n` which are guaranteed to come after

`n` is evaluated, with

`(Int# n#)`. This transformation is merely reusing the knowledge we have gained unwrapping

`n`:

factorial n = if n > 0 then Int# n# * factorial (Int# n# - 1) else 1

Now we promote any primitive operations on only unboxed values. Given

`(-)`, it is cheaper to evalute the subtraction than to store a lazy thunk to the function.

factorial n = if n > 0 then Int# n# * factorial (Int# (n# -# 1)) else 1

factorial# n# = if n# ># 0 then n# *# factorial (Int# (n# -# 1)) else 1

We can now use our knowledge that if we know an argument to a function is already evaluated, we can call the strict variant (this corresponds closely to

constructor specialisation):

factorial n = if n > 0 then n# *# factorial# (n# -# 1) else 1

factorial# n# = if n# ># 0 then n# *# factorial# (n# -# 1) else 1

We can also replace the

`*` in

`factorial` with

`*#` as we know we will have to evaluate the result of a function. Now we have ended up with a fast inner loop, operating only on unboxed integers. We have not required strictness information to make any transformation.

One way of viewing the difference between strictness and this transformation is the flow of information. In strictness, the caller is informed that a particular argument will be evaluated. In speculative, the callee informs the caller that an argument has already been evaluated. These two concepts are not the same, and while they overlap, there are instances where they differ considerably.

**Strict vs. Speculative**Consider the following example:

strict :: Int -> Int

strict x = x `seq` lazy x (x-1) (x+1)

lazy :: Int -> Int -> Int -> Int

lazy a b c = if a == 0 then b else c

Here the

`lazy` function is strict in

`a`, but not either of

`b` or

`c`. A strictness analyser would generate a variant of

`lazy` with only the first argument unboxed. In contrast the speculative variant will determine that

`x-1` and

`x+1` should be evaluated, and pass unboxed values in all arguments of

`lazy`, even though

`lazy` may not evaluate

`b` or

`c`.

To see this behaviour in GHC, it helps to make

`lazy` recursive:

module Temp where

strict :: Int -> Int

strict x = x `seq` lazy x (x+1) (x-1)

lazy :: Int -> Int -> Int -> Int

lazy a b c = if a == 0 then lazy b b b else c

Now run with the options

`ghc Temp.hs -c -O2 -ddump-simpl`, and you will see the

`lazy` variant has type

`lazy :: Int# -> Int -> Int -> Int`.

These thoughts are still very preliminary, and there are a number of unanswered questions:

- What is the overlap between strict and speculative?

- Can both variants be combined? (almost certainly yes)

- Is speculative really simpler?

- Is speculative sufficient?

- What are the performance benefits of speculative?