For the last few days I have been thinking about how to write a low-level program optimiser, based on the ideas from
Supero. Supero works at the level of a lazy Core expressions, but actual hardware works on a sequence of strict instructions. The possible idea is to translate the lazy expressions to strict sequences, then borrow the ideas from supercompilation once more. In particular I have been looking at the
GRIN approach, which defines such a set of instructions.
The GRIN work is very clever, and has many ideas that I would like to reuse. However, the one aspect that gave me slight concern is the complexity. A GRIN program requires the use of several analysis passes, and many many transformation rules. While this approach is perfectly acceptable, one of the goals of the Supero work is to make the optimisation process simpler -- comprising of a few simple but powerful rules.
I will first explain how strictness works, then how my speculative approach works. Readers who already know about unboxing are encouraged to skip to the speculative section.
StrictnessWhen doing low-level compilation, one of the most important stages is strictness analysis, and the associated unboxing. To take the example of the
factorial function in Haskell:
factorial :: Int -> Int
factorial n = if n > 0 then n * factorial (n-1) else 1
Here it is easy to see that the
factorial function always evaluates
n. We can also use our knowledge of the definition of
Int:
data Int = Int# I#
Where
I# is an actual machine integer (possibly stored in a register), and
Int# is a lazy box surrounding it. Since we know that
factorial will always unwrap our
n, we can pass the
n around without the
Int# box. I have made all the conversions from
I# to
Int explicit using an
Int#, but have left all the unboxings implicit. The operators
># etc. are simply unboxed and strict variants of the standard operators.
factorial# :: Int# -> Int
factorial# n# = if n# ># 0 then n# *# factorial (Int# n# - 1) else 1
Also, since we know
factorial is strict in its first argument, we can evaluate the first argument to the recursive call strictly. Applying all these optimisations can now write:
factorial# :: Int# -> Int
factorial# n# = if n# ># 0 then n# *# factorial (n# -# 1) else 1
We have removed the explicit boxing in the recursive call, and work entirely with unboxed integers. Now
factorial is entirely strict. We can even write a wrapper around our strict version, to provide a lazy interface matching the original.
factorial :: Int -> Int
factorial n = factorial# n#
I have used
n# to denote the unboxing of
n. Now
factorial looks like it did before, but operates much faster, on unboxed integers.
SpeculativeI would like to not include a strictness analyser in my optimiser, or if it is included, have it be the result of a series of transformations -- without explicit "stop and analyse" then "use the results" stages. As part of my thoughts on this, I was trying to consider how to optimise
factorial without invoking the strictness analyser.
The speculative transformation I have defined first generates
factorial# - I have left out the details of
why it decides to.
factorial :: Int -> Int
factorial n = if n > 0 then n * factorial (n-1) else 1
factorial# :: Int# -> Int
factorial# n# = if n# ># 0 then n# *# factorial (Int# n# - 1) else 1
This step is entirely safe - we have defined
factorial#, but we have not written a wrapper that invokes it, even in the recursive case. The
factorial# function is equivalent to
factorial if the initial argument was evaluated. We have transformed
factorial# using only local knowledge, at the point. We can also transform
factorial, replacing any uses of
n which are guaranteed to come after
n is evaluated, with
(Int# n#). This transformation is merely reusing the knowledge we have gained unwrapping
n:
factorial n = if n > 0 then Int# n# * factorial (Int# n# - 1) else 1
Now we promote any primitive operations on only unboxed values. Given
(-), it is cheaper to evalute the subtraction than to store a lazy thunk to the function.
factorial n = if n > 0 then Int# n# * factorial (Int# (n# -# 1)) else 1
factorial# n# = if n# ># 0 then n# *# factorial (Int# (n# -# 1)) else 1
We can now use our knowledge that if we know an argument to a function is already evaluated, we can call the strict variant (this corresponds closely to
constructor specialisation):
factorial n = if n > 0 then n# *# factorial# (n# -# 1) else 1
factorial# n# = if n# ># 0 then n# *# factorial# (n# -# 1) else 1
We can also replace the
* in
factorial with
*# as we know we will have to evaluate the result of a function. Now we have ended up with a fast inner loop, operating only on unboxed integers. We have not required strictness information to make any transformation.
One way of viewing the difference between strictness and this transformation is the flow of information. In strictness, the caller is informed that a particular argument will be evaluated. In speculative, the callee informs the caller that an argument has already been evaluated. These two concepts are not the same, and while they overlap, there are instances where they differ considerably.
Strict vs. SpeculativeConsider the following example:
strict :: Int -> Int
strict x = x `seq` lazy x (x-1) (x+1)
lazy :: Int -> Int -> Int -> Int
lazy a b c = if a == 0 then b else c
Here the
lazy function is strict in
a, but not either of
b or
c. A strictness analyser would generate a variant of
lazy with only the first argument unboxed. In contrast the speculative variant will determine that
x-1 and
x+1 should be evaluated, and pass unboxed values in all arguments of
lazy, even though
lazy may not evaluate
b or
c.
To see this behaviour in GHC, it helps to make
lazy recursive:
module Temp where
strict :: Int -> Int
strict x = x `seq` lazy x (x+1) (x-1)
lazy :: Int -> Int -> Int -> Int
lazy a b c = if a == 0 then lazy b b b else c
Now run with the options
ghc Temp.hs -c -O2 -ddump-simpl, and you will see the
lazy variant has type
lazy :: Int# -> Int -> Int -> Int.
These thoughts are still very preliminary, and there are a number of unanswered questions:
- What is the overlap between strict and speculative?
- Can both variants be combined? (almost certainly yes)
- Is speculative really simpler?
- Is speculative sufficient?
- What are the performance benefits of speculative?