The GRIN work is very clever, and has many ideas that I would like to reuse. However, the one aspect that gave me slight concern is the complexity. A GRIN program requires the use of several analysis passes, and many many transformation rules. While this approach is perfectly acceptable, one of the goals of the Supero work is to make the optimisation process simpler -- comprising of a few simple but powerful rules.
I will first explain how strictness works, then how my speculative approach works. Readers who already know about unboxing are encouraged to skip to the speculative section.
Strictness
When doing low-level compilation, one of the most important stages is strictness analysis, and the associated unboxing. To take the example of the factorial function in Haskell:
factorial :: Int -> Int
factorial n = if n > 0 then n * factorial (n-1) else 1
Here it is easy to see that the factorial function always evaluates n. We can also use our knowledge of the definition of Int:
data Int = Int# I#
Where I# is an actual machine integer (possibly stored in a register), and Int# is a lazy box surrounding it. Since we know that factorial will always unwrap our n, we can pass the n around without the Int# box. I have made all the conversions from I# to Int explicit using an Int#, but have left all the unboxings implicit. The operators ># etc. are simply unboxed and strict variants of the standard operators.
factorial# :: Int# -> Int
factorial# n# = if n# ># 0 then n# *# factorial (Int# n# - 1) else 1
Also, since we know factorial is strict in its first argument, we can evaluate the first argument to the recursive call strictly. Applying all these optimisations can now write:
factorial# :: Int# -> Int
factorial# n# = if n# ># 0 then n# *# factorial (n# -# 1) else 1
We have removed the explicit boxing in the recursive call, and work entirely with unboxed integers. Now factorial is entirely strict. We can even write a wrapper around our strict version, to provide a lazy interface matching the original.
factorial :: Int -> Int
factorial n = factorial# n#
I have used n# to denote the unboxing of n. Now factorial looks like it did before, but operates much faster, on unboxed integers.
Speculative
I would like to not include a strictness analyser in my optimiser, or if it is included, have it be the result of a series of transformations -- without explicit "stop and analyse" then "use the results" stages. As part of my thoughts on this, I was trying to consider how to optimise factorial without invoking the strictness analyser.
The speculative transformation I have defined first generates factorial# - I have left out the details of why it decides to.
factorial :: Int -> Int
factorial n = if n > 0 then n * factorial (n-1) else 1
factorial# :: Int# -> Int
factorial# n# = if n# ># 0 then n# *# factorial (Int# n# - 1) else 1
This step is entirely safe - we have defined factorial#, but we have not written a wrapper that invokes it, even in the recursive case. The factorial# function is equivalent to factorial if the initial argument was evaluated. We have transformed factorial# using only local knowledge, at the point. We can also transform factorial, replacing any uses of n which are guaranteed to come after n is evaluated, with (Int# n#). This transformation is merely reusing the knowledge we have gained unwrapping n:
factorial n = if n > 0 then Int# n# * factorial (Int# n# - 1) else 1
Now we promote any primitive operations on only unboxed values. Given (-), it is cheaper to evalute the subtraction than to store a lazy thunk to the function.
factorial n = if n > 0 then Int# n# * factorial (Int# (n# -# 1)) else 1
factorial# n# = if n# ># 0 then n# *# factorial (Int# (n# -# 1)) else 1
We can now use our knowledge that if we know an argument to a function is already evaluated, we can call the strict variant (this corresponds closely to constructor specialisation):
factorial n = if n > 0 then n# *# factorial# (n# -# 1) else 1
factorial# n# = if n# ># 0 then n# *# factorial# (n# -# 1) else 1
We can also replace the * in factorial with *# as we know we will have to evaluate the result of a function. Now we have ended up with a fast inner loop, operating only on unboxed integers. We have not required strictness information to make any transformation.
One way of viewing the difference between strictness and this transformation is the flow of information. In strictness, the caller is informed that a particular argument will be evaluated. In speculative, the callee informs the caller that an argument has already been evaluated. These two concepts are not the same, and while they overlap, there are instances where they differ considerably.
Strict vs. Speculative
Consider the following example:
strict :: Int -> Int
strict x = x `seq` lazy x (x-1) (x+1)
lazy :: Int -> Int -> Int -> Int
lazy a b c = if a == 0 then b else c
Here the lazy function is strict in a, but not either of b or c. A strictness analyser would generate a variant of lazy with only the first argument unboxed. In contrast the speculative variant will determine that x-1 and x+1 should be evaluated, and pass unboxed values in all arguments of lazy, even though lazy may not evaluate b or c.
To see this behaviour in GHC, it helps to make lazy recursive:
module Temp where
strict :: Int -> Int
strict x = x `seq` lazy x (x+1) (x-1)
lazy :: Int -> Int -> Int -> Int
lazy a b c = if a == 0 then lazy b b b else c
Now run with the options ghc Temp.hs -c -O2 -ddump-simpl, and you will see the lazy variant has type lazy :: Int# -> Int -> Int -> Int.
These thoughts are still very preliminary, and there are a number of unanswered questions:
- What is the overlap between strict and speculative?
- Can both variants be combined? (almost certainly yes)
- Is speculative really simpler?
- Is speculative sufficient?
- What are the performance benefits of speculative?
The "speculative" way was the only thing the the early versions of the LML compiler had. It takes you part of the way, but not all the way.
ReplyDeleteTry factorial with an accumulator. The tricky part is to realize that it is strict in the accumulator.
augustss: Very interesting to see this has been done before. factorial with an accumulator still works, assuming:
ReplyDeletefac n = fac' 0 n
fac' a n = if n > 0 then fac' (a*n) (n-1) else a
Here the 0 will be recognised as already evaluated, so the accumulator will be unboxed in fac'. The second recursive call to fac' will then call a variant with both arguments unboxed. Using _ as unboxed and # as boxed:
fac_ n = fac'#_ a n
fac'#_ a n = ... fac'## ...
fac'## a n = ... fac'## ...
Now we have a tight unboxed loop.
Neil, afaics almost all of your factorial# functions have type-errors in them. The types signature says that they should return a boxed Int. However, in the recursive case you use unboxed multiplication. Furthermore you use unboxed multiplication on unboxed integers coming from the factorial function.
ReplyDeleteAlso, in some functions you refer to n# without it being bound. The following pattern should solve that problem: n@(Int# n#)
Sorry for being such a nit pick but I think that fixing these errors would make it easier to follow your ideas.
Josef: The return types are rather curious - since all functions must evaluate to WHNF before returning, there actually isn't any difference between returning Int and Int# at this level. I have stated that conversions from Int to Int# are implicit, to kind of pass over this issue without going into it too much.
ReplyDeleteThe reason I also avoided explicitly mentioning how n# is constructed is because it must be due to a case on n, and the code gets rather messy very quickly. You can't do it at the start, because that requires strictness information - you actually have to use the case n of Int# n#, then all subsequent n's can be rewritten, but its tricky.
But what if I want to use fac', so you can't guarantee that the accumulator will always be an already evaluated number?
ReplyDeleteaugustss: If you use a non-evaluated accumulator, then the laziness will leak all the way through every recursive step, which would be a real shame, and perform a lot worse. I can't think of any way of getting that back without strictness analysis.
ReplyDeleteOf course, if the program gains this information from supercompilation, the chances of the accumulator being non-strict are relatively low.