The GRIN work is very clever, and has many ideas that I would like to reuse. However, the one aspect that gave me slight concern is the complexity. A GRIN program requires the use of several analysis passes, and many many transformation rules. While this approach is perfectly acceptable, one of the goals of the Supero work is to make the optimisation process simpler -- comprising of a few simple but powerful rules.

I will first explain how strictness works, then how my speculative approach works. Readers who already know about unboxing are encouraged to skip to the speculative section.

**Strictness**

When doing low-level compilation, one of the most important stages is strictness analysis, and the associated unboxing. To take the example of the

`factorial`function in Haskell:

factorial :: Int -> Int

factorial n = if n > 0 then n * factorial (n-1) else 1

Here it is easy to see that the

`factorial`function always evaluates

`n`. We can also use our knowledge of the definition of

`Int`:

data Int = Int# I#

Where

`I#`is an actual machine integer (possibly stored in a register), and

`Int#`is a lazy box surrounding it. Since we know that

`factorial`will always unwrap our

`n`, we can pass the

`n`around without the

`Int#`box. I have made all the conversions from

`I#`to

`Int`explicit using an

`Int#`, but have left all the unboxings implicit. The operators

`>#`etc. are simply unboxed and strict variants of the standard operators.

factorial# :: Int# -> Int

factorial# n# = if n# ># 0 then n# *# factorial (Int# n# - 1) else 1

Also, since we know

`factorial`is strict in its first argument, we can evaluate the first argument to the recursive call strictly. Applying all these optimisations can now write:

factorial# :: Int# -> Int

factorial# n# = if n# ># 0 then n# *# factorial (n# -# 1) else 1

We have removed the explicit boxing in the recursive call, and work entirely with unboxed integers. Now

`factorial`is entirely strict. We can even write a wrapper around our strict version, to provide a lazy interface matching the original.

factorial :: Int -> Int

factorial n = factorial# n#

I have used

`n#`to denote the unboxing of

`n`. Now

`factorial`looks like it did before, but operates much faster, on unboxed integers.

**Speculative**

I would like to not include a strictness analyser in my optimiser, or if it is included, have it be the result of a series of transformations -- without explicit "stop and analyse" then "use the results" stages. As part of my thoughts on this, I was trying to consider how to optimise

`factorial`without invoking the strictness analyser.

The speculative transformation I have defined first generates

`factorial#`- I have left out the details of

*why*it decides to.

factorial :: Int -> Int

factorial n = if n > 0 then n * factorial (n-1) else 1

factorial# :: Int# -> Int

factorial# n# = if n# ># 0 then n# *# factorial (Int# n# - 1) else 1

This step is entirely safe - we have defined

`factorial#`, but we have not written a wrapper that invokes it, even in the recursive case. The

`factorial#`function is equivalent to

`factorial`if the initial argument was evaluated. We have transformed

`factorial#`using only local knowledge, at the point. We can also transform

`factorial`, replacing any uses of

`n`which are guaranteed to come after

`n`is evaluated, with

`(Int# n#)`. This transformation is merely reusing the knowledge we have gained unwrapping

`n`:

factorial n = if n > 0 then Int# n# * factorial (Int# n# - 1) else 1

Now we promote any primitive operations on only unboxed values. Given

`(-)`, it is cheaper to evalute the subtraction than to store a lazy thunk to the function.

factorial n = if n > 0 then Int# n# * factorial (Int# (n# -# 1)) else 1

factorial# n# = if n# ># 0 then n# *# factorial (Int# (n# -# 1)) else 1

We can now use our knowledge that if we know an argument to a function is already evaluated, we can call the strict variant (this corresponds closely to constructor specialisation):

factorial n = if n > 0 then n# *# factorial# (n# -# 1) else 1

factorial# n# = if n# ># 0 then n# *# factorial# (n# -# 1) else 1

We can also replace the

`*`in

`factorial`with

`*#`as we know we will have to evaluate the result of a function. Now we have ended up with a fast inner loop, operating only on unboxed integers. We have not required strictness information to make any transformation.

One way of viewing the difference between strictness and this transformation is the flow of information. In strictness, the caller is informed that a particular argument will be evaluated. In speculative, the callee informs the caller that an argument has already been evaluated. These two concepts are not the same, and while they overlap, there are instances where they differ considerably.

**Strict vs. Speculative**

Consider the following example:

strict :: Int -> Int

strict x = x `seq` lazy x (x-1) (x+1)

lazy :: Int -> Int -> Int -> Int

lazy a b c = if a == 0 then b else c

Here the

`lazy`function is strict in

`a`, but not either of

`b`or

`c`. A strictness analyser would generate a variant of

`lazy`with only the first argument unboxed. In contrast the speculative variant will determine that

`x-1`and

`x+1`should be evaluated, and pass unboxed values in all arguments of

`lazy`, even though

`lazy`may not evaluate

`b`or

`c`.

To see this behaviour in GHC, it helps to make

`lazy`recursive:

module Temp where

strict :: Int -> Int

strict x = x `seq` lazy x (x+1) (x-1)

lazy :: Int -> Int -> Int -> Int

lazy a b c = if a == 0 then lazy b b b else c

Now run with the options

`ghc Temp.hs -c -O2 -ddump-simpl`, and you will see the

`lazy`variant has type

`lazy :: Int# -> Int -> Int -> Int`.

These thoughts are still very preliminary, and there are a number of unanswered questions:

- What is the overlap between strict and speculative?
- Can both variants be combined? (almost certainly yes)
- Is speculative really simpler?
- Is speculative sufficient?
- What are the performance benefits of speculative?

## 6 comments:

The "speculative" way was the only thing the the early versions of the LML compiler had. It takes you part of the way, but not all the way.

Try factorial with an accumulator. The tricky part is to realize that it is strict in the accumulator.

augustss: Very interesting to see this has been done before. factorial with an accumulator still works, assuming:

fac n = fac' 0 n

fac' a n = if n > 0 then fac' (a*n) (n-1) else a

Here the 0 will be recognised as already evaluated, so the accumulator will be unboxed in fac'. The second recursive call to fac' will then call a variant with both arguments unboxed. Using _ as unboxed and # as boxed:

fac_ n = fac'#_ a n

fac'#_ a n = ... fac'## ...

fac'## a n = ... fac'## ...

Now we have a tight unboxed loop.

Neil, afaics almost all of your factorial# functions have type-errors in them. The types signature says that they should return a boxed Int. However, in the recursive case you use unboxed multiplication. Furthermore you use unboxed multiplication on unboxed integers coming from the factorial function.

Also, in some functions you refer to n# without it being bound. The following pattern should solve that problem: n@(Int# n#)

Sorry for being such a nit pick but I think that fixing these errors would make it easier to follow your ideas.

Josef: The return types are rather curious - since all functions must evaluate to WHNF before returning, there actually isn't any difference between returning Int and Int# at this level. I have stated that conversions from Int to Int# are implicit, to kind of pass over this issue without going into it too much.

The reason I also avoided explicitly mentioning how n# is constructed is because it must be due to a case on n, and the code gets rather messy very quickly. You can't do it at the start, because that requires strictness information - you actually have to use the case n of Int# n#, then all subsequent n's can be rewritten, but its tricky.

But what if I want to use fac', so you can't guarantee that the accumulator will always be an already evaluated number?

augustss: If you use a non-evaluated accumulator, then the laziness will leak all the way through every recursive step, which would be a real shame, and perform a lot worse. I can't think of any way of getting that back without strictness analysis.

Of course, if the program gains this information from supercompilation, the chances of the accumulator being non-strict are relatively low.

Post a Comment