Friday, November 30, 2007

Nofib benchmarks

I've been working right through the night for the IFL post-publication deadline, trying to get benchmarks for most of the imaginary nofib suite using Supero. It's now almost 9am, and I'm off to work once more, so here are the final raw results I'm going to be using in the IFL paper.

For these numbers, lower is better. The fastest optimiser gets a 1.00, all the others are scaled relative to it. The most red value is the slowest. ghc2 is GHC 6.8.1 with -O2. supero-* is running through the Yhc front end, then through Supero, then through GHC 6.8.1 -O2. none corresponds to the supero optimiser doing no inlining or optimisation, essentially giving GHC the whole program in one go. The other supero-* results are all based on Positive Supercompilation with homeomorphic embedding as the termination criteria and a given generalisation function. whistle is no generalisation criteria, msg is the most specific generalisation, and general is my newly created generalisation function. All the details will be in the paper I'm submitting later today.

I know some places where these results have fallen a bit flat of what they should, but even so, I'm very happy with this!



Josef said...

Interesting results! But given that you used such a powerful technique as supercompilation I would have hoped for even better results. And I think you could get even better results if you use different benchmarks. I've also used nofib for benchmarking but in my experience these programs are not very high level and they have an oldish feel to them. For instance, none of them use monad transformers (at least afaicr). It would be interesting to see what results you would get from a program which is highly parameterized or uses highly generic libraries like monad transformers, syb or something along those lines. That will give some indication of what kind of improvements we might expect when writing very high level programs. The hope is of course that the optimizer will make the high level programs as efficient as the low level ones which does the same thing, thereby eliminating any redundancy we've introduced when abstracting the program.

Neil Mitchell said...

josef: If I could choose the benchmarks, I'd get absolutely wonderful results :-) I realise the microbenchmark style problems aren't that representative, so hopefully larger programs will offer more benefit. Nofib is a bit weird, while none have monad transformers, loads have CAF's and quite a lot have circular programming - things which are quite rare in real programs.

More benchmarks are to come!