I've been working on optimising Haskell with the Supero project for a few months now. I've had some good success with small benchmarks, but scaling up to larger examples is always a challenge. Even a small benchmark, such as those from the imaginary section of nobench, can pull in nearly 1Mb of code from libraries. Debugging through 100,000 lines of low-level transformed Haskell is not pleasant!
Anyway, I now have some results on some of the nobench programs, so I thought I'd share them. As with all benchmarks, there are a whole pile of disclaimers:
- The benchmarks are unmodified from nobench, unlike last time I have not replaced key functions (i.e. getContents) with alternative versions.
- The Supero programs are optimised by Supero, then fed back into GHC - GHC's excellent native code generation is benefiting both programs.
- All the results are compared to GHC 6.6.1 -O2, which lacks SpecConstr, and is likely to give a benefit to the GHC benchmark.
- The benchmarks were generally chosen because they did not read an input file, and didn't output too much - I wanted this to be a test of computation, not IO. The benchmarks are just the ones I have tackled so far.
- The results are not stable. Yesterday two of the benchmarks showed substantially better results for Supero, and three didn't go through at all. Things are changing fast - there is no guarantee that any final release of Supero will match these numbers.
With those disclaimers out of the way, on to the graph:
All numbers are given relative to GHC taking 100%, i.e. all the benchmarks are at least 10% faster - and one is nearly three times faster.
It is actually quite an achievement to draw with GHC in these benchmarks. GHC treats class dictionaries in a special manner (I think), and optimises them out specially. GHC also performs a reasonable amount of foldr/build fusion in several of the tests. Since these transformations rely on names and special knowledge, when compiling Yhc generated Haskell they can no longer be applied - so Supero must perform all these tricks to draw with GHC.
I am sure there is further scope for improvement. Supero is a moving target - yesterday the primes benchmark went 50% faster, but now it is only 12% faster. Three days ago I rewrote the whole optimiser. It will be a while before large scale Haskell programs can be optimised, but it is certainly the final goal.
Surely your graph should be labelled "Time relative to GHC"? Speed is the reciprocal of time.
ReplyDeleteGwenhwyfaer, you are indeed correct. Imagine that change in your head, as I don't have the Excel file on this machine :)
ReplyDeleteThose ratios look refreshingly different from the corresponding jhc/GHC ratios!
ReplyDeleteGood work! Will this stuff ever make it into GHC or is it YHC only? I'd love to have the whole "package" in a working industrial strength compiler. Nice to get some speed without having to write code that's lower level than C, I hope it works out!
ReplyDeletesebastian: I'm using Yhc as a test bed, but its not in the standard code of Yhc. Once all the issues have been worked out and polished perfectly, then is the time to think about making it a GHC transformation - but should be feasible if the results make it worthwhile.
ReplyDelete