Neil Mitchell's Blog (Haskell etc): yhc

Showing posts with label yhc. Show all posts

Monday, March 10, 2008

Sorting At Speed

Sorting is currently a hot topic within the the Haskell community. Christopher brought it up in a recent thread on the mailing list, and this weekend I ended up spending several hours looking at sort routines.

I was browsing through the Yhc standard libraries, as one does on the weekend, and was drawn to Yhc's sort function. It had some undesirable characteristics for one of the projects I was working on, so I wondered if other Haskell systems used different implementations. I checked GHC, and discovered it their sort was different. In general, when Yhc and GHC have different implementations of a standard library function, the GHC one is better tuned for performance. I decided to replace the Yhc sort function with the GHC one, but before doing so, thought a quick performance test was in order. So I came up with something simple:


test = do
    src <- readFile "Sort.hs"
    print $ ordered $ sort $ sort $ reverse $ sort src

The file "sort.txt" was simply the source code to the program. The code sorts the contents of this file, then reverses it, sorts it and sorts it again. This means that the program performs one sort over semi-random data, one over reverse-ordered data and one over ordered-data. These are some fairly standard cases that should be checked. This test is not a comprehensive benchmark, but a nice quick indicator.

I ran the Yhc sort function against the GHC version, and was shocked to find that the Yhc code was twice as fast. I ran the benchmark under Yhc, GHC and Hugs (using reduction count in Hugs), and in all cases the performance was doubled. I was not expecting this result!

The code for the GHC sort is:


mergesort :: (a -> a -> Ordering) -> [a] -> [a]
mergesort cmp = mergesort' cmp . map (:[])

mergesort' :: (a -> a -> Ordering) -> [[a]] -> [a]
mergesort' cmp [] = []
mergesort' cmp [xs] = xs
mergesort' cmp xss = mergesort' cmp (merge_pairs cmp xss)

merge_pairs :: (a -> a -> Ordering) -> [[a]] -> [[a]]
merge_pairs cmp [] = []
merge_pairs cmp [xs] = [xs]
merge_pairs cmp (xs:ys:xss) = merge cmp xs ys : merge_pairs cmp xss

merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a]
merge cmp [] ys = ys
merge cmp xs [] = xs
merge cmp (x:xs) (y:ys)
 = case x `cmp` y of
        GT -> y : merge cmp (x:xs)   ys
        _  -> x : merge cmp    xs (y:ys)

The function works by splitting the list into one element lists, resulting in each basic list being ordered. These lists are then merged in pairs until a single list is left. For example, given the input "sort":


"s"  "o"  "r"  "t"
  "os"      "rt"
      "orst"

We first split each character into its own list, then merge adjacent pairs. This code corresponds to the standard merge sort. But instead of making each initial list a single element, we could use sequences of increasing elements, for example using the risers function:


risers :: Ord a => [a] -> [[a]]
risers [] = []
risers [x] = [[x]]
risers (x:y:etc) = if x <= y then (x:s):ss else [x]:(s:ss)
     where (s:ss) = risers (y:etc)

Now, if we apply risers "sort" we get ["s","ort"]. We can now follow the same merge procedure as before:


"s"    "ort"
  "orst"

Instead of doing 3 merges, we have done only 1. Given the input "abcd" the effect would have been even more dramatic. We can refine this scheme further, by detecting both ascending and descending chains of elements in the initial list. This technique is used by Yhc, and is based on code originally written by Lennart Augustsson. Knowing the original source of the code, my shock at the performance benefits offered by the Yhc version have decreased substantially.

The GHC sort function should probably be replaced by the one from the Yhc libraries. This would offer increased performance, especially in the case of ordered or reverse-ordered lists. The asymptotic complexity of the two sorts means that there must exist some value of n such that sort [1..n] runs faster in Yhc than GHC (assuming sufficient stack/heap for both). I wonder whether Int32 is capable of expressing such a value...

A side note: I have been playing with the risers function for several years. I've used it as an example of pattern match checking, both specialised to Int and on a general Ord class. I've used it for supercompiling. It's appeared in blog posts, a TFP paper (Mitchell+Runciman 2007), a Haskell Workshop paper (Xu 2006) etc. I originally had this example suggested from a functional programming exam paper, but only today at lunch did I discover its true origins. The risers function had originally been the first-step in a merge sort!

Wednesday, May 16, 2007

13% faster than GHC*

* As usual, I'm going to qualify this with "on one particular benchmark, on one particular machine, with one particular set of following wind".

I've been working on optimisation techniques for the last week, as one chapter of my PhD is likely to be on that topic. The approach I'm using is to take Yhc.Core, optimise it at the Core level, then spit out GHC Haskell which can then be compiled. The optimisation technique I've been working on can best be described as "deforestation on steroids" - its entirely automatic (no notion of special named functions like in foldr/build etc) and isn't too expensive.

The particular benchmark I've been working on is "print . length =<< getContents" or "wc -c". In order to make it a fair comparison, I require that all implementations must use the C getchar function at their heart for getting the next character - not allowing any buffering or block reads. This is actually a substantial benefit for C, since in Haskell things like clever buffering can be done automatically (and GHC does them!) - but it does result in a test which is purely on computational crunch.

Writing this benchmark in C is quite easy:


#include "stdio.h"

int main()
{
int i = 0;
while (getchar() != EOF)
    i++;
printf("%i\n", i);
return 0;
}

Writing in Haskell, forcing the use of getchar, is a little bit more tricky:


{-# OPTIONS_GHC -fffi #-}

import Foreign.C.Types
import System.IO
import System.IO.Unsafe

foreign import ccall unsafe "stdio.h getchar" getchar :: IO CInt

main = print . length =<< getContents2

getContents2 :: IO String
getContents2 = do
  c <- getchar
  if c == (-1) then return [] else do
      cs <- unsafeInterleaveIO getContents2
      return (toEnum (fromIntegral c) : cs)

I then benchmarked the C on -O3, versus the Haskell going through my optimiser, and going through GHC on -O2. The benchmark was counting the characters in a 35Mb file, run repeatedly, as usual:

The actual numbers are C=4.830, Supero=5.064, GHC=5.705. This makes Supero 5% slower than C, and GHC 18% slower than C. The reason that Supero is faster is because its managed to deforest the length/getContents pair. The inner loop in Supero allocates no data at all, compared to the GHC one which does. Supero allocates 11,092 bytes in the heap, GHC allocates 1,104,515,888.

This benchmark is not a particularly good one - the task is simple, the computation is minimal and most expense is spent inside the getchar function. It is however a good place to start, and shows that Haskell can get very close to C speeds - even using standard type String = [Char] processing. Supero has made global optimisations that GHC was unable to, and does not have any techniques with which it could acheive the same results. GHC has then taken the Supero output and automatically identified strictness, performed unboxing/inlining, and done lots of other useful low-level optimisations.

I'm going to move on to "wc -l" next, and hopefully onto the nobench suite. The techniques used in Supero are all general, and I hope that as the complexity of the problem increases, the payoff will do so too.

Wednesday, March 28, 2007

let vs letrec

I've been working hard on Catch for the last few months, and will hopefully have something to release to the world very soon - a few people have alpha copies already. As part of that I've been playing with Yhc Core lots and lots. At once stage I introduced a bug into Yhc, by accident, due to the difference between let and letrec, so I thought I'd describe my mistake for the whole world to see.

Haskell has a keyword called let, but some functional languages have both let and letrec - Haskell's form (although called let) is actually letrec. What's the difference? Consider the fragment of code:

let x = 1 in (let x = x in x)

If we are dealing with let, the answer is 1. If we are dealing with letrec, the answer is a black hole of non-termination - _|_. The rule is that in a letrec you introduce the names you are defining before computing their values (so the x = x refers to itself), but with let you only introduce the names once you get to in (so the x = 1 is still in scope).

Often people define let's in a non-recursive way, but occasionally the recursion is fundamental:

repeat xs = let res = x : res in res

Here repeat is being defined with letrec - the res on the LHS is used on the RHS. This enables only one (:) cell to ever be created, saving memory.

Yhc.Core has let, which is recursive - unlike GHC Core which has both let and letrec. Yhc.Core also provides a method "removeRecursiveLet" which removes all recursive lets, replacing them with equivalent code.

So now we come to the bug, I recently changed Yhc.Core so that when desugaring FatBar/Failure (from complex pattern matches) we introduce let bindings, instead of inlining the let multiple times. This can lead to a substantial reduction in code size for bigger programs. The basic idea is that before when a Failure was spotted, we replaced it with an expression, now we replace it with a variable which we bind to the expression. If there are multiple failures with the same expression, we gain with less code.

Given the function:

binAdd False as [] = as
binAdd False [] (b:bs) = b:bs

This desugared to:

| Num.binAdd v248 v249 v250 =
| let v_fail = error "Pattern match failure"
| in case v248 of
| Prelude.False ->
| let v_fail = case v249 of
| Prelude.[] -> case v250 of
| (Prelude.:) v251 v252 -> (Prelude.:) v251 v252
| _ -> v_fail
| _ -> v_fail
| in case v250 of
| Prelude.[] -> v249
| _ -> v_fail
| _ -> v_fail

As you might be able to see, there are two vfail's here which are part of a letrec, when I really was hoping for a let. Adding a simple numbering scheme to the v_fails mades these refer to different ones and the problem was solved.

Saturday, December 30, 2006

End of year update

Since most people are doing a year summary, I thought I should do one. Then I remembered that I can barely remember what I did yesterday (for example today I went in to work only to discover it was a Saturday...). So instead I'm going to try and predict what I'll do in 2007, with the proviso that it's a complete pile of fiction, but some of it might actually happen.

PhD + Catch

2007 should be the year I complete my PhD. Its certainly the last year I get any funding for my PhD. With aiming to complete my PhD, I also want to release a really good version of Catch - the Hsakell pattern match checker. I've been working away on this for over 2 years now, so would like something I can point at. I'm hoping Catch will be out by March, but I realise I said "a few months" at least 9 months ago now - it turns out the Halting Problem is quite hard...

Hoogle 4

I think Hoogle 4 has been knocking around for about 8 months now. I'd actually like to get this finished, and be the main version of the website. This brings just a complete polishing of Hoogle. Hoogle seems to be a well used tool, and hopefully Hoogle 4 will just make it better at everything and much more reliable.

Yhc

One project which I really want to polish off and put a release out of is Yhc. Yhc has lots of really exciting features, and is evolving pretty fast. The Yhc Javascript stuff is particularly interesting - the demos just keep getting better and better. The Yhc.Core side of things is looking really impressive. My optimisations stuff is starting to look really good. If I ever finish WinHaskell, Yhc might become a very nice choice for the default compiler.

Parsing

I really really want to finally implement my parser. Unfortunately, just taking a look at the things above this makes that rather unlikely. If you add in distractions like Hat, WinHaskell, WinHugs etc then it looks a lot less likely. As such I'm going to make the target to formalise my state machine algorithms and write a state machine library in Haskell which I can use as the basis for my parser.

I've tried to keep this list as short as possible, to actually give me a chance of doing some of it. We'll see if that happens.

Thursday, September 21, 2006

DriFT

I tried out DriFT recently, and was most impressed, it has the feel of a program that hasn't had much love recently, but thats ok, its still a very useful program.

Firstly there is no easy way to compile it on Windows - its not hard, its just not obvious either. For reference the steps are 1) cd src, 2) ren Version.hs.in Version.hs, 3) ghc --make Drift.hs. For those Windows users who don't want to do that, I've shoved up a binary on my Windows distribution page. Its certainly not hard, but its not as easy as Cabal based thingies.

Once you've done that, "drift -r File.hs" produces the goods, in a very straight forward way. What I wanted was a deriving Binary, with a loadBinary and a saveBinary interface. DriFT offers two separate binary output modes, Binary and GhcBinary - I wanted to use it in Hugs, but a bit of experimentation showed that GhcBinary has a nicer output, so I jumped with that. Once thats done I wanted to combine it with the Binary library to do the actual serialisation - there is one in the repo but it seems to have not been given the attention it needs, so was easier to write my own. See this file in the Yhc repo for how I did it - the answer is not very nicely, but quite workable.

With all this done, now Yhc spits out Core in binary, but more importantly my PhD program now has binary cache's - changing some operations from 30 seconds to 2 seconds, which is nice :)

Neil Mitchell's Blog (Haskell etc)