Monday, March 10, 2008

Sorting At Speed

Sorting is currently a hot topic within the the Haskell community. Christopher brought it up in a recent thread on the mailing list, and this weekend I ended up spending several hours looking at sort routines.

I was browsing through the Yhc standard libraries, as one does on the weekend, and was drawn to Yhc's sort function. It had some undesirable characteristics for one of the projects I was working on, so I wondered if other Haskell systems used different implementations. I checked GHC, and discovered it their sort was different. In general, when Yhc and GHC have different implementations of a standard library function, the GHC one is better tuned for performance. I decided to replace the Yhc sort function with the GHC one, but before doing so, thought a quick performance test was in order. So I came up with something simple:

test = do
src <- readFile "Sort.hs"
print $ ordered $ sort $ sort $ reverse $ sort src

The file "sort.txt" was simply the source code to the program. The code sorts the contents of this file, then reverses it, sorts it and sorts it again. This means that the program performs one sort over semi-random data, one over reverse-ordered data and one over ordered-data. These are some fairly standard cases that should be checked. This test is not a comprehensive benchmark, but a nice quick indicator.

I ran the Yhc sort function against the GHC version, and was shocked to find that the Yhc code was twice as fast. I ran the benchmark under Yhc, GHC and Hugs (using reduction count in Hugs), and in all cases the performance was doubled. I was not expecting this result!

The code for the GHC sort is:

mergesort :: (a -> a -> Ordering) -> [a] -> [a]
mergesort cmp = mergesort' cmp . map (:[])

mergesort' :: (a -> a -> Ordering) -> [[a]] -> [a]
mergesort' cmp [] = []
mergesort' cmp [xs] = xs
mergesort' cmp xss = mergesort' cmp (merge_pairs cmp xss)

merge_pairs :: (a -> a -> Ordering) -> [[a]] -> [[a]]
merge_pairs cmp [] = []
merge_pairs cmp [xs] = [xs]
merge_pairs cmp (xs:ys:xss) = merge cmp xs ys : merge_pairs cmp xss

merge :: (a -> a -> Ordering) -> [a] -> [a] -> [a]
merge cmp [] ys = ys
merge cmp xs [] = xs
merge cmp (x:xs) (y:ys)
= case x `cmp` y of
GT -> y : merge cmp (x:xs) ys
_ -> x : merge cmp xs (y:ys)

The function works by splitting the list into one element lists, resulting in each basic list being ordered. These lists are then merged in pairs until a single list is left. For example, given the input "sort":

"s" "o" "r" "t"
"os" "rt"

We first split each character into its own list, then merge adjacent pairs. This code corresponds to the standard merge sort. But instead of making each initial list a single element, we could use sequences of increasing elements, for example using the risers function:

risers :: Ord a => [a] -> [[a]]
risers [] = []
risers [x] = [[x]]
risers (x:y:etc) = if x <= y then (x:s):ss else [x]:(s:ss)
where (s:ss) = risers (y:etc)

Now, if we apply risers "sort" we get ["s","ort"]. We can now follow the same merge procedure as before:

"s" "ort"

Instead of doing 3 merges, we have done only 1. Given the input "abcd" the effect would have been even more dramatic. We can refine this scheme further, by detecting both ascending and descending chains of elements in the initial list. This technique is used by Yhc, and is based on code originally written by Lennart Augustsson. Knowing the original source of the code, my shock at the performance benefits offered by the Yhc version have decreased substantially.

The GHC sort function should probably be replaced by the one from the Yhc libraries. This would offer increased performance, especially in the case of ordered or reverse-ordered lists. The asymptotic complexity of the two sorts means that there must exist some value of n such that sort [1..n] runs faster in Yhc than GHC (assuming sufficient stack/heap for both). I wonder whether Int32 is capable of expressing such a value...

A side note: I have been playing with the risers function for several years. I've used it as an example of pattern match checking, both specialised to Int and on a general Ord class. I've used it for supercompiling. It's appeared in blog posts, a TFP paper (Mitchell+Runciman 2007), a Haskell Workshop paper (Xu 2006) etc. I originally had this example suggested from a functional programming exam paper, but only today at lunch did I discover its true origins. The risers function had originally been the first-step in a merge sort!


Anonymous said...

In summary:

* Out of 3 sorts in your test, 2 use sorted inputs;
* Yhc version has a special case for sorted lists;
* Yhc version is faster for that test.

No so surprising, is it?
What about performance for non-sorted inputs?

Neil Mitchell said...

Anon: Don't think competition, think exploration of the design space with interesting ideas. It isn't Yhc vs GHC, it's a big group hug where everyone benefits.

My benchmark was a simple one, as a first-cut basic indicator. It wasn't designed to show Yhc being faster, it was designed to convince me (and only me) that I could drop the Yhc sorting routine. The fact that Yhc has a special case for sorted lists is very surprising!

My initial indicators for performance on a non-sorted input is that its ~20% faster, but those numbers are so vague as to be an indicator only, not a result.

Sammy Larbi said...

What is the computational complexity of Yhc? Could you just add the presorted list tests to GHC?

Is this a general sort? If so, why put so much weight on the special cases of presorted lists?

Neil Mitchell said...

Sami: Yhc is O(n) for ordered lists, and O(n log n) otherwise. For lists with various ordered chunks, Yhc is somewhere in between. GHC is O(n log n) always.

The Yhc sort is nearly identical to GHC in the merge sort bit, its just the splitting up bit where Yhc wins. You could certainly add the splitting up bit to GHC, which is what I suggest is done. Effectively, this is "use the Yhc algorithm".

It is a general sort, and it is faster in the general case as well, by a constant factor. In real world sorting, it turns out that often the input is either sorted or reverse sorted - hence it is worth considering these cases. They don't deserve to be given equal weight to sorting a random list (as my test does), but then my test was just an indicator, not a result.

Neil Mitchell said...

Sammy: apologies for misspelling your name - I guess I got the trailing i out of Larbi! Having the comment box at the top of the page, and your comment at the bottom certainly didn't help (bad UI in blogger).

Anonymous said...

Consider using microbench ( for doing this sort of test in the future -- it's exactly the sort of thing I wrote it for!

Mikael said...

An additional note - I use risers - more or less as defined here - for algebraic topology. :)

Sammy Larbi said...

I guess then really Yhc is nlogn + n for unsorted data? (with the understanding that the extra +n becomes less relevant for large enough data sets)

So the question is if the default sort should be a faster for the special case of sorting presorted lists at the expense of being a bit slower for most sorts.

I would choose the nlogn regardless, under the assumption most data will not be presorted. But, that assumption could be wrong of course, depending on your usage.

PS: No worries about the name. I knew what you meant. =)

Neil Mitchell said...

Sammy: The original GHC version initially calls map (:[]), so they are both (a*n*log(n) + b*n + c). [Significant simplification, but that's the general idea]

In the case of a sorted list, for Yhc, a=0, b=1 c=1, i.e. very quick sort. GHC would be a=1 b=1 c=1

In the case of an unsorted list, Yhc a half of GHC's, but b about twice of GHC's. In that case, it probably comes down to the minor constant factors floating around. i.e. with Yhc's sort (or Lennart's sort, as would be a better name) you get faster performance for sorted and reverse sorted, and for unsorted you get roughly the same, perhaps slightly better. It's like a whole bottle of win :-)

Vesa Karvonen said...

The basic idea is rather well known. In the book ML for the Working Programmer, Paulson attributes it to Richard O'Keefe: A smooth applicative merge sort. Research paper 182, Department of AI, Edinburgh University, 1982.