Comments on Neil Mitchell's Blog (Haskell etc): Sorting At Speed

The basic idea is rather well known. In the book ...

2008-03-31T11:58:00.000+01:00

The basic idea is rather well known. In the book ML for the Working Programmer, Paulson attributes it to Richard O'Keefe: A smooth applicative merge sort. Research paper 182, Department of AI, Edinburgh University, 1982.

Sammy: The original GHC version initially calls ma...

2008-03-28T11:45:00.000+00:00

Sammy: The original GHC version initially calls map (:[]), so they are both (a*n*log(n) + b*n + c). [Significant simplification, but that's the general idea]

In the case of a sorted list, for Yhc, a=0, b=1 c=1, i.e. very quick sort. GHC would be a=1 b=1 c=1

In the case of an unsorted list, Yhc a half of GHC's, but b about twice of GHC's. In that case, it probably comes down to the minor constant factors floating around. i.e. with Yhc's sort (or Lennart's sort, as would be a better name) you get faster performance for sorted and reverse sorted, and for unsorted you get roughly the same, perhaps slightly better. It's like a whole bottle of win :-)

I guess then really Yhc is nlogn + n for unsorted ...

2008-03-26T13:23:00.000+00:00

I guess then really Yhc is nlogn + n for unsorted data? (with the understanding that the extra +n becomes less relevant for large enough data sets)

So the question is if the default sort should be a faster for the special case of sorting presorted lists at the expense of being a bit slower for most sorts.

I would choose the nlogn regardless, under the assumption most data will not be presorted. But, that assumption could be wrong of course, depending on your usage.

PS: No worries about the name. I knew what you meant. =)

An additional note - I use risers - more or less a...

2008-03-11T07:59:00.000+00:00

An additional note - I use risers - more or less as defined here - for algebraic topology. :)

Consider using microbench (http://hackage.haskell....

2008-03-10T21:40:00.000+00:00

Consider using microbench (http://hackage.haskell.org/cgi-bin/hackage-scripts/package/microbench-0.1) for doing this sort of test in the future -- it's exactly the sort of thing I wrote it for!

Sammy: apologies for misspelling your name - I gue...

2008-03-10T21:15:00.000+00:00

Sammy: apologies for misspelling your name - I guess I got the trailing i out of Larbi! Having the comment box at the top of the page, and your comment at the bottom certainly didn't help (bad UI in blogger).

Sami: Yhc is O(n) for ordered lists, and O(n log n...

2008-03-10T21:13:00.000+00:00

Sami: Yhc is O(n) for ordered lists, and O(n log n) otherwise. For lists with various ordered chunks, Yhc is somewhere in between. GHC is O(n log n) always.

The Yhc sort is nearly identical to GHC in the merge sort bit, its just the splitting up bit where Yhc wins. You could certainly add the splitting up bit to GHC, which is what I suggest is done. Effectively, this is "use the Yhc algorithm".

It is a general sort, and it is faster in the general case as well, by a constant factor. In real world sorting, it turns out that often the input is either sorted or reverse sorted - hence it is worth considering these cases. They don't deserve to be given equal weight to sorting a random list (as my test does), but then my test was just an indicator, not a result.

What is the computational complexity of Yhc? Coul...

2008-03-10T21:07:00.000+00:00

What is the computational complexity of Yhc? Could you just add the presorted list tests to GHC?

Is this a general sort? If so, why put so much weight on the special cases of presorted lists?

Anon: Don't think competition, think exploration o...

2008-03-10T18:08:00.000+00:00

Anon: Don't think competition, think exploration of the design space with interesting ideas. It isn't Yhc vs GHC, it's a big group hug where everyone benefits.

My benchmark was a simple one, as a first-cut basic indicator. It wasn't designed to show Yhc being faster, it was designed to convince me (and only me) that I could drop the Yhc sorting routine. The fact that Yhc has a special case for sorted lists is very surprising!

My initial indicators for performance on a non-sorted input is that its ~20% faster, but those numbers are so vague as to be an indicator only, not a result.

In summary:* Out of 3 sorts in your test, 2 use so...

2008-03-10T17:59:00.000+00:00

In summary:

* Out of 3 sorts in your test, 2 use sorted inputs;
* Yhc version has a special case for sorted lists;
* Yhc version is faster for that test.

No so surprising, is it?
What about performance for non-sorted inputs?