On IRC today someone asked for a function that computes the mode of a list. This is the same as finding the most common element in a list. One simple solution is a nice display of composition of smaller abstraction functions:
mostCommon :: Ord a => [a] -> a
mostCommon = head . maximumBy (comparing length) . group . sort
And a single example:
; mostCommon "Haskell"
'l'
To read the function mostCommon, it is best to read from right to help:
- sort: put the elements in order, i.e. "Haeklls".
- group: group consecutive elemnts, i.e. ["H","a","e","k","ll","s"]. It would be entirely reasonable to have a groupSet function that did grouping of non-consecutive elements, and I often use such a function.
- maximumBy (comparing length): find the maximum value, looking only at the number of elements in each list, i.e. "ll".
- head: take a representative value from the list, i.e. 'l'.
If asked to write the same function in Javascript (say), I would have started with a Hash table that looped counting the number of occurrences of each letter, then done a loop through this to find the maximum. The code would have been longer, and more prone to bugs.
It is easy to see that by combining functions that have already been written, tested and documented, it is possible to come up with very succinct operations.
9 comments:
Um ... you should add that the fact that many other languages don't have this feature (or, at least, lambda forms) renders them un-usable, when one is well-Haskellised.
These nifty features are what should carry a warning label: `ADDICTIVE! >8<'
Nice :) I like these kind of functions too! To be complete, here's a definition of comparing:
comparing f l r = compare (f l) (f r)
lecamarade: that's for others to learn once they've been seduced by Haskell :)
chris: Thanks for including that. comparing is in GHC 6.6.1 in Data.Ord, but not in 6.4.
I made a small test in Javascript. Once you have the functions defined, you can also call it rather nicely (although it's not nearly as cool as the haskell version).
You can call it like this:
alert("haskell".toArray().sort().group().maximumByLength()[0]);
mostCommon.html
I'm not sure that I understand why this feature should be considered so "addictive". In fact, I'm not sure why it is even in the language at all. The code above can be trivially expressed without the . operator simply by specifying a name for an argument. In fact, if maximumBy took its arguments in the opposite order, the above composition would not even have been possible. Perhaps I'm missing some critical functionality that this provides, but it seems to be one more unneeded way to express the same thing without imparting much additional clarity.
@anonymous :
> In fact, if maximumBy took its
> arguments in the opposite order,
> the above composition would not
> even have been possible.
then you'd do :
flip maximumBy
@anonymous :
> I'm not sure that I understand why
> this feature should be considered so
> "addictive".
It's not just the composition, but the currying that makes this so nice. A brand new function that does what you want, built out of smaller pieces of polymorphic, general purpose functions.
I think the "maximumBy (comparing length)" bit is the sweetest of the bunch. maximumBy lets you hand in your own "value" function, appropriate for the list you wish to find a maximum in. "comparing" is itself a general purpose function that takes a list and compares all elements together, by whatever criteria you choose to give.
Finally, notice the function will also work over any list type that can be "ordered".
Sure, without composition the function definition has more paranetheses, but the other features at play - currying, polymorphism, and even "point-free" style - conspire to give a very readable and concise definition of the function "mostCommon".
groupSet wouldn't really be in the lazy-evaluation spirit, I think. It would have to digest the whole list before emitting any output. "sort" has no choice but groupBy works even on infinite lists, which is cool.
I'd have used
last . transpose . group . sort
instead. It handles the empty list and multiple mode cases better (by providing a list of results instead of a single one).
Post a Comment