Friday, November 23, 2018

Counting the Cost of Colons in Haskell

Summary: Haskell uses :: as the type operator. That was a mistake that costs us over 1 million characters of source code.

Haskell uses :: for type annotations, e.g. (1 :: Int). Most other FP languages with types use :, including Scala, OCaml, Agda, Idris and Elm. Haskell uses : for list cons, so you can write:

myList = 1:2:[] :: [Int]

Whereas in other languages you write:

myList = 1::2::[] : [Int]

Moreover, the reason Haskell preferred :: was the belief that if the cons operator was :: then people would quite naturally insert spaces around it, giving:

myList = 1 :: 2 :: [] : [Int]

The final program is now noticeably longer than the original. Back when people first invented Haskell, I imagine they were mainly list manipulation operations, whereas now plenty of libraries work mainly at the type level. That raises the question - would Hackage be shorter or longer if we used : for types?

Method

I downloaded the latest version of every Hackage package. For each .hs file, I excluded comments and strings, then counted the number of instances of : and ::, also noting whether there were spaces around the :. The code I used can be found here.

Results

  • Instances of :: = 1,991,631
  • Instances of : = 265,880 (of which 79,109 were surrounded by two spaces, and 26,931 had one space)

Discussion

Assuming we didn't add/remove any spaces, switching the symbols would have saved 1,725,751 characters. If we assume that everyone would have written :: with spaces, that saving drops to 137,9140 characters. These numbers are fairly small, representing about 0.17% of the total 993,793,866 characters on Hackage.

Conclusion

The Haskell creators were wrong in their assumptions - type should have been :.

Update: the first version of this post had numbers that were too low due to a few bugs, now fixed.

5 comments:

Daniel Bergey said...

Thank you for this empirical confirmation of my long-held belief. I only regret that in posting now, you may have missed the chance for a 1 April Haskell Prime proposal.

Neil Mitchell said...

Daniel: This feature is so great that I'm not sure we can wait til 1st April for the proposal.

MichaƂ said...

I put this post in my Hall of Fame of parodies.
Especially liked the "0.17%" note ;-).

atravers said...

...and has been duly noted at https://wiki.haskell.org/Nitpicks.

to Mr David Turner;

Could you please enlighten the rest of us, and future generations, as to why you used a single colon for the cons data constructor in Miranda(R) (along with - IIRC - SASL and KRC)?

I'm assuming that choice was one crucial factor in Haskell now having the same syntax.

Thank you.

Carl said...

Thank you foor writing this