Wednesday, December 12, 2018

GHC: From Bug to Merge

Summary: The story of a bug, from report, proposal, patch, revisions, to merge.

A while ago I found a bug in GHC. Like all good bugs, this one starts with playing around, in this case seeing if it was possible to eliminate String from a base-like library. There are two interesting extensions:

  • OverloadedStrings changes each string literal to be wrapped in fromString, so "hello" in the source code becomes fromString "hello" when desugared.
  • RebindableSyntax means that when GHC sees fromString or fail (or lots of other functions) it uses the version in scope, rather than the version in the base library.

Taking a look at the code:

foo x = do
    Just y <- x
    return y

In GHC 8.6 it desugars to something equivalent to:

foo x = x >>= \v ->
    case v of
        Just y -> return y
        _ -> fail ("pattern match error" :: String)

In particular, it does not insert a fromString around the generated "pattern match error". If you are using the built-in fail, that's fine, since the built-in fail wants a String. But if we define a custom fail function which doesn't use String, but which takes the output of fromString, then we can get type errors with the definitions:

newtype Text = Text String
fail (Text x) = error x
fromString = Text

The solution is fairly simple - desugar adding a fromString - which is what GHC 8.8 will do. This post is the story of getting that change integrated.

Raise a GHC ticket

The first step after finding a bug is to raise a GHC ticket, which I did as Trac 15645, also offering to fix it. After discussing a bit with the GHC developers, we came the conclusion that the essential property of the bug is:

Overloaded strings should imply even generated strings are overloaded, if they are passed to user-controlled functions.

I think (and still think) that this ticket is a pure bug fix, partly because changing it does not imply changing the user manual in any way. However, GHC developers quite reasonably disagreed, and suggested I go through the GHC proposal process.

Raise a GHC proposal

The GHC Proposal process involves creating a proposal, having people discuss it, and then passing it via a committee. At this point I got Shayne Fletcher involved, who actually ran with most of the steps from here onwards. We raised GHC Proposal 168, setting out the case why the change should be made. We @ mentioned people on GitHub, and posted to Twitter to solicit opposing views, but only 4 people made any comment, and none of them disagreed. We submitted to the committee after the discussion was verifiably dead (not that it was ever super alive), and had the proposal accepted.

Write the code

After having the proposal accepted the hard work of writing the code began as Phab D5251. The code itself wasn't simple, but neither was it very complex. An initial version was reviewed with lots of good suggestions from Simon Peyton Jones. Shayne made some revisions, including adding a regression test, and the patch was ready. Sometime later the code was merged to GHC master.

Timeline

The whole process took a long time - 14 Sep to 12 Dec (fortunately all 2018), so nearly 3 months. That timeline was significantly extended because GHC is in the process of changing hosting for their code, which both makes it harder to merge stuff and involves the people who would normally be merging stuff doing other work. However, I thought it instructive to show where the time went.

  • 14 Sep - 14 Sep: Open bug and discuss it
  • 14 Sep - 17 Sep: Writing the proposal
  • 17 Sep - 20 Sep: Discussing the proposal
  • 20 Sep - 24 Sep: Confirming the discussion had died down
  • 24 Sep - 21 Oct: Waiting for the committee decision
  • 21 Oct - 22 Oct: Opening the Phab ticket
  • 22 Oct - 25 Oct: Discussion on the code review
  • 25 Oct - 29 Oct: Addressing review comments
  • 29 Oct - 31 Oct: Additional review and discussion
  • 1 Nov - 26 Nov: Waiting for people to give a final approval
  • 26 Nov - 11 Dec: Waiting to merge in

I think the interesting thing is that of the 3 months, over 2 months was spent waiting for confirmations of a mechanical nature. However, every time real in-depth thoughtful discussion was required (code review, discussing the bug) it happened almost immediately.

Friday, November 23, 2018

Counting the Cost of Colons in Haskell

Summary: Haskell uses :: as the type operator. That was a mistake that costs us over 1 million characters of source code.

Haskell uses :: for type annotations, e.g. (1 :: Int). Most other FP languages with types use :, including Scala, OCaml, Agda, Idris and Elm. Haskell uses : for list cons, so you can write:

myList = 1:2:[] :: [Int]

Whereas in other languages you write:

myList = 1::2::[] : [Int]

Moreover, the reason Haskell preferred :: was the belief that if the cons operator was :: then people would quite naturally insert spaces around it, giving:

myList = 1 :: 2 :: [] : [Int]

The final program is now noticeably longer than the original. Back when people first invented Haskell, I imagine they were mainly list manipulation operations, whereas now plenty of libraries work mainly at the type level. That raises the question - would Hackage be shorter or longer if we used : for types?

Method

I downloaded the latest version of every Hackage package. For each .hs file, I excluded comments and strings, then counted the number of instances of : and ::, also noting whether there were spaces around the :. The code I used can be found here.

Results

  • Instances of :: = 1,991,631
  • Instances of : = 265,880 (of which 79,109 were surrounded by two spaces, and 26,931 had one space)

Discussion

Assuming we didn't add/remove any spaces, switching the symbols would have saved 1,725,751 characters. If we assume that everyone would have written :: with spaces, that saving drops to 137,9140 characters. These numbers are fairly small, representing about 0.17% of the total 993,793,866 characters on Hackage.

Conclusion

The Haskell creators were wrong in their assumptions - type should have been :.

Update: the first version of this post had numbers that were too low due to a few bugs, now fixed.

Thursday, November 22, 2018

Downloading all of Hackage

Summary: I wanted to download the latest version of every package in Hackage. Here's a script and explanation of how to do it.

Imagine you want the latest version of every package on Hackage. I found two tools that mirror every version of every package:

  • Using hackage-mirror you can do hackage-mirror --from="http://hackage.haskell.org" --to="C:/hackage-mirror". But this project is long deprecated and doesn't actually work anymore.
  • Using hackage-mirror-tool you might be able to do it, but it requires a new Cabal, isn't on Hackage, doesn't seem to work on Windows and doesn't say whether it downloads to disk or not.

Given it's a fairly simple problem, after investigating these options for an hour, I decided to cut my losses and write a script myself. Writing the script took a lot less than an hour, and I even wrote this blog post while the download was running. The complete script is at the bottom of this post, but I thought it might be instructive to explain how I went about developing it.

Step 0: Set up my working environment

I created a file named Download.hs where I was writing the source code, used ghcid Download.hs in a VS Code terminal to get fast error feedback using Ghcid, and opened another terminal to execute runhaskell Download.hs for testing.

Step 1: Find where a download link is

You can download a package from Hackage at http://hackage.haskell.org/package/shake/shake-0.17.tar.gz. You can also use https, but for my purposes and bulk downloading I figured http was fine. I hunted around to find a link which didn't contain the version number (as then I wouldn't have to compute the version number), but failed.

Step 2: Find a list of package versions

Looking at the cabal tool I found the cabal list --simple command, which prints a big list of packages in the form:

foo 1.0
foo 2.1
bar 1.0

For each package on Hackage I get all versions sequentially, with the highest version number last. I can execute this command using systemOutput_ "cabal list --simple" (where systemOutput_ comes from the extra library).

Step 3: Generate the list of URLs

Now I have the data as a big string I want to convert it into a list of URL's. The full pipeline is:

map (toUrl . last) . groupOn fst .  map word1 . lines

Reading from right to left, I split the output into a list of lines with lines, then split each line on its first space (using word1 from the extra library). I then use groupOn fst so that I get consecutive runs of each package (no points for guessing where groupOn comes from). For each list of versions for a package I take the last (since I know that's the highest one) and transform it into the URL using:

let toUrl (name, ver) = "http://hackage.haskell.org/package/" ++ name ++ "/" ++ name ++ "-" ++ ver ++ ".tar.gz"

Step 4: Download the URLs

I could make multiple calls to wget, but that's very slow, so instead I write them to a file and make a single call:

writeFile "_urls.txt" $ unlines urls
system_ "wget --input-file=_urls.txt"

I use the name _urls.txt so I can spot that special file in amongst all the .tar.gz files this command produces.

Step 5: Putting it all together

The complete script is:

import Data.List.Extra
import System.Process.Extra

main :: IO ()
main = do
    let toUrl (name, ver) = "http://hackage.haskell.org/package/" ++ name ++ "/" ++ name ++ "-" ++ ver ++ ".tar.gz"
    urls <- map (toUrl . last) . groupOn fst .  map word1 . lines <$> systemOutput_ "cabal list --simple"
    writeFile "_urls.txt" $ unlines urls
    system_ "wget --input-file=_urls.txt"

After waiting 46 minutes I had 13,258 packages weighing in at 861Mb.

Update: In the comments Janek Stolarek suggested the simpler alternative of cabal list --simple | cut -d' ' -f1 | sort | uniq | xargs cabal get (I had missed the existence of cabal get).



Sunday, October 21, 2018

Announcing Profiterole - GHC Profile Viewer

Summary: profiterole reformats GHC profile reports so they are easier to read.

Do you often work with GHC time profiling reports? Do you find them excessively long and hard to navigate? Profiterole reads standard GHC .prof files and generates both textual and HTML reports which are typically more than 10x smaller. As an example compare HLint profile input to HLint Profiterole output.

Usage

To run, first install (cabal update && cabal install profiterole), generate a GHC profile the normal way, then run:

profiterole myprogram.prof

Profiterole will generate myprogram.profiterole.txt and myprogram.profiterole.html - both contain the same information, but the HTML has hyperlinks. There are three columns of numbers:

  • TOT is the total time spent in any item under this code, what GHC calls inherited time.
  • INH is the total time spent in the items that Profiterole did not move out to the top level.
  • IND is the individual time, just like GHC profiles.

For large programs, using +RTS -P (instead of the common -p) will give more accurate results.

How it works

Profiterole aims to make the profile shorter by combining common subtrees and lifting them to the root - e.g. if you call parseFile from 7 places in the code, instead of having 7 pieces of parseFile profiling, Profiterole will give you one. With only 1 place containing parseFile, it's easier to optimise parseFile, and it's easier to read the code calling it without getting lost in the internals.

How to profile

Given profile data, different ways of looking at it reveal different insights, and the ones discovered by Profiterole have definitely had value. I tend to use:

  • I first use Profiteur to get an overall sense of where the time is going visually. Profiteur lets me orientate myself, but tends to be a little difficult to drill into the details and repeat experiments.
  • I then use Profiterole to see if there were any repeated pieces of code Profiteur missed, and then dig into the details using Profiterole.
  • Only if I'm really going into the details do I go to the GHC .prof output - it's pretty rare.

Tuesday, October 16, 2018

Announcing Shake 0.17

I'm delighted to announce Shake 0.17. As always, the full changelog is on GitHub, but I'd like to highlight three areas that have seen most attention.

Error Messages

Error messages now support GHC's HasCallStack feature, giving code locations in error messages. As an example, let's define rules for both *.txt and overlap.*, then try and build overlap.txt. With Shake 0.17 we get the far more informative error:

Error when running Shake build system:
at Example.hs:50:46-55:
* Depends on: overlap.txt
* Raised the exception:
Build system error - key matches multiple rules:
Key type:       FileQ
Key value:      overlap.txt
Rules matched:  2
Rule 1:         "overlap.*" %> at Example::21:94-106:
Rule 2:         ".txt" %> at Example::24:94-106:
Modify your rules so only one can produce the above key

We can see where the dependency was introduced (line 50), where the rules were defined (lines 21 and 24), and what their patterns were.

The Database module

The new module Development.Shake.Database provides operations for working with open Shake databases - meaning you can now open the database, run some actions against, and shut it. Unlike before, you can now run against an open database repeatedly, and query the resulting database for live or erroneous files. When combined with the new feature that /dev/null for shakeFiles results in no on-disk representation of Shake, you can create an in-memory database, execute it many times, then throw it away. These features aren't targetted at build systems, but allow reuse of Shake in other domains.

If you are using the Database module, and have a way to observe changes interactively, the deprioritize function may be of use, to cause Shake to build some unimportant rules last.

This work was supported by Digital Asset.

Changes to Builtin Rules

Most users shouldn't need to define their own types of rules, but for those who do, the biggest improvement is probably the better documentation in Development.Shake.Rule, complete with a worked example. At the same time, the builtin rules have changed slightly in incompatible ways - the docs explain the new state. These changes are intended to set the stage for Cloud Shake, following the pattern described in Build Systems a la Carte. I hope that a forthcoming release of Shake will provide an actual implementation of Cloud Shake.

Tuesday, October 02, 2018

Full-time Haskell jobs in Z├╝rich/New York, at Digital Asset

Summary: We're hiring 3 Haskell programmers and a few other roles too.

I am currently working at Digital Asset, working on our DAML programming language. We're seeking 3 additional Haskell programmers to join, 2 in New York and 1 in Zurich (remote work is not currently an option). There are also a ton of other jobs on our website, including Formal Methods and nix + some Haskell Build Engineering (also available at a more junior level).

What we do

We have built DAML, the Digital Asset Modelling Language, which is the centerpiece of our distributed ledger technology. DAML is a contract language that consists of a strongly-typed purely functional core extended with domain specific constructs to express the flow of rights and obligations underlying today's multi-party business processes. Application Developers using DAML and our distributed ledger technology are supported by the DAML SDK. It provides a type-safe integration of DAML with existing technology like Java, Scala, XML and SQL, and contains DAML Studio, which provides a modern IDE experience to develop, test, and analyse DAML programs.

Working on the Language Engineering team with Digital Asset involves partnering with people around the world (we have centers in New York, Zurich and Sydney), working with exciting new technology, where many of the answers haven't yet been figured out, producing solutions for clients, such as replacing the settlement and clearing platform of the Australian Stock Exchange (ASX), and making sure the end result has the quality required for robust usage. It's challenging work, but the impact could be huge.

What we require

We're looking for the best functional programmers out there, with a strong bias towards Haskell. If not Haskell, then Scala is useful, as other teams in the company write Scala. However, we've hired ML/F# programmers too, with good results. In particular we want:
  • Experienced functional programmer. Either some open-source libraries (Hackage/GitHub) or commercial experience.
  • Writes good, clean, effective code.
  • Existing compiler development experience is useful, if it's with GHC then even better.
We do not require any existing blockchain/DLT/finance knowledge.

How to apply

To apply, email neil.mitchell AT digitalasset.com with a copy of your CV. If you have any questions, email me.
The best way to assess technical ability is to look at code people have written. If you have any packages on Hackage or things on GitHub, please point me at the best projects. If your best code is not publicly available, please describe the Haskell projects you've been involved in.

Thursday, September 13, 2018

Review of Apple Watch (series 2)

Summary: I like mine.

I've worn a watch since I was 10, starting with a Casio F-91W, replacing it with an identical model about seven times over the years. Last year the strap broke (it's almost always the strap that breaks), and I made the decision to buy an Apple Watch. I'm very happy with my Apple Watch (series 2, 38mm).

What I use it for

The main things I use my watch for are:

  • Alarms and timers, often using Siri to set them. If you treat the voice interface like a bad command line it works well - if you say "Add an alarm at 4pm" and you had an alarm at 4pm yesterday, it just shows you the alarm but doesn't enable it. Instead you have to say "Set alarm at 4pm". If you say "Set alarm at 4:10pm" the odds of getting an alarm at 10pm are quite high.
  • Receiving notifications, email, texts, Slack messages, phone calls. When you lift your arm you get a quick preview of the message, which is enough to decide whether it's important (take out the phone), or can happily be ignored.
  • Sleep tracking, via the Sleepwatch app. It's an awesome app that tracks your sleep showing trends.
  • Paying for things via the ApplePay. Nowadays I'm on the verge of not shopping at places that don't take ApplePay. It's more secure than contactless, works everywhere contactless works, has a higher limit, and is incredibly convenient. It can also swipe through multiple credit cards. Easy and seamless.

What I wish was better

There are other features the watch offers that I was hoping to use, but for various reasons haven't worked out.

  • I am terrible at navigation, and wanted to use Google Maps on my watch, particularly while cycling. Unfortunately, there is no Google Maps app for the watch, and the Apple Maps one is even less useful than normal Apple Maps. There is a recurrent bug where it loses the map display and just displays a checkered background - which can be fixed by some complex steps including wiping the watch and reinstalling. Part of the reason for buying this watch was for navigation, so I hope this gets resolved eventually.
  • I wanted to quickly and easily track train departures on the move. Unfortunately the National Rail train time app is useless - it omits the time the train is leaving, merely saying "On time" or not. As a consequence you have to memorise the timetable, plus believe it has refreshed and isn't just showing stale information. All in all, impressively close, and totally useless.
  • The actual display of the watch is adequate - it takes a noticeable pause to display the watch face (sub-second), but compared to my Casio, it's not always available. I like actual seconds on my watch, which limits me to exactly one digital watch face. I can't believe "knowing the precise time" is such a niche feature on a watch.
  • You can't hide apps from the watch that you don't use, which means my watch face has 50 odd apps, of which I use maybe 10. Removing most of the apps would make navigation much easier.
  • The watch slows down over time, so after a couple of months it starts to lag. A reboot fixes that.
  • The straps you can buy for the watch are fantastically overpriced. The default one is OK, but my wrist in between two holes, so it's usually either a bit loose or a bit tight.
  • Exercise features haven't been much use to me, but I'd blame that on me rather than the watch...

Conclusions

The positives are enough to make it worth me having an Apple Watch, and inevitably replacing when the battery life gets too bad (for the moment, it runs about 30hrs on a charge, which is fine).