Sunday, February 17, 2019

Quadratic "deriving Generic" Compile Times

Summary: For large data types, deriving Generic can take a long time to compile.

I was building GHC using Hadrian, and part of that process involves compiling Cabal multiple times - once to build Hadrian itself, and then once per GHC stage thereafter (typically 3 stages). Compiling Cabal takes quite a long time, but one thing I spotted was that compiling the LicenseId file alone was taking 30 seconds! Given that it seemed to be on the critical path, and usually gets compiled 4 times, that seemed worth investigating.

Looking at the file, it defines an enumeration with 354 license types in it. Pulling apart the file, I found that with only the data type and deriving Generic it was taking about 9 seconds without optimisation or 22 seconds with optimisation, using GHC 8.6.3. I then wrote a test program that generated and compiled instances of the program:

{-# LANGUAGE DeriveGeneric #-}
module Sample where
import GHC.Generics
data Sample = Ctor1 | Ctor2 | ...
   deriving Generic

The resulting graph shows that compilation time is quadratic in the number of constructors:

This phenomenon could either be because the deriving Generic itself is quadratic, or more likely that the output generated by deriving Generic provokes quadratic behaviour in some subsequent part of the compiler. I'm going to report this as a GHC bug.

Update: There's already GHC bug 5642 which explains that it might just be a fundamental property of how Generic instances look.

For Cabal the fix might be somewhat simpler - the deriving Generic is only used to define a Binary instance, which can be written in 2 lines for an Enum anyway - discussion on the issue tracker. Hopefully this change will speed up compiling Cabal, and thus speed up compiling GHC.

As I watched the GHC output go by I noticed that both Parser and DynFlags in GHC itself were also very slow, with the latter taking nearly 2 minutes. I'm sure there's more scope for optimising GHC builds, so Andrey Mokhov and I have proposed a Google Summer of Code project to tackle other low-hanging fruit.

The quick program I wrote for generating the data in this blog post is below:

import Control.Monad
import Data.List
import System.Time.Extra
import System.Process.Extra
import Numeric.Extra

main = do
    forM_ [1..] $ \n -> do
        writeFile "Sample.hs" $ unlines $
            ["{-# LANGUAGE DeriveGeneric #-}"
            ,"module Sample where"
            ,"import GHC.Generics"
            ,"data Sample = " ++ intercalate " | " ["Ctor" ++ show i | i <- [1..n]]
            ,"   deriving Generic"
        xs <- forM [0..2] $ \o ->
            fmap fst $ duration $ systemOutput_ $ "ghc -fforce-recomp Sample.hs -O" ++ show o
        putStrLn $ unwords $ show n : map (showDP 2) xs

Tuesday, February 05, 2019

Announcing ghc-lib

On behalf of Digital Asset I'm delighted to announce ghc-lib, a repackaging of the GHC API to allow it to be used on different GHC versions. The GHC API allows you use the GHC compiler as a library, so you can parse, analyze and compile Haskell code.

The GHC API comes pre-installed with GHC, and is tied to that GHC version - if you are using GHC 8.6.3, you get version 8.6.3 of the API, and can't change it. The ghc-lib package solves that problem, letting you mix and match versions of the GHC compiler and GHC API. Why might you want that?

  • Imagine you are writing a tool to work with several versions of the GHC compiler. The GHC API changes significantly between each version, so doing this would require writing a lot of C preprocessor code to support it. An alternative is to use one version of ghc-lib which works across multiple versions of GHC.
  • Imagine you are modifying the GHC API or want features from GHC HEAD. With ghc-lib you can depend on the revised GHC API, without upgrading the compiler used to build everything, speeding up iteration.

While ghc-lib provides the full GHC API, it doesn't contain a runtime system, nor does it create a package database. That means you can't run code produced by ghc-lib (no runtime), and compiling off-the-shelf code is very hard (no package database containing the base library). What you can do:

The package ghc-lib is released on Hackage, and can be used like any normal package, e.g. cabal install ghc-lib. Since ghc-lib conflicts perfectly with the GHC API and template-haskell, you may wish to ghc-pkg hide ghc-lib and use the language extension PackageImports to do import "ghc-lib" GHC. There will be two release streams within the ghc-lib name:

  • Version 8.8.1 will be the version of ghc-lib produced against the released GHC 8.8.1, once GHC 8.8.1 is released. There is no release against GHC 8.6.3 because we had to make changes to GHC to enable ghc-lib, which were only upstreamed in the last few months.
  • Version 0.20190204 is the version of ghc-lib using GHC HEAD on the date 2019-02-04.

We've been developing and using ghc-lib internally at Digital Asset for the last six months. The use of ghc-lib has enabled us to track GHC HEAD for certain projects, and develop improvements to GHC itself, and then integrate them without requiring us to rebuild all Haskell source code on every step. Smoothing out that development loop has been a massive productivity boon to us.

While this is Digital Asset's first open source project in a while, we have been making lots of contributions behind the scenes - it's no coincidence several of my recent posts involve my colleague Shayne. In particular, our engineers have been behind several GHC proposals.

While I'm announcing the project, much of the work has been done by Shayne Fletcher and Jussi Maki, but neither of them have active blogs just yet!

Friday, January 25, 2019

HLint Unused Extension Hints

Summary: HLint detects unused extensions in LANGUAGE pragmas, including over 17,000 on Hackage.

HLint has detected unused LANGUAGE pragmas for a while - where you enable an extension (e.g. {-# LANGUAGE EmptyDataDecls #-}) but don't use it. HLint v2.1.13 includes some improvements from Yair and myself making these hints even more powerful. As a result, I thought it worth showing some examples of what HLint can do in this area. I started by running HLint on all of Hackage, which found 17,718 "Unused LANGUAGE pragma" hints, including the examples in this post.

Detecting unused extensions

For extensions that show up as syntax (e.g. EmptyDataDecls, ViewPatterns etc), HLint has rules saying which constructs require which extensions. For extensions that aren't syntax directed (e.g. AllowAmbiguousTypes or IncoherentInstances), HLint can't detect whether they are used or not. In all, HLint has rules for detecting 36 different unused extensions. Taking a look at some examples from Hackage:

abcBridge-0.15\src\Data\ABC\AIG.hs:18:1: Warning: Unused LANGUAGE pragma
  {-# LANGUAGE EmptyDataDecls #-}
Perhaps you should remove it.

mallard-\lib\Database\Mallard\Validation.hs:4:1: Warning: Unused LANGUAGE pragma
  {-# LANGUAGE TemplateHaskell #-}
Perhaps you should remove it.

scholdoc-texmath-\src\Text\TeXMath\Writers\TeX.hs:1:1: Warning: Unused LANGUAGE pragma
  {-# LANGUAGE GeneralizedNewtypeDeriving, ViewPatterns, GADTs #-}
  {-# LANGUAGE GeneralizedNewtypeDeriving, GADTs #-}

As we can see, HLint can spot entirely redundant extension declarations, and also prune those that are partly redundant.

Duplicate extensions

Sometimes extension are simply duplicated, and HLint detects these, either between two separate pragmas, or within a single pragma.

ghcjs-base-stub-\src\GHCJS\Marshal\Pure.hs:3:1: Warning: Use fewer LANGUAGE pragmas
  {-# LANGUAGE DefaultSignatures #-}
  {-# LANGUAGE DefaultSignatures #-}
  {-# LANGUAGE DefaultSignatures #-}

abstract-deque-tests-0.3\Data\Concurrent\Deque\Tests.hs:1:1: Warning: Use fewer LANGUAGE pragmas
  {-# LANGUAGE BangPatterns, RankNTypes, CPP, BangPatterns #-}
  {-# LANGUAGE BangPatterns, RankNTypes, CPP #-}

Implied extensions

The new feature for v2.1.13 is that extension are detected as redundant if they are implied by other extensions. For example, if you have PolyKinds defined then that implies KindSignatures. HLint now features a list of such implications, which it uses to detect redundant extensions.

AERN-RnToRm-\src\Data\Number\ER\RnToRm\UnitDom\Base.hs:1:1: Warning: Unused LANGUAGE pragma
  {-# LANGUAGE MultiParamTypeClasses #-}
Perhaps you should remove it.
Note: Extension MultiParamTypeClasses is implied by FunctionalDependencies

attoparsec-\Data\Attoparsec\ByteString\Char8.hs:1:1: Warning: Unused LANGUAGE pragma
  {-# LANGUAGE BangPatterns, CPP, FlexibleInstances, TypeFamilies,
    TypeSynonymInstances, GADTs #-}
  {-# LANGUAGE BangPatterns, CPP, FlexibleInstances, TypeFamilies,
    GADTs #-}
Note: Extension TypeSynonymInstances is implied by FlexibleInstances

Redundant extensions that imply non-redundant extensions

Sometimes there is an extension that you can tell is unused (e.g. RecordWildCards), which implies an extension that is either being used or can't be detected (e.g. DisambiguateRecordFields). In such cases HLint gives a note that the implied extension might now need to be provided explicitly, although usually it won't be necessary. As examples:

gogol-maps-engine-0.3.0\gen\Network\Google\Resource\MapsEngine\Projects\List.hs:7:1: Warning: Unused LANGUAGE pragma
  {-# LANGUAGE RecordWildCards #-}
Perhaps you should remove it.
Note: may require `{-# LANGUAGE DisambiguateRecordFields #-}` adding to the top of the file

manifolds-\Data\Function\Affine.hs:14:1: Warning: Unused LANGUAGE pragma
  {-# LANGUAGE FunctionalDependencies #-}
Perhaps you should remove it.
Note: may require `{-# LANGUAGE MultiParamTypeClasses #-}` adding to the top of the file

Being wrong

Finally, sometimes HLint gets it a bit wrong. As an example:

shake-0.17.1\src\Development\Shake\Internal\FileInfo.hs:1:1: Warning: Unused LANGUAGE pragma
  {-# LANGUAGE GeneralizedNewtypeDeriving, DeriveDataTypeable, CPP,
    ForeignFunctionInterface #-}
  {-# LANGUAGE GeneralizedNewtypeDeriving, DeriveDataTypeable, CPP

Here HLint has detected that ForeignFunctionInterface is not used, but in fact it is, although only under one branch of an #ifdef. To fix the hint we can put the extension itself under CPP, adjust the CPP definitions given to HLint, or ignore the hint.

Tuesday, January 22, 2019

Release delays with Stackage

Summary: There are two steps that delay new versions of packages in Stackage.

I aim to get the latest version of my software out to as many people as quickly as possible. Older versions have bugs, new versions have new features - that's why I release new versions. Unfortunately there are two steps in Stackage that slow down this process.

Taking an example, HLint depends on haskell-src-exts, and is tightly coupled, so (to a first approximation) every 0.1 bump to haskell-src-exts requires changing HLint. There are also lots of other packages that depend on haskell-src-exts. This situation leads to two delays in getting HLint to Stackage users, both of which are on display in bug 4214:

Issue 1: Reluctance to remove packages

Stackage has a policy that if a new package (e.g. haskell-src-exts) is released which breaks your package (e.g. haskell-src-meta) you have an unspecified amount of time to release an update. My experience is either packages are updated quickly (all upgrades on that ticket happened within 12 days) or the package maintainers never reply (46 days later no other maintainer has even left a comment).

It used to be the case that there were hard time limits (maximum one month), but my experience was those were never enforced. Unfortunately this lag can cause a significant delay until Stackage Nightly picks up an upgrade. It seems like a more mechanical rule (e.g. after 2 weeks with no update, or 6 weeks total) might keep the process ticking faster. I appreciate it's hard to break people's work, which is why making it come down to human judgement seems to lengthen the process significantly.

Delay imposed: up to 2 months, and sometimes requires chasing.

Issue 2: Existence of Stackage LTS

While the first issue is very much a trade off, the second one is (in my view) just a bad design of Stackage, as I've said before. There is Stackage Nightly which has the latest code. There is Stackage LTS which has older and therefore buggier code, up to 2-3 months older. Having two options is fine, but the stack tool and documentation direct people towards LTS as a preference. LTS is useful if you view the act of upgrading between 0.0.1 versions as low risk (which it isn't) or you find it easier to fix multiple distinct breaking changes when they are overlapped (which it isn't). Unfortunately Stackage LTS users won't get a new version of HLint until a new Stackage LTS version is created, even after it gets merged. On the plus side, this process happens automatically without any intervention by package authors.

Delay imposed: 2-3 months.

PS. While I criticise Stackage, that's because I want to make it better, since it is a very useful distribution channel for many people, and I'm grateful for the work the Stackage maintainers do to keep the process ticking along.

Thursday, January 17, 2019

Ignoring HLint

Summary: HLint now has more ways to ignore hints you don't like.

HLint makes suggestions about how to improve your Haskell code. But not everyone likes all the suggestions all the time, so HLint comes with ways of ignoring those annoying hints, and HLint 2.1.11 provides even more mechanisms. Without further ado, let's take a quick tour - full details are in the HLint README.

Method 1: the --default flag

To ignore all hints your code currently generates run hlint as normal, but passing the --default flag, which will generate a config file with all hints that fire set to ignore. Typically, when approaching a new code base to run HLint on, I start by doing:

hlint . --default > .hlint.yaml

After that, it's easy to remove ignored hints from .hlint.yaml one by one and fix the code.

Method 2: Add -ignore directives

In the .hlint.yaml file you can write:

- ignore: {name: Eta reduce}

This directive ignores the named hint, and is what --default generates. There are also more refined ways of ignoring a hint in certain modules, or ignoring all hints in certain modules (see the README).

Method 3: Add a {- comment -}

Method 3 actually has 3 sub-methods, you can write any of:

  • {-# ANN module "HLint: ignore Eta reduce" #-}
  • {-# HLINT ignore "Eta reduce" #-}
  • {- HLINT ignore "Eta reduce" -}

For ANN pragmas it is important to put them after any import statements. If you have the OverloadedStrings extension enabled you will need to give an explicit type to the annotation, e.g. {-# ANN module ("HLint: ignore Eta reduce" :: String) #-}. The ANN pragmas can also increase compile times or cause more recompilation than otherwise required, since they are evaluated by TemplateHaskell.

For {-# HLINT #-} pragmas GHC may give a warning about an unrecognised pragma, which can be supressed with -Wno-unrecognised-pragmas.

For {- HLINT -} comments they are likely to be treated as comments in syntax highlighting, which can lead to them being overlooked.

My current preference is {- HLINT -}, but I think GHC should just special case {-# HLINT #-} and then in a few GHC releases we could use that. Unfortunately, other people disagree with me, so {- HLINT -} is the best we have.

Method 4: Using the C Pre Processor

hlint defines the __HLINT__ preprocessor definition (with value 1), so problematic definitions (including those that don't parse) can be hidden with:

#ifndef __HLINT__
foo = ( -- HLint would fail to parse this

Wednesday, January 09, 2019

GHC: From Bug to Merge (2)

Summary: The story of another bug, from report, patch, revisions, to merge.

I recently posted the story of a GHC bug that took 3 months to fix, which isn't great. I hoped that the recent infrastructure work to move GHC to GitLab would speed that up in future. Fortunately, I got to test that theory shortly after.

When experimenting with RebindableSyntax and MonadFailDesugaring I kept getting the error:

The failable pattern ‘Just x’
    is used together with -XRebindableSyntax. If this is intentional,
    compile with -Wno-missing-monadfail-instances.

It's annoying the warning is on by default, but nevermind, let's add -Wno-missing-monadfail-instances to silence the compiler. But alas, no flags could make the warning go away. Looking at the code, it's clear why:

; if | rebindableSyntax && (desugarFlag || missingWarning)
        -> warnRebindableClash pat

If you have RebindableSyntax and MonadFailDesugaring turned on, the value of the warning flag (missingWarning) is ignored. Boolean logic is fiddly, but replacing || with && seems to do the right thing.

Raise a PR

At this point I got Shayne Fletcher involved, who actually ran with most of the steps from here onwards. Given the change is small and the original code was obviously untested, we decided to raise a GitHub PR, skipping the Trac ticket and GHC Proposal steps.

A few days later GHC GitLab became available, so we closed the first PR and opened a GitLab PR.

Fix the PR

As with the previous bug to merge story, the immediate feedback was "please add a test suite entry", which we did.

Thanks to the better integration with CI etc, the PR clearly passed the tests and got merged shortly thereafter.


24 Dec - raise GitHub PR
27 Dec - raise GitLab PR
27 Dec - request for changes
28 Dec - code updated
29 Dec - merged

This bug was small enough to skip the bug tracker and proposal process, but even ignoring those steps, the speed was fantastic even over the holiday period. Hopefully this speed is the new normal!

Wednesday, December 12, 2018

GHC: From Bug to Merge

Summary: The story of a bug, from report, proposal, patch, revisions, to merge.

A while ago I found a bug in GHC. Like all good bugs, this one starts with playing around, in this case seeing if it was possible to eliminate String from a base-like library. There are two interesting extensions:

  • OverloadedStrings changes each string literal to be wrapped in fromString, so "hello" in the source code becomes fromString "hello" when desugared.
  • RebindableSyntax means that when GHC sees fromString or fail (or lots of other functions) it uses the version in scope, rather than the version in the base library.

Taking a look at the code:

foo x = do
    Just y <- x
    return y

In GHC 8.6 it desugars to something equivalent to:

foo x = x >>= \v ->
    case v of
        Just y -> return y
        _ -> fail ("pattern match error" :: String)

In particular, it does not insert a fromString around the generated "pattern match error". If you are using the built-in fail, that's fine, since the built-in fail wants a String. But if we define a custom fail function which doesn't use String, but which takes the output of fromString, then we can get type errors with the definitions:

newtype Text = Text String
fail (Text x) = error x
fromString = Text

The solution is fairly simple - desugar adding a fromString - which is what GHC 8.8 will do. This post is the story of getting that change integrated.

Raise a GHC ticket

The first step after finding a bug is to raise a GHC ticket, which I did as Trac 15645, also offering to fix it. After discussing a bit with the GHC developers, we came the conclusion that the essential property of the bug is:

Overloaded strings should imply even generated strings are overloaded, if they are passed to user-controlled functions.

I think (and still think) that this ticket is a pure bug fix, partly because changing it does not imply changing the user manual in any way. However, GHC developers quite reasonably disagreed, and suggested I go through the GHC proposal process.

Raise a GHC proposal

The GHC Proposal process involves creating a proposal, having people discuss it, and then passing it via a committee. At this point I got Shayne Fletcher involved, who actually ran with most of the steps from here onwards. We raised GHC Proposal 168, setting out the case why the change should be made. We @ mentioned people on GitHub, and posted to Twitter to solicit opposing views, but only 4 people made any comment, and none of them disagreed. We submitted to the committee after the discussion was verifiably dead (not that it was ever super alive), and had the proposal accepted.

Write the code

After having the proposal accepted the hard work of writing the code began as Phab D5251. The code itself wasn't simple, but neither was it very complex. An initial version was reviewed with lots of good suggestions from Simon Peyton Jones. Shayne made some revisions, including adding a regression test, and the patch was ready. Sometime later the code was merged to GHC master.


The whole process took a long time - 14 Sep to 12 Dec (fortunately all 2018), so nearly 3 months. That timeline was significantly extended because GHC is in the process of changing hosting for their code, which both makes it harder to merge stuff and involves the people who would normally be merging stuff doing other work. However, I thought it instructive to show where the time went.

  • 14 Sep - 14 Sep: Open bug and discuss it
  • 14 Sep - 17 Sep: Writing the proposal
  • 17 Sep - 20 Sep: Discussing the proposal
  • 20 Sep - 24 Sep: Confirming the discussion had died down
  • 24 Sep - 21 Oct: Waiting for the committee decision
  • 21 Oct - 22 Oct: Opening the Phab ticket
  • 22 Oct - 25 Oct: Discussion on the code review
  • 25 Oct - 29 Oct: Addressing review comments
  • 29 Oct - 31 Oct: Additional review and discussion
  • 1 Nov - 26 Nov: Waiting for people to give a final approval
  • 26 Nov - 11 Dec: Waiting to merge in

I think the interesting thing is that of the 3 months, over 2 months was spent waiting for confirmations of a mechanical nature. However, every time real in-depth thoughtful discussion was required (code review, discussing the bug) it happened almost immediately.