Tuesday, June 18, 2019

The One PR Per Day Rule

Summary: The rough rule I use for teams I'm on is make at least one PR per day.

One of the principles I've used quite successfully in a number of teams I've been involved with is:

Make at least one Pull Request per day

This principle nicely captures a number of development practices I consider important.

  • Most things should be reflected in code. If you spend a day coding, improving documentation, writing tests etc. there is a natural reflection in the code. If you spend a day helping someone through some problems, that probably indicates there is better documentation to be written. If you spend a day doing dev-ops, that should probably be reflected with Terraform files or similar. Not everything that matters produces code (e.g. organising an office party, immigration paperwork, attending a conference), but most things do.

  • Work incrementally. If a piece of code takes more than one day, it's a good idea to split it into smaller pieces that can land incrementally. It's always possible that after a few days work you'll realise your overarching idea wasn't great, but if you've polished up some libraries and added tests along the way, that still produced value.

  • Work with autonomy. I'm a big fan of giving developers as much autonomy as possible - discuss the broad goals and then let them figure out the details. However, with such freedom, it's much easier for things to go off in the wrong direction. Seeing incremental pieces of code every day gives a fairly good direction indicator, and allows problems to surface before a massive time investment.

  • Write reviewable code. If you have 20K lines in one big blob, there's no realistic way to review it. By splitting code into smaller, manageable, independent units it's much easier to review. More importantly, the reviewer should be able to say "No, that's not a good idea" - doing that to a days work is sad, doing it to a whole months work is brutal.

  • Foster collaboration. In a rapidly moving project, it's important that everyone is benefiting from other peoples incremental improvements, as otherwise everyone solves the same problems. By getting the code merged every day it's much easier for different people to contribute to an area of the code base, avoiding the problem of others staying away from a piece of code that someone else is working on.

  • Get feedback. If the end user is able to test the results every day that's even better, as it means they can be involved in the feedback loop - potentially refining what they actually want.

The "rule" isn't really a rule, it's more a statement of culture and principles, but one I have found concise and simple to explain. While I like this as a statement of culture, I do not measure it, as that would create all the wrong incentives.

Monday, June 17, 2019

Shake from 10,000ft

Summary: A very high-level view of the engineering aspects of Shake.

The theory behind Shake is now well documented in the Build Systems a la Carte paper, but the engineering design of the system is not. This post is a high-level overview of Shake, from 10,000ft (the types are the types I'm thinking of in my mind - read the source code for the ground truth).

Data Storage

At it's heart, the Shake "database" (not really a database) stores a mapping from keys (k) to values (v). That mapping is stored on disk as a list of (k,v) pairs. When Shake computes a new value for k, it appends it to the end of the file. When Shake starts, it loads the list, keeping only the last value for any given k, and producing a Map k v. On disk, each entry is prefixed by its size, so if the file is truncated (e.g. machine shutdown), any trailing incomplete data can be discarded.

When operating in memory, Shake uses the data type Map k (Status v), with the definitions:

data Result v = Result
    {result :: v -- ^ the result associated with the Key
    {built :: Step -- ^ when it was actually run
    ,changed :: Step -- ^ when it last changed
    ,depends :: [[Id]] -- ^ dependencies
    }

data Status v
    = Loaded (Result v)
    | Running (Either SomeException (Result v)) -> IO ()))
    | Error SomeException
    | Ready (Result v)

Data is loaded in the Loaded state. When someone demands a key it moves to Running - anyone who subsequently demands it will be added to the callback. After the run completes it becomes either Error or Ready.

Execution Model

Shake runs values in the Action monad, which is a combination of some global state (e.g. settings), per-rule state (e.g. dependencies), continuation monad with IO underlying everything. The execution model of Shake is that every Action computation is either blocked waiting for a Running to complete, or queued/executing with the thread pool. The thread pool has a list of things to do and runs them in a given priority order, respecting parallelism constraints. Most threads start up, do a bit of work, block on a Running and leave it for another item in the thread pool to continue them.

To pause a thread we use continuations, meaning the most important operation on Action (which isn't available to users!) is:

captureRAW :: ((Either SomeException a -> IO ()) -> IO ()) -> Action a

This function stops an Action in its tracks, resuming it when the continuation is called. One invariant of Shake, which is (sadly!) not tracked by the type system is that every continuation passed to captureRAW must be called exactly once.

The symmetry between Running and captureRAW is deliberate, and convenient.

To kick start the thread pool, the user specifies things to run with action - a Shake program completes when all those initial action calls have completed.

Rules

The final piece of the puzzle is what Shake actually does to build a key. The core of Shake is abstract over the k and v, but Shake ships with an outer layer of around ten additional rule types -- the most important of which is files. Taking an idealised (and inefficient) file rule, we can think of it as mapping from file paths to file contents. To provide such a rule, we first declare the type mapping:

type instance RuleResult FilePath = String

And then declare the rule. The rule says how to take the key (the filename), whether its dependencies have changed, the old value (the old contents), and produce a new value (the current contents). In addition, the rule must say if the contents have changed in a meaningful way, which causes anyone who depended on them to rebuild.

Shake programs typically call apply which builds a list of keys in parallel, moving all the keys to Loaded (or at least one to Error) before continuing.

Hidden Complexity

There's a number of pieces I haven't mentioned but which hide quite a lot of complexity:

  • Shake operates on any k/v pair, but serialising arbitrary values is hard, so Shake needs to build mapping and translation tables to make that work.
  • Many rules are defined in terms of pattern matches - e.g. **/*.c - that matching logic is tricky.
  • Many rules ultimately call command line programs, so a flexible command line execution API is required.
  • The rules that Shake ships with are highly optimised and have to operate in a variety of circumstances, e.g. with --skip flags etc, so have a lot of cases in them.
  • Shake goes to a lot of effort to make binary serialisation fast, as otherwise that turns into a bottleneck.
  • Exceptions, parallelism and continuations aren't natural bedfellows - the combination requires some care and attention.
  • There are lots of utility functions, UI concerns, profiling features etc.
  • There are lots of tests. Shake is 17K lines of code, of which 4.5K lines is tests.

The Picture Version

Stepping back, the picture diagram looks like:

Thursday, June 13, 2019

HLint's path to the GHC parser

Summary: HLint is going to switch to the GHC parser over the next few months. The plan is below.

For some time, HLint has been accumulating a list of those files which are valid GHC Haskell but don't parse with haskell-src-exts. The list of differences keeps growing. While I have appreciated all the maintainers of haskell-src-exts, there have been a fair few of them recently, and the project has felt like it was in maintenance mode, rather than a vibrant project.

To solve this problem, I decided to switch to the GHC parser. However, the GHC parse tree changes significantly with each version of GHC, and HLint needs to support more than one version of GHC. The solution was ghc-lib - a decoupling of the GHC API, turning it into a reusable library. As of now, the latest haskell-src-exts maintainer has recommended people move to ghc-lib.

The plan for HLint is tracked in a GitHub issue. The first step was to switch so all files are parsed with both haskell-src-exts and ghc-lib - with a failure if either parser fails - that step has been completed and released (with much work from Shayne Fletcher, who is my partner in crime for this transition).

The next step was to abstract over the Language.Haskell.HLint3 API to produce a version that didn't fundamentally rely on the haskell-src-exts data types. That has led to the Language.Haskell.HLint4 API which makes things like parsed modules abstract, and removes functions that Aelve Codesearch showed weren't being used in practice (e.g. functions for approximate Scope resolution).

The next release will ship with a 0.1 breaking-change bump and HLint3 reexporting what is currently HLint4. If you think the HLint4 API does not include necessary functions, please let me know ASAP. After that release, we'll start changing hints one by one to use the GHC parse tree. Once that is complete, we will drop the dependency on haskell-src-exts and the project will be complete.

For command line users of HLint you should notice greater compatibility with GHC, but relatively little else.

Monday, May 20, 2019

Hoogle XSS Vulnerability

Summary: Hoogle 5.0.17.6 and below have an XSS vulnerability, fixed in later versions.

On Friday afternoon I got an email from Alexander Gugel with the subject line "Non-persistent XSS vulnerability on hoogle.haskell.org" - never a good thing to get. He had found that Hoogle was echoing the user search string back into the page, meaning that if you searched for %27"><marquee style you could make all the results scroll past in a disturbingly hypnotic manner. Oh dear!

Step 1: Fix the website

The first concern was to fix the website. While there aren't any cookies stored by Hoogle, and there are no logon forms or similar, the Project Zero blog has taught me that given the tiniest chink, everything can be broken. Fortunately, Alex emailed me using the email address on my webpage, described the problem, and provided a 3 line diff that escaped all the problematic variables. I applied this fix and pushed a new version to hoogle.haskell.org.

Step 2: Use the type system

Like any good Haskeller, my first thought on encountering a bug is to use the type system to prevent it by construction. The problem boils down to taking user input and splicing it into an HTML page. My initial fix was to introduce a type Taint:

newtype Taint a = Taint a

escapeUntaint :: Taint String -> String
escapeUntaint (Taint x) = escapeHTML x

The idea is that instead of the query parameters to the web page being String's that can be carelessly spliced into the output, they were Taint String values whose only real unwrapping function involves escaping any HTML they may contain. Furthermore, Taint can have instances for Monad etc, meaning you can work on tainted values, but the result will always remain tainted.

Using this approach uncovered no additional problems, but gave me much more confidence there weren't any I just hadn't found.

Step 3: Make a release

At this point I made a release of Hoogle 5.0.17.7. This version has no known XSS issues with it.

Step 4: Switch to blaze-html

While Taint is an effective tool for some domains, the real problem for Hoogle was that I was building up HTML values using String - making it way too easy to create invalid HTML, and providing an easy attack vector. The next change was to switch to blaze-html, which uses strong typing to ensure the HTML is always valid. Instead of having to call escapeHTML to turn bad String into good String, I instead used H.string to turn bad String into good Markup. For the rare case where there genuinely was String that contained HTML for good reasons I used H.preEscapedString, making the "don't escape" explicit and longer, and the "do escape" the default - a much safer default.

Step 5: Use Content Security Policy headers

There are a whole suite of headers that can be returned by the server to opt in to additional checking, known as CSP headers. These headers can ban inline script, detect XSS attacks, avoid confusion with MIME types, avoid http downgrade attacks and more. Thanks to Gary Verhaegen many of these are now applied to Hoogle, meaning that even if my code is wrong, the chances of it causing any damange (even just hypnotic scrolling) are much reduced.

Step 6: Relax

Hoogle 5.0.17.8 has all the security fixes listed and is deployed to hoogle.haskell.org. Hopefully no more security issues for a while!

Many thanks to Alexander Gugel for the responsible disclosure, and to Gary Verhaegen for his work on CSP headers.

Tuesday, May 14, 2019

Shake with Applicative Parallelism

Summary: Shake now does that Applicative trick from Haxl.

In Shake 0.17.9 and below, need xs >> need ys builds xs in parallel, then afterwards builds ys in parallel. The same is true of need xs *> need ys, where *> is the applicative equivalent of >>. From Shake 0.18 onwards both versions run everything in parallel. Hopefully that makes some Shake-based build systems go faster.

What change is being made?

If you make two calls to apply without any IO, monadic-bind or state operations in between then they will be executed as though you had made a single call to apply. As examples, need, askOracle and getDirectoryFiles are all calls to apply under the hood, so can be merged. However, note that the invariants are somewhat subtle. Something as simple as:

myNeed xs = do putNormal "Needing here"; need xs

Will not be merged with a preceeding need - the function putNormal queries the state (what is the verbosity level), does IO and contains a monadic bind.

Why are you making this change?

I am making the change for two reasons: 1) people have kept asking for it since Haxl does it; 2) the Hadrian build probably benefits from it. The downsides are relatively low (more complexity inside Shake, slightly slower Action operations) but the benfits are potentially large.

Why didn't you make this change sooner?

My previous reasoning for not making the change was:

Shake could follow the Haxl approach, but does not, mainly because they are targeting different problems. In Haxl, the operations are typically read-only, and any single step is likely to involve lots of operations. In contrast, with Shake the operations definitely change the file system, and there are typically only one or two per rule. Consequently, Shake opts for an explicit approach, rather than allow users to use *> (and then inevitably add a comment because its an unusual thing to do).

I stand by that comment - explicit grouping of need or explicit use of parallel is often better - all it takes is a sneaky >>= and the parallelism disappears. But if this change improves some build times, it's hard to argue strongly against.

Will it break any build systems?

Potentially, but unlikely, and those it will break were already on thin ice. As some examples:

  • If a depends on some state change from b (e.g. creating a directory), but doesn't have a dependency on it, then need [a] >> need [b] might have worked, while need [a,b] might not. The correct solution is for a to depend on b, if it does in fact depend on b, or at the very least use orderOnly.
  • If you use getDirectoryFiles on generated files (something the documentation says is a bad idea) then if merged with the thing that generates the files you will get incoherent results. The solution is to avoid using getDirectoryFiles on generated files.

Thanks to Pepe Iborra for encouraging, testing and troubleshooting this change.

Monday, April 29, 2019

foldr under the hood

Summary: The foldr function seems simple, but is actually very complex, with lots of layers. This post dives through the layers.

The foldr function takes a list and replaces all : (cons) and [] (nil) values with functions and a final value. It's available in the Haskell Prelude and described on Wikipedia. As some examples:

sum = foldr (+) 0
map f = foldr (\x xs -> f x : xs) []

But the simple foldr described on Wikipedia is many steps away from the one in the Haskell Prelude. In this post we'll peel back the layers, learning why foldr is a lot more complicated under the hood.

Layer 1: Wikipedia definition

The definition on Wikipedia is:

foldr :: (a -> b -> b) -> b -> [a] -> b
foldr f z []     = z
foldr f z (x:xs) = f x (foldr f z xs)

This recursive definition directly describes what foldr does. Given a list [1,2,3] we get f 1 (f 2 (f 3 z)).

Layer 2: Static argument transformation

The problem with this definition is that it is recursive, and GHC doesn't like to inline recursive functions, which prevents a lot of optimisation. Taking a look at sum, it's a real shame that operations like (+) are passed as opaque higher-order functions, rather than specialised to the machine instruction ADD. To solve that problem, GHC defines foldr as:

foldr f z = go
    where go []     = z
          go (x:xs) = f x (go xs)

The arguments f and z are constant in all sucessive calls, so they are lifted out with a manually applied static argument transformation.

Now the function foldr is no longer recursive (it merely has a where that is recursive), so foldr can be inlined, and now + can meet up with go and everything can be nicely optimised.

Layer 3: Inline later

We now have foldr that can be inlined. However, inlining foldr is not always a good idea. In particular, GHC has an optimisation called list fusion based on the idea that combinations of foldr and build can be merged, sometimes known as short-cut deforestation. The basic idea is that if we see foldr applied to build we can get rid of both (see this post for details). We remove foldr using the GHC rewrite rule:

{-# RULES "my foldr/build" forall g k z. foldr k z (build g) = g k z #-}

The most interesting thing about this rule (for this post at least!) is that it matches foldr by name. Once we've inlined foldr we have thrown away the name, and the rule can't fire anymore. Since this rule gives significant speedups, we really want it to fire, so GHC adds an extra pragma to foldr:

{-# INLINE [0] foldr #-}

This INLINE pragma says don't try and inline foldr until the final stage of the compiler, but in that final stage, be very keen to inline it.

Layer 4: More polymorphism

However, the foldr function in the Prelude is not the one from GHC.List, but actually a more general one that works for anything Foldable. Why limit yourself to folding over lists, when you can fold over other types like Set. So now foldr is generailsed from [] to t with:

foldr :: Foldable t => (a -> b -> b) -> b -> t a -> b

Where foldr on [] is GHC.List.foldr.

Layer 5: A default implementation

But foldr is actually in the type class Foldable, not just defined on the outside. Users defining Foldable can define only foldr and have all the other methods defined for them. But they can equally define only foldMap, and have an implicit version of foldr defined as:

foldr :: (a -> b -> b) -> b -> t a -> b
foldr f z t = appEndo (foldMap (Endo . f) t) z

Where Endo is defined as:

newtype Endo = Endo {appEndo :: a -> a}

instance Monoid (Endo a) where
    mempty = Endo id
    Endo a <> Endo b = Endo (a . b)

The function foldMap f is equivalent to mconcat . map f, so given a list [1,2,3] the steps are:

  • First apply map (Endo . f) to each element to get [Endo (f 1), Endo (f 2), Endo (f 3)].
  • Next apply mconcat to the list to get Endo (f 1) <> Endo (f 2) <> Endo (f 3).
  • Inline all the <> definitions to get Endo (f 1 . f 2 . f 3).
  • Apply the appEndo at the beginning and z at the end for (f 1 . f 2 . f 3) z.
  • Inline all the . to give f 1 (f 2 (f 3 z)), which is what we had at layer 1.

Layer 6: Optimising the default implementation

The real default implementation of foldr is:

foldr f z t = appEndo (foldMap (Endo #. f) t) z

Note that the . after Endo has become #.. Let's first explain why it's correct, then why it might be beneficial. The definition of #. is:

(#.) :: Coercible b c => (b -> c) -> (a -> b) -> (a -> c)
(#.) _ = coerce

Note that it has the same type as . (plus a Coercible constraint), but ignores it's first argument entirely. The coerce function transforms a value of type a into a value of type b with zero runtime cost, provided they have the same underlying representation. Since Endo is a newtype, that means Endo (f 1) and f 1 are implemented identically in the runtime, so coerce switches representation "for free". Note that the first argument to #. only serves to pin down the types, so if we'd passed an interesting function as the first argument it would have been ignored.

Of course, in normal circumstances, a newtype is free anyway, with no runtime cost. However, in this case we don't have a newtype, but a function application with a newtype. You can see the gory details in GHC ticket 7542, but at one point this impeeded other optimisations.

I tried a simulated version of foldr and found that if GHC can't tell that [] is the Foldable the code looks pretty bad, but if it can, at -O1 and above, the two implementations are 100% equialent (to the point that common subexpression elimination makes them actually the same). It's possible this final layer is a vestigial optimisation, or perhaps it's still important in some circumstances.

Tuesday, April 16, 2019

Code Statistics and Measuring Contributions

Summary: The only way to understand a code base is to ask someone who works on it.

This weekend a relative asked me how can we tell who wrote the code behind the black hole image, and was interested in the stats available on GitHub. There are lots of available stats, but almost every stat can be misleading in some circumstances. The only people who have the context to interpret the stats are those who work on the project, hence my default approach to assessing a project is to ask someone who works on it, with the understanding that they may look at relevant stats on GitHub or similar. In this post lets go through some of the reasons that a simplistic interpretation of the stats is often wrong.

These remarks all apply whether you're trying to assign credit for a photo, trying to do performance reviews for an employee or trying to manage a software project.

What to measure

There are broadly two ways to measure activity on the code in a repo. The first is additions/deletions of lines of code, where a modified line of code is usually measured as an addition and deletion. The second is number of commits or pull requests, which measures how many batches of changes are made. The problem with the latter is that different people have different styles - some produce big batches, some tiny batches - a factor of 10 between different developers is not unusual. There are also policy reasons that commits may be misleading - some projects squash multiple commits down to one when merging. The number of lines of code gives a better measure of what has changed, but it's merely better, not good - the rest of this post assumes people are focusing on number of lines of code changed.

All code is equal

Treating number of lines changed as the contribution assumes that every line is equally hard - but that's far from the case. At a previous company I worked on code that ranged from the internals of a compiler, to intricate C++ libraries, to Haskell GUI's. I estimate that I could produce 100x the volume of Haskell GUI's compared to C++ libraries. Other colleagues worked only only on the compiler, or only on GUIs - vastly changing how much code they produced per hour.

Similarly, each line of code is not equally important. Last week I wrote a few 100 lines of code. Of those, nearly all were done on Monday, and the remainder of the week involved a single line that is ferociously difficult with lots of obscure side conditions (libraries and link order...). That one line is super valuable, but simplistic measuring suggests I napped all Tuesday and Wednesday.

Code is attributed properly

Developers typically have user handles or email addresses that are used for code contributions. I currently have at least two handles, and in the past when we did stats on a $WORK project there were 6 different email addresses that I claimed ownership of. As a consequence, my work shows up under lots of different names, and counting it can be difficult. The longer a project runs, the more chance of developers changing identity.

The person who changed code did the work

A big part of software engineering is making old code obsolete. I was recently involved in deleting many thousands of lines that was no longer necessary. With a small team, we created a new project, implemented it, converted 90% of the uses over to the new code, and then stopped. Separately, someone else did the last 10% of the conversion, and then was able to delete a huge number of lines of code. There was definitely work in deleting the final bit of code, but the "labour" involved in that final deletion was mostly carried out months ago by others.

Similarly, when copying a new project in (often called vendoring) there is a big commit to add a lot of new code that was originally written by others, but which gets attributed to a named individual.

All code is in one place

Often projects will incorporate library code. For example, the official contribution of Niklas Broberg to HLint is 8 lines. However, he's called out explicitly in the README as someone whose contributions were essential to the project. In this case, because he wrote a library called haskell-src-exts without which HLint could not exist, and then continued to improve it for the benefit of HLint for many years.

Furthermore, projects like HLint rely on a compiler, libraries, operating system, and even a version control system. Usually these get overlooked when giving credit since they are relatively old and shared between many projects - but they are an essential part of getting things to work.

More code is better

The only purpose of code is to do a thing - whatever that thing might be. In all other ways, code is a liability - it has to be read, tested, compiled etc. Given the choice between 10 lines or 1,000,000 lines of code, I would always prefer 10 lines if they did the same thing. A smarter programmer who can do more with less lines of code is better. The famous quote attributed to Bill Gates is still true many decades later:

Measuring programming progress by lines of code is like measuring aircraft building progress by weight.

Code is the essence

Measuring code suggests that code is the thing that matters. The code certainly does matter, but the code is just a representation of an underlying algorithm. The code follows a high-level design. Often much more significant contributions are made by picking the right design, algorithm, approach etc.

Code is all that matters

In a large project there is code, but the code doesn't exist in a vacuum. There are other code-adjacent tasks to be performed - discussions, mentoring, teaching, reporting issues, buying computers etc. Many of these are never reflected in the code, yet if omitted, the code wouldn't happen, or would happen slower.

Sunday, April 07, 2019

Code Review: Approve with Suggestions

Summary: Code review is not a yes/no decision - mostly I say yes with suggestions.

As I wrote previously, I didn't used to be a fan of code review for $WORK code, but now I am. After I review some code there are three responses I might give:

  • "Request changes" - this code has some fatal flaw. Maybe I saw a race condition. Maybe there's insufficient testing. Maybe it's just a bad idea. Please fix it or convince me I'm wrong and I'll review again.
  • "Approved" - this code is great. Let's merge it. If the CI has already passed, I'll probably merge it now myself.
  • "Approved with suggestions" - this code is fine, I'm happy for it to be merged, but I thought of a few ways to make it better.

I think I use "Approved with suggestions" about 80% of the time. To use this status I think the code is correct, readable, and will have no negative effects - I'm happy for it to be merged. At the same time, I can think of a few ways to improve it - e.g. using some utility function, simplifying things a bit, making the documentation clearer. If the original author disagrees with me, I'm not going to bother arguing. I'm happy for these commits to be in a follow up PR, or pushed on top of this PR, whatever suits them.

What's different about approved with suggestions, at least for trusted individuals (e.g. colleagues), is that if they make these tweaks I have no real interest in rereviewing. I'm happy for my approval to remain sticky and for them to seek out rereview only if they think they need it. Importantly, this guideline is consistent with my reasons for reviewing. After the first review, if the code doesn't change meaningfully, a rereview offers none of the benefits that make me want to review in the first place.

Since the programming language DAML that I work on is now open source I can point at a concrete example using a pull request to our GHC fork. Here Shayne added a function:

qualifyDesugar :: (String -> OccName) -> String -> RdrName
qualifyDesugar occName =
  (mkRdrQual $ mkModuleName "DA.Internal.Desugar") . occName

It matches the stated intention, but it seems to do a bit too much - it turns a String into an OccName using a supplied function, when it could have just taken the OccName directly. Simpler, better, more maintainable. So I suggested:

qualifyDesugar :: OccName -> RdrName
qualifyDesugar = mkRdrQual $ mkModuleName "DA.Internal.Desugar"

A nice improvement. However, Shayne is responsible enough to make such simple tweaks that it didn't require another review. A typical case of Approve with suggestions.

Monday, March 18, 2019

GHC Rebuild Times - Shake profiling

Summary: GHC rebuild times are slow. Using the Shake profiler and Hadrian we can find out why.

I just checked out GHC, and using Hadrian (the Shake-based build system), built GHC on Windows using:

hadrian\build.stack.bat -j --flavour=quickest --integer-simple --configure --profile

Namely use stack to install dependencies and build Hadrian itself, then compile as quick as I can get it, on all CPUs (8 on my machine), run configure for me and a profile report.html output. After compiling Hadrian, 40m54s later I had a built GHC. That's not a quick process! Why did it take so long? If I bought a better machine would it go faster? How might we optimise GHC? These questions and more can be answered with Shake profiling.

Shake has had profiling for years, but in the recently-released Shake 0.17.7 I've overhauled it. The profile report is generated as a web page, and the generated output in the new version is smaller (2x smaller), loads faster (100x or more) and is more intuitive (not really a numeric thing). In the rest of this post I'll pepper some screenshots from the Shake profiler without thoughts about what it could mean. I've also uploaded the profile so you can play around with it:

Hadrian Profile

Summary Page

The first page you see when opening the report is the summary.

This page gives us some basic stats. There was 1 run of the build system. It ran 3,645 traced actions (e.g. command line calls or other expensive actions) and there were 15,809 rules run (where a rule is something with dependency information - somewhere between one third to two thirds of those are likely to be files in typical build systems).

Turning to performance, the entire build, on one CPU would take 2h26m. The build on my 8 CPU machine took 40m45s, with on average 3.58 commands going at once (quite a bit less than the 8 I wanted). The critical path is about 37m14s, so that's the lower bound with infinite CPUs, so buying a machine with more CPUs won't really help (faster CPUs and a faster disk probably would though).

OK, so I'm now unhappy that GHC doesn't execute enough in parallel. So let's see what it does manage to do in parallel by switching to the Command Plot.

Command Plot

We now see a graph of what was executing at each point during the build. We see spikes in a hideous light blue for GHC, showing that when GHC gets going, it can get near to the 8 CPUs we requested. However, we see lots of periods of time with only 1 task executing. In most cases these are either sh at the start (which I happen to know is configure), or cabal-configure (which is more obviously configure). However, there are also Haskell blips where we get down to 1 CPU. I'm now increasingly convinced that the biggest problem Hadrian has (performance wise) is lack of parallelism. To confirm that, let's switch to the Parallelizability tab.

Parallelizability

This next tab predicts how long it will take to build Hadrian at various different thread counts. The green line is if there were no dependencies, the blue line is with the dependencies we have, and the yellow line is the difference. As we can see, at 8 CPU's the difference is about 16m - I could have had my GHC a whole 16m faster if we could parallelise GHC more. At the same time, it looks like the sweet spot for compiling GHC is currently around 6 CPUs - more doesn't make a huge amount of difference. How sad. To investigate let's jump to the Rules tab.

Rules

Now we've moved on from pretty graphs to tables of rules. The most interesting columns for performance work are Time (how long something took), ETime (how long it took if you only pay for the fraction of the computer you are using) and WTime (how long you were the only thing running for). The first is most relevant if you want to take less CPU, the second two if you aren't hitting the parallelism you are hoping for. Since we aren't hitting the parallelism, we can sort by WTime.

For WTime, if we eliminated that step, the total build would improve by that amount of time. Looking at the first two entries, which are the initial configure and then configure of the base library, we see a total of 8m38s. If we could get rid of configure, or speed it up, or interleave it with other operations, we could save almost 10 minutes off a GHC build. Using the search bar we can figure out how much configure costs us in total.

Now we have used the search bar to filter to only rules that run the command cabal-configure or sh, and we've named them all in the group configure (so it sums them up for us). We see we spend 15m18s configuring, and would go about 10m faster if we didn't - so it's expensive, and serialises the build a lot. I hate configure.

Slow Stage0 Compilation

Ignoring configure, the next slow things are building the stage0 compiler, so let's focus in on that.

While we can use the search bar for entering JavasScript expressions, we can equally just enter substrings. Let's delve into /compiler/ and sort by Time. We quickly see a bunch of stage0 and stage1 compiles, with HsInstances.o and DynFlags.o right at the top. Those files take a really long time to compile, and end up serialising the build quite a bit. However, it's a bit odd that we see stage0, but a lot less of stage1. Let's compare the two stages:

Now we're using a regular expression to pull out the .o compiles in compiler, and group them by their stage. We can see that both involve 1,527 compiles, but that stage0 takes 43m, while stage1 is under 18m. Why? Either we're using different sets of flags (e.g. compiling stage0 with higher optimisations or warnings), or GHC HEAD (the output of stage0 which we use to compile stage1) is significantly faster than GHC 8.6.3 (which I used to compile stage0). I've no idea, but tracking down the difference could save at least 7 minutes on the wall clock time of building GHC.

Conclusion

Compiling GHC is slow, but the biggest problem is it doesn't parallelise well. Using Shake profiling we've found that configure and stage0 are the culprits. There's lots to be done, and hopefully a Summer of Code project too.

Sunday, February 17, 2019

Quadratic "deriving Generic" Compile Times

Summary: For large data types, deriving Generic can take a long time to compile.

I was building GHC using Hadrian, and part of that process involves compiling Cabal multiple times - once to build Hadrian itself, and then once per GHC stage thereafter (typically 3 stages). Compiling Cabal takes quite a long time, but one thing I spotted was that compiling the LicenseId file alone was taking 30 seconds! Given that it seemed to be on the critical path, and usually gets compiled 4 times, that seemed worth investigating.

Looking at the file, it defines an enumeration with 354 license types in it. Pulling apart the file, I found that with only the data type and deriving Generic it was taking about 9 seconds without optimisation or 22 seconds with optimisation, using GHC 8.6.3. I then wrote a test program that generated and compiled instances of the program:

{-# LANGUAGE DeriveGeneric #-}
module Sample where
import GHC.Generics
data Sample = Ctor1 | Ctor2 | ...
   deriving Generic

The resulting graph shows that compilation time is quadratic in the number of constructors:


This phenomenon could either be because the deriving Generic itself is quadratic, or more likely that the output generated by deriving Generic provokes quadratic behaviour in some subsequent part of the compiler. I'm going to report this as a GHC bug.

Update: There's already GHC bug 5642 which explains that it might just be a fundamental property of how Generic instances look.

For Cabal the fix might be somewhat simpler - the deriving Generic is only used to define a Binary instance, which can be written in 2 lines for an Enum anyway - discussion on the issue tracker. Hopefully this change will speed up compiling Cabal, and thus speed up compiling GHC.

As I watched the GHC output go by I noticed that both Parser and DynFlags in GHC itself were also very slow, with the latter taking nearly 2 minutes. I'm sure there's more scope for optimising GHC builds, so Andrey Mokhov and I have proposed a Google Summer of Code project to tackle other low-hanging fruit.

The quick program I wrote for generating the data in this blog post is below:

import Control.Monad
import Data.List
import System.Time.Extra
import System.Process.Extra
import Numeric.Extra

main = do
    forM_ [1..] $ \n -> do
        writeFile "Sample.hs" $ unlines $
            ["{-# LANGUAGE DeriveGeneric #-}"
            ,"module Sample where"
            ,"import GHC.Generics"
            ,"data Sample = " ++ intercalate " | " ["Ctor" ++ show i | i <- [1..n]]
            ,"   deriving Generic"
            ]
        xs <- forM [0..2] $ \o ->
            fmap fst $ duration $ systemOutput_ $ "ghc -fforce-recomp Sample.hs -O" ++ show o
        putStrLn $ unwords $ show n : map (showDP 2) xs

Tuesday, February 05, 2019

Announcing ghc-lib

On behalf of Digital Asset I'm delighted to announce ghc-lib, a repackaging of the GHC API to allow it to be used on different GHC versions. The GHC API allows you use the GHC compiler as a library, so you can parse, analyze and compile Haskell code.

The GHC API comes pre-installed with GHC, and is tied to that GHC version - if you are using GHC 8.6.3, you get version 8.6.3 of the API, and can't change it. The ghc-lib package solves that problem, letting you mix and match versions of the GHC compiler and GHC API. Why might you want that?

  • Imagine you are writing a tool to work with several versions of the GHC compiler. The GHC API changes significantly between each version, so doing this would require writing a lot of C preprocessor code to support it. An alternative is to use one version of ghc-lib which works across multiple versions of GHC.
  • Imagine you are modifying the GHC API or want features from GHC HEAD. With ghc-lib you can depend on the revised GHC API, without upgrading the compiler used to build everything, speeding up iteration.

While ghc-lib provides the full GHC API, it doesn't contain a runtime system, nor does it create a package database. That means you can't run code produced by ghc-lib (no runtime), and compiling off-the-shelf code is very hard (no package database containing the base library). What you can do:

The package ghc-lib is released on Hackage, and can be used like any normal package, e.g. cabal install ghc-lib. Since ghc-lib conflicts perfectly with the GHC API and template-haskell, you may wish to ghc-pkg hide ghc-lib and use the language extension PackageImports to do import "ghc-lib" GHC. There will be two release streams within the ghc-lib name:

  • Version 8.8.1 will be the version of ghc-lib produced against the released GHC 8.8.1, once GHC 8.8.1 is released. There is no release against GHC 8.6.3 because we had to make changes to GHC to enable ghc-lib, which were only upstreamed in the last few months.
  • Version 0.20190204 is the version of ghc-lib using GHC HEAD on the date 2019-02-04.

We've been developing and using ghc-lib internally at Digital Asset for the last six months. The use of ghc-lib has enabled us to track GHC HEAD for certain projects, and develop improvements to GHC itself, and then integrate them without requiring us to rebuild all Haskell source code on every step. Smoothing out that development loop has been a massive productivity boon to us.

While this is Digital Asset's first open source project in a while, we have been making lots of contributions behind the scenes - it's no coincidence several of my recent posts involve my colleague Shayne. In particular, our engineers have been behind several GHC proposals.

While I'm announcing the project, much of the work has been done by Shayne Fletcher and Jussi Maki, but neither of them have active blogs just yet!

Friday, January 25, 2019

HLint Unused Extension Hints

Summary: HLint detects unused extensions in LANGUAGE pragmas, including over 17,000 on Hackage.

HLint has detected unused LANGUAGE pragmas for a while - where you enable an extension (e.g. {-# LANGUAGE EmptyDataDecls #-}) but don't use it. HLint v2.1.13 includes some improvements from Yair and myself making these hints even more powerful. As a result, I thought it worth showing some examples of what HLint can do in this area. I started by running HLint on all of Hackage, which found 17,718 "Unused LANGUAGE pragma" hints, including the examples in this post.

Detecting unused extensions

For extensions that show up as syntax (e.g. EmptyDataDecls, ViewPatterns etc), HLint has rules saying which constructs require which extensions. For extensions that aren't syntax directed (e.g. AllowAmbiguousTypes or IncoherentInstances), HLint can't detect whether they are used or not. In all, HLint has rules for detecting 36 different unused extensions. Taking a look at some examples from Hackage:

abcBridge-0.15\src\Data\ABC\AIG.hs:18:1: Warning: Unused LANGUAGE pragma
Found:
  {-# LANGUAGE EmptyDataDecls #-}
Perhaps you should remove it.

mallard-0.6.1.1\lib\Database\Mallard\Validation.hs:4:1: Warning: Unused LANGUAGE pragma
Found:
  {-# LANGUAGE TemplateHaskell #-}
Perhaps you should remove it.

scholdoc-texmath-0.1.0.1\src\Text\TeXMath\Writers\TeX.hs:1:1: Warning: Unused LANGUAGE pragma
Found:
  {-# LANGUAGE GeneralizedNewtypeDeriving, ViewPatterns, GADTs #-}
Perhaps:
  {-# LANGUAGE GeneralizedNewtypeDeriving, GADTs #-}

As we can see, HLint can spot entirely redundant extension declarations, and also prune those that are partly redundant.

Duplicate extensions

Sometimes extension are simply duplicated, and HLint detects these, either between two separate pragmas, or within a single pragma.

ghcjs-base-stub-0.2.0.0\src\GHCJS\Marshal\Pure.hs:3:1: Warning: Use fewer LANGUAGE pragmas
Found:
  {-# LANGUAGE DefaultSignatures #-}
  {-# LANGUAGE DefaultSignatures #-}
Perhaps:
  {-# LANGUAGE DefaultSignatures #-}

abstract-deque-tests-0.3\Data\Concurrent\Deque\Tests.hs:1:1: Warning: Use fewer LANGUAGE pragmas
Found:
  {-# LANGUAGE BangPatterns, RankNTypes, CPP, BangPatterns #-}
Perhaps:
  {-# LANGUAGE BangPatterns, RankNTypes, CPP #-}

Implied extensions

The new feature for v2.1.13 is that extension are detected as redundant if they are implied by other extensions. For example, if you have PolyKinds defined then that implies KindSignatures. HLint now features a list of such implications, which it uses to detect redundant extensions.

AERN-RnToRm-0.5.0.1\src\Data\Number\ER\RnToRm\UnitDom\Base.hs:1:1: Warning: Unused LANGUAGE pragma
Found:
  {-# LANGUAGE MultiParamTypeClasses #-}
Perhaps you should remove it.
Note: Extension MultiParamTypeClasses is implied by FunctionalDependencies

attoparsec-0.13.2.2\Data\Attoparsec\ByteString\Char8.hs:1:1: Warning: Unused LANGUAGE pragma
Found:
  {-# LANGUAGE BangPatterns, CPP, FlexibleInstances, TypeFamilies,
    TypeSynonymInstances, GADTs #-}
Perhaps:
  {-# LANGUAGE BangPatterns, CPP, FlexibleInstances, TypeFamilies,
    GADTs #-}
Note: Extension TypeSynonymInstances is implied by FlexibleInstances

Redundant extensions that imply non-redundant extensions

Sometimes there is an extension that you can tell is unused (e.g. RecordWildCards), which implies an extension that is either being used or can't be detected (e.g. DisambiguateRecordFields). In such cases HLint gives a note that the implied extension might now need to be provided explicitly, although usually it won't be necessary. As examples:

gogol-maps-engine-0.3.0\gen\Network\Google\Resource\MapsEngine\Projects\List.hs:7:1: Warning: Unused LANGUAGE pragma
Found:
  {-# LANGUAGE RecordWildCards #-}
Perhaps you should remove it.
Note: may require `{-# LANGUAGE DisambiguateRecordFields #-}` adding to the top of the file

manifolds-0.5.0.1\Data\Function\Affine.hs:14:1: Warning: Unused LANGUAGE pragma
Found:
  {-# LANGUAGE FunctionalDependencies #-}
Perhaps you should remove it.
Note: may require `{-# LANGUAGE MultiParamTypeClasses #-}` adding to the top of the file

Being wrong

Finally, sometimes HLint gets it a bit wrong. As an example:

shake-0.17.1\src\Development\Shake\Internal\FileInfo.hs:1:1: Warning: Unused LANGUAGE pragma
Found:
  {-# LANGUAGE GeneralizedNewtypeDeriving, DeriveDataTypeable, CPP,
    ForeignFunctionInterface #-}
Perhaps:
  {-# LANGUAGE GeneralizedNewtypeDeriving, DeriveDataTypeable, CPP
    #-}

Here HLint has detected that ForeignFunctionInterface is not used, but in fact it is, although only under one branch of an #ifdef. To fix the hint we can put the extension itself under CPP, adjust the CPP definitions given to HLint, or ignore the hint.

Tuesday, January 22, 2019

Release delays with Stackage

Summary: There are two steps that delay new versions of packages in Stackage.

I aim to get the latest version of my software out to as many people as quickly as possible. Older versions have bugs, new versions have new features - that's why I release new versions. Unfortunately there are two steps in Stackage that slow down this process.

Taking an example, HLint depends on haskell-src-exts, and is tightly coupled, so (to a first approximation) every 0.1 bump to haskell-src-exts requires changing HLint. There are also lots of other packages that depend on haskell-src-exts. This situation leads to two delays in getting HLint to Stackage users, both of which are on display in bug 4214:

Issue 1: Reluctance to remove packages

Stackage has a policy that if a new package (e.g. haskell-src-exts) is released which breaks your package (e.g. haskell-src-meta) you have an unspecified amount of time to release an update. My experience is either packages are updated quickly (all upgrades on that ticket happened within 12 days) or the package maintainers never reply (46 days later no other maintainer has even left a comment).

It used to be the case that there were hard time limits (maximum one month), but my experience was those were never enforced. Unfortunately this lag can cause a significant delay until Stackage Nightly picks up an upgrade. It seems like a more mechanical rule (e.g. after 2 weeks with no update, or 6 weeks total) might keep the process ticking faster. I appreciate it's hard to break people's work, which is why making it come down to human judgement seems to lengthen the process significantly.

Delay imposed: up to 2 months, and sometimes requires chasing.

Issue 2: Existence of Stackage LTS

While the first issue is very much a trade off, the second one is (in my view) just a bad design of Stackage, as I've said before. There is Stackage Nightly which has the latest code. There is Stackage LTS which has older and therefore buggier code, up to 2-3 months older. Having two options is fine, but the stack tool and documentation direct people towards LTS as a preference. LTS is useful if you view the act of upgrading between 0.0.1 versions as low risk (which it isn't) or you find it easier to fix multiple distinct breaking changes when they are overlapped (which it isn't). Unfortunately Stackage LTS users won't get a new version of HLint until a new Stackage LTS version is created, even after it gets merged. On the plus side, this process happens automatically without any intervention by package authors.

Delay imposed: 2-3 months.

PS. While I criticise Stackage, that's because I want to make it better, since it is a very useful distribution channel for many people, and I'm grateful for the work the Stackage maintainers do to keep the process ticking along.

Thursday, January 17, 2019

Ignoring HLint

Summary: HLint now has more ways to ignore hints you don't like.

HLint makes suggestions about how to improve your Haskell code. But not everyone likes all the suggestions all the time, so HLint comes with ways of ignoring those annoying hints, and HLint 2.1.11 provides even more mechanisms. Without further ado, let's take a quick tour - full details are in the HLint README.

Method 1: the --default flag

To ignore all hints your code currently generates run hlint as normal, but passing the --default flag, which will generate a config file with all hints that fire set to ignore. Typically, when approaching a new code base to run HLint on, I start by doing:

hlint . --default > .hlint.yaml

After that, it's easy to remove ignored hints from .hlint.yaml one by one and fix the code.

Method 2: Add -ignore directives

In the .hlint.yaml file you can write:

- ignore: {name: Eta reduce}

This directive ignores the named hint, and is what --default generates. There are also more refined ways of ignoring a hint in certain modules, or ignoring all hints in certain modules (see the README).

Method 3: Add a {- comment -}

Method 3 actually has 3 sub-methods, you can write any of:

  • {-# ANN module "HLint: ignore Eta reduce" #-}
  • {-# HLINT ignore "Eta reduce" #-}
  • {- HLINT ignore "Eta reduce" -}

For ANN pragmas it is important to put them after any import statements. If you have the OverloadedStrings extension enabled you will need to give an explicit type to the annotation, e.g. {-# ANN module ("HLint: ignore Eta reduce" :: String) #-}. The ANN pragmas can also increase compile times or cause more recompilation than otherwise required, since they are evaluated by TemplateHaskell.

For {-# HLINT #-} pragmas GHC may give a warning about an unrecognised pragma, which can be supressed with -Wno-unrecognised-pragmas.

For {- HLINT -} comments they are likely to be treated as comments in syntax highlighting, which can lead to them being overlooked.

My current preference is {- HLINT -}, but I think GHC should just special case {-# HLINT #-} and then in a few GHC releases we could use that. Unfortunately, other people disagree with me, so {- HLINT -} is the best we have.

Method 4: Using the C Pre Processor

hlint defines the __HLINT__ preprocessor definition (with value 1), so problematic definitions (including those that don't parse) can be hidden with:

#ifndef __HLINT__
foo = ( -- HLint would fail to parse this
#endif

Wednesday, January 09, 2019

GHC: From Bug to Merge (2)

Summary: The story of another bug, from report, patch, revisions, to merge.

I recently posted the story of a GHC bug that took 3 months to fix, which isn't great. I hoped that the recent infrastructure work to move GHC to GitLab would speed that up in future. Fortunately, I got to test that theory shortly after.

When experimenting with RebindableSyntax and MonadFailDesugaring I kept getting the error:

The failable pattern ‘Just x’
    is used together with -XRebindableSyntax. If this is intentional,
    compile with -Wno-missing-monadfail-instances.

It's annoying the warning is on by default, but nevermind, let's add -Wno-missing-monadfail-instances to silence the compiler. But alas, no flags could make the warning go away. Looking at the code, it's clear why:

; if | rebindableSyntax && (desugarFlag || missingWarning)
        -> warnRebindableClash pat

If you have RebindableSyntax and MonadFailDesugaring turned on, the value of the warning flag (missingWarning) is ignored. Boolean logic is fiddly, but replacing || with && seems to do the right thing.

Raise a PR

At this point I got Shayne Fletcher involved, who actually ran with most of the steps from here onwards. Given the change is small and the original code was obviously untested, we decided to raise a GitHub PR, skipping the Trac ticket and GHC Proposal steps.

A few days later GHC GitLab became available, so we closed the first PR and opened a GitLab PR.

Fix the PR

As with the previous bug to merge story, the immediate feedback was "please add a test suite entry", which we did.

Thanks to the better integration with CI etc, the PR clearly passed the tests and got merged shortly thereafter.

Timeline

24 Dec - raise GitHub PR
27 Dec - raise GitLab PR
27 Dec - request for changes
28 Dec - code updated
29 Dec - merged

This bug was small enough to skip the bug tracker and proposal process, but even ignoring those steps, the speed was fantastic even over the holiday period. Hopefully this speed is the new normal!