Monday, March 16, 2020

The <- pure pattern

Summary: Sometimes <- pure makes a lot of sense, avoiding some common bugs.

In Haskell, in a monadic do block, you can use either <- to bind monadic values, or let to bind pure values. You can also use pure or return to wrap a value with the monad, meaning the following are mostly equivalent:

let x = myExpression
x <- pure myExpression

The one place they aren't fully equivalent is when myExpression contains x within it, for example:

let x = x + 1
x <- pure (x + 1)

With the let formulation you get an infinite loop which never terminates, whereas with the <- pure pattern you take the previously defined x and add 1 to it. To solve the infinite loop, the usual solution with let is to rename the variable on the left, e.g.:

let x2 = x + 1

And now make sure you use x2 everywhere from now on. However, x remains in scope, with a more convenient name, and the same type, but probably shouldn't be used. Given a sequence of such bindings, you often end up with:

let x2 = x + 1
let x3 = x2 + 1
let x4 = x3 + 1
...

Given a large number of unchecked indicies that must be strictly incrementing, bugs usually creep in, especially when refactoring. The unused variable warning will sometime catch mistakes, but not if a variable is legitimately used twice, but one of those instances is incorrect.

Given the potential errors, when a variable x is morally "changing" in a way that the old x is not longer useful, I find it much simpler to write:

x <- pure myExpression

The compiler now statically ensures we haven't fallen into the traps of an infinite loop (which is obvious and frustrating to track down) or using the wrong data (which is much harder to track down, and often very subtly wrong).

What I really want: What I actually think Haskell should have done is made let non-recursive, and had a special letrec keyword for recursive bindings (leaving where be recursive by default). This distinction is present in GHC Core, and would mean let was much safer.

What HLint does: HLint is very aware of the <- pure pattern, but also aware that a lot of beginners should be guided towards let. If any variable is defined more than once on the LHS of an <- then it leaves the do alone, otherwise it will suggest let for those where it fits.

Warnings: In the presence of mdo or do rec both formulations might end up being the same. If the left is a refutable pattern you change between error and fail, which might be quite different. Let bindings might be generalised. This pattern gives a warning about shadowed variables with -Wall.

Monday, March 02, 2020

How to get a Haskell job

Summary: There are four things I recommend to get a Haskell job. Applies to most technologies.

I was recently emailed by someone who asked for advice on what they could do to get a Haskell job in the future. Rather than share my reply only with them, I thought I'd cc the world via my blog. I'd give the same advice if asked about how to get a job focusing on any technology, just changing the examples. While the pieces of advice explain how they can be used to get a job, I believe they are all useful in their own right too!

Write Haskell

The most important thing to get a job in Haskell, is being fluent in Haskell, which can only be done by writing real Haskell programs/libraries. Solving small exercises or challenges will help a bit, but there are some limitations/solutions/approaches that you only learn when trying to do something for real. For beginners at Haskell, I recommend taking whatever you are interested in outside of Haskell, and writing a library about that. It can be image codecs, statistics, lasers, poker - whatever. There's probably something you know a lot about, which most people don't, which gives you a good starting point. That library will convince future employers you know how to write good code, and in the best case, you'll find an employer ends up using your library. Hiring people whose work you already use is an easy decision. When I've hired programmers in the past, I treat their CV as a pointer to their GitHub account.

Meet Haskellers

In most cities there are a bunch of either Haskell of functional programming meetups. Usually Meetup will have them, but a Google search can find them too. If there's nothing near you, try a more global event like ZuriHac or Haskell Implementors Workshop. These events give you an idea how other Haskell programmers think, and there are always people who are currently employed to write Haskell, who might offer you a job. Some of these Haskellers will even become friends who you collaborate with over decades.

Write words

As you are learning, write down what you are learning, what you are thinking, the hurdles you overcome and the thoughts you have. In many cases, no one will listen, but the mere act of writing down the words serves as a record of what you are learning. In some cases, you'll find an audience, and that audience will give you credibility (which isn't real credibility, but the world is a funny place) and contacts which can be useful in getting you a job. When I started, I wrote on my blog, but now Twitter or Medium might be better. Maybe it should be Twitch streams or SnapChat messages - I've no idea. Do whatever works for you. When I got my first Haskell job, I had colleagues who didn't know who I was, but had already been reading my blog.

Read news

It's important to have a rough idea of what's happening in the Haskell world. For Haskell, you might read Planet Haskell, some mailing lists, follow some people on Twitter, or read academic papers like those at ICFP. Nowadays there is so much information it's impossible to keep on top of it all, and if you try, you'll end up neglecting the other pieces of advice. Reading news won't directly get you a job, but it will expose you to a variety of techniques that keep you learning more.

Monday, January 27, 2020

One Haskell IDE to rule them all

Summary: The Haskell IDE Engine and Ghcide teams are joining forces on a single IDE.

This weekend many of the Haskell IDE Engine (HIE) and Ghcide developers met up for the Bristol Hackathon. Writing an IDE is a lot of work, and the number of contributors is finite, so combining forces has always seemed like a good idea. We now have a plan to combine our efforts, taking the best of each, and putting them together. Taking a look at the best features from each:

  • HIE provides a lot of plugins which extend the IDE. A choice of three formatters. LiquidHaskell. HLint and Hoogle. There are lots, and they are quite mature.
  • HIE has great build scripts that build the IDE for lots of different compilers and configurations.
  • HIE has gained lots of arcane knowledge about LSP and various clients, with an understanding of how best to respond in terms of latency/liveness.
  • HIE has driven a lot of improvements in the GHC API.
  • HIE has pioneered a lot of code that Ghcide subsequently reused, e.g. completions and hover.
  • Ghcide uses a Shake graph based approach to manage the IDE state, allowing a simpler programming model.
  • Ghcide breaks the GHC monad apart, making it easier to do tricks like reusing .hi files and multi-component builds with a single GHC session.
  • Both projects use the same set of underlying libraries - haskell-lsp, lsp-test and hie-bios.

Putting these together, we've decided that the best way forward is to create a new project at haskell/ide which combines the best of both. That project will be a source of plugins and a complete distribution of an IDE. Under the hood, it will use Ghcide as a library, making use of the core graph and logic. The new IDE project will take over plugins and build system from HIE. There are some things in Ghcide that will be separated into plugins, e.g. the code actions. There are some ideas in HIE that will be folded into Ghcide, e.g. liveness, plugin wrappers and LSP quirks. Together, we hope to create a world class IDE experience for Haskell.

In the short term, we don't recommend anyone switch to this new IDE, as we're still putting the pieces together - continue using whatever you were using before. If you're interested in helping out, we're tracking some of the major issues in this ticket and the IDE and Ghcide repos.

Thanks to everyone who has contributed their time to both projects! The current state is a consequence of everyone's time and open collaboration. The spirit of friendship and mutual assistance typifies what is best about the Haskell community.

By Alan Zimmerman, Neil Mitchell, Moritz Kiefer and everyone at the Bristol Hackathon

Friday, October 18, 2019

Improving Rebindable Syntax

Summary: Rebindable syntax is powerful, but sometimes too flexible. I had some ideas on how to improve it.

In Haskell, when you write 1, GHC turns that into GHC.Num.fromInteger 1, knowing that the binding is GHC.Num.fromInteger :: Num a => Integer -> a. If you want to use a different fromInteger you can turn on the RebindableSyntax extension, which uses whichever fromInteger is in scope. While I was working at Digital Asset on DAML, we built a GHC-based compiler with a different standard library. That standard library eliminates the Char type, has a packed Text instead of String = [Char], doesn't have overloaded numeric literals, renames Monad to Action and other changes. To get that working, we leveraged RebindableSyntax along with a module DA.Internal.RebindableSyntax which is automatically imported into every module unqualified (via a GHC source plugin).

With RebindableSyntax you can get a long way building your own base library, but there were two unpleasant parts:

  • When using RebindableSyntax, if the user writes let fromInteger = undefined in 1 then they use the fromInteger they defined in the let, shadowing the global one. For users who turn on RebindableSyntax deliberately, that's what they want. However, if you want to replace the base library and make it feel "just as good", then you'd rather than fromInteger was always some specific library you point at.
  • In the process of building fresh base libraries, we had to follow all the (pretty complex!) layering choices that GHC has made about how the modules and packages form a directed acyclic graph. There are some modules where using integer literals would cause a module cycle to appear. The fact that a number of fully qualified names are hardcoded in GHC makes for a fairly tight coupling with the base libraries, that would be better avoided.

I had an idea to solve that, but it's not fully fleshed out, and as of now, it's not a problem I still suffer from. However, I thought it worth dumping my (potentially unimplementable, certainly incomplete) thoughts out for the world, in case someone wants to pick them up.

My idea to solve the problem was to add a flag to GHC such as -fbuiltins=base.Builtins which would specify where to get all builtins. You could expect the base library to gain a module Builtins which reexported everything like fromInteger. With that extension, using RebindableSyntax is then saying "whatever is in scope", and using -fbuiltins is saying "qualify everything with this name" - they start to become fairly similar ideas. I see that has having a few benefits:

  1. It becomes easier for someone to write a principled standard library which doesn't have String = [Char], or whatever choice wants making, in a way that provides a coherent experience. One example is DAML, but another is the foundation library, which uses RebindableSyntax in its example programs.
  2. The lowest level GHC base libraries can be restructured to use RebindableSyntax as a way to more easily manage the dependencies between them in the base libraries themselves, rather than a cross-cutting concern with the compiler and base libraries. (This benefit might be possible even today with what we already have. Some people might strongly disagree that it's a benefit.)
  3. Things like which integer library to use can become a library concern, rather than requiring compiler changes.
  4. Currently the code path for RebindableSyntax is always quite different from the normal syntax path. As a result, sometimes it's not quite right and needs patching.

The main obvious disadvantages (beyond potentially the whole thing not being feasible) are that it would cause the compiler to slow down, as currently these types are hard-wired into the compiler.

Sunday, October 13, 2019

Monads as Graphs

Summary: You can describe type classes like monads by the graphs they allow.

In the Build Systems a la Carte paper we described build systems in terms of the type class their dependencies could take. This post takes the other view point - trying to describe type classes (e.g. Functor, Applicative, Monad) by the graphs they permit.

Functor

The Functor class has one operation: given Functor m, we have fmap :: (a -> b) -> m a -> m b. Consequently, if we want to end up with an m b, we need to start with an m a and apply fmap to it, and can repeatedly apply multiple fmap calls. The kind of graph that produces looks like:

We've used circles for the values m a/m b etc and lines to represent the fmap that connects them. Functor supplies no operations to "merge" two circles, so our dependencies form a linear tree. Thinking as a build system, this represents Docker, where base images can be extended to form new images (ignoring the newer multi-stage builds).

Applicative

The Applicative class has two fundamental operations - pure :: a -> m a (which we ignore because its pretty simple) and liftA2 :: (a -> b -> c) -> m a -> m b -> m c (most people think of <*> as the other fundamental operation, but liftA2 is equivalent in power). Thinking from a graph perspective, we now have the ability to create a graph node that points at two children, and uses the function argument to liftA2 to merge them. Since Applicative is a superset of Functor, we still have the ability to point at one child if we want. Children can also be pointed at by multiple parents, which just corresponds to reusing a value. We can visualise that with:

The structure of an Applicative graph can be calculated before any values on the graph have been calculated, which can be more efficient for tasks like parsing or build systems. When viewed as a build system, this represents build systems like Make (ignoring dependencies on generated Makefiles) or Buck, where all dependencies are given up front.

Selective

The next type class we look at is Selective, which can be characterised by the operation ifS :: m Bool -> m a -> m a -> m a. From a graph perspective, Selective interrogates the value of the first node, and then selects either the second or third node. We can visualise that as:

We use two arrows with arrow heads to indicate that we must point at one of the nodes, but don't know which. Unlike before, we don't know exactly what the final graph structure will be until we have computed the value on the first node of ifS. However, we can statically over-approximate the graph by assuming both branches will be taken. In build system terms, this graph corresponds to something like Dune.

Monad

The final type class is Monad which can be characterised with the operation (>>=) :: m a -> (a -> m b) -> m b. From a graph perspective, Monad interrogates the value of the first node, and then does whatever it likes to produce a second node. It can point at some existing node, or create a brand new node using the information from the first. We can visualise that as:

The use of an arrow pointing nowhere seems a bit odd, but it represents the unlimited options that the Monad provides. Before we always knew all the possible structures of the graph in advance. Now we can't know anything beyond a monad-node at all. As a build system, this graph represents a system like Shake.

Monday, July 01, 2019

Thoughts for a Haskell IDE

Summary: We have been working on pieces for a Haskell IDE at Digital Asset.

At Digital Asset, we wrote the DAML programming language. The compiler builds on GHC, and one of the important tools for using DAML is an IDE. You can try the DAML IDE online or download it. Since we wrote the DAML IDE in Haskell, and DAML uses GHC under the hood, it's possible to take the work we did for the DAML IDE and turn them into pieces for a Haskell IDE. In the rest of this post I'll outline what we wrote, and how I think it can make a full Haskell IDE.

What has Digital Asset written?

We have written a Haskell library hie-core, which serves as the "core" of an IDE. It maintains state about which files are open. It generates diagnostics. It runs the parser and type checker. It doesn't figure out how to load your package, and it doesn't have integrations with things like HLint etc. In my view, it should never gain such features - it's deliberately a small core of an IDE, which can be extended with additional rules and handlers after-the-fact.

On the technical side, at the heart of the IDE is a key-value store, where keys are pairs of file names and stages (e.g. TypeCheck) and values are dependent on the stage. We use the Shake build system in memory-only mode to record dependencies between phases. As an example of a rule:

define $ \TypeCheck file -> do
    pm <- use_ GetParsedModule file
    deps <- use_ GetDependencies file
    tms <- uses_ TypeCheck (transitiveModuleDeps deps)
    packageState <- use_ GhcSession ""
    opt <- getIdeOptions
    liftIO $ Compile.typecheckModule opt packageState tms pm

To type check a file, we get the parse tree, the transitive dependencies, a GHC session, and then call a typecheckModule helper function. If any of these dependencies change (e.g. the source file changes) the relevant pieces will be rerun.

Building on top of Shake wasn't our first choice - we initially explored two painful dead ends. While Shake isn't perfect for what we want, it's about 90% of the way there, and having robust parallelism and many years of solid engineering is worth some minor compromises in a few places. Having all the features of Shake available has also been exceptionally helpful, allowing us to try out new things quickly.

What else is required for an IDE?

My hope is that hie-core can become the core of a future IDE - but what else is required?

  • Something to load up a GHC session with the right packages and dependencies in scope. For DAML, we have a custom controlled environment so it's very easy, but real Haskell needs a better solution. My hope is that hie-bios becomes the solution, since I think it has a great underlying design.
  • Some plugins to add features, such as the as-yet-unwritten hie-hlint and hie-ormolu. Since we add lots of features on top of hie-core to make the DAML IDE, we have a good story for extensions in hie-core. Importantly, because shake is designed to be extensible, these extensions can integrate with the full dependency graph.
  • Something to talk Language Server Protocol (LSP) to communicate with editors, for which we use the existing haskell-lsp.
  • An extension for your editor. We provide a VS Code extension as extension in hie-core, but it's a fairly boilerplate LSP implementation, and people have got it working for Emacs already.
  • Something to put it all together into a coherent project, generate it, distribute it etc. A project such as haskell-ide-engine might be the perfect place to bring everything together.

Can I try it now?

Yes - instructions here. I've been using hie-core as my primary Haskell development environment since ZuriHac two weeks ago, and I like it a lot. However, beware:

  • The IDE doesn't load all the relevant files, only the ones you have open.
  • Integration with things like stack doesn't work very well - I've been using hie-bios in "Direct" mode - giving it the flags to start ghci myself. See my integrations for shake and hlint.
  • Features like hs-boot files and Template Haskell need more work to be fully supported, although a bit of Template Haskell has been observed to work.

These issues are being discussed on the hie-bios issue tracker.

Hypothetical FAQ

Q: Is something like FRP better than Shake for describing dependencies? A: I think it's clear that an IDE should use some dependency/incremental computation/parallel rebuilding approach. Shake offers one of those, and is well tested, exception safe, performant etc. The mapping from Shake to what we really want is confined to a single module, so feel free to experiment with alternatives.

Q: Who has contributed? Many many people have contributed pieces, including the whole team at Digital Asset, in particular Tim Williams, David Millar-Durant, Neil Mitchell and Moritz Kiefer.

Q: What is the relationship to haskell-ide-engine? My hope is this piece can slot into the other great things that have been done to make IDE tooling better, specifically haskell-ide-engine. This post is intended to start that discussion.

Tuesday, June 18, 2019

The One PR Per Day Rule

Summary: The rough rule I use for teams I'm on is make at least one PR per day.

One of the principles I've used quite successfully in a number of teams I've been involved with is:

Make at least one Pull Request per day

This principle nicely captures a number of development practices I consider important.

  • Most things should be reflected in code. If you spend a day coding, improving documentation, writing tests etc. there is a natural reflection in the code. If you spend a day helping someone through some problems, that probably indicates there is better documentation to be written. If you spend a day doing dev-ops, that should probably be reflected with Terraform files or similar. Not everything that matters produces code (e.g. organising an office party, immigration paperwork, attending a conference), but most things do.

  • Work incrementally. If a piece of code takes more than one day, it's a good idea to split it into smaller pieces that can land incrementally. It's always possible that after a few days work you'll realise your overarching idea wasn't great, but if you've polished up some libraries and added tests along the way, that still produced value.

  • Work with autonomy. I'm a big fan of giving developers as much autonomy as possible - discuss the broad goals and then let them figure out the details. However, with such freedom, it's much easier for things to go off in the wrong direction. Seeing incremental pieces of code every day gives a fairly good direction indicator, and allows problems to surface before a massive time investment.

  • Write reviewable code. If you have 20K lines in one big blob, there's no realistic way to review it. By splitting code into smaller, manageable, independent units it's much easier to review. More importantly, the reviewer should be able to say "No, that's not a good idea" - doing that to a days work is sad, doing it to a whole months work is brutal.

  • Foster collaboration. In a rapidly moving project, it's important that everyone is benefiting from other peoples incremental improvements, as otherwise everyone solves the same problems. By getting the code merged every day it's much easier for different people to contribute to an area of the code base, avoiding the problem of others staying away from a piece of code that someone else is working on.

  • Get feedback. If the end user is able to test the results every day that's even better, as it means they can be involved in the feedback loop - potentially refining what they actually want.

The "rule" isn't really a rule, it's more a statement of culture and principles, but one I have found concise and simple to explain. While I like this as a statement of culture, I do not measure it, as that would create all the wrong incentives.