Friday, October 18, 2019

Improving Rebindable Syntax

Summary: Rebindable syntax is powerful, but sometimes too flexible. I had some ideas on how to improve it.

In Haskell, when you write 1, GHC turns that into GHC.Num.fromInteger 1, knowing that the binding is GHC.Num.fromInteger :: Num a => Integer -> a. If you want to use a different fromInteger you can turn on the RebindableSyntax extension, which uses whichever fromInteger is in scope. While I was working at Digital Asset on DAML, we built a GHC-based compiler with a different standard library. That standard library eliminates the Char type, has a packed Text instead of String = [Char], doesn't have overloaded numeric literals, renames Monad to Action and other changes. To get that working, we leveraged RebindableSyntax along with a module DA.Internal.RebindableSyntax which is automatically imported into every module unqualified (via a GHC source plugin).

With RebindableSyntax you can get a long way building your own base library, but there were two unpleasant parts:

  • When using RebindableSyntax, if the user writes let fromInteger = undefined in 1 then they use the fromInteger they defined in the let, shadowing the global one. For users who turn on RebindableSyntax deliberately, that's what they want. However, if you want to replace the base library and make it feel "just as good", then you'd rather than fromInteger was always some specific library you point at.
  • In the process of building fresh base libraries, we had to follow all the (pretty complex!) layering choices that GHC has made about how the modules and packages form a directed acyclic graph. There are some modules where using integer literals would cause a module cycle to appear. The fact that a number of fully qualified names are hardcoded in GHC makes for a fairly tight coupling with the base libraries, that would be better avoided.

I had an idea to solve that, but it's not fully fleshed out, and as of now, it's not a problem I still suffer from. However, I thought it worth dumping my (potentially unimplementable, certainly incomplete) thoughts out for the world, in case someone wants to pick them up.

My idea to solve the problem was to add a flag to GHC such as -fbuiltins=base.Builtins which would specify where to get all builtins. You could expect the base library to gain a module Builtins which reexported everything like fromInteger. With that extension, using RebindableSyntax is then saying "whatever is in scope", and using -fbuiltins is saying "qualify everything with this name" - they start to become fairly similar ideas. I see that has having a few benefits:

  1. It becomes easier for someone to write a principled standard library which doesn't have String = [Char], or whatever choice wants making, in a way that provides a coherent experience. One example is DAML, but another is the foundation library, which uses RebindableSyntax in its example programs.
  2. The lowest level GHC base libraries can be restructured to use RebindableSyntax as a way to more easily manage the dependencies between them in the base libraries themselves, rather than a cross-cutting concern with the compiler and base libraries. (This benefit might be possible even today with what we already have. Some people might strongly disagree that it's a benefit.)
  3. Things like which integer library to use can become a library concern, rather than requiring compiler changes.
  4. Currently the code path for RebindableSyntax is always quite different from the normal syntax path. As a result, sometimes it's not quite right and needs patching.

The main obvious disadvantages (beyond potentially the whole thing not being feasible) are that it would cause the compiler to slow down, as currently these types are hard-wired into the compiler.

Sunday, October 13, 2019

Monads as Graphs

Summary: You can describe type classes like monads by the graphs they allow.

In the Build Systems a la Carte paper we described build systems in terms of the type class their dependencies could take. This post takes the other view point - trying to describe type classes (e.g. Functor, Applicative, Monad) by the graphs they permit.

Functor

The Functor class has one operation: given Functor m, we have fmap :: (a -> b) -> m a -> m b. Consequently, if we want to end up with an m b, we need to start with an m a and apply fmap to it, and can repeatedly apply multiple fmap calls. The kind of graph that produces looks like:

We've used circles for the values m a/m b etc and lines to represent the fmap that connects them. Functor supplies no operations to "merge" two circles, so our dependencies form a linear tree. Thinking as a build system, this represents Docker, where base images can be extended to form new images (ignoring the newer multi-stage builds).

Applicative

The Applicative class has two fundamental operations - pure :: a -> m a (which we ignore because its pretty simple) and liftA2 :: (a -> b -> c) -> m a -> m b -> m c (most people think of <*> as the other fundamental operation, but liftA2 is equivalent in power). Thinking from a graph perspective, we now have the ability to create a graph node that points at two children, and uses the function argument to liftA2 to merge them. Since Applicative is a superset of Functor, we still have the ability to point at one child if we want. Children can also be pointed at by multiple parents, which just corresponds to reusing a value. We can visualise that with:

The structure of an Applicative graph can be calculated before any values on the graph have been calculated, which can be more efficient for tasks like parsing or build systems. When viewed as a build system, this represents build systems like Make (ignoring dependencies on generated Makefiles) or Buck, where all dependencies are given up front.

Selective

The next type class we look at is Selective, which can be characterised by the operation ifS :: m Bool -> m a -> m a -> m a. From a graph perspective, Selective interrogates the value of the first node, and then selects either the second or third node. We can visualise that as:

We use two arrows with arrow heads to indicate that we must point at one of the nodes, but don't know which. Unlike before, we don't know exactly what the final graph structure will be until we have computed the value on the first node of ifS. However, we can statically over-approximate the graph by assuming both branches will be taken. In build system terms, this graph corresponds to something like Dune.

Monad

The final type class is Monad which can be characterised with the operation (>>=) :: m a -> (a -> m b) -> m b. From a graph perspective, Monad interrogates the value of the first node, and then does whatever it likes to produce a second node. It can point at some existing node, or create a brand new node using the information from the first. We can visualise that as:

The use of an arrow pointing nowhere seems a bit odd, but it represents the unlimited options that the Monad provides. Before we always knew all the possible structures of the graph in advance. Now we can't know anything beyond a monad-node at all. As a build system, this graph represents a system like Shake.