Monday, December 11, 2006

bhc: Basic Haskell Compiler

I have been thinking Compilery thoughts for a while now, so I thought I'd jot them down. I am starting the "bhc" project here and now! The intention is that bhc will be a concept, and some thoughts, for at least another year - definately not anything as concrete as code!

The current Haskell compilery things are:

  • Hugs - fast to compile, slow to run
  • GHC - slow to compile, fast to run
  • Yhc/nhc - medium at both
  • Jhc - aiming for super slow to compile, super fast to run
  • ...


What I would like to do is create a new front end for Yhc, taking the Yhc.Core and Yhc.ByteCode bits as they stand, and replacing the front end. I'd also like to base the entire code on libraries, rather than compiler passes, and make everything entirely reusable. I also want to aim for simplicity, elegance and readability.

So, first things first. Haskell is really the only language to implement this in. After that you have a few stages until you get to the Yhc.Core language:

  • Make (Cabal should do this)
  • Parsing
  • Name resolution
  • Type checking
  • Dictionary desugaring
  • Desugaring


There is only one way to write a parser, using my parsing system (not finished, barely even started). There is only one way to write a type checker, using my type checker generator (not started at all, barely even specified, not even a link, definately not proven!). Name resolution isn't that hard. Dictionary desugaring should use the special method of Catch (same as Jhc, I believe). Desugaring is trivial with Play classes (warning, if you follow that link, not finished!), I also want to have invertable desugaring, for analysis purposes. The parsing and typechecking would be standalone libaries.

Two of these things need to be written first, but thats part of the fun :)

Note that type checking does all the typey stuff, dictionary desugaring uses the types. Nothing else uses types, and in fact, I think this compiler should be untyped. (I know no one will agree with me, all I can say is that I think I'm right, but realise everyone thinks I'm wrong)

The next big decision is file formats: I would like to have a .bho (basic haskell object) file which corresponds to a single module, and a .bhl (basic haskell library) which is a whole library, and a .bhx (basic haskell executable) which is a whole program. Inside a .bho you would have (all items are optional):

  • Full original source code to the module
  • Desugarred source code, in Yhc.Core format
  • Bytecode, in Yhc.ByteCode format


A .bhl would have those three things, but linked together within a library. A .bhx would have all of that, including all the libaries, linked in as one.

I would also write an optimser, for whole program analysis, which took a .bhx, and produced an equivalent one. Probably also a compiler to C code, for super-fast execution.

So what in this compiler would be new?

  • Focus on libraries, Yhc.Core, Yhc.Parse, Yhc.ByteCode, Yhc.TypeCheck
  • Invertable desugaring
  • Extensive use of the Play class
  • Better library story (one library, one file)
  • Standalone crossplatform executables
  • Fast whole program analysis
  • Brand new parsing system
  • Untyped Core language (compared to other optimising compilers)
  • Simple


So, who wants to have some fun?

4 comments:

Robin said...

Hi Neil,

In terms of related work, make sure you read up on EHC and especially Ruler first. See:

http://www.cs.uu.nl/wiki/Atze/WebHome

I haven't looked at EHC's codebase at all really yet (but I'm planning to dive in and fix my first EHC bug this week), so I don't know whether EHC has any resemblance to what you're after. Ruler might, though - it generates code from type rules.

Robin said...

Hi Neil,

In terms of related work, make sure you read up on EHC and especially Ruler first. See:

http://www.cs.uu.nl/wiki/Atze/WebHome

I haven't looked at EHC's codebase at all really yet (but I'm planning to dive in and fix my first EHC bug this week), so I don't know whether EHC has any resemblance to what you're after. Ruler might, though - it generates code from type rules.

Anonymous said...

hi Neil,
I see the rationale! (is that weird?)
we want .NET and Java bytecode!
(now that Java is GPL, there will hardly be any Linux distro without it from now on)
regards,
Imam

Anonymous said...

Hi Robin, thanks for the tip. I've seen Ruler before, it might be what I was aiming for.

Hi Imam, yes, definately JVM and .NET bytecode back ends. I think I can propably keep the .NET and JVM stuff from Yhc already in this new compiler.