Neil Mitchell's Blog (Haskell etc)

Saturday, March 10, 2007

Describing Haskell to an Ada programmer

Yesterday I was asked a question about functional programming from someone in my department. This person is in the real time systems group, and their programming language is Ada (although they are not really a programmer).

The question I was asked was about cache locality, but that question was just a red herring inspired by a complete misunderstanding about what functional programming was. The impression they had was that it was just Ada without procedures. So I tried to explain "in a nutshell" what functional programming was. It's not an easy task!

I didn't go on about things like purity and referential transparency - these concepts might mean something to an experienced software engineer (they can see how it would make nicer interfaces) or a mathematician (more like maths), but to someone who does timing analysis, it doesn't.

So I wanted to try and get across a few of the core concepts:

1) No assignment (technically no destructive assignment, but that is the only type they know). The example they wanted me to show them "in a functional style" was:

x := 1;
x := x + 1;

In a functional style there are a couple of ways to do this:

x = 1
y = x + 1

Of course, order is irrelevant

y = x + 1
x = 1

Neither really capture the idea they were looking for. However, in a purely functional style we don't write silly stuff, so thanks to the purity we can just write:

y = 2

The great thing is that we can know that the answer is 2, because beta reduction is valid.

So after (probably failing) to dismiss destructive assignment, I moved on:

2) No sequencing, no "one thing then the other". This was again quite hard to get across. I gave the example of quicksort, showing that sequence was not necessary.

I then challenged them to write quicksort in Ada, which is really quite hard to do. Obviously to do a quicksort only on Integer takes loads of lines before you even get to the sorting bit. Since I had been teaching generics earlier, this showed how superior Haskell is for generics.

I think I was quite a bit less successful at explaining this one. The only way you can show sequence is not necessary is by writing an OS, a compiler, a web server etc in Haskell - all of which have been done. Unfortunately I only had a small piece of paper so couldn't write any of those.

3) Then we got back to the question of timing analysis - if you spend all day doing timing analysis you start to believe that everything else is less important. I got asked if my PhD in "software engineering" was just writing programs. I took great pains to explain that programming language research is not writing an interface to a database in Java.

Anyway, if you want to spend all day doing timing analysis, Haskell is probably not the right language to pick :) So at this point I decided to screw with their mind:

I asked them to predict how long this would take:

sort "neil"

They guessed O(n log n), I said O(1). If you never need the result of sort, it never gets executed - take the example:

if null (sort "neil") then 1 else 0

The idea of lazy evaluation was starting to mess with their head, so they decided to try and rephrase the question. They are interested in the worst case time, so can you treat it as the time to fully evaluate all the components.

The answer is a definite no!

[1..]

The above expression takes infinite amount of time to evaluate

[1..] !! 3

Therefore under their interpretation, this takes an infinite amount of time, but it doesn't. It's much quicker than that.

I don't think they understand that many of the concepts about functional programming, or how to use it in practice, but I think they do now realise it is not just "another programming language" - its an entirely separate entity.

Tuesday, February 13, 2007

A missing Haskell feature?

Haskell 98 + Hierarchical modules is pretty much feature complete as far as I'm concerned. I realise that most people would demand to add higher ranked types and multi-parameter type classes, but the only extension I'm crying out for is pattern guards.

However, there is one operation that I think would be useful - pattern equality.

Consider the data type:

> type RegExp = [Atom]
> data Atom = Atom String | Star String

Now imagine you want to check that all atoms in a regular expression are Star's. Easy enough in Haskell:

> all isStar r

Unfortunately the definition of isStar - something that is conceptually very simple - is rather verbose:

> isStar (Star _) = True
> isStar _ = False

The reason is that in Haskell pattern matching is used to control program flow, not as an expression. One extension I often find myself wanting is some kind of "pattern match equality". Stealing the ~= operator for the moment, imagine:

> all (~= Star _) r

The ~= operator can be seen as pattern equality. It can be given a very simple semantics:

> x ~= y === case x of {y -> True; _ -> False}

Of course, now you've got pattern matching in there, its natural to allow fuller pattern matching:

> all (~= Star x | null x) r

Note that the original pattern match desugaring still allows this to be valid.

What do people think of this suggestion? Do people think it would be generally useful?

Disclaimer: This is not a carefully thought out feature proposal. Stealing ~= for syntax is completely unacceptable, something like <- would have to be chosen, if the grammar rules would permit it.

Wednesday, February 07, 2007

Logical implication in Haskell

Logical implication is a common enough operator, usually written "a => b" for a implies b. Haskell doesn't feature a => operator, it would be a syntax error because that symbol is reserved for class constraints. Both QuickCheck and SmallCheck feature incompatible ==> operators for implication.

But Haskell does provide a boolean implication operator, namely (<=)! Yes, by writing the implication symbol backwards, you get standard forward implication!

Sunday, February 04, 2007

Haskell and Set Theory

As part of my PhD, one thing I've been doing recently is specifying and working with set theory operations. Haskell is the perfect language for this, because in most cases the set theory can be implemented almost directly. Let's take a few examples to show how neat Haskell is at this.

I've done all the set theory in LaTeX - I guess most people who know set theory probably know LaTeX as well! I've also used xs/ys for sets, and x/y for elements from the sets. Although this isn't the way most mathematics would have been written, it does actually make it a bit more readable with some explicit typing :)

Subset

Subset is defined as:

xs \subset ys = \forall x \in xs, x \in ys

subset xs ys = all (\x -> x `elem` ys) xs
subset xs ys = all (`elem` ys) xs

The first version is the direct translation. We can of course make Haskell a bit closer to set theory with:

forAll = flip all

Now we have:

subset xs ys = forAll xs (`elem` ys)

That is a pretty direct translation of the mathematics.

Cross-product

One operation I wanted to specify recently was:

merge xs ys = { c | \exists x \in xs, \exists y \in ys, c = x <> y}

(where <> is an operation I had defined myself)

This time you can use Haskell's list comprehensions to obtain a translation:

merge xs ys = [x <> y | x <- xs, y <- ys]

In this case the Haskell is actually shorter the the corresponding mathematics.

Conclusion

I could give more examples of where the mathematics and the Haskell overlap closely, but they are almost too easy.

Almost every piece of set theory that I've tried to define in Haskell comes out looking very much like the original. This closeness allows me to work in mathematics, and then translate things to Haskell easily without worrying about having made mistakes in the translation.

The facilities of Haskell that make the set theory translation so direct include:

Higher order functions (i.e. all, forAll)
Sections (`elem` xs) being equal to \x -> elem x xs
Infix operators such as + and `elem`, along with priorities
Beautiful syntax
Lazy evaluation can make some operators more efficient (but is not incredibly important)

Wednesday, January 31, 2007

Writing code by types

In the past few days I've found that while I'm writing the code, the type checker is the one deciding what the code should be. To take an example, I was recently writing some SYB code using gfoldl:

> gfoldl :: (forall a b . Data a => c (a -> b) -> a -> c b) -> (forall g . g -> c g) -> a -> c a

I decided what c had to be:

> newtype C x a = C ([x], [x] -> (a, [x]))

(where "C x" was the instantiated type of c)

From there I didn't have much of a clue what I was doing, but by the time you've got that far, there is really virtually only one type correct program that can possibly work! Finding what that program was took a while, but once you have got this far you can start getting the type checker to write the code for you.

The first step is to write the basic definition, in such a way that Haskell will give you error messages if you don't get it right:

> myexpr :: a -> C x a
> myexpr = gfoldl cont base
>
> cont = undefined
> base = undefined

At this point the Haskell type checker can already check that you haven't made a mistake anywhere else. Once all is good you can start trying to "fill in the blanks". The first step is to tackle base:

> base :: a -> C x a
> base = undefined

Note that no code has been written, but you've checked that the type is valid. A :ttype on base would have given you the same information, assuming base was at the top level of the program, but this step is useful to check.

Next you can start filling in the undefined bits:

> base :: a -> C x a
> base a = C ([], \y -> (a,y))

By working this way it is relatively easy to build up a correct program, at each stage using small steps and regular type feedback to check you are going in the right direction.

The second time I was using this technique was when I was replacing one data structure with another, with some similar operations and some completely different ones. It wasn't as simple as changing the imports, but thanks to regular feedback from the type checker this went from being an exercise in thought to just going around mopping up the errors.

Programming like this is very powerful, giving lots of hints about what the user needs to do next, but would really benefit from some intelligent development tools. Epigram has some similar ideas built into the development environment, but Haskell doesn't.

Sunday, January 14, 2007

Debugging and crying

I'm now back from Hac 07, and I was sharing a room with Pepe, the GHCi Debugger Summer Of Code guy. I got to hear his presentation on the debugger he has completed, and got to see it "in action". He made some interesting comments - that Hat provides a much richer framework, and that when he originally tried to get Hat working he even went as far as replacing his OS (!) to try and get Hat working - but with no luck. He demonstrated his debugger, and the next day, in front of about 10 people I demonstrated Hat.

GHCi Debugger Demonstration

The GHCi debugger is much newer, and not officially released yet. Pepe started off with the demonstration, giving a few examples, but at various points I asked "what if?" questions, all of which he went and did. The debugger worked flawlessly, and produced useful information. It didn't produce as much information as Hat (particularly stack traces were missing), but it was solid and reliable. I only had one major gripe with it - that :quit was used for continuing from a breakpoint - but the next day a patch went in fixing that.

Hat Debugger Demonstration

In return, I tried to go through a pre-scripted demonstration of Hat. However, since the first scripting I had installed GHC 6.6 on the machine, so the compilation took about 100 times longer than I remember - no idea why. When I finally had a Hat file I was able to view it in Visual Hat. I then tweaked the file as people asked various questions - unfortunately that didn't work so well. Various things went wrong. I added a head call, to invoke a pattern match failure. Indeed, it did invoke a pattern match failure, in Hat itself! I tried some other stuff, files kept appearing in the wrong places. Stack traces were coming out not quite as you'd expect them. Everything went wrong.

Some of these issues may well be in the hat-make that I wrote recently and hackily, but others were more fundamental issues in Hat itself.

Conclusion

While Hat is without a doubt more powerful, it looses on reliability. Despite porting Hat to windows, and using Hugs fulltime, GHCi will be my debugger of choice. I don't use Hat because when I've got a bug to fix, I know getting through Hat will take at least an hour, and won't always be possible (Data.Map for example). The GHCi debugger is maybe 10% of Hat in terms of power, but 100% reliability looks like it will make up for this.

Friday, January 12, 2007

Hac 07: Deferred binary

As part of Hac 07 I have written a Deferred binary library. It's not released yet, since I'll need to change some bits, but I have managed to write a pickle/unpickle combinator which is pretty minimal. It can save to a file and load from a file in binary.

The first step is to take your data structure:

> data Computer = Laptop {weight :: Int, wireless :: Bool} | Desktop {software :: [String]}

Then you need to write your combinators:

instance BinaryDefer Computer where
bothDefer = serial
[\ ~(Laptop a b) -> unit Laptop <<> unit Desktop << a =""> Handle -> a -> IO ()
getDefer :: BinaryDefer a => Handle -> IO a

This gives you all the power to save and load your data structure. No need to manually assign tags, no need to write get/put versions, just one set of combinators. Needless to say, this was pretty tricky to set up!

But this is all stuff that belongs in a binary library, so what is a deferred binary library? Well perhaps there are a lot of software packages on a desktop, and usually the program doesn't care about them. Maybe 90% of the file will be the software packages, so 90% of the time will be wasted in reading the file. Of course, sometimes the software is queried, so it still has to be available. Using deferred binary, you can change the combinator to say which fields are loaded in a deferred manner:

\ ~(Desktop a) -> unit Desktop <<~ a

That extra ~ is the single change. Now the file format adapts to store the software list in a different place and it is read only when needed. Much more efficient, yet still the same simple underlying interface.

This is the binary library that Hoogle 4 will be using!

Hac 07: Searchable haddock index files

To see what I've been doing at Hac 07, go to: http://www-users.cs.york.ac.uk/~ndm/temp/haddock/doc-index.html

I have written a searchable index file for haddock, which adds very little file size, and has search-as-you-type features. It doesn't store a separate table of index names. It searches for prefixes only, in O(log n).

There was a version that did full substring searching in O(n), but unfortunately IE is too slow to do this for the base library. If you want that you are still going to have to use Hoogle.

These patches have been bundled up and sent off to Simon Marlow, so hopefully they'll be appearing in a Haddock near you soon!

Thursday, January 04, 2007

Does Haskell need macros?

Haskell has higher-order functions, and lazy evaluation, but not macros. Does Haskell need macros? For the purpose of this, I'm talking about macro in the "a little piece of code that expands out" sense, rather than the conditional compilation that macros quite often get used for in C++.

If Haskell needs macros then there must be some benefit to adding them. The reason that a lot of languages need macros is because they have strict evaluation. Consider trying to write a function that selects one of two arguments, based on the first condition variable - some sort of if:

iff a b c = if a then b else c

You can write this in Haskell. You can also write this in VB (and it is pre-defined), but it doesn't do what you might expect in VB. Haskell's lazy evaluation is a big win here, opening up entirely new opportunities.

So what are the reasons to add macros?

Speed

This is just a silly argument. First off, a pure function can always be inlined away, so performance is not an issue. A compiler might choose not to inline a function, but you can usually persuade it that it should with pragma's etc. And finally, there is the real possibility that you (the author of code) doesn't have a clue what should be inlined and is actually throwing away speed for premature optimisation.

Generate code

One thing that macros can do is to generate definitions, functions can be used to generate expressions (essentially a function is an expression with some pre-defined structure). You can't define a Haskell function that takes some data, and generates classes/instances etc. For example, imagine that you want to define lots of constructors for binary operations. Instead of:

data Expr = Add Expr Expr
| Mul Expr Expr
...

Here a macro might let you write something like:

data Expr = binary(Add) | binary(Mul) ...

Of course, this is something you really can't do with Haskell as it stands. However most of the ways you might use a macro like this are covered by Haskell. There are type synonyms, which may actually let you do the above trick with GADT's (I haven't tried, but its certainly feasible they could work like that, even if they currently don't). Classes has defaults which is another situation where otherwise this functionality might be required.

For most purposes generating code with macros is not about doing clever stuff, but about cutting out repetitive stuff. Haskell has (by design) cut out most of the repetitive tasks, so no longer requires macros. For truly repetitive tasks, Template Haskell can be used, or things like Drift and Scrap Your Boilerplate. All of these techniques reduce the number of places where macros might otherwise be required.

LHS of a pattern match

One place where expressions are present, but functions are not allowed is on the left hand side of a pattern match. For example:

f 3.14159265358979 = 2.71828182845905

Of course, it's silly to write this using hardcoded values, what if either of these constants change?

f pi = exp 1

Unfortunately without macros, this is not allowed in Haskell since pi is a function (a constant arity function, but still a function). You'd instead have to write:

f x | pi == x = exp 1

Not too painful, but what if you are trying to match a LHS with a free variable in it:

f (Neg (Neg (Neg a))) = Pos $ Pos $ Pos a

Here you are matching a deeply nested constructor, and on the right have to explicitly bracket the terms. However on the right you are able to use the function ($) which due to precedence issues removes the need for brackets. One thing you can't do however is:

f (Neg $ Neg $ Neg a) = Pos $ Pos $ Pos a

If ($) was a macro, this would be valid however.

Haskell has no feature here that can reduce the redundancy in the pattern match.

But while macros would be useful here, how far would standard C style macros take us? Let's consider defining a useful auxiliary:

apply 0 f x = x
apply n f x = f (apply (n-1) f x)

Now we can rewrite this as:

f (Neg (Neg (Neg a))) = apply 3 Pos x

But we really can't write:

f (apply 3 Neg a) = apply 3 Pos x

However, its entirely possible to come up with a consistent semantics for this, so maybe one day this might be something people decide is useful.

For the moment the lack of macros on the LHS is occasionally annoying, but generally is not a feature that anyone is crying out for. The LHS is rarely sufficiently complex for macros to show great benefit.

Type checking with rank-2 types

While standard Haskell (mainly, ignoring MR and seq - for those who care) has the fold and unfold law, i.e. you can inline a function and then contact it back, once you introduce rank-2 types you loose this property. Unfold still holds, but fold doesn't. This leads to lots of typing problems for things like runST which present rank-2 types to the user.

While macros would paper over the cracks here, the real solution probably lies in the type system, which I believe GHC has in various flavours of impredictive polymorphism. The other solution is not to use rank-2 types, something I have been happily avoiding for years now.

Conclusion

Despite macros being useful in languages like Lisp, in Haskell their usefulness is probably minimal. Lazy evaluation is often a substitute for macros, and where other languages require macros to define domain specific languages, Haskell solves these problems naturally.

Macros are quite a painful cost to a language, they reduce the assumptions that people can make and require people to play hunt the macro. Haskell really doesn't need them!

Saturday, December 30, 2006

End of year update

Since most people are doing a year summary, I thought I should do one. Then I remembered that I can barely remember what I did yesterday (for example today I went in to work only to discover it was a Saturday...). So instead I'm going to try and predict what I'll do in 2007, with the proviso that it's a complete pile of fiction, but some of it might actually happen.

PhD + Catch

2007 should be the year I complete my PhD. Its certainly the last year I get any funding for my PhD. With aiming to complete my PhD, I also want to release a really good version of Catch - the Hsakell pattern match checker. I've been working away on this for over 2 years now, so would like something I can point at. I'm hoping Catch will be out by March, but I realise I said "a few months" at least 9 months ago now - it turns out the Halting Problem is quite hard...

Hoogle 4

I think Hoogle 4 has been knocking around for about 8 months now. I'd actually like to get this finished, and be the main version of the website. This brings just a complete polishing of Hoogle. Hoogle seems to be a well used tool, and hopefully Hoogle 4 will just make it better at everything and much more reliable.

Yhc

One project which I really want to polish off and put a release out of is Yhc. Yhc has lots of really exciting features, and is evolving pretty fast. The Yhc Javascript stuff is particularly interesting - the demos just keep getting better and better. The Yhc.Core side of things is looking really impressive. My optimisations stuff is starting to look really good. If I ever finish WinHaskell, Yhc might become a very nice choice for the default compiler.

Parsing

I really really want to finally implement my parser. Unfortunately, just taking a look at the things above this makes that rather unlikely. If you add in distractions like Hat, WinHaskell, WinHugs etc then it looks a lot less likely. As such I'm going to make the target to formalise my state machine algorithms and write a state machine library in Haskell which I can use as the basis for my parser.

I've tried to keep this list as short as possible, to actually give me a chance of doing some of it. We'll see if that happens.

Saturday, December 23, 2006

Hoogle Progress, HsMan features

About 2 months ago Frederik Eaton posted "HsMan" to the ghc list, a program that with one command line could look up the documentation for a function. Now Hoogle 4 has that feature :)

$ hoogle +filepath combine /info /doc
System.FilePath.Version_0_09.combine :: FilePath -> FilePath -> FilePath

Combine two paths, if the right path isAbsolute, then it returns the second.

Posix: combine "/" "test" == "/test"
Posix: combine "home" "bob" == "home/bob"
Windows: combine "home" "bob" == "home\\bob"

http://www-users.cs.york.ac.uk/~ndm/projects/filepath/System-FilePath-Version_0_09.html#v%3Acombine

What the above says is search the filepath module for the name combine, and when you find it, with the one that ranks best, display the haddock entry for it (info) and a link to further documentation (doc)

It was a reasonable amount of work to add, but there is a good reason for adding this feature, which will become clear when Hoogle 4 web version is relased :)

Friday, December 22, 2006

Hoogle 4 progress

I'm slowly working through the things I want for Hoogle 4, and its coming along quite nicely. A quick tour of the features I've already implemented:

Easier generation of Hoogle Databases

First create a Hoogle file using Haddock and Cabal:
> runhaskell Setup haddock --hoogle

Then convert it to a Hoogle binary database:
> hoogle /convert=base.txt

This will produce base.hoo, an optimised Hoogle database.

Faster text searches

Most searches are text searches, they used to linearly scan through the data, now they build a nice efficient index and hop directly to the relevant bit. This makes text searches over 100 times faster, and means they scale linearly with the number of results and the size of the text, rather than previously where they scaled with the size of the database.

As a result of having faster searches, I can now do multi-word searches, searching for two different words, and then matching between them. i.e. you can now search for "concat map" (or "map concat") and get concatMap returned.

Proper API

There is now a proper Hoogle API, which the Hoogle command line tool and web tool call out to. This will hopefully people can layer new things on top of Hoogle more easily.

Multiple package searching

You can now search within multiple packages at a time. For example "drop +base +filepath" will search both the base and filepath modules, as though they were one.

Restricting on module name

As requested you can now restrict searches by module name, for example "Prelude.map" and "map +Prelude" will both do a search for the map function that resides in the Prelude.

Proper parsing

The parser is written in parsec now, much more reliable! And also much better error messages, hopefully this should make it easier to extend Hoogle in the future as well.

The Future

I broke quite a lot with the complete revamp of the code, so the idea is to fix up what I broke and try and push a new release out relatively soon.

Thursday, December 21, 2006

Will produce patches for PhD

I've just got back to my home, and am doing darcs pull's against all the repo's around to get my work up to date. I noticed Yhc has over 1000 patches, and then I pulled my primary PhD repo - and found I have 1626 patches! I have at least 100 more in a separate repo I use for one chunk of the code, and was working in CVS for over a year before moving to Haskell.

The Darcs repo for my PhD at home seemed dead, so I did a fresh darcs get, no checkpoints. It takes a while...

Friday, December 15, 2006

Advertising Haskell

A lot of posts on the mailing list at the moment seem to be about how we can promote Haskell to a greater audience. The idea of this post is to figured out why we might want to promote Haskell to a greater audience, which I think will give us a clue where to go. If Haskell was more popular then:

More Haskell jobs would emerge
People would find it easier to introduce Haskell in a work place
Less C in the world
More reliable software
Elegance and beauty

Of course, there are also downsides to Haskell being more popular:

Bigger communities aren't always as friendly
Commercialisation will loose some benefits of Haskell (open source etc.)
The volume of beginners will outweigh the experienced professionals
Managers might push Haskell for entirely unsuitable things

So, question, do we want Haskell to be more popular. At the moment I quite like the fact that when programming I can be much faster than other people, and have far fewer errors. It would however be nice if I could get a job doing Haskell at some point, rather than becoming a C monkey once more, which will be doubly painful after experiencing so much Haskell.

My personal view on how Haskell can be made more popular:

Remove features, generalise features, do not add features. I don't understand/know rank-n types, template Haskell, GADT's, impredictive polymorphism, arrows... Beginners don't want to know either! Getting the features right is good, which may mean adding features, but not necessarily.
Continue to be polite and helpful
Promote Haskell as a programming course
Promote Haskell to experienced programmers looking for something new
DO NOT promote Haskell in industry, if you force this on to people, they'll just discover we are not ready for this yet!

In my opinion, we do plenty of advertisments already. There is not nearly enough coding going on, Cabal could benefit from extra contributors, I know Yhc could etc. When the tools are ready the users will come, and then if they don't come naturally, we can prod them. Until that time the steady flow of new users is plenty.

Monday, December 11, 2006

bhc: Basic Haskell Compiler

I have been thinking Compilery thoughts for a while now, so I thought I'd jot them down. I am starting the "bhc" project here and now! The intention is that bhc will be a concept, and some thoughts, for at least another year - definately not anything as concrete as code!

The current Haskell compilery things are:

Hugs - fast to compile, slow to run
GHC - slow to compile, fast to run
Yhc/nhc - medium at both
Jhc - aiming for super slow to compile, super fast to run
...

What I would like to do is create a new front end for Yhc, taking the Yhc.Core and Yhc.ByteCode bits as they stand, and replacing the front end. I'd also like to base the entire code on libraries, rather than compiler passes, and make everything entirely reusable. I also want to aim for simplicity, elegance and readability.

So, first things first. Haskell is really the only language to implement this in. After that you have a few stages until you get to the Yhc.Core language:

Make (Cabal should do this)
Parsing
Name resolution
Type checking
Dictionary desugaring
Desugaring

There is only one way to write a parser, using my parsing system (not finished, barely even started). There is only one way to write a type checker, using my type checker generator (not started at all, barely even specified, not even a link, definately not proven!). Name resolution isn't that hard. Dictionary desugaring should use the special method of Catch (same as Jhc, I believe). Desugaring is trivial with Play classes (warning, if you follow that link, not finished!), I also want to have invertable desugaring, for analysis purposes. The parsing and typechecking would be standalone libaries.

Two of these things need to be written first, but thats part of the fun :)

Note that type checking does all the typey stuff, dictionary desugaring uses the types. Nothing else uses types, and in fact, I think this compiler should be untyped. (I know no one will agree with me, all I can say is that I think I'm right, but realise everyone thinks I'm wrong)

The next big decision is file formats: I would like to have a .bho (basic haskell object) file which corresponds to a single module, and a .bhl (basic haskell library) which is a whole library, and a .bhx (basic haskell executable) which is a whole program. Inside a .bho you would have (all items are optional):

Full original source code to the module
Desugarred source code, in Yhc.Core format
Bytecode, in Yhc.ByteCode format

A .bhl would have those three things, but linked together within a library. A .bhx would have all of that, including all the libaries, linked in as one.

I would also write an optimser, for whole program analysis, which took a .bhx, and produced an equivalent one. Probably also a compiler to C code, for super-fast execution.

So what in this compiler would be new?

Focus on libraries, Yhc.Core, Yhc.Parse, Yhc.ByteCode, Yhc.TypeCheck
Invertable desugaring
Extensive use of the Play class
Better library story (one library, one file)
Standalone crossplatform executables
Fast whole program analysis
Brand new parsing system
Untyped Core language (compared to other optimising compilers)
Simple

So, who wants to have some fun?

Tuesday, December 05, 2006

Generalised Haskell

In the past few days I've been working on a generalised version of Haskell, designed for program optimisation and transformation. I've also implemented the start of a Haskell optimiser, but in about 100 lines of Haskell, using this generalised form of Haskell.

Generalised Haskell is very much like a normal Haskell Core language, with less restrictions in pattern matching. In normal Haskell you would write "map" as:

> map f [] = []
> map f (x:xs) = f x : map f xs

Which would be compiled into:

> map f x = case x of
> ____ [] -> []
> ____ (x:xs) -> f x : map f xs

The rule for pattern matching is that you take the first rule that matches. Generalised Haskell relaxes this slightly to take any one rule that matches, but not necessarily in order. The second relaxation is that when pattern matching on the left hand side, functions can appear as patterns.

For example, you can now define:

> map id x = x
> map f x= case x of
> ____ [] -> []
> ____ (x:xs) -> f x : map f xs

Note that the "map id" case looks very much like GHC's Rules feature. The advantage for optimisation is that many activities can be encoding using these pattern matches - arity raising, specialisation, deforestation.

I am still working on a tool that uses this technique for optimisation, but initial results look quite promising, while the code remains impressively short.

Friday, November 17, 2006

HsInstaller

I've given this message and idea to a few people already, via email and IRC, so I thought maybe I should share it slightly wider and get feedback from people.

I am intending to start a project called HsInstaller, to produce Windows installers for:

Hat, Yhc, WinHugs and any Cabal program (definately Hoogle, hoepfully Happy, Alex, Haddock etc.), maybe GHC and Gtk2Hs if thats possible. This project does NOT hope to create installers for libaries, just standalone executable programs. A wrapper can be produced for libraries that just calls the underlying Cabal infastructure much more easily.

At the moment the Haskell community has several different installers, the C based one I wrote for WinHugs that generates a .exe, and the GHC one you have that generates a .msi, and the Gtk2Hs one that uses InnoSetup. There are advantages and disadvantages to all approaches, but since the world is going steadily towards .msi's that might be the best format to standardise on. Of course, it might not be the best format, discussion is welcome!

Unfortunately the installer code used by GHC is not publically available, one consequence of this is that GHC does not have snapshot installers for Windows, and in fact a GHC release is not prepackaged for Windows - this is something done later.

So, my general thoughts are to try and see if a copy of the GHC installer can be obtained (this is apparently likely in the future). Collect all the installers that everyone uses, sit down with everyone who wants an installer and figure out what we have, and what we want. Once thats done we can start implementing something that everyone can reuse easily.

Anyone have any thoughts? I have about 4 projects that need installers (Hat, Yhc, WinHugs, Hoogle), but it would be nice if everyone could be satisfied by the results.

Monday, November 13, 2006

Dr Haskell progress

I've been working on Dr Haskell for a little bit, and its now quite a bit cleverer. For those who don't know, Dr Haskell tries to make helpful suggestions about your code. For example, if you use "concat (map f x)", it will suggest you use "concatMap f x". It can also do other clever things, for example:

not (a == b) ===> use /=

if x `elem` xs then True else x == 1 ===> use an or

if bracket then "(" else "" ===> ['(' | bracket]

The idea is to introduce beginners to new functions or patterns of Haskell that they might not have been aware of, without requiring too much thought or learning, until you have identified such an instance. I think of this as a nice complement to Hoogle, since Hoogle caters for the case when beginners have already identified where a common pattern may lie.

Of course, the biggest weakness in Dr Haskell is that it matches at the expression level only, a beginner might write:

takeheads [] = []
takeheads (x:xs) = head x : takeheads xs

(And indeed they do, I've seen that 4 times already today!) An experience programmer would recognise that as a map, and now, so will Dr Haskell! It will report "specialised version of map used as takeheads, line number n".

Some other things that Dr Haskell can now spot:

sum :: [Int] -> Int
sum [] = 0
sum (x:xs) = x + sum xs
(its a foldr, as written above - although I realise foldl' is a better choice for sum)

reverse xs = rev [] xs
where
rev acc [] = acc
rev acc (x:xs) = rev (x:acc) xs
(rev is a foldl)

Dr Haskell can now spot all of these, and has the power to spot lots more patterns using recursion as well as expression matching. I think this could be really helpful for beginners, if anyone has any interesting ideas about this please let me know. I hope to package up a release in the next few days.

Sunday, November 12, 2006

Library idea: the Safe library

Often Haskell programmers end up with "error, head []" - a totally useless error message!

Because in developing Catch I have 10,000's of lines, that would be a real pain. To combat this I have:

headNote :: String -> [a] -> a
headNote err [] = error $ "headNote, " ++ err
headNote err (x:xs) = x

I also have fromJustNote, lookupJust (fromJust . lookup), assertNote etc - loads of useful safe functions. I also have headMaybe :: [a] -> Maybe a. Perhaps this could be useful as a library - the safe library?

I would imagine lots of functions from the various Prelude/List libraries, some with Note versions taking a parameter to give a better error message, some with Maybe versions to return a Maybe on failure.

Anyone think this would be useful? Its technically trivial to implement, but I don't have enough time to manage this, so perhaps someone else would like to take this one on?

Friday, November 10, 2006

System.FilePath, automated testing

I just released System.FilePath, a library for manipulating FilePath's on both Windows and Posix. The problem with a library like this is that there are lots of corner cases, lots of weird situations and everything needs to be tested twice with different semantics. Obviously this requires a test infrastructure different from most other libraries!

I started off with a separate file for writing properties, but quickly found that the properties were the best type of documentation for a function. It was also a pain to keep two different places which identify what the logic of the code is - i.e. the code and the tests. The obvious idea then is to combine the code, documentation and testing into one. Because I am using Haddock that turned out to be quite easy to do - any line beginning with "-- > " is a test. Haddock sees this as monospace formatting, Haskell as comment, and my test generator can find the tests pretty easily.

Within the System.FilePath repo I have a DOS Batch file driver (test.bat) which uses a separate Haskell program (GenTests.hs) to create a test script and run it.

Within the test listing there are some tests which I refer to as constant, and some as properties. Some of the tests have no free variables, these are constant - the GenTests recognises them and outputs them directly - in one execution they either pass or fail. The properties are just standard QuickCheck properties, with the restriction that every multi-letter keyword not in a certain known set is a function in the library, and every variable x..z is a FilePath (hence using a custom FilePath generator).

The main complication in testing from System.FilePath is the fact that every property corresponds to two different tests - one on the Posix implementation, one on the Windows implementation. The translator automatically does this duplication, unless either Posix: or Windows: is given at the start of the test, in which case the test is only executed on the appropriate version.

For QuickCheck testing I defined a FilePath to be a 25 character string, from the following set of characters "?|./:\\abcd 123;_". The idea of this set of characters is to include every character that any aspect of the library treats differently, along with a small selection of "normal" letters/numbers.

There was one modification I had to make to QuickCheck, by default QuickCheck returns successfully outputting success/failure to the console. Unfortunately if an automatic program is executing over 200 tests, then these messages can get obscured in the general noise - this happened more than once. To combat this I defined a new QuickCheck wrapper which calls error on failure. Ideally the signature of quickCheck should be changed to :: .. -> IO Bool to detect these situations and allow the driver script to fail more obviously.

Without QuickCheck I don't think it would have been possible to write the FilePath library - it caught too many obscure bugs that manual testing would never have found. In addition, QuickCheck forced me to think about the properties of the library more closely - I changed some design decisions after it turned out that the properties disagreed with me. The one thing QuickCheck helped with more than anything though was refactoring - despite a massive number of the functions all depending on each other, QuickCheck allows me to change the behaviour of one function in some obscure case and check that no other function was relying on that.

The only criticism that can be levelled at my use of QuickCheck is that failing examples are not minimal, in fact they are exactly 25 characters long. I hope that at some point soon I can make use of SmallCheck (once it has a darcs repo and a .cabal file) to do testing alongside QuickCheck to get a greater depth of coverage.

All the test scripts I have written are available in the darcs repo, under the BSD3. If anyone can make use of them, I'd be happy to have someone take them forward!