Neil Mitchell's Blog (Haskell etc)

Wednesday, April 25, 2007

More Hoogle progress

I haven't done much on Hoogle 4 in quite a while, and since I keep telling people that various bugs will be "fixed in Hoogle 4", it seems only polite to fix those bugs in the near future. I decided that this evening I would focus on Hoogle 4, since I was slowly going insane from writing academic papers. I'm now just about to go to bed, so thought I would summarise what I did.

One of the things that has been slowing me down on Hoogle is the fact that I rarely get a chance to work on it, and when I do that means I have to refamiliarise myself with the code base. Each time I do this, I forget something, forget what I was working for, and generally end up slowing down development. It also means that working on Hoogle for "an odd half hour" is not practical, it takes at least a couple of hours to get up to speed. In order to solve this, I've done some diagrams of the module structure, and the main Hoogle database. They may not mean much to anyone else, but I hope they are enough to keep my mind fresh with where things go.

As part of documenting the module structure, it also gives a chance to reflect on where the module structure is a bit unclear, and modify as necessary. Previously the module structure had DataBase and Item each being dependent on each other - with tricks like having an extra field in Item which was of a parametrised type, i.e:

data Item a = Item {name :: String, typeSig :: TypeSig, other :: a}

Fortunately now all that is gone, and the structure is a lot more logical.

The other thing I've been doing is moving to the Deferred Binary library. Part of the reason for Hoogle 4 becoming bogged down is the creation of a central database that is loaded on demand, and contains tries etc to perform efficient searches. Unfortunately, this loading on demand aspect had infected the entire code, resulting in a big IO Monad mess. It was slightly more pleasant than coding in C, but all the low-level thoughts had seeped through, requiring careful though, slow progress and frequent mistakes.

To take one example, the Hoogle.DataBase.Texts module, which is responsible for performing all textual searches and storing the lazily-loaded trie. Compare the old version and the current version. The code has shrunk from 186 lines to 65 - of which only about 30 are real code. The IO Monad is nowhere to be seen - everything is pure. The really low-level functions such as sizeTree and layoutTree have gone entirely.

I'm not sure when I will next be able to get to Hoogle 4, but next time I hope to start working on it faster, and that the cleaner code makes for happier coding!

Note: I have started displaying adverts on this blog because it was only a button click away (very impressive integration by Google), it seemed like a fun thing to try and that the adverts in Gmail are sometimes quite helpful. I have deliberately picked to not show too many adverts, placed them off to the side where they won't interfere with reading and tried to keep it minimal - but if they are annoying please let me know. I am going to donate whatever money (if any) from this to charity - but please don't take that as an incentive to artificially click on adverts you aren't interested in.

Monday, April 23, 2007

Boilerplate considered harmful

At the moment I'm working on a boilerplate removal for Haskell, which is faster (runtime), shorter (code), more type safe and requires fewer extensions than Scrap Your Boilerplate (SYB). However, since I haven't finished and released a stable version, I can't really recommend people use that. The reason I started working on this is because I was unaware of SYB when I started. Last week I also introduced someone to SYB, who had done quite a bit of Haskell programming, but had not stumbled across SYB. As a result, I think it needs a bit more attention - SYB is one of the strengths of Haskell!

Disadvantages

Before saying how great SYB is, its important to point out the things that make it not so great:

Only implemented in GHC - sorry to the poor Hugs users
Requires rank-2 types, which means its not actually Haskell 98
Occassionally the rank-2-ness infects the code you write, with unfortunate error messages (although this is not that common)

A data structure

Before showing some operations, I'm going to first introduce a data structure on which we can imagine operations are performed. I don't like the example from the SYB benchmark - it feels like an XML file (as is the stated intention), which means that the logic behind it is a bit disturbed. So instead I'll pick a data type like an imperative programming language:

{-# OPTIONS_GHC -fglasgow-exts #-}
import Data.Generics

data Expr = Var String | Lit Int | Call String [Expr] deriving (Data, Typeable)
data Stmt = While Expr [Stmt] | Assign String Expr | Sequence [Stmt] deriving (Data,Typeable)

We define the data type as normal, adding deriving for Data and Typeable - the two key SYB types. We also add an import and a flag, just to get the GHC machinery working for the derivings.

Queries

So lets imagine you have to get a list of all literals. In SYB this is easy:

extractLits :: Data a => a -> [Int]
extractLits = everything (++) ([] `mkQ` f)
where f (Lit x) = [x] ; f _ = []

Wow, easy! This function will operate on anything which has a Data instance, so you can run it on an Expr, Stmt, [Stmt], [Either Stmt Expr] - the choice is yours. For the purposes of a short introduction, I'd recommend treating all the bits except the "f" as just something you write - read the full SYB paper to get all the details of what everything can be used for.

Traversals

Now lets negate all the literals, we have:

negateLits :: Data a => a -> a
negateLits = everywhere (mkT f)
where f (Lit x) = Lit (negate x) ; f x = x

Again, its pretty easy. And once again, consider all apart from the "f" as just something you write.

The gains in code reduction that can be made with SYB are quite substantial, and by removing the boilerplate you get code which can be reused more easily. Boilerplate code is bad, and should be removed where necessary.

Thursday, April 19, 2007

Visual Basic, the story

Today on the Haskell mailing list, someone confused VBScript and VBA. Its not really surprising, the "Visual Basic" or "VB" name has been attached to at least 3 entirely different programming languages! This is an attempt to clear up the confusion. My first programming language (after .bat) was Visual Basic, and I loved it. I've worked on Excel/Word programs using VB, written applications for money in VB, and lots more besides - its a great language. Often people criticize VB for being "under powered" or a "beginners language" - those people are wrong - but that's not something I'm going to cover here :-) [Technical note: I may have got some of this wrong, I was a user of the language - not an implementer]

Visual Basic 3

We'll start the story at Visual Basic 3, because that's the first version I ever used. My Sixth Form would copy it to floppy disk if you bought in 3, which is basically school sponsored piracy. Anyway, this language was an imperative language, with build in support for graphical interfaces.

You can spot a VB 3 program a mile away, they look clunky and alas VB made it really easy to change the background colour of a form. The ability to easily change the colours made it a hotbed for garish colours - does you inhouse application have pastel shades of red? If so its either a C programmer who went to a lot of work, or more likely, a VB programmer without any sense of style. I've seen these apps running on point-of-sale terminals in some shops.

Visual Basic 5/6

I'm going to lump these two versions into one, there were some changes, but not many. Continuing the same trend as VB 3, but moving towards being an object orientated language. This language had a very strong COM interoperation - all VB objects were COM objects, all COM objects could be used in VB. The COM ability made it ideal for many things, and opened up lots of easy to use libraries. All windows become objects, and some of the clunky bits in VB 3 got depreciated in favour of the newer methods, but most programs still worked exactly as before.

Alas VB 6 still allows the background colours of a form to be set, but if you don't (please don't!) the program looks very Windows native - much more than Java or even C#.

Visual Basic for Applications (VBA)

At the time of VB 5, Microsoft moved over to using VB as the language of scripting in Office, replacing WordScript etc. The VBA language is VB, but rather than have the VB libraries, it has the Office libraries. The languages are identical in general use.

VBScript

Microsoft also decided they needed a JavaScript (or JScript as they call it, or ECMAScript as dull people call it) competitor. They decided to repurpose VB, changing it slightly to give it a more scripting feel. If you use IE, you can run VBScripts in the browser. If you like email viruses you can run the .vbs files you can your inbox. If you have active scripting installed, you can run .vbs files at the command line.

But please, don't use VBScript. JavaScript is a much nicer language, and works everywhere VBScript does and in more places besides. The language VBScript should quietly go away and die. There is one exception: ASP only provides one method (uploading a binary file) for VBScript but not JavaScript. As a result, I do have a website written in VBScript which I have to maintain.

VB.NET

And finally we get to the newest edition, or ¬VB.NET as I will refer to it. When changing to C# with .NET, Microsoft decided that shunning all the VB developers would be a bad idea, so instead they shunned them and wrote a syntax wrapper round C#. The ¬VB.NET language is a perfectly fine language, but its not VB. Things like late binding have been removed, the syntax has been made more regular, the object facilities have totally changed etc. If you have a VB project of any mild complexity, then you'll not get very far with the automatic converter.

If you are a VB developer looking to move towards the new .NET framework, I'd look at both ¬VB.NET and C#, before deciding which one to pick. The differences aren't that major, but at the time I was learning (.NET 1.0 beta) the ¬VB.NET was clearly the unloved younger brother of C# - the examples were all C#, and it was clearly the favoured language. Now both languages have evolved so they are not as similar, but neither is worthy of the VB name.

Me and VB

VB was my first language, and my "first choice" language for years. After learning C I initially poured scorn on VB for not being a "real language", then I realised I was an idiot and that VB is a lot better for most application programming than C, and you can always interface to a C .dll if you must. Now Haskell and C# (for GUI's) have become my standard languages - I haven't started up VB 6 for quite a while. I've got a few programs written in VB 6 around that I technically maintain, but none that I've had any reports on in ages, and none that I can imagine working on. I've got at least 50 programs in my VB directory, but I've forgotten quite where I put my VB directory - its certainly not been opened in a while.

Even though I may not use stand alone VB 6 anymore, I still maintain and actively work on some VBA solutions for accountants in Excel, and a couple of website templating systems using VBA in Word. I also develop and maintain a corporate website using VBScript, which requires new features adding at least once a year. VB is not yet dead, but its definitely going that way.

Tuesday, April 17, 2007

Coding Nirvana

Today I experienced the perfection of coding, and I never want to go back to the old hacking method! My FilePath library is nearly ready for release, but just before it went out I got a bug report.

The FilePath library is based on the idea that all functions must satisfy a list of properties, which are documented in the interface. I use QuickCheck to automatically test that these properties hold. This leads to the great situation where there are only two classes of bug:

A property is wrong
A property is missing

The bug report I received was of the second kind - a property was missing. The next step is to decide if this property should hold or not, and in this case the reporter was perfectly right, this property should hold. Next step is to decide if a more general version of this property should hold - and in discussion with the reporter a more generalised version was formulated. By working at the level of properties the code is no longer the hard part, and while collaborative coding on a single function isn't that useful, collaborative property discussion works really well.

Once the properties have been decided upon, and having left a night to see if anyone had any other thoughts on the properties, the coding can begin. This then makes the process of coding much less hacky, and much more precise:

Write out the properties
Write out code to implement them
If the tests fail, you have a counter example - go back to 2
Done

By the time you get to 4, the code meets the properties. You have already done 1000's of tests (more than 10,000 in FilePath) and can have a fairly strong degree of confidence in the work you have produced. The bug reporter has already agreed with the changes you've made, even before you've made them.

I have never done a library before so focused on properties, but it has had amazing advantages. The advantage of having correct and reliable code may be obvious - but in practice the amount of confidence in the code surprised me. By far the bigger benefit has been collaboration - the very high declarative nature of properties makes them something well suited to "group thought".

Monday, April 16, 2007

No Termination Checking

Originally my PhD had an element in it on Termination Checking, and since someone asked on IRC how that bit was going, I thought I'd let everyone know that my PhD now doesn't include termination checking - there are several reasons why:

Related Work

Not long before I started working on the termination thing, this piece of work came out: Automated Termination Analysis for Haskell: From Term Rewriting to Programming Languages. That does a really good job. The authors behind that paper have been working on term rewriting termination for a really long time, and have built up a very powerful framework. Their encoding of Haskell into their framework means they have a very powerful tool - based on years of experience. I don't think I could hope to come anywhere near that.

Less Motivation

I want the tool I produce to solve a practical problem. I want it to be something that people can use on a daily basis. In Haskell termination is rarely a problem, and if it is, usually the problem occurs in all circumstances - not just the corner cases (of course there are exceptions, but this is my overall impression). A lot of termination issues relate to variables being bound unexpectedly, lazy evaluation and black holes - and these will occur on any sufficient test input. Once non-termination has occured, there is no point in running a static checker - use a dynamic tool such as Hat-Nonterm or Black-Hat - both specially designed to pinpoint the cause. These factors mean that my motivation to work on a termination checker is less - I don't see it as fulfiling a massive practical need in Haskell. (I know some will disagree, but this is the reason for me not doing it, not for others not doing it - motivation is a personal thing)

Hard to Proove

If you have a static checker, there are two main reasons for using it - to obtain a proof of safety and to find real bugs. It is most useful when the issues that are spotted can be worked around in some way, so that even if they were not a valid problem, afterwards you can obtain a statically checked version that you can gurantee will terminate for all inputs. Pattern-match checking has this property, by completing the patterns or performing extra (potentially redundant) checks, you can be sure all patterns are safe. From my understanding, termination checking does not - to introduce a decreasing variable into an algorithm that does not already have one is a lot of work, and potentially requires a rethink of the entire algorithm. For this reason, even with the help of a willing programmer, I see termination checking as something that will obtain less successful results than I would like.

Thursday, April 12, 2007

SOC: GuiHaskell

The Google Summer of Code allocations are now out, and GuiHaskell got one of the slots from Haskell.org. There were quite a few students applying for GuiHaskell - I thought the students who applied were very good, and any of them would have done a wonderful job. In the end Asumu Takikawa was chosen, and I wish him well with the project.

The link with his application is available here.

The basic idea as I see the project is to start from the existing GuiHaskell base and evolving so that by the end of the summer it is a very solid alternative to WinHugs, but with several key advantages:

Written in Haskell using Gtk2Hs, not C with the Win32 API
Cross platform: Windows, Linux and Mac at least
Cross compiler: Hugs, GHC and GHCi to begin with
Integration with Haddock/Cabal/Hoogle etc.

The initial aim is not to make it an IDE, but to do everything an IDE might do apart from having an integrated editor. Depending on the progress, this may be something that is tackled this summer, or perhaps not.

Exploding Patterns in FilePath

[ICFP referees please stop reading now]

After what feels like forever, finally my FilePath library has hit 1.0. It took over 250 email messages debating FilePath design choices from Haskell 1.0 onwards, but finally we were able to come to an agreement about what the libary should look like. I'd like to thank everyone who participated in the discussion, used the library, and encouraged me to keep pushing it.

I've previously blogged about my QuickCheck testing of the FilePath library, something which has been essential to coming up with a coherent design - and which lead to various decisions being made for me. Now I'd like to talk about the pattern match story. For the purposes of this post, I'm going to term "exploding patterns" to mean pattern-match errors that can be provoked by appropriately chosen input.

After doing all the QuickCheck tests, you might think I was quite confident of avoiding exploding patterns, but I was not. The QuickCheck tests all work with lists of 25 characters - perhaps the empty list crashes the function? Perhaps only some really rare input? After all that loose random testing, the time came for some serious proof!

Taking a look at the code, I have used 5 incomplete functions (fromJust, init, head, last, tail) and one explicit non-exhaustive pattern-match. Can any of these explode? Turning the code over to my pattern-match checker, it quickly (2 seconds, 2Mb later) informs me that it can prove none of the incomplete functions crash, but it can't prove the explicit pattern match. So lets take a look:

f x | isWindows =
| isPosix = ....

Oh dear, it won't crash currently (isPosix = not isWindows), but you can certainly see that these patterns aren't going to stay complete if we add another operating system. Quick as a flash, we replace "isPosix" with "otherwise", and run the checker again. Yay, no exploding patterns remain.

So there we have it, in under 5 minutes, I was able to proove my entire library to be safe. I can still use all the fun functions like head, but in a safe manner. It turns out the library was always safe, but one minor modification to improve the code, and I can be sure that things won't go wrong - the released 1.0 has this modification. I have the guarantees of safety, without much work at all. Not only that, but as the library evolves, I'm only 2 seconds away (literally!) from pattern-match safety, compared to my QuickCheck test which takes around a minute, this is a very cheap verification.

So how come that within an hour of releasing 1.0 someone had already replied with a bug report about a crashing pattern? The answer is simple, Setup.hs in Cabal has code in it which could (at the time of release, but not anymore) cause a pattern explosion. I didn't write, or validate, the Setup.hs script - but it just goes to show that pattern-match errors can easily slip through!

Wednesday, April 11, 2007

Redesigned Website

I've redesigned my main academic website: http://www-users.cs.york.ac.uk/~ndm/

Why did I decide to redesign my web page? Because I realised that:

I was being put off releasing code because adding it to my website was a pain to add to
I wanted tags
The number of projects had exceeded the vertical space in some browsers
I wanted better cross-site linking - for example all downloads should appear on one place

The site was previously written in PHP as that is the language supported on the server, but I don't know PHP, and from what I do know, I hate it. The solution was obviously to rewrite the site using Haskell - but how.

I initially looked towards a web framework, but given I can't install anything on the server, that makes it a bit hard. The next choice was to write a preprocessor that takes the source code and generates the end result HTML. This was what I decided to do.

I now write each page in HTML, but in a modified form of HTML. For example, the code for the Catch page is at: http://www-users.cs.york.ac.uk/~ndm/catch/catch.html - do view source to see the original. The start is a list of properties, which include tagging information, slides/papers etc and names. The rest of the document is simply the content.

The preprocessor the gives an appropriate beginning and end to each document, fills in some magic tags I have defined, and outputs a complete document. All slides/tags etc are collated and placed in central files - for example http://www-users.cs.york.ac.uk/~ndm/downloads/

The code weighs in at 350 lines, and can be found at http://www-users.cs.york.ac.uk/~ndm/Main.hs - its overly long and could do with some proper abstraction being applied - I just hacked it together as required. However, its fairly simple and generates a nice result without too much effort. Adding a new page is now as simple as creating a new file, and all the indexes are automatically maintained.

I am not sure if this approach will suit everyone, or even if it will suit anyone but me, but I think this is a very nice way to create a site which is easily extendable and very consistent, while keeping the overhead on each page down.

I am sure there is a common tool underlying this, just waiting to get out, but unless anyone else expresses an interest in this approach, the tool will remain buried in my specific needs.

Friday, April 06, 2007

A beautiful proof

I got told a rather beautiful proof the other day, and wanted to write it down.

Question: Given any two irrational numbers x and y, does x^y ever evaluate to a rational number?

Answer: Yes.

Proof: Given the definitions:

r = 2^0.5 (square root of 2, known to be irrational)
z = r^r

Either:

z is rational. Set x=r, y=r and you have a proof.
z is irrational. Set x=z, y=r. x^y = z^r = (r^r)^r = r^(r*r) = r^2 = 2, which is rational.

This proof does not provide a particular value of x and y, but merely shows there must be one.

Friday, March 30, 2007

Eq but not Ord

I recently wondered, which data types in the Haskell library have instances for Eq, but not for Ord? Of course, everything with Ord is obligated to provide Eq, but how many provide only Eq?

The answer is 82:

Action, Body, BuildInfo, Callconv, Clause, Compiler, CompilerFlavor, Con, ConfigFlags, Constr, ConstrRep, Counts, DataRep, Dec, Errno, Exception, Executable, Exp, Extension, Fixity, FixityDirection, Foreign, FunDep, Guard, Handle, HandlePosn, HsAlt, HsAssoc, HsBangType, HsConDecl, HsDecl, HsExp, HsExportSpec, HsFieldUpdate, HsGuardedAlt, HsGuardedAlts, HsGuardedRhs, HsImportDecl, HsImportSpec, HsLiteral, HsMatch, HsPat, HsPatField, HsQualType, HsRhs, HsStmt, HsType, IOArray, IOErrorType, IOException, IORef, Lexeme, Library, License, Lit, LocalBuildInfo, Match, MVar, Node, Orient, PackageDescription, PackageIdentifier, Pat, Range, Safety, SockAddr, Socket, SocketStatus, StableName, StablePtr, STArray, State, Stmt, STRef, Strict, Tree, TVar, TyCon, Type, TypeRep, URI, URIAuth

Wednesday, March 28, 2007

let vs letrec

I've been working hard on Catch for the last few months, and will hopefully have something to release to the world very soon - a few people have alpha copies already. As part of that I've been playing with Yhc Core lots and lots. At once stage I introduced a bug into Yhc, by accident, due to the difference between let and letrec, so I thought I'd describe my mistake for the whole world to see.

Haskell has a keyword called let, but some functional languages have both let and letrec - Haskell's form (although called let) is actually letrec. What's the difference? Consider the fragment of code:

let x = 1 in (let x = x in x)

If we are dealing with let, the answer is 1. If we are dealing with letrec, the answer is a black hole of non-termination - _|_. The rule is that in a letrec you introduce the names you are defining before computing their values (so the x = x refers to itself), but with let you only introduce the names once you get to in (so the x = 1 is still in scope).

Often people define let's in a non-recursive way, but occasionally the recursion is fundamental:

repeat xs = let res = x : res in res

Here repeat is being defined with letrec - the res on the LHS is used on the RHS. This enables only one (:) cell to ever be created, saving memory.

Yhc.Core has let, which is recursive - unlike GHC Core which has both let and letrec. Yhc.Core also provides a method "removeRecursiveLet" which removes all recursive lets, replacing them with equivalent code.

So now we come to the bug, I recently changed Yhc.Core so that when desugaring FatBar/Failure (from complex pattern matches) we introduce let bindings, instead of inlining the let multiple times. This can lead to a substantial reduction in code size for bigger programs. The basic idea is that before when a Failure was spotted, we replaced it with an expression, now we replace it with a variable which we bind to the expression. If there are multiple failures with the same expression, we gain with less code.

Given the function:

binAdd False as [] = as
binAdd False [] (b:bs) = b:bs

This desugared to:

| Num.binAdd v248 v249 v250 =
| let v_fail = error "Pattern match failure"
| in case v248 of
| Prelude.False ->
| let v_fail = case v249 of
| Prelude.[] -> case v250 of
| (Prelude.:) v251 v252 -> (Prelude.:) v251 v252
| _ -> v_fail
| _ -> v_fail
| in case v250 of
| Prelude.[] -> v249
| _ -> v_fail
| _ -> v_fail

As you might be able to see, there are two vfail's here which are part of a letrec, when I really was hoping for a let. Adding a simple numbering scheme to the v_fails mades these refer to different ones and the problem was solved.

Wednesday, March 21, 2007

HsExcel

I was reading JFP this weekend, and read the paper "Spreadsheet functional programming" by David Wakeling. As a Windows user who has done a lot of work with Excel, I thought it was great. People before have either tried to pervert Excel into a functional language, or write a new spreadsheet - and given that Open Office spreadsheet sucks totally - this seems to be a hard challenge.

Despite being a lovely concept, I thought the implementation described missed the mark in a few places. These are not academic reasons, but reasons of practicality for a production system and integration with the underlying Excel. I fully expect that the reasons stem from missing a deep understanding of Excel - something few academics would be likely to master. However, my Dad is a Chartered Accountant, and I've been developing systems based around Excel for over 10 years, so can fill in the Excel half.

Spreadsheet Functional Programming

Before describing how I would implement the ideas in the above paper, its best to recap the paper a bit, for those people who can't be bothered to read it. (Note: the above paper is very accessible, even a novice programmer should be able to get something from it - so skip this summary and read the paper!)

The basic idea is that Excel should be able to call Haskell, so:

=Haskell("1+2")

should call off to Haskell and evaluate 1+2, hopefully returning 3. Naturally that sort of thing could be done in Excel much more conveniently, so the strength should be in definition of proper Haskell functions within Excel:

f :: Int -> Int -> Int
f a b = (a + b) / 2

=Haskell("f 1 2")

(imagine a more complex example for f, I'm too lazy to think of one, but the paper has one)

Obviously, =Haskell should go in the Cell you want the result to appear in, but what about the definition of f? This paper chooses to place definitions of f in comments, a neat solution that keeps them in the document, but out of the small cells, which would be inappropriate.

This is extended by allowing users to write references such as A1 in their Haskell strings. Unfortunately this shows up a weakness - now dragging formula around does not automatically update their references as it normally does in Excel, since the reference is hidden from Excel inside a string.

The final point to mention is that the evaluation is done with Hugs, by writing out a static file and running it - something I believe could end up being too slow for practical use. Whether it is or not isn't something I can comment on, having not tried it in real life.

HsExcel

Reading the paper, there are lots of things which are done spot on, and seem obvious in retrospect. The best ideas always seem good in retrospect, so I consider this a very good thing. Unfortunately I see 3 issues:

Writing code in comments looks highly unpleasant

Haskell is in strings, which enforces separation between Haskell and Excel

Not fast enough

I have come up with a design which I believe addresses all of these issues, which I have named HsExcel. I have not implemented it, so cannot guarantee it would really work, but I suspect it would not be that much of a challenge. I have no free time currently, but it is on my todo list (somewhere).

The first issue I shall tackle is writing code in comments. A much nicer solution would be to add a tab to Excel which contained Haskell code in a nice big box, perhaps also allowing Haskell code in comments if people found it preferable for small snippets. This can be done using Excel dialog panes with a bit of VB magic.

Next is the issue of Haskell in strings, instead of =Haskell("f 1 2") I would write:

=hs("f",1,2)

Here the Excel will evaluate 1 and 2, and pass them to the Haskell function f. If 1 was A1, then it would behave like a normal A1 in Excel. Perhaps allowing =Haskell as well would be nice, but I think the common case would be =hs. This also allows better flexibility, nested Excel and Haskell functions together, intermingled. I also think this will make cell dependency updating work (an issue not covered in the paper).

Once this has been done, Haskell could operate on any Excel values, including Cell Ranges etc. Unfortunately the paper is silent on the issue of marshaling Excel data to Haskell, but I suspect it could be done relatively easily.

One nice feature to add would be that =hs_f(1,2) works, just like an Excel function. I'm not sure if VBA is permitted to add primitive functions to Excel, or just addons. I'm also not sure if addons can dynamically change the available functions. If this was the case, then all entires in the Haskell source could become first class Excel functions.

The final question is the issue of speed. Currently the implementation runs the string from the beginning each time, writing out fresh values for the changed cells. I think spawning a shell and talking to an interpreter with each request would be higher performance. Once this solution was adopted, either GHCi or Hugs could be used, potentially allowing for a further speed up. The author comments that this is hard to do using VB's Shell command, and indeed it is. However, it is possible to prod standard Win32 system calls from VB, albeit with a great deal of work, which could get this working properly.

Conclusion

The paper presented a lot of good idea, I think a bit of user interface polish could bring them to the fore. HsExcel could be a useful tool - I certainly have projects I could use it for. I suspect that it would be under a weeks work for anyone already fluent in Haskell and (more importantly) Excel/VBA. Any volunteers?

Saturday, March 10, 2007

Describing Haskell to an Ada programmer

Yesterday I was asked a question about functional programming from someone in my department. This person is in the real time systems group, and their programming language is Ada (although they are not really a programmer).

The question I was asked was about cache locality, but that question was just a red herring inspired by a complete misunderstanding about what functional programming was. The impression they had was that it was just Ada without procedures. So I tried to explain "in a nutshell" what functional programming was. It's not an easy task!

I didn't go on about things like purity and referential transparency - these concepts might mean something to an experienced software engineer (they can see how it would make nicer interfaces) or a mathematician (more like maths), but to someone who does timing analysis, it doesn't.

So I wanted to try and get across a few of the core concepts:

1) No assignment (technically no destructive assignment, but that is the only type they know). The example they wanted me to show them "in a functional style" was:

x := 1;
x := x + 1;

In a functional style there are a couple of ways to do this:

x = 1
y = x + 1

Of course, order is irrelevant

y = x + 1
x = 1

Neither really capture the idea they were looking for. However, in a purely functional style we don't write silly stuff, so thanks to the purity we can just write:

y = 2

The great thing is that we can know that the answer is 2, because beta reduction is valid.

So after (probably failing) to dismiss destructive assignment, I moved on:

2) No sequencing, no "one thing then the other". This was again quite hard to get across. I gave the example of quicksort, showing that sequence was not necessary.

I then challenged them to write quicksort in Ada, which is really quite hard to do. Obviously to do a quicksort only on Integer takes loads of lines before you even get to the sorting bit. Since I had been teaching generics earlier, this showed how superior Haskell is for generics.

I think I was quite a bit less successful at explaining this one. The only way you can show sequence is not necessary is by writing an OS, a compiler, a web server etc in Haskell - all of which have been done. Unfortunately I only had a small piece of paper so couldn't write any of those.

3) Then we got back to the question of timing analysis - if you spend all day doing timing analysis you start to believe that everything else is less important. I got asked if my PhD in "software engineering" was just writing programs. I took great pains to explain that programming language research is not writing an interface to a database in Java.

Anyway, if you want to spend all day doing timing analysis, Haskell is probably not the right language to pick :) So at this point I decided to screw with their mind:

I asked them to predict how long this would take:

sort "neil"

They guessed O(n log n), I said O(1). If you never need the result of sort, it never gets executed - take the example:

if null (sort "neil") then 1 else 0

The idea of lazy evaluation was starting to mess with their head, so they decided to try and rephrase the question. They are interested in the worst case time, so can you treat it as the time to fully evaluate all the components.

The answer is a definite no!

[1..]

The above expression takes infinite amount of time to evaluate

[1..] !! 3

Therefore under their interpretation, this takes an infinite amount of time, but it doesn't. It's much quicker than that.

I don't think they understand that many of the concepts about functional programming, or how to use it in practice, but I think they do now realise it is not just "another programming language" - its an entirely separate entity.

Tuesday, February 13, 2007

A missing Haskell feature?

Haskell 98 + Hierarchical modules is pretty much feature complete as far as I'm concerned. I realise that most people would demand to add higher ranked types and multi-parameter type classes, but the only extension I'm crying out for is pattern guards.

However, there is one operation that I think would be useful - pattern equality.

Consider the data type:

> type RegExp = [Atom]
> data Atom = Atom String | Star String

Now imagine you want to check that all atoms in a regular expression are Star's. Easy enough in Haskell:

> all isStar r

Unfortunately the definition of isStar - something that is conceptually very simple - is rather verbose:

> isStar (Star _) = True
> isStar _ = False

The reason is that in Haskell pattern matching is used to control program flow, not as an expression. One extension I often find myself wanting is some kind of "pattern match equality". Stealing the ~= operator for the moment, imagine:

> all (~= Star _) r

The ~= operator can be seen as pattern equality. It can be given a very simple semantics:

> x ~= y === case x of {y -> True; _ -> False}

Of course, now you've got pattern matching in there, its natural to allow fuller pattern matching:

> all (~= Star x | null x) r

Note that the original pattern match desugaring still allows this to be valid.

What do people think of this suggestion? Do people think it would be generally useful?

Disclaimer: This is not a carefully thought out feature proposal. Stealing ~= for syntax is completely unacceptable, something like <- would have to be chosen, if the grammar rules would permit it.

Wednesday, February 07, 2007

Logical implication in Haskell

Logical implication is a common enough operator, usually written "a => b" for a implies b. Haskell doesn't feature a => operator, it would be a syntax error because that symbol is reserved for class constraints. Both QuickCheck and SmallCheck feature incompatible ==> operators for implication.

But Haskell does provide a boolean implication operator, namely (<=)! Yes, by writing the implication symbol backwards, you get standard forward implication!

Sunday, February 04, 2007

Haskell and Set Theory

As part of my PhD, one thing I've been doing recently is specifying and working with set theory operations. Haskell is the perfect language for this, because in most cases the set theory can be implemented almost directly. Let's take a few examples to show how neat Haskell is at this.

I've done all the set theory in LaTeX - I guess most people who know set theory probably know LaTeX as well! I've also used xs/ys for sets, and x/y for elements from the sets. Although this isn't the way most mathematics would have been written, it does actually make it a bit more readable with some explicit typing :)

Subset

Subset is defined as:

xs \subset ys = \forall x \in xs, x \in ys

subset xs ys = all (\x -> x `elem` ys) xs
subset xs ys = all (`elem` ys) xs

The first version is the direct translation. We can of course make Haskell a bit closer to set theory with:

forAll = flip all

Now we have:

subset xs ys = forAll xs (`elem` ys)

That is a pretty direct translation of the mathematics.

Cross-product

One operation I wanted to specify recently was:

merge xs ys = { c | \exists x \in xs, \exists y \in ys, c = x <> y}

(where <> is an operation I had defined myself)

This time you can use Haskell's list comprehensions to obtain a translation:

merge xs ys = [x <> y | x <- xs, y <- ys]

In this case the Haskell is actually shorter the the corresponding mathematics.

Conclusion

I could give more examples of where the mathematics and the Haskell overlap closely, but they are almost too easy.

Almost every piece of set theory that I've tried to define in Haskell comes out looking very much like the original. This closeness allows me to work in mathematics, and then translate things to Haskell easily without worrying about having made mistakes in the translation.

The facilities of Haskell that make the set theory translation so direct include:

Higher order functions (i.e. all, forAll)
Sections (`elem` xs) being equal to \x -> elem x xs
Infix operators such as + and `elem`, along with priorities
Beautiful syntax
Lazy evaluation can make some operators more efficient (but is not incredibly important)

Wednesday, January 31, 2007

Writing code by types

In the past few days I've found that while I'm writing the code, the type checker is the one deciding what the code should be. To take an example, I was recently writing some SYB code using gfoldl:

> gfoldl :: (forall a b . Data a => c (a -> b) -> a -> c b) -> (forall g . g -> c g) -> a -> c a

I decided what c had to be:

> newtype C x a = C ([x], [x] -> (a, [x]))

(where "C x" was the instantiated type of c)

From there I didn't have much of a clue what I was doing, but by the time you've got that far, there is really virtually only one type correct program that can possibly work! Finding what that program was took a while, but once you have got this far you can start getting the type checker to write the code for you.

The first step is to write the basic definition, in such a way that Haskell will give you error messages if you don't get it right:

> myexpr :: a -> C x a
> myexpr = gfoldl cont base
>
> cont = undefined
> base = undefined

At this point the Haskell type checker can already check that you haven't made a mistake anywhere else. Once all is good you can start trying to "fill in the blanks". The first step is to tackle base:

> base :: a -> C x a
> base = undefined

Note that no code has been written, but you've checked that the type is valid. A :ttype on base would have given you the same information, assuming base was at the top level of the program, but this step is useful to check.

Next you can start filling in the undefined bits:

> base :: a -> C x a
> base a = C ([], \y -> (a,y))

By working this way it is relatively easy to build up a correct program, at each stage using small steps and regular type feedback to check you are going in the right direction.

The second time I was using this technique was when I was replacing one data structure with another, with some similar operations and some completely different ones. It wasn't as simple as changing the imports, but thanks to regular feedback from the type checker this went from being an exercise in thought to just going around mopping up the errors.

Programming like this is very powerful, giving lots of hints about what the user needs to do next, but would really benefit from some intelligent development tools. Epigram has some similar ideas built into the development environment, but Haskell doesn't.

Sunday, January 14, 2007

Debugging and crying

I'm now back from Hac 07, and I was sharing a room with Pepe, the GHCi Debugger Summer Of Code guy. I got to hear his presentation on the debugger he has completed, and got to see it "in action". He made some interesting comments - that Hat provides a much richer framework, and that when he originally tried to get Hat working he even went as far as replacing his OS (!) to try and get Hat working - but with no luck. He demonstrated his debugger, and the next day, in front of about 10 people I demonstrated Hat.

GHCi Debugger Demonstration

The GHCi debugger is much newer, and not officially released yet. Pepe started off with the demonstration, giving a few examples, but at various points I asked "what if?" questions, all of which he went and did. The debugger worked flawlessly, and produced useful information. It didn't produce as much information as Hat (particularly stack traces were missing), but it was solid and reliable. I only had one major gripe with it - that :quit was used for continuing from a breakpoint - but the next day a patch went in fixing that.

Hat Debugger Demonstration

In return, I tried to go through a pre-scripted demonstration of Hat. However, since the first scripting I had installed GHC 6.6 on the machine, so the compilation took about 100 times longer than I remember - no idea why. When I finally had a Hat file I was able to view it in Visual Hat. I then tweaked the file as people asked various questions - unfortunately that didn't work so well. Various things went wrong. I added a head call, to invoke a pattern match failure. Indeed, it did invoke a pattern match failure, in Hat itself! I tried some other stuff, files kept appearing in the wrong places. Stack traces were coming out not quite as you'd expect them. Everything went wrong.

Some of these issues may well be in the hat-make that I wrote recently and hackily, but others were more fundamental issues in Hat itself.

Conclusion

While Hat is without a doubt more powerful, it looses on reliability. Despite porting Hat to windows, and using Hugs fulltime, GHCi will be my debugger of choice. I don't use Hat because when I've got a bug to fix, I know getting through Hat will take at least an hour, and won't always be possible (Data.Map for example). The GHCi debugger is maybe 10% of Hat in terms of power, but 100% reliability looks like it will make up for this.

Friday, January 12, 2007

Hac 07: Deferred binary

As part of Hac 07 I have written a Deferred binary library. It's not released yet, since I'll need to change some bits, but I have managed to write a pickle/unpickle combinator which is pretty minimal. It can save to a file and load from a file in binary.

The first step is to take your data structure:

> data Computer = Laptop {weight :: Int, wireless :: Bool} | Desktop {software :: [String]}

Then you need to write your combinators:

instance BinaryDefer Computer where
bothDefer = serial
[\ ~(Laptop a b) -> unit Laptop <<> unit Desktop << a =""> Handle -> a -> IO ()
getDefer :: BinaryDefer a => Handle -> IO a

This gives you all the power to save and load your data structure. No need to manually assign tags, no need to write get/put versions, just one set of combinators. Needless to say, this was pretty tricky to set up!

But this is all stuff that belongs in a binary library, so what is a deferred binary library? Well perhaps there are a lot of software packages on a desktop, and usually the program doesn't care about them. Maybe 90% of the file will be the software packages, so 90% of the time will be wasted in reading the file. Of course, sometimes the software is queried, so it still has to be available. Using deferred binary, you can change the combinator to say which fields are loaded in a deferred manner:

\ ~(Desktop a) -> unit Desktop <<~ a

That extra ~ is the single change. Now the file format adapts to store the software list in a different place and it is read only when needed. Much more efficient, yet still the same simple underlying interface.

This is the binary library that Hoogle 4 will be using!

Hac 07: Searchable haddock index files

To see what I've been doing at Hac 07, go to: http://www-users.cs.york.ac.uk/~ndm/temp/haddock/doc-index.html

I have written a searchable index file for haddock, which adds very little file size, and has search-as-you-type features. It doesn't store a separate table of index names. It searches for prefixes only, in O(log n).

There was a version that did full substring searching in O(n), but unfortunately IE is too slow to do this for the base library. If you want that you are still going to have to use Hoogle.

These patches have been bundled up and sent off to Simon Marlow, so hopefully they'll be appearing in a Haddock near you soon!

Thursday, January 04, 2007

Does Haskell need macros?

Haskell has higher-order functions, and lazy evaluation, but not macros. Does Haskell need macros? For the purpose of this, I'm talking about macro in the "a little piece of code that expands out" sense, rather than the conditional compilation that macros quite often get used for in C++.

If Haskell needs macros then there must be some benefit to adding them. The reason that a lot of languages need macros is because they have strict evaluation. Consider trying to write a function that selects one of two arguments, based on the first condition variable - some sort of if:

iff a b c = if a then b else c

You can write this in Haskell. You can also write this in VB (and it is pre-defined), but it doesn't do what you might expect in VB. Haskell's lazy evaluation is a big win here, opening up entirely new opportunities.

So what are the reasons to add macros?

Speed

This is just a silly argument. First off, a pure function can always be inlined away, so performance is not an issue. A compiler might choose not to inline a function, but you can usually persuade it that it should with pragma's etc. And finally, there is the real possibility that you (the author of code) doesn't have a clue what should be inlined and is actually throwing away speed for premature optimisation.

Generate code

One thing that macros can do is to generate definitions, functions can be used to generate expressions (essentially a function is an expression with some pre-defined structure). You can't define a Haskell function that takes some data, and generates classes/instances etc. For example, imagine that you want to define lots of constructors for binary operations. Instead of:

data Expr = Add Expr Expr
| Mul Expr Expr
...

Here a macro might let you write something like:

data Expr = binary(Add) | binary(Mul) ...

Of course, this is something you really can't do with Haskell as it stands. However most of the ways you might use a macro like this are covered by Haskell. There are type synonyms, which may actually let you do the above trick with GADT's (I haven't tried, but its certainly feasible they could work like that, even if they currently don't). Classes has defaults which is another situation where otherwise this functionality might be required.

For most purposes generating code with macros is not about doing clever stuff, but about cutting out repetitive stuff. Haskell has (by design) cut out most of the repetitive tasks, so no longer requires macros. For truly repetitive tasks, Template Haskell can be used, or things like Drift and Scrap Your Boilerplate. All of these techniques reduce the number of places where macros might otherwise be required.

LHS of a pattern match

One place where expressions are present, but functions are not allowed is on the left hand side of a pattern match. For example:

f 3.14159265358979 = 2.71828182845905

Of course, it's silly to write this using hardcoded values, what if either of these constants change?

f pi = exp 1

Unfortunately without macros, this is not allowed in Haskell since pi is a function (a constant arity function, but still a function). You'd instead have to write:

f x | pi == x = exp 1

Not too painful, but what if you are trying to match a LHS with a free variable in it:

f (Neg (Neg (Neg a))) = Pos $ Pos $ Pos a

Here you are matching a deeply nested constructor, and on the right have to explicitly bracket the terms. However on the right you are able to use the function ($) which due to precedence issues removes the need for brackets. One thing you can't do however is:

f (Neg $ Neg $ Neg a) = Pos $ Pos $ Pos a

If ($) was a macro, this would be valid however.

Haskell has no feature here that can reduce the redundancy in the pattern match.

But while macros would be useful here, how far would standard C style macros take us? Let's consider defining a useful auxiliary:

apply 0 f x = x
apply n f x = f (apply (n-1) f x)

Now we can rewrite this as:

f (Neg (Neg (Neg a))) = apply 3 Pos x

But we really can't write:

f (apply 3 Neg a) = apply 3 Pos x

However, its entirely possible to come up with a consistent semantics for this, so maybe one day this might be something people decide is useful.

For the moment the lack of macros on the LHS is occasionally annoying, but generally is not a feature that anyone is crying out for. The LHS is rarely sufficiently complex for macros to show great benefit.

Type checking with rank-2 types

While standard Haskell (mainly, ignoring MR and seq - for those who care) has the fold and unfold law, i.e. you can inline a function and then contact it back, once you introduce rank-2 types you loose this property. Unfold still holds, but fold doesn't. This leads to lots of typing problems for things like runST which present rank-2 types to the user.

While macros would paper over the cracks here, the real solution probably lies in the type system, which I believe GHC has in various flavours of impredictive polymorphism. The other solution is not to use rank-2 types, something I have been happily avoiding for years now.

Conclusion

Despite macros being useful in languages like Lisp, in Haskell their usefulness is probably minimal. Lazy evaluation is often a substitute for macros, and where other languages require macros to define domain specific languages, Haskell solves these problems naturally.

Macros are quite a painful cost to a language, they reduce the assumptions that people can make and require people to play hunt the macro. Haskell really doesn't need them!

Saturday, December 30, 2006

End of year update

Since most people are doing a year summary, I thought I should do one. Then I remembered that I can barely remember what I did yesterday (for example today I went in to work only to discover it was a Saturday...). So instead I'm going to try and predict what I'll do in 2007, with the proviso that it's a complete pile of fiction, but some of it might actually happen.

PhD + Catch

2007 should be the year I complete my PhD. Its certainly the last year I get any funding for my PhD. With aiming to complete my PhD, I also want to release a really good version of Catch - the Hsakell pattern match checker. I've been working away on this for over 2 years now, so would like something I can point at. I'm hoping Catch will be out by March, but I realise I said "a few months" at least 9 months ago now - it turns out the Halting Problem is quite hard...

Hoogle 4

I think Hoogle 4 has been knocking around for about 8 months now. I'd actually like to get this finished, and be the main version of the website. This brings just a complete polishing of Hoogle. Hoogle seems to be a well used tool, and hopefully Hoogle 4 will just make it better at everything and much more reliable.

Yhc

One project which I really want to polish off and put a release out of is Yhc. Yhc has lots of really exciting features, and is evolving pretty fast. The Yhc Javascript stuff is particularly interesting - the demos just keep getting better and better. The Yhc.Core side of things is looking really impressive. My optimisations stuff is starting to look really good. If I ever finish WinHaskell, Yhc might become a very nice choice for the default compiler.

Parsing

I really really want to finally implement my parser. Unfortunately, just taking a look at the things above this makes that rather unlikely. If you add in distractions like Hat, WinHaskell, WinHugs etc then it looks a lot less likely. As such I'm going to make the target to formalise my state machine algorithms and write a state machine library in Haskell which I can use as the basis for my parser.

I've tried to keep this list as short as possible, to actually give me a chance of doing some of it. We'll see if that happens.

Saturday, December 23, 2006

Hoogle Progress, HsMan features

About 2 months ago Frederik Eaton posted "HsMan" to the ghc list, a program that with one command line could look up the documentation for a function. Now Hoogle 4 has that feature :)

$ hoogle +filepath combine /info /doc
System.FilePath.Version_0_09.combine :: FilePath -> FilePath -> FilePath

Combine two paths, if the right path isAbsolute, then it returns the second.

Posix: combine "/" "test" == "/test"
Posix: combine "home" "bob" == "home/bob"
Windows: combine "home" "bob" == "home\\bob"

http://www-users.cs.york.ac.uk/~ndm/projects/filepath/System-FilePath-Version_0_09.html#v%3Acombine

What the above says is search the filepath module for the name combine, and when you find it, with the one that ranks best, display the haddock entry for it (info) and a link to further documentation (doc)

It was a reasonable amount of work to add, but there is a good reason for adding this feature, which will become clear when Hoogle 4 web version is relased :)

Friday, December 22, 2006

Hoogle 4 progress

I'm slowly working through the things I want for Hoogle 4, and its coming along quite nicely. A quick tour of the features I've already implemented:

Easier generation of Hoogle Databases

First create a Hoogle file using Haddock and Cabal:
> runhaskell Setup haddock --hoogle

Then convert it to a Hoogle binary database:
> hoogle /convert=base.txt

This will produce base.hoo, an optimised Hoogle database.

Faster text searches

Most searches are text searches, they used to linearly scan through the data, now they build a nice efficient index and hop directly to the relevant bit. This makes text searches over 100 times faster, and means they scale linearly with the number of results and the size of the text, rather than previously where they scaled with the size of the database.

As a result of having faster searches, I can now do multi-word searches, searching for two different words, and then matching between them. i.e. you can now search for "concat map" (or "map concat") and get concatMap returned.

Proper API

There is now a proper Hoogle API, which the Hoogle command line tool and web tool call out to. This will hopefully people can layer new things on top of Hoogle more easily.

Multiple package searching

You can now search within multiple packages at a time. For example "drop +base +filepath" will search both the base and filepath modules, as though they were one.

Restricting on module name

As requested you can now restrict searches by module name, for example "Prelude.map" and "map +Prelude" will both do a search for the map function that resides in the Prelude.

Proper parsing

The parser is written in parsec now, much more reliable! And also much better error messages, hopefully this should make it easier to extend Hoogle in the future as well.

The Future

I broke quite a lot with the complete revamp of the code, so the idea is to fix up what I broke and try and push a new release out relatively soon.

Thursday, December 21, 2006

Will produce patches for PhD

I've just got back to my home, and am doing darcs pull's against all the repo's around to get my work up to date. I noticed Yhc has over 1000 patches, and then I pulled my primary PhD repo - and found I have 1626 patches! I have at least 100 more in a separate repo I use for one chunk of the code, and was working in CVS for over a year before moving to Haskell.

The Darcs repo for my PhD at home seemed dead, so I did a fresh darcs get, no checkpoints. It takes a while...

Friday, December 15, 2006

Advertising Haskell

A lot of posts on the mailing list at the moment seem to be about how we can promote Haskell to a greater audience. The idea of this post is to figured out why we might want to promote Haskell to a greater audience, which I think will give us a clue where to go. If Haskell was more popular then:

More Haskell jobs would emerge
People would find it easier to introduce Haskell in a work place
Less C in the world
More reliable software
Elegance and beauty

Of course, there are also downsides to Haskell being more popular:

Bigger communities aren't always as friendly
Commercialisation will loose some benefits of Haskell (open source etc.)
The volume of beginners will outweigh the experienced professionals
Managers might push Haskell for entirely unsuitable things

So, question, do we want Haskell to be more popular. At the moment I quite like the fact that when programming I can be much faster than other people, and have far fewer errors. It would however be nice if I could get a job doing Haskell at some point, rather than becoming a C monkey once more, which will be doubly painful after experiencing so much Haskell.

My personal view on how Haskell can be made more popular:

Remove features, generalise features, do not add features. I don't understand/know rank-n types, template Haskell, GADT's, impredictive polymorphism, arrows... Beginners don't want to know either! Getting the features right is good, which may mean adding features, but not necessarily.
Continue to be polite and helpful
Promote Haskell as a programming course
Promote Haskell to experienced programmers looking for something new
DO NOT promote Haskell in industry, if you force this on to people, they'll just discover we are not ready for this yet!

In my opinion, we do plenty of advertisments already. There is not nearly enough coding going on, Cabal could benefit from extra contributors, I know Yhc could etc. When the tools are ready the users will come, and then if they don't come naturally, we can prod them. Until that time the steady flow of new users is plenty.

Monday, December 11, 2006

bhc: Basic Haskell Compiler

I have been thinking Compilery thoughts for a while now, so I thought I'd jot them down. I am starting the "bhc" project here and now! The intention is that bhc will be a concept, and some thoughts, for at least another year - definately not anything as concrete as code!

The current Haskell compilery things are:

Hugs - fast to compile, slow to run
GHC - slow to compile, fast to run
Yhc/nhc - medium at both
Jhc - aiming for super slow to compile, super fast to run
...

What I would like to do is create a new front end for Yhc, taking the Yhc.Core and Yhc.ByteCode bits as they stand, and replacing the front end. I'd also like to base the entire code on libraries, rather than compiler passes, and make everything entirely reusable. I also want to aim for simplicity, elegance and readability.

So, first things first. Haskell is really the only language to implement this in. After that you have a few stages until you get to the Yhc.Core language:

Make (Cabal should do this)
Parsing
Name resolution
Type checking
Dictionary desugaring
Desugaring

There is only one way to write a parser, using my parsing system (not finished, barely even started). There is only one way to write a type checker, using my type checker generator (not started at all, barely even specified, not even a link, definately not proven!). Name resolution isn't that hard. Dictionary desugaring should use the special method of Catch (same as Jhc, I believe). Desugaring is trivial with Play classes (warning, if you follow that link, not finished!), I also want to have invertable desugaring, for analysis purposes. The parsing and typechecking would be standalone libaries.

Two of these things need to be written first, but thats part of the fun :)

Note that type checking does all the typey stuff, dictionary desugaring uses the types. Nothing else uses types, and in fact, I think this compiler should be untyped. (I know no one will agree with me, all I can say is that I think I'm right, but realise everyone thinks I'm wrong)

The next big decision is file formats: I would like to have a .bho (basic haskell object) file which corresponds to a single module, and a .bhl (basic haskell library) which is a whole library, and a .bhx (basic haskell executable) which is a whole program. Inside a .bho you would have (all items are optional):

Full original source code to the module
Desugarred source code, in Yhc.Core format
Bytecode, in Yhc.ByteCode format

A .bhl would have those three things, but linked together within a library. A .bhx would have all of that, including all the libaries, linked in as one.

I would also write an optimser, for whole program analysis, which took a .bhx, and produced an equivalent one. Probably also a compiler to C code, for super-fast execution.

So what in this compiler would be new?

Focus on libraries, Yhc.Core, Yhc.Parse, Yhc.ByteCode, Yhc.TypeCheck
Invertable desugaring
Extensive use of the Play class
Better library story (one library, one file)
Standalone crossplatform executables
Fast whole program analysis
Brand new parsing system
Untyped Core language (compared to other optimising compilers)
Simple

So, who wants to have some fun?

Tuesday, December 05, 2006

Generalised Haskell

In the past few days I've been working on a generalised version of Haskell, designed for program optimisation and transformation. I've also implemented the start of a Haskell optimiser, but in about 100 lines of Haskell, using this generalised form of Haskell.

Generalised Haskell is very much like a normal Haskell Core language, with less restrictions in pattern matching. In normal Haskell you would write "map" as:

> map f [] = []
> map f (x:xs) = f x : map f xs

Which would be compiled into:

> map f x = case x of
> ____ [] -> []
> ____ (x:xs) -> f x : map f xs

The rule for pattern matching is that you take the first rule that matches. Generalised Haskell relaxes this slightly to take any one rule that matches, but not necessarily in order. The second relaxation is that when pattern matching on the left hand side, functions can appear as patterns.

For example, you can now define:

> map id x = x
> map f x= case x of
> ____ [] -> []
> ____ (x:xs) -> f x : map f xs

Note that the "map id" case looks very much like GHC's Rules feature. The advantage for optimisation is that many activities can be encoding using these pattern matches - arity raising, specialisation, deforestation.

I am still working on a tool that uses this technique for optimisation, but initial results look quite promising, while the code remains impressively short.

Friday, November 17, 2006

HsInstaller

I've given this message and idea to a few people already, via email and IRC, so I thought maybe I should share it slightly wider and get feedback from people.

I am intending to start a project called HsInstaller, to produce Windows installers for:

Hat, Yhc, WinHugs and any Cabal program (definately Hoogle, hoepfully Happy, Alex, Haddock etc.), maybe GHC and Gtk2Hs if thats possible. This project does NOT hope to create installers for libaries, just standalone executable programs. A wrapper can be produced for libraries that just calls the underlying Cabal infastructure much more easily.

At the moment the Haskell community has several different installers, the C based one I wrote for WinHugs that generates a .exe, and the GHC one you have that generates a .msi, and the Gtk2Hs one that uses InnoSetup. There are advantages and disadvantages to all approaches, but since the world is going steadily towards .msi's that might be the best format to standardise on. Of course, it might not be the best format, discussion is welcome!

Unfortunately the installer code used by GHC is not publically available, one consequence of this is that GHC does not have snapshot installers for Windows, and in fact a GHC release is not prepackaged for Windows - this is something done later.

So, my general thoughts are to try and see if a copy of the GHC installer can be obtained (this is apparently likely in the future). Collect all the installers that everyone uses, sit down with everyone who wants an installer and figure out what we have, and what we want. Once thats done we can start implementing something that everyone can reuse easily.

Anyone have any thoughts? I have about 4 projects that need installers (Hat, Yhc, WinHugs, Hoogle), but it would be nice if everyone could be satisfied by the results.

Monday, November 13, 2006

Dr Haskell progress

I've been working on Dr Haskell for a little bit, and its now quite a bit cleverer. For those who don't know, Dr Haskell tries to make helpful suggestions about your code. For example, if you use "concat (map f x)", it will suggest you use "concatMap f x". It can also do other clever things, for example:

not (a == b) ===> use /=

if x `elem` xs then True else x == 1 ===> use an or

if bracket then "(" else "" ===> ['(' | bracket]

The idea is to introduce beginners to new functions or patterns of Haskell that they might not have been aware of, without requiring too much thought or learning, until you have identified such an instance. I think of this as a nice complement to Hoogle, since Hoogle caters for the case when beginners have already identified where a common pattern may lie.

Of course, the biggest weakness in Dr Haskell is that it matches at the expression level only, a beginner might write:

takeheads [] = []
takeheads (x:xs) = head x : takeheads xs

(And indeed they do, I've seen that 4 times already today!) An experience programmer would recognise that as a map, and now, so will Dr Haskell! It will report "specialised version of map used as takeheads, line number n".

Some other things that Dr Haskell can now spot:

sum :: [Int] -> Int
sum [] = 0
sum (x:xs) = x + sum xs
(its a foldr, as written above - although I realise foldl' is a better choice for sum)

reverse xs = rev [] xs
where
rev acc [] = acc
rev acc (x:xs) = rev (x:acc) xs
(rev is a foldl)

Dr Haskell can now spot all of these, and has the power to spot lots more patterns using recursion as well as expression matching. I think this could be really helpful for beginners, if anyone has any interesting ideas about this please let me know. I hope to package up a release in the next few days.

Sunday, November 12, 2006

Library idea: the Safe library

Often Haskell programmers end up with "error, head []" - a totally useless error message!

Because in developing Catch I have 10,000's of lines, that would be a real pain. To combat this I have:

headNote :: String -> [a] -> a
headNote err [] = error $ "headNote, " ++ err
headNote err (x:xs) = x

I also have fromJustNote, lookupJust (fromJust . lookup), assertNote etc - loads of useful safe functions. I also have headMaybe :: [a] -> Maybe a. Perhaps this could be useful as a library - the safe library?

I would imagine lots of functions from the various Prelude/List libraries, some with Note versions taking a parameter to give a better error message, some with Maybe versions to return a Maybe on failure.

Anyone think this would be useful? Its technically trivial to implement, but I don't have enough time to manage this, so perhaps someone else would like to take this one on?

Friday, November 10, 2006

System.FilePath, automated testing

I just released System.FilePath, a library for manipulating FilePath's on both Windows and Posix. The problem with a library like this is that there are lots of corner cases, lots of weird situations and everything needs to be tested twice with different semantics. Obviously this requires a test infrastructure different from most other libraries!

I started off with a separate file for writing properties, but quickly found that the properties were the best type of documentation for a function. It was also a pain to keep two different places which identify what the logic of the code is - i.e. the code and the tests. The obvious idea then is to combine the code, documentation and testing into one. Because I am using Haddock that turned out to be quite easy to do - any line beginning with "-- > " is a test. Haddock sees this as monospace formatting, Haskell as comment, and my test generator can find the tests pretty easily.

Within the System.FilePath repo I have a DOS Batch file driver (test.bat) which uses a separate Haskell program (GenTests.hs) to create a test script and run it.

Within the test listing there are some tests which I refer to as constant, and some as properties. Some of the tests have no free variables, these are constant - the GenTests recognises them and outputs them directly - in one execution they either pass or fail. The properties are just standard QuickCheck properties, with the restriction that every multi-letter keyword not in a certain known set is a function in the library, and every variable x..z is a FilePath (hence using a custom FilePath generator).

The main complication in testing from System.FilePath is the fact that every property corresponds to two different tests - one on the Posix implementation, one on the Windows implementation. The translator automatically does this duplication, unless either Posix: or Windows: is given at the start of the test, in which case the test is only executed on the appropriate version.

For QuickCheck testing I defined a FilePath to be a 25 character string, from the following set of characters "?|./:\\abcd 123;_". The idea of this set of characters is to include every character that any aspect of the library treats differently, along with a small selection of "normal" letters/numbers.

There was one modification I had to make to QuickCheck, by default QuickCheck returns successfully outputting success/failure to the console. Unfortunately if an automatic program is executing over 200 tests, then these messages can get obscured in the general noise - this happened more than once. To combat this I defined a new QuickCheck wrapper which calls error on failure. Ideally the signature of quickCheck should be changed to :: .. -> IO Bool to detect these situations and allow the driver script to fail more obviously.

Without QuickCheck I don't think it would have been possible to write the FilePath library - it caught too many obscure bugs that manual testing would never have found. In addition, QuickCheck forced me to think about the properties of the library more closely - I changed some design decisions after it turned out that the properties disagreed with me. The one thing QuickCheck helped with more than anything though was refactoring - despite a massive number of the functions all depending on each other, QuickCheck allows me to change the behaviour of one function in some obscure case and check that no other function was relying on that.

The only criticism that can be levelled at my use of QuickCheck is that failing examples are not minimal, in fact they are exactly 25 characters long. I hope that at some point soon I can make use of SmallCheck (once it has a darcs repo and a .cabal file) to do testing alongside QuickCheck to get a greater depth of coverage.

All the test scripts I have written are available in the darcs repo, under the BSD3. If anyone can make use of them, I'd be happy to have someone take them forward!

Thursday, October 26, 2006

Hat+Windows, 1 happy user!

I just got an email:

"I've got hat-observe working on my code, which makes me very happy."

This is a happy Hat user, who is using my Windows port. As far as I know, this is the first Hat user on Windows who has actually got a real project going through which they really want to debug - not just test stuff.

This makes me very happy too :)

Saturday, October 21, 2006

30% faster than GHC!

I have been working on a back end to my analysis framework Catch for a while, I do lots of transformations as part of Catch, some of which speed up the code, so hooking a back end up seems sensible. Using this back end, I can be 30% faster than GHC.

Before I show any pretty graphs, there are few big and important disclaimers:

I use GHC as a back end

This means that GHC optimises my code as well!

Benchmarked against GHC 6.4.2, using -O2
Only tested on one single file! Prime numbers, from the nofib suite.
All experiments run on a P4, 3GHz
All experiments run 5 times, and the lowest number recorded
I do whole program analysis

And so on to some pretty graphs of *HC vs GHC on the prime number benchmark:

These numbers show a pretty much one third increase in speed.

One of my future tasks is to hook this directly up to a code generator, and hopefully my speed will increase even further - at the moment I have to add things in to make the output valid Haskell which slows down the generated code. A custom back end would help with this, plus I have other techniques for speeding up the back end given some of the "knowledge" accumulated by Catch. I am reasonably confident that GHC is not doing too much of the heavy work when compiling my code, as compiling without any optimisation does not penalise my GHC output too much.

How do I get fast code?

I take the Yhc compiler, and generate Yhc.Core. I take all the Core generated for all bits in the program and splat them together, including the Prelude etc. I run some analysis passes on this code, including making the code completely first order, a little bit of deforestation and a touch of partial evaluation. I then output Haskell, however my Core language is not a subset of Haskell, so some additional things need to be handled.

Will this only work for Primes?

Hopefully not! In theory the back end is general purpose, and should work for anything. In practice, I'm still working on it, and not everything is finished yet.

Whole program analysis? Thats slow!

Not really, I develop Catch in Hugs, it takes around 10 seconds to compile Primes in Hugs, using an entirely unoptimised pipeline - I even use associative lists for function name lookup - and still the performance is not too bad. Only a very small number of the steps I perform are whole program, and the ones that are only get done once in a linear fasion. It probably won't scale to "a whole compiler", but it can certainly hit 1000 line programs with no issue.

What's next?

Stop getting distracted by developing a compiler, and get back to my PhD!

Thursday, September 21, 2006

DriFT

I tried out DriFT recently, and was most impressed, it has the feel of a program that hasn't had much love recently, but thats ok, its still a very useful program.

Firstly there is no easy way to compile it on Windows - its not hard, its just not obvious either. For reference the steps are 1) cd src, 2) ren Version.hs.in Version.hs, 3) ghc --make Drift.hs. For those Windows users who don't want to do that, I've shoved up a binary on my Windows distribution page. Its certainly not hard, but its not as easy as Cabal based thingies.

Once you've done that, "drift -r File.hs" produces the goods, in a very straight forward way. What I wanted was a deriving Binary, with a loadBinary and a saveBinary interface. DriFT offers two separate binary output modes, Binary and GhcBinary - I wanted to use it in Hugs, but a bit of experimentation showed that GhcBinary has a nicer output, so I jumped with that. Once thats done I wanted to combine it with the Binary library to do the actual serialisation - there is one in the repo but it seems to have not been given the attention it needs, so was easier to write my own. See this file in the Yhc repo for how I did it - the answer is not very nicely, but quite workable.

With all this done, now Yhc spits out Core in binary, but more importantly my PhD program now has binary cache's - changing some operations from 30 seconds to 2 seconds, which is nice :)

Wednesday, September 06, 2006

Hoogle mailing list

There is now a hoogle mailing list: http://www.haskell.org/mailman/listinfo/hoogle

So instead of emailing me individually, or talking to me on IRC, instead you can just email that list and hopefully others can give useful feedback as well.

Friday, September 01, 2006

Over 65,000 Hoogle Searches!

I just checked the Hoogle logs, and found a stagering 67,112 searches have been performed.
18495 of these searches were for different terms, people searched for "map" 4036. Thats quite a lot of searching!

I think those logs are since about October, which makes this just under a years data. I have no idea of what the breakdown over time is, but Hoogle 4 will have better logging, and should be able to tell me.

Update: dons tells me there have been 3849 lambdabot @hoogle searches

Wednesday, August 23, 2006

Parsec and Hoogle

For the last few days I've been rewritting the parsing in Hoogle to use parsec. As an end result, the parser is a lot more powerful, and more maintainable, and extendable - on the downside its longer and more complex.

The thing that most impressed me about parsec was its compositionality. In Hoogle there are type signatures, which are both on database lines (map :: (a -> b) -> [a] -> [b]), and there are user queries which might have a type signature in them, amongst other junk. Thanks to parsec I can use the same type signature parser in both of them, with extensions for the relevant bits. I couldn't really do this with a traditional yacc/bison/happy parser generator. Its also great for the fuzzy nature of user searches - you don't want to parse error if there is any sensible user interpretation of what they wrote.

Thanks to this rewrite, I now get a few query goodies that I always wanted but could never properly parse. Included in this are multiple words "concat map", names and type signatures "map :: [a] -> [a]" and the search parser now checks for command line options, which stops the bugs with things like "->" being misinterpretted.

Thursday, August 17, 2006

Neil vs Cabal

Today I've been trying to get into cabal, since it seems a pretty cool thing, and it looks like the way forward.

As I've been doing this, trying to compile various projects using Cabal, it turns out that I spent all day encountering bugs! I've hit things that seem a bit curious in loads of programs, sent off patches, reported bugs, asked for clarification etc. Hopefully this will be fixed at some point soon, after enough people have bashed through it.

In particular, to try and get this going better, I'm going to try and keep the HEAD versions of various projects compiled regularly with Cabal on Windows - and then probably distribute the binaries as part of my Haskell on Windows drive.

I'm also trying to get Hoogle working properly with Cabal, since thats going to be the future way of building it, probably.

Which brings me on to a final question about the Hoogle license, what should it be? Currently Hoogle is licensed under Creative Commons Attribution-NonCommercial-ShareAlike License. Nothing else in the Haskell world is, so its not particularly sensible that Hoogle is. My basic thoughts are GPL vs BSD. What do people think one way or the other?

Wednesday, August 16, 2006

Hoogle 4 plans

After spending about the last 3 months replying to most people's Hoogle comments with "that will be fixed in Hoogle 4", its about time I actually implement Hoogle 4. Just to give people an idea of where I'm going, I thought I'd summarise what Hoogle 4 means to me :)

First off, I abandoned .ths after talking with Niklas who does Haskell-source-exts and HSP, and will be using his stuff. Its got advantages of tag safety and a better syntax, lacks a few bits, but being written by someone else saves me a bit of work. Its also well supported, something thats essential!

Anyway, the plans for Hoogle 4 fall into a few areas:

No bugs: type classes, type aliases, higher kinded type classes - all these things confuse Hoogle. Either they are bugs, or they come close enough to count as them. These will all be fixed.

Help the user: searching for ThreadID doesn't work (its ThreadId), searching for Just a doesn't work, searching for Maybe -> Int doesn't work, all just fail silently. I want to tell people when they are searching a bit dubiously and fix it for them.

Do what users keep asking for: often users do searches for multiple words, "map concat", this currently gives them very confusing results (the type "m a" is equivalent).

Be a database: I want to give more database like features, lookup Just will give you the functions that use it.

Faster: I want to make text searching 100's of times faster, which isn't so the results come back faster but so that...

Packages: I want to be able to search packages other than the default ones, such as Gtk2hs (which you can already Hoogle search), and lots lots more. Which requires a speed boost.

AJAX: I have a few good AJAX ideas for making searching just a little bit quicker.

Lots to do, and will probably be an entire rewrite (again...), but this is hopefully going to be the version that sticks arond for a very long time and comes out of Beta.

Sunday, August 13, 2006

.ths (Textual Haskell Source)

I've been doing some work on Hoogle 4 over the last week while I've been away from a computer. Lots of cool new ideas, some paper code, and other goodies - will probably be a few weeks before I start to crank out implementations and improvements get seen on the main Hoogle site, and perhaps 2 or 3 months before Hoogle 4 starts to take shape.

Anyway, one thing Hoogle needs to do is to output a web page, and at the moment it does that by reading in text files, and writing them out. To do a typical search page it shoves out a standard prefix, a top bit, the answers and a suffix. Only the answers are generated on the fly, the others are included in. Of course, this means that the HTML is in 4 places, and the reusability is poor (files are chunks of text, not functions). The pages also have small tweaks to them based on dynamic data - for example the title of the page is the search query. To accomodate this I had to add $ replacement, so $ becomes the search query. Messy, and not very general.

So to answer all this, I devised .ths - Textual Haskell Source. Currently you have .hs (source code is the main thing), .lhs (comments are the main thing) and now you have .ths (text is the main thing). Lets start with an example:

> showHtml :: String -> String
> showHtml search = pure
[html]
[head]
[title]<% search %> - Hoogle[/title]
[/head]

Note that in this example I am escaping the code (with > ticks), and the text is just the main bit. I also have <% code %> which is substituted inline.

I can also do more advanced things (naturally)

> showList :: FilePath -> [Bool] -> IO String
> showList filename items =
> src <- readFile filename
The length of <% filename %> is <% length src %>
And the booleans are
<%for i = items %> <%if i %>ON<%else%>off<%end if%><%end for%>

I have all these bits implemented, and hope to make a release in a few days. I kind of have to release, because the current darcs version of Hoogle will be using them in a few days anyway.

And of course, since all this stuff is Haskell, its easier to compose, call as functions, etc.

Thoughts or comments?

Friday, July 07, 2006

Haskell debugging with Hat on Windows

For the last couple of days I've been trying to get Hat working on Windows. I now have over half the tools working on Windows, and have a bundle ready for Windows users to install: http://www.cs.york.ac.uk/fp/temp/hat-win32-05_jul_2006.zip

How to install: Extract the contents of the .zip file into a folder preserving
directory structure. Add the folder containing hat-make to your %PATH% variable, this is
100% required, even if you give the explicit path to hat-make when you
use it. Make sure ghc is available on your system and has been added to the %PATH%.

How to use: cd to the directory containing your Haskell source
hat-make Main.hs
main
hat-stack Main.hat
hat-observe Main.hat

I am also working on a graphical user interface for these tools, a screenshot is here, using Gtk2Hs. - its not ready yet, but hopefully soon.

If any Windows users try this out and find that it either works or does not, please let me know.