Saturday, January 26, 2008

Safety and Optimisation: joinMaybes'

In a recent blog post by Conal, he introduced the joinMaybes' function, defined as:

filterMP' :: MonadPlus m => (a -> Bool) -> m a -> m a
filterMP' p = (>>= f)
f a | p a = return a
| otherwise = mzero

joinMaybes' :: MonadPlus m => m (Maybe a) -> m a
joinMaybes' = liftM fromJust . filterMP' isJust

He laments that the use of isJust and fromJust mean that his code will run slower (having two Just tests), and that an automated checker such as Catch won't be able to check it successfully. Fortunately, Catch can check the code perfectly, and Supero can optimise the code perfectly. As such, this simple definition is perfectly fine from all points of view. I'm going to go through the checking with Catch in some detail, and if anyone wants, I'll post another article on the optimisation with Supero.

Checking With Catch

To simplify things, I'm going to work only in the [] monad, so here is a new variant of the code:

filterMP' :: (a -> Bool) -> [a] -> [a]
filterMP' p = concatMap f
f a | p a = [a]
| otherwise = []

joinMaybes' :: MonadPlus m => m (Maybe a) -> m a
joinMaybes' = map fromJust . filterMP' isJust

Catch would remove the dictionaries before starting, so would accept the original code unmodified. The first thing Catch wold do is reduce this fragment to first-order. The end translation would be:

filterMP_isJust x = concatMap_f x
concatMap_f [] = []
concatMap_f (x:xs) = (if isJust x then [x] else []) ++ concatMap_f xs

joinMaybes x = map_fromJust (filterMP_isJust x)
map_fromJust [] = []
map_fromJust (Just x:xs) = x : map_fromJust xs
map_fromJust (Nothing:xs) = error "Pattern match error"

I've also taken the liberty of inlining the otherwise, and used pattern matching rather than case expressions. Catch will take care of those details for us, but the code is a little easier to follow without them in. Now Catch can begin the checking process.

Catch first decideds that if map_fromJust is passed a list matching the pattern (Nothing:_), it will crash, and annotates the precondition of map_fromJust as being either the input list is [] or (Just _:_). It then spots the recursive call within map_fromJust, and determines that the revised precondition should be that the input list is a list, or any length, whose elements are all Just constructed (we call this condition P).

Having determined the precondition on map_fromJust, it uses that within joinMaybes. Catch transforms the condition P, trying to find the precondition on filterMP_isJust to ensure the postcondition P holds. By examining each branch, Catch determines that under all circumstances the postcondition will hold, therefore the precondition is just true. Given that filterMP_isJust always satisfies the precondition of map_fromJust, it is clear that joinMaybes never crashes.

Catch can generate the above proof automatically, showing the above function is safe.

Friday, January 25, 2008

Functional Flow Control

In normal programming languages, there are many keywords for flow control, such as for, while etc. These flow control keywords encode a particular pattern of iteration, such as looping over a range (in the case of for) or continuing until some condition holds (while). Imperative programming languages continue to add more iteration keywords: both C#/Java have introduced some form of for-each; Python/C# have yield; Ada has many variants.

Haskell doesn't have these iteration keywords, but instead relies on recursion. This choice, when coupled with a few other Haskell ingredients, makes it much more powerful. Take for example the task of looping over a sequence, adding 1 to each element:

// In C
for (int i = 0; i < n; i++)

-- In Haskell
incList (x:xs) = 1+x : foo xs
incList [] = []

I've used C mutating an array, and Haskell allocating a new list, simply because that would be the natural thing to do in each language. However, the great thing about higher-order functions is that we can now go back and abstract the flow control in Haskell, giving us:

incList = map (+1)

The above function maps over a list, incrementing each element. People have identified a common pattern (iterating over a list) and rather than baking it into the language with a keyword such as iterate-over-list, a library function can provide the operation. It is very important that map is not special in any way, and can simply be defined as:

map f (x:xs) = f x : map f xs
map f [] = []

The great advantage is that rather than being restricted to a limited range of flow control operators that someone somewhere decided upon, we can add new ones. Let's take another example, that of summing a list:

-- in C
int total = 0;
for (int i = 0; i < n; i++)
total += list[i];

-- in Haskell
sum [] = 0
sum (x:xs) = x + sum xs

-- or using the built in foldl
sum = foldl (+) 0

In Haskell there is a standard library function foldl which iterates over a list using an accumulator, managing updates to the accumulator for you, and setting an initial value. In C there is no such operator, so the more general purpose for is used.

But these examples are very common, so C's for keyword has provided most most of the control flow. However, sometimes you need more exotic flow control, which the authors of the language did not think of including. Take the example of computing a fixed point of a function f on the value x:

int x;
while (1) {
int x2 = f(x);
if (x == x2) break;
x = x2;

fix f x = if x == x2 then x else fix f x2
where x2 = f x

Here the Haskell version shows its power, instead of having defined a particular instance for a particular f and a particular type of value x, in Haskell we have basically defined fix as a new form of flow control.

In C we were still able to define something, but it was much harder. Now consider the following example that I was working on yesterday. I have an algorithm which has 4 stages, lambda, simplify, inline, specialise. Each stage must be run in turn, but if any stage changes something, then we restart from the beginning. For example, we apply lambda, then simplify - if something changes we restart at lambda. We only finish once all the stages have been run without changing anything. In Haskell this code is simple:

fixList orig x = f orig x
f [] x = x
f (a:as) x = if x == x2 then f as x else f orig x2
where x2 = a x

fixList [lambda,simplify,inline,specialise]

I define a new function called fixList, which provides an abstraction of flow control. The actual operations have been well isolated from this structure. I considered how to express this in structured C and drew a complete blank. My best guess is:

int x, x2;

x = x2;

x2 = lambda(x) ; if (x != x2) goto begin;
x2 = simplify(x) ; if (x != x2) goto begin;
x2 = inline(x) ; if (x != x2) goto begin;
x2 = specialise(x); if (x != x2) goto begin;

It might be possible to change some of this flow control into a macro, but I can think of no clean abstraction. Haskell is a great language for building abstractions, flow-control is just one highly useful instance.

PS: C Issues

Some of the code in this article isn't quite as nice as I'd wanted it to be, and isn't really a fair comparison. The array processing code in C relies on having defined n to be the length of the list, and having that tracked separately. The fixed point definition on C works over int to get a nice equality test, but that is merely a limitation of the language not having a standardized way to do equality. The C code could use function pointers, but in reality inline takes an extra argument so is used as a closure in Haskell - and besides, that would hardly be the standard way of coding C.

Tuesday, January 15, 2008

darcs Feature Request

I use darcs all the time. I have 15 repos listed on my website, and at least an additional 10 that haven't made that list yet. The main reason I use version control is for syncronising files between the various computers I use (at least 4 on a regular basis). Of course, having the version control features available as well is an added bonus. I used to use CVS, and its Windows graphical companion, Tortoise CVS, which is a fantastic tool. However, as the Haskell community has migrated to darcs it has become more convenient to standardise on one single tool.

All my darcs repo's are available behind ssh (for pushing to) and behind http (for pulling from). So for example, to pull from the filepath library, I use the http accessible version:

darcs pull

And to push my changes I do:

darcs push

The reason for pulling using http is that this operation doesn't require a public key to be active, doesn't fail if SSH is blocked (which it is in at least one of the locations I regularly visit), and is much quicker.

One feature of darcs is that it saves the last repo address used in a prefs file, so after typing the long darcs pull command the first time, in future I can just type:

darcs pull

Much handier! Unfortunately, if I then do a long-winded push command, it overrides the default repo, and future pull's would go from the ssh path. There is a solution, instead type:

darcs push --no-set-default

Ouch! Now that command line is getting a bit of a mouthful. As a result, I pasted this line into a file push.bat, and committed that to the repo. Now I can do the simple darcs pull and a push is as short as:


But this isn't ideal. It's a bit sad that one of the core Hsakell libraries contains a batch script which is platform specific and only works for people with shared keys on my Haskell account (which is now about 10 people...). I'd love to remove this script, which brings me on to...

Feature Request: have darcs allow a different default for push and pull commands.

One way of engineering this would be for darcs to store them entirely separately. Another option would for pull to use the last repo url, and push to use the last repo that wasn't http (since you can't usually push over http). There may be other options that will achieve the same effect.

Friday, January 11, 2008

British Telecom and Evil Sales Tactics

Not a Haskell post, just a rant about how evil British Telecom (BT) are.

When I got home this Christmas I was welcomed with a router sitting near the computer, and asked to "install it". This perplexed me - when I left my parents left I had already set up broadband, with a wireless router, and it was all working perfectly. I chose to go with BT for my parents, because BT should be the simplest choice - they already have a BT bill, and a BT line, so adding a broadband should be simple. Also, if anything goes wrong, then it has to be a problem with BT.

The package I signed my parents up to came with a really rubbish little USB modem, which I replaced with a nice wireless router with 4 wired ports from Ebuyer. It also had a 20Gb/month limit, with warnings if we went over the limit. Based on talking to my brother (who still lives at home, and is the largest internet user), I suspect we transfer around 15Gb/month.

However, while I was away at Uni, someone from BT phoned up the house. Unfortunately, my Dad answered. My Dad is not qualified to deal with computers. If he wants to "visit" a website he writes down the address on a post-it note and places it near the computer, to be printed out. The person on the end of the phone told my Dad that in we could upgrade our package, making it 4 times faster and £6/month cheaper, and not changing anything else. He agreed, and entered into an 18 month contract with this new plan.

As part of this new plan, we got a wireless router, which is currently sitting in a cupboard somewhere (its inferior to the £40 one from ebuyer in every way). We also got upgraded from 2Mb/s to 8Mb/s, a "significant increase in speed". However, in practice, that makes absolutely no difference. I tried explaining this to people at BT, and they resolutely claimed it was 4 times faster. The real kick in the teeth though is that this new package is limited to 8Gb/month, and any additional usage is charged. This is particularly dangerous: if for example a trojan ends up on the machine, it could run up expensive bills. An 8Gb limit is most definitely not "the same" as our original plan, which was chosen based on what we actually needed.

Over Christmas I had the fun job of trying to sort this out. I rang up BT several times, spent a lot of time listening to bad music and got repeatedly transferred between departments. If the largest telecom company in Britain can't run a call centre, who can? If someone lies to you while selling broadband do you want complaints, broadband, or sales? The answer turned out to be broadband technical support. In the end, after explaining what happened, I got told that they would review the sales call and check if the person selling the contract had lied.

A week later, they phoned back and admitted that we had been completely missold the package. We are now back on the 20Gb/month quota, with a £3/month cost reduction. I had originally chosen to go with BT to keep things "simple". Had I realised that actually trying to get through to anyone sensible was a nightmare, and that they would phone up my parents and lie to them, I would have chosen differently! It's too much effort to change now, but I strongly recommend that anyone else chooses an alternative provider. BT is not cheapest, does not provide the best service, and is not the simplest. There is absolutely no reason to go with BT.

My personal choice of internet provider is Eclipse. If you do choose to go with them, and use the referral code "NMITCHELL" I get some money off my bill, but I strongly recommend them even if you don't use that code. Compared to my parents, I pay half the cost. I have a limit of 20Gb/month, but only counting between 6pm and midnight - sensibly encouraging people to perform massive downloads at off-peak times. If I go over the quota, I get a few warnings, then they start throttling the service, never charging me money or cutting me off. Of course, at such a large quota active only at certain times, it would be a challenge to exceed it. The email support is fantastic, and the web interface is excellent - while setting up an account you get hourly updates on each aspect. I was able to transfer my broadband from one house to another, on a laptop I borrowed for 2 minutes, while in Germany. It isn't the absolute cheapest, but for only a few extra pounds per month, you get a superb service.

Sunday, January 06, 2008

Freedom of Information

The entire academic community is based on the idea of taking previous people's work, and building on it. "Standing on the shoulders of Giants" is the goal. Unfortunately, this can only happen when the giants that have gone before give free access to their shoulders. If a piece of academic research is not made freely available, did it really achieve its goal? Did it build on human knowledge? No.

I am currently working on homeomorphic embedding. I'll write more about it in future, but the basic idea is that the relationship a \lhd b is a type of ordering (read \lhd as a left-pointing triangle with a bar under it). This ordering happens to be particularly good at ensuring termination. All is good.

Unfortunately a simple test of this relation takes exponential time, and is therefore infeasible. But good news - Jonathan Stillman determined how to reduce the running time down to O(size(a) . size(b) . arity), a perfectly reasonable time bound. The algorithm is contained in his thesis, "Computational problems in equational theorem proving", 1989. To obtain his thesis, the only way I can find is this link. To actually view the thesis, and the very important algorithm contained within, will cost a minimum of $46. Of course, its entirely possible it will rely on some side condition I can't guarantee, or not meet my needs for some other reason, but I won't know until after the money is gone.

I want to take the information, acknowledge it, and use it. Today I want to build a highly-optimising compiler for Haskell, tomorrow I might put the information to another use. After I've used the information, I will make my new discoveries and ideas available for free download on my website. Hopefully someone will be able to build on my ideas in new ways, and the cycle can continue. Hopefully one day all the information generated will be available for free on the web, then we truely can build on the work of those who have gone before.

Update: Matthew Danish (mrd on IRC) very kindly scanned and sent me the tech report "On The Complexity Of Homeomorphic Embeddings" by Stillman, which contains exactly the information I want. I am incredibly greatful to Matthew.

Wednesday, January 02, 2008

Associative Lists

Normal lists are indexed by position, an associative list is one which is indexed by some other key. Take for example my fruit basket at the moment: I have 7 apples and 3 bananas. In Haskell the obvious way of writing an associative list is:

fruit :: [(String,Int)]
fruit = [("apple",7),

I can then ask questions about my fruit basket:

-- How many apples do I have?
> lookup "apple" fruit
Just 7
> lookup "orange" fruit

-- How much fruit do I have?
> sum (map snd fruit)

-- What fruit do I have most of?
> fst $ maximumBy (comparing snd) fruit

Since I am storing associative lists as lists of tuples, to search through the list for a key requires O(n) time. By using the Data.Map module, this can be reduced to O(log n).

import qualified Data.Map as Map
fruitMap :: Map.Map String Int
fruitMap = fromList fruit

> lookup "apple" fruitMap
Just 7

Often when writing a large associative list, it can get a bit tiresome with all the (,) tuples. To alleviate this, I tend to write my associative lists as:

fruit = let (*) = (,) in
["apple" * 7
,"banana" * 3

I redefine the (*) operator to be tuple, to remove the extra bracketing. I like the use of * as it has a nice cross-product feel, which goes well with the idea of tuples. I also align the commas at the start of the list, which I think looks more attractive.

Disclaimer: When I started writing this article I had 7 apples, now I only have 6. I also threw away the 3 bananas as they had started to go bad.