Neil Mitchell's Blog (Haskell etc): 2011

Wednesday, December 14, 2011

Enabling Reply To All in Outlook - version 2

I previously described how to enable Reply To All in Outlook. Since then I've modified the VBA script so it works on both Office 2003 and 2007, gives better error messages, automatically replaces the disabled Reply To All buttons, allows editing messages (even if that feature has been disabled) and strips messages of inline images. Installation instructions and links to the code are in the user manual:

http://community.haskell.org/~ndm/darcs/office/ReplyToAll.htm

I don't want to duplicate the installation instructions here, as I will likely revise them over time. However, at the bottom of the manual is a plea to Outlook administrators not to disable Reply To All in the first place. I don't think arguments against disabling essential email functionality can be made often enough, so here is my plea:

Plea to Outlook Administrators

Please do not disable essential email functionality. With the workarounds linked to above, the attempt is futile, but remains deeply inconvenient. Consider the situation where Alice sends an email to Bob, Charlie and Dave asking for some financial details of Mega Corp. Bob has the details on a post-it note on his desk and quickly replies to everyone with the information. But in a world without Reply To All...

Bob replies just to Alice. Charlie, being the helpful soul he is, decides to search through the filing cabinet for the information. 15 minutes later Charlie finds the details and emails Alice, only to be told "Thanks, Bob answered this 15 minutes ago". Charlie realises he just wasted 15 minutes of his life, and goes to get a cookie to make himself feel better.

Bob replies just to Alice. At the end of the week Dave is reviewing his emails and realises that Alice still hasn't got a reply, but before he potentially wastes 15 minutes, he drops Alice an email - "Do you still want those details?". Alice replies "No". Dave concludes that he's a clever bunny for not going to the filing cabinet, and decides a one week latency when replying to Alice is just common sense.

Bob replies, but realising the potential cost of replying just to Alice, also replies to Charlie and Dave. Bob spent a minute retyping the recipient list, and wonders "Why can't Outlook have a button that does this for me?".

Bob replies, also including Charles and Dave. Woops! That should have been Charlie, not Charles. As a best case scenario, Charles gets annoyed with a useless email, but at worst Charles brings down Mega Corp with the sensitive information gleaned from the email. Charlie still ends up going to get a cookie.

Removing Reply To All increases the volume of email required, and increases the risk of email accidents. I've heard only two arguments against Reply To All, both of which are wrong:

When Bob replies to everyone, the financial information about Mega Corp gets sent to Charlie, but what if Charlie shouldn't have access to that information? Of course, if Charlie shouldn't have access to that information then Alice was at fault for sending the request to Charlie in the first place. Bob should also check when sending sensitive information, but most emails are not sensitive (it is a poor default), checking the senders should not require retyping the senders (it is a poor user interface) and by default mailing lists are not fully expanded (so he won't be able to tell the full list of senders anyway).

When Zorg, the owner of Mega Corp, sends a Christmas email to all his employees, what if one of them hits Reply To All? That wastes a lot of company time, and should be prevented - but not on the client side. It is possible to restrict the number of recipients to an email, Zorg could send out the email with everyone in the BCC field, or IT could set up a mailing list which does not permit replies.

Saturday, November 05, 2011

Abstract Generics with Uniplate

Summary: The new version of Uniplate has several wrappers for working with abstract data types, such as Map from the containers package. These wrappers let you transform abstract data types without breaking their invariants.

Abstract data types, such as Map from the containers package, hide their internal structure so they can maintain invariants necessary for their correct operation. Generic programming, such as the Data class used by SYB, allows programs to generically traverse data types, without detailed knowledge of their structure. The two are somewhat incompatible - if Map allows manipulating it's internal structure, the invariants could be broken causing Map to perform incorrectly.

Using the Data class, a Map String Int claims it has no constructors, but if you operate on it with gfoldl is behaves as though it were [(String,Int)]. When Uniplate traverses the Map, such as when searching for an Int, it first analyses the available constructors to see if a Map String Int can possibly contain an Int. Since Map has no constructors, it concludes that Map cannot contain an Int, and Uniplate fails to operate on the contained Int's.

For people who use the Data-style Uniplate module, using the new version of Uniplate you can now correctly operate over Map String Int, provided you use the newtype Map wrapper from Data.Generics.Uniplate.Data.Instances. When you transform over Bool (which does not touch the Map) it will ignore the Map and take O(1). When you transform over Int it will reconstruct the Map in O(n), and if you transform String or Char it will reconstruct the Map in O(n log n). Regardless of what operations you do, it will work efficiently and correctly. As an example:


$ import qualified Data.Map as Map
$ import Data.Char
$ import Data.Generics.Uniplate.Data
$ import Data.Generics.Uniplate.Data.Instances
$ fromMap $ transformBi toUpper $ toMap $ Map.fromList [("haskell",12),("test",18)]
fromList [("HASKELL",12),("TEST",18)]

There are two approaches for dealing with the Map problem in Uniplate. For users of the Direct-style Uniplate module, there is a function plateProject, which has been available for some time. For users of the Data-style Uniplate module, or for users of the SYB package, there is a new module Data.Generics.Uniplate.Data.Instances in uniplate-1.6.4 (released today) which provides three types with special Data instances (Hide, Trigger, Invariant). Using these three types we can construct wrappers providing Data instances for abstract types, and Uniplate includes wrappers for several of the types in the containers package (Map, Set, IntMap, IntSet).

plateProject

The plateProject function helps you define Direct Uniplate instances for abstract types. Instead of defining how to examine the data type, you instead define how to transform the data type into one you can examine, and how to transform it back. As an example:


instance Biplate (Map.Map [Char] Int) Char where
    biplate = plateProject Map.toList Map.fromList

If the types ensure that any operations will not change the keys we can optimise and use the fromDistictAscList function to reconstruct the Map:


instance Biplate (Map.Map [Char] Int) Int where
    biplate = plateProject Map.toAscList Map.fromDistinctAscList

Hide

The Hide data type is useful for wrapping values that you want to ignore with Uniplate. The type is defined as:


data Hide a = Hide {fromHide :: a}

But has a Data instance that pretends it is defined using the extension EmptyDataDecls as:


data Hide a

As an example of avoiding particular values, you can write:


transformBi (+1) (1, 2, Hide 3, Just 4) == (2, 3, Hide 3, Just 5)

As a result of having no constructors, any calls to the methods toConstr or gunfold will raise an error.

Trigger

The Trigger data type is useful for determining when a value was constructed with the Data methods. It is defined as:


data Trigger a = Trigger {trigger :: Bool, fromTrigger :: a}

But the Data instance pretends that it is defined as:


data Trigger a = Trigger a

However, whenever gfoldl or gunfold constructs a new value, it will have the trigger field set to True. The trigger information is useful to indicate whether any invariants have been broken, and thus need fixing. As an example:


data SortedList a = SortedList (Trigger [a]) deriving (Data,Typeable)
toSortedList xs = SortedList $ Trigger False $ sort xs
fromSortedList (SortedList (Trigger t xs)) = if t then sort xs else xs

This data type represents a sorted list. When constructed the items are initially sorted, but operations such as gmapT could break that invariant. The Trigger type is used to detect when the Data operations have been performed, and resort the list.

The Trigger type is often used in conjunction with Invariant, which fixes the invariants.

Invariant

The Invariant data type is useful for ensuring that an invariant is always applied to a data type. It is defined as:


data Invariant a = Invariant {invariant :: a -> a, fromInvariant :: a}

But the Data instance pretends that it is defined as:


data Invariant a = Invariant a

Whenever a gfoldl constructs a new value, it will have the function in the invariant field applied to it. As an example:


data SortedList a = SortedList (Invariant [a]) deriving (Data,Typeable)
toSortedList xs = SortedList $ Invariant sort (sort xs)
fromSortedList (SortedList (Invariant _ xs)) = xs

Any time an operation such as gmapT is applied to the data type, the invariant function is applied to the result. The fromSortedList function can then rely on this invariant.

The gunfold method is partially implemented - all constructed values will have an undefined value for all fields, regardless of which function is passed to fromConstrB. If you only use fromConstr (as Uniplate does) then the gunfold method is sufficient.

Map

Using the Hide, Trigger and Invariant types, we can define a wrapper for the containers Map type as:


newtype Map k v = Map (Invariant (Trigger [k], Trigger [v], Hide (Map.Map k v)))
    deriving (Data, Typeable)

The Map type is defined as an Invariant of three components - the keys, the values, and the underlying Map. We use Invariant to ensure that the keys/values/map always remain in sync. We use Trigger on the keys and values to ensure that whenever the keys or values change we rebuild the Map, but if they don't, we reuse the previous value. The function to extract a containers Map from the wrapper requires only simple pattern matching:


fromMap (Map (Invariant _ (_,_,Hide x))) = x

The function to wrap a containers Map is slightly harder, as we need to come up with an invariant restoring function:


toMap :: Ord k => Map.Map k v -> Map k v
toMap x = Map $ Invariant inv $ create x
    where
        create x = (Trigger False ks, Trigger False vs, Hide x)
            where (ks,vs) = unzip $ Map.toAscList x

        inv (ks,vs,x)
            | trigger ks = create $ Map.fromList $ zip (fromTrigger ks) (fromTrigger vs)
            | trigger vs = create $ Map.fromDistinctAscList $ zip (fromTrigger ks) (fromTrigger vs)
            | otherwise = (ks,vs,x)

The create function creates a value from a Map, getting the correct keys and values. The inv function looks at the triggers on the keys/values. If the keys trigger has been tripped, then we reconstruct the Map using fromList. If the values trigger has been tripped, but they keys trigger has not, we can use fromDistinctAscList, reducing the complexity of constructing the Map. If nothing has changed we can reuse the previous value.

The end result is that all Uniplate (or SYB) traversals over Map result in a valid value, which has had all appropriate transformations applied.

Tuesday, November 01, 2011

Haskell Implementors Workshop 2011

The videos from the Haskell Implementors Workshop 2011 are now online at YouTube. I'm currently uploading them to Vimeo, but that will take another week (I ran out of free quota).

Many thanks to Kento EMOTO who did all the editing and conversion, and to all the speakers for the great talks.

Sunday, October 30, 2011

Calling Haskell from R

Summary: This post describes how to write a function in Haskell, and then call it from R.

R is a very popular language for statistics, particular with biologists (and computational paleobiologists). For writing high performance code, the R developers recommend the use of C or Fortran - not languages that are particularly easy for beginners. However, you can instead write a Haskell function that can be called directly from R. The basic idea is to create a C compatible library using Haskell (as described in the GHC users manual) and then call that library from R (as described in this document). As a simple example, let's write a function that adds up the square roots of a list of numbers.

Create an R-compatible Haskell library

In normal Haskell, we would define the function to add up the square roots of a list as:

sumRoots :: [Double] -> Double
sumRoots xs = sum (map sqrt xs)

However, to make a function that is compatible with R, we have to follow two rules:

Every argument must be a Ptr to a C compatible type, typically Int, Double or CString. (To be pedantic, we should probably use CInt or CDouble, but using GHC on Windows these types are equivalent - keeping things simpler.)
The result must be IO ()

Obeying these restrictions, we need to use the type:

sumRootsR :: Ptr Int -> Ptr Double -> Ptr Double -> IO ()
sumRootsR n xs result = ...

Instead of passing in the list xs, we now pass in:

n, the length of the list xs
xs, the elements of the list
result, a space to put the result

We can implement sumRootsR by using the functions available in the Foreign module:

sumRootsR :: Ptr Int -> Ptr Double -> Ptr Double -> IO ()
sumRootsR n xs result = do
n <- peek n
xs <- peekArray n xs
poke result $ sumRoots xs

This function first gets the value for n, then for each element in 0..n-1 gets the element out of the pointer array xs and puts it in a nice list. We then call the original sumRoots, and store the value in the space provided by result. As a general rule, you should put all the logic in one function (sumRoots), and the wrapping in another (sumRootsR). We can then export this function with the definition:

foreign export ccall sumRootsR :: Ptr Int -> Ptr Double -> Ptr Double -> IO ()

Putting everything together, we end up with the Haskell file:

-- SumRoots.hs
{-# LANGUAGE ForeignFunctionInterface #-}
module SumRoots where

import Foreign

foreign export ccall sumRootsR :: Ptr Int -> Ptr Double -> Ptr Double -> IO ()

sumRootsR :: Ptr Int -> Ptr Double -> Ptr Double -> IO ()
sumRootsR n xs result = do
n <- peek n
xs <- peekArray n xs
poke result $ sumRoots xs

sumRoots :: [Double] -> Double
sumRoots xs = sum (map sqrt xs)

We also need a C stub file. The one described in the GHC users guide works well:

// StartEnd.c
#include <Rts.h>

void HsStart()
{
int argc = 1;
char* argv[] = {"ghcDll", NULL}; // argv must end with NULL

// Initialize Haskell runtime
char** args = argv;
hs_init(&argc, &args);
}

void HsEnd()
{
hs_exit();
}

We can now compile our library with the commands:

ghc -c SumRoots.hs
ghc -c StartEnd.c
ghc -shared -o SumRoots.dll SumRoots.o StartEnd.o

This creates the library SumRoots.dll.

Calling Haskell from R

At the R command prompt, we can load the library with:

dyn.load("C:/SumRoots.dll") # use the full path to the SumRoots library
.C("HsStart")

We can now invoke our function:

input = c(9,3.5,5.58,64.1,12.54)
.C("sumRootsR", n=as.integer(length(input)), xs=as.double(input), result=as.double(0))$result

This prints out the answer 18.78046.

We can make this function easier to use on the R side by writing a wrapper, for example:

sumRoots <- function(input)
{
return(.C("sumRootsR", n=as.integer(length(input)), xs=as.double(input), result=as.double(0))$result)
}

Now we can write:

sumRoots(c(12,444.34))

And get back the answer 24.54348. With a small amount of glue code, it's easy to call Haskell libraries from R programs.

Update: See the comments below from Alex Davis for how to do such things on newer versions of Mac OS.

Saturday, October 01, 2011

Haskell Implementors Workshop 2011 slides

I've just uploaded the slides for all the talks given at the Haskell Implementors Workshop 2011 - just click on "Slides" next to any of the talks. Many thanks to all the speakers for very interesting talks, and for sending me their slides.

There will be video, hosted on either YouTube or Vimeo, at some point in the future. I've got all the copyright permission forms, but I don't yet have the video footage. I'm currently working to get a copy of the video sent from Japan.

Template Haskell fights with Generic Programming

Summary: The InfixE construction in Template Haskell fits poorly with generic programming, because its type does not capture all its restrictions. This mismatch can result in confusing bugs, but there is a simple workaround.

I have often said that anyone manipulating abstract syntax trees, without using some form of generic programming, is doing it wrong. Recently I have been manipulating Template Haskell syntax trees using Uniplate, my preferred generic programming library. Consider the problem of replacing all instances of delete with deleteBy (==) - this task can be done with Template Haskell:


{-# LANGUAGE TemplateHaskell #-}
module IHateDelete where
import Data.List
import Language.Haskell.TH
import Data.Generics.Uniplate.Data

iHateDelete :: Q [Dec] -> Q [Dec]
iHateDelete = fmap (transformBi f)
    where
        f :: Exp -> Exp
        f (VarE x) | x == 'delete = VarE 'deleteBy `AppE` VarE '(==)
        f x = x

We can test this function with:


{-# LANGUAGE TemplateHaskell #-}
import IHateDelete
import Data.List

$(iHateDelete
    [d|
        mapDelete x = map (delete x)
        myElem x xs = length (delete x xs) /= length xs
    |])

To see the result of running iHateDelete pass the flag -ddump-splices. As far as we can tell, our iHateDelete function works perfectly. But wait - let's try another example:


$(iHateDelete
    [d|
        myDelete x xs = x `delete` xs
    |])

In GHC 6.12, we get a GHC panic. In GHC 7.2 we get the error message:


Operator application with a non-variable operator: deleteBy (==)
(Probably resulting from a Template Haskell splice)

(I would find this message far clearer if it said "Infix application..." rather than "Operation application...")

The body of myDelete starts out as:


InfixE (Just (VarE 'x)) (VarE 'delete) (Just (VarE' xs))

After our transformation, this becomes:


InfixE (Just (VarE 'x)) (AppE (VarE 'deleteBy) ('(==))) (Just (VarE' xs))

Or, written in Haskell syntax:


x `deleteBy (==)` xs

This expression is not valid Haskell, and causes an error when spliced back in (when inserted back into the Haskell code).

The Problem

The underlying problem is called out in the Template Haskell Exp documentation:


InfixE (Maybe Exp) Exp (Maybe Exp)
    -- ^ It's a bit gruesome to use an Exp as the operator, but how else can we distinguish constructors from non-constructors?
    --   Maybe there should be a var-or-con type? Or maybe we should leave it to the String itself?

The operator in InfixE has a type which permits any expression, but has the restriction that when spliced back in the expression must only be a VarE or ConE. This restriction poses a problem for generic programming, where values are treated in a uniform manner. Sadly, both of the suggested fixes would also cause problems for generic programming. Perhaps the true fix is to let Haskell have arbitrary expressions for infix operators? Or perhaps Template Haskell should translate InfixE to AppE if the operator is incompatible with Haskell?

The Workaround

As a workaround, you can translate away all InfixE expressions that have a complex middle expression. I use the following function:


fixupInfix :: [Dec] -> [Dec]
fixupInfix = transformBi f
    where
        bad VarE{} = False
        bad ConE{} = False
        bad _ = True

        f (InfixE a b c) | bad b = case (a,c) of
            (Nothing, Nothing) -> b
            (Just a , Nothing) -> b `AppE` a
            (Nothing, Just c ) -> VarE 'flip `AppE` b `AppE` c
            (Just a , Just c ) -> b `AppE` a `AppE` c
        f x = x

The original iHateDelete can then be modified to become:


iHateDelete = fmap (fixupInfix . transformBi f)

Tuesday, September 06, 2011

Faster WPF ComboBoxes

If you create a WPF ComboBox with 2000 items it will take about 2 seconds to drop down, and about 1 second to retract back. But you can make all operations complete instantly if the items are simple (such as strings). If you are writing XAML, add:


<ComboBox>
  <ComboBox.ItemsPanel>
    <ItemsPanelTemplate>
      <VirtualizingStackPanel />
    </ItemsPanelTemplate>
  </ComboBox.ItemsPanel>

  ...
</ComboBox>

If you are writing C#, add:


comboBox.ItemsPanel = new ItemsPanelTemplate(new FrameworkElementFactory(typeof(VirtualizingStackPanel)));

Thanks to Henry Hahn for the XAML code I started from. Reading the C# version, I am reminded of the Hammer Factory joke.

Sunday, September 04, 2011

Sharing in Haskell

Summary: The let and lambda constructs give a precise way to control sharing, but their exact use can be tricky. This post gives some worked examples and general guidance.

An easy way to improve performance is to call something fewer times, which requires understanding how many times something gets called. One topic I find myself regularly explaining is how lambda expressions under let expressions affect sharing. Consider the two following examples:

Example 1


f x y = sqrt x + y

result = f 1 2 + f 1 4

Example 2


f x = let sqrt_x = sqrt x in \y -> sqrt_x + y

result = let f_1 = f 1 in f_1 2 + f_1 4

Question

In each example, how many times is sqrt executed to compute result? (Assume no advanced optimisations - these often break down on larger examples.)

Answer

In Example 1 we execute sqrt twice, while in Example 2 we execute sqrt once. To go from Example 1 to Example 2 we need to make two changes:

Step 1: Rewrite f to compute sqrt after one argument instead of two.

Step 2: Rewrite result to share the result of f with one argument.

Performing either rewrite alone will still result in sqrt being executed twice.

Step 1: Rewriting f

Let's take a look at the original definition of f:


f x y = sqrt x + y

Rewriting this function in English, we can describe it as:


given x and y, compute sqrt x + y

But the computation of sqrt x does not depend on y. If the computation of sqrt x is expensive, and if we know the function will often be called with the same x for many different values of y, it is better to describe it as:


given x, compute sqrt x, then given y, add that value to y

The Haskell syntax for this description is:


f = \x -> let sqrt_x = sqrt x in \y -> sqrt_x + y

Which would usually be written in the equivalent declaration form as:


f x = let sqrt_x = sqrt x in \y -> sqrt_x + y

Step 2: Using the rewritten f

If we look at the definition of result:


result = f 1 2 + f 1 4

We see that the subexpression f 1 occurs twice. We can perform common subexpression elimination (CSE) and write:


result = let f_1 = f 1 in f_1 2 + f_1 4

With the original definition of f, commoning up f 1 would have had no performance benefit - after f was applied to 1 argument it did nothing but wait for the second argument. However, with the revised definition of f, the value f_1 will create the computation of sqrt 1, which will be performed only once when executed by f_1 2 and f_1 4.

The Optimisation

This optimisation technique can be described as:

Step 1: Rewrite the function to perform some computation before all arguments are supplied.

Step 2: Share the partially applied function.

Crucially the function in Step 1 must take it's arguments in an order that allows computation to be performed incrementally.

A Practical Example

In previous versions of Hoogle, the function I wrote to resolve type synonyms (e.g. type String = [Char]) was:


resolveSynonyms :: [Synonym] -> TypeSig -> TypeSig

Given a list of type synonyms, and a type signature, return the type signature with all synonyms expanded out. However, searching through a list of type synonyms is expensive - it is more efficient to compute a table allowing fast lookup by synonym name. Therefore, I used the optimisation technique above to write:


resolveSynonyms synonyms = let info = buildSynonymTable synonyms in \x -> transformSynonyms info x

This technique worked well, especially given that the list of synonyms was usually constant. However, from simply looking at the type signatures, someone else is unlikely to guess that resolveSynonyms should be partially applied where possible. An alternative is to make the sharing more explicit in the types, and provide:


data SynonymTable

buildSynonymTable :: [Synonym] -> SynonymTable

resolveSynonyms :: SynonymTable -> TypeSig -> TypeSig

The disadvantage is the increase in the size of the API - we have gone from one function to two functions and a data type. Something that used to take one function call now takes two.

Conclusion

I think all Haskell programmers benefit from understand how the interaction of lambda and let affect sharing. Pushing lambda under let is often a useful optimisation technique, particularly when the resulting function is used in a map. However, I wouldn't usually recommend exporting public API's that rely on partial application to get acceptable performance - it's too hard to discover.

Friday, August 19, 2011

IFL 2011 submission

Summary: Submit to IFL 2011. It's a great breeding ground for ideas.

The IFL 2011 submission deadline has been extended until 31st August, and, in a move I thoroughly agree with, the notification of acceptance is now within 24 hours of submission!

IFL (and TFP) do not work in the same way as ICFP/Haskell Symposium. You submit a paper, describing the work you've done (something functional programming related) and then present the paper. After the conference, you resubmit the paper, and the result is reviewed in greater depth. There is a great opportunity to learn from the experience, and produce a second paper that is far better.

I submitted my original supercompilation work to IFL 2007. You can compare the paper I submitted before the conference with the paper I submitted afterwards. Note that the original paper doesn't even mention the word supercompilation! At the conference I talked to many people, learnt a lot about supercompilation, program optimisation and termination orderings (and lots of other interesting topics). Using this experience, and building on the enthusiasm people had expressed, I was able to produce a much better second paper. My work on supercompilation went into my thesis, and lead to a subsequent ICFP paper.

I think that IFL/TFP are great venues for to present your work, and in presenting and discussing it, to understand more about the work - making it better in the process. You have just under 2 weeks to make the IFL 2011 deadline of 31st August.

Sunday, July 17, 2011

Submit to the Haskell Implementors Workshop

Are you going to ICFP 2011 in Japan? Do you work with Haskell? You should consider offering a talk for the Haskell Implementors Workshop! You have until this coming Friday (22nd July 2011) to offer a talk, by emailing an abstract (less than 200 words) to benl -AT- cse.unsw.edu.au - for full details see the Call for Talks. Talks are 20 minutes long, plus 10 minutes for questions.

What topics are suitable?

If you have been doing or thinking stuff that would interest a bunch of Haskell implementors in a pub, you probably have something suitable for a talk. The aim of the workshop is to provide an "opportunity to bat around ideas, share experiences, and ask for feedback from fellow experts". Unlike ICFP or the Haskell Symposium, there are no published proceedings. If you have some work that isn't quite ready for a more formal setting, this workshop is a useful venue to seek feedback. If you have an idea which seems a bit crazy, it's probably also very interesting!

What if I have less to say?

In addition to the 30 minute talks, the Haskell Implementors Workshop will also have 2-10 minute lightening talks, organised on the day. Attend the workshop, and sign up on the day to give a brief talk.

Saturday, June 18, 2011

Changing the Windows titlebar gradient colors

Summary: This post describes how to change the gradient colors of the titlebar of a single window on Windows XP, using C#.

On Windows XP the titlebar of most windows have a gradient coloring:

For various reasons, I wanted to override the titlebar gradient colors on a single window. The SetSysColors function can change the colors, but that applies to all windows and immediately forces all windows to repaint, which is slow and very flickery. You can paint the titlebar your self (like Chrome does), but then you need to draw and handle min/max buttons etc. After some experimentation, using the unofficial SetSysColorsTemp function, I was able to change the gradient for a single window. As an example, here are some screenshots, overriding the left color with green, then overriding both the colors with green/blue and orange/blue:

Disclaimer: This code uses an undocumented function, and while it seems to work robustly on my copy of Windows XP, it unlikely to work on future versions of Windows. Generally, it is a bad idea to override user preferences - for example the titlebar text may not be visible with overridden gradient colors.

All the necessary code is available at the bottom of this post, and can be used by anyone, for any purpose. The crucial function is SetSysColorsTemp, which takes an array of colors and an array of brushes created from those colors. If you call SetSysColorsTemp passing a new array of colors/brushes they will override the global system colors until a reboot, without forcing a repaint. Passing null for both arrays will restore the colors to how they were before. To make the color change only for the current window we hook into WndProc, and detect when the colors are about to be used in a paint operation. We set the system colors before, call base.WndProc which paints the titlebar (using our modified system colors), then put the colors back.

There are two known issues:

1) If you change the colors after the titlebar has drawn, it will not use the colors until the titlebar next repaints. I suspect this problem is solvable.
2) As you can see in the demo screenshots, the area near the min/max buttons does not usually get recolored. If you only change the left gradient color this doesn't matter.


public class GradientForm : Form
{
    // Win32 API calls, often based on those from pinvoke.net
    [DllImport("gdi32.dll")] static extern bool DeleteObject(int hObject);
    [DllImport("user32.dll")] static extern int SetSysColorsTemp(int[] lpaElements, int [] lpaRgbValues, int cElements);
    [DllImport("gdi32.dll")] static extern int CreateSolidBrush(int crColor);
    [DllImport("user32.dll")] static extern int GetSysColorBrush(int nIndex);
    [DllImport("user32.dll")] static extern int GetSysColor(int nIndex);
    [DllImport("user32.dll")] private static extern IntPtr GetForegroundWindow();

    // Magic constants
    const int SlotLeft = 2;
    const int SlotRight = 27;
    const int SlotCount = 28; // Math.Max(SlotLeft, SlotRight) + 1;

    // The colors/brushes to use
    int[] Colors = new int[SlotCount];
    int[] Brushes = new int[SlotCount];

    // The colors the user wants to use
    Color titleBarLeft, titleBarRight;
    public Color TitleBarLeft{get{return titleBarLeft;} set{titleBarLeft=value; UpdateBrush(SlotLeft, value);}}
    public Color TitleBarRight{get{return titleBarRight;} set{titleBarRight=value; UpdateBrush(SlotRight, value);}}

    void CreateBrushes()
    {
        for (int i = 0; i < SlotCount; i++)
        {
            Colors[i] = GetSysColor(i);
            Brushes[i] = GetSysColorBrush(i);
        }
        titleBarLeft = ColorTranslator.FromWin32(Colors[SlotLeft]);
        titleBarRight = ColorTranslator.FromWin32(Colors[SlotRight]);
    }

    void UpdateBrush(int Slot, Color c)
    {
        DeleteObject(Brushes[Slot]);
        Colors[Slot] = ColorTranslator.ToWin32(c);
        Brushes[Slot] = CreateSolidBrush(Colors[Slot]);
    }

    void DestroyBrushes()
    {
        DeleteObject(Brushes[SlotLeft]);
        DeleteObject(Brushes[SlotRight]);           
    }

    // Hook up to the Window

    public GradientForm()
    {
        CreateBrushes();
    }

    protected override void Dispose(bool disposing)
    {
        if (disposing) DestroyBrushes();
        base.Dispose(disposing);
    }

    protected override void WndProc(ref System.Windows.Forms.Message m) 
    {
        const int WM_NCPAINT = 0x85; 
        const int WM_NCACTIVATE = 0x86;

        if ((m.Msg == WM_NCACTIVATE && m.WParam.ToInt32() != 0) ||
            (m.Msg == WM_NCPAINT && GetForegroundWindow() == this.Handle))
        {

            int k = SetSysColorsTemp(Colors, Brushes, Colors.Length);
            base.WndProc(ref m); 
            SetSysColorsTemp(null, null, k);
        }
        else
            base.WndProc(ref m); 
    }
}

Tuesday, June 07, 2011

INLINE pragmas in the Safe library

Summary: The Safe library has lots of small functions, none of which have INLINE pragmas on them. The lack of INLINE pragmas probably has no effect on the optimisation.

I was recently asked a question about the Safe library:

I was wondering why you didn't use any INLINE pragmas in the Safe library. I'm not a Haskell expert yet, but I've noticed that in many other libraries most one-liners are annotated by INLINE pragmas, so is there any reason you didn't add them to Safe as well?

When compiling with optimisations, GHC tries to inline functions/values that are either "small enough" or have an INLINE pragma. These functions will be inlined within their module, and will also be placed in their module's interface file, for inlining into other modules. The advantage of inlining is avoiding the function call overhead, and possibly exposing further optimisations. The disadvantage is that the code size may grow, which may result in poor utilisation of the processor's instruction cache.

The Safe library contains wrappers for lots of partial Prelude and Data.List functions (i.e. head), with versions that don't fail, or fail with more debugging information. Using the tail function as an example, the library supplies four additional variants:

tail :: [a] -> [a], crashes on tail []

tailNote :: String -> [a] -> [a], takes an extra argument which supplements the error message

tailDef :: [a] -> [a] -> [a], takes a default to return instead of errors

tailMay :: [a] -> Maybe [a], wraps the result in a Maybe

tailSafe :: [a] -> [a], returns some sensible default if possible, [] in the case of tail

As the questioner correctly noted, there are no INLINE pragmas in the library. Taking the example of tailSafe, it is defined as:


tailSafe :: [a] -> [a]
tailSafe = liftSafe tail null

liftSafe :: (a -> a) -> (a -> Bool) -> (a -> a)
liftSafe func test val = if test val then val else func val

Given this indirection, and the lack of an INLINE pragma, is calling tailSafe as cheap as you might hope? We can answer this question by looking at the interface file at the standard optimisation level, by running ghc -ddump-hi Safe.hs -O -c. Looking at the definition attached to tailSafe, we see:


tailSafe = \val -> case val of
    [] -> []
    ds1:ds2 -> ds2

The tailSafe function has been optimised as much as it can be, and will be inlined into other modules. Adding an INLINE pragma would have no effect at standard optimisation. When compiling without optimisation, even with an INLINE pragma, the function is not included in the module's interface. Adding the INLINE pragma to tailSafe is of no benefit.

Let us now look at all functions in the Safe module. When compiled without optimisation (-O0) no functions are placed in the interface file. At all higher levels of optimisation (-O1 and -O2) the interface files are identical. Looking through the functions, only 6 functions are not included in the interface files:

abort

The abort function is defined as:


abort = error

Neither adding an INLINE pragma or eta expanding (adding an additional argument) includes the function in the interface file. I suspect that GHC decides no further optimisation is possible, so doesn't ever inline abort.

atMay and $watMay

The function atMay is recursive, and thus cannot be inlined even when it's definition is available, so GHC sensibly omits it from the interface file. The function $watMay is the worker/wrapper definition of atMay, which is also recursive.

readNote and readMay

The functions readNote and readMay both follow exactly the same pattern, so I describe only readNote. The function readNote is quite big, and GHC has split it into two top-level functions - producing both readNote and readNote1. The function readNote is included in the interface file, but readNote1 is not. The result is that GHC will be able to partially inline the original definition of readNote - however, the read functions tend to be rather complex, so GHC is unlikely to benefit from the additional inlining it has managed to acheive.

atNote

The atNote function is bigger than the inlining threshold, so does not appear in the interface file unless an INLINE pragma is given. Since the inner loop of atNote is recursive, it is unlikely that inlining would give any noticable benefit, so GHC's default behaviour is appropriate.

Conclusion

Having analysed the interface files, I do not think it is worth adding INLINE pragmas to the Safe library - GHC does an excellent job without my intervention. My advice is to only add INLINE pragmas after either profiling, or analysing the resulting Core program.

Sunday, May 22, 2011

Hoogle talk from TFP 2011 [PDF]

Last week I went to TFP 2011, and gave a talk on Hoogle entitled "Finding Functions from Types". The slides are now available online. These slides give some information about how the type searching works in Hoogle, and I intend to write further details in the future.

Sunday, May 08, 2011

CmdArgs is not dangerous

Summary: CmdArgs can be used purely, and even if you choose to use it in an impure manner, you don't need to worry.

As a result of my blog post yesterday, a few people have said they have been put off using CmdArgs. There are three reasons why you shouldn't be put off using CmdArgs, even though it has some impure code within it.

1: You can use CmdArgs entirely purely

You do not need to use the impure code at all. The module System.Console.CmdArgs.Implict provides two ways to write annotated records. The first way is impure, and uses cmdArgs and &=. The second way is pure, and uses cmdArgs_ and +=. Both ways have exactly the same power. For example, you can write either of:


sample = cmdArgs $
    Sample{hello = def &= help "World argument" &= opt "world"}
    &= summary "Sample v1"

sample = cmdArgs_ $
    record Sample{} [hello := def += help "World argument" += opt "world"]
    += summary "Sample v1"

The first definition is impure. The second is pure. Both are equivalent. I prefer the syntax of the first version, but the second version is not much longer or uglier. If the impurities scare you, just switch. The Implicit module documentation describes four simple rules for converting between the two methods.

2: You do not need to use annotated records at all

Notice that the above module is called Implicit, CmdArgs also features an Explicit version where you create a data type representing your command line arguments (which is entirely pure). For example:


arguments :: Mode [(String,String)]
arguments = mode "explicit" [] "Explicit sample program" (flagArg (upd "file") "FILE")
     [flagOpt "world" ["hello","h"] (upd "world") "WHO" "World argument"
     ,flagReq ["greeting","g"] (upd "greeting") "MSG" "Greeting to give"
     ,flagHelpSimple (("help",""):)]
     where upd msg x v = Right $ (msg,x):v

Here you construct modes and flags explicitly, and pass functions which describe how to update the command line state. Everything in the Implicit module maps down to the Explicit module. In addition, you can use the GetOpt compatibility layer, which also maps down to the Explicit parser.

Having written command line parsers with annotated records, I'd never go back to writing them out in full. However, if you want to map your own command line parser description down to the Explicit version you can get help messages and parsing for free.

3: I fight the optimiser so you don't have to

Even if you choose to use the impure implicit version of CmdArgs, you don't need to do anything, even at high optimisation levels. I have an extensive test suite, and will continue to ensure CmdArgs programs work properly - I rely on it for several of my programs. While I find the implicit impure nicer to work with, I am still working on making a pure version with the same syntax, developing alternative methods of describing the annotations, developing quasi-quoters etc.

Saturday, May 07, 2011

CmdArgs - Fighting the GHC Optimiser

Summary: Everyone using GHC 7 should upgrade to CmdArgs 0.7. GHC's optimiser got better, so CmdArgs optimiser avoidance code had to get better too.

Update: Has this post scared you off CmdArgs? Read why CmdArgs is not dangerous.

CmdArgs is a Haskell library for concisely specifying the command line arguments, see this post for an introduction. I have just released versions 0.6.10 and 0.7 of CmdArgs, which are a strongly recommended upgrade, particular for GHC 7 users. (The 0.6.10 is so anyone who specified 0.6.* in their Cabal file gets an automatic upgrade, and 0.7 is so people can write 0.7.* to require the new version.)

Why CmdArgs is impure

CmdArgs works by annotating fields to determine what command line argument parsing you want. Consider the data type:


data Opts = Opts {file :: FilePath}

By default this will create a command line which would respond to myprogram --file=foo. If instead we want to use myprogram foo we need to attach the annotation args to the file field. We can do this in CmdArgs with:


cmdArgs $ Opts {file = "" &= args}

However, CmdArgs does not have to be impure - you can equally use the pure variant of CmdArgs and write:


cmdArgs_ $ record Opts{} [file := "" += args]

Sadly, this code is a little more ugly. I prefer the impure version, but everything is supported in both versions.

I am still experimenting with other ways of writing an annotated record, weighing the trade-offs between purity, safety and the syntax required. I have been experimenting with pure variants tonight, and hope in a future release to make CmdArgs pure by default.

GHC vs CmdArgs

Because CmdArgs uses unsafe and untracked side effects, GHC's optimiser can manipulate the program in ways that change the semantics. A standard example where GHC's optimiser can harm CmdArgs is:


Opts2 {file1 = "" &= typFile, file2 = "" &= typFile}

Here the subexpression "" &= typFile is duplicated, and if GHC spots this duplication, it can use common sub-expression eliminate to transform the program to:


let x = "" &= typFile in Opts2 {file1 = x, file2 = x}

Unfortunately, because CmdArgs is impure, this program attaches the annotation to file1, but not file2.

This optimisation problem happens in practice, and can be eliminated by writing {-# OPTIONS_GHC -fno-cse #-} in the source file defining the annotations. However, it is burdensome to require all users of CmdArgs to add pragmas to their code, so I investigated how to reduce the problem.

Beating GHC 6.10.4

The key function used to beat GHC's optimiser is &=. It is defined as:


(&=) :: (Data val, Data ann) => val -> ann -> val
(&=) x y = addAnn x y

In order to stop CSE, I use two tricks. Firstly, I mark &= as INLINE, so that it's definition ends up in the annotations - allowing me to try and modify the expression so it doesn't become suitable for CSE. For GHC 6.10.4 I then made up increasingly random expressions, with increasingly random pragmas, until the problem went away. The end solution was:


{-# INLINE (&=) #-}
(&=) :: (Data val, Data ann) => val -> ann -> val
(&=) x y = addAnn (id_ x) (id_ y)

{-# NOINLINE const_ #-}
const_ :: a -> b -> b
const_ f x = x

{-# INLINE id_ #-}
id_ :: a -> a
id_ x = const_ (\() -> ()) x

Beating GHC 7.0.1

Unfortunately, after upgrading to GHC 7.0.1, the problem happened again. I asked for help, and then started researching GHC's CSE optimiser (using the Hoogle support for GHC I added last weekend - a happy coincidence). The information I found is summarised here. Using this information I was able to construct a much more targeted solution (which I can actually understand!):


{-# INLINE (&=) #-}
(&=) x y = addAnn x (unique x)

{-# INLINE unique #-}
unique x = case unit of () -> x
    where unit = reverse "" `seq` ()

As before, we INLINE the &=, so it gets into the annotations. Now we want to make all annotations appear different, even though they are the same. We use unique which is equivalent to id, but when wrapped around an expression causes all instances to appear different under CSE. The unit binding has the value (), but in a way that GHC can't reduce, so the case does not get eliminated. GHC does not CSE case expressions, so all annotations are safe.

How GHC will probably defeat CmdArgs next

There are three avenues GHC could explore to defeat CmdArgs:

Reorder the optimisation phases. Performing CSE before inlining would defeat everything, as any tricks in &= would be ignored.

Better CSE. Looking inside case would defeat my scheme.

Reduce unit to (). This reduction could be done in a number of ways:
- Constructor specialisation of reverse should manage to reduce reverse to "", which then seq can evaluate, and then eliminate the case. I'm a little surprised this isn't already happening, but I'm sure it will one day.
- Supercompilation can inline recursive functions, and inlining reverse would eliminate the case.
- If GHC could determine reverse "" was total it could eliminate the seq without knowing it's value. This is somewhat tricky for reverse as it isn't total for infinite lists.

Of these optimisations, I consider the reduction of unit to be most likely, but also the easiest to counteract.

How CmdArgs will win

The only safe way for CmdArgs to win is to rewrite the library to be pure. I am working on various annotation schemes and hope to have something available shortly.

Sunday, May 01, 2011

Searching GHC with Hoogle

Summary: Hoogle can now search the GHC source code. There are also lots of small improvements in the latest version.

A few weeks ago Ranjit Jhala asked me for help getting Hoogle working on the GHC documentation. As a result of this conversation, I've now released Hoogle 4.2.3, and upgraded the Hoogle web tool.

For GHC developers

You can search the GHC documentation using the standard Hoogle website, for example: llvm +ghc

To search within a package simply write +package in your search query. The ghc package on Hoogle includes all the internals for GHC.

If you want to search using the console, you can install Hoogle and generate the GHC package database with:


cabal update
cabal install hoogle
hoogle data default ghc

You can now perform searches with:


hoogle +ghc llvm

For all Hoogle users

The new release of Hoogle contains a number of small enhancements:

The web server has been upgraded to Warp. I'll write a blog post shortly on the move to Warp - but generally it's been a very positive step.

Some of the snippets of documentation have been fixed, where the markup was interpreted wrongly.

There is only an expand button next to the documentation if there is more information to expand.

Some iPad integration, so you can now add it to your home page with a nice icon.

Work on a deployment script to automate uploading a new version to the web server, which will allow for more frequent updates (until now it took over 2 hours to deploy a new version).

Updates as some web resources moved around, particularly the Haskell Platform cabal file.

The theory behind Hoogle

I'll be talking about the theory behind type searching in Hoogle at Trends in Functional Programming 2011 in Madrid in a few weeks time. It's not too late to register.

Friday, April 29, 2011

Darcs for my Wife

Summary: Using the version control system Darcs is simple. This document describes the commands I use in my workflow.

Darcs is my favourite version control system because of its simple user interface. I am able to do everything I need with a handful of simple commands. I work with darcs as a single user, without branches, occasionally accepting small external contributions. This guide is designed to provide a sufficient reference for a someone who isn't a computer expert to continue using Darcs after their friendly comp-sci has set up the SSH bits for them (e.g. my wife).

A Darcs repository (or repo) is a set of changes (sometimes referred to as patches). There are three places changes may live:

Remote Repo, such as on community.haskell.org.

Local Repo, on your computer. You may have many local repos all using the same remote repo, especially if you use multiple computers.

Local Changes, modifications you have made to your local repo, by editing files.

The way changes move between locations is described by:

Basic Commands

I use four commands daily, shown in red on the above diagram, which are:

darcs pull - copy changes from the remote repo to your local repo. Use pull at the beginning of the day, and during the day if you are sharing a remote repo with other people.

darcs whatsnew - see what local changes you have made. A useful flag is --summary, which lists only the names of files which have changed.

darcs record - after you have decided to keep your changes, use record to store them in your local repo.

darcs push --no-set-default ndm@code.haskell.org:/srv/code/hoogle - at the end of the day use push to move your changes to the remote repo. This command is the most tricky to use, you need SSH access to the server and need to use the file path of the repo, not the URL exposed by the website. I recommend writing a .bat file to automate this command, or using the neil push command if appropriate.

In addition to the four commands above, there are two additional basic commands that are used more rarely:

darcs add filename - tell darcs that you have created a new file that should be kept in the repo. If you forget to add a file, darcs will not store it for you.

darcs mv oldname newname - rename a file stored in darcs. Use this command instead of directly renaming the file using a file explorer.

Creating Repos

Usually I work with existing repos, but it's useful to know how to create new repos:

darcs init - create an empty remote repo, with no changes in it. This command is usually run on the server.

darcs get http://code.haskell.org/hoogle - create a new local repo based on a remote repo.

Advanced Commands

These commands are useful to correct mistakes, but they aren't essential - you can do everything you need without them.

darcs revert - undo local changes that have not yet been recorded.

darcs unrecord - undo a previous record command. Do not unrecord if you have pushed the changes, or have done substantial work since the initial record. Instead, manually create a new change undoing the previous record, and record that.

Collaboration Commands

If you are accepting occasional contributions from other people, you will need the following commands:

darcs send -o filename.patch - make a file containing all the changes in your local repo, but not in the remote repo. This file can be emailed to other people.

darcs apply filename.patch - apply a set of changes sent to you by email.

Conclusion

Effective use of darcs requires only a few simple commands, with only a few command line switches. This guide includes all the commands I ever use. Darcs rarely surprises me, and I think it is a suitable version control system for people who are familiar with the command line, but aren't computer experts.

Sunday, March 20, 2011

Experience Report: Functional Programming through Deep Time

My wife has just completed a draft of the experience report she's intending to submit to ICFP 2011. It's called Functional Programming through Deep Time:

This experience report describes how Haskell was used to model the beginnings of complex life on Earth. My work combines ecological modeling in Haskell with statistical analysis in R, to answer some long standing paleontological questions. For my work, I found that neither Haskell nor R was suffcient - statistical analysis in Haskell is overly burdensome, while R lacks the structure to express complex algorithms in a maintainable manner. The reaction from my colleagues has ranged from indifferent to excited - but I have yet to tempt any of them over to the pure side!

I initially persuaded my wife to switch to Haskell, but since then, I have had little involvement with her code. If you have any feedback for her (ideally before Wednesday!) please leave it in the comments.

Sunday, March 13, 2011

Hoogle for your language (i.e. F#, Scala, ML, Clean...)

Summary: If you offer to help, I'll make Hoogle search your statically typed functional language.

Hoogle is a search engine for Haskell functions, that allows you to search by either name, or by type. But very little of Hoogle is actually Haskell specific - most is applicable to any language with a Hindley-Milner based type system.

Recently I have been asked by several people what they can do to allow Hoogle to search their preferred language. There are four steps to integrating a language with Hoogle, detailed below. If you are interested in helping please email me - I already have volunteers for both F# and Scala, but additional volunteers for other languages are welcome.

To allow searching a language from Hoogle, there are four steps:

1) A volunteer needs to generate some Hoogle input files containing details of the modules/functions/packages etc. to be searched. These files should be plain text, but can be in a language specific format - i.e. ML syntax for type signatures. For a rough idea of how these files could look see this example - for Haskell I get these files from Hackage. The code to generate these input files can be written in any language, and can live outside Hoogle.

2) Someone needs to write a parser that converts these language specific inputs into internal Hoogle representations. The equivalent code for Haskell is in the Hoogle repo. If a volunteer writes this code, I'll happily use it. If I have to write this code then that's OK, although I might take a bit longer. This code needs to be written in Haskell and live inside Hoogle.

At this stage, Hoogle will be able to search the new language. The remaining stages will just make the experience more pleasant.

3) Someone needs to write a query parser for the language, inside Hoogle. I may do this, as I'm intending to rewrite the Haskell query parser anyway, and I could probably find some savings by doing them together. This code needs to be written in Haskell and live inside Hoogle.

4) A volunteer would be useful to keep the function definitions up to date, generate new definitions, and ensure they get uploaded.

Email me if you want to volunteer!

Sunday, February 13, 2011

Corner Cases And Zero (and how WinForms gets it wrong)

Summary: Zero is a perfectly good number and functions should deal with it sensibly. In WinForms, both the Bitmap object and the DrawToBitmap function fail on zero, which is wrong. Functional programming (and recursion) make it harder to get the corner cases wrong.

Lots of programming is about reusing existing functions/objects. Many types have natural corner cases, e.g. an empty string, the number zero, and an array with zero elements. If the functions you reuse don't deal sensibly with corner cases your functions are likely to contain bugs, or be more verbose in working around other peoples bugs.

C#/WinForms has bugs with zero

Let's write a function in C#/WinForms which given a Control (something that can be displayed) produces a Bitmap of how it will be drawn:


public Bitmap Draw(Control c)
{
    Bitmap bmp = new Bitmap(c.Width, c.Height);
    c.DrawToBitmap(bmp, new Rectangle(0, 0, c.Width, c.Height));
    return bmp;
}

Our function Draw makes use of the existing WinForms function DrawToBitmap:


void Control.DrawToBitmap(Bitmap bitmap, Rectangle bounds)

The function DrawToBitmap draws the control into bitmap at the position specified by bounds. This function is useful, but impure (it mutates the bitmap argument), and somewhat fiddly (bounds have to satisfy various invariants with respect to the bitmap and the control). Our Draw function only handles the common case where you want the entire bitmap, but is pure and simpler. (Our Draw function can be renamed DrawToBitmap and added as an extension method of Control, making it quite convenient to use.)

Unfortunately our Draw function has a bug, due to the incorrect handling of zero in the functions we rely on. Let's consider a control with width 0, and height 10. First we crash with the exception "Parameter is not valid." when executing:


new Bitmap(0, 10);

Unfortunately the .NET Bitmap type doesn't allow bitmaps which don't contain any pixels. This limitation probably comes from the CreateBitmap Win32 API function, which doesn't allow empty bitmaps. The result is that our function cannot return a 0x10 bitmap, meaning that lots of nice properties (e.g. the resulting bitmap will be the same size as the control) are necessarily violated. We can patch around the limitations of Bitmap by writing:


Bitmap bmp = new Bitmap(Math.Max(1, c.Width), Math.Max(1, c.Height));

This change is horrid, but it's the best we can do within the limitations of the .NET Bitmap type. We run again and now get the exception "targetbounds" when executing:


c.DrawToBitmap(bmp, new Rectangle(0, 0, 0, 10));

Unfortunately DrawToBitmap throws an exception when either the width or height of the bounds is zero. We have to add another workaround to avoid calling DrawToBitmap in these cases (or at this stage perhaps just add an if at the top which returns early if either dimension is 0). The Bitmap limitation is annoying but somewhat understandable - it is driven by legacy code. However, DrawToBitmap could easily have been modified to accept 0 width or height and simply avoid doing anything, which would be the only sensible behaviour at this corner case.

The problem with bugs in corner cases is that they propagate. Bitmap has a limitation, so everything which uses Bitmap inherits this limitation. The DrawToControl function has a bug, so everything built on top of it has a bug (or needs to include a workaround). The documentation for Bitmap and DrawToControl doesn't mention that they fail at corner cases, which is unfortunate.

Induction for Corner Cases

One of the advantages of functional programming is that defining functions recursively forces you to consider corner cases. Consider the Haskell function replicate, which takes a number and a value, and repeats the value that number of times. To define the function it is natural to use recursion over the number. This scheme leads to the definition:


replicate 0 x = []
replicate n x = x : replicate (n-1) x

The function replicate 0 'x' returns []. To get the corner case wrong would have required additional effort. As a result, most Haskell functions work the way you would expect in corner cases - and consequently functions built from them also work sensibly in corner cases. When programming in Haskell my code is less likely to fail in corner cases, and more likely to work first time.

Exercise

As a final thought exercise, consider the following function:


orAnd :: [[Bool]] -> Bool

Where orAnd [[a,b],[c,d]] returns True if either both a and b are True, or if both c and d are True. What should [] return? What should [[]] return? If you write this function recursively (or on top of other recursive functions such as or/and) there will be a natural answer. Writing the function imperatively makes it hard to ensure the corner cases are correct.

Sunday, January 23, 2011

Hoogle Embed

Summary: Hoogle Embed lets you include a small interactive Hoogle search box on your web page.

I have just released Hoogle 4.2, which adds the feature Hoogle Embed, letting you embed a small Hoogle powered search box on any web page. For an example, visit the Hoogle page on my website, and try typing "database" in the search box on the right. You should see:

As you type, the search box will perform Hoogle searches on the Hoogle API, and display the results. Selecting a result will visit the associated documentation. Pressing the Search button will perform the search at the Hoogle website.

Hoogle Embed has been tested in Chrome, Firefox and IE. ~~Using IE~~ Using IE 7 or below you may not see results unless the page being displayed is on the same server as the Hoogle instance (i.e. haskell.org), due to restrictions on cross domain AJAX requests. This limitation can probably be overcome with additional work, if people are interested.

Hoogle Embed degrades gracefully if the browser does not support Javascript, leaving just the search box.

Configuration options allow you to automatically add a prefix or suffix to the users search, for example adding +hoogle to search only the Hoogle API.

This feature works with either a custom Hoogle instance, or the standard version on haskell.org.

Using Hoogle Embed in your web page

To include Hoogle Embed on a web page, simply add the following piece of HTML:


<script type="text/javascript" src="http://haskell.org/hoogle/datadir/resources/jquery-1.4.2.js"></script>
<script type="text/javascript" src="http://haskell.org/hoogle/datadir/resources/hoogle.js"></script>
<form action="http://haskell.org/hoogle/" method="get">
  <input type="text"   name="hoogle" id="hoogle" accesskey="1" />
  <input type="hidden" name="prefix" value="+base" />
  <input type="submit" value="Search" />
</form>

To use a different Hoogle server change the action field of the form. To specify a prefix/suffix for all searches add an input field with the name prefix/suffix. For example, the above snippet only searches the base pacakge. By eliminating the prefix line it will search using the default Hoogle settings (the Haskell platform).

The Hoogle Embed feature is usable on any web page, but I think would be particularly effective on pages such as the Hackage page for a package, or for any Haddock documentation (perhaps when using a flag such as --hoogle-embed). I encourage anyone who is interested to submit patches to the relevant projects.

Hoogle Manual

I am currently considering the issue of documentation, and would welcome other peoples thoughts. Currently the Hoogle manual is hosted on the Haskell Wiki, but is somewhat out of date. For all other packages, I tend to write an HTML manual stored in the darcs repo, such as for hlint. There are advantages to both formats - the wiki can be easily edited by many people, but the darcs manual can be updated simultaneously with the code and is available offline (most Hoogle work is done on a train without internet access, so this issue is very relevant). My current thought is to remove the wiki page and move it's contents into darcs.

Edit: Fixed the Javascript links.

Edit 2: Hoogle Embed now works cross domain in IE 8 and above.

Sunday, January 16, 2011

Hoogle At 1.7 Million Searches

Summary: I detail some of the changes in today's Hoogle release; I outline future plans for Hoogle; I plot some statistics, noting that Hoogle has been used for 1.7 million searches.

New Features

Hoogle is a Haskell API search engine, which I've been working on since 2004. Today I updated the web version at http://haskell.org/hoogle, and uploaded a new version to Hackage. Since my last release I've been working on several new features:

Instant Search - in the top right corner you will see a link entitled "Instant is off". Click that link to turn instant search on, and then searches will be performed as you type. This feature is still experimental, but I already rely on it for my searches.

Visual Refresh - I've modified the style and layout, trying to improve the general feel, especially when used in conjunction with instant search. I added links to make the package filtering options more accessible and I collapse identical results available from different modules.

Database Update - thanks to Ian Lynagh and Ross Paterson I've been able to update all the Hoogle databases, including for the base package. In the future I hope to keep Hoogle continuously updated.

Future Plans

There are three big improvements I plan to make to Hoogle:

Better Data - Hoogle relies on data from Haddock, uploaded to Hackage, by Cabal. This release improves the data, but there are still further improvements to be made. There are several bugs in Haddock, and data for the base library is not yet available on Hackage. If Hoogle could integrate more closely with Cabal then Hoogle could search users local packages. Many of these problems require coordination between several projects, and any offers of help would be welcome. Some bugs: #80, #11, #339, #60, #183.

Better Search Syntax - the search syntax in Hoogle is acceptable, but isn't that close to other search engines, doesn't always mesh well with instant search and has a number of bugs. I intend to overhaul the search syntax, hopefully improving the feel of Hoogle. Some bugs: #34, #61, #398, #130.

Improved Database/Searching - type search needs to execute faster and give better results. By executing faster Hoogle will be able to search the whole of Hackage at once. Hoogle has gone through four entirely different iterations of type search already, and I have a design for the fifth version. Some bugs: #79, #324, #30, #336, #381.

Statistics

Hoogle logs the date and contents of each search, but stores no personally identifiable information. These statistics all relate to the number of searches made, not including blank searches or suggestions offered by the Firefox search plugin. In the time between adding a logging facility, and making it log the date (2009-Apr-24), there were 631930 searches. Since then there have been at least 1012414 searches, not taking into account about a month where logging was disabled. Generally, searches range between 1000 and 2500 a day.