Sunday, January 08, 2012

Pascal's Triangle in Haskell

Summary: I'm continually amazed how concise and powerful Haskell is, compared to mainstream languages. This post describes how to write Pascal's Triangle, and gives some of the advantages of using Haskell.

Often, when programming in Haskell, I feel like I'm cheating. As an example, I recently came across this article by William Shields, which suggests that prospective interview candidates be given simple programming tasks like generating Pascal's Triangle. William gives examples in Python, including some of the answers a typical candidate might give.

Pascal's Triangle in Haskell

William describes Pascal's Triangle as:

"The root element is 1. Every other element is the sum of the one or two above it (diagonally left and diagonally right)."

As a Haskell programmer, the obvious technique to use is induction. The first row of the triangle is [1], and each row can be computed from the previous row by adding the row shifted left, and the row shifted right:


next xs = zipWith (+) ([0] ++ xs) (xs ++ [0])
pascal = iterate next [1]


Here, we define next to take one row and produce the next row. We then use the iterate function to repeatedly apply next starting at the root element. The solution is short, and follows from the definition.

Laziness for Modularity

William originally posed three questions:


  • Print out the triangle to a specific row: print $ take 100 pascal

  • Return a given row of the triangle: pascal !! 50

  • Return a given element (by row and index) of the triangle: pascal !! 10 !! 5



Thanks to laziness, we can concisely answer all these questions in terms of the original pascal definition. In contrast, using a language such as Python, the best solution (Dynamic Programming from the original article) can only perform the first task.

Interview problems

The original article was not about the choice of programming language, but about choosing suitable questions for interviewing programmers. I agree with William that Pascal's Triangle is a suitable problem - it isn't a trick puzzle, it isn't an API knowledge quiz - it's about understanding how to program. Given how much easier the problem is to solve in Haskell, I wonder if using Haskell in a job interview should be considered cheating? ;-)

Hiding Win32 Windows

Summary: This post describes how to hide windows on the Windows operating system, by using the Win32 API from Haskell.

Imagine that whenever your computer restarts, it pops up a message box:



If you interact with this window, your computer will be destroyed. You can't kill the process, or dismiss the window harmlessly. (This scenario isn't hypothetical...) The solution is to hide the window, so it still exists but is out of the way of misplaced clicks. To hide the window, we first find it's OS handle, then we call some Win32 API functions.

Find the Window

To find the handle of the window, we use Spy++. Spy++ comes with Visual Studio, and is bundled in the Express edition (the free version) from 2010 onwards. Start Spy++, got to Search, Find Window, then use the finder tool to select the window in question. Check that the caption of the window matches what Spy++ reports:



The important information is the handle: 0004061E.

Hide the Window

To hide the window you need a programming language capable of making Win32 API calls. In the past I have used Word VBA as the host language, but Haskell is probably easier. Start GHCi, and type:


$ import System.Win32
$ import Graphics.Win32
$ showWindow (castUINTToPtr 0x0004061E) sW_HIDE


Replace 0x0004061E on the final line with 0xyour-handle. The final line should cause the window to be hidden, saving your computer from destruction.

Thanks: Thanks to Roman Leshchinskiy for reminding me that there were better solutions than just trying not to click the window. Thanks to the Win32 Haskell developers - the Win32 binding was a lot of work, which not many people ever see.

Wednesday, December 14, 2011

Enabling Reply To All in Outlook - version 2

I previously described how to enable Reply To All in Outlook. Since then I've modified the VBA script so it works on both Office 2003 and 2007, gives better error messages, automatically replaces the disabled Reply To All buttons, allows editing messages (even if that feature has been disabled) and strips messages of inline images. Installation instructions and links to the code are in the user manual:

http://community.haskell.org/~ndm/darcs/office/ReplyToAll.htm

I don't want to duplicate the installation instructions here, as I will likely revise them over time. However, at the bottom of the manual is a plea to Outlook administrators not to disable Reply To All in the first place. I don't think arguments against disabling essential email functionality can be made often enough, so here is my plea:

Plea to Outlook Administrators



Please do not disable essential email functionality. With the workarounds linked to above, the attempt is futile, but remains deeply inconvenient. Consider the situation where Alice sends an email to Bob, Charlie and Dave asking for some financial details of Mega Corp. Bob has the details on a post-it note on his desk and quickly replies to everyone with the information. But in a world without Reply To All...


  • Bob replies just to Alice. Charlie, being the helpful soul he is, decides to search through the filing cabinet for the information. 15 minutes later Charlie finds the details and emails Alice, only to be told "Thanks, Bob answered this 15 minutes ago". Charlie realises he just wasted 15 minutes of his life, and goes to get a cookie to make himself feel better.

  • Bob replies just to Alice. At the end of the week Dave is reviewing his emails and realises that Alice still hasn't got a reply, but before he potentially wastes 15 minutes, he drops Alice an email - "Do you still want those details?". Alice replies "No". Dave concludes that he's a clever bunny for not going to the filing cabinet, and decides a one week latency when replying to Alice is just common sense.

  • Bob replies, but realising the potential cost of replying just to Alice, also replies to Charlie and Dave. Bob spent a minute retyping the recipient list, and wonders "Why can't Outlook have a button that does this for me?".

  • Bob replies, also including Charles and Dave. Woops! That should have been Charlie, not Charles. As a best case scenario, Charles gets annoyed with a useless email, but at worst Charles brings down Mega Corp with the sensitive information gleaned from the email. Charlie still ends up going to get a cookie.



Removing Reply To All increases the volume of email required, and increases the risk of email accidents. I've heard only two arguments against Reply To All, both of which are wrong:


  • When Bob replies to everyone, the financial information about Mega Corp gets sent to Charlie, but what if Charlie shouldn't have access to that information? Of course, if Charlie shouldn't have access to that information then Alice was at fault for sending the request to Charlie in the first place. Bob should also check when sending sensitive information, but most emails are not sensitive (it is a poor default), checking the senders should not require retyping the senders (it is a poor user interface) and by default mailing lists are not fully expanded (so he won't be able to tell the full list of senders anyway).

  • When Zorg, the owner of Mega Corp, sends a Christmas email to all his employees, what if one of them hits Reply To All? That wastes a lot of company time, and should be prevented - but not on the client side. It is possible to restrict the number of recipients to an email, Zorg could send out the email with everyone in the BCC field, or IT could set up a mailing list which does not permit replies.

Saturday, November 05, 2011

Abstract Generics with Uniplate

Summary: The new version of Uniplate has several wrappers for working with abstract data types, such as Map from the containers package. These wrappers let you transform abstract data types without breaking their invariants.

Abstract data types, such as Map from the containers package, hide their internal structure so they can maintain invariants necessary for their correct operation. Generic programming, such as the Data class used by SYB, allows programs to generically traverse data types, without detailed knowledge of their structure. The two are somewhat incompatible - if Map allows manipulating it's internal structure, the invariants could be broken causing Map to perform incorrectly.

Using the Data class, a Map String Int claims it has no constructors, but if you operate on it with gfoldl is behaves as though it were [(String,Int)]. When Uniplate traverses the Map, such as when searching for an Int, it first analyses the available constructors to see if a Map String Int can possibly contain an Int. Since Map has no constructors, it concludes that Map cannot contain an Int, and Uniplate fails to operate on the contained Int's.

For people who use the Data-style Uniplate module, using the new version of Uniplate you can now correctly operate over Map String Int, provided you use the newtype Map wrapper from Data.Generics.Uniplate.Data.Instances. When you transform over Bool (which does not touch the Map) it will ignore the Map and take O(1). When you transform over Int it will reconstruct the Map in O(n), and if you transform String or Char it will reconstruct the Map in O(n log n). Regardless of what operations you do, it will work efficiently and correctly. As an example:


$ import qualified Data.Map as Map
$ import Data.Char
$ import Data.Generics.Uniplate.Data
$ import Data.Generics.Uniplate.Data.Instances
$ fromMap $ transformBi toUpper $ toMap $ Map.fromList [("haskell",12),("test",18)]
fromList [("HASKELL",12),("TEST",18)]


There are two approaches for dealing with the Map problem in Uniplate. For users of the Direct-style Uniplate module, there is a function plateProject, which has been available for some time. For users of the Data-style Uniplate module, or for users of the SYB package, there is a new module Data.Generics.Uniplate.Data.Instances in uniplate-1.6.4 (released today) which provides three types with special Data instances (Hide, Trigger, Invariant). Using these three types we can construct wrappers providing Data instances for abstract types, and Uniplate includes wrappers for several of the types in the containers package (Map, Set, IntMap, IntSet).

plateProject

The plateProject function helps you define Direct Uniplate instances for abstract types. Instead of defining how to examine the data type, you instead define how to transform the data type into one you can examine, and how to transform it back. As an example:


instance Biplate (Map.Map [Char] Int) Char where
biplate = plateProject Map.toList Map.fromList


If the types ensure that any operations will not change the keys we can optimise and use the fromDistictAscList function to reconstruct the Map:


instance Biplate (Map.Map [Char] Int) Int where
biplate = plateProject Map.toAscList Map.fromDistinctAscList


Hide

The Hide data type is useful for wrapping values that you want to ignore with Uniplate. The type is defined as:


data Hide a = Hide {fromHide :: a}


But has a Data instance that pretends it is defined using the extension EmptyDataDecls as:


data Hide a


As an example of avoiding particular values, you can write:


transformBi (+1) (1, 2, Hide 3, Just 4) == (2, 3, Hide 3, Just 5)


As a result of having no constructors, any calls to the methods toConstr or gunfold will raise an error.

Trigger

The Trigger data type is useful for determining when a value was constructed with the Data methods. It is defined as:


data Trigger a = Trigger {trigger :: Bool, fromTrigger :: a}


But the Data instance pretends that it is defined as:


data Trigger a = Trigger a


However, whenever gfoldl or gunfold constructs a new value, it will have the trigger field set to True. The trigger information is useful to indicate whether any invariants have been broken, and thus need fixing. As an example:


data SortedList a = SortedList (Trigger [a]) deriving (Data,Typeable)
toSortedList xs = SortedList $ Trigger False $ sort xs
fromSortedList (SortedList (Trigger t xs)) = if t then sort xs else xs


This data type represents a sorted list. When constructed the items are initially sorted, but operations such as gmapT could break that invariant. The Trigger type is used to detect when the Data operations have been performed, and resort the list.

The Trigger type is often used in conjunction with Invariant, which fixes the invariants.

Invariant

The Invariant data type is useful for ensuring that an invariant is always applied to a data type. It is defined as:


data Invariant a = Invariant {invariant :: a -> a, fromInvariant :: a}


But the Data instance pretends that it is defined as:


data Invariant a = Invariant a


Whenever a gfoldl constructs a new value, it will have the function in the invariant field applied to it. As an example:


data SortedList a = SortedList (Invariant [a]) deriving (Data,Typeable)
toSortedList xs = SortedList $ Invariant sort (sort xs)
fromSortedList (SortedList (Invariant _ xs)) = xs


Any time an operation such as gmapT is applied to the data type, the invariant function is applied to the result. The fromSortedList function can then rely on this invariant.

The gunfold method is partially implemented - all constructed values will have an undefined value for all fields, regardless of which function is passed to fromConstrB. If you only use fromConstr (as Uniplate does) then the gunfold method is sufficient.

Map

Using the Hide, Trigger and Invariant types, we can define a wrapper for the containers Map type as:


newtype Map k v = Map (Invariant (Trigger [k], Trigger [v], Hide (Map.Map k v)))
deriving (Data, Typeable)


The Map type is defined as an Invariant of three components - the keys, the values, and the underlying Map. We use Invariant to ensure that the keys/values/map always remain in sync. We use Trigger on the keys and values to ensure that whenever the keys or values change we rebuild the Map, but if they don't, we reuse the previous value. The function to extract a containers Map from the wrapper requires only simple pattern matching:


fromMap (Map (Invariant _ (_,_,Hide x))) = x


The function to wrap a containers Map is slightly harder, as we need to come up with an invariant restoring function:


toMap :: Ord k => Map.Map k v -> Map k v
toMap x = Map $ Invariant inv $ create x
where
create x = (Trigger False ks, Trigger False vs, Hide x)
where (ks,vs) = unzip $ Map.toAscList x

inv (ks,vs,x)
| trigger ks = create $ Map.fromList $ zip (fromTrigger ks) (fromTrigger vs)
| trigger vs = create $ Map.fromDistinctAscList $ zip (fromTrigger ks) (fromTrigger vs)
| otherwise = (ks,vs,x)


The create function creates a value from a Map, getting the correct keys and values. The inv function looks at the triggers on the keys/values. If the keys trigger has been tripped, then we reconstruct the Map using fromList. If the values trigger has been tripped, but they keys trigger has not, we can use fromDistinctAscList, reducing the complexity of constructing the Map. If nothing has changed we can reuse the previous value.

The end result is that all Uniplate (or SYB) traversals over Map result in a valid value, which has had all appropriate transformations applied.

Tuesday, November 01, 2011

Haskell Implementors Workshop 2011

The videos from the Haskell Implementors Workshop 2011 are now online at YouTube. I'm currently uploading them to Vimeo, but that will take another week (I ran out of free quota).

Many thanks to Kento EMOTO who did all the editing and conversion, and to all the speakers for the great talks.

Sunday, October 30, 2011

Calling Haskell from R

Summary: This post describes how to write a function in Haskell, and then call it from R.

R is a very popular language for statistics, particular with biologists (and computational paleobiologists). For writing high performance code, the R developers recommend the use of C or Fortran - not languages that are particularly easy for beginners. However, you can instead write a Haskell function that can be called directly from R. The basic idea is to create a C compatible library using Haskell (as described in the GHC users manual) and then call that library from R (as described in this document). As a simple example, let's write a function that adds up the square roots of a list of numbers.

Create an R-compatible Haskell library

In normal Haskell, we would define the function to add up the square roots of a list as:


sumRoots :: [Double] -> Double
sumRoots xs = sum (map sqrt xs)


However, to make a function that is compatible with R, we have to follow two rules:


  • Every argument must be a Ptr to a C compatible type, typically Int, Double or CString. (To be pedantic, we should probably use CInt or CDouble, but using GHC on Windows these types are equivalent - keeping things simpler.)

  • The result must be IO ()



Obeying these restrictions, we need to use the type:


sumRootsR :: Ptr Int -> Ptr Double -> Ptr Double -> IO ()
sumRootsR n xs result = ...


Instead of passing in the list xs, we now pass in:


  • n, the length of the list xs

  • xs, the elements of the list

  • result, a space to put the result



We can implement sumRootsR by using the functions available in the Foreign module:


sumRootsR :: Ptr Int -> Ptr Double -> Ptr Double -> IO ()
sumRootsR n xs result = do
n <- peek n
xs <- peekArray n xs
poke result $ sumRoots xs


This function first gets the value for n, then for each element in 0..n-1 gets the element out of the pointer array xs and puts it in a nice list. We then call the original sumRoots, and store the value in the space provided by result. As a general rule, you should put all the logic in one function (sumRoots), and the wrapping in another (sumRootsR). We can then export this function with the definition:


foreign export ccall sumRootsR :: Ptr Int -> Ptr Double -> Ptr Double -> IO ()


Putting everything together, we end up with the Haskell file:


-- SumRoots.hs
{-# LANGUAGE ForeignFunctionInterface #-}
module SumRoots where

import Foreign

foreign export ccall sumRootsR :: Ptr Int -> Ptr Double -> Ptr Double -> IO ()

sumRootsR :: Ptr Int -> Ptr Double -> Ptr Double -> IO ()
sumRootsR n xs result = do
n <- peek n
xs <- peekArray n xs
poke result $ sumRoots xs

sumRoots :: [Double] -> Double
sumRoots xs = sum (map sqrt xs)


We also need a C stub file. The one described in the GHC users guide works well:


// StartEnd.c
#include <Rts.h>

void HsStart()
{
int argc = 1;
char* argv[] = {"ghcDll", NULL}; // argv must end with NULL

// Initialize Haskell runtime
char** args = argv;
hs_init(&argc, &args);
}

void HsEnd()
{
hs_exit();
}


We can now compile our library with the commands:


ghc -c SumRoots.hs
ghc -c StartEnd.c
ghc -shared -o SumRoots.dll SumRoots.o StartEnd.o


This creates the library SumRoots.dll.

Calling Haskell from R

At the R command prompt, we can load the library with:


dyn.load("C:/SumRoots.dll") # use the full path to the SumRoots library
.C("HsStart")


We can now invoke our function:


input = c(9,3.5,5.58,64.1,12.54)
.C("sumRootsR", n=as.integer(length(input)), xs=as.double(input), result=as.double(0))$result


This prints out the answer 18.78046.

We can make this function easier to use on the R side by writing a wrapper, for example:


sumRoots <- function(input)
{
return(.C("sumRootsR", n=as.integer(length(input)), xs=as.double(input), result=as.double(0))$result)
}


Now we can write:


sumRoots(c(12,444.34))


And get back the answer 24.54348. With a small amount of glue code, it's easy to call Haskell libraries from R programs.

Saturday, October 01, 2011

Haskell Implementors Workshop 2011 slides

I've just uploaded the slides for all the talks given at the Haskell Implementors Workshop 2011 - just click on "Slides" next to any of the talks. Many thanks to all the speakers for very interesting talks, and for sending me their slides.

There will be video, hosted on either YouTube or Vimeo, at some point in the future. I've got all the copyright permission forms, but I don't yet have the video footage. I'm currently working to get a copy of the video sent from Japan.

Template Haskell fights with Generic Programming

Summary: The InfixE construction in Template Haskell fits poorly with generic programming, because its type does not capture all its restrictions. This mismatch can result in confusing bugs, but there is a simple workaround.

I have often said that anyone manipulating abstract syntax trees, without using some form of generic programming, is doing it wrong. Recently I have been manipulating Template Haskell syntax trees using Uniplate, my preferred generic programming library. Consider the problem of replacing all instances of delete with deleteBy (==) - this task can be done with Template Haskell:


{-# LANGUAGE TemplateHaskell #-}
module IHateDelete where
import Data.List
import Language.Haskell.TH
import Data.Generics.Uniplate.Data

iHateDelete :: Q [Dec] -> Q [Dec]
iHateDelete = fmap (transformBi f)
where
f :: Exp -> Exp
f (VarE x) | x == 'delete = VarE 'deleteBy `AppE` VarE '(==)
f x = x


We can test this function with:


{-# LANGUAGE TemplateHaskell #-}
import IHateDelete
import Data.List

$(iHateDelete
[d|
mapDelete x = map (delete x)
myElem x xs = length (delete x xs) /= length xs
|])


To see the result of running iHateDelete pass the flag -ddump-splices. As far as we can tell, our iHateDelete function works perfectly. But wait - let's try another example:


$(iHateDelete
[d|
myDelete x xs = x `delete` xs
|])


In GHC 6.12, we get a GHC panic. In GHC 7.2 we get the error message:


Operator application with a non-variable operator: deleteBy (==)
(Probably resulting from a Template Haskell splice)


(I would find this message far clearer if it said "Infix application..." rather than "Operation application...")

The body of myDelete starts out as:


InfixE (Just (VarE 'x)) (VarE 'delete) (Just (VarE' xs))


After our transformation, this becomes:


InfixE (Just (VarE 'x)) (AppE (VarE 'deleteBy) ('(==))) (Just (VarE' xs))


Or, written in Haskell syntax:


x `deleteBy (==)` xs


This expression is not valid Haskell, and causes an error when spliced back in (when inserted back into the Haskell code).

The Problem

The underlying problem is called out in the Template Haskell Exp documentation:


InfixE (Maybe Exp) Exp (Maybe Exp)
-- ^ It's a bit gruesome to use an Exp as the operator, but how else can we distinguish constructors from non-constructors?
-- Maybe there should be a var-or-con type? Or maybe we should leave it to the String itself?


The operator in InfixE has a type which permits any expression, but has the restriction that when spliced back in the expression must only be a VarE or ConE. This restriction poses a problem for generic programming, where values are treated in a uniform manner. Sadly, both of the suggested fixes would also cause problems for generic programming. Perhaps the true fix is to let Haskell have arbitrary expressions for infix operators? Or perhaps Template Haskell should translate InfixE to AppE if the operator is incompatible with Haskell?

The Workaround

As a workaround, you can translate away all InfixE expressions that have a complex middle expression. I use the following function:


fixupInfix :: [Dec] -> [Dec]
fixupInfix = transformBi f
where
bad VarE{} = False
bad ConE{} = False
bad _ = True

f (InfixE a b c) | bad b = case (a,c) of
(Nothing, Nothing) -> b
(Just a , Nothing) -> b `AppE` a
(Nothing, Just c ) -> VarE 'flip `AppE` b `AppE` c
(Just a , Just c ) -> b `AppE` a `AppE` c
f x = x


The original iHateDelete can then be modified to become:


iHateDelete = fmap (fixupInfix . transformBi f)


.

Tuesday, September 06, 2011

Faster WPF ComboBoxes

If you create a WPF ComboBox with 2000 items it will take about 2 seconds to drop down, and about 1 second to retract back. But you can make all operations complete instantly if the items are simple (such as strings). If you are writing XAML, add:


<ComboBox>
<ComboBox.ItemsPanel>
<ItemsPanelTemplate>
<VirtualizingStackPanel />
</ItemsPanelTemplate>
</ComboBox.ItemsPanel>

...
</ComboBox>


If you are writing C#, add:


comboBox.ItemsPanel = new ItemsPanelTemplate(new FrameworkElementFactory(typeof(VirtualizingStackPanel)));


Thanks to Henry Hahn for the XAML code I started from. Reading the C# version, I am reminded of the Hammer Factory joke.