Monday, June 04, 2012

The Flavours of MVar

Update: These functions have been cleaned up, improved with respect to error conditions and async exceptions, and put in the extra package.

The MVar is a flexible and powerful locking primitive, used extensively in Haskell. An MVar is like a box which is empty (has zero elements inside) or full (has one element inside). You block when trying to take from an empty MVar or put to a full MVar. On top of MVars, lots of interesting concurrent programs can be written. However, with such a flexible mechanism, there is scope for confusion. Every MVar can block on either a take or a put, but for any individual MVar it is likely you expect it to block on only one of those operations. In my programs I usually restrict my MVars to one of three flavours, each of which is described below.

Lock


The Lock guarantees single-threaded access, typically to some system resource.

type Lock = MVar ()

newLock :: IO Lock
newLock = newMVar ()

withLock :: Lock -> IO a -> IO a
withLock x = withMVar x . const

And as an example:

lock <- newLock
let output = withLock . putStrLn
forkIO $ do ...; output "hello"
forkIO $ do ...; output "world"

Here we are creating a lock to ensure that when writing output our messages do not get interleaved. This use of MVar never blocks on a put. It is permissible, but rare, that a withLock contains a withLock inside it - but if so, watch out for deadlocks.

Var


The Var operates on a mutable variable in a thread-safe way.

type Var a = MVar a

newVar :: a -> IO (Var a)
newVar = newMVar

modifyVar :: Var a -> (a -> IO (a, b)) -> IO b
modifyVar = modifyMVar

modifyVar_ :: Var a -> (a -> IO a) -> IO ()
modifyVar_ = modifyMVar_

readVar :: Var a -> IO a
readVar = readMVar

And as an example:

hits <- newVar 0
forkIO $ do ...; modifyVar_ hits (+1); ...
i <- readVar hits
print ("HITS",i)

Here we have a variable which we modify atomically, so modifications are not interleaved. This use of MVar never blocks on a put. No modifyVar operation should ever block, and they should always complete in a reasonable timeframe. A Var should not be used to protect some external resource, only the variable contained within. Information from a readVar should not be subsequently inserted back into the Var.

Barrier


A barrier starts with no value, is written to once, and read one or more times.

type Barrier a = MVar a

newBarrier :: IO (Barrier a)
newBarrier = newEmptyMVar

signalBarrier :: Barrier a -> a -> IO ()
signalBarrier = putMVar

waitBarrier :: Barrier a -> IO a
waitBarrier = readMVar

And as an example:

bar <- newBarrier
forkIO $ do ...; val <- ...; signalBarrier bar val
print =<< waitBarrier bar

Here we create a barrier which will contain some computed value. A thread is forked to fill the barrier, while the main thread waits for it to complete. A barrier has similarities to a future or promise from other languages, has been known as an IVar in other Haskell work, and in some ways is like a manually managed thunk. It is an error to signal a barrier more than once and a deadlock to never signal it. Since the barrier goes from empty to full, it never blocks on a put, unless you incorrectly call signal more than once.

Combining MVar Flavours - Once


The previous three MVar wrappers are the flavours of MVar which I use regularly. These can be combined into higher-level abstractions specific to certain situations. I now give two examples, intended to show how to combine these primitives.

The once function takes an action, and returns a new action. If the action is never called the argument action will never be executed, but if it is called more than once, it will only be executed once. We can write this function as:

once :: IO a -> IO (IO a)
once act = do
    var :: Var (Maybe (Barrier a)) <- newVar Nothing
    return $ join $ modifyVar var $ \v -> case v of
        Nothing -> do b <- newBarrier; return (Just b, do x <- act; signalBarrier b x; return x)
        Just b -> return (Just b, waitBarrier b)

Here we create a variable to store the result, whose state is either Nothing (we have not yet started computing) or Just a barrier (we have started computing, use this barrier to get the result out). I have found 'join $ modifyVar' is a common idiom, used to defer a blocking action (often waitBarrier) until after a modifyVar has completed, ensuring we preserve our invariant of not blocking inside a modifyVar. When running the resulting action, if the variable is a Nothing we create a new barrier, store it, and then start an action (after leaving the modifyVar) to compute the result, signal the barrier and return. If we already have a barrier, we just wait for this barrier.

[Note that you can implement once in terms of MVar directly, using only one MVar, but that violates the simple rules of the restricted MVars - rightly so, you have to use the MVar empty state to mean both atomic access to shared state, and to mean computation in progress.]

Combing MVar Flavours - Queue


As another practical example of using these restricted MVars, let us consider a special kind of queue. Message arrive individually, but are collected in bulk. When someone tries to retrieve message, if there are any messages waiting they are sent immediately. If there are no messages, the read blocks until either a message arrives or until a new reader arrives, in which case the old reader is sent away with nothing. This can be implemented as:

type Queue a = Var (Either [a] (Barrier [a]))

arrive :: Queue a -> a -> IO ()
arrive q x = modifyVar_ q $ \q -> case q of
    Left xs -> return $ Left $ xs ++ [x]
    Right b -> do signalBarrier b [x]; return $ Left []

collect :: Queue a -> IO [a]
collect q = join $ modifyVar q $ \q -> case q of
    Left xs@(_:_) -> return (Left [], return xs)
    _ -> do
        case q of Right b -> signalBarrier b []; _ -> return ()
        b <- newBarrier
        return (Right b, waitBarrier b)

The type of Queue tells us most of what we need to know about the invariants - Queue has a mutable state, which is either Left (zero or more messages waiting) or a Right (someone waiting to collect messages). If we had used MVar instead of both Var and Barrier, the invariant and design would be far less clear. With these invariants clearly stated, the code just follows directly.

Creating New Flavours


I find the three MVar wrappers (Lock, Var, Barrier) much easier to understand since the rules are simpler, making maintenance easier. I have also found that most projects benefit from higher-level abstractions in some places. As an example, I defined Queue in one recent project, and Shake defines a Resource type, on top of which the resources feature is implemented. Concurrency is hard, but robust abstractions split the complexity, and thus simplify the programs.

Sunday, June 03, 2012

Hoogle Update

Summary: I just updated the Hoogle website. It looks nicer on the iPhone and the source is on GitHub.

The Website

The Hoogle website is (as always) at http://haskell.org/hoogle. I've just uploaded a fresh Hackage index (currently a manual operation, but one I'm intending to automate imminently). I've also made a number of improvements if you are using Hoogle over the iPhone - to simulate the iPhone experience click here.

The Source

The source code has moved to Github: https://github.com/ndmitchell/hoogle. While darcs is a much nicer version control system than Git, GitHub offers a lot of nice features, so I'm using Hoogle as an experiment. I've been promised that projects on GitHub get lots of contributions, so now I wait!

I'm leaving the bug tracker in Google code for the moment, and am considering where the Hoogle manual should live, but a GitHub wiki site is currently looking likely.

Monday, March 26, 2012

Finding Two unordered-containers Bugs

Summary: Over the past few weeks I found two bugs in unordered-containers. I strongly recommend everyone upgrades to unordered-containers-0.2.1.0 or later, which fixes these bugs.

Both Shake and Uniplate use the unordered-containers library, which provides fast Map and Set types. Both my libraries had bugs reported against them which turned out to be introduced by unordered-containers-0.2.0.0. As soon as these bugs were identified Johan released a new version (0.2.1.0) fixing the bugs, and both my packages now work as expected.

Bug 1: Shake breaks because delete fails

Just over two weeks ago Evan Laforge reported he'd started seeing "thread blocked indefinitely in an MVar operation" if there was a compile error. Shake goes to a lot of effort to clean up nicely, so something was going wrong. Since Evan's project compiles on Linux, and I only have access to Windows machines, I tried replicating it on one of my projects. I perturbed the project in several ways and did manage to replicate the error once, but the same perturbation next time made it succeed - something was clearly non-deterministic (which is the norm in a highly concurrent program such as Shake).

I'd been wanting to write a random build system generator and tester for a while, to thoroughly test Shake, and used this bug as motivation. I wrote the tester, and usually within 5 random runs (about 60 seconds of churning) the error was reproduced on my desktop. However, the error wouldn't reproduce on my laptop even after an hour (because I was on an older version of unordered-containers, I now know), and as I was off travelling I had to put the bug away to look into on my return.

On my return I investigated further. Figuring out the command line to run the random test took about an hour (note to self - write these things down). Once I had that, it still reproduced reliably in about 60 seconds. I cut out the random parts of the test that I didn't think were contributing to the bug (tests for building successfully, minimal rebuilding etc) and cut the reproduction time down to about 5 seconds. I then started augmenting the code with print statements. The cleanup code for exceptions is almost exclusively within the thread pool implementation (see Development.Shake.Pool if you are reading the source). In particular, the code that cleans up after an exception is:


t <- myThreadId
mapM_ killThread $ Set.toList $ Set.delete t $ threads s
signalBarrier done $ Just e


This code finds all threads in the thread pool, with the exception of this thread (the Set.delete t), and kills them. After that has finished, signal to the thread who first called the thread pool that everything has finished with an exception.

Adding trace statements at every step showed that the exception started being cleaned up, a handful of threads were killed, but not all the threads were killed and the barrier was never signaled. An initial guess was that my thread was being killed, and thus that Set.delete had failed to remove t. Since I had debugged a separate unordered-containers bug a week before (bug 2 started after and finished before this one), I went to the unordered-containers commit list and immediately found a bug fixed against delete. I upgraded, and ran my random tester for 3 hours without failure.

Bug 2: Uniplate breaks because filter fails

The Uniplate bug was reported just over a week ago, while I was travelling, by Aleksey Khudyakov. This bug report included a relatively small test case demonstrating a clear bug, along with the vital piece of information that the test case worked with unordered-containers-0.1, but not 0.2. With that information the question became whether Uniplate was using some non-guaranteed behaviour (such as assuming the result from Set.toList is ordered), or whether unordered-containers had a bug. The supplied test case was very sensitive - it involved three modules, and simply changing any module name caused the bug to disappear. The code was also exercising a very sensitive and complex part of Uniplate. After 12 hours of cooperative debugging between myself, Aleksey and Johan, I came up with the reduced test case:


let broken = Set.filter (`elem` good) $ Set.fromList $ good ++ bad
working = Set.fromList $ Set.toList broken
print $ Set.member (head good) broken
print $ Set.member (head good) working


This printed False/True, when it should print True/True. Of course, the choice of the good and bad lists is critical, and was based on the Hashable instance for TypeRep, which uses the fully qualified type name, and thus changes if you rename the modules.

Once I had provided Johan with the reduced test case, it was fixed the same day.

Sunday, March 04, 2012

NSIS: Windows installers through an EDSL

Summary: I've just released the NSIS library on Hackage. It's useful for building Windows installers, but is also a call-by-name embedded two-stage programming language.

I've just released a new library on Hackage, named NSIS. It's a library for building Windows installers (see an example at the top of the documentation), based on the NSIS (Nullsoft Scriptable Install System). The basic idea of NSIS is that you write a program which defines your installer - it can search for files, create registry keys, create groups of files etc. But, at its heart, it is a full programming language, merely optimised to the job of writing installers. The original NSIS system has an excellent backend, producing small but featureful installers which run well on a variety of Windows platforms. Unfortunately, the front-end programming language is probably best described as assembly code with gentle influences from scripting languages. My NSIS library defines an embedded Haskell language, of the style promoted by Lennart Augustsson, which can produce scripts to be fed into the original NSIS compiler.

Why might you want to use the original NSIS backend?

I've tried several installer generators over the past few years, and even written my own from scratch. Of all these, I prefer NSIS mainly for two reasons:


  • It's a full programming language. The installer is capable of expressing any program, which means that as you need to do custom things in the installer, you can. For example, you can prohibit installing into the Windows directory. Equally, you can calculate the 1000th prime number. You are never artificially limited by the installer.

  • The installers are slick and robust. As the NSIS documentation says, "Being a user's first experience with your product, a stable and reliable installer is an important component of successful software". NSIS consistently delivers a smooth end-user experience, provided you select the MUI2 (Modern User Interface 2) settings.



Why might you want to use my frontend?

There are many advantages to my frontend. If your script is simple it is likely to look relatively similar in either system. If your script is complex it could end up 100 lines shorter and far clearer in my system. However, there are three primary advantages.

MUI2 by default

If you are writing a simple NSIS installer, the biggest difference is that my system will use the MUI2 settings by default. As NSIS has evolved there are now three separate ways to define your user interface, using the Classic builtin commands, using MUI1 or using MUI2. Of these, MUI2 is by far the nicest and should always be used, but the Classic interface is built in by default, and MUI2 is defined on top using macros. To specify that you want to insert a particular page into the installer you can write either of:


page Directory # classic NSIS
!insertmacro MUI_PAGE_DIRECTORY # MUI2


However, with my front end, you simply write:


page Directory


Flow control

While NSIS installer scripts are very powerful, they aren't easy for script authors. Taking the example from earlier, let's warn before installing into the Windows directory or the System directory:


StrCmp $WINDIR $INSTDIR bad 0
StrCmp $SYSDIR $INSTDIR bad 0
Goto skip
bad:
MessageBox MBOK|MB_ICON_EXCLAMATION "Warning: installing into the Windows directory"
skip:


Shockingly, in 1995 someone wrote a scripting language with only goto, and hasn't added any flow control since then. In my frontend, you can write:


iff_ ("$INSTDIR" %== "$WINDIR" %|| "$INSTDIR" %== "$SYSDIR") $
alert "Warning: installing into the Windows directory"


All control flow can be accomplished with structured programming.

Variables

The original NSIS system has global mutable variables, 16 register variables and a stack - it directly mirrors assembly programming. In my system, variables are properly scoped and named. The difference for complicated functions is significant. As an example, let us define substring replacement in the original NSIS script:


Push "Hello World"
Push "Hello"
Push "Goodbye"
Call StrRep
Pop $R0
MessageBox MB_OK $R0

; Taken from http://nsis.sourceforge.net/Replace_Sub_String_(macro)
Function StrRep
;Written by dirtydingus 2003-02-20 04:30:09
Exch $R4 ; $R4 = Replacement String
Exch
Exch $R3 ; $R3 = String to replace (needle)
Exch 2
Exch $R1 ; $R1 = String to do replacement in (haystack)
Push $R2 ; Replaced haystack
Push $R5 ; Len (needle)
Push $R6 ; len (haystack)
Push $R7 ; Scratch reg
StrCpy $R2 ""
StrLen $R5 $R3
StrLen $R6 $R1
loop:
StrCpy $R7 $R1 $R5
StrCmp $R7 $R3 found
StrCpy $R7 $R1 1 ; - optimization can be removed if U know len needle=1
StrCpy $R2 "$R2$R7"
StrCpy $R1 $R1 $R6 1
StrCmp $R1 "" done loop
found:
StrCpy $R2 "$R2$R4"
StrCpy $R1 $R1 $R6 $R5
StrCmp $R1 "" done loop
done:
StrCpy $R3 $R2
Pop $R7
Pop $R6
Pop $R5
Pop $R2
Pop $R1
Pop $R4
Exch $R3
FunctionEnd


Alternatively, with my frontend, you can write:


alert $ strReplace "Hello" "Goodbye" "Hello World"

strReplace :: Exp String -> Exp String -> Exp String -> Exp String
strReplace from to str = do
from <- constant_ from
to <- constant_ to
rest <- mutable_ str
res <- mutable_ ""
while (rest %/= "") $ do
iff (from `strIsPrefixOf` rest)
(do
res @= res & to
rest @= strDrop (strLength from) rest)
(do
res @= res & strTake 1 rest
rest @= strDrop 1 rest)
res


The difference is immense - the first is hard to follow without plenty of paper. The second is a fairly obvious imperative algorithm for string replacement (and is included in the package for reuse). Note that even though we're using a functional host language, our embedded language is very much imperative.

What technologies went into the frontend?

I really enjoyed writing this installer frontend. I got to dust off many techniques from compiler design that I haven't used for a while. Below is a flavour of some of the techniques I used:


  • Phantom types - all expressions have a phantom type, indicating what type the expression returns. In truth, all NSIS types are strings, but the phantom types let us make conversions explicit and catch user errors.

  • Call-by-name - the programming language I've implemented is call-by-name, which seems to be a particularly convenient choice for embedded languages.

  • Optimisation - since the target is a programming language, I wrote eight optimisation passes, which I iterate over. The result is that the string replacement function above becomes 21 lines of compiled code, while the original hand-coded version is 32 lines. There's no real need for the optimisation pass (the code is rarely a bottleneck or a significant size cost), but writing optimisers is fun.

  • Generics - the front end and optimisation passes are heavily based on generic programming, in particular using Uniplate.

  • Two-stage programming - the first program runs and generates a second program that is then rerun. It is possible to do computation at either level, and most errors are only caught at one or other of those levels, not both.

  • Goto programming - while I provide concepts such as iff and while, the target language is exclusively goto based, which I translate down to.

  • Overloaded literals - I use the overloaded strings and numbers extensively. Haskell doesn't permit overloaded booleans, but if it did I'd use those too.



If any of these techniques are particularly interesting, please comment below, and I'll expand on that area in a future post.

Tuesday, February 28, 2012

Four Interesting Build Tools

Summary: This post describes four interesting build systems, namely Redo, Ninja, Tup and Fabricate.

While developing Shake I investigated many existing build tools. Build tools can be divided into two categories -- those which target single-language projects with fixed rules (e.g. ocamlbuild, ghc --make, Visual Studio projects), and those which allow user specified rules (e.g. make and Shake). Focusing on the second category, the defacto standard is make, but there are many make competitors (notably Ant, CMake, Jam, Scons and Waf). Most of these tools read a list of rules, generate a dependency graph, then execute commands while traversing that graph.

Since the number of build tools is vast, I will restrict my discussion to four build tools which take different approaches (Redo, Ninja, Tup and Fabricate). Interestingly, one thing all four systems have in common is that they require a database of build data, in addition to the rules and the file system. Unlike Shake, all these build systems are limited to files.

Redo

The Redo build system has a similar dependency theory to Shake. Rules are run starting at the target. A rule may call redo-ifchange (similar to need) to ensure that this rule is repeated if any of the file arguments change. A rule can build either a specific named file, or a set of files ending with a particular extension.

While Redo has similarities to Shake, the practical implementation is significantly different. Instead of a single rule store, Redo stores each rule in a separate file, and the script language is simply shell script. The advantage of separate files is that Redo is able to depend on the actual rule used to build a result, meaning that build system changes are properly tracked. However, separating build rules makes it harder to reason about the build system, and eliminates many potential uses of abstraction. Redo does not work on Windows and has no support for multiple outputs.

Ninja

The Ninja build system is designed as a two-stage build system -- users specify their build rules in a high-level manner, which is then translated to a set of Ninja build rules. As a result, the Ninja build system is not designed to be general purpose and configuration choices are expected to be resolved by the first level. The Ninja target language supports three dependency features beyond make. Firstly, a rule can depend on the list of files contained in another file, allowing additional dependencies at build time. Secondly, the command line for each rule is tracked, resulting in a rebuild if the rule itself changes. Thirdly, a rule can generate multiple outputs, which are properly tracked.

Tup

The Tup build system is designed as an incremental build system. Tup has a similar dependency structure to make, but a significantly different implementation. Instead of scanning all dependencies, it expects the operating system to supply a list of changed files, avoiding the overhead of checking which files have changed. For large build systems the result can be a significant speed improvement if there are only a few files to rebuild. We believe a similar implementation strategy could be applied to Shake.

Another difference from make is the treatment of dead build results. If a rule to build foo is deleted from the rule list, then Tup automatically deletes the file foo. The problem of dead build results is serious, resulting in builds succeeding that should have failed, and that will fail as soon as a clean build is performed (to reduce this risk, we suggest an overnight build which starts from scratch). However, it is often useful to have build modes which generate skeleton files which are then modified by the user -- deleting these files would be most unwelcome. It would be easy to add support for deleting dead build results to Shake, but we choose not to.

Fabricate

The key innovation in the Fabricate build system is that dependencies do not need to be stated explicitly. A build system is a Python program, which primarily executes system commands in order. While executing the commands, Fabricate uses system tracing (strace on Linux) to record which files are accessed. In future runs, if the same system command is reached but none of the files it used have changed, the command is skipped. The resulting build systems are simple, and avoid the difficulties of correctly specifying dependencies.

There are two inherent difficulties for build systems without explicit dependencies. Firstly, the system tracing mechanisms on different platforms are varied, and on Windows are somewhat fragile. Secondly, parallelism cannot be inferred automatically -- Fabricate requires explicit grouping annotations to use parallelism.

Saturday, February 11, 2012

Shake: A Better Make

Summary: I have just released Shake, a library that allows you to write build systems in Haskell (think Make, but much better).

At the Haskell Implementors Workshop 2010 I described a Haskell build system named Shake (video here), hoping an implementation would be available "soon". Better late than never, I am delighted to announce that Shake is now available on Hackage. This version is a from scratch reimplementation, based on an improved underlying theory, all completed in my spare time. Several users have already experimented with this version, so it is reasonably robust.

A Simple Example

As a simple example of a Shake build system, let us build the file result.tar from the files listed by result.txt:


import Development.Shake
import Development.Shake.FilePath

main = shake shakeOptions $ do
want ["result.tar"]
"*.tar" *> \out -> do
contents <- readFileLines $ replaceExtension out "txt"
need contents
system' "tar" $ ["-cf",out] ++ contents


We start by importing the modules defining both Shake and routines for manipulating FilePath values. We define main to call shake with the default shakeOptions. As the second argument to shake, we provide a set of rules. There are two common forms of rules, want to specify target files, and *> to define a rule which builds a file pattern. We use want to require that after the build completes the file result.tar should be ready.

The *.tar rule describes how to build files with the extension .tar, including result.tar. We readFileLines on result.txt, after changing the .tar extension to .txt. We read each line into the variable contents -- being a list of the files that should go into result.tar. Next, we depend (need) all the files in contents. If any of these files change, the rule will be repeated. Finally we call the tar program. If either result.txt changes, or any of the files listed by result.txt change, then result.tar will be rebuilt.

If you are interested in writing a build system using Shake, I suggest you read the documentation. One thing noted in the documentation is that if ghc --make or cabal is capable of building your project, use that instead. Custom build systems are necessary for many complex projects, but many projects are not complex.

As an example of how to do this task in Make, see this StackOverflow question. I'll cover the differences and similarities with other build tools in a forthcoming blog post.

Sunday, January 08, 2012

Pascal's Triangle in Haskell

Summary: I'm continually amazed how concise and powerful Haskell is, compared to mainstream languages. This post describes how to write Pascal's Triangle, and gives some of the advantages of using Haskell.

Often, when programming in Haskell, I feel like I'm cheating. As an example, I recently came across this article by William Shields, which suggests that prospective interview candidates be given simple programming tasks like generating Pascal's Triangle. William gives examples in Python, including some of the answers a typical candidate might give.

Pascal's Triangle in Haskell

William describes Pascal's Triangle as:

"The root element is 1. Every other element is the sum of the one or two above it (diagonally left and diagonally right)."

As a Haskell programmer, the obvious technique to use is induction. The first row of the triangle is [1], and each row can be computed from the previous row by adding the row shifted left, and the row shifted right:


next xs = zipWith (+) ([0] ++ xs) (xs ++ [0])
pascal = iterate next [1]


Here, we define next to take one row and produce the next row. We then use the iterate function to repeatedly apply next starting at the root element. The solution is short, and follows from the definition.

Laziness for Modularity

William originally posed three questions:


  • Print out the triangle to a specific row: print $ take 100 pascal

  • Return a given row of the triangle: pascal !! 50

  • Return a given element (by row and index) of the triangle: pascal !! 10 !! 5



Thanks to laziness, we can concisely answer all these questions in terms of the original pascal definition. In contrast, using a language such as Python, the best solution (Dynamic Programming from the original article) can only perform the first task.

Interview problems

The original article was not about the choice of programming language, but about choosing suitable questions for interviewing programmers. I agree with William that Pascal's Triangle is a suitable problem - it isn't a trick puzzle, it isn't an API knowledge quiz - it's about understanding how to program. Given how much easier the problem is to solve in Haskell, I wonder if using Haskell in a job interview should be considered cheating? ;-)

Hiding Win32 Windows

Summary: This post describes how to hide windows on the Windows operating system, by using the Win32 API from Haskell.

Imagine that whenever your computer restarts, it pops up a message box:



If you interact with this window, your computer will be destroyed. You can't kill the process, or dismiss the window harmlessly. (This scenario isn't hypothetical...) The solution is to hide the window, so it still exists but is out of the way of misplaced clicks. To hide the window, we first find it's OS handle, then we call some Win32 API functions.

Find the Window

To find the handle of the window, we use Spy++. Spy++ comes with Visual Studio, and is bundled in the Express edition (the free version) from 2010 onwards. Start Spy++, got to Search, Find Window, then use the finder tool to select the window in question. Check that the caption of the window matches what Spy++ reports:



The important information is the handle: 0004061E.

Hide the Window

To hide the window you need a programming language capable of making Win32 API calls. In the past I have used Word VBA as the host language, but Haskell is probably easier. Start GHCi, and type:


$ import System.Win32
$ import Graphics.Win32
$ showWindow (castUINTToPtr 0x0004061E) sW_HIDE


Replace 0x0004061E on the final line with 0xyour-handle. The final line should cause the window to be hidden, saving your computer from destruction.

Thanks: Thanks to Roman Leshchinskiy for reminding me that there were better solutions than just trying not to click the window. Thanks to the Win32 Haskell developers - the Win32 binding was a lot of work, which not many people ever see.

Wednesday, December 14, 2011

Enabling Reply To All in Outlook - version 2

I previously described how to enable Reply To All in Outlook. Since then I've modified the VBA script so it works on both Office 2003 and 2007, gives better error messages, automatically replaces the disabled Reply To All buttons, allows editing messages (even if that feature has been disabled) and strips messages of inline images. Installation instructions and links to the code are in the user manual:

http://community.haskell.org/~ndm/darcs/office/ReplyToAll.htm

I don't want to duplicate the installation instructions here, as I will likely revise them over time. However, at the bottom of the manual is a plea to Outlook administrators not to disable Reply To All in the first place. I don't think arguments against disabling essential email functionality can be made often enough, so here is my plea:

Plea to Outlook Administrators



Please do not disable essential email functionality. With the workarounds linked to above, the attempt is futile, but remains deeply inconvenient. Consider the situation where Alice sends an email to Bob, Charlie and Dave asking for some financial details of Mega Corp. Bob has the details on a post-it note on his desk and quickly replies to everyone with the information. But in a world without Reply To All...


  • Bob replies just to Alice. Charlie, being the helpful soul he is, decides to search through the filing cabinet for the information. 15 minutes later Charlie finds the details and emails Alice, only to be told "Thanks, Bob answered this 15 minutes ago". Charlie realises he just wasted 15 minutes of his life, and goes to get a cookie to make himself feel better.

  • Bob replies just to Alice. At the end of the week Dave is reviewing his emails and realises that Alice still hasn't got a reply, but before he potentially wastes 15 minutes, he drops Alice an email - "Do you still want those details?". Alice replies "No". Dave concludes that he's a clever bunny for not going to the filing cabinet, and decides a one week latency when replying to Alice is just common sense.

  • Bob replies, but realising the potential cost of replying just to Alice, also replies to Charlie and Dave. Bob spent a minute retyping the recipient list, and wonders "Why can't Outlook have a button that does this for me?".

  • Bob replies, also including Charles and Dave. Woops! That should have been Charlie, not Charles. As a best case scenario, Charles gets annoyed with a useless email, but at worst Charles brings down Mega Corp with the sensitive information gleaned from the email. Charlie still ends up going to get a cookie.



Removing Reply To All increases the volume of email required, and increases the risk of email accidents. I've heard only two arguments against Reply To All, both of which are wrong:


  • When Bob replies to everyone, the financial information about Mega Corp gets sent to Charlie, but what if Charlie shouldn't have access to that information? Of course, if Charlie shouldn't have access to that information then Alice was at fault for sending the request to Charlie in the first place. Bob should also check when sending sensitive information, but most emails are not sensitive (it is a poor default), checking the senders should not require retyping the senders (it is a poor user interface) and by default mailing lists are not fully expanded (so he won't be able to tell the full list of senders anyway).

  • When Zorg, the owner of Mega Corp, sends a Christmas email to all his employees, what if one of them hits Reply To All? That wastes a lot of company time, and should be prevented - but not on the client side. It is possible to restrict the number of recipients to an email, Zorg could send out the email with everyone in the BCC field, or IT could set up a mailing list which does not permit replies.

Saturday, November 05, 2011

Abstract Generics with Uniplate

Summary: The new version of Uniplate has several wrappers for working with abstract data types, such as Map from the containers package. These wrappers let you transform abstract data types without breaking their invariants.

Abstract data types, such as Map from the containers package, hide their internal structure so they can maintain invariants necessary for their correct operation. Generic programming, such as the Data class used by SYB, allows programs to generically traverse data types, without detailed knowledge of their structure. The two are somewhat incompatible - if Map allows manipulating it's internal structure, the invariants could be broken causing Map to perform incorrectly.

Using the Data class, a Map String Int claims it has no constructors, but if you operate on it with gfoldl is behaves as though it were [(String,Int)]. When Uniplate traverses the Map, such as when searching for an Int, it first analyses the available constructors to see if a Map String Int can possibly contain an Int. Since Map has no constructors, it concludes that Map cannot contain an Int, and Uniplate fails to operate on the contained Int's.

For people who use the Data-style Uniplate module, using the new version of Uniplate you can now correctly operate over Map String Int, provided you use the newtype Map wrapper from Data.Generics.Uniplate.Data.Instances. When you transform over Bool (which does not touch the Map) it will ignore the Map and take O(1). When you transform over Int it will reconstruct the Map in O(n), and if you transform String or Char it will reconstruct the Map in O(n log n). Regardless of what operations you do, it will work efficiently and correctly. As an example:


$ import qualified Data.Map as Map
$ import Data.Char
$ import Data.Generics.Uniplate.Data
$ import Data.Generics.Uniplate.Data.Instances
$ fromMap $ transformBi toUpper $ toMap $ Map.fromList [("haskell",12),("test",18)]
fromList [("HASKELL",12),("TEST",18)]


There are two approaches for dealing with the Map problem in Uniplate. For users of the Direct-style Uniplate module, there is a function plateProject, which has been available for some time. For users of the Data-style Uniplate module, or for users of the SYB package, there is a new module Data.Generics.Uniplate.Data.Instances in uniplate-1.6.4 (released today) which provides three types with special Data instances (Hide, Trigger, Invariant). Using these three types we can construct wrappers providing Data instances for abstract types, and Uniplate includes wrappers for several of the types in the containers package (Map, Set, IntMap, IntSet).

plateProject

The plateProject function helps you define Direct Uniplate instances for abstract types. Instead of defining how to examine the data type, you instead define how to transform the data type into one you can examine, and how to transform it back. As an example:


instance Biplate (Map.Map [Char] Int) Char where
biplate = plateProject Map.toList Map.fromList


If the types ensure that any operations will not change the keys we can optimise and use the fromDistictAscList function to reconstruct the Map:


instance Biplate (Map.Map [Char] Int) Int where
biplate = plateProject Map.toAscList Map.fromDistinctAscList


Hide

The Hide data type is useful for wrapping values that you want to ignore with Uniplate. The type is defined as:


data Hide a = Hide {fromHide :: a}


But has a Data instance that pretends it is defined using the extension EmptyDataDecls as:


data Hide a


As an example of avoiding particular values, you can write:


transformBi (+1) (1, 2, Hide 3, Just 4) == (2, 3, Hide 3, Just 5)


As a result of having no constructors, any calls to the methods toConstr or gunfold will raise an error.

Trigger

The Trigger data type is useful for determining when a value was constructed with the Data methods. It is defined as:


data Trigger a = Trigger {trigger :: Bool, fromTrigger :: a}


But the Data instance pretends that it is defined as:


data Trigger a = Trigger a


However, whenever gfoldl or gunfold constructs a new value, it will have the trigger field set to True. The trigger information is useful to indicate whether any invariants have been broken, and thus need fixing. As an example:


data SortedList a = SortedList (Trigger [a]) deriving (Data,Typeable)
toSortedList xs = SortedList $ Trigger False $ sort xs
fromSortedList (SortedList (Trigger t xs)) = if t then sort xs else xs


This data type represents a sorted list. When constructed the items are initially sorted, but operations such as gmapT could break that invariant. The Trigger type is used to detect when the Data operations have been performed, and resort the list.

The Trigger type is often used in conjunction with Invariant, which fixes the invariants.

Invariant

The Invariant data type is useful for ensuring that an invariant is always applied to a data type. It is defined as:


data Invariant a = Invariant {invariant :: a -> a, fromInvariant :: a}


But the Data instance pretends that it is defined as:


data Invariant a = Invariant a


Whenever a gfoldl constructs a new value, it will have the function in the invariant field applied to it. As an example:


data SortedList a = SortedList (Invariant [a]) deriving (Data,Typeable)
toSortedList xs = SortedList $ Invariant sort (sort xs)
fromSortedList (SortedList (Invariant _ xs)) = xs


Any time an operation such as gmapT is applied to the data type, the invariant function is applied to the result. The fromSortedList function can then rely on this invariant.

The gunfold method is partially implemented - all constructed values will have an undefined value for all fields, regardless of which function is passed to fromConstrB. If you only use fromConstr (as Uniplate does) then the gunfold method is sufficient.

Map

Using the Hide, Trigger and Invariant types, we can define a wrapper for the containers Map type as:


newtype Map k v = Map (Invariant (Trigger [k], Trigger [v], Hide (Map.Map k v)))
deriving (Data, Typeable)


The Map type is defined as an Invariant of three components - the keys, the values, and the underlying Map. We use Invariant to ensure that the keys/values/map always remain in sync. We use Trigger on the keys and values to ensure that whenever the keys or values change we rebuild the Map, but if they don't, we reuse the previous value. The function to extract a containers Map from the wrapper requires only simple pattern matching:


fromMap (Map (Invariant _ (_,_,Hide x))) = x


The function to wrap a containers Map is slightly harder, as we need to come up with an invariant restoring function:


toMap :: Ord k => Map.Map k v -> Map k v
toMap x = Map $ Invariant inv $ create x
where
create x = (Trigger False ks, Trigger False vs, Hide x)
where (ks,vs) = unzip $ Map.toAscList x

inv (ks,vs,x)
| trigger ks = create $ Map.fromList $ zip (fromTrigger ks) (fromTrigger vs)
| trigger vs = create $ Map.fromDistinctAscList $ zip (fromTrigger ks) (fromTrigger vs)
| otherwise = (ks,vs,x)


The create function creates a value from a Map, getting the correct keys and values. The inv function looks at the triggers on the keys/values. If the keys trigger has been tripped, then we reconstruct the Map using fromList. If the values trigger has been tripped, but they keys trigger has not, we can use fromDistinctAscList, reducing the complexity of constructing the Map. If nothing has changed we can reuse the previous value.

The end result is that all Uniplate (or SYB) traversals over Map result in a valid value, which has had all appropriate transformations applied.

Tuesday, November 01, 2011

Haskell Implementors Workshop 2011

The videos from the Haskell Implementors Workshop 2011 are now online at YouTube. I'm currently uploading them to Vimeo, but that will take another week (I ran out of free quota).

Many thanks to Kento EMOTO who did all the editing and conversion, and to all the speakers for the great talks.

Sunday, October 30, 2011

Calling Haskell from R

Summary: This post describes how to write a function in Haskell, and then call it from R.

R is a very popular language for statistics, particular with biologists (and computational paleobiologists). For writing high performance code, the R developers recommend the use of C or Fortran - not languages that are particularly easy for beginners. However, you can instead write a Haskell function that can be called directly from R. The basic idea is to create a C compatible library using Haskell (as described in the GHC users manual) and then call that library from R (as described in this document). As a simple example, let's write a function that adds up the square roots of a list of numbers.

Create an R-compatible Haskell library

In normal Haskell, we would define the function to add up the square roots of a list as:

sumRoots :: [Double] -> Double
sumRoots xs = sum (map sqrt xs)


However, to make a function that is compatible with R, we have to follow two rules:

  • Every argument must be a Ptr to a C compatible type, typically Int, Double or CString. (To be pedantic, we should probably use CInt or CDouble, but using GHC on Windows these types are equivalent - keeping things simpler.)
  • The result must be IO ()


Obeying these restrictions, we need to use the type:

sumRootsR :: Ptr Int -> Ptr Double -> Ptr Double -> IO ()
sumRootsR n xs result = ...


Instead of passing in the list xs, we now pass in:

  • n, the length of the list xs
  • xs, the elements of the list
  • result, a space to put the result


We can implement sumRootsR by using the functions available in the Foreign module:

sumRootsR :: Ptr Int -> Ptr Double -> Ptr Double -> IO ()
sumRootsR n xs result = do
n <- peek n
xs <- peekArray n xs
poke result $ sumRoots xs


This function first gets the value for n, then for each element in 0..n-1 gets the element out of the pointer array xs and puts it in a nice list. We then call the original sumRoots, and store the value in the space provided by result. As a general rule, you should put all the logic in one function (sumRoots), and the wrapping in another (sumRootsR). We can then export this function with the definition:

foreign export ccall sumRootsR :: Ptr Int -> Ptr Double -> Ptr Double -> IO ()


Putting everything together, we end up with the Haskell file:

-- SumRoots.hs
{-# LANGUAGE ForeignFunctionInterface #-}
module SumRoots where

import Foreign

foreign export ccall sumRootsR :: Ptr Int -> Ptr Double -> Ptr Double -> IO ()

sumRootsR :: Ptr Int -> Ptr Double -> Ptr Double -> IO ()
sumRootsR n xs result = do
n <- peek n
xs <- peekArray n xs
poke result $ sumRoots xs

sumRoots :: [Double] -> Double
sumRoots xs = sum (map sqrt xs)


We also need a C stub file. The one described in the GHC users guide works well:

// StartEnd.c
#include <Rts.h>

void HsStart()
{
int argc = 1;
char* argv[] = {"ghcDll", NULL}; // argv must end with NULL

// Initialize Haskell runtime
char** args = argv;
hs_init(&argc, &args);
}

void HsEnd()
{
hs_exit();
}


We can now compile our library with the commands:

ghc -c SumRoots.hs
ghc -c StartEnd.c
ghc -shared -o SumRoots.dll SumRoots.o StartEnd.o


This creates the library SumRoots.dll.

Calling Haskell from R

At the R command prompt, we can load the library with:

dyn.load("C:/SumRoots.dll") # use the full path to the SumRoots library
.C("HsStart")


We can now invoke our function:

input = c(9,3.5,5.58,64.1,12.54)
.C("sumRootsR", n=as.integer(length(input)), xs=as.double(input), result=as.double(0))$result


This prints out the answer 18.78046.

We can make this function easier to use on the R side by writing a wrapper, for example:

sumRoots <- function(input)
{
return(.C("sumRootsR", n=as.integer(length(input)), xs=as.double(input), result=as.double(0))$result)
}
Now we can write:
sumRoots(c(12,444.34))
And get back the answer 24.54348. With a small amount of glue code, it's easy to call Haskell libraries from R programs.

Update: See the comments below from Alex Davis for how to do such things on newer versions of Mac OS.

Saturday, October 01, 2011

Haskell Implementors Workshop 2011 slides

I've just uploaded the slides for all the talks given at the Haskell Implementors Workshop 2011 - just click on "Slides" next to any of the talks. Many thanks to all the speakers for very interesting talks, and for sending me their slides.

There will be video, hosted on either YouTube or Vimeo, at some point in the future. I've got all the copyright permission forms, but I don't yet have the video footage. I'm currently working to get a copy of the video sent from Japan.

Template Haskell fights with Generic Programming

Summary: The InfixE construction in Template Haskell fits poorly with generic programming, because its type does not capture all its restrictions. This mismatch can result in confusing bugs, but there is a simple workaround.

I have often said that anyone manipulating abstract syntax trees, without using some form of generic programming, is doing it wrong. Recently I have been manipulating Template Haskell syntax trees using Uniplate, my preferred generic programming library. Consider the problem of replacing all instances of delete with deleteBy (==) - this task can be done with Template Haskell:


{-# LANGUAGE TemplateHaskell #-}
module IHateDelete where
import Data.List
import Language.Haskell.TH
import Data.Generics.Uniplate.Data

iHateDelete :: Q [Dec] -> Q [Dec]
iHateDelete = fmap (transformBi f)
where
f :: Exp -> Exp
f (VarE x) | x == 'delete = VarE 'deleteBy `AppE` VarE '(==)
f x = x


We can test this function with:


{-# LANGUAGE TemplateHaskell #-}
import IHateDelete
import Data.List

$(iHateDelete
[d|
mapDelete x = map (delete x)
myElem x xs = length (delete x xs) /= length xs
|])


To see the result of running iHateDelete pass the flag -ddump-splices. As far as we can tell, our iHateDelete function works perfectly. But wait - let's try another example:


$(iHateDelete
[d|
myDelete x xs = x `delete` xs
|])


In GHC 6.12, we get a GHC panic. In GHC 7.2 we get the error message:


Operator application with a non-variable operator: deleteBy (==)
(Probably resulting from a Template Haskell splice)


(I would find this message far clearer if it said "Infix application..." rather than "Operation application...")

The body of myDelete starts out as:


InfixE (Just (VarE 'x)) (VarE 'delete) (Just (VarE' xs))


After our transformation, this becomes:


InfixE (Just (VarE 'x)) (AppE (VarE 'deleteBy) ('(==))) (Just (VarE' xs))


Or, written in Haskell syntax:


x `deleteBy (==)` xs


This expression is not valid Haskell, and causes an error when spliced back in (when inserted back into the Haskell code).

The Problem

The underlying problem is called out in the Template Haskell Exp documentation:


InfixE (Maybe Exp) Exp (Maybe Exp)
-- ^ It's a bit gruesome to use an Exp as the operator, but how else can we distinguish constructors from non-constructors?
-- Maybe there should be a var-or-con type? Or maybe we should leave it to the String itself?


The operator in InfixE has a type which permits any expression, but has the restriction that when spliced back in the expression must only be a VarE or ConE. This restriction poses a problem for generic programming, where values are treated in a uniform manner. Sadly, both of the suggested fixes would also cause problems for generic programming. Perhaps the true fix is to let Haskell have arbitrary expressions for infix operators? Or perhaps Template Haskell should translate InfixE to AppE if the operator is incompatible with Haskell?

The Workaround

As a workaround, you can translate away all InfixE expressions that have a complex middle expression. I use the following function:


fixupInfix :: [Dec] -> [Dec]
fixupInfix = transformBi f
where
bad VarE{} = False
bad ConE{} = False
bad _ = True

f (InfixE a b c) | bad b = case (a,c) of
(Nothing, Nothing) -> b
(Just a , Nothing) -> b `AppE` a
(Nothing, Just c ) -> VarE 'flip `AppE` b `AppE` c
(Just a , Just c ) -> b `AppE` a `AppE` c
f x = x


The original iHateDelete can then be modified to become:


iHateDelete = fmap (fixupInfix . transformBi f)


.

Tuesday, September 06, 2011

Faster WPF ComboBoxes

If you create a WPF ComboBox with 2000 items it will take about 2 seconds to drop down, and about 1 second to retract back. But you can make all operations complete instantly if the items are simple (such as strings). If you are writing XAML, add:


<ComboBox>
<ComboBox.ItemsPanel>
<ItemsPanelTemplate>
<VirtualizingStackPanel />
</ItemsPanelTemplate>
</ComboBox.ItemsPanel>

...
</ComboBox>


If you are writing C#, add:


comboBox.ItemsPanel = new ItemsPanelTemplate(new FrameworkElementFactory(typeof(VirtualizingStackPanel)));


Thanks to Henry Hahn for the XAML code I started from. Reading the C# version, I am reminded of the Hammer Factory joke.

Sunday, September 04, 2011

Sharing in Haskell

Summary: The let and lambda constructs give a precise way to control sharing, but their exact use can be tricky. This post gives some worked examples and general guidance.

An easy way to improve performance is to call something fewer times, which requires understanding how many times something gets called. One topic I find myself regularly explaining is how lambda expressions under let expressions affect sharing. Consider the two following examples:

Example 1


f x y = sqrt x + y
result = f 1 2 + f 1 4


Example 2


f x = let sqrt_x = sqrt x in \y -> sqrt_x + y
result = let f_1 = f 1 in f_1 2 + f_1 4


Question

In each example, how many times is sqrt executed to compute result? (Assume no advanced optimisations - these often break down on larger examples.)

Answer

In Example 1 we execute sqrt twice, while in Example 2 we execute sqrt once. To go from Example 1 to Example 2 we need to make two changes:


  • Step 1: Rewrite f to compute sqrt after one argument instead of two.

  • Step 2: Rewrite result to share the result of f with one argument.



Performing either rewrite alone will still result in sqrt being executed twice.

Step 1: Rewriting f

Let's take a look at the original definition of f:


f x y = sqrt x + y


Rewriting this function in English, we can describe it as:


given x and y, compute sqrt x + y


But the computation of sqrt x does not depend on y. If the computation of sqrt x is expensive, and if we know the function will often be called with the same x for many different values of y, it is better to describe it as:


given x, compute sqrt x, then given y, add that value to y


The Haskell syntax for this description is:


f = \x -> let sqrt_x = sqrt x in \y -> sqrt_x + y


Which would usually be written in the equivalent declaration form as:


f x = let sqrt_x = sqrt x in \y -> sqrt_x + y


Step 2: Using the rewritten f

If we look at the definition of result:


result = f 1 2 + f 1 4


We see that the subexpression f 1 occurs twice. We can perform common subexpression elimination (CSE) and write:


result = let f_1 = f 1 in f_1 2 + f_1 4


With the original definition of f, commoning up f 1 would have had no performance benefit - after f was applied to 1 argument it did nothing but wait for the second argument. However, with the revised definition of f, the value f_1 will create the computation of sqrt 1, which will be performed only once when executed by f_1 2 and f_1 4.

The Optimisation

This optimisation technique can be described as:


  • Step 1: Rewrite the function to perform some computation before all arguments are supplied.

  • Step 2: Share the partially applied function.



Crucially the function in Step 1 must take it's arguments in an order that allows computation to be performed incrementally.

A Practical Example

In previous versions of Hoogle, the function I wrote to resolve type synonyms (e.g. type String = [Char]) was:


resolveSynonyms :: [Synonym] -> TypeSig -> TypeSig


Given a list of type synonyms, and a type signature, return the type signature with all synonyms expanded out. However, searching through a list of type synonyms is expensive - it is more efficient to compute a table allowing fast lookup by synonym name. Therefore, I used the optimisation technique above to write:


resolveSynonyms synonyms = let info = buildSynonymTable synonyms in \x -> transformSynonyms info x


This technique worked well, especially given that the list of synonyms was usually constant. However, from simply looking at the type signatures, someone else is unlikely to guess that resolveSynonyms should be partially applied where possible. An alternative is to make the sharing more explicit in the types, and provide:


data SynonymTable
buildSynonymTable :: [Synonym] -> SynonymTable
resolveSynonyms :: SynonymTable -> TypeSig -> TypeSig


The disadvantage is the increase in the size of the API - we have gone from one function to two functions and a data type. Something that used to take one function call now takes two.

Conclusion

I think all Haskell programmers benefit from understand how the interaction of lambda and let affect sharing. Pushing lambda under let is often a useful optimisation technique, particularly when the resulting function is used in a map. However, I wouldn't usually recommend exporting public API's that rely on partial application to get acceptable performance - it's too hard to discover.

Friday, August 19, 2011

IFL 2011 submission

Summary: Submit to IFL 2011. It's a great breeding ground for ideas.

The IFL 2011 submission deadline has been extended until 31st August, and, in a move I thoroughly agree with, the notification of acceptance is now within 24 hours of submission!

IFL (and TFP) do not work in the same way as ICFP/Haskell Symposium. You submit a paper, describing the work you've done (something functional programming related) and then present the paper. After the conference, you resubmit the paper, and the result is reviewed in greater depth. There is a great opportunity to learn from the experience, and produce a second paper that is far better.

I submitted my original supercompilation work to IFL 2007. You can compare the paper I submitted before the conference with the paper I submitted afterwards. Note that the original paper doesn't even mention the word supercompilation! At the conference I talked to many people, learnt a lot about supercompilation, program optimisation and termination orderings (and lots of other interesting topics). Using this experience, and building on the enthusiasm people had expressed, I was able to produce a much better second paper. My work on supercompilation went into my thesis, and lead to a subsequent ICFP paper.

I think that IFL/TFP are great venues for to present your work, and in presenting and discussing it, to understand more about the work - making it better in the process. You have just under 2 weeks to make the IFL 2011 deadline of 31st August.

Sunday, July 17, 2011

Submit to the Haskell Implementors Workshop

Are you going to ICFP 2011 in Japan? Do you work with Haskell? You should consider offering a talk for the Haskell Implementors Workshop! You have until this coming Friday (22nd July 2011) to offer a talk, by emailing an abstract (less than 200 words) to benl -AT- cse.unsw.edu.au - for full details see the Call for Talks. Talks are 20 minutes long, plus 10 minutes for questions.

What topics are suitable?

If you have been doing or thinking stuff that would interest a bunch of Haskell implementors in a pub, you probably have something suitable for a talk. The aim of the workshop is to provide an "opportunity to bat around ideas, share experiences, and ask for feedback from fellow experts". Unlike ICFP or the Haskell Symposium, there are no published proceedings. If you have some work that isn't quite ready for a more formal setting, this workshop is a useful venue to seek feedback. If you have an idea which seems a bit crazy, it's probably also very interesting!

What if I have less to say?

In addition to the 30 minute talks, the Haskell Implementors Workshop will also have 2-10 minute lightening talks, organised on the day. Attend the workshop, and sign up on the day to give a brief talk.

Saturday, June 18, 2011

Changing the Windows titlebar gradient colors

Summary: This post describes how to change the gradient colors of the titlebar of a single window on Windows XP, using C#.

On Windows XP the titlebar of most windows have a gradient coloring:



For various reasons, I wanted to override the titlebar gradient colors on a single window. The SetSysColors function can change the colors, but that applies to all windows and immediately forces all windows to repaint, which is slow and very flickery. You can paint the titlebar your self (like Chrome does), but then you need to draw and handle min/max buttons etc. After some experimentation, using the unofficial SetSysColorsTemp function, I was able to change the gradient for a single window. As an example, here are some screenshots, overriding the left color with green, then overriding both the colors with green/blue and orange/blue:



Disclaimer: This code uses an undocumented function, and while it seems to work robustly on my copy of Windows XP, it unlikely to work on future versions of Windows. Generally, it is a bad idea to override user preferences - for example the titlebar text may not be visible with overridden gradient colors.

All the necessary code is available at the bottom of this post, and can be used by anyone, for any purpose. The crucial function is SetSysColorsTemp, which takes an array of colors and an array of brushes created from those colors. If you call SetSysColorsTemp passing a new array of colors/brushes they will override the global system colors until a reboot, without forcing a repaint. Passing null for both arrays will restore the colors to how they were before. To make the color change only for the current window we hook into WndProc, and detect when the colors are about to be used in a paint operation. We set the system colors before, call base.WndProc which paints the titlebar (using our modified system colors), then put the colors back.

There are two known issues:

1) If you change the colors after the titlebar has drawn, it will not use the colors until the titlebar next repaints. I suspect this problem is solvable.
2) As you can see in the demo screenshots, the area near the min/max buttons does not usually get recolored. If you only change the left gradient color this doesn't matter.



public class GradientForm : Form
{
// Win32 API calls, often based on those from pinvoke.net
[DllImport("gdi32.dll")] static extern bool DeleteObject(int hObject);
[DllImport("user32.dll")] static extern int SetSysColorsTemp(int[] lpaElements, int [] lpaRgbValues, int cElements);
[DllImport("gdi32.dll")] static extern int CreateSolidBrush(int crColor);
[DllImport("user32.dll")] static extern int GetSysColorBrush(int nIndex);
[DllImport("user32.dll")] static extern int GetSysColor(int nIndex);
[DllImport("user32.dll")] private static extern IntPtr GetForegroundWindow();

// Magic constants
const int SlotLeft = 2;
const int SlotRight = 27;
const int SlotCount = 28; // Math.Max(SlotLeft, SlotRight) + 1;

// The colors/brushes to use
int[] Colors = new int[SlotCount];
int[] Brushes = new int[SlotCount];

// The colors the user wants to use
Color titleBarLeft, titleBarRight;
public Color TitleBarLeft{get{return titleBarLeft;} set{titleBarLeft=value; UpdateBrush(SlotLeft, value);}}
public Color TitleBarRight{get{return titleBarRight;} set{titleBarRight=value; UpdateBrush(SlotRight, value);}}

void CreateBrushes()
{
for (int i = 0; i < SlotCount; i++)
{
Colors[i] = GetSysColor(i);
Brushes[i] = GetSysColorBrush(i);
}
titleBarLeft = ColorTranslator.FromWin32(Colors[SlotLeft]);
titleBarRight = ColorTranslator.FromWin32(Colors[SlotRight]);
}

void UpdateBrush(int Slot, Color c)
{
DeleteObject(Brushes[Slot]);
Colors[Slot] = ColorTranslator.ToWin32(c);
Brushes[Slot] = CreateSolidBrush(Colors[Slot]);
}

void DestroyBrushes()
{
DeleteObject(Brushes[SlotLeft]);
DeleteObject(Brushes[SlotRight]);
}

// Hook up to the Window

public GradientForm()
{
CreateBrushes();
}

protected override void Dispose(bool disposing)
{
if (disposing) DestroyBrushes();
base.Dispose(disposing);
}

protected override void WndProc(ref System.Windows.Forms.Message m)
{
const int WM_NCPAINT = 0x85;
const int WM_NCACTIVATE = 0x86;

if ((m.Msg == WM_NCACTIVATE && m.WParam.ToInt32() != 0) ||
(m.Msg == WM_NCPAINT && GetForegroundWindow() == this.Handle))
{

int k = SetSysColorsTemp(Colors, Brushes, Colors.Length);
base.WndProc(ref m);
SetSysColorsTemp(null, null, k);
}
else
base.WndProc(ref m);
}
}

Tuesday, June 07, 2011

INLINE pragmas in the Safe library

Summary: The Safe library has lots of small functions, none of which have INLINE pragmas on them. The lack of INLINE pragmas probably has no effect on the optimisation.

I was recently asked a question about the Safe library:


I was wondering why you didn't use any INLINE pragmas in the Safe library. I'm not a Haskell expert yet, but I've noticed that in many other libraries most one-liners are annotated by INLINE pragmas, so is there any reason you didn't add them to Safe as well?


When compiling with optimisations, GHC tries to inline functions/values that are either "small enough" or have an INLINE pragma. These functions will be inlined within their module, and will also be placed in their module's interface file, for inlining into other modules. The advantage of inlining is avoiding the function call overhead, and possibly exposing further optimisations. The disadvantage is that the code size may grow, which may result in poor utilisation of the processor's instruction cache.

The Safe library contains wrappers for lots of partial Prelude and Data.List functions (i.e. head), with versions that don't fail, or fail with more debugging information. Using the tail function as an example, the library supplies four additional variants:


  • tail :: [a] -> [a], crashes on tail []

  • tailNote :: String -> [a] -> [a], takes an extra argument which supplements the error message

  • tailDef :: [a] -> [a] -> [a], takes a default to return instead of errors

  • tailMay :: [a] -> Maybe [a], wraps the result in a Maybe

  • tailSafe :: [a] -> [a], returns some sensible default if possible, [] in the case of tail



As the questioner correctly noted, there are no INLINE pragmas in the library. Taking the example of tailSafe, it is defined as:


tailSafe :: [a] -> [a]
tailSafe = liftSafe tail null

liftSafe :: (a -> a) -> (a -> Bool) -> (a -> a)
liftSafe func test val = if test val then val else func val


Given this indirection, and the lack of an INLINE pragma, is calling tailSafe as cheap as you might hope? We can answer this question by looking at the interface file at the standard optimisation level, by running ghc -ddump-hi Safe.hs -O -c. Looking at the definition attached to tailSafe, we see:


tailSafe = \val -> case val of
[] -> []
ds1:ds2 -> ds2


The tailSafe function has been optimised as much as it can be, and will be inlined into other modules. Adding an INLINE pragma would have no effect at standard optimisation. When compiling without optimisation, even with an INLINE pragma, the function is not included in the module's interface. Adding the INLINE pragma to tailSafe is of no benefit.

Let us now look at all functions in the Safe module. When compiled without optimisation (-O0) no functions are placed in the interface file. At all higher levels of optimisation (-O1 and -O2) the interface files are identical. Looking through the functions, only 6 functions are not included in the interface files:

abort

The abort function is defined as:


abort = error


Neither adding an INLINE pragma or eta expanding (adding an additional argument) includes the function in the interface file. I suspect that GHC decides no further optimisation is possible, so doesn't ever inline abort.

atMay and $watMay

The function atMay is recursive, and thus cannot be inlined even when it's definition is available, so GHC sensibly omits it from the interface file. The function $watMay is the worker/wrapper definition of atMay, which is also recursive.

readNote and readMay

The functions readNote and readMay both follow exactly the same pattern, so I describe only readNote. The function readNote is quite big, and GHC has split it into two top-level functions - producing both readNote and readNote1. The function readNote is included in the interface file, but readNote1 is not. The result is that GHC will be able to partially inline the original definition of readNote - however, the read functions tend to be rather complex, so GHC is unlikely to benefit from the additional inlining it has managed to acheive.

atNote

The atNote function is bigger than the inlining threshold, so does not appear in the interface file unless an INLINE pragma is given. Since the inner loop of atNote is recursive, it is unlikely that inlining would give any noticable benefit, so GHC's default behaviour is appropriate.

Conclusion

Having analysed the interface files, I do not think it is worth adding INLINE pragmas to the Safe library - GHC does an excellent job without my intervention. My advice is to only add INLINE pragmas after either profiling, or analysing the resulting Core program.

Sunday, May 22, 2011

Hoogle talk from TFP 2011 [PDF]

Last week I went to TFP 2011, and gave a talk on Hoogle entitled "Finding Functions from Types". The slides are now available online. These slides give some information about how the type searching works in Hoogle, and I intend to write further details in the future.

Sunday, May 08, 2011

CmdArgs is not dangerous

Summary: CmdArgs can be used purely, and even if you choose to use it in an impure manner, you don't need to worry.

As a result of my blog post yesterday, a few people have said they have been put off using CmdArgs. There are three reasons why you shouldn't be put off using CmdArgs, even though it has some impure code within it.

1: You can use CmdArgs entirely purely

You do not need to use the impure code at all. The module System.Console.CmdArgs.Implict provides two ways to write annotated records. The first way is impure, and uses cmdArgs and &=. The second way is pure, and uses cmdArgs_ and +=. Both ways have exactly the same power. For example, you can write either of:


sample = cmdArgs $
Sample{hello = def &= help "World argument" &= opt "world"}
&= summary "Sample v1"

sample = cmdArgs_ $
record Sample{} [hello := def += help "World argument" += opt "world"]
+= summary "Sample v1"


The first definition is impure. The second is pure. Both are equivalent. I prefer the syntax of the first version, but the second version is not much longer or uglier. If the impurities scare you, just switch. The Implicit module documentation describes four simple rules for converting between the two methods.

2: You do not need to use annotated records at all

Notice that the above module is called Implicit, CmdArgs also features an Explicit version where you create a data type representing your command line arguments (which is entirely pure). For example:


arguments :: Mode [(String,String)]
arguments = mode "explicit" [] "Explicit sample program" (flagArg (upd "file") "FILE")
[flagOpt "world" ["hello","h"] (upd "world") "WHO" "World argument"
,flagReq ["greeting","g"] (upd "greeting") "MSG" "Greeting to give"
,flagHelpSimple (("help",""):)]
where upd msg x v = Right $ (msg,x):v


Here you construct modes and flags explicitly, and pass functions which describe how to update the command line state. Everything in the Implicit module maps down to the Explicit module. In addition, you can use the GetOpt compatibility layer, which also maps down to the Explicit parser.

Having written command line parsers with annotated records, I'd never go back to writing them out in full. However, if you want to map your own command line parser description down to the Explicit version you can get help messages and parsing for free.

3: I fight the optimiser so you don't have to

Even if you choose to use the impure implicit version of CmdArgs, you don't need to do anything, even at high optimisation levels. I have an extensive test suite, and will continue to ensure CmdArgs programs work properly - I rely on it for several of my programs. While I find the implicit impure nicer to work with, I am still working on making a pure version with the same syntax, developing alternative methods of describing the annotations, developing quasi-quoters etc.

Saturday, May 07, 2011

CmdArgs - Fighting the GHC Optimiser

Summary: Everyone using GHC 7 should upgrade to CmdArgs 0.7. GHC's optimiser got better, so CmdArgs optimiser avoidance code had to get better too.

Update: Has this post scared you off CmdArgs? Read why CmdArgs is not dangerous.

CmdArgs is a Haskell library for concisely specifying the command line arguments, see this post for an introduction. I have just released versions 0.6.10 and 0.7 of CmdArgs, which are a strongly recommended upgrade, particular for GHC 7 users. (The 0.6.10 is so anyone who specified 0.6.* in their Cabal file gets an automatic upgrade, and 0.7 is so people can write 0.7.* to require the new version.)

Why CmdArgs is impure

CmdArgs works by annotating fields to determine what command line argument parsing you want. Consider the data type:


data Opts = Opts {file :: FilePath}


By default this will create a command line which would respond to myprogram --file=foo. If instead we want to use myprogram foo we need to attach the annotation args to the file field. We can do this in CmdArgs with:


cmdArgs $ Opts {file = "" &= args}


However, CmdArgs does not have to be impure - you can equally use the pure variant of CmdArgs and write:


cmdArgs_ $ record Opts{} [file := "" += args]


Sadly, this code is a little more ugly. I prefer the impure version, but everything is supported in both versions.

I am still experimenting with other ways of writing an annotated record, weighing the trade-offs between purity, safety and the syntax required. I have been experimenting with pure variants tonight, and hope in a future release to make CmdArgs pure by default.

GHC vs CmdArgs

Because CmdArgs uses unsafe and untracked side effects, GHC's optimiser can manipulate the program in ways that change the semantics. A standard example where GHC's optimiser can harm CmdArgs is:


Opts2 {file1 = "" &= typFile, file2 = "" &= typFile}


Here the subexpression "" &= typFile is duplicated, and if GHC spots this duplication, it can use common sub-expression eliminate to transform the program to:


let x = "" &= typFile in Opts2 {file1 = x, file2 = x}


Unfortunately, because CmdArgs is impure, this program attaches the annotation to file1, but not file2.

This optimisation problem happens in practice, and can be eliminated by writing {-# OPTIONS_GHC -fno-cse #-} in the source file defining the annotations. However, it is burdensome to require all users of CmdArgs to add pragmas to their code, so I investigated how to reduce the problem.

Beating GHC 6.10.4

The key function used to beat GHC's optimiser is &=. It is defined as:


(&=) :: (Data val, Data ann) => val -> ann -> val
(&=) x y = addAnn x y


In order to stop CSE, I use two tricks. Firstly, I mark &= as INLINE, so that it's definition ends up in the annotations - allowing me to try and modify the expression so it doesn't become suitable for CSE. For GHC 6.10.4 I then made up increasingly random expressions, with increasingly random pragmas, until the problem went away. The end solution was:


{-# INLINE (&=) #-}
(&=) :: (Data val, Data ann) => val -> ann -> val
(&=) x y = addAnn (id_ x) (id_ y)

{-# NOINLINE const_ #-}
const_ :: a -> b -> b
const_ f x = x

{-# INLINE id_ #-}
id_ :: a -> a
id_ x = const_ (\() -> ()) x


Beating GHC 7.0.1

Unfortunately, after upgrading to GHC 7.0.1, the problem happened again. I asked for help, and then started researching GHC's CSE optimiser (using the Hoogle support for GHC I added last weekend - a happy coincidence). The information I found is summarised here. Using this information I was able to construct a much more targeted solution (which I can actually understand!):


{-# INLINE (&=) #-}
(&=) x y = addAnn x (unique x)

{-# INLINE unique #-}
unique x = case unit of () -> x
where unit = reverse "" `seq` ()


As before, we INLINE the &=, so it gets into the annotations. Now we want to make all annotations appear different, even though they are the same. We use unique which is equivalent to id, but when wrapped around an expression causes all instances to appear different under CSE. The unit binding has the value (), but in a way that GHC can't reduce, so the case does not get eliminated. GHC does not CSE case expressions, so all annotations are safe.

How GHC will probably defeat CmdArgs next

There are three avenues GHC could explore to defeat CmdArgs:


  • Reorder the optimisation phases. Performing CSE before inlining would defeat everything, as any tricks in &= would be ignored.

  • Better CSE. Looking inside case would defeat my scheme.

  • Reduce unit to (). This reduction could be done in a number of ways:

    • Constructor specialisation of reverse should manage to reduce reverse to "", which then seq can evaluate, and then eliminate the case. I'm a little surprised this isn't already happening, but I'm sure it will one day.

    • Supercompilation can inline recursive functions, and inlining reverse would eliminate the case.

    • If GHC could determine reverse "" was total it could eliminate the seq without knowing it's value. This is somewhat tricky for reverse as it isn't total for infinite lists.





Of these optimisations, I consider the reduction of unit to be most likely, but also the easiest to counteract.

How CmdArgs will win

The only safe way for CmdArgs to win is to rewrite the library to be pure. I am working on various annotation schemes and hope to have something available shortly.

Sunday, May 01, 2011

Searching GHC with Hoogle

Summary: Hoogle can now search the GHC source code. There are also lots of small improvements in the latest version.

A few weeks ago Ranjit Jhala asked me for help getting Hoogle working on the GHC documentation. As a result of this conversation, I've now released Hoogle 4.2.3, and upgraded the Hoogle web tool.

For GHC developers

You can search the GHC documentation using the standard Hoogle website, for example: llvm +ghc

To search within a package simply write +package in your search query. The ghc package on Hoogle includes all the internals for GHC.

If you want to search using the console, you can install Hoogle and generate the GHC package database with:


cabal update
cabal install hoogle
hoogle data default ghc


You can now perform searches with:


hoogle +ghc llvm


For all Hoogle users

The new release of Hoogle contains a number of small enhancements:


  • The web server has been upgraded to Warp. I'll write a blog post shortly on the move to Warp - but generally it's been a very positive step.

  • Some of the snippets of documentation have been fixed, where the markup was interpreted wrongly.

  • There is only an expand button next to the documentation if there is more information to expand.

  • Some iPad integration, so you can now add it to your home page with a nice icon.

  • Work on a deployment script to automate uploading a new version to the web server, which will allow for more frequent updates (until now it took over 2 hours to deploy a new version).

  • Updates as some web resources moved around, particularly the Haskell Platform cabal file.



The theory behind Hoogle

I'll be talking about the theory behind type searching in Hoogle at Trends in Functional Programming 2011 in Madrid in a few weeks time. It's not too late to register.

Friday, April 29, 2011

Darcs for my Wife

Summary: Using the version control system Darcs is simple. This document describes the commands I use in my workflow.

Darcs is my favourite version control system because of its simple user interface. I am able to do everything I need with a handful of simple commands. I work with darcs as a single user, without branches, occasionally accepting small external contributions. This guide is designed to provide a sufficient reference for a someone who isn't a computer expert to continue using Darcs after their friendly comp-sci has set up the SSH bits for them (e.g. my wife).

A Darcs repository (or repo) is a set of changes (sometimes referred to as patches). There are three places changes may live:


  • Remote Repo, such as on community.haskell.org.

  • Local Repo, on your computer. You may have many local repos all using the same remote repo, especially if you use multiple computers.

  • Local Changes, modifications you have made to your local repo, by editing files.



The way changes move between locations is described by:



Basic Commands

I use four commands daily, shown in red on the above diagram, which are:


  • darcs pull - copy changes from the remote repo to your local repo. Use pull at the beginning of the day, and during the day if you are sharing a remote repo with other people.

  • darcs whatsnew - see what local changes you have made. A useful flag is --summary, which lists only the names of files which have changed.

  • darcs record - after you have decided to keep your changes, use record to store them in your local repo.

  • darcs push --no-set-default ndm@code.haskell.org:/srv/code/hoogle - at the end of the day use push to move your changes to the remote repo. This command is the most tricky to use, you need SSH access to the server and need to use the file path of the repo, not the URL exposed by the website. I recommend writing a .bat file to automate this command, or using the neil push command if appropriate.



In addition to the four commands above, there are two additional basic commands that are used more rarely:


  • darcs add filename - tell darcs that you have created a new file that should be kept in the repo. If you forget to add a file, darcs will not store it for you.

  • darcs mv oldname newname - rename a file stored in darcs. Use this command instead of directly renaming the file using a file explorer.



Creating Repos

Usually I work with existing repos, but it's useful to know how to create new repos:


  • darcs init - create an empty remote repo, with no changes in it. This command is usually run on the server.

  • darcs get http://code.haskell.org/hoogle - create a new local repo based on a remote repo.



Advanced Commands

These commands are useful to correct mistakes, but they aren't essential - you can do everything you need without them.


  • darcs revert - undo local changes that have not yet been recorded.

  • darcs unrecord - undo a previous record command. Do not unrecord if you have pushed the changes, or have done substantial work since the initial record. Instead, manually create a new change undoing the previous record, and record that.



Collaboration Commands

If you are accepting occasional contributions from other people, you will need the following commands:


  • darcs send -o filename.patch - make a file containing all the changes in your local repo, but not in the remote repo. This file can be emailed to other people.

  • darcs apply filename.patch - apply a set of changes sent to you by email.



Conclusion

Effective use of darcs requires only a few simple commands, with only a few command line switches. This guide includes all the commands I ever use. Darcs rarely surprises me, and I think it is a suitable version control system for people who are familiar with the command line, but aren't computer experts.