Neil Mitchell's Blog (Haskell etc): 2009

Monday, November 23, 2009

Haskell DLL's on Windows

The current section of the GHC manual on creating DLL's on Windows is fairly confusing to read, and has some bugs (i.e. 3605). Since I got tripped up by the current documentation, I offered to rewrite sections 11.6.2 and 11.6.3 (merging them in the process). Creating Windows DLL's with GHC is surprisingly easy, and my revised manual section includes an example which can be called from both Microsoft Word (using VBA) and C++. I've pasted the revised manual section as the rest of this blog post. I'll shortly be submitting it to the GHC team, so any feedback is welcome.

11.6.2. Making DLLs to be called from other languages

This section describes how to create DLLs to be called from other languages, such as Visual Basic or C++. This is a special case of Section 8.2.1.2, "Making a Haskell library that can be called from foreign code"; we'll deal with the DLL-specific issues that arise below. Here's an example:

Use foreign export declarations to export the Haskell functions you want to call from the outside. For example:


-- Adder.hs
{-# LANGUAGE ForeignFunctionInterface #-}
module Adder where

adder :: Int -> Int -> IO Int  -- gratuitous use of IO
adder x y = return (x+y)

foreign export stdcall adder :: Int -> Int -> IO Int

Add some helper code that starts up and shuts down the Haskell RTS:


// StartEnd.c
#include <Rts.h>

extern void __stginit_Adder(void);

void HsStart()
{
   int argc = 1;
   char* argv[] = {"ghcDll", NULL}; // argv must end with NULL

   // Initialize Haskell runtime
   char** args = argv;
   hs_init(&argc, &args);

   // Tell Haskell about all root modules
   hs_add_root(__stginit_Adder);
}

void HsEnd()
{
   hs_exit();
}

Here, Adder is the name of the root module in the module tree (as mentioned above, there must be a single root module, and hence a single module tree in the DLL). Compile everything up:


$ ghc -c Adder.hs
$ ghc -c StartEnd.c
$ ghc -shared -o Adder.dll Adder.o Adder_stub.o StartEnd.o

Now the file Adder.dll can be used from other programming languages. Before calling any functions in Adder it is necessary to call HsStart, and at the very end call HsEnd.

NOTE: It may appear tempting to use DllMain to call hs_init/hs_exit, but this won’t work (particularly if you compile with -threaded).

11.6.2.1. Using from VBA

An example of using Adder.dll from VBA is:


Private Declare Function Adder Lib "Adder.dll" Alias "adder@8" _
      (ByVal x As Long, ByVal y As Long) As Long

Private Declare Sub HsStart Lib "Adder.dll" ()
Private Declare Sub HsEnd Lib "Adder.dll" ()

Private Sub Document_Close()
HsEnd
End Sub

Private Sub Document_Open()
HsStart
End Sub

Public Sub Test()
MsgBox "12 + 5 = " & Adder(12, 5)
End Sub

This example uses the Document_Open/Close functions of Microsoft Word, but provided HsStart is called before the first function, and HsEnd after the last, then it will work fine.

11.6.2.2. Using from C++

An example of using Adder.dll from C++ is:


// Tester.cpp
#include "HsFFI.h"
#include "Adder_stub.h"
#include <stdio.h>

extern "C" {
    void HsStart();
    void HsEnd();
}

int main()
{
    HsStart();
    // can now safely call functions from the DLL
    printf("12 + 5 = %i\n", adder(12,5))    ;
    HsEnd();
    return 0;
}

This can be compiled and run with:


$ ghc -o tester Tester.cpp Adder.dll.a
$ tester
12 + 5 = 17

Please give feedback in the comments.

Monday, November 16, 2009

Reviewing View Patterns

View Patterns are an interesting extension to the pattern matching capabilities of Haskell, implemented in GHC 6.10 and above. After using view patterns in real world programs, including HLint, I've come to like them. I use view patterns in 10 of the 27 modules in HLint.

View Pattern Overview

My intuitive understanding of view patterns is given in my Approaches and Applications of Inductive Programming 2009 paper, which describes the view pattern translation as:


f (sort -> min:ascending) = ...
    ==
f v_1 | min:ascending <- sort v_1 = ...
    ==
f v_1 | case v_2 of _:_ -> True ; _ -> False = ...
    where  v_2 = sort v_1 ; min:ascending = v_2

The view pattern on the first line sorts the list elements, then binds the lowest element to min and the remaining elements to ascending. If there are no elements in the list then the pattern will not match. This can be translated to a pattern guard, which can then be translated to a case expression. This translation does not preserve the scoping behaviour of the variables, but is sufficient for all my uses of view patterns. It is important to note that the translation from view patterns to pattern guards is fairly simple, and mainly eliminates one redundant intermediate variable. However, the translation from pattern guards to case expressions and guards is substantially harder.

How I Use View Patterns

My uses of view patterns seem to fall into a few distinct categories. Here are some example code snippets (mainly from HLint), along with explanation.

1) Complex/Nested Matching


uglyEta (fromParen -> App f (fromParen -> App g x)) (fromParen -> App h y) = g == h

Every operation/match pair in a pattern guard requires a separate pattern guard, whereas view patterns can be nested very naturally. Here the abstract syntax tree for expressions has brackets, and the fromParen function unwraps any brackets to find the interesting term inside. View patterns allow us to perform nested matches, which would have required three separate pattern guards.

2) Matching on a Different Structure


isAppend (view -> App2 op _ _) = op ~= "++"

The expression tree I use in HLint has lots of expressions which apply a function to two arguments - for example App (App (Var f) x) y and InfixOp x f y. I have a type class View that maps expressions into the data type data App2 = NoApp2 | App2 String Exp Exp, allowing easy matching on a range of expressions.

3) Safe Normalisation


dismogrify (simplify -> x) = .... x ....

While working with Yhc Core for the Catch and Supero tools I often wanted to process a syntax tree after simplifying it. If you name the original tree x, and the simplified tree y, then it's an easy (and type-safe) mistake to use x instead of y. To avoid this I wrote:


dismogrify bad_x = .... x ....
   where x = simplify bad_x

Using bad_x in the expression makes the mistake easy for a human to spot. Using a view pattern makes the mistake impossible.

4) Mapping


classify (Ident (getRank -> x)) = ...

Sometimes I want to take a variable in one domain, and work with it in another. In the above example getRank converts a String to a Rank enumeration. Within the classify function I only wish to work with the rank as an enumeration, so it's convenient to never bind the string. This pattern is similar to safe normalisation, but it's purpose isn't safety - just making things a little neater.

5) Abstraction

The view pattern example in the GHC manual is all about abstraction. I have mainly used HLint in programs which don't use abstract data types, just algebraic data types which are intended to be manipulated directly. I don't think there are many data types which are both abstract and have a structural view, so I suspect this use will be less common (Data.Sequence is the only type that comes to mind).

Improvements I Suggest

I think there are three improvements that should be made to the view patterns in GHC 6.10.4:

1) Warnings

In GHC 6.10 all view patterns are incorrectly considered overlapping (see bug #2395), so all users of view patterns need to supply -fno-warn-overlapping-patterns. This problem has been fixed in GHC 6.12, which is great news.

2) Scoping

The current scoping behaviour seems undesirable:


apply (f -> y) = ...
    where f = ...

Here the f in the view pattern isn't the f bound at the where. I suggest that the lhs of the -> can use variables from the where, in a similar manner to pattern guards. (It's possible this suggestion is misguided, as the scoping rules can be quite subtle.)

3) Implicit Patterns

The original view patterns wiki document asks what should become of (-> ...), and proposes it become (view -> ...). I like this idea as HLint already contains 12 instances of (view -> ...). The only question is which view should be used? I think there are two possible answers:

a) The view currently in scope

If the desugaring is simply to view, then people can select their imports appropriately to choose their view function. This proposal is similar to the rebindable syntax already supported, but in this case may be a legitimate default, due to several possible view interpretations. If one day everyone starts using Data.View.view, then the default could be switched. As an example (in combination with proposal 2) we could have:


uglyEta (-> App f (-> App g x)) (-> App h y) = g == h
    where view = fromParen

b) Data.View.view

In HLint I have used:


class View a b where
    view :: a -> b

I haven't needed any functional dependencies, as the matching always constrains the types sufficiently. I have mapped one source type (i.e. Exp) to several matching types (App2 and App1), but never mapped multiple source types onto one matching type. If I was to add a dependency it should be that b uniquely determines a, as usually b will have a pattern on the RHS which will constrain b already.

I think my preference is for using Data.view.view, primarily because all other Haskell syntax is bound to a fixed name, rather than using the name currently in scope. However, my opinions on functional dependencies should be taken with skepticism - I'm not normally a user of functional dependencies.

4) Rejected Suggestions

I do not support the idea of implicit view patterns without some leading syntax (see bug 3583) - view patterns are nice, but I don't think they are important enough to be first-class, like they are in F# (note that F# interoperates with OO languages, so first-class view patterns are much more reasonable there).

I also do not support the idea of implicit Maybe in view patterns - Maybe should not be special, and this suggestion doesn't seem to fit with the rest of Haskell.

Conclusion

View patterns are a nice enhancement to pattern guards, increasing their compositionality and reducing the need for redundant intermediate variables. I could live without view patterns, but I don't think I should have to - the design is good, and they fit in nicely with the language. As for pattern guards, I consider them an essential part of the Haskell language that really makes a substantially difference to some pieces of code that would otherwise be rather ugly.

Edit: Fix as per Christophe's comment.

Saturday, September 12, 2009

How I Use HLint

HLint is a tool for automatically suggesting improvements to your Haskell code. This post describes how I use HLint, and provides and some background on its development. Before reading this article, if you are an active Haskell programmer who has not yet tried out HLint, I suggest you perform the following steps:


cabal update && cabal install hlint
cd your-current-project
hlint . --report
# open report.html in your web browser

The original purpose of HLint was to help teach beginners. When helping with the functional programming course at York, I used to wander round the students, looking at their code, and suggesting improvements. After three years helping with the same course, I found myself regularly suggesting the same improvements. For example, the pattern if a then True else b came up a lot, which can be written more succinctly as a || b. Of course, having turned myself into a pattern recognition tool, the obvious step was to automate myself - and HLint is the result.

I am no longer at a University, and so the way I use HLint has changed. Often on the Haskell Cafe mailing list people ask for code reviews - intermediate level Haskellers trying to gain knowledge from those around them. The suggestions resulting from a code review are often split into two categories. There are small-scale suggestions about things such as using a better library function, and large-scale suggestions about what the structure of the program should be. Often it is useful to tackle the small-scale issues, tidying and polishing what is already there, before investigating any large-scale issues. Unfortunately reviewers are often short of time, so they may not get round to making large-scale suggestions. The hope is that HLint can automate much of the small-scale suggestions, allowing clever people to use their time more effectively on the more complex problems.

Another reason to use HLint is one of developer pride. Some developers do not react well to criticism, and take comments about their code in a very personal way. Worse still, if you declare that some small syntactic pattern is the "wrong way to do it", then you can inadvertently end up just point out the failings. In contrast, if HLint is run first, then the human suggestions are typically deeper, and are design trade-offs that can be debated.

HLint is not designed as a tool to fix existing code, but more as a tool to promote learning, thus pre-emptively fixing future code. I do not intend people to slavishly apply the hints given by HLint - each hint should be carefully considered. For example, the darcs project uses HLint, but has decided that they are not interested in eta reduction hints, so have used HLint's ignoring facility.

One use of HLint is to provide an easy mechanism to start participating in an open source project. One of the largest hurdles in project participation is writing your first patch. Many projects have different conventions and requirements, plus there is usually a large code base that needs to be learnt. A good first step might be to run HLint over the code. While many of the hints suggested by HLint might be design decisions, or minor issues, there are likely to be a few more unambiguous improvements. As a simple example, taking the xmonad code base and applying HLint shows that the import Data.Maybe statements in XMonad\Core.hs could be combined. This would be a perfect first patch for a budding xmonad developer.

HLint can be used in many ways, but my two golden rules for HLint usage are:

Do not blindly apply the output of HLint

Never review code that hasn't had HLint applied

Tuesday, June 16, 2009

Draft paper on Derive, comments wanted

It's been a long time since I last blogged (about 3 months). Since then I've had a paper on Firstify accepted in to the Haskell Symposium (I'll post the final version to my website shortly). I've also been writing a paper on Derive to go with my invited talk at Approaches and Applications of Inductive Programming (co-located with ICFP this year). I have to submit a final version by the 22nd of June (6 days time), but any comments on this draft would be gratefully received - either add them as comments to this post or send an email to ndmitchell AT gmail DOT com.

Download link: http://community.haskell.org/~ndm/temp/derive_draft.pdf

Title: Deriving a DSL from One Example

Abstract: Given an appropriate domain specific language (DSL), it is possible to describe the relationship between Haskell data types and many generic functions, typically type class instances. While describing the relationship is possible, it is not always an easy task. There is an alternative -- simply give one example output for a carefully chosen input, and have the relationship derived.

When deriving a relationship from only one example, it is important that the derived relationship is the intended one. We identify general restrictions on the DSL, and on the provided example, to ensure a level of predictability. We then apply these restrictions in practice, to derive the relationship between Haskell data types and generic functions. We have used our scheme in the Derive tool, where over 60% of type classes are derived from a single example.

Home page: http://community.haskell.org/~ndm/derive/

Darcs repo: http://community.haskell.org/~ndm/darcs/derive

The work presented in this paper will become the basis of Derive 2.0. Many thanks for any comments!

Saturday, March 21, 2009

Concise Generic Queries

A few weeks ago David Miani asked how to write concise queries over a data type. The answer is certainly generic programming, a technique that I feel is underused in the Haskell community. I suggested David look at Uniplate, but he found greater success with SYB. Sean Leather gave a solution using EMGM. One of the advantages of Uniplate is conciseness, so I decided to tackle the same problem and compare.

A full description of the task, including data type definitions, is at Sean's blog. From a data type representing structured files (tables, headings, paragraphs) find a heading with a particular name then within that heading find a paragraph starting with "Description". The rest of this post contains solutions using Uniplate, EMGM (taken from Sean) and SYB (from David). The SYB solution is slightly different from the EMGM or Uniplate solutions, but they all do roughly the same generic operations. It is entirely possible that the EMGM/SYB solutions could be improved, but that is a job for other people.

Uniplate Solution

The Uniplate solution is:


projDesc :: String -> OrgFileP -> Maybe String
projDesc name p = listToMaybe [y |
 OrgHeadingP _ x ys <- universeBi p, name == x,
 ParagraphP y <- universeBi ys, "Description" `isPrefixOf` y]

The code can be read as:

Line 1: Type signature, given a name and a file, return the paragraph if you find one

Line 3: Find a heading with the right name

Line 4: Find a paragraph below that heading, whose name starts with "Description"

Line 2: Pick the paragraph

I find this code to be a clear, concise and simple description of the problem. The thought process to come up with the solution was as follows: You want to search, or perform a query. The first question is whether this is a deep (all nodes) or shallow (just the children) query - David doesn't say but the example seems to imply deep. If it's deep use universeBi. Operations are combined with a list comprehension that finds an element, check it has the necessary properties (the name), then performs more operations. The result is the code you see above.

EMGM Solution

Sean's solution can be found at his blog:


projDesc :: String -> OrgFileP -> Maybe String
projDesc name file = do
  hdg <- G.firstr (headings name file)
  para <- firstPara hdg
  if para =~ "Description" then return para else Nothing

headings :: String -> OrgFileP -> [OrgHeadingP]
headings name = filter check . G.collect
  where
    check (OrgHeadingP _ possible _) = name == possible

firstPara :: OrgHeadingP -> Maybe String
firstPara hdg = paraStr =<< G.firstr (G.collect hdg)
  where
    paraStr (ParagraphP str) = Just str
    paraStr _                = Nothing

This solution isn't bad, but is more verbose than the Uniplate solution. Perhaps it could be rewritten with list comprehensions? It seems that G.collect is similar to universeBi - although I am not sure.

SYB Solution

David's SYB solution can be found here along with another solution using different combinators.


eitherOr :: Either a b -> Either a b -> Either a b
eitherOr x@(Right _) _ = x
eitherOr  _ y  = y

getP14Desc :: OrgElement -> Either ErrString String
getP14Desc org = everything eitherOr (Left descError `mkQ` findDesc) =<<
                 everything eitherOr (Left findError `mkQ` findP14) org
    where
      findP14 h@(Heading {headingName=name})
          | name == "Project14" = Right h
      findP14 _ = Left findError

      findDesc (Paragraph {paragraphText=text})
          | text =~ "Description" = Right text
      findDesc _ = Left findError

      descError = "Couldn't find description for project"
      findError = "Couldn't find project."

Summary

The relative merits of each solution are highly subjective, but I believe the Uniplate solution is concise. The Uniplate solution is a simple translation of the problem, without any clever steps, so hopefully other users (who didn't write the library!) will be able to achieve similar results. The Uniplate solution required only one function from the Uniplate library, so has a small learning curve. Even if you don't choose Uniplate, generic programming techniques are very useful, and can make your code concise and robust.

Monday, March 09, 2009

Website move

Today I spotted that I could no longer push to my darcs repos hosted at York University. A little more checking showed that my home page had also been removed - I guess that's what happens when you are no longer a student there (although a warning email before would have been nice...). So I am pleased to announce my new website address:

http://community.haskell.org/~ndm/

Thanks to the wondrous Haskell community for providing all the resources I needed to move my website with no human intervention at haste. Expect my darcs repos to move somewhere shortly too.

I have now submitted the final bound copies of my thesis, and have uploaded a copy to my website (I had uploaded it to York, but didn't get chance to announce it!). I should say a great thank you to everyone who helped with my work/thesis, in particular Colin Runciman for supervising me for six years, and Detlef Plump and Simon Peyton Jones for examining me and really helping improve the final document with their comments.

The thesis has four content chapters, corresponding to Uniplate, Supero, Firstify and Catch. I have submitted a paper to ICFP 09 which expands/clarifies the Firstify work, which I'll upload as a draft shortly. For the other chapters, the version in the thesis is an improvement on the version in any papers I've published.

Sunday, February 22, 2009

Hoogle package search

Recently on the Haskell mailing list there has been some discussions of which packages Hoogle searches by default. One person remarked that it was unfortunate that the network package isn't searched by default. There are lots of packages on Hackage, and Hoogle needs to decide how to cope with so much choice. There are a number of questions that I need to answer in Hoogle:

What packages should Hoogle search by default? All of hackage? The base libraries? Only the packages a user has installed? Only packages that make it in to the Haskell Platform?

What groups of packages should Hoogle have available? Each package individually? All packages which compile on Windows? All packages by a certain author? All packages whose minor version number is even?

What UI should Hoogle show? Should there be checkboxes for each os's package? Should their be a checkbox for each compiler/version? Should their be no UI but some documentation?

And these questions present a number of trade offs:

The packages have to be divided under sensible and clear lines - I don't want to (and shouldn't) arbitrate divisions like "good" or "popular".

The more packages you search, the less relevant the results will be.

The fewer packages you search, the more chance that you miss something.

The more UI that is added the more confusing things get.

My development time for Hoogle derives Bounded, Finite and increasingly also derives Small.

Thoughts and suggestions are very welcome. I've set up a wiki page to track peoples thoughts, please make your view and arguments known: http://haskell.org/haskellwiki/Hoogle/Packages.

(As an aside, I recently found that dolphin friendly tuna is actually really harmful to the environment, far more harmful than dolphin unfriendly tuna. Read more here.)

Tuesday, February 03, 2009

Monomorphism and Defaulting

Haskell has some ugly corners - not many, but a few. One that many people consider exceptionally ugly is the monomorphism restriction. In this post I'm going to discuss three related issues - Constant Applicative Forms (CAFs), the monomorphism restriction and defaulting. But before we start, lets take a simple example.

Computing Pi

Haskell already provides the pi function which represents the value of pi, but lets assume it didn't. Taking a quick look at Wikipedia we can see that one way of computing Pi is the Gregory-Leibniz series. We can calculate pi as:

pi = (4/1) + (-4/3) + (4/5) + (-4/7) + (4/9) + (-4/11) ...

So let's write that as a Haskell program:


pie = sum $ take 1000000 $ zipWith (/) (iterate negate 4) [1,3..]

Here the constant 1000000 gives the accuracy of our approach, increasing this value will give a higher precision. As it currently stands, the Haskell library says pi = 3.14159265358979 and our program says pie = 3.14159165358977. Thirteen matching digits should be suffient for most uses of pi :-)

CAFs

The disadvantage of our pie function is that (under Hugs) it takes about 4 seconds to evaluate. If we are performing lots of calculations with pi, calculating pie each time will be a serious problem. CAFs are the solution!

A CAF is a top-level constant, which doesn't take any arguments, and will be computed at most once per program execution. As a slight subtlety, if the constant has class constraints on it (i.e. is Num a => a, instead of a) then it isn't a CAF because the class constraints act like implicit arguments. Our pie function above doesn't take any arguments, so is a CAF.

Defaulting

While pie doesn't have any class constraints, the right-hand side of pie does! Take a look in Hugs:


Main> :t sum $ take 1000000 $ zipWith (/) (iterate negate 4) [1,3..]
:: (Enum a, Fractional a) => a

Main> :t pie
:: Double

The right-hand side works for any Enum and Fractional type, for example Float, but pie is restricted to Double. The reason is the defaulting mechanism in Haskell - if a type can't be nailed down precisely, but is one of a handful of built-in classes, then it will default to a particular type. This feature is handy for working at an interactive environment, but can sometimes be a little unexpected.

Monomorphism restriction

Without defaulting the compiler would infer the type of pie as ::(Enum a, Fractional a) => a. However, such a definition would be rejected by the monomorphism restriction. The monomorphism restriction states that a function with no explicit arguments, but with class constraints, must be given a type annotation. This rejects functions like:


snub = sort . nub

To fix the problem there are two solutions:


snub i_hate_the_evil_mr = (sort . nub) i_hate_the_evil_mr

snub :: Ord a => [a] -> [a]
snub = sort . nub

For a function like pie only the second approach is applicable. The addition of dummy arguments to avoid the monomorphism restriction is sufficiently common that the HLint tool never suggests eta-reduction if the argument is named mr.

Conclusion

So why was the monomorphism restriction first introducted? For a function with no explicit arguments, the programmer might think they had written a CAF, but class constraints may substantially degrade the performance. Defaulting reduces the number of cases where the monomorphism restriction would otherwise bite, but it is still useful to be aware of the ugly corners.

There are proposals afoot to remove the monomorphism restriction and to increase the power of the default mechanism - hopefully both will be included in to Haskell'.

Tuesday, January 27, 2009

Small scripts with Haskell

Normally I give blog posts detailing the fun, interesting or advanced stuff I do with Haskell. But that isn't a real representation of my programming life! Most of the time I am doing small scripts that do little tasks, so I thought I'd describe one of those. This post is written as Literate Haskell, which means you can save the whole contents as a .lhs file and run it in GHCi or Hugs.

The task I had to complete was to take a directory of files, and for each file foo.txt generate the files foo_m1.txt to foo_m3.txt, where each one file is a block of lines from the original delimited by a blank line. i.e. given the file with the lines ["","1","1","","2","","3"], the numbers "1" would go in foo_m1.txt etc.

This blog post isn't how I actually wrote the original script - I didn't use literate Haskell (since I find it ugly), I didn't give explicit import lists (since they are needlessly verbose), I didn't give type signatures (but I should have) and I didn't split the IO and non-IO as well (but again, I should have). It is intended as a guide to the simple things you can easily do with Haskell. Now on to the code...


> import System.FilePath(takeExtension, dropExtension, (<.>), (</>))
> import System.Directory(getDirectoryContents)
> import Data.Char(isSpace)
> import Control.Monad

First, let's import some useful modules. To find more about a particular function just use Hoogle and search for it, but a quick summary:


takeExtension "foo.txt" = ".txt"
dropExtension "foo.txt" = "foo"
"foo" <.> "txt" = "foo.txt"
"bar" </> "foo.txt" = "bar/foo.txt"
getDirectoryContents "C:\Windows" = running "dir C:\Windows" at the command prompt
isSpace ' ' = True

Every Haskell program starts with a main function, which is an IO action. For this program, we are going to keep all the IO in main, and only use other pure functions. With most file processing applications its best to read files from one directory, and write them to another. That way, if anything goes wrong, its usually easy to recover. In this case we read from "data" and write to "res".


> main :: IO ()
> main = do
>     files <- getDirectoryContents "data"
>     forM_ files $ \file -> when (takeExtension file == ".txt") $ do
>         src <- readFile $ "data" </> file
>         forM_ (zip [1..] (splitFile src)) $ \(i,x) ->
>              writeFile ("res" </> dropExtension file ++ "_m" ++ show i <.> "txt") x

Or in some kind of pseudo-code:


main =
    set files to be the list of files in the directory "data"
    for each file in files which has the extension ".txt"
    {
        set src to be the result of reading the file
        for each numbered result of splitFile
        {
            write out the value from splitFile to the location "res/file_m#.txt"
            where # is the 1-based index into the list of results
        }
    }

We can now move on to the pure bits left over. We want a function splitFile that takes a file, and splits it in to three chunks for each of the blocks in the file. When processing text, often there will be stray blank lines, and the term "blank lines" will also apply to lines consisting only of spaces. The code is below:


> splitFile :: String -> [String]
> splitFile xs = map (tabify . unlines) [s1,s2,s3]
>     where
>         xs2 = dropWhile null $ map (dropWhile isSpace) $ lines xs
>         (s1,_:rest) = break null xs2
>         (s2,_:s3) = break null $ dropWhile null rest

And now presented more as a list of steps:

split the text in to lines

for each line drop all the leading spaces from it

drop all the leading blank lines

break on the first empty line, the bits before are chunk 1

drop all leading blank lines for the rest

break on the first empty line in the rest, before is chunk 2, after is chunk 3

for each of the chunks, put the lines back together, then tabify them

The tabify requirement was added after. The person decided that all continuous runs of spaces should be converted to tabs, so the file could better be loaded in to a spread sheet. Easy enough to add, just a simple bit of recursive programming:


> tabify (' ':xs) = '\t' : tabify (dropWhile (== ' ') xs)
> tabify (x:xs) = x : tabify xs
> tabify [] = []

And again in English:

if you encouter a space, drop it and all successive spaces, and write out a tab

otherwise just continue onwards

Haskell is a great language for writing short scripts, and as the libraries improve it just keeps getting better.

Sunday, January 18, 2009

FsCheck changes

Kurt Schelfthout has just released FsCheck 0.4, a tool similar to QuickCheck but for F#. While working at my internship for Credit Suisse I spent a little bit of time modifying FsCheck to include automatic generators (so you don't have to describe how to generate arbitrary values) and failure shrinking (so the counter-examples are smaller). Both these changes have now been incorporated in to the main FsCheck tool. It is really nice to see the work being contributed back, and that big companies are taking the time to get the necessary legal clearance etc.

I find shrinking to be a particularly potent feature. In one real-world task I struggled to debug a test failure for 8 hours, before shrinking was available. Attacking the same example with FsCheck and shrinking made the reason for the test failure immediately obvious.

Neil Mitchell's Blog (Haskell etc)