Neil Mitchell's Blog (Haskell etc): 2013

Thursday, December 12, 2013

Progress Reporting in Shake

Summary: Shake can predict you how long your build will take. The prediction makes use of stateful applicative streams.

The Shake build system can predict how long your build will take. As an example, on Windows 7 I see:

The progress message is displayed in the titlebar of the console window (on Windows, Linux and Mac), for example 3m12s (82%) to indicate that the build is 82% complete and is predicted to take a further 3 minutes and 12 seconds. If you are running Windows 7 or higher and place the shake-progress utility somewhere on your %PATH% then the progress will also be displayed in the taskbar progress indicator.

Progress reporting is great for end users of Shake based build systems. They may not care that Shake is fast (since they probably have no frame of comparison for your particular build), or that it is robust, or that it is fully featured etc. But everyone wants to know if they should grab a coffee or just engage in some quick sword-fighting.

Limitations

Predicting the future is tricky. Shake is a very dynamic build system, so unlike Make, while running it can both discover new things to do, and decide that things it previously intended to do are unnecessary. It also runs rules in parallel, and each rule takes a different amount of time. All these factors make the progress a "best guess", rather than a statement of fact. In particular:

The first run (or after deleting the Shake database) will have worse predictions, as Shake has no idea how long each rule takes to execute.
In the first few seconds of a build, the predicted progress may vary more dramatically, as Shake is still determining how much is left to do and how fast things are going.
If you have a small number of long executing rules (e.g. taking minutes) then progress changes will be very granular, as no attempt is made at predicting in-progress rules.

These limitations aside, the feature has changed the way I work, and is valuable to many people. One user remarked that "showing the estimated build time remaining in the terminal window's title is genius".

Turning the feature on

Using Shake 0.10.2 or above (from March 2012), you can pass the command line flag --progress or the set the ShakeOptions field shakeProgress to progressSimple. Using the shakeProgress field you can customise how the progress messages are calculated and displayed, for example you can generate annotations for the Team City continuous-integration tool.

Collecting the data

The primary goal for this feature was to have zero overhead for people not using it, and to not complicate the internals of the build system. I achieved that goal by making progress generation take the internal Shake state every 5 seconds (configurable, using shakeProgress) and generate summary statistics from the state. The result is the Progress data type (to follow the rest of this post, you do not need to understand the actual fields below):

data Progress = Progress
    {isFailure :: Maybe String -- ^ Starts out 'Nothing', becomes 'Just' a target name if a rule fails.
    ,countSkipped :: Int -- ^ Number of rules which were required, but were already in a valid state.
    ,countBuilt :: Int -- ^ Number of rules which were have been built in this run.
    ,countUnknown :: Int -- ^ Number of rules which have been built previously, but are not yet known to be required.
    ,countTodo :: Int -- ^ Number of rules which are currently required (ignoring dependencies that do not change), but not built.
    ,timeSkipped :: Double -- ^ Time spent building 'countSkipped' rules in previous runs.
    ,timeBuilt :: Double -- ^ Time spent building 'countBuilt' rules.
    ,timeUnknown :: Double -- ^ Time spent building 'countUnknown' rules in previous runs.
    ,timeTodo :: (Double,Int) -- ^ Time spent building 'countTodo' rules in previous runs, plus the number which have no known time (have never been built before).
    }

This structure counts the number of rules in each state, and where available, provides execution times. It reflects the information easily computable by the internals of Shake.

Computing the progress

To compute the progress I first spent a long time measuring real builds, and the Progress values, and trying to detect some correlation. The results were disappointing, and I got nowhere - the progress jumps around too much (I tried Kalman filters etc, but they didn't help). I then thought hard and decided that the best predictor of completion time is:

(number-of-rules-to-build * time-to-build-each-rule) / work-rate

So you take:

The number of things you have to do.
The time you expect each one to take.
How fast you are running rules, which you would expect to be roughly the -j parallelism argument.

This measure does not attempt to predict if rules are skipped or discovered, but these are unpredictable events, so ignoring them is sensible (and likely why the data-driven approach fell down). These measures can be computed by:

number-of-rules-to-build is known, and stored as countTodo.
time-to-build-each-rule is influenced by timeTodo, comprising the time to build some of the rules, and the number of rules that have never been built before. We can therefore compute the time-to-build-each-rule from the times we have, and predicted-time-to-build for the times we are missing.
predicted-time-to-build can be computed by taking the time things took to build this run (timeBuilt) and dividing by the number of things that rebuilt (countBuilt). Given the available data, there are a number of plausible alternatives for calculating predicted-time-to-build.
work-rate is roughly the parallelism flag, but for laptops, may be skewed by things like whether the machine was using battery power. We compute the work rate by tracking how long we have been building, and how much timeBuilt time we have managed to execute.

Using these definitions, given the Progress structure and a Double for the number of seconds the build has been running, we can define:

progressMessage :: Progress -> Double -> String

Stateful progress

The formulae above give a reasonable approximation of progress time, but they have a number of weaknesses:

At the beginning we have completed no work, so timeBuilt is zero, and therefore work-rate is zero.
If we are running a long-running rule (e.g. 15s) then the work-rate keeps going down, as timeBuilt only increases at the end, but the time building keeps going up.
Builds often go in phases, meaning that unknown rules at the beginning are typically fast code generators, in the middle are typically compilers, and at the end are slow linkers. The average of all rules is often too small by the end.

Many of these weaknesses are due to taking the current state and computing the progress, without incorporating some information from previous progress calculations. We need to redefine our progressMessage function as:

progressMessage :: ProgressState -> Progress -> (ProgressState, String)

Now we take some ProgressState (which can include the time spent building), and in computing the message, we produce some new state, which will be used to produce the next message. For example, ProgressState can include some decaying value of predicted-time-to-build, where the new value is a weighted average between the old value and the new value, allowing new values to dominate.

Streamed progress

My first stateful progress predictor made use of an explicit state, but it's hard to thread around, and doesn't feel very Haskelly. It also made minor local edits to the definitions require global state modifications, and significantly hampered tweaking the definitions. The solution was to introduce an abstraction. I played with a number of variants, but eventually settled on:

newtype Stream i a = Stream {runStream :: i -> (a, Stream i a)}

A Stream can be supplied with an input value of type i and produces an output value a and a new Stream. In our case, we will have a value of type Stream Progress String, and provide new Progress values to get out progress messages. Any stateful information can be captured in the closure. (This definition is unrelated to other Stream definitions in Haskell, which are usually just infinite lists.)

I define four primitive operations on Stream, namely:

pure from the Applicative class, which produces a new Stream which, regardless of the input, always produces the same output.
<*> from the Applicative class, which applies a Stream of functions to a Stream of values to produce a Stream of results.
idStream :: Stream i i which is the stream which always echoes its input as its output. This primitive is the only way to get at the input.
foldStream :: (a -> b -> a) -> a -> Stream i b -> Stream i a, which is like scanl but for streams, keeping an accumulating a value. This primitive is the only one to keep any state between successive stream outputs.

On top of these primitives I define a number of useful auxiliaries. As an example, oldStream produces a stream where each value is a pair of the previous value and the new value. We can define oldStream as:

oldStream :: a -> Stream i a -> Stream i (a,a)
oldStream old = foldStream (\(_,old) new -> (old,new)) (old,old)

Using these primitives we can build up complex definitions such as:

decay :: Double -> Stream i Double -> Stream i Double -> Stream i Double
decay f a b = foldStream step 0 $ (,) <$> oldStream 0 a <*> oldStream 0 b
    where step r ((a,a'),(b,b')) = ((r*b) + f*(a'-a)) / (b + f*(b'-b))

Here we are computing a / b, but using f to decay the old value, where 1 is no decay. This definition maintains a lot of state between steps of the stream, but thanks to the Stream abstraction, we can focus on the mathematics, instead of being lost in a sea of state.

Using these functions we can start to calculate the progress, for example:

unknownRuleTime = decay 10 (timeBuild <$> progress) (fromInt . countBuild <$> progress)

Here unknownRuleTime is the time spent building, divided by the number of rules that have been built, decaying at a fairly rapid rate, meaning more recent rules will have a greater impact on unknownRuleTime.

The entire progress generator takes 37 lines, and has some of the highest comment densities in Shake. The complete Stream abstraction and progress prediction code can be found in the git repo. I would be surprised if someone has not already discovered Stream, given it a name, and investigated its properties - any pointers most welcome.

Update: I have renamed Stream to Mealy, following the machines package.

Tuesday, November 19, 2013

The oldest Shake bug - async exceptions are hard

Summary: Shake could corrupt the build database under rare circumstances. Last week I finally figured out why, and fixed it.

The Shake build system records build metadata as it runs, using several techniques to ensure the metadata is not corrupted, even if the build process is killed. Over the past few years I've received a small number of reports where Shake ended up with corrupted metadata, causing a complete rebuild. These reports usually involved a build error, and often a laptop, but I was never able to reproduce the bug. Last week I finally understood the cause, and the bug is now fixed in shake-0.10.9. In this post I explain how Shake records its metadata, what the bug was and how I fixed it.

The Shake metadata

Shake records build metadata in a file, as a sequence of entries. Each entry represents one firing of a build rule, and contains the length of the entry, followed by the key of that rule (e.g. a filename) and the values produced by that rule (e.g. a modification time and list of dependencies).

The metadata file is designed to be robust even if the build is forcibly killed, using the length of the entry to detect whether the entry was written out in full, or if the program aborted while writing the entry. The key property I rely on is that all entries are complete and valid, apart from the last which may be only partially written.

The Bug

I was writing to the file with:

withLock lock $ LBS.hPut handle entry

This code was called from multiple threads, so to ensure writes were not interleaved, I used withLock. In Shake, if any build rule fails, I abort the build by killing all worker threads. However, during the process of aborting, one thread could take the lock, write out some prefix of its entry, then be killed. Another thread could follow the same pattern, resulting in two half-entries and a corrupted metadata file.

For this problem to arise you need to abort a thread that is in the middle of writing to the file, then wait sufficiently long to give a second thread the chance to start writing before it too is killed. In practice I believe it requires an error to be raised and two rules to produce entries almost simultaneously, and for the first thread to be killed to be the one that took the lock, and for the threads to be killed somewhat slowly. In days of random testing once every 20 seconds I never managed to reproduce such a delicate arrangement.

The Fix

The fix is straightforward, just make a single thread responsible for all writes to the file:

chan <- newChan
forkIO $ forever $ LBS.hPut h =<< readChan
let write x = do
    evaluate $ LBS.length x
    writeChan x

Here the chan keeps a list of things to write out, a single thread reads from chan and writes to the file, and any thread can call write. It is important to evaluate before writing to chan so that any exceptions caused by binary encoding are raised by the thread that caused them, and errors get the correct stack trace (Shake adds nice stack traces to all error messages).

In practice, the code is complicated by the possibility of exceptions, cleaning up on shutdown and a desire to flush the file periodically. The real code in Development.Shake.Storage is:

flushThread :: Maybe Double -> Handle -> ((LBS.ByteString -> IO ()) -> IO a) -> IO a
flushThread flush h act = do
    chan <- newChan -- operations to perform on the file
    kick <- newEmptyMVar -- kicked whenever something is written
    died <- newBarrier -- has the writing thread finished

    flusher <- case flush of
        Nothing -> return Nothing
        Just flush -> fmap Just $ forkIO $ forever $ do
            takeMVar kick
            threadDelay $ ceiling $ flush * 1000000
            tryTakeMVar kick
            writeChan chan $ hFlush h >> return True

    root <- myThreadId
    writer <- forkIO $ handle (\(e :: SomeException) -> signalBarrier died () >> throwTo root e) $
        -- only one thread ever writes, ensuring only the final write can be torn
        whileM $ join $ readChan chan

    (act $ \s -> do
            evaluate $ LBS.length s -- ensure exceptions occur on this thread
            writeChan chan $ LBS.hPut h s >> tryPutMVar kick () >> return True)
        `finally` do
            maybe (return ()) killThread flusher
            writeChan chan $ signalBarrier died () >> return False
            waitBarrier died

This function takes the flush interval (in seconds, or Nothing to never flush), and the file handle, and an action to run which requires the write function. It's pretty complex code, which is why it has such a high density of comments (compared to my usual code).

Saturday, November 16, 2013

ACM Article - Leaking Space

I wrote an article for the Communcations of the ACM which is now available on the ACM Queue, entitled Leaking Space, all about space leaks. It's got plenty of Haskell, but is intended as a general article on space leaks, and even has an example in Javascript. The introduction features both printed encyclopedia's and postage stamps - both things my son might only learn about on Wikipedia. The abstract:

A space leak occurs when a computer program uses more memory than necessary. In contrast to memory leaks, where the leaked memory is never released, the memory consumed by a space leak is released, but later than expected. This article presents example space leaks and how to spot and eliminate them.

Read Leaking Space for free, or if you have an ACM subscription, also here.

Thanks to Andy Gill for requesting the article and for suggestions to improve it, and to the team at ACM for typesetting/proofing etc.

Tuesday, November 12, 2013

NSIS plugins and LZMA solid

Summary: If you use NSIS plugins along with LZMA solid compression, make sure you reference the plugins first.

The NSIS Windows Installer supports plugins, which are dlls added to the installer which are called at install time. It also supports LZMA solid compression, which means that all files are compressed as a single continuous run for better compression. These two features can interact poorly. As an example:

SetCompressor /SOLID lzma
Page instfiles "" createInstFiles
Section dummy
    SetOutPath "$INSTDIR"
    File "*.exe"
SectionEnd
Function createInstFiles
    w7tbp::Start
FunctionEnd

This installer is LZMA solid compressed and installs all .exe files into the installation directory. It also uses the Taskbar progress plugin to provide progress information in the taskbar.

Unfortunately, if the .exe files are large (say 1Gb) the installer will visibly freeze for around 40 seconds when you start to install. When the createInstFiles function is run, it first extracts the 3Kb file w7tbp.dll and loads it. NSIS adds files to the archive in the order they are discovered, meaning that since w7tbp.dll is first mentioned at the end of the file it is placed at the end of the archive. Solid compression requires decompressing everything before a file to extract it, and thus accessing the last file requires a lot of disk access and CPU time.

The workaround is to move all addin calls before all sections, so they go at the front of the archive, and can be extracted quickly. However, sections define names which are only accessible later on in the file, so not all functions can be moved before sections. Therefore, as a general technique, you can write a dummy function that is never called but which references each plugin, and put that function before any sections.

My NSIS wrapper library automatically uses the workaround.

Monday, November 04, 2013

NSIS Haskell wrapper now supports plugins

Summary: I've just released NSIS-0.2.3 which supports NSIS plugins.

My NSIS library is a Haskell EDSL to make it easier to write NSIS Windows Installers. I've just released a new version which has support for NSIS plugins. These plugins can provide extra functions to an NSIS installer, for example base64 encode/decode or displaying Windows 7 taskbar progress indications.

My NSIS library provides a low-level wrapper to interact with plugins, which can be wrapped to produce type-safe easy-to-use plugins. As an example, here are two NSIS plugins I have wrapped:

Example 1: base64 encode/decode

The base64 plugin provides two methods, encrypt and decrypt, both with the same pattern. In NSIS we can write:

StrLen $1 $2
Base64::Encrypt $1 $2
Pop $3

Using the Haskell library this becomes:

encrypt :: Exp String -> Exp String
encrypt x = share x $ \x -> do
    plugin "Base64" "Encrypt" [exp_ x, exp_ $ strLength x]
    pop

We have given the encrypt function a proper type, and compute the strLength automatically. The only complication is that since we are using a call-by-name language, and are using x twice, we must share it to ensure it is computed only once. We can now write:

alert $ "Hello encoded is " & encrypt "Hello"

The module Development.NSIS.Plugins.Base64 provides this function, along with decrypt.

Example 2: Windows 7 taskbar progress

The Taskbar progress plugin provides a progress bar inside the taskbar on Windows 7 and above. In NSIS we can write:

!define MUI_PAGE_CUSTOMFUNCTION_SHOW ShowInstFiles
!insertmacro MUI_PAGE_INSTFILES
Function ShowInstFiles
  w7tbp::Start
FunctionEnd

The !define must occur before the appropriate !insertmacro, so you are restricted to where this code can go. Using the Haskell library this becomes:

taskbar :: Action ()
taskbar = onPageShow InstFiles $ plugin "w7tbp" "Start" []

Now the taskbar function can be called anywhere, and it will automatically move the macros to the right place. The module Development.NSIS.Plugins.Taskbar provides this function.

Contributions welcome

NSIS is hosted on github, and I welcome contributions. Some things you could help with:

I currently wrap most of the standard NSIS functions, but not all of them. I welcome any additional wrappers. I have been wrapping functions by need, but eventually would like to have everything wrapped.
The functions are mostly intended to be understood in conjunction with the NSIS documentation. I welcome any enhanced documented or work to make the documentation standalone, so people don't need to look at the underlying NSIS docs.
I currently wrap 2 plugins - one I needed and one that made a good demo. I welcome wrappers for all plugins.

I have written this library to address my needs. I would welcome bug reports or pull requests, in particular if you can write the installer you need in NSIS but not in this library.

Saturday, October 12, 2013

Haskell type graphs with Uniplate and Haskell-src-exts

Summary: Generating graphs from Haskell modules is easy, using Haskell-src-exts and Uniplate.

In my recent Uniplate talk I showed a graph of which Haskell-src-exts types contain Exp at any level. Several people in the audience asked for the code, so I've tidied it up and will walk through it in this post. Before we start, the graph we want to generate is:

Each node is a type, and an edge from a to b means at least one constructor of type a contains a value of type b. I only include types which eventually reach Exp. To calculate this graph, I used Haskell-src-exts and Uniplate. The main function is:

import Language.Haskell.Exts          -- haskell-src-exts
import Data.Generics.Uniplate.Data    -- uniplate
import Data.List

main :: IO ()
main = writeFile "graph.dot" . graph . interesting "Exp" . reach =<< getModule

We can read main from right to left:

We first call getModule to get the module we are interested in.
Next we call reach to compute which types are directly contained within which types.
We use interesting to pick out only those types which can reach Exp.
We convert to a DOT file using graph.
And finally we write the file out to Graph.dot.
Afterwards we can manually run dot -Tpng graph.dot > graph.png to generate the image above.

Now let's go through each of the functions in turn.

The getModule function

getModule :: IO Module
getModule = do
    src <- readFile "../../haskell-src-exts/src/Language/Haskell/Exts/Syntax.hs"
    return $ fromParseResult $ parseModule $ unlines $ filter (not . bad) $ lines src
    where bad x = head (words x ++ [""]) `elem` words "#if #else #endif deriving #ifdef"

We read the module from a known location on disk and parse it. The only complication is that the Haskell-src-exts AST module uses the C pre processor (CPP). By filtering out all lines beginning with CPP directives, and also deriving, we end up with a module with the same type structure, but which can be parsed with Haskell-src-exts. An alternative approach would be to use something like cpphs to properly interpret the CPP directives.

The reach function

reach :: Module -> [(String, [String])]
reach m = [ (prettyPrint name, nub [prettyPrint x | TyCon x <- universeBi ctors])
          | DataDecl _ _ _ name _ ctors _ <- universeBi m]

This function gets a list of type name and contained types for all data declarations in the module. We use the universeBi function from Uniplate to find everything of the relevant type, a list comprehension pattern to filter it down and grab the parts we want, and then prettyPrint to convert things to String. This function is the only one that uses Uniplate.

The interesting function

interesting :: String -> [(String, [String])] -> [(String, [String])]
interesting target xs = [(a,b `intersect` keep) | (a,b) <- xs, a `elem` keep]
    where keep = f [target] xs
          f want xs = if null new then want else f (map fst new ++ want) rest
              where (new,rest) = partition (not . null . intersect want . snd) xs

This function removes all types which don't at some point include the target, which is Exp in our case. We first compute keep, which is the interesting types, then restrict the input to only refer to things in keep. To compute keep we start by using the target as the thing we want. We then repeatedly find any types that include anything we want. Once there is nothing new that we want, we finish. This function is the only one that performs thoughtful computation, the others are just shuffling data between formats.

The graph function

graph :: [(String, [String])] -> String
graph xs = unlines $ ["digraph g {"] ++ [from ++ " -> " ++ t ++ ";" | (from,to) <- xs, t <- to] ++ ["}"]

This function converts the types into GraphViz format, which is a list of edges in a directed graph.

License

This code is throwaway code I wrote to generate one slide, and didn't even bother checking into a version control system. If it's useful to you, please use it for whatever purpose you want. Ignoring blank lines, optional type signatures and imports, it only takes 12 lines.

Wednesday, October 02, 2013

Upcoming talks

I'll be talking at two events this month - hope to see some people there!

9 Oct 2013 - Haskell eXchange 2013

Everyone should use a Generics library - writing HLint with Uniplate

A generics library allows programmers to express only the interesting part of certain tasks, avoiding lots of boilerplate and making the code significantly shorter and more maintainable. Haskell is awash with excellent generics libraries, including Scrap Your Boilerplate, Generics for the Masses, Generic Deriving and Uniplate, yet many developers will never have used any of them. This talk explains how to use the Uniplate generics library, using examples derived from the Haskell Lint tool (HLint), which makes extensive use of Uniplate.

23 Oct 2013 - Runcifest

Colin's Industrial Influence (joint with Malcolm Wallace)

Colin has dabbled in many areas of functional programming, from XML processing to data visualisation, from type searching to generic transformations. While most of Colin's contributions have been made from inside a University, their impact has been felt in industry. In this talk we take a whirlwind tour through some of the projects that have influenced us and our work.

Thursday, September 12, 2013

Repeated Word Detection with Haskell

Summary: I wrote a simple program to detect repeated words in a file. This post gives a walk-through of how I developed it.

My wife is currently writing her PhD thesis, and in a recent draft her supervisor spotted a few instances where she had repeated repeated a word by accident (as I just did with repeated). While her LaTeX editor of choice highlights spelling mistakes, it does not spot repeated words. Therefore, I offered to write a quick script to spot repeated words for her. I used Haskell because it is a great choice for quick one-off scripts.

The problem can roughly be described as "for each file, find words that repeat". My first approach with any Haskell program is to decompose the problem, and this problem naturally breaks itself into "(for each file) (find words) (that repeat)". I need a function to iterate through a list of files doing IO and calling the other functions (main), a function to split an input file into words (worder) and a function to spot repeats (dupes).

Iteration 1

To get started, I wrote simple versions of each function:

main :: IO ()
main = print . dupes . worder =<< readFile "thesis.tex"

worder :: String -> [String]
worder = words

dupes :: [String] -> [String]
dupes = map head . filter ((> 1) . length) . group

The main function is restricted to a single static file (namely thesis.tex), which it reads in, splits it into words, finds the duplicates, and prints that information. The function to split into words just uses the standard words function which splits on whitespace boundaries. The function dupes is the most interesting - it uses group to create lists of equal adjacent words, filter to find any group of more than 1 adjacent word, then map head to take only the first element from each group to report as the word at fault.

Iteration 2

The first iteration only prints out the words that have been duplicated. It would be much more civilised to also print the line number of the duplicated word, so my wife can quickly find the problem. The solution is to refine the type passed between worder and dupe to include the line number alongside each word. Instead of passing [String] we pass [(Int,String)].

worder :: String -> [(Int, String)]
worder whole = [(i, word) | (i, line) <- zip [1..] $ lines whole, word <- words line]

dupes :: [(Int,String)] -> [(Int,String)]
dupes = map head . filter ((> 1) . length) . groupBy ((==) `on` snd)

For worder we first split into lines and use zip [1..] to assign line numbers, then split each line into words. The changes to dupes are fairly minor - when grouping we use groupBy and consider two words to be adjacent looking only at the word part, not the line number. We are now printing out line numbers, making the error easy to find.

Iteration 3

The type of dupes is more specific than we need, so we can generalise it. Thinking about what dupes should do, we are really getting in a list of pairs of some information, and a value to check for repetition on. Therefore, we can write:

dupes :: Eq v => [(k, v)] -> [(k, v)]
dupes = map head . filter ((> 1) . length) . groupBy ((==) `on` snd)

Note that the code has not changed, merely the signature. With the new signature we can also be sure we are not inadvertently comparing on the line number, since we have no Eq context for k.

Iteration 4

Looking at some sample documents, it became clear our worder implementation is insufficient, in particular:

It is case sensitive, while our repeated words might be at the start of a sentence, e.g. "In in this chapter"
It keeps punctuation, while our repeated words might be followed by a comma, e.g. "as shown in this chapter chapter, we have"
It finds non-alphabetic words, e.g. "The position is at coordinate 1 1 3."

Fixing the first problem is simple - just map toLower over the string at some point. Fixing the others requires more thought, as we still want punctuation and non-alphabetic characters to separate words, so it cannot simply be discarded. There are two approaches to the problem - one is to change the splitting procedure (which effectively involves designing a finite state machine for when to split), the other is to process the input to make it suitable for words. While the first is more likely to produce something maintainable and adaptable, the second is often quicker to implement at first. For this program, I chose the second approach.

I realised that if we convert to lowercase, then replace all non-alphabetic non-space characters with " 1 2 ", we meet all the criteria above. Punctuation separates words, as does " 1 2 ". By replacing with a sequence of two distinct words we ensure that repeated punctuation does not flag as a spurious repeated word. By choosing characters that are themselves replaced by the sequence, we ensure we do not make a repeated word with the word before/after the replacement.

worder :: String -> [(Int, String)]
worder whole = [(i, word) | (i, line) <- zip [1..] $ lines $ f whole, word <- words line]
    where f = concatMap (\x -> if isAlpha x || isSpace x then [x] else " 1 2 ") . map toLower

We define a local function f to make the changes to the input string. To perform Char to String replacements on a string we use concatMap with the replacement described above. We could have fused the two iterations over the string, but keeping them separate makes it slightly clearer.

Iteration 5

The final part of the spec we have ignored until now is "for each file", which we can implement as:

main :: IO ()
main = do
    files <- getArgs
    bad <- fmap concat $ forM files $ \file -> do
        src <- readFile file
        return $ map ((,) file) $ dupes $ worder src
    putStr $ unlines $ if null bad then ["Success"] else
        [file ++ ":" ++ show line ++ ": " ++ word | (file,(line,word)) <- bad] ++
        ["FAILURES (" ++ show (length bad) ++ ")"]

This function gets a list files from the command line arguments, reads each file, applies worder and dupes to find repeated words, then tacks on the filename to the errors. Once we have accumulated all errors in bad we check how many there were, if none we say Success, otherwise we list the failures.

Conclusion

This code is not intended to demonstrate best practice, or clever ideas, more practical use of some simple Haskell to solve a simple problem. Running this checker on my wife's thesis found three instances of repeated words, all of which have now been fixed.

Wednesday, August 14, 2013

Destroying Performance with Strictness

Summary: New versions of unordered-containers cause a massive performance regression in uniplate (10x slower is typical), which I have fixed in uniplate-1.6.11.

I've just uploaded uniplate-1.6.11, which fixes a severe performance regression introduced by unordered-containers-0.2.3.0 and above. As an example, the time to run HLint on the HLint source code (my standard benchmark) jumped from 1.7 seconds to 18.6 seconds, more than ten times slower. I strongly recommend anyone using Uniplate/HLint to upgrade.

The problem is caused by the lookupDefault function from the Data.HashMap.Strict module:

lookupDefault :: (Eq k, Hashable k) => v -> k -> HashMap k v -> v
lookupDefault def k mp = ...

This function looks up k in mp, but if k is not found, returns def. There has been discussion over whether def should be evaluated even if k is present in mp, and since unordered-containers-0.2.3.0 lookupDefault always evaluates def. There are many legitimate discussions on semantics in the Haskell community, but I do not consider this discussion to be one of them - lookupDefault should not pointlessly evaluate def. As John Hughes elegantly argues in "Why Functional Programming Matters", laziness lets you write functions that composable properly. I have spent several days debugging two problems caused by the excessive strictness in lookupDefault:

Problem 1: Error Defaults

uniplate-1.6.9 contained the following code:

lookupDefault (error "Uniplate internal error: Report to Neil Mitchell, couldn't grab in follower") x mp

Uniplate uses a very complex and highly tuned algorithm to determine which values can be recursively contained by a type. The algorithm relies on subtle invariants, in particular that x must be a member of mp at this particular point. If that fails for certain data types, I want users to submit a bug report, rather than wonder if they called Uniplate incorrectly. The simplest way to achieve that goal is using a default value of error that is only evaluated if x is not present in mp. Unfortunately, by making lookupDefault strict, this error message was always triggered. People have argued that passing error as the default is only to work around the absence of stack traces in GHC - an argument disproven by the example above. The error message provides information that would not be provided by a stack trace - the error is not merely an error, but an error that is my fault. I fixed this bug in uniplate-1.6.10 by switching to:

fromMaybe (error "Uniplate internal error: Report to Neil Mitchell, couldn't grab in follower") $ lookup x mp

Problem 2: Expensive Defaults

uniplate-1.6.10 contained the following code:

lookupDefault (hit ! x) x mp

In this example I look up x in mp, but if that fails, look up x in hit. By forcing the default to be evaluated this code always performs the lookup in hit, resulting in a slowdown of more than 10 times in HLint, and more than 20 times in Uniplate microbenchmarks. I fixed this bug in February by changing all instances of lookupDefault to use fromMaybe after finding out lookupDefault was incorrect, but was unaware of the significant performance impact until earlier today.

Counterarguments

There are various arguments for making lookupDefault strict in the default argument:

"You are using Data.HashMap.Strict" - the Strict module for a data structure should provide a data structure containing strict values, not a module of functions which are unnecessarily strict - values which are never stored in the data structure should never be evaluated. As another example consider the Control.Monad.State.Strict module, which provides a strict state monad. In that module, the state is strict, but functions like return are not needlessly strict.
"We don't support using undefined values" - I expect all Haskell libraries to work with undefined values where sensible, not be corrupted in the presence of exceptions and be multithread safe. These are some of the attributes that are essential to allow libraries to be used predictably and reused in situations the authors did not consider.
"Strictness is faster" - a strict lookupDefault may occasionally shave nanoseconds off a program, but can make a program arbitrarily slower, and in the case of HLint makes a commonly used program 10 times slower.
"It is what people expect" - I didn't expect lookupDefault to be strict, despite being one of a handful of Haskell programmers to program in a strict variant of Haskell all day long.

Tuesday, July 30, 2013

Standard Chartered are hiring

Summary: Haskell programmer willing to relocate to Singapore? Come work for Standard Chartered!

Standard Chartered, the bank where I work, is hiring again. We're looking for a programmer in Singapore to use Haskell full time (or more specifically, the Mu Haskell dialect), with the results used by many people, often the very next day. We write compilers, libraries, applications, web servers, DSLs and much more besides. All of the work is for use in house, and is usually geared towards finance, but no finance background is required. Standard Chartered has been using Haskell for over five years, and has hired lots of well-known Haskell programmers, including Lennart Augustsson, Ravi Nanavati, Malcolm Wallace, Roman Leshchinskiy, Don Stewart and Andy Adams-Moran.

To apply, send a CV to neil.mitchell AT sc DOT com, and make sure the CV includes links to anything you've written on programming (Twitter, StackOverflow, blogs, academic papers) and links to any open-source software you have written (GitHub, Hackage). We are ideally looking for:

An expert Haskell programmer - someone who can quickly write top-quality code. We have a large existing Haskell code base, which lots of people are continually developing, so experience with larger Haskell projects would be a plus.
Someone who can proactively talk to our users, understand their problems, find out what they are asking for, then deliver what they really want. We make releases nightly and rely on constant feedback.
Someone who takes responsibility for their code and keeps it in good shape. That includes testing it, refining it over time and documenting the surprising bits. We can only iterate rapidly by keeping the code clean and testable, or we would end up breaking things.
Someone who is prepared to work weekdays 9am-6pm in an office in Singapore (telecommuting is not currently an option). The work we do directly impacts many people on very short time scales, which is demanding, but also rewarding.

Update: The application process has now closed.

Saturday, June 22, 2013

Building LLVM using Shake

Summary: You can now build LLVM using Shake, and a rebuild with nothing to do goes massively faster than make (0.8s vs 199s) and fractionally faster than Ninja (0.8s vs 0.9s).

As of Shake 0.10.4 the shake tool can execute Ninja build files. LLVM can be built with CMake, and CMake can generate a Ninja build file, so you can compile LLVM with Shake. I've included the full steps I followed at the end of this post.

The main thing I wanted to test was how fast a rebuild with nothing to do was using Shake vs Ninja, as Ninja prides itself on having "a focus on speed". When compiling LLVM on Windows with GCC, a nothing to do build using make takes 199s, Shake takes 0.8s and Ninja takes 0.9s. The CMake generator does not use one of the latest Ninja build features (the deps keyword), but if it did, Shake would be about 0.1s faster and Ninja would be at least 0.1s faster.

Full builds with Shake and Ninja both take about the same time, but with anything higher than 2 CPUs the linker phase ends up contending heavily and the machine thrashes the disk, making robust measurements impossible. The solution would be to use finite resources on the linkers, something that needs implementing in the CMake Ninja generator, and would then allow more CPUs to be used.

Other than speed, why would you use Shake to compile LLVM?

If you build with --report the file report.html will be generated. Open that report file and you can see numerous details about the build - how good the parallel utilisation was, what changed to cause what to rebuild, summary statistics, a dependency graph and more. See the Help page in any generated report for more details.
If you build with --progress the console titlebar will display a predicted completion time, how many seconds until your build completes. The predicted time will be fairly inaccurate the first time round, but future runs are influenced by recorded timings, and can produce useful guesses.
If your CPU has a preference for functional languages it will make the registers happier.

Existing Ninja users may also be interested in a guide to running Ninja builds with Shake, which gives a few more details on using Shake like Ninja.

Compiling LLVM with Shake

These instructions are how I compiled LLVM with Shake, on Windows, with GCC. I didn't run into any significant problems, but there were two minor niggles I had to work though (both listed below). I compiled LLVM with make, then Ninja, then Shake, to check each phase as I went - but only the final Shake compile is actually necessary.

Install Shake with cabal update && cabal install shake --global, if you are new to Haskell package installation, see here.
Get LLVM and compile it with make, I followed the instructions at http://bencode.net/clangonwindows, which has disappeared in the last few days (I have emailed the web master to see where it went).
Install Ninja.
Run CMake over LLVM like this, configuring with -G Ninja.
To build with Ninja I had to edit build.ninja line 17697 to delete lib/clang/3.4/lib/windows/libclang_rt.i386.a, which won't build on my system and isn't built at all by the make system - I suspect this is a tip/mingw issue. At this stage you can compile LLVM with Ninja.
Type touch tools/clang/lib/Basic/CMakeFiles/clang_revision_tag to create a dummy file. There is a Ninja rule to create such a file, but the rule is wrong since it doesn't actually produce the file, and Shake's sanity checking spots that.
Run shake -j2 in the build directory. Come back later and you will have a build.
Run shake -j2 again to enjoy the fast nothing to do build.

Sunday, May 26, 2013

Three types of build-system dependency

Summary: There are three types of dependencies you might want to express in a build system, all of which are supported by Shake.

A build system, at its heart, is a system which runs commands in an order satisfying user-specified dependencies. But what kind of dependencies can be expressed? This post describes three different types of dependency, only one of which is available in Make, but all of which are available in both Shake and Ninja.

Feature 1: Static dependencies (available in every build system)

The most basic form of dependency is a static dependency, where a rule produces an output from some inputs:

-- In Make --
result.tar : file1 file2
    tar -cf result.tar file1 file2

-- In Shake --
"result.tar" *> \out -> do
    let deps = ["file1","file2"]
    need deps
    cmd "tar -cf" [out] deps

This rule says that the file result.tar depends on the inputs file1 and file2, and provides a command to build result.tar. Whenever file1 or file2 change, the command will be run, and result.tar will be built.

Static dependencies occur in almost every build rule, and are supported by all build tools, including Make and Shake.

Feature 2: Dynamic dependencies (available in Shake, Ninja, Redo and tup)

A more advanced dependency is where the list of dependencies itself depends on the results of previous dependencies. Imagine we want to build result.tar from the list of files stored in list.txt. The dependencies of result.tar cannot be specified statically, but depend on the contents of list.txt. In Shake we can write:

"result.tar" *> \out -> do
    need ["list.txt"]
    contents <- readFileLines "list.txt"
    need contents
    cmd "tar -cf" [out] contents

This rule describes how to build result.tar. We depend on (need) the file list.txt. We read each line from list.txt into the variable contents - being a list of the files that should go into result.tar. Next, we depend on all the files in contents, and finally call the tar program. If either list.txt changes, or any of the files listed by list.txt change, then result.tar will be rebuilt.

This feature is necessary in almost every build system, yet is shockingly lacking from most build tools - I am only aware of it being available in Shake, Ninja, Redo and tup. As a common example, in Make you might write:

result.o : result.c result_header1.h result_header2.h
    gcc ...

The file result.o depends on both the C source file result.c and all headers that file includes. But listing the headers both in result.c with #include directives, and in the Makefile, is a brittle form of duplication. A better approach is for the build system to run gcc -M result.c and extract the includes from there. In Shake we can write:

"result.o" *> \out -> do
    let src = "result.c"
    Stdout stdout <- cmd "gcc -MM" [src]
    need $ src : drop 2 (words stdout)
    cmd "gcc -o" [out] "-c" [src]

My experience is that about a quarter of rules require some kind of additional dependency based on previous dependencies. While you can hack round some of the issues in Make, and people have become disturbingly adept at doing so, the result often only approximates the dependencies - building either too much or too little.

Feature 3: Multiple outputs from one rule (available in Shake and Ninja)

The final feature is producing multiple outputs from one command, and is used far more rarely (perhaps one or two rules in a complex build system) - but when needed, is essential. Some programs, such as GHC, can produce two outputs with one command - compiling Foo.hs produces both Foo.o and Foo.hi. As a first approximation, the .o file depends on the entire contents of the source file, while the .hi file depends only on the type signatures. A single ghc invocation needs to do all the work to produce both, but often the .hi file will be left unchanged. In Shake we can write:

["Foo.hi","Foo.o"] *>> \_ -> do
    need ["Foo.hs"]
    cmd "gcc -c Foo.hs"

While it is often possible to construct a series of dependencies to approximate a single rule producing multiple outputs, it only works in some cases, and is brittle. The only build systems I am aware of which support multiple outputs are Shake and Ninja.

Essential features

My standard advice when people ask about writing a build system is "don't". If some existing build system (e.g. ghc --make or Cabal) is capable of building your project, use that instead. Custom build systems are necessary for many complex projects, but many projects are not complex. If you have decided your project is complex, you should use a build tool that can express complex dependencies, both for writing the initial system and to provide the flexibility to make the inevitable changes required.

Looking only at dependency features, I would consider it unwise to start a complex build system using a tool other than Shake or Ninja, or perhaps either Redo or tup (if you accept the absence of multiple outputs from one rule).

Weak dependency specification in build tools, particularly Make, has left its mark on many programs. I recently talked to some OCaml hackers complaining that their tools were not "Make friendly" because they produced multiple output files. I wonder what lengths other tools have gone to in order to cope with weak dependency specification...

Update: The relative power of tup was reported as a comment, and it appears to have the necessary power, but I haven't yet checked. Following further research into Ninja I suspect it may not be as powerful as originally stated and may not have Feature 2, but am not yet sure. Make gained support for multiple outputs after this post was published.

Wednesday, April 17, 2013

Buffer smashing in NSIS

Summary: I've identified two fixed-size buffer errors in the NSIS compiler generator, one of which is a buffer overflow leading to a segfault. My nsis library works round them to some extent.

My nsis Haskell library provides a layer over the NSIS installer generator. NSIS scripts can be viewed as assembly code for a virtual machine with 16 general purpose registers, the ability to define new registers, and a stack - where all locations store strings and instructions are register-to-register. My nsis library abstracts over these details to provide something with more traditional flow control (e.g. while, if), compound expressions and type safety, using NSIS as an assembler.

A high-level language tends to tickle areas of an assembler that a human would not. In particular, NSIS has two fixed-buffer size errors that nsis has triggered.

Bug 1: String literals >= 4096 bytes cause segfaults

If you write a string literal in the NSIS source which is 4096 characters or longer, the NSIS generator segfaults. As an example (ignoring lots of NSIS boilerplate):

Var foo
StrCpy $foo "XXX...XXX"

If the string has 4095 X characters it works. As soon as you have 4096 X characters or more the NSIS generator segfaults. My guess is that the NSIS lexer has a 4096 character buffer that is overflowed.

As a workaround, you can do:

Var foo
StrCpy $foo1 "XXX...XXX"
StrCpy $foo2 "XXX...XXX"
StrCpy $foo $foo1$foo2

As long as both $foo1 and $foo2 are less than 4096 characters, you can combine them to produce $foo without error (as far as I can tell).

Bug 2: fileWrite truncates its output at 1023 bytes

When writing a file, all FileWrite calls are truncated to 1023 bytes. As an example:

FileOpen $h "output.txt" w
FileWrite $h "XXX...XXX"

If there are more than 1023 X characters, only the first 1023 will be written. My guess is that FileWrite has a 1024 character buffer for output.

As a workaround, you can write the file in smaller chunks, using multiple FileWrite instructions.

Workarounds in nsis-0.2.2

Manipulating long strings in NSIS is not that common. The example that caused me to look at buffer sizes was writing out a configuration file line-by-line, for example:

writeFileLines "$INSTDIR/config.ini"
    ["[config1]"
    ,"InstallDir=$INSTDIR"
    ,...
    ]

In nsis-0.2.1 writeFileLines was defined as:

writeFileLines a b = writeFile' a $ strUnlines b

This function merges all lines together then writes them to the file in one go, which truncates if the whole output is longer than 1023 characters. In addition, the nsis optimiser will often perform strUnlines at compile time, so the NSIS assembler gets a single literal, potentially exceeding 4095 characters. To avoid both these problems, in nsis-0.2.2 I have defined:

writeFileLines a b = withFile' ModeWrite a $ \hdl ->
    forM_ b $ \s -> fileWrite hdl $ s & "\r\n"

This revised definition writes the lines one by one. If any line is longer than 1023 characters it will still be truncated, but that is less likely than before. I will be reporting these issues to the NSIS team, so hopefully they can be fixed at source.

It would be possible to apply the workarounds directly in the nsis library, but I have not yet done so. Using long strings in an installer is rare, so hopefully the problem will not impact anyone. In addition, while investigating the file truncation bug I found the string literal bug, so I wonder what other buffer bugs might lurk in NSIS.

Tuesday, February 26, 2013

Shake Links

Summary: Shake is now hosted on GitHub and there is a mailing list.

I've just converted Shake over to Git, hosted on GitHub, and created a mailing list. The full set of Shake related links is now:

Download the Haskell package from Hackage and install it using Cabal.
Documentation, including examples and a list of all the available functions.

Blog posts detailing ongoing development work.
Academic paper on the underlying principles of Shake.
Video of a talk given at ICFP 2012.

Mailing list for any questions/thoughts on Shake.

Questions may also be asked on StackOverflow with the tag shake-build-system.
Bugs may also be reported on the GitHub issue tracker, but if you aren't sure, use the mailing list.

Source code in a git repo, stored at GitHub.

If you want to talk to me about Shake you can now write an email to the mailing list, create a question on StackOverflow, or open a GitHub issue (or still contact me by personal email). I'm available on any means of communication, so pick what suits you best - I'm expecting most users will start with a StackOverflow question if it's a "how do I..." question, or the mailing list if it's more of a discussion/suggestion.

I'm very interested in hearing tales of people who have successfully used Shake, so if Shake works for you, perhaps write it up on a blog, or just send an email to the mailing list. I'm also interested in what doesn't work, what was difficult, what documentation was lacking etc.

Why the sudden Shake activism? Some readers may have noticed a sudden surge in Shake related activity. The reason is that I was reminded of the problems with bad build systems, after using only shake/ghc for many years. I had to work with a team for several hours with a build system written in SCons. The "null build" (with nothing to do) took 38 seconds. The build was structured as three separate SCons build systems, one which compiled a code generator, one which ran the code generator, then another that compiled the result. The build could only run single threaded because it made use of the current directory. A project that was not particularly complex was suffering at the hands of their build system. Some of these are limitations in SCons, some are weaknesses in their use of SCons, but the build system was hindering them, not helping them. I wasted a lot of my time waiting for rebuilds that should have taken seconds. Developers deserve better.

Monday, February 25, 2013

Chasing a Space Leak in Shake

Summary: Shake v0.3 had a serious space leak that went undiagnosed for over a year, this post describes how I tracked it down.

Introduction to Space Leaks

One of the downsides of a lazy language is the possibility of space leaks. A space leak is when your program uses more memory than it should, typically because lazy evaluation is holding on to things that if evaluated would disappear. As an example, let's build up a set:

myset = Set.delete dead $ Set.fromList [alive, dead]

Consider the value myset. We first create a set with alive and dead, then delete dead. Assuming the values dead and alive are used nowhere else in the program, which values does the garbage collector consider to be alive?

In a strict language only alive would be considered alive. But in a lazy language, as we have not yet evaluated myset, both alive and dead are kept alive. You can think of myset being represented in memory as Thunk (Set.delete dead) (Value (Set.fromList [alive,dead])), instead of just Value (Set.fromList [alive]). You are storing the list of operations that will be applied to the set, rather than the result of those operations. These operations reference dead, so we cannot garbage collect it. Once we have identified the problem, we can apply strictness operations to force the value, for example:

evaluate myset

The evaluate function forces evaluation of the first part of myset. If our set data type is strict, that will force evaluation of the entire myset value. After evaluate there will be no references to dead, and it will be garbage collected. In other situations bang patterns, strict fields or seq can be used for eliminating space leaks.

Finding the Space Leak in Shake

The space leak in Shake was found over a period of a few weeks, ultimately with me waking up at 4am on New Years Day with knowledge of where an exclamation mark should be placed. However, I will present the account somewhat more methodically...

Step 1: Admitting you have a problem

The first step to fixing a space leak is the realisation that a space leak exists. My suspicion is that many (most?) large Haskell programs have space leaks, but they often go unnoticed. My first report of "memory issues" with Shake was a few months before I tracked it down. Shake is a library that makes use of user supplied rules, and in this example, the user supplied rules were far from trivial. Complicating the problem is the fact that Shake stores a list of all rules it has run in memory, which naturally grows over time. What finally convinced me that there was a problem was when several clean builds failed by exhausting memory, taking greater than 2Gb on a 32bit system.

Step 2: Seeing the problem with your own eyes

After suspecting a space leak, the first thing to do is measure memory usage, which I did using Process Explorer. The graph to the left shows a clear increase in memory. The memory doesn't grow all at once, but in steps - typical of a garbage collected language. The time between memory growth is not evenly spaced, which is reasonable for a program that runs different user rules all of which take different amounts of time. I reran the program several times, and while memory always increased, the shape of the graph varied quite considerably - as expected when running a non-deterministic program such as Shake. While I still didn't have any real information on what caused the memory leak, I could at least observe the memory leak myself.

Step 3: Hit it with your tools

Before thinking too hard, it is worth applying whatever tools are to hand. In the case of space leaks, the GHC manual describes how to use the heap profiling tools. My standard pattern is to compile with -rtsopts -prof -auto-all -caf-all, run with +RTS -h and view with hp2ps -c. Since the test case was huge (over an hour), I always terminated it early with Ctrl-C, which produces an invalid .hp file. Fortunately, you can delete everything after the last END_SAMPLE to recover a profile that hp2ps can understand.

The results on the right show steadily increasing memory allocated to PINNED values, of type ARR_WORDS, which I suspect are bytestrings. However, the total memory usage as reported by GHC and Process Explorer far exceeded that shown in the profile. It seemed like there was something missing from the profile. I upgraded to the latest GHC version but memory still seemed to be missing. The tools were not giving me any insight - perhaps due to the high degree of parallelism or other complexities in the build system.

Step 4: Simplify

The next step was to produce a simpler example - one I could run to completion in a feasible amount of time. My standard approach is to take the full test case and remove things, checking the problem has not gone away, until nothing more can be removed - leaving a minimal test case. There are two reasons why that was infeasible in this instance: 1) With a program which naturally consumes memory over time, it is not clear if the space leak has disappeared or merely become smaller as things are removed. 2) The first reduction steps would have each taken over an hour.

After deciding reduction was infeasible, I decided to try and produce a test case from scratch (which is usually a terrible idea). The original example used many different types of build rule, in many different ways - it was entirely possible that only one particular variant led to the memory leak. I decided to start with a small example, then (if necessary) try adding features until the space leak returned. Since I had been meaning to produce a benchmark for Shake at some point, I figured I could write a benchmark test which would hopefully show the bug, and even if not, be useful elsewhere.

For benchmarking, generating a random build tree is not so useful, so I attempted to define a regular shaped but interesting dependency pattern. The pattern I settled on was parameterised by breadth and depth, where every node a depth n depended on every node at depth n+1, to a limit of the given depth, with a given number of nodes at each level given by breadth. To accurately benchmark I used the file rules, as these are by far the most common.

I found that with breadth=1000 depth=1000 I was able to exhaust the memory. In order to try and simplify the test case I tried passing flags to the test case to turn off certain features and try and make the problem easier to investigate. I was able to turn off multithreading (-j1), profile reports (--no-report), always build from scratch (--clean) and to build rules in a deterministic order (--deterministic). The final command was:

$ shake benchmark breadth=1000 depth=1000 --deterministic -j1 --clean --no-report

This command works using a released version of Shake if you install the test program (install with --flags=testprog).

With the knowledge that 1000x1000 exceeded available memory I start reducing the numbers so the benchmark would complete in a reasonable timeframe (< 15s), but use an unreasonable amount of memory (> 500Mb). I found that 100x100 gave a reasonable balance.

Step 5: Hit it with your tools (harder)

With a simple example, I once again turned to the heap profiling tools available in GHC. Using +RTS -h I still saw memory unaccounted for, as before. The two obvious guesses are that GHC knows about the memory, but has decided to not show it in the heap profile, or that GHC does not know about the memory (for example, allocated on a different heap). Using +RTS -s I saw that GHC was aware of the additional memory, suggesting GHC had at least some knowledge of the memory. With nothing else to try, I ran through the list of heap profiling flags trying each in turn.

The magic flags turned out to be -xt -hy, producing the graph on the left. The profile shows STACK takes up the majority of the space, and starts to give memory usage in about the right ballpark. I concluded that the space leak must include stack values.

Step 6: Hypothesize and test

A stack in GHC is created by a thread, and a thread is typically created by forkIO. There are three reasons stacks could take too much memory - too large, too many, too long:

The stacks might be becoming too large. I added -K1K to limit all threads to 1Kb of stack. The small test case ran to completion, without any threads exceeding their stack, so the stacks were not growing too large.
There might be too many threads active in the program. I added logging of every call to forkIO, and every time a thread finished. I found I was churning through 1000's of threads, but at most 60 were alive at any one time.
The threads might be keeping their stacks alive after they had finished for too long. Having eliminated the other possibilities, this seemed likely.

Looking for places threads were referenced, and thus potentially kept alive, I found a set I was building up containing ThreadIds. The set is used so that if one Shake thread throws an exception all other threads are killed immediately. When a thread started it was added to the set. When a thread finished it was removed. Under most circumstances the set was never evaluated, but simply thrown away when the build finished. This situation correponds to the example at the start of this post, and was causing a space leak of ThreadId.

I guessed that if the ThreadId is kept alive then the stack is also. As a result, leaking a small amount of memory for a thread identifier was amplified by also leaking the stack for that thread. GHC HQ could certainly modify the runtime system so that the ThreadId did not keep the stack alive, but holding on to completed ThreadIds is rarely a good idea, so I imagine it is not worth it. For more details, including a profiling showing that leaking ThreadIds does leak stacks, see this blog post on the GHC scheduler.

Step 7: Fix the problem

Having identified the likely culprit, I simply needed to force evaluation of the set at each step. The set was held within a larger data structure:

data S = S {threads :: Set.HashSet ThreadId, ...}

To always force the set, I added an exclamation mark, turning it into a strict record:

data S = S {threads :: !(Set.HashSet ThreadId), ...}

Since I had worked for days to find somewhere to insert a single character, I also added a comment, just in case I ever thought about removing the exclamation mark.

Step 8: Check the problem went away

To check the problem had gone away I reran my reduced test case, producing the graph on the right, and showing far less memory usage in Process Explorer. I then reran the full build system and saw a typical reduction in memory of around 1Gb - a most welcome improvement!

Shake-0.4 and later contain that important exclamation mark, so an upgrade is strongly advised.

Does your Haskell program need an additional exclamation mark? Here are some you can use: !!!!!!!!!!!!!!

Sunday, February 17, 2013

Finite Resources in Shake

Summary: Management of finite resources is an important part of any modern build system, only properly available in Shake and Ninja.

I've just released Shake 0.9, a build system library, with a few bug fixes and a bunch of new features (the change log has a complete list). This release contains an incompatible change which makes the Resource feature easier to use, so I thought I'd describe the motivation and use of Resources in Shake. A full upgrade guide is at the bottom of this post.

What are Resources for?

When you run -j10 (shakeThreads=10) you are asking the build system to limit computation so it uses no more than ten CPU resources at a time. The CPU is certainly a precious resource, but there are other resource limitations a build system may need to obey:

Some APIs are global in nature, if you run two programs that access the Excel API at the same time things start to fail.
Many people have large numbers of CPUs, but only one slow rotating hard drive. If you run ten hard-drive thrashing linkers simultaneously the computer is likely to grind to a halt.
Some proprietary software requires licenses, a fixed number of which can be purchased and managed using a license manager. As an example, the Kansas Lava team only have access to 48 licenses for modelsim.

Resources in other build systems

I know of two approaches used by other build systems to obey resource constraints:

Limit the number of CPUs to hit your target - for example, the Lava build system could cap the number of CPUs to the number of licenses. People with 24 CPUs might ask the build system to use only 8, so the linkers do not make their machines unusable (and even then, a link heavy rebuild may still harm interactive performance). This solution wastes CPU resources, leaving CPUs that could be building your code idling.
Add locks to suspend jobs that are competing for the shared resource. For example any rule using Excel could take the Excel lock, either a mutex/MVar in some build systems, or creating a file to serve as the lock in make based build systems. Locking can be made to work, but is tricky if you have to fake locks using the file system, and still squanders CPU resources - instead of blocking the CPU should be running another rule.

The one exception is the Ninja build system which has a concept of "pools", which properly model finite resources.

Resources in Shake

In Shake the Resource type represents a finite resource, which multiple build rules can use. Resource values are created with newResource and used with withResource. As an example, only one set of calls to the Excel API can occur at one time, therefore Excel is a finite resource of quantity 1. You can write:

shake shakeOptions{shakeThreads=2} $ do
    want ["a.xls","b.xls"]
    excel <- newResource "Excel" 1
    "*.xls" *> \out ->
        withResource excel 1 $
            system' "excel" [out,...]

Now we will never run two copies of excel simultaneously. Moreover, it will never block waiting for excel if there are other rules that could be run.

For most programming languages the compiler is CPU bound but the linker is disk bound. Running 8 linkers will often cause an 8 CPU system to grid to a halt. We can limit ourselves to 4 linkers with:

disk <- newResource "Disk" 4
want [show i <.> "exe" | i <- [1..100]]
    "*.exe" *> \out ->
        withResource disk 1 $
            system' "ld" ["-o",out,...]
    "*.o" *> \out ->
        system' "cl" ["-o",out,...]

Now we can use 7 or 8 CPUs while still leaving the computer responsive enough to browse the web.

Software licenses are another finite resource and can be managed in the same way. For a complete example see the Kansas Lava test program, which uses Shake.

Porting from Shake 0.8

In Shake 0.9 the newResource function has been renamed to newResourceIO - rename newResource to newResourceIO everywhere and your code will work again.

However, you may have noticed that newResourceIO (as it is now called) forces you to create the resource before calling the shake function, meaning that often the creation and use of the resource are far apart. I have introduced a function newResource which runs in the Rules monad, allowing you to create a resource and then use it nearby. Moving the creation and use of resources closer together makes it much easier to check your resource constraints are met.

The only other breaking change is that shakeVersion has become a String rather than an Int, allowing you to store more precise information about the version (for example, your build system might want to encode the GHC version and the version of the build system in the string).

Updated 18 Feb 2013: Ninja also supports finite resources.

Thursday, February 07, 2013

A Nofib build system using Shake

Last February I had a few discussions with David Terei about replacing the Nofib benchmark suite build system with something based on Shake. Naturally, like all Make to Shake conversions, I thought this was an excellent idea. In further discussions with Simon Peyton Jones and Simon Marlow I refined the build system to match the useful features of the old Make based system.

I have now put the build system online: Shake build system for Nofib.

Unfortunately, I don't think much has happened with the build system in the last year. I invite people to play with it and use it for whatever purpose they want. The rest of this post is divided into two sections - the first section is how to use the build system (for people running nofib) the second half is how it works and how it was written (for people writing build systems with Shake).

Running the nofib suite

Grab the nofib suite, take the above file and put it in the root directory. You can then run:

$ runhaskell Nofib -j2 imaginary --way="-O2 -fvia-C" --run
... build related output ...
Build completed
Running imaginary/bernouilli...501ms
Running imaginary/digits-of-e1...513ms
... more tests ...
Running imaginary/wheel-sieve2...238ms
Running imaginary/x2n1...27ms

The -j2 is the parallelism to use when building (not when testing), the --way is which flags to pass to the compiler (so it should be trivial to experiment LLVM vs not), the "imaginary" is which section to run, but naming an individual test such as "x2n1" or just leaving it blank for all also works. The --run says run the tests afterwards, defaulting to norm, but --run=fast/--run=slow also works.

The system uses whatever GHC is on your path first, and stores the output under that GHC version, but if you want two GHC's built with differing bits, you can pass --compiler=ghc-with-a-tweak to do the build under a separate directory.

A few more examples, assuming the build system is compiled as runner, with the nofib command and the new equivalent:

Run a single test:

$ cd nofib/imaginary/exp3_8
$ make

runner exp3_8 --run

Run a test with some extra GHC flags:

$ cd nofib/imaginary/exp3_8
$ make EXTRA_HC_OPTS=-fllvm

runner exp3_8 --way="-O1 -fllvm" --run

Run a test with a different GHC:

$ cd nofib/imaginary/exp3_8
$ make HC=ghc-7.0.4

runner exp3_8 --compiler=ghc-7.0.4 --run

Just build a test, don't run it:

$ make NoFibRuns=0

runner

Writing the build system

This build system is well commented, so I encourage people to read the code. It makes use of CmdArgs for command line parsing (lines 60-114) combined with Shake for dependency based tracking (lines 187-281). The total system is 358 lines long, and the rest is mostly pieces for running tests.

When writing the build system I had no idea what the old Makefile system did - I read what docs I could find, took a brief stab and reading the Makefile, but failed to even run it on the laptop I was working on. Therefore, I wrote the new system from scratch, and then looked back to see what the Makefiles did that the new system did not.

As is common is Make based build systems, the Makefile stored a combination of build rules (how to compile files, line 187), static confirmation data (which targets to build, line 31) and dynamic configuration data (what inputs to give when testing, line 289). The build rules and static configuration data can be easily moved into the new build system, either as function rules or top-level constants. The dynamic configuration can be moved over as top-level constants, but that means the build system requires modifying more than it should, and it is harder to track the dependencies on this data. Instead I followed an approach I've used in several build systems, by writing a converter which sucks the dynamic data out of the Makefile and produces a much simpler config file. The converter is in the build system, so the config files are generated from the Makefiles, and updated appropriately. To actually run the tests, I query the config files. The hope is that in future everyone will decide to delete the Makefiles, and the config files can be checked in as source files, and the translation aspect can be deleted.

When writing the build system I ran into a GHC bug that when two linkers ran simultaneously they failed. To work round this bug I created a resource of a linker, with quantity 1 (line 152). Whenever I call the GHC linker I acquire this resource (line 215). Now Shake will run the rest of the build system with maximum parallelism, but ensure no linkers ever run in parallel, working round the bug.

Thursday, January 31, 2013

Shake 0.8 is out

Summary: Shake 0.8 is out, ask any questions on StackOverflow

Shake is a build system I have been working on sporadically for the last four years, think of it as a better alternative to writing Makefiles. In the past few weeks I've released versions 0.6, 0.7 and 0.8.

Questions about Shake

Unlike many of my other libraries, Shake invites user questions. It's a complex tool with lots of power to wield, and lots of aspects that emerge from the API, rather than being obvious from it. Therefore, I encourage anyone with any questions about Shake to ask them against the shake-build-system StackOverflow tag (thanks to Lennart Augustsson for creating the tag, as my reputation is too low). I've already asked one question, but I'm sure there are lots of others - "how does Shake/Make compare to monad/arrow?", "why did the Oracle change type?", "how would I define a rule that downloads from the web?". The more questions the easier it will be for future Shake users to find the information they need.

API Change: getDirectoryFiles

There is only one real breaking API change in the above series of versions, getDirectoryFiles now takes a list of FilePatterns. So you can write:

getDirectoryFiles "Configuration" ["*.xml","*.json"]

to find all XML and JSON files in your Configuration directory. The reason for this change is to introduce a new and more powerful matching capability, so you can also write:

getDirectoryFiles "Configuration" ["//*.xml","//*.json"]

And that will find all XML and JSON files anywhere under the Configuration directory. Shake tries hard to issue the minimum number of directory traversals, so searching for a list of patterns results in fewer file system queries than searching for each pattern individually.

Sunday, January 06, 2013

Shake 0.5 is out

Summary: Shake 0.5 is out, a strongly recommended upgrade.

Shake is a build system I have been working on sporadically for the last four years, think of it as a better alternative to writing Makefiles. In the past week I've released version 0.4 followed by version 0.5. I strongly recommend all users upgrade to the latest version of Shake as it fixes a massive space leak compared to versions 0.2 and 0.3, I have measured memory savings of over 1Gb in some build systems (I intend to write a post on the space leak later). There are two notable API changes for people upgrading from version 0.3.

Change 1: The oracle is now strongly typed

Shake has a notion of oracle rules, which store information about the system, for example which GHC version you are running. The intent is to allow users to track this extra information, so if you upgrade GHC, the build system will automatically rebuild all Haskell files, but leave the C files alone, without requiring users to perform a wipe.

An Oracle is essentially a question/answer pair. In shake-0.3 both questions and answers were of type [String], the documentation attempted to justify this abomination by saying:

"This type is a compromise. Questions will often be the singleton list, but allowing a list of strings allows hierarchical schemes such as ghc-pkg shake, ghc-pkg base etc. The answers are often singleton lists, but sometimes are used as sets - for example the list of packages returned by ghc-pkg."

Shake required you to impose a global namespace on questions, and to encode results in an impoverished type. No more! The oracle now allows arbitrary question/answer types, namespacing is automatic since the type definitions act as unique questions, and the answer can be any type required.

The API for working with the oracle has changed, but the necessary modifications should be fairly localised. As an example conversion:

addOracle ["ghc"] $ return ["7.2.1"]
askOracle ["ghc"]

Would become:

newtype GhcVersion = GhcVersion () deriving (Show,Typeable,Eq,Hashable,Binary,NFData)
addOracle $ \GhcVersion{} -> return "7.2.1"
askOracleWith (GhcVersion ()) ""

There is a little more boilerplate with the new version, but all the problems caused by [String] are gone, and in larger projects it can lead to a significant reduction in complexity and cross-module namespace issues.

Change 2: validStored becomes storedValue

When defining your own rules (a rare occurrence) previously you had to supply a definition:

validStored :: key -> value -> IO Bool

Given a key, look up that value on disk and check it matches the value you were given. This API is very much what Shake needs to do at runtime - it needs to check that the value it has is still valid. Almost all definitions followed the form:

validStored key value = do
    v <- lookupMyKey key
    case v of
        Nothing -> return False -- not stored, so doesn't match
        Just v -> return $ v == value

In the new version of Shake, you would instead supply:

storedValue :: key -> IO (Maybe value)
storedValue = lookupMyKey

This revised definition simplifies most rules, still has the same power a old system (because you can define a custom Eq instance for your value type), and has allowed the addition of the AssumeClean feature to Shake (more about that in a future post, or read the Shake Haddock documentation).