Wednesday, December 10, 2014

Beta testing the Windows Minimal GHC Installer

Summary: There is now a minimal Windows GHC installer that can install the network library.

Michael Snoyman and I are seeking beta testers for our new Windows GHC installer. The installer is very new, and we're keen to get feedback.

Update: If you want to install to Program Files (the default installation path), download the file, right click and hit "Run as administrator". This will be automatic in the next version.

One of the standard problems when upgrading Haskell libraries on Windows is that sometimes you have to upgrade a package which requires a configure script - typically the network library. When that happens, you have to take special measures, and even then, sometimes the binary will fail at runtime (this happens on about half my machines).

Our installer solves that problem by bundling GHC, Cabal and MSYS in one installer. You can install the network library out of the box without any special configuration and it just works.

Our installer does not provide all the packages included with the Haskell Platform, but it does provide an environment where you can easily install those packages. This minimal installer might be more suitable for developers.

Why might I want this installer

The installer is much easier to install than the GHC distribution (has a real installer, sets up the %PATH%, includes Cabal and can build the network library).

The installer has less libraries than the Haskell Platform installer, but includes MSYS so you can build everything in the Platform. The lack of additional packages and close correspondence with GHC (there is one installer for each GHC version) might make it more suitable for library developers. Once GHC 7.10.1 is released, we hope to make a GHC Minimal Installer available within days.

The installer also has more precise control over what is added to the %PATH%. You can add all the binaries it provides, or just add a single binary minghc-7.8.3 which when run at the command prompt temporarily adds all the installed binaries to the %PATH%. Using the switcher, you can easily switch GHC versions.

Wednesday, November 19, 2014

Announcing Shake 0.14

Summary: Shake 0.14 is now out. The *> operator became %>. If you are on Windows the FilePath operations work differently.

I'm pleased to announce Shake 0.14, which has one set of incompatible changes, one set of deprecations and some new features.

The *> operator and friends are now deprecated

The *> operator has been the Shake operator for many years, but in GHC 7.10 the *> operator is exported by the Prelude, so causes name clashes. To fix that, I've added %> as an alias for *>. I expect to export *> for the foreseeable future, but you should use %> going forward, and will likely have to switch with GHC 7.10. All the Shake documentation has been updated. The &*> and |*> operators have been renamed &%> and |%> to remain consistent.

Development.Shake.FilePath now exports System.FilePath

Previously the module Development.Shake.FilePath mostly exported System.FilePath.Posix (even on Windows), along with some additional functions, but modified normalise to remove /../ and made </> always call normalise. The reason is that if you pass non-normalised FilePath values to need Shake can end up with two names for one on-disk file and everything goes wrong, so it tried to avoid creating non-normalised paths.

As of 0.14 I now fully normalise the values inside need, so there is no requirement for the arguments to need to be normalised already. This change allows the simplification of directly exporting System.FilePath and only adding to it. If you are not using Windows, the changes are:

  • normalise doesn't eliminate /../, but normaliseEx does.
  • </> no longer calls normalise. It turned out calling normalise is pretty harmful for FilePattern values, which leads to fairly easy-to-make but hard-to-debug issues.

If you are using Windows, you'll notice all the operations now use \ instead of /, and properly cope with Windows-specific aspects like drives. The function toStandard (convert all separators to /) might be required in a few places. The reasons for this change are discussed in bug #193.

New features

This release has lots of new features since 0.13. You can see a complete list in the change log, but the most user-visible ones are:

  • Add getConfigKeys to Development.Shake.Config.
  • Add withTempFile and withTempDir for the Action monad.
  • Add -j to run with one thread per processor.
  • Make |%> matching with simple files much faster.
  • Use far less threads, with corresponding less stack usage.
  • Add copyFileChanged.

Plans

I'm hoping the next release will be 1.0!

Thursday, November 13, 2014

Operators on Hackage

Summary: I wrote a script to list all operators on Hackage, and which packages they are used by.

In GHC 7.10 the *> operator will be moving into the Prelude, which means the Shake library will have to find an alternative operator (discussion on the mailing list). In order to pick a sensible operator, I wanted to list all operators in all Hackage packages so I could be aware of clashes.

Note that exported operators is more than just those defined by the package, e.g. Shake exports the Eq class, so == is counted as being exported by Shake. However, in most cases, operators exported by a package are defined by that package.

Producing the file

First I downloaded the Hoogle databases from Hackage, and extracted them to a directory named hoogle. I then ran:

ghc --make Operators.hs && operators hoogle operators.txt

And uploaded operators.txt above. The code for Operators.hs is:

import Control.Exception.Extra
import Control.Monad
import Data.List.Extra
import System.Directory.Extra
import System.Environment
import System.FilePath
import System.IO.Extra

main = do
    [dir,out] <- getArgs
    files <- listFilesRecursive dir
    xs <- forM files $ \file -> do
        src <- readFileUTF8' file `catch_` \_ -> readFile' file `catch_` \_ -> return ""
        return [("(" ++ takeWhile (/= ')') x ++ ")", takeBaseName file) | '(':x <- lines src]
    writeFileUTF8 out $ unlines [unwords $ a : nub b | (a,b) <- groupSort $ concat xs]

This code relies on the normal packages distributed with GHC, plus the extra package.

Code explanation

The script is pretty simple. I first get two arguments, which is where to find the extracted files, and where to write the result. I then use listFilesRecursive to recursively find all extracted files, and forM to loop over them. For each file I read it in (trying first UTF8, then normal encoding, then giving up). For each line I look for ( as the first character, and form a list of [(operator-name, package)].

After producing the list, I use groupSort to produce [(operator-name, [package])] then writeFileUTF8 to produce the output. Running the script takes just over a minute on my ancient computer.

Writing the code

Writing the code to produce the operator list took about 15 minutes, and I made some notes as I was going.

  • I started by loading up ghcid for the file with the command line ghcid -t -c "ghci Operators.hs". Now every save immediately resulted in a list of warnings/errors, and I never bothered opening the file in ghci, I just compiled it to test.
  • I started by inserting take 20 files so I could debug the script faster and could manually check the output was plausible.
  • At first I wrote takeBaseName src rather than takeBaseName file. That produced a lot of totally incorrect output, woops.
  • At first I used readFile to suck in the data and putStr to print it to the console. That corrupted Unicode operators, so I switched to readFileUTF8' and writeFileUTF8.
  • After switching to writeFileUTF8 I found a rather serious bug in the extra library, which I fixed and added tests for, then made a new release.
  • After trying to search through the results, I added ( and ) around each operator to make it easier to search for the operators I cared about.

User Exercise

To calculate the stats of most exported operator and package with most operators I wrote two lines of code - how would you write such code? Hint: both my lines involved maximumBy.

Tuesday, November 11, 2014

Upper bounds or not?

Summary: I thought through the issue of upper bounds on Haskell package dependencies, and it turns out I don't agree with anyone :-)

There is currently a debate about whether Haskell packages should have upper bounds for their dependencies or not. Concretely, given mypackage and dependency-1.0.2, should I write dependency >= 1 (no upper bounds) or dependency >= 1 && < 1.1 (PVP/Package versioning policy upper bounds). I came to the conclusion that the bounds should be dependency >= 1, but that Hackage should automatically add an upper bound of dependency <= 1.0.2.

Rock vs Hard Place

The reason the debate has continued so long is because both choices are unpleasant:

  • Don't add upper bounds, and have packages break for your users because they are no longer compatible.
  • Add PVP upper bounds, and have reasonable install plans rejected and users needlessly downgraded to old versions of packages. If one package requires a minimum version of above n, and another requires a maximum below n, they can't be combined. The PVP allows adding new functions, so even if all your dependencies follow the PVP, the code might still fail to compile.

I believe there are two relevant relevant factors in choosing which scheme to follow.

Factor 1: How long will it take to update the .cabal file

Let us assume that the .cabal file can be updated in minutes. If there are excessively restrictive bounds for a few minutes it doesn't matter - the code will be out of date, but only by a few minutes, and other packages requiring the latest version are unlikely.

As the .cabal file takes longer to update, the problems with restrictive bounds become worse. For abandoned projects, the restrictive upper bounds make them unusable. For actively maintained projects with many dependencies, bounds bumps can be required weekly, and a two week vacation can break actively maintained code.

Factor 2: How likely is the dependency upgrade to break

If upgrading a dependency breaks the package, then upper bounds are a good idea. In general it is impossible to predict whether a dependency upgrade will break a package or not, but my experience is that most packages usually work fine. For some projects, there are stated compatibility ranges, e.g. Snap declares that any API will be supported for two 0.1 releases. For other projects, some dependencies are so tightly-coupled that every 0.1 increment will almost certainly fail to compile, e.g. the HLint dependency on Haskell-src-exts.

The fact that these two variable factors are used to arrive at a binary decision is likely the reason the Haskell community has yet to reach a conclusion.

My Answer

My current preference is to normally omit upper bounds. I do that because:

  • For projects I use heavily, e.g. haskell-src-exts, I have fairly regular communication with the maintainers, so am not surprised by releases.
  • For most projects I depend on only a fraction of the API, e.g. wai, and most changes are irrelevant to me.
  • Michael Snoyman and the excellent Stackage alert me to broken upgrades quickly, so I can respond when things go wrong.
  • I maintain quite a few projects, and the administrative overhead of uploading new versions, testing, waiting for continuous-integration results etc would cut down on real coding time. (While the Hackage facility to edit the metadata would be quicker, I think that tweaking fundamentals of my package, but skipping the revision control and continuous integration, seems misguided.)
  • The PVP is a heuristic, but usually the upper bound is too tight, and occasionally the upper bound is too loose. Relying on the PVP to provide bounds is no silver bullet.

On the negative side, occasionally my packages no longer compile for my users (very rarely, and for short periods of time, but it has happened). Of course, I don't like that at all, so do include upper bounds for things like haskell-src-exts.

The Right Answer

I want my packages to use versions of dependencies such that:

  • All the features I require are present.
  • There are no future changes that stop my code from compiling or passing its test suite.

I can achieve the first objective by specifying a lower bound, which I do. There is no way to predict the future, so no way I can restrict the upper bound perfectly in advance. The right answer must involve:

  • On every dependency upgrade, Hackage (or some agent of Hackage) must try to compile and test my package. Many Haskell packages are already tested using Travis CI, so reusing those tests seems a good way to gauge success.
  • If the compile and tests pass, then the bounds can be increased to the version just tested.
  • If the compile or tests fail, then the bounds must be tightened to exclude the new version, and the author needs to be informed.

With this infrastructure, the time a dependency is too tight is small, and the chance of breakage is unknown, meaning that Hackage packages should have exact upper bounds - much tighter than PVP upper bounds.

Caveats: I am unsure whether such regularly changing metadata should be incorporated into the .cabal file or not. I realise the above setup requires quite a lot of Hackage infrastructure, but will buy anyone who sorts it out some beer.

Sunday, November 02, 2014

November talks

Summary: I'll be talking at CodeMesh 2014 on 5th November and FP Days on 20th November.

I am giving two talks in London this month:

CodeMesh 2014 - Gluing Things Together with Haskell, 5th Nov

I'll be talking about how you can use Haskell as the glue language in a project, instead of something like Bash. In particular, I'll cover Shake, NSIS and Bake. The abstract reads:

A large software project is more than just the code that goes into a release, in particular you need lots of glue code to put everything together - including build systems, test harnesses, installer generators etc. While the choice of language for the project is often a carefully considered decision, more often than not the glue code consists of shell scripts and Makefiles. But just as functional programming provides a better way to write the project, it also provides a better way to write the glue code. This talk covers some of the technologies and approaches we use at Standard Chartered to glue together the quant library. In particular, we'll focus on the build system where we replaced 10,000 lines of Makefiles with 1,000 lines of Haskell which builds the project twice as fast. We'll also look at how to test programs using Haskell, how to replace ancillary shell scripts with Haskell, and how to use Haskell to generate installers.

FP Days 2014 - Building stuff with Shake, 20th Nov

I'll be giving a tutorial on building stuff with Shake. It's going to be less sales pitch, more how you structure a build system, and how you use Shake effectively. The abstract reads:

Build systems are a key part of any large software project, relied upon by both developers and release processes. It's important that the build system is understandable, reliable and fast. This talk introduces the Shake build system which is intended to help meet those goals. Users of Shake write a Haskell program which makes heavy use of the Shake library, while still allowing the full power of Haskell to be used. The Shake library provides powerful dependency features along with useful extras (profiling, debugging, command line handling). This tutorial aims to help you learn how to think about writing build systems, and how to make those thoughts concrete in Shake.

Sunday, October 19, 2014

HLint now spots bad unsafePerformIO

Summary: I've just released a new version of HLint that can spot an unsafePerformIO which should have NOINLINE but doesn't.

I've just released HLint v1.9.10, a tool to suggest improvements to Haskell code. I don't usually bother with release announcements of HLint, as each version just fixes a few things and adds a few hints, it's all in the changelog (plus there have now been 102 releases). The latest release attempts to make using unsafePerformIO a little bit safer. A common idiom for top-level mutable state in Haskell is:

myGlobalVar :: IORef Int
myGlobalVar = unsafePerformIO (newIORef 17)

That is, define a top-level CAF (function with no variables) and bind it to unsafePerformIO of some created mutable state. But the definition above is unsafe. GHC might decide myGlobalVar is cheap and inline it into several uses, duplicating the variable so that some functions update different versions. Running this code through the latest version of HLint we see:

Sample.hs:2:1: Error: Missing NOINLINE pragma
Found:
  myGlobalVar = unsafePerformIO (newIORef 17)
Why not:
  {-# NOINLINE myGlobalVar #-}
  myGlobalVar = unsafePerformIO (newIORef 17)

HLint has spotted the problem, and suggested the correct fix.

Trying it for real

Let's take the package slave-thread-0.1.0 and run HLint on it. Slave thread is a new package that helps you ensure you don't end up with ghost threads or exceptions being thrown but missed - a useful package. Running HLint we see:

Sample.hs:19:1: Error: Missing NOINLINE pragma
Found:
  slaves = unsafePerformIO $ Multimap.newIO
Why not:
  {-# NOINLINE slaves #-}
  slaves = unsafePerformIO $ Multimap.newIO

Sample.hs:20:3: Warning: Redundant $
Found:
  unsafePerformIO $ Multimap.newIO
Why not:
  unsafePerformIO Multimap.newIO

Sample.hs:25:1: Error: Eta reduce
Found:
  fork main = forkFinally (return ()) main
Why not:
  fork = forkFinally (return ())

Sample.hs:55:28: Warning: Redundant $
Found:
  PartialHandler.totalizeRethrowingTo_ masterThread $ mempty
Why not:
  PartialHandler.totalizeRethrowingTo_ masterThread mempty

Sample.hs:72:5: Error: Use unless
Found:
  if null then return () else retry
Why not:
  Control.Monad.unless null retry

HLint has managed to spot the missing NOINLINE pragma, and also a handful of tiny cleanups that might make the code a little more readable. Fortunately (and independent of HLint), the NOINLINE pragma was added in slave-thread-0.1.1, so the library no longer suffers from that bug.

Friday, October 17, 2014

Fixing Haddock docs on Hackage

Summary: A few weeks ago Hackage stopped generating docs. I have a script that generates the docs, and also fixes some Haddock bugs.

Update: The Haddock documentation generators are running once again, but may be somewhat behind for a little while. A few weeks ago Hackage stopped generating documentation, so if you look at recently uploaded pages they tend to either lack docs, or have very alert maintainers who did a manual upload. I've packaged up my solution, which also fixes some pretty annoying Haddock bugs. Given that I can now get docs faster and better with my script, I'll probably keep using it even after Haddock on Hackage gets restarted.

How to use it

  • You are likely to get better results if you always install the packages you use with documentation.
  • Ensure you have tar, curl, cp and cabal on your $PATH.
  • cabal update && cabal install neil
  • Make a release, don't touch any other code, then make sure you are in the project directory.
  • neil docs --username=YourHackageUsername
  • Type in your Hackage password at the prompt.

And like that, your docs are online. To see an example of something that was generated with this process, look at Shake.

What I fixed

I automated the process using scripts originally taken from the lens library, supplemented with suggestions from Reddit. I then do a number of manual fixups.

  • Haddock now makes cross-module links where it doesn't know what the target is default to types. Concretely, if I write 'Development.Shake.need' in Haddock it generates a link to #t:need, which doesn't exist, when it should be #v:need - I fix that.
  • Update: fixed in Haddock 1.14 or above, as per this ticket.
  • On Windows, if you use CPP and multiline bird-tick (>) Haddock blocks you get a blank line between each line. I fix that.
  • If you follow some of the simpler scripts links outside your package won't work (or at least, didn't for me). I fix that.

The neil tool

The neil tool is my personal set of handy Haskell scripts. I make all my releases with it (neil sdist), and do lots of checks that my packages conform to my rules (neil check). I also use it for driving my Travis instances. It's in fairly regular flux. Until now, I've always kept it in Darcs/Git and never released it - it's personal stuff tuned to how I work.

You might also notice that neil provides a library. Don't use that, I intend to delete it in a few weeks. Update: library removed.