Thursday, April 12, 2007

Exploding Patterns in FilePath

[ICFP referees please stop reading now]

After what feels like forever, finally my FilePath library has hit 1.0. It took over 250 email messages debating FilePath design choices from Haskell 1.0 onwards, but finally we were able to come to an agreement about what the libary should look like. I'd like to thank everyone who participated in the discussion, used the library, and encouraged me to keep pushing it.

I've previously blogged about my QuickCheck testing of the FilePath library, something which has been essential to coming up with a coherent design - and which lead to various decisions being made for me. Now I'd like to talk about the pattern match story. For the purposes of this post, I'm going to term "exploding patterns" to mean pattern-match errors that can be provoked by appropriately chosen input.

After doing all the QuickCheck tests, you might think I was quite confident of avoiding exploding patterns, but I was not. The QuickCheck tests all work with lists of 25 characters - perhaps the empty list crashes the function? Perhaps only some really rare input? After all that loose random testing, the time came for some serious proof!

Taking a look at the code, I have used 5 incomplete functions (fromJust, init, head, last, tail) and one explicit non-exhaustive pattern-match. Can any of these explode? Turning the code over to my pattern-match checker, it quickly (2 seconds, 2Mb later) informs me that it can prove none of the incomplete functions crash, but it can't prove the explicit pattern match. So lets take a look:

f x | isWindows =
| isPosix = ....

Oh dear, it won't crash currently (isPosix = not isWindows), but you can certainly see that these patterns aren't going to stay complete if we add another operating system. Quick as a flash, we replace "isPosix" with "otherwise", and run the checker again. Yay, no exploding patterns remain.

So there we have it, in under 5 minutes, I was able to proove my entire library to be safe. I can still use all the fun functions like head, but in a safe manner. It turns out the library was always safe, but one minor modification to improve the code, and I can be sure that things won't go wrong - the released 1.0 has this modification. I have the guarantees of safety, without much work at all. Not only that, but as the library evolves, I'm only 2 seconds away (literally!) from pattern-match safety, compared to my QuickCheck test which takes around a minute, this is a very cheap verification.

So how come that within an hour of releasing 1.0 someone had already replied with a bug report about a crashing pattern? The answer is simple, Setup.hs in Cabal has code in it which could (at the time of release, but not anymore) cause a pattern explosion. I didn't write, or validate, the Setup.hs script - but it just goes to show that pattern-match errors can easily slip through!

5 comments:

Richard said...

Isn't it more "future-proof" to add an otherwise case at the end, which raises an error, rather than replace the isPosix one?

Neil Mitchell said...

Richard: In this particular case, no. The function compares equality of FilePath's, and the Windows one makes the special action of mapping toLower over the string first. The safe default option is to not do this, which is also what Posix does. So any added systems will be safe if they take the Posix route.

richard said...

ah, I see. That does sound safe enough... although I'm noting that on my macs HFS+ filenames are not case sensitive, either... since it's really more of a filesystem behavior than a platform behavior, I guess it's hard to choose something that's always right.

Regardless, this looks like a nice library, and I'll be making use of it soon! Thanks.

Neil Mitchell said...

And on Windows a drive can mount NFS drives as well, which are case sensitive. You are indeed right that its a drive property, but with symlinks and no real way to query drives, its impossible to get right. Some sensible guessing is the only alternative.

Phil Armstrong said...

To be even more confusing, HFS+ filesystems can be case sensitive: it's a per filesystem setting. You're not allowed to make the system partition case sensitive though, as it would break the OS.