Neil Mitchell's Blog (Haskell etc): filepath

Showing posts with label filepath. Show all posts

Monday, October 26, 2015

FilePaths are subtle, symlinks are hard

Summary: When thinking about the filepath .., remember symlinks, or you will trip yourself up.

As the maintainer of the Haskell filepath package, one common path-related mistake I see is the assumption that filepaths have the invariant:

/bob/home/../cookies == /bob/cookies

I can see the conceptual appeal - go down one directory, go up one directory, end up where you started. Unfortunately, it's not true. Consider the case where home is a symlink to tony/home. Now, applying the symlink leaves us with the case:

/tony/home/../cookies == /bob/cookies

And, assuming /tony/home is itself not a symlink, that reduces to:

/tony/cookies == /bob/cookies

This is clearly incorrect (assuming no symlinks), so the original invariant was also incorrect, and cannot be relied upon in general. The subtle bit is that descending into a directory might move somewhere else, so it's not an operation that can be undone with ... Each step of the path is interpreted based on where it ends up, not based on the path it took to the current point.

While this property isn't true in general, there are many special cases where it is reasonable to assume. For example, the shake package contains a normaliseEx function that normalises under this assumption, but nothing in the standard filepath package assumes it.

The full example

/
   [DIR]  bob
   [DIR]  tony
/bob
   [LINK] home -> /tony/home
   [FILE] cookies 
/tony
   [DIR]  home
/tony/home
   [FILE] cookies

Thursday, April 12, 2007

Exploding Patterns in FilePath

[ICFP referees please stop reading now]

After what feels like forever, finally my FilePath library has hit 1.0. It took over 250 email messages debating FilePath design choices from Haskell 1.0 onwards, but finally we were able to come to an agreement about what the libary should look like. I'd like to thank everyone who participated in the discussion, used the library, and encouraged me to keep pushing it.

I've previously blogged about my QuickCheck testing of the FilePath library, something which has been essential to coming up with a coherent design - and which lead to various decisions being made for me. Now I'd like to talk about the pattern match story. For the purposes of this post, I'm going to term "exploding patterns" to mean pattern-match errors that can be provoked by appropriately chosen input.

After doing all the QuickCheck tests, you might think I was quite confident of avoiding exploding patterns, but I was not. The QuickCheck tests all work with lists of 25 characters - perhaps the empty list crashes the function? Perhaps only some really rare input? After all that loose random testing, the time came for some serious proof!

Taking a look at the code, I have used 5 incomplete functions (fromJust, init, head, last, tail) and one explicit non-exhaustive pattern-match. Can any of these explode? Turning the code over to my pattern-match checker, it quickly (2 seconds, 2Mb later) informs me that it can prove none of the incomplete functions crash, but it can't prove the explicit pattern match. So lets take a look:

f x | isWindows =
| isPosix = ....

Oh dear, it won't crash currently (isPosix = not isWindows), but you can certainly see that these patterns aren't going to stay complete if we add another operating system. Quick as a flash, we replace "isPosix" with "otherwise", and run the checker again. Yay, no exploding patterns remain.

So there we have it, in under 5 minutes, I was able to proove my entire library to be safe. I can still use all the fun functions like head, but in a safe manner. It turns out the library was always safe, but one minor modification to improve the code, and I can be sure that things won't go wrong - the released 1.0 has this modification. I have the guarantees of safety, without much work at all. Not only that, but as the library evolves, I'm only 2 seconds away (literally!) from pattern-match safety, compared to my QuickCheck test which takes around a minute, this is a very cheap verification.

So how come that within an hour of releasing 1.0 someone had already replied with a bug report about a crashing pattern? The answer is simple, Setup.hs in Cabal has code in it which could (at the time of release, but not anymore) cause a pattern explosion. I didn't write, or validate, the Setup.hs script - but it just goes to show that pattern-match errors can easily slip through!

Friday, November 10, 2006

System.FilePath, automated testing

I just released System.FilePath, a library for manipulating FilePath's on both Windows and Posix. The problem with a library like this is that there are lots of corner cases, lots of weird situations and everything needs to be tested twice with different semantics. Obviously this requires a test infrastructure different from most other libraries!

I started off with a separate file for writing properties, but quickly found that the properties were the best type of documentation for a function. It was also a pain to keep two different places which identify what the logic of the code is - i.e. the code and the tests. The obvious idea then is to combine the code, documentation and testing into one. Because I am using Haddock that turned out to be quite easy to do - any line beginning with "-- > " is a test. Haddock sees this as monospace formatting, Haskell as comment, and my test generator can find the tests pretty easily.

Within the System.FilePath repo I have a DOS Batch file driver (test.bat) which uses a separate Haskell program (GenTests.hs) to create a test script and run it.

Within the test listing there are some tests which I refer to as constant, and some as properties. Some of the tests have no free variables, these are constant - the GenTests recognises them and outputs them directly - in one execution they either pass or fail. The properties are just standard QuickCheck properties, with the restriction that every multi-letter keyword not in a certain known set is a function in the library, and every variable x..z is a FilePath (hence using a custom FilePath generator).

The main complication in testing from System.FilePath is the fact that every property corresponds to two different tests - one on the Posix implementation, one on the Windows implementation. The translator automatically does this duplication, unless either Posix: or Windows: is given at the start of the test, in which case the test is only executed on the appropriate version.

For QuickCheck testing I defined a FilePath to be a 25 character string, from the following set of characters "?|./:\\abcd 123;_". The idea of this set of characters is to include every character that any aspect of the library treats differently, along with a small selection of "normal" letters/numbers.

There was one modification I had to make to QuickCheck, by default QuickCheck returns successfully outputting success/failure to the console. Unfortunately if an automatic program is executing over 200 tests, then these messages can get obscured in the general noise - this happened more than once. To combat this I defined a new QuickCheck wrapper which calls error on failure. Ideally the signature of quickCheck should be changed to :: .. -> IO Bool to detect these situations and allow the driver script to fail more obviously.

Without QuickCheck I don't think it would have been possible to write the FilePath library - it caught too many obscure bugs that manual testing would never have found. In addition, QuickCheck forced me to think about the properties of the library more closely - I changed some design decisions after it turned out that the properties disagreed with me. The one thing QuickCheck helped with more than anything though was refactoring - despite a massive number of the functions all depending on each other, QuickCheck allows me to change the behaviour of one function in some obscure case and check that no other function was relying on that.

The only criticism that can be levelled at my use of QuickCheck is that failing examples are not minimal, in fact they are exactly 25 characters long. I hope that at some point soon I can make use of SmallCheck (once it has a darcs repo and a .cabal file) to do testing alongside QuickCheck to get a greater depth of coverage.

All the test scripts I have written are available in the darcs repo, under the BSD3. If anyone can make use of them, I'd be happy to have someone take them forward!

Neil Mitchell's Blog (Haskell etc)