Friday, November 10, 2006

System.FilePath, automated testing

I just released System.FilePath, a library for manipulating FilePath's on both Windows and Posix. The problem with a library like this is that there are lots of corner cases, lots of weird situations and everything needs to be tested twice with different semantics. Obviously this requires a test infrastructure different from most other libraries!

I started off with a separate file for writing properties, but quickly found that the properties were the best type of documentation for a function. It was also a pain to keep two different places which identify what the logic of the code is - i.e. the code and the tests. The obvious idea then is to combine the code, documentation and testing into one. Because I am using Haddock that turned out to be quite easy to do - any line beginning with "-- > " is a test. Haddock sees this as monospace formatting, Haskell as comment, and my test generator can find the tests pretty easily.

Within the System.FilePath repo I have a DOS Batch file driver (test.bat) which uses a separate Haskell program (GenTests.hs) to create a test script and run it.

Within the test listing there are some tests which I refer to as constant, and some as properties. Some of the tests have no free variables, these are constant - the GenTests recognises them and outputs them directly - in one execution they either pass or fail. The properties are just standard QuickCheck properties, with the restriction that every multi-letter keyword not in a certain known set is a function in the library, and every variable x..z is a FilePath (hence using a custom FilePath generator).

The main complication in testing from System.FilePath is the fact that every property corresponds to two different tests - one on the Posix implementation, one on the Windows implementation. The translator automatically does this duplication, unless either Posix: or Windows: is given at the start of the test, in which case the test is only executed on the appropriate version.

For QuickCheck testing I defined a FilePath to be a 25 character string, from the following set of characters "?|./:\\abcd 123;_". The idea of this set of characters is to include every character that any aspect of the library treats differently, along with a small selection of "normal" letters/numbers.

There was one modification I had to make to QuickCheck, by default QuickCheck returns successfully outputting success/failure to the console. Unfortunately if an automatic program is executing over 200 tests, then these messages can get obscured in the general noise - this happened more than once. To combat this I defined a new QuickCheck wrapper which calls error on failure. Ideally the signature of quickCheck should be changed to :: .. -> IO Bool to detect these situations and allow the driver script to fail more obviously.

Without QuickCheck I don't think it would have been possible to write the FilePath library - it caught too many obscure bugs that manual testing would never have found. In addition, QuickCheck forced me to think about the properties of the library more closely - I changed some design decisions after it turned out that the properties disagreed with me. The one thing QuickCheck helped with more than anything though was refactoring - despite a massive number of the functions all depending on each other, QuickCheck allows me to change the behaviour of one function in some obscure case and check that no other function was relying on that.

The only criticism that can be levelled at my use of QuickCheck is that failing examples are not minimal, in fact they are exactly 25 characters long. I hope that at some point soon I can make use of SmallCheck (once it has a darcs repo and a .cabal file) to do testing alongside QuickCheck to get a greater depth of coverage.

All the test scripts I have written are available in the darcs repo, under the BSD3. If anyone can make use of them, I'd be happy to have someone take them forward!


Thomas Schilling said...

Are you sure the problem with the too large test cases is not related to your generator? You define the generator, so you can always control the size of the generated test data.

Neil Mitchell said...

Oh, its ENTIRELY due to my generator - I have it hard coded to produce 25 character long FilePath's. It just happens in QuickCheck that controlling a random distribution of length is relatively hard - its a bit too easy to miss some things.

Thomas Schilling said...

Ah. Now that I re-read your post I found it. Yes it's a bit hard to get a good coverage of different lengths and forms. But I'd say the problem lies withing the range of possibilities for which the default of 100 tests is too few. Maybe just increasing the number of test cases will help.

Neil Mitchell said...

That's a really good idea! I think I can change the tests to about 500 or 1000 without altering the amount of time it takes significantly. (why didn't I think of that before...)

Anonymous said...

I just stumbled across a more sensible way to your problem: In the local version at Chalmers, the Arbitrary class has one more function shrink which does exactly what you want: if QC finds a failing result it tries to shrink it as much as possible, as long as the error pertains.

our local QC version (module Chalmers.QuickCheck)
example using shrink

Neil Mitchell said...

Yes, if shrink was standard in QuickCheck (and I think it should be) then I'd definately use that! I had heard of shrink before, but its not that time consuming to minimize an example (maybe 5 minutes) so the payoff isn't there unless shrink is a standard thing.