Saturday, March 24, 2018

Adding Package Lower-bounds

Summary: I hacked Cabal so I could spot where I was missing package lower bounds. The approach has lots of limitations, but I did find one missing lower bound in HLint.

Cabal lets you constrain your dependencies with both upper bounds and lower bounds (for when you are using a feature not available in older versions). While there has been plenty of debate and focus on upper bounds, it feels like lower bounds have been somewhat neglected. As an experiment I decided to modify cabal to prefer older versions of packages, then tried to compile a few of my packages. The approach seems sound, but would require a fair bit of work to be generally usable.

Hacking Cabal

By default Cabal prefers to choose packages that are already installed and have the highest bound possible. The code to control that is in cabal-install/Distribution/Solver/Modular/Preference.hs and reads:

-- Prefer packages with higher version numbers over packages with
-- lower version numbers.
latest :: [Ver] -> POption -> Weight
latest sortedVersions opt =
    let l = length sortedVersions
        index = fromMaybe l $ L.findIndex (<= version opt) sortedVersions
    in  fromIntegral index / fromIntegral l

To change that to prefer lower versions I simply replaced the final expression with fromIntegral (l - index) / fromIntegral l. I also removed the section about giving preferences to currently installed versions, since I wanted the lowest bound to be chosen regardless.

So I didn't mess up my standard copy of Cabal I changed the .cabal file to call the executable kabal.

Testing Kabal on Extra

To test the approach, I used my extra library, and ran kabal new-build all. I used new-build to avoid poluting my global package database with these franken-packages, and all to build all targets. That failed with:

Failed to build Win32-2.2.2.0.
In file included from dist\build\Graphics\Win32\Window_hsc_make.c:1:0:
Window.hsc: In function 'main':
Window.hsc:189:16: error: 'GWL_USERDATA' undeclared (first use in this function)
C:\ghc\ghc-8.2.2/lib/template-hsc.h:38:10: note: in definition of macro 'hsc_const'
     if ((x) < 0)                                      \
          ^

So it seems that Win32-2.2.2.0 claims to work with GHC 8.2, but probably doesn't (unfortunately it's not on the Linux-based Hackage Matrix). We can work around that problem by constraining Win32 to the version that is already installed with --constraint=Win32==2.5.4.1. With that, we can successfully build extra. For bonus points we can also use --enable-test, checking the test suite has correct lower bounds, which also works.

Testing Kabal on Shake

For Shake we start with:

kabal new-build all --constraint=Win32==2.5.4.1 --enable-test

That worked perfectly - either I had sufficient lower bounds, or this approach doesn't do what I hoped...

Testing Kabal on HLint

Trying HLint with our standard recipe we get:

Failed to build ansi-terminal-0.6.2.
[3 of 6] Compiling System.Console.ANSI.Windows.Foreign ( System\Console\ANSI\Windows\Foreign.hs, dist\build\System\Console\ANSI\Windows\Foreign.o )
System\Console\ANSI\Windows\Foreign.hs:90:20: error:
    Ambiguous occurrence `SHORT'
    It could refer to either `System.Win32.Types.SHORT',
                             imported from `System.Win32.Types' at System\Console\ANSI\Windows\Foreign.hs:41:1-25
                          or `System.Console.ANSI.Windows.Foreign.SHORT',
                             defined at System\Console\ANSI\Windows\Foreign.hs:59:1

So it seems ansi-terminal-0.6.2 and Win32-2.5.4.1 don't cooperate. Let's fix that by restricting ansi-terminal==0.7 with another constraint. Now we get:

Preprocessing library for cmdargs-0.10.2..
Building library for cmdargs-0.10.2..
[ 1 of 25] Compiling Data.Generics.Any ( Data\Generics\Any.hs, dist\build\Data\Generics\Any.o )

Data\Generics\Any.hs:65:17: error:
    Variable not in scope: tyConString :: TyCon -> String
   |
65 | typeShellFull = tyConString . typeRepTyCon . typeOf
   |                 ^^^^^^^^^^^

Oh dear, now it's the fault of cmdargs, which is one of my packages! Checking the Hackage Matrix for cmdargs we see:

Namely that 0.10.2 to 0.10.9 don't compile with GHC 8.2. We solve that by going to the maintainers corner and editing the .cabal file of released versions to produce a revision with better bounds - replacing base == 4.* with base >= 4 && < 4.10. Finding the translation from GHC version 8.2 to base version 4.10 involved consulting the magic page of mappings.

After waiting 15 minutes for the package tarballs to update, then doing cabal update, I got to a real error in HLint:

src\Hint\Duplicate.hs:44:37: error:
    * Could not deduce (Default (String, String, SrcSpan))
        arising from a use of `duplicateOrdered'

Looking at the data-default library I see that the Default instance for triples was only introduced in version 0.3. Adding the bounds data-default >= 0.3 to the hlint.cabal dependencies fixes the issue, allowing HLint to compile cleanly.

Next, looking at the commit log, I noticed that I'd recently added a lower bound on the yaml package. I wondered if I removed that bound then it could be detected?

Resolving dependencies...
Error:
    Dependency on unbuildable library from yaml
    In the stanza 'library'
    In the inplace package 'hlint-2.1'

Alas not - Cabal says the library is unbuildable - I don't really know what that means.

Testing Kabal on Ghcid

Trying Ghcid with our standard recipe and accumulated constraints we get:

Preprocessing library for Win32-notify-0.2..
Building library for Win32-notify-0.2..
[1 of 2] Compiling System.Win32.FileNotify ( dist\build\System\Win32\FileNotify.hs, dist\build\System\Win32\FileNotify.o )

src\System\Win32\FileNotify.hsc:29:9: error:
    Ambiguous occurrence `fILE_LIST_DIRECTORY'

So it seems Win32-notify-0.2 and Win32-2.5.4.1 don't cooperate. With that discovery I had used up all the time I was willing to spend and stopped the experiment.

Conclusions

By modifying Cabal to select for older packages I was able to find and fix a lower bound. However, because all my dependencies aren't lower-bound safe, it became a somewhat manual process. To be practically useful the prinple of correct lower-bounds needs adopting widely. Some notes:

  • The Hackage Matrix provides a large amount of actionable intelligence - a great experience. However, fixing the issues it discovers (actually adding the bounds) is frustratingly manual, requiring lots of clicks and edits in a textbox.
  • Using cabal new-build caused each directory to gain a .ghc.environment.x86_64-mingw32-8.2.2 file, which silently recongfigured ghc and ghci in those directories so they stopped working as I expected. Not a pleasant experience!
  • I ran my tests on Windows, and most of the dependencies with incorrect bounds were Windows-specific issues. Maybe Linux would have had less lower-bound issues?
  • I used a pretty recent GHC, which excludes a lot of older versions of packages because they don't work on newer GHC versions - picking the oldest-supported GHC would probably have found more bounds.
  • Are lower bounds actually useful? If you ignore which packages are globally installed (which both stack and cabal new-build effectively do) then the only reason to be constrained to an older version is by upper bounds - in which case solving excessive upper-bounds is likely to give more actual benefit.
  • I'm not the first person to think of constraining cabal to use older versions - e.g. Cabal bug 2876 from 2015.
  • The Trustee tool can infer minimum bounds, but it's Linux only so doesn't work for me. It is probably better for people who want to do their own bound checking.
  • Compiling kabal required a bit of trial and error, I eventually settled on compiling each dependent Cabal package in turn into the global package database, which wasn't ideal, but did work.

Friday, March 09, 2018

Safe Library with better Stack Traces

Summary: The safe library now provides error messages with short and informative stack traces on errors.

The Haskell base library contains a number of functions that crash under certain circumstances, e.g. tail []. The safe library attempts to tame these functions, providing tailDef (which uses a default value on []) and tailNote (which gives you a chance to provide some extra information if there is a failure). Since GHC 7.10 there has been support for adding stack traces to exceptions, where if any function with an appropriate annotation calls error it will include some stack trace.

Just over a year ago the safe library introduced a Partial constraint in Safe.Partial to declare that a function is partial, which also provides the annotations for error. The signature of tailNote became:

tailNote :: Partial => String -> [a] -> [a]

This signature has two benefits - it is visible in the documentation and it provides the stack trace if something goes wrong. If you typed tailNote "uhoh" [] in ghci you got:

Prelude Safe> tailNote "uhoh" []
*** Exception: Safe.tailNote [], uhoh
CallStack (from HasCallStack):
  error, called at .\Safe\Util.hs:23:44 in safe-0.3.16-9YcgrXj17kg79mfNx7tCoF:Safe.Util
  fromNoteModule, called at Safe.hs:65:12 in safe-0.3.16-9YcgrXj17kg79mfNx7tCoF:Safe
  fromNote, called at Safe.hs:108:17 in safe-0.3.16-9YcgrXj17kg79mfNx7tCoF:Safe
  tailNote, called at <interactive>:5:1 in interactive:Ghci1

Great - we can see the final line says we were on line 5 of the interactive and ran tailNote. Useful, but with the new version of safe it's even better:

*Main> tailNote "uhoh" []
*** Exception: Safe.tailNote [], uhoh
CallStack (from HasCallStack):
  tailNote, called at <interactive>:1:1 in interactive:Ghci1

We still get the interesting final line, but all the internal details of safe, e.g. the fact that tailNote calls fromNote have disappeared.

To get the stack traces just add Partial to any function you believe to be partial - it's easy. If you are happy to stick with GHC 8.0 and above you can use HasCallStack from GHC.Stack without depending on safe. I am slowly adding annotations to my packages, for example the extra library has Partial annotations.

Supressing the internal functions was a lot more work. I think it's worth it for a package that is all about nice handling of errors, but I probably won't bother in any of my other packages. The change to the code was going from:

tailNote note = fromNote note "tailNote []" . tailMay

To:

tailNote note x = withFrozenCallStack $ fromNote note "tailNote []" $ tailMay x

The main change is we've added withFrozenCallStack, which freezes the call stack at this point and stops new entries from being accumulated inside the bowels of our library. The other change was to eta-expand the definition by adding x on both sides, so that withFrozenCallStack gets to block the actual error, and not merely a function that later produces an error.

Partial constraints are very powerful, and I hope in time they are adopted universally throughout the Haskell ecosystem. One day I hope the Prelude.tail function will also have a Partial constraint.