Thursday, February 28, 2008

Adding data files using Cabal

Cabal is the standard method of packaging Haskell programs and libraries for release. One problem I've encountered more than once is that adding data files to a Cabal built project is not as easy as it could be. I'm not entirely sure why - having just added data file support to Hoogle, it wasn't excessively painful, but I still came out of the experience feeling slightly bruised. To help others (and my future self), I thought I'd write down the details while they are still freshly spinning round my head.

Let's assume we start with an existing Cabal project, with an associated .cabal file. In the root directory of the project we have readme.txt and data.txt. The readme file contains a basic introduction to the user, and the data file contains some data that the program needs to access at runtime.

We first modify the .cabal file to add the following lines in the top section:


The Extra-Source-Files tells Cabal to put the files in the release tarball, but nothing more - for a readme this behaviour is perfect. The Data-Files section tells Cabal that the following files contain data which the program will want to access at runtime. Data files include things like big tables, the hoogle function search database, graphics/game data files for games, UI description files for GUI's, etc.

Now we have added the data file to Cabal's control, Cabal will automatically manage it for us. It will be added to the source tarball, and will be installed somewhere appropriate on the users system, following operating system guidelines. The only question is where Cabal has put the file. To figure this out, Cabal generates a Paths_hoogle module (change the project name as appropriate) which it links in with the program. The Paths module provides the function:

getDataFileName :: FilePath -> IO FilePath

At runtime, to find the data file, we can simply call getDataFileName "data.txt", and Cabal will tell us where the data file resides.

The above method works well after a program has been installed, but is harder to work with while developing a program. To alleviate these problems, we can add our own Paths module to the program, for example:

module Paths_hoogle where

getDataFileName :: FilePath -> IO FilePath
getDataFileName = return

Place this module alongside all the other modules. While developing the program our hand-created Paths module will be invoked, which says the data is always in the current directory. When doing a Cabal build, Cabal will choose its custom generated Paths module over ours, and we get the benefits of Cabal managing our data.

Cabal's support for data files, and extra source files, is very useful. It doesn't take much work to make use of the provided facilities, and it will help to ensure that users of your program on all operating systems get the style of installation they were expecting.


Anonymous said...

One thing missing is the the possiblity to add recursively a data directory.

For a game, you will have plenty of data files, and adding them all is error-prone.

Anonymous said...

Once again your blog proves to be an excellent source of information.


Rickard Lindberg said...

I could not get my own Paths module to work for local development. But what I found worked was to set an environment variable before running the program:

tracker_datadir=. ./dist/build/tracker/tracker

Thought it might be useful to others ending up here.

Anonymous said...

thanks, very useful! please allow me to add another hint.

i found that in my project, cabal picks Paths_bla.hs from the local path (perhaps because i import things differently from hoogle and ghc grabs it even though cabal is unaware of it).

to keep this from happening, i moved Paths_bla.hs from ./ to ./.ghci-only/, and added the following line to my local .ghci:

:set -i.ghci-only/