Wednesday, August 27, 2008

Hoogle Database Generation

Brief Annoucement: A new release of the Hoogle command line is out, including bug fixes and additional features. Upgrading is recommended.

Two interesting features of Hoogle 4 are working with mulitple function databases (from multiple packages), and running your own web server. Both these features aren't fully developed yet, and may change in their use, but can be used with care. This post covers how to generate your own databases, and how the web version databases are generated. Tomorrow I'm going to post on how to run your own Hoogle web server, but you'll need to generate your databases first! I'm going to walk through all the steps to create a database from the filepath library, as an example

Hoogle Databases

A Hoogle database is a set of searchable things, including text and type searching, and has a ".hoo" extension. A database may include the definitions from one package, or from multiple packages. Typically the Hoogle databases installed would include one database for each package (i.e. base.hoo, filepath.hoo), a default database (default.hoo) comprising of all the standard search items, and any number of custom databases (all.hoo) which comprise of different combinations of the other databases.

When using Hoogle, adding +name will include the given database in the search list, and -name will exclude the given package from the search. By default, Hoogle will use default.hoo, but if any +name commands are given then those databases will be used instead.

Hoogle looks for databases in the current directory, in the data directory specified by Cabal, and in any --include directories passed at the command line.

Step 1: Creating a Textbase

A Textbase is a textual representation of a function database. To generate a textbase you need to install the darcs version of Haddock, then use runhaskell Setup haddock --hoogle on your package. For filepath, this will create the file dist/doc/html/filepath/filepath.txt, which is a textbase.

Step 2: Converting a Textbase to a Database

To convert a textbase to a database use the command hoogle --convert=filepath.txt in the appropriate folder. If a package depends on any other packages, then adding +package will allow Hoogle to use the dependencies to generate a more accurate database. In the case of filepath, which depends on base, we use hoogle --convert=filepath.txt +base. This command requires base.hoo to be present.

Adding the dependencies is not strictly necessary, but will allow Hoogle to generate a more accurate database. For example, the base package defines type String = [Char], without the +base flag this type synonym would not be known to Hoogle.

We now have filepath.hoo, which can be used as a search database.

Step 3: Combining Databases

To generate a database comprising of both filepath and base, type hoogle --output=default.hoo --combine=filepath.hoo --combine=base.hoo. By combining databases you allow easy access to common groups of packages, and searching all these packages at once becomes faster than listing each database separately.

Web Version Databases

The web version uses the Hackage tarballs to generate documentation for most of its databases, but also has three custom databases:


  • base - the base package is just too weird, and isn't even on hackage. A darcs version and some tweaking is required.

  • keyword - the keyword database is a list of the keywords in Haskell, and is taken from the web page on the wiki.

  • hackage - the hackage database is a list of all the packages on Hackage, indexed only by the package name.



All the code for generating the web version databases is found in data/generate in the Hoogle darcs repo at http://code.haskell.org/hoogle.

Future Improvements

There are two database related tasks that still need to be done: Cabal integration and indexing all of Hackage.

Bug 80: In the future I would like Hoogle databases to be generated by Cabal automatically on installing a package. Unfortunately, I don't have the time to implement such a feature currently, and even if I did implement it, I'm unlikely to ever use it. If anyone wants to work on this, please get in contact. This is mainly a project working with Cabal.

Bug 79: The other work is to index all the packages on Hackage. The problem here is generating the textbases, once they have been created the rest is fairly simple. However, to run Haddock 2 over a package requires that the package builds, and that all the dependencies are present. Unfortunatley my machine is not powerful enough to cope with the number of packages on Hackage. Hopefully at some point the machinery that builds Haddock documentation for Hackage will also generate textbases, however in the mean time if someone wants to take on the task of generating textbases for Hackage, please get in contact.

Bug Tracker

I'm not working on Hoogle full-time anymore, so am using my bug tracker to keep track of outstanding issues. In order to interact more effectively with my bug tracker, you might want to read this guide. It describes how to vote for bugs etc.

2 comments:

Jim said...

Thanks for explaining this! I can generate .hoo files for the packages now, but is it possible for me to generate a new base for my installed ghc-6.12.3? Thanks.

Neil Mitchell said...

Jim: Unfortunately not, generating packages for base is always a large amount of work, and usually requires some manual tweaking. I hope to get new packages for the new base done in the next few weeks.