Tuesday, January 27, 2015

Hoogle 5 is coming

Summary: I'm working on Hoogle 5. If you like unfinished software, you can try it.

For the last month I've been working on the next version of Hoogle, version 5. It isn't finished, and it isn't the "official" version of Hoogle yet, but it's online, you can try it out, and anyone who wants to hack on Hoogle (or things like hoogle-index) should probably take a look now.

How do I try it?

The purpose of this blog post isn't to solicit beta testers, it's not ready for that - I'm mostly reaching out to Hoogle developers. That said, I know some people will want to try it, so it's online at hoogle.haskell.org. Beware, it isn't finished, and haskell.org/hoogle remains the "official" version to use (but hey, use whichever you want, or both, or neither).

What's new for users of the website?

  • It isn't finished. In particular, type search hasn't been implemented. But there are also lots of other pieces in all the corners that don't work properly.
  • It searches all the packages on Stackage by default.
  • There is a drop-down to allow you to restrict focus to only a single package, or an author, or a tag.

What's new for everyone else?

  • There is no library, it's not on Hackage and the command line tool isn't designed for users. These things will be coming over time.
  • It's hosted on a haskell.org virtual machine, served through Warp rather than as a CGI program. Thanks to the Haskell Infrastructure team for all their help. As a result, I'm free to experiment with Hoogle without accidentally making the Haskell homepage say "moo".
  • Generating database for all of Stackage (minus download time) takes about 40s and uses < 1Gb of memory. The databases are built directly out of the .tar.gz files, without unpacking them.

What's next?

Update: the hogle code is now on the master branch of the hoogle repo, and the hoogle v4 code is on branch hoogle4.

Hoogle 5 is a complete rewrite of Hoogle, stealing bits as they were useful. At the moment Hoogle 4 is hosted at https://github.com/ndmitchell/hoogle while version 5 is at https://github.com/ndmitchell/hogle. The next step is to rename and move github repos so they are sensible once again. My best guess is that I should rename hoogle to hoogle4, then rename hogle to hoogle and move the issue tickets. I'm open to other suggestions.

Once that's resolved, I need to continue fixing and improving Hoogle so all the things that aren't finished become finished. If you are interested in helping, I recommend you create a github ticket describing what you want to do and we can take it from there.


Daniel Swe said...

Maybe a silly question but why not make a Hoogle 5 branch instead of a complete new repo? I think that will be less confusing for people and in turn get more people to contribute. Also I would love to learn the absolute basics of how the search algorithm works.

Neil Mitchell said...

Daniel, that's the other obvious alternative. The only issue is that then hoogle4 becomes a branch in the existing repo, it's harder to get back at the old code, it isn't as easily searchable/viewable in Github etc, I need two checkouts of the same repo at different branches etc. Basically, it's leaning more on git and less on github, and my git-fu is relatively weak. But if the consensus is that's a better solution, I'll happily go for it (there are certainly clear advantages as well).

Jaseem Abid said...

Niel, Using a different repository for multiple versions is not a new idea, and is used by projects like ZMQ. It has its benefits, but I think its a bad idea. The extra git stuff to learn is not much and genuinely those are the best parts of git. You could rename current master to hoogle4 and call the current hoogle5 branch master.

Keeping everything in one place makes things really simple in the long run. You could just tag the bugs with `4` or `5` on github to distinguish things rather than switch b/w 2 issue trackers. Back porting fixes from 5 to 4 will be just a cherry pick. You get all the version history. People have one place to look for all code, new and historical.

I'm more than willing to put all the efforts to make this transition easy for you if you need any help at all.

Neil Mitchell said...

Daniel: I didn't really mention much about how the search algorithm works. There are the text algorithms and the type algorithms. The text ones are relatively standard, ranging from sorted lists, to FM_index, and Hoogle 5 is likely just to be brute force. The type search has gone through 4 iterations, and the best information is probably here - http://community.haskell.org/~ndm/downloads/slides-hoogle_finding_functions_from_types-16_may_2011.pdf. Hoogle 5 will probably have radically different type search again, and I'll blog that when I see it.

Jaseem, thanks for your comments. I don't think there will be any cherry picking (they share almost no code), but having everything in one repo does seem to be the favored solution, so I'll go that way. Thanks for the offer of help, but I have a few colleagues who are Git experts and have helped me previously, so I'll try them first - taking a look, I might even be able to do it on my own :)

Neil Mitchell said...

I have now put the hoogle v5 code on the master branch of the hoogle repo, and put the hoogle v4 code on the hoogle4 branch, as most people suggested.

Daniel Swe said...

Thanks Neil. I have downloaded the slides and will take a look with a cup of Joe.