For the last few days I've been rewritting the parsing in Hoogle to use parsec. As an end result, the parser is a lot more powerful, and more maintainable, and extendable - on the downside its longer and more complex.
The thing that most impressed me about parsec was its compositionality. In Hoogle there are type signatures, which are both on database lines (map :: (a -> b) -> [a] -> [b]), and there are user queries which might have a type signature in them, amongst other junk. Thanks to parsec I can use the same type signature parser in both of them, with extensions for the relevant bits. I couldn't really do this with a traditional yacc/bison/happy parser generator. Its also great for the fuzzy nature of user searches - you don't want to parse error if there is any sensible user interpretation of what they wrote.
Thanks to this rewrite, I now get a few query goodies that I always wanted but could never properly parse. Included in this are multiple words "concat map", names and type signatures "map :: [a] -> [a]" and the search parser now checks for command line options, which stops the bugs with things like "->" being misinterpretted.
Wednesday, August 23, 2006
Thursday, August 17, 2006
Neil vs Cabal
Today I've been trying to get into cabal, since it seems a pretty cool thing, and it looks like the way forward.
As I've been doing this, trying to compile various projects using Cabal, it turns out that I spent all day encountering bugs! I've hit things that seem a bit curious in loads of programs, sent off patches, reported bugs, asked for clarification etc. Hopefully this will be fixed at some point soon, after enough people have bashed through it.
In particular, to try and get this going better, I'm going to try and keep the HEAD versions of various projects compiled regularly with Cabal on Windows - and then probably distribute the binaries as part of my Haskell on Windows drive.
I'm also trying to get Hoogle working properly with Cabal, since thats going to be the future way of building it, probably.
Which brings me on to a final question about the Hoogle license, what should it be? Currently Hoogle is licensed under Creative Commons Attribution-NonCommercial-ShareAlike License. Nothing else in the Haskell world is, so its not particularly sensible that Hoogle is. My basic thoughts are GPL vs BSD. What do people think one way or the other?
As I've been doing this, trying to compile various projects using Cabal, it turns out that I spent all day encountering bugs! I've hit things that seem a bit curious in loads of programs, sent off patches, reported bugs, asked for clarification etc. Hopefully this will be fixed at some point soon, after enough people have bashed through it.
In particular, to try and get this going better, I'm going to try and keep the HEAD versions of various projects compiled regularly with Cabal on Windows - and then probably distribute the binaries as part of my Haskell on Windows drive.
I'm also trying to get Hoogle working properly with Cabal, since thats going to be the future way of building it, probably.
Which brings me on to a final question about the Hoogle license, what should it be? Currently Hoogle is licensed under Creative Commons Attribution-NonCommercial-ShareAlike License. Nothing else in the Haskell world is, so its not particularly sensible that Hoogle is. My basic thoughts are GPL vs BSD. What do people think one way or the other?
Wednesday, August 16, 2006
Hoogle 4 plans
After spending about the last 3 months replying to most people's Hoogle comments with "that will be fixed in Hoogle 4", its about time I actually implement Hoogle 4. Just to give people an idea of where I'm going, I thought I'd summarise what Hoogle 4 means to me :)
First off, I abandoned .ths after talking with Niklas who does Haskell-source-exts and HSP, and will be using his stuff. Its got advantages of tag safety and a better syntax, lacks a few bits, but being written by someone else saves me a bit of work. Its also well supported, something thats essential!
Anyway, the plans for Hoogle 4 fall into a few areas:
No bugs: type classes, type aliases, higher kinded type classes - all these things confuse Hoogle. Either they are bugs, or they come close enough to count as them. These will all be fixed.
Help the user: searching for ThreadID doesn't work (its ThreadId), searching for Just a doesn't work, searching for Maybe -> Int doesn't work, all just fail silently. I want to tell people when they are searching a bit dubiously and fix it for them.
Do what users keep asking for: often users do searches for multiple words, "map concat", this currently gives them very confusing results (the type "m a" is equivalent).
Be a database: I want to give more database like features, lookup Just will give you the functions that use it.
Faster: I want to make text searching 100's of times faster, which isn't so the results come back faster but so that...
Packages: I want to be able to search packages other than the default ones, such as Gtk2hs (which you can already Hoogle search), and lots lots more. Which requires a speed boost.
AJAX: I have a few good AJAX ideas for making searching just a little bit quicker.
Lots to do, and will probably be an entire rewrite (again...), but this is hopefully going to be the version that sticks arond for a very long time and comes out of Beta.
First off, I abandoned .ths after talking with Niklas who does Haskell-source-exts and HSP, and will be using his stuff. Its got advantages of tag safety and a better syntax, lacks a few bits, but being written by someone else saves me a bit of work. Its also well supported, something thats essential!
Anyway, the plans for Hoogle 4 fall into a few areas:
No bugs: type classes, type aliases, higher kinded type classes - all these things confuse Hoogle. Either they are bugs, or they come close enough to count as them. These will all be fixed.
Help the user: searching for ThreadID doesn't work (its ThreadId), searching for Just a doesn't work, searching for Maybe -> Int doesn't work, all just fail silently. I want to tell people when they are searching a bit dubiously and fix it for them.
Do what users keep asking for: often users do searches for multiple words, "map concat", this currently gives them very confusing results (the type "m a" is equivalent).
Be a database: I want to give more database like features, lookup Just will give you the functions that use it.
Faster: I want to make text searching 100's of times faster, which isn't so the results come back faster but so that...
Packages: I want to be able to search packages other than the default ones, such as Gtk2hs (which you can already Hoogle search), and lots lots more. Which requires a speed boost.
AJAX: I have a few good AJAX ideas for making searching just a little bit quicker.
Lots to do, and will probably be an entire rewrite (again...), but this is hopefully going to be the version that sticks arond for a very long time and comes out of Beta.
Labels:
hoogle
Sunday, August 13, 2006
.ths (Textual Haskell Source)
I've been doing some work on Hoogle 4 over the last week while I've been away from a computer. Lots of cool new ideas, some paper code, and other goodies - will probably be a few weeks before I start to crank out implementations and improvements get seen on the main Hoogle site, and perhaps 2 or 3 months before Hoogle 4 starts to take shape.
Anyway, one thing Hoogle needs to do is to output a web page, and at the moment it does that by reading in text files, and writing them out. To do a typical search page it shoves out a standard prefix, a top bit, the answers and a suffix. Only the answers are generated on the fly, the others are included in. Of course, this means that the HTML is in 4 places, and the reusability is poor (files are chunks of text, not functions). The pages also have small tweaks to them based on dynamic data - for example the title of the page is the search query. To accomodate this I had to add $ replacement, so $ becomes the search query. Messy, and not very general.
So to answer all this, I devised .ths - Textual Haskell Source. Currently you have .hs (source code is the main thing), .lhs (comments are the main thing) and now you have .ths (text is the main thing). Lets start with an example:
> showHtml :: String -> String
> showHtml search = pure
[html]
[head]
[title]<% search %> - Hoogle[/title]
[/head]
Note that in this example I am escaping the code (with > ticks), and the text is just the main bit. I also have <% code %> which is substituted inline.
I can also do more advanced things (naturally)
> showList :: FilePath -> [Bool] -> IO String
> showList filename items =
> src <- readFile filename
The length of <% filename %> is <% length src %>
And the booleans are
<%for i = items %> <%if i %>ON<%else%>off<%end if%><%end for%>
I have all these bits implemented, and hope to make a release in a few days. I kind of have to release, because the current darcs version of Hoogle will be using them in a few days anyway.
And of course, since all this stuff is Haskell, its easier to compose, call as functions, etc.
Thoughts or comments?
Anyway, one thing Hoogle needs to do is to output a web page, and at the moment it does that by reading in text files, and writing them out. To do a typical search page it shoves out a standard prefix, a top bit, the answers and a suffix. Only the answers are generated on the fly, the others are included in. Of course, this means that the HTML is in 4 places, and the reusability is poor (files are chunks of text, not functions). The pages also have small tweaks to them based on dynamic data - for example the title of the page is the search query. To accomodate this I had to add $ replacement, so $ becomes the search query. Messy, and not very general.
So to answer all this, I devised .ths - Textual Haskell Source. Currently you have .hs (source code is the main thing), .lhs (comments are the main thing) and now you have .ths (text is the main thing). Lets start with an example:
> showHtml :: String -> String
> showHtml search = pure
[html]
[head]
[title]<% search %> - Hoogle[/title]
[/head]
Note that in this example I am escaping the code (with > ticks), and the text is just the main bit. I also have <% code %> which is substituted inline.
I can also do more advanced things (naturally)
> showList :: FilePath -> [Bool] -> IO String
> showList filename items =
> src <- readFile filename
The length of <% filename %> is <% length src %>
And the booleans are
<%for i = items %> <%if i %>ON<%else%>off<%end if%><%end for%>
I have all these bits implemented, and hope to make a release in a few days. I kind of have to release, because the current darcs version of Hoogle will be using them in a few days anyway.
And of course, since all this stuff is Haskell, its easier to compose, call as functions, etc.
Thoughts or comments?
Subscribe to:
Posts (Atom)