This week I was going to tackle type searching, but then realised I'm going to spend 6 hours on Friday on a train (hence the weekly update on Thursday), so can spend that time productively working on paper tackling type search. So instead of type search, I worked on a few other pieces, some of which make type search easier:
Haddock Database Generation More patches to get better output from Haddock. The code now handles class methods properly, and deals with some FFI bits.
Lazy Name Searching Searching for a name is now fairly lazy. When searching for a name, Hoogle can return the prefix of the results without doing too much computation to calculate all the results. This work is useful in its own right, but very necessary for the type searching, and can be reused.
Hoogle --info The biggest feature added this week is the --info flag. When this flag is given, Hoogle picks the first result and gives more details, including any Haddock documentation associated with the function. For example:
$ hoogle +tagsoup openurl --info
Text.HTML.Download openURL :: String -> IO String
This function opens a URL on the internet. Any http:// prefix is ignored.
> openURL "www.haskell.org/haskellwiki/Haskell"
Known Limitations:
* Only HTTP on port 80
* Outputs the HTTP Headers as well
* Does not work with all servers
It is hoped that a more reliable version of this function will be placed in a new HTTP library at some point!
Next week: Type searching! See last week for a description of what I hope to achieve.
User visible changes: The --info flag now exists.
Thursday, June 26, 2008
Sunday, June 22, 2008
GSoC Hoogle: Week 4
This week I've stayed in one place, and had lots of opportunity to get on with Hoogle. I've done a number of different things this week:
More on Haddock databases I fixed a number of issues with the Haddock generated Hoogle information. These patches have been submitted back to Haddock.
Binary Defer library I merged the binary defer library into the Hoogle sources, and modified it substantially. Some of the modifications were thanks to suggestions from the Haskell community, particularly David Roundy. The library is now more robust, and is being used as a solid foundation to build the rest of Hoogle on top of.
Text Searching You can now search for words, even multiple words, and the search will be performed. The text searching uses efficient data structures, scales excellently, and returns better results first.
Suggestions These improvements were detailed earlier in the week.
Next week: Type searching. I have various ideas on how to go about this, but it is the most tricky part of the whole project. I hope to come up with the perfect solution by the end of the week, but if not, will come up with something good enough for Hoogle 4 then revise it after the Summer is over (it could easily suck in a whole Summer of time if I am not careful!). Much of the low-level infrastructure is already present, so it is just the search algorithm.
User visible changes: Text searching works. A session with Hoogle as it currently stands:
.
More on Haddock databases I fixed a number of issues with the Haddock generated Hoogle information. These patches have been submitted back to Haddock.
Binary Defer library I merged the binary defer library into the Hoogle sources, and modified it substantially. Some of the modifications were thanks to suggestions from the Haskell community, particularly David Roundy. The library is now more robust, and is being used as a solid foundation to build the rest of Hoogle on top of.
Text Searching You can now search for words, even multiple words, and the search will be performed. The text searching uses efficient data structures, scales excellently, and returns better results first.
Suggestions These improvements were detailed earlier in the week.
Next week: Type searching. I have various ideas on how to go about this, but it is the most tricky part of the whole project. I hope to come up with the perfect solution by the end of the week, but if not, will come up with something good enough for Hoogle 4 then revise it after the Summer is over (it could easily suck in a whole Summer of time if I am not careful!). Much of the low-level infrastructure is already present, so it is just the search algorithm.
User visible changes: Text searching works. A session with Hoogle as it currently stands:
> cabal haddock --hoogle
-- generates tagsoup.txt
> hoogle --convert=tagsoup.txt
Generating Hoogle database
Written tagsoup.hoo
> hoogle +tagsoup is open --color
Text.HTML.TagSoup.Type isTagOpen :: Tag -> Bool
Text.HTML.TagSoup.Type isTagOpenName :: String -> Tag -> Bool
.
Wednesday, June 18, 2008
Hoogle 4 New Features
I'm still developing Hoogle 4, and there are many things that don't work (such as searching for types and the web version). However, it's starting to come together, and I'm beginning to implement new features that aren't in Hoogle 3. Today I've implemented two useful features.
Multi Word Text Search
In Hoogle 3, if you entered "is just" it would be treated as a type search, exactly the same as "m a". Now, it will search for "is" and search for "just" and intersect the results. This seems to be something that people often try, so hopefully will make Hoogle more intuitive.
Intelligent Suggestions
Hoogle 3 tries to give suggestions, for example if I search for "a -> maybe a" it will helpfully suggest "a -> Maybe a". Unfortunately it's not that clever. If your search term contains a type variable (starting with a lower-case letter), which is more than one letter, it will suggest you try the capitalised version. For example, "(fst,snd) -> snd" will suggest "(Fst,Snd) -> Snd", which isn't very helpful.
The new mechanism uses knowledge about the types, arities and constructors present in the Hoogle database. Some examples:
.
Multi Word Text Search
In Hoogle 3, if you entered "is just" it would be treated as a type search, exactly the same as "m a". Now, it will search for "is" and search for "just" and intersect the results. This seems to be something that people often try, so hopefully will make Hoogle more intuitive.
Intelligent Suggestions
Hoogle 3 tries to give suggestions, for example if I search for "a -> maybe a" it will helpfully suggest "a -> Maybe a". Unfortunately it's not that clever. If your search term contains a type variable (starting with a lower-case letter), which is more than one letter, it will suggest you try the capitalised version. For example, "(fst,snd) -> snd" will suggest "(Fst,Snd) -> Snd", which isn't very helpful.
The new mechanism uses knowledge about the types, arities and constructors present in the Hoogle database. Some examples:
"Just a -> a" ===> "Maybe a -> a"
"a -> Maybe" ===> "a -> Maybe b"
"a -> MayBe a" ===> "a -> Maybe a"
"a -> maybe a" ===> "a -> Maybe a"
.
Sunday, June 15, 2008
darcs over FTP
I'm currently unable to access SSH, and suspect this situation will persist for most of the Summer. Most of my darcs repo's are behind SSH, so this presents a problem. I've been looking for a way to work with darcs over FTP, and have managed to get it going on Windows. The following are instructions for (1) me when I forget them and (2) any Windows users who want to follow the same path. If you are a Linux user, then similar information is available from this blog post.
Step 1: Install Sitecopy
Go to http://dennisbareis.com/freew32.htm and download and install SITECPY.
Add "C:\Program Files\SITECOPY" to your path.
Add "C:\Home" to a %HOME% variable.
Open up a command line and type:
Step 2: Prepare the FTP site
Go to the FTP site, and create a directory. In my particular example, I created darcs/hoogle so I could mirror the Hoogle repo.
Step 3: Configure Sitecopy
Edit the file "c:\home\.sitecopyrc" to contain:
Obviously, substituting in your relevant details.
Step 4: Initialise Sitecopy
Type:
darcs push
Now to do a darcs push, you can type:
The first copy will take a long time, but subsequent copies should be a lot faster.
darcs pull
After all this, you can either pull using FTP, or if your FTP is also a web site, you can pull over http. For example:
Step 1: Install Sitecopy
Go to http://dennisbareis.com/freew32.htm and download and install SITECPY.
Add "C:\Program Files\SITECOPY" to your path.
Add "C:\Home" to a %HOME% variable.
Open up a command line and type:
c:\> mkdir home
c:\> cd home
c:\home> mkdir .sitecopy
c:\home> echo . > .sitecopyrc
Step 2: Prepare the FTP site
Go to the FTP site, and create a directory. In my particular example, I created darcs/hoogle so I could mirror the Hoogle repo.
Step 3: Configure Sitecopy
Edit the file "c:\home\.sitecopyrc" to contain:
site hoogle
server ftp.york.ac.uk
username ndm500
local C:\Neil\hoogle
remote web/darcs/hoogle
port 21
Obviously, substituting in your relevant details.
Step 4: Initialise Sitecopy
Type:
sitecopy --init hoogle
darcs push
Now to do a darcs push, you can type:
sitecopy --update hoogle
The first copy will take a long time, but subsequent copies should be a lot faster.
darcs pull
After all this, you can either pull using FTP, or if your FTP is also a web site, you can pull over http. For example:
darcs get http://www-users.york.ac.uk/~ndm500/darcs/hoogle/
GSoC Hoogle: Week 3
This week I've travelled a further 600 miles by train, but am now starting to get settled for the Summer, and down to work on Hoogle.
My main focus this week has been getting Haddock to generate Hoogle databases. For Haddock 0.8 I added in a --hoogle flag to generate Hoogle databases, and a similar --hoogle flag to Cabal. Unfortunately, for Haddock 2.0, the feature was removed as most of the code got rewritten. Now I've added the feature back, making extensive use of the GHC API to reduce the amount of custom pretty-printing required, and to support more Haskell features. The code has been added to the development Haddock branch, and will be present in the next release.
Most of the challenge was working with the GHC API. It's certainly a powerful body of code, but suffers from being inconsistent in various places and poorly documented. I mainly worked with the code using :i to view the API. I got bitten by various problems such as the Outputable module exporting useful functions such as mkUserStyle :: QueryQualifies -> Depth -> PprStyle, but not exporting any functions that can create a Depth value, and therefore not actually being usable. If Hoogle and Haddock could be used over the GHC API, it would substantially improve the development experience!
I've also worked more on defining the database format. I am about to start work on the implementation today. I've also added a few more command line flags, but mainly as placeholders.
Next week: Database creation and text searches (looking back I see some similarity to last week!)
User visible changes: haddock --hoogle now works.
My main focus this week has been getting Haddock to generate Hoogle databases. For Haddock 0.8 I added in a --hoogle flag to generate Hoogle databases, and a similar --hoogle flag to Cabal. Unfortunately, for Haddock 2.0, the feature was removed as most of the code got rewritten. Now I've added the feature back, making extensive use of the GHC API to reduce the amount of custom pretty-printing required, and to support more Haskell features. The code has been added to the development Haddock branch, and will be present in the next release.
Most of the challenge was working with the GHC API. It's certainly a powerful body of code, but suffers from being inconsistent in various places and poorly documented. I mainly worked with the code using :i to view the API. I got bitten by various problems such as the Outputable module exporting useful functions such as mkUserStyle :: QueryQualifies -> Depth -> PprStyle, but not exporting any functions that can create a Depth value, and therefore not actually being usable. If Hoogle and Haddock could be used over the GHC API, it would substantially improve the development experience!
I've also worked more on defining the database format. I am about to start work on the implementation today. I've also added a few more command line flags, but mainly as placeholders.
Next week: Database creation and text searches (looking back I see some similarity to last week!)
User visible changes: haddock --hoogle now works.
Monday, June 09, 2008
GSoC Hoogle: Week 2
This week I submitted my PhD thesis, emptied my entire rented house of furniture, spent £96 on petrol, drove (or was driven) 400 miles, travelled a similar distance by train, have been to the north of Scotland and am currently working on a borrowed Mac in London. Needless to say, its been rather busy - but now all the excitement is over and I should be able to focus properly on Hoogle.
In the last week I've been focusing on the database, the store of all the function names and type signatures, so a very critical piece of information. I want to support fast searching, which doesn't slow down as the number of known functions increases - a nasty property of the current version. For text searching, the trie data structure has this nice property, and can deal with searching for substrings. For fuzzy type searching, things are a lot more complex. However, I think I have an algorithm which is fast (few operations), accurate (gives better matches), scalable (independent of the number of functions in the database) and lazy (returns the best results first). The idea is to have a graph of function results, and then navigate this graph to find the best match.
Most of the database work has been theoretical, but I have done some coding. In particular, I have started on the database creation code, and polished the flag argument interaction code some more. Part of the development required the Derive tool, and in doing this work I noticed a few deficiencies. In particular, if you run Windows and run derive over a UNIX line-ending file, the tool will generate a Windows line-ending file. This problem, and a few others, are now fixed.
Next week: Database creation and searching. I want text searches to work by the end of the week.
User visible changes: The --help flag prints out information on the arguments.
PS. I was looking forward to seeing some blog posts from the other Haskell summer of code students on the Haskell Planet. If any Haskell GSoC student does have a blog, please ask for it to be included!
In the last week I've been focusing on the database, the store of all the function names and type signatures, so a very critical piece of information. I want to support fast searching, which doesn't slow down as the number of known functions increases - a nasty property of the current version. For text searching, the trie data structure has this nice property, and can deal with searching for substrings. For fuzzy type searching, things are a lot more complex. However, I think I have an algorithm which is fast (few operations), accurate (gives better matches), scalable (independent of the number of functions in the database) and lazy (returns the best results first). The idea is to have a graph of function results, and then navigate this graph to find the best match.
Most of the database work has been theoretical, but I have done some coding. In particular, I have started on the database creation code, and polished the flag argument interaction code some more. Part of the development required the Derive tool, and in doing this work I noticed a few deficiencies. In particular, if you run Windows and run derive over a UNIX line-ending file, the tool will generate a Windows line-ending file. This problem, and a few others, are now fixed.
Next week: Database creation and searching. I want text searches to work by the end of the week.
User visible changes: The --help flag prints out information on the arguments.
PS. I was looking forward to seeing some blog posts from the other Haskell summer of code students on the Haskell Planet. If any Haskell GSoC student does have a blog, please ask for it to be included!
Sunday, June 01, 2008
GSoC Hoogle: Week 1
I started my Google Summer of Code project on Hoogle at the beginning of this week. In my initial application I promised to make my weekly updates via blog, so here is the first weeks report:
I've only done about half a weeks work on Hoogle this week, because I'm handing in my PhD thesis early next week, and because I'm moving house on Wednesday. I spent 14 hours on Saturday moving furniture, and many more hours than that on my thesis! I should be fully devoted to GSoC by the middle of next week.
Despite all the distractions, I did manage to start work on Hoogle. I created a new project for Hoogle at the community.haskell.org site, and an associated darcs repo at http://code.haskell.org/hoogle. I've done a number of things on Hoogle:
I've started work from the front, and am intending to first flesh out an API and command line client, then move on to the web front end. The biggest change from the current implementation of Hoogle will be that there is one shared binary, which will be able to function in a number of modes. These modes will include running as a web server, as a command line version, as an interactive (Hugs/GHCi style) program, documentation location etc. This will allow easier installation, and let everyone host their own web-based Hoogle without much effort.
Next week: I hope to move towards the command line client and central Hoogle database structure. I also hope to chat to the Haddock 2 people, and try and get some integration similar to Haddock 1's --hoogle flag.
User visible changes: Hoogle 4 as it currently stands is unable to run searches, although hoogle --test will run some regression tests.
I've only done about half a weeks work on Hoogle this week, because I'm handing in my PhD thesis early next week, and because I'm moving house on Wednesday. I spent 14 hours on Saturday moving furniture, and many more hours than that on my thesis! I should be fully devoted to GSoC by the middle of next week.
Despite all the distractions, I did manage to start work on Hoogle. I created a new project for Hoogle at the community.haskell.org site, and an associated darcs repo at http://code.haskell.org/hoogle. I've done a number of things on Hoogle:
- Improved the developer documentation in some places
- Reorganised the repo, moving away dead files
- Work on command line flags, parsing them etc.
- Added a framework for running regression tests
- Organise the command line/CGI division
I've started work from the front, and am intending to first flesh out an API and command line client, then move on to the web front end. The biggest change from the current implementation of Hoogle will be that there is one shared binary, which will be able to function in a number of modes. These modes will include running as a web server, as a command line version, as an interactive (Hugs/GHCi style) program, documentation location etc. This will allow easier installation, and let everyone host their own web-based Hoogle without much effort.
Next week: I hope to move towards the command line client and central Hoogle database structure. I also hope to chat to the Haddock 2 people, and try and get some integration similar to Haddock 1's --hoogle flag.
User visible changes: Hoogle 4 as it currently stands is unable to run searches, although hoogle --test will run some regression tests.
Subscribe to:
Posts (Atom)