Friday, February 29, 2008

Hoogle 3 Security Bug

I recently released Hoogle 3.1, in response to a security bug spotted by Tillmann Rendel. Checking back through my records, I found that Botje on #haskell had previously spotted the same issue, but at the time I hadn't noticed it was a security bug. The security implications of the bug are very low, and it could not be used to cause any real harm.

The Bug

The bug is that in the Hoogle web interface, user supplied data may end up being shown to the user without escaping. For example, searching for the number 1 results in an error message, which says "Parse Error: Unexpected character '1'". Unfortunately, in Hoogle 3, if that search string had been "1<u>2<u>", then the result would have been "Parse Error: Unexpected character '12'" - i.e. the number 2 would be underlined. If you try this same example in Hoogle 3.1, you do not get any formatting, and see the entered tags.

The bug could be provoked in several places:


  • The error message, on a parse error.

  • The input box, after a search had been entered.

  • As the string listed as what the user searched for.



To perturb the input box would require entering a quote character ("), and to perturb the other instances would require an opening angle bracket (<).

The Severity

I am fairly sure the severity of this bug is "incredibly low". As a result of entering a malicious query, the attacker could cause the page displayed to contain whatever was desired. However, the Hoogle online interface has no privileges beyond that of a normal web page, and so can't actually do anything evil. The bug does not permit any supplied code to be executed on the server.

There is only one malicious use I could think of: browser redirects. Sometimes evil companies will send out spam mail, with links such as "click here to go to example.com and order Viagra". One anti-spam measure is to reject all emails linking to a particular domain name. By crafting a URL, it was possible for a link to Hoogle to redirect to another domain, thus appearing that the initial link was to a trusted website. The spam recipient still goes to the original page, but it may defeat their spam filters.

Checking the server logs for Hoogle shows that no one ever actually exploited the flaw to perform a redirect, or even to insert a <script> tag - the first step to any such exploit.

The Fix

I had to make two fixes to the code. I use Haskell Source Extensions to generate most of the HTML shown in Hoogle. As part of that, I have a ToXML class that automatically converts values to an XML representation, which is then rendered. The ToXML instance for String did not escape special HTML characters, now it does. I wrote the ToXML instances, instead of relying on those supplied in the associated HSP, and thus introduced the bug.

The only other code that generates HTML uses a formatted string type, which can represent hyperlinks and various formatting, and can be rendered as either console escape characters or as HTML. Since this part of the code was written before moving to Haskell Source Extensions, it generates raw strings. This generating code was also patched to escape certain characters.

As a result of using libraries and abstractions, it wasn't necessary to fix each of the security flaws one by one, but to fix the interface to the library. In doing so, I have much more confidence that all the security flaws have been tackled once and for all, and that they will not reoccur.

Is Haskell Insecure?

Enhanced security is one of the many advantages that Haskell offers. It is not possible to overrun a buffer and conduct stack smashing attacks on a Haskell program. Passing query strings will not overwrite global variables, and escaping cannot cause user code to be executed on the server. However, when Haskell code generates HTML, it is not immune from code injection attacks on the client side.

In the beginning Hoogle did not use any HTML generation libraries. As I have slowly moved towards Haskell Source Extensions, I have benefited from better guarantees about well-formed HTML. By creating appropriate abstractions, and dealing with concerns like escaping at the right level, and enforcing these decisions with appropriate types, the number of places to introduce a security bug is lowered. Hopefully Hoogle will not fall victim to such a security problem in future.

Thursday, February 28, 2008

Adding data files using Cabal

Cabal is the standard method of packaging Haskell programs and libraries for release. One problem I've encountered more than once is that adding data files to a Cabal built project is not as easy as it could be. I'm not entirely sure why - having just added data file support to Hoogle, it wasn't excessively painful, but I still came out of the experience feeling slightly bruised. To help others (and my future self), I thought I'd write down the details while they are still freshly spinning round my head.

Let's assume we start with an existing Cabal project, with an associated .cabal file. In the root directory of the project we have readme.txt and data.txt. The readme file contains a basic introduction to the user, and the data file contains some data that the program needs to access at runtime.

We first modify the .cabal file to add the following lines in the top section:


Extra-Source-Files:
readme.txt
Data-Files:
data.txt


The Extra-Source-Files tells Cabal to put the files in the release tarball, but nothing more - for a readme this behaviour is perfect. The Data-Files section tells Cabal that the following files contain data which the program will want to access at runtime. Data files include things like big tables, the hoogle function search database, graphics/game data files for games, UI description files for GUI's, etc.

Now we have added the data file to Cabal's control, Cabal will automatically manage it for us. It will be added to the source tarball, and will be installed somewhere appropriate on the users system, following operating system guidelines. The only question is where Cabal has put the file. To figure this out, Cabal generates a Paths_hoogle module (change the project name as appropriate) which it links in with the program. The Paths module provides the function:


getDataFileName :: FilePath -> IO FilePath


At runtime, to find the data file, we can simply call getDataFileName "data.txt", and Cabal will tell us where the data file resides.

The above method works well after a program has been installed, but is harder to work with while developing a program. To alleviate these problems, we can add our own Paths module to the program, for example:


module Paths_hoogle where

getDataFileName :: FilePath -> IO FilePath
getDataFileName = return


Place this module alongside all the other modules. While developing the program our hand-created Paths module will be invoked, which says the data is always in the current directory. When doing a Cabal build, Cabal will choose its custom generated Paths module over ours, and we get the benefits of Cabal managing our data.

Cabal's support for data files, and extra source files, is very useful. It doesn't take much work to make use of the provided facilities, and it will help to ensure that users of your program on all operating systems get the style of installation they were expecting.