Saturday, March 21, 2009

Concise Generic Queries

A few weeks ago David Miani asked how to write concise queries over a data type. The answer is certainly generic programming, a technique that I feel is underused in the Haskell community. I suggested David look at Uniplate, but he found greater success with SYB. Sean Leather gave a solution using EMGM. One of the advantages of Uniplate is conciseness, so I decided to tackle the same problem and compare.

A full description of the task, including data type definitions, is at Sean's blog. From a data type representing structured files (tables, headings, paragraphs) find a heading with a particular name then within that heading find a paragraph starting with "Description". The rest of this post contains solutions using Uniplate, EMGM (taken from Sean) and SYB (from David). The SYB solution is slightly different from the EMGM or Uniplate solutions, but they all do roughly the same generic operations. It is entirely possible that the EMGM/SYB solutions could be improved, but that is a job for other people.

Uniplate Solution

The Uniplate solution is:


projDesc :: String -> OrgFileP -> Maybe String
projDesc name p = listToMaybe [y |
OrgHeadingP _ x ys <- universeBi p, name == x,
ParagraphP y <- universeBi ys, "Description" `isPrefixOf` y]


The code can be read as:


  • Line 1: Type signature, given a name and a file, return the paragraph if you find one

  • Line 3: Find a heading with the right name

  • Line 4: Find a paragraph below that heading, whose name starts with "Description"

  • Line 2: Pick the paragraph



I find this code to be a clear, concise and simple description of the problem. The thought process to come up with the solution was as follows: You want to search, or perform a query. The first question is whether this is a deep (all nodes) or shallow (just the children) query - David doesn't say but the example seems to imply deep. If it's deep use universeBi. Operations are combined with a list comprehension that finds an element, check it has the necessary properties (the name), then performs more operations. The result is the code you see above.

EMGM Solution

Sean's solution can be found at his blog:


projDesc :: String -> OrgFileP -> Maybe String
projDesc name file = do
hdg <- G.firstr (headings name file)
para <- firstPara hdg
if para =~ "Description" then return para else Nothing

headings :: String -> OrgFileP -> [OrgHeadingP]
headings name = filter check . G.collect
where
check (OrgHeadingP _ possible _) = name == possible

firstPara :: OrgHeadingP -> Maybe String
firstPara hdg = paraStr =<< G.firstr (G.collect hdg)
where
paraStr (ParagraphP str) = Just str
paraStr _ = Nothing


This solution isn't bad, but is more verbose than the Uniplate solution. Perhaps it could be rewritten with list comprehensions? It seems that G.collect is similar to universeBi - although I am not sure.

SYB Solution

David's SYB solution can be found here along with another solution using different combinators.


eitherOr :: Either a b -> Either a b -> Either a b
eitherOr x@(Right _) _ = x
eitherOr _ y = y

getP14Desc :: OrgElement -> Either ErrString String
getP14Desc org = everything eitherOr (Left descError `mkQ` findDesc) =<<
everything eitherOr (Left findError `mkQ` findP14) org
where
findP14 h@(Heading {headingName=name})
| name == "Project14" = Right h
findP14 _ = Left findError

findDesc (Paragraph {paragraphText=text})
| text =~ "Description" = Right text
findDesc _ = Left findError

descError = "Couldn't find description for project"
findError = "Couldn't find project."


Summary

The relative merits of each solution are highly subjective, but I believe the Uniplate solution is concise. The Uniplate solution is a simple translation of the problem, without any clever steps, so hopefully other users (who didn't write the library!) will be able to achieve similar results. The Uniplate solution required only one function from the Uniplate library, so has a small learning curve. Even if you don't choose Uniplate, generic programming techniques are very useful, and can make your code concise and robust.

Monday, March 09, 2009

Website move

Today I spotted that I could no longer push to my darcs repos hosted at York University. A little more checking showed that my home page had also been removed - I guess that's what happens when you are no longer a student there (although a warning email before would have been nice...). So I am pleased to announce my new website address:

http://community.haskell.org/~ndm/

Thanks to the wondrous Haskell community for providing all the resources I needed to move my website with no human intervention at haste. Expect my darcs repos to move somewhere shortly too.

I have now submitted the final bound copies of my thesis, and have uploaded a copy to my website (I had uploaded it to York, but didn't get chance to announce it!). I should say a great thank you to everyone who helped with my work/thesis, in particular Colin Runciman for supervising me for six years, and Detlef Plump and Simon Peyton Jones for examining me and really helping improve the final document with their comments.

The thesis has four content chapters, corresponding to Uniplate, Supero, Firstify and Catch. I have submitted a paper to ICFP 09 which expands/clarifies the Firstify work, which I'll upload as a draft shortly. For the other chapters, the version in the thesis is an improvement on the version in any papers I've published.