Sunday, October 30, 2011

Calling Haskell from R

Summary: This post describes how to write a function in Haskell, and then call it from R.

R is a very popular language for statistics, particular with biologists (and computational paleobiologists). For writing high performance code, the R developers recommend the use of C or Fortran - not languages that are particularly easy for beginners. However, you can instead write a Haskell function that can be called directly from R. The basic idea is to create a C compatible library using Haskell (as described in the GHC users manual) and then call that library from R (as described in this document). As a simple example, let's write a function that adds up the square roots of a list of numbers.

Create an R-compatible Haskell library

In normal Haskell, we would define the function to add up the square roots of a list as:


sumRoots :: [Double] -> Double
sumRoots xs = sum (map sqrt xs)


However, to make a function that is compatible with R, we have to follow two rules:


  • Every argument must be a Ptr to a C compatible type, typically Int, Double or CString. (To be pedantic, we should probably use CInt or CDouble, but using GHC on Windows these types are equivalent - keeping things simpler.)

  • The result must be IO ()



Obeying these restrictions, we need to use the type:


sumRootsR :: Ptr Int -> Ptr Double -> Ptr Double -> IO ()
sumRootsR n xs result = ...


Instead of passing in the list xs, we now pass in:


  • n, the length of the list xs

  • xs, the elements of the list

  • result, a space to put the result



We can implement sumRootsR by using the functions available in the Foreign module:


sumRootsR :: Ptr Int -> Ptr Double -> Ptr Double -> IO ()
sumRootsR n xs result = do
n <- peek n
xs <- peekArray n xs
poke result $ sumRoots xs


This function first gets the value for n, then for each element in 0..n-1 gets the element out of the pointer array xs and puts it in a nice list. We then call the original sumRoots, and store the value in the space provided by result. As a general rule, you should put all the logic in one function (sumRoots), and the wrapping in another (sumRootsR). We can then export this function with the definition:


foreign export ccall sumRootsR :: Ptr Int -> Ptr Double -> Ptr Double -> IO ()


Putting everything together, we end up with the Haskell file:


-- SumRoots.hs
{-# LANGUAGE ForeignFunctionInterface #-}
module SumRoots where

import Foreign

foreign export ccall sumRootsR :: Ptr Int -> Ptr Double -> Ptr Double -> IO ()

sumRootsR :: Ptr Int -> Ptr Double -> Ptr Double -> IO ()
sumRootsR n xs result = do
n <- peek n
xs <- peekArray n xs
poke result $ sumRoots xs

sumRoots :: [Double] -> Double
sumRoots xs = sum (map sqrt xs)


We also need a C stub file. The one described in the GHC users guide works well:


// StartEnd.c
#include <Rts.h>

void HsStart()
{
int argc = 1;
char* argv[] = {"ghcDll", NULL}; // argv must end with NULL

// Initialize Haskell runtime
char** args = argv;
hs_init(&argc, &args);
}

void HsEnd()
{
hs_exit();
}


We can now compile our library with the commands:


ghc -c SumRoots.hs
ghc -c StartEnd.c
ghc -shared -o SumRoots.dll SumRoots.o StartEnd.o


This creates the library SumRoots.dll.

Calling Haskell from R

At the R command prompt, we can load the library with:


dyn.load("C:/SumRoots.dll") # use the full path to the SumRoots library
.C("HsStart")


We can now invoke our function:


input = c(9,3.5,5.58,64.1,12.54)
.C("sumRootsR", n=as.integer(length(input)), xs=as.double(input), result=as.double(0))$result


This prints out the answer 18.78046.

We can make this function easier to use on the R side by writing a wrapper, for example:


sumRoots <- function(input)
{
return(.C("sumRootsR", n=as.integer(length(input)), xs=as.double(input), result=as.double(0))$result)
}


Now we can write:


sumRoots(c(12,444.34))


And get back the answer 24.54348. With a small amount of glue code, it's easy to call Haskell libraries from R programs.

Saturday, October 01, 2011

Haskell Implementors Workshop 2011 slides

I've just uploaded the slides for all the talks given at the Haskell Implementors Workshop 2011 - just click on "Slides" next to any of the talks. Many thanks to all the speakers for very interesting talks, and for sending me their slides.

There will be video, hosted on either YouTube or Vimeo, at some point in the future. I've got all the copyright permission forms, but I don't yet have the video footage. I'm currently working to get a copy of the video sent from Japan.

Template Haskell fights with Generic Programming

Summary: The InfixE construction in Template Haskell fits poorly with generic programming, because its type does not capture all its restrictions. This mismatch can result in confusing bugs, but there is a simple workaround.

I have often said that anyone manipulating abstract syntax trees, without using some form of generic programming, is doing it wrong. Recently I have been manipulating Template Haskell syntax trees using Uniplate, my preferred generic programming library. Consider the problem of replacing all instances of delete with deleteBy (==) - this task can be done with Template Haskell:


{-# LANGUAGE TemplateHaskell #-}
module IHateDelete where
import Data.List
import Language.Haskell.TH
import Data.Generics.Uniplate.Data

iHateDelete :: Q [Dec] -> Q [Dec]
iHateDelete = fmap (transformBi f)
where
f :: Exp -> Exp
f (VarE x) | x == 'delete = VarE 'deleteBy `AppE` VarE '(==)
f x = x


We can test this function with:


{-# LANGUAGE TemplateHaskell #-}
import IHateDelete
import Data.List

$(iHateDelete
[d|
mapDelete x = map (delete x)
myElem x xs = length (delete x xs) /= length xs
|])


To see the result of running iHateDelete pass the flag -ddump-splices. As far as we can tell, our iHateDelete function works perfectly. But wait - let's try another example:


$(iHateDelete
[d|
myDelete x xs = x `delete` xs
|])


In GHC 6.12, we get a GHC panic. In GHC 7.2 we get the error message:


Operator application with a non-variable operator: deleteBy (==)
(Probably resulting from a Template Haskell splice)


(I would find this message far clearer if it said "Infix application..." rather than "Operation application...")

The body of myDelete starts out as:


InfixE (Just (VarE 'x)) (VarE 'delete) (Just (VarE' xs))


After our transformation, this becomes:


InfixE (Just (VarE 'x)) (AppE (VarE 'deleteBy) ('(==))) (Just (VarE' xs))


Or, written in Haskell syntax:


x `deleteBy (==)` xs


This expression is not valid Haskell, and causes an error when spliced back in (when inserted back into the Haskell code).

The Problem

The underlying problem is called out in the Template Haskell Exp documentation:


InfixE (Maybe Exp) Exp (Maybe Exp)
-- ^ It's a bit gruesome to use an Exp as the operator, but how else can we distinguish constructors from non-constructors?
-- Maybe there should be a var-or-con type? Or maybe we should leave it to the String itself?


The operator in InfixE has a type which permits any expression, but has the restriction that when spliced back in the expression must only be a VarE or ConE. This restriction poses a problem for generic programming, where values are treated in a uniform manner. Sadly, both of the suggested fixes would also cause problems for generic programming. Perhaps the true fix is to let Haskell have arbitrary expressions for infix operators? Or perhaps Template Haskell should translate InfixE to AppE if the operator is incompatible with Haskell?

The Workaround

As a workaround, you can translate away all InfixE expressions that have a complex middle expression. I use the following function:


fixupInfix :: [Dec] -> [Dec]
fixupInfix = transformBi f
where
bad VarE{} = False
bad ConE{} = False
bad _ = True

f (InfixE a b c) | bad b = case (a,c) of
(Nothing, Nothing) -> b
(Just a , Nothing) -> b `AppE` a
(Nothing, Just c ) -> VarE 'flip `AppE` b `AppE` c
(Just a , Just c ) -> b `AppE` a `AppE` c
f x = x


The original iHateDelete can then be modified to become:


iHateDelete = fmap (fixupInfix . transformBi f)


.