Sunday, October 30, 2011

Calling Haskell from R

Summary: This post describes how to write a function in Haskell, and then call it from R.

R is a very popular language for statistics, particular with biologists (and computational paleobiologists). For writing high performance code, the R developers recommend the use of C or Fortran - not languages that are particularly easy for beginners. However, you can instead write a Haskell function that can be called directly from R. The basic idea is to create a C compatible library using Haskell (as described in the GHC users manual) and then call that library from R (as described in this document). As a simple example, let's write a function that adds up the square roots of a list of numbers.

Create an R-compatible Haskell library

In normal Haskell, we would define the function to add up the square roots of a list as:

sumRoots :: [Double] -> Double
sumRoots xs = sum (map sqrt xs)

However, to make a function that is compatible with R, we have to follow two rules:

  • Every argument must be a Ptr to a C compatible type, typically Int, Double or CString. (To be pedantic, we should probably use CInt or CDouble, but using GHC on Windows these types are equivalent - keeping things simpler.)

  • The result must be IO ()

Obeying these restrictions, we need to use the type:

sumRootsR :: Ptr Int -> Ptr Double -> Ptr Double -> IO ()
sumRootsR n xs result = ...

Instead of passing in the list xs, we now pass in:

  • n, the length of the list xs

  • xs, the elements of the list

  • result, a space to put the result

We can implement sumRootsR by using the functions available in the Foreign module:

sumRootsR :: Ptr Int -> Ptr Double -> Ptr Double -> IO ()
sumRootsR n xs result = do
n <- peek n
xs <- peekArray n xs
poke result $ sumRoots xs

This function first gets the value for n, then for each element in 0..n-1 gets the element out of the pointer array xs and puts it in a nice list. We then call the original sumRoots, and store the value in the space provided by result. As a general rule, you should put all the logic in one function (sumRoots), and the wrapping in another (sumRootsR). We can then export this function with the definition:

foreign export ccall sumRootsR :: Ptr Int -> Ptr Double -> Ptr Double -> IO ()

Putting everything together, we end up with the Haskell file:

-- SumRoots.hs
{-# LANGUAGE ForeignFunctionInterface #-}
module SumRoots where

import Foreign

foreign export ccall sumRootsR :: Ptr Int -> Ptr Double -> Ptr Double -> IO ()

sumRootsR :: Ptr Int -> Ptr Double -> Ptr Double -> IO ()
sumRootsR n xs result = do
n <- peek n
xs <- peekArray n xs
poke result $ sumRoots xs

sumRoots :: [Double] -> Double
sumRoots xs = sum (map sqrt xs)

We also need a C stub file. The one described in the GHC users guide works well:

// StartEnd.c
#include <Rts.h>

void HsStart()
int argc = 1;
char* argv[] = {"ghcDll", NULL}; // argv must end with NULL

// Initialize Haskell runtime
char** args = argv;
hs_init(&argc, &args);

void HsEnd()

We can now compile our library with the commands:

ghc -c SumRoots.hs
ghc -c StartEnd.c
ghc -shared -o SumRoots.dll SumRoots.o StartEnd.o

This creates the library SumRoots.dll.

Calling Haskell from R

At the R command prompt, we can load the library with:

dyn.load("C:/SumRoots.dll") # use the full path to the SumRoots library

We can now invoke our function:

input = c(9,3.5,5.58,64.1,12.54)
.C("sumRootsR", n=as.integer(length(input)), xs=as.double(input), result=as.double(0))$result

This prints out the answer 18.78046.

We can make this function easier to use on the R side by writing a wrapper, for example:

sumRoots <- function(input)
return(.C("sumRootsR", n=as.integer(length(input)), xs=as.double(input), result=as.double(0))$result)

Now we can write:


And get back the answer 24.54348. With a small amount of glue code, it's easy to call Haskell libraries from R programs.


winterkoninkje said...

Sweet. Now we just need to write a nice tool for autogenerating all the glue, since FFI glue is usually the worst part for beginners. So what sorts of datatypes does R export other than CInt, CDouble, CString, and arrays a la (CInt,Ptr a) ? How does R's GC work in case we wanted to allocate arrays on the Haskell side?

Dominik said...

thx for the post, the last time I tried that I didn't get it to work but I'll try again with your explanation.

R also calls an initialization routine after dyn.load() which you can have call HsStart
(see: Then you could avoid explicitly calling HsStart from R. Doesnt make much difference though,

@winterkoninkje: .C() only exposes basic C datatypes, like double*, int*, char* and so on. To my knowledge you cant allocate objects on the C- or Haskell-side that way, you usually allocate all output objects in R code and pass them along to .C(). There's two other functions, .Call() and .External() that pass you real R objects and where you can also allocate R objects in C/Haskell code, but I think you have to use R API functions to create them and have R's gc handle these objects. I found that usually .C() is good enough and much easier to use.

Jesse said...

"...not languages that are particularly easy for beginners"

Really, Haskell falls into that category as well. The suggestion (for C/Fortran) probably comes from the fact that there is no wrapper code indirection necessary in that case.

The ability to use Haskell instead is still a pretty nice point, however.

scott said...

This seems interesting, but the Haskell you have to write to be compatible with R doesn't look much like idiomatic Haskell.

What's the point of using Haskell when you have to use IO to do something that shouldn't require it?

Audun said...

Instead of the manual loop over the array:
xs <- forM [0..n-1] $ \i ->
peekElemOff xs i

You could do the simpler:
xs <- peekArray n xs

Where peekArray is from Foreign.Marshal.Array.

Neil Mitchell said...

Dominik: Thanks for the tip on HsStart - good idea.

Audun: Thanks for the tip, I've updated the post - that's a much simpler solution.

scott/Jesse: The wrapping code is not idiomatic Haskell, but that's a few lines. The idea is you write a small wrapper Haskell/R wrapper (sumRootsR), and then can write lots of idiomatic Haskell that it calls.

winterkoninkje: I don't intend to do anything else on this work, but if you do anything my wife may be interested!

Dominik said...

BTW to compile on FreeBSD I had to use (and probably something similar on Linux):

ghc -c -fPIC StartEnd.c
ghc -c -fPIC SumRoots.hs
ghc -shared -dynamic -fPIC -o SumRoots.o SumRoots_stub.o StartEnd.o -lHSrts-ghc7.0.3 -optl-Wl,-rpath,/usr/local/lib/ghc-7.0.3/

I found that on http://www.well- . I wonder though if you still have to explicitly link against the right RTS.
For starters making a wrapper for .C would be cool; maybe one can steal some ideas from the inline and Rcpp R packages. Think I'll try that some time.

Neil Mitchell said...

Dominik: Useful information about the FreeBSD command line. I think if you were on GHC 7.2 you wouldn't need to include "SumRoots_stub.o"

James said...

Do you what would be involved in going the other way around (ie calling R code from Haskell code)?

Neil Mitchell said...

No idea I'm afraid, but the GHC manual does say how to call Haskell from Excel, and the same pattern might work, if R can bind to C dlls.