R is a very popular language for statistics, particular with biologists (and computational paleobiologists). For writing high performance code, the R developers recommend the use of C or Fortran - not languages that are particularly easy for beginners. However, you can instead write a Haskell function that can be called directly from R. The basic idea is to create a C compatible library using Haskell (as described in the GHC users manual) and then call that library from R (as described in this document). As a simple example, let's write a function that adds up the square roots of a list of numbers.
Create an R-compatible Haskell library
In normal Haskell, we would define the function to add up the square roots of a list as:
sumRoots :: [Double] -> Double sumRoots xs = sum (map sqrt xs)
However, to make a function that is compatible with R, we have to follow two rules:
- Every argument must be a Ptr to a C compatible type, typically Int, Double or CString. (To be pedantic, we should probably use CInt or CDouble, but using GHC on Windows these types are equivalent - keeping things simpler.)
- The result must be IO ()
Obeying these restrictions, we need to use the type:
sumRootsR :: Ptr Int -> Ptr Double -> Ptr Double -> IO () sumRootsR n xs result = ...
Instead of passing in the list xs, we now pass in:
- n, the length of the list xs
- xs, the elements of the list
- result, a space to put the result
We can implement sumRootsR by using the functions available in the Foreign module:
sumRootsR :: Ptr Int -> Ptr Double -> Ptr Double -> IO () sumRootsR n xs result = do n <- peek n xs <- peekArray n xs poke result $ sumRoots xs
This function first gets the value for n, then for each element in 0..n-1 gets the element out of the pointer array xs and puts it in a nice list. We then call the original sumRoots, and store the value in the space provided by result. As a general rule, you should put all the logic in one function (sumRoots), and the wrapping in another (sumRootsR). We can then export this function with the definition:
foreign export ccall sumRootsR :: Ptr Int -> Ptr Double -> Ptr Double -> IO ()
Putting everything together, we end up with the Haskell file:
-- SumRoots.hs {-# LANGUAGE ForeignFunctionInterface #-} module SumRoots where import Foreign foreign export ccall sumRootsR :: Ptr Int -> Ptr Double -> Ptr Double -> IO () sumRootsR :: Ptr Int -> Ptr Double -> Ptr Double -> IO () sumRootsR n xs result = do n <- peek n xs <- peekArray n xs poke result $ sumRoots xs sumRoots :: [Double] -> Double sumRoots xs = sum (map sqrt xs)
We also need a C stub file. The one described in the GHC users guide works well:
// StartEnd.c #include <Rts.h> void HsStart() { int argc = 1; char* argv[] = {"ghcDll", NULL}; // argv must end with NULL // Initialize Haskell runtime char** args = argv; hs_init(&argc, &args); } void HsEnd() { hs_exit(); }
We can now compile our library with the commands:
ghc -c SumRoots.hs ghc -c StartEnd.c ghc -shared -o SumRoots.dll SumRoots.o StartEnd.o
This creates the library SumRoots.dll.
Calling Haskell from R
At the R command prompt, we can load the library with:
dyn.load("C:/SumRoots.dll") # use the full path to the SumRoots library .C("HsStart")
We can now invoke our function:
input = c(9,3.5,5.58,64.1,12.54) .C("sumRootsR", n=as.integer(length(input)), xs=as.double(input), result=as.double(0))$result
This prints out the answer 18.78046.
We can make this function easier to use on the R side by writing a wrapper, for example:
sumRoots <- function(input) { return(.C("sumRootsR", n=as.integer(length(input)), xs=as.double(input), result=as.double(0))$result) }Now we can write:
sumRoots(c(12,444.34))And get back the answer 24.54348. With a small amount of glue code, it's easy to call Haskell libraries from R programs.
Update: See the comments below from Alex Davis for how to do such things on newer versions of Mac OS.