R is a very popular language for statistics, particular with biologists (and computational paleobiologists). For writing high performance code, the R developers recommend the use of C or Fortran - not languages that are particularly easy for beginners. However, you can instead write a Haskell function that can be called directly from R. The basic idea is to create a C compatible library using Haskell (as described in the GHC users manual) and then call that library from R (as described in this document). As a simple example, let's write a function that adds up the square roots of a list of numbers.
Create an R-compatible Haskell library
In normal Haskell, we would define the function to add up the square roots of a list as:
sumRoots :: [Double] -> Double sumRoots xs = sum (map sqrt xs)
However, to make a function that is compatible with R, we have to follow two rules:
- Every argument must be a Ptr to a C compatible type, typically Int, Double or CString. (To be pedantic, we should probably use CInt or CDouble, but using GHC on Windows these types are equivalent - keeping things simpler.)
- The result must be IO ()
Obeying these restrictions, we need to use the type:
sumRootsR :: Ptr Int -> Ptr Double -> Ptr Double -> IO () sumRootsR n xs result = ...
Instead of passing in the list xs, we now pass in:
- n, the length of the list xs
- xs, the elements of the list
- result, a space to put the result
We can implement sumRootsR by using the functions available in the Foreign module:
sumRootsR :: Ptr Int -> Ptr Double -> Ptr Double -> IO () sumRootsR n xs result = do n <- peek n xs <- peekArray n xs poke result $ sumRoots xs
This function first gets the value for n, then for each element in 0..n-1 gets the element out of the pointer array xs and puts it in a nice list. We then call the original sumRoots, and store the value in the space provided by result. As a general rule, you should put all the logic in one function (sumRoots), and the wrapping in another (sumRootsR). We can then export this function with the definition:
foreign export ccall sumRootsR :: Ptr Int -> Ptr Double -> Ptr Double -> IO ()
Putting everything together, we end up with the Haskell file:
-- SumRoots.hs
{-# LANGUAGE ForeignFunctionInterface #-}
module SumRoots where
import Foreign
foreign export ccall sumRootsR :: Ptr Int -> Ptr Double -> Ptr Double -> IO ()
sumRootsR :: Ptr Int -> Ptr Double -> Ptr Double -> IO ()
sumRootsR n xs result = do
n <- peek n
xs <- peekArray n xs
poke result $ sumRoots xs
sumRoots :: [Double] -> Double
sumRoots xs = sum (map sqrt xs)
We also need a C stub file. The one described in the GHC users guide works well:
// StartEnd.c
#include <Rts.h>
void HsStart()
{
int argc = 1;
char* argv[] = {"ghcDll", NULL}; // argv must end with NULL
// Initialize Haskell runtime
char** args = argv;
hs_init(&argc, &args);
}
void HsEnd()
{
hs_exit();
}
We can now compile our library with the commands:
ghc -c SumRoots.hs ghc -c StartEnd.c ghc -shared -o SumRoots.dll SumRoots.o StartEnd.o
This creates the library SumRoots.dll.
Calling Haskell from R
At the R command prompt, we can load the library with:
dyn.load("C:/SumRoots.dll") # use the full path to the SumRoots library
.C("HsStart")
We can now invoke our function:
input = c(9,3.5,5.58,64.1,12.54)
.C("sumRootsR", n=as.integer(length(input)), xs=as.double(input), result=as.double(0))$result
This prints out the answer 18.78046.
We can make this function easier to use on the R side by writing a wrapper, for example:
sumRoots <- function(input)
{
return(.C("sumRootsR", n=as.integer(length(input)), xs=as.double(input), result=as.double(0))$result)
}
Now we can write:
sumRoots(c(12,444.34))And get back the answer 24.54348. With a small amount of glue code, it's easy to call Haskell libraries from R programs.
Update: See the comments below from Alex Davis for how to do such things on newer versions of Mac OS.