Home > R > user2013: The Rcpp tutorial

user2013: The Rcpp tutorial

I’m at user 2013, and this morning I attended Hadley Wickham and Romain Francois’s tutorial on the Rcpp package for calling C++ code from R. I’ve spent the last eight years avoiding C++ afer having nightmares about obscure pointer bugs, so I went into the room slightly skeptical about this package.

I think the most important takeaway from the tutorial was a clear sense of when and why you might want to use C++.

The main selling point for using C++ with R is, in Hadley’s words, that R is optimised for making programmers efficient whereas C++ is made for making machine efficient, so the langauages are complimentary. That is, most of the time the slow part of doing statistics is you. Occasionally however, the slow part will be running your code, and in those instances C++ is better than R.

In order to write fast R code, it needs to be vectorised, and that often means using different functions to a scalar version. A classic example is using ifelse instead of separate if and else blocks, or using pmax instead of max.

Knowing how to vectorise R code thus requires quite a large vocabulary of functions. In C++ there is no vectorisation – you just write a for loop.

There are three things in particular that C++ does much faster than R: the above mentioned looping, resizing vectors and calling functions. (For the last point, Hadley quoted an overhead of 2ns to call a function in C++ versus 200ns in R.)

This means that C++ is useful for the following restricted use cases:

  1. When vectorisation is difficult or impossible. This is common when one element of a vector depends upon previous elements. MCMC is a classic example.
  2. When you are changing the size of a vector in a loop. Run length encoding was the example given.
  3. When you need to make millions of function calls. Recursive functions and some optimisation and simulation problems fit this category.

Typically C++ can give you an order of magnitude or two speed up over an R equivalent, but this is wildly problem-dependent and many of the built-in functions call C code which will run at the same speed (more or less) as a C++ version. It’s also important to consider how often the code will be run. Even if you have a thousand-fold speedup, if the running time of the R function is 0.01s, then you need to run it 60000 times just to get back the 10 minutes it took you to rewrite it in C++.

Anyway, using Rcpp makes it surprisingly simple to call C++ code. You need to install Rtools under windows, and of course the Rcpp package.

install.packages(c("installr", "Rcpp"))
library(installr)
install.Rtools()

Check that Rcpp is working by seeing if the following expression returns 2.

library(Rcpp)
evalCpp("1 + 1")

Then you can create a C++ function using the cppFunction function. Here’s a reimplementation of the any function. (Although it doesn’t deal with missing values.)

cppFunction('
  bool Any(LogicalVector x)
  {
    for(int i = 0; i < x.size(); ++i)
    {
      if(x[i])
      {
        return true;
      }
    }
    return false;
  }
')

Notice that in C++ you must be explicit about the types of variable that are passed into and returned from a function. How to wrie C++ is beyond the scope of the post, so I’ll say no more.

You can now call the Any function like this.

Any(runif(10) > 0.5) #returns TRUE
Any(runif(10) > 1.5) #returns FALSE
About these ads
Tags: , , ,
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 217 other followers

%d bloggers like this: