Home > R > user2013: The caret tutorial

user2013: The caret tutorial

This afternoon I went to Max Kuhn’s tutorial on his caret package. caret stands for classification and regression (something beginning with e) trees. It provides a consistent interface to nearly 150 different models in R, in much the same way as the plyr package provides a consistent interface to the apply functions.

The basic usage of caret is to split your data into training and test sets.

my_data <- split(my_data, runif(nrow(my_data)) > p) #for some value of p
names(my_data) <- c("training", "testing")

Then call train on your training set.

training_model <- train(
  response ~ ., 
  data   = my_data$training,
  method = "a type of model!")

Then predict it using predict.

predictions <- predict(training_model, my_data$testing)

So the basic usage is very simple. The devil is of course in the statistical details. You still have to choose (at least one) type of model, and there are many options for how those models should be fit.

Max suggested that a good strategy for modelling is to begin with a powerful black box method (boosting, random forests or support vector machines) since they can usually provide excellent fits. The next step is to use a simple, understandable model (some form of regression perhaps) and see how much predictive power you are losing.

I suspect that in order to get the full benefit of caret, I’ll need to read Max’s book: Applied Predictive Modeling.

  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: