Adding metadata to variables
There are only really two ways to preserve your statistical analyses. You either save the variables that you create, or you save the code that you used to create them. In general the latter is much preferred because at some point you’ll realise that your model was wrong, or your dataset has changed, and you need to re-run your analysis. If you only stored your variables then you are now stuck rewriting your code in order to create new versions, which is really not fun. On the other hand, if you saved your code, all your have to do is tweak it and run it.
Occasionally though, just keeping the code and rerunning an analysis isn’t practical. The most obvious case being when it takes a long time. If your model takes more than ten minutes to run, it can be really useful to save its variables as well as the source code.
The problem with saving variables is that when you come back and load them six months later, it isn’t always obvious what they are or where they came from. With code, we solve this by using comments to jog our memory, so it would be nice to have an equivalent for variables. In fact, in R, such a facility exists with the – you guessed it – comment
function.
library(lattice) comment(barley) <- "Immer's barley data, 1934. The data from the Morris site may have the wrong years." comment(barley)
The comment
function simply stores the string as an attribute of the variable, with some special rules on printing. Other common attributes that you may be familiar with are names
for vectors and lists, and dim
and dimnames
for matrices.
You can find the names of all the attributes of a variable with the attributes
function, and get and set individual attributes with attr
.
x <- c(apple = 1, banana = 2) attr(x, "type") <- "fruit" attributes(x) attr(x, "names") #same as names(x)
Attributes are really great for storing contextual metadata about a variable. For starters, when you come back to your saved workspace after those six months you might want to know who created the variable and when. To get this facility, we need an enhanced version of assign
.
get_user <- function() { env <- if(.Platform$OS.type == "windows") "USERNAME" else "USER" unname(Sys.getenv(env)) } assign_with_metadata <- function(x, value, ..., pos = parent.frame(), inherits = FALSE) { attr(value, "creator") <- get_user() attr(value, "time_created") <- Sys.time() more_attr <- list(...) attr_names <- names(more_attr) for(i in seq_along(more_attr)) { attr(value, attr_names[i]) <- more_attr[[i]] } assign(x, value, pos = pos, inherits = inherits) } assign_with_metadata("x", 1:3, monkey = "chimp")
Notice the ...
that allows you to add arbitrary attributes to the variable.
While this is great, and solves the problem, typing assign_with_metadata
is way too clunky. It would be much easier if we could just use <-
to assign variables and get the metadata for free.
Actually, overriding <-
itself is going to lead to slowness and likely errors. Since we don’t want to store metadata for every variable (just the important ones), it is better to define our own operators to do so.
`%<-%` <- function(x, value) { xname <- deparse(substitute(x)) pos <- parent.frame() assign_with_metadata(xname, value, pos = pos) } `%<<-%` <- function(x, value) { xname <- deparse(substitute(x)) pos <- globalenv() assign_with_metadata(xname, value, pos = pos) } m %<-% "foo" #local assignment with metadata f <- function() { n %<<-% "bar" #global assignment with metadata } f()
With these functions, if you want to save your variables for later, simply swap <-
for %<-%
.
Categories
Archives
- April 2017
- May 2016
- March 2016
- October 2015
- September 2015
- August 2015
- July 2015
- June 2015
- March 2015
- February 2015
- January 2015
- December 2014
- November 2014
- September 2014
- July 2014
- May 2014
- April 2014
- October 2013
- September 2013
- August 2013
- July 2013
- April 2013
- December 2012
- November 2012
- October 2012
- July 2012
- June 2012
- May 2012
- March 2012
- February 2012
- January 2012
- December 2011
- October 2011
- September 2011
- August 2011
- July 2011
- June 2011
- May 2011
- April 2011
- March 2011
- February 2011
- January 2011
- December 2010
- November 2010
- October 2010
- September 2010
- August 2010
Blogroll
Licensing

The non-code parts of 4D Pie Charts by Richard Cotton are licensed under a Creative Commons Attribution-NoDerivs 2.0 UK: England & Wales License. The code parts of the blog are licensed under the WTFPL v2.0.