Archive

Posts Tagged ‘assignment’

Adding metadata to variables

6th January, 2012 Leave a comment

There are only really two ways to preserve your statistical analyses. You either save the variables that you create, or you save the code that you used to create them. In general the latter is much preferred because at some point you’ll realise that your model was wrong, or your dataset has changed, and you need to re-run your analysis. If you only stored your variables then you are now stuck rewriting your code in order to create new versions, which is really not fun. On the other hand, if you saved your code, all your have to do is tweak it and run it.

Occasionally though, just keeping the code and rerunning an analysis isn’t practical. The most obvious case being when it takes a long time. If your model takes more than ten minutes to run, it can be really useful to save its variables as well as the source code.

The problem with saving variables is that when you come back and load them six months later, it isn’t always obvious what they are or where they came from. With code, we solve this by using comments to jog our memory, so it would be nice to have an equivalent for variables. In fact, in R, such a facility exists with the – you guessed it – comment function.

library(lattice)
comment(barley) <- "Immer's barley data, 1934.  The data from the Morris site may have the wrong years."
comment(barley)

The comment function simply stores the string as an attribute of the variable, with some special rules on printing. Other common attributes that you may be familiar with are names for vectors and lists, and dim and dimnames for matrices.

You can find the names of all the attributes of a variable with the attributes function, and get and set individual attributes with attr.

x <- c(apple = 1, banana = 2)
attr(x, "type") <- "fruit"
attributes(x)
attr(x, "names") #same as names(x)

Attributes are really great for storing contextual metadata about a variable. For starters, when you come back to your saved workspace after those six months you might want to know who created the variable and when. To get this facility, we need an enhanced version of assign.

get_user <- function()
{
  env <- if(.Platform$OS.type == "windows") "USERNAME" else "USER"
  unname(Sys.getenv(env))    
}  
  
assign_with_metadata <- function(x, value, ..., pos = parent.frame(), inherits = FALSE)
{
  attr(value, "creator") <- get_user()
  attr(value, "time_created") <- Sys.time()
  more_attr <- list(...)
  attr_names <- names(more_attr)
  for(i in seq_along(more_attr))
  {
    attr(value, attr_names[i]) <- more_attr[[i]]
  }
  assign(x, value, pos = pos, inherits = inherits)
}

assign_with_metadata("x", 1:3, monkey = "chimp")

Notice the ... that allows you to add arbitrary attributes to the variable.

While this is great, and solves the problem, typing assign_with_metadata is way too clunky. It would be much easier if we could just use <- to assign variables and get the metadata for free.

Actually, overriding <- itself is going to lead to slowness and likely errors. Since we don’t want to store metadata for every variable (just the important ones), it is better to define our own operators to do so.

`%<-%` <- function(x, value)
{
  xname <- deparse(substitute(x))
  pos <- parent.frame()
  assign_with_metadata(xname, value, pos = pos)
}

`%<<-%` <- function(x, value) 
{
  xname <- deparse(substitute(x))
  pos <- globalenv()
  assign_with_metadata(xname, value, pos = pos)
}

m %<-% "foo"    #local assignment with metadata
f <- function()
{
  n %<<-% "bar" #global assignment with metadata
}
f()

With these functions, if you want to save your variables for later, simply swap <- for %<-%.

Non-standard assignment with getSymbols

21st April, 2011 3 comments

I recently came across a rather interesting investment blog, Timely Portfolio. I have a certain soft spot for that sort of thing, because using my data analysis skills to make a fortune is casually on my to-do list.

This blog makes regular use of a function getSymbols in the quantmod package. The power and simplicity of the function is fantastic: with one short line of code, you can retrieve historical data on any stock, bond or index that you fancy. It does have one oddity though. In R, we are all used to assigning values to variable with <-.

x <- mean(1:5)

Not for getSymbols this behaviour though. It uses a bizarre assignment procedure whereby the return value is assigned to a variable with the same name as the Symbols parameter, into an environment of your choice (the global environment by default). For example, getSymbols("DGS10",src="FRED") creates a variable named DGS10.

When retrieving many symbols, this can get a little clunky. Here’s a snippet from a recent post.

getSymbols("DGS10",src="FRED") #load 10yTreasury
getSymbols("DFII10",src="FRED") #load 10yTIP for real return
getSymbols("DTWEXB",src="FRED") #load US dollar
getSymbols("SP500",src="FRED") #load SP500

I see lots of code repetition, which means that is is a prime opportunity for some refactoring. These four lines can be condensed with a call to lapply by passing a vector to getSymbols (EDIT CREDIT: thanks Owe!).

symbol_names <- c("DGS10", "DFII10", "DTWEXB", "SP500")
#lapply(symbol_names, getSymbols, src = "FRED")  #see Owe's comment
getSymbols(symbol_names, src = "FRED")

Unfortunately, the non-standard assignment means that instead of having a nice list of each of our datasets, we have four separate variables. To fix this, we must create our own environment to store the results, then convert that to a list.

symbol_env <- new.env()
#lapply(symbol_names, getSymbols, src="FRED", env = symbol_env) 
getSymbols(symbol_names, src = "FRED", env = symbol_env)
list_of_symbols <- as.list(symbol_env)

Understanding environments is quite an advanced topic and a full explanation is beyond the scope of this post. In this case however, the idea is very simple. We need somewhere out of the way to store all the variables that getSymbols creates. This storage place is the environment symbol_env, which can be thought of a list with special variable scoping rules. Since environments and lists are such similar constructs, we can convert from one to the other with as.list. (list2env works in the other direction.)

Follow

Get every new post delivered to your Inbox.

Join 41 other followers