It’s Friday night, and I recently discovered that adding a dash of Cointreau somewhat enhances a gin and tonic. Consequently, I’m drunk-blogging. Will try not to make too many tpyos.
In the first part of this series, I discussed some standard debugging techniques, like the use of the
browser function. One of the limitations of
browser is that it isn't much help if the problem is somewhere higher up the call-stack. That is, the case when you passed a dodgy argument to another function, which maybe passed it on to something else before your code fell over.
If you have an error, but you don't know where it is, then a helpful thing to see is your call stack, and the variables that you've put in it. The function I'm going to show you uses
sys.calls to retrieve the call stack, and
sys.frames to retrieve the environment for each function you called. Next it calls
ls.str in each of those environments to tell you what variables you've created. Finally, there are some methods to print your output more clearly. After all, the whole point of this function is to give you quick insight into what you've just done.
When I was trying to think of a suitable name for this function, I toyed with variations on the word "stack". Eventually, I figured that it should be named for the word most commonly used when things go wrong. And so, I present to you, the
bugger <- function(...)
# ... contains arguments passed to ls.str. See that function for a description.
stack_length <- sys.nframe()
call_stack <- sys.calls()[-stack_length] # -stack_length is so we don't include this function
frames <- sys.frames()[-stack_length]
ls_stack <- lapply(frames, function(e) ls.str(envir = e, ...))
call_names <- make.names(lapply(call_stack,
function(cl) as.character(cl)), unique = TRUE)
names(call_stack) <- call_names
names(ls_stack) <- call_names
class(call_stack) <- "call_stack"
class(ls_stack) <- "ls_stack"
list(call_stack = call_stack, ls_stack = ls_stack)
The call to
make.names is there to ensure that each items in
ls_stack have a unique name, and thus can be referenced more easily. To finish, these two components get S3-style overloads of the print function.
print.call_stack <- function(x)
n <- length(x)
if(n < 1) cat("Call stack is empty\n")
calls <- sapply(x, function(x) as.character(as.expression(x)))
cat(paste(seq_len(n), calls, sep = ": "), sep = "\n")
print.ls_stack <- function(x, ...)
n <- length(x)
if(n < 1) cat("Call stack is empty\n")
titles <- paste(seq_len(n), names(x), sep = ": ")
for(i in seq_len(n))
And finally, here's an example to put the code to use
bar <- function(xx, ...)
foo <- function(x, y, z)
another_variable <- 999
bar(x, y, letters[1:5])
foo(1:10, "monkey", list(runif(3), zzz = "zzz"))
This yields the result
1: foo(1:10, "monkey", list(runif(3), zzz = "zzz"))
2: bar(x, y, letters[1:5])
another_variable : num 999
x : int [1:10] 1 2 3 4 5 6 7 8 9 10
y : chr "monkey"
z : List of 2
$ : num [1:3] 0.154 0.974 0.189
$ zzz: chr "zzz"2: bar
xx : int [1:10] 1 2 3 4 5 6 7 8 9 10
One thing to note is that
ls.str doesn't pick up the
... arguments. It is possible to retrieve the call to those dots from each environment, but it isn't always possible to evaluate them. Playing with R's evaluation model in this way is deep magic, so accessing them involves excessive use of
eval. I'm going to postpone talking about them for an occasion that didn't involve gin and cointreau.
By number of questions asked, R passed MATLAB for the first time on Stack Overflow today. Thus it seems an appropriate time to write my first R-based post.
This post concerns what to do when your R-code goes pear shaped. Back in June there were a couple of very good videos on R debugging that came out of an R meetup in New York. Jay Emerson talked about basic debugging functions like
browser and Harlan Harris talked about more advanced techniques like
debug. These R meetups sound like a great idea but I suspect that we don't have the critical mass of R users here in Buxton, UK. I digress ...
There are two obvious cases where you need to debug things. If you are lucky, you have the case of an error being thrown. It sounds like it should be the worst case, but at least you get to know where the problem is occurring; the more difficult situation is simply getting the wrong answer.
When I get an error, then
traceback() is my usual instinctive response to find the location of the error. If the
ggplot2 package is loaded, then
tr() provides a way to save a few keystrokes. This function isn't infallible though, and seems to have particular trouble with code in
try blocks. To see this, compare
throw_error <- function() stop("!!!")
throw_error_in_try <- function() try(stop("!!!"))
traceback() #Same as before; new error did not register
In the 'hard' case, where you have a wrong answer rather than an error, or where
traceback has let you down, you'll have to step through your code to hunt down the problem. This entails using the
debug function, or it's graphical equivalent
mtrace (from the
debug package). I don't want to spend time on those functions here (another post perhaps), so if you're desperate to know how they work I recommend Harlan's video tutorial.
After I've found the function that is causing the problem, my next step is usually to stick a call to
browser in my code and rerun it. This lets me explore my environment and test out alternative code. The following example is just a wrapper for
sum. The nesting gives us a call stack to look at.
first <- function(...) second(...)
second <- function(...) sum(...)
first(2, 4, "6")
The call to traceback tells us where the error is.
1: first(2, 4, "6")
Let's suppose that the error wasn't as obvious as this. (That's the character input, for those of you asleep at the back.) The
traceback output has shown us that the error occurred in the function
second. Technically, it occurred in
sum, but the contents of that are C code rather than R code and
traceback is smart enough to know it is of no use to us. As I described above, the next step is to call
browser, just before the problem occurs.
second <- function(...)
Now when we rerun the code, execution halts at the
browser() line, and we get the chance to dig about into what is going on. In this case, since all the arguments are contained in the ellipsis, we can see everything with
list(...). The more common circumstance is to have at least some named arguments in your function call. In those cases
ls.str() is the quickest way to see what is going on.
The real strength of the
browser function is that if the problem wasn't obvious now, we could go on to execute additional code from within the function environment. Nevertheless, there are some limitations with the approach. The first issue is that we need to be able to get at the source code. There are two main cases where we can't do this: firstly, when external code is called and secondly, when the code is tucked away in a package. The external case is sadly quite insoluble. I don't know of any easy ways to debug C or Fortran code from R. For debugging in packages, if you have an error being thrown, setting
options(error = recover) is an excellent alternative to adding
browser statements. If there is no error thrown, then
trace are the best solutions, but they are beyond the scope of this post.
Another issue is that the problem with our code may not be caused in the same function that the error is thrown in. It could occur higher up the call stack. In that sort of situation, you need a way to see what is happening in all the functions that you've called. And that is what I'm going to talk about in part two of this post.