Posts Tagged ‘debugging’

Oh (de)bugger! Part II

4th September, 2010 Leave a comment

It’s Friday night, and I recently discovered that adding a dash of Cointreau somewhat enhances a gin and tonic.  Consequently, I’m drunk-blogging.  Will try not to make too many tpyos.

In the first part of this series, I discussed some standard debugging techniques, like the use of the browser function.  One of the limitations of browser is that it isn’t much help if the problem is somewhere higher up the call-stack.  That is, the case when you passed a dodgy argument to another function, which maybe passed it on to something else before your code fell over.

If you have an error, but you don’t know where it is, then a helpful thing to see is your call stack, and the variables that you’ve put in it.  The function I’m going to show you uses sys.calls to retrieve the call stack, and sys.frames to retrieve the environment for each function you called.  Next it calls ls.str in each of those environments to tell you what variables you’ve created.  Finally, there are some methods to print your output more clearly.  After all, the whole point of this function is to give you quick insight into what you’ve just done.

When I was trying to think of a suitable name for this function, I toyed with variations on the word “stack”.  Eventually, I figured that it should be named for the word most commonly used when things go wrong.  And so, I present to you, the bugger function.

bugger <- function(...)
# ... contains arguments passed to ls.str.  See that function for a description.
+++stack_length <- sys.nframe()
+++call_stack <- sys.calls()[-stack_length] # -stack_length is so we don't include this function
+++frames <- sys.frames()[-stack_length]
+++ls_stack <- lapply(frames, function(e) ls.str(envir = e, ...))
+++call_names <- make.names(lapply(call_stack,
++++++function(cl) as.character(cl)[1]), unique = TRUE)
+++names(call_stack) <- call_names
+++names(ls_stack) <- call_names
+++class(call_stack) <- "call_stack"
+++class(ls_stack) <- "ls_stack"
+++list(call_stack = call_stack, ls_stack = ls_stack)

The call to make.names is there to ensure that each items in call_stack and ls_stack have a unique name, and thus can be referenced more easily.   To finish, these two components get S3-style overloads of the print function.

print.call_stack <- function(x)
+++n <- length(x)
+++if(n < 1) cat("Call stack is empty\n")
+++calls <- sapply(x, function(x) as.character(as.expression(x)))
+++cat(paste(seq_len(n), calls, sep = ": "), sep = "\n")

print.ls_stack <- function(x, ...)
+++n <- length(x)
+++if(n < 1) cat("Call stack is empty\n")
+++titles <- paste(seq_len(n), names(x), sep = ": ")
+++for(i in seq_len(n))
++++++cat(titles[i], "\n")
++++++noquote(print(x[[i]], ...))

And finally, here’s an example to put the code to use

bar <- function(xx, ...)

foo <- function(x, y, z)
+++another_variable <- 999
+++bar(x, y, letters[1:5])

foo(1:10, "monkey", list(runif(3), zzz = "zzz"))

This yields the result

1: foo(1:10, "monkey", list(runif(3), zzz = "zzz"))
2: bar(x, y, letters[1:5])

1: foo
another_variable : num 999
x : int [1:10] 1 2 3 4 5 6 7 8 9 10
y : chr "monkey"
z : List of 2
+$ : num [1:3] 0.154 0.974 0.189
+$ zzz: chr "zzz"2: bar
xx : int [1:10] 1 2 3 4 5 6 7 8 9 10

One thing to note is that ls.str doesn’t pick up the ... arguments.  It is possible to retrieve the call to those dots from each environment, but it isn’t always possible to evaluate them.  Playing with R’s evaluation model in this way is deep magic, so accessing them involves excessive use of substitute and eval.  I’m going to postpone talking about them for an occasion that didn’t involve gin and cointreau.

Tags: ,

Oh (de)bugger!

26th August, 2010 1 comment

By number of questions asked, R passed MATLAB for the first time on Stack Overflow today. Thus it seems an appropriate time to write my first R-based post.

This post concerns what to  do when your R-code goes pear shaped. Back in June there were a couple of very good videos on R debugging that came out of an R meetup in New York. Jay Emerson talked about basic debugging functions like print and browser and Harlan Harris talked about more advanced techniques like trace and debug. These R meetups sound like a great idea but I suspect that we don’t have the critical mass of R users here in Buxton, UK. I digress …

There are two obvious cases where you need to debug things. If you are lucky, you have the case of an error being thrown. It sounds like it should be the worst case, but at least you get to know where the problem is occurring; the more difficult situation is simply getting the wrong answer.

When I get an error, then traceback() is my usual instinctive response to find the location of the error. If the ggplot2 package is loaded, then tr() provides a way to save a few keystrokes. This function isn’t infallible though, and seems to have particular trouble with code in try blocks. To see this, compare

throw_error <- function() stop("!!!")


throw_error_in_try <- function() try(stop("!!!"))
traceback() #Same as before; new error did not register

In the ‘hard’ case, where you have a wrong answer rather than an error, or where traceback has let you down, you’ll have to step through your code to hunt down the problem. This entails using the debug function, or it’s graphical equivalent mtrace (from the debug package). I don’t want to spend time on those functions here (another post perhaps), so if you’re desperate to know how they work I recommend Harlan’s video tutorial.

After I’ve found the function that is causing the problem, my next step is usually to stick a call to browser in my code and rerun it. This lets me explore my environment and test out alternative code. The following example is just a wrapper for sum. The nesting gives us a call stack to look at.

first <- function(...) second(...)
second <- function(...) sum(...)
first(2, 4, "6")

The call to traceback tells us where the error is.

2: second(...)
1: first(2, 4, "6")

Let’s suppose that the error wasn’t as obvious as this. (That’s the character input, for those of you asleep at the back.) The traceback output has shown us that the error occurred in the function second. Technically, it occurred in sum, but the contents of that are C code rather than R code and traceback is smart enough to know it is of no use to us. As I described above, the next step is to call browser, just before the problem occurs.

second <- function(...)

Now when we rerun the code, execution halts at the browser() line, and we get the chance to dig about into what is going on. In this case, since all the arguments are contained in the ellipsis, we can see everything with list(...). The more common circumstance is to have at least some named arguments in your function call. In those cases ls.str() is the quickest way to see what is going on. list(...) returns

[1] 2

[1] 4

[1] “6”

The real strength of the browser function is that if the problem wasn’t obvious now, we could go on to execute additional code from within the function environment. Nevertheless, there are some limitations with the approach. The first issue is that we need to be able to get at the source code. There are two main cases where we can’t do this: firstly, when external code is called and secondly, when the code is tucked away in a package. The external case is sadly quite insoluble. I don’t know of any easy ways to debug C or Fortran code from R. For debugging in packages, if you have an error being thrown, setting options(error = recover) is an excellent alternative to adding browser statements. If there is no error thrown, then debug and trace are the best solutions, but they are beyond the scope of this post.

Another issue is that the problem with our code may not be caused in the same function that the error is thrown in. It could occur higher up the call stack. In that sort of situation, you need a way to see what is happening in all the functions that you’ve called. And that is what I’m going to talk about in part two of this post.

Tags: ,