Posts Tagged ‘packages’

Automatically convert RUnit tests to testthat tests

12th May, 2014 1 comment

There’s a new version of my assertive package, for sanity-checking code, on its way to CRAN. The release has been delayed a while, since my previous attempt at an upload met with an error that was only generated on the CRAN machine, but not on my own. The problem lay with some code designed to autorun the RUnit tests for the package. After fiddling for a while and getting nowhere, I decided it was time to make the switch to testthat.

I’ve been a long-time RUnit user, since the syntax is near-identical to every other xUnit variant in every programming language. So switching between RUnit and MATLAB xUnit or NUnit requires no thinking. testthat has a couple of important advantages over RUnit though.

test_package makes it easy to, ahem, test your package. A page of code for finding and nicely displaying bad tests has been reduced to test_package("assertive").

Secondly, testing for warnings is much cleaner. In RUnit, you have to use convoluted mechanisms like:

test.sqrt.negative_numbers.throws_a_warning <- function()
  old_ops <- options(warn = 2)

The testthat equivalent is more readable:

  "sqrt throws a warning for negative number inputs",

Thirdly, testthat caches tests, so you spend less time waiting for tests that you know are fine to rerun.

These benefits mean that I’ve been meaning to switch packages for a while. The big problem was that the assertive package contains over 300 unit tests. At about a minute or so to update each tests, that was five hours of tedious work that I couldn’t be bothered to do. Instead, I spent two days making a package that automatically converts RUnit tests to testthat tests. Not exactly a time saving, but it was more fun.

It isn’t on CRAN yet, but you can get it from github.


The package contains functions to convert tests on an individual/file/package basis.

convert_test takes an RUnit test function and returns a call to test_that. Here’s an example for the sqrt function.

test.sqrt.3.returns_1.732 <- function()
  x <- 3
  expected <- 1.73205080756888
  checkEquals(sqrt(x), expected)
## test_that("test.sqrt.3.returns_1.732", {
##   x <- 3
##   expected <- 1.73205080756888
##   expect_equal(expected, sqrt(x))
## })

convert_test works with more complicated test functions. You can have multiple checks, nested inside if blocks or loops if you really want.

test.some_complicated_nonsense.returns_an_appropriate_testthat_test <- function()
  x <- 6:10
  for(i in 1:5)
    if(i %% 2 == 0)
      checkTrue(all(x > i), msg = "i divisible by 2") 
      if(i == 4)
        checkIdentical(4, i, msg = "i = 4")
      } else
        while(i > 0) 
          checkIdentical(2, i, msg = "i = 2")
## test_that("test.some_complicated_nonsense.returns_an_appropriate_testthat_test", 
## {
##   x <- 6:10
##   for (i in 1:5) {
##     if (i%%2 == 0) {
##       expect_true(all(x > i), info = "i divisible by 2")
##       if (i == 4) {
##         expect_identical(i, 4, info = "i = 4")
##       }
##       else {
##         while (i > 0) {
##           expect_identical(i, 2, info = "i = 2")
##         }
##         repeat {
##           expect_error(stop("!!!"))
##           break
##         }
##       }
##     }
##   }
## })

Of course, the main use for this is converting whole files or packages at at time, so runittotestthat contains convert_test_file and convert_package_tests for this purpose. By default (so you don’t overwrite your RUnit tests by mistake), they write their output to the console, but you can also write the resulting testthat tests to a file. converting all 300 of assertive’s tests was as easy as

  test_file_regexp = "^test", 
  testthat_files = paste("new-", runit_files)

In that line of code, the runit_files variable is a special name that refers to the names of the files that contain you RUnit tests. It means that the output testthat file names can be be based upon original input names.

Although runittotestthat works fine on all my tests, automatic code editing is a tricky task, so there may be some weird edge cases that I’ve missed. Please download the package and play with it, and let me know if you find any bugs.

Introducing the pathological package for manipulating paths, files and directories

28th April, 2014 7 comments

I was recently hunting for a function that will strip the extension from a file – changing foo.png to foo, and so forth. I was knitting a report, and wanted to replace the file extension of the input with the extension of the the output file. (knitr handles this automatically in most cases but I had some custom logic in there that meant I had to work things manually.)

Finding file extensions is such a common task that I figured that someone must have written a function to solve the problem already. A quick search using findFn("file extension") from the sos package revealed a few thousand hits. There’s a lot of noise in there, but I found a few promising candidates.

There’s removeExt in the limma package (you can find it on Bioconductor), strip_extension in Kmisc, remove_file_extension which has identical copies in both and gdalUtils, and extension in the raster.

To save you the time and effort, I’ve tried them all, and unfortunately they all suck.

At a bare minimum, a file extension stripper needs to be vectorized, deal with different file extensions within that vector, deal with multiple levels of extension (for things like “tar.gz” files), and with filenames with dots in the name other than the extension, and with missing values, and with directories. OK, that’s quite a few things but I’m picky.

Since all the existing options failed, I’ve made my own function. In fact, I went overboard and created a package of path manipulation utilities, the pathological package. It isn’t on CRAN yet, but you can install it via:


It’s been a while since I’ve used MATLAB, but I have fond recollections of its fileparts function that splits a path up into the directory, filename and extension.

The pathological equivalent is to decompose a path, which returns a character matrix data.frame with three columns.

x <- c(
  "somedir/foo.tgz",         # single extension
  "another dir\\bar.tar.gz", # double extension
  "baz",                     # no extension
  "quux. quuux.tbz2",        # single ext, dots in filename
  R.home(),                  # a dir
  "~",                       # another dir
  "~/quuuux.tar.xz",         # a file in a dir
  "",                        # empty 
  ".",                       # current dir
  "..",                      # parent dir
  NA_character_              # missing
(decomposed <- decompose_path(x))
##                          dirname                      filename      extension
## somedir/foo.tgz         "d:/workspace/somedir"       "foo"         "tgz"    
## another dir\\bar.tar.gz "d:/workspace/another dir"   "bar"         "tar.gz" 
## baz                     "d:/workspace"               "baz"         ""       
## quux. quuux.tbz2        "d:/workspace"               "quux. quuux" "tbz2"   
## C:/PROGRA~1/R/R-31~1.0  "C:/Program Files/R/R-3.1.0" ""            ""       
## ~                       "C:/Users/richie/Documents"  ""            ""       
## ~/quuuux.tar.xz         "C:/Users/richie/Documents"  "quuuux"      "tar.xz" 
## ""                           ""            ""       
## .                       "d:/workspace"               ""            ""       
## ..                      "d:/"                        ""            ""       
## <NA>                    NA                           NA            NA       
## attr(,"class")
## [1] "decomposed_path" "matrix" 

There are some shortcut functions to get at different parts of the filename:

##         somedir/foo.tgz another dir\\bar.tar.gz                     baz 
##                   "tgz"                "tar.gz"                      "" 
##        quux. quuux.tbz2  C:/PROGRA~1/R/R-31~1.0                       ~ 
##                  "tbz2"                      ""                      "" 
##         ~/quuuux.tar.xz                                               . 
##                "tar.xz"                      ""                      "" 
##                      ..                    <NA> 
##                      ""                      NA 
##  [1] "d:/workspace/somedir/foo"         "d:/workspace/another dir/bar"    
##  [3] "d:/workspace/baz"                 "d:/workspace/quux. quuux"        
##  [5] "C:/Program Files/R/R-3.1.0"       "C:/Users/richie/Documents"       
##  [7] "C:/Users/richie/Documents/quuuux" "/"                               
##  [9] "d:/workspace"                     "d:/"                             
## [11] NA 

strip_extension(x, include_dir = FALSE)
##         somedir/foo.tgz another dir\\bar.tar.gz                     baz 
##                   "foo"                   "bar"                   "baz" 
##        quux. quuux.tbz2  C:/PROGRA~1/R/R-31~1.0                       ~ 
##           "quux. quuux"                      ""                      "" 
##         ~/quuuux.tar.xz                                               . 
##                "quuuux"                      ""                      "" 
##                      ..                    <NA> 
##                      ""                      NA 

You can also get your original file location (in a standardised form) using

##  [1] "d:/workspace/somedir/foo.tgz"           
##  [2] "d:/workspace/another dir/bar.tar.gz"    
##  [3] "d:/workspace/baz"                       
##  [4] "d:/workspace/quux. quuux.tbz2"          
##  [5] "C:/Program Files/R/R-3.1.0"             
##  [6] "C:/Users/richie/Documents"              
##  [7] "C:/Users/richie/Documents/quuuux.tar.xz"
##  [8] "/"                                      
##  [9] "d:/workspace"                           
## [10] "d:/"                                    
## [11] NA 

The package also contains a few other path utilities. The standardisation I mentioned comes from standardise_path (standardize_path also available for Americans), and there’s a dir_copy function for copying directories.

It’s brand new, so after I’ve complained about other people’s code, I’m sure karma will ensure that you’ll find a bug or two, but I hope you find it useful.

Be assertive!

30th May, 2012 15 comments

assert_package_is_awesome("assertive") returns TRUE.

assertive, my new package for writing robust code, is now on CRAN. It consists of lots of is functions for checking variables, and corresponding assert functions that throw an error if the condition doesn’t hold. For example, is_a_number checks that the input is numeric and scalar.

is_a_number(1)     #TRUE
is_a_number("a")   #FALSE
is_a_number(1:10)  #FALSE

In the last two cases, the return value of FALSE has an attribute “cause” that indicates the cause of failure. When “a” is the input, the cause is “"a" is not of type 'numeric'.“, whereas for 1:10, the cause is “1:10 does not have length one.“. You can get or set the cause attribute with the cause function.

m <- lm(uptake ~ 1, CO2)
ok <- is_empty_model(m)
if(!ok) cause(ok)

The assert functions call an is function, and if the result is FALSE, they throw an error; otherwise they do nothing.

assert_is_a_number(1)   #OK
assert_is_a_number("a") #Throws an error

There are also some has functions, primarily for checking the presence of attributes.

has_names(c(foo = 1, bar = 4, baz = 9))
has_dims(matrix(1:12, nrow = 3))

Some functions apply to properties of vectors. In this case, the assert functions can check that all the values conform to the condition, or any of the values conform.

x <- -2:2
is_positive(x)              #The last two are TRUE
assert_any_are_positive(x)  #OK
assert_all_are_positive(x)  #Error

“Why would you want to use these functions?”, you may be asking. The dynamic typing and extreme flexibility of R means that it is very easy to have variables that are the wrong format. This is particularly true when you are dealing with user input. So while you know that the sales totals passed to your function should be a vector of non-negative numbers, or that the regular expression should be a single string rather than a character vector, your user may not. You need to check for these invalid conditions, and return an error message that the user can understand. assertive makes it easy to do all this.

Since this is the first public release of assertive, it hasn’t been widely tested. I’ve written a moderately comprehensive unit-test suite, but there are likely to be a few minor bugs here and there. In particular, I suspect there may be one or two typos in the documentation. Please give the package a try, and let me know if you find any errors, or if you want any other functions adding.


Get every new post delivered to your Inbox.

Join 204 other followers