One of the bits of feedback that I got from the useR conference was that my
assertive package, for run-time testing of code, was too big. Since then I’ve been working to modularise it, and the results are now on CRAN.
Now, rather than one big package, there are fifteen
assertive.* packages for specific pieces of functionality. For example,
assertive.numbers contains functionality for checking numeric vectors,
assertive.datetimes contains functionality for checking dates and times, and
assertive.properties contains functionality for checking names, attributes, and lengths.
This finer grained approach means that if you want to develop a package with assertions, you can choose only the parts that you need, allowing for a much smaller footprint.
pathological package, which depends upon
assertive, gives you an example of how to do this.
assertive package itself now contains no functions of its own – it merely imports and re-exports the functions from the other 15 packages. This means that if you are working with
assertive interactively, you can still simply type
library(assertive) and have access to all the functionality.
“Assertion” is computer-science jargon for a run-time check on your code. In R , this typically means function argument checks (“did they pass a numeric vector rather than a character vector into your function?”), and data quality checks (“does the date-of-birth column contain values in the past?”).
The four packages
R currently has four packages for assertions:
assertive, which is mine;
assertthat by Hadley Wickham,
assertr By Tony Fischetti, and
ensurer by Stefan Bache.
Having four packages feels like too many; we’re duplicating effort, and it makes package choice too hard for users. I didn’t know about the existence of
ensurer until a couple of days ago, but the useR conference has helped bring these rivals to my attention. I’ve chatted with the authors of the other three packages to see if we can streamline things a little.
Hadley said that
assertthat isn’t a high priority for him – dplyr, ggplot2 and tidyr (among many others) are more important – so he’s not going to develop it further. Since
assertthat is mostly a subset of
assertive anyway, this shouldn’t be a problem. I’ll take a look how easy it is to provide an
assertthat API, so existing users can have a direct replacement.
Tony said that the focus of
assertr is predominantly data checking. It only works with data frames, and has a more limited remit than
assertive. He plans to change the backend to be built on top of
assertive. That is,
assertr will be an
assertive extension that make it easy to apply assertions to multiple columns in data frames.
Stefan has stated that he prefers to keep
ensurer separate, since it has a different philosophical stance to
assertive, and I agree.
ensurer is optimised for being lightweight and elegant;
assertive is optimised for clarity of user code and clarity of error messages (at a cost of some bulk).
So overall, we’re down from four distinct assertion packages to two groups (
assertive). This feels sensible. It’s the optimum number for minimizing duplication while still having the some competition to spur development onwards.
The assertive development plan
ensurer has one feature in particular that I definitely want to include in
assertive: you can create type-safe functions.
The question of bulk has also been playing on my mind for a while. It isn’t huge by any means – the tar.gz file for the package is 836kB – but the number of functions can make it a little difficult for new users to find their way around. A couple of years ago when I was working with a lot of customer data, I included functions for checking things like the validity of UK postcodes. These are things that I’m unlikely to use at all in my current job, so it seems superfluous to have them. That means that I’d like to make
assertive more modular. The core things should be available in an
assertive.base package, with specialist assertions in additional packages.
I also want to make it easier for other package developers to include their own assertions in their packages. This will require a bit of rethinking about how the existing assertion engine works, and what internal bits I need to expose.
One bit of feedback I got from the attendees at my tutorial this week was that for simulation usage (where you call the same function millions of times), assertions can slow down the code too much. So a way to turn off the assertions (but keep them there for debugging purposes) would be useful.
The top feature request however, was for the use of pipe compatibility. Stefan’s
magrittr package has rocketed in popularity (I’m a huge fan), so this definitely needs implementing. It should be a small fix, so I should have it included soon.
There are some other small fixes like better NA handling and a better error message for
is_in_range that I plan to make soon.
The final (rather non-trivial) feature I want to add to assertive is support for error messages in multiple languages. The infrastructure is in place for translations (it currently support both the languages that I know; British English and American English), I just need some people who can speak other languages to do the translations. If you are interested in translating; drop me an email or let me know in the comments.
Over the last week or two I’ve been pushing all my packages to CRAN.
pathological (for working with file paths),
runittotestthat (for converting RUnit tests to testthat tests), and
regex, for building regular expressions in a human readable way) all make their CRAN debuts.
assertive, for run-time testing your code has more checks for the state of your R setup (
r_has_png_capability, and many more), checks for the state of your variables (
are_same_length, etc.), and utilities (
sig (for checking that your function signatures are sensible) now works with primitive functions too.
learningr (to accompany the book) has a reference URL fix but is otherwise the same.
I encourage you to take a look at some or all of them, and give me feedback.
assertive, my new package for writing robust code, is now on CRAN. It consists of lots of
is functions for checking variables, and corresponding
assert functions that throw an error if the condition doesn’t hold. For example,
is_a_number checks that the input is numeric and scalar.
is_a_number(1) #TRUE is_a_number("a") #FALSE is_a_number(1:10) #FALSE
In the last two cases, the return value of FALSE has an attribute “
cause” that indicates the cause of failure. When “a” is the input, the cause is “
"a" is not of type 'numeric'.“, whereas for
1:10, the cause is “
1:10 does not have length one.“. You can get or set the cause attribute with the
m <- lm(uptake ~ 1, CO2) ok <- is_empty_model(m) if(!ok) cause(ok)
assert functions call an
is function, and if the result is FALSE, they throw an error; otherwise they do nothing.
assert_is_a_number(1) #OK assert_is_a_number("a") #Throws an error
There are also some
has functions, primarily for checking the presence of attributes.
has_names(c(foo = 1, bar = 4, baz = 9)) has_dims(matrix(1:12, nrow = 3))
Some functions apply to properties of vectors. In this case, the
assert functions can check that all the values conform to the condition, or any of the values conform.
x <- -2:2 is_positive(x) #The last two are TRUE assert_any_are_positive(x) #OK assert_all_are_positive(x) #Error
“Why would you want to use these functions?”, you may be asking. The dynamic typing and extreme flexibility of R means that it is very easy to have variables that are the wrong format. This is particularly true when you are dealing with user input. So while you know that the sales totals passed to your function should be a vector of non-negative numbers, or that the regular expression should be a single string rather than a character vector, your user may not. You need to check for these invalid conditions, and return an error message that the user can understand. assertive makes it easy to do all this.
Since this is the first public release of assertive, it hasn’t been widely tested. I’ve written a moderately comprehensive unit-test suite, but there are likely to be a few minor bugs here and there. In particular, I suspect there may be one or two typos in the documentation. Please give the package a try, and let me know if you find any errors, or if you want any other functions adding.