Home > R > The state of assertions in R

The state of assertions in R

“Assertion” is computer-science jargon for a run-time check on your code. In R , this typically means function argument checks (“did they pass a numeric vector rather than a character vector into your function?”), and data quality checks (“does the date-of-birth column contain values in the past?”).

The four packages

R currently has four packages for assertions: assertive, which is mine; assertthat by Hadley Wickham, assertr By Tony Fischetti, and ensurer by Stefan Bache.

Having four packages feels like too many; we’re duplicating effort, and it makes package choice too hard for users. I didn’t know about the existence of assertr or ensurer until a couple of days ago, but the useR conference has helped bring these rivals to my attention. I’ve chatted with the authors of the other three packages to see if we can streamline things a little.

Hadley said that assertthat isn’t a high priority for him – dplyr, ggplot2 and tidyr (among many others) are more important – so he’s not going to develop it further. Since assertthat is mostly a subset of assertive anyway, this shouldn’t be a problem. I’ll take a look how easy it is to provide an assertthat API, so existing users can have a direct replacement.

Tony said that the focus of assertr is predominantly data checking. It only works with data frames, and has a more limited remit than assertive. He plans to change the backend to be built on top of assertive. That is, assertr will be an assertive extension that make it easy to apply assertions to multiple columns in data frames.

Stefan has stated that he prefers to keep ensurer separate, since it has a different philosophical stance to assertive, and I agree. ensurer is optimised for being lightweight and elegant; assertive is optimised for clarity of user code and clarity of error messages (at a cost of some bulk).

So overall, we’re down from four distinct assertion packages to two groups (assertive/assertr and assertive). This feels sensible. It’s the optimum number for minimizing duplication while still having the some competition to spur development onwards.

The assertive development plan

ensurer has one feature in particular that I definitely want to include in assertive: you can create type-safe functions.

The question of bulk has also been playing on my mind for a while. It isn’t huge by any means – the tar.gz file for the package is 836kB – but the number of functions can make it a little difficult for new users to find their way around. A couple of years ago when I was working with a lot of customer data, I included functions for checking things like the validity of UK postcodes. These are things that I’m unlikely to use at all in my current job, so it seems superfluous to have them. That means that I’d like to make assertive more modular. The core things should be available in an assertive.base package, with specialist assertions in additional packages.

I also want to make it easier for other package developers to include their own assertions in their packages. This will require a bit of rethinking about how the existing assertion engine works, and what internal bits I need to expose.

One bit of feedback I got from the attendees at my tutorial this week was that for simulation usage (where you call the same function millions of times), assertions can slow down the code too much. So a way to turn off the assertions (but keep them there for debugging purposes) would be useful.

The top feature request however, was for the use of pipe compatibility. Stefan’s magrittr package has rocketed in popularity (I’m a huge fan), so this definitely needs implementing. It should be a small fix, so I should have it included soon.

There are some other small fixes like better NA handling and a better error message for is_in_range that I plan to make soon.

The final (rather non-trivial) feature I want to add to assertive is support for error messages in multiple languages. The infrastructure is in place for translations (it currently support both the languages that I know; British English and American English), I just need some people who can speak other languages to do the translations. If you are interested in translating; drop me an email or let me know in the comments.

Advertisements
  1. 4th July, 2015 at 12:13 pm

    Hi Richie, was good to meet you at useR and thanks for the nice tutorial.

    To trim down the number of functions, how about something like an assert_special() function with argument “type” that could then be e.g. c(“UK_Postcode”, “UK_tel”, “US_Zip”, …) so users can easily remember the function name, then they just have to look at the help file to check which option they need.

    • 5th July, 2015 at 22:53 pm

      Yeah, there are a few cases where it is easy to replace multiple functions with a snigle function that has more arguments. For example, I could replace is_windows, is_mac, is_linux, etc., with is_os(“myos”). I’ve mostly been reluctant to do that because you lose autocompletion, and I’m a lousy typist. If you think a smaller package is worth the extra typing effort, I’d be happy to consider it.

  2. Denes
    4th July, 2015 at 12:58 pm

    Hi, you should also consider the ‘checkmate’ package (http://cran.r-project.org/web/packages/checkmate/index.html & https://github.com/mllg/checkmate). IMHO this is the best package available for fast and concise check of function arguments.

    • 5th July, 2015 at 22:48 pm

      Hmm. I had at least two conversations with Bernd Bischl this week (one of the checkmate authors), and somehow I still didn’t hear about his package. Thanks for letting me know.

  3. 6th July, 2015 at 12:13 pm

    Hi Richie, I think this is great. Do you have any plans to extending the assertion functionality towards ideas of Design by Contract (https://en.wikipedia.org/wiki/Design_by_contract). It is probably not 100% comparable, but assertions at the beginning of a function could be seen as pre-conditions and assertions at the end of a function would most likely be post-conditions. Class invariants are probably more difficult to cover.
    Best regards, Peter

  1. 4th July, 2015 at 20:45 pm

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: