One of the bits of feedback that I got from the useR conference was that my
assertive package, for run-time testing of code, was too big. Since then I’ve been working to modularise it, and the results are now on CRAN.
Now, rather than one big package, there are fifteen
assertive.* packages for specific pieces of functionality. For example,
assertive.numbers contains functionality for checking numeric vectors,
assertive.datetimes contains functionality for checking dates and times, and
assertive.properties contains functionality for checking names, attributes, and lengths.
This finer grained approach means that if you want to develop a package with assertions, you can choose only the parts that you need, allowing for a much smaller footprint.
pathological package, which depends upon
assertive, gives you an example of how to do this.
assertive package itself now contains no functions of its own – it merely imports and re-exports the functions from the other 15 packages. This means that if you are working with
assertive interactively, you can still simply type
library(assertive) and have access to all the functionality.
Over the last week or two I’ve been pushing all my packages to CRAN.
pathological (for working with file paths),
runittotestthat (for converting RUnit tests to testthat tests), and
regex, for building regular expressions in a human readable way) all make their CRAN debuts.
assertive, for run-time testing your code has more checks for the state of your R setup (
r_has_png_capability, and many more), checks for the state of your variables (
are_same_length, etc.), and utilities (
sig (for checking that your function signatures are sensible) now works with primitive functions too.
learningr (to accompany the book) has a reference URL fix but is otherwise the same.
I encourage you to take a look at some or all of them, and give me feedback.
I was recently hunting for a function that will strip the extension from a file – changing
foo, and so forth. I was knitting a report, and wanted to replace the file extension of the input with the extension of the the output file. (
knitr handles this automatically in most cases but I had some custom logic in there that meant I had to work things manually.)
Finding file extensions is such a common task that I figured that someone must have written a function to solve the problem already. A quick search using
findFn("file extension") from the
sos package revealed a few thousand hits. There’s a lot of noise in there, but I found a few promising candidates.
removeExt in the
limma package (you can find it on Bioconductor),
remove_file_extension which has identical copies in both
extension in the
To save you the time and effort, I’ve tried them all, and unfortunately they all suck.
At a bare minimum, a file extension stripper needs to be vectorized, deal with different file extensions within that vector, deal with multiple levels of extension (for things like “tar.gz” files), and with filenames with dots in the name other than the extension, and with missing values, and with directories. OK, that’s quite a few things but I’m picky.
Since all the existing options failed, I’ve made my own function. In fact, I went overboard and created a package of path manipulation utilities, the
pathological package. It isn’t on CRAN yet, but you can install it via:
It’s been a while since I’ve used MATLAB, but I have fond recollections of its
fileparts function that splits a path up into the directory, filename and extension.
The pathological equivalent is to decompose a path, which returns a
data.frame with three columns.
library(pathological) x <- c( "somedir/foo.tgz", # single extension "another dir\\bar.tar.gz", # double extension "baz", # no extension "quux. quuux.tbz2", # single ext, dots in filename R.home(), # a dir "~", # another dir "~/quuuux.tar.xz", # a file in a dir "", # empty ".", # current dir "..", # parent dir NA_character_ # missing ) (decomposed <- decompose_path(x)) ## dirname filename extension ## somedir/foo.tgz "d:/workspace/somedir" "foo" "tgz" ## another dir\\bar.tar.gz "d:/workspace/another dir" "bar" "tar.gz" ## baz "d:/workspace" "baz" "" ## quux. quuux.tbz2 "d:/workspace" "quux. quuux" "tbz2" ## C:/PROGRA~1/R/R-31~1.0 "C:/Program Files/R/R-3.1.0" "" "" ## ~ "C:/Users/richie/Documents" "" "" ## ~/quuuux.tar.xz "C:/Users/richie/Documents" "quuuux" "tar.xz" ## "" "" "" ## . "d:/workspace" "" "" ## .. "d:/" "" "" ## <NA> NA NA NA ## attr(,"class") ##  "decomposed_path" "matrix"
There are some shortcut functions to get at different parts of the filename:
get_extension(x) ## somedir/foo.tgz another dir\\bar.tar.gz baz ## "tgz" "tar.gz" "" ## quux. quuux.tbz2 C:/PROGRA~1/R/R-31~1.0 ~ ## "tbz2" "" "" ## ~/quuuux.tar.xz . ## "tar.xz" "" "" ## .. <NA> ## "" NA strip_extension(x) ##  "d:/workspace/somedir/foo" "d:/workspace/another dir/bar" ##  "d:/workspace/baz" "d:/workspace/quux. quuux" ##  "C:/Program Files/R/R-3.1.0" "C:/Users/richie/Documents" ##  "C:/Users/richie/Documents/quuuux" "/" ##  "d:/workspace" "d:/" ##  NA strip_extension(x, include_dir = FALSE) ## somedir/foo.tgz another dir\\bar.tar.gz baz ## "foo" "bar" "baz" ## quux. quuux.tbz2 C:/PROGRA~1/R/R-31~1.0 ~ ## "quux. quuux" "" "" ## ~/quuuux.tar.xz . ## "quuuux" "" "" ## .. <NA> ## "" NA
You can also get your original file location (in a standardised form) using
recompose_path(decomposed) ##  "d:/workspace/somedir/foo.tgz" ##  "d:/workspace/another dir/bar.tar.gz" ##  "d:/workspace/baz" ##  "d:/workspace/quux. quuux.tbz2" ##  "C:/Program Files/R/R-3.1.0" ##  "C:/Users/richie/Documents" ##  "C:/Users/richie/Documents/quuuux.tar.xz" ##  "/" ##  "d:/workspace" ##  "d:/" ##  NA
The package also contains a few other path utilities. The standardisation I mentioned comes from
standardize_path also available for Americans), and there’s a
dir_copy function for copying directories.
It’s brand new, so after I’ve complained about other people’s code, I’m sure karma will ensure that you’ll find a bug or two, but I hope you find it useful.