Really useful bits of code that are missing from R
There are some pieces of code that are so simple and obvious that they really ought to be included in base R somewhere.
Geometric mean and standard deviation – a staple for anyone who deals with lognormally distributed data.
geomean <- function(x, na.rm = FALSE, trim = 0, ...)
{
exp(mean(log(x, ...), na.rm = na.rm, trim = trim, ...))
}
geosd <- function(x, na.rm = FALSE, ...)
{
exp(sd(log(x, ...), na.rm = na.rm, ...))
}
A drop option for nlevels. Sure your factor has 99 levels, but how many of them actually crop up in your dataset?
nlevels <- function(x, drop = FALSE) base::nlevels(x[, drop = drop])
A way of converting factors to numbers that is quicker than as.numeric(as.character(my_factor)) and easier to remember than the method suggested in the FAQ on R.
factor2numeric <- function(f)
{
if(!is.factor(f)) stop("the input must be a factor")
as.numeric(levels(f))[as.integer(f)]
}
A “not in” operator. Not many people know the precedence rules well enough to know that !x %in% y means !(x %in% y) rather than (!x) %in% y, but x %!in% y should be clear to all.
"%!in%" <- function(x, y) !(x %in% y)
I’m sure there are loads more snippets like this that would be useful to have; please contribute your own in the comments.
EDIT:
Thanks for all your suggestions. I had another idea while drifting off to sleep last night. The error message thrown by stopifnot is a little clunky, which means that I end up with lots of instances of if(!some_condition) stop("A nicer error message"). The factor2numeric function above is typical. If stopifnot allowed for custom error messages, I’d be much more inclined to use it.
stopifnot <- function (..., errmsg = NULL)
{
n <- length(ll <- list(...))
if (n == 0L)
return(invisible())
mc <- match.call()
for (i in 1L:n) if (!(is.logical(r <- ll[[i]]) && !any(is.na(r)) &&
all(r))) {
ch <- deparse(mc[[i + 1]], width.cutoff = 60L)
if (length(ch) > 1L)
ch <- paste(ch[1L], "....")
if(is.null(errmsg)) errmsg <- paste(ch, " is not ", if (length(r) > 1L)
"all ", "TRUE", sep = "")
stop(errmsg, call. = FALSE)
}
invisible()
}
ANOTHER EDIT:
Once you start thinking about this, it’s really easy to keep coming up with ideas. Checking to see if an object is scalar is easy – it just has to have length 1.
is.scalar <- function(x) length(x) == 1

A little bit easer and quicker is:
as.numeric(levels(f))[f]
Every time I use:
as.numeric(as.character(my_factor))I think I must be doing something wrong.Nice post. For `nlevels`, consider the new function `droplevels` in R, so we could have something like :
nlevels <- function(x, drop = FALSE, …) {
if(drop)
base:::nlevels(droplevels(x, …))
else
base:::nlevels(x)
}
The geomean function won’t cover you if you have negative values, that can get trickier and is a problem I run into sometimes.
http://www.buzzardsbay.org/geomean.htm#negative_values
If you input negative numbers to geomean then it returns NaN, which I think is the correct behaviour. You can persuade it to give you a numeric answer by converting to complex numbers, e.g., geomean(as.complex(-1:-5)).
Nice topic!
Here is a one I recently found useful:
labels.hclust <- function(object, …) as.character(object$labels)
I wonder were we are supposed to propose such function…
Nice snippet. To get things into R, your choice is probably
1. Post on R-devel and hope someone likes the idea. I get the feeling that you have to regularly contribute there before feature requests will be taken seriously.
2. Pick a member of R-core who may be interested and contact them directly. You’ll either benefit from the personal touch or be added to their email block list.
For “not in” operator I use code based directly on “%in%”:
match(x, table, nomatch = 0L) == 0L
Hola, Reading this website is a real pleasure, thanks !
There’s many things I dislike about SAS and Visual Basic, but one thing I do like is the presence of a string concatenation operator: || in the former, + in the latter. Something similar in R might be
`%+%` <- function(a, b) paste(a, b, sep="")
which would let you do
"hello " %+% "world"
"var" %+% 1:3 # var1 var2 var3
Excellent idea. It’s a shame that + isn’t an S3 method because then we could define `+.character` to save bothering with the % signs.
^ That should be & and not + for the VB concatenation operator