## Really useful bits of code that are missing from R

There are some pieces of code that are so simple and obvious that they really ought to be included in base R somewhere.

Geometric mean and standard deviation – a staple for anyone who deals with lognormally distributed data.

geomean <- function(x, na.rm = FALSE, trim = 0, ...) { exp(mean(log(x, ...), na.rm = na.rm, trim = trim, ...)) } geosd <- function(x, na.rm = FALSE, ...) { exp(sd(log(x, ...), na.rm = na.rm, ...)) }

A drop option for `nlevels`

. Sure your factor has 99 levels, but how many of them actually crop up in your dataset?

nlevels <- function(x, drop = FALSE) base::nlevels(x[, drop = drop])

A way of converting factors to numbers that is quicker than `as.numeric(as.character(my_factor))`

and easier to remember than the method suggested in the FAQ on R.

factor2numeric <- function(f) { if(!is.factor(f)) stop("the input must be a factor") as.numeric(levels(f))[as.integer(f)] }

A “not in” operator. Not many people know the precedence rules well enough to know that `!x %in% y`

means `!(x %in% y)`

rather than `(!x) %in% y`

, but `x %!in% y`

should be clear to all.

"%!in%" <- function(x, y) !(x %in% y)

I’m sure there are loads more snippets like this that would be useful to have; please contribute your own in the comments.

EDIT:

Thanks for all your suggestions. I had another idea while drifting off to sleep last night. The error message thrown by `stopifnot`

is a little clunky, which means that I end up with lots of instances of `if(!some_condition) stop("A nicer error message")`

. The `factor2numeric`

function above is typical. If `stopifnot`

allowed for custom error messages, I’d be much more inclined to use it.

stopifnot <- function (..., errmsg = NULL) { n <- length(ll <- list(...)) if (n == 0L) return(invisible()) mc <- match.call() for (i in 1L:n) if (!(is.logical(r <- ll[[i]]) && !any(is.na(r)) && all(r))) { ch <- deparse(mc[[i + 1]], width.cutoff = 60L) if (length(ch) > 1L) ch <- paste(ch[1L], "....") if(is.null(errmsg)) errmsg <- paste(ch, " is not ", if (length(r) > 1L) "all ", "TRUE", sep = "") stop(errmsg, call. = FALSE) } invisible() }

ANOTHER EDIT:

Once you start thinking about this, it’s really easy to keep coming up with ideas. Checking to see if an object is scalar is easy – it just has to have length 1.

is.scalar <- function(x) length(x) == 1

A little bit easer and quicker is:

as.numeric(levels(f))[f]

Every time I use:

`as.numeric(as.character(my_factor))`

I think I must be doing something wrong.Nice post. For `nlevels`, consider the new function `droplevels` in R, so we could have something like :

nlevels <- function(x, drop = FALSE, …) {

if(drop)

base:::nlevels(droplevels(x, …))

else

base:::nlevels(x)

}

The geomean function won’t cover you if you have negative values, that can get trickier and is a problem I run into sometimes.

http://www.buzzardsbay.org/geomean.htm#negative_values

If you input negative numbers to geomean then it returns NaN, which I think is the correct behaviour. You can persuade it to give you a numeric answer by converting to complex numbers, e.g., geomean(as.complex(-1:-5)).

Nice topic!

Here is a one I recently found useful:

labels.hclust <- function(object, …) as.character(object$labels)

I wonder were we are supposed to propose such function…

Nice snippet. To get things into R, your choice is probably

1. Post on R-devel and hope someone likes the idea. I get the feeling that you have to regularly contribute there before feature requests will be taken seriously.

2. Pick a member of R-core who may be interested and contact them directly. You’ll either benefit from the personal touch or be added to their email block list.

For “not in” operator I use code based directly on “%in%”:

match(x, table, nomatch = 0L) == 0L

Hola, Reading this website is a real pleasure, thanks !

There’s many things I dislike about SAS and Visual Basic, but one thing I do like is the presence of a string concatenation operator: || in the former, + in the latter. Something similar in R might be

`%+%` <- function(a, b) paste(a, b, sep="")

which would let you do

"hello " %+% "world"

"var" %+% 1:3 # var1 var2 var3

Excellent idea. It’s a shame that + isn’t an S3 method because then we could define `+.character` to save bothering with the % signs.

^ That should be & and not + for the VB concatenation operator