Really useful bits of code that are missing from R
There are some pieces of code that are so simple and obvious that they really ought to be included in base R somewhere.
Geometric mean and standard deviation – a staple for anyone who deals with lognormally distributed data.
geomean <- function(x, na.rm = FALSE, trim = 0, ...) { exp(mean(log(x, ...), na.rm = na.rm, trim = trim, ...)) } geosd <- function(x, na.rm = FALSE, ...) { exp(sd(log(x, ...), na.rm = na.rm, ...)) }
A drop option for nlevels
. Sure your factor has 99 levels, but how many of them actually crop up in your dataset?
nlevels <- function(x, drop = FALSE) base::nlevels(x[, drop = drop])
A way of converting factors to numbers that is quicker than as.numeric(as.character(my_factor))
and easier to remember than the method suggested in the FAQ on R.
factor2numeric <- function(f) { if(!is.factor(f)) stop("the input must be a factor") as.numeric(levels(f))[as.integer(f)] }
A “not in” operator. Not many people know the precedence rules well enough to know that !x %in% y
means !(x %in% y)
rather than (!x) %in% y
, but x %!in% y
should be clear to all.
"%!in%" <- function(x, y) !(x %in% y)
I’m sure there are loads more snippets like this that would be useful to have; please contribute your own in the comments.
EDIT:
Thanks for all your suggestions. I had another idea while drifting off to sleep last night. The error message thrown by stopifnot
is a little clunky, which means that I end up with lots of instances of if(!some_condition) stop("A nicer error message")
. The factor2numeric
function above is typical. If stopifnot
allowed for custom error messages, I’d be much more inclined to use it.
stopifnot <- function (..., errmsg = NULL) { n <- length(ll <- list(...)) if (n == 0L) return(invisible()) mc <- match.call() for (i in 1L:n) if (!(is.logical(r <- ll[[i]]) && !any(is.na(r)) && all(r))) { ch <- deparse(mc[[i + 1]], width.cutoff = 60L) if (length(ch) > 1L) ch <- paste(ch[1L], "....") if(is.null(errmsg)) errmsg <- paste(ch, " is not ", if (length(r) > 1L) "all ", "TRUE", sep = "") stop(errmsg, call. = FALSE) } invisible() }
ANOTHER EDIT:
Once you start thinking about this, it’s really easy to keep coming up with ideas. Checking to see if an object is scalar is easy – it just has to have length 1.
is.scalar <- function(x) length(x) == 1
A little bit easer and quicker is:
as.numeric(levels(f))[f]
Every time I use:
as.numeric(as.character(my_factor))
I think I must be doing something wrong.Nice post. For `nlevels`, consider the new function `droplevels` in R, so we could have something like :
nlevels <- function(x, drop = FALSE, …) {
if(drop)
base:::nlevels(droplevels(x, …))
else
base:::nlevels(x)
}
The geomean function won’t cover you if you have negative values, that can get trickier and is a problem I run into sometimes.
http://www.buzzardsbay.org/geomean.htm#negative_values
If you input negative numbers to geomean then it returns NaN, which I think is the correct behaviour. You can persuade it to give you a numeric answer by converting to complex numbers, e.g., geomean(as.complex(-1:-5)).
Nice topic!
Here is a one I recently found useful:
labels.hclust <- function(object, …) as.character(object$labels)
I wonder were we are supposed to propose such function…
Nice snippet. To get things into R, your choice is probably
1. Post on R-devel and hope someone likes the idea. I get the feeling that you have to regularly contribute there before feature requests will be taken seriously.
2. Pick a member of R-core who may be interested and contact them directly. You’ll either benefit from the personal touch or be added to their email block list.
For “not in” operator I use code based directly on “%in%”:
match(x, table, nomatch = 0L) == 0L
Hola, Reading this website is a real pleasure, thanks !
There’s many things I dislike about SAS and Visual Basic, but one thing I do like is the presence of a string concatenation operator: || in the former, + in the latter. Something similar in R might be
`%+%` <- function(a, b) paste(a, b, sep="")
which would let you do
"hello " %+% "world"
"var" %+% 1:3 # var1 var2 var3
Excellent idea. It’s a shame that + isn’t an S3 method because then we could define `+.character` to save bothering with the % signs.
^ That should be & and not + for the VB concatenation operator
Reblogged this on Deep thought and commented:
Một vài code R hữu ích