Home > R > Really useful bits of code that are missing from R

Really useful bits of code that are missing from R

There are some pieces of code that are so simple and obvious that they really ought to be included in base R somewhere.

Geometric mean and standard deviation – a staple for anyone who deals with lognormally distributed data.

geomean <- function(x, na.rm = FALSE, trim = 0, ...)
{
   exp(mean(log(x, ...), na.rm = na.rm, trim = trim, ...))
}

geosd <- function(x, na.rm = FALSE, ...) 
{
   exp(sd(log(x, ...), na.rm = na.rm, ...))
}

A drop option for nlevels. Sure your factor has 99 levels, but how many of them actually crop up in your dataset?

nlevels <- function(x, drop = FALSE) base::nlevels(x[, drop = drop])

A way of converting factors to numbers that is quicker than as.numeric(as.character(my_factor)) and easier to remember than the method suggested in the FAQ on R.

factor2numeric <- function(f)
{
   if(!is.factor(f)) stop("the input must be a factor")
   as.numeric(levels(f))[as.integer(f)]
}

A “not in” operator. Not many people know the precedence rules well enough to know that !x %in% y means !(x %in% y) rather than (!x) %in% y, but x %!in% y should be clear to all.

"%!in%" <- function(x, y) !(x %in% y)

I’m sure there are loads more snippets like this that would be useful to have; please contribute your own in the comments.

EDIT:
Thanks for all your suggestions. I had another idea while drifting off to sleep last night. The error message thrown by stopifnot is a little clunky, which means that I end up with lots of instances of if(!some_condition) stop("A nicer error message"). The factor2numeric function above is typical. If stopifnot allowed for custom error messages, I’d be much more inclined to use it.

stopifnot <- function (..., errmsg = NULL) 
{
    n <- length(ll <- list(...))
    if (n == 0L) 
        return(invisible())
    mc <- match.call()
    for (i in 1L:n) if (!(is.logical(r <- ll[[i]]) && !any(is.na(r)) && 
        all(r))) {
        ch <- deparse(mc[[i + 1]], width.cutoff = 60L)
        if (length(ch) > 1L) 
            ch <- paste(ch[1L], "....")
        if(is.null(errmsg)) errmsg <- paste(ch, " is not ", if (length(r) > 1L) 
            "all ", "TRUE", sep = "")
        stop(errmsg, call. = FALSE)
    }
    invisible()
}

ANOTHER EDIT:
Once you start thinking about this, it’s really easy to keep coming up with ideas. Checking to see if an object is scalar is easy – it just has to have length 1.

is.scalar <- function(x) length(x) == 1
About these ads
Tags:
  1. Wojtek Sobala
    10th January, 2011 at 15:49 pm

    A little bit easer and quicker is:
    as.numeric(levels(f))[f]

  2. 10th January, 2011 at 16:36 pm

    Every time I use: as.numeric(as.character(my_factor)) I think I must be doing something wrong.

  3. Gavin Simpson
    10th January, 2011 at 16:45 pm

    Nice post. For `nlevels`, consider the new function `droplevels` in R, so we could have something like :

    nlevels <- function(x, drop = FALSE, …) {
    if(drop)
    base:::nlevels(droplevels(x, …))
    else
    base:::nlevels(x)
    }

  4. troy
    10th January, 2011 at 17:03 pm

    The geomean function won’t cover you if you have negative values, that can get trickier and is a problem I run into sometimes.

    http://www.buzzardsbay.org/geomean.htm#negative_values

    • 10th January, 2011 at 18:28 pm

      If you input negative numbers to geomean then it returns NaN, which I think is the correct behaviour. You can persuade it to give you a numeric answer by converting to complex numbers, e.g., geomean(as.complex(-1:-5)).

  5. 10th January, 2011 at 19:03 pm

    Nice topic!

    Here is a one I recently found useful:

    labels.hclust <- function(object, …) as.character(object$labels)

    I wonder were we are supposed to propose such function…

    • 11th January, 2011 at 9:56 am

      Nice snippet. To get things into R, your choice is probably
      1. Post on R-devel and hope someone likes the idea. I get the feeling that you have to regularly contribute there before feature requests will be taken seriously.
      2. Pick a member of R-core who may be interested and contact them directly. You’ll either benefit from the personal touch or be added to their email block list.

  6. Marek
    10th January, 2011 at 22:06 pm

    For “not in” operator I use code based directly on “%in%”:

    match(x, table, nomatch = 0L) == 0L

  7. 12th January, 2011 at 13:00 pm

    Hola, Reading this website is a real pleasure, thanks !

  8. Hong Ooi
    18th January, 2011 at 3:22 am

    There’s many things I dislike about SAS and Visual Basic, but one thing I do like is the presence of a string concatenation operator: || in the former, + in the latter. Something similar in R might be

    `%+%` <- function(a, b) paste(a, b, sep="")

    which would let you do

    "hello " %+% "world"
    "var" %+% 1:3 # var1 var2 var3

    • 18th January, 2011 at 17:51 pm

      Excellent idea. It’s a shame that + isn’t an S3 method because then we could define `+.character` to save bothering with the % signs.

  9. Hong Ooi
    18th January, 2011 at 3:26 am

    ^ That should be & and not + for the VB concatenation operator

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 218 other followers

%d bloggers like this: