R code style guide

Precedence of rules

  1. When extending existing code, follow the style of that code.
  2. Where there are no existing style rules, use this guide.
  3. If this guide doesn’t cover it, try another guide (R internal coding standards, Bioconductor coding standards, Google’s, Hadley Wickham’s, Colin Gillespie’s, Henrik Bengtsson’s basic and Aroma).

Variable and function names

  1. Variables, in general, should be given meaningful and pronounceable names.
  2. Index variables with short scope may be given short names, e.g., i, j, k. Likewise, mathematical variables with short scope may be given an appropriate short name such as x, y, z.

#Good:
mean_height <- mean(height)
#Bad:
foo <- 1:10     #not meaningful

  1. Variable names, including function names, should be lower_under_cased, except
  2. Variables representing constants should be UPPER_UNDER_CASED, and
  3. Functions using S3 method dispatch must take the form generic_name.class, and
  4. Column and row names for data frames, matrices, etc., should be lower.dotted.cased.

#Good:
body_data <- data.frame(
  weight.in.kg = c(65, 83, 99), 
  height.in.m  = c(156, 179, 186)
)
PI_SQUARED <- pi^2
mean.my_class <- function(x) #some implementation
#Bad:
firstdayofweek <- 'Monday'   # wrong case

  1. Negated boolean names should be avoided.

#Good:
if(is_found) #...
#Bad:
if(!is_not_found) #...

  1. The prefix n should be used for variables representing the number of objects (e.g., n_items).

Code quality

  1. All functions intended to be called directly by a user should include input validation and, where appropriate, default values for inputs.
  2. All functions intended to be called directly by a user should have help documentation explaining usage.

#Good:
calculate_area_of_rectangle <- function(width, height) 
{
   if(any(width < 0) || any(height < 0)) stop("all inputs must be non-negative")
   width * height
}

Code Reuse

  1. Functions should be used instead of scripts, where appropriate.
  2. Lots of small functions should be used instead of a few big functions.
  3. Packages should be used instead of individual functions, where appropriate.

Syntax
Assignment

  1. <- should be used for assignment (not =).

#Good:
mean_height <- mean(height)
#Bad:
mean_height = mean(height)

White space

  1. There should be a space before and after all binary operators (<-, ==, +, *, &&, etc.), except
  2. The colon operator, :, and the exponentiation operator, ^.
  3. There should be a space after a comma, but not before.
  4. There should be no white space immediately inside brackets, (), {}, {}.
  5. Extra spacing is permissible if it improves alignment.
  6. Nested items should be given on their own line and indented, where this improves clarity.
  7. Tabs should be made of space characters, not a tab character.
  8. Tabs should be 32 spaces wide. (I’ve changed my mind about this. Since you can break lines whenever you like, highly indented code is possible and useful; this in turn requires a smaller tab size.)

#Good:
y <- x + 1;
z <- m[1:2, ];
list(
  x       = 1:10,        # extra spaces align equals signs
  data    = my_data,
  sublist = list(
    a = 123,
    b = list(
      zzz = "structure is easier to determine when you indent nested items"
    )
  )
)
#Bad:
y<-x+1;                # no spaces
z <- y[ 1:2,];         # spaces inside brackets, none after comma

Braces

  1. Braces are not compulsory, but are recommended.
  2. Braces for code blocks should appear aligned, on their own lines.

#Good:
if(runif(1) > 0.5)
{
  cat("more than a half")
}
#Bad:
if(runif(1) > 0.5) {         #misaligned brace
  cat("more than a half")
}

  1. Code should be vectorised, where possible, in preference to loops.
  2. Short loops for code that cannot be vectorised should use an apply-like function.
  3. Outputs from loops should be initialised before the loop.
  4. switch statements for character values should include an “otherwise” statement where possible.

#Good:
mean(height)                  #mean is vectorised
tapply(height, gender, mean)  #apply-like functions are the next best thing
#Bad:
total <- 0
count <- 0
for(i in 1:length(height))
{
  total <- total + height[i]
  count <- count + 1
}
mean_height <- total / count
#Slower and vastly more effort than it should be

  1. Line lengths should be less than 80 characters. Longer lines should be split at a graceful point (e.g., after a comma or operator).
  2. There should only be one command on any line, except for assignment/display combinations.

x <- 1:10; x          # assignment and display is okay on one line

Floating point numbers

  1. Floating point values should always include a digit before the decimal point.

#Good:
0.5
#Bad:
.5    #less readable

Boolean values

  1. TRUE and FALSE should be used for boolean values (not T and F).

#Good:
c(TRUE, FALSE)
#Bad:
c(T, F)    #less readable

  1. 20th August, 2011 at 1:47 am | #1

    One way to keep the ‘one statement per line’ paradigm yet avoid breaking the “x <- 1:10; x " pattern across two lines is "(x <- 1:10)"; I don't know if perhaps you think that's worse, but I figured it might be worth raising it as a possibility when the two are in conflict

    • 22nd August, 2011 at 11:15 am | #2

      That’s a cute trick. I like it.

      I think I’ll keep the assign-and-display-on-one-line rule as it is to save having to rewrite old code, but I agree that your technique is prettier (and less typing for long variable names).

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 43 other followers