R code style guide

Precedence of rules

  1. When extending existing code, follow the style of that code.
  2. Where there are no existing style rules, use this guide.
  3. If this guide doesn’t cover it, try another guide (R internal coding standards, Bioconductor coding standards, Google’s, Hadley Wickham’s standard and Stat 405, Colin Gillespie’s, Henrik Bengtsson’s basic and Aroma, Paul E Johnson’s).

Variable and function names

  1. Variables, in general, should be given meaningful and pronounceable names in the problem domain.
  2. If the variable has a unit, that unit should be included as a suffix in the name.
    1. #Good:
      height_cm <- rnorm(1000, 170, 5)  #includes unit
      is_tall <- height_cm > 200        #in problem domain
      foo <- 1:10                       #not meaningful
      height <- rnorm(1000, 170, 5)     #no unit
      tall_flag <- height_cm > 200      #'flag' refers to var type not problem domain
      1. Index variables with short scope may be given short names, e.g., i, j, k. Likewise, mathematical variables with short scope may be given an appropriate short name such as x, y, z.
      2. Variables that are summary statistics of of variables should take the form stat_of_original_variable, except
      3. The prefix n may be used instead of count_of for variables representing the number of objects (e.g., n_items).
      for(i in seq_along(x))               #short names are OK for indexers, math vars
        #loop content here
      mean_of_height_cm <- mean(height_cm) #takes form stat_of_var; includes unit
      height_cm_mean <- mean(height_cm)    #wrong form
      1. Variable names, including function names, should be lower_under_cased, except
      2. Variables representing constants should be UPPER_UNDER_CASED, and
      3. Functions using S3 method dispatch must take the form generic_name.class, and
      4. Column and row names for data frames, matrices, etc., should be lower.dotted.cased.
      body_data <- data.frame(
        weight.kg = c(65, 83, 99),           #lower dotted case; unit included
        height.m  = c(156, 179, 186)
      PI_SQUARED <- pi^2
      mean.my_class <- function(x) {}        #some implementation
      firstdayofweek <- 'Monday'   # wrong case
      1. Negated boolean names should be avoided.
      if(is_found) #...
      if(!is_not_found) #...
      1. Functions of opposite concepts should have matching opposite prefixes.
      saveRDS()          #read and save don't match!

      Code quality

      1. All functions intended to be called directly by a user should include input validation and, where appropriate, default values for inputs.
      2. All functions intended to be called directly by a user should have help documentation explaining usage.
      calculate_area_of_rectangle <- function(width, height) 
        if(any(width < 0) || any(height < 0)) 
          stop("all inputs must be non-negative")
        width * height

      Code Reuse

      1. Functions should be used instead of scripts, where appropriate.
      2. Lots of small functions should be used rather than a few big functions.
      3. Packages should be used instead of individual functions, where appropriate.


      1. <- should be used for assignment (not =).
      mean_height <- mean(height)
      mean_height = mean(height)

      White space

      1. There should be a space before and after all binary operators (<-, ==, +, *, &&, etc.), except
      2. The colon operator, :, and the exponentiation operator, ^.
      3. There should be a space after a comma, but not before.
      4. There should be no white space immediately inside brackets, (), {}, {}.
      5. Extra spacing is permissible if it improves alignment.
      6. In particular, equals signs for named arguments in the same function call should be aligned.
      7. Nested items should be given on their own line and indented, where this improves clarity.
      8. Tabs should be made of space characters, not a tab character.
      9. Tabs should be 32 spaces wide. (I’ve changed my mind about this. Since you can break lines whenever you like, highly indented code is possible and useful; this in turn requires a smaller tab size.)
      10. Where function calls have been split over multiple lines, the closing bracket appears on its own line, aligned with the start of the function name.
      y <- x + 1;
      z <- m[1:2, ];
        x       = 1:10,        # extra spaces align equals signs
        data    = my_data,
        sublist = list(
          a = 123,
          b = list(
            zzz = "structure is easier to determine when you indent nested items"
          )                    #closing bracket gets its own line
        )                      #and again
      )                        #and again
      y<-x+1;                  # no spaces
      z <- y[ 1:2,];           # spaces inside brackets, none after comma


      1. Braces are not compulsory, but are recommended.
      2. Braces for code blocks should appear aligned, on their own lines.
      if(runif(1) > 0.5)
        cat("more than a half")
      if(runif(1) > 0.5) {         #misaligned brace
        cat("more than a half")
      1. Code should be vectorised, where possible, in preference to loops.
      2. Short loops for code that cannot be vectorised should use an apply-like function.
      3. Outputs from loops should be initialised before the loop.
      4. switch statements for character values should include an “otherwise” statement where possible.
      mean(height)                  #mean is vectorised
      tapply(height, gender, mean)  #apply-like functions are the next best thing
      total <- 0
      count <- 0
      for(i in 1:length(height))
        total <- total + height[i]
        count <- count + 1
      mean_height <- total / count
      #Slower and vastly more effort than it should be
      1. Line lengths should be less than 80 characters. Longer lines should be split at a graceful point (e.g., after a comma or operator).
      2. There should only be one command on any line, except for assignment/display combinations.
      3. Even for assignment/display combinations, brackets are preferred to multiple statements.
      x <- 1:10; x          #assignment and display is okay on one line...
      (x <- 1:0)            #but using brackets is preferred for this task

      Floating point numbers

      1. Floating point values should always include a digit before the decimal point.
      .5    #less readable

      Boolean values

      1. TRUE and FALSE should be used for boolean values (not T and F).
      c(TRUE, FALSE)
      c(T, F)    #less readable
  1. 20th August, 2011 at 1:47 am

    One way to keep the ‘one statement per line’ paradigm yet avoid breaking the “x <- 1:10; x " pattern across two lines is "(x <- 1:10)"; I don't know if perhaps you think that's worse, but I figured it might be worth raising it as a possibility when the two are in conflict

    • 22nd August, 2011 at 11:15 am

      That’s a cute trick. I like it.

      I think I’ll keep the assign-and-display-on-one-line rule as it is to save having to rewrite old code, but I agree that your technique is prettier (and less typing for long variable names).

  2. A
    19th July, 2012 at 22:37 pm

    Rule # 34, not using T and F is not a matter of readability, it is a matter of reliability. T and F are very common names for variables, and they are not reserved names in the language.

  3. 3rd February, 2013 at 16:41 pm

    So, I’m new to coding in general, and R specifically. Rule #17 is interesting to me. All the code in R that I’ve seen uses <- for assignment. I'm taking a course on Coursera where the teacher is using = for assignment. My question is, why do you think #17 is essential? I assume you have a reason based on experience; so, I'm interested in why using that assignment is important.

    Thanks for your time.

    • 3rd February, 2013 at 17:23 pm

      As with many style points, it doesn’t matter much which style you use, as long as you pick one and stick to it consistently.

      <- is much more common because it’s been around longer (and you need it for compatibility with very old versions of S-Plus). This means that if you contribute to an existing package, you’ll probably have to use this operator.

      Proponents of = point out that it’s less typing and you don’t need spaces to avoid ambiguous cases like x<-3.

      See this stackoverflow question for more discussion on the matter.

  4. 21st July, 2013 at 21:45 pm

    Great style guide!

    I really like the “”<-" operator because it maps better onto the operation you are actually doing. For example, when writing "x <- 10" you say that you want 10 to go into x, which is exactly what is happening while when you write "x = 10" you say you want x and 10 to be equal which is not what is happening, right? If it actually was that last thing that was happening it would be possible to write "10 = x", but it isn't because "=" the equality sign, for historical reasons, is used as the assignment operator and not the equality operator. So "<-" = less confusing 🙂

  1. 2nd August, 2012 at 6:17 am
  2. 23rd January, 2013 at 22:50 pm
  3. 23rd January, 2013 at 22:51 pm
  4. 27th July, 2013 at 22:06 pm
  5. 12th May, 2015 at 10:31 am

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: