R code style guide
Precedence of rules
- When extending existing code, follow the style of that code.
- Where there are no existing style rules, use this guide.
- If this guide doesn’t cover it, try another guide (R internal coding standards, Bioconductor coding standards, Google’s, Hadley Wickham’s, Colin Gillespie’s, Henrik Bengtsson’s basic and Aroma).
Variable and function names
- Variables, in general, should be given meaningful and pronounceable names.
- Index variables with short scope may be given short names, e.g.,
i,j,k. Likewise, mathematical variables with short scope may be given an appropriate short name such asx,y,z.
#Good: mean_height <- mean(height) #Bad: foo <- 1:10 #not meaningful
- Variable names, including function names, should be
lower_under_cased, except - Variables representing constants should be
UPPER_UNDER_CASED, and - Functions using S3 method dispatch must take the form
generic_name.class, and - Column and row names for data frames, matrices, etc., should be
lower.dotted.cased.
#Good: body_data <- data.frame( weight.in.kg = c(65, 83, 99), height.in.m = c(156, 179, 186) ) PI_SQUARED <- pi^2 mean.my_class <- function(x) #some implementation #Bad: firstdayofweek <- 'Monday' # wrong case
- Negated boolean names should be avoided.
#Good: if(is_found) #... #Bad: if(!is_not_found) #...
- The prefix
nshould be used for variables representing the number of objects (e.g.,n_items).
Code quality
- All functions intended to be called directly by a user should include input validation and, where appropriate, default values for inputs.
- All functions intended to be called directly by a user should have help documentation explaining usage.
#Good:
calculate_area_of_rectangle <- function(width, height)
{
if(any(width < 0) || any(height < 0)) stop("all inputs must be non-negative")
width * height
}
Code Reuse
- Functions should be used instead of scripts, where appropriate.
- Lots of small functions should be used instead of a few big functions.
- Packages should be used instead of individual functions, where appropriate.
Syntax
Assignment
<-should be used for assignment (not=).
#Good: mean_height <- mean(height) #Bad: mean_height = mean(height)
White space
- There should be a space before and after all binary operators (
<-,==,+,*,&&, etc.), except - The colon operator, :, and the exponentiation operator, ^.
- There should be a space after a comma, but not before.
- There should be no white space immediately inside brackets,
(),{},{}. - Extra spacing is permissible if it improves alignment.
- Nested items should be given on their own line and indented, where this improves clarity.
- Tabs should be made of space characters, not a tab character.
- Tabs should be
32 spaces wide. (I’ve changed my mind about this. Since you can break lines whenever you like, highly indented code is possible and useful; this in turn requires a smaller tab size.)
#Good:
y <- x + 1;
z <- m[1:2, ];
list(
x = 1:10, # extra spaces align equals signs
data = my_data,
sublist = list(
a = 123,
b = list(
zzz = "structure is easier to determine when you indent nested items"
)
)
)
#Bad:
y<-x+1; # no spaces
z <- y[ 1:2,]; # spaces inside brackets, none after comma
Braces
- Braces are not compulsory, but are recommended.
- Braces for code blocks should appear aligned, on their own lines.
#Good:
if(runif(1) > 0.5)
{
cat("more than a half")
}
#Bad:
if(runif(1) > 0.5) { #misaligned brace
cat("more than a half")
}
- Code should be vectorised, where possible, in preference to loops.
- Short loops for code that cannot be vectorised should use an
apply-like function. - Outputs from loops should be initialised before the loop.
switchstatements for character values should include an “otherwise” statement where possible.
#Good:
mean(height) #mean is vectorised
tapply(height, gender, mean) #apply-like functions are the next best thing
#Bad:
total <- 0
count <- 0
for(i in 1:length(height))
{
total <- total + height[i]
count <- count + 1
}
mean_height <- total / count
#Slower and vastly more effort than it should be
- Line lengths should be less than 80 characters. Longer lines should be split at a graceful point (e.g., after a comma or operator).
- There should only be one command on any line, except for assignment/display combinations.
x <- 1:10; x # assignment and display is okay on one line
Floating point numbers
- Floating point values should always include a digit before the decimal point.
#Good: 0.5 #Bad: .5 #less readable
Boolean values
TRUEandFALSEshould be used for boolean values (notTandF).
#Good: c(TRUE, FALSE) #Bad: c(T, F) #less readable

One way to keep the ‘one statement per line’ paradigm yet avoid breaking the “x <- 1:10; x " pattern across two lines is "(x <- 1:10)"; I don't know if perhaps you think that's worse, but I figured it might be worth raising it as a possibility when the two are in conflict
That’s a cute trick. I like it.
I think I’ll keep the assign-and-display-on-one-line rule as it is to save having to rewrite old code, but I agree that your technique is prettier (and less typing for long variable names).