R code style guide
Precedence of rules
- When extending existing code, follow the style of that code.
- Where there are no existing style rules, use this guide.
- If this guide doesn’t cover it, try another guide (R internal coding standards, Bioconductor coding standards, Google’s, Hadley Wickham’s, Colin Gillespie’s, Henrik Bengtsson’s basic and Aroma).
Variable and function names
- Variables, in general, should be given meaningful and pronounceable names.
- Index variables with short scope may be given short names, e.g.,
i,j,k. Likewise, mathematical variables with short scope may be given an appropriate short name such asx,y,z.
#Good: mean_height <- mean(height) #Bad: foo <- 1:10 #not meaningful
- Variable names, including function names, should be
lower_under_cased, except - Variables representing constants should be
UPPER_UNDER_CASED, and - Functions using S3 method dispatch must take the form
generic_name.class, and - Column and row names for data frames, matrices, etc., should be
lower.dotted.cased.
#Good: body_data <- data.frame( weight.in.kg = c(65, 83, 99), height.in.m = c(156, 179, 186) ) PI_SQUARED <- pi^2 mean.my_class <- function(x) #some implementation #Bad: firstdayofweek <- 'Monday' # wrong case
- Negated boolean names should be avoided.
#Good: if(is_found) #... #Bad: if(!is_not_found) #...
- The prefix
nshould be used for variables representing the number of objects (e.g.,n_items).
Code quality
- All functions intended to be called directly by a user should include input validation and, where appropriate, default values for inputs.
- All functions intended to be called directly by a user should have help documentation explaining usage.
#Good:
calculate_area_of_rectangle <- function(width, height)
{
if(any(width < 0) || any(height < 0)) stop("all inputs must be non-negative")
width * height
}
Code Reuse
- Functions should be used instead of scripts, where appropriate.
- Lots of small functions should be used instead of a few big functions.
- Packages should be used instead of individual functions, where appropriate.
Syntax
Assignment
<-should be used for assignment (not=).
#Good: mean_height <- mean(height) #Bad: mean_height = mean(height)
White space
- There should be a space before and after all binary operators (
<-,==,+,*,&&, etc.), except - The colon operator, :, and the exponentiation operator, ^.
- There should be a space after a comma, but not before.
- There should be no white space immediately inside brackets,
(),{},{}. - Extra spacing is permissible if it improves alignment.
- Nested items should be given on their own line and indented, where this improves clarity.
- Tabs should be made of space characters, not a tab character.
- Tabs should be
32 spaces wide. (I’ve changed my mind about this. Since you can break lines whenever you like, highly indented code is possible and useful; this in turn requires a smaller tab size.)
#Good:
y <- x + 1;
z <- m[1:2, ];
list(
x = 1:10, # extra spaces align equals signs
data = my_data,
sublist = list(
a = 123,
b = list(
zzz = "structure is easier to determine when you indent nested items"
)
)
)
#Bad:
y<-x+1; # no spaces
z <- y[ 1:2,]; # spaces inside brackets, none after comma
Braces
- Braces are not compulsory, but are recommended.
- Braces for code blocks should appear aligned, on their own lines.
#Good:
if(runif(1) > 0.5)
{
cat("more than a half")
}
#Bad:
if(runif(1) > 0.5) { #misaligned brace
cat("more than a half")
}
- Code should be vectorised, where possible, in preference to loops.
- Short loops for code that cannot be vectorised should use an
apply-like function. - Outputs from loops should be initialised before the loop.
switchstatements for character values should include an “otherwise” statement where possible.
#Good:
mean(height) #mean is vectorised
tapply(height, gender, mean) #apply-like functions are the next best thing
#Bad:
total <- 0
count <- 0
for(i in 1:length(height))
{
total <- total + height[i]
count <- count + 1
}
mean_height <- total / count
#Slower and vastly more effort than it should be
- Line lengths should be less than 80 characters. Longer lines should be split at a graceful point (e.g., after a comma or operator).
- There should only be one command on any line, except for assignment/display combinations.
x <- 1:10; x # assignment and display is okay on one line
Floating point numbers
- Floating point values should always include a digit before the decimal point.
#Good: 0.5 #Bad: .5 #less readable
Boolean values
TRUEandFALSEshould be used for boolean values (notTandF).
#Good: c(TRUE, FALSE) #Bad: c(T, F) #less readable

One way to keep the ‘one statement per line’ paradigm yet avoid breaking the “x <- 1:10; x " pattern across two lines is "(x <- 1:10)"; I don't know if perhaps you think that's worse, but I figured it might be worth raising it as a possibility when the two are in conflict
That’s a cute trick. I like it.
I think I’ll keep the assign-and-display-on-one-line rule as it is to save having to rewrite old code, but I agree that your technique is prettier (and less typing for long variable names).
Rule # 34, not using T and F is not a matter of readability, it is a matter of reliability. T and F are very common names for variables, and they are not reserved names in the language.
So, I’m new to coding in general, and R specifically. Rule #17 is interesting to me. All the code in R that I’ve seen uses <- for assignment. I'm taking a course on Coursera where the teacher is using = for assignment. My question is, why do you think #17 is essential? I assume you have a reason based on experience; so, I'm interested in why using that assignment is important.
Thanks for your time.
As with many style points, it doesn’t matter much which style you use, as long as you pick one and stick to it consistently.
<-is much more common because it’s been around longer (and you need it for compatibility with very old versions of S-Plus). This means that if you contribute to an existing package, you’ll probably have to use this operator.Proponents of
=point out that it’s less typing and you don’t need spaces to avoid ambiguous cases likex<-3.See this stackoverflow question for more discussion on the matter.