R code style guide
Precedence of rules
- When extending existing code, follow the style of that code.
- Where there are no existing style rules, use this guide.
- If this guide doesn’t cover it, try another guide (R internal coding standards, Bioconductor coding standards, Google’s, Hadley Wickham’s standard and Stat 405, Colin Gillespie’s, Henrik Bengtsson’s basic and Aroma, Paul E Johnson’s).
Variable and function names
- Variables, in general, should be given meaningful and pronounceable names in the problem domain.
- If the variable has a unit, that unit should be included as a suffix in the name.
- Index variables with short scope may be given short names, e.g.,
i
,j
,k
. Likewise, mathematical variables with short scope may be given an appropriate short name such asx
,y
,z
. - Variables that are summary statistics of of variables should take the form
stat_of_original_variable
, except - The prefix
n
may be used instead ofcount_of
for variables representing the number of objects (e.g.,n_items
). - Variable names, including function names, should be
lower_under_cased
, except - Variables representing constants should be
UPPER_UNDER_CASED
, and - Functions using S3 method dispatch must take the form
generic_name.class
, and - Column and row names for data frames, matrices, etc., should be
lower.dotted.cased
. - Negated boolean names should be avoided.
- Functions of opposite concepts should have matching opposite prefixes.
- All functions intended to be called directly by a user should include input validation and, where appropriate, default values for inputs.
- All functions intended to be called directly by a user should have help documentation explaining usage.
- Functions should be used instead of scripts, where appropriate.
- Lots of small functions should be used rather than a few big functions.
- Packages should be used instead of individual functions, where appropriate.
<-
should be used for assignment (not=
).- There should be a space before and after all binary operators (
<-
,==
,+
,*
,&&
, etc.), except - The colon operator, :, and the exponentiation operator, ^.
- There should be a space after a comma, but not before.
- There should be no white space immediately inside brackets,
()
,{}
,{}
. - Extra spacing is permissible if it improves alignment.
- In particular, equals signs for named arguments in the same function call should be aligned.
- Nested items should be given on their own line and indented, where this improves clarity.
- Tabs should be made of space characters, not a tab character.
- Tabs should be
32 spaces wide. (I’ve changed my mind about this. Since you can break lines whenever you like, highly indented code is possible and useful; this in turn requires a smaller tab size.) - Where function calls have been split over multiple lines, the closing bracket appears on its own line, aligned with the start of the function name.
- Braces are not compulsory, but are recommended.
- Braces for code blocks should appear aligned, on their own lines.
- Code should be vectorised, where possible, in preference to loops.
- Short loops for code that cannot be vectorised should use an
apply
-like function. - Outputs from loops should be initialised before the loop.
switch
statements for character values should include an “otherwise” statement where possible.- Line lengths should be less than 80 characters. Longer lines should be split at a graceful point (e.g., after a comma or operator).
- There should only be one command on any line, except for assignment/display combinations.
- Even for assignment/display combinations, brackets are preferred to multiple statements.
- Floating point values should always include a digit before the decimal point.
TRUE
andFALSE
should be used for boolean values (notT
andF
).
#Good: height_cm <- rnorm(1000, 170, 5) #includes unit is_tall <- height_cm > 200 #in problem domain #Bad: foo <- 1:10 #not meaningful height <- rnorm(1000, 170, 5) #no unit tall_flag <- height_cm > 200 #'flag' refers to var type not problem domain
#Good: for(i in seq_along(x)) #short names are OK for indexers, math vars { #loop content here } mean_of_height_cm <- mean(height_cm) #takes form stat_of_var; includes unit #Bad: height_cm_mean <- mean(height_cm) #wrong form
#Good: body_data <- data.frame( weight.kg = c(65, 83, 99), #lower dotted case; unit included height.m = c(156, 179, 186) ) PI_SQUARED <- pi^2 mean.my_class <- function(x) {} #some implementation #Bad: firstdayofweek <- 'Monday' # wrong case
#Good: if(is_found) #... #Bad: if(!is_not_found) #...
#Good: read_my_data() write_my_data() #or load_my_data() save_my_data() #Bad: readRDS() saveRDS() #read and save don't match!
Code quality
#Good: calculate_area_of_rectangle <- function(width, height) { if(any(width < 0) || any(height < 0)) { stop("all inputs must be non-negative") } width * height }
Code Reuse
Syntax
Assignment
#Good: mean_height <- mean(height) #Bad: mean_height = mean(height)
White space
#Good: y <- x + 1; z <- m[1:2, ]; list( x = 1:10, # extra spaces align equals signs data = my_data, sublist = list( a = 123, b = list( zzz = "structure is easier to determine when you indent nested items" ) #closing bracket gets its own line ) #and again ) #and again #Bad: y<-x+1; # no spaces z <- y[ 1:2,]; # spaces inside brackets, none after comma
Braces
#Good: if(runif(1) > 0.5) { cat("more than a half") } #Bad: if(runif(1) > 0.5) { #misaligned brace cat("more than a half") }
#Good: mean(height) #mean is vectorised tapply(height, gender, mean) #apply-like functions are the next best thing #Bad: total <- 0 count <- 0 for(i in 1:length(height)) { total <- total + height[i] count <- count + 1 } mean_height <- total / count #Slower and vastly more effort than it should be
x <- 1:10; x #assignment and display is okay on one line... (x <- 1:0) #but using brackets is preferred for this task
Floating point numbers
#Good: 0.5 #Bad: .5 #less readable
Boolean values
#Good: c(TRUE, FALSE) #Bad: c(T, F) #less readable
One way to keep the ‘one statement per line’ paradigm yet avoid breaking the “x <- 1:10; x " pattern across two lines is "(x <- 1:10)"; I don't know if perhaps you think that's worse, but I figured it might be worth raising it as a possibility when the two are in conflict
That’s a cute trick. I like it.
I think I’ll keep the assign-and-display-on-one-line rule as it is to save having to rewrite old code, but I agree that your technique is prettier (and less typing for long variable names).
Rule # 34, not using T and F is not a matter of readability, it is a matter of reliability. T and F are very common names for variables, and they are not reserved names in the language.
So, I’m new to coding in general, and R specifically. Rule #17 is interesting to me. All the code in R that I’ve seen uses <- for assignment. I'm taking a course on Coursera where the teacher is using = for assignment. My question is, why do you think #17 is essential? I assume you have a reason based on experience; so, I'm interested in why using that assignment is important.
Thanks for your time.
As with many style points, it doesn’t matter much which style you use, as long as you pick one and stick to it consistently.
<-
is much more common because it’s been around longer (and you need it for compatibility with very old versions of S-Plus). This means that if you contribute to an existing package, you’ll probably have to use this operator.Proponents of
=
point out that it’s less typing and you don’t need spaces to avoid ambiguous cases likex<-3
.See this stackoverflow question for more discussion on the matter.
Great style guide!
I really like the “”<-" operator because it maps better onto the operation you are actually doing. For example, when writing "x <- 10" you say that you want 10 to go into x, which is exactly what is happening while when you write "x = 10" you say you want x and 10 to be equal which is not what is happening, right? If it actually was that last thing that was happening it would be possible to write "10 = x", but it isn't because "=" the equality sign, for historical reasons, is used as the assignment operator and not the equality operator. So "<-" = less confusing 🙂