## Thoughts on R’s Terrible, Horrible, No Good, Very Bad Documentation

A couple of days ago Pete Werner had a rant about the state of R’s documentation. A lot of it was misguided, but it had some legitimate complaints, and the fact that people can perceive R’s documentation as being bad (whether accurate or not) is important in itself.

The exponential growth in R’s popularity means that a large proportion of its user’s are beginners. The demographic also increasingly includes people who don’t come from a traditional statistics or data analysis background – I work with biologists and chemists to whom R is a secondary skill after their lab work.

All this means that I think it’s important for the R community to have an honest discussion about how it can make itself accessible for beginners, and dispel the notion that it is hard to learn.

## Function help pages

Pete’s big complaint was that the `?`

function help pages only act as a reference; they aren’t useful for finding functions if you don’t know what you want already. This is pretty dumb; every programming language worth using has a similar function-level reference system, and they are never used for finding new functions. I happen to think that R’s function-level reference are, on the whole, pretty good. The fact that you can’t get a package submitted to CRAN without dozens of check on the documentation means that all functions have at least their usage documented, and most have some description and examples too.

## Searching for functions

His complaint that it is hard to find a function when you don’t know the name carries a little more weight.

He gives the example of trying to find a function to create an identity matrix. `??identity`

returns many irrelevant things, but he spots the function `identity`

. Of course, `identity`

isn’t what he wants; it just returns its input, so Pete gives up.

I agree that `identity`

should have a “see also” link to tell people to use `diag`

instead if they want an identity matrix. After reading Pete’s post I filed a bug report, and 3 hours later, Martin Maechler made the change to R’s documentation. All fixed.

While’s R function-level documentation is fairly mature, there is definitely more scope for linking between pages to point people in the right direction. If you think that there is a missing link between help pages, write a comment, and I’ll file a new bug with the collated suggestions. Similarly, there are a few functions that could use better examples. Feel free to comment about those.

The failure of `??identity`

to find an identity matrix function is unfortunately typical. `??"identity matrix"`

would have been a fairer search, and it gets rid of most the rubbish, but still doesn’t find `diag`

.

In general, I find that `??`

isn’t great, unless I’m searching for a fairly obscure term. I also don’t see an easy fix for that. Fortunately, there’s an alternative. I use Rseek as my first choice tool for finding new functions. In this case, the first result for a search for “identity matrix” is a blog post entitled “How do I Create the Identity Matrix in R?”, which gives the right answer.

When I teach R to beginners, Rseek gets mentioned in lesson one. It is absolutely fundamental to R usage. So I don’t believe that finding the right function to use is a big problem in R either, except to new users who don’t know about Rseek.

The thing is, there’s a way to fix that. Rseek, as far as I know, is entirely run by Sasha Goodman right now. If he gets hit by a bus, several million R users are going to be stuck. This is a big vulnerability to R, and I think it’s time that Rseek became an official R project.

I should also mention that R has other built-in ways of finding functions beyond `??`

, and as Pete linked to, Pat Burns’ guide to them is excellent.

## Concept-level documentation

Pete’s final complaint was that there is a lack of concept-level documentation. That is, how do you string several functions together to achieve a task?

Actually, there is a lot of concept-level documentation around; it just comes in many forms, and you have to learn what those forms are.

`demo()`

brings up a list of demonstrations of how to do particular tasks. This command appears in the startup text when R loads, so there is no excuse for not knowing about it. There are only 16 of them though, so I think that these are worth revisiting for expansion.

`browseVignettes()`

brings up a list of vignettes. These are short documents on a particular task. Many packages have them, and it is a good idea to read them when you start using new package.

The base-R packages, other than `grid`

and `Matrix`

, aren’t well represented with vignettes. Much of the content that would have gone into vignettes appears in the manual Introduction to R, but there is definite room for improvement. For example, a vignette on subsetting or basic plotting might stave off a few questions to the r-help mailing list.

Another point to remember is that R-core only consists of 20 people (and I’m not sure how many of those are still actively working on R), so much of the how-to documentation has been created by the users. There are a ridiculous number of free resources available; just take a look at the Stack Overflow R Tag Info page.

## tl;dr

- R’s function level documentation is mostly very good. There are a few “see also”s missing, and some of the examples could be improved.
- The built-in facilities to find a function aren’t usually as successful as searching on Rseek. I think Rseek ought to be an official R project.
- Concept-level documentation is covered by demos and vignettes, though I think there should be a few more of these in base-R.

Update: Andrie de Vries tweeted me to say that Google has gotten better at returning R-related content, so searching for `[r] "identity matrix"`

returns what you want, and in fact `r "identity matrix"`

does too.

## Improving base-R examples

Earlier today I saw the hundred bazillionth question about how to use the `paste`

function. My initial response was “take a look at example(paste) to see how it works”.

Then I looked at `example(paste)`

, and it turns out that it’s not very good at all. There isn’t even an example of how to use the `collapse`

argument. Considering that `paste`

is one of the first functions that beginners come across, as well as being a little bit tricky (getting to understand the difference between the `sep`

and `collapse`

arguments takes a bit of thinking about when you are new), this seems like a big oversight.

I’ve submitted this as a bug, with a suggested improvement to the examples. Fingers crossed that R-core will accept the update, or something like it.

It got me thinking though, how many other base functions could do with better examples? I had a quick look at some common functions that beginners seems to get confused with, and the following all have fairly bad example sections:

In base: `browser`

, `get`

, `seq`

In stats: `formula`

, `lm`

, `runif`

, `t.test`

In graphics: `plot`

In utils: `download.file`

, `read.table`

If you have half an hour spare, have a go at writing a better example page for one of these functions, or any other function in the base distribution, then submit it to the bug tracker. (If you aren’t sure that your examples are good enough, or you need advice, try posting what you have on r-devel before submitting a bug report. Dealing with bug reports takes up valuable R-core time, so you need to be sure of quality first.)

This seems like a really easy way to make R more accessible for beginners.