We wrote the code, we wrote the tests but we didn’t specify explicitly how the code can be used. Take for example the calculateAutoCorrelation
function. From its name we can infer it should return some kind of auto correlation coefficient. But how exactly is this correlation calculated? What kind of lag we take into account? Those questions can more easily be answered if we add some metadata to the functions we created.
In this post I won’t go into documentation in general but will focus specifically on how to document functions in R. So READMEs, vignettes, won’t be covered here (although vignettes will come up when talking about literate programming).
Yet again we use what’s suggested by devtools
so we’ll stick to roxygen2
. This package helps with combining code and documentation in the same place. I didn’t explore other alternatives.
As an example I’ll show how the plotAvgTemperaturesByYear
in visualize.R
is documented.
Comments consist of two parts:
Everything before the first tag is considered the introduction. The introduction consists of:
The title and description are required so you can’t document a title without also documenting a description.
This is how it’s put into practice:
#' Plot of average temperatures by year
#'
#' \code{plotAvgTemperaturesByYear} plots the average temperature by year in a line graph.
plotAvgTemperaturesByYear <- function(avg_temperatures,
place = "Nottingham") {
...
As you can see I don’t use any explicit tags saying this is a title, this is a description, … By default the first line is the title, the second line is the description and any following lines are details. Lines are separated with #'
.
You have to specify the function name with \code{}
but this seems kind of superfluous since you’re already documenting a function so by definition you know its name.
Without adding any specific tags we can already create the documentation by typing the following in the console:
devtools::document()
This creates a .Rd
file in a man
folder. You don’t actually need to know how this file works since you always manipulate it by documenting functions right next to those functions.
Now if you type ?plotAvgTemperaturesByYear
in the console you get a small summary of what the function does. This works exactly the same as it does for built-in functions like sum
, mean
, …
We didn’t have to specify it explicitly but roxygen2
adds some information about usage and what package the function belongs to.
The function is still poorly documented of course. In this post I won’t go over all the existing tags cause these are already well documented. When documenting functions you can get by by using just 3 tags:
@param
(to document input)@return
(to document output)@examples
(self-explanatory)Let’s make things a bit more precise:
...
#' @param avg_temperatures Data.frame. Has years and their corresponding average temperature.
#' @param place String. It's assumed this function is called with the nottem built-in dataset.
#' This dataset records the temperatures in Nottingham. Nottingham is the default argument.
#' You can override this argument if the need arises.
#' @return ggplot object
#' @examples
#' avg_temperatures <- data.frame(year = c(1920, 1921, 1922, 1923),
#' avgTemperature = c(5, 10, 8, 9))
#' plotAvgTemperaturesByYear(avg_temperatures)
#' @export
plotAvgTemperaturesByYear <- ...
Important note: the @export
is essential. If not the function is not added to the namespace and checking using devtools::check()
will fail. This is an issue because examples are more than just text, they’re actually executed each time you run devtools::check()
to make sure they’re at least executable (it’s still up to you to make sure they’re relevant).
Running the documentation again gives:
You can always document more but I tend to keep it to a minimum. Everything you document can also get out of date again so it’s good to strike the right balance.
Another benefit of using the roxygen
comment is we get a warning less now when running devtools::check()
.
Documenting using roxygen2
is quite easy. All you need to know is some basic tags and the devtools::document()
function takes care of the rest. For this reproducible analysis example I haven’t gone through the effort of documenting every function available. The functions are basic and are already covered with tests so in a way they’re self-documenting. However, if the project would be more complex I’d certainly make sure to document every function available as a best practice.