R style guide for ZOL851

This is derived from Hadley Wickham's R style guide, under CC license (thanks Hadley!).

Hadley's is based on Google's R style guide.

Why is style important?

Good coding style is like using correct punctuation when writing: you can manage without it, but it sure makes things easier to read. As with punctuation, there are many possible variations. The importance is in consistency. After this class you are free to develop your own style, but these rules will make sure everybody starts off in the same place.

Writing code for communication is important because in this class your code will have one author (you!) but multiple readers:

  • You, when you're writing it.
  • Me, Charles (and anyone else), when grading it.
  • Your group members, during projects.
  • Future you, when you look at this code in 6 months time.

Make sure to read the code grading rubric thoroughly to ensure that you don't lose marks for silly mistakes. It will be hard at first, but you'll get used to it pretty quickly.

Notation and naming

File names

File names should end in .r and be meaningful.

When handing in an electronic copy of your assignment code, put your files in a directory called firstname-lastname-hw-x then upload a zip of that directory.

Good
drosophila-longevity.r
ian-dworkin-hw-1.r
Bad
foo.r
my-homework.R

Variable Identifiers

Variable names should be lowercase. Use _ to separate words within a name. Generally, variable names should be nouns. Strive for concise but meaningful names (this is not easy!). When ever possible include the object class name in the variable (lm for linear model, data for dataframe, etc..). Model objects should also use this coding convention but make it clear that it is a model

Good
drosophila_longevity_data
model_longevity_matrix
drosophila_longevity_lm_1
Bad
first_day_of_the_month
DayOne
dayone
djm1

Constants Identifiers

constant names should be upper case, and used sparingly. Use _ to separate words within a name, like with variable identifiers. Other than using upper case, the same rules apply for constant identifiers as for variable identifiers

Function Identifiers

function names should be CamelCase. Use Capitalization of the first letter to separate words within a name. Generally, function names should be verbs, if possible. Strive for concise but meaningful names (this is not easy!).

Good
CoefficientVariationCalculator
BootstrapLinearModels
FindMode
Bad
find_mode
find.mode
Function1
djm1

Syntax

Spacing

Place spaces around all binary operators (=, +, -, <-, etc.). Do not place a space before a comma, but always place one after a comma.

Good
average <- mean(feet / 12 + inches, na.rm = T)
Bad
average<-mean(feet/12+inches,na.rm=T)

Place a space before left parentheses, except in a function call.

Good
if (debug)
plot(x, y)
Bad
if(debug)
plot (x, y)

Extra spacing (i.e., more than one space in a row) is okay if it improves alignment of equals signs or arrows (<-).

 
list(
  x = call_this_long_function(a, b), 
  y = a * e / d ^ f)
 
list(
  total = a + b + c, 
  mean  = (a + b + c) / n)

Do not place spaces around code in parentheses or square brackets. (Except if there's a trailing comma: always place a space after a comma.)

Good
if (debug)
diamonds[5, ]
Bad
if ( debug ) # No spaces around debug
x[1,] # Needs a space after the comma
x[1 ,] # Space goes after, not before

Curly braces

An opening curly brace should never go on its own line and should always be followed by a new line; a closing curly brace should always go on its own line, unless followed by else.

Always indent the code inside the curly braces.

Good
 
 
if (y < 0 && debug) {
  message("Y is negative")
}
 
if (y == 0) {
  log(x)
} else {
  y ^ x
}
Bad
 
 
if (y < 0 && debug)
message("Y is negative")
 
if (y == 0) {
  log(x)
} 
else {
  y ^ x
}
 
 
if (y < 0 && debug) message("Y is negative")

Indentation

When indenting your code, use two spaces. Never use tabs or mix tabs and spaces.

Line length

Keep your lines less than 80 characters. This is the amount that will fit comfortably on a printed page at a reasonable size. If you find you are running out of room, this is probably an indication that you should encapsulate some of the work in a separate function.

Assignment

Use <-, not =, for assignment.

Good
x <- 5
Bad
x = 5

Organization

Commenting guidelines

Comment your code. Entire commented lines should begin with # and one space. Comments should explain the why, not the what.

Short comments can also be placed after code preceded by two spaces, #, and then one space.

# Create histogram of frequency of campaigns by pct budget spent.
hist(df$pctSpent,
     breaks = "scott",  # method for choosing number of buckets
     main   = "Histogram: fraction budget spent by campaignid",
     xlab   = "Fraction of budget spent",
     ylab   = "Frequency (count of campaignids)")

In general it is also important to use commented lines of - and = to break up your files into scanable chunks

General Layout and Ordering

If everyone uses the same general ordering, we'll be able to read and understand each other's scripts faster and more easily.

  1. Author comment
  2. File description comment, including purpose of program, inputs, and outputs
  3. source() and library() statements
  4. Function definitions
  5. Executed statements, if applicable (e.g., print, plot)

Function definitions, function calls and source files

  • Function Definitions and Calls
  • Function definitions should first list arguments without default values, followed by those with default values.

    In both function definitions and function calls, multiple arguments per line are allowed; line breaks are only allowed between assignments.


    GOOD:

    PredictCTR <- function(query, property, numDays,
                           showPlot = TRUE)
    
    BAD:
    PredictCTR <- function(query, property, numDays, showPlot =
                           TRUE)
    

  • Function Documentation
  • Functions should contain a comments section immediately below the function definition line. These comments should consist of a one-sentence description of the function; a list of the function's arguments, denoted by Args:, with a description of each (including the data type); and a description of the return value, denoted by Returns:. The comments should be descriptive enough that a caller can use the function without reading any of the function's code.

  • Example Function
  •  
     
    CalculateSampleCovariance <- function(x, y, verbose = TRUE) {
      # Computes the sample covariance between two vectors.
      #
      # Args:
      #   x: One of two vectors whose sample covariance is to be calculated.
      #   y: The other vector. x and y must have the same length, greater than one,
      #      with no missing values.
      #   verbose: If TRUE, prints sample covariance; if not, not. Default is TRUE.
      #
      # Returns:
      #   The sample covariance between x and y.
      n <- length(x)
      # Error handling
      if (n <= 1 || n != length(y)) {
        stop("Arguments x and y have invalid lengths: ",
             length(x), " and ", length(y), ".")
      }
      if (TRUE %in% is.na(x) || TRUE %in% is.na(y)) {
        stop(" Arguments x and y must not have missing values.")
      }
      covariance <- var(x, y)
      if (verbose)
        cat("Covariance = ", round(covariance, 4), ".\n", sep = "")
      return(covariance)
    }
     
    

    As an alternative, you can also use the comment(FunctionName) function in R (which I use alot), to comment your function, so that when you (or anyone else) uses that function they can actually see the comment directly.

    comment(CalculateSampleCovariance) <- c("Computes the sample covariance between two vectors.",

    "Args:

    x: One of two vectors whose sample covariance is to be calculated.

    y: The other vector. x and y must have the same length, greater than one, with no missing values.

    verbose: If TRUE, prints sample covariance; if not, not. Default is TRUE." )

    This will allow you to see the comments directly simply by calling str(FunctionName)

    Language

    • Attach
    • The possibilities for creating errors when using attach are numerous. Avoid it.

    • source()
    • Placing all of your functions that you use for this course in a common file is a GOOD idea. This way you can re-use it easily, and just call it by source(SourceFileName). It also results in much cleaner code as we can seperate the function call from the function itself.

      Parting Words

      Use common sense and BE CONSISTENT.

      If you are editing code, take a few minutes to look at the code around you and determine its style. If others use spaces around their if clauses, you should, too. If their comments have little boxes of stars around them, make your comments have little boxes of stars around them, too.

      The point of having style guidelines is to have a common vocabulary of coding so people can concentrate on what you are saying, rather than on how you are saying it. We present global style rules here so people know the vocabulary. But local style is also important. If code you add to a file looks drastically different from the existing code around it, the discontinuity will throw readers out of their rhythm when they go to read it. Try to avoid this. OK, enough writing about writing code; the code itself is much more interesting. Have fun!

      References

      http://www.maths.lth.se/help/R/RCC/ - R Coding Conventions
      http://ess.r-project.org/ - For emacs users. This runs R in your emacs and has an emacs mode.