Download lect_2_handout

Lecture 2: Style and Function Super-Advanced R Style Guide This style guide is specific to FISH 512, Super-Advanced R and will be used throughout the course. It borrows heavily from Hadley Wickham’s Advanced R textbook and Google’s R Style Guide. The goal is to agree on a common style up front in order to write code that is clear and understandable to all members of the group. Everyone has preferred styles and will have to sacrifice these preferences to work with others. Hadley Wickham’s Advanced R textbook http://adv-r.had.co.nz/ Google’s R Style Guide https://google-styleguide.googlecode.com/svn/trunk/Rguide.xml 1. File Names Files should end in .R, be meaningful, and use underscores to separate words. GOOD: predict_ad_revenue.R, data_output.R BAD: foo.R 2. Object Identifiers Objects should use lowercase letters and periods to separate words. GOOD: variable.name, avg.clicks BAD: VariableName, avg_Clicks 3. Function Identifiers: Identifiers should use verbs, be all lowercase with words separated by underscores GOOD: function_name, calculate_avg_clicks BAD: Func, Avg_Clicks 4. Spacing Spaces should be around all binary operators, after “if” statements, and after commas. GOOD: Average <- mean(feet / 12 + inches, na.rm=TRUE) BAD: Average<- mean(feet/12+inches, na.rm=TRUE) 5. Curly Braces Opening braces should never be on its own line and closing brace should go on its own line unless followed by else. GOOD: if (is.null(y.lim)) { y.lim <- c(0, 0.06) } if (condition) { one or more lines } else { one or more lines } BAD: if (is.null(y.lim)) ylim <- c(0, 0.06) if (condition) { one or more lines } else { one or more lines } 6. Functions Function documentation should be descriptive enough that the user can use and understand the function without reading the code. This includes a one-sentence decription of the function, a list of of function arguments denoted by “Args:” with a description of each argument and data type, and a list of the return objects. We will use this type of function documentation when we use Roxygen to develop packages later in the course. CalculateSampleCovariance <- function(x, y, verbose = TRUE) { # Computes the sample covariance between two vectors. # # Args: # x: One of two vectors whose sample covariance is to be calculated. # y: The other vector. x and y must have the same length, greater than # one,with no missing values. # verbose: If TRUE, prints sample covariance; if not, not. Default is # TRUE # # Returns: # The sample covariance between x and y. n <- length(x) covariance <- var(x, y) if (verbose) cat("Covariance = ", round(covariance, 4), ".\n", sep = "") return(covariance) } Structuring Projects Well-structured code makes your life easier. Good layout ensures the integrity of the data, portability of the project, and makes the project easy to pick up after a break. Basic structure should be: C://project/ C://project/R/ -- Contains function files, no code that runs C://project/data/ -- Data are treated as read only. Usually .csv files C://project/figs/ -- Contains generated figures C://project/output/ -- Contains simulation output, processed datasets C://project/analysis.r -- Script that calls functions, reads data, creates figures, and outputs Rstudio has an easy-to-use project feature. Additional sources: http://www.rstudio.com/ide/docs/using/projects http://nicercode.github.io/blog/2013-04-05-projects/ http://carlboettiger.info/2012/05/06/research-workflow.html Functional Programming R is a functional programming language that focuses on the creation and manipulation of functions. Anything you can do with vectors, you can do with functions. This includes assigning them to variables, storing them in lists, passing them as arguments to other functions. Functions don’t even have to be named or stored. Functions remove redundancy and duplication in your code. The motivation behind functional programming is to start with small, easy-to-understand chunks of code and combine them into more complex analyses. Repetition in code allows for inconsistencies and makes it difficult to change code. The “do not repeat yourself” or DRY principle is one approach that states, “every piece of knowledge must have a single, unambiguous, authoritative representation within a system.” In cases when you are writing for loops or repeating lines of code, consider using apply() functions. Functionals: apply() Functionals are functions of functions. apply() takes any function in R and applies it to elements of a list or rows or columns of a matrix. Apply is an extremely powerful tool that can simplify your code. In general, apply()takes the form: apply(X, MARGIN, FUN, …) X = array MARGIN: 1 = rows, 2 = columns FUN: an R function (can define yourself) … : additional arguments to function FUN Although there are multiple flavors of apply(): lapply(), sapply(), and tapply() lapply()is useful when dealing with data frames. lapply applies the function to each component of the list (note, data frames are two-dimensional lists), and as a result there is no MARGIN argument. Output is always a list. lapply(X, FUN, …) X = list FUN: an R function (can define yourself) … : additional arguments to function FUN sapply()is a wrapper of lapply()that can return a vector, matrix, or array if appropriate. sapply(X, FUN, …, simplify = TRUE) X = list FUN: an R function (can define yourself) … : additional arguments to function FUN simplify: should the result be simplified to a vector, matrix or array? tapply()applies a function to categories, specified by INDEX. The packages plyr and dplyr are similar to tapply()but can be faster and have easier syntax. tapply(X, INDEX, FUN = NULL, …, simplify = TRUE) X = list INDEX: a list of one or more factors FUN: an R function (can define yourself) … : additional arguments to function FUN simplify: should the result be simplified to a vector, matrix or array? dplyr will be covered later in the course. dplyr has really simple and intuitive syntax and it’s super, super fast. It can apply functions to datasets with millions of rows in milliseconds. Users of plyr will appreciate the huge reduction in computation time. dplyr is awesome. Additional References: http://www.ats.ucla.edu/stat/r/library/advanced_function_r.htm http://adv-r.had.co.nz/Functionals.html http://blog.rstudio.org/2014/01/17/introducing-dplyr/ Closures Closures are functions that write functions. Their name comes from the fact that they enclose the environment of the parent function and can access all its variables. The concept of scope and environment will be explained in next week’s lecture. In R, almost every function is a closure. The exceptions are primitive functions like sum or cumsum which call to C directly. Debugging R has a few somewhat clunky and inelegant debugging tools like traceback()and browser(). Trevor wrote a debugging lecture for FISH 554, Advanced R Programming, which is available on the course website, and Sean Anderson has written a blog post outlining these debugging tools. However, RStudio now has a line-by-line debugging tool that probably renders these methods obsolete. RStudio Debugging: http://www.rstudio.com/ide/docs/debugging/overview Sean’s blog post: http://seananderson.ca/2013/08/23/debugging-r.html

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download lect_2_handout