Download lect_2_handout

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Go (programming language) wikipedia , lookup

Subroutine wikipedia , lookup

Name mangling wikipedia , lookup

Falcon (programming language) wikipedia , lookup

Recursion (computer science) wikipedia , lookup

Indentation style wikipedia , lookup

C syntax wikipedia , lookup

Functional programming wikipedia , lookup

Corecursion wikipedia , lookup

Dirac delta function wikipedia , lookup

Function object wikipedia , lookup

APL syntax and symbols wikipedia , lookup

C++ wikipedia , lookup

Standard ML wikipedia , lookup

Transcript
Lecture 2: Style and Function
Super-Advanced R Style Guide
This style guide is specific to FISH 512, Super-Advanced R and will be used throughout the
course. It borrows heavily from Hadley Wickham’s Advanced R textbook and Google’s R
Style Guide.
The goal is to agree on a common style up front in order to write code that is clear and
understandable to all members of the group. Everyone has preferred styles and will have to
sacrifice these preferences to work with others.
Hadley Wickham’s Advanced R textbook
http://adv-r.had.co.nz/
Google’s R Style Guide
https://google-styleguide.googlecode.com/svn/trunk/Rguide.xml
1. File Names
Files should end in .R, be meaningful, and use underscores to separate words.
GOOD:
predict_ad_revenue.R, data_output.R
BAD:
foo.R
2. Object Identifiers
Objects should use lowercase letters and periods to separate words.
GOOD:
variable.name, avg.clicks
BAD:
VariableName, avg_Clicks
3. Function Identifiers:
Identifiers should use verbs, be all lowercase with words separated by underscores
GOOD:
function_name, calculate_avg_clicks
BAD:
Func, Avg_Clicks
4. Spacing
Spaces should be around all binary operators, after “if” statements, and after commas.
GOOD:
Average <- mean(feet / 12 + inches, na.rm=TRUE)
BAD:
Average<- mean(feet/12+inches, na.rm=TRUE)
5. Curly Braces
Opening braces should never be on its own line and closing brace should go on its own line
unless followed by else.
GOOD:
if (is.null(y.lim)) {
y.lim <- c(0, 0.06)
}
if (condition) {
one or more lines
} else {
one or more lines
}
BAD:
if (is.null(y.lim)) ylim <- c(0, 0.06)
if (condition) {
one or more lines
}
else {
one or more lines
}
6. Functions
Function documentation should be descriptive enough that the user can use and
understand the function without reading the code. This includes a one-sentence decription
of the function, a list of of function arguments denoted by “Args:” with a description of each
argument and data type, and a list of the return objects. We will use this type of function
documentation when we use Roxygen to develop packages later in the course.
CalculateSampleCovariance <- function(x, y, verbose = TRUE) {
# Computes the sample covariance between two vectors.
#
# Args:
#
x: One of two vectors whose sample covariance is to be calculated.
#
y: The other vector. x and y must have the same length, greater than
#
one,with no missing values.
#
verbose: If TRUE, prints sample covariance; if not, not. Default is
#
TRUE
#
# Returns:
#
The sample covariance between x and y.
n <- length(x)
covariance <- var(x, y)
if (verbose)
cat("Covariance = ", round(covariance, 4), ".\n", sep = "")
return(covariance)
}
Structuring Projects
Well-structured code makes your life easier. Good layout ensures the integrity of the data,
portability of the project, and makes the project easy to pick up after a break. Basic
structure should be:
C://project/
C://project/R/ -- Contains function files, no code that runs
C://project/data/ -- Data are treated as read only. Usually .csv files
C://project/figs/ -- Contains generated figures
C://project/output/ -- Contains simulation output, processed datasets
C://project/analysis.r -- Script that calls functions, reads data, creates figures, and outputs
Rstudio has an easy-to-use project feature.
Additional sources:
http://www.rstudio.com/ide/docs/using/projects
http://nicercode.github.io/blog/2013-04-05-projects/
http://carlboettiger.info/2012/05/06/research-workflow.html
Functional Programming
R is a functional programming language that focuses on the creation and manipulation of
functions. Anything you can do with vectors, you can do with functions. This includes
assigning them to variables, storing them in lists, passing them as arguments to other
functions. Functions don’t even have to be named or stored.
Functions remove redundancy and duplication in your code. The motivation behind
functional programming is to start with small, easy-to-understand chunks of code and
combine them into more complex analyses. Repetition in code allows for inconsistencies
and makes it difficult to change code.
The “do not repeat yourself” or DRY principle is one approach that states, “every piece of
knowledge must have a single, unambiguous, authoritative representation within a
system.”
In cases when you are writing for loops or repeating lines of code, consider using apply()
functions.
Functionals: apply()
Functionals are functions of functions. apply() takes any function in R and applies it to
elements of a list or rows or columns of a matrix. Apply is an extremely powerful tool that
can simplify your code. In general, apply()takes the form:
apply(X, MARGIN, FUN, …)
X = array
MARGIN: 1 = rows, 2 = columns
FUN: an R function (can define yourself)
… : additional arguments to function FUN
Although there are multiple flavors of apply(): lapply(), sapply(), and
tapply()
lapply()is useful when dealing with data frames. lapply applies the function to each
component of the list (note, data frames are two-dimensional lists), and as a result there is
no MARGIN argument. Output is always a list.
lapply(X, FUN, …)
X = list
FUN: an R function (can define yourself)
… : additional arguments to function FUN
sapply()is a wrapper of lapply()that can return a vector, matrix, or array if
appropriate.
sapply(X, FUN, …, simplify = TRUE)
X = list
FUN: an R function (can define yourself)
… : additional arguments to function FUN
simplify: should the result be simplified to a vector, matrix or array?
tapply()applies a function to categories, specified by INDEX. The packages plyr and
dplyr are similar to tapply()but can be faster and have easier syntax.
tapply(X, INDEX, FUN = NULL, …, simplify = TRUE)
X = list
INDEX: a list of one or more factors
FUN: an R function (can define yourself)
… : additional arguments to function FUN
simplify: should the result be simplified to a vector, matrix or array?
dplyr will be covered later in the course. dplyr has really simple and intuitive syntax
and it’s super, super fast. It can apply functions to datasets with millions of rows in
milliseconds. Users of plyr will appreciate the huge reduction in computation time.
dplyr is awesome.
Additional References:
http://www.ats.ucla.edu/stat/r/library/advanced_function_r.htm
http://adv-r.had.co.nz/Functionals.html
http://blog.rstudio.org/2014/01/17/introducing-dplyr/
Closures
Closures are functions that write functions. Their name comes from the fact that they
enclose the environment of the parent function and can access all its variables. The concept
of scope and environment will be explained in next week’s lecture. In R, almost every
function is a closure. The exceptions are primitive functions like sum or cumsum which
call to C directly.
Debugging
R has a few somewhat clunky and inelegant debugging tools like traceback()and
browser(). Trevor wrote a debugging lecture for FISH 554, Advanced R Programming,
which is available on the course website, and Sean Anderson has written a blog post
outlining these debugging tools. However, RStudio now has a line-by-line debugging tool
that probably renders these methods obsolete.
RStudio Debugging:
http://www.rstudio.com/ide/docs/debugging/overview
Sean’s blog post:
http://seananderson.ca/2013/08/23/debugging-r.html