Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Writing Functions in R We have already used several existing R functions in previous handouts. For example, consider the Grades dataset. Once the data frame has been attached in R, we can call the mean() function as follows to compute the average of the Final Exam scores. > mean(Final) [1] 69.1791 We can also call the median() function. > median(Final) [1] 72 Before we discussing writing new functions in R, let’s examine an existing function more closely. For more insight into the coding behind the median() function, consider the following R commands and output. > median function (x, na.rm = FALSE) UseMethod("median") <bytecode: 0x000000000b265d20> <environment: namespace:stats> > methods(median) [1] median.default > median.default function (x, na.rm = FALSE) { if (is.factor(x) || is.data.frame(x)) stop("need numeric data") if (length(names(x))) names(x) <- NULL if (na.rm) x <- x[!is.na(x)] else if (any(is.na(x))) return(x[FALSE][NA]) n <- length(x) if (n == 0L) return(x[FALSE][NA]) half <- (n + 1L)%/%2L if (n%%2L == 1L) sort(x, partial = half)[half] else mean(sort(x, partial = half + 0L:1L)[half + 0L:1L]) } 1 When determining how to use this function, note that this function contains two arguments: x and na.rm. For more information on these arguments, enter the following at the prompt. > help(median) Description Compute the sample median. Usage median(x, na.rm = FALSE) Arguments x an object for which a method has been defined, or a numeric vector containing the values whose median is to be computed. na.rm a logical value indicating whether NA values should be stripped before the computation proceeds. Notice that by default, the na.rm argument is set to FALSE. This means that the following arguments both return the same result: > median(Final) [1] 72 > median(Final,na.rm=FALSE) [1] 72 The previous example gives us some insight into how functions are programmed with arguments and called in R. You can learn a lot about R programming by examining the code behind functions that were written by others, and before you know it, you’ll be writing your own. The remainder of this handout provides an introduction to the basics of writing your own functions in R. WRITING NEW FUNCTIONS IN R One of the advantages of using a scriptable language like R is the ability to write your own functions (or to modify existing functions). An R programmer can define their own functions using the function() and return() functions. Description These functions provide the base mechanisms for defining new functions in the R language. 2 Usage function( arglist ) expr return(value) Arguments arglist Empty or one or more name or name=expression terms. expr An expression. value An expression. Example 1: Creating a function to add two numbers Note that the sum() function already exists in R and can be used to add two numbers; for illustrative purposes, however, we will write our own simple function to do this. add_two <- function(a,b){ a+b } To call the function, enter the following at the prompt. > add_two(3,9) [1] 12 Example 2: Creating a function to compute multiple quantities Next, suppose you want to write a function in R to both find the difference between two values and the ratio of those two values. You may start with the following. f1 <- function(a,b){ a-b a/b } > f1(.5,.25) [1] 2 Note that the function returns the result of only the last of the computations. To report all of the results, you can use the return statement as shown below. f1 <- function(a,b){ return(c(a-b,a/b)) } 3 > f1(.5,.25) [1] 0.25 2.00 Finally, note that it may be more useful to assign names to the values calculated and to return a more fle xible data type (such as a list object) to provide more information about the calculations that have been performed. The following programming statements return the results in a list. f1 <- function(a,b){ Result1 = a-b Result2 = a/b list(Difference=Result1,Ratio=Result2) } > f1(.5,.25) $Difference [1] 0.25 $Ratio [1] 2 Example 3: Computing the mean of a vector of numbers Once again, note that the mean() function already exists in R and can be used to find the average of a vector of numbers; for illustrative purposes, however, we will write our own simple function to do this. Suppose we want to name our function average. We can check to make sure that this is not already a keyword in R (we wouldn’t want to overwrite an existing function!). > ?average No documentation for ‘average’ in specified packages and libraries: you could try ‘??average’ Next, we can write the function as follows. average <- function(x){ s = sum(x) avg = s/length(x) return (avg) } Note that this function can now be applied to calculate the average of the final exam scores from the Grades data frame. > attach(Grades) > average(Final) [1] 69.1791 4 Example 4: Computing the standard error of the mean and dealing with missing values Recall that the standard error of a sample mean is computed as follows: s n , where s is the sample standard deviation and n is the sample size. The R base package does not contain a function to compute this standard error, so let’s write our own function named sem(). sem <- function(x){ n = length(x) se = sd(x)/sqrt(n) return (se) } > sem(Final) [1] 1.756686 Next, import the data file Grades_missing. Note that in one case, a student did not take the final exam. > Grades_missing[1:3,c("FirstName", "LastName", "Final")] FirstName LastName Final 1 Aaron Albrecht 63 2 Aaron Alen 51 3 Abbey Antoff NA What impact will this have on using our sem() function? > sem(Final) [1] NA Clearly, our function does not work with missing data. How do we fix it? First, let’s remove the old function from our workspace. > rm(sem) Now, we can update our function. sem <- function(x){ n = length(na.omit(x)) se = sd(x,na.rm=TRUE)/sqrt(n) return (se) } > sem(Final) [1] 1.76148 5 Example 5: Modifying the summary() function Recall that the summary() function can be used to calculate several summary statistics for the final exam scores. > detach(Grades_missing) > attach(Grades) > summary(Final) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.00 60.25 72.00 69.18 83.00 100.00 What if you would also like to see the standard deviation, sample size, and standard error of the mean? You can write your own function to add to the output provided by the summary() function. Here, we will name this revised function num.summary. num.summary <- function(x){ original = summary(x) s = sd(x,na.rm=TRUE) n=length(na.omit(x)) se=s/sqrt(n) list(c(original,"Std Dev" = s, "Sample Size" = n, "SE of mean" = se)) } > num.summary(Final) [[1]] Min. 1st Qu. Median Mean 3rd Qu. Max. Std Dev Sample Size SE of mean .000000 60.250000 72.000000 69.180000 83.000000 100.000000 20.335106 134.000000 1.756686 Finally, note that you could change the layout of the output, if desired. num.summary <- function(x){ Min = summary(x)[1] Q1 = summary(x)[2] Q2 = summary(x)[3] Q3 = summary(x)[5] Max = summary(x)[6] Mean = summary(x)[4] s = sd(x,na.rm=TRUE) n=length(na.omit(x)) se=s/sqrt(n) cat(paste("Min: ", Min)) cat("\n") cat(paste("Q1: ", Q1)) cat("\n") cat(paste("Median: ", Q2)) cat("\n") cat(paste("Q3: ", Q3)) cat("\n") cat(paste("Max: ", Max)) 6 cat("\n") cat(paste("Mean: ", Mean)) cat("\n") cat(paste("Standard Deviation: ", s)) cat("\n") cat(paste("Sample Size: ", n)) cat("\n") cat(paste("Standard Error: ", se)) } > num.summary(Final) Min: 0 Q1: 60.25 Median: 72 Q3: 83 Max: 100 Mean: 69.18 Standard Deviation: 20.3351064067899 Sample Size: 134 Standard Error: 1.7566856355663 Example 6: Modifying the plot() function Suppose you want the plot() function to use a smaller gray-filled circle as the default plotting symbol instead of an open circle. You can modify the existing plot()function as follows. Note that you can pass an unspecified number of parameters to a function using the … notation; however, when using the … notation, be sure to carefully consider the order of the arguments in the list. my.plot <- function(..., pch.new=21, bg.new='gray', cex.new=.75 ) { plot(..., pch=pch.new, bg=bg.new, cex=cex.new ) } > my.plot(Exam1,Final) 7 > plot(Exam1,Final) Tasks: 1. Recall that the formula for a sphere is 4⁄3πr3. Write a function named sphere.volume that returns the volume of a sphere when given the radius r as a parameter. Then, call the function to return the volume of a sphere with a radius of 5. Call the function again to return the volume of a sphere with a radius of 10. 2. Write a function named my.plot which calls the plot() function but uses a blue open circle as the default plotting symbol, instead. Use your function to obtain a scatterplot of Exam 1 vs. Exam 2 scores from the Grades data set. 3. Recall that the syntax for the by() function. Usage by(data, INDICES, FUN, ..., simplify = TRUE) Arguments data an R object, normally a data frame, possibly a matrix. INDICES a factor or a list of factors, each of length nrow(data). FUN a function to be applied to (usually data-frame) subsets of data. ... further arguments to FUN. Suppose you always mess up the order of the arguments because you think it makes more sense to list the arguments in this order, instead: (1) the function to be applied, (2) the data frame (or 8 subset of a data frame) to which you want to apply the function, and (3) the BY factor. Write a function named my.by() which allows you to enter the arguments in the order you desire. Then, import the NYC_trees data set. Use your my.by() function to compute the mean Foliage Density for each Condition. 9