Download Writing Functions in R

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Types of artificial neural networks wikipedia , lookup

Transcript
Writing Functions in R
We have already used several existing R functions in previous handouts. For example, consider the
Grades dataset. Once the data frame has been attached in R, we can call the mean() function as follows
to compute the average of the Final Exam scores.
> mean(Final)
[1] 69.1791
We can also call the median() function.
> median(Final)
[1] 72
Before we discussing writing new functions in R, let’s examine an existing function more closely. For
more insight into the coding behind the median() function, consider the following R commands and
output.
> median
function (x, na.rm = FALSE)
UseMethod("median")
<bytecode: 0x000000000b265d20>
<environment: namespace:stats>
> methods(median)
[1] median.default
> median.default
function (x, na.rm = FALSE)
{
if (is.factor(x) || is.data.frame(x))
stop("need numeric data")
if (length(names(x)))
names(x) <- NULL
if (na.rm)
x <- x[!is.na(x)]
else if (any(is.na(x)))
return(x[FALSE][NA])
n <- length(x)
if (n == 0L)
return(x[FALSE][NA])
half <- (n + 1L)%/%2L
if (n%%2L == 1L)
sort(x, partial = half)[half]
else mean(sort(x, partial = half + 0L:1L)[half + 0L:1L])
}
1
When determining how to use this function, note that this function contains two arguments: x and
na.rm. For more information on these arguments, enter the following at the prompt.
> help(median)
Description
Compute the sample median.
Usage
median(x, na.rm = FALSE)
Arguments
x
an object for which a method has been defined, or a numeric vector containing the values whose
median is to be computed.
na.rm a logical value indicating whether NA values should be stripped before the computation proceeds.
Notice that by default, the na.rm argument is set to FALSE. This means that the following arguments
both return the same result:
> median(Final)
[1] 72
> median(Final,na.rm=FALSE)
[1] 72
The previous example gives us some insight into how functions are programmed with arguments and
called in R. You can learn a lot about R programming by examining the code behind functions that were
written by others, and before you know it, you’ll be writing your own. The remainder of this handout
provides an introduction to the basics of writing your own functions in R.
WRITING NEW FUNCTIONS IN R
One of the advantages of using a scriptable language like R is the ability to write your own functions (or
to modify existing functions). An R programmer can define their own functions using the function() and
return() functions.
Description
These functions provide the base mechanisms for defining new functions in the R language.
2
Usage
function( arglist ) expr
return(value)
Arguments
arglist Empty or one or more name or name=expression terms.
expr
An expression.
value An expression.
Example 1: Creating a function to add two numbers
Note that the sum() function already exists in R and can be used to add two numbers; for illustrative
purposes, however, we will write our own simple function to do this.
add_two <- function(a,b){
a+b
}
To call the function, enter the following at the prompt.
> add_two(3,9)
[1] 12
Example 2: Creating a function to compute multiple quantities
Next, suppose you want to write a function in R to both find the difference between two values and the
ratio of those two values. You may start with the following.
f1 <- function(a,b){
a-b
a/b
}
> f1(.5,.25)
[1] 2
Note that the function returns the result of only the last of the computations. To report all of the
results, you can use the return statement as shown below.
f1 <- function(a,b){
return(c(a-b,a/b))
}
3
> f1(.5,.25)
[1] 0.25 2.00
Finally, note that it may be more useful to assign names to the values calculated and to return a more fle
xible data type (such as a list object) to provide more information about the calculations that have been
performed. The following programming statements return the results in a list.
f1 <- function(a,b){
Result1 = a-b
Result2 = a/b
list(Difference=Result1,Ratio=Result2)
}
> f1(.5,.25)
$Difference
[1] 0.25
$Ratio
[1] 2
Example 3: Computing the mean of a vector of numbers
Once again, note that the mean() function already exists in R and can be used to find the average of a
vector of numbers; for illustrative purposes, however, we will write our own simple function to do this.
Suppose we want to name our function average. We can check to make sure that this is not already a
keyword in R (we wouldn’t want to overwrite an existing function!).
> ?average
No documentation for ‘average’ in specified packages and libraries:
you could try ‘??average’
Next, we can write the function as follows.
average <- function(x){
s = sum(x)
avg = s/length(x)
return (avg)
}
Note that this function can now be applied to calculate the average of the final exam scores from the
Grades data frame.
> attach(Grades)
> average(Final)
[1] 69.1791
4
Example 4: Computing the standard error of the mean and dealing with missing values
Recall that the standard error of a sample mean is computed as follows: s
n , where s is the sample
standard deviation and n is the sample size. The R base package does not contain a function to compute
this standard error, so let’s write our own function named sem().
sem <- function(x){
n = length(x)
se = sd(x)/sqrt(n)
return (se)
}
> sem(Final)
[1] 1.756686
Next, import the data file Grades_missing. Note that in one case, a student did not take the final exam.
> Grades_missing[1:3,c("FirstName", "LastName", "Final")]
FirstName LastName Final
1 Aaron Albrecht 63
2 Aaron Alen 51
3 Abbey Antoff NA
What impact will this have on using our sem() function?
> sem(Final)
[1] NA
Clearly, our function does not work with missing data. How do we fix it? First, let’s remove the old
function from our workspace.
> rm(sem)
Now, we can update our function.
sem <- function(x){
n = length(na.omit(x))
se = sd(x,na.rm=TRUE)/sqrt(n)
return (se)
}
> sem(Final)
[1] 1.76148
5
Example 5: Modifying the summary() function
Recall that the summary() function can be used to calculate several summary statistics for the final exam
scores.
> detach(Grades_missing)
> attach(Grades)
> summary(Final)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 60.25 72.00 69.18 83.00 100.00
What if you would also like to see the standard deviation, sample size, and standard error of the mean?
You can write your own function to add to the output provided by the summary() function. Here, we will
name this revised function num.summary.
num.summary <- function(x){
original = summary(x)
s = sd(x,na.rm=TRUE)
n=length(na.omit(x))
se=s/sqrt(n)
list(c(original,"Std Dev" = s, "Sample Size" = n, "SE of mean" = se))
}
> num.summary(Final)
[[1]]
Min. 1st Qu. Median Mean 3rd Qu. Max. Std Dev Sample Size SE of mean
.000000 60.250000 72.000000 69.180000 83.000000 100.000000 20.335106 134.000000 1.756686
Finally, note that you could change the layout of the output, if desired.
num.summary <- function(x){
Min = summary(x)[1]
Q1 = summary(x)[2]
Q2 = summary(x)[3]
Q3 = summary(x)[5]
Max = summary(x)[6]
Mean = summary(x)[4]
s = sd(x,na.rm=TRUE)
n=length(na.omit(x))
se=s/sqrt(n)
cat(paste("Min: ", Min))
cat("\n")
cat(paste("Q1: ", Q1))
cat("\n")
cat(paste("Median: ", Q2))
cat("\n")
cat(paste("Q3: ", Q3))
cat("\n")
cat(paste("Max: ", Max))
6
cat("\n")
cat(paste("Mean: ", Mean))
cat("\n")
cat(paste("Standard Deviation: ", s))
cat("\n")
cat(paste("Sample Size: ", n))
cat("\n")
cat(paste("Standard Error: ", se))
}
> num.summary(Final)
Min: 0
Q1: 60.25
Median: 72
Q3: 83
Max: 100
Mean: 69.18
Standard Deviation: 20.3351064067899
Sample Size: 134
Standard Error: 1.7566856355663
Example 6: Modifying the plot() function
Suppose you want the plot() function to use a smaller gray-filled circle as the default plotting symbol
instead of an open circle. You can modify the existing plot()function as follows. Note that you can pass
an unspecified number of parameters to a function using the … notation; however, when using the …
notation, be sure to carefully consider the order of the arguments in the list.
my.plot <- function(..., pch.new=21, bg.new='gray', cex.new=.75 ) {
plot(..., pch=pch.new, bg=bg.new, cex=cex.new )
}
> my.plot(Exam1,Final)
7
> plot(Exam1,Final)
Tasks:
1. Recall that the formula for a sphere is 4⁄3πr3. Write a function named sphere.volume that returns
the volume of a sphere when given the radius r as a parameter. Then, call the function to return
the volume of a sphere with a radius of 5. Call the function again to return the volume of a
sphere with a radius of 10.
2. Write a function named my.plot which calls the plot() function but uses a blue open circle as the
default plotting symbol, instead. Use your function to obtain a scatterplot of Exam 1 vs. Exam 2
scores from the Grades data set.
3. Recall that the syntax for the by() function.
Usage
by(data, INDICES, FUN, ..., simplify = TRUE)
Arguments
data
an R object, normally a data frame, possibly a matrix.
INDICES a factor or a list of factors, each of length nrow(data).
FUN
a function to be applied to (usually data-frame) subsets of data.
...
further arguments to FUN.
Suppose you always mess up the order of the arguments because you think it makes more sense
to list the arguments in this order, instead: (1) the function to be applied, (2) the data frame (or
8
subset of a data frame) to which you want to apply the function, and (3) the BY factor. Write a
function named my.by() which allows you to enter the arguments in the order you desire. Then,
import the NYC_trees data set. Use your my.by() function to compute the mean Foliage Density
for each Condition.
9