Download Introduction to Graphics in R

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Data assimilation wikipedia , lookup

Choice modelling wikipedia , lookup

Least squares wikipedia , lookup

Transcript
Introduction to Programming in R
Department of Statistical Sciences and Operations
Research
Computation Seminar Series
Speaker: Edward Boone
Email: [email protected]
What is R?





The R statistical programming language is a free
open source package based on the S language
developed by Bell Labs.
The language is very powerful for writing programs.
Many statistical functions are already built in.
Contributed packages expand the functionality to
cutting edge research.
Since it is a programming language, generating
computer code to complete tasks is required.
Getting Started







Where to get R?
Go to www.r-project.org
Downloads: CRAN
Set your Mirror: Anyone in the USA is fine.
Select Windows 95 or later.
Select base.
Select R-2.4.1-win32.exe

The others are if you are a developer and wish to
change the source code.
Getting Started

The R GUI?
Getting Started


Opening a script.
This gives you a script window.
Getting Started
Submit Selection



Submitting a
program:
Use button
Right mouse click
and run selection.
Getting Started


Basic assignment and operations.
Arithmetic Operations:


Matrix Arithmetic.



+, -, *, /, ^ are the standard arithmetic operators.
* is element wise multiplication
%*% is matrix multiplication
Assignment

To assign a value to a variable use “<-”
Getting Started

How to use help in R?





R has a very good help system built in.
If you know which function you want help with
simply use ?_______ with the function in the
blank.
Ex: ?hist.
If you don’t know which function to use, then use
help.search(“_______”).
Ex: help.search(“histogram”).
Importing Data




How do we get data into R?
Remember we have no point and click…
First make sure your data is in an easy to
read format such as CSV (Comma Separated
Values).
Use code:

D <- read.table(“path”,sep=“,”,header=TRUE)
Working with data.



Accessing columns.
D has our data in it…. But you can’t see it
directly.
To select a column use D$column.
Working with data.


Subsetting data.
Use a logical operator to do this.



==, >, <, <=, >=, <> are all logical operators.
Note that the “equals” logical operator is two = signs.
Example:





D[D$Gender == “M”,]
This will return the rows of D where Gender is “M”.
Remember R is case sensitive!
This code does nothing to the original dataset.
D.M <- D[D$Gender == “M”,] gives a dataset with the
appropriate rows.
Creating a Vector

To create a vector use the c() function
b <- c(3,1,0.3,0.1)
This creates the column vector
 3 
 
 1 
b 
0 .3
 
 0 .1 
 
Random Number Generation

Random number generation is important
in simulations as well as some model
fitting techniques.

Consider:
X1 <- rnorm(100,5,2)
This generates a vector of 100 normal
random variables with mean 5 and
standard deviation 2.
Random Number Generation

Generate two more vectors:
X2 <- rnorm(100,15,3)
X3 <- rnorm(100,22,5)

This gives us two more vectors of
normally distributed values.
Determining the Size of a Vector

Use the length function.
n1 <- length(X1)

Use this only for vectors. Can
produce different results on
matricies.
Creating a Vector of Repeated Values


Often we want a vector of ones around.
Use the rep() function.
ones <- rep(1,n1)

This creates a vector of ones of length
n1.
Creating a Matrix from Vectors

Use the cbind() function.

X <- cbind(ones,X1,X2,X3)

This binds the column vectors together into
a matrix.
Create a Regression Relationship

Using our randomly generated data create a
regression relationship.
Y  X  
 ~ N (0, I )

Use the code:
Y <- X%*%b + rnorm(100,0,1)
Estimate a Regression Model

Find the normal equations
X ' X  X ' Y

Use the code
XtX <- t(X)%*%X
XtY <- t(X)%*%Y
Solve the normal equations

To estimate the regression parameters solve the
normal equations.
1
ˆ
  ( X ' X ) X 'Y

Use the following code.
bhat <- solve(XtX)%*%XtY

Check it
bhat
lm(Y ~ X1 + X2 + X3)
Create a Regression Function

Use the function() format
reg1 <- function(Y,X){
res <- solve(t(X)%*%X)%*%t(X)%*%Y
return(res)
}


Don’t forget to return the result.
Remember the code in braces is the function.
Try the function
 Use

the data already created.
reg1(Y,X)
Add to the function

Use the list function to return more than one
result. Essentially, you are adding properties
to the object reg2.
reg2 <- function(Y,X){
coeff <- solve(t(X)%*%X)%*%t(X)%*%Y
resid <- Y - X%*%coeff
mse <- t(resid)%*%resid/(length(Y)-length(coeff)-1)
res <- list(coeff,resid,mse)
return(res)
}
Try the function
 Use

the data already created.
reg2(Y,X)
Add names to the function properties

Use the names function allows you to name
the properties.
reg3 <- function(Y,X){
coeff <- solve(t(X)%*%X)%*%t(X)%*%Y
resid <- Y - X%*%coeff
mse <- t(resid)%*%resid/(length(Y)-length(coeff)-1)
res <- list(coeff,resid,mse)
names(res) <- c('coeff','residuals','mse')
return(res)
}
Programming Goal: PRESS

PRESS will give us the ability to
demonstrate basic programming
constructs in an application.




Matrix Operations
Creating Functions
Loops
Data subsetting and storage
Programming Goal: PRESS

PRESS is the predictive sums of squares of a
regression model. It is computed via:
n
2
ˆ
PRESS   ( y  y(  i ) )
i 1

where yˆ (  i ) is the predicted value of yi using a model
fit with all of the data except observation i.
Loops

To construct a for loop use the following structure
for(i in 1:n){
Operations…
}
PRESS
PRESS <- function(Y,X){
n1 <- length(Y)
ind1 <- 1:n1
presshold <- rep(0,n1)
for(i in 1:n1){
X1 <- X[ind1 != i,]
Y1 <- Y[ind1 != i]
coef1 <- reg3(Y1,X1)$coeff
X2 <- X[ind1==i,]
Y2 <- Y[ind1==i,]
Yp <- X2%*%coef1
presshold[i] <- (Y2 - Yp)^2
}
res <- mean(presshold)
return(res)
}
Try the function
 Use
the data already created.
PRESS(Y,X)
If…then constructs

If you are interested in an if… then statement
on a vector use the ifelse() function.


ifelse(condition, True action, False action)
Example
X1 <- runif(15,0,1)
X2 <- ifelse(X1<.5,1,0)
cbind(X1,X2)

Did it work?
If…then constructs

If you are not interested in a vector,
then use the if{}else{} construct.
Source Files


Source files allows you to store all of your
created functions in a single file and have
all those functions available to you.
To load a self created library use:
source(Path)

Don’t forget that \ in the path needs to be
replaced with \\
Writing to a file

To write to a file use the write.table()
function.
write.table(dataset, path, sep=“,”, header=TRUE)

This will produce a comma separated value
(csv) file.
Linear Algebra Extras



Eigenvalues and eigenvectors use the eigen() function.
This gives an object that contains both the eigenvalues and
eigenvectors
Example:
eigen(XtX)
$values
[1] 77901.567997
1375.036486
456.253787
1.847225
$vectors
[1,]
[2,]
[3,]
[4,]
[,1]
[,2]
[,3]
[,4]
-0.03534617 -0.02144023 -0.02084214 0.99892771
-0.18185911 -0.21669130 -0.95864785 -0.03108754
-0.54097195 -0.79158676 0.28253402 -0.03023690
-0.82038239 0.57094272 0.02710023 -0.01620879
Summary




R is programming environment with many
standard programming structures already
included.
Easy to create functions.
No support.
Allows users to create a library of functions.
Summary

All of the R code and files can be found at:
www.people.vcu.edu/~elboone2/CSS.htm