Download Slides 2: Introduction to R (PDF, 131 KB)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Linear regression wikipedia , lookup

Choice modelling wikipedia , lookup

Transcript
Stochastic Models
Introduction to R
Walt Pohl
Universität Zürich
Department of Business Administration
February 28, 2013
What is R?
R is a freely-available general-purpose statistical package,
developed by a team of volunters on the Internet.
It is widely used among statisticians, and frequently new
statistical techniques are first implemented in R.
It is less widely-used by economists, who tend to prefer
commercial statistical packages or Matlab.
Walt Pohl (UZH QBA)
Stochastic Models
February 28, 2013
2/1
R versus Excel
R has many more probability and statistical
functions built in or avaiable in free packages.
R is command-driven. You enter a sequence of
commands to manipulate your data.
While everything in Excel is in terms of cells, R has
a bunch of different data types: vectors, arrays,
objects. You can define your own.
Normally you will create a “.R” command file that
is separate from your data.
Note: Excel also has a separate command language –
VBA.
Walt Pohl (UZH QBA)
Stochastic Models
February 28, 2013
3/1
R versus Matlab
The real target audience for Matlab is engineers.
Matlab has many features useful for engineers but
not useful for us.
The target application for R is statistics. R has
many more statistical functions than Matlab.
Matlab started as a package for manipulating
matrices, and added other features later.
Non-matrix based operations are awkward.
R was designed for general-purpose programming
from the beginning.
Walt Pohl (UZH QBA)
Stochastic Models
February 28, 2013
4/1
R versus Other Statistics Programs
R is free.
R is more command-driven and less GUI driven.
R is very close to S-Plus.
R supports as broad of an array of operations as any
other statistics program.
R’s programming language is better-designed than
most of its competitors.
Since different packages are written by different
volunteers, R is not as uniform as some other
systems.
Walt Pohl (UZH QBA)
Stochastic Models
February 28, 2013
5/1
Important URLs
R home page – http://www.r-project.org/
Closest R mirror site – http://stat.ethz.ch/CRAN/
R tutorial –
http://cran.r-project.org/doc/manuals/R-intro.html
Walt Pohl (UZH QBA)
Stochastic Models
February 28, 2013
6/1
Monte Carlo Simulation in R
R has many, built-in probability distributions. For each
supported distribution XXX, R comes with four functions:
dXXX – density function
pXXX – cumulative distribution function
qXXX – quantile function (inverse of the CDF)
rXXX – random draw
XXX = unif, norm, chisq, t, etc.
Example: For the normal distribution, we have dnorm,
pnorm, qnorm, rnorm.
Walt Pohl (UZH QBA)
Stochastic Models
February 28, 2013
7/1
Vectors in R
For us, the basic R datatype is a vector of numbers.
The c command creates vectors:
Example: If you type c(1, 3, 4.5), R returns the vector
(1, 3, 4.5).
You can assign vectors to variables, using the < −
operator.
x < − c(1, 3, 4.5);
Walt Pohl (UZH QBA)
Stochastic Models
February 28, 2013
8/1
Vectors in R, cont’d
You can get the value of individual entries by using the []
operator.
x[3] will return 4.5.
You can also get subvectors by using ranges.
x[1:2] will return the vector 1, 3.
The length function allows you to refer to the end in a
range:
x[2:length(x)] will return the vector 3, 4.5.
Walt Pohl (UZH QBA)
Stochastic Models
February 28, 2013
9/1
Operations on Vectors
Where possible, any operation on vectors will be applied
elementwise.
So if x and y are two vectors, then z = x * y will be the
vector where z[i] = x[i] * y[i].
Likewise log(x) will be the vector whose each entry will
be log(x[i]), etc.
Walt Pohl (UZH QBA)
Stochastic Models
February 28, 2013
10 / 1
Sample Statistics
R has built-in functions for the usual sample statistics:
mean(x) – Mean of vector x
var(x) – Variance of vector x
sd(x) – Standard Deviation of vector x
quantile(x, q) – The q-th quantile of vector x.
Walt Pohl (UZH QBA)
Stochastic Models
February 28, 2013
11 / 1
Reading Data
The easiest way to import data into R is through CSV
files. Excel can export files in this format.
The function read.csv imports a file as a CSV file.
Example: apple < − read.csv(”apple.csv”) imports the
file named ”apple.csv” into the variable apple. The data
is returned in the form of a data frame.
Walt Pohl (UZH QBA)
Stochastic Models
February 28, 2013
12 / 1
Data frames
A data frame is a named list of vectors. In the case of
”apple.csv”, we get four entries on the list:
DATE – end date of month.
RET – monthly return on Apple stock.
VWRETD – monthly return on CRSP
value-weighted index.
rf – monthly risk-free rate.
You access the vector by using $. Example: apple$RET.
Walt Pohl (UZH QBA)
Stochastic Models
February 28, 2013
13 / 1
Regression
R has a very easy to use interface for regression: the lm
function. For example, to fit the CAPM for Apple, we
would use
lm(RET ∼ VWRETD, data=apple)
The first argument uses the tilde operator indicate
that we want to regress RET on VWRETD.
The second argument indicates that the data comes
from the apple frame.
Walt Pohl (UZH QBA)
Stochastic Models
February 28, 2013
14 / 1
Regression cont’d
lm by itself only returns the coefficients. To get more
detail, including t stats, use
summary(lm(RET ∼ VWRETD, data=apple))
Walt Pohl (UZH QBA)
Stochastic Models
February 28, 2013
15 / 1
Built-In Mathematical Functions
R has various built-in mathematical functions:
exp(x) – e x .
log(x) – natural logarithm, log x. (Use log(x, b) for
logb x).
xˆy – x y .
√
sqrt(x) – x
Note these all work on vectors. exp(c(1, 2)) gives you
c(2.718282, 7.389056).
Walt Pohl (UZH QBA)
Stochastic Models
February 28, 2013
16 / 1
Special Mathematical Values
Floating point supports some special values
1/0 = Inf.
−1/0 = -Inf.
0/0 = NaN.
Mathematical operations are defined for these special
values. For example, Inf + Inf = Inf, and Inf - Inf = NaN.
Walt Pohl (UZH QBA)
Stochastic Models
February 28, 2013
17 / 1
Defining Your Own Functions
You can define a function by using R’s function
command:
f < − function(x) xˆ2
This creates a function that squares its argument, and
assigns it to the variable f. Calling f(2) in R will return 4.
Functions can take vector arguments. So f(c(1, 2)) will
return c(1, 4).
Walt Pohl (UZH QBA)
Stochastic Models
February 28, 2013
18 / 1
Matrices
R also supports matrices. Use matrix(0, nrow=m,
ncol=n) to create an m-by-n matrix. For example
g = matrix(0, nrow = 3, ncol = 4);
To access the element in the i-th row and j-th column,
use [] with two numbers. For example
g[1,2] < − 3;
assigns 3 to gi,j .
Walt Pohl (UZH QBA)
Stochastic Models
February 28, 2013
19 / 1
Logical Operations
R has the following basic logical operations.
==: equality
!−: not equal
<, >: greater or less than
<=, >=: greater/less than or equal
They evaluate to TRUE or FALSE.
Walt Pohl (UZH QBA)
Stochastic Models
February 28, 2013
20 / 1
Logical Operations on Vectors
Logical operations work on vector arguments, and return
a vector of TRUE or FALSE values.
Example: 1:10 > 5.
You can use the functions any or all to see if any or all of
the entries in the vector are TRUE.
Walt Pohl (UZH QBA)
Stochastic Models
February 28, 2013
21 / 1
Control Structures
R supports the standard control structures found in most
programming languages:
Branching: if
Definite iteration: for
Indefinite iteration: while
Walt Pohl (UZH QBA)
Stochastic Models
February 28, 2013
22 / 1
Control Structures: If
A statement like “if test code1 else code2 ’ executes
code1 if the test is true, and code2 if the test is false.
(“else code2 can be missing, means to do nothing).
Example: if (0 == 0) print(“is zero”) else print(“is not
zero”).
Walt Pohl (UZH QBA)
Stochastic Models
February 28, 2013
23 / 1
Control Structures: For
For allows you to do something a fixed number of times:
Example: for (i in 1:10) print(i);
Walt Pohl (UZH QBA)
Stochastic Models
February 28, 2013
24 / 1
Control Structures: While
While allows you to do something until a condition
becomes TRUE. (It may take forever).
Example:
i = 10;
while (i>0) {
print(i);
i = i - 1;
}
(Notice the use of braces here. This is because the body
of the while loop contains multiple statements.)
Walt Pohl (UZH QBA)
Stochastic Models
February 28, 2013
25 / 1
Writing Fast R Code
R is faster for vector operations than for loops.
Example: x < − (1:1000)2
is faster than
x < − rep(0, 1000); # create an array of all zeros.
for (i in 1:1000) {
x[i] < − iˆ2;
}
Walt Pohl (UZH QBA)
Stochastic Models
February 28, 2013
26 / 1