Download stob.pdf

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
January 31 - Welcome to Mathematics 243
Each class period, you will receive a one-page “outline” such as this one that outlines the topics of the
day and also includes the homework assignment. For today, first a note from your instructor:
I will not be able to in class on Monday, January 31, or Tuesday, February 1. I am serving
on a panel at the National Science Foundation to review grant proposals. Normally, I
do not cancel classes during the regular semester – your time is precious. However this
particular program at the NSF has been so important to Calvin in providing funds for
curricular improvement that I thought it important that Calvin contribute to doing the
work of evaluating grant proposals. The NSF makes funding decisions almost entirely on
the basis of peer review. So I hope that you will excuse my absence (and read my policy
on your skipping class!) and I hope that you will be ready to begin class on Thursday,
February 3. So that you aren’t bored until then, there is a homework assignment listed
below. Homework is due Friday, February 4. I can answer any questions that arise about
the homework on Thursday.
Michael Stob
Handouts that you should have in this packet
1. A one-page sheet, “syllabus” with crucial data about the course on the front and a calendar on
the reverse
2. A one-page sheet entitled “Important Information”
3. The first three sections of simpleR, the notes for the statistical computer program, R, that we will
use this semester (I have provided the first few sections but you can download the whole manual
from the internet so I will not copy any more of it.)
Homework, Due Friday, February 4
1. Read the “syllabus” and the “important information.” Be sure to note any questions that you
have about these policies so that you can ask them in class on Thursday.
2. Find the course website:
www.calvin.edu/∼stob/M243 Explore the website. In particular,
read the information on the website about the computer package R.
3. Read sections 1.1 and 1.2 of the textbook. Pay particular attention to the definitions of terms that
are in boldface in the text. Do problems 1.4,6,14,15,16. Note that problems are numbered
consecutively withing chapters so problem 1.4 is problem number 4 of Chapter 1. It happens
to be on page 20. All data for problems is available on the CD that came with your textbook.
(Stem-and-leaf plots are a tool suited to hand analysis but histograms should normally be drawn
with computer software. For this assignment, you may draw the histograms by hand but we will
soon learn how to read homework data into R and draw the histograms with the software.)
4. Read the sections 1 and 2 of simpleR. Do problems 2.1–2.6. You will need to use R to do this.
Either find R in a computer lab or download it to your computer. You will want to download
it if at all possible since you will be using R for most assignments in this class. (If you wish to
install it on your computer but you have only a slow internet connection, I will give you a CD
with a copy to install on Thursday. Currently, the Macintosh lab in North Hall Basement and the
Engineering labs have R installed.)
February 3 – Data
1. Statistics is the science of data.
(a) Three activities:
i. collecting data
ii. analyzing data
iii. making inferences from data
(b) Data are numbers in context.
(c) A variable is a function defined on a set of objects (usually numerical)
(d) Datasets consist of objects and variables.
2. Sampling from a population.
(a) A population is a (well-defined) set of objects.
(b) A sample is a subset of a population.
(c) A simple random sample of size k from a population of size n is a sample chosen by a
procedure for which any subset of size k has the same chance to be the sample chosen.
(d) Making inferences about the population from the sample.
(e) R command sample(1:100,15,replace=F) chooses a random sample of size 15 from the set
of numbers in {1, 2, . . . , 99, 100} (without replacement).
3. Questions about the course.
4. Questions about homework.
Homework, Due Tuesday, February 8
1. Read the supplementary notes Sections 1 and 2.
2. Read Devore and Farnum, Section 4.2, pages 161–166.
3. Do problems 1.1,2.1-4 of the Supplementary Notes.
February 4 – Experiments
1. Experiments (versus observation)
2. Independent (treatment, factor) variables, dependent (response) variable
3. Statistics is about describing and explaining variation
4. Replication
5. Experimental error (variation when independent variable is fixed)
6. Randomization
7. Blocking
8. The DOE mantra:
Block what you know, randomize what you don’t.
Homework, Due Tuesday, February 8
1. Read Devore and Farnum, Section 4.3.
2. In a clinical test of cancer treatments, there is usually a control group and a treatment
group. A very simple example of such a study can be found at
http://www.stat.ucla.edu/cases/breast cancer/. Answer the questions that appear
at that website.
3. The main Calvin basketball court has two baskets with very different visual backgrounds.
Some people claim that it is more difficult to make freethrows at the South basket than at
the North basket. Describe a design of an experiment to test this hypothesis. Think about
the roles of replication, randomization and blocking in this particular experiment.
February 7 – Distributions
1. Today all data are from continuous variables
2. Relative frequency histograms (area corresponds to proportion of data)
3. Distributions
4. Intepretation of distributions - approximation, model
5. The exponential family
f (x) = λe−λx
x≥0
6. The exponential distribution is sometimes used as a model for lifetime data, waiting times
7. R commands of note
(a) dexp(c,lambda) gives the value of the density at c
(b) pexp(c,lambda) gives the value of
Rc
0
λe−λx dx
Homework, Due Friday, February 11
1. Read Devore and Farnum, Section 1.3
2. Do problems 19,20,22,23,24 in Section 1.3 of Devore and Farnum.
February 8 – More Continuous Distributions
1. Today all data are from continuous variables
2. The normal distribution
f (x) = √
1 x−u 2
1
e− 2 ( σ )
2πσ 2
−∞<x<∞
(a) Qualitative properties: unimodal, symmetric, “bell-shaped”
(b) Sample applications
(c) R - dnorm(x,mu,sigma), pnorm(x,mu,sigma), qnorm(p,mu,sigma)
(d) The standard normal (Z) distribution
3. The uniform distribution dunif(x,a,b)
4. The Weibull distribution dweibull(x,alpha,beta)
5. The beta distribution dbeta(x,alpha,beta)
Homework, Due Friday, February 11
1. Read Devore and Farnum, Section 1.4 and “gaze” at Section 1.5.
2. Do problems 32,34,38,40 of Devore and Farnum Section 1.4 and problem 50 of Section 1.5
February 10 – Discrete Distributions
1. Discrete distributions.
2. Mass functions.
3. The Poisson distribution. (R pois)
4. The hypergeometric distribution. (R hyper) This is a three parameter distribution with
parameters called m, n, k. The distribution arises from sampling k elements without
replacement from a population with m objects of one kind and n of the other. For each x,
the mass function p(x) is the proportion of samples that have exactly x objects of the first
kind. The mass function is
p(x) =
m
x
n
k−x
m+n
x
x = 0, 1, . . . , min(k, m)
where
!
a
a!
=
b
b!(a − b)!
Homework
Read pp 29-30 and pp 51–53. Do the following problems:
1. Problems 56 of Section 1.6.
2. Suppose that a simple random sample of size 10 is chosen from a population of size 100.
Suppose 60 members of the population are female. How often will such a sample include
at least 6 females? (The hypergeometric distribution will be useful here.)
February 11 – Measures of center
1. mean (notation: x̄ is the mean of observations x1 , . . . , xn )
2. median (notation: x̃)
3. the median is resistant to the effect of outliers while the mean is not
4. trimmed means (notation: x̄p )
5. mean of a distribution (notation: µ)
6. median of a distribution (notation: µ̃)
7. mean and median of important distributions:
Distribution
Exponential
Normal
Uniform
Weibull
Beta
Poisson
Hypergeometric
parameters
λ
µ, σ
a, b
α, β
α, β
λ
m, n, k
mean
1/λ
µ
(a + b)/2
median
ln 2/λ
µ
(a + b)/2
α/(α + β)
λ
(km)/(m + n)
Homework
1. Read Section 2.1.
2. You do not have to do the problem about the hypergeometric distribution included on the
outline of February 10.
3. Do problems 2.2,4,6,8 of Devore and Farnum.
4. The mean and median of the exponential distribution.
(a) Verify the entries for mean and median in the above table for the exponential distribution.
(b) According to the above table, which number is the greater of the median and the
mean of the exponential distribution? Give an explanation in terms of the shape of
the exponential distribution as to why this should be so.
5. Important properties of the mean.
(a) Show that the mean of x1 , x2 , . . . , xn is the unique number c such that
n
X
(xi − c) = 0.
i=1
(b) Show that the mean of x1 , . . . , xn is the value that minimizes
n
X
i=1
(xi − c)2 .
February 14 – Boxplots and percentiles
1. statistics is concerned with describing and explaining variation
2. quartiles and interquartile range (IQR)
3. five number summary
4. boxplots (box-and-whiskers)
5. percentiles
6. quartiles and percentiles of continuous distributions
Homework, Due Friday, February 18
1. Read Section 2.3 of Devore and Farnum.
2. Do problems 2.32,34,36,38,42
3. Suppose a variable has an exponential distribution with parameter λ = 1.
(a) What are the lower quartile, upper quartile, and IQR of the distribution?
(b) What percentage of values of this variable would be classified as outliers if outliers are
defined as in the construction of the boxplot?
4. Suppose that x1 , . . . , xn are observations of a variable X and observations yi are defined
by yi = αxi + β where α and β are constants. How does the boxplot of the yi compare to
that of the xi ?
February 15 – Measures of spread
1. (“sample”) variance of a set of observations (Notation: s2 )
2. (“sample”) standard deviation (Notation: s)
3. Why n − 1 instead of n?
4. important (obvious) identity (which we use with c = x̄ and c = µ).
n
X
(xi − c)2 =
i=1
n
X
x2i − 2c
n
X
xi + nc2
x=1
i=1
5. Use c = x̄:
Sxx =
n
X
2
(xi − x̄) = . . . =
i=1
n
X
x2i
2
− nx̄ =
i=1
n
X
x2i
Pn
−
(
i=1
i=1
xi )2
n
6. variance and standard deviation of distributions (Notation: σ 2 , σ)
Distribution
Normal
Exponential
Beta
Uniform
Poisson
Hypergeometric
mean
variance
µ
σ2
1/λ
1/λ2
α/(α + β)
αβ/((α + β + 1)(α + β)2 )
(a + b)/2
(b − a)2 /12
λ
λ
km/n
(kmn(m + n − k))/((n + m)2 (n − 1))
Homework
1. Read Section 2.2 of Devore and Farnum.
2. Do problems 2.15,16,22,24,30
3. Show that “the average of the squares is greater than the square of the averages” unless the
data are constant. Formally, let yi = x2i for each i. Then show ȳ > x̄2 unless x1 = . . . = xn .
(Hint: the above expression for Sxx can be employed to great effect here.)
February 17 – Randomness
1. Random “experiments.” Essential features:
(a) more than one possible outcome,
(b) the experiment is repeatable under (more or less) identical conditions,
(c) the outcome that will obtain under any given repetition is uncertain.
2. Two canonical examples related to data collection.
(a) A random sample of size m is to be chosen from a finite population of size n,
(b) A number n of units are to be assigned at random to k experimental treatments.
3. Crucial terminology. Fix an experiment E.
Definition 1. An outcome is one of the possible (atomic) results of the experiment.
Definition 2. The sample space is the set of possible outcomes.
Definition 3. An event is any subset of the sample space.
4. Example: for the first canonical example. Suppose that a random sample of size 30 is taken from the
population of all Calvin senior students. Any specific collection of seniors of size 30 is an outcome. (There
are boatloads of these.) Some example events include: the set of samples that have exactly 15 males and
15 females; the set of samples that have exactly 10 students with GPA greater than 3.00; the set of samples
that have 30 varsity basketball players (there aren’t too many of these!).
5. A probability function is a rule that assigns to each event A of the experiment a real number, P (A) such
that 0 ≤ P (A) ≤ 1.
6. The probability P (A) is supposed to “predict” the long-run relative frequency of the occurence of the event
A if the experiment is repeated many times. In other words:
P (A) ≈
number of times A occurs
number of times experiment is repeated
with the approximation improving as the number of times the experiment is repeated increases.
7. Important:
Probability statements are about what may occur in the long-run and are statements before
the fact about what the result of the process of doing the experiment might be. They are not
statements about what has already occured. What has already occured is certain!
Homework, Due Tuesday, February 22
1. Read Devore and Farnum, Section 5.1.
2. Do problems 5.1,2,3,4,5 of Devore and Farnum.
3. Suppose that two numbers are to be chosen at random (without replacement) from the numbers 1, . . . , 10.
(a) Make a systematic list of all the outcomes of this random experiment. How many are there?
(b) Let A be the event such that the sum of the two numbers chosen is even. List the outcomes in A.
(c) For the event A in part (b), what is a reasonable number to define P (A) to be? Why?
(d) Now suppose that the two numbers are to be chosen at random with replacement. How many outcomes
are there (you need not list them)? what is a reasonable number to assign P (A) in this case?
February 18 – Probability
1. Goal: given an experiment E, assign probabilities P (A) to all events A so that P (A) is the “limiting relative
frequency” of the event if the experiment is repeated.
2. Helpful language of events. Use the language of sets.
(a) A ∪ B (or A or B) is the event that occurs if any outcome in A or B (or both) occurs
(b) A ∩ B or (A and B) is the event that occurs if any outcome that is in both A and B occurs
(c) A0 (or the complement of A) is the event that occurs if an outcome that is not in A occurs
3. Axioms for probability theory
Axiom 1. For every event A, 0 ≤ P (A) ≤ 1.
Axiom 2. If S is the sample space, P (S) = 1.
Axiom 3. If A1 , A2 , . . . are mutually exclusive events (i.e., Ai ∩ Aj = ∅ for all i 6= j), then P (∪∞
i=1 Ai ) =
P
∞
i=1 P (Ai ).
4. Simple case: if the sample space has finitely many outcomes o1 , . . . , ok that are equally likely, then assign
P ({oi }) = 1/k for each outcome oi .
5. Example: throw two dice once.
6. Example of simple case: sample
m objects without replacement from a set of n objects. The number of
n
equally likely outcomes is m
.
7. An exhaustive (and exhausting) analysis of the example: sample 5 stones from a population consisting of
34 yellow, 25 blue, 4 green (63 total).
Homework, Due Tuesday, February 22
Read Devore and Farnum Section 5.2.
1. Assume that A and B are events and that P (A), P (B) and P (A ∩ B) are given. Find formulas for the
following events in terms of these three numbers (hint: use the third axiom above, drawing a Venn diagram
may be helpful):
(a) Exactly one of A or B
(b) neither A or B
(c) at least one of A or B
(d) A and not B.
2. Consider the case of the 63 stones but the experiment of choosing 10 stones instead of 5.
(a) How many different equally likely outcomes are there in the sample space?
(b) For each number k = 0, 1, . . . , 10, what is the probability of choosing k yellow stones in the sample of
10?
(c) In the case of sampling 5 stones from 63 stones, how many different outcomes are there if we replace
each stone before selecting the next?
February 21 – Conditional Probability
1. Conditional probability. (The case of partial information.)
2. Notation: P (A|B) read “probability of event A given B”
3. Definition:
P (A|B) =
P (A ∩ B)
P (B)
4. examples: two dice, the random senior
5. Be careful: P (A|B) is not necessarily equal to P (B|A).
6. Independence.
(a) the intuitive content.
(b) the formal definition: A and B are independent if P (A ∩ B) = P (A)P (B).
7. The difference between sampling with and without replacement amounts to independence.
Homework, Due Friday, 25
1. Read Devore and Farnum Section 5.3.
2. Do problems 5.18,19,20,22 in Devore and Farnum.
February 22 – More on conditional probability
1. Example: AIDS tests are “99.5% accurate.”
2. Compound experiments.
3. Repeated trials of an experiment with two possible outcomes: “success” and “failure.”
(a) The binomial mass function: n independent trials; p probability of “success,” Probability
of k successes, p(k), is given by
n k
p(k) =
p (1 − p)n−k
0≤k≤n
k
(b) R commands: dbinom, pbinom
Homework, Due Friday, 25
Read Devore and Farnum Section 1.6, pages 48–51.
1. Do problems 1.52 and 1.55 (note that these are in Chapter 1!)
2. Consider the following experiment. There are three cards: one is white on both sides, one is
red on both sides, and one is white on one side and red on the other. The experiment consists
of choosing a card at random and laying it on the table with a randomly chosen side face up
(i.e., each card is equally likely and each side of the card chosen is equally likely). What is
the probability that, if a red side is facing up then the face down side is also red?
3. In basketball, a “one-and-one” is executed as follows. A player shoots a free throw. If the
player makes the free throw, one point is awarded and the player is allowed to shoot one more
free throw for an additional point. If a player misses the first free throw, then the player does
not shoot the second one. Thus a player can score either 0,1, or 2 points on a one-and-one.
Suppose that a player has a probability of 0.7 of making any free throw.
(a) What is the probability that the player scores 0 points? 1 point? 2 points?
(b) In general, if the player is allowed to continue shooting until she misses, what is the
probability that the player scores k points for any k?
February 24 – Random variables and distributions
1. Given an experiment E with sample space S, a random variable is a (real-valued) function
defined on S. Use uppercase letters for random variables and lowercase letters for their values.
2. Examples of random variables.
3. Events correspond to possible sets of values of X.
4. A random variable has a distribution given by a density function (or a mass function).
5. Probability distributions as models for probabilities.
6. Many examples.
7. Interpretation of mean and variance of a probability distribution.
Homework, Due Friday, March 4
1. Read Devore and Farnum, Section 5.4, 212-217.
2. You might find it useful to obtain (from the website) and read Section 6 of SimpleR.
3. Do problems 5.30, 5.32, 5.35.
February 25 – Properties of Distributions of Random Variables
1. Using R with distributions. (Generating random numbers: rnorm)
2. Suppose that the random variable X has a certain distribution (given by a density function
f (x) or a mass function p(x). Then the mean of the distribution, µ is often denoted by µX
and, is computed by
Z ∞
X
xf (x) dx
xp(x)
or
µX =
µX =
−∞
x
3. Interpretation: theoretical long-range average of X as the experiment is repeated a lot.
4. Making new random variables from old: given a real-valued function g, define a new random
variable Y = g(X) by applying g to the value of X on any outcome.
5. Our favorite functions: X 2 , X n , eX , log X.
6. We could, in principle, compute a mass function or density function for g(X) given one for g.
7. Examples:
8. Computing µY where Y = g(X):
µg(X) =
X
Z
g(x)p(x)
or
∞
µg(X) =
x
g(x)f (x) dx
−∞
2 is simply a special case where Y = (X − µ)2 .
9. Note that σX
10. Alternate useful notation for µg(X) is E(g(X)) which is read “Expected value of g(X)”
11. Properties:
(a) E(cX) = cE(X) for any constant c.
(b) E(X + a) = E(X) + a for any constant a.
Homework, Due Friday, March 4
1. Use R to generate 100 random samples of size 100 from an exponential distribution with
parameter 1. Find the mean of each of the 100 samples. Compute a five-number summary
of the set of the 100 means.
2. Suppose that X is a normal random variable with µ = 0 and σ 2 = 1. Let Y = eX .
(a) Compute P (0 ≤ Y ≤ 1). (Hint: First find c such that Y ≤ 1 if and onle if X < c)
(b) Compute P (0 ≤ Y ≤ 3).
3. Here’s a silly experiment for you. It has two steps: first, a four-sided die (with the sides
labelled 1,2,3,4) is thrown and the number c that appears is noted. Then c coins are thrown
and the number X of heads is recorded.
(a) What is the expected number of coins thrown?
(b) What is the mass functions of X?
(c) What is the mean of X?
(d) In what way are your answers to parts (a) and parts (c) consistent?
February 28 – Distributions of two random variables
1. Experiments with more than one associated random variable. Examples.
2. Two discrete variables:
(a) Example: throw 2 dice and record smaller number S and larger number L.
(b) Two discrete variables X and Y have a joint mass function p(x, y).
(c) From the joint mass function can compute individual mass functions, means, etc.
(d) The special case of independence - p(x, y) = pX (x)pY (y).
3. Two continuous variables:
(a) Example: Math SAT score M and Verbal SAT score V .
(b) Two continuous variables X and Y have a joint density function f (x, y).
(c) From the joint density function can compute individual density functions, means, etc.
(d) The special case of independence - f (x, y) = fX (x)fY (y).
(e) Another special case, the bivariate normal distribution.
4. Extend all of the above to more variables and to the mixture of discrete and continuous
variables.
Homework, Due Friday, March 4
1. Read Devore and Farnum pages 218–220 and pages 146–149.
2. Do problem 3.41.
3. Suppose that X and Y are continuous random variables with f (x, y) = 2, 0 ≤ x ≤ y ≤ 1.
Find P (1/2 ≤ X). (Hint: Draw a picture.)
March 3 — Sampling Distributions
1. The two examples for the day:
(a) How many raisins in a box?
(b) Pick a senior at random and report her GPA.
2. The setting for the remainder of the course:
(a) We have a random experiment with associated random variable X. We call X the
population random variable.
(b) The distribution of X is not completely known. Often we make assumptions about its
shape (e.g., X has a normal distribution)
(c) We repeat the experiment n times, independently. The random variable for the ith
repetition is called X1 .
(d) The variables X1 , X2 , . . . , Xn are independent and identically distributed (i.i.d.). Such
a collection of random variables is called a random sample from the population X.
(e) The values of these variables x1 , x2 , . . . , xn are the data
(f) We want to use the data to make inferences about X.
(g) We compute statistics: a statistic is a function of the sample. For example X̄ is a
function of the n random variables X1 , . . . , Xn . Other examples S 2 , S, IQR, X̃.
(h) A statistic also has a distribution that is also not completely known (but it is known in
terms of the distribution of X).
3. $64,000 question.
What does the value of the statistic tell us about the population random variable?
4. For a while, we will work on the very simple (but very important) question:
What does X̄ tell us about µX ?
Homework, Due Tuesday, March 8
1. You do not need to do the homework problem that is on the outline from Monday.
2. Read Devore and Farnum, Section 5.5.
March 4 — Sampling Distributions
1. The setting for the remainder of the course:
(a) We have a random experiment with associated random variable X. We call X the population
random variable.
(b) The distribution of X is not completely known. Often we make assumptions about its shape (e.g.,
X has a normal distribution)
(c) We repeat the experiment n times, independently. The random variable for the ith repetition is
called Xi .
(d) The variables X1 , X2 , . . . , Xn are independent and identically distributed (i.i.d.). Such a collection
of random variables is called a random sample from the population X.
(e) The values of these variables x1 , x2 , . . . , xn are the data
(f) We want to use the data to make inferences about X.
(g) We compute statistics: a statistic is a function of the sample. For example X̄ is a function of
the n random variables X1 , . . . , Xn . Other examples S 2 , S, IQR, X̃.
(h) A statistic also has a distribution that is also not completely known (but it is known in terms of
the distribution of X).
2. $64,000 question.
What does the value of the statistic tell us about the population random variable?
3. For a while, we will work on the very simple (but very important) question:
What does X̄ tell us about µX ?
4. Key fact: as the sample size gets larger, the sample mean, X̄, gets better as an approximation to the
µX .
5. Empirical evidence.
Homework, Due Tuesday, March 8
1. Read Devore and Farnum, Section 5.5, again.
2. One problem but many parts. We are going to investigate the sampling distribution of the sample
mean of the beta distribution. Assume that X is a random variable that has a beta distribution with
α = 3 and β = 5. (Normally, of course, we do not know the distribution of X. That’s the whole
problem. But we can explore what happens when we know what the answer is to help us understand
what to do when we don’t know what the answer is.) The mean of this beta distribution, µX = 3/8.
(In general the mean of a beta distribution is α/(α + β).
(a) What is the IQR of this distribution? (Remember, using qbeta will help.)
(b) Generate 1000 random numbers from this beta distribution. Each of these 1000 numbers could
be considered a random sample of size 1. Save these 1000 numbers in a vector called one. Give a
fivenumber summary of these 1000 numbers. What is the IQR of these 1000 numbers? Compare
this number to your answer in part (a).
(c) Now generate 1000 random samples of size 2 from this distribution and compute the sample mean
of each of those samples of size 2. Save these sample means in a vector called two. Give a
fivenumber summary of these 1000 sample means. What is the IQR of these 1000 numbers?
(d) Now generate 1000 random samples of size 5 from this distribution and again compute the sample
mean of each of those samples of size 5. Save these sample means in a vector called five. Again,
give a fivenumber summary of these 1000 sample means and compute the IQR.
(e) Finally, repeat the process for samples of size 10. (Call the vector ten.)
(f) Do a boxplot of the vectors one, two, five, ten on the same plot.
(g) On your boxplot, sketch what you think the boxplot would look like if we took samples of size 20.
Defend your sketch.
March 7 — Sampling Distribution of X
1. Setting:
(a) X is a random variable with unknown distribution.
(b) X1 , . . . , Xn is a random sample from X (independent and identically distributed random variables.)
(c) X = (X1 + · · · + Xn )/n, the sample mean is an estimate of µX .
2. Important facts about the distribution of X.
Theorem 1. If X is any distribution and X is the sample mean of a sample of size n, then
(a) µX = µX (sometimes written E(X) = E(X))
2
2
= σX
/n (sometimes written Var(X) = Var(X)/n)
(b) σX
Theorem 2. If X is a normal distribution then the distribution of X is also normal.
Theorem 3 (The Central Limit Theorem). If X is a distribution and X n is the sample mean of a
sample of size n, then the limit of the distributions of X n as n goes to infinity is a normal distribution.
3. The Central Limit Theorem says that if the sample size is “large enough”, then the distribution of the
sample mean is approximately normal.
4. The binomial distribution and its relation to the sample mean.
Homework, Due Friday, March 11
1. Read Devore and Farnum, Section 5.6.
2. Do problems 5.46,48,49,50,52
March 8 - Estimators
1. Setting:
(a) X is a random variable with unknown distribution.
(b) X1 , . . . , Xn is a random sample from X (independent and identically distributed random
variables.)
2. Problem.Want to estimate a parameter θ of the distribution of X.
3. Estimators and estimates.
Definition 1. An estimator of θ is a statistic θ̂ used to estimate θ.
Definition 2. An estimate of θ is the value of the estimator for a given sample.
Examples of estimators: X is an estimator of µ, S 2 is an estimator of σ 2 , S is an estimator
of σ.
4. Properties of estimators.
(a) unbiased:
Definition 3. An estimator θ̂ is unbiased if µθ̂ = θ.
Examples: X is an unbiased estimator of µ and S 2 is an unbiased estimator of σ 2 but S
is a biased estimator of σ.
(b) small variance: would also like estimator to have small variance.
Important example: X is the minimum variance unbiased estimator of µ!
(c) consistent: (estimator gets better as n gets larger)
Definition 4. An estimator θ̂n for θ that depends on the sample size n is consistent if
for every > 0, lim P (|θ̂n − θ| > ) = 0.
n→∞
5. Important example - X in disguise in the binomial distribution. If Y is a binomial random
variable with parameters n (known) and π (unknown), estimate π by y/n. Y /n has mean π
(is unbiased) and variance π(1 − π)/n.
Homework, Due Friday, March 11
1. Read Devore and Farnum, Section 7.1
2. Do problems 7.2,3,4,6 of Devore and Farnum.
March 10 - An introduction to confidence intervals
1. Setting:
(a) X is a random variable with unknown distribution.
(b) X1 , . . . , Xn is a random sample from X (independent and identically distributed random
variables.)
2. Key fact used: for large n, X has a distribution that has mean µ, variance σ 2 /n, and that
is approximately normal. Therefore the following random variable is approximately normal
with mean 0 and standard deviation 1.
X −µ
√
Z=
σ/ n
3. Using Z and algebra we have
σ
σ
P X − 1.96 √ < µ < X + 1.96 √
≈ .95
n
n
(The symbol ≈ means approximately equal and is because of the central limit theorem. IF
X is normal, then this probability statement exact.)
4. But σ is not known. If n is large, use S to approximate σ. Then
P
S
S
X − 1.96 √ < µ < X + 1.96 √
n
n
≈ .95
There are now two approximations here, one using the CLT and the other using S to approximate σ.
h
i
5. The interval x − 1.96 √sn , x + 1.96 √sn is called a 95% confidence interval for µ. Important:
our confidence is not in the interval but in the procedure for producing the interval. Approximately 95% of the 95% confidence intervals that we produce will successfully capture
µ.
6. Other confidence intervals: other percentages; one sided.
7. Work to do: What’s with all these approximations?
Homework, Due Tuesday, March 22
1. Read Devore and Farnum, Section 7.2
2. Do problems 7.8,9,10,12,14 of Devore and Farnum.
March 21 - Confidence intervals using t-distribution
1. Setting:
(a) X is a random variable with unknown distribution.
(b) X1 , . . . , Xn is a random sample from X (independent and identically distributed random
variables.)
2. Review. An approximate confidence interval for µ if n is large is given by
S
X ± z∗ √
n
where z ∗ is chosen based on the level of confidence desired (for 95% use 1.96). The approximation is due to the CLT (if X is not normal) and approximating σ by s.
3. Assume that X is normal. Then the distribution of
with parameter n − 1.
X
√
S/ n
is known exactly. It is a t-distribution
4. The t-distribution has one parameter k called the degrees of freedom. The distribution is
n
unimodal, symmetric with mean 0 and variance n−2
for n > 2.
5. An exact confidence interval for µ is given by
S
X ± t∗ √
n
where t∗ is chosen based on the level of confidence desired. e.g., for a 95% confidence interval,
t∗ =qt(.975,n-1).
6. For X not normal but n reasonably large, the t-distribution can be used to construct confidence intervals (and probably should instead of the Z distribution used above). The tdistribution is robust with respect to violation of the normality hypothesis.
7. The R command t.test computes confidence intervals using the t-distribution.
Homework, Due Thursday, March 24, note due date
1. Read Devore and Farnum, Section 7.4, pages 313–316.
2. The problems of March 10 are due Thursday instead of Tuesday.
3. Do problems 7.36,38,39,40a
4. For the iris data, write 90% confidence intervals for the mean of the sepal length of each of
the three species. Does it seem likely that the true means for the three species are actually
equal?
March 22 - Confidence intervals using t-distribution, two samples
1. Setting:
(a) X1 , X2 are normal random variables with unknown distribution and means µ1 and µ2 .
(b) X11 , . . . , X1n is a random sample from X1 and X21 , . . . , X2m is a random sample from
X2 and all variables are independent.
(c) Want to make inferences about µ1 − µ2 .
(d) Usual application: two treatments or treatment and control group. Example: random
dot stereograms.
2. Aside.
Theorem 1. If Y and Z are independent, then the random variables Y ± Z have means
µY ± µZ and variance σY2 + σZ2 .
Application: X 1 − X 2 has mean µ1 − µ2 and variance σ12 /n + σ22 /m.
3. Crucial t-distribution fact: the random variable
(X 1 − X 2 ) − (µ1 − µ2 )
q
S12
S22
n + m
has a distribution that is approximately t with degrees of freedom df given by a nasty formula
(see book page 322). The approximation is best when the variables are approximately normal,
the variances of X1 and X2 are approximately equal, and the sample sizes are approximately
equal. (The biggest deviations occur when the sample sizes are small and/or the variances
are quite different.)
q
S2
S2
4. Conclusion: (X 1 − X 2 ) ± t∗ n1 + m2 gives a confidence interval for µ1 − µ2 .
5. The R command ttest(x1,x2,...) computes these confidence intervals.
6. New Setting:
(a) X1 , X2 are normal random variables with unknown distribution and means µ1 and µ2 .
(b) X11 , . . . , X1n is a random sample from X1 and X21 , . . . , X2n but X1i and X2i are dependent.
(c) Want to make inferences about µ1 − µ2 .
(d) Usual application: Two treatments with a blocking variable so that X1i and X2i are
not independent but rather represent the two treatments applied for a fixed value of a
blocking variable.
7. Consider the random variable Y = X1 − X2 . This random variable has mean µ1 − µ2 .
Consider Yi = X1i − X2i as a random sample from a population with mean µ1 − µ2 and
unknown variance. Use one sample t-distribution.
8. Example: picking stocks by throwing darts.
Homework, Due Tuesday, March 29
1. Read Devore and Farnum, Section 7.5.
2. Do problems 7.49,7.50,7.52a.
3. Refer to the “darts” data introduced in class (and available from the data page of the class
website. Write a confidence interval useful for determining whether the PROS rate of return
was better than that of the Dow Jones Industrial Index (column DJIA of the data).
March 16 — Confidence intervals for proportions
1. Suppose x is the number of successes in n trials of a Bernoulli process with probability
of success π. Then x has the binomial distribution with parameters n and π.
2. π̂ = p = x/n is an unbiased estimator for π.
3. The distribution of p has mean π and variance π(1 − π)/n.
4. For large n, the central limit theorem them implies that the following random variable
is approximately normal with mean 0 and variance 1:
p−π
q
π(1−π)
n
so a 95% confidence interval can be found by solving
p−π
−1.96 < q
< 1.96
π(1−π)
n
for an interval of form a(p) < π < b(p).
5. Three different confidence intervals based on this:
(a) Sloppy - replace π in the denominator by p (some books do this)
(b) R (prop.test): solve the equation directly for an inequality of form a(p) < π <
b(p). (Use the quadratic equation.)
(c) R (with continuity correction, prop.test default): solve the equation after realizing that x is really between x − 12 and x + 12 .
6. Exact confidence intervals are also available in R (binom.test). These are called exact
but are not really exact. They do have the property however that a 95% confidence
interval is at least a 95% confidence interval.
7. Confidence intervals for differences in proportions can also be computed by the same
approximations (R prop.test).
8. Determining sample sizes to ensure given confidence levels. (See Gallup poll.)
Homework
1. Read Devore and Farnum, Section 7.3, pages 303–306.
2. Do problems 7.22,24,26,28.
March 28 — The bootstrap
1. In estimating a parameter θ we have the following two questions:
Question 1 What estimator θ̂ should we use?
Question 2 How accurate is it?
The answer to the second question could include an estimate of the variance of θ̂ and,
preferably, information about the distribution of θ̂.
2. We have seen that for the mean µ we have good answers to the two questions. But for
other parameters (e.g., population median µ̃) or for small sample sizes we don’t always
have answers.
3. If we had many samples (so a sample of the estimator) we could estimate the variance
of the estimator by the sample variance. But we only have one sample.
Big idea. Use the sample that we have have to generate lots of samples.
4. A bootstrap method for finding 95% confidence intervals. Suppose that
x1 , . . . , xn is a fixed random sample with the underlying distribution unknown. Let
θ̂(x1 , . . . , xn ) be an estimator of θ.
(a) For a large B choose B samples of size n from x1 , . . . , xn with replacement.
Denote the ith such sample by xi1 , . . . , xin .
(b) Compute the B values of the estimator θ̂i = θ̂(xi1 , . . . , xin ).
(c) Compute the .025- and .975-quantiles l, u of the numbers θ̂1 , . . . θ̂B .
(d) The confidence interval for θ is (l, u).
5. Evidence suggests that the bootstrap works in many situations.
6. The theory is that the sample x1 , . . . , xn gives us an approximation of the density
function of the unknown rnadom variable.
The empirical density function determined by the sample is the density f (x) such that
f (xi ) = 1/n for all i. Bootstrap samples are simply samples from a distribution with
that has the empirical density function for its density.
Homework
1. Read Devore and Farnum, Section 7.6, pages 334–337.
2. Do problems 7.56,79.
March 29 — Linear Models
1. Setting: bivariate data: (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ).
2. Problem: fit a function y = f (x) to the data
3. Special case: fit a linear function y = a + bx.
4. Predicted value for a fixed a, b: ŷi = a + bxi .
5. Residuals: ei = yi − ŷi .
6. Linear regression: Choose a, b to minimize
n
X
e2i .
i=1
7. Notation:
Sxx =
n
X
(xi − x̄)2
i=1
Syy =
n
X
(yi − ȳ)2
i=1
Sxy =
n
X
(xi − x̄)(yi − ȳ)
i=1
SSTOT = Syy
n
X
SSResid =
e2i
i=1
SSRegress =
n
X
(ŷi − ȳ)2
i=1
8. The following minimize SSResid :
b = Sxy /Sxx
a = ȳ − bx̄
Homework
Read Devore and Farnum, Section 3.3, pages 114–117.
A dataset containing some statistics on baseball teams competing in the 2003 American League
baseball season can be found on the data page of the course website. Suppose that you want to
predict the number of runs scored (R) by a team just from knowing how many home runs (HR)
the team has.
1. Write a linear function of R as a function of HR.
2. Report SSResid .
3. Plot the residuals as a function of HR.
4. Compute the predicted values for each of the teams. Make some comments on the fit. (For
example, are there any values not particiularly well-fit? Do you have any explanations for
that?)
April Fool’s Day – Linear Regression
1. (a) Given: bivariate data: (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ).
(b) Problem: fit a line y = a + bx to the data.
2. Solution: Choose a and b to minimize the sum of the squared residuals. That is minimize
SSResid =
n
X
(yi − (a + bxi ))2 =
i=1
(Minimize by computing
∂
∂a
and
b=
∂
∂b ,
n
X
e2i
i=1
set equal to 0, and solve for a and b.) Result
Sxy
=
Sxx
Pn
(x − x̄)(yi − ȳ)
i=1
Pn i
,
2
i=1 (xi − x̄)
a = ȳ − bx̄
3. Why minimize SSResid as opposed to some other function of the ei ? Hold that thought.
4. Notation:
Sxx =
n
X
(xi − x̄)2
s2x = Sxx /(n − 1)
(yi − ȳ)2
s2y = Syy /(n − 1)
i=1
Syy =
n
X
i=1
Sxy =
n
X
(xi − x̄)(yi − ȳ)
i=1
SSTOT = Syy
SSRegress =
n
X
(ŷi − ȳ)2
i=1
5. The following relationship is not obvious (think algebra)
SSTOT = SSRegress + SSResid
Interpreting this equation: we want to explain the variation measured by SSTOT . Our explanation is
measured by SSRegress . The unexplained variation is SSResid .
6. The coefficient of determination, denoted by R2 measures the explained variation:
R2 =
SSRegress
SSTOT
Note that 0 ≤ R2 ≤ 1. Higher values represent a better model. R2 is usually reported as a percentage
and often appears in expressions such as “x explains 56% of the variation in y.”
7. The correlation coefficient r is defined by
r= √
Note that r =
√
Sxy
p
Sxx Syy
R2 . Thus −1 ≤ r ≤ 1.
8. The regression equation can be rewritten as
y − ȳ
x − x̄
=r
sy
sx
Homework
Read Devore and Farnum, Section 3.3.
1. Suppose that we wish to fit a linear model without a constant: i.e., y = bx. Write down an expression
for the value of b that minimizes the sums of squares of residuals in this case. (Hint: there is only one
variable here, b, so this is a straightforward Mathematics 161 max-min problem.) R will compute b in
this case as well with the command lm(y∼x-1). In this expression, 1 stands for the constant term and
-1 therefore means leave it out. Alternatively we can write lm(y∼x+0).
2. Refer to the same baseball dataset as the last homework, the 2003 American League Baseball season.
Can we predict the number of wins (W ) that a team will have from the number of runs (R) that the
team scores?
(a) Write W as a linear function of R.
(b) A better model takes into account the runs that a team’s opponent has scored as well. Write
W − L as a function of R − OR (here L is losses and OR is opponents runs scored). Compare
this model with the previous one by comparing the values of R2 (not runs squared but coefficient
of determination!).
(c) Why might it make sense from the meaning of the variables W − L and R − OR to use a linear
model without a constant term as in problem 1? Write W − L as a linear function of R − OR
without a constant term.
(d) Compare the results of parts (b) and (c) as to the goodness of the model. It is best, when
comparing a model without a constant term to one with a constant term, to compare the residuals
rather than to compare the values of R2 .
April 4 – The linear model
1. Review:
(a) Given: bivariate data: (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn ).
(b) Problem: fit a line y = a + bx to the data.
(c) Solution: Choose a and b to minimize the sum of the squared residuals. That is
minimize
n
n
X
X
2
SSResid =
(yi − (a + bxi )) =
e2i
i=1
i=1
(d) A measure of fit: R2 :
R2 =
SSRegress
SSTot
SSTOT = Syy =
n
X
(yi − ȳ)2
SSRegress =
i=1
n
X
(ŷi − ȳ)2
i=1
2. A statistical model.
y = α + βx + where
(a) is a random variable with mean 0 and variance σ 2
(b) α, β, σ 2 are (unknown) parameters
(c) Additionally, when we want to write confidence intervals, is normal
3. The data (x1 , y1 ), . . . , (xn , yn ) are assumed to arise by choosing a random sample
1 , . . . , n . The random variables 1 , . . . , n are assumed to be independent and all
have the same distribution as . (Note that this really makes yi a random variable but
xi is not.) We can write
µY |x = α + βx
σY2 |x = σ 2
4. This model is just a generalization of one that we have seen before: we could view a
random sample y1 , . . . , yn as arising from the model
y =µ+
Homework
Read Devore and Farnum, Section 11.1, pages 488–492. No problems!
April 5 – The linear model
1. The standard statistical linear model.
yi = α + βxi + ei
where
(a) ei is a random variable with mean 0 and variance σ 2 (should use uppercase E,
some authors use )
(b) α, β, σ 2 are (unknown) parameters
(c) the random variables e1 , . . . , en are independent
(d) To write confidence intervals we will later assume that e is normal
2. Note that these assumptions imply that yi is a random variable (so we should write
Yi ) but xi is not. We have
µYi = α + βxi
σY2i = σ 2
3. This model is just a generalization of one that we have seen before: we could view a
random sample y1 , . . . , yn as arising from the model
Yi = µ + ei
4. Estimating parameters. The least squares estimates of α and β are good estimates.
There is also a natural estimator of σ 2 (but note the n − 2).
β̂ =
Sxy
Sxx
α̂ = ȳ − β̂ x̄
σ̂ 2 = s2e =
SSResid
n−2
5. These estimators are each unbiased.
6. Variances:
σβ̂2
σ2
σ2
=
= Pn
2
Sxx
i=1 (xi − x̄)
σα̂2
=σ
2
1
x̄2
+
n Sxx
7. α̂ and β̂ are the “best” estimators in a large class of estimators. For the following
Theorem, we do not need to assume that errors have a normal distribution (or even
that they have the same distribution). We need only assume that they have mean 0,
the same variance σ 2 and that they are independent.
Theorem 1 (Gauss-Markov) The estimators α̂ and β̂ are linear functions of the
yi . Among all unbiased estimators of α and β that are linear in the yi , α̂ and β̂ have
minimum variance. (We say α̂ and β̂ are BLUE - best, linear, unbiased estimators.)
See other side for homework.
Homework
Read Devore and Farnum, Section 11.1, Section 3.4 (pages 129,130)
1. If y is (approximately) linearly related to x then x is (approximately) linearly related
to y. Use the state data to write Life.Exp as a function of Murder. Likewise write
Murder and a linear function of Life.Exp.
(a) Plot the data and both lines on one plot. (You can use R to do this but it might
require a bit of thought. Else use R to plot the data and one line and add the
other line in by hand. You know how to sketch lines given their equations.)
(b) Why aren’t the lines the same? (Obviously if the data were on a straight line, the
lines should be the same.)
2. Sections 11.1 and 3.4 show how to transform data given by (xi , yi ) to (x0i , yi0 ) if the
hypothesized relationship of x and y is a particular nonlinear relationship. Show how
to define new variables x0 , y 0 to transform the following nonlinear relationships to linear
realtionships.
(a) y = a/(b + cx)
(b) y = ae−bx
(c) y = abx
(d) y = x/(a + bx)
(e) y = 1/(1 + ebx )
3. Sometimes the experimenter has control over the points xi in the random experiment.
If the experimenter can choose the points xi , she might very well choose these points
to minimize the variance of the estimators α̂ and β̂. Suppose that the experiment will
collect 10 observations (xi , yi ) and can set the values of xi to be
real number in
Pany
n
the interval [−5, 5]. (Hint: you might want to recall that Sxx = i=1 x2i − nx̄2 .)
(a) How should an experimenter choose the ten values xi to minimize the variance of
β̂?
(b) There are other considerations in choosing the values of xi . Why might the choice
in part (a) not be a very good one?
April 7 – Estimating parameters in the linear model
1. The standard statistical linear model.
yi = α + βxi + ei
where
(a) ei is a random variable with mean 0 and variance σ 2 (should use uppercase E, some authors
use )
(b) α, β, σ 2 are (unknown) parameters
(c) the random variables e1 , . . . , en are independent
(d) To write confidence intervals we will later assume that e is normal
2. Estimating parameters.
(a) Estimators.
Sxy
SSResid
α̂ = ȳ − β̂ x̄
σ̂ 2 = s2e =
Sxx
n−2
Important note: from now on we will use the book’s convention and write a for α̂ and
b for β̂. Should really use uppercase since these are estimators (and then use lowercase for
estimates).
(b) These estimators are each unbiased.
(c) Variances:
σ2
1
σ2
x̄2
2
2
σb2 =
= Pn
σ
=
σ
+
a
2
Sxx
n Sxx
i=1 (xi − x̄)
β̂ =
3. a and b are the “best” estimators in a large class of estimators.
Theorem 1 (Gauss-Markov) The estimators a and b are linear functions of the yi . Among
all unbiased estimators of α and β that are linear in the yi , a and b have minimum variance.
(We say a and b are BLUE - best, linear, unbiased estimators.)
4. Writing confidence intervals for β.
(a)
(b)
(c)
(d)
(e)
Assume that i is normally distributed.
Then b is normally distributed.
Estimate σ 2 by s2e .
Estimate σb2 by s2b = s2e /Sxx . sb is called the standard error of the estimate of β.
Key fact: the following statistic has a t-distribution with n − 2 degrees of freedom
b−β
sb
(f) Confidence interval. Let t be the appropriate critical value for the desired confidence level
and n − 2 degrees of freedom. The following is a confidence interval for β
(b − tsb , b + tsb )
Homework
Read Devore and Farnum, Section 11.2, pages 501-502. No problems.
April 8 – Confidence intervals for paramters
1. The standard statistical linear model.
yi = α + βxi + ei
where
(a) ei is a random variable with mean 0 and variance σ 2
(b) α, β, σ 2 are (unknown) parameters
(c) the random variables e1 , . . . , en are independent
(d) ei has a normal distribution (assumption made to construct confidence intervals)
2. Estimators
Parameter
σ
β
Estimate
se =
q
b=
Standard Deviation
Estimate
√σ
Sxx
sb = √sSexx
q
2
sa = se n1 + Sx̄xx
q
2
sŷ = se n1 + (x−x̄)
Sxx
SSResid
n−2
Sxy
Sxx
α
a = ȳ − bx̄
α + βx
ŷ = a + bx
q
2
σ n1 + Sx̄xx
q
2
σ n1 + (x−x̄)
Sxx
3. Confidence Intervals. Let t∗ be the appropriate critical value for df = n − 2. The following
are confidence intervals.
Parameter
Interval
α
a ± t∗ sa
β
b ± t∗ sb
α + βx
(a + bx) ± t∗ sŷ
4. Predicting a single y. For a fixed x∗ we predict y ∗ = α + βx∗ + e∗ by ŷ ∗ = a + bx∗ . Then
∗
2
). Thus a “prediction interval” for y ∗ is
ŷ ∗ − y has mean 0 and variance σ 2 (1 + n1 + (x S−x̄)
xx
s
ŷ ∗ ± t∗ se
1+
1 (x∗ − x̄)2
+
n
Sxx
Homework
Read Devore and Farnum, Section 11.3.
1. Do problems 11.18,22.
2. Return to the state data. Again use murder rate to predict life expectancy.
(a) Estimate β. Write a sentence that expresses the relationship between murder rate
and life expectancy using this estimate.
(b) Estimate α. What is a 90% confidence interval for α? Write a sentence interpreting
α.
(c) Suppose that the murder rate is 9.0. Write a 95% confidence interval for the mean
life expectancy for such states.
(d) Suppose that the murder rate is 4.0. Write a 90% prediction interval for a “new”
state that has murder rate 4.0
April 11- Don’t fit a line if a line doesn’t fit
1. Computing multiple confidence intervals may increase the chance of error.
2. Our estimates a and b for α and β are not independent. We can generate a 95% confidence
ellipsoid for α and β. (Using the ellipse command of the ellipse package in R).
3. Three potential sources of problems in using the linear model
(a) the standard, linear statistical model is not right (linearity, homoscedasticity, independence)
(b) the distributional assumption (normality) is wrong
(c) the data are “bad”
4. Examples of line-fitting
5. One important diagnostic tool: plot the residuals (versus x or y)
6. Influential data. (Use of R commands dfbeta and dfbetas. dfbeta returns the amount the
estimates of α and β would change if an individual observation is left out of the regression.
dfbetas returns the same value but the units are in number of standard deviations).
Homework
Read Devore and Farnum, Section 3.3, pages 123–125.
1. Do problem 3.26 of Devore and Farnum. The data can be found on the CD or in the builtin R
dataset data(anscombe).
2. Suppose that a research generates 95% confidence intervals for 10 different parameters as the
result of the analysis of a set of data. Suppose that random variables generating these confidence
intervals are not necessarily independent. What confidence level should the researcher use to
ensure that the probability that all confidence intervals contain their respective paramters is at
least 95%?
3. Use the builtin states dataset again. (Recall that you retrieve the data by data(state) and then
s=data.frame(state.x77). The variable s will now be a data.frame with the desired data).
(a) Write life expectancy as a linear function of illiteracy rate.
(b) Plot the residuals. Are there any features of the residual plot that indicate a violation of
the linear model assumptions?
(c) Which observation has the most influence over the coefficients in the regression equation?
4. A built in dataset obtained by data(women) gives the average weight for women of each height.
(a) Write weight as a linear function of height.
(b) Plot the residuals. Are there any features of the residual plot that indicate a violation of
the linear model assumptions?
(c) Can you find a transformation of the data that fits a linear model better?
April 12 - Checking normality, other lines
1. Quantile-quantile plots. A visual check for normality. (qqnorm or plot in linear regression) In regression, the standardized residuals have a standard normal distribution if the
hypothesis of normality is true.
2. Resistant regression. Standard choice is to minimize sums of squares of residuals:
n
X
i=1
e2i
=
n
X
(yi − ŷi )2
i=1
If f (e) = e2 , then the standard choice is to minimize
possible.
Pn
i=1
f (ei ). Other choices for f are
(a) f (e) = |e| is called “least absolute deviations” (LAD) regression.
(b) The following is called Huber’s method. It is essentially a compromise between least
squares and LAD. (rlm in the MASS package implements this in R). For an appropriate
choice of c, use the following function:
(
e2 /2
if |e| ≤ c
f (e) =
2
c|e| − c /2 otherwise
(c) Another choice is to use f (e) = e2 for the smallest residuals but f (e) = 0 for the
others. This is analogous to the trimmed mean and is called “least trimmed squares.”
(ltsreg in the MASS package in R implements this)
Homework
Read Devore and Farnum, Section 2.4 and Section 3.4, pages 133,134.
1. Continuing problem 3 of the last homework assignment,
(a) Plot life expectancy against illiteracy rate.
(b) Add to the plot the line produced from the regression.
(c) Add to the plot the line produced by rlm.
(d) Add to the plot the line produced by the lts regression.
(e) Comment on any significant differences that you see in the plots.
2. Continuing problem 3 on the last assignment, does the qq-plot of the standardized residuals
indicate any departure from normality?
April 12 - Multiple linear regression
1. Setting:
(a) k independent variables x1 , . . . , xk . One dependent variable y.
(b) n data points: (x11 , . . . , xk1 , y1 ), . . . , (x1i , . . . , xki , yi ), . . . , (x1n , . . . , xkn , yn ).
(c) Goal: fit a linear function y ≈ b0 + b1 x1 + · · · + bk xk .
(d) Geometry: fit a k-dimensional plane to n + 1 points in k + 1 dimensional space.
(e) Remark: we write b0 instead of a so we can talk about the coefficients as b’s. Note
that b0 can be viewed as similar to the bi ’s if we think of a new variable x0 where
x0i = 1 for all i.
2. Least squares solution: Choose b0 , . . . , bk to minimize the sums of squares of residuals:
SSResid =
n
X
(yi − ŷi )2
where ŷi = b0 + b1 x1i + · · · + bk xki
i=1
3. “Interpretation” of coefficients: a change in xj of 1 unit means a change in y of βj units,
all other variables xl being held constant. (But in applications, holding the other variables
constant might not be realistic.)
4. Use of R command lm to find the numbers b0 , . . . , bk . (Example.)
5. R2 has the same meaning: percentage of variation “explained” by the regression:
R2 =
SSResid
SSRegress
=1−
SSTotal
SSTotal
6. Generality of linear model:
(a) Can use functions of variables as new variables. Example: polynomial regression
(new variables x2 , x3 , . . . ).
(b) Variables can interact. Example: interaction terms (new variable x = xj xl ). Interpretation of interaction terms.
Homework
Read Devore and Farnum Section 3.5 and 3.4, pages 132–33.
1. Do problems 3.32,34,35,36.
April 15 - The standard linear statistical model
1. Setting:
(a) k independent variables x1 , . . . , xk . One dependent variable y.
(b) n data points: (x11 , . . . , xk1 , y1 ), . . . , (x1i , . . . , xki , yi ), . . . , (x1n , . . . , xkn , yn ).
(c) Goal: fit a linear function y ≈ b0 + b1 x1 + · · · + bk xk .
(d) Least squares solution: Choose b0 , . . . , bk to minimize the sums of squares of residuals:
SSResid =
n
X
(yi − ŷi )2
where ŷi = b0 + b1 x1i + · · · + bk xki
i=1
2. The standard linear statistical model (lm in R)
(a) yi = β0 + β1 x1i + · · · + βk xki + ei (So β0 + β1 x∗1 + · · · + βk x∗k is the mean value of y
for a fixed tuple (x∗1 , . . . , x∗k ) of independent variables.)
(b) The errors, ei have mean 0, variance σ 2 , and are independent.
(c) Additionally, when confidence intervals are desired, the errors have a normal distribution.
3. The coefficients bj of the least squares solution are unbiased estimators of the paramters
βj and have minimum variance among all unbiased estimators.
4. An unbiased estimator of σ 2 is SSResid/(n − (k + 1)). Thus we define se , the standard
error, by
s
SSResid
se =
(n − (k + 1))
5. How many estimators should one use? Some issues.
(a) Adding an estimator will always increase R2 since it will decrease SSResid.
(b) If an estimator doesn’t increase R2 “very much” it might not be useful in the model.
We will want to quantify this.
6. Side remark, inclusion of categorical variables xj in the model by xj = 0, 1. Be careful in
interpreting the coefficient βj !
Homework
Read Devore and Farnum Section 11.4
1. Do problems 11.26,27,28.
Patriot’s Day – Multiple Linear Regression, Inferences
1. Setting:
(a) k independent variables x1 , . . . , xk . One dependent variable y.
(b) n data points: (x11 , . . . , xk1 , y1 ), . . . , (x1i , . . . , xki , yi ), . . . , (x1n , . . . , xkn , yn ).
2. The standard linear statistical model (lm in R)
(a) yi = β0 + β1 x1i + · · · + βk xki + ei (So β0 + β1 x∗1 + · · · + βk x∗k is the mean value of y for a fixed tuple
(x∗1 , . . . , x∗k ) of independent variables.)
(b) The errors, ei have mean 0, variance σ 2 , and are independent.
(c) The random variables ei have normal distributions.
3. Confidence intervals for the parameters βj .
(a) Statistical software gives estimates bj for βj and estimates sbj of the standard deviation of bj .
(b) Then
b−β
sb
has a t-distribution with n − (k + 1) degrees of freedom.
(c) A confidence interval for any β is b + t∗ sb where t∗ is the critical value for a t-distribution with
n − (k + 1) degrees of freedom.
(d) Similarly, we can get confidence intervals and prediction intervals for fixed (x∗1 , . . . , x∗k ).
(e) Cautions: the declining confidence in multiple confidence intervals; the interpretation of a confidence
interval in the presence of other coefficients
4. An alternate way to interpret the output - Testing the “hypothesis” that β = 0.
(a) If β = 0, t = b/sb has a t-distribution with n − (k + 1) degrees of freedom.
(b) The probability that a random variable with a t-distribution is at least as great as t is called the
p-value of the statistic.
(c) The p-value is a measure of how “surprising” that a t-value this extreme occured. If the p-value is
very small, we doubt that β = 0. Otherwise, we think that β = 0 is consistent with the data.
(d) Computing p-values is directly related to computing confidence intervals and a p-value doesn’t provide
as much information as a confidence interval.
5. Adjusted R2 . R2 will increase as we add more variables to the model. To correct for this, there is a statistic
called adjusted R2 :
SSResid/(n − (k + 1))
adj-R2 = 1 −
SSTotal/(n − 1)
Homework
Read Devore and Farnum Section 11.5, pages 525–527, 530–532.
1. In the data section of the course website is a datafile consisting of the scores of 32 students of Mathematics
222 on three tests and a final exam. In this question we investigate using the test scores to predict the final
exam score. (After all, if the test scores do a good enough job, I wouldn’t have to grade the final exam!)
(a) Write a linear function Exam = b0 + b1 Test1 + b2 Test2 + b3 Test3 that can be used to predict the final
exam score from the three test scores.
(b) Write a 95% confidence interval for the parameter β1 in the model corresponding to our estimate b1 .
(c) Do the p-values for the coefficients lead you to suspect that one or more of the βi are not very useful
in the model? Explain.
(d) For each of the three independent variables, fit a linear function that does not include that variable.
Compare the values of adj-R2 for each those models to each other and to the full model. Which model
would you use to predict exam scores and why?
(e) If a student scores 85 on each test, what is the predicted exam score? What is a 90% confidence
interval for that prediction?
April 19 — Choosing model variables
1. The standard linear statistical model (lm in R)
(a) yi = β0 + β1 x1i + · · · + βk xki + ei
(b) The errors, ei have mean 0, variance σ 2 , and are independent.
(c) The random variables ei have normal distributions.
2. Problem: How many variables (of the k) should we keep? There is a tradeoff - more variables
“explains” more variation but makes a more complicated model.
3. Some solutions suggested by yesterday:
(a) Eliminate any variable with a “large” p-value (in other words, the evidence in favor of β 6= 0
not strong)
(b) Find the collection of variables with largest adjusted-R2 .
4. One approach.
(a) Stepwise regression, add or subtract one variable at a time based on a criterion.
(b) Want to mimimize some (increasing) function of SSResid and k. (As k or SSResid increases,
model becomes less desirable).
(c) Candidates for function to minimize (there are theoretical reasons for these)
AIC = n ln
SSResid
+ 2(k + 1)
n
BIC = n ln
SSResid
+ (k + 1) ln n
n
(d) step in R does both AIC (default) and BIC stepwise regression.
(e) Stepwise regression may not find the best model since it does not look at every subset of
variables.
Homework
Read Devore and Farnum Section 11.6, pages 538–545.
1. The data page at the course website has a dataset that includes statistics on five years of majorleague baseball. We desire to take certain statistics computed on a per-game basis and use these
to predict the number of runs-per-game (RG). The version of the dataset that is useful in this
regard is the “per game” version. The variables are RG (runs), X1BG (singles), X2BG (doubles),
X3BG (triples), HRG (home runs), SOG (strikeouts), SBG (stolen bases), CSG (caught stealing).
(a) Use all of the other variables in a linear function to predict RG.
(b) Based on the t-values in the preceding analysis, which variables are obvious candidates for
removal from the model? Refit the model without these variables. Compute adjusted R2
for the two models.
(c) Employ stepwise regresson in R (step). Does step give the same result as the analysis in
part (b)?
(d) Using the model from (c), what variable should be the next to remove? Refit the model
without that variable and compare adjusted R2 of this model to the models of (b) and (c).
(e) If you know something about baseball, do the coefficients in the model of (c) “make sense?”
April 22 — Hypothesis Testing
1. Hypothesis testing is an alternate way to present inferences from data about the parameters of the underlying statistical model. Hypothesis testing used to be used more heavily than it is now and in most
circumstances, there are better (more informative) inference methods such as confidence intervals.
2. Hypotheses. A hypothesis proposes a possible state of affairs with respect to the distribution from which
the data comes. Examples:
(a) A hypothesis stating a fixed value of a parameter: µ = 0.
(b) A hypothesis stating a range of values of a paramter: µ ≥ 3.
(c) A hypothesis about the nature of the distribution itself; X has a normal distribution.
3. The hypotheses of an experiment.
(a) Null Hypothesis. The null hypothesis, usually denoted H0 , is generally a hypothesis that the data
analysis is intended to investigate. It is usually thought of as the “default” or “status quo” hypothesis
that we will accept unless the data gives us substantial evidence against it.
(b) Alternate Hypothesis. The alternate hypothesis, usually denoted H1 or Ha , is the hypothesis that
we are wanting to put forward as true if we have sufficient evidence against the null hypothesis.
(c) Possible Decisions. On the basis of the data we will either reject H0 or fail to reject H0 (in favor
of H1 ).
(d) Asymmetry. Note that H0 and H1 are not treated equally. The idea is that H0 is the default and
only if we are reasonably sure that H0 is false do we reject it in favor of H1 . H0 is “innocent until
proven guilty” and this metaphor from the criminal justice system is good to keep in mind.
4. How do we decide?
(a) We construct a “test” statistic T : that is a number computed from the data that we will use to
assess the likelihood that H0 is true.
(b) We announce a critical region, a set of possible values of the test statistic T that would cause us to
reject H0 .
(c) We choose the critical region so that
i. values of the test statistic in the critical region tend to support H1 in favor of H1 and
ii. the size of the critical region is such that the probability of rejecting H0 if it is true is small.
5. Errors. There are two types of errors with extraordinarily obvious but uninteresting and uninformative
names:
(a) Type I error. We reject H0 even though it is true. The probability of a type I error is denoted by α.
(b) Type II error. We do not reject H0 even though it is false. The probability of a type II error is
denoted by β.
Both errors are bad but the asymmetricity of the hypotheses suggests that Type I errors are worse than
Type II errors. Thus, we usually want to construct our test statistic and critical region so that α is small.
This requires knowing the distribution of the test statistic T if H0 is true. This in turn restricts the kinds
of hypotheses that we can choose for H0 .
6. p-value.After the data is collected, we report the decision (reject or not) and, usually, the p-value which
is the probability that, if H0 is true a value of the statistic at least as extreme in favor of H1 would occur.
Homework, Due Monday, April 25, a short assignment but a short due-date
1. Read Devore and Farnum Section 8.1.
2. Do problems 8.2–8.
April 22 — Hypothesis Testing – M&M’s
1. Advertised distribution of M&M colors;
brown
red
orange
13%
13%
20%
yellow
blue
green
14%
24%
16%
2. The multinomial distribution. k + 1 paramters: n, π1 , . . . , πk such that π1 + · · · πk = 1. Interpretation:
X1 , . . . , Xk have a multinomial distribution if there are n independent trials of an experiment which can
result in one of k distinct outcomes and Xi is the number of trials that result in the ith outcome.
3. Want to test the hypothesis
H0 :
π1 = .13
π2 = .14 π3 = .13
π4 = .24 π5 = .20
π6 = .16
or in general the hypothesis
H0 :
π1 = π1,0
...
πk = πk,0
4. Development of a test statistic. Given the result of the experiment x1 , . . . , xk want a statistic T that
measures deviation of the result from what would be expected if H0 is true.
5. Candidate:
6. Decision: reject H0 if T is too large.
7. If the null hypothesis is true then the distribution of T is approximately a distribution that is called the
distribution. It has one parameter k − 1, called the degrees of freedom.
8. Decision: reject H0 if
Homework, Due Monday, May 2,
1. Read Devore and Farnum 8.3, pages 373-376.
2. Do problems 8.42–44.
May 2 – Analysis of Variance
1. The setting for (one-way) analysis of variance (ANOVA).
(a) a dependent (response) variable (usually a continuous random variable)
(b) an independent (treatment, factor) variable that is categorical with several (at least
two) categories called levels
(c) Want to know whether the different levels explain (some of) the variation in the
dependent variable
2. Related to what we have done before:
(a) difference in means between two different populations, two sample t-test, the case of
two levels
(b) linear regression
3. Examples.
4. Useful graphical representations (side-by-side boxplots, stripcharts)
5. Explaining variation: pictures.
6. Formal setting:
(a) k (normal) populations with means µ1 , . . . , µk respectively and common variance σ 2 .
(b) data xij is the j th observation of the ith population (1 ≤ i ≤ k, 1 ≤ j ≤ ni ). The
data are independent observations.
(c) sample mean and standard deviation of ith level are x̄i and si .
(d) average of all the observations is x̄¯.
7. The crucial role of randomization.
Homework, Due Friday, May 6,
1. Read Devore and Farnum 9.1, pages 401–404.
The next three weeks
1. There will be no further homework assignments. Some problems will be assigned on current
material but these problems will form part of the take-home portion of the exam.
2. There will be an in-class portion to the final exam. It will be limited to questions taken
from the first three tests (the in-class portions thereof) although I reserve the right to
restate, modify, combine or otherwise improve those questions. Additional information
will be forthcoming.
3. The in-class component will be distributed on Monday, May 9, and will be due on Thursday, May 19, at 5:00 PM.
May 2 – Analysis of Variance
1. Formal setting:
(a) k (normal) populations with means µ1 , . . . , µk respectively and common variance σ 2 .
(b) data xij is the j th observation of the ith population (1 ≤ i ≤ k, 1 ≤ j ≤ ni ). The
data are independent observations.
(c) sample mean and standard deviation of ith level are x̄i and si .
(d) average of all the observations is x̄¯.
2. Sums of squares:
SSTotal =
ni
k X
X
(xij − x̄¯)2 = (n − 1)s2
(1)
i=1 j=1
SSError =
ni
k X
X
(xij − x̄i )2 =
i=1 j=1
SSTreatment =
ni
k X
X
k
X
(ni − 1)s2i
(2)
ni (x̄i − x̄¯)2
(3)
i=1
(x̄i − x̄¯)2 =
i=1 j=1
k
X
i=1
3. The fundamental equation (same as in linear regression but read “error” for “residual”
and “Treatment” for “regression”):
SSTotal = SSTreatment + SSError
4. Comparing SSTreatment to SSError. Define
MSE =
SSError
n−k
MSTr =
SSTreatment
k−1
F =
MSTr
MSE
5. If H0 : µ1 = · · · = µk is true, the F has an F -distribution with parameters (k − 1) and
(n − k). Reject H0 if F is too large.
6. The role of randomization. Without assuming anything in particular about the underlying distributions, if the xij all have the same distribution (i.e., there is no difference in
treatment effect by level), then the F statistic has a distribution that is approximately an
F distribution.
Homework, Due Friday, May 6,
1. Read Devore and Farnum 9.1, pages 403-406, and 9.2.
May 5 – Confidence intervals
1. Formal setting:
(a) k (normal) populations with means µ1 , . . . , µk respectively and common variance σ 2 .
(b) data xij is the j th observation of the ith population (1 ≤ i ≤ k, 1 ≤ j ≤ ni ). The
data are independent observations.
(c) sample mean and standard deviation of ith level are x̄i and si .
(d) average of all the observations is x̄¯.
2. The problem of constructing confidence intervals - multiple comparisons.
3. Basic facts:
(a) x̄i − x̄j has a normal distribution with mean µi − µj and variance
σ2
ni
+
σ2
.
nj
(b) s2e = MSE is an unbiased estimator of σ 2 .
4. Approach I: Bonferonni. If we want a family of k confidence intervals at 100(1 − α)%,
construct 100(1 − α/k)% confidence intervals. Guaranteed but conservative.
5. Approach II: Tukey Honest Significant Differences intervals. (Who was Tukey? What is
Honest?) Develop intervals based on the distribution of the range of a set of normally
distributed numbers. Intervals for µi − µj have form
s
1
MSE 1
∗
+
(x̄i − x̄j ) ± q
2
ni nj
where q ∗ is a critical value that depends on the studentized range distribution. (two
parameters, k and n − k).
Homework, Due Tuesday, May 10
1. Read Devore and Farnum Section 9.3, pages 416–420.
2. This is Problem 1 on the take-home final exam. Twenty-four rats (as identical as rats are)
were each fed one of four diets (imaginatively labeled A,B,C,D). The time (in seconds) for
their blood to coagulate was recorded. The data are available from the course webpage.
(a) Is it reasonable to believe that diet had no effect on blood coagulation? Explain.
(b) Identify which, if any, pairs of diets seem to cause different blood coagulation times.
(c) There are only six cases per diet. This is a very small number with which to check the
assumptions underlying the techniques that you used to answer the first two parts.
Nevertheless, do you see anything in the data that gives you any concerns about these
assumptions?
May 6 – Two-way ANOVA
1. Problem
(a) a dependent (response) variable, x, which is continous
(b) two different independent categorical variables (treatment and block, or two treatment)
(c) want to know if either or both categorical variables has an effect on mean value of x
2. Data: xijk with i ≤ a, j ≤ b, k ≤ nij . If all nij are equal, we call the experiment “balanced” and
use r (for replications) to denote nij .
3. Model: (not in book)
a
X
xijk = µ + αi + βj + eijk
αi = 0
i=1
b
X
βj = 0
j=1
where the errors, eijk are independent, normally distributed, with mean 0 and variance σ 2 .
4. Estimates (using different notation than that of the book, dots refer to variable to be averaged
over)
parameter
µ
estimate
¯
x̄
αi
x̄i.. − x̄...
βj
x̄.j. − x̄...
σ2
MSE
5. Sum of squares analysis:
SSTotal =
X
SSA =
X
SSB =
X
SSE =
X
(xijk − x̄... )2
ijk
(x̄i.. − x̄... )2
ijk
(x̄.j. − x̄... )2
ijk
(xijk − (x̄i.. − x̄... + x̄.j. − x̄... + x̄... ))2
ijk
6. Mean squares:
MSE =
SSE
abr − a − b + 1
MSA =
SSA
a−1
7. F-tests: If H0 : α1 = · · · = αa = 0 is true then F =
appropriate number of degrees of freedom.
8. TukeyHSD works.
Homework, Due Tuesday, May 10
1. Read Devore and Farnum Section 9.4.
MSA
MSE
MSB =
SSB
b−1
has an F -distribution with the
May 9 – Two-way ANOVA with interaction
1. Problem
(a) a dependent (response) variable, x, which is continous
(b) two different independent categorical variables (treatment and block, or two treatment)
(c) want to know if either or both categorical variables has an effect on mean value of x
2. Data: xijk with i ≤ a, j ≤ b, k ≤ r. Must have r > 1.
3. Model: xijk = µ + αi + βj + γij + eijk with the usual assumptions about eijk .
4. Sum of squares analysis:
SSTotal =
X
SSA =
X
SSB =
X
SSAB =
X
SSE =
X
(xijk − x̄... )2
ijk
(x̄i.. − x̄... )2
ijk
(x̄.j. − x̄... )2
ijk
(x̄ij. − (x̄i.. − x̄... + x̄.j. − x̄... + x̄... ))2
ijk
(xijk − x̄ij. )2
ijk
5. Mean squares:
MSE =
SSE
ab(r − 1)
MSA =
SSA
a−1
6. TukeyHSD works.
Homework
1. Read Devore and Farnum, Section 10.2
MSB =
SSB
b−1
MSAB =
SSAB
(a − 1)(b − 1)
R Reference Card
by Tom Short, EPRI PEAC, [email protected] 2004-11-07
Granted to the public domain. See www.Rpad.org for the source and latest
version. Includes material from R for Beginners by Emmanuel Paradis (with
permission).
Getting help
Most R functions have online documentation.
help(topic) documentation on topic
?topic id.
help.search("topic") search the help system
apropos("topic") the names of all objects in the search list matching
the regular expression ”topic”
help.start() start the HTML version of help
str(a) display the internal *str*ucture of an R object
summary(a) gives a “summary” of a, usually a statistical summary but it is
generic meaning it has different operations for different classes of a
ls() show objects in the search path; specify pat="pat" to search on a
pattern
ls.str() str() for each variable in the search path
dir() show files in the current directory
methods(a) shows S3 methods of a
methods(class=class(a)) lists all the methods to handle objects of
class a
character or factor columns are surrounded by quotes ("); sep is the
field separator; eol is the end-of-line separator; na is the string for
missing values; use col.names=NA to add a blank column header to
get the column headers aligned correctly for spreadsheet input
sink(file) output to file, until sink()
Most of the I/O functions have a file argument. This can often be a character string naming a file or a connection. file="" means the standard input or
output. Connections can include files, pipes, zipped files, and R variables.
On windows, the file connection can also be used with description =
"clipboard". To read a table copied from Excel, use
x <- read.delim("clipboard")
To write a table to the clipboard for Excel, use
write.table(x,"clipboard",sep="\t",col.names=NA)
For database interaction, see packages RODBC, DBI, RMySQL, RPgSQL, and
ROracle. See packages XML, hdf5, netCDF for reading other file formats.
Indexing lists
x[n]
list with elements n
x[[n]]
nth element of the list
x[["name"]] element of the list named "name"
x$name
id.
Indexing matrices
x[i,j]
element at row i, column j
x[i,]
row i
x[,j]
column j
x[,c(1,3)] columns 1 and 3
x["name",] row named "name"
Indexing data frames (matrix indexing plus the following)
x[["name"]] column named "name"
x$name
id.
Data creation
c(...) generic function to combine arguments with the default forming a
vector; with recursive=TRUE descends through lists combining all
elements into one vector
from:to generates a sequence; “:” has operator priority; 1:4 + 1 is “2,3,4,5”
seq(from,to) generates a sequence by= specifies increment; length=
specifies desired length
seq(along=x) generates 1, 2, ..., length(along); useful for for
loops
rep(x,times) replicate x times; use each= to repeat “each” element of x each times; rep(c(1,2,3),2) is 1 2 3 1 2 3;
rep(c(1,2,3),each=2) is 1 1 2 2 3 3
data.frame(...) create a data frame of the named or unnamed
Input and output
arguments; data.frame(v=1:4,ch=c("a","B","c","d"),n=10);
load() load the datasets written with save
shorter vectors are recycled to the length of the longest
data(x) loads specified data sets
list(...)
create a list of the named or unnamed arguments;
library(x) load add-on packages
list(a=c(1,2),b="hi",c=3i);
read.table(file) reads a file in table format and creates a data
array(x,dim=) array with data x; specify dimensions like
frame from it; the default separator sep="" is any whitespace; use
dim=c(3,4,2); elements of x recycle if x is not long enough
header=TRUE to read the first line as a header of column names; use
matrix(x,nrow=,ncol=) matrix; elements of x recycle
as.is=TRUE to prevent character vectors from being converted to facfactor(x,levels=) encodes a vector x as a factor
tors; use comment.char="" to prevent "#" from being interpreted as
gl(n,k,length=n*k,labels=1:n) generate levels (factors) by speca comment; use skip=n to skip n lines before reading data; see the
ifying the pattern of their levels; k is the number of levels, and n is
help for options on row naming, NA treatment, and others
the number of replications
read.csv("filename",header=TRUE) id. but with defaults set for
expand.grid() a data frame from all combinations of the supplied vecreading comma-delimited files
tors or factors
read.delim("filename",header=TRUE) id. but with defaults set
rbind(...) combine arguments by rows for matrices, data frames, and
for reading tab-delimited files
others
 ,as.is=FALSE)
read.fwf(file,widths,header=FALSE,sep=""
cbind(...) id. by columns
read a table of f ixed width f ormatted data into a ’data.frame’; widths
Slicing and extracting data
is an integer vector, giving the widths of the fixed-width fields
save(file,...) saves the specified objects (...) in the XDR platform- Indexing vectors
independent binary format
x[n]
nth element
save.image(file) saves all objects
x[-n]
all but the nth element
cat(..., file="", sep=" ") prints the arguments after coercing to x[1:n]
first n elements
character; sep is the character separator between arguments
x[-(1:n)]
elements from n+1 to the end
print(a, ...) prints its arguments; generic, meaning it can have differ- x[c(1,4,2)]
specific elements
ent methods for different objects
x["name"]
element named "name"
format(x,...) format an R object for pretty printing
x[x > 3]
all elements greater than 3
write.table(x,file="",row.names=TRUE,col.names=TRUE, x[x > 3 & x < 5]
all elements between 3 and 5
sep=" ") prints x after converting to a data frame; if quote is TRUE, x[x %in% c("a","and","the")] elements in the given set
Variable conversion
as.array(x), as.data.frame(x), as.numeric(x),
as.logical(x), as.complex(x), as.character(x),
... convert type; for a complete list, use methods(as)
Variable information
is.na(x), is.null(x), is.array(x), is.data.frame(x),
is.numeric(x), is.complex(x), is.character(x),
... test for type; for a complete list, use methods(is)
length(x) number of elements in x
dim(x) Retrieve or set the dimension of an object; dim(x) <- c(3,2)
dimnames(x) Retrieve or set the dimension names of an object
nrow(x) number of rows; NROW(x) is the same but treats a vector as a onerow matrix
ncol(x) and NCOL(x) id. for columns
class(x) get or set the class of x; class(x) <- "myclass"
unclass(x) remove the class attribute of x
attr(x,which) get or set the attribute which of x
attributes(obj) get or set the list of attributes of obj
Data selection and manipulation
which.max(x) returns the index of the greatest element of x
which.min(x) returns the index of the smallest element of x
rev(x) reverses the elements of x
sort(x) sorts the elements of x in increasing order; to sort in decreasing
order: rev(sort(x))
cut(x,breaks) divides x into intervals (factors); breaks is the number
of cut intervals or a vector of cut points
match(x, y) returns a vector of the same length than x with the elements
of x which are in y (NA otherwise)
which(x == a) returns a vector of the indices of x if the comparison operation is true (TRUE), in this example the values of i for which x[i]
== a (the argument of this function must be a variable of mode logical)
choose(n, k) computes the combinations of k events among n repetitions
= n!/[(n − k)!k!]
na.omit(x) suppresses the observations with missing data (NA) (suppresses the corresponding line if x is a matrix or a data frame)
na.fail(x) returns an error message if x contains at least one NA
unique(x) if x is a vector or a data frame, returns a similar object but with
the duplicate elements suppressed
table(x) returns a table with the numbers of the differents values of x
(typically for integers or factors)
subset(x, ...) returns a selection of x with respect to criteria (...,
typically comparisons: x$V1 < 10); if x is a data frame, the option
select gives the variables to be kept or dropped using a minus sign
sample(x, size) resample randomly and without replacement size elements in the vector x, the option replace = TRUE allows to resample
with replacement
prop.table(x,margin=) table entries as fraction of marginal table
Math
sin,cos,tan,asin,acos,atan,atan2,log,log10,exp
max(x) maximum of the elements of x
min(x) minimum of the elements of x
range(x) id. then c(min(x), max(x))
sum(x) sum of the elements of x
diff(x) lagged and iterated differences of vector x
prod(x) product of the elements of x
mean(x) mean of the elements of x
median(x) median of the elements of x
quantile(x,probs=) sample quantiles corresponding to the given probabilities (defaults to 0,.25,.5,.75,1)
weighted.mean(x, w) mean of x with weights w
rank(x) ranks of the elements of x
var(x) or cov(x) variance of the elements of x (calculated on n − 1); if x is
a matrix or a data frame, the variance-covariance matrix is calculated
sd(x) standard deviation of x
cor(x) correlation matrix of x if it is a matrix or a data frame (1 if x is a
vector)
var(x, y) or cov(x, y) covariance between x and y, or between the
columns of x and those of y if they are matrices or data frames
cor(x, y) linear correlation between x and y, or correlation matrix if they
are matrices or data frames
round(x, n) rounds the elements of x to n decimals
log(x, base) computes the logarithm of x with base base
scale(x) if x is a matrix, centers and reduces the data; to center only use
the option center=FALSE, to reduce only scale=FALSE (by default
center=TRUE, scale=TRUE)
pmin(x,y,...) a vector which ith element is the minimum of x[i],
y[i], . . .
pmax(x,y,...) id. for the maximum
cumsum(x) a vector which ith element is the sum from x[1] to x[i]
cumprod(x) id. for the product
cummin(x) id. for the minimum
cummax(x) id. for the maximum
union(x,y), intersect(x,y), setdiff(x,y), setequal(x,y),
is.element(el,set) “set” functions
Re(x) real part of a complex number
Im(x) imaginary part
Mod(x) modulus; abs(x) is the same
Arg(x) angle in radians of the complex number
Conj(x) complex conjugate
convolve(x,y) compute the several kinds of convolutions of two sequences
fft(x) Fast Fourier Transform of an array
mvfft(x) FFT of each column of a matrix
filter(x,filter) applies linear filtering to a univariate time series or
to each series separately of a multivariate time series
Many math functions have a logical parameter na.rm=FALSE to specify missing data (NA) removal.
Matrices
t(x) transpose
diag(x) diagonal
%*% matrix multiplication
solve(a,b) solves a %*% x = b for x
solve(a) matrix inverse of a
rowsum(x) sum of rows for a matrix-like object; rowSums(x) is a faster
version
colsum(x), colSums(x) id. for columns
rowMeans(x) fast version of row means
colMeans(x) id. for columns
Advanced data processing
apply(X,INDEX,FUN=) a vector or array or list of values obtained by
applying a function FUN to margins (INDEX) of X
lapply(X,FUN) apply FUN to each element of the list X
tapply(X,INDEX,FUN=) apply FUN to each cell of a ragged array given
by X with indexes INDEX
by(data,INDEX,FUN) apply FUN to data frame data subsetted by INDEX
merge(a,b) merge two data frames by common columns or row names
xtabs(a b,data=x) a contingency table from cross-classifying factors
aggregate(x,by,FUN) splits the data frame x into subsets, computes
summary statistics for each, and returns the result in a convenient
form; by is a list of grouping elements, each as long as the variables
in x
stack(x, ...) transform data available as separate columns in a data
frame or list into a single column
unstack(x, ...) inverse of stack()
reshape(x, ...) reshapes a data frame between ’wide’ format with
repeated measurements in separate columns of the same record and
’long’ format with the repeated measurements in separate records;
use (direction=”wide”) or (direction=”long”)
Strings
paste(...) concatenate vectors after converting to character; sep= is the
string to separate terms (a single space is the default); collapse= is
an optional string to separate “collapsed” results
substr(x,start,stop) substrings in a character vector; can also assign, as substr(x, start, stop) <- value
strsplit(x,split) split x according to the substring split
grep(pattern,x) searches for matches to pattern within x; see ?regex
gsub(pattern,replacement,x) replacement of matches determined
by regular expression matching sub() is the same but only replaces
the first occurrence.
tolower(x) convert to lowercase
toupper(x) convert to uppercase
match(x,table) a vector of the positions of first matches for the elements
of x among table
x %in% table id. but returns a logical vector
pmatch(x,table) partial matches for the elements of x among table
nchar(x) number of characters
Dates and Times
The class Date has dates without times. POSIXct has dates and times, including time zones. Comparisons (e.g. >), seq(), and difftime() are useful.
Date also allows + and −. ?DateTimeClasses gives more information. See
also package chron.
as.Date(s) and as.POSIXct(s) convert to the respective class;
format(dt) converts to a string representation. The default string
format is “2001-02-21”. These accept a second argument to specify a
format for conversion. Some common formats are:
%a, %A Abbreviated and full weekday name.
%b, %B Abbreviated and full month name.
%d Day of the month (01–31).
%H Hours (00–23).
%I Hours (01–12).
%j Day of year (001–366).
%m Month (01–12).
%M Minute (00–59).
%p AM/PM indicator.
%S Second as decimal number (00–61).
%U Week (00–53); the first Sunday as day 1 of week 1.
%w Weekday (0–6, Sunday is 0).
%W Week (00–53); the first Monday as day 1 of week 1.
%y Year without century (00–99). Don’t use.
%Y Year with century.
%z (output only.) Offset from Greenwich; -0800 is 8 hours west of.
%Z (output only.) Time zone as a character string (empty if not available).
Where leading zeros are shown they will be used on output but are optional
on input. See ?strftime.
Plotting
plot(x) plot of the values of x (on the y-axis) ordered on the x-axis
plot(x, y) bivariate plot of x (on the x-axis) and y (on the y-axis)
hist(x) histogram of the frequencies of x
barplot(x) histogram of the values of x; use horiz=FALSE for horizontal
bars
dotchart(x) if x is a data frame, plots a Cleveland dot plot (stacked plots
line-by-line and column-by-column)
pie(x) circular pie-chart
boxplot(x) “box-and-whiskers” plot
sunflowerplot(x, y) id. than plot() but the points with similar coordinates are drawn as flowers which petal number represents the number of points
stripplot(x) plot of the values of x on a line (an alternative to
boxplot() for small sample sizes)
coplot(x˜y | z) bivariate plot of x and y for each value or interval of
values of z
interaction.plot (f1, f2, y) if f1 and f2 are factors, plots the
means of y (on the y-axis) with respect to the values of f1 (on the
x-axis) and of f2 (different curves); the option fun allows to choose
the summary statistic of y (by default fun=mean)
matplot(x,y) bivariate plot of the first column of x vs. the first one of y,
the second one of x vs. the second one of y, etc.
fourfoldplot(x) visualizes, with quarters of circles, the association between two dichotomous variables for different populations (x must
be an array with dim=c(2, 2, k), or a matrix with dim=c(2, 2) if
k = 1)
assocplot(x) Cohen–Friendly graph showing the deviations from independence of rows and columns in a two dimensional contingency table
mosaicplot(x) ‘mosaic’ graph of the residuals from a log-linear regression of a contingency table
pairs(x) if x is a matrix or a data frame, draws all possible bivariate plots
between the columns of x
plot.ts(x) if x is an object of class "ts", plot of x with respect to time, x
may be multivariate but the series must have the same frequency and
dates
ts.plot(x) id. but if x is multivariate the series may have different dates
and must have the same frequency
qqnorm(x) quantiles of x with respect to the values expected under a normal law
qqplot(x, y) quantiles of y with respect to the quantiles of x
contour(x, y, z) contour plot (data are interpolated to draw the
curves), x and y must be vectors and z must be a matrix so that
dim(z)=c(length(x), length(y)) (x and y may be omitted)
filled.contour(x, y, z) id. but the areas between the contours are
coloured, and a legend of the colours is drawn as well
image(x, y, z) id. but with colours (actual data are plotted)
persp(x, y, z) id. but in perspective (actual data are plotted)
stars(x) if x is a matrix or a data frame, draws a graph with segments or a
star where each row of x is represented by a star and the columns are
the lengths of the segments
symbols(x, y, ...) draws, at the coordinates given by x and y, symbols (circles, squares, rectangles, stars, thermometres or “boxplots”)
which sizes, colours . . . are specified by supplementary arguments
termplot(mod.obj) plot of the (partial) effects of a regression model
(mod.obj)
The following parameters are common to many plotting functions:
add=FALSE if TRUE superposes the plot on the previous one (if it exists)
axes=TRUE if FALSE does not draw the axes and the box
type="p" specifies the type of plot, "p": points, "l": lines, "b": points
connected by lines, "o": id. but the lines are over the points, "h":
vertical lines, "s": steps, the data are represented by the top of the
vertical lines, "S": id. but the data are represented by the bottom of
the vertical lines
xlim=, ylim= specifies the lower and upper limits of the axes, for example with xlim=c(1, 10) or xlim=range(x)
xlab=, ylab= annotates the axes, must be variables of mode character
main= main title, must be a variable of mode character
sub= sub-title (written in a smaller font)
Low-level plotting commands
points(x, y) adds points (the option type= can be used)
lines(x, y) id. but with lines
text(x, y, labels, ...) adds text given by labels at coordinates (x,y); a typical use is: plot(x, y, type="n"); text(x, y,
names)
mtext(text, side=3, line=0, ...) adds text given by text in
the margin specified by side (see axis() below); line specifies the
line from the plotting area
segments(x0, y0, x1, y1) draws lines from points (x0,y0) to points
(x1,y1)
arrows(x0, y0, x1, y1, angle= 30, code=2) id. with arrows
at points (x0,y0) if code=2, at points (x1,y1) if code=1, or both if
code=3; angle controls the angle from the shaft of the arrow to the
edge of the arrow head
abline(a,b) draws a line of slope b and intercept a
abline(h=y) draws a horizontal line at ordinate y
abline(v=x) draws a vertical line at abcissa x
abline(lm.obj) draws the regression line given by lm.obj
rect(x1, y1, x2, y2) draws a rectangle which left, right, bottom, and
top limits are x1, x2, y1, and y2, respectively
polygon(x, y) draws a polygon linking the points with coordinates given
by x and y
legend(x, y, legend) adds the legend at the point (x,y) with the symbols given by legend
title() adds a title and optionally a sub-title
axis(side, vect) adds an axis at the bottom (side=1), on the left (2),
at the top (3), or on the right (4); vect (optional) gives the abcissa (or
ordinates) where tick-marks are drawn
rug(x) draws the data x on the x-axis as small vertical lines
locator(n, type="n", ...) returns the coordinates (x, y) after the
user has clicked n times on the plot with the mouse; also draws symbols (type="p") or lines (type="l") with respect to optional graphic
parameters (...); by default nothing is drawn (type="n")
Graphical parameters
These can be set globally with par(...); many can be passed as parameters
to plotting commands.
adj controls text justification (0 left-justified, 0.5 centred, 1 right-justified)
bg specifies the colour of the background (ex. : bg="red", bg="blue", . . .
the list of the 657 available colours is displayed with colors())
bty controls the type of box drawn around the plot, allowed values are: "o",
"l", "7", "c", "u" ou "]" (the box looks like the corresponding character); if bty="n" the box is not drawn
cex a value controlling the size of texts and symbols with respect to the default; the following parameters have the same control for numbers on
the axes, cex.axis, the axis labels, cex.lab, the title, cex.main,
and the sub-title, cex.sub
col controls the color of symbols and lines; use color names: "red", "blue"
see colors() or as "#RRGGBB"; see rgb(), hsv(), gray(), and
rainbow(); as for cex there are: col.axis, col.lab, col.main,
col.sub
font an integer which controls the style of text (1: normal, 2: italics, 3:
bold, 4: bold italics); as for cex there are: font.axis, font.lab,
font.main, font.sub
las an integer which controls the orientation of the axis labels (0: parallel to
the axes, 1: horizontal, 2: perpendicular to the axes, 3: vertical)
lty controls the type of lines, can be an integer or string (1: "solid",
2: "dashed", 3: "dotted", 4: "dotdash", 5: "longdash", 6:
"twodash", or a string of up to eight characters (between "0" and
"9") which specifies alternatively the length, in points or pixels, of
the drawn elements and the blanks, for example lty="44" will have
the same effect than lty=2
lwd a numeric which controls the width of lines, default 1
mar a vector of 4 numeric values which control the space between the axes
and the border of the graph of the form c(bottom, left, top,
right), the default values are c(5.1, 4.1, 4.1, 2.1)
mfcol a vector of the form c(nr,nc) which partitions the graphic window
as a matrix of nr lines and nc columns, the plots are then drawn in
columns
mfrow id. but the plots are drawn by row
pch controls the type of symbol, either an integer between 1 and 25, or any
single character within ""
1 ● 2
16 ● 17
3
18
4
5
19 ● 20 ●
6
7
21 ● 22
8
23
9
24
10 ● 11
25
* *
12
.
13 ● 14
15
X X a a ? ?
ps an integer which controls the size in points of texts and symbols
pty a character which specifies the type of the plotting region, "s": square,
"m": maximal
tck a value which specifies the length of tick-marks on the axes as a fraction
of the smallest of the width or height of the plot; if tck=1 a grid is
drawn
tcl a value which specifies the length of tick-marks on the axes as a fraction
of the height of a line of text (by default tcl=-0.5)
xaxt if xaxt="n" the x-axis is set but not drawn (useful in conjonction with
axis(side=1, ...))
yaxt if yaxt="n" the y-axis is set but not drawn (useful in conjonction with
axis(side=2, ...))
Lattice (Trellis) graphics
xyplot(y˜x) bivariate plots (with many functionalities)
barchart(y˜x) histogram of the values of y with respect to those of x
dotplot(y˜x) Cleveland dot plot (stacked plots line-by-line and columnby-column)
densityplot(˜x) density functions plot
histogram(˜x) histogram of the frequencies of x
bwplot(y˜x) “box-and-whiskers” plot
qqmath(˜x) quantiles of x with respect to the values expected under a theoretical distribution
stripplot(y˜x) single dimension plot, x must be numeric, y may be a
factor
qq(y˜x) quantiles to compare two distributions, x must be numeric, y may
be numeric, character, or factor but must have two ‘levels’
splom(˜x) matrix of bivariate plots
parallel(˜x) parallel coordinates plot
levelplot(z˜x*y|g1*g2) coloured plot of the values of z at the coordinates given by x and y (x, y and z are all of the same length)
wireframe(z˜x*y|g1*g2) 3d surface plot
cloud(z˜x*y|g1*g2) 3d scatter plot
In the normal Lattice formula, y x|g1*g2 has combinations of optional conditioning variables g1 and g2 plotted on separate panels. Lattice functions
take many of the same arguments as base graphics plus also data= the data
frame for the formula variables and subset= for subsetting. Use panel=
to define a custom panel function (see apropos("panel") and ?llines).
Lattice functions return an object of class trellis and have to be print-ed to
produce the graph. Use print(xyplot(...)) inside functions where automatic printing doesn’t work. Use lattice.theme and lset to change Lattice
defaults.
Optimization and model fitting
optim(par, fn, method = c("Nelder-Mead", "BFGS",
"CG", "L-BFGS-B", "SANN") general-purpose optimization;
par is initial values, fn is function to optimize (normally minimize)
nlm(f,p) minimize function f using a Newton-type algorithm with starting
values p
lm(formula) fit linear models; formula is typically of the form response
termA + termB + ...; use I(x*y) + I(xˆ2) for terms made of
nonlinear components
glm(formula,family=) fit generalized linear models, specified by giving a symbolic description of the linear predictor and a description of
the error distribution; family is a description of the error distribution
and link function to be used in the model; see ?family
nls(formula) nonlinear least-squares estimates of the nonlinear model
parameters
approx(x,y=) linearly interpolate given data points; x can be an xy plotting structure
spline(x,y=) cubic spline interpolation
loess(formula) fit a polynomial surface using local fitting
Many of the formula-based modeling functions have several common arguments: data= the data frame for the formula variables, subset= a subset of
variables used in the fit, na.action= action for missing values: "na.fail",
"na.omit", or a function. The following generics often apply to model fitting
functions:
predict(fit,...) predictions from fit based on input data
df.residual(fit) returns the number of residual degrees of freedom
coef(fit) returns the estimated coefficients (sometimes with their
standard-errors)
residuals(fit) returns the residuals
deviance(fit) returns the deviance
fitted(fit) returns the fitted values
logLik(fit) computes the logarithm of the likelihood and the number of
parameters
AIC(fit) computes the Akaike information criterion or AIC
Statistics
aov(formula) analysis of variance model
anova(fit,...) analysis of variance (or deviance) tables for one or more
fitted model objects
density(x) kernel density estimates of x
binom.test(),
pairwise.t.test(),
power.t.test(),
prop.test(), t.test(), ... use help.search("test")
Distributions
rnorm(n, mean=0, sd=1) Gaussian (normal)
rexp(n, rate=1) exponential
rgamma(n, shape, scale=1) gamma
rpois(n, lambda) Poisson
rweibull(n, shape, scale=1) Weibull
rcauchy(n, location=0, scale=1) Cauchy
rbeta(n, shape1, shape2) beta
rt(n, df) ‘Student’ (t)
rf(n, df1, df2) Fisher–Snedecor (F) (χ2 )
rchisq(n, df) Pearson
rbinom(n, size, prob) binomial
rgeom(n, prob) geometric
rhyper(nn, m, n, k) hypergeometric
rlogis(n, location=0, scale=1) logistic
rlnorm(n, meanlog=0, sdlog=1) lognormal
rnbinom(n, size, prob) negative binomial
runif(n, min=0, max=1) uniform
rwilcox(nn, m, n), rsignrank(nn, n) Wilcoxon’s statistics
All these functions can be used by replacing the letter r with d, p or q to
get, respectively, the probability density (dfunc(x, ...)), the cumulative
probability density (pfunc(x, ...)), and the value of quantile (qfunc(p,
...), with 0 < p < 1).
Programming
function( arglist ) expr function definition
return(value)
if(cond) expr
if(cond) cons.expr else alt.expr
for(var in seq) expr
while(cond) expr
repeat expr
break
next
Use braces {} around statements
ifelse(test, yes, no) a value with the same shape as test filled
with elements from either yes or no
do.call(funname, args) executes a function call from the name of
the function and a list of arguments to be passed to it
Continuous distributions in R
Distribution
exponential
normal
uniform
Weibull
beta
parameters
λ
µ, σ
a, b
α, β
α, β
R suffix
exp
norm
unif
weibull
beta
1. Computing density functions. Use dexp, dnorm, etc.
dexp(2,3) gives the value of the density of the exponential at x = 2 with λ = 3. dexp(x,3)
computes the values of density for all values in vector x.
2. Computing proportions. Use pexp, pnorm, etc.
pexp(2,3) gives the integral 02 f (x)dx where f is the density of the exponential with λ = 3.
(This is the proportion of values of the variable for x ≤ 2.)
R
3. Computing percentiles. Use qexp, qnorm, etc.
qexp(.7,3) gives the number q such that
exponential with parameter 3.
Rq
0
f (x)dx = .7 where f (x) is the density of the
4. Generating random numbers. Use rexp, rnorm, etc.
rexp(10,3) gives 10 random numbers from an exponential distribution with λ = 3. That
is the random numbers are expected to occur according to the relative frequencies as given
by the exponential distribution.
You can also use a vector for the first argument in each of these commands.
Solution to Problem 2.30, page 78
The grader found considerable confusion on this problem and finally gave up in reading it. Here is
a solution, with some generality added. You are given a distribution (in this case an exponential
distribution with parameter λ) and are asked to find the proportion of the population that exceeds the
mean by more than 1 standard deviation. If f (x) is the density for the distribution, that means that
you are to compute
Z
∞
f (x) dx
µ+σ
To compute this integral, we need to know µ and σ. For the distribution in question, µ = λ. We need
only compute σ. Now σ 2 for any distribution is given by
∞
Z
σ2 =
(x − µ)2 f (x) dx
−∞
which by the hint in the problem is equal to
∞
Z
σ2 =
x2 f (x) dx − µ2
−∞
In the special case of this problem then we need to compute
σ2 =
Z
∞
x2 λe−λx dx − 1/λ2
0
since µ = 1/λ. The integral, after integrating by parts twice evaluates to 2/λ2 so σ 2 = 1/λ2 . Thus
σ = 1/λ. Therefore the answer to the problem is
Z
∞
λe−λx dx
2/λ
This is a dead easy integral to evaluate. Its value is e−2 .
Study Sheet for Test 1 — Tuesday, March 1
1. The test will cover all material from the course through that of Tuesday, February 22
(including the homework assigned on that day).
2. The textbook sections covered include 1.1–1.6, 2.1–2.3, 4.3, 5.1–5.3. On occasion we
covered topics not in the textbook. And we left out a few things that were in the
textbook. The test also covers the supplementary notes Sections 1 and 2.
3. The daily outlines are the best guide to what we covered. You should be familiar with
all the terminology listed on those sheets.
4. You will not be expected to know the exact formulas for the density functions for
important distributions. But you will be expected to know about distributions in
general (what is a density function, mean, variance, etc.) And you will be expected
to know how a particular distribution (like the normal or exponential) is shaped and
what a particular distribution might be a model for. You should be able to compute
relative proportions, means, and variances given a (fairly simple) density function.
5. I’ll supply a copy of the table on the front cover of the book. Know how to use it to
answer questions about arbitrary normal distributions.
6. I won’t ask you to compute statistics or draw things like histograms or boxplots on the
fly. But you should know how they are computed and drawn. In particular, I might
show you R output and ask for interpretation.
7. To repeat the policy on missed tests, you may miss this test for any reason you deem
appropriate. But there are no makeup tests. If you miss the test or score higher on
the final exam than on this test, your final exam score will replace the first test score.
8. There are two sections of this course so the same exam will be given at 9 AM and 1:30
PM. You may take it at either time. Discussing the test with another student after
you see it but before 2:30 PM is a serious breach of trust and is punishable by death.
9. You may use calculators to calculate things. But you may not use the calculator to
retrieve stored information such as definitions or the like.