Download Dr. Frank Wood - Columbia Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Choice modelling wikipedia , lookup

Time series wikipedia , lookup

Regression analysis wikipedia , lookup

Coefficient of determination wikipedia , lookup

Linear regression wikipedia , lookup

Transcript
ANOVA
Dr. Frank Wood
Frank Wood, [email protected]
Linear Regression Models
Lecture 6, Slide 1
ANOVA
• ANOVA is nothing new but is instead a way of
organizing the parts of linear regression so as
to make easy inference recipes.
• Will return to ANOVA when discussing
multiple regression and other types of linear
statistical models.
Frank Wood, [email protected]
Linear Regression Models
Lecture 6, Slide 2
Partitioning Total Sum of Squares
• “The ANOVA approach is based on the
partitioning of sums of squares and degrees
of freedom associated with the response
variable Y”
• We start with the observed deviations of Yi
around the observed mean Ȳ
Yi − Ȳ
Frank Wood, [email protected]
Linear Regression Models
Lecture 6, Slide 3
Partitioning of Total Deviations
SSTO
Frank Wood, [email protected]
SSE
Linear Regression Models
SSR
Lecture 6, Slide 4
Measure of Total Variation
• The measure of total variation is denoted by
SST O =
(Yi − Ȳ )2
• SSTO stands for total sum of squares
• If all Yi’s are the same, SSTO = 0
• The greater the variation of the Yi’s the
greater SSTO
Frank Wood, [email protected]
Linear Regression Models
Lecture 6, Slide 5
Variation after predictor effect
• The measure of variation of the Yi’s that is still
present when the predictor variable X is taken
into account is the sum of the squared
deviations
SSE = (Yi − Ŷi )2
• SSE denotes error sum of squares
Frank Wood, [email protected]
Linear Regression Models
Lecture 6, Slide 6
Regression Sum of Squares
• The difference between SSTO and SSE is
SSR
SSR =
(Ŷi − Ȳ )2
• SSR stands for regression sum of squares
Frank Wood, [email protected]
Linear Regression Models
Lecture 6, Slide 7
Partitioning of Sum of Squares
Yi − Ȳ = Ŷi − Ȳ + Yi − Ŷi
Total
deviation
Frank Wood, [email protected]
Deviation
of fitted
regression
value
around mean
Deviation
around
fitted
regression
line
Linear Regression Models
Lecture 6, Slide 8
Remarkable Property
• The sums of the same deviations squared
has the same property!
(Yi − Ȳ )2 = (Ŷi − Ȳ )2 + (Yi − Ŷi )2
or
SSTO = SSR + SSE
• Proof:
Frank Wood, [email protected]
Linear Regression Models
Lecture 6, Slide 9
Remarkable Property
(Yi − Ȳ )2 = (Ŷi − Ȳ )2 + (Yi − Ŷi )2
• Proof:
(Yi − Ȳ )
2
=
=
=
[(Ŷi − Ȳ ) + (Yi − Ŷi )]2
[(Ŷi − Ȳ )2 + (Yi − Ŷi )2 + 2(Ŷi − Ȳ )(Yi − Ŷi )]
2
2
ˆ
(Ŷi − Ȳ ) +
(Yi − Yi ) + 2
(Ŷi − Ȳ )(Yi − Ŷi )
but
(Ŷi − Ȳ )(Yi − Ŷi ) =
Ŷi (Yi − Ŷi ) −
Ȳ (Yi − Ŷi ) = 0
By properties previously demonstrated
Frank Wood, [email protected]
Linear Regression Models
Lecture 6, Slide 10
Remember: Lecture 3
• The ith residual is defined to be
ei = Yi − Ŷi
• The sum of the residuals is zero:
ei
=
=
=
0
i
Frank Wood, [email protected]
(Yi − b0 − b1 Xi )
Yi − nb0 − b1
Xi
By first normal equation.
Linear Regression Models
Lecture 6, Slide 11
Remember: Lecture 3
• The sum of the weighted residuals is zero
when the residual in the ith trial is weighted by
the fitted value of the response variable for
the ith trial
Ŷi ei
=
i
i
=
b0
(b0 + b1 Xi )ei
ei + b1
i
=
Frank Wood, [email protected]
0
Linear Regression Models
ei Xi
i
By previous properties.
Lecture 6, Slide 12
Breakdown of Degrees of Freedom
• SSTO
– 1 linear constraint due to the calculation and
inclusion of the mean
• n-1 degrees of freedom
• SSE
– 2 linear constraints arising from the estimation of
β and β
• n-2 degrees of freedom
• SSR
– Two degrees of freedom in the regression
parameters, one is lost due to linear constraint
• 1 degree of freedom
Frank Wood, [email protected]
Linear Regression Models
Lecture 6, Slide 13
Mean Squares
• A sum of squares divided by its associated
degrees of freedom is called a mean square
– The regression mean square is
M SR =
SSR
1
= SSR
– The error mean square is
M SE =
Frank Wood, [email protected]
SSE
n−2
Linear Regression Models
Lecture 6, Slide 14
ANOVA table for simple lin. regression
Source of
Variation
SS
Regression
SSR =
(Ŷi − Ȳ )
df
MS
1
MSR =
SSR/1
2
Error
n-2
SSE =
(Yi − Ŷi )2
Total
MSE =
SSE/(n-2)
E{MS}
2
σ +
β12
(Xi − X̄)2
σ2
n-1
SST O =
(Yi − Ȳ )2
Frank Wood, [email protected]
Linear Regression Models
Lecture 6, Slide 15
E{M SE} = σ 2
• We know from earlier lectures that
– SSE/σ ~ χ(n-2)
• That means that E{SSE/σ} = n-2
• And thus that E{SSE/(n-2)} = E{MSE} = σ
Frank Wood, [email protected]
Linear Regression Models
Lecture 6, Slide 16
2
E{M SR} = σ +
2
β1
(Xi − X̄)
2
• To begin, we take an alternative but
equivalent form for SSR
SSR = b21 (Xi − X̄)2
• And note that, by definition of variance we
can write
σ 2 {b1 } = E{b21 } − (E{b1 })2
Frank Wood, [email protected]
Linear Regression Models
Lecture 6, Slide 17
2
E{M SR} = σ +
2
β1
(Xi − X̄)
2
• But we know that b1 is an unbiased estimator
of β so E{b1} = β
• We also know (from previous lectures) that
2
σ {b1 } =
2
σ
(Xi −X̄)2
• So we can rearrange terms and plug in
σ 2 {b1 }
=
E{b21 }
=
Frank Wood, [email protected]
E{b21 } − (E{b1 })2
σ2
2
+
β
1
(Xi − X̄)2
Linear Regression Models
Lecture 6, Slide 18
2
E{M SR} = σ +
2
β1
(Xi − X̄)
2
• From the previous slide
E{b21 }
=
σ2
2
+
β
1
2
(Xi − X̄)
• Which brings us to this result
E{M SR}
=
E{SSR/1}
= E{b21 } (Xi − X̄)2 = σ 2 + β12 (Xi − X̄)2
Frank Wood, [email protected]
Linear Regression Models
Lecture 6, Slide 19
Comments and Intuition
• The mean of the sampling distribution of MSE
is σ regardless of whether X and Y are
linearly related (i.e. whether β = 0)
• The mean of the sampling distribution of MSR
is also σ when β = 0.
– When β = 0 the sampling distributions of MSR
and MSE tend to be the same
Frank Wood, [email protected]
Linear Regression Models
Lecture 6, Slide 20
F Test of β = 0 vs. β ≠ 0
• ANOVA provides a battery of useful tests.
For example, ANOVA provides an easy test
for
Test statistic from before
– Two-sided test
• H0 : β = 0
• Ha : β ≠ 0
• Test statistic
∗
t =
b1 −0
s{b1 }
ANOVA test statistic
∗
F =
Frank Wood, [email protected]
Linear Regression Models
M SR
M SE
Lecture 6, Slide 21
Sampling distribution of F*
• The sampling distribution of F* when H0(β =
0) holds can be derived starting from
Cochran’s theorem
• Cochran’s theorem
– If all n observations Yi come from the same
normal distribution with mean µ and variance σ,
and SSTO is decomposed into k sums of squares
SSr, each with degrees of freedom dfr, then the
SSr/σ terms are independent χ variables with dfr
degrees of freedom if
k
r=1
Frank Wood, [email protected]
dfr = n − 1
Linear Regression Models
Lecture 6, Slide 22
The F Test
• We have decomposed SSTO into two sums
of squares SSR and SSE and their degrees
of freedom are additive, hence, by Cochran’s
theorem:
– If β = 0 so that all Yi have the same mean µ = β
and the same variance σ, SSE/σ and SSR/σ
are independent χ variables
Frank Wood, [email protected]
Linear Regression Models
Lecture 6, Slide 23
F* Test Statistic
• F* can be written as follows
∗
F =
M SR
M SE
=
SSR/σ 2
1
SSE/σ 2
n−2
• But by Cochran’s theorem, we have when H0
holds
∗
F ∼
Frank Wood, [email protected]
χ2 (1)
1
χ2 (n−2)
n−2
Linear Regression Models
Lecture 6, Slide 24
F Distribution
• The F distribution is the ratio of two
independent χ random variables.
• The test statistic F* follows the distribution
– F* ~ F(1,n-2)
Frank Wood, [email protected]
Linear Regression Models
Lecture 6, Slide 25
Hypothesis Test Decision Rule
• Since F* is distributed as F(1,n-2) when H0
holds, the decision rule to follow when the risk
of a Type I error is to be controlled at α is:
– If F* ≤ F(1-α; 1, n-2), conclude H0
– If F* > F(1-α; 1, n-2) conclude Ha
Frank Wood, [email protected]
Linear Regression Models
Lecture 6, Slide 26
F distribution
• PDF, CDF, Inverse CDF of F distribution
F Density Function
F Inverse CDF
F Cumulative Density Function
1.4
7
1
0.9
1.2
6
0.8
5
0.7
0.8
0.6
0.6
4
X
fcdf(X,1,100)
fpdf(X,1,100)
1
0.5
3
0.4
0.3
0.4
2
0.2
0.2
1
0.1
0
0
1
2
3
4
5
X
6
7
8
9
10
0
0
1
2
3
4
5
X
6
7
8
9
10
0
0
0.1
0.2
0.3
0.4
0.5
0.6
fcdf(X,1,100)
0.7
0.8
0.9
1
• Note, MSR/MSE must be big in order to reject
hypothesis.
Frank Wood, [email protected]
Linear Regression Models
Lecture 6, Slide 27
Partitioning of Total Deviations
• Does this make sense? When is MSR/MSE
big?
SSTO
Frank Wood, [email protected]
SSE
Linear Regression Models
SSR
Lecture 6, Slide 28
General Linear Test
• The test of β = 0 versus β ≠ 0 is but a single
example of a general test for a linear
statistical models.
• The general linear test has three parts
– Full Model
– Reduced Model
– Test Statistic
Frank Wood, [email protected]
Linear Regression Models
Lecture 6, Slide 29
Full Model Fit
• The standard full simple linear regression
model is first fit to the data
Yi = β0 + β1 Xi + ǫi
• Using this model the error sum of squares is
obtained
SSE(F ) =
2
[Yi − (b0 + b1 Xi )] =
Frank Wood, [email protected]
Linear Regression Models
(Yi − Ŷi )2 = SSE
Lecture 6, Slide 30
Fit Reduced Model
• For instance, so far we have considered
– H0 : β = 0
– Ha : β ≠ 0
• The model when H0 holds is called the
reduced or restricted model. Here this results
in β = 0
Yi = β0 + ǫi
• The SSE for the reduced model is obtained
SSE(R) =
2
(Yi − b0 ) =
Frank Wood, [email protected]
(Yi − Ȳ )2 = SST O
Linear Regression Models
Lecture 6, Slide 31
Test Statistic
• The idea is to compare the two error sums of
squares SSE(F) and SSE(R).
• Because F has more parameters than R
– SSE(F) ≤ SSE(R) always
• The relevant test statistic is
∗
F =
SSE(R)−SSE(F )
dfR −dfF
SSE(F )
dfF
which follows the F distribution when H0 holds.
• dfR and dfF are those associated with the reduced
and full model error sumes of square respectively
Frank Wood, [email protected]
Linear Regression Models
Lecture 6, Slide 32
R2
• SSTO measures the variation in the
observations Yi when X is not considered
• SSE measures the variation in the Yi after a
predictor variable X is employed
• A natural measure of the effect of X in
reducing variation in Y is to express the
reduction in variation (SSTO-SSE = SSR) as
a proportion of the total variation
R2 =
Frank Wood, [email protected]
SSR
SST O
=1−
Linear Regression Models
SSE
SST O
Lecture 6, Slide 33