Download The Standard Error of the Slope

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Time series wikipedia , lookup

German tank problem wikipedia , lookup

Choice modelling wikipedia , lookup

Linear regression wikipedia , lookup

Regression analysis wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
11.1 Variation in the Slope from Sample to Sample
Objectives
 To learn that the regression equation ŷ  b0  b1 x must sometimes be thought of as an
estimate of a true underlying linear model, y  0  1 x   .
 To understand that the slope of a regression line fitted from sample data will vary from
sample to sample.
 To learn that having a wider spread in the values of x and a smaller spread in the values of
y (for each x) decreases the variability of the slope b1.
 To learn that the variability in the values of y from a sample is measured in terms of the
difference of y and its predicted value ŷ and so is equivalent to the variability in the
residuals.
Linear Models
In Chapter 3 we learned about the following regression line equation which describes a linear
relationship:
ŷ  b0  b1 x
If this equation is based on a sample from the entire population then the values of b0 and b1 are
simply estimates of the true population parameters 0 and 1 .
The standard model for linear regression is:
response = prediction from true regression line + random deviation
y = ( b0 + b1 x ) + e
y = my + e
where b0 = intercept for the true regression line
b1 = slope for the true regression line
e = random error
m y = mean of the values of y for the given
value of the explanatory variable x
(lies on the true regression line)
1
11.1 Variation in the Slope from Sample to Sample
The values of y in a sample can be thought of as:
observed response = fitted response + residual
y  ˆy  residual
y  ( b0  b1 x )  residual
Activity 11.1a: How Fast Do Kids Grow?
Variability in x and y
“Above” a single fixed value of x there are different values of y. The mean and variability of
these values is called the conditional distribution of y given x. Each conditional distribution has
a mean,  y , and a measure of variability,  . The mean of each conditional distribution lies on
the theoretical line and the variability is assumed to be the same for each value of x. This implies
that  measures the variability of all values of y about the true regression line. This fact is used
to estimate  , essentially the standard deviation of the residuals, from your data.
s
  y  ˆy 
i
n2
i
2

SSE
n2
conditional distribution of y given x
mean =  y standard deviation = 
2
11.1 Variation in the Slope from Sample to Sample
The Standard Error of the Slope
The different possible values of b1 are called the sampling distribution of the slope. The standard error
(standard deviation) of this sampling distribution is:
sb =
1
å( y - ŷ )
i
se
å( x - x )
i
2
i
=
2
i
n-2
å( x - x )
i
2
i
=
standard deviation of the residuals
n -1 ×standard deviation of the x-values
As you can see from the formula the variability in the slope of the regression line from sample to sample
depends on the following:



The sample size, n. Larger sample sizes result in less variability in the slope.
The spread in the values of x. A wider spread in x results in regression lines with less
variability in their slope.
The variability of y at each fixed value of x (   s ). More variability in each conditional
distribution of y means more variability in the slope b1.
Example: According to Leonardo da Vinci, a person’s arm span and height are about equal. The table
below gives height and arm span measurements for a sample of 15 high-school students.
Arm Span (cm)
Height (cm)
168
170.5
172
170
101
107
161
159
166
166
174
175
153.5
158
95
95.5
129
132.5
169
165
175
179
154
149
142
143
156.5
158
a) What is the theoretical regression line, m y = b 0 + b1x , that Leonardo is proposing for this
situation? Use arm span as the explanatory variable.
b) Find the least squares regression line, ŷ = b0 + b1x , for these data. Interpret the slope. Compare
the estimated slope and intercept to the theoretical slope and intercept in part a. Are they close?
c) Calculate the random deviation, e , for each student, using the theoretical regression line. Plot
these random deviations against the arm spans and comment on the pattern.
d) For each student, calculate the residual from the estimated regression line. Plot the residuals
against the arm spans and comment on the pattern. Is the pattern similar to that for the random
deviations?
3
11.1 Variation in the Slope from Sample to Sample
164
161