Download ch14_1_2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Degrees of freedom (statistics) wikipedia , lookup

Transcript
Chapter
14
Inference on the
Least-Squares
Regression Model
and Multiple
Regression
Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
Requirement 1 for Inference on the
Least-Squares Regression Model
For any particular value of the explanatory variable x,
the mean of the corresponding responses in the
population depends linearly on x. That is,
 y|x  1 x  0
for some numbers β0 and β1, where μy|x represents the
population mean response when the value of the
explanatory variable is x.
14-2
Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
Requirement 2 for Inference on the
Least-Squares Regression Model
The response variables are normally distributed with
mean  y|x  1 x  0 and standard deviation σ.
14-3
Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
“In Other Words”
When doing inference on the least-squares
regression model, we require (1) for any
explanatory variable, x, the mean of the
response variable, y, depends on the value of x
through a linear equation, and (2) the response
variable, y, is normally distributed with a
constant standard deviation, σ. The mean
increases/decreases at a constant rate depending
on the slope, while the standard deviation
remains constant.
14-4
Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
A large value of σ, the population standard
deviation, indicates that the data are widely
dispersed about the regression line, and a small
value of σ indicates that the data lie fairly close to
the regression line.
14-5
Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
The least-squares regression model is given by
yi  1 xi  0  i
where
yi is the value of the response variable for the ith
individual
β0 and β1 are the parameters to be estimated based on
sample data
xi is the value of the explanatory variable for the ith
individual
εi is a random error term with mean 0 an variance
 2i   2 , the error terms are independent.
i =1,…,n, where n is the sample size (number of
ordered pairs in the data set)
14-6
Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
The standard error of the estimate, se, is found
using the formula
se 
14-7
  yi  yˆi 
n2
2

 residuals
Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
n2
2
14-8
Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
CAUTION!
Be sure to divide by n – 2 when computing the
standard error of the estimate.
14-9
Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
14-10
Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
Hypothesis Test Regarding the
Slope Coefficient, β1
To test whether two quantitative variables are
linearly related, we use the following steps
provided that
1. the sample is obtained using random sampling.
2. the residuals are normally distributed with
constant error variance.
14-11
Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
Step 1: Determine the null and alternative
hypotheses. The hypotheses can be
structured in one of three ways:
Two-tailed
Left-Tailed
Right-Tailed
H0: β1 = 0
H0: β1 = 0
H0: β1 = 0
H1: β1 ≠ 0
H1: β1 < 0
H1: β1 > 0
Step 2: Select a level of significance, α, depending
on the seriousness of making a Type I
error.
14-12
Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
Classical Approach
Step 3: Compute the test statistic
b1  1 b1
t0 

sb
sb
1
1
which follows Student’s t-distribution with
n – 2 degrees of freedom. Remember,
when computing the test statistic, we
assume the null hypothesis to be true. So,
we assume that β1 = 0. Use Table VI to
determine the critical value using n – 2
degrees of freedom.
14-13
Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
Classical Approach
Step 4: Compare the critical value with the test
statistic.
14-14
Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
P-Value Approach
By Hand Step 3: Compute the test statistic
b1  1 b1
t0 

sb
sb
1
1
which follows Student’s t-distribution with
n – 2 degrees of freedom. Use Table VI to
approximate the P-value.
14-15
Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
P-Value Approach
Step 4: If the P-value < α, reject the null
hypothesis.
14-16
Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
CAUTION!
Before testing H0: β1 = 0, be sure to draw a
residual plot to verify that a linear model is
appropriate.
14-17
Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
Parallel Example 5: Testing for a Linear Relation
Test the claim that there is a linear relation between drill
depth and drill time at the α = 0.05 level of significance
using the drilling data.
14-18
Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
Solution
Verify the requirements:
• We assume that the experiment was randomized so
that the data can be assumed to represent a random
sample.
• In Parallel Example 4 we confirmed that the residuals
were normally distributed by constructing a normal
probability plot.
• To verify the requirement of constant error variance,
we plot the residuals against the explanatory variable,
drill depth.
14-19
Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
There is no discernable pattern.
14-20
Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
Solution
Step 1: We want to determine whether a linear relation
exists between drill depth and drill time without
regard to the sign of the slope. This is a two-tailed
test with
H0: β1 = 0 versus H1: β1 ≠ 0
Step 2: The level of significance is α = 0.05.
Step 3: Using technology, we obtained an estimate of β1 in
Parallel Example 2, b1= 0.0116. To determine the
2
standard deviation of b1, we compute  x i  x  .
The calculations are on the next slide.
14-21
Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
14-22
Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
Solution
Step 3, cont’d: We have
sb1 
se
x
 x
2
i
0.5197

 0.0030
30006.25
The test statistic is
b1 0.0116
t0 

 3.867
sb1
0.003


14-23
Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
Solution: Classical Approach
Step 3: cont’d Since this is a two-tailed test, we
determine the critical t-values at the α = 0.05
level of significance with n – 2 = 12 – 2 = 10
degrees of freedom to be –t0.025 = –2.228 and
t0.025 = 2.228.
Step 4: Since the value of the test statistic, 3.867, is
greater than 2.228, we reject the null
hypothesis.
14-24
Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
Solution: P-Value Approach
Step 3: Since this is a two-tailed test, the P-value is the
sum of the area under the t-distribution with 12 – 2
= 10 degrees of freedom to the left of –t0 = –3.867
and to the right of t0 = 3.867. Using Table VI we
find that with 10 degrees of freedom, the value
3.867 is between 3.581 and 4.144 corresponding to
right-tail areas of 0.0025 and 0.001, respectively.
Thus, the P-value is between 0.002 and 0.005.
Step 4: Since the P-value is less than the level of
significance, 0.05, we reject the null hypothesis.
14-25
Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
Solution
Step 5: There is sufficient evidence at the α = 0.05
level of significance to conclude that a linear
relation exists between drill depth and drill
time.
14-26
Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
Confidence Intervals for the Slope of the
Regression Line
A (1 – α)•100% confidence interval for the slope of the
true regression line, β1, is given by the following
formulas:
Lower bound:
Upper bound:
b1  t 2 
b1  t 2 
se
 x
i
x

2
se
 x
i
x

2
Here, ta/2 is computed using n – 2 degrees of freedom.
14-27
Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
Note: The confidence interval formula for β1
can be computed only if the data are randomly
obtained, the residuals are normally
distributed, and there is constant error
variance.
14-28
Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
Parallel Example 7: Constructing a Confidence Interval for
the Slope of the True Regression Line
Construct a 95% confidence interval for the slope of the
least-squares regression line for the drilling example.
14-29
Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
Solution
The requirements for the usage of the confidence
interval formula were verified in previous
examples.
We also determined
• b1 = 0.0116
• sb  0.0030
in previous examples.
1
14-30
Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
Solution
Since t0.025 = 2.228 for 10 degrees of freedom, we
have
Lower bound = 0.0116 – 2.228 • 0.003 = 0.0049
Upper bound = 0.0116 + 2.228 • 0.003 = 0.0183.
We are 95% confident that the mean increase in
the time it takes to drill 5 feet for each additional
foot of depth at which the drilling begins is
between 0.005 and 0.018 minutes.
14-31
Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
Section
14.2
Confidence and
Prediction
Intervals
Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
Confidence intervals for a mean response are
intervals constructed about the predicted value of
y, at a given level of x, that are used to measure
the accuracy of the mean response of all the
individuals in the population.
Prediction intervals for an individual response
are intervals constructed about the predicted
value of y that are used to measure the accuracy
of a single individual’s predicted value.
14-33
Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
Confidence Interval for the Mean Response of y, yˆ .
A (1 – α)•100% confidence interval for yˆ , the mean
response of y for a specified value of x, is given by
yˆ  t 2  se
x  x

1


n x i  x 2
yˆ  t 2  se
x  x

1

2
n x i  x 
*
Lower bound:

*
Upper bound:

2
2
where x* is the given value of the explanatory variable, n is
the number of observations, and tα/2 is the critical value
with n – 2 degrees of freedom.
14-34

Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
Parallel Example 1: Constructing a Confidence Interval for a
Mean Response
Construct a 95% confidence interval about the predicted
mean time to drill 5 feet for all drillings started at a
depth of 110 feet.
14-35
Copyright © 2013, 2010 and 2007 Pearson Education, Inc.

Solution
The least squares regression line is yˆ  0.0116x  5.5273
.
To find the predicted mean time to drill 5 feet for all
drillings started at 110 feet, let x*=110 in the regression
equation and obtain yˆ 
6.8033 .
Recall:
•
se=0.5197

•
x  126.25
•
xi  x   30006.25
•
t0.025 = 2.228 for 10 degrees of freedom
14-36
2
Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
Solution
Therefore,
Lower bound:
1 110 126.25
6.8033  2.228  0.5197

 6.45
12
30006.25
2
Upper bound:

 14-37
1 110 126.25
6.8033  2.228  0.5197

 7.15
12
30006.25
2
Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
Solution
We are 95% confident that the mean time to drill 5
feet for all drillings started at a depth of 110 feet is
between 6.45 and 7.15 minutes.
14-38
Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
Prediction Interval for an Individual Response aboutyö
A (1 – α)•100% prediction interval for y
ˆ , the individual response of
y, is given by
Lower bound:
yˆ  t 2  se

Upper bound:

yˆ  t 2  se
1
1 
n
x  x 
x  x 
1
1 
n
x  x 
x  x 
2
*
2
i
2
*
2
i
where x* is the given value of the explanatory variable, n is
the number of observations, and tα/2 is the critical value with

n – 2 degrees
of freedom.
14-39
Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
Parallel Example 2: Constructing a Prediction Interval for an
Individual Response
Construct a 95% prediction interval about the predicted
time to drill 5 feet for a single drilling started at a depth
of 110 feet.
14-40
Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
Solution
The least squares regression line is
yˆ  0.0116x  5.5273 . To find the predicted
mean time to drill 5 feet for all drillings started at
110 feet, let x*=110 in the regression equation and
obtain yˆ  6.8033.

Recall:
• se=0.5197

• x  126.25
• x i  x 2  30006.25
• t0.025=2.228 for 10 degrees of freedom
14-41
Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
Solution
Therefore,
Lower bound:
1 110 126.25
6.8033  2.228  0.5197 1 
 5.59
12
30006.25
2
Upper bound:


1 110 126.25
6.8033  2.228  0.5197 1 
 8.01
12
30006.25
2
14-42
Copyright © 2013, 2010 and 2007 Pearson Education, Inc.
Solution
We are 95% confident that the time to drill 5
feet for a random drilling started at a depth of 110
feet is between 5.59 and 8.01 minutes.
14-43
Copyright © 2013, 2010 and 2007 Pearson Education, Inc.