Download sbs2e_ppt_ch16

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Chapter 16
Inference for
Regression
Copyright © 2012 Pearson Education. All rights reserved.
Copyright © 2012 Pearson Education. All rights reserved.
16-1
Inference for Regression
In Chapter 6, we modeled relationships by fitting a straight line
to a sample of ordered pairs. The Nambe Mills regression line
is
Price  4.871  4.200 Time
Now we want to know, how useful is this model?
Copyright © 2012 Pearson Education. All rights reserved.
16-2
16.1 The Population and the Sample
The Nambe Mills sample is based on 59 observations.
But we know observations vary from sample to sample. So we
imagine a true line that summarizes the relationship between x
and y for the entire population,
 y   0  1 x
Where µy is the population mean of y at a given value of x.
We write µy instead of y because the regression line assumes
that the means of the y values for each value of x fall exactly on
the line.
Copyright © 2012 Pearson Education. All rights reserved.
16-3
16.1 The Population and the Sample
For a given value x:
 Most, if not all, of the y values obtained from a particular
sample will not lie on the line.
 The sampled y values will be distributed about µy.
 We can account for the difference between ŷ and µy by
adding the error residual, or ε : y   0  1 x  
Copyright © 2012 Pearson Education. All rights reserved.
16-4
16.1 The Population and the Sample
Regression Inference
 Collect a sample and estimate the population β’s by finding
a regression line (Chapter 6):
yˆ  b0  b1 x
b0 estimates  0 , b1 estimates 1
 The residuals e = y – ŷ are the sample based versions of ε.
 Account for the uncertainties in β0 and β1 by making
confidence intervals, as we’ve done for means and
proportions.
Copyright © 2012 Pearson Education. All rights reserved.
16-5
16.2 Assumptions and Conditions
The inference methods of Chapter 16 are based on these
assumptions (check these assumptions in this order):
1. Linearity Assumption
2. Independence Assumption
3. Equal Variance Assumption
4. Normal Population Assumption
Copyright © 2012 Pearson Education. All rights reserved.
16-6
16.2 Assumptions and Conditions
The inference methods of Chapter 16 are based on these
assumptions (check these assumptions in this order):
1. Linearity Assumption – This condition is satisfied if the
scatterplot of x and y looks straight.
2. Independence Assumption – Look for randomization in the
sample or the experiment. Also check the residual plot for
lack of patterns.
Copyright © 2012 Pearson Education. All rights reserved.
16-7
16.2 Assumptions and Conditions
3. Equal Variance Assumption – Check the Equal Spread
Condition, which means the variability of y should be about
the same for all values of x.
4. Normal Population Assumption – Assume the errors
around the idealized regression line at each value of x follow
a Normal model. Check if the residuals satisfy the Nearly
Normal Condition.
Copyright © 2012 Pearson Education. All rights reserved.
16-8
16.2 Assumptions and Conditions
Summary of Assumptions and Conditions
Copyright © 2012 Pearson Education. All rights reserved.
16-9
16.2 Assumptions and Conditions
Summary of Assumptions and Conditions
1. Make a scatterplot of the data to check for linearity.
(Linearity Assumption)
2. Fit a regression and find the residuals, e, and predicted
values ŷ.
3. Plot the residuals against time (if appropriate) and check
for evidence of patterns (Independence Assumption).
4. Make a scatterplot of the residuals against x or the
predicted values. This plot should not exhibit a “fan” or
“cone” shape. (Equal Variance Assumption)
Copyright © 2012 Pearson Education. All rights reserved.
16-10
16.2 Assumptions and Conditions
Testing the Assumptions, continued
5. Make a histogram and Normal probability plot of the
residuals (Normal Population Assumption)
Data from Nambé Mills (Chapter 8)
Copyright © 2012 Pearson Education. All rights reserved.
16-11
16.3 The Standard Error of the Slope
For a sample, we expect b1 to be close, but not equal to the
model slope β1. For similar samples, the standard error of the
slope is a measure of the variability of b1 about the true slope β1.
Copyright © 2012 Pearson Education. All rights reserved.
16-12
16.3 The Standard Error of the Slope
Which of these scatterplots would give the more consistent
regression slope estimate if we were to sample repeatedly from
Hint: Compare se’s.
the underlying population?
Copyright © 2012 Pearson Education. All rights reserved.
16-13
16.3 The Standard Error of the Slope
Which of these scatterplots would give the more consistent
regression slope estimate if we were to sample repeatedly from
Hint: Compare sx’s.
the underlying population?
Copyright © 2012 Pearson Education. All rights reserved.
16-14
16.3 The Standard Error of the Slope
Which of these scatterplots would give the more consistent
regression slope estimate if we were to sample repeatedly from
Hint: Compare n’s.
the underlying population?
Copyright © 2012 Pearson Education. All rights reserved.
16-15
16.4 A Test for the Regression Slope
Copyright © 2012 Pearson Education. All rights reserved.
16-16
16.4 A Test for the Regression Slope
The usual null hypothesis about the slope is that it’s equal to 0.
Why? A slope of zero says that y doesn’t tend to change linearly
when x changes. In other words, if the slope equals zero, there
is no linear association between the two variables.
Copyright © 2012 Pearson Education. All rights reserved.
16-17
16.4 A Test for the Regression Slope
Copyright © 2012 Pearson Education. All rights reserved.
16-18
16.4 A Test for the Regression Slope
Copyright © 2012 Pearson Education. All rights reserved.
16-19
16.4 A Test for the Regression Slope
Example : Soap
A soap manufacturer tested a standard bar of soap to see
how long it would last. A test subject showered with the
soap each day for 15 days and recorded the weight (in
grams) remaining. Conditions were met so a linear
regression gave the following:
Dependent variable is: Weight
R squared = 99.5%
s = 2.949
Variable
Coefficient
SE(Coeff)
Intercept
123.141
1.382
Day
-5.57476
0.1068
t-ratio
89.1
-52.2
P-value
<0.0001
<0.0001
What is the standard deviation of the residuals?
What is the standard error of b1?
What are the hypotheses for the regression slope?
At α = 0.05, what is the conclusion?
Copyright © 2012 Pearson Education. All rights reserved.
16-20
16.4 A Test for the Regression Slope
Example : Soap
A soap manufacturer tested a standard bar of soap to see how long
it would last. A test subject showered with the soap each day for 15
days and recorded the weight (in grams) remaining. Conditions
were met so a linear regression gave the following:
Dependent variable is: Weight
R squared = 99.5%
s = 2.949
Variable
Coefficient
SE(Coeff)
Intercept
123.141
1.382
Day
-5.57476
0.1068
t-ratio
89.1
-52.2
P-value
<0.0001
<0.0001
What is the standard deviation of the residuals? se = 2.949
What is the standard error of b1 ? SE( b1 ) = 0.0168
Copyright © 2012 Pearson Education. All rights reserved.
16-21
16.4 A Test for the Regression Slope
Example : Soap
A soap manufacturer tested a standard bar of soap to see how long
it would last. A test subject showered with the soap each day for 15
days and recorded the weight (in grams) remaining. Conditions
were met so a linear regression gave the following:
Dependent variable is: Weight
R squared = 99.5%
s = 2.949
Variable
Coefficient
SE(Coeff)
Intercept
123.141
1.382
Day
-5.57476
0.1068
t-ratio
89.1
-52.2
P-value
<0.0001
<0.0001
H o : 1  0
What are the hypotheses for the
regression slope?
H a : 1  0
At α = 0.05, what is the conclusion? Since the p-value is small
(<0.0001), reject the null hypothesis. There is strong evidence of a
linear relationship between Weight and Day.
Copyright © 2012 Pearson Education. All rights reserved.
16-22
16.4 A Test for the Regression Slope
Example : Soap
A soap manufacturer tested a standard bar of soap to see how long
it would last. A test subject showered with the soap each day for 15
days and recorded the weight (in grams) remaining. Conditions
were met so a linear regression gave the following:
Dependent variable is: Weight
R squared = 99.5%
s = 2.949
Variable
Coefficient
SE(Coeff)
Intercept
123.141
1.382
Day
-5.57476
0.1068
t-ratio
89.1
-52.2
P-value
<0.0001
<0.0001
Find a 95% confidence interval for the slope?
Interpret the 95% confidence interval for the slope?
At α = 0.05, is the confidence interval consistent with the hypothesis
test conclusion?
Copyright © 2012 Pearson Education. All rights reserved.
16-23
16.4 A Test for the Regression Slope
Example : Soap
A soap manufacturer tested a standard bar of soap to see how long
it would last. A test subject showered with the soap each day for 15
days and recorded the weight (in grams) remaining. Conditions
were met so a linear regression gave the following:
Dependent variable is: Weight
R squared = 99.5%
s = 2.949
Variable
Coefficient
SE(Coeff)
Intercept
123.141
1.382
Day
-5.57476
0.1068
t-ratio
89.1
-52.2
P-value
<0.0001
<0.0001
Find a 95% confidence interval for the slope?
b1  t * SE (b1 )  5.57476  (2.160)(0.1068)  (5.805, 5.344)
Interpret the 95% confidence interval for the slope? We can be 95%
confident that weight of soap decreases by between 5.34 and 5.8
grams per day.
At α = 0.05, is the confidence interval consistent with the hypothesis
test conclusion? Yes, the interval does not contain zero, so reject
the null hypothesis.
Copyright © 2012 Pearson Education. All rights reserved.
16-24
16.5 A Hypothesis Test for Correlation
What if we want to test whether the correlation between
x and y is 0?
Copyright © 2012 Pearson Education. All rights reserved.
16-25
16.6 Standard Errors for Predicted Values
SE becomes larger the further xν
x
gets from . That
is, the
confidence interval broadens as
you move away from . (See
x
figure at right.)
Copyright © 2012 Pearson Education. All rights reserved.
16-26
16.6 Standard Errors for Predicted Values
SE, and the confidence interval, becomes smaller with
increasing n.
SE, and the confidence interval, are larger for samples with
more spread around the line (when se is larger).
Copyright © 2012 Pearson Education. All rights reserved.
16-27
16.6 Standard Errors for Predicted Values
Because of the extra term se2 , the
confidence interval for individual
values is broader that those for the
predicted mean value.
Copyright © 2012 Pearson Education. All rights reserved.
16-28
16.7 Using Confidence and Prediction
Intervals
Confidence interval for a mean:
ˆ  x   yˆ  x   tn*2 SE 2  b1    x  x 
2
se2

n
The result ˆ 10.1  4.55  0.15 at 95% means
“We are 95% confident that the mean value of
y is between 4.40 and 4.70 when x = 10.1.”
Copyright © 2012 Pearson Education. All rights reserved.
16-29
16.7 Using Confidence and Prediction
Intervals
Prediction interval for an individual value:
yˆ  x   yˆ  x   tn*2 SE 2  b1    x  x 
The result
2
se2
  se2
n
at 95% means
yˆ 10.1  4.55  0.60
“We are 95% confident that a single
measurement of y will be between 3.95
and 5.15 when x = 10.1.”
Copyright © 2012 Pearson Education. All rights reserved.
16-30
16.7 Using Confidence and Prediction
Intervals
Example : External Hard Disks
A study of external disk drives reveals a linear relationship between
the Capacity (in GB) and the Price (in $). Regression resulted in the
following:
Price  18.64  0.104Capacity,
s e  17.95,and SE(b1 )  0.0051
Find the predicted Price of a 1000 GB hard drive.
Find the 95% confidence interval for the mean Price of all 1000 GB
hard drives.
Find the 95% prediction interval for the Price of one 1000 GB hard
drive.
Copyright © 2012 Pearson Education. All rights reserved.
16-31
16.7 Using Confidence and Prediction
Intervals
Example : External Hard Disks
A study of external disk drives reveals a linear relationship between
the Capacity (in GB) and the Price (in $). Regression resulted in the
following:
Price  18.64  0.104Capacity,
s e  17.95,and SE(b1 )  0.0051
Find the predicted Price of a 1000 GB hard drive.
Price  18.64  0.104(1000)  122.64
Copyright © 2012 Pearson Education. All rights reserved.
16-32
16.7 Using Confidence and Prediction
Intervals
Example : External Hard Disks
A study of external disk drives reveals a linear relationship between
the Capacity (in GB) and the Price (in $). Regression resulted in the
following:
Price  18.64  0.104Capacity,
s e  17.95,and SE(b1 )  0.0051
Find the 95% confidence interval for the mean Price of all 1000 GB
hard drives.
2
ö x  yö x  t
*
n2
 
SE b1  x  x
2


2
 $122.64  2.571 0.00512  1000  1110
se

n

2
17.952

7
 $122.64  $17.50  ($105.14,$140.14)
Copyright © 2012 Pearson Education. All rights reserved.
16-33
16.7 Using Confidence and Prediction
Intervals
Example : External Hard Disks
A study of external disk drives reveals a linear relationship between the
Capacity (in GB) and the Price (in $). Regression resulted in the
following:
Price  18.64  0.104Capacity,
s e  17.95,and SE(b1 )  0.0051
Find the 95% prediction interval for the Price of one 1000 GB hard drive.
 
 
 
*
yö x  yö x  tn2
SE 2 b1  x  x


2
 122.64  2.571 0.00512  1000  1110
se2
  se2
n

2
0.00512

 0.00512
7
 $122.64  $1.44
Copyright © 2012 Pearson Education. All rights reserved.
16-34
 Don’t fit a linear regression to data that aren’t straight.
 Watch out for changing spread.
 Watch out for non-Normal errors. Check the histogram and
the Normal probability plot.
 Watch out for extrapolation. It is always dangerous to predict
for x-values that lie far away from the center of the data.
Copyright © 2012 Pearson Education. All rights reserved.
16-35
 Watch out for high-influence points and unusual observations.
 Watch out for one-tailed tests. Most software packages
perform only two-tailed tests. Adjust your P-values
accordingly.
Copyright © 2012 Pearson Education. All rights reserved.
16-36
What Have We Learned?
 Apply your understanding of inference for means using
Student’s t to inference about regression coefficients.
 Know the Assumptions and Conditions for inference about
regression coefficients and how to check them, in this order:
 Linearity
 Independence
 Equal Variance
 Normality
Copyright © 2012 Pearson Education. All rights reserved.
16-37
What Have We Learned?
 Know the components of the standard error of the
slope coefficient:
 y  yö
2
The standard deviation of the residuals, se 
n2
x  x 
2
The standard deviation of x, se 
n 1
The sample size, n
Copyright © 2012 Pearson Education. All rights reserved.
16-38
What Have We Learned?
 Be able to find and interpret the standard error of the
slope.

 The standard deviation of the residuals, SE b1 
se
sx n  1
 The standard error of the slope is the estimated
standard deviation of the sampling distribution of
the slope.
Copyright © 2012 Pearson Education. All rights reserved.
16-39
What Have We Learned?
 State and test the standard null hypothesis on the
slope.
 H0: β1 = 0. This would mean that x and y are not
linearly related.
 We test this null hypothesis using the t-statistic t 
Copyright © 2012 Pearson Education. All rights reserved.
b1  0

SE b1
16-40
What Have We Learned?
 Know how to use a t-test to test whether the true
correlation is zero.
n2
tr
1 r 2
 Construct and interpret a confidence interval for the
predicted mean value corresponding to a specified
value, xn.
 
 
*
yö x  tn2
 SE where SE ö 
Copyright © 2012 Pearson Education. All rights reserved.
 
SE 2 b1  x  x

2
se2

n
16-41
What Have We Learned?
 Construct and interpret a confidence interval for an
individual predicted value corresponding to a specified
value, xn.
 
 
*
yö x  tn2
 SE where SE yö 
Copyright © 2012 Pearson Education. All rights reserved.
 
SE 2 b1  x  x

2
se2
  se2
n
16-42