Download 1 Chapter 12.4: Estimation and Prediction for a New Value of x

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
1
Chapter 12.4: Estimation and Prediction for a New Value of x
Instructor: Dr. Arnab Maity
2
So far we have learned
• What is simple linear regression model (y = β0 + β1 x + e)
• How to interpret model parameters
• How to estimate model parameters (least squares)
• Inference for slope β1 (t-test and CI, ANOVA)
In this chapter, we will learn how to predict the value of y when we are given a value of x.
We will also learn how to do inference about the predicted value.
Example: Corrosion of steel reinforcing bars is the most important durability problem for
reinforced concrete structures. Carbonation of concrete results from a chemical reaction that
lowers the pH value by enough to initiate corrosion of the rebar. Representative data on x =
carbonation depth (mm) and y = strength (M Pa) for a sample of core specimens taken from
a particular building follows (read from a plot in the article “The Carbonation of Concrete
Structures in the Tropical Environment of Singapore,” Magazine of Concrete Res., 1996:
293-300). Data are provided in Example 12.13.
Simple linear regression results:
Dependent Variable: strength
Independent Variable: carbonation_depth
strength = 27.182936 - 0.29756123 carbonation_depth
Sample size: 18
R (correlation coefficient) = -0.87497382
R-sq = 0.76557918
Estimate of error standard deviation: 2.864026
Parameter estimates:
Parameter Estimate
Intercept
27.183
Slope
-0.298
Std. Err.
1.651
0.041
DF 95% L. Limit
16
23.682
16
-0.385
95% U. Limit
30.684
-0.210
3
Question: For a given value of the covariate x = 37, what is the expected value of y?
Recall that, the “true” regression model is y = β0 + xβ1 + e, where E(e) = 0. Hence we have
E(y) = β0 + xβ1 .
For a given value of the covariate x = 37, we use the formula above and see
E(y|x = 37) = β0 + 37 · β1 .
As before, we replace the unknown parameters with their corresponding least squares estimates β̂0 and β̂1 and obtain
Ê(y|x = 37) = β̂0 + 37 · β̂1 .
In this example, we have Ê(y|x = 37) = 27.183 + 37(−0.298) = 16.173.
Question: We have just obtained a point estimate. This does not tell us how precisely the
mean has been estimated. Can we construct a confidence interval?
Estimation of expected value of y for a given value of x = x∗
For a given x = x∗ , we want to estimate µy|x∗ = E(y|x∗ ) = β0 + x∗ β1 .
We can estimate µy|x∗ by µ̂y|x∗ = β̂0 + x∗ β̂1
• The estimator µ̂y|x∗ is a random variable as it will take different values based on different
samples.
• The mean value of µ̂y|x∗ is E(µ̂y|x∗ ) = β0 + x∗ β1
estimator for µy|x∗ .
(= µy|x∗ ). Therefore it is unbiased
• The variance of µ̂y|x∗ is
V (µ̂y|x∗ ) = σ
where SXX =
P
i (xi
2
1 (x∗ − x̄)2
+
,
n
SXX
− x̄)2 .
• The standard error of µ̂y|x∗ is
s
SE(µ̂y|x∗ ) =
• µ̂y|x∗ has a normal distribution.
σ̂ 2
1 (x∗ − x̄)2
+
.
n
SXX
4
Looking at the standard error of the µ̂y|x∗ we see that
• Large error variance σ 2 results in less accurate estimation (large SE) of µ̂y|x∗ .
• The estimator µ̂y|x∗ is more precise (small variance) if x∗ is near x̄ compared to the
values that are further from x̄.
Inferences concerning µ̂y|x∗
• The variable
T =
µ̂y|x∗ − µy|x∗
SE(µ̂y|x∗ )
has a t distribution with n − 2 degrees of freedom.
• A 100(1 − α)% confidence interval for µy|x∗ can be constructed as
µ̂y|x∗ ± tα/2,n−2 SE(µ̂y|x∗ ).
In practice, we can perform such prediction and construct such prediction intervals for a
grid of points for x. As a result we can obtain “point-wise” prediction band for the entire
regression line. For the corrosion study data, we see the results below.
Predicted values:
X value Pred. Y s.e.(Pred. y)
37
16.173171 0.67524719
95% C.I. for mean
(14.741711, 17.604631)
95% P.I. for new
(9.9352421, 22.411099)
From the confidence band (green lines around the regression line), we see that the interval
is narrower close to the center of x values (i.e., closer to x̄) compared to boundaries.
5
In many cases, we are often interested in obtaining an interval of plausible value of y associated with some future observation when the predictor variable has value x = x∗ . Notice
that now we are not estimating the mean of y given a value x∗ . Rather, we are trying to
predict a single value of y when x = x∗ .
Such a value is called a prediction of y for x = x∗ . An interval of such plausible values of y
is called an prediction interval.
Prediction of y for a given value of x = x∗
For a given x = x∗ , we can predict a future single value by ŷ = β̂0 + x∗ β̂1 .
• The variance of prediction error is
1 (x∗ − x̄)2
2
σ 1+ +
,
n
SXX
P
where SXX = i (xi − x̄)2 .
• A 100(1 − α)% prediction interval for a future y when x = x∗ is
s ∗ − x̄)2
1
(x
ŷ ± tα/2,n−2 σ̂ 2 1 + +
.
n
SXX
We see that
• The value of the estimated mean of y and the predicted value of y when x = x∗ are
same (both are β0 + x∗ β1 )
• The variability in the prediction is larger than the variability of the estimation of mean.
This results in the prediction interval being wider than the confidence interval.
• Similar to confidence interval for mean, we get better prediction accuracy (less variability) when x∗ is close to x̄.
We show the prediction results for corrosion data below.
6
1. Recall the Arsenic data example discussed in the previous lecture. We saw data on
x =pH and y = arsenic removed (%) by a particular process. Data for this example is
shown in the book (Example 12.2). Here are the StatCrunch output.
Summary statistics:
Column
n
Mean
Variance Std. dev.
Std. err.
pH
18 8.4833333 1.0159647 1.0079507 0.23757627
Arsenic removed 18 37.277778 365.74183 19.124378 4.5076591
Simple linear regression results:
Dependent Variable: Arsenic removed
Independent Variable: pH
Arsenic removed = 190.26829 - 18.034245 pH
Sample size: 18
R (correlation coefficient) = -0.95049529
R-sq = 0.9034413
Estimate of error standard deviation: 6.1255839
Parameter estimates:
Parameter Estimate
Intercept 190.26829
Slope
-18.034245
Std. Err.
12.587118
1.4739533
DF
16
16
95% L. Limit
163.58479
-21.158887
95% U. Limit
216.95179
-14.909604
(a) Estimate the mean arsenic removal percentage when pH = 8.5. and construct a
95% confidence interval. Hint: t0.025,16 = 2.12.
7
(b) Predict the arsenic removal percentage that would be observed in a future water
sample with pH = 7.5, and construct a 95% prediction interval.
(c) Would you recommend predicting arsenic removal percentage for a pH of 6.5?
Explain.
8
2. Carbonation of concrete results from a chemical reaction that lowers pH value by
enough to initiate corrosion steel reinforcing bars. Representative data on x = carbonation depth (mm) and yy = strength (MPa) for a sample of 18 core specimens
were taken (The Carbonation of Concrete Structures in the Tropical Environment of
Singapore, Magazine of Concrete Res., 1996, 293 – 300). Here are the StatCrunch
output.
Summary statistics:
Column
n
Mean
Carbonation depth 18 36.611111
Strength
18 16.288889
Simple linear regression results:
Dependent Variable: Strength
Independent Variable: Carbonation depth
Strength = 27.182936 - 0.29756123 Carbonation depth
Sample size: 18
R (correlation coefficient) = -0.87497382
R-sq = 0.76557918
Estimate of error standard deviation: 2.864026
Parameter estimates:
Parameter
Estimate
Intercept
27.182936
Slope
-0.29756123
Std. Err.
1.6513481
0.041164172
(a) Estimate the mean strength for all core specimens having a carbonation depth of
45 and construct a 95% confidence interval.
9
(b) Predict the strength of a single specimen having a carbonation depth of 45, and
construct a 95% prediction interval.
Related documents