Download Lesson 8 Chapter 7: Confidence and Prediction Intervals

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

German tank problem wikipedia , lookup

Least squares wikipedia , lookup

Linear regression wikipedia , lookup

Regression analysis wikipedia , lookup

Confidence interval wikipedia , lookup

Transcript
Outline
Introduction to Confidence Intervals
CIs for a Mean and a Proportion
CIs for the Regression Parameters
The issue of Precision
Prediction Intervals
Lesson 8
Chapter 7: Confidence and Prediction Intervals
Michael Akritas
Department of Statistics
The Pennsylvania State University
Michael Akritas
Lesson 8 Chapter 7: Confidence and Prediction Intervals
Outline
Introduction to Confidence Intervals
CIs for a Mean and a Proportion
CIs for the Regression Parameters
The issue of Precision
Prediction Intervals
1
Introduction to Confidence Intervals
2
CIs for a Mean and a Proportion
3
CIs for the Regression Parameters
4
The issue of Precision
5
Prediction Intervals
Michael Akritas
Lesson 8 Chapter 7: Confidence and Prediction Intervals
Outline
Introduction to Confidence Intervals
CIs for a Mean and a Proportion
CIs for the Regression Parameters
The issue of Precision
Prediction Intervals
Bounding the Error of Estimation
b are (at least
By the CLT, if n is large, most estimators, θ,
approximately) normally distributed, with mean equal to the
true value, θ, of the parameter they estimate.
Thus,
·
θ̂ ∼ N θ, σθ̂2 .
The above fact provides probabilistic bounds on the size
of the estimation error:
θ̂ − θ ≤ 1.96σθ̂ holds 95% of the time.
Michael Akritas
Lesson 8 Chapter 7: Confidence and Prediction Intervals
Outline
Introduction to Confidence Intervals
CIs for a Mean and a Proportion
CIs for the Regression Parameters
The issue of Precision
Prediction Intervals
From Error Bounds to Confidence Intervals
The probabilistic error bound, can be re-written as
θ̂ − 1.96σθ̂ ≤ θ ≤ θ̂ + 1.96σθ̂ ,
i.e., an interval of plausible values for θ, with degree of
plausibility approximately 95%.
Such intervals are called confidence intervals (CI).
In general, the 100(1 − α)% CI is of the form
θ̂ − zα/2 σθ̂ ≤ θ ≤ θ̂ + zα/2 σθ̂ , or θ̂ ± zα/2 σθ̂ .
Michael Akritas
Lesson 8 Chapter 7: Confidence and Prediction Intervals
Outline
Introduction to Confidence Intervals
CIs for a Mean and a Proportion
CIs for the Regression Parameters
The issue of Precision
Prediction Intervals
Z Intervals
100(1 − α)% CIs that use percentiles of the standard
normal distribution, zα/2 , as above, are called Z intervals.
Z intervals for the mean require known variance, and
either the assumption of normality or n ≥ 30. Typically, the
variance is not known.
Z intervals will be primarily used for proportions.
Michael Akritas
Lesson 8 Chapter 7: Confidence and Prediction Intervals
Outline
Introduction to Confidence Intervals
CIs for a Mean and a Proportion
CIs for the Regression Parameters
The issue of Precision
Prediction Intervals
The T Distribution and T Intervals
When sampling from normal populations, an estimator θb of
some parameter θ often satisfies, for all sample sizes n,
θb − θ
∼ Tν , where σ
bθ̂ is the estimated standard error,
σ
bθ̂
and Tν stands for T distribution with ν degrees of freedom.
A T distribution is symmetric and its pdf tends to that of the
standard normal as ν tends to infinity.
The 100(1 − α/2)th percentile of the T distribution with ν
degrees of freedom will be denoted by tν,α/2 .
Michael Akritas
Lesson 8 Chapter 7: Confidence and Prediction Intervals
Outline
Introduction to Confidence Intervals
CIs for a Mean and a Proportion
CIs for the Regression Parameters
The issue of Precision
Prediction Intervals
p.d.f of the t-distr. with
ν degrees of freedom
area=
α
t ν,α
Figure: PDF and Percentile of a T Distribution.
As the DF ν gets large, tν,α/2 approaches zα/2 .
For example, for ν = 9, 19, 60 and 120, tν,0.05 is:
1.833, 1.729, 1.671, 1.658,
respectively, while z0.05 = 1.645.
Michael Akritas
Lesson 8 Chapter 7: Confidence and Prediction Intervals
Outline
Introduction to Confidence Intervals
CIs for a Mean and a Proportion
CIs for the Regression Parameters
The issue of Precision
Prediction Intervals
Plots of N(0, 1) and T densities in R
http:
//stat.psu.edu/˜mga/401/fig/ComparTdensit.pdf
Michael Akritas
Lesson 8 Chapter 7: Confidence and Prediction Intervals
Outline
Introduction to Confidence Intervals
CIs for a Mean and a Proportion
CIs for the Regression Parameters
The issue of Precision
Prediction Intervals
Relation θ−θ
σ
bθ̂ ∼ tν , which also holds approximately when
sampling non-normal populations provided n ≥ 30, leads
to the following 1 − α bound on the error of estimation of θ
b
θ
−
θ
bθ̂ ,
≤ tν,α/2 σ
b
This error bound leads to the following (1 − α)100% CI for
θ:
b
b
θ − tν,α/2 σ
bθ̂ , θ + tν,α/2 σ
bθ̂
.
(2.1)
T intervals will be used for the mean, as well as for the
regression parameters in the linear regression model.
Michael Akritas
Lesson 8 Chapter 7: Confidence and Prediction Intervals
Outline
Introduction to Confidence Intervals
CIs for a Mean and a Proportion
CIs for the Regression Parameters
The issue of Precision
Prediction Intervals
Read Section 7.2 CI Semantics: The Meaning of “Confidence”
Michael Akritas
Lesson 8 Chapter 7: Confidence and Prediction Intervals
0
10
20
CI count
30
40
50
Outline
Introduction to Confidence Intervals
CIs for a Mean and a Proportion
CIs for the Regression Parameters
The issue of Precision
Prediction Intervals
0.2
0.4
0.6
0.8
End points of CIs
Figure: 50 CIs for p.
Michael Akritas
Lesson 8 Chapter 7: Confidence and Prediction Intervals
Outline
Introduction to Confidence Intervals
CIs for a Mean and a Proportion
CIs for the Regression Parameters
The issue of Precision
Prediction Intervals
T CIs for the Mean: Proposition
Let X1 , . . . , Xn be a simple r.s. from a population with mean µ
and variance σ 2 , both unknown. Then
X −µ
√ ∼ tn−1
S/ n
(3.1)
holds exactly for any n if the population is normal, and holds
approximately for non-normal populations provided n ≥ 30.
Michael Akritas
Lesson 8 Chapter 7: Confidence and Prediction Intervals
Outline
Introduction to Confidence Intervals
CIs for a Mean and a Proportion
CIs for the Regression Parameters
The issue of Precision
Prediction Intervals
The Proposition yields the 1 − α error bound
S
X
−
µ
≤ tn−1,α/2 √
n
which leads to the following (1 − α)100% CI for the mean:
S
S
X − tn−1,α/2 √ , X + tn−1,α/2 √ .
n
n
Michael Akritas
Lesson 8 Chapter 7: Confidence and Prediction Intervals
Outline
Introduction to Confidence Intervals
CIs for a Mean and a Proportion
CIs for the Regression Parameters
The issue of Precision
Prediction Intervals
Example
The mean weight loss of n = 16 grinding balls after a certain
length of time in mill slurry is 3.42g with S = 0.68g. Construct a
99% CI for the true mean weight loss.
Solution. Because n < 30 we must assume that the (statistical)
population of the mean weight loss is normal. (In a real life
application, the normality assumption should be verified by the
histogram or the Q-Q plot of the data.) For α = 0.01 and
n − 1 = 15 DF, Table A.4 gives tn−1,α/2 = t15,0.005 = 2.947.
Thus the desired 99% CI for µ is
√
3.42 ± 2.947(0.68/ 16), or 2.92 < µ < 3.92.
Michael Akritas
Lesson 8 Chapter 7: Confidence and Prediction Intervals
Outline
Introduction to Confidence Intervals
CIs for a Mean and a Proportion
CIs for the Regression Parameters
The issue of Precision
Prediction Intervals
R command for the t-interval for the mean
With data in x, the commands
”lm(x ∼ 1), confint(lm(x ∼ 1))”
S
will return X , and pair of values X ± tn−1,0.025 √ , which is
n
the 95% CI for µ.
For the 90% CI of µ use
”confint(lm(x ∼ 1), level=0.9)”
S
(∗ ) The pair of values X ± tn−1,0.025 √ can also be gotten
n
as
”mean(x) ± qt(0.975,df=length(x)-1)*sd(x)/sqrt(length(x))”
Michael Akritas
Lesson 8 Chapter 7: Confidence and Prediction Intervals
Outline
Introduction to Confidence Intervals
CIs for a Mean and a Proportion
CIs for the Regression Parameters
The issue of Precision
Prediction Intervals
Z CIs for Proportions
CIs for p, however, are slightly different due to:
1
We are typically given only T = X1 + · · · + Xn , or X = p̂.
2
b(1 − p
b).
σ 2 is estimated by p
3
We use the percentiles from the normal distribution, using
the approximate result
·
b(1 − p
b)/n ,
b∼
p
N p, p
b ≥ 8 and n(1 − p
b) ≥ 8, i.e. at least eight 1s
which holds if np
and at least eight 0s.
Michael Akritas
Lesson 8 Chapter 7: Confidence and Prediction Intervals
Outline
Introduction to Confidence Intervals
CIs for a Mean and a Proportion
CIs for the Regression Parameters
The issue of Precision
Prediction Intervals
The above approximate distribution of p̂ leads to the
(approximate) (1 − α)100% CI
r
b(1 − p
b)
p
b ± zα/2
p
.
n
Michael Akritas
Lesson 8 Chapter 7: Confidence and Prediction Intervals
Outline
Introduction to Confidence Intervals
CIs for a Mean and a Proportion
CIs for the Regression Parameters
The issue of Precision
Prediction Intervals
Example
A Gallup Survey estimated the proportion of adults across the
country who drink beer, wine, or hard liquor, at least
occasionally. Of the 1516 adults interviewed, 985 said they
drank. Find a 95% confidence interval for the proportion, p, of
all Americans who drink.
Solution: Here α = 0.05, and z0.025 = 1.96. Thus
r
985
0.65 × 0.35
± 1.96
= 0.65 ± 0.024
1516
1516
QUESTION: An interpretation of the above CI is that the
probability is 0.95 that the true proportion of adults who drink
lies in the interval you obtained. True or False?
Michael Akritas
Lesson 8 Chapter 7: Confidence and Prediction Intervals
Outline
Introduction to Confidence Intervals
CIs for a Mean and a Proportion
CIs for the Regression Parameters
The issue of Precision
Prediction Intervals
R command for z-intervals for a proportion
With T being the number of ”successes” in n trials, set
”phat=T/n” and use the commands
”phat ± qnorm(0.975)*sqrt(phat*(1-phat)/n)”
to obtain the
q 95% CI for p, i.e. the pair of values
b ± z0.025
p
b
p(1−b
p)
.
n
To obtain 90% or other CIs, adjust the 0.975 in the above
command accordingly.
Michael Akritas
Lesson 8 Chapter 7: Confidence and Prediction Intervals
Outline
Introduction to Confidence Intervals
CIs for a Mean and a Proportion
CIs for the Regression Parameters
The issue of Precision
Prediction Intervals
T CIs for the Slope of a Regression Line: Proposition
Let (X1 , Y1 ), . . . , (Xn , Yn ), be iid satisfying E(Yi |Xi = x) = α1
+β1 x, and Var(Yi |Xi = x) = σε2 , same for all x. Then,
v
u
u
S2
σ
bβ̂1 = u
, where
tP
1 P 2
2
Xi − ( Xi )
n
" n
#
n
n
X
X
X
1
Yi2 − α̂1
Yi − β̂1
Xi Yi .
S2 =
n−2
i=1
i=1
i=1
is the estimator of the intrinsic variability. NOTE: σ
bβ̂1 is also
denoted by Sβ̂1 .
Michael Akritas
Lesson 8 Chapter 7: Confidence and Prediction Intervals
Outline
Introduction to Confidence Intervals
CIs for a Mean and a Proportion
CIs for the Regression Parameters
The issue of Precision
Prediction Intervals
We saw that under the normality assumption,
βb1 − β1
∼ tn−2 .
σ
bβ̂1
This leads to the 100(1 − α)% error bound
|βb1 − β1 | < tn−2,α/2 σ
bβ̂1 ,
and corresponding 100(1 − α)% CI for β1 of:
b
b
β1 − tn−2,α/2 σ
bβ̂1 , β1 + tn−2,α/2 σ
bβ̂1
Michael Akritas
Lesson 8 Chapter 7: Confidence and Prediction Intervals
Outline
Introduction to Confidence Intervals
CIs for a Mean and a Proportion
CIs for the Regression Parameters
The issue of Precision
Prediction Intervals
Example (Y =propagation of stress wave, X =tensile strength)
P
P 2
In
this
study,
n
=
14,
X
=
890,
i
i
P
P 2
Pi Xi = 67, 182,
i Yi = 37.6,
i Yi = 103.54 and
i Xi Yi = 2234.30. Let Y1
denote an observation made at X1 = 30, and Y2 denote an
observation at X2 = 35. Construct a 95% CI for E(Y1 − Y2 ).
Solution. Note that E(Y1 − Y2 ) = −5β1 . We will first construct a
95% CI for β1 . We have: βb1 = −0.0147209, α
b1 = 3.6209072,
and S 2 = 0.02187. Thus,
v
s
u
2
u
S
0.02187
=
= 0.001414,
σ
bβb = u
1
t
2
1
P 2 1 P 2
67,
182
−
890
14
Xi − ( Xi )
n
Michael Akritas
Lesson 8 Chapter 7: Confidence and Prediction Intervals
Outline
Introduction to Confidence Intervals
CIs for a Mean and a Proportion
CIs for the Regression Parameters
The issue of Precision
Prediction Intervals
Example (Continued)
so that, the 95% CI for β1 is
βb1 ± t0.025,12 σ
bβb
1
= −0.0147209 ± 2.179 × 0.001414
= −0.0147209 ± 0.00308 = (−0.0178, −0.01164).
The 95% CI for −5β1 follows now easily:
−5βb1 ± 5tα/2,n−2 σ
bβb = 5(0.0147209) ± 5 × 2.179 × 0.001414.
1
Michael Akritas
Lesson 8 Chapter 7: Confidence and Prediction Intervals
Outline
Introduction to Confidence Intervals
CIs for a Mean and a Proportion
CIs for the Regression Parameters
The issue of Precision
Prediction Intervals
T CIs for the Regression Line
Let (X1 , Y1 ), . . . , (Xn , Yn ), be iid satisfying E(Yi |Xi = x) = α1
+β1 x, and Var(Yi |Xi = x) = σ 2 , same for all x. Then,
s
1
n(x − X )2
P
+ P 2
σ
bµ̂Y |X =x = S
, where
n n Xi − ( Xi )2
S2
" n
#
n
n
X
X
X
1
2
=
Yi − α̂1
Yi − β̂1
Xi Yi .
n−2
i=1
i=1
i=1
is the estimator of the intrinsic variability. NOTE: σ
bµ̂Y |X =x is also
denoted by Sµ̂Y |X =x .
Michael Akritas
Lesson 8 Chapter 7: Confidence and Prediction Intervals
Outline
Introduction to Confidence Intervals
CIs for a Mean and a Proportion
CIs for the Regression Parameters
The issue of Precision
Prediction Intervals
We saw that under the normality assumption,
µ
bY |X =x − µY |X =x
∼ tn−2 .
σ
bµ̂Y |X =x
This leads to the 100(1 − α)% error bound
|b
µY |X =x − µY |X =x | < tn−2,α/2 σ
bµ̂Y |X =x ,
and corresponding 100(1 − α)% CI for µY |X =x of:
µ
bY |X =x − tn−2,α/2 σ
bµ̂Y |X =x , µ
bY |X =x − tn−2,α/2 σ
bµ̂Y |X =x
Michael Akritas
Lesson 8 Chapter 7: Confidence and Prediction Intervals
Outline
Introduction to Confidence Intervals
CIs for a Mean and a Proportion
CIs for the Regression Parameters
The issue of Precision
Prediction Intervals
The n(x − X̄ )2 in the expression of σ
bµ̂Y |X =x , means that
confidence intervals for µY |X =x get wider as x get farther away
from X .
Figure: Confidence Intervals for µY |X =x Get Wider Away from X
Estimation of µY |X =x for x < X(1) or x > X(n) is NOT
recommended.
Michael Akritas
Lesson 8 Chapter 7: Confidence and Prediction Intervals
Outline
Introduction to Confidence Intervals
CIs for a Mean and a Proportion
CIs for the Regression Parameters
The issue of Precision
Prediction Intervals
Example
P
P
n
data points yield
Xi = 292.90,
Yi = 69.03,
P= 11
P
P
Xi2 = 8141.75, Xi Yi = 1890.2, Yi2 = 442.1903,
µ
bY |X =x = 2.22494 + .152119x, and S = 0.3444. Construct 95%
CIs for µY |X =26.627 and µY |X =25 . [Note that X = 26.627.]
Solution: First,
s
SµbY |X =x = 0.3444
1
11(x − 26.627)2
+
,
11 11(8141.75) − (292.9)2
so that, Sµ̂Y |X =26.627 = 0.1038, and SµbY |X =25 = 0.1082. Thus,
µ
bY |X =25 ± t.025,9 0.1082 = 6.028 ± 0.245,
µ
bY |X =26.627 ± t.025,9 0.1038 = 6.275 ± 0.235,
Michael Akritas
CI at X = 25
CI at X = 26.627.
Lesson 8 Chapter 7: Confidence and Prediction Intervals
Outline
Introduction to Confidence Intervals
CIs for a Mean and a Proportion
CIs for the Regression Parameters
The issue of Precision
Prediction Intervals
R commands for CIs in regression
CIs for the intercept and slope:
”confint(lm(y ∼ x))” (or ”confint(lm(y ∼ x),level=0.95)”) gives
95% CIs for both α1 and β1 .
”confint(lm(y ∼ x),level=0.90)” gives 90% CIs for both α1
and β1 .
”confint(lm(y ∼ x),parm=”x”,level=0.90)” gives 90% CI only
for β1 .
”confint(lm(y ∼ x),parm=”(Intercept)”,level=0.90)” gives
90% CI only for α1 .
CIs for µY |X (x) at, e.g., x = 5.5: ”newx=data.frame(x=5.5)”,
”predict(lm(y ∼ x), newx, interval=”confidence”,level=0.9)”
Use ”newx=data.frame(x=c(4.5,5.5))” above for multiple
CIs.
Michael Akritas
Lesson 8 Chapter 7: Confidence and Prediction Intervals
Outline
Introduction to Confidence Intervals
CIs for a Mean and a Proportion
CIs for the Regression Parameters
The issue of Precision
Prediction Intervals
Generalities
Precision in estimation is quantified by the size of the
probabilistic error bound, or by the length of the CI.
Error bounds are of the form
S
X
−
µ
≤ tn−1,α/2 √ (unknown σ, normal case, or n > 30)
n
r
p̂(1 − p̂)
|p̂ − p| ≤ zα/2
(np̂ ≥ 8, n(1 − p̂) ≥ 8).
n
Thus error bounds depend on n, and α since, e.g.,
z.05 = 1.645 < z.025 = 1.96 < z.005 = 2.575
In improving precision, we do not want to adjust α.
Michael Akritas
Lesson 8 Chapter 7: Confidence and Prediction Intervals
Outline
Introduction to Confidence Intervals
CIs for a Mean and a Proportion
CIs for the Regression Parameters
The issue of Precision
Prediction Intervals
The Ideal Case: σ known
To construct a (1 − α)100% CI having a prescribed length
of L, the sample size n is found by solving the equation
σ
2zα/2 √ = L.
n
The solution is:
n=
σ
2zα/2
L
2
.
If the solution is not an integer (as is typically the case), the
number is rounded up. Rounding up guarantees that the
prescribed precision objective will be more than met.
Michael Akritas
Lesson 8 Chapter 7: Confidence and Prediction Intervals
Outline
Introduction to Confidence Intervals
CIs for a Mean and a Proportion
CIs for the Regression Parameters
The issue of Precision
Prediction Intervals
The Ideal Case: An Example
Example
The time to response (in milliseconds) to an editing command
with a new operating system is normally distributed with an
unknown mean µ and σ = 25. We want a 95% CI for µ of length
L = 10 milliseconds. What sample size n should be used?
Solution. For 95% CI, α/2 = .025 and z.025 = 1.96. Thus
25 2
= 96.04,
n = 2 · (1.96)
10
which is rounded up to n = 97.
Michael Akritas
Lesson 8 Chapter 7: Confidence and Prediction Intervals
Outline
Introduction to Confidence Intervals
CIs for a Mean and a Proportion
CIs for the Regression Parameters
The issue of Precision
Prediction Intervals
The Realistic Case: σ unknown
Sample size determination must rely a preliminary
approximation, Sprl , of σ. Two common methods are:
1
If the range of population values is known, use
Sprl =
range
range
, or Sprl =
.
3.5
4
This approximation is inspired by the standard deviation of
a U(a, b) random variable, which is σ = (b − a)/3.464.
2
Use the standard deviation, Sprl , of a preliminary sample.
This is somewhat cumbersome because it requires some
trial-and-error iterations.
Michael Akritas
Lesson 8 Chapter 7: Confidence and Prediction Intervals
Outline
Introduction to Confidence Intervals
CIs for a Mean and a Proportion
CIs for the Regression Parameters
The issue of Precision
Prediction Intervals
Sampe Size Determination for Estimating p
Equating the length of the (1 − α)100% CI for p to L and
solving for n gives the solution is:
n=
2 p̂(1 − p̂)
4zα/2
L2
.
Round up.
Two commonly used methods for obtaining a preliminary
approximation, p̂prl are:
1
2
Obtain p̂prl either from a small pilot sample or from expert
opinion, and use it in the above formula.
Replace p̂(1 − p̂) in the formula by 0.25. This gives
2
n = zα/2
/L2 .
Michael Akritas
Round up.
Lesson 8 Chapter 7: Confidence and Prediction Intervals
Outline
Introduction to Confidence Intervals
CIs for a Mean and a Proportion
CIs for the Regression Parameters
The issue of Precision
Prediction Intervals
Example
A preliminary sample gave p̂prl = 0.91. How large should n be
to estimate the probability of interest to within 0.01 with 95%
confidence?
Solution. “To within 0.01” is another way of saying that the 95%
bound on the error of estimation should be 0.01, or the desired
CI should have a width of 0.02. Since we have preliminary
information, we use the first formula:
n=
4(1.96)2 (0.91)(0.09)
= 3146.27.
(.02)2
This is rounded up to 3147.
Michael Akritas
Lesson 8 Chapter 7: Confidence and Prediction Intervals
Outline
Introduction to Confidence Intervals
CIs for a Mean and a Proportion
CIs for the Regression Parameters
The issue of Precision
Prediction Intervals
Example
A new method of pre-coating fittings used in oil, brake and
other fluid systems in heavy-duty trucks is being studied. How
large n is needed to estimate the proportion of fittings that leak
to within .02 with 90% confidence? (No prior info available).
Solution. Here we have no preliminary information about p.
Thus, we apply the second formula and we obtain
2
n = zα/2
/L2 = (1.645)2 /(.04)2 = 1691.26.
This is rounded up to 1692.
Michael Akritas
Lesson 8 Chapter 7: Confidence and Prediction Intervals
Outline
Introduction to Confidence Intervals
CIs for a Mean and a Proportion
CIs for the Regression Parameters
The issue of Precision
Prediction Intervals
Prediction refers to estimating an observation. It is related
to estimating the mean, but prediction intervals (PIs) are
different from CIs. For example
1
2
Predicting the fat content of the hot dog you are about to
eat is related to estimating the mean fat content of hot dog.
But the PI is different from the CI.
Predicting the failure time of your resistor from its
resistance is related to estimating the mean failure time of
all resistors having the same resistance. But the PI is
different from the CI.
In the first example, there was no explanatory variable.
The second example involves a regression context.
We begin with the case of no explanatory variable.
Michael Akritas
Lesson 8 Chapter 7: Confidence and Prediction Intervals
Outline
Introduction to Confidence Intervals
CIs for a Mean and a Proportion
CIs for the Regression Parameters
The issue of Precision
Prediction Intervals
Prediction Based on a Univariate Sample
To emphasize the difference between PIs and CIs,
suppose that the amount of fat in a randomly selected hot
dog is N(20, 9). Thus there are no unknown parameters to
be estimated, and no need to construct a CI.
Still the amount of fat, X , in the hot dog which one is about
to eat is unknown, simply because it is a random variable.
According to well-accepted criteria, the best point-predictor
of a normal random variable with mean µ, is µ.
A (1 − α)100% PI is an interval that contains the r.v. with
probability 1 − α. Namely: µ ± zα/2 σ.
In the hot dog example, X ∼ N(20, 9), so the best point
predictor of X is 20 and a 95% PI is 20 ± (1.96)3.
Michael Akritas
Lesson 8 Chapter 7: Confidence and Prediction Intervals
Outline
Introduction to Confidence Intervals
CIs for a Mean and a Proportion
CIs for the Regression Parameters
The issue of Precision
Prediction Intervals
Typically, µ, σ are unknown and are estimated from a
sample X1 , . . . , Xn by X , S, respectively.
Then, the best point predictor of a future observation, is X .
The PI, however, must now take into account the variability
of X , S as estimators of µ, σ.
Assuming normality, the (1 − α)100% PI for the next X is:
r
r
1
1
X − tα/2,n−1 S 1 + , X + tα/2,n−1 S 1 +
.
n
n
1
, and the
n
variability of S is accounted for by the use of the
t-percentiles.
The variability of X is accounted for by the
Michael Akritas
Lesson 8 Chapter 7: Confidence and Prediction Intervals
Outline
Introduction to Confidence Intervals
CIs for a Mean and a Proportion
CIs for the Regression Parameters
The issue of Precision
Prediction Intervals
Example
The fat content measurements from a sample of size n = 10 hot dogs,
gave sample mean and sample standard deviation of X = 21.9, and
S = 4.134. Give a 95% PI for the fat content, X , of the next hot dog
to be sampled.
Solution: Assuming that the fat content of a randomly selected hot
dog has the normal distribution, the best point predictor of X is
X = 21.9 and the 95% PI is
r
1
X ± t.025,9 S 1 + = (12.09, 31.71).
n
Michael Akritas
Lesson 8 Chapter 7: Confidence and Prediction Intervals
Outline
Introduction to Confidence Intervals
CIs for a Mean and a Proportion
CIs for the Regression Parameters
The issue of Precision
Prediction Intervals
PIs for the Normal Simple Linear Regression Model
Let (X1 , Y1 ), . . . , (Xn , Yn ) be n observations that follow the
normal simple linear regression model, i.e.
Yi |Xi = xi ∼ N(α1 + β1 xi , σ 2 ).
The point predictor for a future observation Y made at
X = x is µ
bY |X =x = α
b1 + βb1 x.
The 100(1 − α)% PI is
s
µ
bY |X =x ± tα/2,n−2 S
Michael Akritas
1+
1
n(x − X )2
P
+ P 2
.
n n Xi − ( Xi )2
Lesson 8 Chapter 7: Confidence and Prediction Intervals
Outline
Introduction to Confidence Intervals
CIs for a Mean and a Proportion
CIs for the Regression Parameters
The issue of Precision
Prediction Intervals
Example
P
Consider
again the
where n =P
11, Xi = 292.90,
P
P study
2
P Yi2= 69.03, Xi = 8141.75, Xi Yi = 1890.200,
Yi = 442.1903, µ
bY |X = 2.22494 + .152119X , and
S = 0.3444. Construct a 95% PI for a future observation, made at
X = 25.
Solution. The point predictor is µ
bY |X =25 = 6.028, and the 95% PI at
X = 25 is 6.028 ± 0.8165, as obtained from the formula
s
1
11(1.627)2
P 2
P
+
.
µ
bY |X =25 ± t.025,9 (0.344) 1 +
11 11 Xi − ( Xi )2
The 95% CI for µY |X =25 was found to be 6.028 ± 0.245. This
demonstrates that PIs are wider than CIs.
Michael Akritas
Lesson 8 Chapter 7: Confidence and Prediction Intervals