Download Chapter 7 PPT

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Central limit theorem wikipedia , lookup

Transcript
Basic Properties of Confidence Intervals
The basic concepts and properties of confidence intervals
(CIs) are most easily introduced by first focusing on a
simple, albeit somewhat unrealistic, problem situation.
Suppose that the parameter of interest is a population
mean  and that
1. The population distribution is normal
2. The value of the population standard deviation  is
known
Normality of the population distribution is often a
reasonable assumption.
1
Basic Properties of Confidence Intervals
However, if the value of  is unknown, it is typically
implausible that the value of  would be available
(knowledge of a population’s center typically precedes
information concerning spread).
The actual sample observations x1, x2, …, xn are assumed
to be the result of a random sample X1, …, Xn from a
normal distribution with mean value  and standard
deviation .
2
Basic Properties of Confidence Intervals
Irrespective of the sample size n, the sample mean X is
normally distributed with expected value  and standard
deviation
Standardizing X by first subtracting its expected value
and then dividing by its standard deviation yields the
standard normal variable
(7.1)
3
Basic Properties of Confidence Intervals
Because the area under the standard normal curve
between –1.96 and 1.96 is .95,
(7.2)
Now let’s manipulate the inequalities inside the
parentheses in (7.2) so that they appear in the equivalent
form l < < , where the endpoints l and u involve X and
This is achieved through the following sequence of
operations, each yielding inequalities equivalent to the
original ones.
4
Basic Properties of Confidence Intervals
1. Multiply through by
2. Subtract X from each term:
5
Basic Properties of Confidence Intervals
3. Multiply through by –1 to eliminate the minus sign in front
of  (which reverses the direction of each inequality):
that is,
(7.3)
6
Basic Properties of Confidence Intervals
This CI can be expressed either as
or as
7
Interpreting a Confidence Level
But by substituting x = 80.0 for X, all randomness
disappears; the interval (79.3, 80.7) is not a random
interval, and  is a constant (unfortunately unknown to us).
It is therefore incorrect to write the statement
P( lies in (79.3, 80.7)) = .95.
A correct interpretation of “95% confidence” relies on the
long-run relative frequency interpretation of probability: To
say that an event A has probability .95 is to say that if the
experiment on which A is defined is performed over and
over again, in the long run A will occur 95% of the time.
8
Interpreting a Confidence Level
This is illustrated in Figure 7.3, where the vertical line cuts
the measurement axis at the true (but unknown) value of .
One hundred 95% CIs (asterisks identify intervals that do not include ).
Figure 7.3
9
Interpreting a Confidence Level
Notice that 7 of the 100 intervals shown fail to contain .
In the long run, only 5% of the intervals so constructed
would fail to contain .
According to this interpretation, the confidence level 95% is
not so much a statement about any particular interval such
as (79.3, 80.7).
Instead it pertains to what would happen if a very large
number of like intervals were to be constructed using the
same CI formula.
10
Interpreting a Confidence Level
Although this may seem unsatisfactory, the root of the
difficulty lies with our interpretation of probability—it applies
to a long sequence of replications of an experiment rather
than just a single replication.
There is another approach to the construction and
interpretation of CIs that uses the notion of subjective
probability and Bayes’ theorem, but the technical details
are beyond the scope of this text; the book by DeGroot,
et al. is a good source.
11
Other Levels of Confidence
As Figure 7.4 shows, a probability of 1 –  is achieved by
using z/2 in place of 1.96.
P(–z/2  Z < z/2) = 1 – 
Figure 7.4
12
Other Levels of Confidence
Definition
A 100(1 – )% confidence interval for the mean  of a
normal population when the value of  is known is given by
(7.5)
or, equivalently, by
The formula (7.5) for the CI can also be expressed in words
as point estimate of   (z critical value) (standard error of
the mean).
13
Confidence Level, Precision, and Sample Size
Why settle for a confidence level of 95% when a level of
99% is achievable? Because the price paid for the higher
confidence level is a wider interval.
Since the 95% interval extends 1.96 
x, the width of the interval is 2(1.96) 
to each side of
= 3.92 
.
Similarly, the width of the 99% interval is
2(2.58) 
= 5.16 
.
That is, we have more confidence in the 99% interval
precisely because it is wider. The higher the desired degree
of confidence, the wider the resulting interval will be.
14
Confidence Level, Precision, and Sample Size
If we think of the width of the interval as specifying its
precision or accuracy, then the confidence level (or
reliability) of the interval is inversely related to its precision.
A highly reliable interval estimate may be imprecise in that
the endpoints of the interval may be far apart, whereas a
precise interval may entail relatively low reliability.
Thus it cannot be said unequivocally that a 99% interval is
to be preferred to a 95% interval; the gain in reliability
entails a loss in precision.
15
Confidence Level, Precision, and Sample Size
A general formula for the sample size n necessary to
ensure an interval width w is obtained from equating w to
2  z/2 
and solving for n.
The sample size necessary for the CI (7.5) to have a width
w is
The smaller the desired width w, the larger n must be. In
addition, n is an increasing function of  (more population
variability necessitates a larger sample size) and of
the confidence level 100(1 – ) (as  decreases, z/2
increases).
16
Confidence Level, Precision, and Sample Size
The half-width 1.96
of the 95% CI is sometimes called
the bound on the error of estimation associated with a
95% confidence level.
That is, with 95% confidence, the point estimate x will be
no farther than this from .
Before obtaining data, an investigator may wish to
determine a sample size for which a particular value of the
bound is achieved.
17
Large-Sample Confidence Intervals for a Population Mean and Proportion
Earlier we have come across the CI for  which assumed
that the population distribution is normal with the value of 
known.
We now present a large-sample CI whose validity does not
require these assumptions. After showing how the
argument leading to this interval generalizes to yield other
large-sample intervals, we focus on an interval for a
population proportion p.
18
A Large-Sample Interval for 
Let X1, X2, . . . , Xn be a random sample from a population
having a mean  and standard deviation . Provided that n
is large, the Central Limit Theorem (CLT) implies that has
approximately a normal distribution whatever the nature of
the population distribution.
It then follows that
has approximately a
standard normal distribution, so that
19
A Large-Sample Interval for 
We have know that an argument parallel yields
as a large-sample CI for  with a confidence level of
approximately 100(1 – )%. That is, when n is large, the CI
for  given previously remains valid whatever the
population distribution, provided that the qualifier
“approximately” is inserted in front of the confidence level.
A practical difficulty with this development is that
computation of the CI requires the value of , which will
rarely be known. Consider the standardized variable
, in which the sample standard deviation S has
replaced .
20
A Large-Sample Interval for 
Previously, there was randomness only in the numerator of
Z by virtue of . In the new standardized variable, both
and S vary in value from one sample to another. So it
might seem that the distribution of the new variable should
be more spread out than the z curve to reflect the extra
variation in the denominator. This is indeed true when n is
small.
However, for large n the subsititution of S for  adds little
extra variability, so this variable also has approximately a
standard normal distribution. Manipulation of the variable in
a probability statement, as in the case of known , gives a
general large-sample CI for .
21
A Large-Sample Interval for 
Proposition
If n is sufficiently large, the standardized variable
has approximately a standard normal distribution. This
implies that
(7.8)
is a large-sample confidence interval for  with
confidence level approximately 100(1 – )%. This formula
is valid regardless of the shape of the population
distribution.
22
A Large-Sample Interval for 
In words, the CI (7.8) is
point estimate of   (z critical value) (estimated standard error of the mean).
Generally speaking, n > 40 will be sufficient to justify the
use of this interval.
This is somewhat more conservative than the rule of thumb
for the CLT because of the additional variability introduced
by using S in place of .
23
A General Large-Sample Confidence Interval
The large-sample intervals
and
are special cases of a general large-sample CI for a
parameter .
Suppose that is an estimator satisfying the following
properties:
(1) It has approximately a normal distribution;
(2) it is (at least approximately) unbiased; and
(3) an expression for
available.
, the standard deviation of , is
24
A General Large-Sample Confidence Interval
For example, in the case  = , = is an unbiased
estimator whose distribution is approximately normal when
n is large and
. Standardizing yields the
rv
, which has approximately a standard
normal distribution. This justifies the probability statement
(7.9)
Suppose first that does not involve any unknown
parameters (e.g., known  in the case  = ).
25
A General Large-Sample Confidence Interval
Then replacing each < in (7.9) by = results in
, so the lower and upper confidence limits
are
and
, respectively.
Now suppose that does not involve  but does involve at
least one other unknown parameter. Let be the estimate
of
obtained by using estimates in place of the unknown
parameters (e.g.,
estimates
).
Under general conditions (essentially that be close to
for most samples), a valid CI is
. The
large-sample interval
is an example.
26
A Confidence Interval for a Population Proportion
Let p denote the proportion of “successes” in a population,
where success identifies an individual or object that has a
specified property (e.g., individuals who graduated from
college, computers that do not need warranty service, etc.).
A random sample of n individuals is to be selected, and X is
the number of successes in the sample. Provided that n is
small compared to the population size, X can be regarded
as a binomial rv with E(X) = np and
.
Furthermore, if both np  10 and nq  10, (q = 1 – p), X has
approximately a normal distribution.
27
A Confidence Interval for a Population Proportion
The natural estimator of p is = X/n, the sample fraction of
successes. Since is just X multiplied by the constant 1/n,
also has approximately a normal distribution. As we know
that, E( ) = p (unbiasedness) and
.
The standard deviation
involves the unknown parameter
p. Standardizing by subtracting p and dividing by then
implies that
28
A Confidence Interval for a Population Proportion
If the sample size n is very large, then z2/2n is generally
quite negligible (small) compared to and z2/n is quite
negligible compared to 1, from which
. In this case
z2/4n2 is also negligible compared to pq/n (n2 is a much
larger divisor than is n); as a result, the dominant term in
the  expression is
and the score interval is
approximately
(7.11)
This latter interval has the general form
of a
large-sample interval suggested in the last subsection.
29
Intervals Based on a Normal Population Distribution
The CI for  presented in earlier section is valid provided
that n is large. The resulting interval can be used whatever
the nature of the population distribution. The CLT cannot be
invoked, however, when n is small.
In this case, one way to proceed is to make a specific
assumption about the form of the population distribution
and then derive a CI tailored to that assumption.
For example, we could develop a CI for  when the
population is described by a gamma distribution, another
interval for the case of a Weibull distribution, and so on.
30
Intervals Based on a Normal Population Distribution
Statisticians have indeed carried out this program for a
number of different distributional families. Because the
normal distribution is more frequently appropriate as a
population model than is any other type of distribution, we
will focus here on a CI for this situation.
Assumption
The population of interest is normal, so that X1, … , Xn
constitutes a random sample from a normal distribution
with both  and  unknown.
31
Intervals Based on a Normal Population Distribution
The key result underlying the interval in earlier section was
that for large n, the rv
has approximately
a standard normal distribution.
When n is small, S is no longer likely to be close to s, so
the variability in the distribution of Z arises from
randomness in both the numerator and the denominator.
This implies that the probability distribution of
will be more spread out than the standard normal
distribution.
32
Intervals Based on a Normal Population Distribution
The result on which inferences are based introduces a new
family of probability distributions called t distributions.
Theorem
When is the mean of a random sample of size n from a
normal distribution with mean , the rv
(7.13)
has a probability distribution called a t distribution with n – 1
degrees of freedom (df).
33
Properties of t Distributions
Before applying this theorem, a discussion of properties of t
distributions is in order. Although the variable of interest is
still
, we now denote it by T to emphasize
that it does not have a standard normal distribution when n
is small.
We know that a normal distribution is governed by two
parameters; each different choice of  in combination with
 gives a particular normal distribution.
Any particular t distribution results from specifying the value
of a single parameter, called the number of degrees of
freedom, abbreviated df.
34
Properties of t Distributions
We’ll denote this parameter by the Greek letter . Possible
values of  are the positive integers 1, 2, 3, . So there is a
t distribution with 1 df, another with 2 df, yet another with 3
df, and so on.
For any fixed value of , the density function that specifies
the associated t curve is even more complicated than the
normal density function.
Fortunately, we need concern ourselves only with several
of the more important features of these curves.
35
Properties of t Distributions
Properties of t Distributions
Let t denote the t distribution with  df.
1. Each t curve is bell-shaped and centered at 0.
2. Each t curve is more spread out than the standard
normal (z) curve.
3. As  increases, the spread of the corresponding t curve
decreases.
4. As  , the sequence of t curves approaches the
standard normal curve (so the z curve is often called the
t curve with df = ).
36
Properties of t Distributions
Figure 7.7 illustrates several of these properties for
selected values of .
t and z curves
Figure 7.7
37
Properties of t Distributions
The number of df for T in (7.13) is n – 1 because, although
S is based on the n deviations
implies that only n – 1 of these are “freely determined.”
The number of df for a t variable is the number of freely
determined deviations on which the estimated standard
deviation in the denominator of T is based.
The use of t distribution in making inferences requires
notation for capturing t-curve tail areas analogous to
for the curve. You might think that t would do the trick.
However, the desired value depends not only on the tail
area captured but also on df.
38
Properties of t Distributions
Notation
Let t, = the number on the measurement axis for which
the area under the t curve with  df to the right of t, is ;
t, is called a t critical value.
For example, t.05,6 is the t critical value that captures an
upper-tail area of .05 under the t curve with 6 df. The
general notation is illustrated in Figure 7.8.
Illustration of a t critical value
Figure 7.8
39
Properties of t Distributions
Because t curves are symmetric about zero, –t, captures
lower-tail area . Appendix Table A.5 gives t, for selected
values of  and .
This table also appears inside the back cover. The columns
of the table correspond to different values of . To obtain
t.05,15, go to the  =.05 column, look down to the  = 15 row,
and read t.05,15 = 1.753.
Similarly, t.05,22 = 1.717 (.05 column,  = 22 row), and
t.01,22 = 2.508.
40
Properties of t Distributions
The values of t, exhibit regular behavior as we move
across a row or down a column. For fixed , t, increases
as  decreases, since we must move farther to the right of
zero to capture area  in the tail.
For fixed , as  is increased (i.e., as we look down any
particular column of the t table) the value of t, decreases.
This is because a larger value of  implies a t distribution
with smaller spread, so it is not necessary to go so far from
zero to capture tail area .
41
Properties of t Distributions
Furthermore, t, decreases more slowly as  increases.
Consequently, the table values are shown in increments of
2 between 30 df and 40 df and then jump to = 50, 60, 120
and finally .
Because is the standard normal curve, the familiar
values appear in the last row of the table. The rule of thumb
suggested earlier for use of the large-sample CI (if n > 40)
comes from the approximate equality of the standard
normal and t distributions for  40.
42
The One-Sample t Confidence Interval
The standardized variable T has a t distribution with n – 1
df, and the area under the corresponding t density curve
between –t/2,n – 1 and t/2,n – 1 is 1 –  (area /2 lies in each
tail), so
P(–t/2,n – 1 < T < t/2,n – 1) = 1 – 
(7.14)
Expression (7.14) differs from expressions in previous
sections in that T and t/2,n – 1 are used in place of Z and
but it can be manipulated in the same manner to obtain a
confidence interval for .
43
The One-Sample t Confidence Interval
Proposition
Let and s be the sample mean and sample standard
deviation computed from the results of a random sample
from a normal population with mean . Then a
100(1 – )% confidence interval for  is
(7.15)
or, more compactly
44
The One-Sample t Confidence Interval
An upper confidence bound for  is
and replacing + by – in this latter expression gives a lower
confidence bound for , both with confidence level
100(1 – )%.
45
Properties of t Distributions
The values of t, exhibit regular behavior as we move
across a row or down a column. For fixed , t, increases
as  decreases, since we must move farther to the right of
zero to capture area  in the tail.
For fixed , as  is increased (i.e., as we look down any
particular column of the t table) the value of t, decreases.
This is because a larger value of  implies a t distribution
with smaller spread, so it is not necessary to go so far from
zero to capture tail area .
46
Properties of t Distributions
The values of t, exhibit regular behavior as we move
across a row or down a column. For fixed , t, increases
as  decreases, since we must move farther to the right of
zero to capture area  in the tail.
For fixed , as  is increased (i.e., as we look down any
particular column of the t table) the value of t, decreases.
This is because a larger value of  implies a t distribution
with smaller spread, so it is not necessary to go so far from
zero to capture tail area .
47
A Prediction Interval for a Single Future Value
Proposition
A prediction interval (PI) for a single observation to be
selected from a normal population distribution is
(7.16)
The prediction level is 100(1 – )%. A lower prediction
bound results from replacing t/2 by t and discarding the
+ part of (7.16); a similar modification gives an upper
prediction bound.
We skip Sec 7.4
48