Download 1342Lecture7.pdf

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Foundations of statistics wikipedia , lookup

History of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

German tank problem wikipedia , lookup

Transcript
62
Instruction: Confidence Interval Estimates
In this lecture, we begin making statistical inferences, that is, we begin forming
conclusions about the population based on a sample. In particular, we will use X to estimate the
population mean, µ .
If a single sample statistic is used to estimate a population parameter, it is called a point
estimate. Of course, we would not expect a single sample's mean to actually equal the
population mean. Statistical methods have been developed that build an interval of values
around a point estimate in such a way that it is highly likely that the population falls within the
interval. We call these intervals confidence interval estimates.
A point estimate is a single sample statistic used to estimate a
population parameter.
A confidence interval estimate is an interval of values for which the
probability that a population parameter is within that interval is
significant.
Note the modifier "significant" in the definition of confidence interval estimate. What is
significant is open to the statistician performing the estimation. Thus, the level of significance or
confidence is an arbitrary probability value. Usually, the level of "significance"— called the
confidence level—is some constant c close to the numerical value of one such as 0.90, 0.95, or
0.99. In either case, Z c is the number such that the area under the standard normal curve falling
between − Z c and Z c is equal to c. Since area under the standard normal curve corresponds to
probability, we have the desired probability for the chosen confidence level.
If c is the confidence level and x is the point estimate then
P ( −Zc ≤ Z x ≤ Zc ) = c
The value Z c is called the critical value for the confidence level of c.
Consider the probability associated with the confidence level c.
P ( −Zc ≤ Z x ≤ Zc ) = c
This probability is associated with the Z-score of a single sample statistic. Assume that the
single sample statistic is a mean. Then, we know by the Central Limit Theorem that X has a
distribution that is approximately normal with mean µ and standard deviation. Thus a single
sample mean can be converted to a standard Z-score as below.
ZX =
X −µ
σ
n
63
Substituting for Z x into the probability for c, gives
⎛
⎞
X −µ
≤ Z c ⎟⎟ = c .
P ⎜⎜ − Z c ≤
σ n
⎝
⎠
Multiplying all parts of the inequality above by −σ
n gives
⎛
σ
σ ⎞
P ⎜ −Z c ⋅
≤ µ − X ≤ Zc ⋅
⎟ = c.
n
n⎠
⎝
Adding X to all sides of the inequality gives
⎛
σ
σ ⎞
P ⎜ X − Zc ⋅
≤ µ ≤ X + Zc ⋅
⎟ = c.
n
n⎠
⎝
The inequality itself is called the confidence interval for a population mean with known standard
deviation.
The confidence interval for µ with known σ is
X − Zc ⋅
σ
n
≤ µ ≤ X + Zc ⋅
σ
n
where Z c , the critical value for the confidence level of c, is the value
c +1
from the standardized normal
corresponding to the cumulative area of
2
distribution.
Let's calculate a the confidence interval estimate for the population mean with known
population standard deviation. Consider the case of a large national investment firm whose
board wants to know the average amount invested by a client during the previous five years.
Statisticians working for the firm take a random sample of 400 client files for the five-year
period. The mean amount invested for this sample of 400 clients equals $5,250 and the known
standard deviation for all the investments during the five years is $800.00. With this given
information, we can construct a 0.95 confidence interval for µ .
Our arbitrary confidence level, c, is 0.95. Thus, the critical value, Z c , is the value
0.95 + 1
= 0.975, which is 1.96 as shown on a table like
corresponding to the cumulative area of
2
the table below.
64
Z
1.8
1.9
2.0
A for 0.00
0.9641
0.9713
0.9772
A for 0.01
0.9649
0.9719
0.9778
A for 0.02
0.9656
0.9726
0.9783
A for 0.03
0.9664
0.9732
0.9788
A for 0.04
0.9671
0.9738
0.9793
A for 0.05
0.9678
0.9744
0.9798
A for 0.06
0.9686
0.9750
0.9803
Now, the confidence interval can be calculated as below.
$5, 250 − 1.96 ⋅
$800
≤ µ ≤ $5, 250 + 1.96 ⋅
$800
400
400
$5, 250 − $78.4 ≤ µ ≤ $5, 250 + $78.4
$5,171.60 ≤ µ ≤ $5,328.4
The interval from $5,171.60 to $5,328.40 is the 0.95 confidence interval for µ . The firm's board
can be confident that 95% of the time the mean amount invested by clients during the previous
five years is somewhere ranging from $5,171.60 to $5,328.40.
The astute reader will note that the above example is not very practical because it is not
often that the population mean is unknown but the population standard deviation is known. In
such a case, we construct confidence intervals using a distribution that approximates the
standardized normal distribution called the Student's t distribution. This distribution changes
slightly for varying degrees of freedom, a value related to the sample size.* For any degree of
freedom the Student's t distribution approximates the standardized normal distribution and we
can construct a confidence interval for a population mean with unknown standard deviation.
The confidence interval for µ with unknown σ is
X − tn −1 ⋅
S
n
≤ µ ≤ X + tn −1 ⋅
S
n
where tn −1 , the critical value for the confidence level of c, is the value
1− c
from the t distribution with n − 1
corresponding to the upper-tail area of
2
degrees of freedom.
Let's calculate a confidence interval estimate for the population mean with unknown
population standard deviation. Consider the case of Atlas International, a company that
manufactures forklifts. The company has a new assembly line and the managers are interested in
knowing the mean lift capacity for each forklift. For obvious expense reasons, the number of
forklifts available for study is limited. Statisticians working for the company select twelve
forklifts from a trial run of the new assembly line. The mean lift capacity for these twelve
*
The degrees of freedom correspond to the number of values in the sample that can vary, that is, "be free" after all
the previous values have been fixed when used as addends of a sum divided by the sample size and equal to the
sample mean.
65
forklifts equals three tons and the sample standard deviation is 0.25 tons. With this given
information, we can construct a 0.95 confidence interval for µ .
Our arbitrary confidence level, c, is 0.95. The degrees of freedom are 12 − 1 = 11 . Thus,
1 − 0.95
= 0.025,
the critical value, t11 , is the value corresponding to the cumulative area of
2
which is 2.201 as shown on a table like the table below.
degrees
of
freedom
10
11
12
t for uppertail area of
0.25
0.6998
0.6974
0.6955
t for uppertail area of
0.10
1.3722
1.3634
1.3562
t for uppertail area of
0.05
1.8125
1.7959
1.7823
t for uppertail area of
0.025
2.2281
2.2010
2.1788
t for uppertail area of
0.01
2.7638
2.7181
2.6810
t for uppertail area of
0.005
3.1693
3.1058
3.0545
Now, the confidence interval can be calculated as below.
3 − 2.201 ⋅
0.25
≤ µ ≤ 3 + 2.201 ⋅
12
3 − 0.159 ≤ µ ≤ 3 + 0.159
2.841 ≤ µ ≤ 3.159
0.25
12
The interval from 2.841 tons to 3.159 tons is the 0.95 confidence interval for µ . The assembly
line managers can be confident that 95% of the mean lift capacity for the forklifts produced by
the new assembly line is somewhere ranging from 2.841 tons to 3.159 tons.
The previous two examples find confidence intervals for µ , once with σ known, once
with σ unknown. Naturally, confidence intervals can be constructed for other parameters as
well. In particular, we can construct confidence intervals for the population proportion, π , using
the interval detailed in the box below.
The confidence interval for π is
p − Zc ⋅
p (1 − p )
n
≤ π ≤ p + Zc ⋅
p (1 − p )
n
where Z c , the critical value for the confidence level of c, is the value
c +1
corresponding to the cumulative area of
from the standardized normal
2
distribution.
Assignment 7
66
Problems
For the following problems, assume that S σ as long as the sample size is at least thirty.
#1
A random sample of 40 cups of coffee dispensed from an automatic vending machine showed
that the amount of coffee the machine gave was X = 7.1 ounces with standard deviation
S = 0.3 ounces. Find a 90% confidence interval for the population mean of the amount of
coffee dispensed by the machine.
#2
A hospital's chief inspector wants to estimate the average number of days a patient stays in the
mental health ward. A random sample of 100 patients shows the average stay to be 5.2 days
with standard deviation of 1.9 days. Find a 0.90 confidence interval for the mean number of
days a patient stays in the ward.
#3
In wine making, acidity of the grape is a crucial factor. A ph range of 3.1 to 3.6 is considered
very acceptable. A random sample of twelve bunches of ripe grapes was taken from a particular
vineyard. For each bunch of grapes the acidity as measured by ph level was found to be:
3.2
3.5
3.7
3.3
3.4
3.6
3.6
3.1
3.5
3.2
3.1
3.4
Find a 99% confidence interval for the mean acidity of the entire harvest of grapes from the
vineyard of interest.
#4
A random sample of 100 felony trials in San Diego shows the mean waiting time between
arrest and trial is 173 days with standard deviation 28 days. Find a 0.99 confidence interval for
the mean number of days between arrest and trial.
#5
An anthropologist is studying a large pre-historic communal dwelling in northern Arizona. A
random sample of 127 individual family dwellings showed signs that nineteen belonged to the
Sun Clan. Let π be the probability that a dwelling selected at random was a dwelling of a Sun
Clan member. Find a point estimate for π and a confidence interval estimate for π.