Download confidence interval

Document related concepts
no text concepts found
Transcript
Chapter 9
Estimation
Using a Single Sample
(Confidence Intervals!)
1
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Branches of Statistics
•Descriptive statistics – what we’ve
done so far.
•Inferential statistics – what we start
today!
Using values obtained from a
sample (statistics) to predict
values for a population
(parameters)
Confidence intervals
Hypothesis testing
2
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Point Estimation
A point estimate of a
population characteristic is a
single number that is based
on sample data and
represents a plausible value
of the characteristic.
3
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Examples of Point Estimates
The percentage of orange Reese’s Pieces in a random
sample of 25.
The average length of the Jellyblubbers in a random sample
of 25.
The median size (diameter) of a random sample of 40 apples.
The standard deviation of the ages of a random sample of
125 college students.
The variance of the Algebra II grades of a random sample of
200 Algebra II students.
4
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Examples of Point
Estimates - Continued
A sample of 200 students at a large
university is selected to estimate the
proportion of students that wear contact lens.
In this sample 47 wore contact lens.
Let p = the true proportion of all students at
this university who wear contact lens.
Consider “success” being a student who
wears contact lens.
What is the point estimate for p?
5
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example
number of successes in the sample
p

ˆ
The statistic
n
is a reasonable choice for a formula to obtain a point
estimate for p.
Such a point estimate is pˆ  47  0.235
200
6
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example
A sample of weights of 34 male freshman
students was obtained.
185
202
197
188
166
148
161
139
214
170
231
180
174
177
283
207
176
194
175
170
184
180
184
176
202
151
189
167
179
178
176
168
177
155
If one wanted to estimate the true mean of all
male freshman students, you might use the
sample mean as a point estimate for the true
mean.
sample mean  x  182.44
7
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example – Same Data!
After looking at a histogram and boxplot of the
data (below) you might notice that the data
seems reasonably symmetric with an outlier,
so you might use either the sample median or
a sample trimmed mean as a point estimate.
5% trimmed mean  180.07
140
180
220
260
177  178
sample median 
 177.5
2
8
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Bias
A statistic with mean value equal to the
value of the population characteristic being
estimated is said to be an unbiased
statistic. A statistic that is not unbiased is
said to be biased.
Sampling
distribution of a
unbiased statistic
Original
distribution
9
Sampling
distribution of a
biased statistic
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Bias
Another way to think of bias is this.
An unbiased statistic gives an estimate that
is too high the same proportion of the time
that it gives an estimate that is too low!
Sampling
distribution of a
unbiased statistic
Original
distribution
10
Sampling
distribution of a
biased statistic
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
What Makes a “Good”
Point Estimate?
Given a choice between several unbiased
statistics that could be used for estimating a
population characteristic, the best statistic to
use is the one with the smallest standard
deviation.
Unbiased sampling
distribution with the
smallest standard
deviation, the Best
choice.
11
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Point Estimates - Summary
Unbiased v. Biased
Small standard error is
good.
What is standard
error? The standard
deviation of the
sampling distribution of
sample statistics.
12
x 

n
p 1  p 
ˆ 
p
n
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Confidence Intervals
Point estimates are of little value in
estimating a parameter. Because of
sampling variability we know a point
estimate can vary widely and is
seldom equal to the actual parameter.
13
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Confidence Intervals
So…instead, we find a range of values
that we can say with some degree of
certainty contains the parameter.
A confidence interval for a
population characteristic (parameter)
is an interval of plausible values for the
characteristic. It is constructed so that,
with a chosen degree of confidence,
the value of the characteristic will be
captured inside the interval.
14
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
A Better Way!
Confidence Intervals
 An interval estimate with and associated
measure of precision.
• I am 95% confident that the true proportion of U.S.
adults who believe that affirmative action programs
should continue is between .499 and .561.
• I am 93% confident that the true mean number of
students per 3rd hour class at MHS is 25 ± 4.
This is called the bound on the error.
15
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
More Examples
I am 99% confident that the true mean annual
radiation exposure for Diablo Canyon Nuclear
Power Plant Unit 2 workers is between .412 and
.550 rem.
I am 90% confident that in 1993 the true mean
salary for married men who received MBAs in the
late 70s and who were the sole source of family
income was between $121,406.03 and
$127,613.97.
Figure out the bound on the error for each of these.
16
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Statistic ± Bound on the Error
Getting the statistic is easy.
How do we get the “bound on the error”?
• We’ll call it “error” for short.
• Formula
Critical value × standard error
p (1  p )
For p-hat that means: z * 
n
Where z* is based on the “confidence
level” (How certain you want to be).
17
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Steps to Creating a Confidence
Interval for Estimating p
To create a
confidence interval
with a 95% level of
confidence, we take
95% of the area
under the normal
curve, right out of the
center!
Z*1
Next, calculate the zscores that define the
boundaries of this area.
These are the critical
values.
95%

p̂
Z*2
Actual value of π
18
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
The Concept!
…from every value of
p-hat possible. Then all
of the resulting intervals
created by the p-hats in
the shaded region will
contain the actual value
of π.
Only one of the point
estimates possible is
actually correct. But,
if we add or subtract
this much…
95%
pˆ  z * 
Z*1
Actual value of π
19

p̂
p (1  p )
n
Z*2
p̂
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Continuing the Steps
Look up the z-scores
that are the
boundaries for the
middle 95% of the
normal curve.
They are just additive
inverses of each other.
This is the critical value
for a 95% confidence
interval.
95%
Z*1
20

p̂
Actual value of π
Z*2
Find the standard error for p-hat. The critical value times the
standard error gives the actual distance from π to the boundaries.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
But Wait a Minute!!!
We don’t know the value of pi. So use p-hat.
And how do we know this is normal?
• Requirements for creating a z-confidence
interval for pi.
P-hat must come from a random
sample.
The sample size must be large enough
for n(p-hat) ≥ 10 and n(1 – p-hat) ≥ 10
(This allows us to say that p-hat has an
approximately normal distribution and
allows us to use p-hat to estimate pi.)
The sample must be less than 5% (or
10%) of the population.
If these requirements are met then we can
proceed.
21
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Large-sample
Confidence Interval for a
Population Proportion
95%
p̂
z2
z1
p
(
1

p
)
ˆ
ˆ
pˆ  z *
n
22
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Confidence Level
The confidence level associated with a
confidence interval estimate is the success
rate of the method used to construct the
interval.
Even though it is written as
a percentage, the
confidence level is not a
probability!
23
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
So Now…what does confidence mean in
each of these cases?
I am 99% confident that the true proportion of
MHS students who are “middle children” is
between .412 and .550.
I am 90% confident that in 1993 the true
proportion of married men who received MBAs in
the late 70s and who were the sole source of
family income was between .12 and .185.
What were the p-hats in each of these cases?
24
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Reese’s Pieces
Our class did an M&M lab. In past years
we have done similar labs with Reese’s
Pieces. Either way results suggest that
even though sample values vary
depending on which sample you happen
to pick, there seems to be a pattern to the
variation. We need more samples to
investigate this pattern more thoroughly,
however. Since it is time-consuming (and
possibly fattening) to literally sample
candies, we will use the TI-83 calculator
to simulate the process.
25
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Reese’s Pieces
To perform these simulations we need to
suppose that we know the actual value of
the parameter. Let us suppose that 45%
of the population is orange.
Use TI-83 calculator drawing 500 samples of
25 candies each. (Pretend that this is really
500 students, each taking 25 candies and
counting the number of orange ones.)
randBin(25, .45, 500)L1
L1/25 L2
(This will take time and battery power.) Then look at a display of
the sample proportions of orange obtained. And sketch.
26
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Reese’s Pieces
Record the mean and standard deviation of
these sample proportions.
Roughly speaking, are there more sample
proportions close to the population proportion
(which, we said to be .45) than there are far
from it?
27
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Phone Home!*
Reese’s Pieces
Let us quantify the previous question. Use TI-83
calculator to count how many of the 500 sample
proportions are within  .10 of .45 (i.e. between .35
and .55). Then repeat for within  .20 and for within 
.30.
SortA(L2)
Number of the 500
Record the results: sample proportions
Percentage of these
sample proportions
within  .10 of .45
within  .20 of .45
within  .30 of .45
28
*E.T. reference
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Reese’s Pieces
Suppose that each of the 500 imaginary
students was to estimate the population
proportion of orange candies by going a
distance of .20 on either side of her/his
sample proportion. What percentage of
the 500 students would capture the actual
population proportion (.45) within this
interval?
29
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Reese’s Pieces
Forgetting that you actually (think you) know the
population proportion of orange candies to be .45,
suppose that you were one of these 500 imaginary
students. Would you have any way of knowing
definitively whether your sample proportion was
within .20 of the population proportion? Would you
be reasonably “confident” that your sample proportion
was within .20 of the population proportion?
Explain why.
30
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
The 95% Confidence Interval
When n is large, a 95% confidence
interval for p is

p(1  p)
p(1  p) 
, p  1.96
 p  1.96

n
n 

The endpoints of the interval are often
abbreviated by
p(1  p)
p  1.96
n
where - gives the lower endpoint and + the
upper endpoint.
31
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example
For a project, a student randomly
sampled 182 other students at a large
university to determine if the majority of
students were in favor of a proposal to
build a field house. He found that 75 were
in favor of the proposal.
32
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
p = the true proportion of students that favor the proposal.
75
p
 0.4121
182
n  182
x  75
Requirements:
1. It is given to be a random sample.
2. np = 182(0.4121) = 75 >10 and
n(1-p)=182(0.5879) = 107 >10
3. It is reasonable to assume that 182 students is less than
or equal to the number of students attending a large
university (182/.05=3640).
4. I will create a 95% z-confidence interval for p.
p(1  p)
0.4121(0.5879)
p  1.96
 0.4121  1.96
n
182
 0.4121  0.07151
I am 95% confident that the true proportion of students
that favor the proposal is between 0.341 and 0.484.
33
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Confidence Interval for pi on the
TI-84
34
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
So…The General Confidence
Interval Formula for a population
proportion
The general formula for a confidence
interval for a population proportion p
when
1. p is the sample proportion from a
random sample , and
2. The sample size n is large
(np  10 and np(1-p)  10)
3. n<.05N
is given by
35
p   z critical value 
p(1  p)
n
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Finding a z Critical Value
Finding a z critical value for a 98%
confidence interval.
How would we
do this on the
calculator?
2.33
Looking up the cumulative area or 0.9900 in the
body of the table we find z = 2.33
36
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Some Common Critical Values
Confidence z critical
level
value
80%
90%
95%
98%
99%
99.8%
99.9%
37
1.28
1.645
1.96
2.33
2.58
3.09
3.29
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Terminology Review
The standard error of a statistic is the
estimated standard deviation of the statistic.
For sample proportions, the standard deviation is
p(1  p)
n
This means that the standard error of the sample
proportion is
p(1  p)
n
38
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Review Terminology
The bound on error of estimation, B,
associated with a 95% confidence interval is
(1.96)·(standard error of the statistic).
The bound on error of estimation, B, associated
with a confidence interval is
(z critical value)·(standard error of the statistic).
39
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Sample Size
The sample size required to estimate a
population proportion p to within an amount
B with 95% confidence is
 1.96 
n  p(1  p) 

 B 
2
The value of p may be estimated by prior
information. If no prior information is available,
use p = 0.5 in the formula to obtain a
conservatively large value for n.
Generally one rounds the result up to the nearest integer.
40
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Sample Size Calculation
Example
If a TV executive would like to find a 95%
confidence interval estimate within 0.03 for
the proportion of all households that watch
NYPD Blue regularly. How large a sample is
needed if a prior estimate for p was 0.15.
We have B = 0.03 and the prior estimate of p = 0.15
2
2
 1.96 
 1.96 
n  p(1  p) 
  (0.15)(0.85) 
  544.2
 B 
 0.03 
A sample of 545 or more would be needed.
41
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Sample Size Calculation Example revisited
Suppose a TV executive would like to find a 95%
confidence interval estimate within 0.03 for the
proportion of all households that watch NYPD
Blue regularly. How large a sample is needed if
we have no reasonable prior estimate for p.
We have B = 0.03 and should use p = 0.5 in
the formula.
2
2
 1.96 
 1.96 
n  p(1  p) 
  (0.5)(0.5) 
  1067.1
 B 
 0.03 
The required sample size is now 1068.
42
Notice, a reasonable ball park estimate for p
can lower the needed sample size.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Another Example
A college professor wants to estimate the
proportion of students at a large university who
favor building a field house with a 99%
confidence interval accurate to 0.02. If one of
his students performed a preliminary study and
estimated p to be 0.412, how large a sample
should he take.
We have B = 0.02, a prior estimate p = 0.412 and we
should use the z critical value 2.58 (for a 99%
confidence interval)
2
2
 2.58 
 2.58 
n  p(1  p) 
  (0.412)(0.588) 
  4031.4
 B 
 0.02 
The required sample size is 4032.
43
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
One-Sample z Confidence
Interval for 
If
1. x is the sample mean from a random
sample,
2. The sample size n is large (generally
n30), and
3.  , the population standard deviation, is
known then the general formula for a
confidence interval for a population mean 
is given by

x   z critical value 
n
44
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
One-Sample z Confidence
Interval for 
If n is small (generally n < 30) but it is
reasonable to believe that the distribution of
values in the population is normal, a confidence
interval for  (when  is known) is...
Notice that this formula works when  is known and
either
1. n is large (generally n  30) or
2. The population distribution is normal (any
sample size.
45
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example
A certain filling machine has a true
population standard deviation  = 0.228
ounces when used to fill catsup bottles. A
random sample of 36 “6 ounce” bottles of
catsup was selected from the output from
this machine and the sample mean was
6.018 ounces.
Find a 90% confidence interval estimate for the
true mean fills of catsup from this machine.
46
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example I (continued)
x  6.018,   0.228, n  36
The z critical value is 1.645

x  (z critical value)
n
0.228
 6.018  1.645
 6.018  0.063
36
90% Confidence Interval
(5.955, 6.081)
47
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Unknown 
[All Size Samples]
An Irish mathematician/statistician, W. S. Gosset
developed the techniques and derived the Student’s
t distributions that describe the behavior of
.
x  0
s n
48
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
t Distributions
If X is a normally distributed random variable, the statistic
x
has a “t” distribution where
x
t
, with n - 1 degrees of freedom
s
n
49
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
t Distributions
Comparison of normal and t distibutions
df = 2
df = 5
df = 10
df = 25
Normal
-4
50
-3
-2
-1
0
1
2
3
4
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
t Distributions
Notice: As df increase, t distributions
approach the standard normal
distribution.
Since each t distribution would require a
table similar to the standard normal table,
we usually only create a table of critical
values for the t distributions.
51
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
52
Central area captured:
Confidence level:
1
2
3
4
5
6
D
7
e
8
g
9
r
10
11
e
12
e
13
s
14
15
16
o
17
f
18
19
20
f
21
r
22
e
23
24
e
25
d
26
o
27
m
28
29
30
40
60
120
z critical values
0.80
0.90
0.95
0.98
0.99
0.998
0.999
80%
90%
95%
98%
99%
99.8%
99.9%
3.08
1.89
1.64
1.53
1.48
1.44
1.41
1.40
1.38
1.37
1.36
1.36
1.35
1.35
1.34
1.34
1.33
1.33
1.33
1.33
1.32
1.32
1.32
1.32
1.32
1.31
1.31
1.31
1.31
1.31
1.30
1.30
1.29
1.28
6.31
2.92
2.35
2.13
2.02
1.94
1.89
1.86
1.83
1.81
1.80
1.78
1.77
1.76
1.75
1.75
1.74
1.73
1.73
1.72
1.72
1.72
1.71
1.71
1.71
1.71
1.70
1.70
1.70
1.70
1.68
1.67
1.66
1.645
12.71
4.30
3.18
2.78
2.57
2.45
2.36
2.31
2.26
2.23
2.20
2.18
2.16
2.14
2.13
2.12
2.11
2.10
2.09
2.09
2.08
2.07
2.07
2.06
2.06
2.06
2.05
2.05
2.05
2.04
2.02
2.00
1.98
1.96
31.82
6.96
4.54
3.75
3.36
3.14
3.00
2.90
2.82
2.76
2.72
2.68
2.65
2.62
2.60
2.58
2.57
2.55
2.54
2.53
2.52
2.51
2.50
2.49
2.49
2.48
2.47
2.47
2.46
2.46
2.42
2.39
2.36
2.33
63.66
9.92
5.84
4.60
4.03
3.71
3.50
3.36
3.25
3.17
3.11
3.05
3.01
2.98
2.95
2.92
2.90
2.88
2.86
2.85
2.83
2.82
2.81
2.80
2.79
2.78
2.77
2.76
2.76
2.75
2.70
2.66
2.62
2.58
318.29
22.33
10.21
7.17
5.89
5.21
4.79
4.50
4.30
4.14
4.02
3.93
3.85
3.79
3.73
3.69
3.65
3.61
3.58
3.55
3.53
3.50
3.48
3.47
3.45
3.43
3.42
3.41
3.40
3.39
3.31
3.23
3.16
3.09
636.58
31.60
12.92
8.61
6.87
5.96
5.41
5.04
4.78
4.59
4.44
4.32
4.22
4.14
4.07
4.01
3.97
3.92
3.88
3.85
3.82
3.79
3.77
3.75
3.73
3.71
3.69
3.67
3.66
3.65
3.55
3.46
3.37
3.29
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
One-Sample t Procedures
Suppose that a SRS of size n is drawn from a
population having unknown mean . The general
confidence limits are
s
x  (t critical value)
n
and the general confidence interval for  is
s
s 

, x  (t critical value)
 x  (t critical value)

n
n

53
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Confidence Interval Example
Ten randomly selected shut-ins were each
asked to list how many hours of television
they watched per week. The results are
82
66
90
84
75
88
80
94
110
91
Find a 90% confidence interval estimate for
the true mean number of hours of
television watched per week by shut-ins.
54
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Confidence Interval Example
Calculating the sample mean and standard
deviation we have n = 10, x =
 86,
86 s = 11.842
We find the critical t value of 1.833 by looking on the
t table in the row corresponding to df = 9, in the
column with bottom label 90%. Computing the
confidence interval for  is
s
11.842
x  t*
 86  (1.833)
 86  6.86
n
10
(79.14, 92.86)
55
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Confidence Interval Example
To calculate the confidence interval, we had
to make the assumption that the distribution
of weekly viewing times was normally
distributed. Consider the normal plot of the
10 data points produced with Minitab that is
given on the next slide.
56
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Confidence Interval Example
Notice that the normal plot looks reasonably
linear so it is reasonable to assume that the
number of hours of television watched per week
by shut-ins is normally distributed.
Normal Probability Plot
.999
.99
.95
Probability
Typically if the
p-value is more than
0.05 we assume that the
distribution is normal
.80
.50
.20
.05
.01
.001
70
80
90
100
110
Hours
Anderson-Darling Normality Test
A-Squared: 0.226
P-Value: 0.753
Average: 86
StDev: 11.8415
N: 10
57
Anderson-Darling Normality Test
A-Squared: 0.226
P-Value: 0.753
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Related documents