Download confidence interval

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Chapter 9
Estimation Using a Single
Sample
Point Estimation
A point estimate of a population
characteristic is a single number that is
based on sample data and represents a
plausible value of the characteristic.
2
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Example
A sample of 200 students at a large university is
selected to estimate the proportion of students that
wear contact lens. In this sample 47 wear contact
lens.
Let p = the true proportion of all students at this
university that wear contact lens. Consider
“success” being a student wears contact lens.
number of successes in the sample
The statistic p 
n
Is a reasonable choice for a formula to obtain a point
estimate for p.
3
47
 0.235
Such a point estimate is p 
200
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Example
A sample of weights of 34 male freshman students was
obtained.
185
170
188
184
161
151
170
179
174
176
207
155
175
197
180
148
202
214
167
180
178
283
177
194
202
184
166
176
139
189
231
177
168
176
If one wanted to estimate the true mean of all male
freshman students, you might use the sample mean as a
point estimate for the true mean.
sample mean  x  182.44
4
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Example
After looking at a histogram and boxplot of the data
(below) you might notice that the data seems reasonably
symmetric with a outlier, so you might use either the
sample median or a sample trimmed mean as a point
estimate.
140
180
220
260
5% trimmed mean( Calculated
with Minitab )  180.07
5
177  178
sample median 
 177.5
2
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Bias
A statistic with mean value equal to the
value of the population characteristic
being estimated is said to be an
unbiased statistic. A statistic that is not
unbiased is said to be biased.
6
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Criteria
Given a choice between several unbiased
statistics that could be used for estimating
a population characteristic, the best
statistic to use is the one with the smallest
standard deviation.
7
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Large-sample Confidence Interval
for a Population Proportion
A confidence interval for a population
characteristic is an interval of plausible
values for the characteristic. It is
constructed so that, with a chosen degree
of confidence, the value of the
characteristic will be captured inside the
interval.
8
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Confidence Level
The confidence level associated with a
confidence interval estimate is the
success rate of the method used to
construct the interval.
9
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Recall
For the sampling distribution of p,
p(1  p)
p = p p 
and for large* n the
n
sampling distribution of p is approximately
normal.
Specifically when n is large*, the statistic
p has a sampling distribution that is
approximately normal with mean p and
standard deviation p(1  p) .
n
* np  10 and np(1-p)  10
10
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Some considerations
11
Approximately 95% of all large samples will
result in a value of p that is within
p(1  p) of the true population
1.96p  1.96
n
proportion p.
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Some considerations
Equivalently, this means that for 95% of all
possible samples, p will be in the interval
p(1  p)
p(1  p)
p  1.96
to p  1.96
n
n
Since p is unknown and n is large, we estimate
p(1  p)
p(1  p)
with
n
n
12
This interval can be used as long as
np  10 and np(1-p)  10
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
The 95% Confidence Interval
When n is large, a 95% confidence interval for p
is 
p(1  p)
p(1  p) 
, p  1.96
 p  1.96

n
n 

The endpoints of the interval are often
abbreviated by
p(1  p)
p  1.96
n
where - gives the lower endpoint and + the
upper endpoint.
13
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Example
For a project, a student randomly
sampled 182 other students at a large
university to determine if the majority of
students were in favor of a proposal to
build a field house. He found that 75 were
in favor of the proposal.
Let p = the true proportion of students that
favor the proposal.
14
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Example - continued
75
p
 0.4121
182
So np = 182(0.4121) = 75 >10 and
n(1-p)=182(0.5879) = 107 >10 we can use
the formulas given on the previous slide to
find a 95% confidence interval for p.
p(1  p)
0.4121(0.5879)
p  1.96
 0.4121  1.96
n
182
 0.4121  0.07151
The 95% confidence interval for p is
(0.341, 0.484).
15
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
The General Confidence Interval
The general formula for a confidence interval for
a population proportion p when
1. p is the sample proportion from a random
sample , and
2. The sample size n is large (np  10 and
np(1-p)  10) is
p   z critical value 
16
p(1  p)
n
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Finding a z Critical Value
Finding a z critical value for a 98%
confidence interval.
17
2.33
Looking up the cumulative area or 0.9900
in the body of the table we find z = 2.33
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Some Common Critical Values
Confidence z critical
level
value
80%
90%
95%
98%
99%
99.8%
99.9%
18
1.28
1.645
1.96
2.33
2.58
3.09
3.29
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Terminology
The standard error of a statistic is the
estimated standard deviation of the
statistic.
For sample proportions, the standard
deviation is p(1  p)
n
This means that the standard error of the
sample proportion is p(1  p)
n
19
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Terminology
The bound on error of estimation, B,
associated with a 95% confidence
interval is
(1.96)(standard error of the statistic).
The bound on error of estimation, B,
associated with a confidence interval is
(z critical value)·(standard error of the statistic).
20
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Sample Size
The sample size required to estimate a
population proportion p to within an
amount B with 95% confidence is
1.96 

n  p(1  p) 

 B 
2
The value of p may be estimated by prior
information. If no prior information is
available, use p = 0.5 in the formula to
obtain a conservatively large value for n.
Generally one rounds the result up to the nearest integer.
21
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Sample Size Calculation Example
If a TV executive would like to find a
95% confidence interval estimate within
0.03 for the proportion of all households
that watch NYPD Blue regularly. How
large a sample is needed if a prior
estimate for p was 0.15.
We have B = 0.03 and the prior estimate
of p = 0.15
2
2
1.96 
1.96 


n  p(1  p) 
  (0.15)(0.85) 
  544.2
 B 
 0.03 
A sample of 545 or more would be needed.
22
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Sample Size Calculation Example revisited
Suppose a TV executive would like to find a
95% confidence interval estimate within 0.03
for the proportion of all households that watch
NYPD Blue regularly. How large a sample is
needed if we have no reasonable prior
estimate for p.
We have B = 0.03 and should use p = 0.5 in
the formula.
2
2
1.96 
1.96 


n  p(1  p) 
  (0.5)(0.5) 
  1067.1
 B 
 0.03 
23
The required sample size is now 1068.
Notice, a reasonable ball park estimate for p
can lower the needed sample size.
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Another Example
A college professor wants to estimate the
proportion of students at a large university
who favor building a field house with a 99%
confidence interval accurate to 0.02. If one of
his students performed a preliminary study
and estimated p to be 0.412, how large a
sample should he take.
We have B = 0.02, a prior estimate p = 0.412
and we should use the z critical value 2.58 (for
a 99% confidence interval)
2
2
2.58 
2.58 


n  p(1  p) 
  (0.412)(0.588) 
  4031.4
 B 
 0.02 
24
The required sample size is 4032.
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
One-Sample z Confidence Interval for 
The general formula for a confidence interval for
a population mean  when
1. x is the sample proportion from a random
sample,
2. The sample size n is large (generally n30),
and
3.  , the population standard deviation, is
known is

x   z critical value 
n
25
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
One-Sample z Confidence Interval for 
If n is small (generally n < 30) but it is reasonable
to believe that the distribution of values in the
population is normal, a confidence interval for 
(when  is known) is

x   z critical value 
n
Notice that this formula works when  is known
and either
1. n is large (generally n  30) or
26
2. The population distribution is normal (any
sample size.
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Example
A certain filling machine has a true population
standard deviation = 0.228 ounces when
used to fill catsup bottles. A random sample of
36 “6 ounce” bottles of catsup was selected
from the output from this machine and the
sample mean was x  6.018 ounces.
Find a 90% confidence interval estimate for the
true mean fills of catsup from this machine.
27
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Example I (continued)
x  6.018,   0.228, n  36
1.645
Z critical value is 1.645

x  (z critical value)
n
0.228
 6.018  1.645
 6.018  0.063
36
90% Confidence Interval
28
(5.955, 6.081)
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Unknown  - Small Size Samples
[All Size Samples]
An Irish mathematician/statistician, W. S.
Gosset developed the techniques and
derived the Student’s t distributions that
describe the behavior of x   0.
s n
29
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
t Distributions
If X is a normally distributed random
variable, the statistic
x  0
t
s n
follows a t distribution with df = n-1
(degrees of freedom).
30
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
t Distributions
x  0
This statistic t 
s n
is fairly robust
and the results are reasonable for moderate sample
sizes (15 and up) if x is just reasonable centrally
weighted. It is also quite reasonable for large sample
sizes for distributional patterns (of x) that are not
extremely skewed.
31
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
t distribution
Comparison of normal and t distibutions
df = 2
df = 5
df = 10
df = 25
Normal
-4
32
-3
-2
-1
0
1
2
3
4
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
t Distributions Continued
Notice: As df increase, t distributions
approach the standard normal distribution.
Since each t distribution would require a table
similar to the standard normal table, we usually
only create a table of critical values for the t
distributions.
33
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
34
Central area captured:
Confidence level:
1
2
3
4
5
6
D
7
e
8
g
9
r
10
11
e
12
e
13
s
14
15
16
o
17
f
18
19
20
f
21
r
22
e
23
24
e
25
d
26
o
27
m
28
29
30
40
60
120
z critical values
0.80
0.90
0.95
0.98
0.99
0.998
0.999
80%
90%
95%
98%
99%
99.8%
99.9%
3.08
1.89
1.64
1.53
1.48
1.44
1.41
1.40
1.38
1.37
1.36
1.36
1.35
1.35
1.34
1.34
1.33
1.33
1.33
1.33
1.32
1.32
1.32
1.32
1.32
1.31
1.31
1.31
1.31
1.31
1.30
1.30
1.29
1.28
6.31
2.92
2.35
2.13
2.02
1.94
1.89
1.86
1.83
1.81
1.80
1.78
1.77
1.76
1.75
1.75
1.74
1.73
1.73
1.72
1.72
1.72
1.71
1.71
1.71
1.71
1.70
1.70
1.70
1.70
1.68
1.67
1.66
1.645
12.71
4.30
3.18
2.78
2.57
2.45
2.36
2.31
2.26
2.23
2.20
2.18
2.16
2.14
2.13
2.12
2.11
2.10
2.09
2.09
2.08
2.07
2.07
2.06
2.06
2.06
2.05
2.05
2.05
2.04
2.02
2.00
1.98
1.96
31.82
6.96
4.54
3.75
3.36
3.14
3.00
2.90
2.82
2.76
2.72
2.68
2.65
2.62
2.60
2.58
2.57
2.55
2.54
2.53
2.52
2.51
2.50
2.49
2.49
2.48
2.47
2.47
2.46
2.46
2.42
2.39
2.36
2.33
63.66
9.92
5.84
4.60
4.03
3.71
3.50
3.36
3.25
3.17
3.11
3.05
3.01
2.98
2.95
2.92
2.90
2.88
2.86
2.85
2.83
2.82
2.81
2.80
2.79
2.78
2.77
2.76
2.76
2.75
2.70
2.66
2.62
2.58
318.29
22.33
10.21
7.17
5.89
5.21
4.79
4.50
4.30
4.14
4.02
3.93
3.85
3.79
3.73
3.69
3.65
3.61
3.58
3.55
3.53
3.50
3.48
3.47
3.45
3.43
3.42
3.41
3.40
3.39
3.31
3.23
3.16
3.09
636.58
31.60
12.92
8.61
6.87
5.96
5.41
5.04
4.78
4.59
4.44
4.32
4.22
4.14
4.07
4.01
3.97
3.92
3.88
3.85
3.82
3.79
3.77
3.75
3.73
3.71
3.69
3.67
3.66
3.65
3.55
3.46
3.37
3.29
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
One-Sample t Procedures
Suppose that a SRS of size n is drawn from a
population having unknown mean . The general
confidence limits are
s
x  (t critical value)
n
and the general confidence interval for  is
s
s 

, x  (t critical value)
 x  (t critical value)

n
n

35
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Confidence Interval Example
Ten randomly selected shut-ins were each
asked to list how many hours of television
they watched per week. The results are
82
66
90
84
75
88
80
94
110
91
Find a 90% confidence interval estimate for
the true mean number of hours of television
watched per week by shut-ins.
36
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Confidence Interval Example Continued
Calculating the sample mean and standard
deviation we have n  10, x  86, and s  11.842
We find the critical t value of 1.833 by looking on
the t table in the row corresponding to df = 9, in
the column with bottom label 90%. Computing
the confidence interval for  is
s
11.842
x  t*
 86  (1.833)
 86  6.86
n
10
(79.14, 92.86)
37
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Confidence Interval Example Continued
To calculate the confidence interval, we
had to make the assumption that the
distribution of weekly viewing times was
normally distributed. Consider the
following normal plot of the 10 data points.
38
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Confidence Interval Example Continued
Notice that the normal plot looks reasonably
linear so it is reasonable to assume that the
number of hours of television watched per
week by shut-ins is normally distributed.
Normal Probability Plot
.999
.99
Probability
The output comes
from Minitab.
Typically if the
p-value is more than
0.05 we assume that
the distribution is
normal
.95
.80
.50
.20
.05
.01
.001
70
80
90
100
110
Hours
Anderson-Darling Normality Test
A-Squared: 0.226
P-Value: 0.753
Average: 86
StDev: 11.8415
N: 10
39
Anderson-Darling Normality Test
A-Squared: 0.226
P-Value: 0.753
Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.
Related documents