Download File

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Chapter 9
Estimation
Using a Single Sample
1
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
9.1 Point Estimation
A point estimate of a population
characteristic is a single number that is
based on sample data and represents a
plausible value of the characteristic.
2
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Example
A sample of 200 students at a large
university is selected to estimate the
proportion of students that wear contact lens.
In this sample 47 wore contact lens.
Let p = the true proportion of all students at
this university who wear contact lens.
Consider “success” being a student who
wears contact lens.
number of successes in the sample
The statistic p 
n
is a reasonable choice for a formula to obtain a point
estimate for p.
47
 0.235
Such a point estimate is p 
200
3
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Example
A sample of weights of 34 male freshman
students was obtained.
185
202
197
188
166
148
161
139
214
170
231
180
174
177
283
207
176
194
175
170
184
180
184
176
202
151
189
167
179
178
176
168
177
155
If one wanted to estimate the true mean of all
male freshman students, you might use the
sample mean as a point estimate for the true
mean.
sample mean  x  182.44
4
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Example
After looking at a histogram and boxplot of the
data (below) you might notice that the data
seems reasonably symmetric with a outlier, so
you might use either the sample median or a
sample trimmed mean as a point estimate.
5% trimmed mean  180.07
Calculated using Minitab
5
140
180
177  178
sample median 
 177.5
2
220
260
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Bias
A statistic with mean value equal to the
value of the population characteristic being
estimated is said to be an unbiased
statistic. A statistic that is not unbiased is
said to be biased.
Sampling
distribution of a
unbiased statistic
Original
distribution
6
Sampling
distribution of a
biased statistic
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Criteria
Given a choice between several unbiased
statistics that could be used for estimating a
population characteristic, the best statistic to
use is the one with the smallest standard
deviation.
Unbiased sampling
distribution with the
smallest standard
deviation, the Best
choice.
7
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
9.2 Large-sample Confidence
Interval for a Population Proportion
A confidence interval for a population
characteristic is an interval of plausible
values for the characteristic. It is
constructed so that, with a chosen degree
of confidence, the value of the
characteristic will be captured inside the
interval.
8
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Confidence Level
The confidence level associated with a
confidence interval estimate is the success
rate of the method used to construct the
interval.
9
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Recall
For the sampling distribution of p,
p(1  p)
mp = p, p 
and for large* n
n
The sampling distribution of p is
approximately normal.
Specifically when n is large*, the statistic
p has a sampling distribution that is
approximately normal with mean p and
standard deviation p(1  p) .
n
* np  10 and np(1-p)  10
10
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Some considerations
Approximately 95% of all large samples will
result in a value of p that is within
p(1  p) of the true population
1.96p  1.96
n
proportion p.
11
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Some considerations
Equivalently, this means that for 95% of
all possible samples, p will be in the
interval
p(1  p)
p(1  p)
p  1.96
to p  1.96
n
n
Since p is unknown and n is large, we estimate
p(1  p)
p(1  p)
with
n
n
This interval can be used as long as
np  10 and np(1-p)  10
12
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
The 95% Confidence Interval
When n is large, a 95% confidence
interval for p is

p(1  p)
p(1  p) 
, p  1.96
 p  1.96

n
n 

The endpoints of the interval are often
abbreviated by
p(1  p)
p  1.96
n
where - gives the lower endpoint and + the
upper endpoint.
13
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Example
For a project, a student randomly
sampled 182 other students at a large
university to determine if the majority of
students were in favor of a proposal to
build a field house. He found that 75 were
in favor of the proposal.
Let p = the true proportion of students
that favor the proposal.
14
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Example - continued
75
p
 0.4121
182
So np = 182(0.4121) = 75 >10 and
n(1-p)=182(0.5879) = 107 >10 we can use
the formulas given on the previous slide to
find a 95% confidence interval for p.
p(1  p)
0.4121(0.5879)
p  1.96
 0.4121  1.96
n
182
 0.4121  0.07151
The 95% confidence interval for p is
(0.341, 0.484).
15
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
9.3 The General Confidence
Interval
The general formula for a confidence
interval for a population proportion p
when
1. p is the sample proportion from a
random sample , and
2. The sample size n is large
(np  10 and np(1-p)  10)
is given by
p   z critical value 
16
p(1  p)
n
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Finding a z Critical Value
Finding a z critical value for a 98%
confidence interval.
2.33
Looking up the cumulative area or 0.9900 in the
body of the table we find z = 2.33
17
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Some Common Critical Values
Confidence z critical
level
value
80%
90%
95%
98%
99%
99.8%
99.9%
18
1.28
1.645
1.96
2.33
2.58
3.09
3.29
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Terminology
The standard error of a statistic is the
estimated standard deviation of the statistic.
For sample proportions, the standard deviation is
p(1  p)
n
This means that the standard error of the sample
proportion is
19
p(1  p)
n
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Terminology
The bound on error of estimation, B,
associated with a 95% confidence interval is
(1.96)·(standard error of the statistic).
The bound on error of estimation, B, associated
with a confidence interval is
(z critical value)·(standard error of the statistic).
20
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Sample Size
The sample size required to estimate a
population proportion p to within an amount
B with 95% confidence is
 1.96 
n  p(1  p) 

 B 
2
The value of p may be estimated by prior
information. If no prior information is available,
use p = 0.5 in the formula to obtain a
conservatively large value for n.
Generally one rounds the result up to the nearest integer.
21
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Sample Size Calculation
Example
If a TV executive would like to find a 95%
confidence interval estimate within 0.03
for the proportion of all households that
watch NYPD Blue regularly. How large a
sample is needed if a prior estimate for p
was 0.15.
We have B = 0.03 and the prior estimate of p = 0.15
2
2
 1.96 
 1.96 
n  p(1  p) 
  (0.15)(0.85) 
  544.2
 B 
 0.03 
A sample of 545 or more would be needed.
22
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Sample Size Calculation Example revisited
Suppose a TV executive would like to find a
95% confidence interval estimate within 0.03
for the proportion of all households that
watch NYPD Blue regularly. How large a
sample is needed if we have no reasonable
prior estimate for p.
We have B = 0.03 and should use p = 0.5 in
the formula.
2
2
 1.96 
 1.96 
n  p(1  p) 
  (0.5)(0.5) 
  1067.1
 B 
 0.03 
23
The required sample size is now 1068.
Notice, a reasonable ball park estimate for p
can lower the needed sample size.
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Another Example
A college professor wants to estimate the
proportion of students at a large university
who favor building a field house with a 99%
confidence interval accurate to 0.02. If one
of his students performed a preliminary
study and estimated p to be 0.412, how
large a sample should he take.
We have B = 0.02, a prior estimate p = 0.412 and we
should use the z critical value 2.58 (for a 99%
confidence interval)
2
2
 2.58 
 2.58 
n  p(1  p) 
  (0.412)(0.588) 
  4031.4
 B 
 0.02 
The required sample size is 4032.
24
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
One-Sample z Confidence
Interval for m
If
1. x is the sample mean from a random
sample,
2. The sample size n is large (generally
n30), and
3.  , the population standard deviation, is
known then the general formula for a
confidence interval for a population mean m
is given by

x   z critical value 
n
25
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
One-Sample z Confidence
Interval for m
If n is small (generally n < 30) but it is
reasonable to believe that the distribution of
values in the population is normal, a
confidence interval for m (when  is known)

is
x  z critical value


n
Notice that this formula works when  is known and
either
1. n is large (generally n  30) or
2. The population distribution is normal (any
sample size.
26
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Example
A certain filling machine has a true
population standard deviation  = 0.228
ounces when used to fill catsup bottles. A
random sample of 36 “6 ounce” bottles of
catsup was selected from the output from
this machine and the sample mean was
6.018 ounces.
Find a 90% confidence interval estimate for the
true mean fills of catsup from this machine.
27
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Example I (continued)
x  6.018,   0.228, n  36
The z critical value is 1.645

x  (z critical value)
n
0.228
 6.018  1.645
 6.018  0.063
36
90% Confidence Interval
(5.955, 6.081)
28
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Unknown  - Small Size Samples
[All Size Samples]
An Irish mathematician/statistician, W. S. Gosset
developed the techniques and derived the Student’s
t distributions that describe the behavior of
x  m0
s n
29
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
t Distributions
If X is a normally distributed random variable, the
statistic
x  m0
t
s n
follows a t distribution with df = n-1 (degrees of
freedom).
30
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
t Distributions
x  m0
This statistic t 
is fairly robust
s n
and the results are reasonable for moderate
sample sizes (15 and up) if x is just reasonable
centrally weighted. It is also quite reasonable
for large sample sizes for distributional
patterns (of x) that are not extremely skewed.
31
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
t Distributions
Comparison of normal and t distibutions
df = 2
df = 5
df = 10
df = 25
Normal
-4
32
-3
-2
-1
0
1
2
3
4
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
t Distributions
Notice: As df increase, t distributions
approach the standard normal
distribution.
Since each t distribution would require a
table similar to the standard normal table,
we usually only create a table of critical
values for the t distributions.
33
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
34
Central area captured:
Confidence level:
1
2
3
4
5
6
D
7
e
8
g
9
r
10
11
e
12
e
13
s
14
15
16
o
17
f
18
19
20
f
21
r
22
e
23
24
e
25
d
26
o
27
m
28
29
30
40
60
120
z critical values
0.80
0.90
0.95
0.98
0.99
0.998
0.999
80%
90%
95%
98%
99%
99.8%
99.9%
3.08
1.89
1.64
1.53
1.48
1.44
1.41
1.40
1.38
1.37
1.36
1.36
1.35
1.35
1.34
1.34
1.33
1.33
1.33
1.33
1.32
1.32
1.32
1.32
1.32
1.31
1.31
1.31
1.31
1.31
1.30
1.30
1.29
1.28
6.31
2.92
2.35
2.13
2.02
1.94
1.89
1.86
1.83
1.81
1.80
1.78
1.77
1.76
1.75
1.75
1.74
1.73
1.73
1.72
1.72
1.72
1.71
1.71
1.71
1.71
1.70
1.70
1.70
1.70
1.68
1.67
1.66
1.645
12.71
4.30
3.18
2.78
2.57
2.45
2.36
2.31
2.26
2.23
2.20
2.18
2.16
2.14
2.13
2.12
2.11
2.10
2.09
2.09
2.08
2.07
2.07
2.06
2.06
2.06
2.05
2.05
2.05
2.04
2.02
2.00
1.98
1.96
31.82
6.96
4.54
3.75
3.36
3.14
3.00
2.90
2.82
2.76
2.72
2.68
2.65
2.62
2.60
2.58
2.57
2.55
2.54
2.53
2.52
2.51
2.50
2.49
2.49
2.48
2.47
2.47
2.46
2.46
2.42
2.39
2.36
2.33
63.66
9.92
5.84
4.60
4.03
3.71
3.50
3.36
3.25
3.17
3.11
3.05
3.01
2.98
2.95
2.92
2.90
2.88
2.86
2.85
2.83
2.82
2.81
2.80
2.79
2.78
2.77
2.76
2.76
2.75
2.70
2.66
2.62
2.58
318.29
22.33
10.21
7.17
5.89
5.21
4.79
4.50
4.30
4.14
4.02
3.93
3.85
3.79
3.73
3.69
3.65
3.61
3.58
3.55
3.53
3.50
3.48
3.47
3.45
3.43
3.42
3.41
3.40
3.39
3.31
3.23
3.16
3.09
636.58
31.60
12.92
8.61
6.87
5.96
5.41
5.04
4.78
4.59
4.44
4.32
4.22
4.14
4.07
4.01
3.97
3.92
3.88
3.85
3.82
3.79
3.77
3.75
3.73
3.71
3.69
3.67
3.66
3.65
3.55
3.46
3.37
3.29
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
One-Sample t Procedures
Suppose that a SRS of size n is drawn from a
population having unknown mean m. The general
confidence limits are
s
x  (t critical value)
n
and the general confidence interval for m is
s
s 

, x  (t critical value)
 x  (t critical value)

n
n

35
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Confidence Interval Example
Ten randomly selected shut-ins were each
asked to list how many hours of television
they watched per week. The results are
82
66
90
84
75
88
80
94
110
91
Find a 90% confidence interval estimate for
the true mean number of hours of
television watched per week by shut-ins.
36
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Confidence Interval Example
Calculating the sample mean and standard
deviation we have n = 10, x =
 86,
86 s = 11.842
We find the critical t value of 1.833 by looking on the
t table in the row corresponding to df = 9, in the
column with bottom label 90%. Computing the
confidence interval for m is
s
11.842
x  t*
 86  (1.833)
 86  6.86
n
10
(79.14, 92.86)
37
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Confidence Interval Example
To calculate the confidence interval, we had
to make the assumption that the distribution
of weekly viewing times was normally
distributed. Consider the normal plot of the
10 data points produced with Minitab that is
given on the next slide.
38
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Confidence Interval Example
Notice that the normal plot looks reasonably
linear so it is reasonable to assume that the
number of hours of television watched per week
by shut-ins is normally distributed.
Normal Probability Plot
.999
.99
.95
Probability
Typically if the
p-value is more than
0.05 we assume that the
distribution is normal
.80
.50
.20
.05
.01
.001
70
80
90
100
110
Hours
Anderson-Darling Normality Test
A-Squared: 0.226
P-Value: 0.753
Average: 86
StDev: 11.8415
N: 10
39
Anderson-Darling Normality Test
A-Squared: 0.226
P-Value: 0.753
© 2008 Brooks/Cole, a division of Thomson Learning, Inc.
Related documents