Download Confidence Interval for a Proportion (p)

Document related concepts

History of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Resampling (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
8
Chapter
Sampling Distributions
and Estimation (Part 1)
Sampling Variation
Estimators and Sampling
Distributions
Sample Mean and the Central Limit
Theorem
Confidence Interval for a Mean (m)
with Known s
Confidence Interval for a Mean (m)
with Unknown s
Confidence Interval for a Proportion
(p)
McGraw-Hill/Irwin
Copyright © 2009 by The McGraw-Hill Companies, Inc.
Sampling Variation
•
•
•
8A-2
Sample statistic – a random variable
whose value depends on which
population items happen to be included in
the random sample.
Depending on the sample size, the sample
statistic could either represent the
population well or differ greatly from the
population.
This sampling variation can easily be
illustrated.
Sampling Variation
•
Consider eight random samples of size n = 5
from a large population of GMAT scores for
MBA applicants.
•
The sample means ( xi ) tend to be close to the
population mean (m = 520.78).
8A-3
Sampling Variation
•
8A-4
The dot plots show that the sample means
have much less variation than the individual
sample items.
Sampling Variation
8A-5
Estimators and Sampling
Distributions
Some Terminology
•
•
•
8A-6
Estimator – a statistic derived from a sample to
infer the value of a population parameter.
Estimate – the value of the estimator in a
particular sample.
Population parameters are represented by
Greek letters and the corresponding statistic
by Roman letters.
Estimators and Sampling
Distributions
Examples of Estimators
8A-7
Estimators and Sampling
Distributions
Sampling Distributions
•
•
•
8A-8
The sampling distribution of an estimator is
the probability distribution of all possible
values the statistic may assume when a
random sample of size n is taken.
An estimator is a random variable since
samples vary.
^
Sampling error =  – 
Estimators and Sampling
Distributions
Bias
•
8A-9
•
Bias is the difference between the expected
value of the estimator and the true
parameter.
Bias = E( ^
)–
•
^
An estimator is unbiased if E(  ) = 
•
On average, an unbiased estimator neither
overstates nor understates the true
parameter.
Estimators and Sampling
Distributions
Bias
•
Sampling error is random whereas bias is
systematic.
Figure 8.4
•
8A-10
An unbiased estimator avoids systematic
error.
Estimators and Sampling
Distributions
8A-11
Estimators and Sampling
Distributions
Efficiency
•
•
Efficiency refers to the variance of the
estimator’s sampling distribution.
A more efficient estimator has smaller
variance.
Figure 8.5
8A-12
Estimators and Sampling
Distributions
Consistency
•
A consistent estimator converges toward
the parameter being estimated as the
sample size
increases.
Figure 8.6
8A-13
Sample Mean and the
Central Limit Theorem
Central Limit Theorem (CLT) for a Mean
•
•
8A-14
If a random sample of size n is drawn from
a population with mean m and standard
deviation s, the distribution of the sample
mean x approaches a normal distribution
with mean m and standard deviation sx = s/
n as the sample size increase.
If the population is normal, the distribution
of the sample mean is normal regardless of
sample size.
Sample Mean and the
Central Limit Theorem
•
8A-15
If the population is exactly normal, then the
sample mean follows a normal distribution.
Sample Mean and the
Central Limit Theorem
•
8A-16
As the sample size n increases, the
distribution of sample means narrows in on
the population mean µ.
Sample Mean and the
Central Limit Theorem
•
8A-17
If the sample is large enough, the sample means
will have approximately a normal distribution
even if your population is not normal.
Sample Mean and the
Central Limit Theorem
Illustrations of Central Limit Theorem
8A-18
Sample Mean and the
Central Limit Theorem
Illustrations of Central Limit Theorem
•
Symmetric population
8A-19
Sample Mean and the
Central Limit Theorem
Illustrations of Central Limit Theorem
•
Skewed population
8A-20
Sample Mean and the
Central Limit Theorem
Example - Bottle Filling: Variation in X
8A-21
Sample Mean and the
Central Limit Theorem
Sample Size and Standard Error
•
The standard error declines as n increases,
but at a decreasing rate.
s
Make the interval m + z
n
small by increasing n.
The distribution of
sample means collapses
at the true population
mean m as n increases.
8A-22
Sample Mean and the
Central Limit Theorem
Illustration: All Possible Samples from a
Uniform Population
8A-23
•
Consider a discrete uniform population
consisting of the integers {0, 1, 2, 3}.
•
The population parameters are:
m = 1.5, s = 1.118
Sample Mean and the
Central Limit Theorem
Illustration: All Possible Samples from a
Uniform Population
•
8A-24
All possible samples of size n = 2, with
replacement, are given below along with
their means.
Sample Mean and the
Central Limit Theorem
Illustration: All Possible Samples from a
Uniform Population
•
8A-25
The population is uniform, yet the
distribution of all possible sample means
has a peaked triangular shape.
Sample Mean and the
Central Limit Theorem
Illustration: All Possible Samples from a
Uniform Population
•
The CLT’s predictions for the mean and
standard error are
mx = m = 1.5
and
sx = s/ n
8A-26
= 1.118/ 2 = 0.7905
Sample Mean and the
Central Limit Theorem
Illustration: All Possible Samples from a
Uniform Population
•
x the mean of means is
x = 1(0.0) + 2(.05) + 3(1.0) + 4(1.5) + 3(2.0) + 2(2.5) + 1(3.0) = 1.5
16
•
8A-27
The standard deviation of the means is
Confidence Interval for a
Mean (m) with Known s
What is a Confidence Interval?
•
A sample mean x is a point estimate of the
population mean m.
•
A confidence interval for the mean is a range
mlower < m < mupper
The confidence level is the probability that the
confidence interval contains the true population
mean.
The confidence level (usually expressed as a %)
is the area under the curve of the sampling
distribution.
•
•
8A-28
Confidence Interval for a
Mean (m) with Known s
What is a Confidence Interval?
•
8A-29
The confidence interval for m with known s is:
Confidence Interval for a
Mean (m) with Known s
Choosing a Confidence Level
•
•
•
8A-30
A higher confidence level leads to a wider
confidence interval.
Greater
confidence
implies loss of
precision.
95% confidence
is most often
used.
Confidence Interval for a
Mean (m) with Known s
Interpretation
•
•
•
8A-31
A confidence interval either does or does
not contain m.
The confidence level quantifies the risk.
Out of 100 confidence intervals,
approximately 95% would contain m, while
approximately 5% would not contain m.
Confidence Interval for a
Mean (m) with Known s
Is s Ever Known?
•
•
•
8A-32
Yes, but not very often.
In quality control applications with ongoing
manufacturing processes, assume s stays
the same over time.
In this case, confidence intervals are used
to construct control charts to track the
mean of a process over time.
Confidence Interval for a
Mean (m) with Unknown s
Student’s t Distribution
•
Use the Student’s t distribution instead of
the normal distribution when the
population is normal but the standard
deviation s is unknown and the sample size
is small.
s
x+t
n
• The confidence interval for m (unknown s)
is
s
s
x-t
<m< x+t
n
n
8A-33
Confidence Interval for a
Mean (m) with Unknown s
Student’s t Distribution
8A-34
Confidence Interval for a
Mean (m) with Unknown s
Student’s t Distribution
•
•
8A-35
t distributions are symmetric and shaped
like the standard normal distribution.
The t distribution is dependent on the size
of the sample.
Figure 8.11
Confidence Interval for a
Mean (m) with Unknown s
Degrees of Freedom
•
•
8A-36
Degrees of Freedom (d.f.) is a parameter
based on the sample size that is used to
determine the value of the t statistic.
Degrees of freedom tell how many
observations are used to calculate s, less
the number of intermediate estimates used
in the calculation.
n=n-1
Confidence Interval for a
Mean (m) with Unknown s
Degrees of Freedom
•
•
8A-37
As n increases, the t distribution
approaches the shape of the normal
distribution.
For a given confidence level, t is always
larger than z, so a confidence interval
based on t is always wider than if z were
used.
Confidence Interval for a
Mean (m) with Unknown s
Comparison of z and t
•
•
•
•
8A-38
For very small samples, t-values differ
substantially from the normal.
As degrees of freedom increase, the tvalues approach the normal z-values.
For example, for n = 31, the degrees of
freedom are: n = 31 – 1 = 30
What would the t-value be for a 90%
confidence interval?
Confidence Interval for a
Mean (m) with Unknown s
Comparison of z and t
For n = 30, the corresponding z-value is 1.645.
8A-39
Confidence Interval for a
Mean (m) with Unknown s
Example GMAT Scores Again
•
Here are the GMAT scores from 20
applicants to an MBA program:
Figure 8.13
8A-40
Confidence Interval for a
Mean (m) with Unknown s
Example GMAT Scores Again
•
Construct a 90% confidence interval for the
mean GMAT score of all MBA applicants.
x = 510
•
•
8A-41
s = 73.77
Since s is unknown, use the Student’s t for
the confidence interval with n = 20 – 1 = 19
d.f.
First find t0.90 from Appendix D.
Confidence Interval for a
Mean (m) with Unknown s
•
8A-42
For a 90%
confidence
interval, use
Appendix D to find
t0.05 = 1.729
Confidence Interval for a
Mean (m) with Unknown s
Example GMAT Scores Again
•
The 90% confidence interval is:
s
s
x-t
x
+
t
<m<
n
n
73.77
73.77
513 – 1.729
<
m
<
513
+
1.729
20
20
513 – 28.52 < m < 513 + 28.52
•
8A-43
We are 90% certain that the true mean GMAT
score is within the interval 481.48 < m < 538.52.
Confidence Interval for a
Mean (m) with Unknown s
Confidence Interval Width
•
•
8A-44
Confidence interval width reflects
- the sample size,
- the confidence level and
- the standard deviation.
To obtain a narrower interval and more
precision
- increase the sample size or
- lower the confidence level (e.g., from 90%
to 80% confidence)
Confidence Interval for a
Mean (m) with Unknown s
A “Good” Sample
•
8A-45
Here are five different samples of 25 births
from a population of N = 4,409 births and
their 95% CIs.
Confidence Interval for a
Mean (m) with Unknown s
A “Good” Sample
•
An examination of the samples shows that
sample 5 has an outlier.
Figure 8.15
•
The outlier is a warning that the resulting
confidence interval possibly could not be
trusted.
In this case, a larger sample size is needed.
•
8A-46
Confidence Interval for a
Mean (m) with Unknown s
Using Appendix D
•
•
•
•
8A-47
Beyond n = 50, Appendix D shows n in
steps of 5 or 10.
If the table does not give the exact degrees
of freedom, use the t-value for the next
lower n.
This is a conservative procedure since it
causes the interval to be slightly wider.
For d.f. above 150, use the z-value.
Confidence Interval for a
Mean (m) with Unknown s
Using Excel
•
8A-48
Use Excel’s function =TINV(probability, d.f.)
to obtain a two-tailed value of t. Here,
“probability” is 1 minus the confidence
level.
Figure 8.17
Confidence Interval for a
Mean (m) with Unknown s
Using MegaStat
•
MegaStat give you a choice of z or t and
does all calculations for you.
Figure 8.18
8A-49
Confidence Interval for a
Mean (m) with Unknown s
Using MINITAB
•
Figure 8.19
8A-50
MINITAB
also gives
confidence
intervals
for the
median and
standard
deviation.
Confidence Interval for a
Proportion (p)
•
•
•
8A-51
A proportion is a mean of data whose only
value is 0 or 1.
The Central Limit Theorem (CLT) states that
the distribution of a sample proportion p =
x/n approaches a normal distribution with
mean p and standard deviation
p(1-p)
sp =
n
p = x/n is a consistent estimator of p.
Confidence Interval for a
Proportion (p)
Illustration: Internet Hotel Reservations
•
•
•
8A-52
Management of the Pan-Asian Hotel
System tracks the percent of hotel
reservations made over the Internet.
The binary data are:
1 Reservation is made over the Internet
0 Reservation is not made over the Internet
After data was collected, it was determined
that the proportion of Internet reservations
is p = .20.
Confidence Interval for a
Proportion (p)
Illustration: Internet Hotel Reservations
8A-53
•
Here are five random samples of n = 20.
Each p is a point estimate of p.
•
Notice the sampling variation in the value
of p.
Confidence Interval for a
Proportion (p)
Applying the CLT
•
8A-54
The distribution of a sample proportion p =
x/n is symmetric if p = .50 and regardless of p,
approaches symmetry as n increases.
Confidence Interval for a
Proportion (p)
Applying the CLT
•
•
•
•
8A-55
As n increases, the statistic p = x/n more
closely resembles a continuous random
variable.
As n increases, the distribution becomes
more symmetric and bell shaped.
As n increases, the range of the sample
proportion p = x/n narrows.
The sampling variation can be reduced by
increasing the sample size n.
Confidence Interval for a
Proportion (p)
When is it Safe to Assume Normality?
•
•
Rule of Thumb: The sample proportion p =
x/n may be assumed to be normal if both
np > 10 and n(1-p) > 10.
Sample size to
assume
normality:
8A-56
Table 8.9
Confidence Interval for a
Proportion (p)
Standard Error of the Proportion
•
•
8A-57
The standard error
of the proportion sp
depends on p, as
well as n.
It is largest when p
is near .50 and
smaller when p is
near 0 or 1.
Confidence Interval for a
Proportion (p)
Standard Error of the Proportion
•
The formula for the standard error is symmetric.
Figure 8.22
8A-58
Confidence Interval for a
Proportion (p)
Standard Error of the Proportion
•
Enlarging n reduces the standard error sp
but at a diminishing rate.
Figure 8.23
8A-59
Confidence Interval for a
Proportion (p)
Confidence Interval for p
•
The confidence interval for p is
p+z
p(1-p)
n
Where z is based on the desired confidence.
•
8A-60
Since p is unknown, the confidence
interval for p = x/n (assuming a large
sample) is
p(1-p)
p+z
n
Confidence Interval for a
Proportion (p)
Confidence Interval for p
•
8A-61
z can be chosen for any confidence level.
For example,
Confidence Interval for a
Proportion (p)
Example Auditing
•
A sample of 75 retail in-store purchases
showed that 24 were paid in cash. What is p?
p = x/n = 24/75 = .32
•
Is p normally distributed?
np = (75)(.32) = 24
n(1-p) = (75)(.88) = 51
Both are > 10, so we may conclude
normality.
8A-62
Confidence Interval for a
Proportion (p)
Example Auditing
•
The 95% confidence interval for the
proportion of retail in-store purchases that
are paid in cash is:
.32(1-.32)
p(1-p) = .32 + 1.96
p+z
75
n
= .32 + .106
•
8A-63
.214 < p < .426
We are 95% confident that this interval
contains the true population proportion.
Confidence Interval for a
Proportion (p)
Narrowing the Interval
•
•
8A-64
The width of the confidence interval for p
depends on
- the sample size
- the confidence level
- the sample proportion p
To obtain a narrower interval (i.e., more
precision) either
- increase the sample size
- reduce the confidence level
Confidence Interval for a
Proportion (p)
Using Excel and MegaStat
•
To find a confidence interval for a
proportion in Excel, use (for example)
=0.15-NORMSINV(.95)*SQRT(0.15*(1-0.15)/200)
=0.15+NORMSINV(.95)*SQRT(0.15*(10.15)/200)
8A-65
Confidence Interval for a
Proportion (p)
Using Excel and MegaStat
•
In MegaStat, enter p and n to obtain the
confidence interval for a proportion.
Figure 8.23
•
8A-66
MegaStat always assumes normality.
Confidence Interval for a
Proportion (p)
Using Excel and MegaStat
•
•
If the sample is small, the distribution of p may not
be well approximated by the normal.
Confidence limits around p can be constructed by
using the binomial distribution.
Figure 8.24
8A-67
Confidence Interval for a
Proportion (p)
Polls and Margin of Error
•
•
•
8A-68
In polls and surveys, the confidence
interval width when p = .5 is called the
margin of error.
Below are some margins of error for 95%
confidence interval assuming p = .50.
Each reduction in the margin of error
requires a disproportionately larger sample
size.
Confidence Interval for a
Proportion (p)
Rule of Three
•
If in n independent trials, no events occur,
the upper 95% confidence bound is
approximately 3/n.
Very Quick Rule
•
A Very Quick Rule (VQR) for a 95%
confidence interval when p is near .50 is
p + 1/ n
8A-69
Applied Statistics in
Business & Economics
End of Chapter 8A
8A-70
McGraw-Hill/Irwin
Copyright © 2009 by The McGraw-Hill Companies, Inc.