Download Chapter 8

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

German tank problem wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Week 6
October 6-10
Four Mini-Lectures
QMM 510
Fall 2014
ML 6.1
Chapter Contents
8.1
8.2
8.3
8.4
8.5
8.6
8.7
8.8
8.9
Sampling Variation
Estimators and Sampling Errors
Sample Mean and the Central Limit Theorem
Confidence Interval for a Mean (μ) with Known σ
Confidence Interval for a Mean (μ) with Unknown σ
Confidence Interval for a Proportion (π)
Estimating from Finite Populations
Sample Size Determination for a Mean
Sample Size Determination for a Proportion
So many
topics, so little
time …
8.10 Confidence Interval for a Population Variance,  2 (Optional)
8-2
Chapter 8
Sampling Distributions
Learning Objectives
LO8-1: Define sampling error, parameter, and estimator.
LO8-2: Explain the desirable properties of estimators.
LO8-3: State the Central Limit Theorem for a mean.
LO8-4: Explain how sample size affects the standard error.
8-3
Chapter 8
Sampling Distributions
Chapter 8
Sampling Variation
•
Sample statistic – a random variable whose value depends on which
population items are included in the random sample.
•
Depending on the sample size, the sample statistic could either represent
the population well or differ greatly from the population.
•
This sampling variation can be illustrated. Here are 100 individual items
drawn from a population. When n = 1, the histogram of the sampled items
resembles the population, but not exactly.
8-4
Chapter 8
Sampling Variation
Example: GMAT Scores
•
Consider eight random samples of size n = 5 from a large population
of GMAT scores for MBA applicants.
• The sample items vary, but the means
tend to be close to the population
mean (m = 520.78).
8-5
Example: GMAT Scores
•
8-6
Sample dot plots show that the sample means have much less variation
than the individual sample items.
Chapter 8
Sampling Variation
Chapter 8
Estimators and Sampling Distributions
Some Terminology
•
•
•
Estimator – a statistic derived from a sample to infer the value of a
population parameter.
Estimate – the value of the estimator in a particular sample.
A population parameter is usually represented by a
Greek letter and the corresponding statistic by a Roman letter.
8-7
Examples of Estimators
Sampling Distributions
The sampling distribution of an estimator is the probability distribution of all possible
values the statistic may assume when a random sample of size n is taken.
Note: An estimator is a random
variable since samples vary.
8-8
Chapter 8
Estimators and Sampling Distributions
Chapter 8
Estimators and Sampling Distributions
• Sampling error is the difference between an estimate and the
corresponding population parameter. For example, if we use the sample
mean as an estimate for the population mean, then the
•
Bias is the difference between the expected value of the estimator and the
true parameter. Example for the mean,
•
An estimator is unbiased if its expected value is the parameter being estimated.
The sample mean is an unbiased estimator of the population mean since
•
On average, an unbiased estimator neither overstates nor understates the true
parameter.
8-9
Chapter 8
Estimators and Sampling Distributions
Unbiased
A desirable property for an estimator is for it to be unbiased.
8-10
Chapter 8
Estimators and Sampling Distributions
Efficiency
•
•
Efficiency refers to the variance of the estimator’s sampling distribution.
A more efficient estimator has smaller variance.
Figure 8.6
8-11
Chapter 8
Estimators and Sampling Distributions
Consistency
A consistent estimator converges toward the parameter being
estimated as the sample size increases.
Figure 8.6
8-12
Chapter 8
Central Limit Theorem
The Central Limit Theorem is a powerful result that allows us to
approximate the shape of the sampling distribution of the sample
mean even when we don’t know what the population looks like.
8-13
Chapter 8
Central Limit Theorem
If the population is exactly normal, then
the sample mean follows a normal
distribution.
As the sample size n increases, the
distribution of sample means narrows
in on the population mean µ.
8-14
If the sample is large enough, the sample means will have
approximately a normal distribution even if your population is
not normal.
8-15
Chapter 8
Central Limit Theorem
Chapter 8
Central Limit Theorem
Illustrations of Central Limit Theorem
Using the uniform
and a right-skewed
distribution.
Note:
8-16
Chapter 8
Central Limit Theorem
Applying The Central Limit Theorem
The Central Limit Theorem permits us to define an interval within which the
sample means are expected to fall. As long as the sample size n is large enough,
we can use the normal distribution regardless of the population shape (or any n if
the population is normal to begin with).
8-17
Chapter 8
Central Limit Theorem
Sample Size and Standard Error
The sample means tend to fall within a narrower interval as n increases. The key is
the standard error:
x  / n
For example, when n = 4 the standard error is halved. To halve it again
requires n = 16, and to halve it again requires n = 64. To halve the
standard error, you must quadruple the sample size (the law of diminishing
returns).
8-18
Chapter 8
Central Limit Theorem
Illustration: All Possible Samples from a Uniform Population
•
Consider a discrete uniform population consisting of the integers {0, 1,
2, 3}.
•
The population parameters are: m = 1.5,  = 1.118.
8-19
Chapter 8
Central Limit Theorem
Illustration: All Possible Samples from a Uniform Population
•
The population is uniform, yet the distribution of all possible sample
means of size 2 has a peaked triangular shape.
8-20
Chapter 8
Central Limit Theorem
Illustration: 100 Samples from a Uniform Population
The population is
uniform, yet the
histogram of sample
means has a peaked
triangular shape
starting with n = 2.
By n = 8, the
histogram appears
normal.
8-21
Chapter 8
Central Limit Theorem
Illustration: 100 Samples from a Skewed Population
The population is
skewed, yet the
histogram of sample
means starts to have
a normal shape
starting with n = 4.
By n = 16, the
histogram appears
arguably normal.
8-22
Chapter 8
Confidence Interval for
a Mean (m) with Known 
ML 6.2
What Is a Confidence Interval?
8-23
Chapter 8
Confidence Interval for a Mean (m) with Known 
What is a Confidence Interval?
•
The confidence interval for m with known  is:
z-values for commonlyused confidence levels
8-24
Chapter 8
Confidence Interval for a Mean (m) with Known 
Example: Bottle Fill
… but usually we do NOT know σ
8-25
Chapter 8
Confidence Interval for a Mean (m) with Known 
Choosing a Confidence Level
•
A higher confidence level leads to a wider confidence interval.
•
Greater confidence
implies loss of precision
(i.e. greater margin of
error).
95% confidence is most
often used.
•
Confidence Intervals for Example 8.2
8-26
Chapter 8
Confidence Interval for a Mean (m) with Known 
Interpretation
•
A confidence interval either does or does not contain m.
•
The confidence level quantifies the risk.
•
Out of 100 confidence intervals, approximately 95% may contain m, while
approximately 5% might not contain m when constructing 95% confidence
intervals (for example, sample 14 below).
8-27
Chapter 8
Confidence Interval for a Mean (m) with Known 
When Can We Assume Normality?
•
If  is known and the population is normal, then we can safely use the
formula to compute the confidence interval.
• If  is known and we do not know whether the population is normal, a
common rule of thumb is that n  30 is sufficient to use the formula as
long as the distribution is approximately symmetric with no outliers.
• Larger n may be needed to
assume normality if you are
sampling from a strongly
skewed population or one
with outliers.
8-28
Student’s t Distribution
Chapter 8
Confidence Interval for
a Mean (m) with Unknown 
ML 6.3
… and usually we do
NOT know σ …
Use the Student’s t distribution instead of the normal distribution when the
population is normal but the standard deviation s is unknown and the sample
size is small.
8-29
Chapter 8
Confidence Interval for a Mean (m) with Unknown 
Student’s t Distribution
8-30
Student’s t Distribution
•
t distributions are symmetric and shaped like the
standard normal distribution.
•
The t distribution is dependent on the size of the
sample.
Comparison of Normal and Student’s t
8-31
Figure 8.11
Chapter 8
Confidence Interval for a Mean (m) with Unknown 
Degrees of Freedom
8-32
•
Degrees of freedom (d.f.) is a parameter based on the sample size that
is used to determine the t distribution.
•
The d.f. for the t distribution in this case is given by d.f. = n 1.
•
As n increases, the t distribution approaches the shape of the
normal distribution.
•
For a given confidence level, t is always larger than z, so a
confidence interval based on t is always wider than if z were
used.
Comparison of Normal and Student’s t
Chapter 8
Confidence Interval for a Mean (m) with Unknown 
Comparison of z and t
•
•
For very small samples, t-values differ substantially from the normal.
As degrees of freedom increase, the t-values approach the normal zvalues.
• For example, for n = 31, the
degrees of freedom would be d.f.
= 31 – 1 = 30.
• So for a 90 percent
confidence interval, we
would use t = 1.697,
which is slightly larger
than z = 1.645.
8-33
Note: the z and t distributions are
almost the same for d.f. = 30
Chapter 8
Confidence Interval for a Mean (m) with Unknown 
Example: GMAT Scores Again
8-34
Figure 8.13
Chapter 8
Confidence Interval for a Mean (m) with Unknown 
Example: GMAT Scores Again
•
Construct a 90% confidence interval for the mean GMAT score of all
MBA applicants.
x = 510
8-35
s = 73.77
•
Since  is unknown, use the Student’s t for the confidence
interval with d.f. = 20 – 1 = 19.
•
Find t/2 = t.05 = 1.729 from Appendix D.
Chapter 8
Confidence Interval for a Mean (m) with Unknown 
Example: GMAT Scores Again
•
For a 90% confidence
interval, use Appendix D
to find t0.05 = 1.729 with
d.f. = 19.
Note: We could also use
Excel, MINITAB, etc. to
obtain t.05 values as well
as to construct
confidence intervals.
=T.INV.2T(0.1,19) = 1.729
We are 90 percent confident
that the true mean GMAT score
might be within the interval
[481.48, 538.52]
8-36
Chapter 8
Confidence Interval for a Mean (m) with Unknown 
Confidence Interval Width
•
Confidence interval width reflects
- the sample size,
- the confidence level and
- the standard deviation.
•
To obtain a narrower interval and more precision
- increase the sample size, or
- lower the confidence level (e.g., from 90%
to 80% confidence).
There is no
free lunch!
8-37
Chapter 8
Confidence Interval for a Mean (m) with Unknown 
Using Appendix D
8-38
Chapter 8
Confidence Interval for a Mean (m) with Unknown 
•
Beyond d.f. = 50, Appendix D shows d.f. in steps of 5 or 10.
•
If the table does not give the exact degrees of freedom, use the t-value
for the next lower degrees of freedom.
•
This is a conservative procedure since it causes the interval to be
slightly wider.
•
A conservative statistician may use the t distribution for confidence
intervals when σ is unknown because using z would underestimate the
margin of error.
Chapter 8
Confidence Interval for a
Population Variance, 2.
ML 6.4
Chi-Square Distribution
•
If the population is normal, then the sample variance s2 follows the
chi-square distribution (c2) with degrees of freedom d.f. = n – 1.
•
Lower (c2L) and upper (c2U) tail percentiles for the chi-square
distribution can be found using Appendix E.
Note: The chi-square
distribution is skewed
right, but less so for
larger d.f.
8-39
Chapter 8
Confidence Interval for a Population Variance, 2
Confidence Interval
•
Using the sample variance s2, the confidence interval is
•
To obtain a confidence interval for the standard deviation ,
just take the square root of the interval bounds.
8-40
Chapter 8
Confidence Interval for a Population Variance, 2
• You can use Appendix E to find critical chi-square values.
or from Excel:
=CHISQ.INV(0.025,39)= 23.65
=CHISQ.INV(0.975,39) = 58.12
8-41
Chapter 8
Confidence Interval for a Population Variance, 2
Bottom Line:
•
Estimating a variance is easy.
•
But you don’t see it very often.
•
Maybe because the chi-square distribution is less familiar?
•
Maybe because we usually are more about the mean?
8-42