Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Misuse of statistics wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Resampling (statistics) wikipedia , lookup

German tank problem wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
Confidence Interval for a Population
Mean:
Normal (z) Statistic
Estimation Process
Population
Mean
x = 50
Mean, µ, is
unknown
☺
☺
Random Sample
☺
☺
☺
☺Sample ☺
☺
☺
☺
☺
☺
☺
I am 95% confident
that µ is between 40 &
60.
Confidence Interval
According to the Central Limit Theorem, the sampling
distribution of the sample mean is approximately normal
for large samples. Let us calculate the interval estimator:
1.96σ
x ± 1.96σ x = x ±
n
That is, we form an interval from 1.96 standard
deviations below the sample mean to 1.96 standard
deviations above the mean. Prior to drawing the sample,
what are the chances that this interval will enclose µ,
the population mean?
Confidence Interval
If sample measurements yield a value of x that falls
between the two lines on either side of µ, then the
interval x ± 1.96σ x will contain µ.
The area under the
normal curve between
these two boundaries is
exactly .95. Thus, the
probability that a
randomly selected
interval will contain µ is
equal to .95.
95% Confidence Level
If our confidence level is 95%, then in the long run, 95% of
our confidence intervals will contain µ and 5% will not.
For a confidence level of 95%, the area in the two tails is
.05. To choose a different confidence level we increase or
decrease the area (call it α) assigned to the tails. If we place
α/2 in each tail
and zα/2 is the z-value, the
confidence interval with
coefficient (1 – α) is
( )
x ± zα 2 σ x .
Conditions Required for a Valid LargeSample
Confidence Interval for µ
1. A random sample is selected from the target
population.
2. The sample size n is large (i.e., n ≥ 30). Due to the
Central Limit Theorem, this condition guarantees
that the sampling distribution of x is approximately
normal. Also, for large n, s will be a good estimator
of σ.
Large-Sample (1 – α)% Confidence Interval for µ
 σ 
x ± zα 2 σ x = x ± zα 2 
 n 
where zα/2 is the z-value with an area α/2 to its right and
in the standard normal distribution. The parameter σ is
the standard deviation of the sampled population, and n
is the sample size.
Note: When σ is unknown and n is large (n ≥ 30), the
confidence interval is approximately equal to
 s 
x ± zα 2 
 n 
where s is the sample standard deviation.
( )
Thinking Challenge
You’re a Q/C inspector for Gallo.
The σ for 2-liter bottles is .05
liters. A random sample of 100
bottles showed x = 1.99 liters.
What is the 90% confidence
interval estimate of the true
mean amount in 2-liter bottles?
2 liter
2 liter
© 1984-1994 T/Maker Co.
Confidence Interval
Solution*
x − zα /2 ⋅
1.99 − 1.645⋅
σ
n
.05
≤ µ ≤ x + zα /2 ⋅
σ
n
≤ µ ≤ 1.99 + 1.645⋅
100
.05
100
1.982 ≤ µ ≤ 1.998
We can be 90% confident that the mean amount in 2-liter bottles
between 1.982 and 1.998. Our confidence is derived from the fact that
90% of the intervals formed in repeated applications of this procedure
would contain µ
Exercise
• A random sample of 70 observations from a normally
distributed population possesses a sample mean equal
to 26.2 and a sample standard deviation equal to 4.1
• A) Find an approximate 95% confidence interval for µ.
• B) What do you mean when you say that a confidence
level is 95%?
• C) Find an approximate 99% confidence interval for µ.
• D) What happens to the width of a confidence interval
as the value of the confidence coefficient is increased
while the sample size is held fixed?
Confidence Interval for a Population
Mean:
Student’s t-Statistic
Small sample size problem for
inference about µ
• The use of a small sample in making inference
about µ presents two problems when we
attempt to use the standard normal z as a test
statistic.
Problem 1
• The shape of the sampling distribution of the sample
mean now depends on the shape of the population
sampled.
• We can no longer assume that the sampling
distribution of sample mean is approximately
normal because the central limit theorem ensures
normality only for samples that are sufficiently large.
Solution to Problem 1
• We know that if our sample comes from a population
with normal distribution the sampling distribution of
sample mean will be normal regardless of the sample
size.
Problem 2
• The population standard deviation σ is almost
always unknown. For small samples the sample
standard deviaiton s provides poor approximation
for σ.
Solution to Problem 2
(Small Sample with σ Unknown)
Instead of using the standard normal statistic
z=
use the t–statistic
x−µ
σx
x−µ
=
σ n
x−µ
t=
s n
in which the sample standard deviation, s, replaces the
population standard deviation, σ.
Conditions Required for a Valid SmallSample Confidence Interval for µ
• A random sample is selected from the target
population
• The population has a relative frequency
distribution that is approximately normal.
Small Sample with σ known
Use the standard normal statistic
z=
x−µ
σx
x−µ
=
σ n
Student’s t-Statistic
The t-statistic has a sampling distribution very much like
that of the z-statistic: mound-shaped, symmetric, with
mean 0.
The primary difference
between the sampling
distributions of t and z
is that the t-statistic is
more variable than the
z-statistic.
Degrees of Freedom
The actual amount of variability in the sampling
distribution of t depends on the sample size n. A
convenient way of expressing this dependence is to say
that the t-statistic has (n – 1) degrees of freedom (df).
Student’s t Distribution
Standard
Normal
Bell-Shaped
Symmetric
‘Fatter’ Tails
t (df = 13)
t (df = 5)
0
The smaller the degrees of freedom for t-statistic, the more variable
will be its sampling distribution.
z
t
1)
2)
• We have a random sample of 15 cars of the same model. Assume
that the gas milage for the population is normally distributed with a
standard deviaition of 5.2 miles per galon.
• A) Identify the bounds for a 90% confidence interval for the mean
given a sample mean of 22.8 miles per gallon.
• B) The car manufacturer of this particular model claims that the
average gas milage is 26 miles per gallon. Discuss the validity of this
claim using the 90% confidence interval calculated in A.
• C) Let a and b represent the lower and upper boundaries of 90%
confidence intervl for the mean of the population. Is it correct to
conclude that tere is a 90% probability that true population mean
lies between a and b?
Thinking Challenge
• We have a random sample of customer order totals
with an average of $78.25 and a population standard
deviation of $22.5.
• A) Calculate a 90% confidence interval for the mean
given a sample size of 40 orders.
• B) Calculate a 90% confidence interval for the mean
given a sample size of 75 orders.
• C) Explain the difference in the 90% confidence
intervals calculated in A and B.
• Calculate the minimum sample size needed to identify
a 90% confidence interval for the mean assuming a $5
margin of error.