Download Section 9-2 Hypothesis Tests for Population Means

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia, lookup

Statistics wikipedia, lookup

Transcript
9.2 Hypothesis Tests for Population
Means
LEARNING GOAL
Understand and interpret one- and two-tailed hypothesis
tests for claims made about population means, and learn to
recognize and avoid common errors (type I and type II
errors) in hypothesis tests.
Copyright © 2009 Pearson Education, Inc.
One-Tailed Hypothesis Tests
Consider the following hypothetical situation. Columbia
College advertises that the mean starting salary of its
graduates is $39,000. An independent organization suspects
that this claim is exaggerated and decides to conduct a
hypothesis test to seek evidence to support its suspicion.
The null and alternative hypotheses are
H0: μ = $39,000
Ha: μ < $39,000
Because the alternative hypothesis has a “less than” form, we
are dealing with a left-tailed hypothesis test.
Copyright © 2009 Pearson Education, Inc.
Slide 9.2- 2
They select a random sample of 100 recent graduates from
the college. The mean salary of the graduates in the sample
turns out to be xx̄ = $37,000.
From the four-step hypothesis test process on page 376:
Step 1: The The mean starting salary has been identified as the
population parameter of interest, the null and alternative
hypotheses have been stated.
Step 2: The sample has been drawn and measured to
determine its sample size and sample mean.
The Sampling Distribution
Step 3 of the hypothesis test process is finding the likelihood of
observing a sample mean as extreme as the one found, under the
assumption that the null hypothesis is true.
Copyright © 2009 Pearson Education, Inc.
Slide 9.2- 3
For the Columbia College example, the question becomes
this: How likely are we to select a sample (of size n = 100)
with a mean of $37,000 or less when the mean for the whole
population is $39,000?
To answer this question, we need a crucial observation based
on our work with sampling distributions in Chapter 8: The
observed sample mean (xxx̄ == $37,000
$37,000 ) is just one point in a
distribution of sample means.
Moreover, assuming the null hypothesis is true, this
distribution of sample means is approximately normal with a
mean of μ = $39,000.
Copyright © 2009 Pearson Education, Inc.
Slide 9.2- 4
Figure 9.1 Graphical interpretation of the hypothesis test for the Columbia
College example. If the null hypothesis is true, then the population mean is μ =
$39,000. In that case, if we took many samples from the population, the
distribution of sample means would be approximately normal with a mean of
$39,000. Under this assumption, the distance of a single sample mean from the
population mean allows us to decide whether to reject or not reject the null
hypothesis.
Copyright © 2009 Pearson Education, Inc.
Slide 9.2- 5
One-Tailed Hypothesis Tests
The Sampling Distribution
Finding the Standard Score
Because the sampling distribution is a (nearly) normal
distribution, the standard score of the sample mean represents
a quantitative measure of the distance between the sample
mean and the claimed population mean.
Recall from Section 5.2 that the standard score (or z-score) of
a data value in a normal distribution is the number of standard
deviations that it lies above or below the mean of the
distribution.
Copyright © 2009 Pearson Education, Inc.
Slide 9.2- 6
From the Central Limit Theorem (Section 5.3), the standard
deviation of a distribution of sample means is σ / n , where σ is
the population standard deviation and n is the sample size.
Putting these ideas together, we find the following formula for
the standard score of a sample mean :
x–
sample mean – population mean
z=
=
standard deviation of sampling distribution  / n
The only remaining problem is that we generally do not know the
population standard deviation, σ. For now, let’s assume that we
can approximate the population standard deviation with the
sample standard deviation, s, and that s = $6,150 for the 100
salaries in the sample.
Copyright © 2009 Pearson Education, Inc.
Slide 9.2- 7
(In Section 10.1 we’ll discuss a better way to proceed when σ
is not known.) In that case, we set σ = $6,150 and the
standard deviation for the distribution of sample means is
z=

= $6,150 = $615
n
1100
Using this value in the previous equation tells us that the
standard score for the sample mean of xx̄ = $37,000 in a
sampling distribution with a population mean of μ = $39,000
is
x–
$37,000 – $39,000
z=
=
= -3.25
$615
/ n
In other words, the sample mean of xx̄ = $37,000 lies 3.25
standard deviations below the mean of the sampling
distribution.
Copyright © 2009 Pearson Education, Inc.
Slide 9.2- 8
Computing the Standard Score for the Sample
Mean in a Hypothesis Test
When we draw a random sample for a hypothesis test, we
can consider it to be one of many possible samples in the
sampling distribution. Given the sample size (n), the
sample mean ( x̄ ), the population standard deviation (σ),
and the claimed population mean (μ), we make the
following computations:
standard deviation for the distribution
σ
of sample means =
n
x–
standard score for the sample mean, z =
/ n
Note: In reality, it is rare that we know the population
standard deviation σ; see Section 10.1 about how to deal
with such cases.
Copyright © 2009 Pearson Education, Inc.
Slide 9.2- 9
One-Tailed Hypothesis Tests
The Sampling Distribution
Finding the Standard Score
Critical Values for Statistical Significance
Recall that the hypothesis test is significant at the 0.05 level if
the probability of finding a result as extreme as the one actually
observed is 0.05 or less (assuming the null hypothesis is true).
For a left-tailed test, then, we are looking for a standard score
that is at or below the 5th percentile of the sampling distribution.
From Appendix A, the 5th percentile has a standard score of z =
-1.645.
Copyright © 2009 Pearson Education, Inc.
Slide 9.2- 10
Therefore, a left-tailed hypothesis test is significant at the 0.05
level if the standard score of the sample mean is less than or
equal to z = -1.645.
This standard score represents the critical value for
significance at the 0.05 level in a left-tailed hypothesis test.
Copyright © 2009 Pearson Education, Inc.
Slide 9.2- 11
Copyright © 2009 Pearson Education, Inc.
Slide 9.2- 12
Decisions Based on Statistical Significance
for One-Tailed Hypothesis Tests
We decide whether to reject or not reject the null
hypothesis by comparing the standard score (z) for a
sample mean to critical values for significance at a
given level. Table 9.1 (next slide) summarizes the
decisions for one-tailed hypothesis tests at the 0.05
and 0.01 levels of significance.
Copyright © 2009 Pearson Education, Inc.
Slide 9.2- 13
Copyright © 2009 Pearson Education, Inc.
Slide 9.2- 14
TIME OUT TO THINK
Suppose that, in right-tailed tests, one study finds a
sample mean with z = 3 and another study finds a sample
mean with z = 10. Both are significant at the 0.01 level,
but which result provides stronger evidence for rejecting
the null hypothesis? Explain.
Copyright © 2009 Pearson Education, Inc.
Slide 9.2- 15
EXAMPLE 1 Columbia College Test
Significance
Assuming that the null hypothesis is true and the mean starting
salary for Columbia College graduates is $39,000, is it
statistically significant to find a sample in which the mean is
only $37,000? Based on your answer, should we reject or not
reject the null hypothesis?
Solution: Recall that the hypothesis test is left-tailed (because
the alternative hypothesis has the “less than” form of μ <
$39,000) and we already found that a sample mean of xxx̄==
$37,000 has a standard score of -3.25.
This result is significant at the 0.05 level because the standard
score is less than the critical value of z = -1.645.
Copyright © 2009 Pearson Education, Inc.
Slide 9.2- 16
EXAMPLE 1 Columbia College Test
Significance
Solution: (cont.)
In fact, it is also significant at the 0.01 level, because the
standard score is less than -2.33.
We therefore have strong reason to reject the null hypothesis and
conclude that Columbia College officials did indeed exaggerate
the mean starting salary of its graduates.
Copyright © 2009 Pearson Education, Inc.
Slide 9.2- 17
One-Tailed Hypothesis Tests
The Sampling Distribution
Finding the Standard Score
Critical Values for Statistical Significance
Finding the P-Value
Recall that the P-value is the probability of finding a sample
mean as extreme as the one found, under the assumption that
the null hypothesis is true.
For a left-tailed test, the probability of finding a sample mean
less than or equal to some particular value is simply the area
under the curve to the left of the sample mean.
Copyright © 2009 Pearson Education, Inc.
Slide 9.2- 18
Figure 9.4 The P-value for one-tailed hypothesis tests corresponds to an area
under the sampling distribution curve. Once we compute the standard score for a
particular sample mean, we can find the corresponding area from standard score
tables like those in Appendix A. (Note: Appendix A lists areas to the left of each
standard score; therefore, for right-tailed tests, finding the area to the right of the
sample mean requires subtracting the value given in the table from 1.)
Copyright © 2009 Pearson Education, Inc.
Slide 9.2- 19
One-Tailed Hypothesis Tests
The Sampling Distribution
Finding the Standard Score
Critical Values for Statistical Significance
Finding the P-Value
Summary of One-Tailed Tests for Population
Means
Copyright © 2009 Pearson Education, Inc.
Slide 9.2- 20
To summarize:
• Because we are dealing with population means, the null
hypothesis has the form μ = claimed value. To decide whether
to reject or not reject the null hypothesis, we must determine
whether a sample as extreme as the one found in the hypothesis
test is likely or unlikely to occur if the null hypothesis is true.
• We determine this likelihood from the standard score (z) of the
sample mean, which we compute from the formula
x–
z=
/ n
where n is the sample size, xx̄ is the sample mean, μ is the
population mean claimed by the null hypothesis, and σ is the
population standard deviation. (In this section, we generally
approximate σ with the sample standard deviation, s; a better
approach is described in Section 10.1.)
Copyright © 2009 Pearson Education, Inc.
Slide 9.2- 21
To summarize: (cont.)
• We can then assess the standard score in two ways:
1. We can assess its level of statistical significance by
comparing it to the critical values given in Table 9.1 (page
383).
2. We can determine its P-value with standard score tables
like those in Appendix A. For a left-tailed test, the P-value
is the area under the normal curve to the left of the standard
score; for a right-tailed test, it is the area under the normal
curve to the right of the standard score.
• If the result is statistically significant at the chosen level
(usually either the 0.05 or the 0.01significance level), we
reject the null hypothesis. If it is not statistically significant,
we do not reject the null hypothesis.
Copyright © 2009 Pearson Education, Inc.
Slide 9.2- 22
EXAMPLE 2 Mean Rental Car Mileage (Revisited)
Recall the case of the rental car fleet owner (Example 5 of
Section 9.1) who suspects that the mean annual mileage of his
cars is greater than the national mean of 12,000 miles. He selects
a random sample of n = 225 cars and calculates the sample
mean to be xx̄ = 12,375 miles and the sample standard deviation
to be s = 2,415 miles. Determine the level of statistical
significance and P-value for this hypothesis test, and interpret
your findings.
Solution: To determine the significance and P-value we must
first calculate the standard score for the sample mean of x̄ =
12,375 miles. We are given the claimed population mean (μ =
12,000 miles) and the sample size (n = 225); we do not know
the population standard deviation, σ, but we will assume it is the
same as the sample standard deviation and set s = 2,415 miles.
Copyright © 2009 Pearson Education, Inc.
Slide 9.2- 23
EXAMPLE 2 Mean Rental Car Mileage (Revisited)
Solution: (cont.)
We find
x–
z=
=
/ n
12,375 – 12,000
2,415/ 225
= 2.33
This standard score is greater than the critical value of z = 1.645
for significance at the 0.05 level and equal to the value of z =
2.33 for significance at the 0.01 level, giving us strong reason to
reject the null hypothesis and conclude that the rental car fleet
average really is greater than the national average.
We can find the P-value from Appendix A, which shows that
the area to the left of a standard score of z = 2.33 is 0.9901;
because this is a right-tailed test, the probability is the area to
the right (see Figure 9.4b in slide 19), which is 1 – 0.9901 =
0.0099. Therefore, the P-value is 0.0099, which is very close to
0.01.
Copyright © 2009 Pearson Education, Inc.
Slide 9.2- 24
Two-Tailed Tests
The same basic ideas apply to two-tailed hypothesis tests in
which the alternative hypothesis has the “not equal to” form of
Ha: μ = claimed value.
For two-tailed tests a value “as extreme as the one actually
found” can lie either on the left or on the right of the sampling
distribution (Figure 9.5 in the next slide). A probability of 0.05,
or 5%, therefore corresponds to standard scores either in
the first 2.5% of the sampling distribution on the left or in the
last 2.5% on the right.
From Appendix A, the 2.5th percentile corresponds to a standard
score of -1.96 and the 97.5th percentile corresponds to a standard
score of 1.96. These standard scores become the critical values
for two-tailed tests to be significant at the 0.05 level.
Copyright © 2009 Pearson Education, Inc.
Slide 9.2- 25
Figure 9.5 For two-tailed tests, the critical values for significance at the 0.05
level correspond to the 2.5th and 97.5th percentiles (as opposed to the 5th or
95th percentiles for one-tailed tests).
Copyright © 2009 Pearson Education, Inc.
Slide 9.2- 26
Two-Tailed Test (Ha: μ ≠ claimed value)
Statistical significance: A two-tailed test is significant at
the 0.05 level if the standard score of the sample mean is
at or below a critical value of -1.96 or at or above a critical
value of 1.96. For significance at the 0.01 level, the critical
values are -2.575 and 2.575.
P-values: To find the P-value for a sample mean in a twotailed test, first use the standard score of the sample
mean to find the P-value assuming the test is one-tailed;
then double this value to find the P–value for the twotailed test.
Copyright © 2009 Pearson Education, Inc.
Slide 9.2- 27
An example:
Consider a drug company that seeks to be sure that its “500milligram” aspirin tablets really contain 500 milligrams of
aspirin.
The null hypothesis says that the population mean of the aspirin
content is 500 milligrams:
H0: μ = 500 milligrams
The drug company is interested in the possibility that the mean
weight is either less than or greater than 500 milligrams.
Because the company is interested in possibilities on both sides
of the claimed mean of 500 milligrams, we have a two-tailed test
in which the alternative hypothesis is
Ha: μ ≠ 500 milligrams
Copyright © 2009 Pearson Education, Inc.
Slide 9.2- 28
An example: (cont.)
Suppose the company selects a random sample of n = 100 tablets
and finds that they have a mean weight of xx̄x==12,375milligrams;
501.5
further suppose that the population standard deviation is σ = 7.0
milligrams.
Then the standard score (z) for this sample mean is
x–
z=
=
/ n
501.5 – 5000
7/ 100
= 2.33
This standard score is above the critical value of 1.96 for a twotailed test, so it is significant at the 0.05 level. (It is not
significant at the 0.01 level, because it is not above the critical
value of 2.575.)
This gives us good reason to reject the null hypothesis and
conclude that the mean weight of the “500-milligram” aspirin
tablets is different from 500 milligrams.
Copyright © 2009 Pearson Education, Inc.
Slide 9.2- 29
An example: (cont.)
We find the P-value by using Appendix A, which shows that the
area to the left of a standard score of z = 2.14 is 0.9838.
Therefore, if this were a one-tailed test, the P-value would be the
area to the right, or 1 – 0.9838 = 0.0162. But because this is a
two-tailed test, we double this number to find that the P-value is
2 × 0.0162 = 0.0324.
In other words, if the null hypothesis is true, the probability of
drawing a sample as extreme as the one found is 0.0324.
Copyright © 2009 Pearson Education, Inc.
Slide 9.2- 30
Common Errors in Hypothesis Testing
Two common types of error may affect the conclusions.
Recall the legal analogy at the end of Section 9.1, in which
the null hypothesis is H0: The defendant is innocent.
One type of error occurs if we conclude that the defendant is
guilty when, in reality, he or she is innocent.
The other type of error occurs if we find the defendant not
guilty when he or she actually is guilty. In this case, we have
wrongly failed to reject the null hypothesis.
Copyright © 2009 Pearson Education, Inc.
Slide 9.2- 31
Type I and Type II errors
• An error in which H0 is wrongly rejected, is called a
type I error.
• An error in which we wrongly fail to reject H0, is
called a type II error.
Copyright © 2009 Pearson Education, Inc.
Slide 9.2- 32
Common Errors in Hypothesis Testing
Bias in Choosing Hypotheses
Consider a situation in which a factory is investigated for
releasing pollutants into a nearby stream at a level that may
(or may not) exceed the maximum level allowed by the
government.
The logical choice for the null hypothesis is that the actual
mean level of pollutants equals the maximum allowed
level:
H0: mean level of pollutants = maximum level allowed
Copyright © 2009 Pearson Education, Inc.
Slide 9.2- 33
However, this leaves us with two reasonable choices for the
alternative hypothesis, each of which introduces some bias
into the ultimate conclusions.
If factory officials conducted the test, they would be inclined
to claim that the mean level of pollutants is less than the
maximum allowed level.
H0: mean level of pollutants < maximum level allowed
With this alternative hypothesis, the two possible outcomes
are
• Reject H0, in which case we conclude that the mean level of
pollutants is less than the allowed maximum. This outcome
would please the factory officials.
• Not reject H0, in which case the test is inconclusive.
Copyright © 2009 Pearson Education, Inc.
Slide 9.2- 34
Now suppose an environmental group conducts the test.
Because the group suspects that the factory is violating the
government standards, its choice for the alternative
hypothesis would be that the mean level of pollutants is
greater than the maximum allowed level, or
Ha: mean level of pollutants > maximum level allowed
With this choice of alternative hypothesis, rejection of H0
implies that the factory has violated the standards, while not
rejecting H0 is again inconclusive.
In other words, choosing the right-tailed (“greater than”) test
creates a situation in which the factory may be found in
violation but cannot be proved to be in compliance.
Copyright © 2009 Pearson Education, Inc.
Slide 9.2- 35
The End
Copyright © 2009 Pearson Education, Inc.
Slide 9.2- 36