Download Section 9-1 Fundamentals of Hypothesis Testing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
9.1 Fundamentals of Hypothesis
Testing
LEARNING GOAL
Understand the goal of hypothesis testing and the basic
structure of a hypothesis test, including how to set up the
null and alternative hypotheses, how to determine the
possible outcomes of a hypothesis test, and how to decide
between these possible outcomes.
Copyright © 2009 Pearson Education, Inc.
A company called ProCare Industries, Ltd. once claimed
that its product, called Gender Choice, could increase a
woman’s chance of giving birth to a baby girl.
How could we test whether the Gender Choice claim is true?
One way would be to study a random sample of, say, 100
babies born to women who used the Gender Choice product.
If the product does not work, we would expect about half of
these babies to be girls. If it does work, we would expect
significantly more than half of the babies to be girls.
The key question, then, is what constitutes “significantly
more.”
Copyright © 2009 Pearson Education, Inc.
Slide 9.1- 2
In statistics, we answer such questions through hypothesis
testing. Consider a sample in which there are 64 girls among
100 babies born to women who used the product.
• We begin by assuming that Gender Choice does not work;
that is, it does not increase the percentage of girls. If this is
true, then we should expect about 50% girls among the
population of all births to women using the product. That is,
the proportion of girls is equal to 0.50.
• We now use our sample (with 64 girls among 100 births) to
test the above assumption. We conduct this test by
calculating the likelihood of drawing a random sample of
100 births in which 64% (or more) of the babies are girls
(from a population in which the overall proportion of girls
is only 50%).
Copyright © 2009 Pearson Education, Inc.
Slide 9.1- 3
• If we find that a random sample of births is fairly likely to
have 64% (or more) girls, then we do not have evidence that
Gender Choice works.
However, if we find that a random sample of births is
unlikely to have at least 64% girls, then we conclude that
the result in the Gender Choice sample is probably due to
something other than chance—meaning that the product
may be effective.
Note, however, that even in this case we will not have
proved that the product is effective, as there could still be
other explanations for the result (such as that we happened
to choose an unusual sample or that there were confounding
variables we did not account for).
Copyright © 2009 Pearson Education, Inc.
Slide 9.1- 4
TIME OUT TO THINK
Suppose Gender Choice is effective and you select a new
random sample of 1,000 babies born to women who used
the product. Based on the first sample (with 64 girls
among 100 births), how many baby girls would you
expect in the new sample? Next, suppose Gender Choice
is not effective; how many baby girls would you expect in
a random sample of 1,000 babies born to women who
used the product?
Copyright © 2009 Pearson Education, Inc.
Slide 9.1- 5
Formulating the Hypotheses
Definitions
A hypothesis is a claim about a population parameter,
such as a population proportion (p) or population mean
(m).
A hypothesis test is a standard procedure for testing a
claim about a population parameter.
Copyright © 2009 Pearson Education, Inc.
Slide 9.1- 6
There are always at least two hypotheses in any
hypothesis test.
In the Gender Choice case, the two hypotheses
are essentially:
(1) that Gender Choice works in raising the
population proportion of girls from the 50%
that we would normally expect and
(2) that Gender Choice does not work in raising
the population proportion of girls.
Copyright © 2009 Pearson Education, Inc.
Slide 9.1- 7
• The null hypothesis is the claim that Gender Choice
does not work, in which case the population proportion
of girls born to women who use the product should be
50%, or 0.50.
Using p to represent the population proportion, we
write this null hypothesis as
H0 (null hypothesis): p = 0.50
• The alternative hypothesis is the claim that Gender
Choice does work, in which case the population
proportion of girls born to women who use the product
should be greater than 0.50.
We write this alternative hypothesis as
Ha (alternative hypothesis): p > 0.50
Copyright © 2009 Pearson Education, Inc.
Slide 9.1- 8
In this chapter, the null hypothesis will always include the
condition of equality, just as the null hypothesis for the
Gender Choice example is the equality p = 0.50.
However, the alternative hypothesis for the Gender Choice
case (p > 0.50) is only one of three general forms of
alternative hypotheses that we will encounter in this
chapter:
population parameter < claimed value
population parameter > claimed value
population parameter ≠ claimed value
Copyright © 2009 Pearson Education, Inc.
Slide 9.1- 9
It is useful to give names to the three different types of
alternative hypotheses.
The first form (“less than”) leads to what is called a lefttailed hypothesis test, because it requires testing whether
the population parameter lies to the left (lower values) of
the claimed value.
Similarly, the second form (“greater than”) leads to a righttailed hypothesis test, because it requires testing whether
the population parameter lies to the right (higher values) of
the claimed value.
The third form (“not equal to”) leads to a two-tailed
hypothesis test, because it requires testing whether the
population parameter lies significantly far to either side of
the claimed value.
Copyright © 2009 Pearson Education, Inc.
Slide 9.1- 10
Null and Alternative Hypotheses
The null hypothesis, or H0, is the starting assumption
for a hypothesis test. For the types of hypothesis tests
in this chapter, the null hypothesis always claims a
specific value for a population parameter and therefore
takes the form of an equality:
H0 (null hypothesis):
population parameter = claimed value
The alternative hypothesis, or Ha, is a claim that the
population parameter has a value that differs from the
value claimed in the null hypothesis. It may take one of
the following forms:
(left-tailed) Ha: population parameter < claimed value
(right-tailed) Ha: population parameter > claimed value
(two-tailed) Ha: population parameter ≠ claimed value
Copyright © 2009 Pearson Education, Inc.
Slide 9.1- 11
EXAMPLE 1 Identifying Hypotheses
In each case, identify the population parameter about which a
claim is made, state the null and alternative hypotheses for a
hypothesis test, and indicate whether the hypothesis test will be
left-tailed, right-tailed, or two-tailed.
a. The manufacturer of a new model of hybrid car advertises
that the mean fuel consumption is equal to 62 miles per
gallon on the highway. A consumer group claims that the
mean is less than 62 miles per gallon.
Solution:
a. The population parameter about which a claim is made is a
population mean (m)—the mean gas mileage of all new
hybrid cars of this model. The null hypothesis must state
that this population mean is equal to some specific value.
Copyright © 2009 Pearson Education, Inc.
Slide 9.1- 12
EXAMPLE 1 Identifying Hypotheses
Solution:
a. (cont.) We therefore recognize the null hypothesis as the
advertised claim that the mean mileage for the cars is 62
miles per gallon.
The alternative hypothesis is the consumer group’s claim
that the true gas mileage is less than advertised.
To summarize:
H0 (null hypothesis): m = 62 miles per gallon
Ha (alternative hypothesis): m < 62 miles per gallon
Because the alternative hypothesis has a “less than” form,
the hypothesis test will be left-tailed.
Copyright © 2009 Pearson Education, Inc.
Slide 9.1- 13
EXAMPLE 1 Identifying Hypotheses
In each case, identify the population parameter about which a
claim is made, state the null and alternative hypotheses for a
hypothesis test, and indicate whether the hypothesis test will be
left-tailed, right-tailed, or two-tailed.
b. The Ohio Department of Health claims that the average stay
in Ohio hospitals after childbirth is greater than the national
mean of 2.0 days.
Solution:
b. The population is all women in Ohio who have recently
given birth, and the population parameter is their mean (m)
stay after childbirth in Ohio hospitals. The null hypothesis
must be an equality, so in this case it is the claim that the
mean hospital stay equals the national average of 2.0 days.
Copyright © 2009 Pearson Education, Inc.
Slide 9.1- 14
EXAMPLE 1 Identifying Hypotheses
Solution:
b. (cont.) The alternative hypothesis is the Health Department’s
claim that the mean stay in Ohio is greater than this national
average.
To summarize:
H0 (null hypothesis): m = 2.0 days
Ha (alternative hypothesis): m > 2.0 days
Because the alternative hypothesis has a “greater than” form,
the hypothesis test will be right-tailed.
Copyright © 2009 Pearson Education, Inc.
Slide 9.1- 15
EXAMPLE 1 Identifying Hypotheses
In each case, identify the population parameter about which a
claim is made, state the null and alternative hypotheses for a
hypothesis test, and indicate whether the hypothesis test will be
left-tailed, right-tailed, or two-tailed.
c. A wildlife biologist working in the African savanna claims
that the actual proportion of female zebras in the region is
different from the accepted proportion of 50%.
Solution:
c. In this case, the claim is about a population proportion
(p)— the proportion of female zebras among the zebra
population of the region. The accepted population
proportion is p = 0.5, which becomes the null hypothesis.
Copyright © 2009 Pearson Education, Inc.
Slide 9.1- 16
EXAMPLE 1 Identifying Hypotheses
Solution:
c. (cont.) The wildlife biologist claims that the actual population
proportion is different from the value in the null hypothesis.
Because “different” can be either greater or less than, the
alternative hypothesis has a “not equal to” form:
H0 (null hypothesis): p = 0.5
Ha (alternative hypothesis): p ≠ 0.5
The “not equal to” form of this alternative hypothesis leads to
a two-tailed test.
Copyright © 2009 Pearson Education, Inc.
Slide 9.1- 17
Possible Outcomes of a Hypothesis Test
Two Possible Outcomes of a Hypothesis Test
There are two possible outcomes to a hypothesis test:
1. Reject the null hypothesis, H0, in which case we have
evidence in support of the alternative hypothesis.
2. Not reject the null hypothesis, H0, in which case we do
not have enough evidence to support the alternative
hypothesis.
Copyright © 2009 Pearson Education, Inc.
Slide 9.1- 18
Notice that “accepting the null hypothesis” is not a
possible outcome, because the null hypothesis is always
the starting assumption.
The hypothesis test may not give us reason to reject this
starting assumption, but it cannot by itself give us reason
to conclude that the starting assumption is true.
The fact that there are only two possible outcomes makes
it extremely important that the null and alternative
hypotheses be chosen in an unbiased way.
In particular, both hypotheses must always be formulated
before a sample is drawn from the population for testing.
Otherwise, data from the sample might inappropriately
bias the selection of the hypotheses to test.
Copyright © 2009 Pearson Education, Inc.
Slide 9.1- 19
TIME OUT TO THINK
The idea that a hypothesis test cannot lead us to accept
the null hypothesis is an example of the old dictum that
“absence of evidence is not evidence of absence.” As an
illustration of this idea, explain why it would be easy in
principle to prove that some legendary animal (such as
Bigfoot or the Loch Ness Monster) exists, but it is nearly
impossible to prove that it does not exist.
Copyright © 2009 Pearson Education, Inc.
Slide 9.1- 20
EXAMPLE 2 Hypothesis Test Outcomes
For each of the three cases from Example 1 (slides 12-17),
describe the possible outcomes of a hypothesis test and how we
would interpret these outcomes.
Solution:
a. Recall that the null hypothesis is the advertised claim that the
mean mileage for the new cars is m = 62 miles per gallon. The
alternative hypothesis is the consumer group’s claim that the
true gas mileage is less than advertised, or m < 62 miles per
gallon.
The possible outcomes are
• Reject the null hypothesis of m = 62 miles per gallon, in
which case we have evidence in support of the consumer
group’s claim that the mileage is less than advertised.
Copyright © 2009 Pearson Education, Inc.
Slide 9.1- 21
EXAMPLE 2 Hypothesis Test Outcomes
Solution: (cont.)
• Do not reject the null hypothesis, in which case we lack
evidence to support the consumer group’s claim.
Note, however, that this option does not imply that the
advertised claim is true.
Copyright © 2009 Pearson Education, Inc.
Slide 9.1- 22
EXAMPLE 2 Hypothesis Test Outcomes
For each of the three cases from Example 1 (slides 12-17),
describe the possible outcomes of a hypothesis test and how we
would interpret these outcomes.
Solution: (cont.)
b. The null hypothesis is that the mean hospital stay is the
national average of 2.0 days. The alternative hypothesis is the
Health Department’s claim that the mean stay in Ohio is
greater than the national average.
The possible outcomes are
• Reject the null hypothesis of m = 2.0 days, in which case we
have evidence in support of the Health Department’s
claim that the mean stay in Ohio is greater than the
national average.
Copyright © 2009 Pearson Education, Inc.
Slide 9.1- 23
EXAMPLE 2 Hypothesis Test Outcomes
Solution: (cont.)
• Do not reject the null hypothesis, in which case we lack
evidence to support the Health Department’s claim.
Note, however, that this option does not imply that the
Ohio average stay is actually equal to the national average
stay of 2.0 days.
Copyright © 2009 Pearson Education, Inc.
Slide 9.1- 24
EXAMPLE 2 Hypothesis Test Outcomes
For each of the three cases from Example 1 (slides 12-17),
describe the possible outcomes of a hypothesis test and how we
would interpret these outcomes.
Solution: (cont.)
c. The null hypothesis is that the proportion of female zebras is
the accepted population proportion of 50% (p = 0.5). The
alternative hypothesis is the biologist’s claim that the accepted
value is wrong, meaning that the actual proportion of female
zebras is not 50% (it may be either more or less than 50%).
The possible outcomes are
• Reject the null hypothesis of p = 0.5, in which case we have
evidence in support of the biologist’s claim that the
accepted value is wrong.
Copyright © 2009 Pearson Education, Inc.
Slide 9.1- 25
EXAMPLE 2 Hypothesis Test Outcomes
Solution: (cont.)
• Do not reject the null hypothesis, in which case we lack
evidence to support the biologist’s claim.
Note, however, that this does not imply that the accepted
value is correct.
Copyright © 2009 Pearson Education, Inc.
Slide 9.1- 26
Drawing a Conclusion from a Hypothesis
Test
We will look at two ways to make the decision about rejecting
or not rejecting the null hypothesis.
Statistical Significance
If the probability of a particular result is 0.05 or less, we say
that the result is statistically significant at the 0.05 level; if the
probability is 0.01 or less, the result is statistically significant
at the 0.01 level.
The 0.01 level therefore means stronger significance than the
0.05 level.
Copyright © 2009 Pearson Education, Inc.
Slide 9.1- 27
Hypothesis Test Decisions Based on Levels of
Statistical Significance
We decide the outcome of a hypothesis test by comparing
the actual sample result (mean or proportion) to the result
expected if the null hypothesis is true. We must choose a
significance level for the decision.
•
If the chance of a sample result at least as extreme as
the observed result is less than 1 in 100 (or 0.01), then
the test is statistically significant at the 0.01 level and
offers strong evidence for rejecting the null hypothesis.
•
If the chance of a sample result at least as extreme as
the observed result is less than 1 in 20 (or 0.05), then
the test is statistically significant at the 0.05 level and
offers moderate evidence for rejecting the null
hypothesis.
Copyright © 2009 Pearson Education, Inc.
Slide 9.1- 28
Hypothesis Test Decisions Based on Levels of
Statistical Significance (cont.)
•
If the chance of a sample result at least as extreme as
the observed result is greater than the chosen level of
significance (0.05 or 0.01), then we do not reject the
null hypothesis.
Copyright © 2009 Pearson Education, Inc.
Slide 9.1- 29
EXAMPLE 3 Statistical Significance in
Hypothesis Testing
Consider the Gender Choice example in which the sample size is
n = 100 and the sample proportion is pp̂ = 0.64. Using techniques
that we will discuss later in this chapter, it is possible to
calculate the probability of randomly choosing such a sample (or
a more extreme sample with pp̂ > 0.64) under the assumption that
the null hypothesis (p = 0.50) is true; the result is that the
probability is 0.0026. Based on this result, should you reject or
not reject the null hypothesis?
Solution: The probability of 0.0026 means that if the null
hypothesis is true (meaning that the true population proportion
is 50%), the probability of drawing a random sample with a
sample proportion of at least pp̂ = 0.64 is just slightly under 3 in
1,000.
Copyright © 2009 Pearson Education, Inc.
Slide 9.1- 30
EXAMPLE 3 Statistical Significance in
Hypothesis Testing
Solution: (cont.)
Because this probability is less than 1 in 100, this result is
statistically significant at the 0.01 level. It therefore gives us good
reason to reject the null hypothesis, which means it provides
support for the alternative hypothesis that Gender Choice raises
the proportion of girls to more than 50%.
Copyright © 2009 Pearson Education, Inc.
Slide 9.1- 31
Drawing a Conclusion from a Hypothesis
Test
Statistical Significance
P-Values
P-value is short for probability value; notice the capital P,
used to avoid confusion with the lowercase p that stands for a
population proportion.
We will discuss the actual calculation of P-values in Sections
9.2 and 9.3; here, we focus only on their interpretation.
Copyright © 2009 Pearson Education, Inc.
Slide 9.1- 32
Hypothesis Test Decisions Based on P-Values
The P-value (probability value) for a hypothesis test of a
claim about a population parameter is the probability of
selecting a sample at least as extreme as the observed
sample, assuming that the null hypothesis is true:
• A small P-value (such as less than or equal to 0.05)
indicates that the sample result is unlikely, and
therefore provides reason to reject the null hypothesis.
• A large P-value (such as greater than 0.05) indicates
that the sample result could easily occur by chance, so
we cannot reject the null hypothesis
Copyright © 2009 Pearson Education, Inc.
Slide 9.1- 33
EXAMPLE 4 Fair Coin?
You suspect that a coin may have a bias toward landing tails
more often than heads, and decide to test this suspicion by
tossing the coin 100 times. The result is that you get 40 heads
(and 60 tails). A calculation (not shown here) indicates that the
probability of getting 40 or fewer heads in 100 tosses with a fair
coin is 0.0228. Find the P-value and level of statistical
significance for your result. Should you conclude that the coin is
biased against heads?
Solution: The null hypothesis is that the coin is fair, in which
case the proportion of heads should be about 50% (H0: p =
0.50). The alternative hypothesis is your suspicion that the coin
is biased against heads, in which case the proportion of heads
would be less than 50% (Ha: p < 0.50).
Copyright © 2009 Pearson Education, Inc.
Slide 9.1- 34
EXAMPLE 4 Fair Coin?
Solution: (cont.)
Your 100 coin tosses represent a random sample of size n = 100,
and the result of 40 heads is the sample proportion (p
p̂ = 0.40) for
the hypothesis test. The P-value for the test is the probability of
getting a sample proportion at least as extreme as the one you
found (p
p̂ ≤ 0.40), assuming that the coin is fair and the population
proportion of heads is 0.5.
The given probability for this occurrence is 0.0228, which is the
P-value for the test. Because this P-value is smaller than 0.05, the
result is statistically significant at the 0.05 level. Because it is not
smaller than 0.01, the result is not significant at the 0.01 level.
Statistical significance at the 0.05 level gives you moderate reason
to reject the null hypothesis and conclude that the coin is biased
against heads.
Copyright © 2009 Pearson Education, Inc.
Slide 9.1- 35
Putting It All Together
The Hypothesis Test Process
Step 1. Formulate the null and alternative hypotheses,
each of which must make a claim about a
population parameter, such as a population mean
(μ) or a population proportion (p); be sure this is
done before drawing a sample or collecting data.
Based on the form of the alternative hypothesis,
decide whether you will need a left-, right-, or twotailed hypothesis test.
Step 2. Draw a sample from the population and measure
the sample statistics, including the sample size (n)
and the relevant sample statistic, such as the
p̂
x̄ or sample proportion (p).
sample mean (x)
Copyright © 2009 Pearson Education, Inc.
Slide 9.1- 36
The Hypothesis Test Process (cont.)
Step 3. Determine the likelihood of observing a sample statistic
(mean or proportion) at least as extreme as the one you
found under the assumption that the null hypothesis is
true. The precise probability of such an observation is the
P-value (probability value) for your sample result.
Step 4. Decide whether to reject or not reject the null
hypothesis, based on your chosen level of significance
(usually 0.05 or 0.01, but other significance levels are
sometimes used).
Copyright © 2009 Pearson Education, Inc.
Slide 9.1- 37
Again, be sure to avoid confusion among the three different
uses of the letter p:
• A lowercase p represents a population proportion—that is,
the true proportion among a complete population.
• A lowercase pp̂ (“p-hat”) represents a sample proportion—that
is, the proportion found in a sample drawn from a population.
• An uppercase P stands for probability in a P-value.
Copyright © 2009 Pearson Education, Inc.
Slide 9.1- 38
EXAMPLE 5 Mean Rental Car Mileage
In the United States, the average car is driven about 12,000
miles each year. The owner of a large rental car company
suspects that for his fleet, the mean distance is greater than
12,000 miles each year. He selects a random sample of n = 225
cars from his fleet and finds that the mean annual mileage for
this sample is xx̄ = 12,375 miles.
A calculation shows that if you assume the fleet mean is the
national mean of 12,000 miles, then the probability of selecting a
225-car sample with a mean annual mileage of at least 12,375
miles is 0.01. Based on these data, describe the process of
conducting a hypothesis test and drawing a conclusion.
Copyright © 2009 Pearson Education, Inc.
Slide 9.1- 39
EXAMPLE 5 Mean Rental Car Mileage
Solution: We follow the four steps listed in slides 36 and 37:
Step 1. The population parameter of interest is a population
mean (μ)—the mean annual mileage for the population of
all cars in the rental car fleet. The null hypothesis is that
the population mean is the national average of 12,000
annual miles per car. The alternative hypothesis is the
owner’s claim that the population mean for his fleet is
greater than this national average. That is
H0: m = 12,000 miles
Ha: m > 12,000 miles
Because the alternative hypothesis has a “greater than”
form, the hypothesis test will be right-tailed.
Copyright © 2009 Pearson Education, Inc.
Slide 9.1- 40
EXAMPLE 5 Mean Rental Car Mileage
Solution: (cont.)
Step 2. This step asks that we select the sample and measure the
sample statistics; here, we are given the sample size of n
= 225 and the sample mean of xx̄ = 12,375 miles.
Step 3. This step asks us to determine the likelihood that, under
the assumption that the fleet average is really 12,000
miles (the null hypothesis), we would by chance select a
sample with a mean of at least 12,375 miles.
Here, we don’t need to do the calculation, because we
are told that the probability is 0.01. (The calculation
method is given in Section 9.2.) This probability is the
P-value; it tells us that if the null hypothesis were true,
there would be only a 0.01 probability of selecting a
sample as extreme as the one observed.
Copyright © 2009 Pearson Education, Inc.
Slide 9.1- 41
EXAMPLE 5 Mean Rental Car Mileage
Solution: (cont.)
Step 4. The P-value of 0.01 tells us that the result is significant
at the 0.01 level.
We therefore reject the null hypothesis and conclude that
the test provides strong support for the alternative
hypothesis, implying that the mean annual mileage for
the rental car fleet is greater than the national average of
12,000 miles.
Copyright © 2009 Pearson Education, Inc.
Slide 9.1- 42
A Legal Analogy of Hypothesis Testing
In American courts of law, the fundamental principle is that a
defendant is presumed innocent until proven guilty. Because
the starting assumption is innocence, this represents the null
hypothesis:
H0: The defendant is innocent.
Ha: The defendant is guilty.
The job of the prosecutor is to present evidence so compelling
that the jury is persuaded to reject the null hypothesis and find
the defendant guilty. If the prosecutor does not make a
sufficiently compelling case, then the jury will not reject H0
and the defendant will be found “not guilty.”
Copyright © 2009 Pearson Education, Inc.
Slide 9.1- 43
TIME OUT TO THINK
Consider two situations. In one, you are a juror in a case
in which the defendant could be fined a maximum of
$2,000. In the other, you are a juror in a case in which the
defendant could receive the death penalty. Compare the
significance levels you would use in the two situations. In
each situation, what are the consequences of wrongly
rejecting the null hypothesis?.
Copyright © 2009 Pearson Education, Inc.
Slide 9.1- 44
The End
Copyright © 2009 Pearson Education, Inc.
Slide 9.1- 45