Download Tests of Significance.

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

History of statistics wikipedia , lookup

Psychometrics wikipedia , lookup

Eigenstate thermalization hypothesis wikipedia , lookup

Taylor's law wikipedia , lookup

Foundations of statistics wikipedia , lookup

Omnibus test wikipedia , lookup

Statistical hypothesis testing wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Misuse of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Transcript
5.2 Tests of Significance
Example 5.7. Diet colas use artificial sweeteners to avoid sugar. Colas with artificial sweeteners gradually lose their sweetness over time.
Manufacturers therefore test new colas for loss of sweetness before marketing them. Trained tasters sip the cola along with drinks of standard
sweetness and score the cola on a “sweetness score” of 1 to 10. The cola
is then stored for a month at high temperture to imitate the effect of
four months’ storage at room temperature. After a month, each taster
scores the stored cola. This is a matched pairs experiment. Our data
are the differences (score before storage minus score after storage) in
the tasters’ scores. The bigger these differences, the bigger the loss of
sweetness. Here are the sweetness losses for a new cola, as measured
by 10 trained tasters:
2.0 0.4 0.7 2.0
− 0.4 2.2
− 1.3 1.2 1.1 2.3.
Most are positive. That is, most tasters found a loss of sweetness. But
the loses are small, and two tasters (the negative scores) thought the
cola gained sweetness. Are these data good evidence that the cola lost
sweetness in storage?
The Reasoning of a Significance Test
Note. The average sweetness loss for our cola is given by the sample
mean,
x=
2.0 + 0.4 + · · · + 2.3
= 1.02.
10
1
That’s not a large loss. Ten different tasters would almost surely give
a different result. Maybe it’s just chance that produced this result. A
test of significance asks: “Does the sample result x = 1.02 reflect a real
loss of sweetness?” OR “Could we easily get the outcome x = 1.02 just
by chance?”
Note. Next, state the null hypothesis. The null hypothesis says that
there is no effect or no change in the population. If the null hypothesis
is not true, the sample result is just chance at work. Here, the null
hypothesis says that the cola does not lose sweetness (no change). We
can write that in terms of the mean sweetness loss µ in the population
as H0 : µ = 0. We write H0 , read “H-nought,” to indicate the null
hypothesis. The effect we suspect is true, the alternative to “no effect”
or “no change,” is described by the alternate hypothesis. We suspect
that the cola does lose sweetness. In terms of the mean sweetness loss
µ, the alternative hypothesis is Ha : µ > 0.
Note. The reasoning of a significance test goes like this.
• Suppose for the sake of argument that the null hypothesis is true,
that on the average there is no loss of sweetness.
• Is the sample outcome = 1.02 surprisingly large under that supposition? If it is, that’s evidence against H0 and in favor of Ha .
To answer the question, we use our knowledge of how the sample mean
x would vary in repeated samples if H0 really were true. That’s the
sampling distribution of x once again.
2
Note. From long experience we also know that the standard deviation for all individual tasters is σ = 1. (It is not realistic to suppose
that we know the population standard devatiation σ. We will eliminate
this assumption in the next chapter.) The sampling distribution of x
from 10 tasters is then normal with mean µ = 0 and standard devia√
√
tion σ/ n = 1/ n = .316. We can judge whether any observed x is
surprising by locating it on this distribution. Figure 5.8 (and TM-86)
shows the sampling distribution with the observed values of x for two
types of cola.
• One cola had x = .3 for a sample of 10 tasters. It is clear from
Figure 5.8 (TM-86) that an average x this large could easily occur
just by chance when the population mean is µ = 0. That 10 tasters
find x = .3 is not evidence of a sweetness loss.
• The taste for our cola produced x = 1.02. That’s way out on the
normal curve in Figure 5.8 (TM-86), so far out that an observed
value this large would almost never occur just by chance if the true
µ were 0. This observed value is good evidence that in fact the
true µ is greater than 0, that is, that the cola lost sweetness. The
manufacturer must reformulate the cola and try again.
Note. Look again at Figure 5.8 (TM-86). If the alternative hypothesis
is true, there is a sweetness loss and we expect the mean loss x found by
the tasters to be positive. The farther out x is in the positive direction,
the more convinced we are that the population mean µ is not zero
but positive. We measure the strength of the evidence against H0 by
3
the probability under the normal curve in Figure 5.8 (TM-86) to the
right of the observed x. This probability is called the P −value. It is
the probability of a result at least as far out as the result we actually
got. The lower this probability, the more surprising our result, and the
stronger the evidence against the null hypothesis.
Note. Notice:
• For one new cola, our 10 tasters gave x = .3. Figure 5.9 (and
TM-87) shows the P −value for this outcome. It is the probability
to the right of 0.3. This probability is about 0.17. That is, 17%.
• Our cola showed a larger sweetness loss, x = 1.02. The probability
of a result this large or larger is only 0.0006.
Note. Small P −values are evidence against H0, because they say that
the observed result is unlikely to occur just by chance. Large P −values
fail to give evidence against H0 . A P −value of 0.05 is used as a common
rule of thumb. A result with a small P −value, say less than 0.05, is
called statistically significant. That’s just a way of saying that chance
alone would rarely produce so extreme a result.
Outline of a Test
Note. Here is the reasoning of a significance test in outline form:
1. Describe the effect you are searching for in terms of a population
4
parameter like the mean µ.
2. The null hypothesis is the statement that this effect is not present
in the population.
3. From the data, calculate a statistic like x that estimates the parameter.
4. The P −value says how unlikely a result at least as extreme as the
one we observed would be if the null hypothesis were true. Results
with small P −values would rarely occur if the null hypothesis were
true. We call such results statistically significant.
More Detail: Stating Hypotheses
Definition. The statement being tested in a test of significance is
called the null hypothesis. The test of significance is designed to assess
the strength of the evidence against the null hypothesis. Usually the
null hypothesis is a statement of “no effect” or “no difference.”
Note. The first step in a test of significance is to state a claim that we
will try to find evidence against. The alternative hypothesis Ha is the
claim about the population that we are trying to find evidence for.
Note. In Example 5.7, we were seeking evidence of a loss in sweetness.
The null hypothesis says “no loss” on the average in a large population
of tasters. The alternative hypothesis says “there is a loss.” So the
hypotheses are H0 : µ = 0 and Ha : µ > 0. This alternative hypothesis
5
is one-sided because we are interested only in deviations from the null
hypothesis in one direction.
Definition. If no direction of difference is mentioned in a problem,
and the null hypothesis is H0 : µ = 0, then the alternative hypothesis
is two sided: Ha : µ = 0.
More Detail: P −Values and Statistical Significance
Note. A significance test uses data in the form of a test statistic. The
test statistic is usually based on a statistic that estimates the parameter
that appears in the hypothesies.
Definition. The probability, computed assuming that H0 is true, that
the test statistics would take a value as extreme or more extreme than
that actually observed is called the P −value of the test. The smaller
the P −value is, the stronger is the evidence against H0 provided by the
data.
Example 5.9. In Example 5.7 the observations are an SRS of size
n = 10 from a normal population with σ = 1. The observed mean
sweetness loss for one cola was x = .3. The P −value for testing H0 :
µ = 0 and Ha : µ > 0 is therefore P (x ≥ .3) calculated assuming that
H0 is true. When H0 is true, x has the normal distribution with mean
√
√
0 and standard deviation σ/ n = 1/ 10 = .316. Find the P −value
by a normal probability calculation. Start by drawing a picture that
6
shows the P −value as an area under a normal curve. Figure 5.10 (and
TM-88) is the picture for this example. Then standardize x to get a
standard normal Z and use Table A (see TM-139, TM-140):
x−0
.3 − 0
P (x ≥ .3) = P
≥
.316
.316
= P (Z ≥ .95) = 1 − .8289 = .1711
Note. We can compare the P −value with a fixed value that we regard
as decisive. This amounts to announcing in advance how much evidence
against H0 we will insist on. The decisive value of P is called the
significance level. We write it as α, the Greek letter alpha. If we
choose α = .05, we are requiring that the data give evidence against
H0 so strong that it would happen no more than 5% of the time when
H0 is true.
Definition. If the P −value is as small or smaller than α, we say that
the data are statistically significant at level α.
Tests for a Population Mean
Note. We have an SRS of size n drawn from a normal population
with unknown mean µ. We want to test the hypothesis that µ has
a specified value. Call the specified value µ0 . The null hypothesis
is H0 : µ = µ0 . The test is based on the sample mean x. Because
normal calculations require standardized variables, we will use as our
7
test statistic the standardized sample mean
x − µ0
√ .
z=
σ/ n
This z test statistic has the standard normal distribution when H0 is
true. If the alternative is one-sided on the high side Ha : µ > µ0 then
the p−value is the probability that a standard normal variable Z takes
a value at least as large as the observed z. That is, P = P (Z ≥ z).
Example 5.10. Suppose that the z test statistic for a two-sided test
is z = 1.7. The two-sided P −value is the probability that Z ≤ −1.7 or
Z ≥ 1.7. Figure 5.11 (and TM-89) shows this probability as areas under
the standard normal curve. Because the standard normal distribution
is symmetric, we can calculate this probability by finding P (Z ≥ 1.7)
and doubling it:
P (Z ≤ −1.7 or Z ≥ 1.7) = 2P (Z ≥ 1.7) = 2(1 − .9554) = .0892.
We would make exactly the same calculation if we observed z = −1.7.
It is the absolute value |z| that matters, not whether z is positive or
negative.
Definition. To test the hypothesis H0 : µ = µ0 based on an SRS of
size n from a population with unknown mean µ and known standard
deviation σ, compute the z test statistic
x − µ0
√ .
z=
σ/ n
in terms of a variable Z having the standard normal distribution, the
P −value for a test of H0 against
Ha : µ > µ0
is
8
P (Z ≥ z)
Ha : µ < µ0
is
P (Z ≤ z)
Ha : µ = µ0
is
P (Z ≥ |z|).
These p−values are exact if the population distribution is normal and
are approximately correct for large n in other cases.
Example 5.11. The National Center for Health Statistics reports
that the mean systolic blood pressure for males 35 to 44 years of age is
128 and the standard deviation in this population is 15. The medical
director of a large company looks at the medical records of 72 executives
in this age group and finds that the mean systolic blood pressure in this
sample is x = 126.07. Is this evidence that the company’s executives
have a different mean blood pressure from the general population? As
usual in this chapter, we make the unrealistic assumption that we know
the population standard deviation. Assume that executives have the
same σ = 15 as the general population of middle-aged males.
Step 1: Hypotheses. The null hypothesis is “no difference” from
the national mean µ0 = 128. The alternative is two-sided, because
the medical director did not have a particular direction in mind before
examining the data. So H0 : µ = 128 and Ha : µ = 128.
Test 2: Test Statistic. The z test statistic is
z=
126.07 − 128
x − µ0
√
√ =
= −1.09.
σ/ n
15/ 72
Test 3: P −Value. You should still draw a picture to help find the
P −value, but now you can sketch the standard normal curve with the
observed value of z. Figure 5.12 (and TM-90) shows that the P −value
9
is the probability that a standard normal variable Z takes a value at
least 1.09 away from zero. From Table A (and TM-139, TM-140) we
find that this probability is
P = 2P (Z ≥ 1.09) = 2(1 − .8621) = .2758.
Conclusion: More than 27% of the time, an SRS of size 72 from the
general male population would have a mean blood pressure at least as
far from 128 as that of the executive sample. The observed x = 126.07
is therefore not good evidence that executives differ from other men.
Tests with Fixed Significance Level
Example 5.13. In Example 5.12, we examined whether the mean
NAEP quantitative score of young men is less than 275. The hypotheses
are H0 : µ = 275 and Ha : µ < 275. The z statistic takes the value
z = −1.45. Is the evidence against H0 statistically significant at the 5%
level? To determine significance, we need only compare the observed
z = −1.45 with the 5% critical value z ∗ = 1.645 from Table C (and
TM-142). Because z = −1.45 is not farther from 0 than -1.645, it is
not significant at level α = .05.
Definition. To test the hypothesis H0 : µ = µ0 based on an SRS of
size n from a population with unknown mean µ and known standard
deviation σ, compute the z test statistic
z=
x − µ0
√ .
σ/ n
10
Reflect H0 at significance level α against a one-sided alternative
Ha : µ > µ0
if
z ≥ z∗
Ha : µ < µ0
if
z ≤ −z ∗
where z ∗ is the upper α critical value from Table C (and TM-142).
Reject H0 at significance level α against a two-sided alternative
Ha : µ = µ0 if |z| ≥ z ∗
where z ∗ is the upper α/2 critical value from Table C (TM-142).
Example 5.14. The analytical laboratory of Example 5.4 is asked to
evaluate the claim that the concentration of the active ingredient in a
specimen is 0.86%. The lab makes 3 repeated analyses of the specimen.
The mean result is x = .8404. The true concentration is the mean µ of
the population of all analyses of the specimen. The standard deviation
of the analysis process is known to be σ = .0068. Is there significant
evidence at the 1% level that µ = .86?
Step 1: Hypotheses. The hypotheses are H0 : µ = .86 and Ha : µ =
.86.
Step 2: Test Statstic. The z statistic is
z=
.8404 − .86
√ = −4.99.
.0068/ 3
Step 3: Significance. Because the alternative is two-sided, we compare |z| = 4.99 with the α/2 = .005 critical value from Table C (and
TM-142). This critical value is Z ∗ = 2.576. Figure 5.15 (and TM93) illustrates the values of z that are statistically significant. Because
11
|z| > 2.576, we reject the null hypothesis and conclude (at the 1%
significance level) that the concentration is not as claimed.
Note. The P −value is the smallest level α at which the data are
significant. Knowing the P −value allows us to assess significance at
any level.
Tests from Confidence Intervals
Note. A level α two-sided significance test rejects a hypothesis H0 :
µ = µ0 exactly when the value µ0 falls outside a level 1 − α confidence
interval for µ
12