Download 12.9 EVEL OF SIGNIFICANCE AND HYPOTHESIS TESTING

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability interpretations wikipedia , lookup

Inductive probability wikipedia , lookup

Dempster–Shafer theory wikipedia , lookup

Transcript
12.9 LEVEL OF SIGNIFICANCE AND HYPOTHESIS
TESTING POWER
Level of Significance
Our approach to hypothesis testing thus far has been to compute the
strength of statistical evidence for rejecting the null hypothesis based on
the observed data, such as the observed X in the case of hypothesis testing
concerning the population mean. The value of this computed probability is
called the observed level of significance, or in some courses and statistical
settings the p-value. The observed level of significance is the probability,
assuming the null hypothesis is true, that a randomly obtained sample
statistic is farther away from the null hypothesis value than the observed
sample statistic is. We saw in the previous section that the exact form of this
computation depends on the exact form of the alternative hypothesis.
Another common approach in standard statistical practice does not use
an observed level of significance obtained from the observed data. This
approach specifies a level of significance prior to the collection of data, often
the gold standard of 0.05, but sometimes the ”platinum” standard of 0.01,
and occasionally the ”silver” standard of 0.1. It is easiest to illustrate this
new approach with an example. Consider (again) the body temperature
problem:
H0 : ␮ ⳱ 98.6 versus H1 : ␮ ⬎ 98.6
under an assumption of population normality and a sample size of n ⳱ 36.
Under our new approach we decide prior to data collection that we want
the probability of type I error— the error of rejecting the null hypothesis
when it is true—to be 0.05. That is, we will reject the null hypothesis if the
observed X satisfies X ⬎ C, where we determine C by requiring that
p(X ⬎ C) ⳱ 0.05
under the assumption that the null hypothesis is true. Thus we have decided
on a region of the line, namely X ⬎ C, for which we will, after collecting
data, reject the null hypothesis in favor of the alternative hypothesis if
indeed X ⬎ C. The choice of C is determined by our choice of 0.05 as the
level of significance. The interval of X values given by X ⬎ C is called the
rejection region or the critical region.
Let’s see how to compute C and thus specify the critical region. To make
things simple, suppose we know from past experience with temperature data
that the theoretical standard deviation is SD ⳱ 0.5. Then p(X ⬎ C) ⳱ 0.05
must be used to compute the boundary point C of the critical region. We
have, assuming the null hypothesis to be true,
冢
0.05 ⳱ p(X ⬎ C) ⳱ p z ⬎
C ⫺ 98.6
0.5/6
冣
But we know from Table E that p(z ⬎ 1.645) ⳱ 0.05. Hence we set
1.645 ⳱
C ⫺ 98.6
0.5/6
(We have used the fact that p(z ⬎ a) ⳱ p(z ⬎ b) implies that a ⳱ b.) Solving
for C yields C ⳱ 98.74. Thus our critical or rejection region is the set of all
X’s that satisfy X ⬎ 98.74.
When we actually do the random experiment, any value of X that
exceeds 98.74 provides us with statistically significant evidence (at level
0.05) that we should reject the null hypothesis in favor of the alternative
hypothesis that ␮ ⬎ 98.6. For example, if we observe X ⳱ 98.8, we reject
the null hypothesis. But if X ⳱ 98.7, we do not have enough evidence to
reject the null hypothesis and in this sense accept the null hypothesis.
As a practicing statistician, you should be prepared to take either
an observed level of significance approach, as we have stressed prior to
this section, or to take a predetermined level of significance approach,
as described in this section. Of course, these two choices exist for a null
hypothesis about any population parameter, and not just the population
mean.
Power of a Test
Once we have used a specified level of significance to determine a critical
region, we can ask ourselves how powerful the test is, in the sense of how
probable it is to reject the null hypothesis for a particular specified value of
the parameter of interest for which the alternative hypothesis holds. Clearly,
for a test to be described as powerful, it must have a high probability of
rejecting the null hypothesis for values of the parameter of interest that the
person doing the statistical study considers to be far from the null hypothesis
value. Note that such a high probability amounts to a low probability of a
type II error, which is defined as accepting a false null hypothesis.
Let’s again consider the above example, where the critical region was
found to be X ⬎ 98.74. Suppose the medical scholar designing the experiment wants to know if there will be a high probability of rejecting the null
hypothesis if in fact ␮ ⳱ 98.8 is the true population mean, a particular
value that would mean the alternative hypothesis is true. Thus we need to
find p(reject the null hypothesis) under the assumption that ␮ ⳱ 98.8. But
we reject the null hypothesis if and only if X ⬎ 98.74. Hence,
p(reject the null hypothesis) ⳱ p(X ⬎ 98.74)
To solve this we must standardize to produce a z statistic obeying the normal
distribution of Table E. Here the population standard deviation is SD ⳱ 0.5,
the sample size is 36, and most important, ␮ ⳱ 98.8. Thus, standardizing,
we obtain
冢
p(X ⬎ 98.74) ⳱ p z ⬎
98.74 ⫺ 98.8
⳱ p(z ⬎ ⫺0.72) ⳱ 0.7642 ⬇ 0.76
0.5/6
冣
Note that we have centered at the true 98.8⬚ temperature and not at the
hypothesized 98.6⬚ temperature. The probability of 0.76 tells us that if
the true ␮ is 98.8, we will arrive at the correct decision to reject the null
hypothesis about 76% of the time. In this precise numerical sense, the test is
powerful against the particular alternative ␮ ⳱ 98.8.
In science, government, and industry, one of the most common problems
is that the sample size is limited because of time or money constraints and is
too small to yield the desired power to be able to reject the null hypothesis
for values the person doing the study really cares about. For example, if the
current medical treatment produces a cure 60% of the time, then we would
want a hypothesis testing study of a new treatment to tell us with very high
probability that the new treatment is better if in fact it has an alternative
hypothesis cure rate of 70%, versus the null hypothesis rate of 60%.
SECTION 12.9 EXERCISES
1. Find the critical region for the test of hypothesis in Exercise 1 of Section 12.8 for a level of
significance of 0.05.
2. Find the critical region for the test of hypothesis in Exercise 2 of Section 12.8 for a level of
significance of 0.05.
3. Find the power for the test of hypothesis in
Exercise 1 above for ␮ ⳱ 24.
4. Find the power for the test of hypothesis in
Exercise 2 above for ␮ ⳱ 201.
5. A candidate commissions a poll, wishing to
reject the null hypothesis that the proportion
of the population who support him is p ⳱ 12 .
The poll will have n ⳱ 400 people in it.
a. What is the alternative hypothesis?
b. Find C so that p( pˆ ⱖ C) ⳱ .05 when the
null hypothesis is true.
c. If pˆ ⳱ .47, do you accept or reject the null
hypothesis?
d. If pˆ ⳱ .53, do you accept or reject the null
hypothesis?
e. If pˆ ⳱ .61, do you accept or reject the null
hypothesis? (Use level of significance ⳱
.01.)
6. A sample of n ⳱ 100 boxes of cereal was obtained to test whether the population mean
weight of the contents is 16 ounces. The alternative is that the population mean weight
is larger than 16 ounces. (The manufacturer
does not want to put more cereal in the boxes
than necessary.)
The test is to reject the null hypothesis if
¯ exceeds c ⳱ 16.2. Assume
the sample mean X
the population standard deviation is 0.5.
a. What is p(X ⱖ c) when the null hypothesis
is true?
b. Suppose the population mean is ␮ ⳱ 16.1.
What is p(X ⱖ c)?
c. Suppose the population mean is ␮ ⳱ 16.2.
What is p(X ⱖ c)?
d. Suppose the population mean is ␮ ⳱ 16.3.
What is p(X ⱖ c)?
e. If the population mean is ␮ ⳱ 16.1, is there
a high probability one would reject the null
hypothesis? What if the population mean
is ␮ ⳱ 16.3?