Download Kerns hypothesis testing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
Hypothesis Testing
10.2. TESTS FOR PROPORTIONS
219
• Y = 17, then throw away the torque converter.
Let p denote the proportion of defectives produced by the machine. Before the installation of
the torque converter p was 0.10. Then we installed the torque converter. Did p change? Did it
go up or down? We use statistics to decide. Our method is to observe data and construct a 95%
confidence interval for p,
r
p̂(1 − p̂)
p̂ ± zα/2
.
(10.2.1)
n
If the confidence interval is
• [0.01, 0.05], then we are 95% confident that 0.01 ≤ p ≤ 0.05, so there is evidence that
the torque converter is helping.
• [0.15, 0.19], then we are 95% confident that 0.15 ≤ p ≤ 0.19, so there is evidence that
the torque converter is hurting.
• [0.07, 0.11], then there is not enough evidence to conclude that the torque converter is
doing anything at all, positive or negative.
10.2.1
Terminology
The null hypothesis H0 is a “nothing” hypothesis, whose interpretation could be that nothing
has changed, there is no difference, there is nothing special taking place, etc.. In Example 10.1
the null hypothesis would be H0 : p = 0.10. The alternative hypothesis H1 is the hypothesis
that something has changed, in this case, H1 : p , 0.10. Our goal is to statistically test the
hypothesis H0 : p = 0.10 versus the alternative H1 : p , 0.10. Our procedure will be:
1. Go out and collect some data, in particular, a simple random sample of observations from
the machine.
2. Suppose that H0 is true and construct a 100(1 − α)% confidence interval for p.
3. If the confidence interval does not cover p = 0.10, then we rejectH0 . Otherwise, we fail
to rejectH0 .
Remark 10.2. Every time we make a decision it is possible to be wrong, and there are two
possible mistakes that we could make. We have committed a
Type I Error if we reject H0 when in fact H0 is true. This would be akin to convicting an
innocent person for a crime (s)he did not commit.
Type II Error if we fail to reject H0 when in fact H1 is true. This is analogous to a guilty
person escaping conviction.
Type I Errors are usually considered worse2 , and we design our statistical procedures to control
the probability of making such a mistake. We define the
significance level of the test = IP(Type I Error) = α.
(10.2.2)
We want α to be small which conventionally means, say, α = 0.05, α = 0.01, or α = 0.005 (but
could mean anything, in principle).
2
There is no mathematical difference between the errors, however. The bottom line is that we choose one type
of error to control with an iron fist, and we try to minimize the probability of making the other type. That being
said, null hypotheses are often by design to correspond to the “simpler” model, so it is often easier to analyze (and
thereby control) the probabilities associated with Type I Errors.
220
CHAPTER 10. HYPOTHESIS TESTING
• The rejection region (also known as the critical region) for the test is the set of sample
values which would result in the rejection of H0 . For Example 10.1, the rejection region
would be all possible samples that result in a 95% confidence interval that does not cover
p = 0.10.
• The above example with H1 : p , 0.10 is called a two-sided test. Many times we are
interested in a one-sided test, which would look like H1 : p < 0.10 or H1 : p > 0.10.
We are ready for tests of hypotheses for one proportion.
Table here.
Don’t forget the assumptions.
Example 10.3. Find
1. The null and alternative hypotheses
2. Check your assumptions.
3. Define a critical region with an α = 0.05 significance level.
4. Calculate the value of the test statistic and state your conclusion.
Example 10.4. Suppose p = the proportion of students who are admitted to the graduate
school of the University of California at Berkeley, and suppose that a public relations officer
boasts that UCB has historically had a 40% acceptance rate for its graduate school. Consider
the data stored in the table UCBAdmissions from 1973. Assuming these observations constituted a simple random sample, are they consistent with the officer’s claim, or do they provide
evidence that the acceptance rate was significantly less than 40%? Use an α = 0.01 significance
level.
Our null hypothesis in this problem is H0 : p = 0.4 and the alternative hypothesis is
H1 : p < 0.4. We reject the null hypothesis if p̂ is too small, that is, if
√
p̂ − 0.4
< −zα ,
0.4(1 − 0.4)/n
(10.2.3)
where α = 0.01 and −z0.01 is
> -qnorm(0.99)
[1] -2.326348
Our only remaining task is to find the value of the test statistic and see where it falls relative
to the critical value. We can find the number of people admitted and not admitted to the UCB
graduate school with the following.
> A <- as.data.frame(UCBAdmissions)
> head(A)
1
2
3
4
5
6
Admit Gender Dept Freq
Admitted
Male
A 512
Rejected
Male
A 313
Admitted Female
A
89
Rejected Female
A
19
Admitted
Male
B 353
Rejected
Male
B 207
10.2. TESTS FOR PROPORTIONS
221
> xtabs(Freq ~ Admit, data = A)
Admit
Admitted Rejected
1755
2771
Now we calculate the value of the test statistic.
> phat <- 1755/(1755 + 2771)
> (phat - 0.4)/sqrt(0.4 * 0.6/(1755 + 2771))
[1] -1.680919
Our test statistic is not less than −2.32, so it does not fall into the critical region. Therefore,
we fail to reject the null hypothesis that the true proportion of students admitted to graduate
school is less than 40% and say that the observed data are consistent with the officer’s claim at
the α = 0.01 significance level.
Example 10.5. We are going to do Example 10.4 all over again. Everything will be exactly
the same except for one change. Suppose we choose significance level α = 0.05 instead of
α = 0.01. Are the 1973 data consistent with the officer’s claim?
Our null and alternative hypotheses are the same. Our observed test statistic is the same: it
was approximately −1.68. But notice that our critical value has changed: α = 0.05 and −z0.05
is
> -qnorm(0.95)
[1] -1.644854
Our test statistic is less than −1.64 so it now falls into the critical region! We now reject
the null hypothesis and conclude that the 1973 data provide evidence that the true proportion of
students admitted to the graduate school of UCB in 1973 was significantly less than 40%. The
data are not consistent with the officer’s claim at the α = 0.05 significance level.
What is going on, here? If we choose α = 0.05 then we reject the null hypothesis, but
if we choose α = 0.01 then we fail to reject the null hypothesis. Our final conclusion seems
to depend on our selection of the significance level. This is bad; for a particular test, we
never know whether our conclusion would have been different if we had chosen a different
significance level.
Or do we?
Clearly, for some significance levels we reject, and for some significance levels we do not.
Where is the boundary? That is, what is the significance level for which we would reject at any
significance level bigger, and we would fail to reject at any significance level smaller? This
boundary value has a special name: it is called the p-value of the test.
Definition 10.6. The p-value, or observed significance level, of a hypothesis test is the probability when the null hypothesis is true of obtaining the observed value of the test statistic (such
as p̂) or values more extreme – meaning, in the direction of the alternative hypothesis3 .
3
Bickel and Doksum [7] state the definition particularly well: the p-value is “the smallest level of significance
α at which an experimenter using [the test statistic] T would reject [H0 ] on the basis of the observed [sample]
outcome x”.
222
CHAPTER 10. HYPOTHESIS TESTING
Example 10.7. Calculate the p-value for the test in Examples 10.4 and 10.5.
The p-value for this test is the probability of obtaining a z-score equal to our observed test
statistic (which had z-score ≈ −1.680919) or more extreme, which in this example is less than
the observed test statistic. In other words, we want to know the area under a standard normal
curve on the interval (−∞, −1.680919]. We can get this easily with
> pnorm(-1.680919)
[1] 0.04638932
We see that the p-value is strictly between the significance levels α = 0.01 and α = 0.05.
This makes sense: it has to be bigger than α = 0.01 (otherwise we would have rejected H0 in
Example 10.4) and it must also be smaller than α = 0.05 (otherwise we would not have rejected
H0 in Example 10.5). Indeed, p-values are a characteristic indicator of whether or not we would
have rejected at assorted significance levels, and for this reason a statistician will often skip the
calculation of critical regions and critical values entirely. If (s)he knows the p-value, then (s)he
knows immediately whether or not (s)he would have rejected at any given significance level.
Thus, another way to phrase our significance test procedure is: we will reject H0 at the
α-level of significance if the p-value is less than α.
Remark 10.8. If we have two populations with proportions p1 and p2 then we can test the null
hypothesis H0 : p1 = p2 .
Table Here.