Download 4 Hypothesis testing - School of Mathematics and Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Transcript
4 HYPOTHESIS TESTING
4
49
Hypothesis testing
In sections 2 and 3 we considered the problem of estimating a single parameter of interest,
θ.
In this section we consider the related problem of testing whether or not θ equals a
particular value of interest, or lies in a particular range of values of interest.
Estimation and hypothesis testing can be thought of as two related (dual) aspects of the
inference problem, as we shall see later.
4.1 Types of hypothesis and types of error
Suppose X1 , X2 , . . . , Xn are an independent random sample from a probability density
function fX (x|θ).
Instead of estimating θ, we now wish to use the sample to test hypotheses about θ.
Definition 4.1.1: Simple and composite hypotheses
We define a hypothesis to be an assertion or conjecture about θ. If the hypothesis completely specifies the distribution of X, it is called a simple hypothesis. Otherwise it is
called a composite hypothesis.
50
4 HYPOTHESIS TESTING
Example 4.1.1
Suppose we take an independent random sample X1 , X2 , . . . , Xn from a random variable
X ∼ N(µ, σ 2 ).
Conisder the following hypotheses. Which are simple and which are composite?
(i) H1 : µ = 100, σ = 15;
(ii) H2 : µ > 100, σ = 15;
(iii) H3 : µ > 100, σ = µ/10;
(iv) H4 : µ = 100;
(v) H5 : σ = 15;
(vi) H6 : µ < 100.
Solution:
Comparing two hypotheses
Usually in hypothesis testing we compare two hypotheses, the first, called the null hypothesis is
H0 : θ ∈ ω
and the second, the alternative hypothesis is
H1 : θ ∈ ω̄
S
T
where ω ⊂ S, ω ω̄ = S, ω ω̄ = ∅ and S is the set of all possible values for the parameter θ of the distribution of the random variable X.
51
4 HYPOTHESIS TESTING
Example 4.1.2
We are interested in whether a new method of sealing light bulbs increases the average
lifetime of the bulbs.
Here, if θ is the mean lifetime of the bulbs sealed by the new method, and we know the
mean lifetime of standard bulbs is 140 hours, our hypothesis test will be a test of
H0 : θ = 140
versus
H1 : θ > 140.
Now suppose we assume that the lifetime X of a new bulb follows an Exponential distribution, i.e. X ∼ Exp(1/θ).
Which of H0 and H1 is simple and which is composite?
What are the sets S, ω and ω which define this hypothesis test?
Solution:
Definition 4.1.2: Acceptance region and rejection region
Let A be the sample space of X, i.e. the set of all possible values of a randomSsample of
sizeTn from X. A test procedure divides A into subsets A0 and A1 (with A0 A1 = A,
A0 A1 = ∅) such that if
X ∈ A0 , we accept H0
and if
X ∈ A1 , we reject H0 and accept H1 .
A0 is called the acceptance region and A1 the rejection region of the test.
52
4 HYPOTHESIS TESTING
Definition 4.1.3: Type I error and type II error
When performing a test we may make the correct decision, or one of two possible errors:
(i) Type I error: reject H0 when it is true;
(ii) Type II error: accept H0 when it is false.
The type I error is usually regarded as the more serious mistake. The probabilities of
making type I and type II errors are usually denoted by α(θ) and β(θ) respectively.
Example 4.1.3
Now returning to the lightbulbs sealed by the new method in Example 4.1.2, suppose that
once again we wish to test:
H0 : θ = 140
versus
H1 : θ > 140,
and we collect some data consisting of ten measurements of lifetimes x1 , . . . , x10 .
Suppose we choose to accept H0 if the sample mean x̄ satisfies x̄ < 150, and to reject H0
(and hence accept H1 ) if x̄ ≥ 150.
What are the sample space, the acceptance region and the rejection region for this test?
What are the Type I and Type II errors in this specific case?
Solution:
53
4 HYPOTHESIS TESTING
In Sections 4.2 to 4.6 we will develop the ideas of hypothesis testing by studying the main important cases.
4.2 Inference for a single Normal sample
For this section we will assume that X1 , X2 , . . . , Xn is an i.i.d. random sample from a
N(µ, σ 2 ) distribution.
For the time being, we assume σ 2 is known, i.e. a constant.
Moreover, a particular value µ = µ0 for the population mean has been suggested by
previous work or ideas. In this case the null hypothesis is denoted by
H0 : µ = µ 0 .
There are a variety of options for the alternative hypothesis. Commonly used alternative
hypotheses are:
(A)
(B)
(C)
(D)
(E)
H1
H1
H1
H1
H1
:
:
:
:
:
µ = µ1 > µ0
µ = µ1 < µ0
µ > µ0
µ < µ0
µ 6= µ0 .
(µ1 fixed constant)
(µ1 fixed constant)
Example 4.2.1
Suppose the marks for a particular test are believed to follow a N(µ, 100) distribution,
and the null hypothesis is H0 : µ = 50.
In which category (A) - (E) are each of the following alternative hypotheses:
1. H1 : µ < 50;
Solution:
2. H1 : µ = 57;
3. H1 : µ 6= 50?
54
4 HYPOTHESIS TESTING
Alternative (E) is the most commonly used, and the easiest to justify in most real–life
situations. All the others assume some knowledge which it is usually unrealistic to assume.
The null and alternative hypotheses are treated in the following way: we adopt the null
hypothesis unless there is evidence against it.
The test statistic we choose to use for a single Normal sample is X̄, the sample mean.
It makes sense to test a hypothesis about the population mean µ using the sample mean
X̄, but more than this, we know the distribution of X̄ under the null hypothesis, which
is crucial.
If H0 is true, X1 , . . . , Xn are i.i.d. N(µ0 , σ 2 ) random variables, and so
X̄ ∼ N µ0 , σ 2 /n
⇒
Z=
X̄ − µ0
√ ∼ N(0, 1).
σ/ n
We now need to decide for which values of the test statistic we will reject H0 . These
values will comprise the rejection region A1 .
We reject H0 in cases
(A) or (C) :
(B) or (D) :
(E) :
if Z is sufficiently far into the right-hand tail;
if Z is sufficiently far into the left-hand tail;
if Z is sufficiently far into either tail.
In case (E) the rejection region is split between the tails of the distribution giving a twotailed test. The other cases are one-tailed tests.
If P (Type I error)=α, the test is said to have significance level α. Commonly used significance levels are 0.05 (5%), 0.01 (1%) and 0.001 (0.1%). Once the significance level is
chosen, the rejection region is precisely determined.
4 HYPOTHESIS TESTING
55
Example 4.2.2
For α = 0.05, calculate the rejection regions (in terms of z) for each category of alternative
hypothesis (A) - (E).
Solution:
Example 4.2.3
The widths (mm) of 64 beetles chosen from a particular locality were measured and the
sample mean was found to be
x̄ = 24.8.
Previous extensive measurements of beetles of the same species had shown the widths to
be Normally distributed with mean 23mm and variance 16mm.
Test at the 5% level whether or not the beetles from the chosen locality have a different
mean width from the main population, assuming that they have the same variance.
4 HYPOTHESIS TESTING
Solution:
56
4 HYPOTHESIS TESTING
57
4.2 cont. A single Normal sample with unknown variance σ 2
Now we consider hypothesis tests about µ where X1 , X2 , . . . , Xn is an i.i.d. random sample
from a N(µ, σ 2 ) distribution, and σ 2 is unknown.
This is usually more realistic than assuming we know σ 2 , but it is also a more complex
problem. We have to estimate µ in the presence of the nuisance parameter σ 2 .
The solution is to replace σ 2 with a suitable estimate; here we use the sample variance
S 2.
Example 4.2.4
Cola makers test new recipes for loss of sweetness during storage. For one particular
recipe, ten trained tasters rate the sweetness before and after, enabling us to calculate
the change (sweetness after storage minus sweetness before storage), as follows:
Before
After
Change
8.0 7.6 8.1 8.2 6.8 7.9 8.0 9.2 7.1 7.0
6.0 7.2 7.4 6.2 7.2 5.7 9.3 7.9 6.0 4.7
-2.0 -0.4 -0.7 -2.0 0.4 -2.2 1.3 -1.3 -1.1 -2.3
Is there evidence that in general, the storage causes the cola to lose sweetness?
Solution:
58
4 HYPOTHESIS TESTING
Solution/cont.
When we knew σ 2 , we used the test statistic
Z=
X̄ − µ0
√ ,
σ/ n
which we know has a N(0, 1) distribution.
Now we are estimating σ 2 using S 2 , so our test statistic becomes
T =
X̄ − µ0
√ ,
s/ n
and this has a slightly different distribution, called the Student t distribution, or just the
t distribution . . .
Definition 4.2.1: The Student t distribution
If Z ∼ N(0, 1) and U ∼ χ2n are independent random variables then
Z
Tn = p
U/n
has a Student t-distribution on n-degrees of freedom. The distribution is denoted by tn .
59
4 HYPOTHESIS TESTING
Example 4.2.5
Sketch the t − distribution with
(a) 1;
(b) 5;
(c) 100
degrees of freedom.
Solution:
0.0 0.1 0.2 0.3
-4
-2
0
t1
2
4
-4
-2
0
t5
2
4
-4
-2
0
t100
2
4
0.0 0.1 0.2 0.3 0.4
pdf
pdf
0.0
pdf
0.1 0.2
0.3
Figure 2: the t1 , t5 and t100 distributions
60
4 HYPOTHESIS TESTING
The t-distribution with n degrees of freedom has a p.d.f. which is symmetric and bellshaped, like the Normal, but with somewhat thicker tails.
Smaller values of n correspond to the thickest tails. Larger values of n cause the tn
distribution to be more like the Normal distribution.
All we have to be able to do is to use statistical tables or R to look up the appropriate
tail probability, since the distribution of our test statistic is given by:
Tn−1 =
X̄ − µ0
√ ∼ tn−1 .
s/ n
Example 4.2.6
For the cola example in 4.2.4 the test statistic was t = −2.697, and the sample size was
n = 10.
Carry out the test of
H0 : µ = 0 (no loss in sweetness);
against
H1 : µ < 0 (some loss in sweetness).
Solution:
61
4 HYPOTHESIS TESTING
4.3 Hypothesis test for two Normal means: two–sample t–test
Now suppose we have two samples (x1 , x2 , . . . , xn1 ) and (xn1 +1 , xn1 +2 , . . . , xn1 +n2 ), i.e.
samples of sizes n1 and n2 from two different populations. We are interested in whether
the two population means are equal.
Assuming that the data are sampled from Normally distributed populations with equal
variance, σ 2 , in each population, then if we want to test
H0 : µ1 = µ2 versus H1 : µ1 6= µ2
where µ1 and µ2 are the means of each population, we can perform a t-test with test
statistic given by . . .
x¯1 − x¯2
, where s =
t= q
s n11 + n12
s
(n1 − 1)s21 + (n2 − 1)s22
,
n1 + n2 − 2
where x¯1 , x¯2 , s1 and s2 are the sample means and standard deviations from each population.
Here s =
√
s2 is the pooled estimate of the common standard deviation σ.
If the null hypothesis is true, then the test statistic comes from a t–distribution on
n1 + n2 − 2 degrees of freedom, so we use the tables for tn1 +n2 −2 to carry out the test.
This test is called the two–sample t–test.
4 HYPOTHESIS TESTING
62
Example 4.3.1
Consider the lifetime of two brands of light bulbs. For a random sample of n1 = 12 bulbs
of one brand the mean bulb life is x̄1 = 3, 400 hours with a sample standard deviation of
s1 = 240 hours.
For the second brand of bulbs the mean bulb life for a sample of n2 = 8 bulbs is x̄2 = 2, 800
hours with s2 = 210 hours.
We assume that distribution of bulb life is approximately Normal, and the standard
deviations of the two populations are assumed to be equal. Test
H0 : µ1 = µ2 versus H1 : µ1 6= µ2
using a two sample t-test at the 1% level.
Solution:
63
4 HYPOTHESIS TESTING
4.4 Two Normal populations: testing the assumption of equal
variances
In Section 4.3 we had to make the assumption that our two Normal populations had equal
variance σ 2 . Here we see how we can carry out a hypothesis test to check this assumption!
We denote the two population variances by σ12 and σ22 . We wish to test
H0 : σ12 = σ22 versus H1 : σ12 6= σ22 .
Notice that these hypotheses don’t make any assumptions about the values of µ1 and µ2 .
If the null hypothesis is true, then the ratio of sample variances
S12
S22
will have a distribution called the F-distribution, on n1 − 1 and n2 − 1 degrees of freedom.
Definition 4.4.1: The F distribution
If U and V are independent chi-square random variables such that U ∼ χ2r and V ∼ χ2s ,
then
U/r
F =
V /s
has an F distribution on r and s degrees of freedom. The distribution is denoted by Fr,s .
Note that the F distribution is characterized by two separate measures of degrees of
freedom: r corresponds to the numerator and s corresponds to the denominator. Printed
F tables are available, and of course we can always use R (except in an exam!).
Note that it follows immediately that the reciprocal ratio of sample variances
S22
S12
will have an F distribution on n2 − 1 and n1 − 1 degrees of freedom.
In practice, we carry out the hypothesis test for equal variances as follows. We will only
consider the case of the two–sided alternative (“not equal”), giving rise to a two–tailed
test. In this case it is sensible to reject H0 if either s21 /s22 or s22 /s21 is large. We form our
test statistic as
2 2
s s
F = max 12 , 22 ,
s2 s1
and compare this with Fr,s tables, where if s11 > s22 we set r = n1 − 1 and s = n2 − 1,
while if s12 > s21 we set r = n2 − 1 and s = n1 − 1. To account for the fact that under H0 ,
these two outcomes could happen with equal probability, the significance level of the test
is *double* the upper tail probability of the F distribution (obtained from tables or R).
4 HYPOTHESIS TESTING
64
Example 4.4.1
For the data in Example 4.3.1, test the assumption that the standard deviations of the
two populations are equal.
Solution:
4 HYPOTHESIS TESTING
65
4.5 Inference for a single Binomial proportion (r not small!)
Here we consider the situation where we have a single observation x from a Binomial
random variable X ∼ Bin(r, θ), and we are interested in testing hypotheses about θ.
Note that x can be viewed as the number of successes from r independent trials, each
with success probability θ. In this section we consider the case where r is not small, i.e.
r > 20.
We will test H0 : θ = θ0 against an alternative from one of the categories (A) to (E)
above.
Example 4.5.1
UK survey of sexual behaviour: in 2004/05, 11% of UK residents aged 16–49 claimed to
have had more than one sexual partner.
Suppose that in 2008-09, a random sample of 600 UK residents in the 16–49 age–group
shows that 83 had more than one sexual partner.
Is this evidence for an increase in the population proportion having more than one sexual
partner?
Formulate this problem as a hypothesis test.
Solution:
66
4 HYPOTHESIS TESTING
We need to derive a test statistic whose distribution we can evaluate conditional on H0
being true.
We use the Normal approximation to the Binomial distribution.
I.e. if
X ∼ Bin(r, θ),
with r > 20, then to a reasonable approximation
X ∼ N[rθ, rθ(1 − θ)].
(Note that the approximation involves rounding the outcome of a Normal random variable
to the nearest integer! See below.)
Now suppose the null hypothesis H0 is true, i.e. θ = θ0 .
Then the Normal approximation implies
X ∼ N[rθ0 , rθ0 (1 − θ0 )],
and hence the test statistic
has a N(0,1) distribution.
Z=p
X − rθ0
rθ0 (1 − θ0 )
This means we can carry out a one–sample z–test exactly as we did in Section 4.2.
N.B. because of the rounding issue, it makes sense to replace x in the test statistic by
x − 0.5 when x > rθ0 , and by x + 0.5 when x < rθ0 . This is called a continuity correction.
Example 4.5.2
For the sexual behaviour data in Example 4.5.1 we have r = 600, we have observed x = 83,
and we want to test
H0 : θ = 0.11
against
H1 : θ > 0.11.
Carry out the hypothesis test.
4 HYPOTHESIS TESTING
67
Solution:
Notes on significance levels and p–values
1. If you are not told what level of significance to use, a sensible procedure is to test at
the 5% level. If not significant then stop, otherwise test at the 1% level. If not significant
then stop, otherwise test at the 0.1% level.
2. If you have access to the p-value, e.g. from Normal tables, or from R (see Exercises
4B Questions 1 and 2) then you immediately have the result of a hypothesis test at any
given significance level.
E.g. in Example 4.5.2 immediately above, we had p = 0.0158. It follows immediately
that our test is significant at 5% but not at 1%, because 0.05 > p > 0.01.
68
4 HYPOTHESIS TESTING
4.6 Inference for two Binomial proportions (samples not small!)
Example 4.6.1
Consider a survey of employment carried out seperately in Northern England and Scotland, among people who had left school six months earlier. Suppose we obtain the following data:
Scotland
Northern England
Total
Unemployed
Employed
In general we have two independent samples of size n1 and n2 , with each observation
classified as success or failure:
Success
Failure
Sample 1
O11
O21
n1
Sample 2
Total
O12
R1 = O11 + O12
O22
R2 = O21 + O22
n2
n = n1 + n2
Assuming all observations are independent, and that the success probability is constant
within each sample, we have two Binomial samples. Suppose that the true probabilities
of success are θ1 and θ2 . We wish to test
H0 : θ1 = θ2
versus
H1 : θ1 6= θ2 .
As always with a hypothesis test, we need to find a test statistic whose distribution is
known when H0 is true.
Now if H0 is true, then θ1 = θ2 = θ, say.
The combined samples give the number of successes in n1 + n2 trials, in each of which
there is a probability θ of a success. So we may estimate θ by θ̂ = R1 /n, where
R1 = O11 + O12 (total for first row)
n = n1 + n2 (grand total).
69
4 HYPOTHESIS TESTING
Hence, under H0 , the expected number of successes in each of the samples is
E11 =
n1 R2
n2 R1
n2 R2
n1 R1
; E21 =
; E12 =
; E22 =
.
n
n
n
n
where
R2 = O21 + O22 (total for second row).
To measure how closely the expected values match the observed values we calculate the
test statistic
2 X
2
X
(Oij − Eij )2
2
X =
.
Eij
i=1 j=1
Under H0 , X 2 has an asymptotic distribution which is a χ21 distribution (a “chi–square
distribution with 1 degree of freedom”).
Definition 4.6.1: The chi–square distribution χ2n
If Z1 , . . . , Zn are independent N(0, 1) random variables, then
X2 =
n
X
Zi2
i=1
has a chi–square distribution on n-degrees of freedom. The distribution is denoted by χ2n .
If H0 is true, the observed values should be close to the expected values, and so X 2 will
be small. Hence we reject H0 if X 2 is large enough, using Tables (or R).
70
4 HYPOTHESIS TESTING
Example 4.6.2
Consider the data in Example 4.6.1:
Test
H0 : the unemployment rates are equal
against
H1 : the unemployment rates are not equal.
Solution:
71
4 HYPOTHESIS TESTING
Notes
1. The method we just described for 2 × 2 tables also works for r × c tables, that is
tables with r rows and c columns. The test statistic is given by
2
X =
r X
c
X
(Oij − Eij )2
i=1 j=1
Eij
,
and this is compared with a chi-square distribution with (r − 1) × (c − 1) degrees of
freedom, i.e. χ2(r−1)(c−1) .
2. Since deviation from what is expected under H0 always corresponds to higher values of X 2 , chi–square tests for 2 proportions (and for r × c contingency tables)
are ***always***
1–tailed, and always use the upper tail of the chi–square
distribution!!!
4 HYPOTHESIS TESTING
72
4.7 The relationship between hypothesis tests and confidence intervals
Every hypothesis test we carry out has a corresponding confidence interval associated
with it!
Example 4.7.1
For the beetle widths given in Example 4.2.3, calculate a 95% confidence interval for the
population mean µ.
Solution:
Looking back at that example, we can deduce immediately that 23 also lies outside the
99% confidence interval, and the 99.9% confidence interval. (Exercise: check this!)
The general rule is:
The 100(1 − α)% confidence interval consists precisely of all those values which would not
be rejected at the 100α% significance level.
73
4 HYPOTHESIS TESTING
4.8 Hypothesis tests: size and power function
Hypothesis tests can be described in terms of their size and power.
Definition 4.8.1: the size of a hypothesis test
Consider a particular hypothesis test on a single parameter θ. We define the size of the
test to be
sup{Pr(reject H0 )}.
θ∈ω
Note that for a simple null hypothesis, this is just the probability we reject H0 if it’s true,
i.e. the probability of a Type I error.
For a composite null hypothesis, it is the supremum of this rejection probability over all
the values of θ for which the null hypothesis holds.
Definition 4.8.2: the power function for a hypothesis test
The power K(θ) is the probability of rejecting H0 , considered as a function of θ.
A plot of the power function is helpful in determining how good our test is at rejecting
the null hypothesis when it is false.
Informally, the power of a test is often used to refer to the probability that it will reject
the null hypothesis when it is false. However from our different categories of alternative
hypothesis (A) - (E), this only makes real sense for (A) and (B), i.e. when we are
comparing two simple hypotheses.
74
4 HYPOTHESIS TESTING
Example 4.8.1
Suppose X1 , . . . , X4 is a random sample from X ∼ N(µ, 36), and we wish to test
H0 : µ = 10
against
H1 : µ > 10.
Note that this is a one–tailed alternative. Now suppose we base our rejection region on
the value of X̄; specifically we construct it as A1 = {X : X̄ > 17}.
(a) Plot the power function for this test in the range 10 ≤ µ ≤ 20.
(b) What is the size of this test?
(c) What would be the power of the test if the alternative was, in fact, H1 : µ = 22?
Solution:
4 HYPOTHESIS TESTING
75
Solution: (cont.)
Choice of rejection region
In Example 4.8.1 we found that our rejection region gave a test with desirable properties:
a ‘standard’ size of 1%, and a well-defined power function.
So how can we design such a test ourselves? Fortunately there is a very useful theorem
which helps us to define an ‘optimal’ rejection region...
76
4 HYPOTHESIS TESTING
The Neyman-Pearson Lemma
Suppose we have a random sample x1 , x2 , . . . , xn from a random variable X with density
fX (x|θ), and we wish to test H0 : θ = θ0 against the simple alternative H1 : θ = θ1 .
Consider the Likelihood Ratio defined as
Λ(x) =
L(θ0 |x)
.
L(θ1 |x)
Suppose we define a test by rejecting H0 in favour of H1 if Λ(x) is small enough. Specifically, suppose we choose a cut–off point η such that Pr(Λ(x) ≤ η|H0) = α.
Then the test based on the rejection region A1 = {x : Λ(x) ≤ η} is the most powerful test
of size α.
Now suppose we have a composite alternative hypothesis H1 : θ ∈ Θ1 . If the test is the
most powerful for all θ1 ∈ Θ1 , then it is said to be the uniformly most powerful (UMP)
test for alternatives in the set Θ1 .
Notes
1. Informally, the Neyman-Pearson Lemma says that if we base our test on the value
of the likelihood ratio, then we get the best possible test (in the sense of being the
most powerful).
2. Note that if we need to define a rejection region in terms of
Λ(x) =
L(θ0 |x)
,
L(θ1 |x)
it is often easier to work with
log[Λ(x)] = log[L(θ0 |x)] − log[L(θ1 |x)].
Example 4.8.2
Suppose X1 , X2 , . . . , Xn is a random sample from a N(µ, σ 2 ) distribution where µ is known,
and we wish to test
H0 : σ 2 = σ02
versus
H1 : σ 2 = σ12 ,
where σ02 < σ12 .
Find an appropriate test statistic on which to base a rejection region.
4 HYPOTHESIS TESTING
Solution:
77
4 HYPOTHESIS TESTING
78
4.9 Small sample methods
In this section we consider statistical inference (estimation and hypothesis testing) in
situations where the sample size is small.
The crucial change from large–sample methods is that we can no longer rely on the
asymptotic distribution of either the maximum likelihood estimator, or the test statistic,
in a hypothesis test.
In fact the cases for one and two Normal means have already been dealt with, because
the adjustments made to deal with unknown variance (t–tests!) work for arbitrarily small
samples.
The cases which need special treatment are the cases of (a) inference on one Binomial
proportion, and (b) the comparison of two Binomial proportions . . .
4.9.1 Inference for a single Binomial proportion (r is small!)
Suppose we have a single observation x from a Binomial random variable X ∼ Bin(r, θ),
and we want to test hypotheses about θ.
This is the same kind of problem as we considered in Section 4.5, but this time we assume
that the number of trials r is small, i.e. ≤ 20. The crucial difference is that the Normal
approximation is now too poor to use, and we should use the Binomial distribution directly (using Tables or R).
The fact that we are now working with a genuinely discrete distribution leads to a complication: we cannot carry out a test precisely for any specified significance level; we have
to use the nearest approximate significance level.
Example 4.9.1
A leading cat–food manufacturer has a slogan which could be interpreted as follows:
“80% of cats prefer our product.”
In an experiment to test this, 20 cats are each given the choice between the product in
question, Brand W, and the leading market competitor, Brand X.
Result: 12 cats go for Brand W, and 8 cats go for Brand X.
Is this evidence against Brand W’s claim?
4 HYPOTHESIS TESTING
Solution:
79
80
4 HYPOTHESIS TESTING
4.9.2 Inference for two Binomial proportions (small samples!)
Here we consider the same kind of problem as in Section 4.6, i.e. two Binomial proportions,
with the data arranged in a 2 × 2 contingency table.
Here we consider the case when one or both samples are small.
Example 4.9.2
A small study into the dieting habits of teenagers is undertaken, to investigate whether
or not the proportions of males and females who diet are equal.
Suppose the population proportions of males and females who are dieting at any one time
are denoted by θM and θF respectively.
We wish to test:
H0 : θM = θF
against
H1 : θM 6= θF .
A random sample of 12 boys and 12 girls is selected, and we ascertain whether each
individual is currently on a diet.
Data:
Table 4.1
boys girls Total
dieting
not dieting
Total
81
4 HYPOTHESIS TESTING
It certainly appears that in the population, girls are more likely to be dieting, since in
our sample:
9 out of 12 girls are dieting;
1 out of 12 boys are dieting.
The question is:
“How significant are these results?”
In other words, how much evidence do we have against H0 : θM = θF ?
The way we answer this is that we assume the row totals and the column totals are fixed at
the observed values. We then assume that H0 is true (as ever!) and we ask, how unlikely
is the result we have observed?
In other words:
If we were to choose 10 of the teenagers at random, what is the probability
that 9 of them would be among the 12 girls, and only 1 from among the 12
boys?
The p–value for this test will be the probability of all outcomes which are as extreme as
this one, or more so . . .
Solution:
We introduce the notation:
boys girls Total
dieting
not dieting
Total
82
4 HYPOTHESIS TESTING
Table 4.2
boys girls Total
dieting
not dieting
Total
4 HYPOTHESIS TESTING
83
Review of Section 4
In this section we have:
1. Introduced the principles of hypothesis testing.
2. Seen how to carry out hypothesis tests in some specific cases when the sample size
n is reasonably large:
(a) the mean of a single Normal population (variance known and unknown);
(b) the means of two Normal populations (variances assumed equal);
(c) the variances of two Normal populations;
(d) the success probability for a single Binomial proportion;
(e) the success probabilities for two Binomial proportions;
3. Introduced three new probability distributions needed to carry out the tests: the
Student t distribution, the F distribution and the chi–square distribution.
4. Learned how to use Statistical Tables to carry out the tests at specific significance
levels.
5. Learned how to use R to do some of these tests, and to interpret the precise p–value
obtained.
6. Understood the relationship between hypothesis tests and confidence intervals.
7. Considered the properties of hypothesis tests, namely size and power.
8. Seen how we may construct the most powerful tests using the Neyman-Pearson
Lemma.
9. Seen how to carry out hypothesis tests relating to the success probability of a single
Binomial distribution when the number of trials is small (≤ 20);
10. Seen how to carry out hypothesis tests to compare two Binomial proportions when
the sample sizes in a 2 × 2 contingency table are small. [Note that for the cases of
one Normal mean and two Normal means, the methods we developed in Section 3
(z-tests and t-tests) already work for arbitrarily small samples.]