Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
4 HYPOTHESIS TESTING 4 49 Hypothesis testing In sections 2 and 3 we considered the problem of estimating a single parameter of interest, θ. In this section we consider the related problem of testing whether or not θ equals a particular value of interest, or lies in a particular range of values of interest. Estimation and hypothesis testing can be thought of as two related (dual) aspects of the inference problem, as we shall see later. 4.1 Types of hypothesis and types of error Suppose X1 , X2 , . . . , Xn are an independent random sample from a probability density function fX (x|θ). Instead of estimating θ, we now wish to use the sample to test hypotheses about θ. Definition 4.1.1: Simple and composite hypotheses We define a hypothesis to be an assertion or conjecture about θ. If the hypothesis completely specifies the distribution of X, it is called a simple hypothesis. Otherwise it is called a composite hypothesis. 50 4 HYPOTHESIS TESTING Example 4.1.1 Suppose we take an independent random sample X1 , X2 , . . . , Xn from a random variable X ∼ N(µ, σ 2 ). Conisder the following hypotheses. Which are simple and which are composite? (i) H1 : µ = 100, σ = 15; (ii) H2 : µ > 100, σ = 15; (iii) H3 : µ > 100, σ = µ/10; (iv) H4 : µ = 100; (v) H5 : σ = 15; (vi) H6 : µ < 100. Solution: Comparing two hypotheses Usually in hypothesis testing we compare two hypotheses, the first, called the null hypothesis is H0 : θ ∈ ω and the second, the alternative hypothesis is H1 : θ ∈ ω̄ S T where ω ⊂ S, ω ω̄ = S, ω ω̄ = ∅ and S is the set of all possible values for the parameter θ of the distribution of the random variable X. 51 4 HYPOTHESIS TESTING Example 4.1.2 We are interested in whether a new method of sealing light bulbs increases the average lifetime of the bulbs. Here, if θ is the mean lifetime of the bulbs sealed by the new method, and we know the mean lifetime of standard bulbs is 140 hours, our hypothesis test will be a test of H0 : θ = 140 versus H1 : θ > 140. Now suppose we assume that the lifetime X of a new bulb follows an Exponential distribution, i.e. X ∼ Exp(1/θ). Which of H0 and H1 is simple and which is composite? What are the sets S, ω and ω which define this hypothesis test? Solution: Definition 4.1.2: Acceptance region and rejection region Let A be the sample space of X, i.e. the set of all possible values of a randomSsample of sizeTn from X. A test procedure divides A into subsets A0 and A1 (with A0 A1 = A, A0 A1 = ∅) such that if X ∈ A0 , we accept H0 and if X ∈ A1 , we reject H0 and accept H1 . A0 is called the acceptance region and A1 the rejection region of the test. 52 4 HYPOTHESIS TESTING Definition 4.1.3: Type I error and type II error When performing a test we may make the correct decision, or one of two possible errors: (i) Type I error: reject H0 when it is true; (ii) Type II error: accept H0 when it is false. The type I error is usually regarded as the more serious mistake. The probabilities of making type I and type II errors are usually denoted by α(θ) and β(θ) respectively. Example 4.1.3 Now returning to the lightbulbs sealed by the new method in Example 4.1.2, suppose that once again we wish to test: H0 : θ = 140 versus H1 : θ > 140, and we collect some data consisting of ten measurements of lifetimes x1 , . . . , x10 . Suppose we choose to accept H0 if the sample mean x̄ satisfies x̄ < 150, and to reject H0 (and hence accept H1 ) if x̄ ≥ 150. What are the sample space, the acceptance region and the rejection region for this test? What are the Type I and Type II errors in this specific case? Solution: 53 4 HYPOTHESIS TESTING In Sections 4.2 to 4.6 we will develop the ideas of hypothesis testing by studying the main important cases. 4.2 Inference for a single Normal sample For this section we will assume that X1 , X2 , . . . , Xn is an i.i.d. random sample from a N(µ, σ 2 ) distribution. For the time being, we assume σ 2 is known, i.e. a constant. Moreover, a particular value µ = µ0 for the population mean has been suggested by previous work or ideas. In this case the null hypothesis is denoted by H0 : µ = µ 0 . There are a variety of options for the alternative hypothesis. Commonly used alternative hypotheses are: (A) (B) (C) (D) (E) H1 H1 H1 H1 H1 : : : : : µ = µ1 > µ0 µ = µ1 < µ0 µ > µ0 µ < µ0 µ 6= µ0 . (µ1 fixed constant) (µ1 fixed constant) Example 4.2.1 Suppose the marks for a particular test are believed to follow a N(µ, 100) distribution, and the null hypothesis is H0 : µ = 50. In which category (A) - (E) are each of the following alternative hypotheses: 1. H1 : µ < 50; Solution: 2. H1 : µ = 57; 3. H1 : µ 6= 50? 54 4 HYPOTHESIS TESTING Alternative (E) is the most commonly used, and the easiest to justify in most real–life situations. All the others assume some knowledge which it is usually unrealistic to assume. The null and alternative hypotheses are treated in the following way: we adopt the null hypothesis unless there is evidence against it. The test statistic we choose to use for a single Normal sample is X̄, the sample mean. It makes sense to test a hypothesis about the population mean µ using the sample mean X̄, but more than this, we know the distribution of X̄ under the null hypothesis, which is crucial. If H0 is true, X1 , . . . , Xn are i.i.d. N(µ0 , σ 2 ) random variables, and so X̄ ∼ N µ0 , σ 2 /n ⇒ Z= X̄ − µ0 √ ∼ N(0, 1). σ/ n We now need to decide for which values of the test statistic we will reject H0 . These values will comprise the rejection region A1 . We reject H0 in cases (A) or (C) : (B) or (D) : (E) : if Z is sufficiently far into the right-hand tail; if Z is sufficiently far into the left-hand tail; if Z is sufficiently far into either tail. In case (E) the rejection region is split between the tails of the distribution giving a twotailed test. The other cases are one-tailed tests. If P (Type I error)=α, the test is said to have significance level α. Commonly used significance levels are 0.05 (5%), 0.01 (1%) and 0.001 (0.1%). Once the significance level is chosen, the rejection region is precisely determined. 4 HYPOTHESIS TESTING 55 Example 4.2.2 For α = 0.05, calculate the rejection regions (in terms of z) for each category of alternative hypothesis (A) - (E). Solution: Example 4.2.3 The widths (mm) of 64 beetles chosen from a particular locality were measured and the sample mean was found to be x̄ = 24.8. Previous extensive measurements of beetles of the same species had shown the widths to be Normally distributed with mean 23mm and variance 16mm. Test at the 5% level whether or not the beetles from the chosen locality have a different mean width from the main population, assuming that they have the same variance. 4 HYPOTHESIS TESTING Solution: 56 4 HYPOTHESIS TESTING 57 4.2 cont. A single Normal sample with unknown variance σ 2 Now we consider hypothesis tests about µ where X1 , X2 , . . . , Xn is an i.i.d. random sample from a N(µ, σ 2 ) distribution, and σ 2 is unknown. This is usually more realistic than assuming we know σ 2 , but it is also a more complex problem. We have to estimate µ in the presence of the nuisance parameter σ 2 . The solution is to replace σ 2 with a suitable estimate; here we use the sample variance S 2. Example 4.2.4 Cola makers test new recipes for loss of sweetness during storage. For one particular recipe, ten trained tasters rate the sweetness before and after, enabling us to calculate the change (sweetness after storage minus sweetness before storage), as follows: Before After Change 8.0 7.6 8.1 8.2 6.8 7.9 8.0 9.2 7.1 7.0 6.0 7.2 7.4 6.2 7.2 5.7 9.3 7.9 6.0 4.7 -2.0 -0.4 -0.7 -2.0 0.4 -2.2 1.3 -1.3 -1.1 -2.3 Is there evidence that in general, the storage causes the cola to lose sweetness? Solution: 58 4 HYPOTHESIS TESTING Solution/cont. When we knew σ 2 , we used the test statistic Z= X̄ − µ0 √ , σ/ n which we know has a N(0, 1) distribution. Now we are estimating σ 2 using S 2 , so our test statistic becomes T = X̄ − µ0 √ , s/ n and this has a slightly different distribution, called the Student t distribution, or just the t distribution . . . Definition 4.2.1: The Student t distribution If Z ∼ N(0, 1) and U ∼ χ2n are independent random variables then Z Tn = p U/n has a Student t-distribution on n-degrees of freedom. The distribution is denoted by tn . 59 4 HYPOTHESIS TESTING Example 4.2.5 Sketch the t − distribution with (a) 1; (b) 5; (c) 100 degrees of freedom. Solution: 0.0 0.1 0.2 0.3 -4 -2 0 t1 2 4 -4 -2 0 t5 2 4 -4 -2 0 t100 2 4 0.0 0.1 0.2 0.3 0.4 pdf pdf 0.0 pdf 0.1 0.2 0.3 Figure 2: the t1 , t5 and t100 distributions 60 4 HYPOTHESIS TESTING The t-distribution with n degrees of freedom has a p.d.f. which is symmetric and bellshaped, like the Normal, but with somewhat thicker tails. Smaller values of n correspond to the thickest tails. Larger values of n cause the tn distribution to be more like the Normal distribution. All we have to be able to do is to use statistical tables or R to look up the appropriate tail probability, since the distribution of our test statistic is given by: Tn−1 = X̄ − µ0 √ ∼ tn−1 . s/ n Example 4.2.6 For the cola example in 4.2.4 the test statistic was t = −2.697, and the sample size was n = 10. Carry out the test of H0 : µ = 0 (no loss in sweetness); against H1 : µ < 0 (some loss in sweetness). Solution: 61 4 HYPOTHESIS TESTING 4.3 Hypothesis test for two Normal means: two–sample t–test Now suppose we have two samples (x1 , x2 , . . . , xn1 ) and (xn1 +1 , xn1 +2 , . . . , xn1 +n2 ), i.e. samples of sizes n1 and n2 from two different populations. We are interested in whether the two population means are equal. Assuming that the data are sampled from Normally distributed populations with equal variance, σ 2 , in each population, then if we want to test H0 : µ1 = µ2 versus H1 : µ1 6= µ2 where µ1 and µ2 are the means of each population, we can perform a t-test with test statistic given by . . . x¯1 − x¯2 , where s = t= q s n11 + n12 s (n1 − 1)s21 + (n2 − 1)s22 , n1 + n2 − 2 where x¯1 , x¯2 , s1 and s2 are the sample means and standard deviations from each population. Here s = √ s2 is the pooled estimate of the common standard deviation σ. If the null hypothesis is true, then the test statistic comes from a t–distribution on n1 + n2 − 2 degrees of freedom, so we use the tables for tn1 +n2 −2 to carry out the test. This test is called the two–sample t–test. 4 HYPOTHESIS TESTING 62 Example 4.3.1 Consider the lifetime of two brands of light bulbs. For a random sample of n1 = 12 bulbs of one brand the mean bulb life is x̄1 = 3, 400 hours with a sample standard deviation of s1 = 240 hours. For the second brand of bulbs the mean bulb life for a sample of n2 = 8 bulbs is x̄2 = 2, 800 hours with s2 = 210 hours. We assume that distribution of bulb life is approximately Normal, and the standard deviations of the two populations are assumed to be equal. Test H0 : µ1 = µ2 versus H1 : µ1 6= µ2 using a two sample t-test at the 1% level. Solution: 63 4 HYPOTHESIS TESTING 4.4 Two Normal populations: testing the assumption of equal variances In Section 4.3 we had to make the assumption that our two Normal populations had equal variance σ 2 . Here we see how we can carry out a hypothesis test to check this assumption! We denote the two population variances by σ12 and σ22 . We wish to test H0 : σ12 = σ22 versus H1 : σ12 6= σ22 . Notice that these hypotheses don’t make any assumptions about the values of µ1 and µ2 . If the null hypothesis is true, then the ratio of sample variances S12 S22 will have a distribution called the F-distribution, on n1 − 1 and n2 − 1 degrees of freedom. Definition 4.4.1: The F distribution If U and V are independent chi-square random variables such that U ∼ χ2r and V ∼ χ2s , then U/r F = V /s has an F distribution on r and s degrees of freedom. The distribution is denoted by Fr,s . Note that the F distribution is characterized by two separate measures of degrees of freedom: r corresponds to the numerator and s corresponds to the denominator. Printed F tables are available, and of course we can always use R (except in an exam!). Note that it follows immediately that the reciprocal ratio of sample variances S22 S12 will have an F distribution on n2 − 1 and n1 − 1 degrees of freedom. In practice, we carry out the hypothesis test for equal variances as follows. We will only consider the case of the two–sided alternative (“not equal”), giving rise to a two–tailed test. In this case it is sensible to reject H0 if either s21 /s22 or s22 /s21 is large. We form our test statistic as 2 2 s s F = max 12 , 22 , s2 s1 and compare this with Fr,s tables, where if s11 > s22 we set r = n1 − 1 and s = n2 − 1, while if s12 > s21 we set r = n2 − 1 and s = n1 − 1. To account for the fact that under H0 , these two outcomes could happen with equal probability, the significance level of the test is *double* the upper tail probability of the F distribution (obtained from tables or R). 4 HYPOTHESIS TESTING 64 Example 4.4.1 For the data in Example 4.3.1, test the assumption that the standard deviations of the two populations are equal. Solution: 4 HYPOTHESIS TESTING 65 4.5 Inference for a single Binomial proportion (r not small!) Here we consider the situation where we have a single observation x from a Binomial random variable X ∼ Bin(r, θ), and we are interested in testing hypotheses about θ. Note that x can be viewed as the number of successes from r independent trials, each with success probability θ. In this section we consider the case where r is not small, i.e. r > 20. We will test H0 : θ = θ0 against an alternative from one of the categories (A) to (E) above. Example 4.5.1 UK survey of sexual behaviour: in 2004/05, 11% of UK residents aged 16–49 claimed to have had more than one sexual partner. Suppose that in 2008-09, a random sample of 600 UK residents in the 16–49 age–group shows that 83 had more than one sexual partner. Is this evidence for an increase in the population proportion having more than one sexual partner? Formulate this problem as a hypothesis test. Solution: 66 4 HYPOTHESIS TESTING We need to derive a test statistic whose distribution we can evaluate conditional on H0 being true. We use the Normal approximation to the Binomial distribution. I.e. if X ∼ Bin(r, θ), with r > 20, then to a reasonable approximation X ∼ N[rθ, rθ(1 − θ)]. (Note that the approximation involves rounding the outcome of a Normal random variable to the nearest integer! See below.) Now suppose the null hypothesis H0 is true, i.e. θ = θ0 . Then the Normal approximation implies X ∼ N[rθ0 , rθ0 (1 − θ0 )], and hence the test statistic has a N(0,1) distribution. Z=p X − rθ0 rθ0 (1 − θ0 ) This means we can carry out a one–sample z–test exactly as we did in Section 4.2. N.B. because of the rounding issue, it makes sense to replace x in the test statistic by x − 0.5 when x > rθ0 , and by x + 0.5 when x < rθ0 . This is called a continuity correction. Example 4.5.2 For the sexual behaviour data in Example 4.5.1 we have r = 600, we have observed x = 83, and we want to test H0 : θ = 0.11 against H1 : θ > 0.11. Carry out the hypothesis test. 4 HYPOTHESIS TESTING 67 Solution: Notes on significance levels and p–values 1. If you are not told what level of significance to use, a sensible procedure is to test at the 5% level. If not significant then stop, otherwise test at the 1% level. If not significant then stop, otherwise test at the 0.1% level. 2. If you have access to the p-value, e.g. from Normal tables, or from R (see Exercises 4B Questions 1 and 2) then you immediately have the result of a hypothesis test at any given significance level. E.g. in Example 4.5.2 immediately above, we had p = 0.0158. It follows immediately that our test is significant at 5% but not at 1%, because 0.05 > p > 0.01. 68 4 HYPOTHESIS TESTING 4.6 Inference for two Binomial proportions (samples not small!) Example 4.6.1 Consider a survey of employment carried out seperately in Northern England and Scotland, among people who had left school six months earlier. Suppose we obtain the following data: Scotland Northern England Total Unemployed Employed In general we have two independent samples of size n1 and n2 , with each observation classified as success or failure: Success Failure Sample 1 O11 O21 n1 Sample 2 Total O12 R1 = O11 + O12 O22 R2 = O21 + O22 n2 n = n1 + n2 Assuming all observations are independent, and that the success probability is constant within each sample, we have two Binomial samples. Suppose that the true probabilities of success are θ1 and θ2 . We wish to test H0 : θ1 = θ2 versus H1 : θ1 6= θ2 . As always with a hypothesis test, we need to find a test statistic whose distribution is known when H0 is true. Now if H0 is true, then θ1 = θ2 = θ, say. The combined samples give the number of successes in n1 + n2 trials, in each of which there is a probability θ of a success. So we may estimate θ by θ̂ = R1 /n, where R1 = O11 + O12 (total for first row) n = n1 + n2 (grand total). 69 4 HYPOTHESIS TESTING Hence, under H0 , the expected number of successes in each of the samples is E11 = n1 R2 n2 R1 n2 R2 n1 R1 ; E21 = ; E12 = ; E22 = . n n n n where R2 = O21 + O22 (total for second row). To measure how closely the expected values match the observed values we calculate the test statistic 2 X 2 X (Oij − Eij )2 2 X = . Eij i=1 j=1 Under H0 , X 2 has an asymptotic distribution which is a χ21 distribution (a “chi–square distribution with 1 degree of freedom”). Definition 4.6.1: The chi–square distribution χ2n If Z1 , . . . , Zn are independent N(0, 1) random variables, then X2 = n X Zi2 i=1 has a chi–square distribution on n-degrees of freedom. The distribution is denoted by χ2n . If H0 is true, the observed values should be close to the expected values, and so X 2 will be small. Hence we reject H0 if X 2 is large enough, using Tables (or R). 70 4 HYPOTHESIS TESTING Example 4.6.2 Consider the data in Example 4.6.1: Test H0 : the unemployment rates are equal against H1 : the unemployment rates are not equal. Solution: 71 4 HYPOTHESIS TESTING Notes 1. The method we just described for 2 × 2 tables also works for r × c tables, that is tables with r rows and c columns. The test statistic is given by 2 X = r X c X (Oij − Eij )2 i=1 j=1 Eij , and this is compared with a chi-square distribution with (r − 1) × (c − 1) degrees of freedom, i.e. χ2(r−1)(c−1) . 2. Since deviation from what is expected under H0 always corresponds to higher values of X 2 , chi–square tests for 2 proportions (and for r × c contingency tables) are ***always*** 1–tailed, and always use the upper tail of the chi–square distribution!!! 4 HYPOTHESIS TESTING 72 4.7 The relationship between hypothesis tests and confidence intervals Every hypothesis test we carry out has a corresponding confidence interval associated with it! Example 4.7.1 For the beetle widths given in Example 4.2.3, calculate a 95% confidence interval for the population mean µ. Solution: Looking back at that example, we can deduce immediately that 23 also lies outside the 99% confidence interval, and the 99.9% confidence interval. (Exercise: check this!) The general rule is: The 100(1 − α)% confidence interval consists precisely of all those values which would not be rejected at the 100α% significance level. 73 4 HYPOTHESIS TESTING 4.8 Hypothesis tests: size and power function Hypothesis tests can be described in terms of their size and power. Definition 4.8.1: the size of a hypothesis test Consider a particular hypothesis test on a single parameter θ. We define the size of the test to be sup{Pr(reject H0 )}. θ∈ω Note that for a simple null hypothesis, this is just the probability we reject H0 if it’s true, i.e. the probability of a Type I error. For a composite null hypothesis, it is the supremum of this rejection probability over all the values of θ for which the null hypothesis holds. Definition 4.8.2: the power function for a hypothesis test The power K(θ) is the probability of rejecting H0 , considered as a function of θ. A plot of the power function is helpful in determining how good our test is at rejecting the null hypothesis when it is false. Informally, the power of a test is often used to refer to the probability that it will reject the null hypothesis when it is false. However from our different categories of alternative hypothesis (A) - (E), this only makes real sense for (A) and (B), i.e. when we are comparing two simple hypotheses. 74 4 HYPOTHESIS TESTING Example 4.8.1 Suppose X1 , . . . , X4 is a random sample from X ∼ N(µ, 36), and we wish to test H0 : µ = 10 against H1 : µ > 10. Note that this is a one–tailed alternative. Now suppose we base our rejection region on the value of X̄; specifically we construct it as A1 = {X : X̄ > 17}. (a) Plot the power function for this test in the range 10 ≤ µ ≤ 20. (b) What is the size of this test? (c) What would be the power of the test if the alternative was, in fact, H1 : µ = 22? Solution: 4 HYPOTHESIS TESTING 75 Solution: (cont.) Choice of rejection region In Example 4.8.1 we found that our rejection region gave a test with desirable properties: a ‘standard’ size of 1%, and a well-defined power function. So how can we design such a test ourselves? Fortunately there is a very useful theorem which helps us to define an ‘optimal’ rejection region... 76 4 HYPOTHESIS TESTING The Neyman-Pearson Lemma Suppose we have a random sample x1 , x2 , . . . , xn from a random variable X with density fX (x|θ), and we wish to test H0 : θ = θ0 against the simple alternative H1 : θ = θ1 . Consider the Likelihood Ratio defined as Λ(x) = L(θ0 |x) . L(θ1 |x) Suppose we define a test by rejecting H0 in favour of H1 if Λ(x) is small enough. Specifically, suppose we choose a cut–off point η such that Pr(Λ(x) ≤ η|H0) = α. Then the test based on the rejection region A1 = {x : Λ(x) ≤ η} is the most powerful test of size α. Now suppose we have a composite alternative hypothesis H1 : θ ∈ Θ1 . If the test is the most powerful for all θ1 ∈ Θ1 , then it is said to be the uniformly most powerful (UMP) test for alternatives in the set Θ1 . Notes 1. Informally, the Neyman-Pearson Lemma says that if we base our test on the value of the likelihood ratio, then we get the best possible test (in the sense of being the most powerful). 2. Note that if we need to define a rejection region in terms of Λ(x) = L(θ0 |x) , L(θ1 |x) it is often easier to work with log[Λ(x)] = log[L(θ0 |x)] − log[L(θ1 |x)]. Example 4.8.2 Suppose X1 , X2 , . . . , Xn is a random sample from a N(µ, σ 2 ) distribution where µ is known, and we wish to test H0 : σ 2 = σ02 versus H1 : σ 2 = σ12 , where σ02 < σ12 . Find an appropriate test statistic on which to base a rejection region. 4 HYPOTHESIS TESTING Solution: 77 4 HYPOTHESIS TESTING 78 4.9 Small sample methods In this section we consider statistical inference (estimation and hypothesis testing) in situations where the sample size is small. The crucial change from large–sample methods is that we can no longer rely on the asymptotic distribution of either the maximum likelihood estimator, or the test statistic, in a hypothesis test. In fact the cases for one and two Normal means have already been dealt with, because the adjustments made to deal with unknown variance (t–tests!) work for arbitrarily small samples. The cases which need special treatment are the cases of (a) inference on one Binomial proportion, and (b) the comparison of two Binomial proportions . . . 4.9.1 Inference for a single Binomial proportion (r is small!) Suppose we have a single observation x from a Binomial random variable X ∼ Bin(r, θ), and we want to test hypotheses about θ. This is the same kind of problem as we considered in Section 4.5, but this time we assume that the number of trials r is small, i.e. ≤ 20. The crucial difference is that the Normal approximation is now too poor to use, and we should use the Binomial distribution directly (using Tables or R). The fact that we are now working with a genuinely discrete distribution leads to a complication: we cannot carry out a test precisely for any specified significance level; we have to use the nearest approximate significance level. Example 4.9.1 A leading cat–food manufacturer has a slogan which could be interpreted as follows: “80% of cats prefer our product.” In an experiment to test this, 20 cats are each given the choice between the product in question, Brand W, and the leading market competitor, Brand X. Result: 12 cats go for Brand W, and 8 cats go for Brand X. Is this evidence against Brand W’s claim? 4 HYPOTHESIS TESTING Solution: 79 80 4 HYPOTHESIS TESTING 4.9.2 Inference for two Binomial proportions (small samples!) Here we consider the same kind of problem as in Section 4.6, i.e. two Binomial proportions, with the data arranged in a 2 × 2 contingency table. Here we consider the case when one or both samples are small. Example 4.9.2 A small study into the dieting habits of teenagers is undertaken, to investigate whether or not the proportions of males and females who diet are equal. Suppose the population proportions of males and females who are dieting at any one time are denoted by θM and θF respectively. We wish to test: H0 : θM = θF against H1 : θM 6= θF . A random sample of 12 boys and 12 girls is selected, and we ascertain whether each individual is currently on a diet. Data: Table 4.1 boys girls Total dieting not dieting Total 81 4 HYPOTHESIS TESTING It certainly appears that in the population, girls are more likely to be dieting, since in our sample: 9 out of 12 girls are dieting; 1 out of 12 boys are dieting. The question is: “How significant are these results?” In other words, how much evidence do we have against H0 : θM = θF ? The way we answer this is that we assume the row totals and the column totals are fixed at the observed values. We then assume that H0 is true (as ever!) and we ask, how unlikely is the result we have observed? In other words: If we were to choose 10 of the teenagers at random, what is the probability that 9 of them would be among the 12 girls, and only 1 from among the 12 boys? The p–value for this test will be the probability of all outcomes which are as extreme as this one, or more so . . . Solution: We introduce the notation: boys girls Total dieting not dieting Total 82 4 HYPOTHESIS TESTING Table 4.2 boys girls Total dieting not dieting Total 4 HYPOTHESIS TESTING 83 Review of Section 4 In this section we have: 1. Introduced the principles of hypothesis testing. 2. Seen how to carry out hypothesis tests in some specific cases when the sample size n is reasonably large: (a) the mean of a single Normal population (variance known and unknown); (b) the means of two Normal populations (variances assumed equal); (c) the variances of two Normal populations; (d) the success probability for a single Binomial proportion; (e) the success probabilities for two Binomial proportions; 3. Introduced three new probability distributions needed to carry out the tests: the Student t distribution, the F distribution and the chi–square distribution. 4. Learned how to use Statistical Tables to carry out the tests at specific significance levels. 5. Learned how to use R to do some of these tests, and to interpret the precise p–value obtained. 6. Understood the relationship between hypothesis tests and confidence intervals. 7. Considered the properties of hypothesis tests, namely size and power. 8. Seen how we may construct the most powerful tests using the Neyman-Pearson Lemma. 9. Seen how to carry out hypothesis tests relating to the success probability of a single Binomial distribution when the number of trials is small (≤ 20); 10. Seen how to carry out hypothesis tests to compare two Binomial proportions when the sample sizes in a 2 × 2 contingency table are small. [Note that for the cases of one Normal mean and two Normal means, the methods we developed in Section 3 (z-tests and t-tests) already work for arbitrarily small samples.]