Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 9 Hypothesis Testing 9.1 Introduction In statistics, a hypothesis is a statement about a parameter. Consider a statistical model {Pθ |θ ∈ Θ}. The goal of hypothesis testing is to decide, based on a sample from a population, which of two complementary hypotheses is true. These are called the null hypothesis and alternative hypothesis and are denoted H0 and H1 respectively. The null hypothesis is H0 : θ ∈ Θ0 and the alternative H1 : θ ∈ Θc0 , where Θ0 ⊂ Θ is a strict subset of the parameter space. Definition 9.1. A hypothesis testing procedure or hypothesis test is a rule that specifies 1. For which sample values the hypothesis H0 is not rejected. 2. For which sample values H0 is rejected and H1 is accepted as true. The subset of X (the sample space) for which H0 is rejected is the rejection region or critical region. Typically, for a random sample X1 , . . . , Xn , a hypothesis test is specified in terms of a test statistic W (X1 , . . . , Xn ) = W (X), a function of the sample. 9.2 9.2.1 Methods of Finding Tests Likelihood Ratio Test Definition 9.2 (Likelihood Ratio Test Statistic). Let L(θ; x) denote the likelihood function for parameter θ. The likelihood ratio test statistic for testing H0 : θ ∈ Θ0 versus H1 : θ ∈ Θc0 is λ(x) = supθ∈Θ0 L(θ; x) . supθ∈Θ L(θ; x) A likelihood ratio test (LRT) is any test that has a rejection region of the form Rcrit = {x|λ(x) < c} Example 9.1 (Normal LRT). 155 0≤c≤1 156 CHAPTER 9. HYPOTHESIS TESTING Let X1 , . . . , Xn be a random sample from a N (θ, 1) population. Consider testing H0 : θ = θ0 versus H1 : θ 6= θ0 . The LRT is L(θ0 |x) λ(x) = L(x|x) since the denominator is maximised (by definition) at θbM L = X. The LRT statistic is ( 1 λ(x) = exp − 2 − n X i=1 (xi − θ0 ) + n X i=1 (xi − x) 2 !) n n o = exp − (x − θ0 )2 . 2 An LRT is a test that rejects H0 for small values of λ(x). The rejection region {x|λ(x) ≤ c} can be written as: ( ) r 2 x : |x − θ0 | ≥ − log c . n Clearly, the rejection region obtained from the LRT statistic has a complete description in terms of the simpler statistic |X − θ0 |. Example 9.2 (Exponential LRT). Let X1 , . . . , Xn be a random sample from an exponential population with p.d.f. p(x; θ) = ( e−(x−θ) x ≥ θ 0 x<θ where Θ = R. The likelihood function is L(θ; x) = ( e− 0 Pn j=1 xj +nθ θ ≤ x(1) θ > x(1) . Here x(1) = minj xj is the lowest order statistic. Consider testing H0 : θ ≤ θ0 versus H1 : θ > θ0 . Clearly L(θ; x) is an increasing function of θ on −∞ < θ < x(1) . Thus, the LRT statistic is λ(x) = ( 1 x(1) ≤ θ0 −n(x(1) −θ0 ) e x(1) > θ0 An LRT rejects H0 if λ(X) ≤ c for some specified c. The rejection region is therefore log c . x : x(1) ≥ θ0 − n It depends on the sample only through the sufficient statistic X(1) . Theorem 9.3. If T (X) is a sufficient statistic for θ and λ∗ (t) and λ(x) are the LRT statistics based on T and X respectively, then λ∗ (T (x)) = λ(x) for every x ∈ X . 157 9.2. METHODS OF FINDING TESTS Proof From the factorisation theorem, the probability mass (or density) function of X may be written as p(x, θ) = g(T (x), θ)h(x) where g(t, θ) is the pmf / pdf of T and h(x) does not depend on θ. It follows that λ(x) = supθ∈θ0 g(T (x)|θ) supθ∈Θ0 L∗ (θ; T (x)) supθ∈Θ0 L(θ|x) = = = λ∗ (T (x)). supθ∈Θ L(θ|x) supθ∈Θ g(T (x)|θ) supθ∈Θ L∗ (θ; T (x)) Likelihood ratio tests are also useful in situations where there are nuisance parameters, that is, parameters that are present in the model, but not of direct interest. Their presence can lead to different tests. Example 9.3 (Normal LRT with unknown variance). Let X1 , . . . , Xn be a random sample from a N (µ, σ 2 ) population, where it is required to test H0 : µ ≤ µ0 versus H1 : µ > µ0 for a specified µ0 and where σ 2 is unknown. The LRT statistic is λ(x) = sup(µ,σ2 ):µ≤µ0 L(µ, σ 2 ; x) c2 M L ; x) L(b µM L , σ A test based on λ(x) is equivalent to a test based on the t-statistic T (X) = √ n 1 X S = (Xj − X)2 . n−1 n(X − µ0 ) ∼ tn−1 S 2 j=1 This is seen as follows: if µ0 > x, then λ(x) = 1, since the maximising pair for the numerator (µ̃, σ̃) is the same as that for the denominator. If µ0 < x, then n 1 1 X 2 (x − µ) L(µ, σ; x) = −n log(σ) − exp j 2σ 2 (2π)n/2 j=1 P P Now note that nj=1 (xj − µ)2 = nj=1 x2j − 2nxµ + nµ2 , and that µ2 − 2µx is decreasing in µ for µ < x so that the maximiser for the numerator is µ0 . Then the σ̃ that maximises the numerator satisfies: n σ̃ 2 := 1X (xj − µ0 )2 . n j=1 It follows that for µ0 < x, λ(x) = exp so that λ(x) 2/n = Pn (xj − x)2 Pnj=1 2 j=1 (xj − µ0 ) = Pn σ b2 n log M2L 2 σ̃ Pn j=1 (xj 2 j=1 (xj − x) − x)2 + n(x − µ0 ) 2 = 1 1+ n(x−µ0 )2 (n−1)s2 158 CHAPTER 9. HYPOTHESIS TESTING where s2 = 1 n−1 It follows that Pn j=1 (xj − x)2 . λ(x) ≤ c ⇔ T (x)2 := n(x − µ0 )2 −2/n > (n − 1) c − 1 =k s2 where the relation between k and c is given by the formula. Since the test is only rejected for µ0 < x, it follows that: Rcrit = {x : T (x) < k}. 9.3 Evaluating Tests There are two possible errors when testing H0 : θ ∈ Θ0 versus H1 : θ ∈ Θc0 . • Type I error: This is when θ ∈ Θ0 , but the hypothesis test incorrectly rejects H0 . • Type II error: This is when θ ∈ Θc0 , but the hypothesis test fails to reject H0 . Suppose R denotes the rejection region for a test. Then, for θ ∈ Θ0 , P(type I error) = Pθ (X ∈ R) and for θ ∈ Θc0 , P(type II error) = Pθ (X ∈ Rc ) θ ∈ Θ0 θ ∈ Θc0 . The rejection region R is chosen so that the probability of type I error is not greater than a value specified in advance, α, known as the significance level. For a specified significance, the power of the test is defined as the probability that it will reject H0 when H0 is false. Definition 9.4 (Power Function). The power function of a hypothesis test with rejection region R is the function β : Θ → [0, 1] defined by β(θ) = Pθ (X ∈ R). Definition 9.5 (Size α, level α test). For 0 ≤ α ≤ 1, a test with power function β(θ) is a size α test if sup β(θ) = α θ∈Θ0 For 0 ≤ α ≤ 1, a test with power function β(θ) is a level α test if sup β(θ) ≤ α. θ∈Θ0 Example 9.4 (Normal Power Function). 159 9.4. MOST POWERFUL TESTS Let X1 , . . . , Xn be a random sample from a N (θ, σ 2 ) population, where σ 2 is known. An LRT of H0 : θ ≤ θ0 versus H1 : θ > θ0 is a test that rejects H0 if X − θ0 √ >c σ/ n c ∈ R+ The power function of the test is β(θ) = Pθ X − θ0 √ >c σ/ n = Pθ where Φ(x) = P(Z ≤ x), Z ∼ N (0, 1). X −θ θ −θ √ >c+ 0 √ σ/ n σ/ n θ − θ0 =1−Φ c− √ σ/ n Suppose a significance level α = 0.1 is required and, in addition, a power of 0.8 is required if θ ≥ θ0 + σ (θ is more than one standard deviation from θ0 ). Which values of c and n should be chosen to meet this? The requirements will be met if β(θ0 ) = 0.1 β(θ0 + σ) = 0.8. 0.1 = β(θ0 ) = 1 − Φ(c) ⇒ c = z0.1 ≃ 1.28 √ 0.8 = β(θ0 + σ) = 1 − Φ c − n √ √ ⇒ Φ n − c = 0.8 ⇒ n − c = z0.2 ≃ 0.84 ⇒ n = (1.28 + 0.84)2 = 4.49 Since n is an integer, choose n = 5. 9.4 Most Powerful Tests Definition 9.6. Let C be a class of tests for testing H0 : θ ∈ Θ0 versus H1 : θ ∈ Θc0 . A test in class C with power function β(θ) is a uniformly most powerful (UMP) class C test if β(θ) ≥ β ′ (θ) for every θ ∈ Θc0 and every β ′ that is a power function of a test in class C. The requirements of this definition are so strong that UMP tests do not exist in many realistic problems. When they do, they are very useful. The following theorem describes which tests are UMP level α tests in the situation where where the null and alternative hypotheses both consist of only one probability distribution (H0 and H1 are simple hypotheses). Theorem 9.7 (Neyman-Pearson Lemma). Consider testing H0 : θ = θ0 versus H1 : θ = θ1 , where the p.d.f or p.m.f. corresponding to θi is p(., θi ), i = 0, 1, using a test with rejection region R that satisfies x∈R if p(x, θ1 ) > kp(x, θ0 ) and x ∈ Rc for some k ≥ 0 (both inequalities are strict) and α = Pθ0 (X ∈ R). Then if p(x, θ1 ) < kp(x, θ0 ) (9.1) 160 CHAPTER 9. HYPOTHESIS TESTING 1. (Sufficiency) Any test that satisfies these criteria is a UMP level α test. 2. (Necessity) If there exists a test satisfying these two criteria with k > 0, then every UMP level α test is a size α test and every UMP level α test satisfies Equation (9.1) except perhaps on a set A satisfying Pθ0 (X ∈ A) = Pθ1 (X ∈ A) = 0. The theorem has the following corollary: Corollary 9.8. Suppose that T (X) is a sufficient statistic for θ and g(t, θi ) the p.d.f. or p.m.f. of T corresponding to θi , i = 0, 1. Then any test based on T with rejection region S us a UMP level α test if it satisfies t∈S if g(t, θ1 ) > kg(t, θ0 ) for some k ≥ 0 where t ∈ Sc and if g(t, θ1 ) < kg(t, θ0 ). (9.2) α = Pθ0 (T ∈ S). Proof (Exercise - use the theorem and the factorisation theorem) Before giving the proof of the Neyman-Pearson lemma, the following examples may be instructive. Example 9.5 (UMP Binomial test). Let X ∼ Binomial(2, θ). Consider the test H0 : θ = pX (0|θ = 43 ) 1 = , 4 pX (0|θ = 12 ) 1 2 versus H1 : θ = 34 . The probability ratios are: pX (1|θ = 43 ) 3 = 4 pX (1|θ = 12 ) pX (2|θ = 43 ) 9 = . 4 pX (2|θ = 12 ) If we choose 34 < k < 94 , the Neyman-Pearson lemma says that R = {2} (reject H0 for x = 2) is the UMP level α = pX (2|θ = 12 ) = 41 test. If we choose 14 < k < 34 , then the Neyman-Pearson lemma says that R = {1, 2} (reject H0 for x = 1 or 2) is the UMP level α = P(X ∈ {1, 2}|θ = 21 ) = 43 test (probability of 0.75 of wrongly rejecting H0 ). If we choose k < 41 or k > 94 , this gives the UMP level α = 1 (R = φ) or level α = 0 (R = {0, 1, 2}) test respectively. If k = 34 , then Equation (9.1) says we must reject H0 for the sample point x = 2 and accept for x = 0, but leaves x = 1 undetermined. If we do not reject for x = 1, then we get the UMP level α = 41 test as above. If we reject for x = 1, we get the UMP level α = 34 test above. Example 9.6 (UMP Normal test). Let X1 , . . . , Xn be a random sample from a N (θ, σ 2 ) population, σ 2 known. The sample mean X is a sufficient statisticfor θ. Consider testing H0 : θ = θ0 versus H1 : θ = θ1 , where θ0 > θ1 . Let g denote the density function of X. Then g(x, θ1 ) > kg(x, θ0 ) ⇔ x < 1 2 n (2σ log k) − θ02 + θ12 2(θ1 − θ0 ) 161 9.4. MOST POWERFUL TESTS The test with rejection region R = {x < c} is the UMP level α test where α = Pθ0 (X < c). If a particular α is specified, then the UMP test rejects H0 if σ X < c = θ 0 − √ zα n Proof of Neyman - Pearson Lemma For notational convenience, the proof is given for continuous variables; the proof for discrete variables is similar. First, since Θ0 contains only one point, any test satisfying Pθ0 (X ∈ R) = α is a size α test and hence a level α test. Let R be the region defined by Equation (9.1) and 1R the indicator function for this region. Let R′ be any other region such that Pθ0 (X ∈ R′ ) ≤ α (critical region of a level α test). Let β(θ) and β ′ (θ) be the power functions corresponding to rejection regions of R and R′ respectively. Because 0 ≤ 1R′ (x) ≤ 1, it follows that (1R (x) − 1R′ (x))(p(x, θ1 ) − kp(x, θ0 )) ≥ 0 ∀x ∈ X since 1R (x) − 1R′ (x) ≥ 0 when p(x, θ1 ) > kp(x, θ0 ) and 1R (x) − 1R′ (x) ≤ 0 when p(x, θ1 ) < kp(x, θ0 ). It follows that 0≤ Z (1R (x) − 1R′ (x)) (p(x, θ1 ) − kp(x, θ0 )) dx = β(θ1 ) − β ′ (θ1 ) − k β(θ0 ) − β ′ (θ0 ) . The sufficiency statement follows by noting that since φ′ is a level α test and φ a size α test, β(θ0 ) − β ′ (θ0 ) ≥ 0. It follows that 0 ≤ (β(θ1 ) − β ′ (θ1 )) − k(β(θ0 ) − β ′ (θ0 )) ≤ β(θ1 ) − β ′ (θ1 ). It follows that the test with critical region R is the MP test. To prove the necessity statement, let R′ be the rejection region for any MP level α test. The test satisfying Equation (9.1) is also a MP level α test and therefore β(θ1 ) = β ′ (θ1 ). It follows that 0 ≤ −k(β(θ0 ) − β ′ (θ0 )) and since R is the critical region for a size α test, it follows that β ′ (θ0 ) = α; hence R′ is the critical region for a size α test. It now follows that Z (1R (x) − 1R′ (x)) (p(x, θ1 ) − kp(x, θ0 )) dx = 0 and since the integrand is non negative, it is therefore zero everywhere. It follows that 1R′ is the indicator of the region from Equation (9.1) except possibly on a set A of p(., θi ) measure 0 for i = 0, 1. The theorem is proved. 162 CHAPTER 9. HYPOTHESIS TESTING 9.5 Monotone Likelihood Ratio The monotone likelihood ratio property plus some other properties give situations that admit a UMP test. Definition 9.9. A family of p.d.f.s (probability density functions) or p.m.f.s (probability mass functions) g(t, θ) for a univariate random variable T with real valued parameter θ has monotone likelihood 2) ratio (MLR) if for every θ2 > θ2 , g(t,θ g(t,θ1 ) is a monotone (non-increasing or non-decreasing) function of t on {t : g(t, θ1 ) + g(t, θ2 ) > 0}, where 0c is defined as +∞ if c 6= 0. Note that any density of the form: g(t, θ) = h(t)c(θ) exp{w(θ)t} satisfies MLR if w(θ) is a monotone function. Theorem 9.10 (Karlin - Rubin). Consider a test H0 : θ ≤ θ0 versus H1 : θ > θ0 . Suppose that T is a sufficient statistic for θ and the family of p.d.f.s (or p.m.f.s) for T satisfies MLR. Then for any t0 , the test that rejects H0 if and only if T > t0 is a UMP level α test where α = Pθ0 (T > t0 ). Proof Let β(θ) = Pθ be the power function of the test. Fix θ′ > θ0 and consider the test H0′ : θ = θ0 versus H1′ : θ = θ′ . Since the family of p.d.f.s or p.m.f.s of T satisfies the MLR property, it follows that β(θ) is non-decreasing. It follows that supθ≤θ0 β(θ) = β(θ0 ) = α, hence this is a level α test. Let k ′ = inf t∈T g(t, θ′ ) g(t, θ0 ) T = {t : t > t0 Then {T > t0 } ⇔ and g(t, θ′ ) + g(t, θ0 ) > 0}. g(t, θ′ ) > k′ . g(t, θ0 ) It now follows from Corollary 9.8 that the power function β ∗ for any other level α test satisfies β(θ′ ) ≥ β ∗ (θ′ ). Since β ∗ (θ0 ) ≤ supθ∈Θ0 β ∗ (θ) ≤ α, it follows that β(θ′ ) ≥ β ∗ (θ′ ) for any level α test of H0 . Since θ′ is arbitrary, it follows that the test is a UMP level α test. Example 9.7 (Normal Random Sample, Variance Known, Testing Mean). If X1 , . . . , Xn is a N (µ, σ 2 ) random sample, where σ 2 is known, then T := X is a sufficient statistic 2 for µ. Its density is N (µ, σn ) and the likelihood ratio is n n o g(t, µ2 ) = exp (µ + µ − 2t) (µ − µ ) 1 2 1 2 g(t, µ1 ) 2σ 2 and it is clear that it satisfies the MLR property. Consider the test H0 : µ ≥ µ 0 with versus H 1 : µ < µ0 , 163 9.5. MONOTONE LIKELIHOOD RATIO Rcrit = σzα −∞, µ0 − √ n . From the preceding result, this is a UMP level α test. The power function is: √ σzα n(µ0 − µ) β(µ) = Pµ X < µ − √ =Φ σ n which is clearly decreasing as a function of µ, so that β(µ0 ) = sup β(µ) = α. µ≥µ0 The following simple example illustrates that the UMP test does not, in general, exist. Two tests, which satisfy the necessary conditions for UMP are given, but they do not satisfy the sufficient conditions. Example 9.8 (Two sided normal, variance known). Let X ∼ N (µ, σ 2 ), where σ 2 is known. Consider the test H0 : µ = µ 0 versus H1 : µ 6= µ0 . For a specified value of α, a level α test is any test that satisfies Pµ0 (reject H0 ) ≤ α. Consider the test: reject H0 if x < µ0 − σzα where x is the observed value of X. This test has greatest power in the region {µ : µ < µ0 }. By the necessity part of the Neyman-Pearson lemma, any other level α test, for any µ < µ0 , a level α test of the same power has the same rejection region up to a set of both Pµ and Pµ0 measure zero. Let β denote the power function of this test. Now consider the test: reject H0 if x > µ0 + σzα . This is also a level α test. Let β ∗ denote its power function, then it is clear that β ∗ (µ) > β(µ) ∀µ > µ0 . These contradictory tests both satisfy necessary (but not sufficient) conditions to be UMP tests, therefore there does not exist a UMP test. To obtain sensible results, an additional requirement has to be added to the class of tests under consideration; tests should be unbiased. Definition 9.11 (Unbiased Test). A statistical test of size (level) α, 0 < α < 1 for testing H0 : θ ∈ Θ0 ⊂ Θ against an alternative H1 : θ ∈ Θ\Θ0 whose power function satisfies ( β(θ) ≤ α θ ∈ Θ0 β(θ) ≥ α θ ∈ Θ\Θ0 is said to be unbiased 164 CHAPTER 9. HYPOTHESIS TESTING Exercise In the example of X ∼ N (µ, σ 2 ) where σ 2 is known, the test with critical region Rcrit = (−∞, µ0 − σzα/2 ] ∪ [µ0 + σzα/2 , +∞) is UMP among the class of unbiased tests. 9.6 Union-Intersection and Intersection-Union Tests A union-intersection test is one where the null hypothesis is of the form: e := H0 : θ ∈ Θ \ Θγ versus γ∈Γ e H1 : θ 6∈ Θ where Γ is an indexing set and Θγ ⊂ Θ for each γ ∈ Γ. Suppose that a test is available for each of the problems: H0γ : θ ∈ Θγ versus H1γ : θ ∈ Θ\Θγ and suppose the rejection region for this test is: Rγ . The rejection region for the test as a whole is ∪γ∈Γ Rγ . That is, the null hypothesis is rejected if it is rejected for any one of the tests. Similarly, an intersection-union test (IUT) has null hypothesis of the form e= H0 : θ ∈ Θ [ Θγ versus γ∈Γ e H1 : θ 6∈ Θ and, if the rejection region for test γ is Rγ , then the rejection region for the test as a whole is R := ∩γ∈Γ Rγ . Let λγ (x) be the LRT statistic for testing H0γ : θ ∈ Θγ versus H1γ : θ 6∈ Θγ and consider the UIT e := ∩γ∈Γ Θγ versus H1 : θ 6∈ Θ. e Then the following results gives the relationships between H0 : θ ∈ Θ the overall LRT and the UIT based on λγ (x). e := ∩γ∈Γ Θγ versus H1 : θ ∈ Θ\Θ e and let λ(x) denote the Theorem 9.12. Consider testing H0 : θ ∈ Θ LRT for this test. Let λγ (x) be the LRT statistic for H0 : θ ∈ Θγ versus H1 : θ ∈ Θ\Θγ . Let T (x) = inf λγ (x). γ∈Γ Consider two tests: the first with rejection region {x : T (x) < c} and the second with rejection region {x : λ(x) < c}, for a given c. Then 1. T (x) ≥ λ(x) for each x, 2. If βT and βλ are the likelihood functions for the UIT and LRT tests respectively, then βT (θ) ≤ βλ (θ) for each θ ∈ Θ. 3. If the LRT is a level α test, then the UIT is a level α test. 165 9.6. UNION-INTERSECTION AND INTERSECTION-UNION TESTS Proof From the definition of an LRT statistic of H0 : θ ∈ Θ0 versus H1 : θ ∈ Θ\Θ0 , λ(x) = supθ∈Θ0 L(θ,x) b supθ∈Θ L(θ,x) , it is clear that λγ (x) ≥ λ(x) for each γ ∈ Γ, since Θ ⊂ Θγ for each γ ∈ Γ. It follows that T (x) := inf γ∈Γ λγ (x) ≥ λ(x), thus proving the first part. The second follows trivially from the definition, since {x : T (x) < c} ⊆ {x : λ(x) < c}. The third also follows directly from the definition. The following result for IUT tests is trivially obvious: e versus H1 : θ ∈ Θ\Θ e where Θ e = ∪γ∈Γ Θγ . Let αγ be Theorem 9.13. Consider the test of H0 : θ ∈ Θ γ γ the size of the test of H0 : θ ∈ Θγ versus H1 : θ ∈ Θ\Θγ . Let Rγ be the rejection region for this test and let R = ∩γ∈Γ Rγ be the rejection region for H0 versus H1 . Then the IUT test with rejection region R is a level α = supγ∈Γ αγ test. b Proof For any θ ∈ Θ, Pθ (X ∈ R) ≤ Pθ (X ∈ Rγ ) ≤ αγ < α. The following theorem gives conditions under which the size of the IUT is exactly α. Theorem 9.14. Consider the test H0 : θ ∈ ∪kj=1 Θj , where k is a finite positive integer and let Rj denote the rejection region of a level α test for H0j : θ ∈ Θj . Suppose that there is an i ∈ {1, . . . , k} for which there is a sequence θl ∈ Θi : l = 1, 2, . . . such that 1. liml→+∞ Pθl (X ∈ Ri ) = α and 2. for each j ∈ {1, . . . , k}\{i}, liml→+∞ Pθl (X ∈ Rj ) = 1. Then the IUT test with rejection region R = ∩kj=1 Rj is a size α test. Proof By the previous result (which is obvious), R is a level α test: sup Pθ (X ∈ R) ≤ α. e θ∈Θ Furthermore, sup Pθ (X ∈ R) ≥ b θ∈Θ lim Pθl (X ∈ R) = lim Pθl (X ∈ ∩kj=1 Rj ) l→+∞ l→+∞ = 1 − lim Pθl (X ∈ ∪kj=1 Rjc ) ≥ 1 − lim l→+∞ = 1 − (1 − α) = α and hence equality. l→+∞ k X j=1 Pθl (X ∈ Rjc ) 166 9.7 CHAPTER 9. HYPOTHESIS TESTING p-Values The p value of a test is a statistic that indicates that if the null hypothesis is rejected, then it is wrongly rejected; for an observed value x ∈ X , p(x) is an estimate of the probability of the critical region, under the assumption that the null hypothesis is true. The precise definition is: Definition 9.15 (p-value). A p-value p(X) for a test H0 : θ ∈ Θ0 is a test statistic p : X → [0, 1] where, for each θ ∈ Θ0 and α ∈ [0, 1], Pθ (p(X) ≤ α) ≤ α. A level α test may be constructed on a valid p-value; the test for H0 : θ ∈ Θ0 with rejection region Rcrit = {x : p(x) ≤ α} is a level α test. The following theorem gives the most common ways of defining p values. Theorem 9.16. Let W (X) be a test statistic for H0 : θ ∈ Θ0 versus H1 : θ ∈ Θ\Θ0 and suppose that the rejection region for H0 takes the form: Rcrit = {x : W (x) > c} for some c ∈ R. Define: p(x) = sup P(W (X) ≥ W (x)). θ∈Θ0 Then p(X) is a valid p-value. Proof Set pθ (x) := Pθ (W (X) ≥ W (x)) so that p(x) = supθ∈Θ0 pθ (x). Let Fθ (w) = Pθ (W (X) < w), so that Fθ (W (x)) = Pθ (W (X) < W (x)) = 1 − pθ (x). Since Fθ is monotone in w, it follows that Pθ (F (W )) < w) = Pθ (W < F −1 (w)) ≥ w, so that for θ ∈ Θ0 , Pθ (pθ (X) ≤ α) = Pθ (1 − Fθ (W (X)) ≤ α) = Pθ (Fθ (W (X)) ≥ 1 − α) = 1 − Pθ (Fθ (W (X)) < 1 − α) ≤ α. Example 9.9 (Normal: one sided test of mean). Let X = (X1 , . . . , Xn ) be a N (µ, σ 2 ) random sample. Consider the test of H0 : µ ≤ µ0 versus H1 : µ > µ0 . The LRT rejects H0 for large values of X − µ0 √ W (X) = S/ n Here, for µ ≤ µ0 , n 1 X (Xj − X)2 . S = n−1 2 j=1 167 9.7. P -VALUES µ0 − µ X − µ0 X −µ √ ≥ W (x) = P(µ,σ) √ ≥ W (x) + √ P(µ,σ) (W (X) ≥ W (x)) = P(µ,σ) S/ n S/ n S/ n ≤ P(T ≥ W (x)) = P(µ0 ,σ) (W (X) ≥ W (x)) where T ∼ tn−1 . Hence x − µ0 √ p(x) = P(T ≥ W (x)) = P T ≥ s/ n P where x = (x1 , . . . , xn ) denotes the observed random sample, x = n1 nj=1 xj and 1 Pn 2 s2 = n−1 j=1 (xj − x) . p-values by conditioning on a sufficient statistic Another way to define a p-value is to condition on a sufficient statistic S(X). Let W denote a test statistic for a test with critical region Rcrit = {x : W (x) ≥ c} for some c. For x ∈ X , let p(x|s) = P(W (X) ≥ W (x)|S(X) = s). By definition of ‘sufficient statistic’, the distribution does not depend on the parameters. This may be useful for discrete variables; here for any θ ∈ Θ, Pθ (p(X) ≤ α) = X s P(p(X) ≤ α|S(X) = s)Pθ (S = s) ≤ α. Therefore, if p(x|s) is a valid p value for each s, then the overall statistic is a valid p value. Example 9.10 (Fisher’s Exact Test). Let S1 and S2 be two independent random variables, S1 ∼ binomial(n1 , p1 ) and S2 ∼ binomial(n2 , p2 ). Consider the test H0 : p1 = p2 versus H1 : p1 > p2 . The joint probability mass function under H0 : p1 = p2 = p is: n1 n2 s1 +s2 p (1 − p)(n1 +n2 )−(s1 +s2 ) , s1 s2 so that S = S1 + S2 is a sufficient statistic under H0 . Conditioned on S, S1 may be used as the test statistic, since S2 gives no further information. The conditional distribution of S1 given S is hypergeometric pS1 ,S2 (s1 , s2 ; p) = P(S1 = k|S = s) = n1 n2 k s−k n1 +n2 s . Let s1 denote the observed value of S1 . This test rejects H0 for s1 ∈ [x, n1 ] for suitable x. The conditional p value is therefore: p((s1 , s2 )) = n1 X P(S1 = j|S = s). x The test defined by this p-value is known as Fisher’s Exact Test. 168 CHAPTER 9. HYPOTHESIS TESTING 9.8 Interval Estimator by Inverting a Test Statistic There is a correspondence between hypothesis testing and interval estimation. Every interval estimator corresponds to a test statistic and vice versa. This is the subject of the following theorem: Theorem 9.17. For each θ0 ∈ Θ, let R(θ0 ) denote the critical region of a level α test of H0 : θ = θ0 . For each x ∈ X , let c(x) = {θ0 : x 6∈ R(θ0 )}. Then the set C(X) is a 1 − α confidence set. Conversely, let C(X) be a 1 − α confidence set. For any θ0 ∈ Θ, define R(θ0 ) = {x : θ0 6∈ C(x)}. Then R(θ0 ) is the critical region for a level α test of H0 : θ = θ0 . Proof For the first part, for each θ ∈ Θ, α ≥ Pθ (X ∈ R(θ)) = 1 − Pθ (θ ∈ C(X)) ⇒ Pθ (θ ∈ C(X)) ≥ 1 − α. and hence C(X) is a 1 − α confidence set. For the second part, for all θ ∈ Θ, Pθ (X ∈ R(θ)) = Pθ (θ 6∈ C(X)) ≤ α so that this is a level α test. Example 9.11 (Inverting a LRT). Let X1 , . . . , Xn be an Exp(λ) random sample. Construct a confidence interval for λ by inverting the LRT. Solution The sample space is X = Rn+ . Let x = (x1 , . . . , xn ) ∈ X denote an outcome and x = 1 Pn j=1 xj .For the test H0 : λ = λ0 versus H1 : λ 6= λ0 , the LRT is: n λ(x) = λn0 e−λ0 Pn supλ λn e−λ j=1 Pn xj j=1 xj = (xλ)n en−nλx . For a fixed λ0 , the critical region is o n R(λ0 ) = x : λ0 xe−λ0 x < k ∗ where k ∗ is a constant chosen to satisfying Pλ0 (X ∈ R(λ0 )) = α. 9.8. INTERVAL ESTIMATOR BY INVERTING A TEST STATISTIC 169 The inversion method gives a confidence set of o n C(x) = λ : λxe−λx ≥ k ∗ . The interval depends on x only through x; a b C(x) = , x x where a ae−a = be−b = k ∗ . By taking logarithms, it is clear that the equation ye−y = k ∗ has no solutions for k ∗ > e−1 , has one solution (y = 1) for k ∗ = e−1 and two solutions for k ∗ ∈ (0, e−1 ). P Recall that 2λ nj=1 Xj ∼ χ22n . Let F be the c.d.f. of the χ22n distribution, then 1 − α = Pλ 2na ≤ 2λ n X j=1 Xj ≤ 2nb = F (2nb) − F (2na) and solutions may be obtaine numerically by finding a and b which satisfy: ( F (2nb) − F (2na) = 1 − α ae−a = be−b . This does not give a symmetric confidence interval; the exact symmetric confidence interval for λ is: " 1 2n k2n,1−(α/2) X , 1 2n k2n,α/2 X # where k2n,β is the value such that 1 − F (k2n,β ) = β. The LRT interval has the advantage over the symmetric interval that the parameter values in the interval are those which give the best likelihood ratios. Except for a few particular examples, it cannot be computed explicitly and needs numerical approximations.