Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
U.C. Berkeley — Stat 135 : Concepts of Statistics Professor: Noureddine El Karoui Homework 5 Due March 15, 2007 Solutions for Homework 5 Problems Chapter 9: 12, 13, 24, 30, 32, 36, 43, 57 Problem 9.12 We know the MLE θ̂ is given by 1/X̄ in this situation. Since statistic is now Λ = = = P i Xi = nX̄, the generalized likelihood ratio Πni=1 θ0 exp(−θ0 Xi ) n Πi=1 (1/X̄) exp(−Xi /X̄) θ0n exp(−θ0 nX̄) (1/X̄)n exp(−n) n eθ0 X̄ exp(−θ0 X̄) . Since n, e, and θ0 are all positive, Λ is small precisely when X̄ exp(−θ0 X̄) is small. Since the generalized likelihood ratio test rejects when Λ < c0 , we see that this is equivalent to rejecting when X̄ exp(−θ0 X̄) ≤ c. Problem 9.13 Recall that in this problem, θ0 = 1, so the results of 9.12 apply with θ0 replaced by 1 everywhere. a Define f (x) = xe−x . Based on 9.12, we see that our test will be to reject H0 when f (X̄) is small. From Figure 1 it is clear that this is precisely when X̄ < x0 (c) or X̄ > x1 (c). To show this a bit more analytically, observe that f 0 (x) = exp(−x)(1 − x), which is positive for x ∈ (0, 1) and negative for x > 1. Thus f (x) is strictly increasing on (0, 1) and strictly decreasing on (1, ∞). Further, f (0) = 0 and f (x) → 0 as x → ∞. This, plus continuity of f , implies that for any c ∈ (0, f (1)), there are exactly two solutions to f(x) = c, and that {x : f (x) ≤ c} is of the desired form. (See Figure 1: there are two solutions b/c the line y=c cuts the graph of the function f exactly twice if c < 1/e. Note that 1/e is the maximum of f .) b In the Neyman-Pearson framework, the rejection region should be such that P0 (reject H0 ) = α. From part a we see that choosing a rejection region for Λ is equivalent to choosing a rejection region of the form [0, c] 1 0.3 0.2 xex 0.0 0.1 c = .1 x0 0 x1 1 2 3 4 5 6 x Figure 1: Question 13a for f (X̄). We should now choose this c to give the correct probability, and here α = .05. c On page 147 we see that for independent X ∼ Γ(α1 , λ) and Y ∼ Γ(α2 , λ), the sum X + Y is distributed as Γ(α1 + α2 , λ). It is also true that, for a positive constant κ, κX ∼ Γ(α1 , λ/κ). (Apply Proposition B on P page 60 with g(x) = κx.) Observe also that X ∼ Exp(1) ⇔ X ∼ Γ(1, 1). Thus under H0 , i Xi ∼ Γ(n, 1) and X̄ ∼ Γ(n, n). Let F be the CDF for Γ(n, n). Then for any c ∈ (0, e), we can numerically solve f (x) = c to get x0 (c) and x1 (c), and α(c) = P0 (X̄ exp(−X̄) ≤ c) = P0 (X̄ ∈ [0, x0 (c)] ∪ [x1 (c), ∞)) = F (x0 (c)) + 1 − F (x1 (c)). To obtain a specific α, we’d now need to have the computer try a range of values for c until α(c) came out 2 close to the desired value. d We could repeatedly generate sets of n = 10 independent Exp(1) variables, and for each such set compute Wi ≡ X̄i exp(−X̄i ). Given B such Wi , let i∗ = bαBc. Then W(i∗ ) (basically our estimate of the α quantile of our test statistic) provides a good approximation to c, particularly for very large B. Problem 9.24 a In this case, we can write out the likelihood under the null hypothesis explicitly: L(1/2) = n n (1/2)X (1/2)n−X = (1/2)n . X X Furthermore we know that the maximum likelihood estimator of p for the binomial distribution is X/n. This means we can write the generalized likelihood ratio as n n X (1/2) n X n−X X (X/n) (1 − X/n) Λ= = nn (1/2)n (n/2)n = . nX (X/n)X nn−X (1 − X/n)n−X X X (n − X)n−X b First, let g(x) = 1/(xx (n − x)n−x ). Notice that this is the non-constant part of the generalized likelihood ratio. Notice also that g(x) = g(n − x) (try plugging in n − x to see why this is true), which means that g(x) is symmetric about n/2. This means that letting y = x − n/2 we have g(n/2 + y) = g(n/2 − y). Without loss of generality, we will consider the function g(n/2 + y) for y ≥ 0. For the argument we will make, an equivalent argument will hold for g(n/2 − y). Call h(y) = log(g(n/2 + y)). We want to show that h is non-increasing. This will mean that as y gets bigger, h(y) gets smaller. Note that this is what we want to show since, as y gets bigger, so does x − n/2, and as h gets smaller, so does our likelihood ratio statistic. Since a symmetric argument holds for g(n/2 − y) (which will be non-increasing as y gets big, assuming y ≥ 0), we know that, if h is indeed non-increasing for y ≥ 0, as |y| gets big, our likelihood ratio gets small which corresponds to rejecting the null hypothesis. To show why h(y) is non-increasing, we see first that log(g(n/2 + y)) = −(n/2 + y) log(n/2 + y) − (n/2 − y) log(n/2 − y) = h(y) and therefore that h0 (y) = log(n/2 − y) − log(n/2 + y) ≤ 0 since y ≥ 0 and log is an increasing function. This means that h(y) is a non-increasing function of y, as we wanted to show. 3 Note that this test statistic and rejection region make intuitive sense. Under the null hypothesis, E(X) = n/2, so the farther X deviates from n/2, the more intuitive “evidence” we have against the null, using the weak law of large numbers. c To find the significance level for a test corresponding to a rejection region |X −n/2| > k, we want to compute α = P0 (|X − n/2| > k) = P0 ((X − n/2) < −k or (X − n/2) > k) = P0 (X < −k + n/2) + P0 (X > k + n/2). The last equality holds because the events {X < −k + n/2} and {X > k + n/2} are disjoint. Since our null distribution is Bin(n, 1/2), we can compute these probabilities explicitly for any values of n and k. d Given that n = 10 and k = 2, our null distribution is Bin(10,1/2) and we want to calculate α = P0 (X < 3 or X > 7) = P0 (X ∈ {0, 1, 2, 8, 9, 10}). We can compute this using a binomial table or R to get α = 0.0654. e In this case, our null distribution is Bin(100,1/2) and we want to find α = P0 (X < 40 or X > 60) = P0 (X < 40) + P0 (X > 60). In this case, we have E(X) = 50 Pnand Var(X) = np(1 − p) = 25. We can use the normal approximation (recall that we can view X as X = i=1 Yi , where Yi are i.i.d Bernoulli(p), so the central limit theorem applies when n is large) , writing P0 (X < 40) X − 50 5 ≈ Φ(−2) = 0.0228 = P0 < 40 − 50 5 Due to symmetry, we also have P0 (X > 60) = 0.0228 and our significance level is therefore approximately 0.0455. Problem 9.30 Part A We reject the null when T > c where c is determined by the significance level α. 4 Pr0 (Reject Null) = Pr0 (T > c) = 1 − F0 (c) = α Suppose we observe T = t0 and we want to find the p-value. As explained in section 9.2.1, the p-value is the probability under the null of getting something as or more extreme than what was observed. More formally, 1. Assume T 0 is a new independent copy of T . 2. The p-value is then Pr0 (T 0 ≥ t0 ) Now, Pr0 (T 0 ≥ t0 ) = 1 − F0 (t0 ) + P (T 0 = t0 ) = 1 − F0 (t0 ) The last step follows from the assumption that T is continuous. This implies that the p-value is given by V = 1 − F (T ) Part B First note that if X is uniform on (0, 1) then 1 − X is as well. So, if we can conclude that F (T ) is uniform on (0, 1) then we will be done. We would like to argue as in Proposition C of 2.3, but F −1 might not be well defined. In general, F (x) may have jumps and may be flat in some regions, in which case the definition of F −1 is unclear. For this problem T is a continuous random variable, so we don’t have to worry about jumps. However, if the density of T is zero on some interval and positive on both sides, then the CDF has a flat region where an inverse is not be well defined. Define F −1 (p) = min{x : F (x) ≥ p} We note the following properties: 1. If F is strictly increasing at p, then this coincides with the regular definition 2. F −1 is non-decreasing 3. F (F −1 (z)) = z∀z ∈ (0, 1) Let A = {t|fT (t) > 0} and let B = {ω|T (ω) ∈ A} Then P (B) = 1 so Pr(F (T ) ≤ z) = Pr((F −1 (F (T )) ≤ F −1 (z)) = Pr((F −1 (F (T )) ≤ F −1 (z)) ∩ B) The first equality follows by property 2 above and the second follows since P (B) = 1. Now, if ω ∈ B then by property 1 above (F −1 (F (T (ω))) = T (ω). So, 5 Pr((F −1 (F (T )) ≤ F −1 (z)) ∩ B) = Pr(T ≤ F −1 (z)) ∩ B) = Pr(T ≤ F −1 (z))) = F (F −1 (z)) = z Where the last equality follows by property 3. Extra Comment: Not part of the problem For those of you who have taken analysis, the definition of the quantile function is F −1 (p) = inf {x : F (x) ≥ p} Part C Since the p-value is uniform, this is .9 Part D This follows since Pr0 (V < α) = Pr(U < α) = α Problem 9.32 Part A The likelihood ratio is exp( −x 1 (x − 100)2 − (x − 125)2 = exp(4.5) ∗ exp( ) 2 ∗ (252 ) 25 If we observe x = 120, Then we get exp(4.5 − 4.8) = .74 Part B If our priors are equal, then the ratio of posteriors is simply the likelihood ratio (see page 330). So if β is the posterior probability for alternative B, then we know that 1−β = .74 β Which implies that β= 1 = ..57 1 + .74 6 Part C This is the probability under A that X > 125. But under A, X = 100 + 25Z in distribution where Z is standard normal. Therefore, α = PrA (X > 125) = Pr(100 + 25Z > 125) = Pr(Z > 1) = 1 − Φ(1) = .1587 Part D This is the probability under B that X > 125. But under B, X = 125 + 25Z in distribution where Z is standard normal. Therefore, Power = PrB (X > 125) = Pr(125 + 25Z > 125) = Pr(Z > 0) = .5 Part E This is the probability under A that X > 120. But under A, X = 100 + 25Z in distribution where Z is standard normal. Therefore, α = PrA (X > 120) = Pr(100 + 25Z > 120) = Pr(Z > 1) = 1 − Φ(.8) = .2119 Problem 9.36 Let Xi be the number of suicides during month i. The model is that the Xi are cell counts of a multinomial with n = 23, 480 total suicides and cell probabilities pi . We are testing H0 : pi = ni /365 for all i versus HA : pi 6= ni /365 for some i, where ni is the number of days in month i. Under the null distribution, the expected number of suicides in month i is ni × 23, 480/365. Construct the following table. Month Jan. Feb. Mar. Apr. May. Jun. Jul. Aug. Sep. Oct. Nov. Dec. Total Days/Months 31 28 31 30 31 30 31 31 30 31 30 31 365 Number of Suicides 1867 1789 1944 2094 2097 1981 1887 2024 1928 2032 1978 1859 23480 Expected Number of Suicides 1994 1801 1994 1930 1994 1930 1994 1994 1930 1994 1930 1994 23480 7 ±(O − E)2 /E -8.11 -0.0827 -1.26 +14.0 +5.30 +1.36 -5.76 +0.446 -0.00180 +0.717 +1.20 -9.17 47.4 The bottom-right entry of the table is the observed χ2 statistic. Its null distribution is approximately chisquare with 11 degrees of freedom (11 free parameters under the alternative, and 0 estimated parameters under the null). The p-value √ is practically 0. A hanging chi-gram is plotted below. This is the plot of Pearson Residuals (O − E)/ E. There is an unusually high number of suicides in April, and unusually low numbers in January and December. If there is a seasonal trend, it unusually flips around in July. −3 −2 −1 0 1 2 3 Hanging chi−gram J F M A M J J A S O N D Problem 9.43 Part A From problem 24, we reject the null hypothesis when |X − n/2|islarge I use the normal approximation to say X − n/2 p ≈Z n/4 Where Z is standard normal. Now, c c −c c X − n/2 |> p ) ≈ Pr(|Z| > p ) = Φ( p ) + 1 − Φ( p ) Pr0 (|X − n/2| > c) = Pr0 (| p n/4 n/4 n/4 n/4 n/4 To get the P value for the observation X = 9207, we plug in c = 9207 − 17950/2 = 232 above. Then √ c n/4 = 3.46, so Φ( √ c n/4 ) = .9997 which implies that our p value is .0006 The model is extremely doubtful. Parts B and C Here there are 19750/5 = 3890 trials and each trial produces a number in 0 to 5. The numerator of the generalized likelihood ratio statistic will be the likelihood evaluated at either p = 1/2 or p = p̂M LE = 9207/17950 = .5129. From these probabilities we will get the expected counts for the cells. 8 If we assume that all 5 coins have probability p of heads, Then the cell probabilities are (1 − p)5 , 5p(1 − p)4 , 10p2 (1 − p)3 , 10p3 (1 − p)2 , 5p4 (1 − p), p5 We find our expected counts as follows > N = 3590 > p=.5 > EXP1 = N * c( (1-p)^5, 5*p*(1-p)^4, 10*p^2*(1-p)^3, 10*p^3*(1-p)^2, 5*p^4*(1-p), p^5 ) > p=9207/17950 > EXP2 = N * c( (1-p)^5, 5*p*(1-p)^4, 10*p^2*(1-p)^3, 10*p^3*(1-p)^2, 5*p^4*(1-p), p^5 ) > EXP1 [1] 112.1875 560.9375 1121.8750 1121.8750 560.9375 112.1875 > EXP2 [1] 98.4180 518.2058 1091.4150 1149.3375 605.1670 127.4568 And the resulting chi square statistics are > OBS = c(100, 524, 1080, 1126, 655, 105) > sum((OBS-EXP1)^2/EXP1) [1] 21.56813 > sum((OBS-EXP2)^2/EXP2) [1] 8.743703 In the first case, the dimension of the null is 0, while in the second case it is 1. So under the null, these are chi square with 5 and 4 degrees of freedom. Consulting the Chi-Square tables, these results are very unlikely. # Heads 0 1 2 3 4 5 Total Frequency 100 524 1080 1126 655 105 365 Expected Number (1/2) 1867 1867 1867 1867 1867 1867 23480 (O − E)2 /E 1994 1994 1994 1994 1994 1994 23480 Expected Number (p=.6) -8.11 -8.11 -8.11 -8.11 -8.11 -8.11 47.4 (O − E)2 /E 4 4 4 4 4 4 4 Problem 9.57 The tails of the Cauchy distribution decay less rapidly than those of the normal distribution, i.e. it has heavy tails. We can see this by comparing their density functions. The normal distribution has density function φ(x) = √ (x−µ)2 1 e− 2σ2 2πσ 2 Its tails decay exponentially–like e−x . However, the tails of the Cauchy distribution decay algebraically– like (1 + x)−1 . (Exponential decay is much more rapid than algebraic decay). This means that the upper 9 quantiles of the Cauchy are greater than those of the normal distribution, and similarly the lower quantiles of the Cauchy are lesser than those of the normal distribution. This is manifested in the normal probability plot by a systematic departure from a straight line at the ends, curving below on the left and above on the right. 10