Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
19. Sample correlation coefficient Lehmann §5.4; Ferguson §8 Suppose that (X1 , Y1 ), (X2 , Y2 ), . . . are iid vectors with E Xi4 < ∞ and E Yi4 < ∞. For the sake of simplicity, we will assume without loss of generality that E Xi = E Yi = 0 (alternatively, we could base all of the following derivations on the centered versions of the random variables). We wish to find the asymptotic distribution of the sample correlation r = sxy /(sx sy ), where if we let Pn mx Xi Pi=1 n my Pni=1 Yi2 mxx = 1 Xi , Pi=1 n n 2 myy Y Pni=1 i mxy i=1 Xi Yi (37) then s2x = mxx − m2x , s2y = myy − m2y , and sxy = mxy − mx my . (38) Notice that we have suppressed the n in the notation above in order to keep things slightly simpler. According to the central limit theorem, 0 mx Cov (X1 , X1 ) ··· Cov (X1 , X1 Y1 ) Cov (Y1 , X1 ) · · · Cov (Y , X Y ) 1 1 1 √ my 02 L n mxx − σx → N5 0, . (39) .. . .. .. . . myy σy2 Cov (X1 Y1 , X1 ) · · · Cov (X1 Y1 , X1 Y1 ) mxy σxy Let Σ denote the covariance matrix in expression (39). Define a function g : R5 → R3 such that g applied to the vector of moments in equation (37) yields the vector (s2x , s2y , sxy ) as defined in expression (38). Then a b −2a 0 1 0 0 0 −2b 0 1 0 . ġ c = d −b −a 0 0 1 e Therefore, if we let 0 0 0 0 2 2 Σ∗ = ġ σx2 Σġ σx2 σy σy σxy σxy t Cov (X12 , Y12 ) Cov (X12 , X1 Y1 ) Cov (X12 , X12 ) = Cov (Y12 , X12 ) Cov (Y12 , Y12 ) Cov (Y12 , X1 Y1 ) , 2 2 Cov (X1 Y1 , X1 ) Cov (X1 Y1 , Y1 ) Cov (X1 Y1 , X1 Y1 ) then by the delta method, 2 2 σx sx √ L n s2y − σy2 → N3 (0, Σ∗ ). sxy σxy √ √ √ √ Finally, define the function h(a, b, c) = c/ ab so that h(s2x , s2y , sxy ) = r. Then ḣ(a, b, c) = 12 (−c/ a3 b, −c/ ab3 , 2/ ab), so that 2 σx −σxy −σxy 1 −ρ −ρ 1 = . (40) ḣ σy2 = , , , , 2σx3 σy 2σx σy3 σx σy 2σx2 2σy2 σx σy σxy 52 Therefore, if A denotes the 1 × 3 matrix in equation (40), using the delta method once again yields √ L n(r − ρ) → N (0, AΣ∗ At ). Consider the special case of bivariate normal (Xi , Yi ). In this case, we may derive 2σx4 2ρ2 σx2 σy2 2ρσx3 σy . 2σy2 2ρσx σy3 Σ∗ = 2ρ2 σx2 σy2 2ρσx3 σy 2ρσx σy3 (1 + ρ2 )σx2 σy2 (41) In this case, AΣ∗ At = (1 − ρ2 )2 , which implies that √ L n(r − ρ) → N {0, (1 − ρ2 )2 }. (42) In the normal case, we may derive a variance-stabilizing transformation. According to equation (42), we should find a function f (x) satisfying f 0 (x) = (1 − x2 )−1 . Since 1 1 1 = + , 1 − x2 2(1 − x) 2(1 + x) which is easy to integrate, we obtain f (x) = 1 1+x log . 2 1−x This is called Fisher’s transformation; we conclude that √ 1 1+r 1 1+ρ L n log − log → N (0, 1). 2 1−r 2 1−ρ Problems Problem 19.1 Verify expressions (41) and (42). Problem 19.2 Assume (X1 , Y1 ), . . . , (Xn , Yn ) are iid from some bivariate normal distribution. Let ρ denote the population correlation coefficient and r the sample correlation coefficient. (a) Describe a test of H0 : ρ = 0 against H1 : ρ 6= 0 based on the fact that √ L n[f (r) − f (ρ)] → N (0, 1), where f (x) is Fisher’s transformation f (x) = (1/2) log[(1 + x)/(1 − x)]. Use α = .05. (b) Based on 5000 repetitions each, estimate the actual level for this test in the case when E (Xi ) = E (Yi ) = 0, Var (Xi ) = Var (Yi ) = 1, and n ∈ {3, 5, 10, 20}. Problem 19.3 Suppose that X and Y are jointly distributed such that X and Y are Bernoulli (1/2) random variables with P (XY = 1) = θ for θ ∈ (0, 1/2). Let (X1 , Y1 ), (X2 , Y2 ), . . . be iid with (Xi , Yi ) distributed as (X, Y ). √ (a) Find the asymptotic distribution of n (X n , Y n ) − (1/2, 1/2) . (b) If rn is the √ sample correlation coefficient for a sample of size n, find the asymptotic distribution of n(rn − ρ). 53 (c) Find a variance stabilizing transformation for rn . (d) Based on your answer to part (c), construct a 95% confidence interval for θ. (e) For each combination of n ∈ {5, 20} and θ ∈ {.05, .25, .45}, estimate the true coverage probability of the confidence interval in part (d) by simulating 5000 samples and the corresponding confidence intervals. Hint: To generate a sample of (X, Y ), first simulate the X’s from their marginal distribution, then simulate the Y ’s according to the conditional distribution of Y given X. To obtain this conditional distribution, simply find P (Y = 1 | X = 1) and P (Y = 1 | X = 0) using the definition of conditional probability. 54