Download 19. Sample correlation coefficient

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Renormalization group wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Least squares wikipedia , lookup

Generalized linear model wikipedia , lookup

Taylor's law wikipedia , lookup

Transcript
19. Sample correlation coefficient
Lehmann §5.4; Ferguson §8
Suppose that (X1 , Y1 ), (X2 , Y2 ), . . . are iid vectors with E Xi4 < ∞ and E Yi4 < ∞. For the sake of simplicity,
we will assume without loss of generality that E Xi = E Yi = 0 (alternatively, we could base all of the
following derivations on the centered versions of the random variables).
We wish to find the asymptotic distribution of the sample correlation r = sxy /(sx sy ), where if we let

 Pn


mx
Xi
Pi=1
n
 my 



 Pni=1 Yi2 

 mxx  = 1 
Xi 

,

Pi=1
n
n
2 
 myy 

Y
Pni=1 i
mxy
i=1 Xi Yi
(37)
then
s2x = mxx − m2x , s2y = myy − m2y , and sxy = mxy − mx my .
(38)
Notice that we have suppressed the n in the notation above in order to keep things slightly simpler. According
to the central limit theorem,

 

 

0
mx


Cov (X1 , X1 )
···
Cov (X1 , X1 Y1 )

















  Cov (Y1 , X1 )
·
·
·
Cov
(Y
,
X
Y
)
1
1
1
√  my   02  L






n  mxx  −  σx  → N5 0, 
.
(39)

..
.
..
..




.
.




 myy   σy2 










Cov (X1 Y1 , X1 ) · · · Cov (X1 Y1 , X1 Y1 )
mxy
σxy
Let Σ denote the covariance matrix in expression (39). Define a function g : R5 → R3 such that g applied
to the vector of moments in equation (37) yields the vector (s2x , s2y , sxy ) as defined in expression (38). Then


a


 b 
−2a
0
1 0 0



 0
−2b 0 1 0  .
ġ 
 c  =
 d 
−b
−a 0 0 1
e
Therefore, if we let



0
0
 0 
 0
 2 
 2


Σ∗ = ġ 
 σx2  Σġ  σx2
 σy 
 σy
σxy
σxy
t



Cov (X12 , Y12 )
Cov (X12 , X1 Y1 )
Cov (X12 , X12 )

 =  Cov (Y12 , X12 )
Cov (Y12 , Y12 )
Cov (Y12 , X1 Y1 )  ,

2
2

Cov (X1 Y1 , X1 ) Cov (X1 Y1 , Y1 ) Cov (X1 Y1 , X1 Y1 )
then by the delta method,
 2   2 
σx
sx

√ 
L
n  s2y  −  σy2  → N3 (0, Σ∗ ).


sxy
σxy
√
√
√
√
Finally, define the function h(a, b, c) = c/ ab so that h(s2x , s2y , sxy ) = r. Then ḣ(a, b, c) = 12 (−c/ a3 b, −c/ ab3 , 2/ ab),
so that
 2 
σx
−σxy −σxy
1
−ρ −ρ
1
=
.
(40)
ḣ  σy2  =
,
,
,
,
2σx3 σy 2σx σy3 σx σy
2σx2 2σy2 σx σy
σxy
52
Therefore, if A denotes the 1 × 3 matrix in equation (40), using the delta method once again yields
√
L
n(r − ρ) → N (0, AΣ∗ At ).
Consider the special case of bivariate normal (Xi , Yi ). In this case, we may derive


2σx4
2ρ2 σx2 σy2
2ρσx3 σy
.
2σy2
2ρσx σy3
Σ∗ =  2ρ2 σx2 σy2
2ρσx3 σy
2ρσx σy3 (1 + ρ2 )σx2 σy2
(41)
In this case, AΣ∗ At = (1 − ρ2 )2 , which implies that
√
L
n(r − ρ) → N {0, (1 − ρ2 )2 }.
(42)
In the normal case, we may derive a variance-stabilizing transformation. According to equation (42), we
should find a function f (x) satisfying f 0 (x) = (1 − x2 )−1 . Since
1
1
1
=
+
,
1 − x2
2(1 − x) 2(1 + x)
which is easy to integrate, we obtain
f (x) =
1
1+x
log
.
2
1−x
This is called Fisher’s transformation; we conclude that
√
1
1+r 1
1+ρ L
n
log
− log
→ N (0, 1).
2
1−r 2
1−ρ
Problems
Problem 19.1 Verify expressions (41) and (42).
Problem 19.2 Assume (X1 , Y1 ), . . . , (Xn , Yn ) are iid from some bivariate normal distribution. Let
ρ denote the population correlation coefficient and r the sample correlation coefficient.
(a) Describe a test of H0 : ρ = 0 against H1 : ρ 6= 0 based on the fact that
√
L
n[f (r) − f (ρ)] → N (0, 1),
where f (x) is Fisher’s transformation f (x) = (1/2) log[(1 + x)/(1 − x)]. Use α = .05.
(b) Based on 5000 repetitions each, estimate the actual level for this test in the case when
E (Xi ) = E (Yi ) = 0, Var (Xi ) = Var (Yi ) = 1, and n ∈ {3, 5, 10, 20}.
Problem 19.3 Suppose that X and Y are jointly distributed such that X and Y are Bernoulli
(1/2) random variables with P (XY = 1) = θ for θ ∈ (0, 1/2). Let (X1 , Y1 ), (X2 , Y2 ), . . . be iid
with (Xi , Yi ) distributed as (X, Y ).
√ (a) Find the asymptotic distribution of n (X n , Y n ) − (1/2, 1/2) .
(b) If rn is the
√ sample correlation coefficient for a sample of size n, find the asymptotic
distribution of n(rn − ρ).
53
(c) Find a variance stabilizing transformation for rn .
(d) Based on your answer to part (c), construct a 95% confidence interval for θ.
(e) For each combination of n ∈ {5, 20} and θ ∈ {.05, .25, .45}, estimate the true coverage probability of the confidence interval in part (d) by simulating 5000 samples and the corresponding
confidence intervals.
Hint: To generate a sample of (X, Y ), first simulate the X’s from their marginal distribution,
then simulate the Y ’s according to the conditional distribution of Y given X. To obtain this
conditional distribution, simply find P (Y = 1 | X = 1) and P (Y = 1 | X = 0) using the
definition of conditional probability.
54