Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
19. Sample correlation coefficient
Lehmann §5.4; Ferguson §8
Suppose that (X1 , Y1 ), (X2 , Y2 ), . . . are iid vectors with E Xi4 < ∞ and E Yi4 < ∞. For the sake of simplicity,
we will assume without loss of generality that E Xi = E Yi = 0 (alternatively, we could base all of the
following derivations on the centered versions of the random variables).
We wish to find the asymptotic distribution of the sample correlation r = sxy /(sx sy ), where if we let
Pn
mx
Xi
Pi=1
n
my
Pni=1 Yi2
mxx = 1
Xi
,
Pi=1
n
n
2
myy
Y
Pni=1 i
mxy
i=1 Xi Yi
(37)
then
s2x = mxx − m2x , s2y = myy − m2y , and sxy = mxy − mx my .
(38)
Notice that we have suppressed the n in the notation above in order to keep things slightly simpler. According
to the central limit theorem,
0
mx
Cov (X1 , X1 )
···
Cov (X1 , X1 Y1 )
Cov (Y1 , X1 )
·
·
·
Cov
(Y
,
X
Y
)
1
1
1
√ my 02 L
n mxx − σx → N5 0,
.
(39)
..
.
..
..
.
.
myy σy2
Cov (X1 Y1 , X1 ) · · · Cov (X1 Y1 , X1 Y1 )
mxy
σxy
Let Σ denote the covariance matrix in expression (39). Define a function g : R5 → R3 such that g applied
to the vector of moments in equation (37) yields the vector (s2x , s2y , sxy ) as defined in expression (38). Then
a
b
−2a
0
1 0 0
0
−2b 0 1 0 .
ġ
c =
d
−b
−a 0 0 1
e
Therefore, if we let
0
0
0
0
2
2
Σ∗ = ġ
σx2 Σġ σx2
σy
σy
σxy
σxy
t
Cov (X12 , Y12 )
Cov (X12 , X1 Y1 )
Cov (X12 , X12 )
= Cov (Y12 , X12 )
Cov (Y12 , Y12 )
Cov (Y12 , X1 Y1 ) ,
2
2
Cov (X1 Y1 , X1 ) Cov (X1 Y1 , Y1 ) Cov (X1 Y1 , X1 Y1 )
then by the delta method,
2 2
σx
sx
√
L
n s2y − σy2 → N3 (0, Σ∗ ).
sxy
σxy
√
√
√
√
Finally, define the function h(a, b, c) = c/ ab so that h(s2x , s2y , sxy ) = r. Then ḣ(a, b, c) = 12 (−c/ a3 b, −c/ ab3 , 2/ ab),
so that
2
σx
−σxy −σxy
1
−ρ −ρ
1
=
.
(40)
ḣ σy2 =
,
,
,
,
2σx3 σy 2σx σy3 σx σy
2σx2 2σy2 σx σy
σxy
52
Therefore, if A denotes the 1 × 3 matrix in equation (40), using the delta method once again yields
√
L
n(r − ρ) → N (0, AΣ∗ At ).
Consider the special case of bivariate normal (Xi , Yi ). In this case, we may derive
2σx4
2ρ2 σx2 σy2
2ρσx3 σy
.
2σy2
2ρσx σy3
Σ∗ = 2ρ2 σx2 σy2
2ρσx3 σy
2ρσx σy3 (1 + ρ2 )σx2 σy2
(41)
In this case, AΣ∗ At = (1 − ρ2 )2 , which implies that
√
L
n(r − ρ) → N {0, (1 − ρ2 )2 }.
(42)
In the normal case, we may derive a variance-stabilizing transformation. According to equation (42), we
should find a function f (x) satisfying f 0 (x) = (1 − x2 )−1 . Since
1
1
1
=
+
,
1 − x2
2(1 − x) 2(1 + x)
which is easy to integrate, we obtain
f (x) =
1
1+x
log
.
2
1−x
This is called Fisher’s transformation; we conclude that
√
1
1+r 1
1+ρ L
n
log
− log
→ N (0, 1).
2
1−r 2
1−ρ
Problems
Problem 19.1 Verify expressions (41) and (42).
Problem 19.2 Assume (X1 , Y1 ), . . . , (Xn , Yn ) are iid from some bivariate normal distribution. Let
ρ denote the population correlation coefficient and r the sample correlation coefficient.
(a) Describe a test of H0 : ρ = 0 against H1 : ρ 6= 0 based on the fact that
√
L
n[f (r) − f (ρ)] → N (0, 1),
where f (x) is Fisher’s transformation f (x) = (1/2) log[(1 + x)/(1 − x)]. Use α = .05.
(b) Based on 5000 repetitions each, estimate the actual level for this test in the case when
E (Xi ) = E (Yi ) = 0, Var (Xi ) = Var (Yi ) = 1, and n ∈ {3, 5, 10, 20}.
Problem 19.3 Suppose that X and Y are jointly distributed such that X and Y are Bernoulli
(1/2) random variables with P (XY = 1) = θ for θ ∈ (0, 1/2). Let (X1 , Y1 ), (X2 , Y2 ), . . . be iid
with (Xi , Yi ) distributed as (X, Y ).
√ (a) Find the asymptotic distribution of n (X n , Y n ) − (1/2, 1/2) .
(b) If rn is the
√ sample correlation coefficient for a sample of size n, find the asymptotic
distribution of n(rn − ρ).
53
(c) Find a variance stabilizing transformation for rn .
(d) Based on your answer to part (c), construct a 95% confidence interval for θ.
(e) For each combination of n ∈ {5, 20} and θ ∈ {.05, .25, .45}, estimate the true coverage probability of the confidence interval in part (d) by simulating 5000 samples and the corresponding
confidence intervals.
Hint: To generate a sample of (X, Y ), first simulate the X’s from their marginal distribution,
then simulate the Y ’s according to the conditional distribution of Y given X. To obtain this
conditional distribution, simply find P (Y = 1 | X = 1) and P (Y = 1 | X = 0) using the
definition of conditional probability.
54