Download Siegel`s formula via Stein`s identities

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Capelli's identity wikipedia , lookup

Transcript
Siegel’s formula via Stein’s identities
Jun S. Liu
Department of Statistics
Harvard University
Abstract
Inspired by a surprising formula in Siegel (1993), we find it convenient to compute covariances,
even for order statistics, by using Stein (1972)’s identities. Generalizations of Siegel’s formula
to other order statistics as well as other distributions are obtained along this line.
KEY WORDS: Exponential family; multivariate Stein’s identity; covariance with order
statistics.
1
Siegel’s result and Stein’s identities
In a recent paper of Siegel (1993), a remarkable formula is found for the covariance of X1 with
the minimum of the multivariate normal vector (X1 , . . . , Xn ). The theorem is stated as follows.
Theorem [Siegel] Let (X1 , . . . , Xn ) be multivariate normal, such that the variables are distinct,
with arbitrary mean and variance structure. Then
cov[X1 , min(X1 , . . . , Xn )] =
n
X
cov(X1 , Xi )Pr[Xi = min(X1 , . . . , Xn )].
i=1
On the other hand, Stein (1972) observes that for Z ∼ N (µ, σ 2 ) and any function f such that
0
E|f (Z)| < ∞,
0
E[(Z − µ)f (Z)] = σ 2 E[f (Z)].
Similar identities hold for many other distributions. For example, if Z ∼ Poisson(λ), then
E[(Z − λ)f (Z)] = λE[f (Z + 1) − f (Z)].
0
Address corresponds to: Jun S. Liu, Department of Statistics, Harvard University, Cambridge, MA 02138.
1
(1)
If Z ∼ tν (0, σ 2 ), i.e., Z = Y / W/ν, where Y ∼ N (0, σ 2 ) and W ∼ χ2ν , then
p
E[Zf (Z)] =
q
0
ν/(ν − 2)σ 2 E[f (Z ∗ )],
where Z ∗ ∼ tν−2 (µ, σ 2 ). The above identities are directly proved as an application of integration
by parts. To illustrate, we show the last identity when σ = 1:
E[Zf (Z)] =
=
=
=
Γ[(ν + 1)/2]
z2
zf (z) √
1+
νπΓ(ν/2)
ν
−∞
Z
∞
Γ[(ν + 1)/2]
√
νπΓ(ν/2)
r
q
ν
ν−2
Z
∞
∞
ν
ν−1
Z
∞
−∞
!−(ν+1)/2


f (z) − 1 +
dz
z2

ν
!−(ν−1)/2 0

Γ[(ν − 1)/2]
z2
f (z) p
1+
ν−2
(ν − 2)πΓ(ν/2 − 1)
0
dz

!−(ν−1)/2
dz
0
ν/(ν − 2)E[f (Z ∗ )].
There is also a multivariate version of Stein’s identity as follows.
Lemma 1 Let X = (X1 , . . . , Xn ) be multivariate normally distributed with arbitrary mean vector µ
and covariance matrix Σ. For any function h(x1 , . . . , xn ) such that ∂h/∂xi exists almost everywhere
T
∂h(X)
∂
and E| ∂x
h(X)| < ∞, i = 1, . . . , n, we write ∇h(X) = ( ∂h(X)
∂x1 , . . . , ∂xn ) . Then the following
i
identity is true:
cov[X, h(X)] = Σ E[∇h(X)].
Specifically,
cov[X1 , h(X1 , . . . , Xn )] =
n
X
cov(X1 , Xi )E[
i=1
∂
h(X1 , . . . , Xn )].
∂xi
(2)
(3)
Proof: Let Z = (Z1 , . . . , Zn ), where the Zi are i.i.d. N (0, 1) random variables. It is straightforward to show by (1) that for any g(X), differentiable almost everywhere, cov[Zi , g(Z)] =
E[∂g(Z)/∂zi ]. Hence
cov[Z, g(Z)] = E[∇g(Z)].
Also see Stein (1981) for a more elaborate proof. Now for the random vector X, we can write
X = Σ1/2 Z + µ, and g(Z) = h(Σ1/2 Z + µ). Hence the left hand side of (2) is
cov[X, h(X)] = cov[Σ1/2 Z, g(Z)] = Σ1/2 E[∇g(Z)].
Since ∇g(Z) = Σ1/2 ∇h(X), the identity (2) follows immediately. 2
2
REMARK: A version of this lemma appears in Stein (1981) with Σ being the identity matrix.
However, the search of an explicit formula as (2) is not successful, although it is believed that the
form is well-known in the fields of sliced inverse regression and decision theory. Siegel (1993)’s
method can also be applied to prove the identity (2) but is less direct.
We find that Siegel’s theorem is closely related to Stein’s identities and can be generalized to
covariances of X1 with other order statistics, other functionals, and for other distributions.
2
A simple proof of the generalized Siegel’s formula
Let fu (z) = min(z, u). Then fu is piecewise linear in z. Hence for Z ∼ N (µ, σ 2 ), Stein’s
identity can be applied to get that
0
cov[Z, fu (Z)] = σ 2 E[fu (Z)] = σ 2 E(I{Z<u} ) = σ 2 Pr[Z = min(Z, u)],
which easily leads to Lemmas 2 and 3 of Siegel (1993). Then some more properties of the normal
distribution are used in Siegel (1993) to obtain the proof of his theorem. We find, however, that
it is convenient to use a multivariate version of Stein’s identity directly to prove the following
generalized Siegel’s formula.
Theorem 1 Let (X1 , . . . , Xn ) be multivariate normal, such that the variables are distinct, with
arbitrary mean and variance structure. Let X(i) be the ith largest among X1 , . . . , Xn , then
cov[X1 , X(i) ] =
n
X
cov(X1 , Xj )P r(Xj = X(i) ).
j=1
Proof: Define h(x1 , . . . , xn ) = x(i) . Then h is piecewise linear and therefore almost everywhere
differentiable. Also it is noticed that ∂h(x1 , . . . , xn )/∂xj is one if xj = x(i) and zero if xj 6= x(i) .
Hence by applying identity (3) we obtain the result. 2
REMARK: When X1 , . . . , Xn are i.i.d., Carl Morris and I found the following ancillary argument
to prove the above result. Since X(i) − X̄ is independent of X̄, cov(X̄, X(i) ) = cov(X̄, X̄) =
var(X1 )/n. On the other hand, since cov(Xj , X(i) ) = cov(Xk , X(i) ) for any j and k because of the
i.i.d. assumption, we obtain that cov(X1 , X(i) ) = var(X1 )/n.
For a multivariate normal random vector, the covariance of X1 with min(|X1 |, . . . , |Xn |) can
also be computed similarly. Define h(x1 , . . . , xn ) = min(|x1 |, . . . , |xn |), then ∂h(x)/∂xj is equal to
3
1 if 0 < xj < u; −1 if −u < xj < 0; and 0 if |xj | > u, where u = min(|Xk |, k 6= j). Hence Stein’s
identity (3) leads to the following corollary.
Corollary 1 Let Y = min(|X1 |, . . . , |Xn |), where (X1 , . . . , Xn ) is multivariate normal with arbitrary mean and covariance structure. Then
cov(X1 , Y ) =
n
X
cov(X1 , Xi )[Pr(|Xi | = Y, Xi > 0) − Pr(|Xi | = Y, Xi < 0)].
i=1
When X1 , . . . , Xn have mean zero, the above covariance equals zero.
3
Generalizations to other distributions
Stein’s identities for other distributions can also be used to obtain similar results.
Corollary 2 Let X1 ∼Poisson(λ1 ) be independent of X2 , . . . , Xn , then
cov[X1 , min(X1 , . . . , Xn )] = λ1 P r[X1 < Xj , for all j 6= 1).
Proof: Define fu (x) the same way as before, then by Stein’s identity for a Poisson distribution:
cov[X1 , fu (X1 )] = λ1 E[fu (X1 + 1) − fu (X1 )] = λ1 Pr(X1 < u).
Hence conditionally
cov[X1 , fu (X1 ) | X2 , . . . , Xn ] = λ1 Pr[X1 < min(X2 , . . . , Xn ) | X2 , . . . Xn ].
Since cov(U, V )=E[cov(U, V |W )]+cov[E(U |W ), E(V |W )], and X1 is independent of Xj , j 6= 1, the
conclusion follows. 2
Corollary 3 Let X1 ∼ tν (0, σ 2 ) be independent of X2 , . . . , Xn , then
cov[X1 , min(X1 , . . . , Xn )] =
r
ν
σ 2 P r[X1∗ < min(X2 , . . . , Xn )],
ν−2
where X1∗ ∼ tν−2 (0, σ 2 ). Let X(i) be the ith order statistics, then more generally,
cov[X1 , X(i) ] =
r
ν
σ 2 P r[X(i−1) < X1∗ < X(i+1) ].
ν−2
Proof: Use Stein’s identity for t-distribution. 2
4
More generally, Morris (1983) provides Stein-type identities (theorem 5.2) for natural exponential families with quadratic variance functions (NEF-QVF), which can be used to generalize the
above arguments to a broader class of distributions (NEF-QVF) without much difficulty.
To illustrate, we compute cov[Y1 , min(Y1 , . . . , Yn )] where Yi ∼ Exponential(θi ) are independent,
E(Yi ) = 1/θi , and var(Yi ) = 1/θi2 . Stein’s identity takes the form
cov[Y1 , f (Y1 )] =
1
0
E[Y1 f (Y1 )].
θ1
Let fu (x) = min(x, u), then cov[Y1 , fu (Y1 )] = (1/θ1 )E(Y1 I{Y1 <u} ). Since U = min(Y2 , . . . , Yn ) ∼
Exponential(θ2 + · · · + θn ),
cov[Y1 , min(Y1 , . . . , Yn )] =
1
1
1
E[E(Y1 I{Y1 <U } | U )] = E[Y1 Pr(U > Y1 | Y1 )] =
.
θ1
θ1
(θ1 + · · · + θn )2
REMARK: Firstly, it is noted that this covariance does not depend on θ1 at all, and a more
general version is cov[Yi , min(Y1 , . . . , Yn )] = var[min(Y1 , . . . , Yn )]. Secondly, Professor Carl Morris
provides another ancillary argument for the result: consider a location shift family of exponential
distributions with known scale parameters θ1 , . . . , θn . Then Y(1) = min(Y1 , . . . , Yn ) is complete
sufficient for the location parameter. Hence by Basu’s theorem, Y1 − Y(1) is independent of Y(1) ,
and therefore the result holds.
Acknowledgement: The author thanks Professors Hal Stern and Carl Morris for bringing up
the problem and providing very stimulating discussions. Professor Morris also suggested the computations for the exponential distribution.
REFERENCES
Morris, C.N. (1983) Natural exponential families with quadratic variance functions: statistical
theory, Ann. Statist., 11, 515-529.
Siegel, A.F. (1993) A surprising covariance involving the minimum of multivariate normal variables, J. Amer. Statist. Assoc., 88, 77-80.
Stein, C.M. (1972) A bound for the error in the normal approximation to the distribution of a
sum of dependent random variables, Proc. Sixth Berkeley Symp. Math. Statist. Probab., 2,
583-602.
5
Stein, C.M. (1981) Estimation of the mean of a multivariate normal distribution, Ann. Statist.,
6, 1135-1151.
6