Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Siegel’s formula via Stein’s identities Jun S. Liu Department of Statistics Harvard University Abstract Inspired by a surprising formula in Siegel (1993), we find it convenient to compute covariances, even for order statistics, by using Stein (1972)’s identities. Generalizations of Siegel’s formula to other order statistics as well as other distributions are obtained along this line. KEY WORDS: Exponential family; multivariate Stein’s identity; covariance with order statistics. 1 Siegel’s result and Stein’s identities In a recent paper of Siegel (1993), a remarkable formula is found for the covariance of X1 with the minimum of the multivariate normal vector (X1 , . . . , Xn ). The theorem is stated as follows. Theorem [Siegel] Let (X1 , . . . , Xn ) be multivariate normal, such that the variables are distinct, with arbitrary mean and variance structure. Then cov[X1 , min(X1 , . . . , Xn )] = n X cov(X1 , Xi )Pr[Xi = min(X1 , . . . , Xn )]. i=1 On the other hand, Stein (1972) observes that for Z ∼ N (µ, σ 2 ) and any function f such that 0 E|f (Z)| < ∞, 0 E[(Z − µ)f (Z)] = σ 2 E[f (Z)]. Similar identities hold for many other distributions. For example, if Z ∼ Poisson(λ), then E[(Z − λ)f (Z)] = λE[f (Z + 1) − f (Z)]. 0 Address corresponds to: Jun S. Liu, Department of Statistics, Harvard University, Cambridge, MA 02138. 1 (1) If Z ∼ tν (0, σ 2 ), i.e., Z = Y / W/ν, where Y ∼ N (0, σ 2 ) and W ∼ χ2ν , then p E[Zf (Z)] = q 0 ν/(ν − 2)σ 2 E[f (Z ∗ )], where Z ∗ ∼ tν−2 (µ, σ 2 ). The above identities are directly proved as an application of integration by parts. To illustrate, we show the last identity when σ = 1: E[Zf (Z)] = = = = Γ[(ν + 1)/2] z2 zf (z) √ 1+ νπΓ(ν/2) ν −∞ Z ∞ Γ[(ν + 1)/2] √ νπΓ(ν/2) r q ν ν−2 Z ∞ ∞ ν ν−1 Z ∞ −∞ !−(ν+1)/2 f (z) − 1 + dz z2 ν !−(ν−1)/2 0 Γ[(ν − 1)/2] z2 f (z) p 1+ ν−2 (ν − 2)πΓ(ν/2 − 1) 0 dz !−(ν−1)/2 dz 0 ν/(ν − 2)E[f (Z ∗ )]. There is also a multivariate version of Stein’s identity as follows. Lemma 1 Let X = (X1 , . . . , Xn ) be multivariate normally distributed with arbitrary mean vector µ and covariance matrix Σ. For any function h(x1 , . . . , xn ) such that ∂h/∂xi exists almost everywhere T ∂h(X) ∂ and E| ∂x h(X)| < ∞, i = 1, . . . , n, we write ∇h(X) = ( ∂h(X) ∂x1 , . . . , ∂xn ) . Then the following i identity is true: cov[X, h(X)] = Σ E[∇h(X)]. Specifically, cov[X1 , h(X1 , . . . , Xn )] = n X cov(X1 , Xi )E[ i=1 ∂ h(X1 , . . . , Xn )]. ∂xi (2) (3) Proof: Let Z = (Z1 , . . . , Zn ), where the Zi are i.i.d. N (0, 1) random variables. It is straightforward to show by (1) that for any g(X), differentiable almost everywhere, cov[Zi , g(Z)] = E[∂g(Z)/∂zi ]. Hence cov[Z, g(Z)] = E[∇g(Z)]. Also see Stein (1981) for a more elaborate proof. Now for the random vector X, we can write X = Σ1/2 Z + µ, and g(Z) = h(Σ1/2 Z + µ). Hence the left hand side of (2) is cov[X, h(X)] = cov[Σ1/2 Z, g(Z)] = Σ1/2 E[∇g(Z)]. Since ∇g(Z) = Σ1/2 ∇h(X), the identity (2) follows immediately. 2 2 REMARK: A version of this lemma appears in Stein (1981) with Σ being the identity matrix. However, the search of an explicit formula as (2) is not successful, although it is believed that the form is well-known in the fields of sliced inverse regression and decision theory. Siegel (1993)’s method can also be applied to prove the identity (2) but is less direct. We find that Siegel’s theorem is closely related to Stein’s identities and can be generalized to covariances of X1 with other order statistics, other functionals, and for other distributions. 2 A simple proof of the generalized Siegel’s formula Let fu (z) = min(z, u). Then fu is piecewise linear in z. Hence for Z ∼ N (µ, σ 2 ), Stein’s identity can be applied to get that 0 cov[Z, fu (Z)] = σ 2 E[fu (Z)] = σ 2 E(I{Z<u} ) = σ 2 Pr[Z = min(Z, u)], which easily leads to Lemmas 2 and 3 of Siegel (1993). Then some more properties of the normal distribution are used in Siegel (1993) to obtain the proof of his theorem. We find, however, that it is convenient to use a multivariate version of Stein’s identity directly to prove the following generalized Siegel’s formula. Theorem 1 Let (X1 , . . . , Xn ) be multivariate normal, such that the variables are distinct, with arbitrary mean and variance structure. Let X(i) be the ith largest among X1 , . . . , Xn , then cov[X1 , X(i) ] = n X cov(X1 , Xj )P r(Xj = X(i) ). j=1 Proof: Define h(x1 , . . . , xn ) = x(i) . Then h is piecewise linear and therefore almost everywhere differentiable. Also it is noticed that ∂h(x1 , . . . , xn )/∂xj is one if xj = x(i) and zero if xj 6= x(i) . Hence by applying identity (3) we obtain the result. 2 REMARK: When X1 , . . . , Xn are i.i.d., Carl Morris and I found the following ancillary argument to prove the above result. Since X(i) − X̄ is independent of X̄, cov(X̄, X(i) ) = cov(X̄, X̄) = var(X1 )/n. On the other hand, since cov(Xj , X(i) ) = cov(Xk , X(i) ) for any j and k because of the i.i.d. assumption, we obtain that cov(X1 , X(i) ) = var(X1 )/n. For a multivariate normal random vector, the covariance of X1 with min(|X1 |, . . . , |Xn |) can also be computed similarly. Define h(x1 , . . . , xn ) = min(|x1 |, . . . , |xn |), then ∂h(x)/∂xj is equal to 3 1 if 0 < xj < u; −1 if −u < xj < 0; and 0 if |xj | > u, where u = min(|Xk |, k 6= j). Hence Stein’s identity (3) leads to the following corollary. Corollary 1 Let Y = min(|X1 |, . . . , |Xn |), where (X1 , . . . , Xn ) is multivariate normal with arbitrary mean and covariance structure. Then cov(X1 , Y ) = n X cov(X1 , Xi )[Pr(|Xi | = Y, Xi > 0) − Pr(|Xi | = Y, Xi < 0)]. i=1 When X1 , . . . , Xn have mean zero, the above covariance equals zero. 3 Generalizations to other distributions Stein’s identities for other distributions can also be used to obtain similar results. Corollary 2 Let X1 ∼Poisson(λ1 ) be independent of X2 , . . . , Xn , then cov[X1 , min(X1 , . . . , Xn )] = λ1 P r[X1 < Xj , for all j 6= 1). Proof: Define fu (x) the same way as before, then by Stein’s identity for a Poisson distribution: cov[X1 , fu (X1 )] = λ1 E[fu (X1 + 1) − fu (X1 )] = λ1 Pr(X1 < u). Hence conditionally cov[X1 , fu (X1 ) | X2 , . . . , Xn ] = λ1 Pr[X1 < min(X2 , . . . , Xn ) | X2 , . . . Xn ]. Since cov(U, V )=E[cov(U, V |W )]+cov[E(U |W ), E(V |W )], and X1 is independent of Xj , j 6= 1, the conclusion follows. 2 Corollary 3 Let X1 ∼ tν (0, σ 2 ) be independent of X2 , . . . , Xn , then cov[X1 , min(X1 , . . . , Xn )] = r ν σ 2 P r[X1∗ < min(X2 , . . . , Xn )], ν−2 where X1∗ ∼ tν−2 (0, σ 2 ). Let X(i) be the ith order statistics, then more generally, cov[X1 , X(i) ] = r ν σ 2 P r[X(i−1) < X1∗ < X(i+1) ]. ν−2 Proof: Use Stein’s identity for t-distribution. 2 4 More generally, Morris (1983) provides Stein-type identities (theorem 5.2) for natural exponential families with quadratic variance functions (NEF-QVF), which can be used to generalize the above arguments to a broader class of distributions (NEF-QVF) without much difficulty. To illustrate, we compute cov[Y1 , min(Y1 , . . . , Yn )] where Yi ∼ Exponential(θi ) are independent, E(Yi ) = 1/θi , and var(Yi ) = 1/θi2 . Stein’s identity takes the form cov[Y1 , f (Y1 )] = 1 0 E[Y1 f (Y1 )]. θ1 Let fu (x) = min(x, u), then cov[Y1 , fu (Y1 )] = (1/θ1 )E(Y1 I{Y1 <u} ). Since U = min(Y2 , . . . , Yn ) ∼ Exponential(θ2 + · · · + θn ), cov[Y1 , min(Y1 , . . . , Yn )] = 1 1 1 E[E(Y1 I{Y1 <U } | U )] = E[Y1 Pr(U > Y1 | Y1 )] = . θ1 θ1 (θ1 + · · · + θn )2 REMARK: Firstly, it is noted that this covariance does not depend on θ1 at all, and a more general version is cov[Yi , min(Y1 , . . . , Yn )] = var[min(Y1 , . . . , Yn )]. Secondly, Professor Carl Morris provides another ancillary argument for the result: consider a location shift family of exponential distributions with known scale parameters θ1 , . . . , θn . Then Y(1) = min(Y1 , . . . , Yn ) is complete sufficient for the location parameter. Hence by Basu’s theorem, Y1 − Y(1) is independent of Y(1) , and therefore the result holds. Acknowledgement: The author thanks Professors Hal Stern and Carl Morris for bringing up the problem and providing very stimulating discussions. Professor Morris also suggested the computations for the exponential distribution. REFERENCES Morris, C.N. (1983) Natural exponential families with quadratic variance functions: statistical theory, Ann. Statist., 11, 515-529. Siegel, A.F. (1993) A surprising covariance involving the minimum of multivariate normal variables, J. Amer. Statist. Assoc., 88, 77-80. Stein, C.M. (1972) A bound for the error in the normal approximation to the distribution of a sum of dependent random variables, Proc. Sixth Berkeley Symp. Math. Statist. Probab., 2, 583-602. 5 Stein, C.M. (1981) Estimation of the mean of a multivariate normal distribution, Ann. Statist., 6, 1135-1151. 6