Download SOME FINITE SAMPLE PROPERTIES OF NEGATIVELY

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Predictive analytics wikipedia , lookup

Least squares wikipedia , lookup

Simplex algorithm wikipedia , lookup

Regression analysis wikipedia , lookup

Randomness wikipedia , lookup

Hardware random number generator wikipedia , lookup

Transcript
Teor i
movr. ta matem. statist.
Vip. 77, 2007, stor. 145–153
Teor. Ĭmovir. ta Matem. Statyst.
No. 77, 2007, pp. 145–153
SOME FINITE SAMPLE PROPERTIES OF NEGATIVELY
DEPENDENT RANDOM VARIABLES
UDC 519.21
ALESSIO FARCOMENI
Abstract. We discuss some finite sample properties of vectors of negatively dependent random variables. We extend some inequalities widely used for independent
random variables to the case of negatively dependent random variables, and some
basic tools like the symmetrization lemma.
A sequence of random variable is said to be negatively (positively) dependent if
n
P
{Xi ≤ zi } ≤ (≥)
P(Xi ≤ zi )
i=1
and
P
n
{Xi > zi }
≤ (≥)
P(Xi > zi ),
i=1
for zi ∈ R, i = 1, . . . , n. Negative dependence is implied for instance by negative association. A sequence of random variables is said to be negatively (positively) associated if
for all monotonically coordinate-wise non-decreasing functions g1 and g2 ,
Cov[g1 (X1 , . . . , Xn ), g2 (X1 , . . . , Xn )] ≤ (≥) 0,
when it exists. Positive association was introduced in [3], and negative association in [4].
Negative association implies negative
dependence.
Moreover, it is straightforward to
prove that under either condition E[ Xi ] ≤ (≥) E[Xi ]. Any subset of a set of negatively associated or negatively dependent random variables is still negatively associated
or negatively dependent; and any non decreasing function of negatively (positively) associated random variables is still negatively (positively) associated. Some further basic
properties are given for instance in [7] and [2] make a review of concepts of negative
dependence.
We provide now a brief list of the most important cases of negatively associated
random variables: multivariate normal random variables with non positive (non negative)
correlations are negatively (positively) associated. Independent random variables are
both positively and negatively associated. Multinomial, multivariate hypergeometric,
Dirichlet random variables are always negatively associated. For other examples, refer
for instance to [4].
In this paper we show some properties of negatively dependent random variables. The
main goal is to extend tools and inequalities widely used for independent random variables. In Section 1 we provide the symmetrization lemma under arbitrary dependence.
In Section 2 we provide exponential tail inequalities and a symmetrization argument
for negatively dependent random variables. First, the well known Hoeffding inequality
2000 Mathematics Subject Classification. Primary 60E15, 47N30.
Key words and phrases. negative dependence, association, Hoeffding inequality, exponential tail inequality, bounded difference inequality, empirical distribution, symmetrization lemma.
145
SOME FINITE SAMPLE PROPERTIES OF NEGATIVELY DEPENDENT RANDOM VARIABLES146
is proved for negatively dependent random variables, then we provide extension of the
bounded difference inequality. Finally, we show a symmetrization argument.
1. Tools
We begin by proving an extension of the symmetrization lemma under arbitrary dependence.
Definition 1 (Separability). Let (Y (u), u ∈ U) be a family of random variables on a
probability space (Ω, F , P). The family is called separable if there exists a countable set
U0 ⊆ U and a set E ∈ F such that
(1) P(E) = 1,
(2) for any ω ∈ E and for any u ∈ U there exists a sequence (uj , j ≥ 1) in U0 such
that Y (uj , w) → Y (u, w) for j → ∞.
Lemma 1 (Symmetrization Lemma). Let (Y (u), u ∈ U) be a family of separable random variables, and (Y (u), u ∈ U) an independent copy of (Y (u), u ∈ U) with the same
joint distribution for any u1 , . . . , un (that is, with the same dependency structure). Let
P(|Y (u)| > ε/2) ≤ 1/2 for any u ∈ U. Then
P sup |Y (u)| > ε ≤ 2 P sup |Y (u) − Y (u)| > ε/2
u
u
for any ε > 0.
Proof. If (Y (u), u ∈ U) is separable, then also (Y (u), u ∈ U) is. Moreover, there exists
a countable set U0 ⊆ U such that supu∈U |Y (u)| = supu∈U0 |Y (u)|. Let ui be the i-th
element of U0 . Let A1 = {|Y (u1 )| > ε}, and
Ai = Y (u1 )| ≤ ε, . . . , |Y (ui−1 )| ≤ ε, |Y (ui )| > ε
for i ≥ 2. Note that if |Y (ui )| > ε and |Y (ui )| ≤ ε/2 then |Y (ui ) − Y (ui )| > ε/2. We
have
1
1 P sup |Y (u)| > ε =
P(Ai )
2
2
u∈U
i∈U0
≤
P(Ai ) P(|Y (ui )| ≤ ε/2)
=
P(Ai , |Y (ui )| ≤ ε/2)
≤
P(Ai , |Y (ui ) − Y (ui )| > ε/2)
≤
P Ai , sup |Y (u) − Y (u)| > ε/2
u∈U0
≤ P sup |Y (u) − Y (u)| > ε/2 . u∈U
2. Exponential tail inequalities
2.1. Hoeffding inequality. The key part in proving Hoeffding inequality for negatively
dependent random variables is the thesis of Lemma 2. Note that it can hold also under
different assumptions. It is straightforward to see, for instance, that Lemma 2 is true for
a vector of binary random variables whenever the covariance between any two of them
is non positive.
, . . . Xn is a vector of negatively associated random variables.
Lemma 2. Suppose
X1
Then E(exp{t Xi }) ≤ E(exp{tXi }) for any t > 0.
SOME FINITE SAMPLE PROPERTIES OF NEGATIVELY DEPENDENT RANDOM VARIABLES147
Proof. This is a straightforward generalization of Lemma 1 in [6], stemming from the
fact that if X1 , . . . , Xn is a vector of negatively dependent random variables, than
etX1 , . . . , etXn is also negatively dependent.
Theorem 1 (Hoeffding inequality). Let X1 , . . . , Xn be a sequence of negatively
dependent
random variables. Let P(ai < Xi < bi ) = 1, E(Xi ) = 0. Let Sn = i (bi − ai )2 /8. Let
ε > 0. Then, for any t > 0,
2
(1)
P
Xi ≥ ε ≤ e−tε+t Sn .
Proof. By Markov theorem and by Lemma 2,
Xi ≥ tε
P
Xi ≥ ε = P t
= P et Xi ≥ etε
≤ e−tε E et Xi
E etXi .
≤ e−tε
The key difference between this proof and the one for independent random variables is at
the third step, where we replace equality with an inequality sign. The rest of the proof
is analogue to the proof of Hoeffding inequality for independent random variables.
It is interesting to note that defining Sn = i E[Xi2 ] [1] show an inequality (eq. 2.8)
analogous to (1), with t close enough to zero, for unbounded negatively dependent random
variables, only putting some additional assumptions on the higher order moments.
2.2. Bounded difference inequality. A generalization of Hoeffding inequality is given
by the Bounded Difference inequality, often used to convert bounds for the expected
value to exponential tail inequalities. The Bounded Difference inequality for independent
random variables was first derived in [5].
Theorem 2 (Bounded difference inequality). Suppose g(·) satisfies the bounded difference assumption
g(x1 , . . . , xn ) − g(x1 , . . . , xi−1 , xi , xi+1 , . . . , xn ) ≤ ci ,
(2)
sup
x1 ,...,xn ;xi ∈A
for 1 ≤ i ≤ n, and any set A. Suppose X1 , . . . , Xn is a vector of negatively dependent
random variables. Then, for all t > 0,
n
2
2
P |g(X1 , . . . , Xn ) − E[g(X1 , . . . , Xn )]| ≥ t ≤ 2 exp −2t
ci .
i=1
Proof. We will prove that
n
2
2
P g(X1 , . . . , Xn ) − E[g(X1 , . . . , Xn )] ≥ t ≤ exp −2t
ci .
i=1
Similarly it can be proved that
n
2
2
P E[g(X1 , . . . , Xn )] − g(X1 , . . . , Xn ) ≥ t ≤ exp −2t
ci .
i=1
Combination of these two results yields the thesis.
(3)
SOME FINITE SAMPLE PROPERTIES OF NEGATIVELY DEPENDENT RANDOM VARIABLES148
The hypothesis of negative dependence is needed by the application of Theorem 1 in
this straightforward extension: let V and Z be such that E[V |Z] = 0, and for some h(·)
and c > 0, h(Z) ≤ V ≤ h(Z) + c. Then, for all s > 0,
2 2
E esV Z ≤ es c /8 .
(4)
Denote now V = g(X1 , . . . , Xn ) − E[g(X1 , . . . , Xn )] and
Hi (X1 , . . . , Xi ) = E[g(X1 , . . . , Xn )|X1 , . . . , Xi ]
and for any i define
Vi = Hi (X1 , . . . , Xi ) − Hi−1 (X1 , . . . , Xi−1 ).
Let Fi (x) = P(Xi < x | X1 , . . . , Xi−1 ). Clearly, V = Vi and
Hi−1 (X1 , . . . , Xi−1 ) = Hi (X1 , . . . , Xi−1 , x) Fi (dx).
(5)
Define moreover
Wi = sup H(X1 , . . . , Xi−1 , u) − Hi−1 (X1 , . . . , Xi−1 )
u
and
Zi = inf H(X1 , . . . , Xi−1 , u) − Hi−1 (X1 , . . . , Xi−1 ).
u
Clearly, P(Zi ≤ Vi ≤ Wi ) = 1 and
Wi − Zi = sup sup H(X1 , . . . , Xi−1 , u) − H(X1 , . . . , Xi−1 , v) ≤ ci
u
v
by the bounded difference assumption. Therefore, by (4), for any i
2 2
E esVi X1 , . . . , Xi−1 ≤ es ci /8 .
(6)
(7)
Finally, by Chernoff bound, for any s > 0
E [exp {s ni=1 Vi }]
P g(X1 , . . . , Xn ) − E[g(X1 , . . . , Xn )] ≥ t ≤
est
n−1
sVn X1 , . . . , Xn−1
E exp s i=1 Vi E e
=
st
e n−1
E exp s i=1 Vi
2 2
≤ es cn /8
st
e −st
2
≤e
exp s
c2i /8 ,
by repeating the same argument n times. Choosing s = 4t/ c2i yields inequality (3). 2.3. Inequalities for the empirical measure. We now provide some inequalities for
the empirical measure of negatively dependent random variables. In what follows we
define the empirical measure of a vector of random variables X1 , . . . , Xn as
1
1xi ∈A ,
μn (A) =
n i
while the empirical distribution considers a restriction to classes of sets A = (−∞, z] and
n
will be denoted by F̂ (z) = n−1 i=1 1Xi ≤z for z ∈ R.
We will now restrict to a class of sets such that
P(Xi ∈ A, Xj ∈ A) ≤ P(Xi ∈ A) P(Xj ∈ A)
for any A in the class, and do not explicitly request for negative dependence. A class
of this kind will be for instance the class of sets of the form (−∞, z], for z real, under
(pairwise) negative dependence for the random variables.
SOME FINITE SAMPLE PROPERTIES OF NEGATIVELY DEPENDENT RANDOM VARIABLES149
Theorem 3. Let X1 , . . . , Xn be a set of identically distributed random variables, and
X1 , . . . , Xn be an independent copy of X1 , . . . , Xn . Let μ(A) = P(Xi ∈ A),
1
μn (A) =
1Xi ∈A
n
and μn (A) = n−1 1Xi ∈A . Suppose there exist A, a class of sets such that
P(Xi ∈ Ai , Xj ∈ Aj ) ≤ P(Xi ∈ Ai ) P(Xj ∈ Aj )
for Ai , Aj ∈ A, and that (μn (A), A ∈ A) is separable. We have that for any ε > 0 and
n ≥ 2/ε2
P sup |μn (A) − μ(A)| > ε ≤ 2 P sup |μn (A) − μn (A)| > ε/2 .
(8)
A∈A
Furthermore,
A∈A
E sup |μn (A) − μ(A)| ≤ E sup |μn (A) − μn (A)| .
A∈A
(9)
A∈A
Proof. Let X1 , . . . , Xn be an independent copy of X1 , . . . , Xn : a vector of random variables independent
from the first one but with the same dependence structure. Let
μn (A) = n−1 1Xi ∈A and recall that under (pairwise) negative dependence
P(Xi ≤ xi ∩ Xj ≤ xj ) ≤ P(Xi ≤ xi ) P(Xj ≤ xj ).
(10)
Our hypothesis is slightly more general, since we do not assume negative dependence but
only that
(11)
P(Xi ∈ Ai ∩ Xj ∈ Aj ) ≤ P(Xi ∈ Ai ) P(Xj ∈ Aj ).
Negative dependence and sets of the form (−∞, z] will suffice for the assumption (11).
Main steps of the proof are as follows: it is easy to see that under the assumptions
V [μn (A) − μ(A)] ≤
μ(A)(1 − μ(A))
.
n
(12)
In fact, we have that E[μn (A) − μ(A)] = 0, which implies
V [μn (A) − μ(A)] = E[(μn (A) − μ(A))2 ].
We have
1 2
1Xi ∈A 1Xj ∈A − μ2 (A)
E (μn (A) − μ(A)) = E 2
n ij
=
1 E[1Xi ∈A 1Xj ∈A ] − μ2 (A)
n2 ij
1 P(Xi ∈ A, Xj ∈ A) − μ2 (A)
n2 ij
n
1 = 2
P(Xi ∈ A, Xj ∈ A) +
P(Xi ∈ A) − n2 μ2 (A)
n
i=1
i=j
1
≤ 2
P(Xi ∈ A) P(Xj ∈ A) + nμ(A) − n2 μ2 (A) ,
n
=
i=j
where we used inequality (11) in the last step. Last expression is easily seen to be equal
to μ(A)(1 − μ(A))/n, as desired; hence (12) is true.
SOME FINITE SAMPLE PROPERTIES OF NEGATIVELY DEPENDENT RANDOM VARIABLES150
We can now apply Chebyshev inequality to random variable (μn (A) − μ(A)), together
with inequality (12), to obtain
4
ε
≤ 2 V [(μn (A) − μ(A))]
P |μn (A) − μ(A)| >
2
ε
4 μ(A)(1 − μ(A)
≤ 2
ε
n
1
2
1
≤ 2 ≤
for all n ≥ 2 .
nε
2
ε
We can then apply Lemma 1, since we have separability. Hence, for n ≥ 2/ε2 ,
P sup |μn (A) − μ(A)| > ε ≤ 2 P sup |μn (A) − μn (A)| > ε/2 .
(13)
A∈A
A∈A
To see the second inequality, by Jensen inequality and law of iterated expectation we
get
E sup |μn (A) − μ(A)| ≤ E sup E[|μn (A) − μn (A)||X1 , . . . , Xn ]
A∈A
≤ E [sup |μn (A) − μn (A)|] . If we restrict to n = 2, we can provide a further randomization argument:
Theorem 4. Let X1 and X2 be negatively dependent random variables, and X1 and X2
be independent copy. Let σ1 and σ2 be independent sign variables, such that
1
P(σi = 1) = P(σi = −1) = .
2
Let μ(A) = P(Xi ∈ A) and μ2 (A) = 12
1xi ∈A . Suppose there exist A, a class of
sets such that P(X1 ∈ A, X2 ∈ A) ≤ P(X1 ∈ A) P(X2 ∈ A) for A ∈ A; and that
(μ2 (A), A ∈ A) is separable. We have that for any A ∈ A
ε
1
ε
≤P
P |μ2 (A) − μn (A)| >
σi 1Xi ∈A − 1Xi ∈A >
.
2
2
2
To prove Theorem 4 we need a preparatory lemma, whose results may be of interest
per se when working with negatively dependent random variables.
Lemma 3. Suppose for any A ∈ A that P(X2 ∈ A, X1 ∈ A) ≤ P(X2 ∈ A) P(X1 ∈ A).
Then
P(X2 ∈ A | X1 ∈ A) ≤ P(X2 ∈ A),
while if at least one of the two marginal probabilities is smaller than 1
/ A) ≥ P(X2 ∈ A).
P(X2 ∈ A | X1 ∈
Moreover,
P(X2 ∈ A, X1 ∈
/ A) ≥ P(X2 ∈ A) P(X1 ∈
/ A),
and finally
/ A, X1 ∈
/ A) ≤ P(X2 ∈
/ A) P(X1 ∈
/ A).
P(X2 ∈
Proof. The first inequality follows from the definition of conditional probability. To see
the second one we can apply Bayes theorem and the first inequality to get:
P(X2 ∈ A | X1 ∈
/ A) = P(X1 ∈
/ A | X2 ∈ A) P(X2 ∈ A)/ P(X1 ∈
/ A)
= (1 − P(X1 ∈ A | X2 ∈ A)) P(X2 ∈ A)/ P(X1 ∈
/ A)
≥ (1 − P(X1 ∈ A)) P(X2 ∈ A)/ P(X1 ∈
/ A)
= P(X2 ∈ A),
SOME FINITE SAMPLE PROPERTIES OF NEGATIVELY DEPENDENT RANDOM VARIABLES151
assuming without loss of generality that P(X1 ∈ A) < 1. The third inequality follows
from
/ A) = P(X1 ∈
/ A | X2 ∈ A) P(X2 ∈ A)
P(X2 ∈ A, X1 ∈
= (1 − P(X1 ∈ A | X2 ∈ A)) P(X2 ∈ A)
≥ (1 − P(X1 ∈ A)) P(X2 ∈ A).
While the last inequality follows from
P(X2 ∈
/ A, X1 ∈
/ A) = (1 − P(X2 ∈ A | X1 ∈
/ A)) P(X1 ∈
/ A)
≤ (1 − P(X2 ∈ A)) P(X1 ∈
/ A).
We can now prove Theorem 4
Proof. Let now X1 and X2 be an independent copy of X1 and X2 . By Theorem 3 we
claim that
P(|μ2 (A) − μ(A)| > ε) ≤ 2 P(|μ2 (A) − μn (A)| > ε/2),
since that is true if one takes the supremum over A ∈ A inside.
Let now σ1 and σ2 be sign variables. We will see that a randomization argument
can be used here. A possibility for further work is to try and extend the randomization
argument for arbitrary n. We have now that
ε
1
ε
≤P
σi 1Xi ∈A − 1Xi ∈A >
P |μ2 (A) − μn (A)| >
.
(14)
2
2
2
If the random variables are independent, equality follows from identical distribution
after randomization. Under dependence, identical distribution is not guaranteed any
more, as in this case.
To see (14), let
Yi = 1Xi ∈A − 1Xi ∈A .
We now show that P(Y2 = 1, Y1 = 1) = P(Y2 = −1, Y1 = −1):
P(Y2 = 1, Y1 = 1) = P(Y2 = −1, Y1 = −1)
⇐⇒ P(Y2 = 1 | Y1 = 1) = P(Y2 = −1 | Y1 = −1)
⇐⇒ P(X2 ∈ A, X2 ∈
/ A | X1 ∈ A, X1 ∈
/ A)
= P(X2 ∈
/ A, X2 ∈ A | X1 ∈
/ A, X1 ∈ A)
/ A | X1 ∈
/ A)
⇐⇒ P(X2 ∈ A | X1 ∈ A) P(X2 ∈
= P(X2 ∈
/ A | X1 ∈
/ A) P(X2 ∈ A | X1 ∈ A),
And the last equality is true since we took X1 and X2 to be a copy of X1 and X2
with the same dependency structure. With the same strategy it can be seen that, for
k = {−1, 0, 1},
P(Y2 = k, Y1 = k) = P(Y2 = −k, Y1 = −k),
P(Y2 = k, Y1 = −k) = P(Y2 = −k, Y1 = k),
and finally that
P(Y2 = k, Y1 = 0) = P(Y2 = −k, Y1 = 0).
On the other hand,
P(Y2 = 1, Y1 = 1) ≤ P(Y2 = −1, Y1 = 1).
We can in fact apply the results of Lemma 3 to show that:
/ A, X1 ∈
/ A)
P(Y2 = 1, Y1 = 1) = P(X2 ∈ A, X2 ∈ A, X1 ∈
= P(X2 ∈ A, X1 ∈ A) P(X2 ∈
/ A, X1 ∈
/ A)
≤ P(X2 ∈ A) P(X2 ∈
/ A) P(X1 ∈ A) P(X1 ∈
/ A)
(15)
SOME FINITE SAMPLE PROPERTIES OF NEGATIVELY DEPENDENT RANDOM VARIABLES152
and
/ A, X1 ∈
/ A, X1 ∈ A)
P(Y2 = 1, Y1 = −1) = P(X2 ∈ A, X2 ∈
= P(X2 ∈ A, X1 ∈
/ A) P(X2 ∈
/ A, X1 ∈ A)
≥ P(X2 ∈ A) P(X2 ∈
/ A) P(X1 ∈
/ A) P(X1 ∈ A).
By identical distribution of Xi and Xi , (15) follows.
With few explicit calculations the results on the joint distribution of (Y1 , Y2 ) can be
used to compute the distribution of the empirical measure
P(|μ2 (A) − μ2 (A)| = 1) = P(Y1 + Y2 = 2) + P(Y1 + Y2 = −2)
= 2 P(Y1 = 1, Y2 = 1).
On the other hand,
P σi 1Xi ∈A − 1Xi ∈A = 2 = 2 P(σ1 Y1 = 1, σ2 Y2 = 1)
= 2 P(Y1 = 1, Y2 = 1) P(σ1 = σ2 ) + P(σ1 = σ2 ) P(Y1 = 1, Y2 = −1)
≥ 2 P(Y1 = 1, Y2 = 1).
Moreover,
P(|μ2 (A) − μ2 (A)| = 0) = P(Y1 = 0, Y2 = 0) + 2 P(Y1 = 1, Y2 = −1),
while
P(|σ1 Y1 + σ2 Y2 | = 0) = P(Y1 = 0, Y2 = 0) + 2 P(σ1 = σ2 ) P(Y1 = 1, Y2 = −1)
+ 2 P(σ1 = σ2 ) P(Y1 = 1, Y2 = 1)
≤ P(Y1 = 0, Y2 = 0) + 2 P(Y1 = 1, Y2 = −1).
The results are combined to see
ε
1
ε
≤P
σi (1Xi ∈A − 1Xi ∈A ) >
,
P |μ2 (A) − μn (A)| >
2
2
2
that is, inequality (14).
3. Discussion
We proved that negatively dependent random variables enjoy certain special properties of independent random variables, in particular in terms of Hoeffding and bounded
difference inequalities and of the possibility to apply symmetrization. These tools pave
the road to inequalities for the empirical distribution of negatively dependent random
variables.
Ackwnoledgements. The author is grateful to Prof. Enzo Orsingher for advice and
encouragement, and to a referee for clarifying review and pointing out the reference [1].
References
1. M. D. Amini and A. Bozorgnia, Complete convergence for negatively dependent random sequences, Journal of Applied Mathematics and Stochastic Analysis 16 (2003), 121–126.
2. H. W. Block, T. H. Savits, and M. Shaked, Some concepts of negative dependence, The Annals
of Probability 10 (1982), 765–772.
3. J. D. Esary, F. Proschan, and D. W. Walkup, Association of random variables, with applications, The Annals of Mathematical Statistics 38 (1967), 1466–1474.
4. J. D. Kumar and F. Proschan, Negative association of random variables with applications, The
Annals of Statistics 11 (1983), 286–295.
5. C. McDiarmid, On the method of bounded differences, Surveys in Combinatorics, Cambridge
University Press, 1989, pp. 148–188.
6. A. Volodin, On the Kolmogorov exponential inequality for negatively dependent random variables, Pakistan Journal of Statistics 18 (2002), 249–253.
7. Y. L. Tong, Probability Inequalities in Multivariate Distributions, Academic Press, 1980.
SOME FINITE SAMPLE PROPERTIES OF NEGATIVELY DEPENDENT RANDOM VARIABLES153
University of Rome “La Sapienza”, Piazzale Aldo Moro, 5,00185 Roma, Italy
E-mail address: [email protected]
Received 10/08/2006