Download Stochastic Orders Induced - Georgia State University

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability interpretations wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Inductive probability wikipedia , lookup

Infinite monkey theorem wikipedia , lookup

Central limit theorem wikipedia , lookup

Transcript
Stochastic Order Induced
by a
Measurable Preorder
by
David C. Nachman
Department of Finance
J. Mack Robinson College of Business
Georgia State University
Atlanta, Georgia 30303-3083
Phone: 404-651-1696
Fax: 404-651-2630
E-mail: [email protected]
November, 2005
Abstract. Kamae, et. al. [8, Theorem 1] presents a general characterization of the partial
ordering of probability measures induced by a closed partial ordering on the underlying
Polish state space. A preorder is a reflexive and transitive, but not necessarily
antisymetric relation. This paper presents a similar characterization of the preordering of
probability measures induced by a measurable preordering on the underlying Polish
space. We then apply this result to obtain a characterization of stochastic majorization,
the preorder induced by the widely applied majorization preorder on Euclidean space. We
show that a multifunction associated with this preorder is compact and convex valued and
continuous, and hence satisfies the hypotheses of our characterization. The continuity
properties of the majorization ordering and the induced stochastic majorization ordering
have not been widely recognized and are of interest in their own right.
Kamae, et. al. [8, Theorem 1] presents a general characterization of the partial
ordering of probability measures induced by a closed partial ordering on the underlying
Polish state space. A preorder is a reflexive and transitive, but not necessarily
antisymetric relation, and so is a weaker kind of relation than a partial order. The
intention here is to present a similar characterization of the preordering of probability
measures induced by a measurable preordering on the underlying Polish space. Marshall
and Olkin [9, Ch. 17.C] already hint that Kamae, et. al. [8, Proposition 1] is “. . . a much
more general version involving general preorders.”
Specifically, we exploit the
properties of the preorder and theorems of Strassen [11] and Himmelberg and Van Vleck
[6] to provide a Kamae, et. al. like characterization of the induced stochastic preorder.
We then apply this result to obtain a characterization of stochastic majorization,
Marshall and Olkin [9, Ch. 11], the preorder induced by the widely applied majorization
preorder on Euclidean space. We show that a multifunction associated with this preorder
is compact and convex valued and continuous, and hence satisfies the hypotheses of our
characterization. The continuity properties of the majorization ordering and the induced
stochastic majorization ordering have not been widely recognized and are of interest in
their own right.
A Borel space S is a Borel subset a Polish space, a complete separable metric
space. A multifunction from S to a set T is a function with domain S with value F  s 
a
nonempty
subset
of
T,
for
each
s S .
It
is
Borel
measurable
if
F 1  B  s  S : F  s   B   is a Borel set in S for each closed subset B of T .
Let R be a Polish space. Throughout R is assumed to be endowed with a
preorder, denoted by
, a reflexive and transitive relation on R . For each x R , let
  x    y R : y
x . By reflexivity, x    x  , and by transitivity, y    x  implies
that   y     x  . Thus  is a multifunction from R to R . We say the preorder
is a
measurable preorder if the multifunction  is Borel measurable. We assume that  is a
compact valued and Borel measurable. For compact valued multifunctions, Borel
measurability has many equivalent definitions. See [7, Theorem 3].
Let P  R  , P  R  R  , denote the set of probability measures on R , respectively
on R  R  R 2 , endowed with the topology of weak convergence. In this case,
P  R  , P  R  R  are both Polish spaces (in the Prohorov metric, [2, Theorem 6.8]). For
  P  R  denote by supp    the support of  , the smallest closed subset of R with 
measure one, [3, p. 18]. For each x R , let   x     P  R  : supp       x  and let
 x  P  R  be the probability measure such that supp  x   x .
Theorem 1. For each x R ,  x   x  and y    x  implies that   y     x  .
 is a compact and convex valued Borel measurable multifunction from R to P  R  .
Proof. The first statement follows from the same properties of  . Thus  is a
multifunction. For each x R ,   x  is convex, closed and tight. By Prohorov’s
theorem, [2, Theorem 5.1],   x  is compact. That  is Borel measurable follows from
Himmelberg and Van Vlect [6, Theorem 3 (ii)] .
Let F  denote the collection of real valued Borel measurable functions on R that
are increasing (nondecreasing) in the preorder
on R , i. e., for f : R  R , R the real
line, f Borel measurable, f  F  if and only if for x, y  R and x
y , f  x  f  y  .
to P  R  as follows. For  ,  P  R  , say that  is larger in
We extend the preorder
this extended preorder than  , and write   , if and only if
 fd    fd
f  F  for which both these integrals exist. Then our extension of
an extension since for x, y  R , x
y in R if and only if  x
for every
to P  R  is truly
 y in P  R  .
Intuitively,   if  puts more weight on elements that are less extreme in the
relation
on R than does  . This intuition is formalized by the characterization of
on P  R  given below (Theorem 2).
This definition of
on P  R  is typical in the literature on stochastic orderings
([8] and [9, 17.A.3], but there are many others. See [10, Chs. 1, 4]. There are many
definitions given in terms of R valued random quantities say X and Y . For example,
one version of stochastic majorization of interest here is the relation E1 in [9, Ch. 11].
This relation, denoted
E1
, is stated as X
E1
Y , Y stochastically majorizes X in the
sense of E1 , if E  f  X    E  f Y   for all f  F  for which these expectations exist.
In [9, Ch. 11], F  is the cone of Borel measurable Schur convex functions. It is easy to
see that this is equivalent to the above definition since these expectations are given by
integration with respect to the distributions in R of these random quantities and given
these distributions there are R valued random quantities with these distributions.
There is a another definition of stochastic majorization that Marshall and Olkin
[9, pp. 282-283] call P1 that implies E1 and appears ostensibly to be stronger than E1 .
There X
P1
Y if f  X  st f Y  for all f  F  , where st is the typical meaning of
stochastically larger, [10, 1.A]. Clearly P1 implies E1 since stochastically larger random
variables have larger expectations. It turns out that in this particular case we also have E1
implies P1 . See the argument in [9, top of p. 283]. We will use this argument to show
one part of the characterization of the relation
orders
E1
and
P1
in P  R  defined above. Here the
are the orderings as defined above, but in the abstract setting of this
paper.
Let B denote the Borel subsets of R . A Markov kernel on R is a map
m : R  B  [0,1] such that for each set B  B the map x  m  x, B  is Borel measurable
and for x R fixed m  x   m  x,   P  R  . For such a Markov kernel m and a
probability measure   P  R  denote by  m the element of P  R 2  defined by
 m  A  B    m  x, B   dx  , for measurable rectangles, A, B  B . We say that the first
A
marginal of  m is  and denote the second marginal  m . Finally, we say that a set
B  R is increasing if its indicator function belongs to F  (necessarily, then B  B ).
These designations are borrowed from [8, pp. 899-900].
The following is the desired characterization of
on P  R  and flushes out the
intuition given above.
Theorem 2. For  ,  P  R  the following are equivalent:
(i)   ;
(ii) There exists a Markov kernel m on R such that    m and m  x    x  ,  almost
every x R ;
(iii) There exists a probability measure   P  R 2  with   K   1 with first marginal 
and second marginal  ;
(iv) There exists a real valued random variable Z and two measurable functions
f , g : R  R with f
g (i. e., f  t 
g  t  , t  R ) such that the distribution of f  Z 
is  and the distribution of g  Z  is  ;
(v) There exist R valued random variables Y and X such that X
P1
Y and the
distribution of X is  and the distribution of Y is  ;
(vi)   B     B  for every increasing set B  B .
Proof: The key equivalence is (i) and (ii). The rest follow easily. Let  ,  P  R 
and assume that (i) holds. For every bounded continuous function z : R  R define
h  x, z   sup
 zd :   x  . By Theorem 1 and [7, Theorem 2],
h , z  is Borel
measurable in x , and for each x R , z  x   h  x, z   sup z since  x   x  and
 zd 
x
 z  x  . Thus h , z  is bounded as well, so all integrals below exist. Finally,
h  , z   F  , for if x, y  R and x
y then by Theorem 1,   x     y  , and hence
h  x, z   h  y, z  . It follows that
 zd    h  x, z   dx    h  x, z   dx  ,
the last
inequality from (i). Condition (ii) then follows from [11, Theorem 3].
Assume (ii) and let    m . Since  is Borel measurable, its graph K is a Borel
subset of R 2 , [7, Theorem 3]. Then       m  x,  x   dx   1 , since the x -section
 x    x  for every x . This gives (iii). Therefore assume (iii). The construction in [8,
Theorem 1(iii)] goes through here as well and this gives (iv). Assuming (iv) let
X  f  Z  and Y  g  Z  . Then clearly X
E1
X
YX
P1
E1
Y and (v) follows from the fact that
Y , since F  satisfies [9, (3) p. 283]. Therefore assume (v). If B  B
and the indicator I B  F  , then   B   E  I B  X    E  I B Y      B  , which is (vi),
where the inequality follows from the fact that I B  X  st I B Y  .
It remains to show that (vi) imples (i). Assume (vi). For f  F  , IxR: f  x t  F 
for all real t . It follows from (vi) that 
Since
f   F ,
f

x  R : f  x   t   x  R : f  x   t .
d     x  R : f  x   tdt    x  R : f  x   tdt   f  d


0
0
(the equalities here hold by [1, (4), p. 223]). Also f  F  implies that  f   F  , and we
get

that

x  R : f

 x   t   x  R : f   x   t .
As
f  d    x  R : f   x   t dt    x  R : f   x   t dt  f d  (the


0
0
here again by [1, (4), p. 223]). If both the integrals
 fd  ,  fd
a
result,
equalities
exist, it follows that
 fd    fd , which is (i) .
The crucial implication (i) implies (ii) in Theorem 2 relies on Theorem 3 of
Strassen [11]. In obtaining essentially the same implication Kamae, et. al. [8, Theorem 1,
(i) implies (ii)] and [9, 17.B.1] use Theorem 11 of Strassen [11], which requires that the
graph K of the multifunction  be closed. Our assumption of measurability of  yields
the weaker condition that K is a measurable set. Of course if K is closed, as it is in the
majorization application to follow, then for the measure  in Theorem 2(iii),
supp     K .
All that is required to use Theorem 3 of [11] is that the preorder be sufficiently
regular to give Borel measurability in x of the function h  x, z  , defined above in the
first paragraph of the proof of Theorem 2. Some measurability of the multifunction 
seems essential to this result, but weaker conditions than compact valuedness of  may
give the result. See for example [5, Proposition 3, p.60]. Theorem 2 of [7], however, is
very handy.
Transitivity of the preorder gives monotonicity of h , z  in the preorder.
Reflexivity of the preorder gives  x   x  and   x  is convex whether   x  is or not.
This convexity is essential to apply [11, Theorem 3], but it comes at no cost.
The result Theorem 2(ii) formalizes our intuition expressed above that   if
 puts more weight on elements that are less extreme in the relation
on R than does
 . The Markov kernel m of Theorem 2(ii) is such that for  almost every x ,
m  x,   x    1 . In this sense m shifts weight of  to elements less extreme in the
relation
on R . Borrowing from the language suggested in [8, top of p. 900], the kernel
m might be termed “downward.”
Let us now consider the application to majorization. Let x   x1 ,
y   y1 ,

y  y1 ,
, yn 
be n-tuples
of real
numbers
and let

x  x1 ,
, xn  and
, xn

and

, yn denote the vectors x and y with coordinates rearranged in decreasing
order, i. e., x1 
 xn and y1 
(or y majorizes x ), written x
 yn . The vector x is majorized by the vector y
y , if for each k  1,
equality holding for k  n , [9, A.1, p. 7].
,n ,

x  i 1 yi with
i 1 i 
k
k
In words, x is majorized by y if the components of x are more evenly spread
out than the components of y or the components of y are more concentrated than the
components of x . This intuition is reinforced by noting the following. Let e  1,
,1 ,
the n-tuple whose coordinates are all equal to one. Then for a vector x the inner product
x  e is the sum of the components of x . Let x   x  e n,
x k  (0,
, x  e,0,
, x  e n  and let
,0) , where x  e appears in the k th component. The vectors x , x ,
and x k all have the same total sum of components, but the components of x are more
evenly spread out than those of x . Clearly x k concentrates this sum in one component.
In this sense, x is the most evenly spread of this sum of components and x k is the most
concentrated of this sum. Indeed, we have that x
x
x k , k  1,
,n .
We note that the majorization relation is reflexive and transitive (established
below in Lemma 5) and hence is a preordering. It is not a partial ordering, however,
since it is not antisymmetric. Indeed it is symmetric in that   x     x ' , where x ' is
any permutation of x .
Let Rn denote n-dimensional Euclidean space. All topological properties in the
sequel will be with respect to the usual metric on Rn . Let Π denote the set of n  n
permutation matrices and let D denote the set of n  n doubly stochastic matrices. Then
M  Π if and only if there is one one in each row and each column of M and all other
entries are zero. Similarly, M  D if and only if the entries in M are nonnegative and
each row and each column sum to one.
Theorem 3. For x, y  R n , the following are equivalent:
(i) x
y;
(ii) x  yD , some D  D ;


(iii) x  y   i  i  , for some i  0 ,
 i


i
 1 , and some  i  Π .
i
Proof: The equivalence of (i) and (ii) is due to Hardy, Littlewood and Polya. See
[9, Theorem 2.B.2]. The equivalence of (ii) and (iii) is due to Birkhoff. See [9, Theorem
2.A.2] .
For each x  R n let   x    y  R n : y
x , the set of n-tuples that are majorized
by x . For a picture of this set in the case n = 3 see [9, Figure 3, p. 9]. Let
   y, x   R n  R n : y    x  , the graph of the multifunction  . In the following, the
notions of upper and lower semicontinuity in [6] are the same as the notions of upper and
lower hemi-continuity in [5], which gives convenient characterizations of these notions in
terms of sequences in Rn . We will use the terminology of [6], but refer to these
convenient characterizations in [5]. A multifunction is continuous if it is both upper and
lower semicontinuous. The following are properties of  .
Theorem 4.  is a compact convex valued continuous multifunction in Rn .
Consequently  is closed in R 2n .
Proof: Clearly for each x  R n , x    x  , so  is a multifunction.   x  is
convex, by Theorem 3(ii) (convex combinations of doubly stochastic matrices are doubly
stochastic) and compact since, by Theorem 3(iii), it is the convex polyhedron generated
by the finite number of permutations of x .
Suppose x  R n and  xk   Rn with x  limk xk . If y    x  , by Theorem 3(ii),
y  xD , some D  D . Then again by Theorem 3(ii), yk  xk D    xk  and y  limk yk ,
so  is lower semicontinuous, [5, Theorem 2, p. 27].
Let yk  xk Dk    xk  with Dk  D arbitrary. By Birkhoff’s theorem D is the
2
convex polyhedron generated by the permutation matrices and hence is compact in Rn .
Thus there is a subsequence of the Dk that converges to an element D  D . For this
subsequence indexed by k ' , limk ' yk '  xD    x  . Thus  is upper semicontinuous, [5,
Theorem 1, p. 24]. The closure of  then follows by the same result .
As obvious as these properties of  are, except for convexity of   x  , they
appear nowhere prominently in the literature on majorization to this author’s knowledge.
Eaton and Perlman [4] do exploit the upper semicontinuity of  in the proof of their
Lemma 4.1. The following result establishes the transitivity of majorization and is needed
later for the characterization of stochastic majorization.
Lemma 5. For x, y  R n , if y    x  , then   y     x  .
Proof: For x, y  R n , suppose y    x  and z    y  . Then by Theorem 3(ii) z  yDˆ
and y  xD some D, Dˆ  D . But then z  xDDˆ and DDˆ  D , [9, 2.A.3, p. 20]. Again by
Theorem 3(ii), z    x  .
As
in
the
abstract
  x     P( R n ) : supp       x  .
graph of the multifunction  .
case,
Let
for
each
x  Rn ,
   x,    R n  P( R n ) :     x  ,
let
the
Theorem 6.  is a compact and convex valued continuous multifunction from Rn
to P( R n ) . The graph  is closed in R n  P( R n ) .
Proof: For x  R n ,  x   x  . This result is then Theorem 1 above, but
specialized to the example here. By [6, Theorem 3(i)],  inherits the continuity of 
established in Theorem 4. The closure of  follows from [5, Theorem 1, p. 24] .
The functions f : R n  R that are increasing (non-decreasing) in the majorization
relation are called Schur-convex. See [9, Ch. 1.D, Ch. 3] for the origins of this
terminology and the characterizations of this class of functions. Denote by SC the class
of Borel measurable Schur-convex functions. The measurability requirement is a
restriction, [9, 3.C.4, p. 70]. We can extend the relation
in Rn to a relation in P( R n )
as we did above by taking F   SC . This relation
in P( R n ) is the version of
stochastic majorization E1 studied in [9, Ch. 11]. Here the relations
E1
and
P1
are the
relations in [9, Ch. 11]. Let Bn denote the Borel sets in Rn and call B  B n Schur convex
if I B  SC . The following characterization of
in P( R n ) results.
Theorem 7. For  ,  P  R n  the following are equivalent:
(i)   ;
(ii) There exists a Markov kernel m on Rn such that    m and m  x    x  , 
almost every x  R n ;
(iii) There exists a probability measure   P  R 2n  with supp     K with first marginal
 and second marginal  ;
(iv) There exists a real valued random variable Z and two measurable functions
f , g : R  R n with f
g (i. e., f  t 
g  t  , t  R ) such that the distribution of f  Z 
is  and the distribution of g  Z  is  ;
(v) There exist Rn valued random variables Y and X such that X
P1
Y and the
distribution of X is  and the distribution of Y is  ;
(vi)   B     B  for every Schur convex set B  B n .
Proof: This result follows from Theorem 2 by noting that by Theorem 4,  is
compact valued and the graph K is closed and hence is a Borel set, which by [7,
Theorem 3] implies that the multifunction  is Borel measurable .
References
1. P. Billingsley, “Convergence of Probability Measures,” Wiley, New York, 1968.
2. P. Billingsley, “Convergence of Probability Measures,” 2nd, Wiley, New York, 1999.
3. J. L. Doob, “Measure Theory,” Springer-Verlag, New York, 1994.
4. M. L. Eaton and M. D. Perlman, Reflection groups, generalized Schur functions, and
the geometry of majorization, Annals of Probability, 5 (1977), 829-860.
5. W. Hildenbrand, “Core and Equilibria of a Large Economy,” Princeton University
Press, Princeton, 1974.
6. C. J. Himmelberg and F. S. Van Vleck, Multifunctions with values in a space of
probability measures, Journal of Mathematical Analysis and Applications, 50 (1975),
108-112.
7. C. J. Himmelberg, T. Parthasarathy, and F. S. Van Vleck, Optimal plans for dynamic
programming problems, Mathematics of Operations Research, 1 (1976), 390-394.
8. T. Kamae, U. Krengel, and G. L. O’Brien, Stochastic inequalities on partially ordered
spaces, Annals of Probability, 5 (1977), 899-912.
9. A. W. Marshall and I. Olkin, “Inequalities: Theory of Majorization and Its
Applications,” Academic Press, New York, 1979.
10. M. Shaked and J. G. Shanthikumar, “Stochastic Orders and Their Applications,”
Academic Press, New York, 1994.
11. V. Strassen, The existence of probability measures with given marginals, Annals of
Mathematical Statistics, 36 (1965), 423-439.