Download Limit Theorems for Monotone Markov Processes - Sankhya

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Probability wikipedia , lookup

Transcript
Sankhyā : The Indian Journal of Statistics
2010, Volume 72-A, Part 1, pp. 170-190
c 2010, Indian Statistical Institute
Limit Theorems for Monotone Markov Processes
Rabi Bhattacharya
The University of Arizona, Tucson, USA
Mukul Majumdar
Cornell University, Ithaca, USA
Nigar Hashimzade
University of Reading, Reading, UK
Abstract
This article considers the convergence to steady states of Markov processes
generated by the action of successive i.i.d. monotone maps on a subset S of
an Eucledian space. Without requiring irreducibility or Harris recurrence, a
“splitting” condition guarantees the existence of a unique invariant probability as well as an exponential rate of convergence to it in an appropriate
metric. For a special class of Harris recurrent processes on [0, ∞) of interest
in economics, environmental studies and queuing theory, criteria are derived
for polynomial and exponential rates of convergence to equilibrium in total
variation distance. Central limit theorems follow as consequences.
AMS (2000) subject classification. Primary 60F05, 60J05.
Keywords and phrases. Markov processes, coupling, monotone i.i.d. maps,
polynomial convergence rates.
1
Introduction
In this paper we present some limit theorems for Markov processes defined by:
(n = 0, 1, . . .)
(1.1)
Xn+1 = αn+1 (Xn )
where {αn : n ≥ 1} is a sequence of i.i.d. random monotone maps on a suitable subset S of Rk . Such processes have been of particular interest in
developing models of dynamic systems subject to exogenous random shocks
in many disciplines. Often, with specific assumptions on S ( for example,
an interval in R, a closed subset of Rk ,...), and the maps αn (for example,
continuity, concavity,...), it has been possible to derive strong results on the
Monotone Markov processes
171
asymptotic behavior of Xn and throw light on questions of long-standing
importance such as estimating the long run expected value of per capita
output or capital stock, or the probabilty of extreme scarcity of a renewable
resource. Since the literature exploring (1.1) is already vast and growing,
before proceeding to the formal analysis, we touch upon a few issues and
applications to economic growth and resource management.
First, the most funadamental theme is to identify conditions that guarantee the existence of an invariant distribution (a stochastic steady state),
its uniqueness, and its stability. When, irrespective of the intial state, the
distribution of Xn converges to the invariant distribution π, it is of interest
to estimate the speed of convergence. Insights into all these questions are
gained when the process (1.1) satisfies a “splitting” condition. In various
contexts, the process (1.1) is interpreted as a purely descriptive model. As
an instance, Xn is a non-negative k− vector of stocks of all the commodities
in an economy and its evolution, when the law of motion is a random monotone function, is captured by (1.1); or Xn denotes the list of k interacting
groups of a population or k interacting species. See Ellner (1984) or Bhattacharya and Majumdar (2007), pp.262- 274, and the references cited there.
One may also start with a discounted dynamic programming model (Blackwell, 1965), impose appropriate restrictions on the state and action spaces,
obtain a stationary optimal policy function (as in Maitra, 1968). This policy function, together with the law of motion, gives rise to the process (1.1)
portraying the evolution of optimal states. In many optimization problems,
monotonicity of an optimal policy function is obtained by assuming a supermodularity or concavity property of the reward function (see Ross, 1983 and
Majumdar, Mitra and Nyarko, 1989). It should perhaps be noted that the
continuity of a policy function may be more problematic: even in a deterministic model of intertemporal optimization; an example of discontinuity
(in which the production function is S-shaped, the return function is linear)
was given in Majumdar and Mitra (1983). Undoubtedly, the process (1.1)
has become one of the most useful frameworks to explore some of the basic
questions of the theory of intertemporal resource allocation in economics.
We should emphasize that there are interesting Markov processes, arising,
for example, in the study of i.i.d. iterates of quadratic maps (Bhattacharya
and Rao, 1993, Bhattacharya and Majumdar, 2004), or in the study of interaction of growth and cyclical forces (Bhattacharya and Majumdar, 2007,
pp. 267–273) that enter into an invariant subset of S on which all the maps
turn out to be monotone. The results on processes (1.1) have been decisive
in the analysis of the long run behavior of such dynamical systems.
172
Rabi Bhattacharya, Mukul Majumdar and Nigar Hashimzade
Another class of models allows for regeneration or replenishment of a
resource (such as groundwater). We sketch a simple example of the management of such a resource to provide a motivation for the analysis in Section
4. Let Xn ≥ 0 be the stock of a resource at the end of period n (assuming
X0 = x ≥ 0), and let {Rn+1 : n ≥ 0} be a sequence of i.i.d. non-negative
real-valued random variables representing random flows of input (for example, rainfalls). Let c > 0 be a given parameter: it is interpreted as a target
level of consumption. If, at the end of period n + 1, the planner observes
Xn + Rn+1 > c, he withdraws from the reservoir the amount c, leaving
Xn+1 = Xn + (Rn+1 − c) as the stock. If Xn + Rn+1 ≤ c then the entire
available stock is withdrawn, leaving Xn+1 = 0. Thus, the evolution of stock
is described by the process:
Xt+1 = (Xn + Zn+1 )+ ≡ max (Xn + Zn+1 , 0) for n ≥ 0,
where Zn+1 ≡ Rn+1 − c.
The policy of “constant harvesting” or “constant yield” is easy to describe
and implement. The process arose in queuing theory and has been extensively studied subseqently. It is a monotone Markov process with S = R+ .
Assume that EZ1 < 0 (and, to avoid trivialities, P (Z1 > 0)˙ > 0). Then
there is a unique invariant distribution π, and one is interested in the nature
of convergence to π, as well as in estimating the probability π({0}).
Here is a brief summary of the main results of the article. In Section 2, we
introduce Markov processes generated by i.i.d. monotone maps on a subset
S of Rk , which may include both increasing and decreasing maps. Assuming a “splitting condition” Theorem 2.1 generalizes earlier results of Dubins
and Freedman (1966), Bhattacharya and Lee (1988) and Bhattacharya and
Majumdar (1999, 2007) on the existence of a unique invariant probability
π, and the exponential convergence to π in an appropriate metric. In Section 3, Theorem 3.1, we elaborate on an earlier central limit theorem due
to Bhattachaya and Lee (1988) for increasing processes on S. Section 4 concerns monotone increasing Markov processes on S = [0, ∞) and especially
the class of Lindley processes. For these Harris-recurrent processes, coupling
arguments, large deviation type estimates and non-uniform error bounds in
the classical central limit theorems provide known criteria, due to Lund and
Tweedie (1996), for exponential rates of convergence to the invariant probablity in total variation distance (Theorems 4.1 and 4.2), and new criteria for
polynomial rates of convergence (Theorems 4.3 and 4.4) and central limit
theorems (Theorem 4.5).
Monotone Markov processes
2
173
Markov processes generated by iterations of i.i.d. monotone
maps
Let (S, S) be a measurable space, and {αn : n ≥ 1} a sequence of i.i.d.
random maps on S (into S), defined on a probability space (Ω, F, P ). Then
for any S− valued random variable X0 independent of {αn : n ≥ 1}, the
process Xn (n = 0, 1, . . .) defined by
Xn+1 = αn+1 Xn
(n = 0, 1, . . .)
(2.1)
is a Markov process on S with a transition probability p (x, dy) given by
p (x, B) = P (α1 x ∈ B) ,
(x ∈ S, B ∈ S) ,
assuming x → p (x, B) is measurable ∀B ∈ S.
Conversely, if S is a standard Borel space (i.e., a Borel subset of a Polish
space), and B (S) is the Borel sigmafield of S, then given any transition
probability p (x, dy) on (S, S = B (S)), one can construct a sequence of i.i.d.
maps {αn : n ≥ 1} on a probability space (Ω, F, P ) such that the process
(2.1) is Markov with transition probability p (x, dy). (See Blumenthal and
Corson, 1972, Kifer, 1986, p. 8, or Bhattacharya and Waymire, 2009, p.228.)
It will be useful to write Xn (x) for the process (2.1), when X0 ≡ x.
In this article we consider only those Markov processes on a measurable subset S of Rk that are generated by i.i.d. monotone maps. For x =
x1 , . . . , xk , y = y 1 , . . . , y k ∈ Rk , write x ≤ y, or y ≥ x, if xi ≤ y i ∀i,
and x < y if x ≤ y with a strict inequality xi < y i for some i. A measurable
function f on S into S, or into R, is increasing (decreasing) if f (x) ≤ f (y)
(respectively, f (y) ≤ f (x)) ∀x ≤ y. The map f is monotone if it is either
increasing or decreasing.
To construct such a Markov process with αn taking
values in a set Γ
of measurable monotone functions γ = γ 1 , . . . , γ k on S, consider an appropriate sigmafield C on Γ and a probability measure
Q on (Γ, C). As1
2
k
sume that (γ, x) → γx = γ (x) , γ (x) , . . . γ (x) is measurable with
respect to the product sigmafield C ⊗ B (S) on Γ × S and the Borel sigmafield B (S) on S. On a probability space (Ω, F, P ), let {αn : n ≥ 1} be
an i.i.d. sequence of maps with common distribution Q. Then define the
Markov process {Xn : n = 0, 1, . . .} as described earlier (see (2.1) and note
that p (x, B) = Q ({γ ∈ Γ : γ (x) ∈ B})). We will make the following assumptions.
174
Rabi Bhattacharya, Mukul Majumdar and Nigar Hashimzade
(A1): S is either a closed subset of Rk , or a Borel subset of Rk which can
be made homeomorphic to a closed subset of Rk by means of a strictly
increasing continuous map f on S into Rk : x < y =⇒ f (x) < f (y).
Note that every rectangle S = I1 × · · · × Ik satisfies (A1), if Ij ’s are all
arbitrary subintervals of R. For, (a, b), [a, b), (a, b] are made homeomorphic
to (−∞, ∞), [0, ∞), (−∞, 0], respectively, by means of strictly increasing
continuous maps.
Next assume the following splitting condition. For its statement, write
γ1N for the composition
γ1N = γN · · · γ1
∀γ = (γ1 , . . . , γN ) ∈ ΓN .
(A2): There exists x0 ∈ S, an integer N ≥ 1, and sets Fi belonging to the
product sigmafield C ⊗N on ΓN ( i = 1, 2), such that
(i)
(ii)
δi ≡ QN (Fi ) > 0 for i = 1, 2;
γ1N x ≤ x0 ∀x ∈ S, if γ ≡ (γ1 , . . . , γN ) ∈ F1 , and
γ1N x ≥ x0 ∀x ∈ S, if γ ∈ F2 .
Finally, we will make the following rather innocuous assumption.
N
N
N
(A3): Consider the
+,
T sets H+ = γT∈ Γ : γ1 is increasing , H− = Γ H
Fi+ = Hi+ Fi , Fi− = Hi− Fi (i = 1, 2). Then Fi+ (and Fi− ) ∈ C ⊗N .
On the space P (S) of all probability measure on (S, B (S)) define, for
each a > 0, the metric
Z
Z
da (µ, ν) = sup g dµ − g dν ,
g∈Ga
where Ga is the class of all Borel measurable monotone functions g on S into
[0, a]. It is easy to verify that (1) da = ad1 and (2) the metric da remains
the same if Ga is restricted to Borel measurable incrasing functions on S
into [0, a]. We will make use of the following result of Chakraborty and
Rao (1998). The second part of the lemma is proved in Bhattacharya and
Majumdar (2007), pp. 287-288.
Lemma 2.1. Under the assumption (A1), (i) (P (S) , da ) is a complete
metric space, and (ii) convergence in da implies weak convergence.
175
Monotone Markov processes
We now state the main result of this section which improves upon earlier results of Bhattacharya and Lee (1988), Bhattacharya and Majumdar
(1999) (see also Bhattacharya and Majumdar, 2007, pp. 259, 260, 288-291).
The latter in turn were generalizations of the seminal result of Dubins and
Freedman (1966). Also see Yahav (1975).
Let p(n) (x, dy) denote the n− step transition probability of the Markov
process (2.1), i.e., the distribution of X
when X0 ≡ x. Let T ∗n be the
R n (n)
∗n
operator on P (S) defined by T µ = p (x, ·) µ (dx). That is, T ∗n µ is
the distribution of Xn when X0 has distribution µ.
Theorem 2.1. Assume (A1)-(A3) hold. Then there exists a unique invariant probability π for the Markov process (2.1), and, for every µ ∈ P (S) ,
d1 (T ∗n µ, π) ≤ (1 − δ)[ N ] (n ≥ 1) ,
n
n
is the integer part of N
.
where δ = min {δ1 , δ2 }, and N
n
(2.2)
Proof. We will provide a sketch of the proof, whose details appeared
in Bhattacharya and Majumdar (2010). For an arbitrary increasing g ∈ G1 ,
define
Z
hi+ (x) =
g γ1N x QN (dγ) ,
Fi+ (Fi+
Z
hi− (x) =
Fi− (Fi−
h3+ (x) =
H+
T
(F1
h3− (x) =
H−
h4 (x) =
F1
T
Z
T
Z
Z
(F1
T
S
S
g
F2
T
Fj )
1 − g γ1N x QN (dγ) ,
Fj )
F2 ) c
(i, j = 1, 2; i 6= j)
g γ1N x QN (dγ) ,
1 − g γ1N x QN (dγ) ,
F2 )c
γ1N x
\ Q (dγ) = g (x0 ) Q
F1 F2 .
N
N
The functions hi± are increasing (i = 1, 2, 3). For µ, ν ∈ P (S), write
Z
Z
ai+ = hi+ (x) µ (dx) − hi+ (x) ν (dx) ,
Z
Z
ai− = hi− (x) µ (dx) − hi− (x) ν (dx) (i = 1, 2, 3)
176
Rabi Bhattacharya, Mukul Majumdar and Nigar Hashimzade
Then hi± ∈ Gai± (i = 1, 2, 3), and
Z
Z
g dT ∗N (µ) − g dT ∗N (ν)
≤
≤
Z
3 Z
X
hi+ (x) µ (dx) − hi+ (x) ν (dx)
i=1
Z
3 Z
X
+
hi− (x) µ (dx) − hi− (x) ν (dx)
i=1
3
X
(ai+ + ai− ) d1 (µ, ν)
i=1
≤ (1 − δ) d1 (µ, ν) .
Hence, T ∗N is a uniformly strict contraction on the complete metric space
(P (S) , d1 ) and, the contraction mapping theorem (see, e.g., Bhattacharya
and Majumdar, 2001, pp. 6, 288–290) gives,
d1 (T ∗n µ, T ∗n ν) ≤ (1 − δ)[ N ] d1 (µ, ν) , (µ, ν) ∈ P (S) .
n
In particular, (2.2) holds, since d1 (µ, ν) ≤ 1.
2
Remark 2.1. The main significance of Theorem 2.1 lies in the fact that it
applies to a class of Markov processes which may not be Harris recurrent, or
irreducible (at least with respect to any discernible measure). One may also
derive central limit theorems for certain classes of functions of the Markov
proceess, which are useful, e.g., in estimating the invariant probability π.
(See Bhattacharya and Lee, 1988 and Bhattaharya and Majumdar, 2007,
Chapter 5.)
Remark 2.2. By looking at the proof of Theorem 2.1, one may notice
that the Euclidean structure of S is not used very much. On a topological
(metric) space S with a partial order, the main ingredients of the proof are
assumption (A2) and Lemma 2.1. See Hopenhayn and Prescott (1992) and
Diaconis and Freedman (1999) for applications to economics and physics,
respectively, on such S.
Remark 2.3. On S=[0, 1] the splitting condition (A2) is necessary for the
existence of a nondegenerate unique invariant probability, if αn are increasing
and coninuous. See, e.g., Bhattacharya and Majumdar (2007), p. 280. The
processes on S = [0, ∞) considered in Section 4, however, do not in general
satisfy (A2), but have unique invariant probabilities.
Monotone Markov processes
3
177
Central Limit Theorem for monotone Markov processes.
In this section we review and explore central limit theorems for Markov
processes (2.1). In particular, we give a proof of Theorem 3.1 below (due to
Bhattacharya and Lee, 1988), clarifying parts of the original proof.
Let T m denote the m−step transition operator on L2 (π):
Z
m
T f (x) = f (y) p(m) (x, dy) ,
f ∈ L2 (π) .
We will write T for T 1 and simply call it the transition operator.
For Theorem 3.1 and Lemma 3.1 we restrict attention to i.i.d. increasing
maps {αn : n ≥ 1} with values in a function space Γ of (measurable) increasing maps on S into S. First, we refer to the following result in Bhattacharya
and Lee (1988).
Lemma 3.1. Let {αn : n ≥ 1} be i.i.d. increasing maps generating the
Markov process (2.1). Assume (A1), (A2). Then for every function f which
2 (π) (i = 1, 2),
may be expressed as f = f1 −f2 , with fi increasing and fi ∈ L
R
f − f belongs to the range of T − I on L2 (π), where f = f dπ and I is
the identity operator.
Proof. The proof of the lemma is based on the following estimate (see
Bhattacharya and Lee, 1988, relations (3.7), (3.8)).
1/2
N
1
T
,
(3.1)
fi − f i 2 ≤ c fi − f i 2 , c = 1 − (1 − δ)
2
which holds for i = 1, 2. Here kk2 is the usual norm in L2 (π). Since T N fi
is increasing, as fi is, iterating (3.1) one obtains
n
T fi − f i ≤ c[ Nn ] fi − f i ∀n ≥ 1, (i = 1, 2) .
2
2
From this one obtains
n
T f − f ≤ 2c[ Nn ] f − f ,
2
2
Now write
g=−
∞
X
n=0
∞
X
n
T f − f < ∞.
2
n=0
Tn f − f .
Then (T − I) g = f − f , completing the proof of the lemma.
2
178
Rabi Bhattacharya, Mukul Majumdar and Nigar Hashimzade
Theorem 3.1 (Bhattacharya and Lee, 1988). Suppose {αn : n ≥ 1} is an
i.i.d. increasing sequence and the assumptions (A1), (A2) in Section 2 hold.
Then the CLT (3.2) holds for all f = f1 − f2 , fi increasing, fi ∈ L2 (π).
Proof. From Lemma 3.1, the martingale CLT yields the following result
as originally proved by Gordin and Lifsic (1978). Also see Bhattacharya and
Waymire (2009), pp. 511-513.
n
L
1 X
√
f (Xj ) − f −→ N 0, σ 2 as n → ∞,
n
(3.2)
j=1
L
if X0 has theinvariant distribution π. Here −→ denotes convergence in Law,
and N 0, σ 2 is the Normal distribution with mean zero and variance
Z
Z
2
2
σ = g dπ − (T g)2 dπ.
(3.3)
Our main task is to show that (3.2) holds, whatever the (initial) distribution µ of X0 . For this, first let f be increasing (on S into R), f ∈ L2 (π),
and denote by {Xn′ : n ≥ 0}, {Xn : n ≥ 0} the processes (2.1) with X0′ having
distribution µ and X0 having distribution π, respectively. Write
′
Sm,q
= n−1/2
Sm,q = n−1/2
q
X
j=m
q
X
j=m
f Xj′ − f ,
f (Xj ) − f ,
0 ≤ m ≤ q ≤ n.
(3.4)
′
′
′
Then S0,n
= S0,n
+ Sn′ 0 ,n , and, for every n0 , S0,n
→ 0 a.s. as n → ∞.
0 −1
0 −1
Same holds for Sm,q . Now, for any given r ∈ R,
Z
′
′
P Sn0 ,n > r = Ehn−n0 Xn0 = hn−n0 (y) (T ∗n0 µ) (dy) ,
(3.5)
where
hj (y) = P (S0,j (y) > r) ,
(3.6)
defining {Xn (y) : n ≥ 0} for (2.1) with X0 ≡ y, and
Sm,q (y) = n
−1/2
q
X
j=m
f (Xj ) − f ,
0 ≤ m ≤ q ≤ n.
(3.7)
179
Monotone Markov processes
Then hj is increasing, hj ∈ G1 , so that, by Theorem 2.1,
Z
Z
∗n0
sup hn−n0 (y) (T µ) (dy) − hn−n0 (y) π (dy) → 0 as n0 → ∞.
n>n0
(3.8)
Fix ε > 0. In view of (3.5)–(3.8), there exists n0 (ε) such that the left side
of (3.8) is less than ε/4 if n0 = n0 (ε), and hence for all r
P Sn′ ,n > r − P (Sn0 ,n > r) < ε
0
4
∀n > n0 = n0 (ε) .
In view of (3.2) (i.e. the CLT for S0,n ), there exists n (ε) such that
|P (Sn0 ,n > r) − (1 − Φσ2 (r))| <
ε
4
∀n ≥ n (ε) > n0 = n0 (ε) , ∀r,
where Φσ2 is the cumulative distribution function of N 0, σ 2 . Hence, for
all r,
P Sn′ ,n > r − (1 − Φσ2 (r)) < 2ε ∀n ≥ n (ε) > n0 = n0 (ε) .
0
4
(3.9)
Finally, for each δ > 0, one has
P Sn′ 0 ,n > r + δ − P
≤ P Sn′ 0 ,n > r − δ + P
′ ′
S0,n ≥ δ ≤ P S0,n
>
r
′ 0
S0,n ≥ δ .
0
Choose δ = δ (ε) such that |Φσ2 (r ±
δ) − Φσ2 (r)| < ε/4 for all r, and choose
′
≥ δ (ε) < ε/4, ∀n ≥ n1 (ε). Since the
n1 (ε) ≥ n (ε) such that P S0,n
0
estimate (3.9) is uniform in r (as the function (3.6) belongs to G1 , whatever
be r), so is (3.9), and one gets
2ε ε ε
′
P S0,n
> r − (1 − Φσ2 (r)) <
+ + = ε, ∀n ≥ n1 (ε) .
4
4 4
This concludes the proof of Theorem 3.1 for f increasing, f ∈ L2 (π). For
f = f1 − f2 , with f1 , f2 increasing,
consider the (joint) distribution of
′(1)
′(2)
(1)
(2)
S0,n , S0,n and of S0,n , S0,n , where the superscript (i) indicates that fi is
used in placeof f in (3.4) and subsequently.
For arbitrary r1 , r2 ∈ R, one now
′(1)
′(2)
(1)
(2)
considers P S0,n > r1 , S0,n > r2 and P S0,n > r1 , S0,n > r2 , compared
with the help of the increasing function
(1)
(2)
y −→ hj,1,2 (y) = P S0,j (y) > r1 , S0,j (y) > r2 ,
180
Rabi Bhattacharya, Mukul Majumdar and Nigar Hashimzade
(i)
with S0,j (y) as in (3.7) with f replaced by f (i) .
(1)
(2)
The asymptotic distribution of S0,n , S0,n (under initial distribution π)
is shown to be a two-dimensional Normal, again using the martingale CLT
(1)
(2)
for Markov processes applied to a linear combinations of S0,n , S0,n . The two
′(1)
′(2)
dimensional CLT for S0,n , S0,n (under arbitrary initial distribution µ) now
follows as in the one-dimensional case. From this follows immediately the
′(1)
′(2)
2
CLT for S0,n − S0,n .
Remark 3.1. We conjecture that the conclusion of Theorem 3.1 holds
under the hypothesis of Theorem 2.1.
Remark 3.2. For Harris recurrent processes, such as considered in Section 4, it is generally true that the CLT (3.2) holds, whatever the initial
distribution (see Bhattacharya, 1982, Theorem 2.6, for a precise statement).
However, for processes which are not Harris recurrent or irreducible, such
convergence for arbitrary initial distributions may not be true. For examples
of many processes for which Theorem 3.1 holds, but which are not Harris
recurrent, see Bhattacharya and Rao (1993), Bhattacharya and Majumdar
(1999, 2001, 2004), and Chapters 3, 4 of Bhattacharya and Majumdar (2007).
Remark 3.3. Theorem 3.1 may be strengthened to its functional form,
providing convergence in distribution of the scaled partial sums process in
(3.2) to the Wiener measure. See Bhattacharya and Lee (1988).
4
Monotone Markov processes with nonnegativity constraints
As explained in the Introduction, nonnegativity and monotonicity constraints arise naturally in economics, queueing theory, and environmental studies. In the present section we study certain Markov processes on
S = [0, ∞) which are monotone (increasing) in the sense that the transition probability p (x, dy) is stochastically ordered: a transition probability
p (x, dy) on an interval J is stochastically larger than p (x′ , dy) if x′ < x; in
other words,
Fx (y) ≤ Fx′ (y) ∀y ∈ J, if x′ < x,
(4.1)
where Fx is the distribution function of p (x, ·). The following lemma shows
that such Markov processes may be generated by iterations of i.i.d. increasing
maps.
Monotone Markov processes
181
Lemma 4.1. A monotone Markov process on an interval J, satisfying
(4.1), may be represented as (2.1), with {αn : n ≥ 1} i.i.d. increasing maps
on J.
Proof. By a strictly increasing continuous map one may map J onto
the unit interval I = (0, 1), or (0, 1], or [0, 1), or [0, 1], depending on J. So
we may take the state space to be I. Let U be a random variable with the
uniform distribution on I. Then define the random map α on I by αx =
Fx−1 (U ), where Fx−1 (u) = inf {y ∈ R : Fx (y) > u} , u ∈ I (See Bhattacharya
and Waymire, 2009, p.228). Using (4.1), it is simple to check α is increasing.
2
For a more general treatment see Lindvall (1992), pp. 132–136.
For the rest of this section, we take S = [0, ∞), and assume that {0} is
a recurrent set,
P (Xn (x) = 0 for some n ≥ 1) = 1 ∀x ∈ [0, ∞) ,
or
P (τ0 < ∞) = 1,
(4.2)
where τ0 = inf {n ≥ 1 : Xn = 0} is the first time ( > 0) to reach 0. Also,
assume
p(n) (0, [x, ∞)) > 0 for some n = n (x) , ∀x > 0.
(4.3)
The process is then Harris recurrent, or ϕ− recurrent, with respect to the
Dirac measure ϕ at 0: ϕ ({0}) = 1 (see, e.g. Meyn and Tweedie, 1993, p.
200). For two processes Xn (x), Xn (x′ ), n ≥ 0, given by (2.1), with X0 = x
and X0 = x′ , respectively, a (strong) coupling occurs with probability one,
at time τ0 (x) if x′ ≤ x, and at time τ0 (x′ ) if x < x′ , where
τ0 (z) = inf {n ≥ 1 : Xn (z) = 0} , z ∈ [0, ∞) .
(4.4)
That is, Xn (x) = Xn (x′ ) ∀n ≥ τ0 (x), if x′ ≤ x, or ∀n ≥ τ0 (x′ ), if x < x′ .
By a conditioning argument, it follows that such a coupling occurs for two
processes (2.1) with initial random variables X0 , X0′ which are independent,
and independent of {αn : n ≥ 1}, at a time
τ0 (X0 ) on {X0′ ≤ X0 }
τ=
(4.5)
τ0 (X0′ ) on {X0 < X0′ }
From standard theory, the Markov process has an invariant probability, say
π, necessarily unique, if and only if
Eτ0 (0) < ∞.
(4.6)
182
Rabi Bhattacharya, Mukul Majumdar and Nigar Hashimzade
Also, letting X0 have distribution µ and X0′ have distribution π, one then
has
(4.7)
dtv (T ∗n µ, π) ≤ P (τ > n) → 0 as n → ∞, ∀µ ∈ P (S) ,
where dtv is the total variation distance:
dtv (µ, ν) = sup {|µ (B) − ν (B)| : B ∈ B ([0, ∞))} .
A useful criterion for an exponential rate of convergence in (4.7), due to
Lund and Tweedie (1996), is the following.
Define
Vx (c) = E exp {cτ0 (x)} ,
c∗ = sup {c : Vx (c) < ∞ ∀x} .
Theorem 4.1 (Lund and Tweedie, 1996). Suppose c∗ > 0. The for all
x ∈ [0, ∞), and for every 0 < c < c∗ ,
dtv p(n) (x, ·) , π = o (exp {−cn}) as n → ∞.
For an application to a process of special interest to us, sometimes referred to as the Lindley process, let {Zn : n ≥ 1} be an i.i.d. sequence,
Xn+1 = max {0, Xn + Zn+1 }
(n ≥ 0) ,
(4.8)
where X0 is an arbitrary nonnegative random variable independent of the
sequence {Zn : n ≥ 1}. Many authors have looked at this process. (See,
e.g., Lindley, 1952, Spitzer, 1956, Feller, 1971, pp. 194–200, Lund and
Tweedie, 1996, Bhattacharya and Majumdar, 2007, pp. 336–338.) From
Spitzer (1956), it follows that this process is ergodic (i.e.
P it has an invariant
probability π, necessarily unique), if and only if n−1 ∞
n=1 P (Sn > 0) < ∞,
where Sn := Z1 + · · · + Zn . We will assume the stronger condition (see, e.g.,
Bhattacharya and Majumdar, 2007, pp. 237, 238): EZ1 < 0, and, to avoid
trivialities, it is also assumed that P (Z1 > 0) > 0. Note that the αn (n ≥ 1)
defining the process (4.8) are given by
αn x = max {0, x + Zn } , x ∈ [0, ∞) (n ≥ 1).
To estimate Vx (c) in this case, observe that
P (τ0 (x) > n) ≤ P (x + Sn > 0) = P edSn > e−dx
≤ edx M n (d)
M (d) := EedZ1 ,
(4.9)
183
Monotone Markov processes
assuming that the moment generating function M (d) is finite for some
d > 0. Since M (0) = 1 and M ′ (0) = EZ1 < 0, the quantity M ∗ =
inf {M (d) : d > 0} is less than 1. Let d∗ be the point where this minimum
is attained. Then let d = d∗ in (4.9). Following Lund and Tweedie (1996),
use the estimate
Vx (c) =
∞
X
ecn P (τ0 (x) = n) ≤
n=1
c+d∗ x
≤ e
c
∗ −1
(1 − e M )
∞
X
ec(n+1) P (τ0 (x) > n)
n=0
<∞
for all c such that ec M ∗ < 1. Now let c∗ = ln (1/M ∗ ). Then c∗ ≥ c∗ , and
one arrives at a slight extension of a result of Lund and Tweedie (1996).
Theorem 4.2. Under the above assumptions on Zn , one has
1
dtv p(n) (x, dy) , π = o (exp {−cn}) ∀c < ln ∗ .
M
Example 4.1. (Two-sided exponential distribution). Let Z1 have the
density
(
ab −bx
e , x>0
g (x) = a+b
ab ax
x ≤ 0.
a+b e ,
Assume 0 < a < b. Then the invariant distribution π has an atom at 0, and
a density π (x) on (0, ∞) given by (see, e.g., Feller, 1971, Example VI.8.b)
a
π ({0}) = 1 − ,
b
π (x) =
b − a −(b−a)x
ae
, x > 0.
b
In this example,
M (d) = EedZ1 =
d∗ =
b−a
,
2
1
a
a+b1−
M∗ =
d
b
+
1
b
a+b1+
d
b
(d < b) ,
4ab
.
(a + b)2
Example 4.2. (Shifted exponential distribution). Here Z1 has the density, for some c > 0, θ > 0, θ < c,
(
1 − 1θ (x+c)
e
for x ≥ −c,
g (x) = θ
0
for x < c,
184
Rabi Bhattacharya, Mukul Majumdar and Nigar Hashimzade
so that
1
M (d) = e−cd / (1 − θd) , d < ,
θ
c
c
c
−
θ
M∗ = M
= e− θ +1
.
cθ
θ
One can check that the invariant distribution π has a density π (x) on (0, ∞)
given by
π (x) = β −1 exp {− (x + c) /β} , x > 0,
where β > θ solves
θ
c
.
1 − = exp −
β
β
The point mass at 0 is, of course, given by π ({0}) = 1 −
R∞
0
π (x) dx.
The following example was considered by Iams and Majumdar (2010).
Example 4.3. (Normal
with negative mean). Let Z1 have the Normal
distribution N µ, σ 2 with mean µ < 0 and variance σ 2 > 0. Then
M (d) = edµ+σ
2 d2 /2
,
d∗ = −
µ
,
σ2
µ2
M ∗ = e− 2σ2 .
Remark 4.1. In general, one has π ({0}) = (E0 τ0 )−1 > 0 for the process
in (4.8) if E0 τ0 < ∞. In the case Z1 has an absolutely continuous distribution
(with respect to Lesbegue measure on [0, ∞)), the invariant probability π
has a density on (0, ∞), in addition to the point mass at 0.
We now turn to those cases when τ0 (z) in (4.4) does not have a finite
moment generating function Vx (c) for any c > 0. This is the case when Z1
has a relatively ‘fat tail’, often observed in certain types of data in economics
and finance. In such cases one may still hope to get polynomially decaying
convergence rates to equilibrium. First, we consider general monotone increasing processes on [0, ∞) satisfying (4.2), (4.3), (4.6). It is convenient to
denote probabilities and expectations under an initial state x as Px and Ex ,
and Pµ , Eµ for the corresponding quantities when the initial distribution is
µ. (Thus Px and Ex are really Pδx , Eδx .) Then (see, e.g., Bhattacharya and
Majumdar, 2007, p. 214, relations C9.27-28),
Z
f dπ =
E0
Pτ0 −1
m=0
f (Xm )
E0 τ0
,
185
Monotone Markov processes
for all π−integrable (or nonnegative measurable) f on [0, ∞). For α > 0,
writing h (x) ≡ Ex τ0α , one gets
Eπ τ0α
=
Z
Ex τ0α π (dx)
=
Z
h (x) π (dx) =
E0
Pτ0 −1
h (Xm )
.
E0 τ0
m=0
+ )α |F
+
Note that h (Xm ) = EXm τ0α = E (τ0 ◦ Xm
m , where Xm is the after
+
m− process: (Xm )n ≡ Xm+n (n ≥ 0), and Fm is the sigmafield generated
by {Xj : 0 ≤ j ≤ m}. Then
E0
τX
0 −1
h (Xm ) = E0
m=0
=
=
=
∞
X
m=0
∞
X
m=0
∞
X
∞
X
m=0
+ α
h (Xm ) 1{m<τ0 }
E0 E
τ0 ◦ Xm
E0 E
+
τ0 ◦ Xm
α
1{m<τ0 } |Fm
E0 E
+
τ0 ◦ Xm
α
1{m<τ0 }
m=0
|Fm 1{m<τ0 }
= E0
(since {m < τ0 } ∈ Fm ),
τX
0 −1
m=0
+
τ0 ◦ Xm
α
.
+ = τ − m on {m < τ }, one then has
Noticing that τ0 ◦ Xm
0
0
E0
τX
0 −1
m=0
h (Xm ) = E0
τX
0 −1
m=0
(τ0 − m)α = E0
Hence
Eπ τ0α ≤
τ0
X
m=1
mα ≤ E0 τ0α+1 .
E0 τ0α+1
.
E0 τ0
(4.10)
Theorem 4.3. In addition to (4.2), (4.3), assume E0 τ0α+1 < ∞ for some
α > 0. Then, for every µ such that Eµ τ0α < ∞, one has
dtv (T ∗n µ, π) = o n−α , as n → ∞.
(4.11)
In particular, (4.11) holds for µ = δ{x} for all x ∈ [0, ∞).
Proof. Let the coupling time τ be expressed by (4.5), with X0 , X0′
having distributions µ and π, respectively. Then
Z
∗n
−α
dtv (T µ, π) ≤ P (τ > n) ≤ n
τ α dP.
{τ >n}
186
Rabi Bhattacharya, Mukul Majumdar and Nigar Hashimzade
It suffices then to prove that Eτ α < ∞. For this write
Z
Z
Eτ α =
τ α dP +
τ α dP
′
′
{X0 ≤X0 }
{X0 >X0 }
α
= Eτ0 (X0 ) 1{X ′ ≤X0 } + Eτ0α X0′ 1{X ′ >X0 }
0
0
≤ Eµ τ0α + Eπ τ0α < ∞,
in view of (4.10) and the assumption Eµ τ0α < ∞. For the second part, take
µ = δ{x} and note that E0 τ0α+1 < ∞, together with (4.3), implies that there
exists arbitrarily large x′ such that Ex′ τ0α+1 ≡ E τ0α+1 (x′ ) < ∞. But
τ0 (x) ≤ τ0 (x′ ) ∀x ≤ x′ . Hence Ex τ0α+1 < ∞ ∀x ≥ 0.
2
We now turn to Lindley process (4.8), and assume
Z1 − µ1 s
< ∞,
µ1 ≡ EZ1 < 0, P (Z1 > 0) > 0, ρs ≡ E σ
(4.12)
where s ≥ 3 is an integer and σ 2 = E (Z1 − µ1 )2 . Denote by Φ the standard
Normal distribution function. By the non-uniform error estimate in the
CLT (see Bhattacharya and Ranga Rao, 1976, Corollary 17.7, p. 172), one
has
√ µ1
Sn − nµ1
√
P0 (τ0 > n) ≤ P (Sn > 0) = P
>− n
σ n
σ
√
√
Sn − nµ1
|µ1 |
|µ1 |
c′
s
√
> n
= P
≤1−Φ
n
+√ √
σ
σ
σ n
n 1 + n |µσ1 |
s
′′ |µ1 |
≤ c
n−(s+1)/2 ,
σ
where c′ , c′′ are constants which depend only on ρs and s.
A standard summation by parts and elementary estimation lead to, for
0 < α < (s − 1)/2,
E0 τ0α+1 =
∞
X
n=1
nα+1 P0 (τ0 = n) ≤ (α + 1) 2α
∞
X
n=1
nα P0 (τ0 > n) < ∞. (4.13)
Using Theorem 4.3, we have now proved the first part of Theorem 4.4 below.
Theorem 4.4. Assume (4.12), with s ≥ 3, for the Lindley process (4.8).
Then, for every x ∈ [0, ∞) ,
s−1
dtv p(n) (x, ·) , π = o n−α for all α <
.
2
187
Monotone Markov processes
For an initial distribution µ, one has
dtv (T ∗n µ, π) = o n−α
provided
R
[o,∞) x
(s+1)/2 µ (dx)
< ∞.
for all α <
s−1
,
2
Proof. In order to prove the second part of the theorem, again use
Corollary 17.7, p. 172, in Bhattacharya and Ranga Rao (1976), to get
Px (τ0 > n) ≤ P (x + Sn > 0)
√ |µ1 |
Sn − nµ1
x
√
= P
>− √ + n
σ n
σ n
σ
√ |µ1 |
x
c′
≤ 1−Φ − √ + n
+√ √ |µ1 | s .(4.14)
x
σ
σ n
n 1 + − σ√
+
n σ n
√
√
x
Note that − σ√
+ n |µσ1 | > n |µ2σ1 | if x ≤
n
n|µ1 |
2 ,
and, therefore, (4.14) yields
Px (τ0 > n) ≤ c′′′ n−(s+1)/2 ∀x ≤
n |µ1 |
,
2
where c′′′ is a constant depending only on |µ1 |, σ, ρs and s. Also, trivially,
Px (τ0 > n) ≤
2x
n |µ1 |
∀x >
.
n |µ1 |
2
Therefore,
Pµ (τ0 > n) ≤ c′′′ n−(s+1)/2 +
′′′ −(s+1)/2
≤ c n
+
1
n |µ1 |
Z
2xµ (dx)
{x>n|µ1 /2|}
1
n |µ1 | (n |µ1 /2|)
s−1
2
Z
x
s+1
2
µ (dx)
{x>n|µ1 /2|}
= O n−(s+1)/2 .
P
α
The series ∞
n=1 n Pµ (τ0 > n) then converges to a finite limit if (s + 1) /2 −
α > 1, i.e., α < (s − 1) /2. This concludes the proof of Theorem 4.4.
2
For proving central limit theorems for general monotone (increasing)
Markov processes on [0, ∞), under the hypothesis (4.2), (4.3), (4.6), and
188
Rabi Bhattacharya, Mukul Majumdar and Nigar Hashimzade
some appropriate moment assumptions, a good way to proceed is to establish a CLT for the normalized sum


(j+1)
τ0
−1 Z
n
1 X X

√
(4.15)
f (Xm ) − f dπ  −→ N 0, σ 2 ,

n
(j)
j=1
m=τ0
(j)
is the j− th return time to 0: τ0
(0)
= 0, τ0
where τ0
j ≥ 1, τ0
(j)
(1)
n
o
(j−1)
= inf n > τ0
: Xn = 0 ,
= τ0 . The classical CLT holds for (4.15), as n → ∞, if
!2
Z
τX
0 −1 E0
f (Xm ) − f dπ
< ∞.
(4.16)
m=0
From this one may derive the desired result (See, e.g., Bhattacharya and
Majumdar, 2007, Theorem 10.2, pp. 187–188.),
Z
N 1 X
√
(4.17)
f (Xi ) − f dπ −→ N 0, δ2 as N → ∞,
N i=0
where δ2 is given by
δ2 = (E0 τ0 )−1 E0
"τ −1 0
X
m=0
f (Xm ) −
Z
f dπ
# 2
≡ (E0 τ0 )−1 σ 2 .
(4.18)
Taking this route, it easily follows that the CLT holds whatever be the initial
distribution µ. Also see Bhattacharya (1982), Theorem 2.6.
Our final result is now an easy consequence of (4.13), and the above
sufficient condition (4.16) for the CLT (4.17).
Theorem 4.5. For the process (4.8), assume (4.12) holds for some s > 3.
Then for all bounded measurable f on [0, ∞), the CLT (4.17) holds, whatever
the initial distribution.
R
Proof. To verify (4.16), write c = sup f (x) − f dπ : x ∈ [0, ∞) .
P
R
2
τ0 −1 Then E0
f
(X
)
−
f
dπ
≤ E0 τ02 c2 < ∞, provided s > 3, as
m
m=0
shown by (4.13).
2
Remark 4.2. The technique of representing the sum in (4.18) as a sum
of i.i.d. block sums such as (4.15), allows one not only to apply the classical
CLT, but also to immediately extend Theorem 4.5 to a functional central
limit theorem (See,e.g., Billingsley, 1968, pp. 68-73, or Bhattacharya and
Waymire, 2009, pp. 99-101).
Monotone Markov processes
189
References
Bhattacharya, R.N. (1982). On the functional central limit theorem and the law of the
iterated logarithm for Markov processes. Z. Wahrsch. Verw. Gebiete, 60, 185–201.
Bhattacharya, R.N. and Lee, O. (1988). Asymptotics of a class of Markov processes
which are not in general irreducible. Ann. Probab., 16, 1333–47 (correction (1997):
Ann. Probab., 25, 1541–43).
Bhattacharya, R.N. and Majumdar, M. (1999). On a theorem of Dubins and Freedman. J. Theoret. Probab., 12, 1067–1087.
Bhattacharya, R.N. and Majumdar, M. (2001). On a class of random dynamical
systems: theory and applications. J. Econom. Theory, 96, 208–229.
Bhattacharya, R.N. and Majumdar, M. (2004). Stability in distribution of randomly
perturbed quadratic maps as Markov processes. Ann. Appl. Probab., 14, 1802–1809.
Bhattacharya, R.N. and Majumdar, M. (2007). Random Dynamical Systems: Theory
and Applications. Cambridge University Press, Cambridge.
Bhattacharya, R.N. and Majumdar, M. (2010). Random iterates of monotone maps.
Rev. Econ. Des., 14, 185–192.
Bhattacharya, R.N. and Ranga Rao, R. (1976). Normal Approximation and Asymptotic Expansions. John Wiley and Sons, New York.
Bhattacharya, R.N. and Rao, B.V. (1993). Random iteration of two quadratic maps.
In Stochastic Processes: A Festschrift in Honour of Gopinath Kallianpur, (Cambanis, S., Ghosh, J.K., Karandikar, R.L. and Sen, P.K., eds.). Springer-Verlag, New
York, 13–22.
Bhattacharya, R.N. and Waymire, E.C. (2009). Stochastic Processes with Applications. SIAM Classics in Applied Mathematics, 61. SIAM, Philadelphia.
Billingsley, P. (1968). Convergence of Probability Measures. Wiley, New York.
Blackwell, D. (1965). Discounted dynamic programming. Ann. Math. Statist., 36,
226–235.
Blumenthal, R.M. and Corson, H. (1972). On continuous collections of measures.
In Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and
Probability, 2, (L.M. Le Cam, J. Neyman and E.L. Scott, eds.). Univ. California
Press, Berkeley, 33–40.
Chakraborty, S. and Rao, B.V. (1998) Completeness of Bhattacharya metric on the
space of probabilities. Statist. Probab. Lett., 36, 321–326.
Diaconis, P. and Freedman, D. (1999) Iterated random functions. SIAM Rev., 41,
45–76.
Dubins, L.E. and Freedman, D.A. (1966). Invariant probabilities for certain Markov
processes. Ann. Math. Statist., 37, 837–868.
Ellner, S. (1984). Asymptotic behavior of some stochastic difference equation population models. J. Math. Biol., 19, 169-200.
Feller, W. (1971). An Introduction to Probability Theory and Its Applications, Vol. 2.
Second Edition. John Wiley and Sons, New York.
Gordin, M.I. and Lifsic, B.A. (1978). The central limit theorem for stationary Markov
processes (English translation). Soviet Math. Dokl., 19, 392–394.
Hopenhayn, H.A. and Prescott, E.C. (1992). Stochastic monotonicity and stationary
distributions for dynamic economies. Econometrica, 60, 1387–1406.
190
Rabi Bhattacharya, Mukul Majumdar and Nigar Hashimzade
Iams, S. and Majumdar, M. (2010). Stochastic equlibrium: concepts and computations
for Lindley processes. Internat. J. of Econom. Theory, 6, 47–56.
Kifer, Y. (1986). Ergodic Theory of Random Transformations. Birkhauser, Boston.
Lindley, D.V. (1952). The theory of queues with a single server. Math. Proc. Cambridge
Philos. Soc., 48, 277.
Lindvall, T. (1992). Lectures on the Coupling Method. John Wiley and Sons, New
York.
Lund, R.B. and Tweedie, R.L. (1996). Geometric convergence rates for stochastically
ordered Markov chains. Math. Oper. Res., 21, 182–194.
Maitra, A. (1968). Discounted dynamic programming on compact metric spaces.
Sankhyā, Ser. A, 27, 241–248.
Majumdar, M. and Mitra, T. (1983). Dynamic optimization with non-convex technology: The case of a linear objective function. Rev. Econom. Stud., 50, 143–151.
Majumdar, M., Mitra, T. and Nyarko, Y. (1989). Dynamic optimization under
uncertainty: non-convex feasible set. In Joan Robinson and Modern Economic
Theory, (G. Feiwel et al., eds.). Macmillan, New York, 545–590.
Meyn, S.P. and Tweedie, R.L. (1993). Markov Chains and Stochastic Stability. Springer-Verlag, New York.
Ross, S.M. (1983). Introduction to Stochastic Dynamic Programming. Academic Press,
New York.
Spitzer, F. (1956). A combinatorial lemma and its application to probability theory.
Trans. Amer. Math. Soc., 82, 323–339.
Yahav, J.A. (1975). On a fixed point theorem and its stochastic equivalent. J. Appl.
Probab., 12, 605–611.
Rabi Bhattacharya
Department of Mathematics
The University of Arizona
Tucson, AZ 85721, USA
E-mail: [email protected]
Nigar Hashimzade
School of Economics
University of Reading
Reading, Berkshire RG6 6AA
United Kingdom
E-mail: [email protected]
Paper received October 2009; revised January 2010.
Mukul Majumdar
Department of Economics
Cornell University
Ithaca, NY 14853, USA
E-mail: [email protected]