Download Weak convergence of probability measures - D-MATH

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Generalized linear model wikipedia , lookup

Birthday problem wikipedia , lookup

Probability amplitude wikipedia , lookup

Transcript
Weak convergence of probability measures
These additional notes contain a short overview of the most important results on weak
convergence of probability measures. Many more details and results as well as proofs can
be found in the (German) lecture notes “Wahrscheinlichkeitstheorie”.
1.
Weak convergence of probability measures on metric spaces
In the sequel, (S, d) is a metric space with Borel σ-field S = B(S). Let µ and µn , n ∈ IN ,
be probability measures on (S, S). How can one define convergence of (µn )n∈IN to µ?
Possible ideas could be
a) n→∞
lim µn (A) = µ(A) for all A ∈ S.
b) kµn − µk := sup |µn (A) − µ(A)| −→ 0 for n → ∞ (this is the so-called convergence
A∈S
in variation).
In general, both notions are too restrictive for our purposes:
– For µn := δ 1 and µ := δ0 , neither a) nor b) is satisfied, because A := {0} has
n
µn (A) ≡ 0, but µ(A) = 1.
– If µn is a standardised binomial distribution with parameters n, p and µ is the
standard normal distribution N (0, 1), there is a countable set A ∈ B(IR) with
µn (A) ≡ 1, but of course µ(A) = 0.
(1.1) Definition. We say that (µn )n∈IN converges weakly to µ and write µn n→∞
=⇒ µ if
lim
n→∞
Z
h dµn =
Z
h dµ
for all h ∈ Cb (S),
where Cb (S) denotes the space of all bounded continuous functions h : S → IR.
(1.2) Theorem (Portmanteau Theorem): The following statements are equivalent:
1) µn =⇒ µ.
n→∞
2) n→∞
lim
Z
h dµn =
Z
h dµ for all uniformly continuous h ∈ Cb (S).
1
3) lim sup µn (F ) ≤ µ(F )
for all closed F ⊆ S.
n→∞
4) lim inf µn (G) ≥ µ(G)
n→∞
for all open G ⊆ S.
5) lim µn (A) = µ(A) for all µ-boundaryless A ∈ S, i.e. A ∈ S with µ(Ā \ A◦ ) = 0,
n→∞
where Ā is the closure and A◦ the interior of A.
If one thinks of µn , µ as the distributions of S-valued random variables Xn , X, one often
uses instead of weak convergence of µn to µ the terminology that the Xn converge to X
in distribution. More precisely, suppose that we have for any n ∈ IN a probability space
(Ωn , Fn , Pn ) and a measurable mapping Xn : Ωn → S, i.e. an S-valued random variable,
and also a probability space (Ω, F, P ) and a measurable mapping X : Ω → S. (This is
always possible if we only specify the distributions µn , µ, but not the mappings: We can
take Ωn := Ω := S, Fn := F := S, Pn := µn , P := µ and Xn := X := Id : S → S.)
The key point here is that all the Xn and X have the same range S. The distributions
µn := Pn ◦ Xn−1 of Xn under Pn and µ := P ◦ X −1 , the distribution of X under P , are
then probability measures on (S, S). We then say that (Xn )n∈IN converges in distribution
d
to X and write Xn =⇒ X if µn =⇒ µ. Other notations sometimes used are Xn −→ X,
n→∞
n→∞
n→∞
L
Xn n→∞
−→ X, L(Xn ) n→∞
=⇒ L(X), and very explicitly L(Xn |Pn ) n→∞
=⇒ L(X|P ).
2
2.
Tightness and Prohorov’s theorem
In this section, (S, d) is a metric space with Borel σ-field S = B(S), and M1 (S) is the set
of all probability measures on (S, S). Convergence of a sequence (µn )n∈IN in M1 (S) to µ
then means that µn n→∞
=⇒ µ, i.e. n→∞
lim
Z
h dµn =
Z
h dµ for all h ∈ Cb (S).
(2.1) Remark. The topology on M1 (S) associated to weak convergence has as a neighbourhood basis the sets of the form
Uε,h1 ,...,hn (µ) := ν ∈
Z
M1 (S) hi dν −
Z
hi dµ
< ε,
i = 1, . . . , n
with ε > 0, n ∈ IN , hi ∈ Cb (S).
(Recall that a neighbourhood basis U of a point x is a system of neighbourhoods of x
such that each neighbourhood of x contains some U ∈ U.)
(2.2) Definition. A set M ⊆ M1 (S) is called relatively sequentially compact if each
sequence in M contains a weakly convergent subsequence.
The goal of this section is a characterisation of relatively sequentially compact subsets of
M1 (S).
(2.3) Remark. On M1 (S), we have the above topology corresponding to weak convergence and hence also a notion of compactness: M ⊆ M1 (S) is called compact if every
covering of M by open sets contains a finite covering of M. In general, one has for topological spaces neither “compact =⇒ sequentially compact” (!) nor “sequentially compact
=⇒ compact”. However, the two notions are equivalent for metric spaces. So it is of interest whether the topology of weak convergence is metrisable, i.e. if there exists a metric
% on M1 (S) such that %(µn , µ) −→ 0 for n → ∞ if and only if µn n→∞
=⇒ µ. This is possible
if S is separable.
(2.4) Example. Take S = IR and µn = δxn for a sequence (xn )n∈IN in IR. If we have
lim xn = +∞, one cannot hope in general to find a weakly convergent subsequence of
n→∞
(µn ); for xn = n, we have for example
Z
h dµn = h(n), and this need not converge
simultaneously for all h ∈ Cb (IR) to some limit; one can for instance look at h(x) = sin x.
But if the sequence (xn ) is bounded, it has a convergent subsequence xnk −→ x for k → ∞,
and then we obviously have µnk =⇒ δx .
k→∞
3
The above example shows that if we want to obtain relative sequential compactness, we
need to impose some kind of boundedness condition; the mass of µn should not be allowed
to wander away as n → ∞.
(2.5) Definition. M ⊆ M1 (S) is called tight if for every ε > 0, there exists a compact
set K ⊆ S with µ(K) ≥ 1 − ε for all µ ∈ M.
The main result of this section is now
(2.6) Theorem (Prohorov): Consider M ⊆ M1 (S).
1) If M is tight, then M is relatively sequentially compact.
2) Suppose that S is complete and separable. If M is relatively sequentially compact, then
M is also tight.
4
3.
Weak convergence on S = C[0, 1]
In this section, we take as S = C[0, 1] the space of continuous functions x : [0, 1] → IR
with the sup-norm kxk := sup |x(t)| and the corresponding metric d(x, y) := kx − yk.
0≤t≤1
Then S is a Banach space and separable, because [0, 1] is compact; see Dieudonné (1969),
“Foundations of Modern Analysis”, 7.4.4. The Borel σ-field S = B(S) = σ(Cb (S)) is also
generated by the system Z of all cylinder sets
Z = {x ∈ S | x(tj ) ∈ Aj , j = 1, . . . , n}
with n ∈ IN , 0 ≤ t1 < t2 < · · · < tn ≤ 1 and A1 , . . . , An ∈ B(IR), i.e.
B(S) = σ(Z).
(3.1)
Moreover, Z is clearly closed under taking intersections..
A probability measure µ on C[0, 1], or more precisely on (S, B(S)), corresponds to a
real-valued stochastic process X = (Xt )0≤t≤1 with continuous trajectories: If we have
such an X defined on (Ω, F, P ), we obtain µ on C[0, 1] as the image of P under X,
i.e. as the distribution of X under P ; conversely, if we have µ on C[0, 1], we can take
Ω := S = C[0, 1], F := S = B(C[0, 1]), P := µ, and the coordinate process X with
Xt (ω) := ω(t), 0 ≤ t ≤ 1. Moreover, µ is uniquely determined by its finite-dimensional
marginal distributions
µ(J) := µ ◦ πJ−1 ,
J ⊆ [0, 1] finite, on IRJ ,
where πJ : S → IR|J| , x 7→ πJ (x) := (x(tj ))tj ∈J are the canonical projections.
Because all the πJ : S → IR|J| are continuous, µn =⇒ µ on C[0, 1] implies the weak
n→∞
convergence of all finite-dimensional marginal distributions, i.e. µ(J)
=⇒ µ(J) on IR|J| for
n n→∞
all finite J ⊆ [0, 1]. Indeed, if g ∈ Cb (IR|J| ), we have g◦πJ ∈ Cb (S), and the transformation
theorem yields
Z
g dµ(J)
n =
IR|J|
−→
Z
g d(µn ◦ πJ−1 ) =
S
(g ◦ πJ ) dµ =
(g ◦ πJ ) dµn
S
IR|J|
Z
Z
Z
g d(µ ◦ πJ−1 ) =
IR|J|
Z
g dµ(J) .
IR|J|
However, the converse is not true, as shown by the following simple counterexample.
5
(3.2) Example. Take x ≡ 0, µ = δx and µn = δxn with the xn piecewise linear, xn (0) = 0,
xn ( n1 ) = 1 and xn (t) = 0 for t ≥
2
n
[−→ picture !]. Then the xn clearly converge to x
pointwise, but not uniformly.
For J = {t1 , . . . , tm } ⊆ [0, 1], we have
=⇒ µ ◦ πJ−1
µn ◦ πJ−1 n→∞
due to pointwise convergence, because
Z
g d(µn ◦ πJ−1 ) =
Z
(g ◦ πJ ) dµn = g(xn (tj ))
j=1,...,m
.
S
IR|J|
But (µn )n∈IN does not converge weakly to µ, because the xn do not converge to x uniformly;
for example, the function h(x) := min(kxk, 1) is in Cb (C[0, 1]), and
but
Z
Z
h dµn = h(xn ) ≡ 1,
h dµ = h(x) = h(0) = 0.
If we now think in general of µn and µ as the distributions of continuous stochastic
processes X n and X, respectively, the next theorem is the key result on convergence in
distribution of continuous stochastic processes.
(3.3) Theorem. For probability measures (µn )n∈IN , µ on (C[0, 1], B(C[0, 1])), the following
are equivalent:
1) µn n→∞
=⇒ µ.
2) All finite-dimensional marginal distributions of the µn converge weakly to the corresponding finite-dimensional marginal distributions of µ, and the sequence (µn )n∈IN is
tight.
To analyse the tightness of a given set M ⊆ M1 (C[0, 1]), we need a description of the
(relatively) compact subsets of C[0, 1]. For that purpose, define for x ∈ C[0, 1] and δ > 0
the modulus of continuity of x as
n
o
wδ (x) := sup |x(t) − x(s)| s, t ∈ [0, 1] with |t − s| ≤ δ .
Then we have
lim wδ (x) = 0
δ&0
for every x ∈ C[0, 1],
because each x is uniformly continuous on [0, 1]. Moreover, x 7→ wδ (x) is continuous (as
a mapping from C[0, 1] to IR for fixed δ > 0), because |wδ (x) − wδ (y)| ≤ 2kx − yk (as can
be checked easily), and δ 7→ wδ (x) is clearly increasing for every fixed x.
6
(3.4) Proposition (Arzelà–Ascoli): A set A ⊆ C[0, 1] is relatively compact in C[0, 1]
if and only if
1) A is uniformly bounded, i.e. sup kxk < ∞,
x∈A
and
2) A is uniformly (over its elements x) uniformly continuous, meaning that
lim sup wδ (x) = 0.
δ&0 x∈A
(3.5) Remark. If one already has 2), one can also replace 1) by
10 ) A is uniformly bounded at 0, i.e. sup |x(0)| < ∞.
x∈A
−1
(3.6) Proposition. Fix M ⊆ M1 (C[0, 1]) and denote by µ0 := µ({x(0) ∈ · }) = µ ◦ π{0}
for µ ∈ M the distribution of “coordinate 0”. Then M is tight if and only if we have both
that {µ0 | µ ∈ M} is tight on IR and that
(3.7)
lim sup µ({x | wδ (x) ≥ η}) = 0
δ&0 µ∈M
for all η > 0,
i.e. if wδ ( · ) goes µ-stochastically to 0, uniformly over M, as δ → 0.
(3.8) Remark. If M is a sequence (µn )n∈IN , then instead of (3.7) in Proposition (3.6), it
is already sufficient if we have
(3.9)
lim lim sup µn ({x | wδ (x) ≥ η}) = 0
δ&0
for all η > 0.
n→∞
Remark. In Proposition (3.6), the family {µ0 | µ ∈ M} is for example trivially tight if
µ({x | x(0) = 0}) = 1
for all µ ∈ M.
7