Download Lecture Notes - Andrew Tulloch

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia, lookup

Transcript
ANDREW TULLOCH
A DVA N C E D P R O B A B I L ITY
TRINITY COLLEGE
THE UNIVERSITY OF CAMBRIDGE
Contents
1
Conditional Expectation
1.1 Discrete Case
5
6
1.2 Existence and Uniqueness
7
1.3 Conditional Jensen’s Inequalities
11
1.4 Product Measures and Fubini’s Theorem
1.5 Examples of Conditional Expectation
1.6 Notation for Example Sheet 1
2
Discrete Time Martingales
2.1 Optional Stopping
13
14
14
17
18
2.2 Hitting Probabilities for a Simple Symmetric Random Walk
2.3 Martingale Convergence Theorem
2.4 Uniform Integrability
22
2.5 Backwards Martingales
2.6 Applications of Martingales
2.6.1
3
21
25
25
Martingale proof of the Radon-Nikodym theorem
Stochastic Processes in Continuous Time
26
27
20
4
4
andrew tulloch
Bibliography
29
1
Conditional Expectation
Let (Ω, F , P) be a probability space. Ω is a set, F is a σ-algebra on
Ω, and P is a probability measure on (Ω, F ).
Definition 1.1. F is a σ-algebra on Ω if it satisfies
• ∅, Ω ∈ F
• A ∈ F ==> Ac ∈ F
• ( An )n≥0 is a collection of sets in F then ∪n An ∈ F .
Definition 1.2. P is a probability measure on (Ω, F ) if
• P : F → [0, 1] is a set function.
• P(∅) = 0, P(Ω) = 1,
• ( An )n≥0 is a collection of pairwise disjoint sets in F , then P(∪n An ) =
∑ n P( A n ).
Definition 1.3. The Borel σ-algebra B(R) is the σ-algebra generated
by the open sets of R. Call O the collection of open subsets of R,
then
B(R) = ∩{ξ : ξ is a sigma algebra containing O}
(1.1)
Definition 1.4. A a collection of subsets of Ω, then we write σ (A) =
∩{ξ : ξ a sigma algebra containing A}
Definition 1.5. X is a random variable on (Ω, F ) if X : Ω− > R is a
function with the property that X −1 (V ) ∈ F for all V open sets in R.
6
andrew tulloch
Exercise 1.6. If X is a random variable then { B ⊆ R, X −1 ( B) ∈ F } is a
σ-algebra and contains B(R).
If ( Xi , i ∈ I ) is a collection of random variables, then we write
σ( Xi , i ∈ I ) = σ ({ω ∈ Ω : Xi(ω )∈B},i∈ I,B∈B(R))) and it is the smallest
σ-algebra that makes all the Xi ’s measurable.
Definition 1.7. First we define it for the positive simple random
variables.
E
n
∑ ci 1 ( Ai )
!
n
=
i =1
∑ P( A i ) .
(1.2)
i =1
with ci positive constants, ( Ai ) ∈ F .
We can extend this to any positive random variable X ≥ 0 by
approximation X as the limit of piecewise constant functions.
For a general X, we write X = X + − X − with X + = max( X, 0), X − =
max(− X, 0).
If at least one of E( X + ) or E( X − ) is finite, then we define E( X ) =
E( X + ) + E( X − ).
We call X integrable if E(| X |) < ∞.
Definition 1.8. Let A, B ∈ F , P( B) > 0. Then
P( A ∩ B )
P( B )
E( X1( B))
E[ X | B ] =
P( B )
P( A | B ) =
Goal - we want to define E( X |G) that is a random variable measurable with respect to the σ-algebra G .
1.1
Discrete Case
Suppose G is a σ-algebra countably generated ( Bi)i∈N is a collection of
pairwise disjoint sets in F with ∪ Bi = Ω. Let G = σ ( Bi , i ∈ N).
It is easy to check that G = {∪ j∈ J Bj , J ⊆ N }.
Let X be an integrable random variable. Then
X 0 = E( X |G) =
∑ E(X |Bi ) I( Bi )
i ∈N
advanced probability
(i) X 0 is G -measurable (check).
(ii)
E | X 0 | ≤ E(| X |)
(1.3)
and so X 0 is integrable.
(iii) ∀ G ∈ G , then
E XI( G ) = E X 0 I( G )
(1.4)
(check).
1.2
Existence and Uniqueness
Definition 1.9. A ∈ F , A happens almost surely (a.s.) if P( A) = 1.
Theorem 1.10 (Monotone Convergence Theorem). If Xn ≥ 0 is a
sequence of random variables and Xn ↑ X as n → ∞ a.s, then
E( Xn ) ↑ E( X )
(1.5)
almost surely as n → ∞.
Theorem 1.11 (Dominated Convergence Theorem). If ( Xn ) is a sequence of random variables such that | Xn | ≤ Y for Y an integrable random
as
as
variable, then if Xn → X then E( Xn ) → E( X ).
Definition 1.12. For p ∈ [1, ∞), f measurable functions, then
1
k f k p = E[| f | p ] p
(1.6)
k f k∞ = inf{λ : | f | ≤ λa.e.}
(1.7)
Definition 1.13.
L p = L p (Ω, F , P) = { f : k f k p < ∞}
Formally, L p is the collection of equivalence classes where two functions are equivalent if they are equal a.e. We will just represent an
element of L p by a function, but remember that equality is a.e.
7
8
andrew tulloch
Theorem 1.14. The space ( L2 , k · k2 ) is a Hilbert space with hU, V i >=
E(UV ).
Suppose H is a closed subspace, then ∀ f ∈ L2 there exists a unique
g ∈ H such that k f − gk2 = inf{k f − hk2 , h ∈ H and h f − g, hi = 0 for
all h ∈ H.
We call g the orthogonal projection of f onto H.
Theorem 1.15. Let (Ω, F , ¶) be an underlying probability space, and let X
be an integrable random variable, and let G ⊂ F sub σ-algebra. Then there
exists a random variable Y such that
(i) Y is G -measurable
(ii) If A ∈ G ,
E( XI( A) = E(YI( A)))
(1.8)
and Y is integrable.
Moreover, if Y 0 also satisfies the above properties, then Y = Y 0 a.s.
Remark 1.16. Y is called a version of the conditional expectation of X given
G and we write G = σ( Z ) as Y = E( X |G).
Remark 1.17. (b) could be replaced by the following condition: for all Z
G -measurable, bounded random variables,
E( XZ ) = E(YZ )
(1.9)
Proof. Uniqueness - let Y 0 satisfy (a) and (b). If we consider {Y 0 −
Y > 0} = A, A is G measurable. From (b),
E (Y 0 − Y )I( A) = E( XI( A)) − E( XI( A)) = 0
and hence P(Y 0 − Y > 0)) = 0 which implies that Y 0 ≤ Y a.s. Similarly, Y 0 ≥ Y a.s.
Existence - Complete the following three steps:
(i) X ∈ L2 (Ω, F , ¶) is a Hilbert space with hU, V i = E(UV ). The
space L2 (Ω, G , ¶) is a closed subspace.
p
as
Xn → X ( L2 ) => Xn → X => ∃subseqXnk → X => X 0 = lim sup Xnk
(1.10)
advanced probability
We can write
L2 (Ω, F , ¶) = L2 (Ω, G , ¶) + L2 (Ω, G , ¶)⊥
X =Y+Z
Set Y = E( X |G), Y is G -measurable, A ∈ G .
E( XI( A)) = EYI( A) + EZI( A)
| {z }
=0
(ii) If X ≥ 0 then Y ≥ 0 a.s. Consider A = {Y < 0}, then
0 ≤ E( XI( A)) = E(YI( A)) ≤ 0
(1.11)
Thus P( A) = 0 ⇒ Y ≥ 0 a.s.
Let X ≥ 0, Set 0 ≤ Xn = max ( X, n) ≤ n, so Xn ∈ L2 for all n.
Write Yn = E( Xn |G), then Yn ≥ 0 a.s., Yn is increasing a.s.. Set
Y = lim sup Yn . So Y is G -measurable. We will show Y = E( X |G)
a.s. For all A ∈ G , we need to check E( XI( A)) = E(YI( A)) . We
know that E( Xn I( A)) = E(Yn I( A)), and Yn ↑ Y a.s. Thus, by
monotone convergence theorem, E( XI( A)) = E(YI( A)).
If X is integrable, setting A = Ω, we have Y is integrable.
(iii) X is a general random variable, not necessarily in L2 or ≥ 0. Then
we have that X = X + + X − . We define E( X |G) = E( X + |G) −
E( X − |G). This satisfies (a), (b).
Remark 1.18. If X ≥ 0, we can always define Y = E( X |G) a.s. The
integrability condition of Y may not be satisfied.
Definition 1.19. Let G0 , G1 , . . . be sub σ-algebras of F . Then they are
called independent if for all i, j ∈ N,
P Gi ∩ · · · ∩ Gj = Πin=1 P( Gi )
Theorem 1.20. (i) If X ≥ 0 then E( X |G) ≥ 0
(1.12)
9
10
andrew tulloch
(ii) E(E( X |G)) = E( X ) (A = Ω)
(iii) X is G -measurable implies E( X |G) = X a.s.
(iv) X is independent of G , then E( X |G) = E( X ).
Theorem 1.21 (Fatau’s lemma). Xn ≥ 0, then for all n,
E(lim inf Xn ) ≤ lim inf E( Xn )
(1.13)
Theorem 1.22 (Conditional Monotone Convergence). Let Xn ≥ 0,
Xn ↑ X a.s. Then
E( Xn |G) ↑ E( X |G) a.s.
(1.14)
Proof. Set Yn = E( Xn |G). Then Yn ≥ 0 and Yn is increasing. Set
Y = lim sup Yn . Then Y is G -measurable.
Theorem 1.23 (Conditional Fatau’s Lemma). Xn ≥ 0, then
E(lim inf Xn |G) ≤ lim inf E( Xn |G) a.s.
(1.15)
Proof. Let X denote the limit inferior of the Xn . For every natural
number k define pointwise the random variable Yk = infn≥k Xn . Then
the sequence Y1 , Y2 , . . . is increasing and converges pointwise to X.
For k ≤ n, we have Yk ≤ Xn , so that
E(Yk |G) ≤ E( Xn |G) a.s
(1.16)
by the monotonicity of conditional expectation, hence
E(Yk |G) ≤ inf E( Xn |G) a.s.
n≥k
(1.17)
because the countable union of the exceptional sets of probability
zero is again a null set. Using the definition of X, its representation
as pointwise limit of the Yk , the monotone convergence theorem for
conditional expectations, the last inequality, and the definition of the
limit inferior, it follows that almost surely
advanced probability
E lim inf Xn |G = E( X |G)
n→∞
= E lim Yk |G
k→∞
(1.18)
(1.19)
= lim E(Yk |G)
(1.20)
≤ lim inf E( Xn |G)
(1.21)
= lim inf E( Xn |G)
(1.22)
k→∞
k→∞ n≥k
n→∞
Theorem 1.24 (Conditional dominated convergence). TODO
1.3
Conditional Jensen’s Inequalities
Let X be an integrable random variable such that φ( x ) is integrable of
φ is non-negative. Suppose G ⊂ F is a σ-algebra. Then
E(φ( X )|G) ≥ φ(E( X |G))
(1.23)
almost surely. In particular, if 1 ≤ p < ∞, then
kE( X |G) k p ≤ k X k p
(1.24)
Proof. Every convex function can be written as φ( x ) = supi∈N ( ai x +
bi ), ai , bi ∈ R. Then
E(φ( X )|G) ≥ aE( X |G) + bi
E(φ( X )|G) ≥ sup( ai E( X |G) + bi )
i ∈N
= φ(E( X |G)
The second part follows from
p
p
kE( X |G) k p = E(|E( X |G) | p ) ≤ E(E(| X | p |G)) = E(| X | p ) = k X k p
(1.25)
Proposition 1.25 (Tower Property). Let X ∈ L1 , H ⊂ G ⊂ F be
11
12
andrew tulloch
sub-σ-algebras. Then
E(E( X |G) |H) = E( X |H)
(1.26)
almost surely.
Proof. Clearly E( X |H) is H-measurable. Let A ∈ H. Then
E(E( X |H) I( A)) = E( XI( A)) = E(E( X |G) I( A))
(1.27)
Proposition 1.26. Let X ∈ L1 , G ⊂ F be sub-σ-algebras. Suppose that Y
is bounded, G -measurable. Then
E( XY |G) = YE( X |G)
(1.28)
almost surely.
Proof. Clearly YE( X |G) is G -measurable. Let A ∈ G . Then


E(YE( X |G) I( A)) = EE( X |G)

(YI( A))
| {z }

 = E( XYI( A))
G -measurable, bounded
(1.29)
Definition 1.27. A collection A of subsets of Ω is called a π-system if
for all A, B ∈ A, then A ∩ B ∈ A.
Proposition 1.28 (Uniqueness of extension). Suppose that ξ is a σalgebra on E. Let µ1 , µ2 be two measures on ( E, ξ ) that agree on a π-system
generating ξ and µ1 ( E) = µ2 ( E) < ∞. Then µ1 = µ2 everywhere on ξ.
Theorem 1.29. Let X ∈ L1 , G , H ⊂ F two sub-σ-algebras. If σ ( X, G) is
independent of H, then
E( X |σ (G , H)) = E( X |G)
almost surely.
(1.30)
advanced probability
Proof. Take A ∈ G , B ∈ H.
E(E( X |G) I( A) I( B)) = P( B) E(E( X |G) I( A))
= P( B) E( XI( A))
= E( XI( A) I( B))
= E(E( X |σ(G , H)) I( A ∩ B))
Assume X ≥ 0, the general case follows by writing X = X + − X − .
Now, letting F ∈ F , we have that µ( F ) = E(E( X |G) I( F )), and if
µ, ν are two measures on (Ω, pF ), setting A = { A ∩ B, A ∈ G , B ∈
H}. Then A is a π-system.
µ, ν are two measurables that agree on the π-system A and µ(Ω) =
E(E( X |G)) = E( X ) = νΩ < ∞, since X is integrable. Note that A
generates σ (G , H).
So, by the uniqueness of extension theorem, µ, ν agree everywhere
on σ (G , H).
Remark 1.30. If we only had X independent of H and G independent of H,
the conclusion can fail. For example, consider coin tosses X, Y independent
0, 1 with probability 12 , and Z = I( X = Y ).
1.4
Product Measures and Fubini’s Theorem
Definition 1.31. A measure space ( E, ξ, µ) is called σ-finite if there
exists sets (Sn )n with ∪Sn = E and µ(Sn ) < ∞ for all n.
Let ( E1 , ξ 1 , µ1 ) and ( E2 , ξ 2 , µ2 ) be two σ-finite measure spaces,
with A = { A1 × A2 : A1 ∈ ξ 1 , A2 ∈ ξ 2 } a π-system of subsets of
E = E1 × E2 . Define ξ = ξ 1 ⊗ ξ 2 = σ ( A).
Definition 1.32 (Product measure). Let ( E1 , ξ 1 , µ1 ) and ( E2 , ξ 2 , µ2 ) be
two σ-finite measure spaces. Then there exists a unique measure µ on
( E, ξ ) (µ = µ1 ⊗ µ2 ) satisfying
µ ( A1 × A2 ) = µ1 ( A1 ) µ2 ( A2 )
for all A1 ∈ ξ 1 , A2 ∈ ξ 2 .
(1.31)
13
14
andrew tulloch
Theorem 1.33 (Fubini’s Theorem). Let ( E1 , ξ 1 , µ1 ) and ( E2 , ξ 2 , µ2 ) be
σ-finite measure spaces. Let f ≥ 0, f is ξ-measurable. Then
µ( f ) =
Z
Z
E1
E2
f ( x1 , x2 )µ2 (dx2 ) µ1 (dx1 )
(1.32)
If f is integrable, then x2 7→ f ( x1 , x2 ) is u2 -integrable for u1 -almost all x.
R
Moreover, x1 7→ E f ( x1 , x2 µ2 (dx2 ) is µ1 -integrable and µ( f ) is given
2
by (1.32).
1.5
Examples of Conditional Expectation
Definition 1.34. A random vector ( X1 , X2 , . . . , Xn ) ∈ Rn is called a
Gaussian random vector if and only if for all a1 , . . . , an ∈ R,
a 1 X1 + · · · + a n X n
(1.33)
is a Gaussian random variable.
( Xt )t≥0 is called a Gaussian process if for all 0 ≤ t1 ≤ t2 ≤ · · · ≤
tn , the vector Xt1 , . . . , Xtn is a Gaussian random vector.
Example 1.35 (Gaussian case). Let ( X, Y ) e a Gaussian vector in R2 . We
want to calculate
E( X |Y ) = E( X |σ (Y )) = X 0
(1.34)
where X 0 = f (Y ) with f a Borel function. Let’s try f of a linear function
X 0 = aY = b, a, b ∈ R to be determined.
Note that E( X ) = E( X 0 ) and E( X 0 − X )Y = 0 ⇒ Cov( X − X 0 , Y ) = 0
by laws of conditional expectation. Then we have that
aE(Y ) + b = E( X ) Cov( X, Y ) = aV( X )
(1.35)
TODO - continue inference
1.6
Notation for Example Sheet 1
(i) G ∨ H = σ ( G, H ).
(ii) Let X, Y be two random variables taking values in R with joint
density f X,Y ( x, y) and h : R → R be a Borel function such that
advanced probability
h( X ) is integrable. We want to calculate
E(h( X )|Y ) = E(h( X )|σ (Y ))
(1.36)
Let g be bounded and measurable. Then
E(h( X ) g(Y )) =
Z Z
h( x ) g(y) f X,Y ( x, y)dxdy
f X,Y ( x, y)
f (y)dxdy
f Y (y) Y
Z Z
f X,Y ( x, y)
=
h( x )
dx g(y) f Y (y)dy
f Y (y)
=
Z Z
h( x ) g(y)
(1.37)
(1.38)
(1.39)
with 0/0 = 0
R
f ( x,y)
Set φ(y) = h( x ) X,Y
dx if f Y (y) > 0, and 0 otherwise. Then we
f (y)
Y
have
E(h( X )|Y ) = φ(Y )
(1.40)
almost surely, and
E(h( X )|Y ) =
with ν(y, dx ) =
f X,Y ( x,y)
I( f Y ( y )
f Y (y)
Z
h( x )ν(Y, dx )
(1.41)
> 0) dx = f X |Y ( x |y)dx.
ν(y, dx ) is called the conditional distribution of X given Y = y and
f X |Y ( x |y) is the conditional density of X given Y = y.
15
2
Discrete Time Martingales
Let (Ω, F , P) be a probability space and ( E, ξ ) a measurable space.
Usually E = R, Rd , C. For us, E = R. A sequence X = ( Xn )n≥0 of
random variables taking values in E is called a stochastic process.
A filtration is an increasing family (Fn )n≥0 of sub-σ-algebras of
Fn , so Fn ⊆ Fn+1 .
Intuitively, Fn is the information available to us at time n. To every stochastic process X we associate a filtration called the natural
filtration
(FnX )n≥0 , FnX = σ( Xk , k ≤ n)
(2.1)
A stochastic process X is called adapted to (Fn )n≥0 if Xn is Fn measurable for all n.
A stochastic process X is called integrable if Xn is integrable for all
n.
Definition 2.1. An adapted integrable process ( Xn )n≥0 taking values
in R is called a
(i) martingale if E( Xn |Fm ) = Xm for all n ≥ m.
(ii) super-martingale if E( Xn |Fm ) ≤ Xm for all n ≥ m.
(iii) sub-martingale if E( Xn |Fm ) ≥ Xm for all n ≥ m.
Remark 2.2. A (sub,super)-martingale with respect to a filtration Fn is
also a (sub, super)-martingale with respect to the natural filtration of Xn (by
the tower property)
18
andrew tulloch
Example 2.3. Suppose (ξ i ) are iid random variables with E(ξ i ) = 0. Set
Xn = ∑in=1 ξ i . Then ( Xn ) is a martingale.
Example 2.4. As above, but let (ξ i ) be iid with E(ξ i ) = 1. Then Xn =
Πin=1 ξ i is a martingale.
Definition 2.5. A random variables T : Ω → Z+ ∪ {∞} is called a
stopping time if { T ≤ n} ∈ Fn for all n. Equivalently, { T = n} ∈ Fn
for all n.
Example 2.6. (i) Constant times are trivial stopping times.
(ii) A ∈ B(R). Define TA = inf{n ≥ 0| Xn ∈ A}, with inf ∅ = ∞. Then
TA is a stopping time.
Proposition 2.7. Let S, T, ( Tn ) be stopping times on the filtered probability
space (Ω, F , (Fn ), P). Then S ∧ T, S ∨ T, infn Tn , lim infn Tn , lim supn Tn
are stopping times.
Notation. T stopping time, then XT (ω ) = XT (ω ) (ω ). The stopped process
X T is defined by XtT = XT ∧t .
F T = { A ∈ F | A ∩ T ≤ T ∈ F t , ∀ t }.
Proposition 2.8. (Ω, F , (Fn ), P), X = ( Xn )n≥0 is adapted.
(i) S ≤ T, stopping times, then FS ⊆ F T
(ii) XT I( T < ∞) is F T -measurable.
(iii) T a stopping time, then X T is adapted
(iv) If X is integrable, then X T is integrable.
Proof. Let A ∈ ξ. Need to show that { XT I( T < ∞) ∈ A} ∈ F T .




{ XT I( T < ∞)} ∩ { T ≤ t} = ∪s≤t { T = s} ∩ { Xs ∈ A} ∈ Ft (2.2)
| {z } | {z }
Fs ⊆Ft
2.1
Optional Stopping
Theorem 2.9. Let X be a martingale.
∈Fs ⊆Ft
advanced probability
(i) If T is a stopping time, then X T is also a martingale. In particular,
E( XT ∧t ) = E( X0 ) for all t.
(ii)
(iii)
(iv)
Proof. By the tower property, it is sufficient to check


t −1


E( XT ∧t |Ft−1 ) = E ∑ Xs I( T = s) |Ft−1  + E( Xt I( T > t − 1) |Ft−1 )
|
{z
}
i =1
∈Fs ⊆Ft−1
t −1
=
∑ I(T = s) Xs + I(t > t − 1) Xt−1 = XT∧(t−1)
s =0
Since it is a martingale, E( XT ∧t ) = E( X0 ).
Theorem 2.10. Let X be a martingale.
(i) If T is a stopping time, then X T is also a martingale, so in particular
E ( X T ∧ t ) = E ( X0 )
(2.3)
(ii) If X ≤ T are bounded stopping times, then E( XT |FS ) = XS almost
surely.
Proof. Let S ≤ T ≤ n. Then XT = ( XT − XT −1 ) + ( XT −1 − XT −2 ) +
· · · + ( XS+1 − XS ) + XS = Xs + ∑nk=0 ( Xk+1 − Xk )I(S ≤ k < T ).
Let A ∈ Fs . Then
E( XT I( A)) = E( Xs I( A)) +
n
∑ E((Xk+1 − Xk )I(S ≤ k < T ) I( A))
k =0
(2.4)
= E( Xs I( A))
Remark 2.11. The optimal stopping theorem also holds for super/submartingales with the respective martingale inequalities in the statement.
(2.5)
19
20
andrew tulloch
Example 2.12. Suppose that (ξ i )i are random variables with
P( ξ i = 1) = P( ξ i = −1) =
1
2
(2.6)
Set X0 = 0, Xn = ∑in=1 ξ i . This is a simply symmetric random walk on Xn .
Let T = inf{n ≥ 0 : Xn = 1}. Then ¶T < ∞ = 1, but T is not
bounded.
Proposition 2.13. If X is a positive supermartingale and T is a stopping
time which is finite almost surely (P( T < ∞) = 1), then
E ( X T ) ≤ E ( X0 )
(2.7)
Proof.
E( XT ) = E lim inf Xt∧T ≤ lim inf E( Xt∧T ) ≤ E( X0 )
t→∞
2.2
t→∞
(2.8)
Hitting Probabilities for a Simple Symmetric Random Walk
Let (ξ i ) be iid ±1 equally likely. Let X0 = 0, Xn = ∑in=1 ξ i . For all
x ∈ Z let
Tx = inf{n ≥ 0 : Xn = x }
(2.9)
which is a stopping time. We want to explore hitting probabilities
(P( T− a < Tb )) for a, b > 0. If E( T ) < ∞, then by (iv) in Theorem 2.10,
E( XT ) = E( X0 ) = 0.
E( XT ) = − aP( T− a < Tb ) + bP( Tb < T− a ) = 0
(2.10)
and thus obtain that
P( T−a < Tb ) =
b
.
a+b
Remains to check E( T ) < ∞. We have P(ξ 1 = 1, ξ a+b = 1) =
(2.11)
1
.
2a+b
advanced probability
2.3
Martingale Convergence Theorem
Theorem 2.14. Let X = ( Xn )n≥0 be a (super-)-martingale bounded in L1 ,
that is, supn≥0 E(| Xn |) < ∞. Then Xn converges as n → ∞ almost surely
towards an a.s. finite limit X ∈ L1 (F∞ ) with F∞ = σ (Fn , n ≥ 0). To
prove it we will use Doob’s trick which counts up-crossings of intervals with
rational endpoints.
Corollary 2.15. Let X be a positive supermartingale. Then it converges to
an almost surely finite limit as n → ∞.
Proof.
E(| Xn |) = E( Xn ) ≤ E( X0 ) < ∞
(2.12)
Proof. Let x = ( xn )n be a sequence of real numbers, and let a < b be
two real numbers. Let T0 ( x ) = 0 and inductively for k ≥ 0,
Sk+1 ( x ) = inf{n ≥ Tk ( x ) : xn ≤ a} Tk+1 ( x ) = inf{n ≥ Sk+1 ( x ) : xn ≥ b}
(2.13)
with the usual convention that inf ∅ = ∞.
Define Nn ([ a, b], x ) = sup{k ≥ 0 : Tk ( x ) ≤ n} - the number of
up-crossings of the interval [ a, b] by the sequence x by the time n. As
n → ∞, we have
Nn ([ a, b], x ) ↑ N ([ a, b], x ) = sup{k ≥ 0 : Tk ( x ) < ∞},
(2.14)
the total number of up-crossings of the interval [ a, b].
Lemma 2.16. A sequence of rationals x = ( xn )n converges in R̄ =
R ∪ {±∞} if and only if N ([ a, b], x ) < ∞ for all rationals a, b.
Proof. Assume x converges. Then if for some a < b we had that
N ([ a, b], x ) = ∞, then lim infn xn ≤ a < b ≤ lim supn xn , which is a
contradiction.
Then, suppose that x does converge. Then lim infn xn > lim supn xn ,
and so taking a, b rationals between these two numbers gives that
N ([ a, b], x ) = ∞ as required.
21
22
andrew tulloch
Theorem 2.17 (Doob’s up-crossing inequality). Let X be a supermartingale and a < b be two real numbers. Then for all n ≥ 0,
(b − a)E( Nn ([ a, b], X )) ≤ E ( Xn − a)−
(2.15)
Proof. For all k,
XTk − XSk ≥ b − a
2.4
(2.16)
Uniform Integrability
Theorem 2.18. Suppose X ∈ L1 . Then the collection of random variables
{E( X |G)}
(2.17)
for G ⊆ F a sub-σ-algebra is uniformly integrable.
Proof. Since X ∈ L1 , for all e > 0 there exists S > 0 such that if A ∈ F
and P( A) < δ, then E(| X |I( A)) ≤ e.
Set Y = E( X |G). Then E(|Y |) ≤ E(| X |). Choose λ < ∞ such that
E(| X |) ≤ λδ. Then
P(|Y | ≥ λ) ≤
E(|Y |)
≤δ
λ
(2.18)
by Markov’s inequality.
Then
E(|Y |I(|Y | ≥ λ)) ≤ E(E(| X ||G) I(|Y | ≥ λ))
(2.19)
= E(| X |I(|Y | ≥ λ))
(2.20)
≤e
(2.21)
Definition 2.19. A process X = ( Xn )n≥0 is called a uniformly integrable martingale if it is a martingale and the collection ( Xn ) is
uniformly integrable.
Theorem 2.20. Let X be a martingale. Then the following are equivalent.
advanced probability
(i) X is a uniformly integrable martingale.
(ii) X converges almost surely and in L1 to a limit X∞ as n → ∞.
(iii) There exists a random variable Z ∈ L1 such that Xn = E( Z |Fn ) almost
surely for all n ≥ 0.
Theorem 2.21 (Chapter 13 of Williams). Let Xn , X ∈ L1 for all n ≥ 0
as
and suppose that Xn → X as n → ∞. Then Xn converges to X in L1 if and
only if ( Xn ) is uniformly integrable.
Proof. We proceed as follows.
(i ) ⇒ (ii ) Since X is uniformly integrable, it is bounded in L1 and by the
martingale convergence theorem, we get that Xn converges almost
surely to a finite limit X∞ . By the previous theorem, Theorem 2.21
gives L1 convergence.
(ii ) ⇒ (iii ) Set Z = X∞ . We need to show that Xn = E( Z |Fn ) almost surely
for all n ≥ 0. For all m ≥ n by the martingale property we have
k Xn − E( X∞ |Fn ) k1 = kE( Xm − X∞ |Fn ) k1 ≤ k Xm − X∞ k1 → 0
(2.22)
as m → ∞.
(iii ) ⇒ (i ) E( Z |Fn ) is a martingale by the tower property of conditional
expectation. Uniform integrability follows from Theorem 2.18.
Remark 2.22. If X is UI then X∞ = E( Z |F∞ ) a.s where F∞ = σ (Fn , n ≥
0).
Remark 2.23. If X is a super/sub-martingale UI, then it converges almost
surely and in L1 to a finite limit X∞ with E( X∞ |Fn ) (≥)(≤) Xn almost
surely.
Example 2.24. Let X1 , X2 , . . . be iid random variables with P( X = 0) =
P( X = 2) = 21 . Set Yn = X1 · · · · · Xn . Then Yn is a martingale.
As E(Yn ) = 1 for all n, we have (Yn ) is bounded in L1 , and it converges
almost surely to 0. But E(Yn ) = 1 for all n, and hence it does not converge
in L1 .
23
24
andrew tulloch
If X is a UI martingale and T is a stopping time, then we can unambiguously define
∞
XT =
∑ Xn I( T = n ) + X∞ I( T = ∞ )
(2.23)
n =0
Theorem 2.25 (Optional stopping for UI martingales). Let X be a UI
martingale and let S, T be stopping times with S ≤ T. Then
E( XT |FS ) = XS
(2.24)
almost surely.
Proof. We first show that E( X∞ |F T ) = XT almost surely for any
stopping time T. First, check that XT ∈ L1 . Since | Xn | ≤ E(| X∞ ||Fn ),
we have
E(| XT |) =
∞
∑ E(|Xn |I(T = n) + E(|X∞ |I(T = ∞)))
(2.25)
n =0
≤
∑
E(| X∞ |I( T = n))
n∈Z+ ∪{∞}
(2.26)
= E(| X∞ |)
(2.27)
Let B ∈ F T . Then
E(I( B ) X T ) =
∑
E(I( B ) I( T = n ) Xn )
(2.28)
∑
E(I( B ) I( T = n ) X∞ )
(2.29)
n∈Z+ ∪{∞}
=
n∈ Z + ∪{∞}
= E(I( B ) X∞ )
(2.30)
where for the second equality we used that E( X∞ |Fn ) = Xn almost
surely.
Clearly XT isF T -measurable, and hence E( X∞ |F T ) = XT almost
surely. Using the tower property of conditional expectation, we have
advanced probability
for stopping times S ≤ T (as FS ⊆ F T ),
E( XT |FS ) = E(E( X∞ |F T ) |FS )
(2.31)
= E( X∞ |FS )
(2.32)
= XS
(2.33)
almost surely.
2.5
Backwards Martingales
Let ... ⊆ G−2 ⊆ G−1 ⊆ G0 be a sequence of ....
Fill in proof from lecture
notes
2.6
Applications of Martingales
Theorem 2.26 (Kolmogrov’s 0 − 1 law). Let ( Xi )i≥1 be a sequence of IID
random variables. Let Fn = σ( Xk , k ≥ n) and F∞ = ∩n≥0 Fn . Then F∞ is
trivial - that is, every A ∈ F∞ has probability P( A) ∈ {0, 1}.
Proof. Let Gn = σ ( Xk , k ≤ n) and A ∈ F∞ . Since Gn is independent of
Fn+1 , we have that
E(I( A) |Gn ) = P( A)
(2.34)
Theorem 2.26 (LN ) gives that P( A) = E(I( A) |Gn ) converges to
link to correct theorem
E(I( A) |G∞ ) as n → ∞, where G∞ = σ (Gn , n ≥ 0). Then we deduce
that E(I( A) |Gn ) = I( A) = P( A) as F∞ ⊆ G∞ . Therefore, P( A) =
Theorem 2.27 (Strong law of large numbers). Let ( Xi )i≥1 be a sequence
of iid random variables in L1 with µ = E( Xi ). Let Sn = ∑in=1 Xi and
S0 = 0. Then
Sn
n
→ µ as n → ∞ almost surely and in L1 .
Proof.
Theorem 2.28 (Kakutani’s product martingale theorem). Let ( Xn )n≥0
be a sequence of independent non-negative random variables of mean 1. Let
M0 = 1, Mn = ∏in=1 Xi for n ∈ N. Then ( Mn )n≥0 is a non-negative
martingale and Mn → M∞ a.s. as n → ∞ for some random variable M∞ .
We set an=E(√ Xn ) , then an ∈ (0, 1]. Moreover,
(i) If ∏n an > 0, then Mn → M∞ in L1 and E( M∞ ) = 1,
(ii) If ∏n an = 0, then M∞ = 0 almost surely.
fill in, this is somewhat involved.
25
26
andrew tulloch
Proof.
fill in
2.6.1 Martingale proof of the Radon-Nikodym theorem
Let P, Q be two probability measures on the measurable space Ω, F .
Assume that F is countably generated, that is, there exists a collection of sets ( Fn )n∈N such that F = σ( FN , n ∈ N). Then the following
are equivalent.
(i) P( A) = 0 ⇒ Q( A) for all A ∈ F . That is, Q is absolutely continuous with respect to P and write Q << P
(ii) For all e > 0, there exists δ > 0 such that P( A) ≤ δ ⇒ Q( A) ≤ e.
(iii) There exists a non-negative random variable X such that
Q( A) = EP ( XI( A))
(2.35)
Proof. (i ) → (ii ). If (ii ) does not hold, then there exists e > 0 such
that for all n ≥ 1 there exists a set An with P( An ) ≤
1
n2
and Q( An ) ≥
e. By Borel-Cantelli, we get that P( An i.o ) = 0. Therefore from (i ) we
get that Q( An i.o ) = 0. But
Q( An i.o ) = Q(∩n ∪k≥n Ak ) = lim Q(∪k≥n Ak ) ≥ e
n→∞
(2.36)
which is a contradiction.
(ii ) → (iii ). Consider the filtration Fn = σ( Fk , k ≤ n). Let
An = { H1 ∩ · · · ∩ Hn | Hi = Fi or Fic }
(2.37)
then it is easy to see that Fn = σ ( An ). Note also that sets in An are
disjoint.
continue proof
3
Stochastic Processes in Continuous Time
Our setting is a probability space (Ω, F , P) a probability space with
t ∈ J ⊆ R+ = [0, ∞)
Definition 3.1. A filtration on (Ω, F , P) is an increasing collection of
σ-algebras (Ft )t∈ J , satisfying Fs ⊆ Ft for t ≥ s. A stochastic process
in continuous time is an ordered collection of random variables on Ω.
4
Bibliography