Download Basic Stochastic Processes

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Randomness wikipedia , lookup

Inductive probability wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability box wikipedia , lookup

Birthday problem wikipedia , lookup

Central limit theorem wikipedia , lookup

Law of large numbers wikipedia , lookup

Conditioning (probability) wikipedia , lookup

Transcript
Introduction to Stochastic Processes
August 30, 2006
Contents
1 Measure space and random variables
1
2 Integration, Expectation and Independence
8
3 The art of conditioning
21
4 Martingales
31
5 Martingale convergence problems
35
6 Continuous time processes: the Wiener process or Brownian motion
37
7 Diffusions and Ito processes
43
Abstract
These notes have a two-fold use: it contains both the material (albeit slightly re-shuffled) of
the course on this topic taught by the above authors in Fall 2003 as well as extra notes where we
feel that the book on ‘Basic Stochastic Processes’ is slightly too ephemeral.
1
Measure space and random variables
Definition 1.1 A probability space is a triple (Ω, F, P) with the following properties:
• The sample space of outcomes Ω is a non-empty set;
• the set of observable events F is a σ-algebra over Ω. This means that F is a collection of
subsets of Ω with the following properties:
i) Ω ∈ F;
ii) B ∈ F ⇒ Ω \ B ∈ F;
1
iii) if (Bn )n∈N is a sequence of events in F, then ∪∞
i=1 Bn ∈ F.
F can be interpreted as the amount of information of Ω that can be observed. The smaller F,
the less information we have of Ω.
• P is a probability measure on (Ω, F), i.e. P : F → [0, 1] with the properties
i) P{Ω} = 1;
ii) for (Bn )n∈N a sequence
of mutually disjoint events in F, i.e. Bi ∩ Bj = ∅ for i 6= j, one has
P∞
P{ ∪∞
P{B
B
}
=
i } (σ-additivity).
i=1 i
i=1
σ-algebras
t:opg1-1
Problem 1.1 Check that B1 , B2 . . . ∈ F implies
many elements of F belongs to F.
∩∞
i=1 Bi
∈ F, i.e. the intersection of countably
The Borel-σ-algebra B(Rd ) over Ω = Rd is the intersection of all σ-algebras containing the open sets
in Rd . It is the smallest σ-algebra containing all open sets in Rd .
Problem 1.2 Show that all one-points sets {x}, x ∈ R belong to B(R). Show that Q belongs to
B(R).
The σ-algebra σ(A) generated by a subset A ⊆ P(Ω) is the intersection of all σ-algebras containing
A:
\
σ(A) := {B : B is a σ − algebra over Ω with A ⊆ B}.
Then B(R) is the σ-algebra generated by e.g. the open intervals (−a, b), a, b ∈ Q.
t:opg1-3
Problem 1.3 Let Ω = Z+ . Suppose that A = {{i} | i ∈ Z+ } is the collection of all one-point sets.
Determine the minimal σ-algebra containing A.
t:opg1-6
Problem 1.4 Let V ⊂ N. Let V be the class of subsets for which the ‘(Césaro) density’
#(V ∩ {1, . . . , n})
n→∞
n
γ(V ) = lim
exists. Give an example of sets V , W ∈ V for which V ∩ W 6∈ V. Hence, V is not a σ-algebra.
Problem 1.5 Let Ω = {0, 1}Z+ , i.e. Ω = {(ω1 , ω2 , . . .), ωn ∈ {0, 1}, n = 1, 2, . . .}. Define
F = σ({ω : ωn = k}, n ∈ Z+ , k ∈ {0, 1}).
Describe
the following sets: (i)A
P F in words. Show that
P∞ F contains
Pnn = {ω : ωi = 0, i > n}; (ii)
−i < 1/3}; (iv) {ω : lim
{ω : ∞
ω
<
∞};
(iii)
{ω
:
ω
2
n→∞
i=1 ωi /n = 1/2}.
i=1 i
i=1 i
2
Probability measure
if
A statement S about points ω ∈ Ω is said to hold almost everywhere (a.e.)
S = {ω | S(ω) is true } ∈ F,
and P{S} = 1.
As an example of a simple probability space, take Ω = {±1}n , F = P(Ω) (power set or collection of
all sub-sets), and P the Laplace-measure on Ω, i.e.
P{B} =
#(B)
.
#(Ω)
σ-algebras are complicated objects. It is often easier to work with π-systems.
A collection I of subsets of Ω is called a π-system, if it is invariant under intersection:
I1 , I2 ∈ I → I1 ∩ I2 ∈ I.
t:l-3
Lemma 1.1 Let µ1 , µ2 be two probability measures on (Ω, σ(I)), such that µ1 = µ2 on I. Then
µ1 = µ2 on σ(I). That is, if two probability measures agree on a π-system, then they agree on the
σ-algebra generated by the π-system.
Problem 1.6 Give a π-system I, such that σ(I) = B([0, 1]).
Let Ω = [0, 1] and F = B([0, 1])) the Borel-sets on [0, 1]. The Lebesgue measure P = λ ‘measures’
the length of an interval:
λ{(a, b]} = b − a.
It is not trivial to prove that λ can be extended as a probability measure on ([0, 1], B([0, 1]).
Let (Ω, F, P) be a probability space and let {An }n∈N be a sequence of events (An ∈ F, n = 1, . . .).
Then
lim sup An := ∩m ∪n≥m An = (An i.o. )
n→∞
i.o=infinitely often. Explanation: x ∈ lim supn An iff x ∈ ∪n≥m An for all m. Then x ∈ lim supn An
iff for all m there exists n ≥ m such that x ∈ An . Similarly
lim inf An := ∪m ∩n≥m An = (An eventually).
n→∞
Then x ∈ lim inf n An iff there exists m such that x ∈ ∩n≥m An . That is, x ∈ lim inf n An iff x belongs
to all An except at most finitely many.
Problem 1.7 Prove that lim inf n→∞ An ⊂ lim supn→∞ An .
The notation An ↑ A means: An ⊂ An+1 , n ∈ N, A = ∪n An ; An ↓ A means that An ⊃ An+1 ,
A = ∩n An .
Lemma 1.2 (Monotone convergence of the measure of a set) i) An ↑ A implies P{An } ↑
P{A};
ii) An ↓ A implies P{An } ↓ P{A}.
Problem 1.8 Prove this lemma, see the hint on BSP p.3. (BSP=Basic Stochastic Processes)
3
Note that for (ii) it is crucial that we consider probability measures. In case of a general measure
µ, (ii) does not necessarily hold when µ(Ω) = ∞. Example is: the Lebesgue measure on R with
An = (n, ∞). Then µ(An ) = ∞, and A = ∩n An = ∅, µ(∅) = 0.
Lemma 1.3 (Fatou Lemma for sets) i) P{lim inf n→∞ An } ≤ lim inf n→∞ P{An }
ii) P{lim supn→∞ An } ≥ lim supn→∞ P{An }.
In this case, (ii) requires finiteness of the measure at play.
Problem 1.9 Give an example where (ii) does not hold.
Proof of the Fatou Lemma. We prove (ii). Let Gm = ∪n≥m An and Gm ↓ G = lim supn→∞ An
(why?). Hence P{Gm } ↓ P{G}. Since P{Gm } ≥ P{An }, we have that P{Gm } ≥ supn≥m P{An }.
Hence
P{G} =↓ lim P{Gm } ≥↓ lim sup P{An } = lim sup P{An }.
m→∞
m→∞ n≥m
n→∞
QED
Problem 1.10 Prove statement (i) of the Fatou Lemma.
Lemma 1.4 (First Borel-Cantelli Lemma) Suppose that
P∞
n=1 P{An }
< ∞. Then
P{lim sup An } = P{An i.o.} = 0.
n→∞
Applications of this lemma come later, after introducing the notion of independence.
Random variables What functions on a probability space (Ω, F, P) are consistent with the σalgebra F? These are the measurable functions.
Definition 1.2 A map X : Ω → R is called (F)-measurable is X −1 (B) ∈ F for all B ∈ B(R).
In other words:
{X ∈ B} := {ω : X(ω) ∈ B} = X −1 (B)
is an observable event for all B ∈ B(R). In the probabilistic context a measurable, real-valued
function is called a random variable. For the present we stick to speaking of measurable functions.
If Ω = Rk and F = B(Rk ), then we call X a Borel-function.
t:opg1-5
Problem 1.11 Let Ω = [0, 1] and let A ( Ω, A 6= ∅. Determine the minimal σ-algebra F containing
A. Classify all (Ω, F)-measurable functions.
4
The building blocks of these functions are the elementary functions: let A1 , . . . , An ∈ F be disjoint
(Aj ∩ Ai = ∅, j 6= i) and let a1 , . . . , an ∈ R. Then
f (ω) =
n
X
ai 1{Ai } (ω)
i=1
is an elementary function or a simple function. Here 1{Ai } is the indicator-function of Ai , i.e.
1{Ai } (ω) =
1,
0,
ω ∈ Ai
otherwise.
Problem 1.12 Show that an elementary function is measurable.
t:opg1-4
Problem 1.13 Let Ω = R, and F = B(R). Show that the function X : R → R, defined by
1,
ω ∈ Q,
X(ω) =
,
0,
ω ∈R\Q
is an elementary function.
In order to show that limits of elementary functions are measurable, we need the following elementary
results on measurability.
Lemma 1.5 i) f −1 preserves set operations:
f −1 (∪α Aα ) = ∪α f −1 (Aα ); f −1 (Ac ) = (f −1 (A))c , ...
ii) If C ⊆ B is a collection of sets generating B(R), that is σ(C) = B(R), then f −1 (C) ∈ F for all
C ∈ C, implies that f is F-measurable.
iii) The function g : Ω → R is measurable, if
{g ≤ x} = {ω : g(ω) ≤ x} ∈ F,
for all x ∈ R.
Proof. The proof of (i) is straightforward. For the proof of (ii) let C(B) be the collection of elements
B (are sets!) of B(R), with f −1 (B) ∈ F. By (i) C(B) is a σ-algebra, by assumption C(B) contains
C, hence C(B) contains B. (iii) follows from (ii) when we take C = π(R) the class of intervals of the
form (−∞, x].
QED
Problem 1.14 Let Ω = R, F = B(R). Show that f : R → R given by f (x) = cos(x) is measurable.
Measurability is preserved under a number of operations.
t:l-1
Lemma 1.6 (Sums and products are measurable) f, g measurable and λ ∈ R, then f + g, f · g
and λf are measurable.
5
Proof (partial). It is sufficient by the previous lemma to check that {f + g > x} ∈ F (why?). Now,
f (ω) + g(ω) > x iff f (ω) > x − g(ω). Hence there exists qω ∈ Q, such that f (ω(> qω > x − g(ω). It
follows that
{f + g > x} = ∪q∈Q {f > q} ∩ {g < x − q},
the latter of which is a countable union of elements of F.
QED
Lemma 1.7 (Composition lemma) If f is F-measurable and g a Borel function, then the composition g ◦ f is F-measurable.
In the next lemma, we may allow that the limits have values ±∞. All results can be extended to
this case, but here we restrict to finite limits. This lemma ensures that non-increasing limits of
elementary functions are measurable. Most ‘reasonable’ functions fall into this category.
t:l-2
Lemma 1.8 (Measurability of infs, liminfs and lims) Let f1 , . . . be a sequence of measurable
functions. Then (i) inf n fn , supn fn , (ii) lim inf n fn , lim supn fn are measurable (provided these
limits are finite); moreover (iii) {ω : limn fn (ω) exists} ∈ F.
Proof. For (i), use {ω : inf n fn (ω) ≥ x} = ∩n {ω : fn (ω) ≥ x}. For (ii) let ln (ω) = inf m≥n fm (ω).
Then ln is measurable by (i). Moreover,
l(ω) := lim inf fn (ω) =↑ lim ln (ω) = sup ln (ω),
n
and so {l ≤ x} = ∩n {ln ≤ x} ∈ F. For (iii), note that
{lim fn exists } = {lim sup fn < ∞} ∩ {lim inf fn > −∞} ∩ {lim sup fn − lim inf fn = 0}.
n
QED
Problem 1.15 We did not prove the case of sup and lim sup. How does this follows from the inf
and lim inf case?
The uniqueness lemma for measures allow to deduce results on σ-algebras from results on π-systems
for these σ-algebras. There is a similar result for measurable functions. The following theorem allow
to deduce results for general measurable functions form results on indicator functions of elements
from a π-system for the σ-algebra at hand! This version is taken from Williams’ book, most versions
tend to be formulated as assertions on σ-algebras.
Theorem 1.9 ((Halmos) Monotone class Theorem: elementary version) Let H be a class
of bounded functions from a set S into R, satisfying the following conditions:
i) H is a vector space over R (i.e. it is an Abelian group w.r.t to addition of functions, it is
closed under scalar multiplication by real scalars, such that (αβ)f = α(βf ),(−1)f = −f and
(α + β)f = αf + βf , for f ∈ H, α, β ∈ R);
ii) if fn ,n = 1, 2, . . ., is a sequence of non-negative functions in H such that fn ↑ f , with f bounded,
then f ∈ H;
iii) The constant function is an element of H.
If H contains the indicator function of every set in a π-system I, then H contains overy bounded
σ(I)-measurable function.
6
Let Ω = {0, 1}N . So, Ω = {(ω1 , ω2 , . . .), ωn ∈ {0, 1}, n = 1, . . .}. Define
Coin tossing
F = σ({ω : ωn = k} : n ∈ N, k ∈ {0, 1}).
Let Xn (ω) be the projection on the n-th co-ordinate: Xn (ω) = ωn . It is the result of the n-th toss.
By definition of F, Xn is a random variable. By Lemma 1.6
Sn = X1 + · · · + Xn = number of ones in n tosses
is a random variable. Next for x ∈ [0, 1]
o
n
Sn
Sn
number of ones in n tosses
→ x = {ω : lim sup
= x} ∩ {ω : lim inf
= x} ∈ F
ω:
n=number of tosses
n
n
by Lemma 1.8. Note that this means that the Strong Law of Large Numbers is a meaningful result!
t:opg1-7
1/2n ,
Problem 1.16 Define P{ω : ω1 = x1 , . . . , ωn = xn } =
where x1 , . . . , xn ∈ {0, 1}. Assume that
this can be extended to a probability measure on Ω. Prove the following assertions:
i) E = {ω :
P
< ∞} ∈ F, and P{E} = 0.
P
ii) The function X(ω) = n ωn 2−n is a random variable.
n ωn
iii) λ(a, b] = P{X ∈ (a, b]} for all intervals (a, b] ∈ [0, 1].
iv) λ(B) = P{X ∈ B} for all Borel sets B ⊂ [0, 1]. Hence X has the uniform distribution on [0, 1].
σ-algebra generated by a random variable or a collection of these Suppose we have a
collection of random variables Xt : Ω → R, t ∈ I, where I is some index set. Then
X = σ(Xt : t ∈ I)
is defined to be the smallest σ-field, such that each random variable Xt is X -measurable. It follows
that X ⊂ F! One can view σ(Xt : t ∈ I) as the information carried by the random variables Xt , t ∈ I.
For instance, observing an outcome y = X1 (ω), we can only retrieve the set X1−1 (y) that ω belongs
to, and in general not the precise point ω that produced outcome y. Compared to the σ-algebra
F, we lose information by observing the outcome of a random variables, and so the σ-algebras
σ(X1 ), σ(X1 , X2 ), . . . , X are sub-σ-algebras of F. It makes sense that observing more outcomes
y1 = X1 (ω), y2 = X2 (ω), . . ., provides us more information as to the precise point ω that produced
these outcomes. This is consistent with the fact that e.g. σ(X1 , . . . , Xn ) ⊃ σ(X1 , . . . , Xn−1 ): the
more outcomes we observe, the bigger ‘finer’ the generated σ-algebra.
How can we build σ-algebra X if e.g. the index set I = N? π-systems help us here: let Xn = σ(Xk :
k ≤ n), then ∪n Xn is a π-system that generates σ(Xn : n ∈ N).
Problem 1.17 Let Ω = [0, 1], F = B([0, 1]), and
X1 (ω) =
1,
0,
ω ≤ 1/5
,
ω > 1/5

 −1,
0,
X2 (ω) =

2,
ω ≤ 1/2
1/2 < ω ≤ 3/4
ω > 3/4
Determine σ(X1 ), σ(X2 ) and σ(X1 , X2 ). Describe all σ(X1 , X2 )-measurable functions.
7
Problem 1.18 Let Ω = R, F = B(R). For X(ω) = cos(ω), determine σ(X). Is Y defined by
Y (ω) = sin(ω) σ(X)-measurable?
Problem 1.19 Prove that the σ-algebra σ(X) generated by the random variable X is given by
σ(X) = X −1 (B) := ({ω | X(ω ∈ B} : B ∈ B),
and that σ(X) is generated by the π-system
π(X) := ({ω | X(ω) ≤ x} : x ∈ R).
How can one characterise π-systems generating σ(X1 , . . . , Xn ) and X ? Explain.
t:mtb
Theorem 1.10 Let (Ω, F) be a measure space. Let Ω1 be another space and f : Ω1 → Ω a function.
Let F1 = σ(f −1 (A), A ∈ F) = f −1 (F) be the σ-algebra generated by the inverse images of A ∈ F
under f . Then a function g : Ω1 → R is F1 -measurable if and only if there exists a F-measurable
function h : Ω → R such that g = h(f ).
An application of the above theorem is the Doob-Dynkin lemma.
Lemma 1.11 (in BSP:Doob-Dynkin lemma) Let X : Ω → R be a random variable. Then Y : Ω → R
is σ(X)-measurable if and only if there exists a Borel function f : R → R such that Y = f (X).
The lemma can be proved by first proving it for elementary functions and then extending this to
positive and then to general measurable functions.
Problem 1.20 Show how the Doob Dynkin lemma follows from theorem 1.10. Suppose that X is
an elementary function. Show the assertion of the Lemma by explicitly constructing σ(X) and by
subsequently specifying how to choose f .
2
Integration, Expectation and Independence
It is convenient here to assume a general measure µ, i.e. we have a measure space (Ω, F, µ). As a
reminder: we say that an event A ∈ F occurs µ-a.s.(almost surely), or µ-a.e. (almost everywhere),
if µ(Ac ) = 0. In case that µ is a probability measure, we can also say that this event occur swith
probability 1.
P
For a non-negative elementary funtion f = ni=1 ai 1{Ai } , ai ≥ 0, i = 1, . . . , n, we define
Z
n
X
f dµ =
ai µ(Ai ).
i=1
For general positive, measurable functions f , the integral can be defined by
Z
Z
f dµ = lim
fn dµ,
n→∞
where fn , n = 1, . . ., is a non-decreasing sequence of elementary functions, with fn ↑ f , n → ∞.
For example, one can choose
n,
fn (ω) =
(i − 1)2−n ,
f (ω) > n
(i − 1)2−n < f (ω) ≤ i2−n ≤ n, i = 1, . . . , n2n .
8
Problem 2.1 These approximating elementary functions fn are σ(f )-measurable. Prove this.
For general measurable f , write Rf = f + −R f − , f + , f − ≥ 0: f + = max(f, 0), f − = max(−f, 0). Then
f is integrable if at least one of f + dµ, f − dµ < ∞; if both are finite we call f summable! N.B.
this is slightly different from definition 1.9 of SBP. N.B. this stepwise argument from elementary
functions, via positive functions to general functions is part of a standard proof machine. Lateron it
will be used for stochastic integrals.
Problem 2.2 Let Ω = (0, 1], F = B(R), µ = λ. Let f = 1{Q∩(0,1]} . Calculate
R
f dλ.
t:opg2-1
Problem 2.3 i) Suppose that
R µ(f 6= 0) = 0 for some measurable function f (not necessarily nonnegative!). Prove that f dµ = 0.
R
ii) Let µ(f < 0) = 0 (i.e. f ≥ 0 µ-a.e.). Prove that f dµ ≥ 0.
R
iii) Let f be a measurable function with µ(f < 0) = 0. Prove that f dµ = 0 implies µ(f > 0) = 0,
Ri.e. f ≡ 0 µ-a.e. Give an example of a measure space and a function f with f 6≡ 0, and
f dµ = 0.
The next step is to formulate a number of basic convergence theorems giving conditions under which
integral and limits may be interchanged. These conditions amount to requiring positivity (positive
functions are always integrable, there are no problems of substracting ∞ from ∞) or some wellbehaved dominating function.
t:th-1
Theorem 2.1 Monotone convergence Theorem Suppose that 0 ≤ fn ↑ f µ-a.e.
(i.e. µ( ∪n (fn < 0) ∪ (f < 0) ∪ (fn 6↑ f )) = 0). Then
Z
Z
Z
lim
fn dµ =
lim fn dµ = f dµ.
n→∞
n→∞
(2.1)
Dominated Convergence Theorem Suppose that fn → f µ-a.e., and |fn | ≤ g µ-a.e. with g a
µ-summable function. Then
Z
|fn − f |dµ → 0,
n→∞
and in particular (2.1) holds.
Lemma 2.2 (Fatou’s Lemma) (BSP, p. 109) If fn ≥ 0 µ-a.e., then
Z
Z
lim inf fn dµ ≤ lim inf fn dµ.
n
n
RProof. LetR gn := inf k≥n fk . gn is measurable and gn ↑ lim inf k fk . Then fkR ≥ gn forR k ≥ n. Hence
fk dµ ≥ gn dµ, k ≥ n (see question 2.3 (iii)). By monotone convergence gn dµ ↑ lim inf k fk dµ,
and so
Z
Z
Z
Z
fk dµ = lim inf fn dµ.
lim inf fk dµ =↑ lim gn dµ ≤↑ lim inf
k
n
n k≥n
n
QED
9
Problem 2.4 There is a limsup version of Fatou’s lemma:
Z
Z
lim sup fn dµ ≥ lim sup fn dµ.
n
n→∞
Provide conditions on the sequence fn , n = 1, . . ., such that the version follows from the above
Fatou’s lemma.
Problem 2.5 Let Ω = (0, 1]; F = B(0, 1] and µ = λ, the Lebesgue measure. Let
fn = n1{(0,1/n]} .
R
Compute limn fn , and limn fn dλ. Compare this with the statements in the Monotone Convergence
Theorem, Dominated Convergence Theorem and Fatou’s Lemma. Which results fail and why?
R
Problem 2.6 Let fn , n = 1, . . . and f be measurable functions, with the property that |fn −
f |dµ → 0, n → ∞. Does this imply that fn → f , n → ∞, µ-a.e.? Unfortunately not in general:
choose Ω = (0, 1], F = B(0, 1] and
i = 0, . . . , 2n − 1, n = 0, 1, . . .
R
R
Calculate fk dλ, investigate whether the limits limk→∞ fk and limk→∞ fk dλ exist.
f2n +i = 1{(i·2−n ,(i+1)·2−n ]} ,
In order to be able to define conditional expectations lateron, we need the following result.
Theorem 2.3 (Radon-Nikodym, BSP p. 28) Let (Ω, F) be given. Suppose that µ is a σ-finite
measure, i.e. there are events An , n = 1, . . . ∈ F, with ∪An = Ω and µ(An ) < ∞ for n = 1, . . ..
Suppose further that ν is µ-absolutely continuous, i.e. µ(A) = 0 implies ν(A) = 0. Then there exists
a measurable function f ≥ 0, which is integrable w.r.t. µ, such that
Z
Z
ν(A) =
f dµ = f 1{A} dµ.
A
Notation: f = dν/dµ is called the density or Radon-Nikodym derivative of ν w.r.t. µ.
A consequence of the Theorem for measurable functions g, integrable w.r.t. ν, is that
Z
Z
gdν = g · f dµ,
(2.2)
Back to random variables and probability measures In general, when speaking of random
variables, we define these in terms of the outcomes (values X van take) and a probability distribution
on the space of outcomes. The underlying probability space (Ω, F, P) is mostly left undefined and
its role is hidden.
It can be useful to know a way of constructing an underlying probability space. However, first we
will discuss some notation and concepts for random variables related to integration.
Suppose that (Ω, F, P) is given as well as the random variable X : Ω → R. Then PX given by
PX {A} = P{ω : X(ω) ∈ A}
is a probability measure on (R, B(R)) by virtue of the so-called “overplantingsstelling’.
10
Theorem 2.4 (Overplantingsstelling) Let Ω, F, µ) be a measure space. Suppose that (Ω0 , F 0 ) is
a measurable space. Let f : Ω → Ω0 be a F–F 0 -measurable function in the sense that f −1 (A0 ) ∈ F
for all A0 ∈ F 0 . Then the function
µ0 (A0 ) = µ{f −1 (A0 )},
A0 ∈ F 0 ,
is a measure on F 0 . Moreover, for any F 0 -measurable function g : Ω0 → R, one has
Z
Z
gdµ0 ,
g(f )dµ =
Ω0
Ω
in the sense that both integrals exist and are equal, whenever at least one of them exists.
Problem 2.7 Prove this theorem.
PX is called the probability distribution of X. Since {(−∞, x]}, x ∈ R is a π-system generating
B(R), the uniqueness lemma 1.1 implies that it is sufficient to specify the values
FX (x) = PX {(−∞, x]} = P{X ≤ x},
which is called the (probability) distribution function of X.
Problem 2.8 Show that FX has the following properties:
i) F : R → [0, 1], FX is non-decreasing;
ii) limx→−∞ FX (x) = 0, limx→∞ FX (x) = 1;
iii) FX is right-continuous.
The function FX provides a nice tool for the construction of random variables with a given distribution
functions.
Let a function F with properties (i,ii,iii) be given. Then again there is a unique (why?) probability
measure p on (R, B(R)) with
p{(−∞, x]} = F(x).
Choose Ω = R, F = B(R) and P = p, and set X(ω) = ω. We have PX = p.
We can also construct X on (Ω, F, P) = ([0, 1], B[0, 1], λ): set
X(ω) = inf{y : F(y) ≥ ω}(= sup{z : F(z) < ω}.
This is called the Skohodrod representation.
Problem 2.9 Show that FX = F.
If the probability measure PX is absolutely continuous w.r.t. the Lebesgue measure then PX has a
probability density function fX (w.r.t. to the Lebesgue measure) by the Radon-Nikodym theorem
and then we can write
Z
PX {A} =
fX (x)dλ(x).
A
11
Whenever f is Riemann-integrable, this integral is the same as the normal Riemann integral! This
applies for instance when the density is a continuous function on an open interval of R. On the other
hand, there is a pit-fall here: one would expect continuity of FX to imply existence of a density. This
is not true: the Cantor set provides a way to construct an example
of this. However If FX is a
Rx
continuous function and there is a function f such that FX (x) = −∞ f(u)du, then the density exists
and one may choose fX (x) = f(x), as in the usual cases.
If X is P-summable, we say that X has finite expectation (or a finite first moment), given by
Z
X(ω)dP(ω).
E(X) =
Ω
Using the ‘overplantingsstelling’, we can write it in terms of PX by E(X) =
density fX w.r.t. λ then
Z
E(X) =
xfX (x)dλ(x).
R
R xdPX (x).
If X has a
R
N.B. different authors define the existence of the expectation or of moments differently: some require
only integrability in our sense.
If, X 2 is P-summable, then we call X square integrable. The variance of X is defined by σ 2 ((x) =
E(X − E(X))2 (= E(X 2 ) − (E(X))2 .
In order to calculate expectations of functions of X, we can use the ‘overplantingsstelling’ in a
convenient way. Suppose that g : R → R is Borel-measurable. Then g(X) is has a finite expectation
if and only if g is summable w.r.t PX and we have
Z
Z
Eg(X) =
g(X(ω))dP(ω) =
g(x)PX (x).
Ω
R
If X has a density fX w.r.t. λ, then
Z
Eg(X) =
g(x)fX (x)dλ(x).
R
Remark The space of summable functions on (Ω, F, P) is denotes by L1 (Ω, F, P), and the space of
R
square integrable functions on (Ω, F, P) by L2 (Ω, F, P). Both play important roles: ||X||1 = |f |dP
qR
and ||X||2 =
X 2 dP act ‘almost as’ norms on these spaces. The problem is that ||X||1,2 = 0 does
not imply that X = 0. It only implies that X = 0 P-a.e.! The solution is to define equivalence
classes of functions that are P-almost everywhere equal. The resulting quotient spaces are denoted
by L2 (Ω, F, P) and L2 (Ω, F, P) and these are complete, normed spaces. In case of L2 (Ω, F, P), the
norm comes from the inner product (X, Y ) = E(XY ) and so the space is a Hilbert space. Note that
convergence in these spaces means convergence in the respective norms.
Problem 2.10 Suppose that X takes only countably many values.
i) What type of function is X? PX cannot be absolutely continuous w.r.t. the Lebesgue measure λ
on (R, B(R))- why?
ii) Give a formula for E(X) and σ 2 (X).
12
iii) Suppose that X ∈ {0, 1, . . .} P-a.s. and suppose that X has a finite expectation. Show the
following alternative formula for its expectation:
∞
X
E(X) =
P{X > n}.
n=0
N.B. the limit theorems of the previous section can be transferred to a formulation in terms of
expectations! When restricting to postive r.v.s, these can be used to yield some useful results. Note
that by definition for a r.v. X we have P{|X| < ∞} = 1 (this is not necessary, the theory holds
through if we allo infinite values!). Suppose that {Xn }n∈N is a collection of r.v.s on (Ω, F, P), that
are all P-a.e. non-negative.
• One has
E(
X
Xn ) =
X
E(Xn ),
(2.3)
n
n
where both are either finite or infinite.
P
P
E(Xn ) < ∞ implies that n Xn < ∞ a.e. and so Xn → 0, n → ∞, a.s.
•
Problem 2.11 Prove this. Conjure up a simple example where (2.3) fails when the positivity
condition lacks.
One can write probabilities of sets in terms of expectations:
P{X ∈ A} = PX {A} = E{1{X∈A} }
and similarly
Z
Z
gdPX =
g1{A} dPX .
A
R
We conclude this section with two important inequalities.
Lemma 2.5 (Chebyshev’s inequality) Suppose that X is a random variable. Let φ : R → R+ be
a non-decreasing, non-negative function such that E(φ(X)) < ∞. Then for all a > 0 with φ(a) > 0
one has
E(φ(X))
P{X ≥ a} ≤
.
φ(a)
Proof.
Z
E(φ(X) =
φ(x)dPX (x)
Z
≥
φ(x)dPX (x)
x≥a
≥ φ(a)P{X ≥ a}.
Positivity of φ justifies the first inequality.
QED
13
Let Z ∼ N (0, 1), that is Z has the standard normal distribution with density
1
fZ (x) = √ exp{−x2 /2}.
2π
We will prove that
P{Z > a} ≤ exp{−a2 /2}.
(2.4)
Take φ(z) = exp{γz}, γ > 0. Then
Z
1
E(φ(Z)) = √
exp{γz − z 2 /2}dz
2π Zz
1
= √
exp{−(z − γ)2 /2dz · exp{γ 2 /2}
2π z
= exp{γ 2 /2}.
So that
γ=a
P{Z > a} ≤ exp{γ 2 /2 − γa} = exp{−a2 /2}.
As an application, √
let X1 , X2 , . . . be N (0, 1) distributed random variables.
{ max{X1 , . . . , Xn } > 6 log n}. Then
p
P{An } = P{max{X1 , . . . , Xn > 6 log n}
p
≤ nP{X1 > 6 log n} ≤ n exp{−6 log n/2} = 1/n2 .
Hence
∞
X
P{An } ≤
n=1
∞
X
Let An
=
1/n2 < ∞.
n=1
Applying the first Borel-Cantelli lemma yields that
0 = P{lim sup An } = P{ lim sup{
n→∞
n→∞
max{X1 , . . . , Xn }
√
> 1}}.
6 log n
This implies that for a.a. ω
max{X1 (ω), . . . , Xn (ω)} ≤
p
6 log n, n ≥ n(ω).
A function f : A → R, where A = (a, b) is een open interval of R, is called convex on A if for all
x, y ∈ A, one has that
f (px + (1 − p)y) ≤ pf (x) + (1 − p)f (y).
Important convex functions on R are f (x) = |x|, x2 , exp{αx}.
Lemma 2.6 (Jensen’s Inequality, BSP p.31) Suppose that f : A → R is convex on A, with
A = (a, b). Suppose that X is a summable r.v. with
P{X ∈ A} = 1,
E(|f (X)| < ∞.
Then
Ef (X) ≥ f (E(X)).
Problem 2.12 Prove this lemma, by successively carrying out the following steps.
14
i) Show that there exists c ∈ [a, b], such that f is non-increasing on (a, c) and non-decreasing on
(c, b). And use this to show continuity of f on A.
ii) Show that for x0 < x1 < x2 , x0 , x1 , x2 ∈ A, one has
f (x1 ) − f (x0 )
f (x2 ) − f (x0 )
≥
,
x2 − x0
x1 − x0
by suitably expressing x1 as convex combination of x2 and x0 . Show that together with (i) this
implies that for each x0 ∈ A there exist a number n(x0 ) with
f (x) ≥ f (x0 ) + n(x0 )(x − x0 ),
x ∈ A.
iii) Finish the proof of the lemma by taking expectation in the last inequality and selecting a suitable
value for x0 .
Independence We now have a basic probability space (Ω, F, P).
Independence of σ-algebras Sub-σ-algebras F1 , F2 , . . . of F are called independent, whenever for each
sequence of sets A1 ∈ F1 , A2 ∈ F2 , . . . and each finite set of distinct indices i1 < i2 < · · · < in one
has
n
Y
P{Ai1 ∩ · · · ∩ Ain } =
P{Aik }.
k=1
Independence of r.v.s Random variables X1 , X2 , . . . are independent if the σ-algebras
σ(X1 ), σ(X2 ), . . . are independent.
Independence of events Events A1 , A2 , . . . are independent if the σ-algebras A1 , A2 , . . . are indepependent, where
Ai = {∅, Ω, Ai , Aci }.
In other words, if the r.v.s 1{A1 } , 1{A2 } , . . . are independent.
Problem 2.13 Show that for independence of A1 , . . ., it is sufficient to check for each finite set of
indices i1 , i2 , . . . , in that
n
Y
P{Ai1 ∩ · · · ∩ Ain } =
P{Aik }.
k=1
Remark: independence of r.v.s X1 and X2 say, does not imply that X2 is not σ(X1 )-measurable.
Construct a trivial example to illustrate this.
Checking independence of σ-algebras and r.v.s is a cumbersome task, but fortunately π-systems
lighten (up) life.
Lemma 2.7 Suppose that F1 and F2 are sub-σ-algebras of F. Suppose that there are π-systems I1
and I2 generating F1 and F2 :
σ(I1 ) = F1 , σ(I2 ) = F2 .
Then F1 and F2 are independent iff I1 and I2 are independent in that
P{I1 ∩ I2 } = P{I1 }P{I2 },
15
I1 ∈ I1, I2 ∈ I2 .
Proof. Clearly, independence of the σ-algebras implies independence of the π-systems. So assume
independence of the π-systems. The only apparatus for extending assertions on measures to whole
σ-algebras we have so far, is the uniqueness lemma 1.1.
Let I1 ∈ I1 be given. Then
µ(A) = P{I1 ∩ A},
ν(A) = P{I1 }P{A}
are measures on I2 (check this). These two measures agree on the π-system I2 . Moreover, µ(Ω) =
ν(Ω). By the uniqueness lemma they now agree on the whole of F2 . This implies
P{I1 ∩ A} = P{I1 }P{A},
A ∈ F2 .
(2.5)
Since I1 ∈ I1 was arbitrarily chosen, (2.5) holds for all I1 ∈ I1 and A ∈ F2 . Now, fix A ∈ F2 and
define µ(B) = P{B ∩ A}, ν(B) = P{B}P{A}. Again µ and ν agree on I1 , with ν(Ω) = µ(Ω), and so
by the uniqueness lemma they agree on the whole of F1 . This is what we wanted to prove. QED
Example. Suppose that for two random variables X and Y one has
P{X ≤ x, Y ≤ y} = P{X ≤ x}P{Y ≤ y}, x, y ∈ R,
i.e. the π-systems π(X) = {(X ≤ x); x ∈ R} and π(Y ) are independent. These π-systems generate
the σ-algebras σ(X) and σ(Y ), so that independence of X and Y follows. N.B. The book BSP
treats this matter slightly differently - independence of r.v.s is slightly differently defined.
Problem 2.14 Let X1 , X2 , . . . be independent r.v.s. Show that the σ-algebras σ(X1 , . . . , Xn ) and
σ(Xn+1 , . . . , Xn+l ) are independent.
Of course it is nice to define independence, but can one construct
r.v.s at all? Remind
P independent
−n
the construction in Problem 1.16. There we had that X(ω) = n ωn s has the uniform distribution
on (0, 1].
Problem 2.15 Show that Zn (ω) = ωn , n = 1, . . . are independent, indentically distributed r.v’s,
and give their distribution.
It follows easily that also
X1 (ω) = ω1 2−1 + ω3 2−2 + ω6 2−3 + ω10 2−4 + . . .
X2 (ω) = ω2 2−1 + ω5 2−2 + ω9 2−3 + ω14 2−4 + · · ·
X3 (ω) = ω4 s−1 + ω8 2−2 + ω13 2−3 + ω19 2−4 + · · ·
and so forth, have the uniform distribution on (0, 1]. The different subsequences of the expansion
of ω generating the Xi0 s are disjoint. It is intuitively clear that Xi are independent r.v.s with the
same uniform distribution on (0, 1]. Let any sequence of distribution functions Fn , n ∈ N be given.
By the Skohodrod representation, one can find r.v.s Yn = gn (Xn ) having distribution function Fn .
Independence is preserved obviously.
Problem 2.16 Let X and Y be independent r.v’s. Let g, h : R → R be Borel functions. Show that
g(X) and h(Y ) are independent.
16
Let us now consider two rv.s X and Y on the probability space (Ω, F, P). For each point ω we have
the vector function (X(ω), Y (ω)) taking values in R2 . This gives rise to distributions on the plane
R2 and hence of so-called product measures. We will not further discuss this here, but restrict to
essentially one-dimensional sub-cases.
First instance is when we consider g(ω) = X(ω)Y (ω).
Lemma 2.8 If X and Y are independent then E(XY ) = E(X)E(Y ), provided the latter expectations
exist (i.e. X and Y are P-summable).
Problem 2.17 Prove this by carrying out the following steps. First show the result for elementary
functions, then for positive functions. For the latter one uses sequences of approaching elementary
functions for X and Y : note these should be independent! Then finish the proof.
We now turn to proving a number of results on sequences of random variables. The proof rely on
assertions derived or stated hitherto. They will be applicable lateron to stochastic processes.
There are two results that concern sequences of independent, identically distributed r.v.s X1 , X2 , . . .
on the probability space (Ω, F, P).
For the first lemma, we need the concept of stopping time BSP pag.54). T : Ω → R ∪ {∞} is a
stopping time for the sequence X1 , . . ., if {T ≤ n} ∈ σ(X1 , . . . , Xn ). In words: the decision to stop
before or at time n is taken on basis of the outcomes X1 , X2 , . . . , Xn . Note that we allow T = ∞,
this is the non-stopping decision.
Let Xn = 1 with probability p and X = −1 with probability (1 − p): it can be interpreted as a
respective gain and loss a gambler incurs when tossing a biased coin. Then the gamblers gain or loss
after n tosses equals Sn = X1 + · · · + Xn . If the gamblers decides to stop after the n-th game, if gain
at that time is some number x, then this is a stopping time.
Lemma 2.9 (Wald’s equation) Let X1 , . . . be a sequence ofPi.i.d. distributed r.v.s with finite
expectation. Suppose that T < ∞ a.e. and E(T ) < ∞. Then E( Ti=1 Xi = E(X1 )E(T ).
Proof. Write ST =
steps)
EST
=
=
=
PT
i=1 .
Assume first that Xi ≥ 0 P-a.e. Now we have (check the validity of all
∞ Z
X
ST dP =
n=1 ω:T (ω)=n
∞ X
∞ Z
X
k=1 n=k
∞
X
∞ Z
X
Sn dP =
n=1 ω:T (ω)=n
∞ Z
X
Xk dP =
ω:T (ω)=n
k=1
ω:T (ω)≥k
∞ X
n Z
X
n=1 k=1
∞
X
Xk dP =
Xk dP
ω:T (ω)=n
E(Xk 1{T ≥k} )
k=1
E(Xk )P{T ≥ k}
k=1
= E(X1 )
∞
X
P{T ≥ k} = E(X1 )E(T ).
k=1
For the second equality we use that T < ∞ with probability 1. For the 7th equality we use independence of Xk and 1{T ≥k} . To show independence, we use the fact that 1{T ≥k} = 1 − 1{T ≤k−1} . By
17
definition, 1{T ≤k−1} is σ(X1 , . . . , Xk−1 )-measurable, hence 1{T ≥k} = 1 − 1{T ≤k−1} . It follows that
σ(1{T ≥k} ) ⊂ σ(X1 , . . . , Xk−1 ). Since σ(Xk ) and σ(X1 . . . , Xk−1 ) are independent, and so σ(Xk ) and
σ(1{T ≥k} ) are independent, i.e. Xk and 1{T ≥k} are independent.
Now, for general r.v.s Xn , the assertion follows from the fact that it applies to X1+ , . . . and X1− , . . ..
Check this.
QED
Problem 2.18 Suppose that p ≥ 1/2. The gambler intends to stop the first time t that his total
gain is -1 (i.e. he has 1 less than what he started with), i.e. T = t iff t = min{n|Sn = −1}.
Assuming that T is finite with probability 1, and has finite expectation, Wald’s equation applies.
What contradiction do we get and what might be wrong with our assumptions on T ? Study the case
p = 1/2 and p > 1/2 separately.
Many interesting events have probability 0 or 1. The first Borel-Cantelli lemma is an assertion on
a sequence of events when their probabilities are a finite series. What can we say, if there series
diverges? For this, we need an extra condition of independence.
Lemma 2.10 (Second
P∞ Borel-Cantelli Lemma) Suppose that A1 , A2 , . . . ∈ F is are independent
events, such that n=1 P{An } = ∞ Then P{lim supn→∞ An } = 1.
Proof.
G := ( lim sup An )c = ( ∩m ∪n≥m An )c = ∪m ∩n≥m Acn .
∩rn=m Acn
n→∞
c
Bm = ∩∞
n=m An .
Then G = ∪m Bm and Br,m ↓ Bm . Hence by monotone
and
Call Br,m =
convergence P{Br,m } ↓ P{Bm }. By independence
r
Y
P{Br,m } =
P{Acn } =
n=m
r
Y
r
X
(1 − P{An }) = exp {
n=m
log (1 − P{An })} ≤ exp { −
n=m
r
X
P{An }},
n=m
where we use that log(1 − x) ≤ −x for x ∈ (0, 1). By taking limits, we obtain
r
X
P{Bm } ≤ lim exp { −
r→∞
Now, P{G} ≤
P∞
m=1 P{Bm }
P{An }} = 0.
n=m
= 0, and so P{Gc } = 1, which is what we set out to prove.
QED
As an example, let Xn , n = 1, 2, . . . be a sequence of i.i.d. distributed random variables. Suppose
that Xn are exponentially distributed with parameter 1, i.e. P{Xn > x} = exp{−x}, x ≥ 0. Then
P{Xn > α log n} = n−α ,
α > 0.
Applying the 2 Borel-Cantelli lemmas, we find
P{Xn > α log n i.o. } =
0,
1,
α>1
α ≤ 1.
Put S = lim supn→∞ (Xn / log n). S is a r.v.!
{ω : S(ω) ≥ 1} = {ω : lim sup(Xn (ω)/ log n) ≥ 1}
n→∞
⊃ {ω : Xn (ω) > log n i.o }.
18
Hence, P{S ≥ 1} = 1. On the other hand,
P{S > 1 + 2α−1 } ≤ P{Xn > (1 + α−1 ) log n, i.o } = 0.
−1
We have that {S > 1} = ∪∞
α=1 {S > 1 + 2α }, hence P{S > 1} = 0. As a consequence:S ≡ 1 with
probability 1.
Problem 2.19 Monkey typing the Bible Suppose that a monkey types a sequence of symbols
at random, one per unit of time. This produces an infinite sequence Xn , n = 1, 2, . . . of i.i.d. r.v.s,
with values in the set of possible symbols on the typing machine. If it is a finite set of symbols, then
we agree that minx P{X1 = x} := > 0. The monkey lives infinitely long and types incessantly.
Typing the Bible corresponds to typing a particular sequence of say N symbols (N is the number of
symbols in the Bible). Let H = { monkey types infinitely many copies of the Bible }.
Use the second Borel-Cantelli lemma to show that P{H} = 1. Define suitable Ω, F and P and sets
An .
t:opg2-2
Problem 2.20 A sometimes convenient charachterisation of convergence with probability 1. Let X,
Xn , n = 1, . . . be r.v.s. on the same probability space (Ω, F, P). Then Xn → X with probability 1
iff for all > 0
lim P{ ∪∞
m=n (|Xm − X| > )} = 0.
n→∞
or equivalently iff for all > 0
lim P{ ∩m≥n (|Xm − X| ≤ )} = 1.
n→∞
Show this.
Problem 2.21 (Algebraic..) Let s > 1 and define the Riemann-zeta function ζ(s) =
Let X, Y be i.i.d r.v. with
n−s
P{Y = n} = P{X = n} =
.
ζ(s)
Prove that the events
Ap = {X divisible by p},
p prime
are independent. Explain Euler’s formula
1
=
ζ(s)
Y
p
prime
(1 −
1
)
ps
probabilistically. Prove that
P{ no square other than 1 divides X} =
Let H be the highest common factor of X and Y . Prove that
P{H = n} =
19
n−2s
.
ζ(2s)
1
.
ζ(2s)
P
n∈N n
−s .
Problem 2.22 Suppose that Xi denotes the ‘quality’ of the i-th applicant for a job. Applicants are
interviewed in a random order and so one may assume that X1 , . . . are i.i.d. random variables with
the same continuous distribution (i.e. they all have a continuous density). What is the probability
that the i-th candidate is the best so far? Prove that
1
P{Ei } = ,
i
where Ei = {i-th candidate is best so far} = {Xi > Xj , j < i}. Prove that the events E1 , E2 , . . . are
independent. Why would we assume a continuous distribution for the qualities? Suppose that there
are only a limited amount of N candidates. Calculate the probability that the i-th candidate is the
best amongst all N candidates.
Problem 2.23 Let X1 , . . . be i.i.d. r.v.s with the N (0, 1) distribution. Prove that
P{lim sup √
n→∞
Xn
= 1} = 1.
2 log n
Use that for x > 0
1
1
1 1
√ exp{−x2 /2} ≤ P{X1 > x} ≤ √ exp{−x2 /2},
x + 1/x 2π
x 2π
since X1 has the N (0, 1) distribution. The second inequality can be derived from the fact that
d
exp{−x2 /2} = −x exp{−x2 /2}.
dx
We recall one of the versions of the law of large numbers.
Theorem 2.11 (Strong Law of Large Numbers) Let Xn , n = 1, 2,P. . . be a sequence of i.i.d.
n
r.v.s on the probability space (Ω, F, P), with finite expectation. Then
i=1 Xi /n → E(X1 ), with
probability 1.
There are elementary proofs for which one needs only results from these pages, but we will not do
that here.
Problem 2.24 Is a fair game fair? Let X1 , . . . be independent r.v.s with P{Xn = n2 − 1} = 1/n2 =
1 − P{Xn = −1}. Prove that E(Xn ) = 0, but that
X1 + · · · + Xn
→ −1, with probability 1.
n
This is counter-intuitive, when bearing in mind the Law of Large Numbers! What would you expect
on basis of this law?
Problem 2.25 The following is a sometimes simple test of a.s. convergence. Let Xn , n = 1, . . . , X
be r.v.s on the same probability space (Ω, F, P). If for all > 0
X
P{|Xn − X| > } < ∞
n
then Xn → X with probability 1. Hint: use problem 2.20.
20
Problem 2.26 You have a lamp working on a battery. As soon as the battery fails, you replace
it with a new one. Batteries have i.i.d. lifetimes, say Xn ≥ 0 is the lifetime of the n-th battery.
Assume the lifetimes to be bounded: Xn ≤ M with probability 1 for some constant M . Let N (t) be
the number of batteries that have failed by time t.
i) Show that in general N (t) is not a stopping time, whereas N (t) + 1 is. Hint: N (t) = n iff
X1 + · · · + Xn ≤ t and X1 + · · · + Xn+1 > t.
PN (t)+1
ii) Argue that t < E( i=1
Xi ) ≤ t + M . Use Wald’s equation to show the elementary renewal
theorem for bounded r.v.s
E{N (t)}
1
lim
=
.
t→∞
t
E(X1 )
That is: the rate at which batteries fail is exactly 1/expected lifetime. Which is an intuitively
obvious result.
Problem 2.27 A deck of 52 cards is shuffled and the cards are then turned face up, one at a time.
Let Xi equal 1, if the i-th card turned up is an ace, otherwise Xi = 0, i = 1, . . . , 52. Let N denote
the number of cards needed to be turned over until all 4 aces appear. That is, the final ace appears
on the N th card to be turned over.
i) Show that P{Xi = 1} = 4/52.
ii) Is Wald’s equation valid? If not, why not?
3
The art of conditioning
The corresponding chapter in BSP is clear enough, only few remarks are to be made here. Let us
just give the definition of conditional expectation.
Suppose we have a probability space (Ω, F, P). Let X be a random variable with finite expectation,
i.e. E(|X|) < ∞. Let A ⊂ F be a sub-σ-algebra: say this is our knowledge of the structure of the
space Ω, which is coarser than F, but consistent with it. In fact, let us assume that we cannot
observe X in detail, that is our knowledge of the space Ω is for instance also coarser than σ(X).
Then we have to ‘estimate’ X in a consistent way with our knowledge A. It makes sense to replace
X by averaging the values X over all sets A ∈ A. This gives rise to follow theorem-definition (SBP
Def. 2.3, Def. 2.4, Prop. 2.3)
Theorem 3.1 (Fundamental Theorem and Definition of Kolmogorov 1933)
Suppose we have a probability space (Ω, F, P). Let X be a random variable with finite expectation,
i.e. E(|X|) < ∞. Let A be a sub-σ-algebra of F. Then there exists a random variable Y such that
i) Y is A-measurable;
ii) E(|Y |) < ∞;
iii) for each A ∈ A we have
Z
Z
Y dP =
A
XdP.
A
21
If Y 0 is another r.v. with properties (i,ii,iii), then Y 0 = Y with probability 1, i.e. P{Y 0 = Y } = 1. We
call Y a version of the conditional expectation of E(X|A) of X given A and we write Y = E(X|A)
a.s.
N.B.1 Conditional expectations are random variables!
N.B.2 Suppose we have constructed a A-measurable r.v. Z, with E(|Z|) < ∞, such that (iii) holds
for all A ∈ π(A), i.e. (iii) holds on a π-system generating A. Then (iii) holds for all A ∈ A, and so
Z is a version of the conditional expectation E(X|A).
N.B.3 BSP p.29 list a number of important properties of conditional expectation. An important
one on independence lacks.
t:l-6
Lemma 3.2 (Independence) Let (Ω, F, P) be a probability space. Suppose that A, G ⊂ F and that
X is a r.v. on (Ω, F, P) with finite expectation. Suppose that A is independent of σ(σ(X), G). Then
E(X|σ(G, A)) = E(X|G),
a.s.
(3.1)
In particular, choosing G = σ(X), it follows that E(X|A) = E(X), a.s., whenever A and σ(X) are
independent.
Proof. We may assume that X ≥ 0 with probability 1. For A ∈ A and G ∈ G, X1{G} and 1{A} are
independent and so
E(X1{G} 1{A} ) = E(X1{G} )E(1{A} ).
Since Y = E(X|G) a.s. is G-measurable, also Y 1{G} and 1{A} are independent with
E(Y 1{G} 1{A} ) = E(Y 1{G} )E(1{A} ).
Since E(X1{G} ) = E(E(X1{G} |G)) = E(1{G} E(X|G)) = E(1{G} Y ). it follows that
E(X1{G∩A} ) = E(X1{G} 1{A} ) = E(Y 1{G} 1{A} ) = E(Y 1{G∩A} ).
(3.2)
For a set C ∈ F, the functions µ(C) = E(X1{C} ), ν(C) = E(Y 1{C} ) define positive, finite measures
on (Ω, F, P). Note that the set C = G ∩ A, G ∈ G, A ∈ A, form a π-system for σ(G, A). By (3.2)
µ and ν are equal on this π-system, µ(Ω) = ν(Ω) and so there are equal on σ(G, A). Hence Y is a
version of E(X|σ(G, A)).
QED
Many theorem for integrals, i.e. expectations, apply to conditional expectations. Even though the
latter are r.v.s and not integrals! We quote some of these.
Properties of conditional expectations without proof (see also BSP p.29) Let the probability space (Ω, F, P) be given. Let X, Xn , n = 1, 2, . . ., be r.v.s on this probability space, with finite
expectation (E|X|, E|Xn | < ∞). Let A be a sub-σ-algebra of F.
conditional monotone convergence If 0 ≤ Xn ↑ X, a.s., then E(Xn |A) ↑ E(X|A) a.s.
conditional Fatou If Xn ≥ 0 a.s. and E(lim inf Xn |A) ≤ lim inf E(Xn |A).
conditional dominated convergence If Xn → X a.s., and |Xn (ω)| ≤ Y (ω), n = 1, 2, . . ., for the
r.v. Y with finite expectation, then E(Xn |A) → E(X|A) a.s.
22
conditional Jensen If f : R → R is a convex function, and E|f (X)| < ∞, then E(f (X)|A) ≥
f (E(X|A)) a.s.
Problem 3.1 A rather queer example. Let Ω = (0, 1]. Let A be the σ-algebra generated by all
one-point sets {x}, x ∈ (0, 1]. Let P{x} = 0 for all x ∈ (0, 1].
i) Does A contain any intervals? If yes, which ones? What is the relation between A and B(0, 1]?
What values can P{A} take for A ∈ A?
ii) Let X : (0, 1] → R be any r.v. Determine E(X|A). Explain heuristically.
N.B.4 Let X be square integrable. Then the conditional expectation is in fact a least squares estimate or an orthogonal projection of X onto the space of square integrable functions on (Ω, A, P).
Some terminology: by E(X|Y ), E(Y1 , Y2 , . . .) we mean E(X|σ(Y )), E(X|σ(Y1 , Y2 , . . .)) etc.etc.etc.
Problem 3.2 Let X, Y1 , Y2 be r.v.s on (Ω, F, P). Use BSP p.29 to show the following properties.
i) E(Xg(Y1 )|Y1 ) = g(Y1 )E(X|Y1 ), for Borel functions g.
ii) E(E(X|Y1 , Y2 )|Y2 ) = E(X|Y2 ).
t:opg2-3
Problem 3.3 Let (Ω, F, P) be given and let X be a r.v. Let A1 , A2 , . . . be a measurable partition
of Ω, that is: A1 , A2 , . . . ∈ F with Ai ∩ Aj = ∅ and ∪i Ai = Ω. let A = σ(A1 , . . .) be the σ-algebra
generated by this partition.
i) Show that there is a version Y of E(X|A) that is constant on each of Ai , in particular
Y (ω) =
E(1{Ai } X)
,
P{Ai }
ω ∈ Ai ,
provided that P{Ai } > 0. What is the value when P{Ai } = 0?
ii) Let Z be any A-measurable r.v. which has distinct values on the Ai , I = 1, . . .. How can you
express E(X|A) in term of Z?
This is not explicitly stated in BSP, but a very important property The Doob-Dynkin
lemma implies that there exists some Borel function g, such that
E(X|Y ) = g(Y )!, with probability 1.
In this case we write
E(X|Y = y) := g(y).
A similar assertion holds when Y = (Y1 , Y2 , . . . , Yn ) is a random vector on (Ω, F, P). We can often
calculate this and then it is extremely important in computing expectations etc.
Problem 3.4 Suppose that X = g(Y ) for some Borel function g. What is E(X|Y )?
23
Note that this entails that E(X|Y ) is constant on sets where Y is constant, i.e. on sets of the form
{ω : Y (ω) = y}.
Since E(X|Y ) = g(Y ) is a function of Y a.e. , one can write integrals of E(X|Y ) over measurable
sets of Ω as integrals over measurable sets of B w.r.t. the induced probability distribution PY of Y :
Z
Z
Z
E(X|Y )dP =
g(y)dPY (y) =
E(X|Y = y)dPY (y).
A
Y (A)
Y (A)
Problem 3.5 In the case of a discrete r.v. Y we have seen how to calculate E(X|Y ). Specify
E(X|Y = y). Show that
X
E(X) =
E(X|Y = y)P{Y = y}.
y
Problem 3.6 Suppose that X1 , . . . is a sequence of i.i.d. r.v.s on the probability
space (Ω, F, P) with
P
finite expectation. Let T be a stopping time for this sequence. Let Sn = ni=1 Xi . It is tempting to
say that
E(ST |T = n) = nE(X1 ).
This is not correct in general- explain why and give a counter-example.
This conditioning on a value Y = y gives rise to conditioning on events. Say, let A ∈ F. Put
Y = 1{A} . Then we define E(X|A) := E(X|Y = 1): this is a number!. If E(X|Y ) = g(Y ), then
E(X|A) = g(1).
Problem 3.7 Let A ∈ F, and let B1 , . . . be a measurable partition of the set A. Show that
X
E(X|A)P{A} =
E(X|Bi )P{Bi }.
i
Problem 3.8 Let a probability space (Ω, F, P) be given. Let X and Y be r.v.s on this space with
E|X| < ∞. By definition, E(X|Y ) is σ(Y )-measurable.
Suppose that Y has
R a density w.r.t. the Lebesgue meaure λ, i.e. there is a Borel function fY , such
that P{Y ∈ B} = B fY (y)dλ(y). Show that
Z
E(X) = E(X|Y = y)fY (y)dλ(y).
y
This is the analogon of the formula for discrete r.v.s Y !
There are now two issues to be addressed. first is that conditional expectations E(X|Y ) are easily
calculated when Y is a discrete r.v., taking only countably many values. However, when Y has a
more general distribution, it is not that obvious how to do this. A first step in this direction is to
write conditional expectation as expectations of r.v.s.
24
Conditional probabilities and conditional distribution functions (pdf ) Since probabilities
can be written as expectations, it is clear that one can also condition probabilities. Let A ∈ F and
let Y be a r.v. on (Ω, F). Then
P{A|Y } := E(1{A} |Y ) = pA (Y ),
where pA is Borel-function that depends on A. We call this the conditional probability of A given
Y . Write P{A|Y = y} = pA (y), and we call it the conditional probability of A given Y = y. As in
the foregoing,
Z
Z
Z
−1
P{A|Y = y}dPY (y),
P{A|Y }(ω)dP(ω) =
1{A} (ω)dP(ω) =
P{A ∩ Y (B)} =
Y −1 (B)
Y −1 (B)
B
so we can write this probability in terms of the probability distribution of Y !
Problem 3.9 Calculate P{A|Y = y} when Y is a discrete r.v.
Let X be another r.v. on the same probability space. Then we can apply the above to the set
A = {X ∈ B 0 }. it is common to write PX|Y (B 0 ) = P{X ∈ B 0 |Y }, PX|Y =y (B 0 ) = P{X ∈ B 0 |Y = y}
as the conditional distribution of X given y and given Y = y respectively. This implies that
Z
Z
0
0
P{X ∈ B ∩ Y ∈ B} =
P{X ∈ B |Y }dP =
PX|Y =y (B 0 )dPY (y).
(3.3)
Y −1 (B)
B
It is a theorem that one can choose a so-called regular version of PX|Y =Y (ω) , which is a probability
measure on (R, B) for P-almost all ω ∈ Ω.
Problem 3.10 Argue that PX|Y =y (A) = PX (A) when X and Y are independent r.v.s on the same
probability space (Ω, F, P).
Since PX|Y =y is a probability distribution on (R, B), we can calculate expectations of B-measurable
functions.
t:l-8
Lemma 3.3 Let φ be a Borel function. Then
Z
E(φ(X)|Y = y) =
φ(x)dPX|Y =y (x).
(3.4)
R
Problem 3.11 Derive this relation, when Y is a discrete r.v.
Proof. Why is this so? Again we apply the strategem of going from elementary functions, via nonnegative functions to general functions. First, let φ = 1{B} , B ∈ B. In this case φ(X) = 1{B} (X) =
1{} X −1 (B). Hence
def
E(φ(X)|Y ) = E(1{B} (X)|Y ) = E(1{X −1 (B)} |Y ) = P{X −1 (B)|Y } = PX|Y (B).
On the other hand
Z
Z
φ(x)dPX|Y =y (x) =
x
dPX|Y =y (x) = PX|Y =y (B).
x∈B
25
General elementary functions φ are linear combinations of indicator functions. The assertion then
follows from the above and the linearity property BSP29 property (1). For positive functions it
follows by monotone convergence of conditional expectations. Finally we write φ = φ+ − φ− , and
then the results follows again from linearity.
QED
We have reduced the problem of computing conditional expectations, to the problem of computing
conditional probability distributions. Does this help?
Very often, a problem already is formulated in terms of conditional distributions. If this is not the
case, one can do something in the following case.
Say X, Y have a joint probability density fX,Y , with respect to the Lebesgue measure λ2 on (R2 , B 2 ):
Z Z
0
P{X ∈ B , Y ∈ B} =
fX,Y (x, y)dλ2 (x, y).
x∈B 0 ,y∈B
Then fY (y) =
R
R fX,Y (x, y)dλ(x)
acts as a probability density of Y .
Define the elementary conditional pdf (=probability density function) of X given Y as
(
fX,Y (x,y)
if fY (y) 6= 0
fY (y) ,
fX|Y =y (x) =
0,
otherwise.
Then
Z
Z
PX|Y =y (A) = P{X ∈ A|Y = y} =
fX|Y =y (x)dλ(x),
E(φ(X)|Y = y) =
φ(x)fX|Y =y (x)dλ(x).
x∈A
This material is contained in BSP exercise 2.16 and Remark 2.3 and you should be able to do the
derivations by help of BSP. One can check the validity of this by checking the definition of conditional
expectation by rewriting (3.3).
Extra observation on conditioning Often, the same random variable appears in the conditioningPand as part of the random variable that we take the conditional expectation of. For instance,
E( Ti=1 Xi |T = t), where T is a r.v. with positive integer values. Intuitively it is clear that we can
P
P
insert the value t for T in the conditioning: E( Ti=1 Xi |T = t) = E( ti=1 Xi |T = t). Is this true
generally?
For some cases we do know this already:
i) E(X + f (Y )|Y = y) = E(X|Y = y) + f (y) = E(X + f (y)|Y = y), by linearity of conditional
expectations, for any Borel function f ;
ii) or, E(Xf (Y )|Y = y) = E(X|Y = y)f (y) = E(Xf (y)|Y = y), by “taking out what is known”.
How can one prove this in case of the above example of a random sum?
Let X, Y be given r.v.’s on a probability space (Ω, F, P). Let us consider the functions f : R2 → R,
with f B 2 -measurable. A π-system for B 2 is for instance the collection of product sets {(−∞, x] ×
(−∞, y]|x, y ∈ R}. The question if whether E(f (X, Y )|Y = y) = E(f (X, y)|Y = y) PY -a.s.
Define H as the collection of bounded Borel functions f : R2 → R, with E(f (X, Y )|Y = y) =
E(f (X, y)|Y = y) PY -a.s. It is straightforward to check that H is a monotone class.
26
Let f = 1{(−∞,a]×(−∞,b])} . If f ∈ H for any a, b ∈ R, then H contains all bounded B 2 -measurable
functions by the Monotone class Theorem. We check that f ∈ H:
E(f (X, Y )|Y ) = E(1{X≤a} 1{Y ≤b} |Y ) = 1{Y ≤b} E(1{X≤a} |Y ).
On the other hand
E(f (X, y)|Y ) = E(1{X≤a} 1{y≤b} |Y ) = 1{y≤b} E(1{X≤a} |Y ).
On the set Y −1 (y) one has 1{y≤b} = 1{Y ≤b} (the first is the constant function on Ω, either 0
everywhere or 1). So the result is proved, if we take the same version E(1{X≤a} |Y ).
Now, think yourself how to extend this to unbounded B 2 -measurable functions f .
Another approach is to use joint measures: P{X ≤ a, Y ≤ b} defines a probability measure PX,Y on
(R2 , B 2 ). We have seen that
Z
Z
dPX|Y =y (x)dPY (y).
PX,Y {(−∞, a] × (−∞, b]} = P{X ≤ a, Y ≤ b} =
y≤b
x≤a
Under assumed regularity conditions, for arbitrary Borel sets B 2 ∈ B 2 one gets by standard procedures
Z
Z
dPX|Y =y (x)dPY (y),
PX,Y (B 2 ) =
y:∈By
x:(x,y)∈B 2
with By = {y ∈ R : ∃x such that (x, y) ∈ B 2 }. So we have an identity for measures. Now by going
through the standard machinery of indicator functions, elementary functions, positive and general
functions, one can show that
Z
Z
f (X, Y )dP =
f (x, y)dPX,Y (x, y)
ω
x,y
Z Z
Z
=
f (x, y)dPX|Y =y (x)dPY (y) = E(f (X, y)|Y = y)dPY (y)
y
x
y
provided that E(f (X, y)|Y = y) is a B-measurable function on R! Can you prove this from standard
machinery?
Remind that E(f (X, Y )|Y ) = g(Y ) for some Borel-function g. Our goal to prove is that one can
take g(y) = E(f (X, y)|Y = y) (presumably we have proved measurability). Not to confuse notation,
write h(y) = E(f (X, y)|Y = y). We get
Z
Z
h(Y )dP =
h(y)dPY (y)
ω∈Y −1 (B)
y∈B
Z
Z
Z
=
f (x, y)dPX|Y =y dPY (y) =
f (x, y)dPX,Y (x, y) =
y∈B x
y∈B,x∈R
Z
=
f (X, Y )dP.
ω∈Y −1 (B)
It follows that h(Y ) is a version of the conditional expectation E(f (X, Y )|Y ).
In case that X and Y are independent, we have a simpler expression since in this case PX|Y =y (B) =
PX (B): h(y) = E(f (X, y)|Y = y) = EX (f (X, y)), where we take the unconditional expectation w.r.t.
X.
27
Help variables Sometimes it is convenient to consider ‘mixtures’ of conditional expectation in the
following sense. Let X, Y, Z be r.v.s on (Ω, F, P). One can then speak of E(X|Y, Z = z). Let g(Y, Z)
is a Borel function that is a.s. equal to E(X|Y, Z). Then E(X|Y, Z = z) = g(Y, z), where Y is left
unspecified.
Since σ(Z, Y ) ⊃ σ(Y ), the Tower property yields that E(E(X|Y, Z)|Y ) = E(X|Y ).
Let us consider E(E(X|Y,
Z)|Y ) = E(g(Y, Z)|Y ). We are in the above situation: E(g(Y, Z)|Y = y) =
R
E(g(y, Z)|Y = y) = z g(y, z)dPZ|y=y (z). Now, if Z and Y are independent, we find
Z
E(g(Y, Z)|Y = y) = g(y, z)dPZ (z),
z
R
so that E(X|Y = y) = z g(y, z)dPZ (z). Hence, if the conditional expectation E(X|Y, Z) = g(Y, Z)
is easy to calculate, this may help to solve the more complicated problem of calculating E(X|Y ).
Problem 3.12 Try to justify all these steps.
This procudure may help to attack BSP exercise 2.6 in a more structured way.
Problem 3.13 Let X = ξ and Y = η from exercise 2.6. Define an appropriate r.v. Z, such that
E(X|Y, Z) can be directly calculated. Compute the desired conditional expectation E(X|Y ).
One can derive many convenient statements about these ‘mixed’ conditional distributions. Let
X, Y1 , . . . , Yn , Z be r.v.s on the same probability space (Ω, F, P).
Problem 3.14 i) Show that
E(X|Z = z) = E(E(X|Y1 , . . . , Yn , Z = z)|Z = z).
ii) Let Z = z ∈ σ(Y1 , . . . , Yn ). Show that
E(E(X|Y1 , . . . , Yn )|Z = z) = E(E(X|Y1 , . . . , Yn , Z = z)|Z = z).
d
d
Problem 3.15 Let X, Y be independent r.v.s with X = exp(λ), Y = exp(µ).
Show that
d
min{X, Y } = exp{λ + µ}.
Problem 3.16 Let X1 , . . . , Xn be i.i.d. r.v.s, distributed as a homogeneous distribution on (0, 1)
d
(Xi = Hom(0, 1)).
i) Determine the distribution function FZ and density fZ of Z = max(X1 , . . . , Xn ).
ii) Calculate P{Z ≤ z|X1 = x} and the density fZ|X1 =x (z).
iii) Calculate P{X1 ≤ x|Z = z} and P{X1 ≤ x|Z}. Hint: use (ii). Calculate E(X|Z).
d
Problem 3.17 Let U, V i.i.d. r.v.s, with U, V = Hom(0, 1). Let X = min(U, V ) and Y = max(U, V ).
Calculate P{Y ≤ y|X} and calculate E(Y |X).
28
Problem 3.18 Let X1 , . . . , Xn be i.i.d. r.v.s with continuous distribution functions F. Let X =
max{X1 , . . . , Xn } and Y = min{X1 , . . . , Xn }. Prove the following statements.
i)
P{Y > y|X = t} =
F(x) − F(y) n−1
F(x)
,
y < x.
ii)
(
P{Xk ≤ x|X = t} =
n−1 F(x)
n F(t) ,
1,
x<t
x ≥ t.
iii)
E(Xk |X = t} =
n−1
n · F(t)
Z
t
ydF(y) +
−∞
t
.
n
Problem 3.19 Gambler’s ruin. A man is saving money to be a new Jaguar at the cost of N units
of money. he starts having k (1 < k < N ) units and tries to win the remainder by the following
gamble with his bank manager. He tosses a fair coin repeatedly; id it comes up heads the manager
pays him one unit, but if it comes up tails then he pays the bank manager one unit. He plays this
game repeatedly, until one of two events occurs: either he runs out of money and is bankrupted or
he wins enough to buy the Jaguar. What is the probability that he is ultimately banktrupted?
Let Ak denote the event that he is eventually bankrupted, given an initial capital of k units. Write
pk = P{Ak }. Let B the event that the first toss of the coin shows heads.
Conditioning on B yields a linear relation between pk , pk−1 and pk−1 , for k = 1, . . . , N − 1. This is
a linear difference equation with boundary conditions p0 = 1, pN = 0.
A trick to solve this (and many similar problems), is to look at the differences bk = pk − pk−1 . The
linear difference equation then transforms to a linear relation between bk and bk+1 .
i) Solve it and determine pk .
ii) One can look at the problem from a different point of view. Say let T be the first time our man
either is bankrupted or he has collected the money for buying the Jaguar. Show that T is a
stopping time. Assume that it is finite with probability 1 and has finite expectation. Use this
to derive the same formula for pk .
Problem 3.20 Now the man follows another strategy. He starts by betting one unit of money. If
heads come up, the manager pays him is bet, if tails come up, he loses his bet to the manager.
Everytime he wins, he increases his bet by one, but he will never bet more than his present capital
or the remainder needed to buy the Jaguar. If he loses he decreases the next bet by one, with again
the condition that he will not bet more than his present capital and the sum needed to buy the
Jaguar. He will always bet at least 1.
Denote by Sn his capital after n bets, S0 = k is his initial capital. Let T again denote the moment
that the man stops betting. Then let us simply model that the mans capital remains the same forever
after.
29
i) Show that E{Sn+1 |S0 , . . . , Sn } = Sn ,
ii) Show that E(Sn ) = S0 .
iii) Assume that we may conclude that E(ST ) = S0 . Determine now the probability that the man
gets bankrupted. How do both strategies compare?
Problem 3.21 A biased coin is tossed repeatedly. Each time there is a probability p of a head
turning up. Let pn be the probability that an even number of heads has occurred after n tosses
(zero is an even number). Then p0 = 1. Derive an expression for pn in terms of pn−1 and use it to
calculate pn , n = 1, 2, . . ..
Sequences of r.v.s and some examples
Gambling systems (cf. BSP Ch.3) A casino offers the following game consisting of n rounds. In
every round t he bets αt ≥ 0. His bet in round t may depend on his knowledge of the game’s past.
The outcomes ηt , t = 1, . . . of the game are i.i.d. r.v.s with values in {−1, 1} and P{ηt = 1} = 1/2 =
P
P{ηt = −1}. The gambler’s capital at time t is therefore Xt = ti=1 αi ηi .
A gambling statregy α1 , α2 , . . . is called admissable if αt is σ(η1 , η2 , . . . , ηt−1 )-measurable. In words
this means that the gambler has no prophetic abilities. His bet at time t depends exclusively on
observed past history.
Example: αt = 1ηt >0 “only bet if you will win” is not admissible.
Problem 3.22 By the distribution of outcomes, one has E(Xt ) = 0. Prove this.
One has T = min{t|Xt ≤ α} is a stopping time, since {T ≤ t} = ∪tl=0 {Xl ≤ α} and
{Xl ≤ α} ∈ σ(η1 , . . . , ηl } ⊂ σ(η1 , . . . , ηt },
l ≤ t.
Now, αt = 1{T >t−1} = 1{T ≥t} ∈ σ(η1 , . . . , ηt−1 ) defines an admissible gambling strategy with
Xt =
t
X
j=1
where St =
Pt
j=1 ηj .
αj ηj =
t
X
min{t,T }
1{T ≥j} ηj =
j=1
X
ηj = Smin{t,T } ,
j=1
Hence ESmin{t,T } = 0 if T is a stopping time.
Hedging We have seen that the above gambling strategies cannot modify the expectation: one the
everage the gambler wins and loses nothing. Apart from that, which payoffs can one obtain by
gambling?
We discuss a simple model for stock options. Assume that the stock price either increases by 1 or
decreases by 1 every day, with probability 1/2, independently from day to day. Suppose I own αt
units of stock at time t. Then the value of my portfolio increases by αt ηt every day (ηt are defined
as in the gambling section).
Suppose the bank offers the following contract “European option”: at a given time t one has the
choice to buy 1 unit of stock for price C or not to buy it. C is specified in advance. Our pay-off per
unit stock is (St − C)+ . In exchange, the bank receives a deterministic amount E((St − C)+ ).
Can one generate the pay-off by an appropriate gambling strategy? The answer is yes, and in fact
much more is true.
30
Lemma 3.4 Let Y be a σ(η1 , . . . , ηn ) measurable function.
α1 , . . . , αt such that
n
X
Y − E(Y ) =
αj ηj .
Then there is a gambling strategy
j=1
Proof. Write Fn = σ(η1 , . . . , ηn ). Define αj by
αj ηj = E(Y |Fj ) − E(Y |Fj−1 ).
We have to show that αj ∈ Fj−1 .
Problem 3.23 i) Show that E(αj ηj |Fj−1 ) = 0.
ii) Use this fact to show that E(αj |Fj−1 , ηj = 1) = E(αj |Fj−1 , ηj = −1).
Now αj is Fj−1 -measurable if
αj =
1
E(αj ηj |Fj−1 , σ(ηj ))
ηj
does not depend on the value ηj . But this follows from the above.
Problem 3.24 Explain this.
We conclude that αj , j = 1, . . . is a gambling strategy. The result follows by addition.
QED
Problem 3.25 Symmetry Let X1 , . . . be i.i.d. r.v.s with finite expectation. Let Sn = X1 +· · ·+Xn .
In general X1 is not σ(Sn )-measurable for n ≥ 2. Explain, and give an example.
Show that with probability 1 we have
E(X1 |Sn ) = · · · = E(Xn |Sn ) =
4
1
1
Sn
E(X1 + · · · + Xn |Sn ) = E(Sn |Sn ) =
.
n
n
n
Martingales
From now on we will mainly list homework problems. As a basic ‘datum’ we take a filtered space
(Ω, F, {Fn }n , P). Here (Ω, F, P) is a probability space and Fn ⊂ F, n = 1, . . . is a filtration, that is
F1 ⊆ F2 ⊆ F3 ⊆ · · ·.
Define F∞ = σ(∪n Fn ).
Let {Mn } be a supermartingale, adapted to the filtration {Fn }n .
Problem 4.1 Suppose that S and T are stopping times adapted to the filtered space. Show that
min(S, T ) = S ∧ T , max(S, T ) = S ∨ T and S + T are stopping times.
31
Suppose that T is a stopping time that is finite with probability 1.
Then {Mn∧T }n is a supermartingale (provided that E|Mn∧T | < ∞) and hence E(Mn∧T ) ≤ E(M0 ).
Under what conditions is
(4.1)
E(MT ) ≤ E(M0 )?
Basically one needs a condition ensuring that
E(MT ) ≤ lim E(Mn∧T ),
n→∞
(4.2)
in the supermartingale case, or
E(MT ) = lim E(Mn∧T )
n→∞
in the martingale case. The latter amounts to justifying interchange of limit and expectation.
BSP gives general conditions for this to happen in the form of (Doob’s) Optional Stopping Theorem.
We can also give simpler conditions that often apply and for which (4.2) can be proved in a more
direct manner.
We give another form of the Optional Stopping Theorem.
Theorem 4.1 (Doob’s optional Stopping Theorem) i) Let {Mn }n be a supermartingale and
T an a.s. finite stopping time. One has E|MT | < ∞ and (4.1) in each of the following cases.
1. T is a.s. bounded: T (ω) ≤ N for almost all ω ∈ Ω, for some constant N .
2. Mn (ω) ≤ C for some constant C, for almost all ω, n = 0, 1, . . ..
3. E(T ) < ∞ and |Mn (ω) − Mn−1 (ω)| ≤ C for some constant C, for a.a. ω, n = 1, . . ..
ii) If {Mn }n is a martingale then E(MT ) = E(M0 ) under any of the conditions 1,2 or 3.
iii) Martingale transformation Suppose that {Mn }n is a martingale and T a stopping time satisfying
(i, 3). Let {αn }n be an admissible gambling strategy adapted to {Fn }n (or a previsible process),
such that |αn (ω)| ≤ C2 for a.a. ω, n = 1, . . ., for some constant C2 . Then
E(
T
X
αn (Mn − Mn−1 )) = 0,
n=1
in other words, on the average, we cannot turn a neutral game into a profitable (or losing) one.
iv) If {Mn }n is a non-negative supermartingale and T is a.s. finite, then (4.1) again applies.
Problem 4.2 i) Prove parts (i,ii,iii) of the above Optional Stopping Theorem.
ii) Prove (iv). Deduce that λP{supn Mn ≥ λ} ≤ E(M0 ).
A problem in applying this theorem is to check a.s. finiteness of the stopping time, and even checking
that it has finite expectation.
There is a simple result, which applies in many cases.
32
Lemma 4.2 What always stands a reasonable chance of happening, will a.s. happen,
sooner rather than later. Let T is a stopping time on the filtered space (Ω, F, (Fn )n , P). Supposet:l-4-1
T has property that for some N ∈ Z+ and some > 0,
P{T ≤ t + N | Ft ) > ,
a.s.,
t = 1, 2, . . .
Then T < ∞ a.s., in particular E(T ) < ∞.
Problem 4.3 Prove Lemma 4.2. Hint: using that P{T > kN } = P{T > kN, T > (k − 1)N }, prove
by induction that P{T > kN } ≤ (1 − )k .
Monkey typing ABRACADABRA At each of time 1, 2, 3, . . . a monkey types a capital letter
at random. The sequence of letters form an i.i.d. sequence of r.v.s, uniformly drawn from the 26
possible capital letters.
Just before each time t = 1, 2, 3, . . ., a new gambler arrives, carrying D 1 in his pocket. Het bets D 1
that the tth letter will be A. If he loses, he leaves; if he wins he receives D 26 times his bet (so that
his total capital of his first bet is D 26!). He bets all of D 26 on the event that the (t + 1)th letter will
be B. If he loses, he leaves. If he wins, he will bet his whole fortune of D 262 on the event that the
(t = 2)th letter will be R. And so forth through the whole ABRACADABRA sequence. Let T
be the first time, by which the monkey has produced the ABRACADABRA sequence. Once this
sequence has been produced, gamblers stop to arrive at the system and nothing happens anymore.
Problem 4.4 i) Put M0 = 0. Show that the total accumulated gain Mt by the gamblers at time
t, t = 0, 1, 2, . . ., is a martingale (loss is a negative gain).
ii) Show that T is a.s. finite with E(T ) < ∞.
iii) Explain why martingale theory makes it intuitively obvious that
E(T ) = 2611 + 264 + 26.
Prove this.
iv) Can you make a guess of the expected time till the monkey has typed 10 successive A’s? Explain
intuitively.
Simple and asymmetric random walks Let Xn be a simple or asymmetric rand walk on the
integers. Then Xn is a martingale whenever p = 1/2, it is a supermartingale whenever p < 1/2 and
a submartingale when p > 1/2.
First consider a finite interval (a, b), such that X0 ∈ (a, b). Let T be the first time that Xn leaves
this interval, i.e.
T = min{n | Xn 6∈ (a, b)}.
Problem 4.5 i) Show that Xn − n(2p − 1) is a martingale.
ii) Show that T is a.s. finite and has finite expectation.
Let p = 1/2.
33
iii) Compute P{XT = a} and P{XT = b}, using the martingale from (i).
iv) Compute E(T ). Hint: use one of the ways discussed in BSP or during the lectures, to define a
suitable related martingale.
In the case that e.g. b = ∞ the result should be intuitively obvious that T is a.s. finite whenever
p > 1/2, but it is not whenever p ≤ 1/2. Let a = 0, X0 = 1, b = ∞, that is, we are interested in the
probability that random walk will hit 0. There are many ways of investigating this. Here we aim to
use methods discussed in Ch. 3 of BSP and the notes.
Problem 4.6 i) Use the previous problem to show that T is a.s. finite in the symmetric case. Show
that E(T ) = ∞.
ii) Assume that p < 1/2. Show that T is a.s. finite by using the martingale from (i) of the previous
problem. Hint: show that E(n ∧ T ) ≤ 1/(1 − 2p). Deduce that E(T ) < ∞.
This still leaves the case p > 1/2. A simple technique coming from Markov chain theory and potential
theory helps. We formulate it in a more general context.
Lemma 4.3 Let {Xn }n be a stochastic process with values in Z, adapted to the filtration {Fn }n . Let
sH be the collection of functions f : Z+ → R with the following properties: f ≥ 0, and {f (Xn )}n is a
supermartingale (adapted to {Fn }n , for any initial position X0 = x. Let T0 = min{n > 0 | Xn = 0}.
In Markov chain theory such functions are called non-negative, superharmonic functions.
i) Let x > 0 be given. Suppose that P{T0 < ∞ | X0 = x} = 1. Then f (0) ≤ f (x).
ii) Show that the stopping time T for the asymmetric walk with p > 1/2, is infinite with positive
probability. Hint: construct a function f with f (0) > f (x), such that f (Xn ) is a martingale.
Martingale formulation of Bellman’s optimality principle.
game n are n , where the n are i.i.d. r.v.s with
Your winning per unit stake on
P{n = 1} = p = 1 − P{n = −1},
with p > 1/2. Your bet αn on game n must lie between 0 and Zn−1 , your capital at time n − 1.
Your object is to maximise your ‘interest reate’ E log(ZN /Z0 ), where N =length of the game is finite
and Z0 is a given constant. Let Fn = σ(1 , . . . , n ) be your ‘history’ upto time n. Let {αn }n be an
admissible strategy.
Problem 4.7 Show that log(Zn ) − nα is a supermartingale with α the entropy given by
α = p log p + (1 − p) log(1 − p) + log 2.
Hence log(Zn /Z0 ) ≤ N α. Show also that for some strategy log(Zn ) − nα is a martingale. What is
the best strategy?
34
5
Martingale convergence problems
Let the filtered probability space (Ω, F, {Fn }n , P) be given. All processes are again processes on this
space, adapted to the filtration {Fn }n .
A summary of the L1 -supermartingale convergence theorem is as follows.
Theorem 5.1 (BSP Thm.4.3, 4.4) Let {Mn }n=0,1,... be a UI supermartingale. Then Mn → M∞
a.s. for some r.v. M∞ , and even Mn → M∞ in L1 , i.e.
E|Mn − M∞ | → 0,
n → ∞.
If {Mn }n is a martingale, then
Mn = E(M∞ |Fn ),
and so {Mn }n is a Doob type martingale w.r.t. M∞ .
It is useful to quote the following theorem, which extends BSP exercise 4.5.
Theorem 5.2 (Levy’s ‘Upward’ Theorem) Let X be a r.v. with E|X| < ∞. Then Mn =
E(X|Fn ) is a UI martingale. Let M∞ = a.s. limn→∞ Mn , then
M∞ = E(X|F∞ ),
a.s.
where F∞ = σ(∪n Fn ).
That M∞ = E(X|F∞ ) is by no means trivial. It amounts again to justifying a limit interchange:
limn E(X|Fn ) = E(X|σ(limn Fn )).
Proof. We only have to prove that M∞ = E(X|F∞ ), a.s.
Let Y = E(X|F∞ ), a.s., and suppose that P(Y 6= M∞ ) > 0. We may assume that X ≥ 0, a.s. Define
two measures on (Ω, F∞ ):
µ1 (A) = E(Y 1{A} ),
A ∈ F∞ .
µ2 (A) = E(M∞ 1{A} ),
For B ∈ Fn we have B ∈ F∞ and so
µ1 (B) = E(Y 1{B} )
def of Y
=
E(X1{B} )
def of Mn
=
E(Mn 1{B} )
BSP Thm. 4.4
=
E(M∞ 1{B} ) = µ2 (B).
Hence µ1 and µ2 agree on the π-system ∪n Fn and therefore they agree on F∞ .
Now Y is F∞ -measurable. Take M∞ = lim supn Mn , then F∞ -measurable. Hence F = 1{Y >M∞ } is
F∞ -measurable and so
E((Y − M∞ )1{F } ) = µ1 (F ) − µ2 (F ) = 0.
Since (Y − M∞ )1{F } ≥ 0, it follows that P{F } = 0. Similarly, P{Y < M∞ } = 0.
QED
Theorem 5.3 (Kolmogorov’s 0-1 law) Let X1 , . . . be a sequence of independent r.v.’s. Then
P{A} = 0 or 1 for all A ∈ T , with T the tail-σ-algebra.
35
Proof. Define Fn = σ(X1 , . . . , Xn ). Let A ∈ T , and let X = 1{A} . By Levy’s upward theorem,
X = E(X|F∞ ) = lim E(X|Fn ),
n
a.s.
Now X is Tn+1 -measurable. Since Tn+1 and Fn are independent, it follows that X is independent
of Fn . And so, E(X|Fn ) = E(X) = P{A}. Consequently, X = P{A}, a.s. The result follows, since
indicator functions take only the value 0 or 1.
QED
There is a nice proof of the strong Law of Large Numbers using Kolmogorov’s 0-1 Law. To this end
we will in fact use so-called ‘reverse martingales’.
Theorem 5.4 (Levy’s Downward Theorem) Let (Ω, F, P) be a probability space.
{F−n }n=0,... be a non-increasing collection of sub-σ-algebras of F with
Let
F−1 ⊇ F−2 ⊇ · · · ⊇ F−n ⊇ · · · ⊇ F−∞ = ∩n F−n .
Let X be a r.v. with E|X| < ∞, and define M−n = E(X|F−n ). Then M−∞ = limn→∞ M−n exists
a.s. and in L1 . Moreover M−∞ = E(X|F−∞ ), a.s.
Problem 5.1 Prove the theorem. Use the techniques that were used for Doob’s submartingale
convergence theorem, the L1 -convergence and Levy’s Upward Theorem.
Let X1 , . . . be a sequence of i.i.d. r.v.s with finite expectation. Write Sn =
F−n = σ(Sn , Sn+1 , . . .).
Pn
i=1 Xi .
Define
Problem 5.2 i) Show that E(X1 |Fn ) = Sn /n, a.s.
ii) Show that limn→∞ Sn /n exists a.s. and in L1 , and that it equals E(X1 ).
Galton-Watson process- the simplest form of a branching process This is a simple model
for population growth, growth of the number of cells, etc. Suppose that we start with a population
of 1 individual at time 0, i.e. N0 = 1.
The number of t-th generation individuals is denoted by Nt . Individual n from this generation has
an amount of offspring Ztn . We assume that Ztn , n = 1, . . . , Nt , t = 0, . . . are bounded i.i.d. r.v.s, say
P{Ztn = k} = pk ,
k = 0, . . . , K,
P t
P
n
for some constant K > 0, and that p0 > 0. Clearly, Nt+1 = N
n=1 Zt and µ = E(N1 ) =
k kpk .
We are interested in the extinction probability of the population as well as the expected time till
extinction. Define T = min{t | Nt = 0}.
Problem 5.3 i) Show that Nt /µt is a martingale with respect to an appropriate filtration.
Let now µ < 1, that is, on the average an individual produces less than one child.
ii) Show that Nt → 0 a.s. What does this imply for the extinction time T ?
36
iii) Show that Mt = αNt 1{Nt >0} is a contracting supermartingale for some α > 1, i.e. for some
α > 1 there exists β < 1 such that
E(Mt+1 |Ft ) ≤ βMt , t = 1, . . .
iv) Show that this implies that E(T ) < ∞. What is the smallest bound on E(T ) you can get?
The case of a population that remains constant on the average, is more complicated. Let TN =
min{t|Nt = 0 or Nt ≥ N }. Intuitively it is clear that TN should be a.s. finite. In order to prove this,
define the function
f (x) = P{0 < Nt < N, for all t ≥ n}|Nn = x}.
Problem 5.4 i) Show that f (Nt ) is a supermartingale.
ii) Show that this implies that f ≡ 0, for all values of µ. Hint: consider the value f (x∗ ) where
x∗ = argmax{f (x)}.
iii) Let µ = 1. Use (ii) to show that P{T < ∞} = 1. Is Nt a UI martingale in this case? Explain.
iv) Prove that P{T < ∞} < 1 whenever µ > 1. You can prove this by using arguments that you
have seen before during the course.
We are still left with the question whether the average time till extinction is finite or not, in the
critical situation µ = 1. The answer is that E(T ) = ∞, for which there seems to exist no probabilistic
proof.
Problem 5.5 Find the simplest proof in the literature of this statement and write it down in your
own words.
6
Continuous time processes: the Wiener process or Brownian motion
Multivariate normal distribution Let

X1
 X2 

X=
 ... 

Xk
| distribution, µ ∈ Rk , Σ
| a k×k
be a k-dimensional random vector. We say that X has a N (µ, Σ)
T
T
T
|
positive definite matrix, if a X has the normal distribution N (a µ, a Σa), for all a ∈ Rk . The
simultaneous distribution of X is given by
fX (x) = p
1
1
| −1 (x − µ)},
exp{− (x − µ)T Σ
2
|
(2π)n det(Σ)
x ∈ Rk .
| ∗ ) distribution
A first consequence is that for a non-singular matrix B, the vector BX has the N (µ∗ , Σ
∗
∗
T
| = BΣB .
with µ = Bµ and Σ
37
The definition implies all information on the components Xi and their correlations cov(Xi , Xj ) =
d
| ii ). Using
E(Xi Xj ) − E(Xi )E(Xj ). Putting a = ei , the i-th unit vector, we obtain that Xi = N (µi , Σ
| ij .
a = ei + ej , one can deduce that cov(Xi , Xj ) = Σ
| ij = 0 implies independence of Xi and Xj . This is a
By using the density, one can then show that Σ
special properties of normally distributed r.v.s
Next we define Brownian motion.
Brownian motion or Wiener process The stochastic process W (t), t ∈ R+ , defined on the
probability space Ω, F, P) is called a standard Brownian motion (or standard Wiener process) if
i) W (0) = 0 a.s.;
ii) (W (t1 , . . . , W (tn )) has a multivariate normal distribution, for all n and all times 0 < t1 < t2 <
· · · < tn ;
iii) E(W (t)) = 0 for t > 0;
iv) cov(W (s), W (t)) = min(s, t);
v) W (·, ω) is a continuous function for a.a. ω ∈ Ω.
Problem 6.1 Let 0 < t1 , · · · < tn . By assumption (W (t1 ), . . . , W (tn )) has a multivariate normal
distribution. Compute the covariance matrix.
Construction of Brownian motion on [0, 1] For each ω we will define a uniformly convergent
sequence of continuous functions Wl (t, ω), t ∈ [0, 1], l = 0, 1, . . ..
d
Define W0 (0) = 0, and choose W0 (1) = ∆0,0 = N (0, 1). Extend W0 (t), 0 < t < 1, by linear
interpolation: W0 (t) = t · W0 (1).
d
Next let ∆1,1 = N (0, 1/4) be drawn independently of ∆0,0 . Define W1 (0) = 0,
1
W1 (1/2) = W0 (1) + ∆1,1
2
and W1 (1) = W0 (1) and define W1 (t), t 6= 0, 1/2, 1 by linear interpolation. It is easily checked that
(W1 (1/2), W1 (1)) has a multivariate normal distribution with the properties (iii,iv). Indeed
cov(W1 (1/2), W1 (1)) = cov( 12 W0 (1), W0 (1)) = 21 σ 2 (W0 (1)) =
1
2
= min( 12 , 1).
Further
σ 2 (W1 (1/2)) = 14 σ 2 (W0 (1)) + σ 2 (∆1,1 ) = 12 .
d
The construction of Wl+1 (t) from Wl (t) is as follows. Let ∆l+1,j = N (0, 2−(l+2) ), j = 1, . . . , 2l , be
independent and independent of ∆i,j , i ≤ l, j = 1, . . . , 2i−1 . Assign Wl+1 (0) = 0,
Wl+1 ((2j − 1)2−(l+1) ) = Wl ((2j − 1)2−(l+1) ) + ∆l+1,j ,
and
Wl+1 (j2−l ) = Wl (j2−l ),
38
j = 1, . . . , 2l .
j = 1, . . . , 2l
For t 6= j2l+1 , j = 0, 1, . . . , 2l+1 , we define Wl+1 (t) by linear interpolation:
Wl+1 (t) = Wl+1 (j2−(l+1) ) +
t − j · 2−(l+1)
· Wl ((j + 1)2−(l+1) ),
2−(l+1)
j = 1, . . . , 2l+1 − 1.
Then Wl+1 (t), t = j · 2l+1 , has the multivariate normal distribution with properties (iii,iv) from the
definition of standard Brownian motion.
Lemma 6.1 sup0≤t≤1 |Wn (t) − Wm (t)| → 0 a.s., n, m → ∞, i.e.
P{ω : sup |Wn (t, ω) − Wm (t, ω)| 6→ 0, for some sequence nk,ω , mk,ω , k = 1, 2, . . .} = 0.
0≤t≤1
Proof. Let Xl,j , j = 1, . . . , 2l−1 , l = 1, . . . be a collection of i.i.d. N (0, 1)-distributed r.v.s. Clearly,
for l ≥ 1
1
· max {|Xl1 |, . . . , |Xl2l−1 |}
2(l+1)/2
sup |Wl (t) − Wl−1 (t)| ≤ max {|∆l,j |, j = 1, . . . , 2l−1 } =
0≤t≤1
Put
An = {ω :
|Xlj (ω)| > 2 ·
max
p
6 log(2n − 1)}
l=1,...,n
j=1,...,2l−1
P
(there are nl=1 2l−1 = 2n − 1 i.i.d. N (0, 1)-distributed r.v.s that determine the max). We have seen
that P{An } ≤ 1/(2n − 1)2 and so the first Borel-Cantelli lemma implies that P{lim supn→∞ An } = 0.
Put A = lim supn→∞ An . Fix any ω ∈ Ac . There exists nω , such that
sup |Wn (t, ω) − Wn−1 (t, ω)| ≤
0≤t≤1
1
2(n+1)/2
·2·
p
6 log(2n − 1),
n ≥ nω .
Consequently for m > n
sup |Wm (t, ω) − Wn (t, ω)| ≤
0≤t≤1
≤
m
X
|Wl (t, ω) − Wl−1 (t, ω)|
l=n+1
m
X
1
2(l+1)/2
l=n+1
·2·
q
6 log(2l − 1) → 0,
m, n → ∞
QED
As a result, for ω ∈ Ac the sequence of continuous functions Wn (·, ω) has a continuous limit W (·, ω).
To see that this limit defines is a Brownian motion on [0, 1], we have to do some work still.
Let 0 < t1 < t2 < · · · < tn ≤ 1. We have to show that (W (t1 ), . . . , W (tn )) is a random vector with
the desired properties. This is clearly true if all tk all are dyadic rationals jk /2l : on these points
m
W (tk ) = Wn (tk ) for n ≥ l. Otherwise let tm
k > tk , tk → tk , m → ∞, be a sequence of dyadic
rationals. Then
m
(W (t1 , ω), . . . , W (tn , ω)) = lim sup (W (tm
1 , ω), . . . , W (tn , ω)),
m→∞
39
ω ∈ Ac ,
by continuity. If A = ∅ then the limsup is measurable. If A 6= ∅ we need in fact that F be extended
with all subsets of 0-probability sets. This is a little beyond the scope of the course.
m
Now, (W (tm
1 ), . . . , W (tn )) converges a.s. to the random vector (W (t1 , . . . , W (tn ). The corresponding
multivariate normal densities converge as well to desired multivariate normal density. Hence the
corresponding distribution functions converge. One can then show that the limit distribution function
is the distribution function of (W (t1 , . . . , W (tn )).
t:l6.2
Lemma 6.2 Let (Ω, F, P) be a probability space. Let X and Xn , n = 1, 2, . . ., be r.v.s on this
probability space, such that Xn → X a.s. Let Fn (·) = P{Xn ≤ ·} be the distribution function of Xn
and assume that Fn → F for some distribution function F. Then F is the distribution function of X.
Problem 6.2 Prove this lemma - Fatou’s lemma for sets plays a role here.
With probability 1 Brownian motion paths W(t), 0 ≤ t, are nowhere differentiable. In
other words: there exists a set A ∈ F, P{A} = 0, such that W (·, ω) is nowhere differentiable
for all ω ∈ Ac .
Note: we assume that W (·, ω) has continuous paths on R+ , for a.a. ω. We have proved this only for
a compact time interval. Let
k k+2
k+1 k+3
k+2 )
−
W
(
)
,
W
(
)
−
W
(
)
,
W
(
)
−
W
(
)
}.
Xnk = max{W ( k+1
n
n
n
n
n
n
2
2
2
2
2
2
These differences have the distribution of 2−n/2 · W (1), and so
P{Xnk ≤ } = P3 {|W (1)| ≤ 2n/2 } ≤ (2 · 2n/2 · )3 ,
since the density of the standard normal distribution is bounded by 1. For Yn = mink≤n2n Xnk , we
have
P{Yn ≤ } ≤ n · 2n (2 · 2n/2 · )3 .
(6.1)
Problem 6.3 Explain (6.1).
Let
DW (tω) = lim sup
h↓0
W (t + h, ω) − W (t, ω)
,
h
DW (tω) = lim inf
h↓0
W (t + h, ω) − W (t, ω)
.
h
Let
E = {ω : DW (tω), DW (t, ω) are both finite for some t}.
It is not clear whether E ∈ F is a measurable set! Choose ω ∈ E. Then there exists K = K(ω) such
that
−K < DW (t, ω) ≤ DW (t, ω) < K,
for some time t = t(ω). Then there exists a constant δ = δ(ω, t, K), such that |W (s)−W (t)| ≤ K|s−t|
for all s ∈ [t, t + δ]. Hence, there exists n0 = n0 (δ, K, t), such that for n > n0
4 · 2−n < δ,
8K < n,
n > t.
Given such n, choose k so that (k − 1)2−n ≤ t < k2−n . it follows that |i · 2−n − t| < δ for
i = k, k + 1, k + 2, k + 3, and so
Xnk (ω) ≤ 2K(4 · 2−n ) < n · 2−n .
40
(6.2)
Problem 6.4 Explain (6.2).
Since, k − 1 ≤ t · 2n < n · 2n , it follows that
Yn (ω) ≤ Xnk (ω) ≤ n · 2−n .
We have thus shown that for ω ∈ E, there exists Nω such that ω ∈ An = {ω : Yn (ω) ≤ n · 2−n } for
n ≥ Nω . So, E ⊂ lim inf n An .
By virtue of (6.1),
P{An } ≤ n · 2n (2 · 2n/2 · n2−n )3 .
Thus P{lim inf n An } ≤ lim inf n→∞ P{An } = 0. By extending F with all sets contained in sets of
probability 0, we obtain that P{E} = 0. This example shows again the necessity of such an extension
procedure!
Markov property and strong Markov property Fix t. Put Ft = σ(W (s), s ≤ t} and F0 =
{∅, Ω}.
Now W (T +t)−W (T ), t ≥ 0 is independent of FT . This is the Markov property of Brownian motion.
Moreover, it is a Brownian motion.
Problem 6.5 prove these statements.
We may even allow T to be a stopping time.
T is a stopping time if T is a non-negative r.v. on (Ω, F, P), such that
{ω : T (ω) ≤ t} ∈ Ft .
Define FT to be the collection of all sets M ∈ F such that M ∩ {ω : T (ω) ≤ t} ∈ Ft for all t ≥ 0.
Problem 6.6 Deduce that {ω : T (ω) = t} ∈ Ft and that M ∈ FT implies M ∩ {ω : T (ω) = t} ∈ Ft .
Now, let T be a stopping time and put W ∗ (t) = W (T + t) − W (T ). Then the strong Markov property
holds: W ∗ (t), t ≥ 0 is independent of FT (i.e. σ(W ∗ (t), t ≥ 0} is independent of FT . Moreover W ∗ (t)
is a Brownian motion.
This is true, if for all x1 , . . . , xk ∈ R, t1 < · · · < tk , k = 1, . . ., and all M ∈ FT we have
P{(W ∗ (t1 ) ≤ x1 , . . . , W ∗ (tk ) ≤ xk ) ∩ M }
=
P{W ∗ (t1 ) ≤ x1 , . . . , W ∗ (tk ) ≤ xk } · P{M } = P{W (t1 ) ≤ x1 , . . . , W (tk ) ≤ xk } · P{M }. (6.3)
To prove this, first assume that {T ∈ A} for a countable set A with probability 1. Since
{ω : W ∗ (t) ≤ x} = ∪T ∈A {ω : W (T + t, ω) − W (T, ω) ≤ x, T (ω) = t)} ∈ F,
it follows that W ∗ (t) is F-measurable. Moreover,
X
P{(W ∗ (t1 ) ≤ x1 , . . . , W ∗ (tk ) ≤ xk ) ∩ M } =
P{(W ∗ (t1 ) ≤ x1 , . . . , W ∗ (tk ) ≤ xk ) ∩ M ∩ (T = t}.
t∈A
41
If M ∈ FT , then M ∩ (T = t) ∈ Ft . Further, if T = t, then (W ∗ (t1 ), . . . , W ∗ (tk )) has the same
distribution as W (t1 + t) − W (t), . . . , W (tk + t) − W (t)). We obtain
P{(W ∗ (t1 ) ≤ x1 , . . . , W ∗ (tk ) ≤ xk ) ∩ M }
X
=
P{(W (t1 + t) − W (t) ≤ x1 , . . . , W (tk + t) − W (t) ≤ xk ) ∩ M ∩ (T = t)}
t∈A
=
X
P{(W (t1 + t) − W (t) ≤ x1 , . . . , W (tk + t) − W (t) ≤ xk }P{M ∩ (T = t)}
t∈A
= P{(W (t1 + t) − W (t) ≤ x1 , . . . , W (tk + t) − W (t) ≤ xk }P{M }.
This proves that the first and last terms in (6.3) are equal. To prove equality of the second and last
terms, simply take M = Ω. Consequently, the assertion has been proved for stopping times with a
countable range.
Let T be an arbitrary stopping time. Define
k · 2−n
if (k − 1)2−n < T ≤ k · 2−n ,
τn =
0,
if T = 0.
k = 1, 2, . . .
If k · 2−n ≤ t < (k + 1) · · · 2−n , then {τn ≤ t} = {T ≤ k · 2−n } ∈ Fk2−n ⊂ Ft . If follows that τn is a
stopping time with a countable range.
Suppose that M ∈ FT and k · 2−n ≤ t < (k + 1) · · · 2−n . Then M ∩ {τn ≤ t} = M ∩ {T ≤ k2−n } ∈
Fk2−n ⊂ Ft . So FT ⊂ Fτn .
Let W (n) (t, ω) = W (τn (ω) + t, ω) − W (τn (ω), ω) be the displacement process after stopping time τn .
Since M ∈ FT implies M ∈ Fτn , we have by virtue of (6.3)
P{(W (n) (t1 ) ≤ x1 , . . . , W (n) (tk ) ≤ xk ) ∩ M } = P{W (n) (t1 ) ≤ x1 , . . . , W (n) (tk ) ≤ xk }P{M }. (6.4)
However, τn (ω) → T (ω) for all ω and by a.s. continuity of the sample paths, W (n) (t, ω) → W ∗ (t, ω)
for a.a. ω.
To finish the proof, we have to invoke Lemma 6.2.
Problem 6.7 Finish the proof by suitably applying this Lemma.
Curious properties Let Ta be the first time that the Brownian motion process hits the set [a, ∞).
Problem 6.8 i) By conditioning on the event {Ta ≤ t} show that
2P{W (t) ≥ a} = P{Ta ≤ t}.
ii) Use this to show that
2
P{Ta ≤ t} = √
2π
Z
∞
√
exp{−y 2 /2}dy.
a/ t
Compute the corresponding density fTa (t). Derive that Ta < ∞ a.s., but E(Ta ) = ∞. Compute
P{max0≤s≤t W (s) ≥ a}.
How often does Brownian hit 0 in a finite time interval?
42
Problem 6.9 Let ρ(s, t) be the probability that a Brownian motion path has at least one zero in
(s, t).
i) Deduce that
ρ(s, t) = 1 −
q
2
arcsin st .
π
ii) use (i) to show that the position of the last zero before time 1 is distributed over (0, 1) with
density π −1 (t(1 − t))−1/2 .
iii) For each ω let Z(ω) = {t : W (t, ω) = 0} be the set of zeroes of W (·, ω). Show that λ(Z(ω)) = 0
for a.a. ω, in words, the Lebesgue measure of the set of zeroes of W (·, ω) is 0 a.s.
We give some application of the use of stopping times. The first one is the curious phenomenon that
one can embed any given distribution law in a Brownian motion. One version of this statement is
the so called Skorokhod embedding - which is in a sense a minimal construction in the sense that
the stopping time involved has finite expectation. Without this minimality condition, it is an almost
trivial statement, as pointed out by Doob.
Problem 6.10 Let F be a distribution function. Determine a function h such that P{h(W (1)) ≤
x} = F(x). Show that τ = min{t > 1|W (t) = h(W (1))} is an a.s. finite stopping time. Show that
W (τ ) has distribution function F and that E(τ ) = ∞.
The stochastic process X(t) = µ · t + W (t), t ≥ 0, is called a Brownian motion with drift µ. Remind
that we can associate 3 martingales with the Brownian motion: W (t), t ≥ 0, W 2 (t) − t, t ≥ 0 and
exp{cW (t) − c2 t/2}, t ≥ 0. With the Brownian motion with drift, one can also associate martingales.
Problem 6.11 Show that X(t) − µt, t ≥ 0, and exp{−2µW (t)}, t ≥ 0 are martingales.
Let a < 0 < b and suppose that X(0) = x ∈ (a, b). We are interested in the probability px that a is
hit before b.
t:q1
Problem 6.12 i) Let T = min{t | X(t) ∈ {a, b}}. Show that T < ∞ with probability 1.
ii) Use the continuous time version (not formulated but evident) of the optional stopping theorem
to compute px .
iii) Show that E(T ) < ∞. Compute E(T ) through a suitable martingale.
7
Diffusions and Ito processes
Let us first give a proof of Theorem 7.4 from BSP.
Rt
Theorem 7.1 Let f be a stochastic process belonging to M2t and let I(t) = 0 f (s, ω)dW (s, ω).
Then there exists a stochastic process ζ(s), s ≤ t, such that ζ(·, ω) is continuous for a.a. ω and
P{ξ(s) = ζ(s)} = 1 for all s ∈ (0, t].
43
Proof. Let fn → f be a sequence of approximating random step functions. Clearly, {Is (fn )}0≤s≤t is
a.s. continuous.
Since {Is (fn )}0≤s≤t is a martingale, also {Is (fn ) − Is (fm )}0≤s≤t , is a martingale. Hence, {(Is (fn ) −
Is (fm ))2 }0≤s≤t , is a sub-martingale. We may apply Doob’s maximal inequality yielding that
P{ sup |Is (fn ) − Is (fm )| > ) = P{ sup |Is (fn ) − Is (fm )|2 > 2 )
0≤s≤t
0≤s≤t
≤
=
1
E(It (fn ) − It (fm ))2 )
2
1
1
||(It (fn ) − It (fm ))||2L2 = 2 ||fn − fm ||2M2 → 0,
2
t
n, m → ∞.
It follows that there exists a subsequence {nk }k , such that
P{ sup |Is (fnk ) − Is (fnk+1 )| > 2−k } < 2−k .
0≤s≤t
We may apply the first Borel-Cantelli Lemma to obtain that for almost all ω there exists an index
k(ω) such that
sup |Is (fnk )(ω) − Is (fnk+1 )(ω)| ≤ 2−k , k ≥ k(ω).
0≤s≤t
Hence, the sequence Is (fnk )(ω) converges uniformly on (0, t] for a.a. ω. Hence the limit Js (ω) =
L2
limk→∞ Is (fnk )(ω) is a continuous function on (0, t] for a.a. ω. Now, Is (fnk ) → Is , k → ∞, for
s ∈ (0, t]. Hence, there is a subsequence converging to Is for a.a. ω. It follows that P{Is = Js } = 1
for s ∈ (0, t].
QED
L2
Problem 7.1 Suppose that X, Xn , n = 1, . . . are r.v.s in L2 (Ω, F, P). Assume that Xn → X.
i) Show that limn→∞ P{|Xn − X| > } = 0, for each > 0.
ii) Use this to show that there is a subsequence {nk }k along which there is a.s. convergence, i.e.
Xnk → X for a.a. ω.
The proof of the above theorem will given some indications as how to prove this.
Problem 7.2 Brownian bridge Let a, b ∈ R be given. Consider the following 1-dimensional
equation:
b − Y (t)
dY (t) =
dt + dW (t), 0 ≤ t < 1, Y (0) = a.
1−t
Verify that
Z t
dW (s)
Y (t) = a(1 − t) + bt + (1 − t)
, 0 ≤ t < 1,
0 1−s
solves the equation and prove that limt→1 Y (t) = b a.s.
So far, we have studied how to construct Ito-processes. However, given any stochastic differential
equation, there is no clue so far, as how to judge whether there exists a solution and, if it exists,
whether it is unique (with prob. 1).
44
SBP treats the case of a so-called Ito-diffusion. Let us give the definition for the n-dimensional case.
A time-homogeneous Ito diffusion is a stochastic vector process X(t, ω) = (X1 (t, ω), . . . , Xn (t, ω)) on
(Ω, F, P) that satisfies a stochastic differential equation of the form
dX(t) = b(X(t))dt + σ(X(t))dW (t),
t ≥ s, X(s) = x,
where W (t) = (W1 (t), . . . , Wd (t)) is a d-dimensional Brownian motion, b : Rn → Rn , σ : Rn →
n × d. We assume that b and σ satisfy a Lipschitz condition: there exists a constant C such that
||b(x) − b(y)|| + |||σ|(x) − |σ|(y)|| ≤ C||x − y||.
(7.1)
Since we have not spoken of the multi-dimensional case, let us shortly spend a few words on it.
(we need it in the later examples) A d-dimensional Brownian motion is simply the vector process
associated with d independent one-dimensional Brownian motions defined on the same space. The
SDE then simply stands for
dXi (t) = bi (X(t))dt + σi1 (X(t))dW1 (t) + . . . + σid (X(t))dWd (t),
i = 1, . . . , n.
We can now set Ft = σ(Wi (s), 0 < s ≤ t, i = 1, . . . , d).
The analog of BSP Theorem 7.7 gives that under the above Lipschitz condition there is an a.s.
unique solution of the initial value problem with a.s. continuous paths
dX(t) = b(X(t))dt + σ(X(t))dW (t), 0 ≤ t ≤ T
X(0) = X0 ,
provided E(X0 )2 < ∞ and X0 independent of σ(Ft , t > 0). This solution is adapted to the filtration
RT
σ(Ft , σ(X0 ) and has 0 Xi2 (t)dt < ∞ for i = 1, . . . , n.
Liptschitz conditions are also commonly used in (deterministic) differential equations for guaranteeing existence and uniqueness properties. Next we will list a number of properties of Ito diffusions.
In fact, these properties are inherent to so-called diffusion processes given technical conditions. One
does not need the notion of SDE’s and Ito integrals for arriving at these properties. However, it
appears from the literature that SDE’s are an efficient formalism for deriving existence and uniqueness results for diffusion processes with certain given properties. In fact it constructs a diffusion
process with given properties from Brownian motion paths. Some authors claim this as the key of
Ito’s contribution to the field of diffusion processes.
The properties described lateron rely on Ito’s formalism.
In our case it is better not to depart to the field of diffusion processes. Even more so, because there
are many conflicting definitions of this notion. The best advice for a rigorous treatment are the
books by Rogers and Williams (the latter being the author of the martingale book).
From now on, we will only consider Ito diffusions. One can prove that this are strong Markov
processes.
The infinitesimal generator A of the process X(t) is defined by
def
Af (x) = lim
t↓0
E(f (X(t))|X(0) = x) − f (x)
,
t
x ∈ Rn .
If for a given function f , this limit exists for all x, then we say that f belongs to DA , the domain
of the generator. Let C02 (Rn ) be the set of twice continuously differentiable functions on Rn with
45
compact support. Then one can prove that Af (x) exists for all x ∈ Rn and
Af (x) =
X
i
bi (x)
1X
∂f
∂2f
(x) +
(σ(x)σ T (x))ij 2 (x),
∂xi
2
∂x
i,j
whenever f ∈ C02 (Rn ). Note that A is a linear operator on C02 (Rn ).
It is obvious that Brownian motion is a time-homogeneous one-dimensional Ito diffusion with infinitesimal parameters b(x) = 0 and σ(x) = (1). The infinitesimal operator A associated with it, is
given by
1 ∂2f
1
Af (x) =
(x) =: ∆f (x).
2
2 ∂x
2
P
(∆ stands for the Laplace operator: if f : Rn → R is twice differentiable, then ∆f = ni=1 (∂ 2 /∂x2i )f .
One can model the graph of Brownian motion by a two-dimensional diffusion as follows: X(t) =
(X1 (t), X2 (t)) with X1 (t) = t and X2 (t) = W (t).
Problem 7.3 Compute the corresponding infinitesimal generator.
The Ornstein-Uhlenbeck process is the Ito diffusion defined by
d(X(t)) = −αX(t)dt + σdW (t),
with α.
Problem 7.4 Give the infinitesimal generator.
The infinitesimal generator contains the information on the marginal distributions of an Ito diffusion.
Lemma 7.2 (Dynkin’s Lemma) Let f ∈ C02 (Rn ). Suppose that τ is a stopping time with
E(τ |X(0) = x) < ∞. Then
Z τ
E(f (X(τ ))|X(0) = x) = f (x) + E(
Af (X(s))ds|X(0) = x).
0
The proof of this lemma follows rather straightforwardly from Ito’s formula.
Problem 7.5 Search the literature for a proof of Dynkin’s lemma based on Ito’s formula. Write it
and, if necessary, supply lacking details.
Problem 7.6 Consider the n-dimensional Brownian motion W (t) = (W1 (t), . . . , Wn (t)), t ≥ 0.
n
n
Suppose Brownian motion starts
qP at a point x ∈ R . Let R > 0 be given. As the norm on R we
2
consider the L2 -norm: ||x|| =
i xi .
i) Compute the infinitesimal generator of n-dimensional Brownian motion.
Let ||x|| < R and let τ denote the first exit time of the ball B n = {y | ||y|| < R}. By a.s. continuity
of Brownian motion paths, τ = inf{t > 0|W (t) 6∈ B n } is equal in distribution to
inf{t > 0| ||W (t)|| = R}.
46
ii) Show that P{τ < ∞|X(0) = x} = 1. Define a suitable martingale to compute E(τ ) by virtue of
the optional stopping theorem. Argue that the theorem is applicable and compute the expected
exit time. Hint: problem 6.12 may be helpful here.
Let now ||x|| > R and let τ be the first entrance time of B n . The question is whether τ < ∞ a.s.
and if yes, what is the expectation. The case n = 1 has been solved already (where?), so we assume
n ≥ 2. I do not know how to get the optional stopping theorem to work for answering the above
questions - if you can, please do. Therefore, it seems better to apply Dynkin’s lemma for suitable
functions f . What type of function f would be suitable?
The complement of the closure of the R-ball is unbounded, and so we start in unbounded territory.
Now, we need to use functions that have a compact support. It makes sense to consider the annulus
Ak = {y|R < ||y|| < 2k R}. Choose k large enough so that x ∈ Ak . Denote τk = inf{t > 0|W (t) 6∈ Ak }.
A
R τksuitable function f = fn,k should depend on y only through the norm. Further, the integral
0 Afn,k (X(s))ds should be easy to calculate. The best would be if this expression disappears
altogether on the annulus, i.e. Afn,k = 0 on Ak . In other words, ∆fn,k = 0 on the annulus, i.e. fn,k
is harmonic (on the annulus).
Choose f = fn,k a function in C02 (Rn ), with fn,k (y) = log ||y|| on Ak if n = 2 and fn,k (y) = ||y||2−n if
n > 2.
iii) Show that τk satisfies the conditions of Dynkin’s lemma. Show that fn,k is harmonic on A¯k .
Compute E(f (X(τk ))|X(0) = x). Derive that
(
1,
n=2
P{τ < ∞|X(0) = x} =
||x|| 2−n
,
n > 2.
(R)
In case of n = 2, show that E(τ |X(0) = x} = ∞. The implication is that Brownian motion in
2 dimensions is null-recurrent and in n > 3 dimension it is transient.
Now, if we choose the stopping time τ deterministic, i.e. τ ≡ t, then we see that
u(t, x) = E(f (X(t))|X(0) = x)
is differentiable w.r.t. t and
∂u
= E(Af (X(t))|X(0) = x).
∂t
It turns out that we can express the right-hand side of the above also in terms of u. This gives rise
to Kolmogorov’s backward equation.
Theorem 7.3 (Kolmogorov’s backward equation) Let f ∈ C02 (Rn ).
i) Define u(t, x) = E(f (X(t))|X(0) = x). Then u(t, ·) ∈ DA for each t and
∂u
= Au, t > 0, x ∈ Rn
∂t
u(0, x) = f (x), x ∈ Rn .
Interpret the right-hand side of (7.2) as A applied to u as a function of x.
47
(7.2)
(7.3)
ii) Suppose that w(t, x) is a bounded function solving (7.2) and (7.3), which is continuously differentiable in t, and twice continuously differentiable in x. Then w(t, x) = u(t, x). In particular,
we have the explicit partial differential equation
∂u X ∂u 1 X
∂2u
=
bi
+
(σσ T )ij 2 .
∂t
∂x 2
∂x
i
i,j
This theorem gives a probabilistic solution to the initial value problem (7.2), (7.3). Now suppose
that the Ito diffusion X(t) has a density p(t, x, y) = (∂/∂y)P{X(t) ≤ y|X(0) = x} that is once
continuously differentiable in t and twice continuously differentiable in x. Then it makes sense that
this density itself solves (7.2).
Problem 7.7 Sketch a way how to prove this from Theorem 7.3.
Heat equation Let us now fix X(t) = x + W (t), with x given. Then Kolmogorov’s backward
equation (7.2) reduces to
∂u(x, t)
∂ 2 u(x, t)
,
= 12
∂t
∂x2
which is the heat equation in one dimension. If X(t) is n-dimensional Brownian motion, then we get
∂u
= 12 ∆u.
∂t
We interpret this equation physically in different ways. It may model the time development of
temperature u by heat conduction. On the other hand, microscopic particles suspended in a fluid
or gas perform a very irregular motion, caused by collisions with molecules in thermal motion. One
can then interpret u as the particle density, evolving in time.
From a microscopic point of view, individual particles perform a Brownian motion, which is a stochastic process. The process is an Ito diffusion. From a macroscopic point of view, the particle density
evolves in time according to the heat equation.
The relation between the two is that the density of Brownian motion is a solution to the heat
equation.
Problem 7.8 Check the validity of this statement.
Now we will check the validity of Theorem 7.3 for this simple model. Consider the initial value
problem

∂
1


u = ∆u, in Rn × R+
∂t
2
u continuous in Rn × R+

0

and u(x1 , . . . , xn , 0) = Φ(x1 , . . . , xn ),
for a given bounded and continuous function Φ : Rn → R.
By Theorem 7.3 this initial value problem has the following solution
Z
Z
1
2
Φ(x + y)
u(x, t) = E(Φ(x + W (t))) =
exp{−||y||
/2t}dy
=
Φ(y)φd (x − y, t)dy,
(2πt)d/2
Rd
Rd
provided that Φ satisfies the condition of that theorem. For continuous functions Φ the statement
can be checked directly.
48
Problem 7.9 Do this by carrying out the following steps.
i) Argue that (x, t) → Φ(x + W (t)) is a continuous and bounded function for a.a. ω ∈ Ω.
ii) Use (i) and a suitable convergence theorem to conclude that (x, t) → E(Φ(x+W (t))) is continuous.
iii) Show that E(Φ(x + W (0))) satisfies the initial conditions.
iv) Finally show that E(Φ(x + W (t))) solves the heat equation. To this end we need to interchange
derivation and integral, so you have to justify that operation.
We derive another property of the solution u to the initial value problem.
Theorem 7.4 Let s > 0 be fixed. Then u(x + W (t), s − t) is a martingale w.r.t. the filtration
(Ft )0≤t≤s , Ft = σ(W (t0 ), 0 ≤ t0 ≤ t).
Problem 7.10 Prove the
theorem. First argue that
the following relation holds true: u(y, s − t) =
E(Φ(y + W (s − t))) = E Φ(y + W (s) − W (t))|Ft .
Boundary value problems Let X(t) be an Ito diffusion in one dimension:
dX(t) = b(X(t))dt + σ(X(t))dW (t),
where the functions b and σ satisfy the Lipschitz condition (7.1). Let (a, b) be a given interval, and
let X(0) = x ∈ (a, b).
Put τ = inf{t > 0|X(t) 6∈ (a, b)} and define p = P{X(τ ) = b|X(0) = x}. Suppose that we can find a
solution f ∈ C 2 (R) such that
1
Af = b(x)f 0 (x) + f 00 (x) = 0,
2
x ∈ R.
Problem 7.11 i) Prove that
p=
f (x) − f (a)
,
f (b) − f (a)
provided τ < ∞ a.s.
ii) Now specialise to the case X(t) = x + W (t), t ≥ 0. Prove that
p=
x−a
.
b−a
iii) Determine p if X(t) = x + bt + σW (t).
Now we are interested in the following boundary
 00
 u (x) =
u(a) =

u(b) =
value problem: find u ∈ C 2 (R) such that
0, x ∈ (a, b)
φ(a)
φ(b).
49
Problem 7.12 Determine a solution to this problem analytically.
We can derive a solution by a stochastic approach.
Problem 7.13 Let X(t) = x + W (t). Show that
u(x) := E(φ(X(τ ))|X(0) = x)
solves the boundary value problem.
Solving the PDE in this way is stereotype. However, in general one needs a detailed study of suitable
properties of the function E(φ(X(τ ))|X(0) = x}, because in most cases one cannot explicitly calculate
it. That involves many technicalities.
Problem 7.14 Solve the following boundary value problem by a stochastic approach: find u ∈
C 2 (R) such that
 0
 bu (x) + 12 σ 2 u00 (x) = 0, x ∈ (a, b)
u(a) = φ(a)

u(b) = φ(b).
In the above solutions, time did not play a role. We will next consider the simplest version of a
boundary value problem involving the heat equation.
Back to the heat equation Let D denote the infinite strip:
D = {(t, x) ∈ R2 : x < R}.
Let φ be a bounded continuous function on δD = {(t, R)|t ∈ R}. We consider the following boundary
value problem: find u ∈ C 1,2 (R × (−∞, R)) such that


∂u 1 ∂ 2 u

+
= 0,
(x, t) ∈ D
∂t
2 ∂x2

lim
u(s, x) = φ(t),
y ∈ δD.

(s,x)→(t,R),(s,x)∈D
A physical interpretation of this problem is the following: we consider an infinitely long vertical bar,
with upper end point at R. We fix the temperature φ at the upper point of the bar as a function of
time. Now we are interested in how temperature ‘spreads’ over the whole bar,while time is running.
Problem 7.15 i) Define (hint: look at earlier exercises) a 2-dimensional Ito diffusion X(t), with
generator
∂
1 ∂2
A=
+
.
∂t 2 ∂x2
ii) Let τt,x = inf{s > t | X(s) 6∈ D}, given that X(0) = (t, x). Show that
u(t, x) = E(φ(X(τt,x ))|X(0) = (t, x))
is the solution of the boundary value problem. Hint: the distribution of τt,x −t is the distribution
of the hitting time of R for Brownian motion given initial state W (0) = x.
50
Option pricing We will indicate how to arrive at the simplest form of the Black Scholes formula
for European options. There is a extensive mathematical formalism to define all notions that we use
below in a precise manner, but this go far beyond the scope of this course.
The basis is the following Ito diffusion
dS(t) = µS(t)dt + σS(t)dW (t),
where S(t) is the value of one unit stock. µ and σ 6= 0 are assumed constant. This is a geometric
Brownian motion (see BSP) and it has the solution
S(t) = s0 exp{(µ − σ 2 /2)t + σW (t)}
where S(0) = s0 is given. Of course, dealing in stock is a risky investment because of the diffusion
term σSdW (t). If we assume the interest rate of a bank investment to equal a constant ρ, then a
bank investment is a safe investment.
A European option is the right to buy one unit stock at the expiration time T for DK. At the
expiration time T you will exercise your option, when K < S(T ); you will not buy it when K > S(T ).
This means that at time T the value of your ‘warrant’ (right to buy the option) is
max(S(T ) − K, 0).
The question is how to calculate the price of the warrant at time t < T . If one assumes a stable
market, that is, on the average one cannot gain or lose, then price and value of a warrant must be
equal. Write F (S(t), T − t) is the price of a unit warrant at time t < T . Then F (S, 0) = max(S, 0).
The aim is to formulate an initial value problem for F (S, T − t), 0 ≤ t ≤ T .
Problem 7.16 Derive a SDE for dF (S, T − t).
Suppose we have the following investment policy: at time t our portfolio consists of 1 unit of warrant
with value F (S(t), T − t) and α(t) units of stock, so as to eliminate risk. Now α(t) is assumed
Ft = σ(W (s), s ≤ t) measurable. As a consequence the value of out portfolio at time t is
V (t) = F (S(t), T − t) + αS(t)
and we get
dV = dF + αdS.
Problem 7.17 Derive a SDE for V . Determine α such that the dW term (diffusion term) disappears.
Conclude that
1 ∂2F
∂F 2 2
dV (t) =
S
σ
−
dt.
2 ∂S 2
∂t
On the other hand, in a stable market, the average value of a portfolio is the same is the value of a
safe (bank) investment:
dV = ρV.
Problem 7.18 i) Show that combining the above gives rise to the following initial value problem
for F :


∂F
∂F
1
∂2F
= ρS
+ S 2 σ 2 2 − ρF.
∂S
2
∂S
 F (S,∂t
0) = max(S − K, 0)
51
ii) Suppose ρ = 0. Of what Ito diffusion would the first equation be be the Kolmogorov backward
equation (7.2)?
To solve this problem, we need to invoke the Feynman-Kac formula.
Theorem 7.5 Let f ∈ C02 (Rn ) and q ∈ C(Rn ). Assume that q is lower bounded.
i) Put
Z
t
v(t, x) = E exp{−
q(X(s))ds}f (X(t)|X(0) = x .
0
Then
(
∂v
= Av − qv,
∂t
v(0, x) = f (x),
t > 0, x ∈ Rn
x ∈ Rn .
ii) If w(t, x) ∈ C 1,2 (R × Rn ) is bounded on K × Rn for each compact subset K ⊂ R and w is a
solution of the above PDE, then w(t, x) = v(t, x).
Problem 7.19 Show that the value of the option at time t = 0 equals
√
s0 Φ(u) − e−ρT KΦ(u − σ T ),
where
1
Φ(u) = √
2π
Z
u
e−x
2 /2
dx,
−∞
is the distribution function of a standard normal r.v. and
u=
ln(s0 /K) + (ρ + σ 2 /2)T
√
.
σ T
This is the classical Black Scholes formula.
52