Download Stochastic Analysis in Financial Mathematics MA4265/FE5204

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Stochastic Analysis in Financial Mathematics
MA4265/FE5204
Lecture Notes 1
Basic Probability & Measure Theory
Main Reference. Øksendal, B., (2000/03): Stochastic Differential Equations–An
Introduction with Applications. 5th/6th Ed. Springer-Verlag
Auxiliary References.
(i) Baxter, Martin & Rennie, Andrew, (1996): Financial Calculus: An Introduction
to Derivative Pricing, Cambridge University Press.
(ii) Lamberton, D., Lapeyre, B. (1996): Introduction to Stochastic Calculus Applied
to Finance. Chapman & Hall
(iii) Steele, J. Michael (2001): Stochastic Calculus and Financial Applications. New
York: Springer.
Some Examples.
To begin with, let us present 4 examples for a preview of what will be covered in the
course.
1. A model for stochastic growth. Consider the differential equation
dN (t)
= a(t) N (t).
dt
(∗)
If a(t) = r + α(t) “noise”, what is the solution of (∗)? (Of course, the term “noise”
needs to be clearly defined before attempt can be made to solve the differential
equation.)
If α(t) ≡ 0, a(t) = r, then (∗) becomes a simple ordinary differential equation which
can be easily solved. Indeed,
N (t) = N0 ert .
What if α(t) = 0? More will be said in §2.3.
2. Optimal Stopping. Assume that the value of some asset at time t, X(t), is given
by
dX(t)
= (r + α(t) “noise”) X(t),
dt
where r, α are known. If we sell this asset at time t, we get the amount e−ρt [X(t)−a].
(Here, ρ is the so-called “discount rate” and a refers to the tax and/or transaction
cost.) We would like to ask at what time we should decide to sell. Mathematically
speaking, we would like to maximize the following quantity:
E
E e−ρτ [X(τ ) − a]
1
over all “admissible” selling time τ . By “admissible,” we mean that the decision to
sell at or before time t should only depend on the values X(s), s ≤ t, and not of any
future value of X(s).
3. Stochastic Control. Suppose that we have two investment possibilities: a bond
with price dynamics
dX0 (t)
= r X0 (t),
dt
with r a constant; a stock with price dynamics
dX1 (t)
= (μ + α “noise”) X1 (t).
dt
(Note that typically μ > r.) Let U(x) be a “utility function”, say U(x) = xγ (with
0 < γ < 1). Let u(x) be the fraction of the total wealth invested in the stock at time
t. Denote by Z(t) = Z (u)(t) the corresponding total wealth at time t.
The problem that we would like to deal with is to maximize
E
E U(Z (u) (T ))
over all “controls” u = u(t). (This problem has been considered by R. Merton in
1971. We will discuss this problem in class in due course.)
4. Problems in Mathematical Finance. To give an example, consider a European
call option which gives the owner of the option the right, but not the obligation, to
buy one stock of a specified type at a specified price K at a specified future time T .
What is the right price for such an option at time t = 0?
In 1973, the Black-Scholes option pricing formula was published, which gives an exact
formula for the right price in the market.
§1. Preliminaries.
To build up and develop the theory of probability, we should begin with a triple,
(Ω, F , IP), with certain proper mathematical structure. Roughly speaking, Ω refers
to the space of all sample points, i.e., all possible outcomes with respect to the
statistical experiment concerned, F refers to the collection of “events” which are
subsets of Ω, and IP refers to the probability measure.
It goes without saying that axioms (or postulates) and rules that govern certain
operations have to be imposed on the triple (Ω, F , IP) so that we may build up a
mathematical system that is rich in structure. Indeed, the first few lectures are more
or less a kind of “language” course, in which “alphabets” (symbols) and “grammatical
rules” (axioms or postulates) are to be defined/mentioned.
Let us first focus on F .
1. σ-Algebra
2
Definition. A non-empty collection F of subsets of Ω is called a σ-algebra on Ω
if and only if
(i) A ∈ F =⇒ Ac ∈ F , where Ac denotes the complement of A in Ω, i.e., Ac = Ω\A,
(ii) every union of a countablecollection of sets in F is again in F . That is, if
A1 , A2 , . . . are in F , then
An ∈ F .
n≥1
Note.
An immediate consequence is that Ω ∈ F , and hence ∅ ∈ F as well.
The pair (Ω, F ) is called a measurable space. A subset F of Ω is called measurable
(with respect to F ) if F ∈ F . (In probability theory, a measurable set is also called
an event.)
A σ-algebra can be regarded as a mathematical model of information.
2. By a probability measure IP on a measurable space (Ω, F ) we mean a nonnegative set
function defined on F and satisfying IP(Ω) = 1, IP(∅) = 0, i.e.,
IP : F → [0, 1]
with IP(Ω) = 1, IP(∅) = 0, and
IP(
∞
Ei ) =
i=1
∞
IP(Ei )
i=1
for any sequence Ei of disjoint measurable sets. (The above property of IP is often
referred to by saying that IP is countably additive (or σ-additive), and hence called
the axiom of countable additivity.)
3. By a probability space (Ω, F , IP) we mean a measurable space (Ω, F ) together with
a probability measure IP defined on F . Ω alone is usually called the sample space,
each of its element ω is then a sample point, every member of F an event, and IP a
probability measure.
An event F with IP(F ) = 1 is called a sure event, which means that F occurs almost
surely, or equivalently, F occurs with probability 1.
4. A Few Observations & Properties.
(a) The above defining condition §1.1.(ii) is equivalent to the following: if Ai ∞
n=1 is
A
must
again
be
in
F
.
a sequence of sets in F , then ∩∞
i=1 i
(b) The power set of Ω, denoted by 2Ω , refers to the collection of all subsets of Ω.
Then, the power set of Ω is a σ-algebra. It is sometimes called the total σ-algebra.
Note also that the collection of two sets {∅, Ω} is a σ-algebra, too. It is called the
trivial σ-algebra.
(c) If Fi , i ∈ I are σ-algebras, then ∩i∈I Fi is. Here I = ∅ is an arbitrary index set
(i.e., possibly uncountable).
(d) Given any collection U of subsets of Ω, there is a smallest σ-algebra σ(U) which
contains U. That is, σ(U) is a σ-algebra containing U and such that if B is any
σ-algebra containing U, then B contains σ(U).
3
Remark: The smallest σ-algebra containing U, denoted by σ(U), is called the
σ-algebra generated by U.
5. Examples.
(i) Example 1. Let Ω = {1, 2, 3}. Put F = {∅, Ω, {1}, {2, 3}}. By referring to the
definition in item 1, it can be shown that F is a σ-algebra.
Next, put G = {∅, Ω, {2}}. Observe that G is not a σ-algebra as
{2}c = {1, 3} = Ω \ {2}
does not belong to G. How do we find the smallest σ-algebra that contains G?
Obviously, σ(G) has to contain {1, 3}. Now, by going through the definition of a
σ-algebra, one can verify that {∅, Ω, {2}, {1, 3}} is a σ-algebra that contains G, and
hence the smallest σ-algebra containing G. That is,
σ(G) = {∅, Ω, {2}, {1, 3}}.
If H = {{2, 3}}, what is the smallest σ-algebra containing H? (Refer to F .)
(ii) Example 2. Let U be the collection of all open sets in IRn . The smallest σalgebra σ(U) containing U is called Borel σ-algebra, denoted B(IRn ).
Remark: A set V ⊂ IRn is said to be open in IRn if for every point v ∈ V , there
exists an r > 0 such that the open ball centered at v with radius r
B(v, r) = {x ∈ IRn : x − v < r}
is contained in V , i.e., B(v, r) ⊂ V . Here x = (x21 + · · · + x2n )1/2 , which is the
Euclidean distance in IRn for x = (x1 , . . . , xn ) ∈ IRn .
For instances, (−2, 5) is an open set (more precisely, an open interval) in IR; an
open ball mentioned above is also an open set.
6. We have just introduced what a probability space is. In the following, we will define
“random variables” on a given probability space.
An IRn -valued function Y defined on Ω is said to be a (vector-valued) random
variable (r.v. for short) if for every open set V ⊂ IRn we have
Y −1 (V ) = {ω : Y (ω) ∈ V } ∈ F .
Y is said to be F -measurable if Y is a random variable. For abbreviation, we will
write Y ∈ F .
7. Let X be a r.v. It can be shown that {X −1 (V ) : V ∈ B(IRn )} is a σ-algebra. Denote
it by σ(X), which is called the σ-algebra generated by X, and it contains all sets of
the form X −1 (V ), where V ∈ B(IRn ).
Remark: In fact, {Y −1 (V ) : V ∈ A} is a σ-algebra, provided A is.
4
8. Doob-Dynkin Lemma. Let X and Y be r.v.’s. Y is measurable with respect to
σ(X) if and only if Y = g(X) for some Borel-measurable function g : IRn → IRn .
9. Distribution. Given X : Ω → IRn , define a set function μX on B(IRn ) as follows:
μX (V ) = IP {X −1 (V )} = IP {ω : X(ω) ∈ V }.
We call μX the distribution of X.
It can be shown that μX is a probability measure on (IRn , B(IRn )). Thus, for each
random variable X, (IRn , B(IRn ), μX ) is a probability space.
10. Mathematical Expectation. The mathematical expectation of X, (abbrev. the
expectation of X, or the expected value of X), denoted E
E[X], is defined as
X dIP,
E
E[X] =
Ω
which can be proved that it equals
wise, it is not defined..
IRn
x dμX (x), provided
Ω
|X| dIP < ∞. Other-
The (mathematical) expectation of X is a kind of “average.”
More generally, for reasonable function f : IRn → IR,
E
E [f(X)] =
IRn
f(x) dμX (x).
(To be precise, f must be Borel measurable such that
IRn
|f(x)| dμX (x) < ∞.)
11. Independence. The conditional probability of A given B is defined to be
IP (A | B) =
IP(A ∩ B)
,
IP(B)
provided IP(B) > 0.
We say A and B are independent if IP(A | B) = IP(A).
It can be shown that two events A and B are independent if IP(A ∩ B) = IP(A)IP(B).
More generally, let H1 , H2 , . . . be families of sets in F . We say that they are
independent if for any finitely many events Hi1 , Hi2 , . . ., Hik , (where k is a positive
integer, and Hij ∈ Hij for each j ∈ {1, · · · , k}), we have
IP (Hi1 ∩ · · · ∩ Hik ) = IP (Hi1 ) · · · IP (Hik ).
In particular, X and Y are independent r.v.’s if σ(X) and σ(Y ) are independent.
5
Suppose that X and Y are integrable. If X and Y are independent, then
E
E [XY ] = E
E [X] E
E [Y ].
12. Conditional Expectation.
Let (Ω, F , IP) be a probability space, and let X be a r.v. with E
E |X| < ∞. Let H be
another σ-algebra of subsets of Ω, H ⊂ F .
Then these exists a unique r.v. Z(ω) such that
(i) Z is H-measurable,
(ii) for all H ∈ H,
Z(ω) dIP(ω) =
X(ω) dIP(ω).
H
H
Equivalently, for every h which is bounded and H-measurable,
E
E [Zh] =
Z(ω) h(ω) dIP(ω) =
X(ω) h(ω) dIP(ω) = E
E [Xh].
This Z is called the conditional expectation of X w.r.t. H, denoted by E
E [X | H].
Here, H refers to the known (i.e., given) information. Under such a condition (i.e.,
H is given), we evaluate the (conditional) expectation of X, which is Z = E
E [X | H].
Note. Sometimes, we call E
E [X] the unconditional expectation of X.
13. Some Properties of Conditional Expectations.
(i) E
E [aX + bY | H] = aE
E [X | H] + bE
E[Y | H].
(ii) E
E [E
E [X | H]] = E
E [X].
(iii) If X is H-measurable, then E
E [X | H] = X.
(iv) If X is independent of H, then E
E [X | H] = E
E [X].
(v) E
E [Y · X | H] = Y · E
E [X | H], if Y is H-measurable.
(vi) Given σ-algebras G ⊂ H ⊂ F , the following “tower property” holds:
E [E
E [X | G] | H] = E
E [X | G].
E
E [E
E [X | H] | G] = E
14. Exercise 1. Let Ω = {1, 2, 3, 4, 5} and U = {{4, 5}, ∅, Ω}.
(a) Is U a σ-algebra? Find the smallest σ-algebra σ(U) containing U.
(b) Let F = {{4}, {3, 5}, {1, 2}, {1, 2, 3, 5}, {1, 2, 4}, {3, 4, 5}, ∅, Ω}. Prove that F is a
σ-algebra.
(c) Let X : Ω → IR be defined by
X(4) = 3,
X(5) = X(3) = 7,
6
X(1) = X(2) = 0.
Prove that X is F -measurable.
(d) A probability measure IP : F → [0, 1] is given by
IP({1, 2}) =
Find IP({1, 2, 3, 5}),
1
,
3
IP({4}) =
1
,
3
IP({3, 5}) =
1
.
3
IP(X = 0), IP(X ≥ 3).
(e) Find
X(ω) dIP(ω) =
E
E [X] =
Ω
k IP [X = k].
k∈ZZ
§2. Stochastic Processes.
1. A stochastic process is a family {Xt }t∈T of r.v.’s. For each t, Xt : Ω → IRn is a
random variable. For each ω, the map
t → Xt (ω)
is called a path of the process.
Usually, T = [0, ∞) or [0, T ]. One may also regard Xt (ω) as the value of the process
at time t for the experiment ω.
2. Example. Brownian motion in IRn .
It can be written as any of the following: Bt (ω) = B(t, ω) = Bt . One may regard
Bt (ω) as the position of pollen grain ω at time t. Bt is a Gaussian process and
t → Bt (ω) is continuous. (See Appendix II at the end of this notes for a brief
introduction to Brownian motion, and Appendix III for Gaussian processes.)
E x the mathematical expectation when B0 = x.
Basic Properties. Denote by E
(i) E
E x [Bt ] = x.
(ii) E
E x [(Bt − x)2 ] = nt, (where n is the dimension).
(iii) E
E x [(Bt − Bs )2 ] = n(t − s) if t > s.
(iv) E
E x [(Btj+1 − Btj ) (Btk+1 − Btk )] = 0 if (tj , tj+1 ) ∩ (tk , tk+1 ) = ∅.
(v) Express B(t, ω) = (B1 (t, ω), · · · , Bn (t, ω)) ∈ IRn . Then B1 (t, ω), B2 (t, ω), . . .,
Bn(t, ω) are independent 1-dim. Brownian motions.
3. Population Growth Model.
Consider the simple population growth model
dN (t)
= a(t) N (t),
dt
N (0) = N0 (constant)
where N (t) is the size of the population at time t, and a(t) is the relative rate of
growth at time t. It might happen that a(t) is not completely known, but subject to
some random environmental effects, so that we have
a(t) = r(t) + α(t) “noise”,
7
and hence,
dN (t)
= (r(t) + α(t) “noise”) N (t),
dt
where the function r(t) is assumed to be nonrandom.
Try to model the “noise” by a stochastic process W (t) (the so-called white noise).
Desired properties are:
(i) If t1 = t2 , then Wt1 and Wt2 are independent.
(ii) Wt is a stationary process, i.e., W (t1 + h), W (t2 + h), . . ., W (tk + h) have the same
joint distribution of W (t1 ), . . ., W (tk ), for all k, h, tj .
(iii) E
E [W (t)] = 0.
But, no such measurable process W (·) exists.
Rather than considering the equation
dN (t)
= (r + α W (t)) N (t),
dt
we look at the integrated equation
t
r N (s) ds + α
N (t) = N (0) +
t
N (s) W (s) ds,
0
0
and replace W (s)ds by dV (s) for some unknown process V (·).
From the desired properties of W (·), we get the following properties of V (·): in view
of
Δh V (t) = V (t + h) − V (t) ≈ W (t) · h,
V (t) has independent, stationary increments such that E
E [Δh V (t)] = 0 (for any
h > 0).
The only such a process with continuous paths is Brownian motion.
Conclusion. Interpret the noise equation as the integral equation
t
t
rN (s) ds +
N (t) = N (0) +
0
α N(s) dB(s).
0
T
We have to define an integral of the form
f(t, ω) dB(t, ω). This can not be done
0
in the classical way, because B(t) does not have finite variation.
4. Exercise 2. Let B(t) be n-dimensional Brownian motion starting at 0.
(a) Prove that
E
E 0 [B(s) · B(t)] = n min (s, t).
8
(b) If n = 2, find IP0 [Bt ∈ Dρ ], where
Dρ = {(x1 , x2 ) ∈ IR2 : x21 + x22 < ρ2 }.
(Recall that, for an n-dimensional Brownian motion, Bt is an n-tuple random vector,
i.e.,
B(t, ω) = (B1 (t, ω), · · · , Bn (t, ω)) ∈ IRn ,
of which B1 (t, ω), B2 (t, ω), . . ., Bn (t, ω) are independent 1-dim. Brownian motions.)
§3. Construction of the Itô Integral.
(Recall that we start out with step functions when we try to define Riemann integrals.)
T
φ(t, ω) dB(t, ω) for step functions φ, then extend it to
1. Basic idea: first define
more general integrands.
0
2. If φ is a step function of the form
φ(t, ω) =
m
ei (ω) χ[ti ,ti+1 ) (t)
i=1
then define
T
def.
φ(t, ω) dB(t) ==
0
m
ei (ω) [B(ti+1 ) − B(ti )].
i=1
Notation. Here χ[ti ,ti+1 ) (·) is called the indicator function of the interval [ti , ti+1 ),
which means
1, if t ∈ [ti , ti+1 ),
χ[ti ,ti+1 ) (t) =
0, otherwise.
3. Example. Consider the following two step functions:
φ1 (t, ω) =
B(ti ) χ[ti ,ti+1 ) (t),
i
φ2 (t, ω) =
B(ti+1 ) χ[ti ,ti+1 ) (t).
i
Thus,
T
0
φ1 (t, ω) dB(t) =
i
9
B(ti ) [B(ti+1 ) − B(ti )],
whose expectation is equal to 0. To see this, simply note first that B(ti ) and B(ti+1 )−
B(ti ) are independent with E
E [B(ti )] = E
E [B(ti+1 ) − B(ti )] = 0. Thus,
T
φ1 (t, ω) dB(t) = E
E
B(ti ) [B(ti+1 ) − B(ti )]
E
E
0
=
i
E
E [B(ti )] E
E [B(ti+1 ) − B(ti )]
i
= 0.
Now turn to the integral of φ2 , which is given by
T
φ2 (t, ω) dB(t) =
B(ti+1 ) [B(ti+1 ) − B(ti )].
0
i
Its expectation can be evaluated:
T
φ2 (t, ω) dB(t) =
E
E (B(ti+1 ) [B(ti+1 ) − B(ti )])
E
E
0
i
=
E
E ([B(ti+1 ) − B(ti )] [B(ti+1 ) − B(ti )])
i
=
E
E [B(ti+1 ) − B(ti )]2
i
=
(ti+1 − ti ) = T.
i
So, in spite of the fact that both φ1 and φ2 appear to be very reasonable approximations to f(t, ω) = Bt (ω), their integrals are not close to each other at all, no matter
what partition 0 = t0 < t1 < · · · < tm < tm+1 = T is chosen.
This example shows that one cannot accept both φ1 and φ2 as “admissible” integrands. In fact, φ2 is not admissible. We need to impose additional assumptions on the integrands f(t, ω) in order to have a satisfactory theory for Itô integral
T
f(t, ω) dBt .
0
4. Class of Integrands. Denote by V(0, T ) the class of process f(t, ω) such that
(i) f(t, ω) is (t, ω)-measurable.
(ii) for each t, f(t, ω) is Ft -measurable, where Ft is the σ-algebra generated by Brownian motion up to time t, i.e.,
Ft = σ(Bs , s ≤ t).
(iii) Growth condition:
T
E
E
0
f(t, ω)2 dt < ∞.
10
Then, we can show that there exists a sequence of step functions φk (t, ω) ∈ V(0, T )
such that, as k → ∞,
T
E
E
0
2
(f(t, ω) − φk (t, ω)) dt → 0.
T
Furthermore, one can show that 0 φk dB converges in L2 (IP) and this limit is deT
noted by 0 f dB. In other words,
⎡
E
E⎣
T
0
f(t, ω) dBt −
2 ⎤
T
0
φk (t, ω) dBt
as k → ∞. Hence, we can define the Itô integral of f,
0
T
f(t, ω) dBt = lim
k→∞
T
0
⎦→0
f dB, as follows:
T
0
φk (t, ω) dBt ,
5. Remarks:
(i) A stochastic process f(t, ω) is called “adapted” w.r.t. Ft if for each t, f(t, ω) is
Ft -measurable. Intuitively, this means that the value of f(t, ω) can be described
in terms of the values of B(s, ω) for s ≤ t.
(ii) Let (Ω, F , IP) be a probability space. A r.v. X belongs to L2 (IP) if
2
E
E [X ] =
Ω
X 2 dIP < ∞.
We say that Xn → X in L2 (IP) if
E
E [(Xn − X)2 ] → 0
as n → ∞. Note that this implies that there is a subsequence Xnk (ω) such that
Xnk (ω) → X(ω) for almost all ω, as k → ∞.
6. Examples.
(i) f(t, ω) = B( 2t , ω) is adapted.
(ii) f(t, ω) = B(t + 1, ω) is not adapted.
t
(iii) f(t, ω) = 0 B(s, ω) ds is adapted.
(iv) f(t, ω) = max B(s, ω) is adapted.
0≤s≤t
(v) φ1 (t, ω) (mentioned in the example of §3.3) is adapted, whereas φ2 (t, ω) is not.
7. Exercise 3. Decide whether or not the following processes are Ft -adapted:
11
√
(i) f(t, ω) = B( t, ω);
(i) f(t, ω) = B(t2 , ω);
t
B(s, ω) ds.
(ii) f(t, ω) =
0
§4. Properties of Itô Integrals & Itô Formula.
Next topics will be: Properties of Itô integrals and how to compute these integrals.
12
Appendix I: Martingales
§5. Martingales.
1. Definition of Filtration. By a filtration on (Ω, F , IP), we mean a nondecreasing
family {Ft , t ≥ 0} of sub-σ-algebras of F , i.e., for 0 < s < t,
F0 ⊂ Fs ⊂ Ft ⊂ F .
For easy notation, denote by F∞ = σ(∪t Ft ). Obviously, F∞ ⊂ F .
The set-up (Ω, F , IP, {Ft , t ≥ 0}) is then called a filtered space.
2. Definition of Adapted Process. A stochastic process {Xt , t ≥ 0} on (Ω, F , IP) is
said to be adapted to the given filtration {Ft , t ≥ 0} if Xt ∈ Ft , i.e., Xt is measurable
with respect to the σ-algebra Ft , for all t ≥ 0. (Equivalently, the family {Xt , t ≥ 0}
is said to be {Ft , t ≥ 0}-progressively measurable.)
3. Definition of Natural Filtration If Ft = σ(Xs , 0 ≤ s ≤ t) for each t, it will be
called the natural filtration of the process {Xt , t ≥ 0}.
Obviously, the natural filtration of {Xt , t ≥ 0} is the smallest filtration relative to
which {Xt , t ≥ 0} is adapted.
4. An Intuitive Meaning. The information about the chosen ω that is available to
us at time n consists of the values Zn (ω) for every Fn -measurable r.v. Zn . {Xn } is
adapted if the value Xn (ω) is known to us at time n.
5. Definition. A real-valued stochastic process Xt , t ∈ [0, ∞), adapted to the filtration
(Ft ) is a martingale relative to Ft if
(i) for each t, E
E |Xt | < ∞, i.e., Xt is integrable,
(ii) for every pair s, t such that s < t, E
E [Xt | Fs ] = Xs a.s.
If E
E [Xt | Fs ] ≥ Xs in the above (ii), then the process Xt , 0 ≤ t < ∞, is called
a submartingale. A process X such that −X is a submartingale is called a supermartingale. Clearly, a process which is both a submartingale and a supermartingale
is a martingale.
13
Appendix II: Brownian Motions (Wiener Processes)
§6. Introduction.
We begin with a definition of Brownian motion. (For simplicity’s sake, only onedimensional B.m is defined here.) Let (Ω, F , IP) be a probability space. Let T =
IR+ = [0, ∞) and let B(IR) denote the σ-algebra generated by Borel sets of IR.
1. Definition. A real-valued stochastic processes {Bt : t ∈ IR+ }, where Bt : Ω → IR,
is a Brownian motion if it has the properties: for each t > 0, Bt is measurable w.r.t.
F /B(IR), i.e., for each t > 0, Bt is a random variable, and
(a) B0 (ω) = 0,
∀ ω ∈ Ω;
(b) the map t → Bt (ω) is a continuous function of t ∈ IR+ for all ω;
(c) for every t, s ≥ 0, Bt+s − Bt is independent of {Bu : 0 ≤ u ≤ t}, and has a
normal distribution with mean 0 and variance s.
2. Comments & Observations. The conditions (b) and (c) are the essential ones.
For each fixed ω ∈ Ω, the function t → Bt (ω); t ≥ 0, is called sample path (realization, trajectory) of the Brownian motion associated with ω. By (b), Brownian
paths are continuous.
We may think of ω itself as a real-valued function defined on [0, ∞), i.e.,
ω : [0, ∞) → IR such that ω(t) = Bt (ω).
The condition (b) says that Ω, as the space of all sample paths, is a space of continuous functions. It turns out that we may indeed take Ω = C([0, ∞); IR), the
space of all real-valued continuous functions defined on [0, ∞), which is called the
“canonical” Brownian motion. For a canonical Brownian motion, each ω ∈ Ω is
itself a sample path. Besides having the advantage of being intuitive, this point of
view is useful for the further analysis of measures on C([0, ∞); IR), since this space
is a complete separable metric space (i.e., Polish space).
From (c) one can deduce that, for t0 < t1 < . . . < tn , the random variables Bt0 , Bt1 −
Bt0 , . . . , Btn − Btn−1 are independent, and
√
IP{Bt+h − Bt ∈ A} =
A
1
2πh
x2
e− 2h dx.
(6.1)
In other words, Bt+h − Bt is normally distributed with mean 0 and variance h, i.e.,
Bt+h − Bt ∼ N (0, h).
Thus, Bt has independent, normally distributed, increments.
As for the condition (a), it says that the starting point of this Brownian motion
is the origin, which is a convenient normalization. We in fact frequently speak of
14
{x + Bt : t ∈ IR+ } as a Brownian motion started at x. Note that this starting point
x can be a fixed real, or random variable independent of B.
One more note on notation is that Bt (ω) may sometimes be written as B(t, ω) and
viewed as a function on [0, ∞) × Ω to IR.
3. Historical Notes. The Brownian motion process, sometimes called the Wiener
process, is a process of tremendous practical and theoretical significance. It originated as
(i) model of the phenomenon observed by Robert Brown, a Scottish botanist in the
summer of 1827, that “pollen grains suspended in water perform a continual
swarming motion.” Hence it was named after Robert Brown, called Brownian
motion. Then,
(ii) in Louis Bachelier’s (1900) work as a model of the stock market. (Bachelier is the
acknowledged father of quantitative methods in finance. In his thesis Théorie de
la spéculation, Bachelier is not only among the first to look at the properties of
B.m., but he also derived option pricing formula.)
(iii) Of course, it was also used as an model to explain the ceaseless irregular motions
of tiny particles suspended in a fluid. The first explanation of the phenomenon of
Brownian motion was given by Albert Einstein in 1905. He showed that Brownian
motion could be explained by assuming that the immersed particle was continually
being subjected to bombardment by the molecules of the surrounding medium.
(iv) But, Brownian motion is complicated and it is not surprising that it took more than
another decade to get a clear picture of such a stochastic process. The preceding
concise definition of this stochastic process underlying Brownian motion was given
by Norbert Wiener (1894 – 1964), who laid a rigorous mathematical foundation
and gave a proof of its existence in a series of papers originating in 1918. Hence,
it explains why it is now also called a Wiener process. In the sequel, we will use
both Brownian motion and Wiener process interchangeably.
(v) It was with the work of P. Samuelson in 1965 that Brownian motion reappeared
and became firmly established as a modeling tool for finance.
Coming back to Bachelier’s (1900) work, we have to point out that his paper was
at first largely ignored by academics for many decades, but now his work stands as
the innovative first step in a mathematical theory of stock markets that has greatly
altered the financial world today.
However, we should also indicate here that B.m. itself is not an adequate stochastic
process for a stock market model. It will be clear later that a standard B.m. has
constant mean, whereas the stock of a company usually grows at some rate, if only
due to inflation. Moreover, it may be too “noisy” (i.e., the variance of the increments
may be bigger than those observed for the stock) or not too noisy enough. One may
scale to change the noisiness and artificially introduce a drift. Yet this still won’t
be a good model. Another reason why B.m. is inadequate as a market model is
it would predict negative stock prices. When Merton, Black and Scholes did their
15
ground-breaking work on option pricing in early 1970’s, they adopted geometric
B.m. framework.
4. In the above we have defined Brownian motion without reference to a filtration.
Adding a filtration is a straightforward matter. Brownian motion relative to a
filtered probability space is defined as follows.
Definition. The process {Bt , t ≥ 0} is Brownian motion with respect to the filtration {Ft }, 0 ≤ t < ∞, if:
(i) it is adapted to {Ft }, i.e., for each t, Bt is measurable with respect to Ft ;
(ii) for all 0 ≤ s, t, Bt+s − Bt is independent of Ft ;
(iii) it is a Brownian motion as defined in item 1.
§7. Brownian Motion as a Martingale
1. Let {Bt : t ≥ 0} be a Brownian motion and define Ft = σ({Bs : s ≤ t}). Then
(Bt , Ft )t≥0 is a martingale.
Let us check that it is indeed a martingale. First, Bt ∈ L1 for all t, because
Bt ∼ N (0, t), and, second, for 0 ≤ s ≤ t,
E
E[Bt − Bs |Fs ] = 0,
equivalently, E
E[Bt |Fs ] = Bs ,
since Bt − Bs is independent of Fs .
2. Likewise, since Bt − Bs ∼ N (0, t − s) independently of Fs , we have
E[(Bt − Bs )2 ] = t − s.
E
E[(Bt − Bs )2 |Fs ] = E
But
E
E[(Bt − Bs )2 |Fs ] = E
E[Bt2 − 2Bs Bt + Bs2 |Fs ] = E
E[Bt2 |Fs ] − Bs2 ,
using properties of conditional expectation, so since we have (a.s.) that
E
E[Bt2 − t|Fs ] = Bs2 − s,
we conclude that
(Bt2 − t, Ft )t≥0
is a martingale.
(7.1)
As a matter of fact, one can even prove the following startling converse to (7.1),
which is known as Paul Lévy’s martingale characterization of Brownian motion.
3. Theorem. (Paul Lévy)
filtration Ft , where
Let (Xt )t≥0 be a continuous martingale with respect to the
Ft = σ(Xs , s ≤ t),
X0 = 0 a.s., and suppose that Xt2 − t is a martingale with respect to Ft . Then
Xt , t ≥ 0 is a Brownian motion.
16
Note that, by a continuous martingale, we mean a martingale such that t → Xt (ω)
is a continuous map for all ω.
4. Exponential Martingale. For a Brownian motion {Bt : t ≥ 0}, some elementary
arguments also shows that for any θ ∈ IR (or indeed, for θ ∈ C)
1
exp(θBt − θ2 t) is a martingale;
2
(7.2)
all one needs is that for 0 ≤ s ≤ t,
1
E
E(exp[θ(Bt − Bs )]) = exp[ θ2 (t − s)]
2
which is just the moment-generating function of a Gaussian distribution. These
exponential martingales are extremely useful in many ways.
One small point to note here in connection with the exponential martingales is that
if we define the Hermite polynomials Hn (t, x) by
θn
1
Hn (t, x),
exp(θx − θ2 t) =
2
n!
n≥0
then, for 0 ≤ s ≤ t,
θn
1
E
E[exp(θBt − θ2 t)|Fs ] =
E
E[Hn (t, Bt )|Fs ]
2
n!
n≥0
1
= exp [θBs − θ2 s]
2
θn
Hn (s, Bs ),
=
n!
n≥0
so, by comparing coefficients of θn , we deduce that
Hn (t, Bt ) is a martingale for each n.
It is easy to check that H1 (t, x) = x and H2 (t, x) = x2 − t, so, in particular, (7.2)
implies (7.1); the above Lévy’s theorem is essentially the converse to this.
17
Appendix III: Gaussian Processes
§8. Basic Properties.
In complete generality, a (real-valued) process {Xt , t ∈ T } indexed by some set T
is said to be a Gaussian process if, for any t1 , . . . , tn ∈ T , the joint distribution
of (X(t1 ), . . . , X(tn )) is multivariate normal (Gaussian). As any multidimensional
normal distribution is specified by two parameters, namely, mean vector and covariance matrix, the (finite-dimensional) distribution of the process X is specified by
the functions
(8.1)
μ(t) = E
E[X(t)],
σ(s, t) = Cov(Xs , Xt ).
(By this, we mean that if we are given μ and σ, we can work out the joint distribution
of (X(t1 ), . . . , X(tn )) for any t1 , . . . , tn ∈ T .)
It is well-known that the covariance matrix is nonnegative definite. The fact that the
joint distribution of (X(t1 ), . . . , X(tn )) is multidimensional normal can be expressed
as follows:
⎡
⎛
⎞⎤
⎛
⎞
n
n
n
1
E
E ⎣exp ⎝i
θj Xtj ⎠⎦ = exp ⎝i
θj μ(tj ) −
θj σ(tj , tk ) θk ⎠
2
j=1
j=1
j,k
for all θj ∈ IR. (Note that here θj ’s are parameters.)
In the study of Gaussian processes, one usually assumes that μ ≡ 0, to which the
general case can be reduced by considering the Gaussian process X(t) − μ(t).
2. It is obvious that {B(t), t ≥ 0} is a Gaussian process, with mean 0, and covariance
σ(s, t) = s ∧ t, (s, t ≥ 0).
3. Theorem. Any continuous real-valued Gaussian process, with mean 0 and covariance σ(s, t) = s ∧ t, (s, t ≥ 0) is a Brownian motion.
Just check the definition!
In other words, Brownian motion is the unique Gaussian process having continuous
trajectories, zero means, and covariance function given in the above item 2. For easy
reference, we re-state the above equivalent definition of one dimensional Brownian
motion starting at 0. Let {Bt , t ≥ 0} be a real-valued process with B0 = 0. It is a
standard Brownian motion if it satisfies
(i) Bt is a Gaussian process (i.e., all its finite dimensional distributions are multivariate normal),
(ii) E
E[Bt ] = 0, E
E[Bt Bs ] = s ∧ t = min (s, t),
(iii) with probability one, t → Bt is continuous.
4. This simple fact turns out to be an extremely efficient means of checking when a
given process is a standard Brownian motion.
18