Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Stochastic Analysis in Financial Mathematics MA4265/FE5204 Lecture Notes 1 Basic Probability & Measure Theory Main Reference. Øksendal, B., (2000/03): Stochastic Differential Equations–An Introduction with Applications. 5th/6th Ed. Springer-Verlag Auxiliary References. (i) Baxter, Martin & Rennie, Andrew, (1996): Financial Calculus: An Introduction to Derivative Pricing, Cambridge University Press. (ii) Lamberton, D., Lapeyre, B. (1996): Introduction to Stochastic Calculus Applied to Finance. Chapman & Hall (iii) Steele, J. Michael (2001): Stochastic Calculus and Financial Applications. New York: Springer. Some Examples. To begin with, let us present 4 examples for a preview of what will be covered in the course. 1. A model for stochastic growth. Consider the differential equation dN (t) = a(t) N (t). dt (∗) If a(t) = r + α(t) “noise”, what is the solution of (∗)? (Of course, the term “noise” needs to be clearly defined before attempt can be made to solve the differential equation.) If α(t) ≡ 0, a(t) = r, then (∗) becomes a simple ordinary differential equation which can be easily solved. Indeed, N (t) = N0 ert . What if α(t) = 0? More will be said in §2.3. 2. Optimal Stopping. Assume that the value of some asset at time t, X(t), is given by dX(t) = (r + α(t) “noise”) X(t), dt where r, α are known. If we sell this asset at time t, we get the amount e−ρt [X(t)−a]. (Here, ρ is the so-called “discount rate” and a refers to the tax and/or transaction cost.) We would like to ask at what time we should decide to sell. Mathematically speaking, we would like to maximize the following quantity: E E e−ρτ [X(τ ) − a] 1 over all “admissible” selling time τ . By “admissible,” we mean that the decision to sell at or before time t should only depend on the values X(s), s ≤ t, and not of any future value of X(s). 3. Stochastic Control. Suppose that we have two investment possibilities: a bond with price dynamics dX0 (t) = r X0 (t), dt with r a constant; a stock with price dynamics dX1 (t) = (μ + α “noise”) X1 (t). dt (Note that typically μ > r.) Let U(x) be a “utility function”, say U(x) = xγ (with 0 < γ < 1). Let u(x) be the fraction of the total wealth invested in the stock at time t. Denote by Z(t) = Z (u)(t) the corresponding total wealth at time t. The problem that we would like to deal with is to maximize E E U(Z (u) (T )) over all “controls” u = u(t). (This problem has been considered by R. Merton in 1971. We will discuss this problem in class in due course.) 4. Problems in Mathematical Finance. To give an example, consider a European call option which gives the owner of the option the right, but not the obligation, to buy one stock of a specified type at a specified price K at a specified future time T . What is the right price for such an option at time t = 0? In 1973, the Black-Scholes option pricing formula was published, which gives an exact formula for the right price in the market. §1. Preliminaries. To build up and develop the theory of probability, we should begin with a triple, (Ω, F , IP), with certain proper mathematical structure. Roughly speaking, Ω refers to the space of all sample points, i.e., all possible outcomes with respect to the statistical experiment concerned, F refers to the collection of “events” which are subsets of Ω, and IP refers to the probability measure. It goes without saying that axioms (or postulates) and rules that govern certain operations have to be imposed on the triple (Ω, F , IP) so that we may build up a mathematical system that is rich in structure. Indeed, the first few lectures are more or less a kind of “language” course, in which “alphabets” (symbols) and “grammatical rules” (axioms or postulates) are to be defined/mentioned. Let us first focus on F . 1. σ-Algebra 2 Definition. A non-empty collection F of subsets of Ω is called a σ-algebra on Ω if and only if (i) A ∈ F =⇒ Ac ∈ F , where Ac denotes the complement of A in Ω, i.e., Ac = Ω\A, (ii) every union of a countablecollection of sets in F is again in F . That is, if A1 , A2 , . . . are in F , then An ∈ F . n≥1 Note. An immediate consequence is that Ω ∈ F , and hence ∅ ∈ F as well. The pair (Ω, F ) is called a measurable space. A subset F of Ω is called measurable (with respect to F ) if F ∈ F . (In probability theory, a measurable set is also called an event.) A σ-algebra can be regarded as a mathematical model of information. 2. By a probability measure IP on a measurable space (Ω, F ) we mean a nonnegative set function defined on F and satisfying IP(Ω) = 1, IP(∅) = 0, i.e., IP : F → [0, 1] with IP(Ω) = 1, IP(∅) = 0, and IP( ∞ Ei ) = i=1 ∞ IP(Ei ) i=1 for any sequence Ei of disjoint measurable sets. (The above property of IP is often referred to by saying that IP is countably additive (or σ-additive), and hence called the axiom of countable additivity.) 3. By a probability space (Ω, F , IP) we mean a measurable space (Ω, F ) together with a probability measure IP defined on F . Ω alone is usually called the sample space, each of its element ω is then a sample point, every member of F an event, and IP a probability measure. An event F with IP(F ) = 1 is called a sure event, which means that F occurs almost surely, or equivalently, F occurs with probability 1. 4. A Few Observations & Properties. (a) The above defining condition §1.1.(ii) is equivalent to the following: if Ai ∞ n=1 is A must again be in F . a sequence of sets in F , then ∩∞ i=1 i (b) The power set of Ω, denoted by 2Ω , refers to the collection of all subsets of Ω. Then, the power set of Ω is a σ-algebra. It is sometimes called the total σ-algebra. Note also that the collection of two sets {∅, Ω} is a σ-algebra, too. It is called the trivial σ-algebra. (c) If Fi , i ∈ I are σ-algebras, then ∩i∈I Fi is. Here I = ∅ is an arbitrary index set (i.e., possibly uncountable). (d) Given any collection U of subsets of Ω, there is a smallest σ-algebra σ(U) which contains U. That is, σ(U) is a σ-algebra containing U and such that if B is any σ-algebra containing U, then B contains σ(U). 3 Remark: The smallest σ-algebra containing U, denoted by σ(U), is called the σ-algebra generated by U. 5. Examples. (i) Example 1. Let Ω = {1, 2, 3}. Put F = {∅, Ω, {1}, {2, 3}}. By referring to the definition in item 1, it can be shown that F is a σ-algebra. Next, put G = {∅, Ω, {2}}. Observe that G is not a σ-algebra as {2}c = {1, 3} = Ω \ {2} does not belong to G. How do we find the smallest σ-algebra that contains G? Obviously, σ(G) has to contain {1, 3}. Now, by going through the definition of a σ-algebra, one can verify that {∅, Ω, {2}, {1, 3}} is a σ-algebra that contains G, and hence the smallest σ-algebra containing G. That is, σ(G) = {∅, Ω, {2}, {1, 3}}. If H = {{2, 3}}, what is the smallest σ-algebra containing H? (Refer to F .) (ii) Example 2. Let U be the collection of all open sets in IRn . The smallest σalgebra σ(U) containing U is called Borel σ-algebra, denoted B(IRn ). Remark: A set V ⊂ IRn is said to be open in IRn if for every point v ∈ V , there exists an r > 0 such that the open ball centered at v with radius r B(v, r) = {x ∈ IRn : x − v < r} is contained in V , i.e., B(v, r) ⊂ V . Here x = (x21 + · · · + x2n )1/2 , which is the Euclidean distance in IRn for x = (x1 , . . . , xn ) ∈ IRn . For instances, (−2, 5) is an open set (more precisely, an open interval) in IR; an open ball mentioned above is also an open set. 6. We have just introduced what a probability space is. In the following, we will define “random variables” on a given probability space. An IRn -valued function Y defined on Ω is said to be a (vector-valued) random variable (r.v. for short) if for every open set V ⊂ IRn we have Y −1 (V ) = {ω : Y (ω) ∈ V } ∈ F . Y is said to be F -measurable if Y is a random variable. For abbreviation, we will write Y ∈ F . 7. Let X be a r.v. It can be shown that {X −1 (V ) : V ∈ B(IRn )} is a σ-algebra. Denote it by σ(X), which is called the σ-algebra generated by X, and it contains all sets of the form X −1 (V ), where V ∈ B(IRn ). Remark: In fact, {Y −1 (V ) : V ∈ A} is a σ-algebra, provided A is. 4 8. Doob-Dynkin Lemma. Let X and Y be r.v.’s. Y is measurable with respect to σ(X) if and only if Y = g(X) for some Borel-measurable function g : IRn → IRn . 9. Distribution. Given X : Ω → IRn , define a set function μX on B(IRn ) as follows: μX (V ) = IP {X −1 (V )} = IP {ω : X(ω) ∈ V }. We call μX the distribution of X. It can be shown that μX is a probability measure on (IRn , B(IRn )). Thus, for each random variable X, (IRn , B(IRn ), μX ) is a probability space. 10. Mathematical Expectation. The mathematical expectation of X, (abbrev. the expectation of X, or the expected value of X), denoted E E[X], is defined as X dIP, E E[X] = Ω which can be proved that it equals wise, it is not defined.. IRn x dμX (x), provided Ω |X| dIP < ∞. Other- The (mathematical) expectation of X is a kind of “average.” More generally, for reasonable function f : IRn → IR, E E [f(X)] = IRn f(x) dμX (x). (To be precise, f must be Borel measurable such that IRn |f(x)| dμX (x) < ∞.) 11. Independence. The conditional probability of A given B is defined to be IP (A | B) = IP(A ∩ B) , IP(B) provided IP(B) > 0. We say A and B are independent if IP(A | B) = IP(A). It can be shown that two events A and B are independent if IP(A ∩ B) = IP(A)IP(B). More generally, let H1 , H2 , . . . be families of sets in F . We say that they are independent if for any finitely many events Hi1 , Hi2 , . . ., Hik , (where k is a positive integer, and Hij ∈ Hij for each j ∈ {1, · · · , k}), we have IP (Hi1 ∩ · · · ∩ Hik ) = IP (Hi1 ) · · · IP (Hik ). In particular, X and Y are independent r.v.’s if σ(X) and σ(Y ) are independent. 5 Suppose that X and Y are integrable. If X and Y are independent, then E E [XY ] = E E [X] E E [Y ]. 12. Conditional Expectation. Let (Ω, F , IP) be a probability space, and let X be a r.v. with E E |X| < ∞. Let H be another σ-algebra of subsets of Ω, H ⊂ F . Then these exists a unique r.v. Z(ω) such that (i) Z is H-measurable, (ii) for all H ∈ H, Z(ω) dIP(ω) = X(ω) dIP(ω). H H Equivalently, for every h which is bounded and H-measurable, E E [Zh] = Z(ω) h(ω) dIP(ω) = X(ω) h(ω) dIP(ω) = E E [Xh]. This Z is called the conditional expectation of X w.r.t. H, denoted by E E [X | H]. Here, H refers to the known (i.e., given) information. Under such a condition (i.e., H is given), we evaluate the (conditional) expectation of X, which is Z = E E [X | H]. Note. Sometimes, we call E E [X] the unconditional expectation of X. 13. Some Properties of Conditional Expectations. (i) E E [aX + bY | H] = aE E [X | H] + bE E[Y | H]. (ii) E E [E E [X | H]] = E E [X]. (iii) If X is H-measurable, then E E [X | H] = X. (iv) If X is independent of H, then E E [X | H] = E E [X]. (v) E E [Y · X | H] = Y · E E [X | H], if Y is H-measurable. (vi) Given σ-algebras G ⊂ H ⊂ F , the following “tower property” holds: E [E E [X | G] | H] = E E [X | G]. E E [E E [X | H] | G] = E 14. Exercise 1. Let Ω = {1, 2, 3, 4, 5} and U = {{4, 5}, ∅, Ω}. (a) Is U a σ-algebra? Find the smallest σ-algebra σ(U) containing U. (b) Let F = {{4}, {3, 5}, {1, 2}, {1, 2, 3, 5}, {1, 2, 4}, {3, 4, 5}, ∅, Ω}. Prove that F is a σ-algebra. (c) Let X : Ω → IR be defined by X(4) = 3, X(5) = X(3) = 7, 6 X(1) = X(2) = 0. Prove that X is F -measurable. (d) A probability measure IP : F → [0, 1] is given by IP({1, 2}) = Find IP({1, 2, 3, 5}), 1 , 3 IP({4}) = 1 , 3 IP({3, 5}) = 1 . 3 IP(X = 0), IP(X ≥ 3). (e) Find X(ω) dIP(ω) = E E [X] = Ω k IP [X = k]. k∈ZZ §2. Stochastic Processes. 1. A stochastic process is a family {Xt }t∈T of r.v.’s. For each t, Xt : Ω → IRn is a random variable. For each ω, the map t → Xt (ω) is called a path of the process. Usually, T = [0, ∞) or [0, T ]. One may also regard Xt (ω) as the value of the process at time t for the experiment ω. 2. Example. Brownian motion in IRn . It can be written as any of the following: Bt (ω) = B(t, ω) = Bt . One may regard Bt (ω) as the position of pollen grain ω at time t. Bt is a Gaussian process and t → Bt (ω) is continuous. (See Appendix II at the end of this notes for a brief introduction to Brownian motion, and Appendix III for Gaussian processes.) E x the mathematical expectation when B0 = x. Basic Properties. Denote by E (i) E E x [Bt ] = x. (ii) E E x [(Bt − x)2 ] = nt, (where n is the dimension). (iii) E E x [(Bt − Bs )2 ] = n(t − s) if t > s. (iv) E E x [(Btj+1 − Btj ) (Btk+1 − Btk )] = 0 if (tj , tj+1 ) ∩ (tk , tk+1 ) = ∅. (v) Express B(t, ω) = (B1 (t, ω), · · · , Bn (t, ω)) ∈ IRn . Then B1 (t, ω), B2 (t, ω), . . ., Bn(t, ω) are independent 1-dim. Brownian motions. 3. Population Growth Model. Consider the simple population growth model dN (t) = a(t) N (t), dt N (0) = N0 (constant) where N (t) is the size of the population at time t, and a(t) is the relative rate of growth at time t. It might happen that a(t) is not completely known, but subject to some random environmental effects, so that we have a(t) = r(t) + α(t) “noise”, 7 and hence, dN (t) = (r(t) + α(t) “noise”) N (t), dt where the function r(t) is assumed to be nonrandom. Try to model the “noise” by a stochastic process W (t) (the so-called white noise). Desired properties are: (i) If t1 = t2 , then Wt1 and Wt2 are independent. (ii) Wt is a stationary process, i.e., W (t1 + h), W (t2 + h), . . ., W (tk + h) have the same joint distribution of W (t1 ), . . ., W (tk ), for all k, h, tj . (iii) E E [W (t)] = 0. But, no such measurable process W (·) exists. Rather than considering the equation dN (t) = (r + α W (t)) N (t), dt we look at the integrated equation t r N (s) ds + α N (t) = N (0) + t N (s) W (s) ds, 0 0 and replace W (s)ds by dV (s) for some unknown process V (·). From the desired properties of W (·), we get the following properties of V (·): in view of Δh V (t) = V (t + h) − V (t) ≈ W (t) · h, V (t) has independent, stationary increments such that E E [Δh V (t)] = 0 (for any h > 0). The only such a process with continuous paths is Brownian motion. Conclusion. Interpret the noise equation as the integral equation t t rN (s) ds + N (t) = N (0) + 0 α N(s) dB(s). 0 T We have to define an integral of the form f(t, ω) dB(t, ω). This can not be done 0 in the classical way, because B(t) does not have finite variation. 4. Exercise 2. Let B(t) be n-dimensional Brownian motion starting at 0. (a) Prove that E E 0 [B(s) · B(t)] = n min (s, t). 8 (b) If n = 2, find IP0 [Bt ∈ Dρ ], where Dρ = {(x1 , x2 ) ∈ IR2 : x21 + x22 < ρ2 }. (Recall that, for an n-dimensional Brownian motion, Bt is an n-tuple random vector, i.e., B(t, ω) = (B1 (t, ω), · · · , Bn (t, ω)) ∈ IRn , of which B1 (t, ω), B2 (t, ω), . . ., Bn (t, ω) are independent 1-dim. Brownian motions.) §3. Construction of the Itô Integral. (Recall that we start out with step functions when we try to define Riemann integrals.) T φ(t, ω) dB(t, ω) for step functions φ, then extend it to 1. Basic idea: first define more general integrands. 0 2. If φ is a step function of the form φ(t, ω) = m ei (ω) χ[ti ,ti+1 ) (t) i=1 then define T def. φ(t, ω) dB(t) == 0 m ei (ω) [B(ti+1 ) − B(ti )]. i=1 Notation. Here χ[ti ,ti+1 ) (·) is called the indicator function of the interval [ti , ti+1 ), which means 1, if t ∈ [ti , ti+1 ), χ[ti ,ti+1 ) (t) = 0, otherwise. 3. Example. Consider the following two step functions: φ1 (t, ω) = B(ti ) χ[ti ,ti+1 ) (t), i φ2 (t, ω) = B(ti+1 ) χ[ti ,ti+1 ) (t). i Thus, T 0 φ1 (t, ω) dB(t) = i 9 B(ti ) [B(ti+1 ) − B(ti )], whose expectation is equal to 0. To see this, simply note first that B(ti ) and B(ti+1 )− B(ti ) are independent with E E [B(ti )] = E E [B(ti+1 ) − B(ti )] = 0. Thus, T φ1 (t, ω) dB(t) = E E B(ti ) [B(ti+1 ) − B(ti )] E E 0 = i E E [B(ti )] E E [B(ti+1 ) − B(ti )] i = 0. Now turn to the integral of φ2 , which is given by T φ2 (t, ω) dB(t) = B(ti+1 ) [B(ti+1 ) − B(ti )]. 0 i Its expectation can be evaluated: T φ2 (t, ω) dB(t) = E E (B(ti+1 ) [B(ti+1 ) − B(ti )]) E E 0 i = E E ([B(ti+1 ) − B(ti )] [B(ti+1 ) − B(ti )]) i = E E [B(ti+1 ) − B(ti )]2 i = (ti+1 − ti ) = T. i So, in spite of the fact that both φ1 and φ2 appear to be very reasonable approximations to f(t, ω) = Bt (ω), their integrals are not close to each other at all, no matter what partition 0 = t0 < t1 < · · · < tm < tm+1 = T is chosen. This example shows that one cannot accept both φ1 and φ2 as “admissible” integrands. In fact, φ2 is not admissible. We need to impose additional assumptions on the integrands f(t, ω) in order to have a satisfactory theory for Itô integral T f(t, ω) dBt . 0 4. Class of Integrands. Denote by V(0, T ) the class of process f(t, ω) such that (i) f(t, ω) is (t, ω)-measurable. (ii) for each t, f(t, ω) is Ft -measurable, where Ft is the σ-algebra generated by Brownian motion up to time t, i.e., Ft = σ(Bs , s ≤ t). (iii) Growth condition: T E E 0 f(t, ω)2 dt < ∞. 10 Then, we can show that there exists a sequence of step functions φk (t, ω) ∈ V(0, T ) such that, as k → ∞, T E E 0 2 (f(t, ω) − φk (t, ω)) dt → 0. T Furthermore, one can show that 0 φk dB converges in L2 (IP) and this limit is deT noted by 0 f dB. In other words, ⎡ E E⎣ T 0 f(t, ω) dBt − 2 ⎤ T 0 φk (t, ω) dBt as k → ∞. Hence, we can define the Itô integral of f, 0 T f(t, ω) dBt = lim k→∞ T 0 ⎦→0 f dB, as follows: T 0 φk (t, ω) dBt , 5. Remarks: (i) A stochastic process f(t, ω) is called “adapted” w.r.t. Ft if for each t, f(t, ω) is Ft -measurable. Intuitively, this means that the value of f(t, ω) can be described in terms of the values of B(s, ω) for s ≤ t. (ii) Let (Ω, F , IP) be a probability space. A r.v. X belongs to L2 (IP) if 2 E E [X ] = Ω X 2 dIP < ∞. We say that Xn → X in L2 (IP) if E E [(Xn − X)2 ] → 0 as n → ∞. Note that this implies that there is a subsequence Xnk (ω) such that Xnk (ω) → X(ω) for almost all ω, as k → ∞. 6. Examples. (i) f(t, ω) = B( 2t , ω) is adapted. (ii) f(t, ω) = B(t + 1, ω) is not adapted. t (iii) f(t, ω) = 0 B(s, ω) ds is adapted. (iv) f(t, ω) = max B(s, ω) is adapted. 0≤s≤t (v) φ1 (t, ω) (mentioned in the example of §3.3) is adapted, whereas φ2 (t, ω) is not. 7. Exercise 3. Decide whether or not the following processes are Ft -adapted: 11 √ (i) f(t, ω) = B( t, ω); (i) f(t, ω) = B(t2 , ω); t B(s, ω) ds. (ii) f(t, ω) = 0 §4. Properties of Itô Integrals & Itô Formula. Next topics will be: Properties of Itô integrals and how to compute these integrals. 12 Appendix I: Martingales §5. Martingales. 1. Definition of Filtration. By a filtration on (Ω, F , IP), we mean a nondecreasing family {Ft , t ≥ 0} of sub-σ-algebras of F , i.e., for 0 < s < t, F0 ⊂ Fs ⊂ Ft ⊂ F . For easy notation, denote by F∞ = σ(∪t Ft ). Obviously, F∞ ⊂ F . The set-up (Ω, F , IP, {Ft , t ≥ 0}) is then called a filtered space. 2. Definition of Adapted Process. A stochastic process {Xt , t ≥ 0} on (Ω, F , IP) is said to be adapted to the given filtration {Ft , t ≥ 0} if Xt ∈ Ft , i.e., Xt is measurable with respect to the σ-algebra Ft , for all t ≥ 0. (Equivalently, the family {Xt , t ≥ 0} is said to be {Ft , t ≥ 0}-progressively measurable.) 3. Definition of Natural Filtration If Ft = σ(Xs , 0 ≤ s ≤ t) for each t, it will be called the natural filtration of the process {Xt , t ≥ 0}. Obviously, the natural filtration of {Xt , t ≥ 0} is the smallest filtration relative to which {Xt , t ≥ 0} is adapted. 4. An Intuitive Meaning. The information about the chosen ω that is available to us at time n consists of the values Zn (ω) for every Fn -measurable r.v. Zn . {Xn } is adapted if the value Xn (ω) is known to us at time n. 5. Definition. A real-valued stochastic process Xt , t ∈ [0, ∞), adapted to the filtration (Ft ) is a martingale relative to Ft if (i) for each t, E E |Xt | < ∞, i.e., Xt is integrable, (ii) for every pair s, t such that s < t, E E [Xt | Fs ] = Xs a.s. If E E [Xt | Fs ] ≥ Xs in the above (ii), then the process Xt , 0 ≤ t < ∞, is called a submartingale. A process X such that −X is a submartingale is called a supermartingale. Clearly, a process which is both a submartingale and a supermartingale is a martingale. 13 Appendix II: Brownian Motions (Wiener Processes) §6. Introduction. We begin with a definition of Brownian motion. (For simplicity’s sake, only onedimensional B.m is defined here.) Let (Ω, F , IP) be a probability space. Let T = IR+ = [0, ∞) and let B(IR) denote the σ-algebra generated by Borel sets of IR. 1. Definition. A real-valued stochastic processes {Bt : t ∈ IR+ }, where Bt : Ω → IR, is a Brownian motion if it has the properties: for each t > 0, Bt is measurable w.r.t. F /B(IR), i.e., for each t > 0, Bt is a random variable, and (a) B0 (ω) = 0, ∀ ω ∈ Ω; (b) the map t → Bt (ω) is a continuous function of t ∈ IR+ for all ω; (c) for every t, s ≥ 0, Bt+s − Bt is independent of {Bu : 0 ≤ u ≤ t}, and has a normal distribution with mean 0 and variance s. 2. Comments & Observations. The conditions (b) and (c) are the essential ones. For each fixed ω ∈ Ω, the function t → Bt (ω); t ≥ 0, is called sample path (realization, trajectory) of the Brownian motion associated with ω. By (b), Brownian paths are continuous. We may think of ω itself as a real-valued function defined on [0, ∞), i.e., ω : [0, ∞) → IR such that ω(t) = Bt (ω). The condition (b) says that Ω, as the space of all sample paths, is a space of continuous functions. It turns out that we may indeed take Ω = C([0, ∞); IR), the space of all real-valued continuous functions defined on [0, ∞), which is called the “canonical” Brownian motion. For a canonical Brownian motion, each ω ∈ Ω is itself a sample path. Besides having the advantage of being intuitive, this point of view is useful for the further analysis of measures on C([0, ∞); IR), since this space is a complete separable metric space (i.e., Polish space). From (c) one can deduce that, for t0 < t1 < . . . < tn , the random variables Bt0 , Bt1 − Bt0 , . . . , Btn − Btn−1 are independent, and √ IP{Bt+h − Bt ∈ A} = A 1 2πh x2 e− 2h dx. (6.1) In other words, Bt+h − Bt is normally distributed with mean 0 and variance h, i.e., Bt+h − Bt ∼ N (0, h). Thus, Bt has independent, normally distributed, increments. As for the condition (a), it says that the starting point of this Brownian motion is the origin, which is a convenient normalization. We in fact frequently speak of 14 {x + Bt : t ∈ IR+ } as a Brownian motion started at x. Note that this starting point x can be a fixed real, or random variable independent of B. One more note on notation is that Bt (ω) may sometimes be written as B(t, ω) and viewed as a function on [0, ∞) × Ω to IR. 3. Historical Notes. The Brownian motion process, sometimes called the Wiener process, is a process of tremendous practical and theoretical significance. It originated as (i) model of the phenomenon observed by Robert Brown, a Scottish botanist in the summer of 1827, that “pollen grains suspended in water perform a continual swarming motion.” Hence it was named after Robert Brown, called Brownian motion. Then, (ii) in Louis Bachelier’s (1900) work as a model of the stock market. (Bachelier is the acknowledged father of quantitative methods in finance. In his thesis Théorie de la spéculation, Bachelier is not only among the first to look at the properties of B.m., but he also derived option pricing formula.) (iii) Of course, it was also used as an model to explain the ceaseless irregular motions of tiny particles suspended in a fluid. The first explanation of the phenomenon of Brownian motion was given by Albert Einstein in 1905. He showed that Brownian motion could be explained by assuming that the immersed particle was continually being subjected to bombardment by the molecules of the surrounding medium. (iv) But, Brownian motion is complicated and it is not surprising that it took more than another decade to get a clear picture of such a stochastic process. The preceding concise definition of this stochastic process underlying Brownian motion was given by Norbert Wiener (1894 – 1964), who laid a rigorous mathematical foundation and gave a proof of its existence in a series of papers originating in 1918. Hence, it explains why it is now also called a Wiener process. In the sequel, we will use both Brownian motion and Wiener process interchangeably. (v) It was with the work of P. Samuelson in 1965 that Brownian motion reappeared and became firmly established as a modeling tool for finance. Coming back to Bachelier’s (1900) work, we have to point out that his paper was at first largely ignored by academics for many decades, but now his work stands as the innovative first step in a mathematical theory of stock markets that has greatly altered the financial world today. However, we should also indicate here that B.m. itself is not an adequate stochastic process for a stock market model. It will be clear later that a standard B.m. has constant mean, whereas the stock of a company usually grows at some rate, if only due to inflation. Moreover, it may be too “noisy” (i.e., the variance of the increments may be bigger than those observed for the stock) or not too noisy enough. One may scale to change the noisiness and artificially introduce a drift. Yet this still won’t be a good model. Another reason why B.m. is inadequate as a market model is it would predict negative stock prices. When Merton, Black and Scholes did their 15 ground-breaking work on option pricing in early 1970’s, they adopted geometric B.m. framework. 4. In the above we have defined Brownian motion without reference to a filtration. Adding a filtration is a straightforward matter. Brownian motion relative to a filtered probability space is defined as follows. Definition. The process {Bt , t ≥ 0} is Brownian motion with respect to the filtration {Ft }, 0 ≤ t < ∞, if: (i) it is adapted to {Ft }, i.e., for each t, Bt is measurable with respect to Ft ; (ii) for all 0 ≤ s, t, Bt+s − Bt is independent of Ft ; (iii) it is a Brownian motion as defined in item 1. §7. Brownian Motion as a Martingale 1. Let {Bt : t ≥ 0} be a Brownian motion and define Ft = σ({Bs : s ≤ t}). Then (Bt , Ft )t≥0 is a martingale. Let us check that it is indeed a martingale. First, Bt ∈ L1 for all t, because Bt ∼ N (0, t), and, second, for 0 ≤ s ≤ t, E E[Bt − Bs |Fs ] = 0, equivalently, E E[Bt |Fs ] = Bs , since Bt − Bs is independent of Fs . 2. Likewise, since Bt − Bs ∼ N (0, t − s) independently of Fs , we have E[(Bt − Bs )2 ] = t − s. E E[(Bt − Bs )2 |Fs ] = E But E E[(Bt − Bs )2 |Fs ] = E E[Bt2 − 2Bs Bt + Bs2 |Fs ] = E E[Bt2 |Fs ] − Bs2 , using properties of conditional expectation, so since we have (a.s.) that E E[Bt2 − t|Fs ] = Bs2 − s, we conclude that (Bt2 − t, Ft )t≥0 is a martingale. (7.1) As a matter of fact, one can even prove the following startling converse to (7.1), which is known as Paul Lévy’s martingale characterization of Brownian motion. 3. Theorem. (Paul Lévy) filtration Ft , where Let (Xt )t≥0 be a continuous martingale with respect to the Ft = σ(Xs , s ≤ t), X0 = 0 a.s., and suppose that Xt2 − t is a martingale with respect to Ft . Then Xt , t ≥ 0 is a Brownian motion. 16 Note that, by a continuous martingale, we mean a martingale such that t → Xt (ω) is a continuous map for all ω. 4. Exponential Martingale. For a Brownian motion {Bt : t ≥ 0}, some elementary arguments also shows that for any θ ∈ IR (or indeed, for θ ∈ C) 1 exp(θBt − θ2 t) is a martingale; 2 (7.2) all one needs is that for 0 ≤ s ≤ t, 1 E E(exp[θ(Bt − Bs )]) = exp[ θ2 (t − s)] 2 which is just the moment-generating function of a Gaussian distribution. These exponential martingales are extremely useful in many ways. One small point to note here in connection with the exponential martingales is that if we define the Hermite polynomials Hn (t, x) by θn 1 Hn (t, x), exp(θx − θ2 t) = 2 n! n≥0 then, for 0 ≤ s ≤ t, θn 1 E E[exp(θBt − θ2 t)|Fs ] = E E[Hn (t, Bt )|Fs ] 2 n! n≥0 1 = exp [θBs − θ2 s] 2 θn Hn (s, Bs ), = n! n≥0 so, by comparing coefficients of θn , we deduce that Hn (t, Bt ) is a martingale for each n. It is easy to check that H1 (t, x) = x and H2 (t, x) = x2 − t, so, in particular, (7.2) implies (7.1); the above Lévy’s theorem is essentially the converse to this. 17 Appendix III: Gaussian Processes §8. Basic Properties. In complete generality, a (real-valued) process {Xt , t ∈ T } indexed by some set T is said to be a Gaussian process if, for any t1 , . . . , tn ∈ T , the joint distribution of (X(t1 ), . . . , X(tn )) is multivariate normal (Gaussian). As any multidimensional normal distribution is specified by two parameters, namely, mean vector and covariance matrix, the (finite-dimensional) distribution of the process X is specified by the functions (8.1) μ(t) = E E[X(t)], σ(s, t) = Cov(Xs , Xt ). (By this, we mean that if we are given μ and σ, we can work out the joint distribution of (X(t1 ), . . . , X(tn )) for any t1 , . . . , tn ∈ T .) It is well-known that the covariance matrix is nonnegative definite. The fact that the joint distribution of (X(t1 ), . . . , X(tn )) is multidimensional normal can be expressed as follows: ⎡ ⎛ ⎞⎤ ⎛ ⎞ n n n 1 E E ⎣exp ⎝i θj Xtj ⎠⎦ = exp ⎝i θj μ(tj ) − θj σ(tj , tk ) θk ⎠ 2 j=1 j=1 j,k for all θj ∈ IR. (Note that here θj ’s are parameters.) In the study of Gaussian processes, one usually assumes that μ ≡ 0, to which the general case can be reduced by considering the Gaussian process X(t) − μ(t). 2. It is obvious that {B(t), t ≥ 0} is a Gaussian process, with mean 0, and covariance σ(s, t) = s ∧ t, (s, t ≥ 0). 3. Theorem. Any continuous real-valued Gaussian process, with mean 0 and covariance σ(s, t) = s ∧ t, (s, t ≥ 0) is a Brownian motion. Just check the definition! In other words, Brownian motion is the unique Gaussian process having continuous trajectories, zero means, and covariance function given in the above item 2. For easy reference, we re-state the above equivalent definition of one dimensional Brownian motion starting at 0. Let {Bt , t ≥ 0} be a real-valued process with B0 = 0. It is a standard Brownian motion if it satisfies (i) Bt is a Gaussian process (i.e., all its finite dimensional distributions are multivariate normal), (ii) E E[Bt ] = 0, E E[Bt Bs ] = s ∧ t = min (s, t), (iii) with probability one, t → Bt is continuous. 4. This simple fact turns out to be an extremely efficient means of checking when a given process is a standard Brownian motion. 18