* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Basic Stochastic Processes
Survey
Document related concepts
Transcript
Introduction to Stochastic Processes August 30, 2006 Contents 1 Measure space and random variables 1 2 Integration, Expectation and Independence 8 3 The art of conditioning 21 4 Martingales 31 5 Martingale convergence problems 35 6 Continuous time processes: the Wiener process or Brownian motion 37 7 Diffusions and Ito processes 43 Abstract These notes have a two-fold use: it contains both the material (albeit slightly re-shuffled) of the course on this topic taught by the above authors in Fall 2003 as well as extra notes where we feel that the book on ‘Basic Stochastic Processes’ is slightly too ephemeral. 1 Measure space and random variables Definition 1.1 A probability space is a triple (Ω, F, P) with the following properties: • The sample space of outcomes Ω is a non-empty set; • the set of observable events F is a σ-algebra over Ω. This means that F is a collection of subsets of Ω with the following properties: i) Ω ∈ F; ii) B ∈ F ⇒ Ω \ B ∈ F; 1 iii) if (Bn )n∈N is a sequence of events in F, then ∪∞ i=1 Bn ∈ F. F can be interpreted as the amount of information of Ω that can be observed. The smaller F, the less information we have of Ω. • P is a probability measure on (Ω, F), i.e. P : F → [0, 1] with the properties i) P{Ω} = 1; ii) for (Bn )n∈N a sequence of mutually disjoint events in F, i.e. Bi ∩ Bj = ∅ for i 6= j, one has P∞ P{ ∪∞ P{B B } = i } (σ-additivity). i=1 i i=1 σ-algebras t:opg1-1 Problem 1.1 Check that B1 , B2 . . . ∈ F implies many elements of F belongs to F. ∩∞ i=1 Bi ∈ F, i.e. the intersection of countably The Borel-σ-algebra B(Rd ) over Ω = Rd is the intersection of all σ-algebras containing the open sets in Rd . It is the smallest σ-algebra containing all open sets in Rd . Problem 1.2 Show that all one-points sets {x}, x ∈ R belong to B(R). Show that Q belongs to B(R). The σ-algebra σ(A) generated by a subset A ⊆ P(Ω) is the intersection of all σ-algebras containing A: \ σ(A) := {B : B is a σ − algebra over Ω with A ⊆ B}. Then B(R) is the σ-algebra generated by e.g. the open intervals (−a, b), a, b ∈ Q. t:opg1-3 Problem 1.3 Let Ω = Z+ . Suppose that A = {{i} | i ∈ Z+ } is the collection of all one-point sets. Determine the minimal σ-algebra containing A. t:opg1-6 Problem 1.4 Let V ⊂ N. Let V be the class of subsets for which the ‘(Césaro) density’ #(V ∩ {1, . . . , n}) n→∞ n γ(V ) = lim exists. Give an example of sets V , W ∈ V for which V ∩ W 6∈ V. Hence, V is not a σ-algebra. Problem 1.5 Let Ω = {0, 1}Z+ , i.e. Ω = {(ω1 , ω2 , . . .), ωn ∈ {0, 1}, n = 1, 2, . . .}. Define F = σ({ω : ωn = k}, n ∈ Z+ , k ∈ {0, 1}). Describe the following sets: (i)A P F in words. Show that P∞ F contains Pnn = {ω : ωi = 0, i > n}; (ii) −i < 1/3}; (iv) {ω : lim {ω : ∞ ω < ∞}; (iii) {ω : ω 2 n→∞ i=1 ωi /n = 1/2}. i=1 i i=1 i 2 Probability measure if A statement S about points ω ∈ Ω is said to hold almost everywhere (a.e.) S = {ω | S(ω) is true } ∈ F, and P{S} = 1. As an example of a simple probability space, take Ω = {±1}n , F = P(Ω) (power set or collection of all sub-sets), and P the Laplace-measure on Ω, i.e. P{B} = #(B) . #(Ω) σ-algebras are complicated objects. It is often easier to work with π-systems. A collection I of subsets of Ω is called a π-system, if it is invariant under intersection: I1 , I2 ∈ I → I1 ∩ I2 ∈ I. t:l-3 Lemma 1.1 Let µ1 , µ2 be two probability measures on (Ω, σ(I)), such that µ1 = µ2 on I. Then µ1 = µ2 on σ(I). That is, if two probability measures agree on a π-system, then they agree on the σ-algebra generated by the π-system. Problem 1.6 Give a π-system I, such that σ(I) = B([0, 1]). Let Ω = [0, 1] and F = B([0, 1])) the Borel-sets on [0, 1]. The Lebesgue measure P = λ ‘measures’ the length of an interval: λ{(a, b]} = b − a. It is not trivial to prove that λ can be extended as a probability measure on ([0, 1], B([0, 1]). Let (Ω, F, P) be a probability space and let {An }n∈N be a sequence of events (An ∈ F, n = 1, . . .). Then lim sup An := ∩m ∪n≥m An = (An i.o. ) n→∞ i.o=infinitely often. Explanation: x ∈ lim supn An iff x ∈ ∪n≥m An for all m. Then x ∈ lim supn An iff for all m there exists n ≥ m such that x ∈ An . Similarly lim inf An := ∪m ∩n≥m An = (An eventually). n→∞ Then x ∈ lim inf n An iff there exists m such that x ∈ ∩n≥m An . That is, x ∈ lim inf n An iff x belongs to all An except at most finitely many. Problem 1.7 Prove that lim inf n→∞ An ⊂ lim supn→∞ An . The notation An ↑ A means: An ⊂ An+1 , n ∈ N, A = ∪n An ; An ↓ A means that An ⊃ An+1 , A = ∩n An . Lemma 1.2 (Monotone convergence of the measure of a set) i) An ↑ A implies P{An } ↑ P{A}; ii) An ↓ A implies P{An } ↓ P{A}. Problem 1.8 Prove this lemma, see the hint on BSP p.3. (BSP=Basic Stochastic Processes) 3 Note that for (ii) it is crucial that we consider probability measures. In case of a general measure µ, (ii) does not necessarily hold when µ(Ω) = ∞. Example is: the Lebesgue measure on R with An = (n, ∞). Then µ(An ) = ∞, and A = ∩n An = ∅, µ(∅) = 0. Lemma 1.3 (Fatou Lemma for sets) i) P{lim inf n→∞ An } ≤ lim inf n→∞ P{An } ii) P{lim supn→∞ An } ≥ lim supn→∞ P{An }. In this case, (ii) requires finiteness of the measure at play. Problem 1.9 Give an example where (ii) does not hold. Proof of the Fatou Lemma. We prove (ii). Let Gm = ∪n≥m An and Gm ↓ G = lim supn→∞ An (why?). Hence P{Gm } ↓ P{G}. Since P{Gm } ≥ P{An }, we have that P{Gm } ≥ supn≥m P{An }. Hence P{G} =↓ lim P{Gm } ≥↓ lim sup P{An } = lim sup P{An }. m→∞ m→∞ n≥m n→∞ QED Problem 1.10 Prove statement (i) of the Fatou Lemma. Lemma 1.4 (First Borel-Cantelli Lemma) Suppose that P∞ n=1 P{An } < ∞. Then P{lim sup An } = P{An i.o.} = 0. n→∞ Applications of this lemma come later, after introducing the notion of independence. Random variables What functions on a probability space (Ω, F, P) are consistent with the σalgebra F? These are the measurable functions. Definition 1.2 A map X : Ω → R is called (F)-measurable is X −1 (B) ∈ F for all B ∈ B(R). In other words: {X ∈ B} := {ω : X(ω) ∈ B} = X −1 (B) is an observable event for all B ∈ B(R). In the probabilistic context a measurable, real-valued function is called a random variable. For the present we stick to speaking of measurable functions. If Ω = Rk and F = B(Rk ), then we call X a Borel-function. t:opg1-5 Problem 1.11 Let Ω = [0, 1] and let A ( Ω, A 6= ∅. Determine the minimal σ-algebra F containing A. Classify all (Ω, F)-measurable functions. 4 The building blocks of these functions are the elementary functions: let A1 , . . . , An ∈ F be disjoint (Aj ∩ Ai = ∅, j 6= i) and let a1 , . . . , an ∈ R. Then f (ω) = n X ai 1{Ai } (ω) i=1 is an elementary function or a simple function. Here 1{Ai } is the indicator-function of Ai , i.e. 1{Ai } (ω) = 1, 0, ω ∈ Ai otherwise. Problem 1.12 Show that an elementary function is measurable. t:opg1-4 Problem 1.13 Let Ω = R, and F = B(R). Show that the function X : R → R, defined by 1, ω ∈ Q, X(ω) = , 0, ω ∈R\Q is an elementary function. In order to show that limits of elementary functions are measurable, we need the following elementary results on measurability. Lemma 1.5 i) f −1 preserves set operations: f −1 (∪α Aα ) = ∪α f −1 (Aα ); f −1 (Ac ) = (f −1 (A))c , ... ii) If C ⊆ B is a collection of sets generating B(R), that is σ(C) = B(R), then f −1 (C) ∈ F for all C ∈ C, implies that f is F-measurable. iii) The function g : Ω → R is measurable, if {g ≤ x} = {ω : g(ω) ≤ x} ∈ F, for all x ∈ R. Proof. The proof of (i) is straightforward. For the proof of (ii) let C(B) be the collection of elements B (are sets!) of B(R), with f −1 (B) ∈ F. By (i) C(B) is a σ-algebra, by assumption C(B) contains C, hence C(B) contains B. (iii) follows from (ii) when we take C = π(R) the class of intervals of the form (−∞, x]. QED Problem 1.14 Let Ω = R, F = B(R). Show that f : R → R given by f (x) = cos(x) is measurable. Measurability is preserved under a number of operations. t:l-1 Lemma 1.6 (Sums and products are measurable) f, g measurable and λ ∈ R, then f + g, f · g and λf are measurable. 5 Proof (partial). It is sufficient by the previous lemma to check that {f + g > x} ∈ F (why?). Now, f (ω) + g(ω) > x iff f (ω) > x − g(ω). Hence there exists qω ∈ Q, such that f (ω(> qω > x − g(ω). It follows that {f + g > x} = ∪q∈Q {f > q} ∩ {g < x − q}, the latter of which is a countable union of elements of F. QED Lemma 1.7 (Composition lemma) If f is F-measurable and g a Borel function, then the composition g ◦ f is F-measurable. In the next lemma, we may allow that the limits have values ±∞. All results can be extended to this case, but here we restrict to finite limits. This lemma ensures that non-increasing limits of elementary functions are measurable. Most ‘reasonable’ functions fall into this category. t:l-2 Lemma 1.8 (Measurability of infs, liminfs and lims) Let f1 , . . . be a sequence of measurable functions. Then (i) inf n fn , supn fn , (ii) lim inf n fn , lim supn fn are measurable (provided these limits are finite); moreover (iii) {ω : limn fn (ω) exists} ∈ F. Proof. For (i), use {ω : inf n fn (ω) ≥ x} = ∩n {ω : fn (ω) ≥ x}. For (ii) let ln (ω) = inf m≥n fm (ω). Then ln is measurable by (i). Moreover, l(ω) := lim inf fn (ω) =↑ lim ln (ω) = sup ln (ω), n and so {l ≤ x} = ∩n {ln ≤ x} ∈ F. For (iii), note that {lim fn exists } = {lim sup fn < ∞} ∩ {lim inf fn > −∞} ∩ {lim sup fn − lim inf fn = 0}. n QED Problem 1.15 We did not prove the case of sup and lim sup. How does this follows from the inf and lim inf case? The uniqueness lemma for measures allow to deduce results on σ-algebras from results on π-systems for these σ-algebras. There is a similar result for measurable functions. The following theorem allow to deduce results for general measurable functions form results on indicator functions of elements from a π-system for the σ-algebra at hand! This version is taken from Williams’ book, most versions tend to be formulated as assertions on σ-algebras. Theorem 1.9 ((Halmos) Monotone class Theorem: elementary version) Let H be a class of bounded functions from a set S into R, satisfying the following conditions: i) H is a vector space over R (i.e. it is an Abelian group w.r.t to addition of functions, it is closed under scalar multiplication by real scalars, such that (αβ)f = α(βf ),(−1)f = −f and (α + β)f = αf + βf , for f ∈ H, α, β ∈ R); ii) if fn ,n = 1, 2, . . ., is a sequence of non-negative functions in H such that fn ↑ f , with f bounded, then f ∈ H; iii) The constant function is an element of H. If H contains the indicator function of every set in a π-system I, then H contains overy bounded σ(I)-measurable function. 6 Let Ω = {0, 1}N . So, Ω = {(ω1 , ω2 , . . .), ωn ∈ {0, 1}, n = 1, . . .}. Define Coin tossing F = σ({ω : ωn = k} : n ∈ N, k ∈ {0, 1}). Let Xn (ω) be the projection on the n-th co-ordinate: Xn (ω) = ωn . It is the result of the n-th toss. By definition of F, Xn is a random variable. By Lemma 1.6 Sn = X1 + · · · + Xn = number of ones in n tosses is a random variable. Next for x ∈ [0, 1] o n Sn Sn number of ones in n tosses → x = {ω : lim sup = x} ∩ {ω : lim inf = x} ∈ F ω: n=number of tosses n n by Lemma 1.8. Note that this means that the Strong Law of Large Numbers is a meaningful result! t:opg1-7 1/2n , Problem 1.16 Define P{ω : ω1 = x1 , . . . , ωn = xn } = where x1 , . . . , xn ∈ {0, 1}. Assume that this can be extended to a probability measure on Ω. Prove the following assertions: i) E = {ω : P < ∞} ∈ F, and P{E} = 0. P ii) The function X(ω) = n ωn 2−n is a random variable. n ωn iii) λ(a, b] = P{X ∈ (a, b]} for all intervals (a, b] ∈ [0, 1]. iv) λ(B) = P{X ∈ B} for all Borel sets B ⊂ [0, 1]. Hence X has the uniform distribution on [0, 1]. σ-algebra generated by a random variable or a collection of these Suppose we have a collection of random variables Xt : Ω → R, t ∈ I, where I is some index set. Then X = σ(Xt : t ∈ I) is defined to be the smallest σ-field, such that each random variable Xt is X -measurable. It follows that X ⊂ F! One can view σ(Xt : t ∈ I) as the information carried by the random variables Xt , t ∈ I. For instance, observing an outcome y = X1 (ω), we can only retrieve the set X1−1 (y) that ω belongs to, and in general not the precise point ω that produced outcome y. Compared to the σ-algebra F, we lose information by observing the outcome of a random variables, and so the σ-algebras σ(X1 ), σ(X1 , X2 ), . . . , X are sub-σ-algebras of F. It makes sense that observing more outcomes y1 = X1 (ω), y2 = X2 (ω), . . ., provides us more information as to the precise point ω that produced these outcomes. This is consistent with the fact that e.g. σ(X1 , . . . , Xn ) ⊃ σ(X1 , . . . , Xn−1 ): the more outcomes we observe, the bigger ‘finer’ the generated σ-algebra. How can we build σ-algebra X if e.g. the index set I = N? π-systems help us here: let Xn = σ(Xk : k ≤ n), then ∪n Xn is a π-system that generates σ(Xn : n ∈ N). Problem 1.17 Let Ω = [0, 1], F = B([0, 1]), and X1 (ω) = 1, 0, ω ≤ 1/5 , ω > 1/5 −1, 0, X2 (ω) = 2, ω ≤ 1/2 1/2 < ω ≤ 3/4 ω > 3/4 Determine σ(X1 ), σ(X2 ) and σ(X1 , X2 ). Describe all σ(X1 , X2 )-measurable functions. 7 Problem 1.18 Let Ω = R, F = B(R). For X(ω) = cos(ω), determine σ(X). Is Y defined by Y (ω) = sin(ω) σ(X)-measurable? Problem 1.19 Prove that the σ-algebra σ(X) generated by the random variable X is given by σ(X) = X −1 (B) := ({ω | X(ω ∈ B} : B ∈ B), and that σ(X) is generated by the π-system π(X) := ({ω | X(ω) ≤ x} : x ∈ R). How can one characterise π-systems generating σ(X1 , . . . , Xn ) and X ? Explain. t:mtb Theorem 1.10 Let (Ω, F) be a measure space. Let Ω1 be another space and f : Ω1 → Ω a function. Let F1 = σ(f −1 (A), A ∈ F) = f −1 (F) be the σ-algebra generated by the inverse images of A ∈ F under f . Then a function g : Ω1 → R is F1 -measurable if and only if there exists a F-measurable function h : Ω → R such that g = h(f ). An application of the above theorem is the Doob-Dynkin lemma. Lemma 1.11 (in BSP:Doob-Dynkin lemma) Let X : Ω → R be a random variable. Then Y : Ω → R is σ(X)-measurable if and only if there exists a Borel function f : R → R such that Y = f (X). The lemma can be proved by first proving it for elementary functions and then extending this to positive and then to general measurable functions. Problem 1.20 Show how the Doob Dynkin lemma follows from theorem 1.10. Suppose that X is an elementary function. Show the assertion of the Lemma by explicitly constructing σ(X) and by subsequently specifying how to choose f . 2 Integration, Expectation and Independence It is convenient here to assume a general measure µ, i.e. we have a measure space (Ω, F, µ). As a reminder: we say that an event A ∈ F occurs µ-a.s.(almost surely), or µ-a.e. (almost everywhere), if µ(Ac ) = 0. In case that µ is a probability measure, we can also say that this event occur swith probability 1. P For a non-negative elementary funtion f = ni=1 ai 1{Ai } , ai ≥ 0, i = 1, . . . , n, we define Z n X f dµ = ai µ(Ai ). i=1 For general positive, measurable functions f , the integral can be defined by Z Z f dµ = lim fn dµ, n→∞ where fn , n = 1, . . ., is a non-decreasing sequence of elementary functions, with fn ↑ f , n → ∞. For example, one can choose n, fn (ω) = (i − 1)2−n , f (ω) > n (i − 1)2−n < f (ω) ≤ i2−n ≤ n, i = 1, . . . , n2n . 8 Problem 2.1 These approximating elementary functions fn are σ(f )-measurable. Prove this. For general measurable f , write Rf = f + −R f − , f + , f − ≥ 0: f + = max(f, 0), f − = max(−f, 0). Then f is integrable if at least one of f + dµ, f − dµ < ∞; if both are finite we call f summable! N.B. this is slightly different from definition 1.9 of SBP. N.B. this stepwise argument from elementary functions, via positive functions to general functions is part of a standard proof machine. Lateron it will be used for stochastic integrals. Problem 2.2 Let Ω = (0, 1], F = B(R), µ = λ. Let f = 1{Q∩(0,1]} . Calculate R f dλ. t:opg2-1 Problem 2.3 i) Suppose that R µ(f 6= 0) = 0 for some measurable function f (not necessarily nonnegative!). Prove that f dµ = 0. R ii) Let µ(f < 0) = 0 (i.e. f ≥ 0 µ-a.e.). Prove that f dµ ≥ 0. R iii) Let f be a measurable function with µ(f < 0) = 0. Prove that f dµ = 0 implies µ(f > 0) = 0, Ri.e. f ≡ 0 µ-a.e. Give an example of a measure space and a function f with f 6≡ 0, and f dµ = 0. The next step is to formulate a number of basic convergence theorems giving conditions under which integral and limits may be interchanged. These conditions amount to requiring positivity (positive functions are always integrable, there are no problems of substracting ∞ from ∞) or some wellbehaved dominating function. t:th-1 Theorem 2.1 Monotone convergence Theorem Suppose that 0 ≤ fn ↑ f µ-a.e. (i.e. µ( ∪n (fn < 0) ∪ (f < 0) ∪ (fn 6↑ f )) = 0). Then Z Z Z lim fn dµ = lim fn dµ = f dµ. n→∞ n→∞ (2.1) Dominated Convergence Theorem Suppose that fn → f µ-a.e., and |fn | ≤ g µ-a.e. with g a µ-summable function. Then Z |fn − f |dµ → 0, n→∞ and in particular (2.1) holds. Lemma 2.2 (Fatou’s Lemma) (BSP, p. 109) If fn ≥ 0 µ-a.e., then Z Z lim inf fn dµ ≤ lim inf fn dµ. n n RProof. LetR gn := inf k≥n fk . gn is measurable and gn ↑ lim inf k fk . Then fkR ≥ gn forR k ≥ n. Hence fk dµ ≥ gn dµ, k ≥ n (see question 2.3 (iii)). By monotone convergence gn dµ ↑ lim inf k fk dµ, and so Z Z Z Z fk dµ = lim inf fn dµ. lim inf fk dµ =↑ lim gn dµ ≤↑ lim inf k n n k≥n n QED 9 Problem 2.4 There is a limsup version of Fatou’s lemma: Z Z lim sup fn dµ ≥ lim sup fn dµ. n n→∞ Provide conditions on the sequence fn , n = 1, . . ., such that the version follows from the above Fatou’s lemma. Problem 2.5 Let Ω = (0, 1]; F = B(0, 1] and µ = λ, the Lebesgue measure. Let fn = n1{(0,1/n]} . R Compute limn fn , and limn fn dλ. Compare this with the statements in the Monotone Convergence Theorem, Dominated Convergence Theorem and Fatou’s Lemma. Which results fail and why? R Problem 2.6 Let fn , n = 1, . . . and f be measurable functions, with the property that |fn − f |dµ → 0, n → ∞. Does this imply that fn → f , n → ∞, µ-a.e.? Unfortunately not in general: choose Ω = (0, 1], F = B(0, 1] and i = 0, . . . , 2n − 1, n = 0, 1, . . . R R Calculate fk dλ, investigate whether the limits limk→∞ fk and limk→∞ fk dλ exist. f2n +i = 1{(i·2−n ,(i+1)·2−n ]} , In order to be able to define conditional expectations lateron, we need the following result. Theorem 2.3 (Radon-Nikodym, BSP p. 28) Let (Ω, F) be given. Suppose that µ is a σ-finite measure, i.e. there are events An , n = 1, . . . ∈ F, with ∪An = Ω and µ(An ) < ∞ for n = 1, . . .. Suppose further that ν is µ-absolutely continuous, i.e. µ(A) = 0 implies ν(A) = 0. Then there exists a measurable function f ≥ 0, which is integrable w.r.t. µ, such that Z Z ν(A) = f dµ = f 1{A} dµ. A Notation: f = dν/dµ is called the density or Radon-Nikodym derivative of ν w.r.t. µ. A consequence of the Theorem for measurable functions g, integrable w.r.t. ν, is that Z Z gdν = g · f dµ, (2.2) Back to random variables and probability measures In general, when speaking of random variables, we define these in terms of the outcomes (values X van take) and a probability distribution on the space of outcomes. The underlying probability space (Ω, F, P) is mostly left undefined and its role is hidden. It can be useful to know a way of constructing an underlying probability space. However, first we will discuss some notation and concepts for random variables related to integration. Suppose that (Ω, F, P) is given as well as the random variable X : Ω → R. Then PX given by PX {A} = P{ω : X(ω) ∈ A} is a probability measure on (R, B(R)) by virtue of the so-called “overplantingsstelling’. 10 Theorem 2.4 (Overplantingsstelling) Let Ω, F, µ) be a measure space. Suppose that (Ω0 , F 0 ) is a measurable space. Let f : Ω → Ω0 be a F–F 0 -measurable function in the sense that f −1 (A0 ) ∈ F for all A0 ∈ F 0 . Then the function µ0 (A0 ) = µ{f −1 (A0 )}, A0 ∈ F 0 , is a measure on F 0 . Moreover, for any F 0 -measurable function g : Ω0 → R, one has Z Z gdµ0 , g(f )dµ = Ω0 Ω in the sense that both integrals exist and are equal, whenever at least one of them exists. Problem 2.7 Prove this theorem. PX is called the probability distribution of X. Since {(−∞, x]}, x ∈ R is a π-system generating B(R), the uniqueness lemma 1.1 implies that it is sufficient to specify the values FX (x) = PX {(−∞, x]} = P{X ≤ x}, which is called the (probability) distribution function of X. Problem 2.8 Show that FX has the following properties: i) F : R → [0, 1], FX is non-decreasing; ii) limx→−∞ FX (x) = 0, limx→∞ FX (x) = 1; iii) FX is right-continuous. The function FX provides a nice tool for the construction of random variables with a given distribution functions. Let a function F with properties (i,ii,iii) be given. Then again there is a unique (why?) probability measure p on (R, B(R)) with p{(−∞, x]} = F(x). Choose Ω = R, F = B(R) and P = p, and set X(ω) = ω. We have PX = p. We can also construct X on (Ω, F, P) = ([0, 1], B[0, 1], λ): set X(ω) = inf{y : F(y) ≥ ω}(= sup{z : F(z) < ω}. This is called the Skohodrod representation. Problem 2.9 Show that FX = F. If the probability measure PX is absolutely continuous w.r.t. the Lebesgue measure then PX has a probability density function fX (w.r.t. to the Lebesgue measure) by the Radon-Nikodym theorem and then we can write Z PX {A} = fX (x)dλ(x). A 11 Whenever f is Riemann-integrable, this integral is the same as the normal Riemann integral! This applies for instance when the density is a continuous function on an open interval of R. On the other hand, there is a pit-fall here: one would expect continuity of FX to imply existence of a density. This is not true: the Cantor set provides a way to construct an example of this. However If FX is a Rx continuous function and there is a function f such that FX (x) = −∞ f(u)du, then the density exists and one may choose fX (x) = f(x), as in the usual cases. If X is P-summable, we say that X has finite expectation (or a finite first moment), given by Z X(ω)dP(ω). E(X) = Ω Using the ‘overplantingsstelling’, we can write it in terms of PX by E(X) = density fX w.r.t. λ then Z E(X) = xfX (x)dλ(x). R R xdPX (x). If X has a R N.B. different authors define the existence of the expectation or of moments differently: some require only integrability in our sense. If, X 2 is P-summable, then we call X square integrable. The variance of X is defined by σ 2 ((x) = E(X − E(X))2 (= E(X 2 ) − (E(X))2 . In order to calculate expectations of functions of X, we can use the ‘overplantingsstelling’ in a convenient way. Suppose that g : R → R is Borel-measurable. Then g(X) is has a finite expectation if and only if g is summable w.r.t PX and we have Z Z Eg(X) = g(X(ω))dP(ω) = g(x)PX (x). Ω R If X has a density fX w.r.t. λ, then Z Eg(X) = g(x)fX (x)dλ(x). R Remark The space of summable functions on (Ω, F, P) is denotes by L1 (Ω, F, P), and the space of R square integrable functions on (Ω, F, P) by L2 (Ω, F, P). Both play important roles: ||X||1 = |f |dP qR and ||X||2 = X 2 dP act ‘almost as’ norms on these spaces. The problem is that ||X||1,2 = 0 does not imply that X = 0. It only implies that X = 0 P-a.e.! The solution is to define equivalence classes of functions that are P-almost everywhere equal. The resulting quotient spaces are denoted by L2 (Ω, F, P) and L2 (Ω, F, P) and these are complete, normed spaces. In case of L2 (Ω, F, P), the norm comes from the inner product (X, Y ) = E(XY ) and so the space is a Hilbert space. Note that convergence in these spaces means convergence in the respective norms. Problem 2.10 Suppose that X takes only countably many values. i) What type of function is X? PX cannot be absolutely continuous w.r.t. the Lebesgue measure λ on (R, B(R))- why? ii) Give a formula for E(X) and σ 2 (X). 12 iii) Suppose that X ∈ {0, 1, . . .} P-a.s. and suppose that X has a finite expectation. Show the following alternative formula for its expectation: ∞ X E(X) = P{X > n}. n=0 N.B. the limit theorems of the previous section can be transferred to a formulation in terms of expectations! When restricting to postive r.v.s, these can be used to yield some useful results. Note that by definition for a r.v. X we have P{|X| < ∞} = 1 (this is not necessary, the theory holds through if we allo infinite values!). Suppose that {Xn }n∈N is a collection of r.v.s on (Ω, F, P), that are all P-a.e. non-negative. • One has E( X Xn ) = X E(Xn ), (2.3) n n where both are either finite or infinite. P P E(Xn ) < ∞ implies that n Xn < ∞ a.e. and so Xn → 0, n → ∞, a.s. • Problem 2.11 Prove this. Conjure up a simple example where (2.3) fails when the positivity condition lacks. One can write probabilities of sets in terms of expectations: P{X ∈ A} = PX {A} = E{1{X∈A} } and similarly Z Z gdPX = g1{A} dPX . A R We conclude this section with two important inequalities. Lemma 2.5 (Chebyshev’s inequality) Suppose that X is a random variable. Let φ : R → R+ be a non-decreasing, non-negative function such that E(φ(X)) < ∞. Then for all a > 0 with φ(a) > 0 one has E(φ(X)) P{X ≥ a} ≤ . φ(a) Proof. Z E(φ(X) = φ(x)dPX (x) Z ≥ φ(x)dPX (x) x≥a ≥ φ(a)P{X ≥ a}. Positivity of φ justifies the first inequality. QED 13 Let Z ∼ N (0, 1), that is Z has the standard normal distribution with density 1 fZ (x) = √ exp{−x2 /2}. 2π We will prove that P{Z > a} ≤ exp{−a2 /2}. (2.4) Take φ(z) = exp{γz}, γ > 0. Then Z 1 E(φ(Z)) = √ exp{γz − z 2 /2}dz 2π Zz 1 = √ exp{−(z − γ)2 /2dz · exp{γ 2 /2} 2π z = exp{γ 2 /2}. So that γ=a P{Z > a} ≤ exp{γ 2 /2 − γa} = exp{−a2 /2}. As an application, √ let X1 , X2 , . . . be N (0, 1) distributed random variables. { max{X1 , . . . , Xn } > 6 log n}. Then p P{An } = P{max{X1 , . . . , Xn > 6 log n} p ≤ nP{X1 > 6 log n} ≤ n exp{−6 log n/2} = 1/n2 . Hence ∞ X P{An } ≤ n=1 ∞ X Let An = 1/n2 < ∞. n=1 Applying the first Borel-Cantelli lemma yields that 0 = P{lim sup An } = P{ lim sup{ n→∞ n→∞ max{X1 , . . . , Xn } √ > 1}}. 6 log n This implies that for a.a. ω max{X1 (ω), . . . , Xn (ω)} ≤ p 6 log n, n ≥ n(ω). A function f : A → R, where A = (a, b) is een open interval of R, is called convex on A if for all x, y ∈ A, one has that f (px + (1 − p)y) ≤ pf (x) + (1 − p)f (y). Important convex functions on R are f (x) = |x|, x2 , exp{αx}. Lemma 2.6 (Jensen’s Inequality, BSP p.31) Suppose that f : A → R is convex on A, with A = (a, b). Suppose that X is a summable r.v. with P{X ∈ A} = 1, E(|f (X)| < ∞. Then Ef (X) ≥ f (E(X)). Problem 2.12 Prove this lemma, by successively carrying out the following steps. 14 i) Show that there exists c ∈ [a, b], such that f is non-increasing on (a, c) and non-decreasing on (c, b). And use this to show continuity of f on A. ii) Show that for x0 < x1 < x2 , x0 , x1 , x2 ∈ A, one has f (x1 ) − f (x0 ) f (x2 ) − f (x0 ) ≥ , x2 − x0 x1 − x0 by suitably expressing x1 as convex combination of x2 and x0 . Show that together with (i) this implies that for each x0 ∈ A there exist a number n(x0 ) with f (x) ≥ f (x0 ) + n(x0 )(x − x0 ), x ∈ A. iii) Finish the proof of the lemma by taking expectation in the last inequality and selecting a suitable value for x0 . Independence We now have a basic probability space (Ω, F, P). Independence of σ-algebras Sub-σ-algebras F1 , F2 , . . . of F are called independent, whenever for each sequence of sets A1 ∈ F1 , A2 ∈ F2 , . . . and each finite set of distinct indices i1 < i2 < · · · < in one has n Y P{Ai1 ∩ · · · ∩ Ain } = P{Aik }. k=1 Independence of r.v.s Random variables X1 , X2 , . . . are independent if the σ-algebras σ(X1 ), σ(X2 ), . . . are independent. Independence of events Events A1 , A2 , . . . are independent if the σ-algebras A1 , A2 , . . . are indepependent, where Ai = {∅, Ω, Ai , Aci }. In other words, if the r.v.s 1{A1 } , 1{A2 } , . . . are independent. Problem 2.13 Show that for independence of A1 , . . ., it is sufficient to check for each finite set of indices i1 , i2 , . . . , in that n Y P{Ai1 ∩ · · · ∩ Ain } = P{Aik }. k=1 Remark: independence of r.v.s X1 and X2 say, does not imply that X2 is not σ(X1 )-measurable. Construct a trivial example to illustrate this. Checking independence of σ-algebras and r.v.s is a cumbersome task, but fortunately π-systems lighten (up) life. Lemma 2.7 Suppose that F1 and F2 are sub-σ-algebras of F. Suppose that there are π-systems I1 and I2 generating F1 and F2 : σ(I1 ) = F1 , σ(I2 ) = F2 . Then F1 and F2 are independent iff I1 and I2 are independent in that P{I1 ∩ I2 } = P{I1 }P{I2 }, 15 I1 ∈ I1, I2 ∈ I2 . Proof. Clearly, independence of the σ-algebras implies independence of the π-systems. So assume independence of the π-systems. The only apparatus for extending assertions on measures to whole σ-algebras we have so far, is the uniqueness lemma 1.1. Let I1 ∈ I1 be given. Then µ(A) = P{I1 ∩ A}, ν(A) = P{I1 }P{A} are measures on I2 (check this). These two measures agree on the π-system I2 . Moreover, µ(Ω) = ν(Ω). By the uniqueness lemma they now agree on the whole of F2 . This implies P{I1 ∩ A} = P{I1 }P{A}, A ∈ F2 . (2.5) Since I1 ∈ I1 was arbitrarily chosen, (2.5) holds for all I1 ∈ I1 and A ∈ F2 . Now, fix A ∈ F2 and define µ(B) = P{B ∩ A}, ν(B) = P{B}P{A}. Again µ and ν agree on I1 , with ν(Ω) = µ(Ω), and so by the uniqueness lemma they agree on the whole of F1 . This is what we wanted to prove. QED Example. Suppose that for two random variables X and Y one has P{X ≤ x, Y ≤ y} = P{X ≤ x}P{Y ≤ y}, x, y ∈ R, i.e. the π-systems π(X) = {(X ≤ x); x ∈ R} and π(Y ) are independent. These π-systems generate the σ-algebras σ(X) and σ(Y ), so that independence of X and Y follows. N.B. The book BSP treats this matter slightly differently - independence of r.v.s is slightly differently defined. Problem 2.14 Let X1 , X2 , . . . be independent r.v.s. Show that the σ-algebras σ(X1 , . . . , Xn ) and σ(Xn+1 , . . . , Xn+l ) are independent. Of course it is nice to define independence, but can one construct r.v.s at all? Remind P independent −n the construction in Problem 1.16. There we had that X(ω) = n ωn s has the uniform distribution on (0, 1]. Problem 2.15 Show that Zn (ω) = ωn , n = 1, . . . are independent, indentically distributed r.v’s, and give their distribution. It follows easily that also X1 (ω) = ω1 2−1 + ω3 2−2 + ω6 2−3 + ω10 2−4 + . . . X2 (ω) = ω2 2−1 + ω5 2−2 + ω9 2−3 + ω14 2−4 + · · · X3 (ω) = ω4 s−1 + ω8 2−2 + ω13 2−3 + ω19 2−4 + · · · and so forth, have the uniform distribution on (0, 1]. The different subsequences of the expansion of ω generating the Xi0 s are disjoint. It is intuitively clear that Xi are independent r.v.s with the same uniform distribution on (0, 1]. Let any sequence of distribution functions Fn , n ∈ N be given. By the Skohodrod representation, one can find r.v.s Yn = gn (Xn ) having distribution function Fn . Independence is preserved obviously. Problem 2.16 Let X and Y be independent r.v’s. Let g, h : R → R be Borel functions. Show that g(X) and h(Y ) are independent. 16 Let us now consider two rv.s X and Y on the probability space (Ω, F, P). For each point ω we have the vector function (X(ω), Y (ω)) taking values in R2 . This gives rise to distributions on the plane R2 and hence of so-called product measures. We will not further discuss this here, but restrict to essentially one-dimensional sub-cases. First instance is when we consider g(ω) = X(ω)Y (ω). Lemma 2.8 If X and Y are independent then E(XY ) = E(X)E(Y ), provided the latter expectations exist (i.e. X and Y are P-summable). Problem 2.17 Prove this by carrying out the following steps. First show the result for elementary functions, then for positive functions. For the latter one uses sequences of approaching elementary functions for X and Y : note these should be independent! Then finish the proof. We now turn to proving a number of results on sequences of random variables. The proof rely on assertions derived or stated hitherto. They will be applicable lateron to stochastic processes. There are two results that concern sequences of independent, identically distributed r.v.s X1 , X2 , . . . on the probability space (Ω, F, P). For the first lemma, we need the concept of stopping time BSP pag.54). T : Ω → R ∪ {∞} is a stopping time for the sequence X1 , . . ., if {T ≤ n} ∈ σ(X1 , . . . , Xn ). In words: the decision to stop before or at time n is taken on basis of the outcomes X1 , X2 , . . . , Xn . Note that we allow T = ∞, this is the non-stopping decision. Let Xn = 1 with probability p and X = −1 with probability (1 − p): it can be interpreted as a respective gain and loss a gambler incurs when tossing a biased coin. Then the gamblers gain or loss after n tosses equals Sn = X1 + · · · + Xn . If the gamblers decides to stop after the n-th game, if gain at that time is some number x, then this is a stopping time. Lemma 2.9 (Wald’s equation) Let X1 , . . . be a sequence ofPi.i.d. distributed r.v.s with finite expectation. Suppose that T < ∞ a.e. and E(T ) < ∞. Then E( Ti=1 Xi = E(X1 )E(T ). Proof. Write ST = steps) EST = = = PT i=1 . Assume first that Xi ≥ 0 P-a.e. Now we have (check the validity of all ∞ Z X ST dP = n=1 ω:T (ω)=n ∞ X ∞ Z X k=1 n=k ∞ X ∞ Z X Sn dP = n=1 ω:T (ω)=n ∞ Z X Xk dP = ω:T (ω)=n k=1 ω:T (ω)≥k ∞ X n Z X n=1 k=1 ∞ X Xk dP = Xk dP ω:T (ω)=n E(Xk 1{T ≥k} ) k=1 E(Xk )P{T ≥ k} k=1 = E(X1 ) ∞ X P{T ≥ k} = E(X1 )E(T ). k=1 For the second equality we use that T < ∞ with probability 1. For the 7th equality we use independence of Xk and 1{T ≥k} . To show independence, we use the fact that 1{T ≥k} = 1 − 1{T ≤k−1} . By 17 definition, 1{T ≤k−1} is σ(X1 , . . . , Xk−1 )-measurable, hence 1{T ≥k} = 1 − 1{T ≤k−1} . It follows that σ(1{T ≥k} ) ⊂ σ(X1 , . . . , Xk−1 ). Since σ(Xk ) and σ(X1 . . . , Xk−1 ) are independent, and so σ(Xk ) and σ(1{T ≥k} ) are independent, i.e. Xk and 1{T ≥k} are independent. Now, for general r.v.s Xn , the assertion follows from the fact that it applies to X1+ , . . . and X1− , . . .. Check this. QED Problem 2.18 Suppose that p ≥ 1/2. The gambler intends to stop the first time t that his total gain is -1 (i.e. he has 1 less than what he started with), i.e. T = t iff t = min{n|Sn = −1}. Assuming that T is finite with probability 1, and has finite expectation, Wald’s equation applies. What contradiction do we get and what might be wrong with our assumptions on T ? Study the case p = 1/2 and p > 1/2 separately. Many interesting events have probability 0 or 1. The first Borel-Cantelli lemma is an assertion on a sequence of events when their probabilities are a finite series. What can we say, if there series diverges? For this, we need an extra condition of independence. Lemma 2.10 (Second P∞ Borel-Cantelli Lemma) Suppose that A1 , A2 , . . . ∈ F is are independent events, such that n=1 P{An } = ∞ Then P{lim supn→∞ An } = 1. Proof. G := ( lim sup An )c = ( ∩m ∪n≥m An )c = ∪m ∩n≥m Acn . ∩rn=m Acn n→∞ c Bm = ∩∞ n=m An . Then G = ∪m Bm and Br,m ↓ Bm . Hence by monotone and Call Br,m = convergence P{Br,m } ↓ P{Bm }. By independence r Y P{Br,m } = P{Acn } = n=m r Y r X (1 − P{An }) = exp { n=m log (1 − P{An })} ≤ exp { − n=m r X P{An }}, n=m where we use that log(1 − x) ≤ −x for x ∈ (0, 1). By taking limits, we obtain r X P{Bm } ≤ lim exp { − r→∞ Now, P{G} ≤ P∞ m=1 P{Bm } P{An }} = 0. n=m = 0, and so P{Gc } = 1, which is what we set out to prove. QED As an example, let Xn , n = 1, 2, . . . be a sequence of i.i.d. distributed random variables. Suppose that Xn are exponentially distributed with parameter 1, i.e. P{Xn > x} = exp{−x}, x ≥ 0. Then P{Xn > α log n} = n−α , α > 0. Applying the 2 Borel-Cantelli lemmas, we find P{Xn > α log n i.o. } = 0, 1, α>1 α ≤ 1. Put S = lim supn→∞ (Xn / log n). S is a r.v.! {ω : S(ω) ≥ 1} = {ω : lim sup(Xn (ω)/ log n) ≥ 1} n→∞ ⊃ {ω : Xn (ω) > log n i.o }. 18 Hence, P{S ≥ 1} = 1. On the other hand, P{S > 1 + 2α−1 } ≤ P{Xn > (1 + α−1 ) log n, i.o } = 0. −1 We have that {S > 1} = ∪∞ α=1 {S > 1 + 2α }, hence P{S > 1} = 0. As a consequence:S ≡ 1 with probability 1. Problem 2.19 Monkey typing the Bible Suppose that a monkey types a sequence of symbols at random, one per unit of time. This produces an infinite sequence Xn , n = 1, 2, . . . of i.i.d. r.v.s, with values in the set of possible symbols on the typing machine. If it is a finite set of symbols, then we agree that minx P{X1 = x} := > 0. The monkey lives infinitely long and types incessantly. Typing the Bible corresponds to typing a particular sequence of say N symbols (N is the number of symbols in the Bible). Let H = { monkey types infinitely many copies of the Bible }. Use the second Borel-Cantelli lemma to show that P{H} = 1. Define suitable Ω, F and P and sets An . t:opg2-2 Problem 2.20 A sometimes convenient charachterisation of convergence with probability 1. Let X, Xn , n = 1, . . . be r.v.s. on the same probability space (Ω, F, P). Then Xn → X with probability 1 iff for all > 0 lim P{ ∪∞ m=n (|Xm − X| > )} = 0. n→∞ or equivalently iff for all > 0 lim P{ ∩m≥n (|Xm − X| ≤ )} = 1. n→∞ Show this. Problem 2.21 (Algebraic..) Let s > 1 and define the Riemann-zeta function ζ(s) = Let X, Y be i.i.d r.v. with n−s P{Y = n} = P{X = n} = . ζ(s) Prove that the events Ap = {X divisible by p}, p prime are independent. Explain Euler’s formula 1 = ζ(s) Y p prime (1 − 1 ) ps probabilistically. Prove that P{ no square other than 1 divides X} = Let H be the highest common factor of X and Y . Prove that P{H = n} = 19 n−2s . ζ(2s) 1 . ζ(2s) P n∈N n −s . Problem 2.22 Suppose that Xi denotes the ‘quality’ of the i-th applicant for a job. Applicants are interviewed in a random order and so one may assume that X1 , . . . are i.i.d. random variables with the same continuous distribution (i.e. they all have a continuous density). What is the probability that the i-th candidate is the best so far? Prove that 1 P{Ei } = , i where Ei = {i-th candidate is best so far} = {Xi > Xj , j < i}. Prove that the events E1 , E2 , . . . are independent. Why would we assume a continuous distribution for the qualities? Suppose that there are only a limited amount of N candidates. Calculate the probability that the i-th candidate is the best amongst all N candidates. Problem 2.23 Let X1 , . . . be i.i.d. r.v.s with the N (0, 1) distribution. Prove that P{lim sup √ n→∞ Xn = 1} = 1. 2 log n Use that for x > 0 1 1 1 1 √ exp{−x2 /2} ≤ P{X1 > x} ≤ √ exp{−x2 /2}, x + 1/x 2π x 2π since X1 has the N (0, 1) distribution. The second inequality can be derived from the fact that d exp{−x2 /2} = −x exp{−x2 /2}. dx We recall one of the versions of the law of large numbers. Theorem 2.11 (Strong Law of Large Numbers) Let Xn , n = 1, 2,P. . . be a sequence of i.i.d. n r.v.s on the probability space (Ω, F, P), with finite expectation. Then i=1 Xi /n → E(X1 ), with probability 1. There are elementary proofs for which one needs only results from these pages, but we will not do that here. Problem 2.24 Is a fair game fair? Let X1 , . . . be independent r.v.s with P{Xn = n2 − 1} = 1/n2 = 1 − P{Xn = −1}. Prove that E(Xn ) = 0, but that X1 + · · · + Xn → −1, with probability 1. n This is counter-intuitive, when bearing in mind the Law of Large Numbers! What would you expect on basis of this law? Problem 2.25 The following is a sometimes simple test of a.s. convergence. Let Xn , n = 1, . . . , X be r.v.s on the same probability space (Ω, F, P). If for all > 0 X P{|Xn − X| > } < ∞ n then Xn → X with probability 1. Hint: use problem 2.20. 20 Problem 2.26 You have a lamp working on a battery. As soon as the battery fails, you replace it with a new one. Batteries have i.i.d. lifetimes, say Xn ≥ 0 is the lifetime of the n-th battery. Assume the lifetimes to be bounded: Xn ≤ M with probability 1 for some constant M . Let N (t) be the number of batteries that have failed by time t. i) Show that in general N (t) is not a stopping time, whereas N (t) + 1 is. Hint: N (t) = n iff X1 + · · · + Xn ≤ t and X1 + · · · + Xn+1 > t. PN (t)+1 ii) Argue that t < E( i=1 Xi ) ≤ t + M . Use Wald’s equation to show the elementary renewal theorem for bounded r.v.s E{N (t)} 1 lim = . t→∞ t E(X1 ) That is: the rate at which batteries fail is exactly 1/expected lifetime. Which is an intuitively obvious result. Problem 2.27 A deck of 52 cards is shuffled and the cards are then turned face up, one at a time. Let Xi equal 1, if the i-th card turned up is an ace, otherwise Xi = 0, i = 1, . . . , 52. Let N denote the number of cards needed to be turned over until all 4 aces appear. That is, the final ace appears on the N th card to be turned over. i) Show that P{Xi = 1} = 4/52. ii) Is Wald’s equation valid? If not, why not? 3 The art of conditioning The corresponding chapter in BSP is clear enough, only few remarks are to be made here. Let us just give the definition of conditional expectation. Suppose we have a probability space (Ω, F, P). Let X be a random variable with finite expectation, i.e. E(|X|) < ∞. Let A ⊂ F be a sub-σ-algebra: say this is our knowledge of the structure of the space Ω, which is coarser than F, but consistent with it. In fact, let us assume that we cannot observe X in detail, that is our knowledge of the space Ω is for instance also coarser than σ(X). Then we have to ‘estimate’ X in a consistent way with our knowledge A. It makes sense to replace X by averaging the values X over all sets A ∈ A. This gives rise to follow theorem-definition (SBP Def. 2.3, Def. 2.4, Prop. 2.3) Theorem 3.1 (Fundamental Theorem and Definition of Kolmogorov 1933) Suppose we have a probability space (Ω, F, P). Let X be a random variable with finite expectation, i.e. E(|X|) < ∞. Let A be a sub-σ-algebra of F. Then there exists a random variable Y such that i) Y is A-measurable; ii) E(|Y |) < ∞; iii) for each A ∈ A we have Z Z Y dP = A XdP. A 21 If Y 0 is another r.v. with properties (i,ii,iii), then Y 0 = Y with probability 1, i.e. P{Y 0 = Y } = 1. We call Y a version of the conditional expectation of E(X|A) of X given A and we write Y = E(X|A) a.s. N.B.1 Conditional expectations are random variables! N.B.2 Suppose we have constructed a A-measurable r.v. Z, with E(|Z|) < ∞, such that (iii) holds for all A ∈ π(A), i.e. (iii) holds on a π-system generating A. Then (iii) holds for all A ∈ A, and so Z is a version of the conditional expectation E(X|A). N.B.3 BSP p.29 list a number of important properties of conditional expectation. An important one on independence lacks. t:l-6 Lemma 3.2 (Independence) Let (Ω, F, P) be a probability space. Suppose that A, G ⊂ F and that X is a r.v. on (Ω, F, P) with finite expectation. Suppose that A is independent of σ(σ(X), G). Then E(X|σ(G, A)) = E(X|G), a.s. (3.1) In particular, choosing G = σ(X), it follows that E(X|A) = E(X), a.s., whenever A and σ(X) are independent. Proof. We may assume that X ≥ 0 with probability 1. For A ∈ A and G ∈ G, X1{G} and 1{A} are independent and so E(X1{G} 1{A} ) = E(X1{G} )E(1{A} ). Since Y = E(X|G) a.s. is G-measurable, also Y 1{G} and 1{A} are independent with E(Y 1{G} 1{A} ) = E(Y 1{G} )E(1{A} ). Since E(X1{G} ) = E(E(X1{G} |G)) = E(1{G} E(X|G)) = E(1{G} Y ). it follows that E(X1{G∩A} ) = E(X1{G} 1{A} ) = E(Y 1{G} 1{A} ) = E(Y 1{G∩A} ). (3.2) For a set C ∈ F, the functions µ(C) = E(X1{C} ), ν(C) = E(Y 1{C} ) define positive, finite measures on (Ω, F, P). Note that the set C = G ∩ A, G ∈ G, A ∈ A, form a π-system for σ(G, A). By (3.2) µ and ν are equal on this π-system, µ(Ω) = ν(Ω) and so there are equal on σ(G, A). Hence Y is a version of E(X|σ(G, A)). QED Many theorem for integrals, i.e. expectations, apply to conditional expectations. Even though the latter are r.v.s and not integrals! We quote some of these. Properties of conditional expectations without proof (see also BSP p.29) Let the probability space (Ω, F, P) be given. Let X, Xn , n = 1, 2, . . ., be r.v.s on this probability space, with finite expectation (E|X|, E|Xn | < ∞). Let A be a sub-σ-algebra of F. conditional monotone convergence If 0 ≤ Xn ↑ X, a.s., then E(Xn |A) ↑ E(X|A) a.s. conditional Fatou If Xn ≥ 0 a.s. and E(lim inf Xn |A) ≤ lim inf E(Xn |A). conditional dominated convergence If Xn → X a.s., and |Xn (ω)| ≤ Y (ω), n = 1, 2, . . ., for the r.v. Y with finite expectation, then E(Xn |A) → E(X|A) a.s. 22 conditional Jensen If f : R → R is a convex function, and E|f (X)| < ∞, then E(f (X)|A) ≥ f (E(X|A)) a.s. Problem 3.1 A rather queer example. Let Ω = (0, 1]. Let A be the σ-algebra generated by all one-point sets {x}, x ∈ (0, 1]. Let P{x} = 0 for all x ∈ (0, 1]. i) Does A contain any intervals? If yes, which ones? What is the relation between A and B(0, 1]? What values can P{A} take for A ∈ A? ii) Let X : (0, 1] → R be any r.v. Determine E(X|A). Explain heuristically. N.B.4 Let X be square integrable. Then the conditional expectation is in fact a least squares estimate or an orthogonal projection of X onto the space of square integrable functions on (Ω, A, P). Some terminology: by E(X|Y ), E(Y1 , Y2 , . . .) we mean E(X|σ(Y )), E(X|σ(Y1 , Y2 , . . .)) etc.etc.etc. Problem 3.2 Let X, Y1 , Y2 be r.v.s on (Ω, F, P). Use BSP p.29 to show the following properties. i) E(Xg(Y1 )|Y1 ) = g(Y1 )E(X|Y1 ), for Borel functions g. ii) E(E(X|Y1 , Y2 )|Y2 ) = E(X|Y2 ). t:opg2-3 Problem 3.3 Let (Ω, F, P) be given and let X be a r.v. Let A1 , A2 , . . . be a measurable partition of Ω, that is: A1 , A2 , . . . ∈ F with Ai ∩ Aj = ∅ and ∪i Ai = Ω. let A = σ(A1 , . . .) be the σ-algebra generated by this partition. i) Show that there is a version Y of E(X|A) that is constant on each of Ai , in particular Y (ω) = E(1{Ai } X) , P{Ai } ω ∈ Ai , provided that P{Ai } > 0. What is the value when P{Ai } = 0? ii) Let Z be any A-measurable r.v. which has distinct values on the Ai , I = 1, . . .. How can you express E(X|A) in term of Z? This is not explicitly stated in BSP, but a very important property The Doob-Dynkin lemma implies that there exists some Borel function g, such that E(X|Y ) = g(Y )!, with probability 1. In this case we write E(X|Y = y) := g(y). A similar assertion holds when Y = (Y1 , Y2 , . . . , Yn ) is a random vector on (Ω, F, P). We can often calculate this and then it is extremely important in computing expectations etc. Problem 3.4 Suppose that X = g(Y ) for some Borel function g. What is E(X|Y )? 23 Note that this entails that E(X|Y ) is constant on sets where Y is constant, i.e. on sets of the form {ω : Y (ω) = y}. Since E(X|Y ) = g(Y ) is a function of Y a.e. , one can write integrals of E(X|Y ) over measurable sets of Ω as integrals over measurable sets of B w.r.t. the induced probability distribution PY of Y : Z Z Z E(X|Y )dP = g(y)dPY (y) = E(X|Y = y)dPY (y). A Y (A) Y (A) Problem 3.5 In the case of a discrete r.v. Y we have seen how to calculate E(X|Y ). Specify E(X|Y = y). Show that X E(X) = E(X|Y = y)P{Y = y}. y Problem 3.6 Suppose that X1 , . . . is a sequence of i.i.d. r.v.s on the probability space (Ω, F, P) with P finite expectation. Let T be a stopping time for this sequence. Let Sn = ni=1 Xi . It is tempting to say that E(ST |T = n) = nE(X1 ). This is not correct in general- explain why and give a counter-example. This conditioning on a value Y = y gives rise to conditioning on events. Say, let A ∈ F. Put Y = 1{A} . Then we define E(X|A) := E(X|Y = 1): this is a number!. If E(X|Y ) = g(Y ), then E(X|A) = g(1). Problem 3.7 Let A ∈ F, and let B1 , . . . be a measurable partition of the set A. Show that X E(X|A)P{A} = E(X|Bi )P{Bi }. i Problem 3.8 Let a probability space (Ω, F, P) be given. Let X and Y be r.v.s on this space with E|X| < ∞. By definition, E(X|Y ) is σ(Y )-measurable. Suppose that Y has R a density w.r.t. the Lebesgue meaure λ, i.e. there is a Borel function fY , such that P{Y ∈ B} = B fY (y)dλ(y). Show that Z E(X) = E(X|Y = y)fY (y)dλ(y). y This is the analogon of the formula for discrete r.v.s Y ! There are now two issues to be addressed. first is that conditional expectations E(X|Y ) are easily calculated when Y is a discrete r.v., taking only countably many values. However, when Y has a more general distribution, it is not that obvious how to do this. A first step in this direction is to write conditional expectation as expectations of r.v.s. 24 Conditional probabilities and conditional distribution functions (pdf ) Since probabilities can be written as expectations, it is clear that one can also condition probabilities. Let A ∈ F and let Y be a r.v. on (Ω, F). Then P{A|Y } := E(1{A} |Y ) = pA (Y ), where pA is Borel-function that depends on A. We call this the conditional probability of A given Y . Write P{A|Y = y} = pA (y), and we call it the conditional probability of A given Y = y. As in the foregoing, Z Z Z −1 P{A|Y = y}dPY (y), P{A|Y }(ω)dP(ω) = 1{A} (ω)dP(ω) = P{A ∩ Y (B)} = Y −1 (B) Y −1 (B) B so we can write this probability in terms of the probability distribution of Y ! Problem 3.9 Calculate P{A|Y = y} when Y is a discrete r.v. Let X be another r.v. on the same probability space. Then we can apply the above to the set A = {X ∈ B 0 }. it is common to write PX|Y (B 0 ) = P{X ∈ B 0 |Y }, PX|Y =y (B 0 ) = P{X ∈ B 0 |Y = y} as the conditional distribution of X given y and given Y = y respectively. This implies that Z Z 0 0 P{X ∈ B ∩ Y ∈ B} = P{X ∈ B |Y }dP = PX|Y =y (B 0 )dPY (y). (3.3) Y −1 (B) B It is a theorem that one can choose a so-called regular version of PX|Y =Y (ω) , which is a probability measure on (R, B) for P-almost all ω ∈ Ω. Problem 3.10 Argue that PX|Y =y (A) = PX (A) when X and Y are independent r.v.s on the same probability space (Ω, F, P). Since PX|Y =y is a probability distribution on (R, B), we can calculate expectations of B-measurable functions. t:l-8 Lemma 3.3 Let φ be a Borel function. Then Z E(φ(X)|Y = y) = φ(x)dPX|Y =y (x). (3.4) R Problem 3.11 Derive this relation, when Y is a discrete r.v. Proof. Why is this so? Again we apply the strategem of going from elementary functions, via nonnegative functions to general functions. First, let φ = 1{B} , B ∈ B. In this case φ(X) = 1{B} (X) = 1{} X −1 (B). Hence def E(φ(X)|Y ) = E(1{B} (X)|Y ) = E(1{X −1 (B)} |Y ) = P{X −1 (B)|Y } = PX|Y (B). On the other hand Z Z φ(x)dPX|Y =y (x) = x dPX|Y =y (x) = PX|Y =y (B). x∈B 25 General elementary functions φ are linear combinations of indicator functions. The assertion then follows from the above and the linearity property BSP29 property (1). For positive functions it follows by monotone convergence of conditional expectations. Finally we write φ = φ+ − φ− , and then the results follows again from linearity. QED We have reduced the problem of computing conditional expectations, to the problem of computing conditional probability distributions. Does this help? Very often, a problem already is formulated in terms of conditional distributions. If this is not the case, one can do something in the following case. Say X, Y have a joint probability density fX,Y , with respect to the Lebesgue measure λ2 on (R2 , B 2 ): Z Z 0 P{X ∈ B , Y ∈ B} = fX,Y (x, y)dλ2 (x, y). x∈B 0 ,y∈B Then fY (y) = R R fX,Y (x, y)dλ(x) acts as a probability density of Y . Define the elementary conditional pdf (=probability density function) of X given Y as ( fX,Y (x,y) if fY (y) 6= 0 fY (y) , fX|Y =y (x) = 0, otherwise. Then Z Z PX|Y =y (A) = P{X ∈ A|Y = y} = fX|Y =y (x)dλ(x), E(φ(X)|Y = y) = φ(x)fX|Y =y (x)dλ(x). x∈A This material is contained in BSP exercise 2.16 and Remark 2.3 and you should be able to do the derivations by help of BSP. One can check the validity of this by checking the definition of conditional expectation by rewriting (3.3). Extra observation on conditioning Often, the same random variable appears in the conditioningPand as part of the random variable that we take the conditional expectation of. For instance, E( Ti=1 Xi |T = t), where T is a r.v. with positive integer values. Intuitively it is clear that we can P P insert the value t for T in the conditioning: E( Ti=1 Xi |T = t) = E( ti=1 Xi |T = t). Is this true generally? For some cases we do know this already: i) E(X + f (Y )|Y = y) = E(X|Y = y) + f (y) = E(X + f (y)|Y = y), by linearity of conditional expectations, for any Borel function f ; ii) or, E(Xf (Y )|Y = y) = E(X|Y = y)f (y) = E(Xf (y)|Y = y), by “taking out what is known”. How can one prove this in case of the above example of a random sum? Let X, Y be given r.v.’s on a probability space (Ω, F, P). Let us consider the functions f : R2 → R, with f B 2 -measurable. A π-system for B 2 is for instance the collection of product sets {(−∞, x] × (−∞, y]|x, y ∈ R}. The question if whether E(f (X, Y )|Y = y) = E(f (X, y)|Y = y) PY -a.s. Define H as the collection of bounded Borel functions f : R2 → R, with E(f (X, Y )|Y = y) = E(f (X, y)|Y = y) PY -a.s. It is straightforward to check that H is a monotone class. 26 Let f = 1{(−∞,a]×(−∞,b])} . If f ∈ H for any a, b ∈ R, then H contains all bounded B 2 -measurable functions by the Monotone class Theorem. We check that f ∈ H: E(f (X, Y )|Y ) = E(1{X≤a} 1{Y ≤b} |Y ) = 1{Y ≤b} E(1{X≤a} |Y ). On the other hand E(f (X, y)|Y ) = E(1{X≤a} 1{y≤b} |Y ) = 1{y≤b} E(1{X≤a} |Y ). On the set Y −1 (y) one has 1{y≤b} = 1{Y ≤b} (the first is the constant function on Ω, either 0 everywhere or 1). So the result is proved, if we take the same version E(1{X≤a} |Y ). Now, think yourself how to extend this to unbounded B 2 -measurable functions f . Another approach is to use joint measures: P{X ≤ a, Y ≤ b} defines a probability measure PX,Y on (R2 , B 2 ). We have seen that Z Z dPX|Y =y (x)dPY (y). PX,Y {(−∞, a] × (−∞, b]} = P{X ≤ a, Y ≤ b} = y≤b x≤a Under assumed regularity conditions, for arbitrary Borel sets B 2 ∈ B 2 one gets by standard procedures Z Z dPX|Y =y (x)dPY (y), PX,Y (B 2 ) = y:∈By x:(x,y)∈B 2 with By = {y ∈ R : ∃x such that (x, y) ∈ B 2 }. So we have an identity for measures. Now by going through the standard machinery of indicator functions, elementary functions, positive and general functions, one can show that Z Z f (X, Y )dP = f (x, y)dPX,Y (x, y) ω x,y Z Z Z = f (x, y)dPX|Y =y (x)dPY (y) = E(f (X, y)|Y = y)dPY (y) y x y provided that E(f (X, y)|Y = y) is a B-measurable function on R! Can you prove this from standard machinery? Remind that E(f (X, Y )|Y ) = g(Y ) for some Borel-function g. Our goal to prove is that one can take g(y) = E(f (X, y)|Y = y) (presumably we have proved measurability). Not to confuse notation, write h(y) = E(f (X, y)|Y = y). We get Z Z h(Y )dP = h(y)dPY (y) ω∈Y −1 (B) y∈B Z Z Z = f (x, y)dPX|Y =y dPY (y) = f (x, y)dPX,Y (x, y) = y∈B x y∈B,x∈R Z = f (X, Y )dP. ω∈Y −1 (B) It follows that h(Y ) is a version of the conditional expectation E(f (X, Y )|Y ). In case that X and Y are independent, we have a simpler expression since in this case PX|Y =y (B) = PX (B): h(y) = E(f (X, y)|Y = y) = EX (f (X, y)), where we take the unconditional expectation w.r.t. X. 27 Help variables Sometimes it is convenient to consider ‘mixtures’ of conditional expectation in the following sense. Let X, Y, Z be r.v.s on (Ω, F, P). One can then speak of E(X|Y, Z = z). Let g(Y, Z) is a Borel function that is a.s. equal to E(X|Y, Z). Then E(X|Y, Z = z) = g(Y, z), where Y is left unspecified. Since σ(Z, Y ) ⊃ σ(Y ), the Tower property yields that E(E(X|Y, Z)|Y ) = E(X|Y ). Let us consider E(E(X|Y, Z)|Y ) = E(g(Y, Z)|Y ). We are in the above situation: E(g(Y, Z)|Y = y) = R E(g(y, Z)|Y = y) = z g(y, z)dPZ|y=y (z). Now, if Z and Y are independent, we find Z E(g(Y, Z)|Y = y) = g(y, z)dPZ (z), z R so that E(X|Y = y) = z g(y, z)dPZ (z). Hence, if the conditional expectation E(X|Y, Z) = g(Y, Z) is easy to calculate, this may help to solve the more complicated problem of calculating E(X|Y ). Problem 3.12 Try to justify all these steps. This procudure may help to attack BSP exercise 2.6 in a more structured way. Problem 3.13 Let X = ξ and Y = η from exercise 2.6. Define an appropriate r.v. Z, such that E(X|Y, Z) can be directly calculated. Compute the desired conditional expectation E(X|Y ). One can derive many convenient statements about these ‘mixed’ conditional distributions. Let X, Y1 , . . . , Yn , Z be r.v.s on the same probability space (Ω, F, P). Problem 3.14 i) Show that E(X|Z = z) = E(E(X|Y1 , . . . , Yn , Z = z)|Z = z). ii) Let Z = z ∈ σ(Y1 , . . . , Yn ). Show that E(E(X|Y1 , . . . , Yn )|Z = z) = E(E(X|Y1 , . . . , Yn , Z = z)|Z = z). d d Problem 3.15 Let X, Y be independent r.v.s with X = exp(λ), Y = exp(µ). Show that d min{X, Y } = exp{λ + µ}. Problem 3.16 Let X1 , . . . , Xn be i.i.d. r.v.s, distributed as a homogeneous distribution on (0, 1) d (Xi = Hom(0, 1)). i) Determine the distribution function FZ and density fZ of Z = max(X1 , . . . , Xn ). ii) Calculate P{Z ≤ z|X1 = x} and the density fZ|X1 =x (z). iii) Calculate P{X1 ≤ x|Z = z} and P{X1 ≤ x|Z}. Hint: use (ii). Calculate E(X|Z). d Problem 3.17 Let U, V i.i.d. r.v.s, with U, V = Hom(0, 1). Let X = min(U, V ) and Y = max(U, V ). Calculate P{Y ≤ y|X} and calculate E(Y |X). 28 Problem 3.18 Let X1 , . . . , Xn be i.i.d. r.v.s with continuous distribution functions F. Let X = max{X1 , . . . , Xn } and Y = min{X1 , . . . , Xn }. Prove the following statements. i) P{Y > y|X = t} = F(x) − F(y) n−1 F(x) , y < x. ii) ( P{Xk ≤ x|X = t} = n−1 F(x) n F(t) , 1, x<t x ≥ t. iii) E(Xk |X = t} = n−1 n · F(t) Z t ydF(y) + −∞ t . n Problem 3.19 Gambler’s ruin. A man is saving money to be a new Jaguar at the cost of N units of money. he starts having k (1 < k < N ) units and tries to win the remainder by the following gamble with his bank manager. He tosses a fair coin repeatedly; id it comes up heads the manager pays him one unit, but if it comes up tails then he pays the bank manager one unit. He plays this game repeatedly, until one of two events occurs: either he runs out of money and is bankrupted or he wins enough to buy the Jaguar. What is the probability that he is ultimately banktrupted? Let Ak denote the event that he is eventually bankrupted, given an initial capital of k units. Write pk = P{Ak }. Let B the event that the first toss of the coin shows heads. Conditioning on B yields a linear relation between pk , pk−1 and pk−1 , for k = 1, . . . , N − 1. This is a linear difference equation with boundary conditions p0 = 1, pN = 0. A trick to solve this (and many similar problems), is to look at the differences bk = pk − pk−1 . The linear difference equation then transforms to a linear relation between bk and bk+1 . i) Solve it and determine pk . ii) One can look at the problem from a different point of view. Say let T be the first time our man either is bankrupted or he has collected the money for buying the Jaguar. Show that T is a stopping time. Assume that it is finite with probability 1 and has finite expectation. Use this to derive the same formula for pk . Problem 3.20 Now the man follows another strategy. He starts by betting one unit of money. If heads come up, the manager pays him is bet, if tails come up, he loses his bet to the manager. Everytime he wins, he increases his bet by one, but he will never bet more than his present capital or the remainder needed to buy the Jaguar. If he loses he decreases the next bet by one, with again the condition that he will not bet more than his present capital and the sum needed to buy the Jaguar. He will always bet at least 1. Denote by Sn his capital after n bets, S0 = k is his initial capital. Let T again denote the moment that the man stops betting. Then let us simply model that the mans capital remains the same forever after. 29 i) Show that E{Sn+1 |S0 , . . . , Sn } = Sn , ii) Show that E(Sn ) = S0 . iii) Assume that we may conclude that E(ST ) = S0 . Determine now the probability that the man gets bankrupted. How do both strategies compare? Problem 3.21 A biased coin is tossed repeatedly. Each time there is a probability p of a head turning up. Let pn be the probability that an even number of heads has occurred after n tosses (zero is an even number). Then p0 = 1. Derive an expression for pn in terms of pn−1 and use it to calculate pn , n = 1, 2, . . .. Sequences of r.v.s and some examples Gambling systems (cf. BSP Ch.3) A casino offers the following game consisting of n rounds. In every round t he bets αt ≥ 0. His bet in round t may depend on his knowledge of the game’s past. The outcomes ηt , t = 1, . . . of the game are i.i.d. r.v.s with values in {−1, 1} and P{ηt = 1} = 1/2 = P P{ηt = −1}. The gambler’s capital at time t is therefore Xt = ti=1 αi ηi . A gambling statregy α1 , α2 , . . . is called admissable if αt is σ(η1 , η2 , . . . , ηt−1 )-measurable. In words this means that the gambler has no prophetic abilities. His bet at time t depends exclusively on observed past history. Example: αt = 1ηt >0 “only bet if you will win” is not admissible. Problem 3.22 By the distribution of outcomes, one has E(Xt ) = 0. Prove this. One has T = min{t|Xt ≤ α} is a stopping time, since {T ≤ t} = ∪tl=0 {Xl ≤ α} and {Xl ≤ α} ∈ σ(η1 , . . . , ηl } ⊂ σ(η1 , . . . , ηt }, l ≤ t. Now, αt = 1{T >t−1} = 1{T ≥t} ∈ σ(η1 , . . . , ηt−1 ) defines an admissible gambling strategy with Xt = t X j=1 where St = Pt j=1 ηj . αj ηj = t X min{t,T } 1{T ≥j} ηj = j=1 X ηj = Smin{t,T } , j=1 Hence ESmin{t,T } = 0 if T is a stopping time. Hedging We have seen that the above gambling strategies cannot modify the expectation: one the everage the gambler wins and loses nothing. Apart from that, which payoffs can one obtain by gambling? We discuss a simple model for stock options. Assume that the stock price either increases by 1 or decreases by 1 every day, with probability 1/2, independently from day to day. Suppose I own αt units of stock at time t. Then the value of my portfolio increases by αt ηt every day (ηt are defined as in the gambling section). Suppose the bank offers the following contract “European option”: at a given time t one has the choice to buy 1 unit of stock for price C or not to buy it. C is specified in advance. Our pay-off per unit stock is (St − C)+ . In exchange, the bank receives a deterministic amount E((St − C)+ ). Can one generate the pay-off by an appropriate gambling strategy? The answer is yes, and in fact much more is true. 30 Lemma 3.4 Let Y be a σ(η1 , . . . , ηn ) measurable function. α1 , . . . , αt such that n X Y − E(Y ) = αj ηj . Then there is a gambling strategy j=1 Proof. Write Fn = σ(η1 , . . . , ηn ). Define αj by αj ηj = E(Y |Fj ) − E(Y |Fj−1 ). We have to show that αj ∈ Fj−1 . Problem 3.23 i) Show that E(αj ηj |Fj−1 ) = 0. ii) Use this fact to show that E(αj |Fj−1 , ηj = 1) = E(αj |Fj−1 , ηj = −1). Now αj is Fj−1 -measurable if αj = 1 E(αj ηj |Fj−1 , σ(ηj )) ηj does not depend on the value ηj . But this follows from the above. Problem 3.24 Explain this. We conclude that αj , j = 1, . . . is a gambling strategy. The result follows by addition. QED Problem 3.25 Symmetry Let X1 , . . . be i.i.d. r.v.s with finite expectation. Let Sn = X1 +· · ·+Xn . In general X1 is not σ(Sn )-measurable for n ≥ 2. Explain, and give an example. Show that with probability 1 we have E(X1 |Sn ) = · · · = E(Xn |Sn ) = 4 1 1 Sn E(X1 + · · · + Xn |Sn ) = E(Sn |Sn ) = . n n n Martingales From now on we will mainly list homework problems. As a basic ‘datum’ we take a filtered space (Ω, F, {Fn }n , P). Here (Ω, F, P) is a probability space and Fn ⊂ F, n = 1, . . . is a filtration, that is F1 ⊆ F2 ⊆ F3 ⊆ · · ·. Define F∞ = σ(∪n Fn ). Let {Mn } be a supermartingale, adapted to the filtration {Fn }n . Problem 4.1 Suppose that S and T are stopping times adapted to the filtered space. Show that min(S, T ) = S ∧ T , max(S, T ) = S ∨ T and S + T are stopping times. 31 Suppose that T is a stopping time that is finite with probability 1. Then {Mn∧T }n is a supermartingale (provided that E|Mn∧T | < ∞) and hence E(Mn∧T ) ≤ E(M0 ). Under what conditions is (4.1) E(MT ) ≤ E(M0 )? Basically one needs a condition ensuring that E(MT ) ≤ lim E(Mn∧T ), n→∞ (4.2) in the supermartingale case, or E(MT ) = lim E(Mn∧T ) n→∞ in the martingale case. The latter amounts to justifying interchange of limit and expectation. BSP gives general conditions for this to happen in the form of (Doob’s) Optional Stopping Theorem. We can also give simpler conditions that often apply and for which (4.2) can be proved in a more direct manner. We give another form of the Optional Stopping Theorem. Theorem 4.1 (Doob’s optional Stopping Theorem) i) Let {Mn }n be a supermartingale and T an a.s. finite stopping time. One has E|MT | < ∞ and (4.1) in each of the following cases. 1. T is a.s. bounded: T (ω) ≤ N for almost all ω ∈ Ω, for some constant N . 2. Mn (ω) ≤ C for some constant C, for almost all ω, n = 0, 1, . . .. 3. E(T ) < ∞ and |Mn (ω) − Mn−1 (ω)| ≤ C for some constant C, for a.a. ω, n = 1, . . .. ii) If {Mn }n is a martingale then E(MT ) = E(M0 ) under any of the conditions 1,2 or 3. iii) Martingale transformation Suppose that {Mn }n is a martingale and T a stopping time satisfying (i, 3). Let {αn }n be an admissible gambling strategy adapted to {Fn }n (or a previsible process), such that |αn (ω)| ≤ C2 for a.a. ω, n = 1, . . ., for some constant C2 . Then E( T X αn (Mn − Mn−1 )) = 0, n=1 in other words, on the average, we cannot turn a neutral game into a profitable (or losing) one. iv) If {Mn }n is a non-negative supermartingale and T is a.s. finite, then (4.1) again applies. Problem 4.2 i) Prove parts (i,ii,iii) of the above Optional Stopping Theorem. ii) Prove (iv). Deduce that λP{supn Mn ≥ λ} ≤ E(M0 ). A problem in applying this theorem is to check a.s. finiteness of the stopping time, and even checking that it has finite expectation. There is a simple result, which applies in many cases. 32 Lemma 4.2 What always stands a reasonable chance of happening, will a.s. happen, sooner rather than later. Let T is a stopping time on the filtered space (Ω, F, (Fn )n , P). Supposet:l-4-1 T has property that for some N ∈ Z+ and some > 0, P{T ≤ t + N | Ft ) > , a.s., t = 1, 2, . . . Then T < ∞ a.s., in particular E(T ) < ∞. Problem 4.3 Prove Lemma 4.2. Hint: using that P{T > kN } = P{T > kN, T > (k − 1)N }, prove by induction that P{T > kN } ≤ (1 − )k . Monkey typing ABRACADABRA At each of time 1, 2, 3, . . . a monkey types a capital letter at random. The sequence of letters form an i.i.d. sequence of r.v.s, uniformly drawn from the 26 possible capital letters. Just before each time t = 1, 2, 3, . . ., a new gambler arrives, carrying D 1 in his pocket. Het bets D 1 that the tth letter will be A. If he loses, he leaves; if he wins he receives D 26 times his bet (so that his total capital of his first bet is D 26!). He bets all of D 26 on the event that the (t + 1)th letter will be B. If he loses, he leaves. If he wins, he will bet his whole fortune of D 262 on the event that the (t = 2)th letter will be R. And so forth through the whole ABRACADABRA sequence. Let T be the first time, by which the monkey has produced the ABRACADABRA sequence. Once this sequence has been produced, gamblers stop to arrive at the system and nothing happens anymore. Problem 4.4 i) Put M0 = 0. Show that the total accumulated gain Mt by the gamblers at time t, t = 0, 1, 2, . . ., is a martingale (loss is a negative gain). ii) Show that T is a.s. finite with E(T ) < ∞. iii) Explain why martingale theory makes it intuitively obvious that E(T ) = 2611 + 264 + 26. Prove this. iv) Can you make a guess of the expected time till the monkey has typed 10 successive A’s? Explain intuitively. Simple and asymmetric random walks Let Xn be a simple or asymmetric rand walk on the integers. Then Xn is a martingale whenever p = 1/2, it is a supermartingale whenever p < 1/2 and a submartingale when p > 1/2. First consider a finite interval (a, b), such that X0 ∈ (a, b). Let T be the first time that Xn leaves this interval, i.e. T = min{n | Xn 6∈ (a, b)}. Problem 4.5 i) Show that Xn − n(2p − 1) is a martingale. ii) Show that T is a.s. finite and has finite expectation. Let p = 1/2. 33 iii) Compute P{XT = a} and P{XT = b}, using the martingale from (i). iv) Compute E(T ). Hint: use one of the ways discussed in BSP or during the lectures, to define a suitable related martingale. In the case that e.g. b = ∞ the result should be intuitively obvious that T is a.s. finite whenever p > 1/2, but it is not whenever p ≤ 1/2. Let a = 0, X0 = 1, b = ∞, that is, we are interested in the probability that random walk will hit 0. There are many ways of investigating this. Here we aim to use methods discussed in Ch. 3 of BSP and the notes. Problem 4.6 i) Use the previous problem to show that T is a.s. finite in the symmetric case. Show that E(T ) = ∞. ii) Assume that p < 1/2. Show that T is a.s. finite by using the martingale from (i) of the previous problem. Hint: show that E(n ∧ T ) ≤ 1/(1 − 2p). Deduce that E(T ) < ∞. This still leaves the case p > 1/2. A simple technique coming from Markov chain theory and potential theory helps. We formulate it in a more general context. Lemma 4.3 Let {Xn }n be a stochastic process with values in Z, adapted to the filtration {Fn }n . Let sH be the collection of functions f : Z+ → R with the following properties: f ≥ 0, and {f (Xn )}n is a supermartingale (adapted to {Fn }n , for any initial position X0 = x. Let T0 = min{n > 0 | Xn = 0}. In Markov chain theory such functions are called non-negative, superharmonic functions. i) Let x > 0 be given. Suppose that P{T0 < ∞ | X0 = x} = 1. Then f (0) ≤ f (x). ii) Show that the stopping time T for the asymmetric walk with p > 1/2, is infinite with positive probability. Hint: construct a function f with f (0) > f (x), such that f (Xn ) is a martingale. Martingale formulation of Bellman’s optimality principle. game n are n , where the n are i.i.d. r.v.s with Your winning per unit stake on P{n = 1} = p = 1 − P{n = −1}, with p > 1/2. Your bet αn on game n must lie between 0 and Zn−1 , your capital at time n − 1. Your object is to maximise your ‘interest reate’ E log(ZN /Z0 ), where N =length of the game is finite and Z0 is a given constant. Let Fn = σ(1 , . . . , n ) be your ‘history’ upto time n. Let {αn }n be an admissible strategy. Problem 4.7 Show that log(Zn ) − nα is a supermartingale with α the entropy given by α = p log p + (1 − p) log(1 − p) + log 2. Hence log(Zn /Z0 ) ≤ N α. Show also that for some strategy log(Zn ) − nα is a martingale. What is the best strategy? 34 5 Martingale convergence problems Let the filtered probability space (Ω, F, {Fn }n , P) be given. All processes are again processes on this space, adapted to the filtration {Fn }n . A summary of the L1 -supermartingale convergence theorem is as follows. Theorem 5.1 (BSP Thm.4.3, 4.4) Let {Mn }n=0,1,... be a UI supermartingale. Then Mn → M∞ a.s. for some r.v. M∞ , and even Mn → M∞ in L1 , i.e. E|Mn − M∞ | → 0, n → ∞. If {Mn }n is a martingale, then Mn = E(M∞ |Fn ), and so {Mn }n is a Doob type martingale w.r.t. M∞ . It is useful to quote the following theorem, which extends BSP exercise 4.5. Theorem 5.2 (Levy’s ‘Upward’ Theorem) Let X be a r.v. with E|X| < ∞. Then Mn = E(X|Fn ) is a UI martingale. Let M∞ = a.s. limn→∞ Mn , then M∞ = E(X|F∞ ), a.s. where F∞ = σ(∪n Fn ). That M∞ = E(X|F∞ ) is by no means trivial. It amounts again to justifying a limit interchange: limn E(X|Fn ) = E(X|σ(limn Fn )). Proof. We only have to prove that M∞ = E(X|F∞ ), a.s. Let Y = E(X|F∞ ), a.s., and suppose that P(Y 6= M∞ ) > 0. We may assume that X ≥ 0, a.s. Define two measures on (Ω, F∞ ): µ1 (A) = E(Y 1{A} ), A ∈ F∞ . µ2 (A) = E(M∞ 1{A} ), For B ∈ Fn we have B ∈ F∞ and so µ1 (B) = E(Y 1{B} ) def of Y = E(X1{B} ) def of Mn = E(Mn 1{B} ) BSP Thm. 4.4 = E(M∞ 1{B} ) = µ2 (B). Hence µ1 and µ2 agree on the π-system ∪n Fn and therefore they agree on F∞ . Now Y is F∞ -measurable. Take M∞ = lim supn Mn , then F∞ -measurable. Hence F = 1{Y >M∞ } is F∞ -measurable and so E((Y − M∞ )1{F } ) = µ1 (F ) − µ2 (F ) = 0. Since (Y − M∞ )1{F } ≥ 0, it follows that P{F } = 0. Similarly, P{Y < M∞ } = 0. QED Theorem 5.3 (Kolmogorov’s 0-1 law) Let X1 , . . . be a sequence of independent r.v.’s. Then P{A} = 0 or 1 for all A ∈ T , with T the tail-σ-algebra. 35 Proof. Define Fn = σ(X1 , . . . , Xn ). Let A ∈ T , and let X = 1{A} . By Levy’s upward theorem, X = E(X|F∞ ) = lim E(X|Fn ), n a.s. Now X is Tn+1 -measurable. Since Tn+1 and Fn are independent, it follows that X is independent of Fn . And so, E(X|Fn ) = E(X) = P{A}. Consequently, X = P{A}, a.s. The result follows, since indicator functions take only the value 0 or 1. QED There is a nice proof of the strong Law of Large Numbers using Kolmogorov’s 0-1 Law. To this end we will in fact use so-called ‘reverse martingales’. Theorem 5.4 (Levy’s Downward Theorem) Let (Ω, F, P) be a probability space. {F−n }n=0,... be a non-increasing collection of sub-σ-algebras of F with Let F−1 ⊇ F−2 ⊇ · · · ⊇ F−n ⊇ · · · ⊇ F−∞ = ∩n F−n . Let X be a r.v. with E|X| < ∞, and define M−n = E(X|F−n ). Then M−∞ = limn→∞ M−n exists a.s. and in L1 . Moreover M−∞ = E(X|F−∞ ), a.s. Problem 5.1 Prove the theorem. Use the techniques that were used for Doob’s submartingale convergence theorem, the L1 -convergence and Levy’s Upward Theorem. Let X1 , . . . be a sequence of i.i.d. r.v.s with finite expectation. Write Sn = F−n = σ(Sn , Sn+1 , . . .). Pn i=1 Xi . Define Problem 5.2 i) Show that E(X1 |Fn ) = Sn /n, a.s. ii) Show that limn→∞ Sn /n exists a.s. and in L1 , and that it equals E(X1 ). Galton-Watson process- the simplest form of a branching process This is a simple model for population growth, growth of the number of cells, etc. Suppose that we start with a population of 1 individual at time 0, i.e. N0 = 1. The number of t-th generation individuals is denoted by Nt . Individual n from this generation has an amount of offspring Ztn . We assume that Ztn , n = 1, . . . , Nt , t = 0, . . . are bounded i.i.d. r.v.s, say P{Ztn = k} = pk , k = 0, . . . , K, P t P n for some constant K > 0, and that p0 > 0. Clearly, Nt+1 = N n=1 Zt and µ = E(N1 ) = k kpk . We are interested in the extinction probability of the population as well as the expected time till extinction. Define T = min{t | Nt = 0}. Problem 5.3 i) Show that Nt /µt is a martingale with respect to an appropriate filtration. Let now µ < 1, that is, on the average an individual produces less than one child. ii) Show that Nt → 0 a.s. What does this imply for the extinction time T ? 36 iii) Show that Mt = αNt 1{Nt >0} is a contracting supermartingale for some α > 1, i.e. for some α > 1 there exists β < 1 such that E(Mt+1 |Ft ) ≤ βMt , t = 1, . . . iv) Show that this implies that E(T ) < ∞. What is the smallest bound on E(T ) you can get? The case of a population that remains constant on the average, is more complicated. Let TN = min{t|Nt = 0 or Nt ≥ N }. Intuitively it is clear that TN should be a.s. finite. In order to prove this, define the function f (x) = P{0 < Nt < N, for all t ≥ n}|Nn = x}. Problem 5.4 i) Show that f (Nt ) is a supermartingale. ii) Show that this implies that f ≡ 0, for all values of µ. Hint: consider the value f (x∗ ) where x∗ = argmax{f (x)}. iii) Let µ = 1. Use (ii) to show that P{T < ∞} = 1. Is Nt a UI martingale in this case? Explain. iv) Prove that P{T < ∞} < 1 whenever µ > 1. You can prove this by using arguments that you have seen before during the course. We are still left with the question whether the average time till extinction is finite or not, in the critical situation µ = 1. The answer is that E(T ) = ∞, for which there seems to exist no probabilistic proof. Problem 5.5 Find the simplest proof in the literature of this statement and write it down in your own words. 6 Continuous time processes: the Wiener process or Brownian motion Multivariate normal distribution Let X1 X2 X= ... Xk | distribution, µ ∈ Rk , Σ | a k×k be a k-dimensional random vector. We say that X has a N (µ, Σ) T T T | positive definite matrix, if a X has the normal distribution N (a µ, a Σa), for all a ∈ Rk . The simultaneous distribution of X is given by fX (x) = p 1 1 | −1 (x − µ)}, exp{− (x − µ)T Σ 2 | (2π)n det(Σ) x ∈ Rk . | ∗ ) distribution A first consequence is that for a non-singular matrix B, the vector BX has the N (µ∗ , Σ ∗ ∗ T | = BΣB . with µ = Bµ and Σ 37 The definition implies all information on the components Xi and their correlations cov(Xi , Xj ) = d | ii ). Using E(Xi Xj ) − E(Xi )E(Xj ). Putting a = ei , the i-th unit vector, we obtain that Xi = N (µi , Σ | ij . a = ei + ej , one can deduce that cov(Xi , Xj ) = Σ | ij = 0 implies independence of Xi and Xj . This is a By using the density, one can then show that Σ special properties of normally distributed r.v.s Next we define Brownian motion. Brownian motion or Wiener process The stochastic process W (t), t ∈ R+ , defined on the probability space Ω, F, P) is called a standard Brownian motion (or standard Wiener process) if i) W (0) = 0 a.s.; ii) (W (t1 , . . . , W (tn )) has a multivariate normal distribution, for all n and all times 0 < t1 < t2 < · · · < tn ; iii) E(W (t)) = 0 for t > 0; iv) cov(W (s), W (t)) = min(s, t); v) W (·, ω) is a continuous function for a.a. ω ∈ Ω. Problem 6.1 Let 0 < t1 , · · · < tn . By assumption (W (t1 ), . . . , W (tn )) has a multivariate normal distribution. Compute the covariance matrix. Construction of Brownian motion on [0, 1] For each ω we will define a uniformly convergent sequence of continuous functions Wl (t, ω), t ∈ [0, 1], l = 0, 1, . . .. d Define W0 (0) = 0, and choose W0 (1) = ∆0,0 = N (0, 1). Extend W0 (t), 0 < t < 1, by linear interpolation: W0 (t) = t · W0 (1). d Next let ∆1,1 = N (0, 1/4) be drawn independently of ∆0,0 . Define W1 (0) = 0, 1 W1 (1/2) = W0 (1) + ∆1,1 2 and W1 (1) = W0 (1) and define W1 (t), t 6= 0, 1/2, 1 by linear interpolation. It is easily checked that (W1 (1/2), W1 (1)) has a multivariate normal distribution with the properties (iii,iv). Indeed cov(W1 (1/2), W1 (1)) = cov( 12 W0 (1), W0 (1)) = 21 σ 2 (W0 (1)) = 1 2 = min( 12 , 1). Further σ 2 (W1 (1/2)) = 14 σ 2 (W0 (1)) + σ 2 (∆1,1 ) = 12 . d The construction of Wl+1 (t) from Wl (t) is as follows. Let ∆l+1,j = N (0, 2−(l+2) ), j = 1, . . . , 2l , be independent and independent of ∆i,j , i ≤ l, j = 1, . . . , 2i−1 . Assign Wl+1 (0) = 0, Wl+1 ((2j − 1)2−(l+1) ) = Wl ((2j − 1)2−(l+1) ) + ∆l+1,j , and Wl+1 (j2−l ) = Wl (j2−l ), 38 j = 1, . . . , 2l . j = 1, . . . , 2l For t 6= j2l+1 , j = 0, 1, . . . , 2l+1 , we define Wl+1 (t) by linear interpolation: Wl+1 (t) = Wl+1 (j2−(l+1) ) + t − j · 2−(l+1) · Wl ((j + 1)2−(l+1) ), 2−(l+1) j = 1, . . . , 2l+1 − 1. Then Wl+1 (t), t = j · 2l+1 , has the multivariate normal distribution with properties (iii,iv) from the definition of standard Brownian motion. Lemma 6.1 sup0≤t≤1 |Wn (t) − Wm (t)| → 0 a.s., n, m → ∞, i.e. P{ω : sup |Wn (t, ω) − Wm (t, ω)| 6→ 0, for some sequence nk,ω , mk,ω , k = 1, 2, . . .} = 0. 0≤t≤1 Proof. Let Xl,j , j = 1, . . . , 2l−1 , l = 1, . . . be a collection of i.i.d. N (0, 1)-distributed r.v.s. Clearly, for l ≥ 1 1 · max {|Xl1 |, . . . , |Xl2l−1 |} 2(l+1)/2 sup |Wl (t) − Wl−1 (t)| ≤ max {|∆l,j |, j = 1, . . . , 2l−1 } = 0≤t≤1 Put An = {ω : |Xlj (ω)| > 2 · max p 6 log(2n − 1)} l=1,...,n j=1,...,2l−1 P (there are nl=1 2l−1 = 2n − 1 i.i.d. N (0, 1)-distributed r.v.s that determine the max). We have seen that P{An } ≤ 1/(2n − 1)2 and so the first Borel-Cantelli lemma implies that P{lim supn→∞ An } = 0. Put A = lim supn→∞ An . Fix any ω ∈ Ac . There exists nω , such that sup |Wn (t, ω) − Wn−1 (t, ω)| ≤ 0≤t≤1 1 2(n+1)/2 ·2· p 6 log(2n − 1), n ≥ nω . Consequently for m > n sup |Wm (t, ω) − Wn (t, ω)| ≤ 0≤t≤1 ≤ m X |Wl (t, ω) − Wl−1 (t, ω)| l=n+1 m X 1 2(l+1)/2 l=n+1 ·2· q 6 log(2l − 1) → 0, m, n → ∞ QED As a result, for ω ∈ Ac the sequence of continuous functions Wn (·, ω) has a continuous limit W (·, ω). To see that this limit defines is a Brownian motion on [0, 1], we have to do some work still. Let 0 < t1 < t2 < · · · < tn ≤ 1. We have to show that (W (t1 ), . . . , W (tn )) is a random vector with the desired properties. This is clearly true if all tk all are dyadic rationals jk /2l : on these points m W (tk ) = Wn (tk ) for n ≥ l. Otherwise let tm k > tk , tk → tk , m → ∞, be a sequence of dyadic rationals. Then m (W (t1 , ω), . . . , W (tn , ω)) = lim sup (W (tm 1 , ω), . . . , W (tn , ω)), m→∞ 39 ω ∈ Ac , by continuity. If A = ∅ then the limsup is measurable. If A 6= ∅ we need in fact that F be extended with all subsets of 0-probability sets. This is a little beyond the scope of the course. m Now, (W (tm 1 ), . . . , W (tn )) converges a.s. to the random vector (W (t1 , . . . , W (tn ). The corresponding multivariate normal densities converge as well to desired multivariate normal density. Hence the corresponding distribution functions converge. One can then show that the limit distribution function is the distribution function of (W (t1 , . . . , W (tn )). t:l6.2 Lemma 6.2 Let (Ω, F, P) be a probability space. Let X and Xn , n = 1, 2, . . ., be r.v.s on this probability space, such that Xn → X a.s. Let Fn (·) = P{Xn ≤ ·} be the distribution function of Xn and assume that Fn → F for some distribution function F. Then F is the distribution function of X. Problem 6.2 Prove this lemma - Fatou’s lemma for sets plays a role here. With probability 1 Brownian motion paths W(t), 0 ≤ t, are nowhere differentiable. In other words: there exists a set A ∈ F, P{A} = 0, such that W (·, ω) is nowhere differentiable for all ω ∈ Ac . Note: we assume that W (·, ω) has continuous paths on R+ , for a.a. ω. We have proved this only for a compact time interval. Let k k+2 k+1 k+3 k+2 ) − W ( ) , W ( ) − W ( ) , W ( ) − W ( ) }. Xnk = max{W ( k+1 n n n n n n 2 2 2 2 2 2 These differences have the distribution of 2−n/2 · W (1), and so P{Xnk ≤ } = P3 {|W (1)| ≤ 2n/2 } ≤ (2 · 2n/2 · )3 , since the density of the standard normal distribution is bounded by 1. For Yn = mink≤n2n Xnk , we have P{Yn ≤ } ≤ n · 2n (2 · 2n/2 · )3 . (6.1) Problem 6.3 Explain (6.1). Let DW (tω) = lim sup h↓0 W (t + h, ω) − W (t, ω) , h DW (tω) = lim inf h↓0 W (t + h, ω) − W (t, ω) . h Let E = {ω : DW (tω), DW (t, ω) are both finite for some t}. It is not clear whether E ∈ F is a measurable set! Choose ω ∈ E. Then there exists K = K(ω) such that −K < DW (t, ω) ≤ DW (t, ω) < K, for some time t = t(ω). Then there exists a constant δ = δ(ω, t, K), such that |W (s)−W (t)| ≤ K|s−t| for all s ∈ [t, t + δ]. Hence, there exists n0 = n0 (δ, K, t), such that for n > n0 4 · 2−n < δ, 8K < n, n > t. Given such n, choose k so that (k − 1)2−n ≤ t < k2−n . it follows that |i · 2−n − t| < δ for i = k, k + 1, k + 2, k + 3, and so Xnk (ω) ≤ 2K(4 · 2−n ) < n · 2−n . 40 (6.2) Problem 6.4 Explain (6.2). Since, k − 1 ≤ t · 2n < n · 2n , it follows that Yn (ω) ≤ Xnk (ω) ≤ n · 2−n . We have thus shown that for ω ∈ E, there exists Nω such that ω ∈ An = {ω : Yn (ω) ≤ n · 2−n } for n ≥ Nω . So, E ⊂ lim inf n An . By virtue of (6.1), P{An } ≤ n · 2n (2 · 2n/2 · n2−n )3 . Thus P{lim inf n An } ≤ lim inf n→∞ P{An } = 0. By extending F with all sets contained in sets of probability 0, we obtain that P{E} = 0. This example shows again the necessity of such an extension procedure! Markov property and strong Markov property Fix t. Put Ft = σ(W (s), s ≤ t} and F0 = {∅, Ω}. Now W (T +t)−W (T ), t ≥ 0 is independent of FT . This is the Markov property of Brownian motion. Moreover, it is a Brownian motion. Problem 6.5 prove these statements. We may even allow T to be a stopping time. T is a stopping time if T is a non-negative r.v. on (Ω, F, P), such that {ω : T (ω) ≤ t} ∈ Ft . Define FT to be the collection of all sets M ∈ F such that M ∩ {ω : T (ω) ≤ t} ∈ Ft for all t ≥ 0. Problem 6.6 Deduce that {ω : T (ω) = t} ∈ Ft and that M ∈ FT implies M ∩ {ω : T (ω) = t} ∈ Ft . Now, let T be a stopping time and put W ∗ (t) = W (T + t) − W (T ). Then the strong Markov property holds: W ∗ (t), t ≥ 0 is independent of FT (i.e. σ(W ∗ (t), t ≥ 0} is independent of FT . Moreover W ∗ (t) is a Brownian motion. This is true, if for all x1 , . . . , xk ∈ R, t1 < · · · < tk , k = 1, . . ., and all M ∈ FT we have P{(W ∗ (t1 ) ≤ x1 , . . . , W ∗ (tk ) ≤ xk ) ∩ M } = P{W ∗ (t1 ) ≤ x1 , . . . , W ∗ (tk ) ≤ xk } · P{M } = P{W (t1 ) ≤ x1 , . . . , W (tk ) ≤ xk } · P{M }. (6.3) To prove this, first assume that {T ∈ A} for a countable set A with probability 1. Since {ω : W ∗ (t) ≤ x} = ∪T ∈A {ω : W (T + t, ω) − W (T, ω) ≤ x, T (ω) = t)} ∈ F, it follows that W ∗ (t) is F-measurable. Moreover, X P{(W ∗ (t1 ) ≤ x1 , . . . , W ∗ (tk ) ≤ xk ) ∩ M } = P{(W ∗ (t1 ) ≤ x1 , . . . , W ∗ (tk ) ≤ xk ) ∩ M ∩ (T = t}. t∈A 41 If M ∈ FT , then M ∩ (T = t) ∈ Ft . Further, if T = t, then (W ∗ (t1 ), . . . , W ∗ (tk )) has the same distribution as W (t1 + t) − W (t), . . . , W (tk + t) − W (t)). We obtain P{(W ∗ (t1 ) ≤ x1 , . . . , W ∗ (tk ) ≤ xk ) ∩ M } X = P{(W (t1 + t) − W (t) ≤ x1 , . . . , W (tk + t) − W (t) ≤ xk ) ∩ M ∩ (T = t)} t∈A = X P{(W (t1 + t) − W (t) ≤ x1 , . . . , W (tk + t) − W (t) ≤ xk }P{M ∩ (T = t)} t∈A = P{(W (t1 + t) − W (t) ≤ x1 , . . . , W (tk + t) − W (t) ≤ xk }P{M }. This proves that the first and last terms in (6.3) are equal. To prove equality of the second and last terms, simply take M = Ω. Consequently, the assertion has been proved for stopping times with a countable range. Let T be an arbitrary stopping time. Define k · 2−n if (k − 1)2−n < T ≤ k · 2−n , τn = 0, if T = 0. k = 1, 2, . . . If k · 2−n ≤ t < (k + 1) · · · 2−n , then {τn ≤ t} = {T ≤ k · 2−n } ∈ Fk2−n ⊂ Ft . If follows that τn is a stopping time with a countable range. Suppose that M ∈ FT and k · 2−n ≤ t < (k + 1) · · · 2−n . Then M ∩ {τn ≤ t} = M ∩ {T ≤ k2−n } ∈ Fk2−n ⊂ Ft . So FT ⊂ Fτn . Let W (n) (t, ω) = W (τn (ω) + t, ω) − W (τn (ω), ω) be the displacement process after stopping time τn . Since M ∈ FT implies M ∈ Fτn , we have by virtue of (6.3) P{(W (n) (t1 ) ≤ x1 , . . . , W (n) (tk ) ≤ xk ) ∩ M } = P{W (n) (t1 ) ≤ x1 , . . . , W (n) (tk ) ≤ xk }P{M }. (6.4) However, τn (ω) → T (ω) for all ω and by a.s. continuity of the sample paths, W (n) (t, ω) → W ∗ (t, ω) for a.a. ω. To finish the proof, we have to invoke Lemma 6.2. Problem 6.7 Finish the proof by suitably applying this Lemma. Curious properties Let Ta be the first time that the Brownian motion process hits the set [a, ∞). Problem 6.8 i) By conditioning on the event {Ta ≤ t} show that 2P{W (t) ≥ a} = P{Ta ≤ t}. ii) Use this to show that 2 P{Ta ≤ t} = √ 2π Z ∞ √ exp{−y 2 /2}dy. a/ t Compute the corresponding density fTa (t). Derive that Ta < ∞ a.s., but E(Ta ) = ∞. Compute P{max0≤s≤t W (s) ≥ a}. How often does Brownian hit 0 in a finite time interval? 42 Problem 6.9 Let ρ(s, t) be the probability that a Brownian motion path has at least one zero in (s, t). i) Deduce that ρ(s, t) = 1 − q 2 arcsin st . π ii) use (i) to show that the position of the last zero before time 1 is distributed over (0, 1) with density π −1 (t(1 − t))−1/2 . iii) For each ω let Z(ω) = {t : W (t, ω) = 0} be the set of zeroes of W (·, ω). Show that λ(Z(ω)) = 0 for a.a. ω, in words, the Lebesgue measure of the set of zeroes of W (·, ω) is 0 a.s. We give some application of the use of stopping times. The first one is the curious phenomenon that one can embed any given distribution law in a Brownian motion. One version of this statement is the so called Skorokhod embedding - which is in a sense a minimal construction in the sense that the stopping time involved has finite expectation. Without this minimality condition, it is an almost trivial statement, as pointed out by Doob. Problem 6.10 Let F be a distribution function. Determine a function h such that P{h(W (1)) ≤ x} = F(x). Show that τ = min{t > 1|W (t) = h(W (1))} is an a.s. finite stopping time. Show that W (τ ) has distribution function F and that E(τ ) = ∞. The stochastic process X(t) = µ · t + W (t), t ≥ 0, is called a Brownian motion with drift µ. Remind that we can associate 3 martingales with the Brownian motion: W (t), t ≥ 0, W 2 (t) − t, t ≥ 0 and exp{cW (t) − c2 t/2}, t ≥ 0. With the Brownian motion with drift, one can also associate martingales. Problem 6.11 Show that X(t) − µt, t ≥ 0, and exp{−2µW (t)}, t ≥ 0 are martingales. Let a < 0 < b and suppose that X(0) = x ∈ (a, b). We are interested in the probability px that a is hit before b. t:q1 Problem 6.12 i) Let T = min{t | X(t) ∈ {a, b}}. Show that T < ∞ with probability 1. ii) Use the continuous time version (not formulated but evident) of the optional stopping theorem to compute px . iii) Show that E(T ) < ∞. Compute E(T ) through a suitable martingale. 7 Diffusions and Ito processes Let us first give a proof of Theorem 7.4 from BSP. Rt Theorem 7.1 Let f be a stochastic process belonging to M2t and let I(t) = 0 f (s, ω)dW (s, ω). Then there exists a stochastic process ζ(s), s ≤ t, such that ζ(·, ω) is continuous for a.a. ω and P{ξ(s) = ζ(s)} = 1 for all s ∈ (0, t]. 43 Proof. Let fn → f be a sequence of approximating random step functions. Clearly, {Is (fn )}0≤s≤t is a.s. continuous. Since {Is (fn )}0≤s≤t is a martingale, also {Is (fn ) − Is (fm )}0≤s≤t , is a martingale. Hence, {(Is (fn ) − Is (fm ))2 }0≤s≤t , is a sub-martingale. We may apply Doob’s maximal inequality yielding that P{ sup |Is (fn ) − Is (fm )| > ) = P{ sup |Is (fn ) − Is (fm )|2 > 2 ) 0≤s≤t 0≤s≤t ≤ = 1 E(It (fn ) − It (fm ))2 ) 2 1 1 ||(It (fn ) − It (fm ))||2L2 = 2 ||fn − fm ||2M2 → 0, 2 t n, m → ∞. It follows that there exists a subsequence {nk }k , such that P{ sup |Is (fnk ) − Is (fnk+1 )| > 2−k } < 2−k . 0≤s≤t We may apply the first Borel-Cantelli Lemma to obtain that for almost all ω there exists an index k(ω) such that sup |Is (fnk )(ω) − Is (fnk+1 )(ω)| ≤ 2−k , k ≥ k(ω). 0≤s≤t Hence, the sequence Is (fnk )(ω) converges uniformly on (0, t] for a.a. ω. Hence the limit Js (ω) = L2 limk→∞ Is (fnk )(ω) is a continuous function on (0, t] for a.a. ω. Now, Is (fnk ) → Is , k → ∞, for s ∈ (0, t]. Hence, there is a subsequence converging to Is for a.a. ω. It follows that P{Is = Js } = 1 for s ∈ (0, t]. QED L2 Problem 7.1 Suppose that X, Xn , n = 1, . . . are r.v.s in L2 (Ω, F, P). Assume that Xn → X. i) Show that limn→∞ P{|Xn − X| > } = 0, for each > 0. ii) Use this to show that there is a subsequence {nk }k along which there is a.s. convergence, i.e. Xnk → X for a.a. ω. The proof of the above theorem will given some indications as how to prove this. Problem 7.2 Brownian bridge Let a, b ∈ R be given. Consider the following 1-dimensional equation: b − Y (t) dY (t) = dt + dW (t), 0 ≤ t < 1, Y (0) = a. 1−t Verify that Z t dW (s) Y (t) = a(1 − t) + bt + (1 − t) , 0 ≤ t < 1, 0 1−s solves the equation and prove that limt→1 Y (t) = b a.s. So far, we have studied how to construct Ito-processes. However, given any stochastic differential equation, there is no clue so far, as how to judge whether there exists a solution and, if it exists, whether it is unique (with prob. 1). 44 SBP treats the case of a so-called Ito-diffusion. Let us give the definition for the n-dimensional case. A time-homogeneous Ito diffusion is a stochastic vector process X(t, ω) = (X1 (t, ω), . . . , Xn (t, ω)) on (Ω, F, P) that satisfies a stochastic differential equation of the form dX(t) = b(X(t))dt + σ(X(t))dW (t), t ≥ s, X(s) = x, where W (t) = (W1 (t), . . . , Wd (t)) is a d-dimensional Brownian motion, b : Rn → Rn , σ : Rn → n × d. We assume that b and σ satisfy a Lipschitz condition: there exists a constant C such that ||b(x) − b(y)|| + |||σ|(x) − |σ|(y)|| ≤ C||x − y||. (7.1) Since we have not spoken of the multi-dimensional case, let us shortly spend a few words on it. (we need it in the later examples) A d-dimensional Brownian motion is simply the vector process associated with d independent one-dimensional Brownian motions defined on the same space. The SDE then simply stands for dXi (t) = bi (X(t))dt + σi1 (X(t))dW1 (t) + . . . + σid (X(t))dWd (t), i = 1, . . . , n. We can now set Ft = σ(Wi (s), 0 < s ≤ t, i = 1, . . . , d). The analog of BSP Theorem 7.7 gives that under the above Lipschitz condition there is an a.s. unique solution of the initial value problem with a.s. continuous paths dX(t) = b(X(t))dt + σ(X(t))dW (t), 0 ≤ t ≤ T X(0) = X0 , provided E(X0 )2 < ∞ and X0 independent of σ(Ft , t > 0). This solution is adapted to the filtration RT σ(Ft , σ(X0 ) and has 0 Xi2 (t)dt < ∞ for i = 1, . . . , n. Liptschitz conditions are also commonly used in (deterministic) differential equations for guaranteeing existence and uniqueness properties. Next we will list a number of properties of Ito diffusions. In fact, these properties are inherent to so-called diffusion processes given technical conditions. One does not need the notion of SDE’s and Ito integrals for arriving at these properties. However, it appears from the literature that SDE’s are an efficient formalism for deriving existence and uniqueness results for diffusion processes with certain given properties. In fact it constructs a diffusion process with given properties from Brownian motion paths. Some authors claim this as the key of Ito’s contribution to the field of diffusion processes. The properties described lateron rely on Ito’s formalism. In our case it is better not to depart to the field of diffusion processes. Even more so, because there are many conflicting definitions of this notion. The best advice for a rigorous treatment are the books by Rogers and Williams (the latter being the author of the martingale book). From now on, we will only consider Ito diffusions. One can prove that this are strong Markov processes. The infinitesimal generator A of the process X(t) is defined by def Af (x) = lim t↓0 E(f (X(t))|X(0) = x) − f (x) , t x ∈ Rn . If for a given function f , this limit exists for all x, then we say that f belongs to DA , the domain of the generator. Let C02 (Rn ) be the set of twice continuously differentiable functions on Rn with 45 compact support. Then one can prove that Af (x) exists for all x ∈ Rn and Af (x) = X i bi (x) 1X ∂f ∂2f (x) + (σ(x)σ T (x))ij 2 (x), ∂xi 2 ∂x i,j whenever f ∈ C02 (Rn ). Note that A is a linear operator on C02 (Rn ). It is obvious that Brownian motion is a time-homogeneous one-dimensional Ito diffusion with infinitesimal parameters b(x) = 0 and σ(x) = (1). The infinitesimal operator A associated with it, is given by 1 ∂2f 1 Af (x) = (x) =: ∆f (x). 2 2 ∂x 2 P (∆ stands for the Laplace operator: if f : Rn → R is twice differentiable, then ∆f = ni=1 (∂ 2 /∂x2i )f . One can model the graph of Brownian motion by a two-dimensional diffusion as follows: X(t) = (X1 (t), X2 (t)) with X1 (t) = t and X2 (t) = W (t). Problem 7.3 Compute the corresponding infinitesimal generator. The Ornstein-Uhlenbeck process is the Ito diffusion defined by d(X(t)) = −αX(t)dt + σdW (t), with α. Problem 7.4 Give the infinitesimal generator. The infinitesimal generator contains the information on the marginal distributions of an Ito diffusion. Lemma 7.2 (Dynkin’s Lemma) Let f ∈ C02 (Rn ). Suppose that τ is a stopping time with E(τ |X(0) = x) < ∞. Then Z τ E(f (X(τ ))|X(0) = x) = f (x) + E( Af (X(s))ds|X(0) = x). 0 The proof of this lemma follows rather straightforwardly from Ito’s formula. Problem 7.5 Search the literature for a proof of Dynkin’s lemma based on Ito’s formula. Write it and, if necessary, supply lacking details. Problem 7.6 Consider the n-dimensional Brownian motion W (t) = (W1 (t), . . . , Wn (t)), t ≥ 0. n n Suppose Brownian motion starts qP at a point x ∈ R . Let R > 0 be given. As the norm on R we 2 consider the L2 -norm: ||x|| = i xi . i) Compute the infinitesimal generator of n-dimensional Brownian motion. Let ||x|| < R and let τ denote the first exit time of the ball B n = {y | ||y|| < R}. By a.s. continuity of Brownian motion paths, τ = inf{t > 0|W (t) 6∈ B n } is equal in distribution to inf{t > 0| ||W (t)|| = R}. 46 ii) Show that P{τ < ∞|X(0) = x} = 1. Define a suitable martingale to compute E(τ ) by virtue of the optional stopping theorem. Argue that the theorem is applicable and compute the expected exit time. Hint: problem 6.12 may be helpful here. Let now ||x|| > R and let τ be the first entrance time of B n . The question is whether τ < ∞ a.s. and if yes, what is the expectation. The case n = 1 has been solved already (where?), so we assume n ≥ 2. I do not know how to get the optional stopping theorem to work for answering the above questions - if you can, please do. Therefore, it seems better to apply Dynkin’s lemma for suitable functions f . What type of function f would be suitable? The complement of the closure of the R-ball is unbounded, and so we start in unbounded territory. Now, we need to use functions that have a compact support. It makes sense to consider the annulus Ak = {y|R < ||y|| < 2k R}. Choose k large enough so that x ∈ Ak . Denote τk = inf{t > 0|W (t) 6∈ Ak }. A R τksuitable function f = fn,k should depend on y only through the norm. Further, the integral 0 Afn,k (X(s))ds should be easy to calculate. The best would be if this expression disappears altogether on the annulus, i.e. Afn,k = 0 on Ak . In other words, ∆fn,k = 0 on the annulus, i.e. fn,k is harmonic (on the annulus). Choose f = fn,k a function in C02 (Rn ), with fn,k (y) = log ||y|| on Ak if n = 2 and fn,k (y) = ||y||2−n if n > 2. iii) Show that τk satisfies the conditions of Dynkin’s lemma. Show that fn,k is harmonic on A¯k . Compute E(f (X(τk ))|X(0) = x). Derive that ( 1, n=2 P{τ < ∞|X(0) = x} = ||x|| 2−n , n > 2. (R) In case of n = 2, show that E(τ |X(0) = x} = ∞. The implication is that Brownian motion in 2 dimensions is null-recurrent and in n > 3 dimension it is transient. Now, if we choose the stopping time τ deterministic, i.e. τ ≡ t, then we see that u(t, x) = E(f (X(t))|X(0) = x) is differentiable w.r.t. t and ∂u = E(Af (X(t))|X(0) = x). ∂t It turns out that we can express the right-hand side of the above also in terms of u. This gives rise to Kolmogorov’s backward equation. Theorem 7.3 (Kolmogorov’s backward equation) Let f ∈ C02 (Rn ). i) Define u(t, x) = E(f (X(t))|X(0) = x). Then u(t, ·) ∈ DA for each t and ∂u = Au, t > 0, x ∈ Rn ∂t u(0, x) = f (x), x ∈ Rn . Interpret the right-hand side of (7.2) as A applied to u as a function of x. 47 (7.2) (7.3) ii) Suppose that w(t, x) is a bounded function solving (7.2) and (7.3), which is continuously differentiable in t, and twice continuously differentiable in x. Then w(t, x) = u(t, x). In particular, we have the explicit partial differential equation ∂u X ∂u 1 X ∂2u = bi + (σσ T )ij 2 . ∂t ∂x 2 ∂x i i,j This theorem gives a probabilistic solution to the initial value problem (7.2), (7.3). Now suppose that the Ito diffusion X(t) has a density p(t, x, y) = (∂/∂y)P{X(t) ≤ y|X(0) = x} that is once continuously differentiable in t and twice continuously differentiable in x. Then it makes sense that this density itself solves (7.2). Problem 7.7 Sketch a way how to prove this from Theorem 7.3. Heat equation Let us now fix X(t) = x + W (t), with x given. Then Kolmogorov’s backward equation (7.2) reduces to ∂u(x, t) ∂ 2 u(x, t) , = 12 ∂t ∂x2 which is the heat equation in one dimension. If X(t) is n-dimensional Brownian motion, then we get ∂u = 12 ∆u. ∂t We interpret this equation physically in different ways. It may model the time development of temperature u by heat conduction. On the other hand, microscopic particles suspended in a fluid or gas perform a very irregular motion, caused by collisions with molecules in thermal motion. One can then interpret u as the particle density, evolving in time. From a microscopic point of view, individual particles perform a Brownian motion, which is a stochastic process. The process is an Ito diffusion. From a macroscopic point of view, the particle density evolves in time according to the heat equation. The relation between the two is that the density of Brownian motion is a solution to the heat equation. Problem 7.8 Check the validity of this statement. Now we will check the validity of Theorem 7.3 for this simple model. Consider the initial value problem ∂ 1 u = ∆u, in Rn × R+ ∂t 2 u continuous in Rn × R+ 0 and u(x1 , . . . , xn , 0) = Φ(x1 , . . . , xn ), for a given bounded and continuous function Φ : Rn → R. By Theorem 7.3 this initial value problem has the following solution Z Z 1 2 Φ(x + y) u(x, t) = E(Φ(x + W (t))) = exp{−||y|| /2t}dy = Φ(y)φd (x − y, t)dy, (2πt)d/2 Rd Rd provided that Φ satisfies the condition of that theorem. For continuous functions Φ the statement can be checked directly. 48 Problem 7.9 Do this by carrying out the following steps. i) Argue that (x, t) → Φ(x + W (t)) is a continuous and bounded function for a.a. ω ∈ Ω. ii) Use (i) and a suitable convergence theorem to conclude that (x, t) → E(Φ(x+W (t))) is continuous. iii) Show that E(Φ(x + W (0))) satisfies the initial conditions. iv) Finally show that E(Φ(x + W (t))) solves the heat equation. To this end we need to interchange derivation and integral, so you have to justify that operation. We derive another property of the solution u to the initial value problem. Theorem 7.4 Let s > 0 be fixed. Then u(x + W (t), s − t) is a martingale w.r.t. the filtration (Ft )0≤t≤s , Ft = σ(W (t0 ), 0 ≤ t0 ≤ t). Problem 7.10 Prove the theorem. First argue that the following relation holds true: u(y, s − t) = E(Φ(y + W (s − t))) = E Φ(y + W (s) − W (t))|Ft . Boundary value problems Let X(t) be an Ito diffusion in one dimension: dX(t) = b(X(t))dt + σ(X(t))dW (t), where the functions b and σ satisfy the Lipschitz condition (7.1). Let (a, b) be a given interval, and let X(0) = x ∈ (a, b). Put τ = inf{t > 0|X(t) 6∈ (a, b)} and define p = P{X(τ ) = b|X(0) = x}. Suppose that we can find a solution f ∈ C 2 (R) such that 1 Af = b(x)f 0 (x) + f 00 (x) = 0, 2 x ∈ R. Problem 7.11 i) Prove that p= f (x) − f (a) , f (b) − f (a) provided τ < ∞ a.s. ii) Now specialise to the case X(t) = x + W (t), t ≥ 0. Prove that p= x−a . b−a iii) Determine p if X(t) = x + bt + σW (t). Now we are interested in the following boundary 00 u (x) = u(a) = u(b) = value problem: find u ∈ C 2 (R) such that 0, x ∈ (a, b) φ(a) φ(b). 49 Problem 7.12 Determine a solution to this problem analytically. We can derive a solution by a stochastic approach. Problem 7.13 Let X(t) = x + W (t). Show that u(x) := E(φ(X(τ ))|X(0) = x) solves the boundary value problem. Solving the PDE in this way is stereotype. However, in general one needs a detailed study of suitable properties of the function E(φ(X(τ ))|X(0) = x}, because in most cases one cannot explicitly calculate it. That involves many technicalities. Problem 7.14 Solve the following boundary value problem by a stochastic approach: find u ∈ C 2 (R) such that 0 bu (x) + 12 σ 2 u00 (x) = 0, x ∈ (a, b) u(a) = φ(a) u(b) = φ(b). In the above solutions, time did not play a role. We will next consider the simplest version of a boundary value problem involving the heat equation. Back to the heat equation Let D denote the infinite strip: D = {(t, x) ∈ R2 : x < R}. Let φ be a bounded continuous function on δD = {(t, R)|t ∈ R}. We consider the following boundary value problem: find u ∈ C 1,2 (R × (−∞, R)) such that ∂u 1 ∂ 2 u + = 0, (x, t) ∈ D ∂t 2 ∂x2 lim u(s, x) = φ(t), y ∈ δD. (s,x)→(t,R),(s,x)∈D A physical interpretation of this problem is the following: we consider an infinitely long vertical bar, with upper end point at R. We fix the temperature φ at the upper point of the bar as a function of time. Now we are interested in how temperature ‘spreads’ over the whole bar,while time is running. Problem 7.15 i) Define (hint: look at earlier exercises) a 2-dimensional Ito diffusion X(t), with generator ∂ 1 ∂2 A= + . ∂t 2 ∂x2 ii) Let τt,x = inf{s > t | X(s) 6∈ D}, given that X(0) = (t, x). Show that u(t, x) = E(φ(X(τt,x ))|X(0) = (t, x)) is the solution of the boundary value problem. Hint: the distribution of τt,x −t is the distribution of the hitting time of R for Brownian motion given initial state W (0) = x. 50 Option pricing We will indicate how to arrive at the simplest form of the Black Scholes formula for European options. There is a extensive mathematical formalism to define all notions that we use below in a precise manner, but this go far beyond the scope of this course. The basis is the following Ito diffusion dS(t) = µS(t)dt + σS(t)dW (t), where S(t) is the value of one unit stock. µ and σ 6= 0 are assumed constant. This is a geometric Brownian motion (see BSP) and it has the solution S(t) = s0 exp{(µ − σ 2 /2)t + σW (t)} where S(0) = s0 is given. Of course, dealing in stock is a risky investment because of the diffusion term σSdW (t). If we assume the interest rate of a bank investment to equal a constant ρ, then a bank investment is a safe investment. A European option is the right to buy one unit stock at the expiration time T for DK. At the expiration time T you will exercise your option, when K < S(T ); you will not buy it when K > S(T ). This means that at time T the value of your ‘warrant’ (right to buy the option) is max(S(T ) − K, 0). The question is how to calculate the price of the warrant at time t < T . If one assumes a stable market, that is, on the average one cannot gain or lose, then price and value of a warrant must be equal. Write F (S(t), T − t) is the price of a unit warrant at time t < T . Then F (S, 0) = max(S, 0). The aim is to formulate an initial value problem for F (S, T − t), 0 ≤ t ≤ T . Problem 7.16 Derive a SDE for dF (S, T − t). Suppose we have the following investment policy: at time t our portfolio consists of 1 unit of warrant with value F (S(t), T − t) and α(t) units of stock, so as to eliminate risk. Now α(t) is assumed Ft = σ(W (s), s ≤ t) measurable. As a consequence the value of out portfolio at time t is V (t) = F (S(t), T − t) + αS(t) and we get dV = dF + αdS. Problem 7.17 Derive a SDE for V . Determine α such that the dW term (diffusion term) disappears. Conclude that 1 ∂2F ∂F 2 2 dV (t) = S σ − dt. 2 ∂S 2 ∂t On the other hand, in a stable market, the average value of a portfolio is the same is the value of a safe (bank) investment: dV = ρV. Problem 7.18 i) Show that combining the above gives rise to the following initial value problem for F : ∂F ∂F 1 ∂2F = ρS + S 2 σ 2 2 − ρF. ∂S 2 ∂S F (S,∂t 0) = max(S − K, 0) 51 ii) Suppose ρ = 0. Of what Ito diffusion would the first equation be be the Kolmogorov backward equation (7.2)? To solve this problem, we need to invoke the Feynman-Kac formula. Theorem 7.5 Let f ∈ C02 (Rn ) and q ∈ C(Rn ). Assume that q is lower bounded. i) Put Z t v(t, x) = E exp{− q(X(s))ds}f (X(t)|X(0) = x . 0 Then ( ∂v = Av − qv, ∂t v(0, x) = f (x), t > 0, x ∈ Rn x ∈ Rn . ii) If w(t, x) ∈ C 1,2 (R × Rn ) is bounded on K × Rn for each compact subset K ⊂ R and w is a solution of the above PDE, then w(t, x) = v(t, x). Problem 7.19 Show that the value of the option at time t = 0 equals √ s0 Φ(u) − e−ρT KΦ(u − σ T ), where 1 Φ(u) = √ 2π Z u e−x 2 /2 dx, −∞ is the distribution function of a standard normal r.v. and u= ln(s0 /K) + (ρ + σ 2 /2)T √ . σ T This is the classical Black Scholes formula. 52