Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
ELEMENTS OF PROBABILITY THEORY Elements of Probability Theory • A collection of subsets of a set Ω is called a σ–algebra if it contains Ω and is closed under the operations of taking complements and countable unions of its elements. • A sub-σ–algebra is a collection of subsets of a σ–algebra which satisfies the axioms of a σ–algebra. • A measurable space is a pair (Ω, F) where Ω is a set and F is a σ–algebra of subsets of Ω. • Let (Ω, F) and (E, G) be two measurable spaces. A function X : Ω 7→ E such that the event {ω ∈ Ω : X(ω) ∈ A} =: {X ∈ A} belongs to F for arbitrary A ∈ G is called a measurable function or random variable. Elements of Probability Theory • Let (Ω, F) be a measurable space. A function µ : F 7→ [0, 1] is called a probability measure if µ(∅) = 1, µ(Ω) = 1 and P∞ ∞ µ(∪k=1 Ak ) = k=1 µ(Ak ) for all sequences of pairwise disjoint sets {Ak }∞ k=1 ∈ F. • The triplet (Ω, F, µ) is called a probability space. • Let X be a random variable (measurable function) from (Ω, F, µ) to (E, G). If E is a metric space then we may define expectation with respect to the measure µ by Z E[X] = X(ω) dµ(ω). Ω • More generally, let f : E 7→ R be G–measurable. Then, Z f (X(ω)) dµ(ω). E[f (X)] = Ω Elements of Probability Theory • Let U be a topological space. We will use the notation B(U ) to denote the Borel σ–algebra of U : the smallest σ–algebra containing all open sets of U . Every random variable from a probability space (Ω, F, µ) to a measurable space (E, B(E)) induces a probability measure on E: µX (B) = PX −1 (B) = µ(ω ∈ Ω; X(ω) ∈ B), B ∈ B(E). The measure µX is called the distribution (or sometimes the law) of X. Example 1 Let I denote a subset of the positive integers. A vector ρ0 = {ρ0,i , i ∈ I} is a distribution on I if it has nonnegative P entries and its total mass equals 1: i∈I ρ0,i = 1. Elements of Probability Theory • We can use the distribution of a random variable to compute expectations and probabilities: Z E[f (X)] = f (x) dµX (x) S and Z P[X ∈ G] = dµX (x), G ∈ B(E). G • When E = Rd and we can write dµX (x) = ρ(x) dx, then we refer to ρ(x) as the probability density function (pdf), or density with respect to Lebesque measure for X. • When E = Rd then by Lp (Ω; Rd ), or sometimes Lp (Ω; µ) or even simply Lp (µ), we mean the Banach space of measurable functions on Ω with norm ³ ´1/p kXkLp = E|X|p . Elements of Probability Theory Example 2 i) Consider the random variable X : Ω 7→ R with pdf ¶ µ 2 1 (x − m) γσ,m (x) := (2πσ)− 2 exp − . 2σ Such an X is termed a Gaussian or normal random variable. The mean is Z EX = xγσ,m (x) dx = m R and the variance is Z E(X − m)2 = (x − m)2 γσ,m (x) dx = σ. R Since the mean and variance specify completely a Gaussian random variable on R, the Gaussian is commonly denoted by N (m, σ). The standard normal random variable is N (0, 1). Elements of Probability Theory ii) Let m ∈ Rd and Σ ∈ Rd×d be symmetric and positive definite. The random variable X : Ω 7→ Rd with pdf µ ¶ 1 ¡ ¢ 1 − γΣ,m (x) := (2π)d detΣ 2 exp − hΣ−1 (x − m), (x − m)i 2 is termed a multivariate Gaussian or normal random variable. The mean is E(X) = m and the covariance matrix is ³ ´ E (X − m) ⊗ (X − m) = Σ. Since the mean and covariance matrix completely specify a Gaussian random variable on Rd , the Gaussian is commonly denoted by N (m, Σ). (1) (2) Elements of Probability Theory Example 3 An exponential random variable T : Ω → R+ with rate λ > 0 satisfies P(T > t) = e−λt , ∀t > 0. We write T ∼ exp(λ). The related pdf is fT (t) = Notice that Z n λe−λt , t > 0, 0, ∞ (3) t < 0. 1 tfT (t)dt = ET = λ −∞ Z ∞ (λt)e−λt d(λt) = 0 1 . λ If the times τn = tn+1 − tn are i.i.d random variables with τ0 ∼ exp(λ) then, for t0 = 0, tn = n−1 X k=0 τk Elements of Probability Theory and it is possible to show that e−λt (λt)k . P(0 6 tk 6 t < tk+1 ) = k! (4) Elements of Probability Theory • Assume that E|X| < ∞ and let G be a sub–σ–algebra of F. The conditional expectation of X with respect to G is defined to be the function E[X|G] : Ω 7→ E which is G–measurable and satisfies Z Z E[X|G] dµ = X dµ ∀ G ∈ G. G G • We can define E[f (X)|G] and the conditional probability P[X ∈ F |G] = E[IF (X)|G], where IF is the indicator function of F , in a similar manner. ELEMENTS OF THE THEORY OF STOCHASTIC PROCESSES Definition of a Stochastic Process • Let T be an ordered set. A stochastic process is a collection of random variables X = {Xt ; t ∈ T } where, for each fixed t ∈ T , Xt is a random variable from (Ω, F) to (E, G). • The measurable space {Ω, F} is called the sample space. The space (E, G) is called the state space . • In this course we will take the set T to be [0, +∞). • The state space E will usually be Rd equipped with the σ–algebra of Borel sets. • A stochastic process X may be viewed as a function of both t ∈ T and ω ∈ Ω. We will sometimes write X(t), X(t, ω) or Xt (ω) instead of Xt . For a fixed sample point ω ∈ Ω, the function Xt (ω) : T 7→ E is called a sample path (realization, trajectory) of the process X. Definition of a Stochastic Process • The finite dimensional distributions (fdd) of a stochastic process are the E k –valued random variables (X(t1 ), X(t2 ), . . . , X(tk )) for arbitrary positive integer k and arbitrary times ti ∈ T, i ∈ {1, . . . , k}. • We will say that two processes Xt and Yt are equivalent if they have same finite dimensional distributions. • From experiments or numerical simulations we can only obtain information about the (fdd) of a process. Stationary Processes • A process is called (strictly) stationary if all fdd are invariant under are time translation: for any integer k and times ti ∈ T , the distribution of (X(t1 ), X(t2 ), . . . , X(tk )) is equal to that of (X(s + t1 ), X(s + t2 ), . . . , X(s + tk )) for any s such that s + ti ∈ T for all i ∈ {1, . . . , k}. • Let Xt be a stationary stochastic process with finite second moment (i.e. Xt ∈ L2 ). Stationarity implies that EXt = µ, E((Xt − µ)(Xs − µ)) = C(t − s). The converse is not true. • A stochastic process Xt ∈ L2 is called second-order stationary (or stationary in the wide sense) if the first moment EXt is a constant and the second moment depends only on the difference t − s: EXt = µ, E((Xt − µ)(Xs − µ)) = C(t − s). Stationary Processes • The function C(t) is called the correlation (or covariance) function of Xt . • Let Xt ∈ L2 be a mean zero second order stationary process on R which is mean square continuous, i.e. lim E|Xt − Xs |2 = 0. t→s • Then the correlation function admits the representation Z ∞ C(t) = eitx f (x) dx, t ∈ R. −∞ • the function f (x) is called the spectral density of the process Xt . • In many cases, the experimentally measured quantity is the spectral density (or power spectrum) of the stochastic process. Stationary Processes • Given the correlation function of Xt , and assuming that C(t) ∈ L1 (R), we can calculate the spectral density through its Fourier transform: Z ∞ 1 f (x) = e−itx C(t) dt. 2π −∞ • The correlation function of a second order stationary process enables us to associate a time scale to Xt , the correlation time τcor : Z ∞ Z ∞ 1 τcor = C(τ ) dτ = E(Xτ X0 )/E(X02 ) dτ. C(0) 0 0 • The slower the decay of the correlation function, the larger the correlation time is. We have to assume sufficiently fast decay of correlations so that the correlation time is finite. Stationary Processes Example 4 Consider a second stationary process with correlation function C(t) = C(0)e−γ|t| . The spectral density of this process is Z ∞ 1 f (x) = C(0) e−itx e−γ|t| dt 2π −∞ 1 γ = C(0) . π γ 2 + x2 The correlation time is Z τcor = 0 ∞ e−γt dt = γ −1 . Gaussian Processes • The most important class of stochastic processes is that of Gaussian processes: Definition 5 A Gaussian process is one for which E = Rd and all the finite dimensional distributions are Gaussian. • A Gaussian process x(t) is characterized by its mean m(t) := Ex(t) and the covariance function ³¡ ¢ ¡ ¢´ C(t, s) = E x(t) − m(t) ⊗ x(s) − m(s) . • Thus, the first two moments of a Gaussian process are sufficient for a complete characterization of the process. • A corollary of this is that a second order stationary Gaussian process is also a stationary process. Brownian Motion • The most important continuous time stochastic process is Brownian motion. Brownian motion is a mean zero, continuous (i.e. it has continuous sample paths: for a.e ω ∈ Ω the function Xt is a continuous function of time) process with independent Gaussian increments. • A process Xt has independent increments if for every sequence t0 < t1 ...tn the random variables Xt1 − Xt0 , Xt2 − Xt1 , . . . , Xtn − Xtn−1 are independent. • If, furthermore, for any t1 , t2 and Borel set B ⊂ R P(Xt2 +s − Xt1 +s ∈ B) is independent of s, then the process Xt has stationary independent increments. Brownian Motion Definition 6 i) A one dimensional standard Brownian motion W (t) : R+ → R is a real valued stochastic process with the following properties: (a) W (0) = 0; (b) W (t) is continuous; (c) W (t) has independent increments. (d) For every t > s > 0 W (t) − W (s) has a Gaussian distribution with mean 0 and variance t − s. That is, the density of the random variable W (t) − W (s) is µ ¶ ³ ´− 12 2 x g(x; t, s) = 2π(t − s) exp − ; (5) 2(t − s) Brownian Motion ii) A d–dimensional standard Brownian motion W (t) : R+ → Rd is a collection of d independent one dimensional Brownian motions: W (t) = (W1 (t), . . . , Wd (t)), where Wi (t), i = 1, . . . , d are independent one dimensional Brownian motions. The density of the Gaussian random vector W (t) − W (s) is thus µ ¶ ³ ´−d/2 2 kxk g(x; t, s) = 2π(t − s) exp − . 2(t − s) Brownian motion is sometimes referred to as the Wiener process . Brownian Motion 3 2 1 0 W(t) −1 −2 −3 −4 0 1 2 3 4 t Figure 1: Brownian sample paths 5 Brownian Motion It is possible to prove rigorously the existence of the Wiener process (Brownian motion): Theorem 1 (Wiener) There exists an almost-surely continuous process Wt with independent increments such and W0 = 0, such that for each t > the random variable Wt is N (0, t). Furthermore, Wt is almost surely locally Hölder continuous with exponent α for any α ∈ (0, 12 ). Notice that Brownian paths are not differentiable. Brownian Motion Brownian motion is a Gaussian process. For the d–dimensional Brownian motion, and for I the d × d dimensional identity, we have (see (1) and (2)) EW (t) = 0 and ∀t > 0 ³ ´ E (W (t) − W (s)) ⊗ (W (t) − W (s)) = (t − s)I. Moreover, ³ ´ E W (t) ⊗ W (s) = min(t, s)I. (6) (7) Brownian Motion • From the formula for the Gaussian density g(x, t − s), eqn. (5), we immediately conclude that W (t) − W (s) and W (t + u) − W (s + u) have the same pdf. Consequently, Brownian motion has stationary increments. • Notice, however, that Brownian motion itself is not a stationary process. • Since W (t) = W (t) − W (0), the pdf of W (t) is 1 −x2 /2t e . g(x, t) = √ 2πt • We can easily calculate all moments of the Brownian motion: Z +∞ 2 1 E(xn (t)) = √ xn e−x /2t dx 2πt −∞ n 1.3 . . . (n − 1)tn/2 , n even, = 0, n odd. The Poisson Process • Another fundamental continuous time process is the Poisson process : Definition 7 The Poisson process with intensity λ, denoted by N (t), is an integer-valued, continuous time, stochastic process with independent increments satisfying ¡ ¢k −λ(t−s) e λ(t − s) P[(N (t) − N (s)) = k] = , t > s > 0, k ∈ N. k! • Notice the connection to exponential random variables via (4). • Both Brownian motion and the Poisson process are homogeneous (or time-homogeneous): the increments between successive times s and t depend only on t − s. The Path Space • Let (Ω, F, µ) be a probability space, (E, ρ) a metric space and let T = [0, ∞). Let {Xt } be a stochastic process from (Ω, F, µ) to (E, ρ) with continuous sample paths. • The above means that for every ω ∈ Ω we have that Xt ∈ CE := C([0, ∞); E). • The space of continuous functions CE is called the path space of the stochastic process. • We can put a metric on E as follows: ∞ X ¢ ¡ 1 1 2 1 2 ρE (X , X ) := max min ρ(Xt , Xt ), 1 . n 2 06t6n n=1 • We can then define the Borel sets on CE , using the topology induced by this metric, and {Xt } can be thought of as a random variable on (Ω, F, µ) with state space (CE , B(CE )). The Path Space • The probability measure PXt−1 on (CE , B(CE )) is called the law of {Xt }. • The law of a stochastic process is a probability measure on its path space. Example 8 The space of continuous functions CE is the path space of Brownian motion (the Wiener process). The law of Brownian motion, that is the measure that it induces on C([0, ∞), Rd ), is known as the Wiener measure.