Download ELEMENTS OF PROBABILITY THEORY

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
ELEMENTS OF PROBABILITY THEORY
Elements of Probability Theory
• A collection of subsets of a set Ω is called a σ–algebra if it
contains Ω and is closed under the operations of taking
complements and countable unions of its elements.
• A sub-σ–algebra is a collection of subsets of a σ–algebra which
satisfies the axioms of a σ–algebra.
• A measurable space is a pair (Ω, F) where Ω is a set and F is a
σ–algebra of subsets of Ω.
• Let (Ω, F) and (E, G) be two measurable spaces. A function
X : Ω 7→ E such that the event
{ω ∈ Ω : X(ω) ∈ A} =: {X ∈ A}
belongs to F for arbitrary A ∈ G is called a measurable
function or random variable.
Elements of Probability Theory
• Let (Ω, F) be a measurable space. A function µ : F 7→ [0, 1] is
called a probability measure if µ(∅) = 1, µ(Ω) = 1 and
P∞
∞
µ(∪k=1 Ak ) = k=1 µ(Ak ) for all sequences of pairwise disjoint
sets {Ak }∞
k=1 ∈ F.
• The triplet (Ω, F, µ) is called a probability space.
• Let X be a random variable (measurable function) from
(Ω, F, µ) to (E, G). If E is a metric space then we may define
expectation with respect to the measure µ by
Z
E[X] =
X(ω) dµ(ω).
Ω
• More generally, let f : E 7→ R be G–measurable. Then,
Z
f (X(ω)) dµ(ω).
E[f (X)] =
Ω
Elements of Probability Theory
• Let U be a topological space. We will use the notation B(U ) to
denote the Borel σ–algebra of U : the smallest σ–algebra
containing all open sets of U . Every random variable from a
probability space (Ω, F, µ) to a measurable space (E, B(E))
induces a probability measure on E:
µX (B) = PX −1 (B) = µ(ω ∈ Ω; X(ω) ∈ B),
B ∈ B(E).
The measure µX is called the distribution (or sometimes the
law) of X.
Example 1 Let I denote a subset of the positive integers. A
vector ρ0 = {ρ0,i , i ∈ I} is a distribution on I if it has nonnegative
P
entries and its total mass equals 1:
i∈I ρ0,i = 1.
Elements of Probability Theory
• We can use the distribution of a random variable to compute
expectations and probabilities:
Z
E[f (X)] =
f (x) dµX (x)
S
and
Z
P[X ∈ G] =
dµX (x),
G ∈ B(E).
G
• When E = Rd and we can write dµX (x) = ρ(x) dx, then we
refer to ρ(x) as the probability density function (pdf), or
density with respect to Lebesque measure for X.
• When E = Rd then by Lp (Ω; Rd ), or sometimes Lp (Ω; µ) or
even simply Lp (µ), we mean the Banach space of measurable
functions on Ω with norm
³
´1/p
kXkLp = E|X|p
.
Elements of Probability Theory
Example 2
i) Consider the random variable X : Ω 7→ R with pdf
¶
µ
2
1
(x
−
m)
γσ,m (x) := (2πσ)− 2 exp −
.
2σ
Such an X is termed a Gaussian or normal random variable.
The mean is
Z
EX =
xγσ,m (x) dx = m
R
and the variance is
Z
E(X − m)2 =
(x − m)2 γσ,m (x) dx = σ.
R
Since the mean and variance specify completely a Gaussian
random variable on R, the Gaussian is commonly denoted by
N (m, σ). The standard normal random variable is N (0, 1).
Elements of Probability Theory
ii) Let m ∈ Rd and Σ ∈ Rd×d be symmetric and positive definite.
The random variable X : Ω 7→ Rd with pdf
µ
¶
1
¡
¢
1
−
γΣ,m (x) := (2π)d detΣ 2 exp − hΣ−1 (x − m), (x − m)i
2
is termed a multivariate Gaussian or normal random
variable. The mean is
E(X) = m
and the covariance matrix is
³
´
E (X − m) ⊗ (X − m) = Σ.
Since the mean and covariance matrix completely specify a
Gaussian random variable on Rd , the Gaussian is commonly
denoted by N (m, Σ).
(1)
(2)
Elements of Probability Theory
Example 3 An exponential random variable T : Ω → R+ with rate
λ > 0 satisfies
P(T > t) = e−λt , ∀t > 0.
We write T ∼ exp(λ). The related pdf is
fT (t) =
Notice that
Z
n λe−λt , t > 0,
0,
∞
(3)
t < 0.
1
tfT (t)dt =
ET =
λ
−∞
Z
∞
(λt)e−λt d(λt) =
0
1
.
λ
If the times τn = tn+1 − tn are i.i.d random variables with
τ0 ∼ exp(λ) then, for t0 = 0,
tn =
n−1
X
k=0
τk
Elements of Probability Theory
and it is possible to show that
e−λt (λt)k
.
P(0 6 tk 6 t < tk+1 ) =
k!
(4)
Elements of Probability Theory
• Assume that E|X| < ∞ and let G be a sub–σ–algebra of F.
The conditional expectation of X with respect to G is
defined to be the function E[X|G] : Ω 7→ E which is
G–measurable and satisfies
Z
Z
E[X|G] dµ =
X dµ ∀ G ∈ G.
G
G
• We can define E[f (X)|G] and the conditional probability
P[X ∈ F |G] = E[IF (X)|G], where IF is the indicator function of
F , in a similar manner.
ELEMENTS OF THE THEORY OF STOCHASTIC
PROCESSES
Definition of a Stochastic Process
• Let T be an ordered set. A stochastic process is a collection
of random variables X = {Xt ; t ∈ T } where, for each fixed
t ∈ T , Xt is a random variable from (Ω, F) to (E, G).
• The measurable space {Ω, F} is called the sample space. The
space (E, G) is called the state space .
• In this course we will take the set T to be [0, +∞).
• The state space E will usually be Rd equipped with the
σ–algebra of Borel sets.
• A stochastic process X may be viewed as a function of both
t ∈ T and ω ∈ Ω. We will sometimes write X(t), X(t, ω) or
Xt (ω) instead of Xt . For a fixed sample point ω ∈ Ω, the
function Xt (ω) : T 7→ E is called a sample path (realization,
trajectory) of the process X.
Definition of a Stochastic Process
• The finite dimensional distributions (fdd) of a stochastic
process are the E k –valued random variables
(X(t1 ), X(t2 ), . . . , X(tk )) for arbitrary positive integer k and
arbitrary times ti ∈ T, i ∈ {1, . . . , k}.
• We will say that two processes Xt and Yt are equivalent if they
have same finite dimensional distributions.
• From experiments or numerical simulations we can only obtain
information about the (fdd) of a process.
Stationary Processes
• A process is called (strictly) stationary if all fdd are
invariant under are time translation: for any integer k and
times ti ∈ T , the distribution of (X(t1 ), X(t2 ), . . . , X(tk )) is
equal to that of (X(s + t1 ), X(s + t2 ), . . . , X(s + tk )) for any s
such that s + ti ∈ T for all i ∈ {1, . . . , k}.
• Let Xt be a stationary stochastic process with finite second
moment (i.e. Xt ∈ L2 ). Stationarity implies that EXt = µ,
E((Xt − µ)(Xs − µ)) = C(t − s). The converse is not true.
• A stochastic process Xt ∈ L2 is called second-order
stationary (or stationary in the wide sense) if the first
moment EXt is a constant and the second moment depends
only on the difference t − s:
EXt = µ,
E((Xt − µ)(Xs − µ)) = C(t − s).
Stationary Processes
• The function C(t) is called the correlation (or covariance)
function of Xt .
• Let Xt ∈ L2 be a mean zero second order stationary process on
R which is mean square continuous, i.e.
lim E|Xt − Xs |2 = 0.
t→s
• Then the correlation function admits the representation
Z ∞
C(t) =
eitx f (x) dx, t ∈ R.
−∞
• the function f (x) is called the spectral density of the process
Xt .
• In many cases, the experimentally measured quantity is the
spectral density (or power spectrum) of the stochastic process.
Stationary Processes
• Given the correlation function of Xt , and assuming that
C(t) ∈ L1 (R), we can calculate the spectral density through its
Fourier transform:
Z ∞
1
f (x) =
e−itx C(t) dt.
2π −∞
• The correlation function of a second order stationary process
enables us to associate a time scale to Xt , the correlation
time τcor :
Z ∞
Z ∞
1
τcor =
C(τ ) dτ =
E(Xτ X0 )/E(X02 ) dτ.
C(0) 0
0
• The slower the decay of the correlation function, the larger the
correlation time is. We have to assume sufficiently fast decay of
correlations so that the correlation time is finite.
Stationary Processes
Example 4 Consider a second stationary process with correlation
function
C(t) = C(0)e−γ|t| .
The spectral density of this process is
Z ∞
1
f (x) =
C(0)
e−itx e−γ|t| dt
2π
−∞
1
γ
= C(0)
.
π γ 2 + x2
The correlation time is
Z
τcor =
0
∞
e−γt dt = γ −1 .
Gaussian Processes
• The most important class of stochastic processes is that of
Gaussian processes:
Definition 5 A Gaussian process is one for which E = Rd and
all the finite dimensional distributions are Gaussian.
• A Gaussian process x(t) is characterized by its mean
m(t) := Ex(t)
and the covariance function
³¡
¢ ¡
¢´
C(t, s) = E x(t) − m(t) ⊗ x(s) − m(s) .
• Thus, the first two moments of a Gaussian process are
sufficient for a complete characterization of the process.
• A corollary of this is that a second order stationary Gaussian
process is also a stationary process.
Brownian Motion
• The most important continuous time stochastic process is
Brownian motion. Brownian motion is a mean zero,
continuous (i.e. it has continuous sample paths: for a.e ω ∈ Ω
the function Xt is a continuous function of time) process with
independent Gaussian increments.
• A process Xt has independent increments if for every
sequence t0 < t1 ...tn the random variables
Xt1 − Xt0 , Xt2 − Xt1 , . . . , Xtn − Xtn−1
are independent.
• If, furthermore, for any t1 , t2 and Borel set B ⊂ R
P(Xt2 +s − Xt1 +s ∈ B)
is independent of s, then the process Xt has stationary
independent increments.
Brownian Motion
Definition 6 i) A one dimensional standard Brownian motion
W (t) : R+ → R is a real valued stochastic process with the
following properties:
(a) W (0) = 0;
(b) W (t) is continuous;
(c) W (t) has independent increments.
(d) For every t > s > 0 W (t) − W (s) has a Gaussian
distribution with mean 0 and variance t − s. That is, the
density of the random variable W (t) − W (s) is
µ
¶
³
´− 12
2
x
g(x; t, s) = 2π(t − s)
exp −
;
(5)
2(t − s)
Brownian Motion
ii) A d–dimensional standard Brownian motion W (t) : R+ → Rd
is a collection of d independent one dimensional Brownian
motions:
W (t) = (W1 (t), . . . , Wd (t)),
where Wi (t), i = 1, . . . , d are independent one dimensional
Brownian motions. The density of the Gaussian random vector
W (t) − W (s) is thus
µ
¶
³
´−d/2
2
kxk
g(x; t, s) = 2π(t − s)
exp −
.
2(t − s)
Brownian motion is sometimes referred to as the Wiener process .
Brownian Motion
3
2
1
0
W(t)
−1
−2
−3
−4
0
1
2
3
4
t
Figure 1: Brownian sample paths
5
Brownian Motion
It is possible to prove rigorously the existence of the Wiener
process (Brownian motion):
Theorem 1 (Wiener) There exists an almost-surely continuous
process Wt with independent increments such and W0 = 0, such
that for each t > the random variable Wt is N (0, t). Furthermore,
Wt is almost surely locally Hölder continuous with exponent α for
any α ∈ (0, 12 ).
Notice that Brownian paths are not differentiable.
Brownian Motion
Brownian motion is a Gaussian process. For the d–dimensional
Brownian motion, and for I the d × d dimensional identity, we have
(see (1) and (2))
EW (t) = 0
and
∀t > 0
³
´
E (W (t) − W (s)) ⊗ (W (t) − W (s)) = (t − s)I.
Moreover,
³
´
E W (t) ⊗ W (s) = min(t, s)I.
(6)
(7)
Brownian Motion
• From the formula for the Gaussian density g(x, t − s), eqn. (5),
we immediately conclude that W (t) − W (s) and
W (t + u) − W (s + u) have the same pdf. Consequently,
Brownian motion has stationary increments.
• Notice, however, that Brownian motion itself is not a
stationary process.
• Since W (t) = W (t) − W (0), the pdf of W (t) is
1 −x2 /2t
e
.
g(x, t) = √
2πt
• We can easily calculate all moments of the Brownian motion:
Z +∞
2
1
E(xn (t)) = √
xn e−x /2t dx
2πt −∞
n 1.3 . . . (n − 1)tn/2 , n even,
=
0,
n odd.
The Poisson Process
• Another fundamental continuous time process is the Poisson
process :
Definition 7 The Poisson process with intensity λ, denoted by
N (t), is an integer-valued, continuous time, stochastic process
with independent increments satisfying
¡
¢k
−λ(t−s)
e
λ(t − s)
P[(N (t) − N (s)) = k] =
, t > s > 0, k ∈ N.
k!
• Notice the connection to exponential random variables via (4).
• Both Brownian motion and the Poisson process are
homogeneous (or time-homogeneous): the increments
between successive times s and t depend only on t − s.
The Path Space
• Let (Ω, F, µ) be a probability space, (E, ρ) a metric space and
let T = [0, ∞). Let {Xt } be a stochastic process from (Ω, F, µ)
to (E, ρ) with continuous sample paths.
• The above means that for every ω ∈ Ω we have that
Xt ∈ CE := C([0, ∞); E).
• The space of continuous functions CE is called the path space
of the stochastic process.
• We can put a metric on E as follows:
∞
X
¢
¡
1
1
2
1
2
ρE (X , X ) :=
max min ρ(Xt , Xt ), 1 .
n
2 06t6n
n=1
• We can then define the Borel sets on CE , using the topology
induced by this metric, and {Xt } can be thought of as a
random variable on (Ω, F, µ) with state space (CE , B(CE )).
The Path Space
• The probability measure PXt−1 on (CE , B(CE )) is called the
law of {Xt }.
• The law of a stochastic process is a probability measure on its
path space.
Example 8 The space of continuous functions CE is the path
space of Brownian motion (the Wiener process). The law of
Brownian motion, that is the measure that it induces on
C([0, ∞), Rd ), is known as the Wiener measure.