* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Lecture 2 - Maths, NUS
Birthday problem wikipedia , lookup
Ars Conjectandi wikipedia , lookup
Inductive probability wikipedia , lookup
Probability box wikipedia , lookup
Probability interpretations wikipedia , lookup
Karhunen–Loève theorem wikipedia , lookup
Random variable wikipedia , lookup
Infinite monkey theorem wikipedia , lookup
Conditioning (probability) wikipedia , lookup
Lecture 2
1
Product Spaces and Measures
Before we introduce and study independent random variables, we need some further background in measure theory. More specifically, given two probability spaces (Ω1 , F1 , P1 ) and
(Ω2 , F2 , P2 ), we will construct a product probability measure P1 × P2 on the product space
Ω1 × Ω2 . We will then extend this construction to an infinite product space Ω1 × Ω2 × · · · ,
which will allow us to construct an infinite sequence of independent random variables.
Recall that the Cartesian product Ω1 × Ω2 is defined as {(ω1 , ω2 ) : ω1 ∈ Ω1 , ω2 ∈ Ω2 }.
Given σ-algebra Fi on Ωi , i = 1, 2, we first need to define a suitable σ-algebra on Ω1 × Ω2 .
Definition 1.1 [Product σ-algebra] The produce σ-algebra F on Ω1 × Ω2 is the σ-algebra
generated by rectangles of the form A1 × A2 , with Ai ∈ Fi for i = 1, 2.
We can then define the product measure P := P1 × P2 on Ω1 × Ω2 .
Theorem 1.2 [Product Measure] Let (Ωi , Fi , Pi ), i = 1, 2, be two probability spaces. Then
there exists a unique probability measure P on Ω1 × Ω2 equipped with the product σ-algebra F,
such that for all A1 ∈ F1 and A2 ∈ F2 , we have P (A1 × A2 ) = P1 (A1 ) · P2 (A2 ).
Proof. Note that the collection of finite disjoint unions of measurable rectangles A1 × A2 ,
with Ai ∈ Fi , forms an algebra of sets I, and F is the σ-algebra generated by I. Therefore
by Caratheodory Extension Theorem, it suffices to show that P defined by
P (A1 × A2 ) := P1 (A1 ) · P2 (A2 )
for A1 ∈ F1 , A2 ∈ F2 ,
extends to a countably-additive probability measure on the algebra I.
For any E ∈ I with E being the finite disjoint union of measurable rectangles Ai1 × Ai2 for
1 ≤ i ≤ k, we can define
k
X
P (E) :=
P (Ai1 × Ai2 ).
i=1
It is an easy exercise (which we leave to the reader) to verify that P (E) is well-defined, i.e.,
if E is also the disjoint union of another collection of measurable rectangles B1j × B2j for
P
j
j
1 ≤ j ≤ m, then also P (E) = m
j=1 P (B1 × B2 ). P defined this way is clearly finitely-additive
on I.
To show that P is countably-additive, let En ∈ I with En ↓ ∅. Let us define the following
sections of En :
En,ω2 := {ω1 ∈ Ω1 : (ω1 , ω2 ) ∈ En }
for ω2 ∈ Ω2 .
The fact that En ∈ I implies that En,ω2 ∈ F1 for each ω2 ∈ Ω2 , P1 (En,ω2 ) is a measurable
function from (Ω2 , F2 ) to ([0, 1], B([0, 1])), and
Z
P (En ) =
P1 (En,ω2 )P2 (dω2 ).
(1.1)
Ω2
En ↓ ∅ implies that En,ω2 ↓ ∅ for each ω2 , and hence P1 (En,ω2 ) ↓ 0 by the countable additivity
of P1 . Applying the Bounded Convergence Theorem to the above integral then gives P (En ) ↓ 0
as n → ∞, which establishes the countable additivity of P on I.
1
Remark 1.3 Given probability spaces (Ωi , F, Pi ) with 1 ≤ i ≤ n, the same argument as
above can be used to construct the product probability measure P on the product space
Ω1 × Ω2 × · · · × Ωn , such that the σ-algebra F is generated by measurable rectangles of
the form A1 × · · · × An with Ai ∈ Fi , and P is the unique probability measure on F with
Q
P (A1 × · · · × An ) = ni=1 Pi (Ai ).
We now extend the relation (1.1) between a set E in the algebra I and its sections to a
general set E in the product σ-algebra F.
Corollary 1.4 [A Set and its Sections] Let A ∈ F, the product σ-algebra on Ω1 × Ω2 . Let
Aω1 := {ω2 ∈ Ω2 : (ω1 , ω2 ) ∈ A},
Aω2 :=
{ω1 ∈ Ω1 : (ω1 , ω2 ) ∈ A}
denote the sections of A. Then P1 (Aω2 ) and P2 (Aω1 ) are measurable, and
Z
Z
P1 (Aω2 )P2 (dω2 ) =
P2 (Aω1 )P1 (dω1 ).
P (A) =
Ω2
(1.2)
Ω1
Proof. The proof follows a standard procedure of showing that all sets in a σ-algebra satisfy
a particular property. Let L denote the collection of sets A ∈ F which satisfy the properties
that P1 (Aω2 ) and P2 (Aω1 ) are measurable and (1.2) holds. First we show that L is a so-called
λ-system, namely that: (i) Ω := Ω1 × Ω2 ∈ L; (ii) If A, B ∈ L and A ⊂ B, then B − A ∈ L;
(iii) If An ∈ L and An ↑ A, then A ∈ L. Then we show that L contains a collection of sets P
forming a Π-system, namely that P is closed under intersection. We can then invoke Dynkin’s
π-λ Theorem to conclude that Λ contains the σ-algebra generated by P:
Theorem 1.5 [Dynkin’s π-λ Theorem] If Λ is a λ-system, and P ⊂ Λ is a π-system, then
Λ contains the σ-algebra generated by P. (See [1]. This is equivalent to the Monotone Class
Theorem.)
If we can find a collection P large enough such that P generates F, then F ⊂ Λ.
In our case, clearly L is a λ-system. We have shown in the proof of Theorem 1.2 that L
contains the algebra I of finite disjoint unions of measurable rectangles A1 × A2 , which is a
π-system. Therefore L contains F, the σ-algebra generated by I.
Remark 1.6 There exists A ⊂ Ω1 × Ω2 which is not measurable w.r.t. the product σ-algebra
F, and yet its sections Aω1 and Aω2 are measurable for all ω1 and ω2 . Take Ω1 = Ω2 = [0, 1]
with Borel σ-algebra and Lebesgue measure. Assuming the Continuum Hypothesis so that
[0, 1] has cardinality ℵ1 , one can use a well-ordering ≺ of [0, 1] to construct a set A := {(a, b) ∈
[0, 1]2 : a ≺ b} such that [0, 1]\Aω1 is countable for every ω1 ∈ [0, 1], and Aω2 is countable for
every ω2 ∈ [0, 1]. Clearly (1.2) fails in this case and A is not measurable. See [2] for more
details.
We will need the following condition for interchanging double integrals on a product space.
Theorem 1.7 [Fubini’s Theorem] Let f : (Ω1 × Ω2 , F, P ) → (R, B). For each ω1 ∈ Ω1 and
(1)
(2)
(1)
(2)
ω2 ∈ Ω2 , let fω1 : Ω2 → R and fω2 : Ω1 → R be defined by fω1 (ω2 ) := fω2 (ω1 ) := f (ω1 , ω2 ).
(1)
(2)
Then fω1 and fω2 are measurable for each ω1 ∈ Ω1 and ω2 ∈ Ω2 . If f is integrable on
2
(2)
(1)
(Ω1 × Ω2 , F, P ), then fω1 and fω2 are integrable for a.s. every ω1 and ω2 respectively. Their
integrals
Z
Z
(1)
(1)
(2)
H (ω1 ) := fω1 (ω2 )P2 (dω2 ) and H (ω2 ) := fω(2)
(ω1 )P1 (dω1 )
2
are measurable, finite a.s. and integrable w.r.t. P1 and P2 respectively. Finally,
Z
Z
Z
f (ω1 , ω2 ) dP = H (1) (ω1 ) dP1 = H (2) (ω2 ) dP2 .
(1.3)
If f ≥ 0 P -a.s., then (1.3) also holds for measurable f , where all terms in (1.3) equal to ∞
when f is not integrable (this result is called Tonelli’s Theorem).
Proof. Corollary 1.4 dealt with the special case f = 1A for some A ∈ F. By linearity, we
can extend the conclusion to all simple functions; then by monotone convergence, to all nonnegative functions; and lastly by separating f into its positive part f + := f ∨ 0 and negative
part f − := (−f ) ∨ 0, with f = f + − f − , extend the conclusion to all integrable f .
Exercise 1.8 If f can be both positive and negative and is not integrable, Fubini’s Theorem
may fail. Verify this by constructing a measurable f on [0, 1] × [0, 1] with Borel σ-algebra and
Lebesgue measure, such that the iterated integrals are well-defined and yet unequal, i.e.,
Z 1
Z 1
Z 1 Z 1
dx
f (x, y)dy 6=
dy
f (x, y)dx.
0
2
0
0
0
Infinite Product Spaces and Measures
To construct an infinite sequence of independent random variables, we need to extend the
construction of product spaces and measures to infinite products. Let (Ωn , Fn , Pn ), n ∈ N, be
a sequence of probability spaces. On the infinite product space Ω := Ω1 × Ω2 × · · · , we can
define the product σ-algebra F, which is generated by cylinder sets of the form
R = {(ω1 , ω2 , · · · ) ∈ Ω : ωi1 ∈ Ai1 , ωi2 ∈ Ai2 , · · · , ωik ∈ Aik }
for some k ∈ N and Ai1 ∈ Fi1 , · · · , Aik ∈ Fik . Note that R depends only on the coordinates
ωi1 , . . . , ωik , and the collection of finite disjoint unions of cylinder sets is an algebra I which
generates the product σ-algebra F. For a cylinder set R defined above, we can define its
probability by
P (R) = Pi1 (Ai1 ) × · · · × Pik (Aik ).
We can then extend the definition of P to all sets in I such that P is finitely-additive on I. By
Caratheodory Extension Theorem, to show that P can be extended to a unique probability
measure on F, it only remains to verify that P is countably-additive on the algebra I.
For this last step, we will apply Kolmogorov’s Extension Theorem, which requires (Ωn , Fn )
to be sufficiently nice, more precisely, to be Borel spaces. A Borel space is a measurable
space which is isomorphic to (B, B) (i.e., there is a measurable bijection between the two
measurable spaces) for some Borel set B ⊂ R, equipped with the Borel σ-algebra. The Borel
spaces we study in probability theory will mostly be complete separable metric spaces (called
Polish spaces if one keeps the topological structure induced by the metric, but not the metric
itself), equipped with the Borel σ-algebra.
3
Kolmogorov’s Extension Theorem is much more general than what we need for the construction of product measures on a countable product of Borel spaces. First of all, the product
space can be the product of an arbitrary collection of spaces (possibly uncountably many), as
long as each is a Borel space. The product σ-algebra F is still generated by the cylinder sets,
which depend on a finite number of coordinates. Secondly, the measure P on the algebra I
generated by the cylinder sets, which we try to extend to F, doesn’t need to be of product
form. A generic way of defining a finitely additive probability measure on I is to specify a
consistent family of probability measures on finite products of (Ωn )n∈N , called
Definition 2.1 [Consistent Finite-Dimensional Distributions] Let I be an arbitrary
index set (e.g., N or R), and let (Ωi , Fi ), i ∈ I, be measurable spaces. A family of probability
Q
measures PJ (indexed by finite J ⊂ I) on the product space ΩJ := j∈J Ωj , is called a
consistent family of finite-dimensional distributions (fdd), if for each L ⊂ J ⊂ I finite, PL =
−1
PJ ◦ πJ,L
, where πJ,L is the projection from ΩJ to ΩL with πJ,L (ωi , i ∈ J) = (ωi , i ∈ L).
Note that given a family of probability spaces (Ωi , Fi , Pi )i∈I , if we take (PJ , J ⊂ I finite) to
be the product measures on ΩJ , then they are consistent.
Theorem 2.2 [Kolmogorov’s Extension Theorem] Let I be an arbitrary index set (e.g.,
N or R), and let (Ωi , Fi ), i ∈ I, be Borel spaces. Let (PJ , J ⊂ I finite) be a consistent
family of finite-dimensional distributions. Then there exists a unique probability measure P
Q
on Ω := i∈I Ωi , equipped with the product σ-algebra F, such that PJ = P ◦ πJ−1 for all finite
J ⊂ I, where πJ is the projection from Ω to ΩJ with πJ (ωi , i ∈ I) = (ωi , i ∈ J).
The proof can be found for example in [3, Section 14.3], or in [1] for the case I = N and
Ωi = R.
Kolmogorov’s Extension Theorem is useful for constructing the distribution of an indexed
set of (say R-valued) random variables (Xi )i∈I defined on a common probability space, called a
stochastic process. Examples include discrete-time Markov Chains with I = N0 := {0} ∪ N,
continuous-time Markov processes with I = [0, ∞), random fields indexed by Rd , etc. When
I is uncountable, the product σ-algebra F is typically too small (since it only contains events
which depend on a countable number of coordinates), and many events of interest (such as
{Xi ≤ 1 ∀ i ∈ I}) would not be measurable. What we normally do then is to restrict to a
regular subset E ⊂ RI (e.g., requiring X· : I → R to be almost surely continuous if I = [0, 1]),
so that events of interest become measurable with respect to the product σ-algebra F on RI ,
restricted to E.
3
Independence
We now introduce the notion of independence. Let (Ω, F, P) be a probability space.
Definition 3.1 [Independence of Events] We say that A1 , A2 ∈ F are independent if
P(A1 ∩ A2 ) = P(A1 ) · P(A2 ). We say that a collection of events (Ai )i∈I , for some index set I,
Q
are independent if for any finite J ⊂ I, P(∩j∈J Aj ) = j∈J P(Aj ).
Definition 3.2 [Independence of Random Variables] We say that two random variables
Xi : (Ω, F, P) → (Ei , Bi ), i = 1, 2, are independent if for all B1 ∈ B1 and B2 ∈ B2 ,
P(X1 ∈ B1 , X2 ∈ B2 ) = P(X1 ∈ B1 )P(X2 ∈ B2 ).
4
We say that a collection of random variables Xi : (Ω, F, P) → (Ei , Bi ), i ∈ I for some index
set I, are independent if for any finite J ⊂ I and any Bj ∈ Bj ,
Y
P(Xj ∈ Bj for j ∈ J) =
P(Xj ∈ Bj ).
j∈J
Remark 3.3 Note that the notion of independence of random variables is a generalization
of the independence of events, if we identify each event Ai with the random variable Xi (ω) =
1Ai (ω).
Exercise 3.4 [Pairwise Independence vs Joint Independence] The pairwise independence of a collection of random variables does not imply their joint independence. Construct
an example with three random variables.
Exercise 3.5 Let Xi : (Ω, F, P) → (Ei , B), i ∈ I, be a collection of independent random
variables. Let fi : (Ei , Bi ) → (R, B(R)) be measurable. Prove that (fi (Xi ))i∈I is also a
collection of independent random variables.
Note that the notion of independence of a collection of random variables (Xi )i∈I is really
a statement about the independence of sets of the form Fi := {Xi−1 (Bi ) : Bi ∈ Bi }, for i ∈ I.
Namely, (Xi )i∈I are independent by definition if and only if for any finite J ⊂ I, if we pick a
set Aj from each Fj , j ∈ J, then (Aj )j∈J is a collection of independent events. Note that
Exercise 3.6 [σ-algebra Generated by a Random Variable] Let X : (Ω, F, P) → (E, B)
be a random variable. Prove that σ(X) := {X −1 (B) : B ∈ B} ⊂ F is a σ-algebra on Ω, which
is called the σ-algebra generated by the random variable X. Also show that σ(X) is the
smallest σ-algebra G on Ω which makes X : (Ω, G) → (E, B) measurable.
Therefore each Fi := {Xi−1 (Bi ) : Bi ∈ Bi } is a σ-algebra. This suggests that we can extend
the notion of independence of a collection of random variables to a collection of σ-algebras.
Definition 3.7 [Independence of σ-algebras] We say that a collection of σ-algebras Fi ⊂
F, i ∈ I for some index set I, are independent if for any finite J ⊂ I and any Aj ∈ Fj , the
collection of events (Aj )j∈J are independent. If we do not assume Fi ⊂ F to be σ-algebras,
then this defines the notion of independence for a collection of sets of events.
The independence of a collection of random variables (Xi )i∈I is then equivalent to the independence of the σ-algebras (σ(Xi ))i∈I generated by these random variables.
Exercise 3.8 Use the π-λ Theorem to show that if Gi ⊂ F, i ∈ I, is an independent collection
of sets of events, each being closed under intersection, then σ(Gi ), i ∈ I, is an independent
collection of σ-algebras.
Remark 3.9 The above exercise shows that to prove that a collection R-valued random
variables (X1 , . . . , Xn ) are independent, it suffices to verify that the events {Xi ≤ ai }, 1 ≤
i ≤ n, are independent for any (a1 , . . . , an ) ∈ Rn . This is because the collection of intervals
{(−∞, a] : a ∈ R} form a π-system that generates the Borel σ-algebra on R, which in turn
implies that {X −1 ((−∞, a]) : a ∈ R} is a π-system that generates σ(X) for any R-valued
random variable X.
We now study the joint distribution of independent random variables, which will also lead
to a way of constructing independent random variables.
5
Theorem 3.10 [Joint Distributions of Independent Random Variables] Let Xi , 1 ≤
i ≤ n, be a collection of independent random variables defined on the probability space (Ω, F, P),
and taking values respectively in the measurable space (Ei , Bi ). Then X := (X1 , . . . , Xn ) is a
random variable taking values in the product space E := E1 × E2 × · · · × En , equipped with the
product σ-algebra B. The distribution of X on E, P ◦ X −1 , is the product measure of P ◦ Xi−1
on Ei , 1 ≤ i ≤ n. The same conclusion holds for an arbitrary collection of independent
random variables (Xi )i∈I , if each Xi takes its values in a Borel space (Ei , Bi ).
Proof. Since rectangles of the form A := A1 × A2 × · · · × An , with Ai ∈ Bi , generate the
product σ-algebra on E, and X −1 (A) = ∩ni=1 Xi−1 (Ai ) ∈ F, it follows that X −1 (B) ∈ F for
all B ∈ B, and hence X is measurable.
To identify the measure P ◦ X −1 on (E, B), we first identify its value on measurable
rectangles A as introduced above. By the independence assumption,
P(A) = P(Xi ∈ Ai , 1 ≤ i ≤ n) =
n
Y
P(Xi ∈ Ai ),
i=1
which coincides with the product measure of P ◦ Xi−1 on Ei , 1 ≤ i ≤ n. Since measurable
rectangles generate the product σ-algebra B on E, it follows by Caratheodory’s Extension
Theorem that P ◦ X −1 must coincide with the product measure.
The case for an arbitrary collection of independent random variables (Xi )i∈I , each taking
its values in a Borel space, follows from Kolmogorov’s Extension Theorem.
Remark 3.11 Theorem 3.10 suggests a general way of constructing independent random
variables. To construct a collection of independent random variables X1 , . . . , Xn with respective distribution Pi on a measurable space (Ei , Bi ), we can just take (Ω, F, P) to be (E, B, P ),
where E := E1 × E2 × · · · × En , B is the product σ-algebra on E, P := P1 × · · · × Pn is
the product measure on (E, B), and let Xi : E → Ei be the coordinate projection map. The
same procedure constructs an arbitrary collection of independent random variables (Xi )i∈I ,
provided that each takes its values in a Borel space.
Exercise 3.12 Let X and Y be two independent R-valued random variables defined on a
probability space (Ω, F, P). Let f, g : R → R be measurable such that E[|f (X)|] < ∞ and
E[|g(Y )|] < ∞. Prove that E[f (X)g(Y )] = E[f (X)]E[g(Y )].
Using Theorem 3.10, we can determine the distribution of the sum of two independent
R-valued random variables X and Y as follows. Let µX and µY denote respectively the
distribution of X and Y on R. Then (X, Y ) has distribution µ := µX × µY on R2 . The
measure µ ◦ f −1 induced by f (x, y) = x + y is then the distribution of the random variable
X + Y . We will denote µ ◦ f −1 by µX ∗ µY , and call it the convolution of µX and µY . For
any A ∈ B(R), by Fubini, we then have
ZZ
Z
Z
(µX ∗ µY )(A) =
1{x+y∈A} µX (dx)µY (dy) =
µX (A − y)µY (dy) =
µY (A − x)µX (dx).
R2
R
R
If µX and µY are absolutely continuous w.r.t. Lebesgue measure with density fX and fY
respectively, then it is easy to see that µX ∗ µY is also absolutely continuous w.r.t. Lebesgue
measure, with density
Z
Z
(fX ∗ fy )(u) =
fX (u − y)fY (y)dy =
fY (u − x)fX (x)dx,
R
R
which is the usual notion of convolution of functions.
6
References
[1] R. Durrett. Probability: Theory and Examples, Duxbury Press.
[2] H. Friedman. A consistent Fubini-Tonelli theorem for nonmeasurable functions. Illinois J.
Math. 24, 390–395, 1980.
[3] A. Klenke. Probability Theory–A Comprehensive Course, Springer-Verlag.
7
					 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            