Download Continuum Probability and Sets of Measure Zero

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Birthday problem wikipedia , lookup

Randomness wikipedia , lookup

Dempster–Shafer theory wikipedia , lookup

Inductive probability wikipedia , lookup

Infinite monkey theorem wikipedia , lookup

Probability interpretations wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Law of large numbers wikipedia , lookup

Transcript
i
i
Chapter 3
Continuum Probability
and Sets of Measure Zero
In this chapter, we provide a motivation for using measure theory as a foundation for
probability. It uses the example of random coin tossing to explain why we need to move
past discrete probability theory and to figure out what would be needed in the new foundation (that has yet to be developed). Presuming that we can indeed create the necessary
theoretical foundation, we show some important consequences that result. This is intended to justify the investment we have to make in rigorous analysis in the following
chapters.
We do not show that the required theoretical foundation exists in this chapter! This is
meant to be a fun and engaging introduction into thought processes involved with measure theoretic probability. Moreover, it shows that formulating a vague idea of measure
allows the possibility of stating and proving deep results. Enjoy this chapter as the next
chapter provides all the heavy-going theory and proof a reader could want! After developing rigorous measure theory, we revisit the material in this chapter to verify that
everything discussed is indeed rigorously justified.
3.1 Probability and sets of real numbers
We begin by developing a connection between a probability space with an infinite number
of points and an interval of real numbers. With this correspondence, we can then develop
a systematic method for computing probabilities of events in the probability space by
measuring the sizes of corresponding sets of real numbers. However, it turns out that
perfectly reasonable probability questions correspond to very complicated sets of real
numbers. Thus, we first need to develop a way to measure the size of rather unusual sets
of real numbers.
3.1.1 Bernoulli sequences and the unit interval
Definition 3.1.1. Suppose an experiment has two possible outcomes and the probabilities of
these outcomes are fixed. A finite number of independent trials of the experiment is a called a
Bernoulli trial. An infinite sequence of independent trials is called a Bernoulli sequence.
Example 3.1.1. Let the experiment be the toss of two-sided coin, with a head denoted (H )
and tails denoted (T ). An example of a Bernoulli sequence is
H,T,T,H,H,H,H,T,H,T,T,H,H,T,T,T,T,H,T,H,T,T,....
29
i
i
30
Chapter 3. Continuum Probability and Sets of Measure Zero
Definition 3.1.2. We define the space of Bernoulli sequences,
B = {all Bernoulli sequences generated by a particular experiment} .
We use H and T to denote the two outcomes.
For simplicity, we mostly treat the case where the outcomes have equal probability of occurring, i.e. corresponding to a “fair coin”. In general, the two results may have different
probabilities.
We show that B can (almost) be represented by the real numbers in (0, 1], which
implies that B is uncountable.
Theorem 3.1.1. If we delete a countable subset of B, we can index the remaining points
using the numbers in (0, 1].
Recall that by index, we mean there is a 1 − 1 correspondence between the two sets.
Proof. We construct a map from (0, 1] to B that fails to be onto by a countable subset.
Any point x ∈ [0, 1] can be written as an expansion in base 2, or binary expansion,
x=
∞
X
ai
i =1
2i
,
ai = 0 or 1.
Each such expansion corresponds to a Bernoulli sequence. To see this, define the n t h term
of the Bernoulli sequence to be H when an = 1 and T when an = 0.
Example 3.1.2.
H , T , H , H , H , T , T , H , T , T , H , · · · ↔ 0.10111001001 · · · .
A problem with using real numbers as an index set is the fact that some numbers
do not have a unique binary expansion but we consider two Bernoulli sequences with
different members to be distinct.
Example 3.1.3.
1
2
= 0.10 . . . and
1
2
= 0.0111 . . . , but H T T T T · · · =
6 T HHH ···.
Thus, the method above used to generate a Bernoulli sequence does not define a function into B. To avoid this trouble, we adopt the convention that if the real number x has
terminating and non-terminating binary expansions, we use the non-terminating expansion. This is the reason for using (0, 1] instead of [0, 1].
With this convention, the method above defines a 1 − 1 map into B that is not onto
because it does not produce Bernoulli sequences ending in all T ’s. We claim that the
set BT of such Bernoulli sequences is countable. Let BTk be the finite set of Bernoulli
sequences that have only T ’s after the k t h term. We have,
BT =
∞
[
BTk .
(3.1)
k=1
This implies that BT is countable and there is a 1 − 1 and onto correspondence between
(0, 1] and B \ BT .
i
i
3.1. Probability and sets of real numbers
31
Proof Comment 3.1. The decomposition of a countable set as a countable union of finite
sets in (3.1) is a standard measure theory argument.
3.1.2 Initial encounter with measure
Since B is uncountable and BT is countable, we would like to “ignore” BT for all practical purposes and identify B with I = (0, 1]. Likewise, it turns out to be convenient to
measure the size of any finite or countable subset of I as “negligible” compared to the size
of I , which has a number of important ramifications. This is the first motivating example
for devising a way to measure the size of sets of real numbers that applies to complex sets.
Lebesgue developed an approach to measure the sizes of complex sets of real numbers
that is the basis for measure theory. Measure theory can be developed in a very abstract
way that applies to spaces of many different kinds of objects, though we focus on spaces
consisting of real numbers in this book. In that context, it is initially reasonable to think
of measure as a generalization of length in one dimension, and area and volume in higher
dimensions. But, we also caution that measures can have other interpretations. For example, we use measure to quantify probability later on.
To fit common conceptions of measuring the sizes of sets, at a minimum, a measure
µ should satisfy some properties.
Definition 3.1.3 (First Wish List for Measures). A measure µ is a real-valued function
defined on a collection of subsets of a space X called the measurable sets. If A is a measurable
set, µ (A) is the measure of A. At a minimum, the structure must satisfy:
(Non-negativity) µ should be non-negative.
(Closed under finite unions) If {Ai }ni=1 is a finite collection of disjoint measurable sets,
n
S
then
Ai is measurable.
i=1
(Finite-additivity) If {Ai }ni=1 is a collection of disjoint measurable sets, then,
‚n
Œ
n
[
X
µ
Ai =
µ (Ai ).
i=1
i=1
Thus, a measure is a non-negative finitely additive set function, just like a probability
function. There should be a connection here.
We pay particular attention to the case of real numbers:
Definition 3.1.4. If the space X is an interval of real numbers and the measurable sets include
intervals for which
µ ((a, b )) = µ ([a, b ]) = µ ((a, b ]) = µ ([a, b )) = b − a,
a, b ∈ X,
we call µ the Lebesgue measure on X and write µ = µL .
Note that this implies that the measure of a set of a single point is zero, i.e.,
µL ({a}) = 0.
i
i
32
Chapter 3. Continuum Probability and Sets of Measure Zero
3.1.3 Assigning probabilities to events in B
So far, we have identified B with the interval of real numbers I and have introduced the
desirability of a general way to measure the sizes of sets in I and some properties that such
a measure should have. The next step is to assign a system for computing probabilities of
events in B using the measure. For simplicity, we consider the case when T and H occur
with equal probability.
To start from what we know, we first consider the space consisting of a Bernoulli trial
of finite length n. The probability of H as the first outcome in any trial is .5, and likewise
the probability of T as the first outcome in any trial. This can be computed using simple
counting over all possible trials of length n. Unfortunately, we cannot make a counting
argument in the case of B, though intuition suggests that the probabilities are also .5.
Switching to sets of real numbers, if AH is the event in B consisting of sequences
where H is the first outcome, the corresponding set in I = (0, 1] is
IAH = {x ∈ I ; x = 0.1a1 a2 a3 . . . : ai = 0 or 1} = (.5, 1].
Note that the largest number not in IAH is 0.100000 . . . while the largest number in IAH
is 0.11111 . . . .) We do not include 1/2 because we use non-terminating expansions. Likewise, if AT is the event where T occurs as the first outcome, then IAT = (0, .5]. We have
µL (IAH ) = µL (IAT ) = .5. In this case, based on the fact that IAH and IAT have equal
measures, it seems reasonable to assign the probabilities,
P (AH ) = µL (IAH ) = .5 and P (AT ) = µL (IAT ) = .5.
Next, if we consider the events AH H , AH T , AT H , AT T in B in which the first two outcomes H H , H T , T H , T T are specified, the corresponding intervals are
IAT T = (0, .25], IAT H = (.25, .5], IAH T = (.5, .75], IAH H = (.75, 1].
Since these intervals have equal length, we assign the probability of .25 to each and to
each corresponding event. We can continue with this argument, considering the events
corresponding to specification of the first three outcomes, then the first four outcomes,
and so on. Considering the events in which the first n outcomes are specified, we obtain
2n intervals of equal length, and assign equal probability 2−n to each interval and thus
each event.
In this way, we obtain a sequence of “binary” partitions Tn of I into 2n nonovern
lapping subintervals In, j of equal length such that I = ∪2j =1 In, j , see Fig. 3.1. We assign equal probabilities to each subinterval in a given partition and to the corresponding
events. Moreover,
it appears that any interval (a, b ] ⊂ I can be “approximated” arbitrarS
ily well by I ⊂(a,b ] In, j in the sense that the intervals of points not in the approximation
n, j
(a, b ] \ ∪In, j ⊂(a,b ] In, j shrink in size as n increases, see Fig. 3.1.
In view of the Wish List 3.1.3 and the fact that µL (I ) = 1, we extend these observations to a general principle of modeling.
Axiom 3.1.1 (The Measure Theory Model for Probability on B). If A is an event
in B, we let IA denote the corresponding set of real numbers in (0, 1]. Then, we assign the
probability of A, denoted by P (E), to be µL (IA).
All of this discussion is terribly vague, since we have not defined µL , described the
collection of measurable sets, or quantified the sense of approximation of sets observed
i
i
3.1. Probability and sets of real numbers
33
0
1
T4
a
b
Figure 3.1. Illustration of the sequence of “binary” partitions Tn of I . We illustrate an
approximation of the interval (a, b ) by subintervals in T5 .
above! But, we verify these ideas are useful in some simple examples below and show that
they lead to stating and proving important theorems in the next couple of sections.
Example 3.1.4. Consider the event A in which H is the n t h outcome. Then,
IA = x ∈ I ; x = 0.a1 a2 . . . 1an+1 an+2 an+3 . . . : ai = 0 or 1
Let s = 0.a1 a2 . . . an−1 1, so IA contains (s, s + 2n−1 ]. We can choose a1 , a2 , . . . an−1 in 2n−1
different ways and each of the resulting intervals are disjoint from the others, so we use finite
additivity to conclude that,
P (A) = µL (IA) = 2n−1 ·
1
= 1/2.
2n
As a concrete example, consider n = 3. Then, we have the following cases: H T H , H H H ,
T H H , H T H : corresponding to 4 disjoint intervals of length 1/23 , and P (A) = 4/8 = 1/2.)
Example 3.1.5. Let A be the event where exactly i of the first n outcomes are H , so
IA = x ∈ I ; x = 0.a1 a2 . . . an an+1 · · · : exactly i of the first n digits are 1
and remaining are 0 or 1 .
Choose a1 , . . . , an so exactly i are 1 and set s = 0.a1 a2 . . . an . IA contains (s , s + 2−n ]. The
intervals corresponding to different choices of a1 , . . . , an are disjoint and there are exactly
n
n!
=
,
i
i !(n − i)!
such intervals. So
P (A) = µL (IA) =
n
1
· .
i
2n
i
i
34
Chapter 3. Continuum Probability and Sets of Measure Zero
3.1.4 Recapping the construction of the model
We note that there are actually two modeling steps involved with Axiom 3.1.1:
Step 1 The adoption of the measure formulation for probability, which gives a procedure
for computing probabilities of events;
Step 2 The assignation of specific probabilities to events in B, i.e. P (A) = µL (IA) for
A ⊂ B.
Step 1 is a proposal for how to carry out stochastic computations in a probability space
with an infinite number of points. This use of measure theory is not entirely free from
controversy and there are alternative proposed frameworks. But it is fair to say that the
proposal of measure theory as a foundation for probability by Kolmogorov stands as one
of the great mathematical achievements of the Twentieth Century. The worthiness of
measure theory as a framework for probability is demonstrated in part by the ability to
state and prove important probabilistic results. We present a couple of examples in the
next two sections and many examples in later chapters.
The assigning of probabilities in Step 2 is subject to perhaps a greater degree of controversy. Partly, this is due to the fact that “randomness” is used to model various situations,
including systems that are truly stochastic in nature and systems whose state is unknown
but not truly stochastic. Even if a system is random, there may be limited information on
the probability values of different events, and when there is information, it is often based
on a finite set of observations. Above, we extrapolated to define P (E) = µL (IE ) working
from a finite set of examples.
We concluding by noting that the model derived in this section can be applied to a
variety of situations.
Example 3.1.6. We can use I an an index set for the points in the space corresponding to the
random throw of a dart onto the interval I and it can index the time of arrival of a single α
particle during a unit interval of time.
We can also extend these ideas to higher dimensions, e.g., by considering a square dart
board. Put a 2 d dart throwing example here
3.1.5 Numerical simulation
References
Exercises
3.2 The Weak Law of Large Numbers
Continuing the program of motivating measure theory as a model for probability in B,
we use it to state and prove some important results in probability. Of course, we have not
shown that it is possible to derive measures yet and we have only described properties of
measures under a lot of restrictions. But, we tackle those issues later. In the mean time,
we begin by revisiting the Law of Large Numbers.
Recall that intuition suggests that it should be possible to detect the probabilities of
H and T in B by examining the outcomes of many repetitions of the experiment. In
particular, the number of times that H occurs in a large number of trials should be related
to the probability of H . However, as discussed earlier, a precise statement of this intuition
i
i
3.2. The Weak Law of Large Numbers
35
is difficult to formulate. Assuming the probability of H is p and Sn is the number of H ’s
that occur in the first n trials, then if we could show that
lim
n→∞
Sn
= p,
n
then this would be a mathematical statement expressing the intuition. But such a result
is certainly false. A sequence of experiments could yield outcomes of all H ’s for example.
So, we need to create a careful formulation.
To make things simple, we assume that the probabilities of H and T are both .5.
To state and prove the desired result, we introduce some functions.
Definition 3.2.1. A random variable is a function on the outcomes of an experiment.
The name “random variable” is a rather disconcerting name to assign to a function! Expressing and proving results in probability by using random variables is a supremely important technique.
Definition 3.2.2. For x ∈ I , define the random variable,
Sn (x) = a1 + · · · + an , where x = 0.a1 a2 · · · an · · ·
Sn gives the number of heads in the first n experiments of the Bernoulli sequence corresponding to x.
Definition 3.2.3. Given δ > 0, define
S (x) 1 In = x ∈ I : n
− > δ .
n
2
(3.2)
Roughly speaking, this is the event consisting of outcomes for which there are not approximately the same number of H and T after n trials, where δ quantifies the discrepancy.
We prove
Theorem 3.2.1 (Weak Law of Large Numbers for Bernoulli Sequences).
δ > 0,
µL (In ) → 0 as n → ∞.
For fixed
(3.3)
An observant reader should be uncomfortable at this conclusion, because In is an apparently complicated set, and we have not yet specified a procedure for computing the
measure µL of complicated sets. Fortunately, during the proof, it becomes apparent that
In is actually a finite collection of nonoverlapping intervals for which µL is defined. By
definition, (3.3) implies that for any fixed δ > 0, given any ε > 0,
µL
S (x) 1 x ∈ I : n
− > δ
< ε,
n
2
for all sufficiently large n. Identifying µL with P , we see that (3.3) extends the earlier
Law of Large Numbers (2.4) to B.
i
i
36
Chapter 3. Continuum Probability and Sets of Measure Zero
1
R1
1
1/2
R2
1
1/4 1/2 3/4
1
1
-1
-1
R3
1/4 1/2 3/4
1
-1
Figure 3.2. Plots of the first three Rademacher functions.
Remark 3.1. The idea of measuring the size of the set where a function takes a specified range
of values is central to measure theory. However, such a set is not a finite collection of disjoint
intervals in general.
To prove the result, we reformulate it using two new random variables.
Definition 3.2.4. For x ∈ I , we define the i t h Rademacher function by,
Ri (x) = 2ai − 1,
x = 0.a1 a2 · · ·
Equivalently,
¨
Ri (x) =
1,
−1,
ai = 1,
ai = 0.
We plot some of these functions in Fig. 3.2. Ri has a useful interpretation. Suppose we
bet on a sequence of coin tosses such that at each toss, we win $1 if it is heads and lose $1
if it is tails. Then Ri (x) is the amount won or lost at the i t h toss in the sequence of tosses
represented by x.
The next random variable is;
Definition 3.2.5. We define Wn (x) =
n
P
i =1
Ri (x).
Following the interpretation of Ri , Wn gives the total amount won or lost after the n t h
toss in the betting game described above. By the definition of Ri ,
Wn (x) = 2(a1 + a2 + · · · + an ) − n = 2Sn (x) − n,
Now,
x = .a1 a2 a3 · · · .
S (x) 1 − > ε ⇔ 2Sn (x) − n| > 2εn,
x ∈ I : n
n
2
or in other words, if and only if,
|Wn (x)| > 2εn.
(3.4)
i
i
3.2. The Weak Law of Large Numbers
37
f
α
included in set
included in set
Figure 3.3. We illustrate a typical set in Chebyshev’s inequality.
Note that since ε is arbitrary, the factor 2 is immaterial.
Definition 3.2.6. We define,
An = {x ∈ I : |Wn (x)| > nε}
.
We can prove Theorem 3.2.1 by showing that
µL (An ) → 0 as n → ∞.
(3.5)
To do this, we use a special version of an important result.
Theorem 3.2.2 (Special Case of Chebyshev’s Inequality). Let f be a non-negative,
piecewise constant function on I and α > 0 be a positive real number. Then,
1
µL ({x ∈ I : f (x) > α}) <
α
Z1
f (x) d x,
0
where the integral is the standard Riemann integral, which is well defined for piecewise constant, nonnegative functions.
We illustrate the theorem in Fig. 3.3.
Proof. [Theorem 3.2.2] Since f is piecewise constant, there is a mesh 0 = x1 < x2 <
· · · < xn = 1 such that f (x) = ci for xi < x ≤ xi +1 for 1 ≤ i ≤ n − 1. Then since f is
i
i
38
Chapter 3. Continuum Probability and Sets of Measure Zero
nonnegative,
Z1
f (x) d x =
n
X
i =1
0
≥
ci (xi +1 − xi )
n
X
ci (xi +1
i =1
ci >α
n
X
>α
i =1
ci >α
− xi )
(xi +1 − xi )
= αµL ({x ∈ I : f (x) > α}).
Now we are ready to prove Theorem 3.2.1
Proof. We can also describe the set An as
An = x ∈ I : Wn2 (x) > n 2 ε2 ,
where Wn2 (x) is piecewise constant and non-negative. By Theorem 3.2.2,
1
µL (An ) <
n 2 ε2
Z1
Wn2 (x) d x.
0
We compute,
Z1
Wn2 (x) d x
0
=
Z1 ‚X
n
i =1
0
1
Œ2
Ri (x)
dx =
n Z
X
i =1
1
R2i (x) d x
0
+
n Z
X
i , j =1
0
i6= j
Ri (x)R j (x) d x.
The first integral on the right is easy since R2i (x) = 1 for all x, so
1
n Z
X
i =1
R2i (x) d x = n.
0
R1
We consider 0 Ri (x)R j (x) d x when i 6= j . Without loss of generality, we assume i < j .
Set J to be the interval,
˜

` `+1
,
, 0 ≤ ` < 2i .
J=
2i 2i
Ri is constant on J while R j oscillates 2( j − i ) times. Because this is an even number of
oscillations, cancellation implies
Z
1
0
Ri (x)R j (x) d x = Ri (x)
Z
1
0
R j (x) d x = 0.
i
i
3.3. Sets of measure zero
39
Therefore,
Z
1
Ri (x)R j (x) d x = 0,
i 6= j .
0
R1
Thus, Wn2 (x) d x = n, and
0
µL (In ) ≤
1
1
n=
⇒ µL (In ) → 0
n 2 ε2
nε2
as n → ∞.
The random variables introduced for this proof can be used to quantify other interesting questions.
Example 3.2.1. Suppose in the betting game above, we start with M dollars. We compute an
expression that yields the probability we lose all the money.
If An is the event where we lose the money on the n t h toss, then the corresponding set of
numbers is
IAn = {x ∈ I : Wi (x) > −M
for i < n and Wn (x) = −M } .
The set IAn , determined by where a function has prescribed values, is generally complicated.
The event A of losing all the money, given by
IA =
∞
[
IAn
n=1
is even more complicated. The probability of A is µL (IA), once we figure out how that is
computed.
3.2.1 Numerical simulation
References
Exercises
3.3 Sets of measure zero
Theorem 3.2.1 states that the size of the event consisting of Bernoulli sequences for a
fair coin for which the relative frequency of H ’s in the first N trials is larger than a fixed
distance from 1/2 tends to 0 as N → ∞. But, this leaves open the question: For a fair
coin and a “typical” x, does
S (x) 1
= ?
(3.6)
lim n
n→∞
n
2
This is an important question from the point of view of numerical simulation, as it is quite
common that we would have only one numerical sequence corresponding to a choice of
x in hand. Can we reliably use the computed example to try to approximate the answer
to statistical questions?
Definition 3.3.1. The set of normal numbers in I is
Sn (x)
1
N= x ∈I :
→
as n → ∞ .
n
2
i
i
40
Chapter 3. Continuum Probability and Sets of Measure Zero
Another way to state the intuition behind the Law of Large Numbers is that the nonnormal numbers should be atypical in some sense.
Definition 3.3.2. An event in B is atypical if it has probability zero, or if the corresponding
set of real numbers has Lebesgue measure 0.
Thus, the intuition behind the Law of Large Numbers is that Nc should have Lebesgue
measure zero.
In this section, we characterize sets with Lebesgue measure zero. We noted above
that the Lebesgue measure of a single point is zero. It follows immediately that finite
collections of points also have Lebesgue measure zero. Infinite collections are apparently
more complicated. For example, I is the uncountable union of single points and does
not have Lebesgue measure zero. Working from the assumptions about measure we have
made so far, we develop a general method for characterizing sets with Lebesgue measure
zero. In doing so, we actually motivate several key aspects of measure theory.
The characterization is based on a fundamentally important concept for metric spaces.
Definition 3.3.3. Given a subset A ⊂ Rn , a countable cover of A is a countable collection
∞
S
n
of sets {Ai }∞
Ai . If the sets in a countable cover are open, we call it
i =1 in R such that A ⊂
an open cover.
i =1
We emphasize that the requirement of being countable is important.
Definition 3.3.4. A set A ⊂ R has Lebesgue measure zero if for every ε > 0, there is a
countable cover {Ai }∞
i =1 of A, where each Ai consists of a finite union of open intervals, such
that
∞
X
µL (Ai ) < ε.
i=1
We also say that A has measure zero.
Note that because each Ai in the countable cover consists of a finite union of open intervals, their Lebesgue measure is computable. In this way, we sidestep the issue of computing µL (A) directly. This definition also uses (implicitly) another property of Lebesgue
measure:
Definition 3.3.5. If (c, d ) ⊂ (a, b ), then µL ((c, d )) ≤ µL ((a, b )). We say that Lebesgue
measure is monotone.
We could use half open or closed interval in the definition instead of open intervals,
but open intervals turn out to be convenient for “compactness” arguments.
Example 3.3.1. We show that a closed interval [a, b ] with a 6= b cannot have measure zero.
If [a, b ] is covered by countably many open intervals, we can extract a finite number that
cover [a, b ] (a finite subcover) because it is compact. The sum of length of these intervals must
be at least b − a.
We describe some sets of measure zero.
i
i
3.3. Sets of measure zero
41
Theorem 3.3.1.
1. A measurable subset of a set of measure zero has measure zero.
2. If {Ai }∞
i =1 is a countable collection of sets of measure zero, then
∞
S
Ai has measure zero.
i=1
3. Any finite or countable set of numbers has measure zero.
This states that a countable union of sets of measure zero is a set of measure zero. In
contrast, uncountable unions of sets of measure zero can have nonzero measure. The
assumption that the subset of the set of measure zero in 2. is measurable is an important
point that we address in later chapters.
Proof.
Result 1. This follows from the definition since any countable cover of the larger set is
also a cover of the smaller set.
Result 2. We choose
ε > 0. Since
An has measure zero, there is a countable collection of
open intervals Bn,1 , Bn,2 , . . . , covering An with
∞
X
i =1
µL (Bn,i ) ≤
The collection {Bn,i }∞
is countable and covers
n,i =1
∞
X
i ,n=1
µL (Bn,i ) =
‚∞
∞
X
X
n=1
i =1
ε
.
2n
∞
S
n=1
An . Moreover,
Œ
µL (Bn,i ) ≤
∞
X
ε
= ε.
n
n=1 2
Note that we use non-negativity to switch the order of summation in this argument.
Result 3. This follows from 2. and 3. and the observation that a point has measure zero.
Proof Comment 3.2. This is a classic measure theory argument that the reader should study
until it is familiar.
An interesting question is whether or not there are any interesting sets of measure
zero. We next show that there are uncountable sets of measure zero. In particular, we
describe the construction of a special example that is used frequently in measure theory.
The set is constructed by an iterative process.
Definition 3.3.6.
Step 1 Beginning with the unit
F0 = [0, 1], divide F0 into 3 equal parts and remove
€ 1interval
Š
2
the middle third open interval 3 , 3 to get

‹ 
‹
1
2
F1 = 0,
∪ ,1 .
3
3
See Fig. 3.4.
i
i
42
Chapter 3. Continuum Probability and Sets of Measure Zero
1
o
o
F0
1_
3
2_
3
1
F1
Figure 3.4. The first step in the construction of the Cantor set.
Step 2 Working on F1 next, divide each of its two pieces into equal thirds and remove the
middle open intervals from the divisions to get F2 .
‹ 
‹ 
‹ 

‹
2 1
2 7
8
1
∪ ,
∪ ,
∪ ,1 .
F2 = 0,
9
9 3
3 9
9
This has 22 closed intervals of length 3−2 , see Fig. 3.5.
1_
9
o
2_
9
1_
3
2_
3
7_
9
8_
9
1
F2
Figure 3.5. The second step in the construction of the Cantor set.
Step i Divide each of the 2i−1 pieces remaining after step i − 1 into equal thirds and remove
the middle piece from each to get Fi . Fi has 2i closed intervals of length 3−i .
End result This procedure yields a sequence of closed sets {Fi }, where each Fi is a finite union
of 2i closed interval of length 3−i .
The Cantor (Middle Third) Set C is defined,
C=
∞
\
Fi .
i =1
Theorem 3.3.2. Let C be the Cantor set in R. Then,
1. C is closed.
2. Every point in C is a limit of a sequence of points in C.
3. C has measure zero.
4. C is uncountable.
Proof.
Result 1
Exercise.
Result 2
Exercise.
Result 3 C is contained in Fi for any i. Since Fi is a union of disjoint intervals whose
lengths sum to (2/3)i and, for any ε > 0, (2/3)i < ε for all sufficiently large i, C has
measure zero.
i
i
3.3. Sets of measure zero
Result 4
form
43
We show that every point x ∈ C can be represented uniquely by a series of the
x=
∞
X
ai
i=1
3i
,
where ai = 0 or 2. This can be recognized as a base 3 decimal expansion. To show uniqueness, if
∞
∞
X
bi
ai X
=
i
3
3i
i=1
i=1
for ai , bi = 0 or 2, we show that ai = bi for all i. Suppose ai 6= bi for some i. Let n be the
smallest number with an 6= bn , so |an − bn | = 2. Since |ai − bi | ≤ 2 for all i,
‚
Œ
∞
∞
∞
X
X
|ai − bi |
ai − bi X
ai − bi 1
|an − bn | −
0=
=
≥
3i i =n 3i 3n
3i −n
i =1
i =n+1
‚
Œ
∞
X
2
1
1
2−
= .
≥
n
i
3
3
3n
i =1
This is a contradiction and so every number in C has a unique base 3 decimal expansion.
Now let {Gi , j , j = 1, 2, . . . , 2i −1 } be the open “middle third” intervals removed to
obtain Fi . Then, a number given by the base 3 decimal expansion 0.b1 b2 b3 . . ., bi = 0, 1, 2,
is in Gi , j for some j if and only if:
• b j = 0 or 2 for each j < i, because it is in Fi−1 ;
• bi = 1, because it is in one of the discarded open intervals at this stage;
• the b j ’s are not all 0 or 2 for j > i.
It is a good exercise to use a variation of the Cantor diagonal argument to show that C is
uncountable.
Check notes on this proof.
To give some idea of the importance of the concept of sets of measure zero, we quote
a beautiful result of Lebesgue that states “if and only if” conditions for a function to be
Riemann integrable. Recall that two aspects of Riemann integration provided significant
impetus to the development of measure theory. First, there was a long search minimal
equivalent conditions on a function that would guarantee the function is Riemann integrable. Second, the Riemann integral has some annoying “flaws”. We provide a theory
for Riemann integration and discuss these issues in Appendix A. Here, we simply quote
one of the most important results.
To explain the idea, we begin with a canonic example. First,
Definition 3.3.7. A property of sets that holds except on a set of measure zero is said to hold
almost everywhere (a.e. ). We say that almost all points in a set have a property if all the
points except those in a set of measure zero have the property.
Now, the example.
i
i
44
Chapter 3. Continuum Probability and Sets of Measure Zero
Definition 3.3.8. Dirichlet’s function is defined
¨
D(x) =
1,
0,
if x ∈ Q,
if x 6∈ Q.
From the definition, D is a bounded function and D(x) = 0 a.e. It is a simple exercise to
show that D is discontinuous at every point in I and therefore D(x) is not continuous
a.e.
We prove the following result in Appendix A.
Theorem 3.3.3 (Lebesgue’s Theorem on Riemann Integration). A bounded function
is Riemann integrable on a closed interval if and only if it is continuous a.e. on the interval.
Add Theorem 1.3 from Billingsley?
References
Exercises
3.4 The Strong Law of Large Numbers
We return to analyzing the set of normal numbers N.
Theorem 3.4.1 (Strong Law of Large Numbers for Bernoulli Sequences). Nc is an
uncountable set with Lebesgue measure zero.
Unlike the Weak Law of Large Numbers Theorem 3.2.1, this theorem is a statement that
requires measure theory. This version of the Law of Large Numbers is called strong because Theorem 3.4.1 implies Theorem 3.2.1. This is a consequence of a general result on
different kinds of convergence that we prove later on.
Proof. We first show that that Nc is uncountable and contains a “Cantor-like” set. Consider the map f : I → I ,
f(x) = 0.a1 11a2 11a3 11 . . . ,
for x = 0.a1 a2 a3 . . .. The map is 1 − 1, so its image is uncountable. Moreover, f(I ) is
contained in Nc . In fact, if y = f(x), then S3n (y) ≥ 3n, and
S3n (y) 2
≥ .
3n
3
Such y’s clearly violate the Law of Large Numbers. The image set f(I ) is Cantor-like in
that it is the countable nested union of sets consisting of finite number of well-separated,
disjoint intervals.
We cover the complicated set Nc using a countable cover of much simpler sets. Recall
the set An = {x ∈ I : Wn (x) > εn} used in the proof of the Weak Law of Large Numbers.
We use an equivalent definition,
An = x ∈ I : Wn4 (x) > ε4 n 4 .
i
i
3.4. The Strong Law of Large Numbers
45
By Chebyshev’s Inequality 3.2.2,
Z
1
µL (An ) ≤
4
ε n4
1
0
Wn4 d x
1
≤
4
ε n4
Z1 ‚X
n
0
i =1
Œ4
Ri
d x.
The integrand yields 5 kinds of terms,
1. R4i for i = 1 · · · n.
2. R2i R2j for i 6= j .
3. R2i R j Rk for i 6= j 6= k.
4. R3i R j for i 6= j .
5. Ri R j Rk R l for i 6= j 6= k 6= l .
Since R4i (x) = 1 and R2i (x)R2j (x) = 1 for all i, j ,
Z
1
0
R4i d x =
Z
1
0
R2i R2j d x = 1.
We show the other terms integrate to zero because of cancellation. Two follow from the
proof of the Weak Law of Large Numbers:
Z
1
0
R2i Ri Rk d x =
Z
Z
1
0
R3i R j
dx =
1
R j Rk d x = 0,
i 6= j 6= k,
0
Z
1
Ri R j d x = 0,
i 6= j .
0
Finally, assume i < j < k < l , and consider an interval of the form
J=

‹
m m +1
,
.
2k 2k
Ri R j Rk is constant on J . However, R l oscillates 2(l − l ) times on J , so
Z
1
Ri R j Rk R l d x = 0.
0
There are n terms of the first kind of integrand and 3n(n − 1) terms involving the
second kind of integrand, so
Z
1
0
Wn4 (x) d x = 3n 2 − 2n ≤ 3n 2 ,
and
µL (An ) ≤
3
n 2 ε4
.
i
i
46
Chapter 3. Continuum Probability and Sets of Measure Zero
We cover Nc using a collection of sets of the form An for increasing n and decreasing
ε chosen in such a way that the cover has arbitrarily small measure. For a constant C , set
ε4n = C n −1/2 , so
∞
∞
X
1
3
3 X
.
=
4
3/2
2
C
n
n=1
n=1 εn n
The last series converges and the quantity can be made smaller than any δ > 0 by choosing
sufficiently large C . Hence, given δ > 0, there is a sequence {εn } such that
∞
X
3
≤ δ.
4 2
ε
n=1 n n
For each n, set
Ãn = {x ∈ I : |Wn (x)| > εn n} .
Note Ãn is a finite union of intervals since Wn is piecewise constant. We have
µL (Ãn ) ≤
and
∞
X
3
,
ε4n n 2
µL (Ãn ) ≤ δ.
n=1
If we show that Nc ⊂
∞
S
n=1
Ãn , then we are done. This holds if N ⊃
then for each n, |Wn (x)| ≤ εn n, or
|Wn (x)|
n
≤ εn . Since εn → 0,
|Wn (x)|
n
∞
T
n=1
Ãcn . If x ∈
∞
T
n=1
Ãcn ,
→ 0, or x ∈ N.
The proof of Theorem 3.4.1 can be used to draw stronger conclusions. For example, a
normal number has the property that no finite sequence of digits occurs more frequently
than any other finite sequence of digits.
3.4.1 Numerical simulation
3.5 A second wish list for measure theory
With some informal experience with measure theory ideas, we make a second attempt at
a wish list of desirable properties for a measure theory. We are considering the measure
on Rn that extends the standard notions of length, area, and volume. If E ⊂ Rn for some
n, let µ (E) denote its “measure”.
1. µ should be non-negative set function from sets in Rn into the extended reals R ∪
{∞}. µ ({x}) = 0 for a single point. µ (A) = ∞ should be possible for unbounded
sets.
2. In R, we should have µ ([a, b ]) = b − a. In Rn , we should have
µ (Q) = (b1 − a1 )(b2 − a2 ) . . . (bn − an ),
for generalized rectangles (multi-intervals),
Q = {x ∈ Rn : ai ≤ xi ≤ bi , 1 ≤ i ≤ n} .
i
i
3.5. A second wish list for measure theory
47
3. If {A1 , A2 , . . . , An } are disjoint sets, then
µ (A1 ∪ A2 ∪ . . . ∪ An ) =
n
X
i =1
µ (Ai ) .
What about infinite collections? Well, µ ({x}) = 0. But in R,
[
{x} .
(0, 1) =
x∈R
This is a problem because we cannot have
‚
Œ
[
X
{x} =
1 = µ ((0, 1)) = µ
µ ({x}) = 0.
x∈(0,1)
x∈R
So, uncountable collections of sets are a problem and we avoid them. What about
countable collections? Countable disjoint collections of sets of measure zero should
have measure zero. Also,

˜ 
˜ 
˜
1
1 1
1 1
(0, 1] = , 1 ∪ , ∪ , ∪ . . . ,
2
3 2
4 3
and,
‹ 
‹ 
‹

1 1
1 1
1
+
−
+
−
+ ...
1 = µ ((0, 1]) = 1 −
2
2 3
3 4
˜‹

˜‹

˜‹

1 1
1 1
1
,1 + µ
,
+µ
,
+ ....
=µ
2
3 2
4 3
So we would like to say that if {Ai } is a countable collection of disjoint sets then
‚∞ Œ ∞
[
X
Ai =
µ (Ai ).
µ
i=1
i=1
4. If A ⊂ B are sets, then µ (A) ≤ µ (B), or µ should be “monotone”.
5. For the standard “volume” measure on Rn , if a set A is obtained from another set
B by rotation, translation, or reflection maps, then µ (A) = µ (B).
It turns out that we cannot construct a desirable measure that satisfies
all of these properties.
We have to give up something, so we do not require that the measure be defined on
all subsets on Rn . We settle for a measure defined on a class of subsets.
References
Exercises