Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
i i Chapter 3 Continuum Probability and Sets of Measure Zero In this chapter, we provide a motivation for using measure theory as a foundation for probability. It uses the example of random coin tossing to explain why we need to move past discrete probability theory and to figure out what would be needed in the new foundation (that has yet to be developed). Presuming that we can indeed create the necessary theoretical foundation, we show some important consequences that result. This is intended to justify the investment we have to make in rigorous analysis in the following chapters. We do not show that the required theoretical foundation exists in this chapter! This is meant to be a fun and engaging introduction into thought processes involved with measure theoretic probability. Moreover, it shows that formulating a vague idea of measure allows the possibility of stating and proving deep results. Enjoy this chapter as the next chapter provides all the heavy-going theory and proof a reader could want! After developing rigorous measure theory, we revisit the material in this chapter to verify that everything discussed is indeed rigorously justified. 3.1 Probability and sets of real numbers We begin by developing a connection between a probability space with an infinite number of points and an interval of real numbers. With this correspondence, we can then develop a systematic method for computing probabilities of events in the probability space by measuring the sizes of corresponding sets of real numbers. However, it turns out that perfectly reasonable probability questions correspond to very complicated sets of real numbers. Thus, we first need to develop a way to measure the size of rather unusual sets of real numbers. 3.1.1 Bernoulli sequences and the unit interval Definition 3.1.1. Suppose an experiment has two possible outcomes and the probabilities of these outcomes are fixed. A finite number of independent trials of the experiment is a called a Bernoulli trial. An infinite sequence of independent trials is called a Bernoulli sequence. Example 3.1.1. Let the experiment be the toss of two-sided coin, with a head denoted (H ) and tails denoted (T ). An example of a Bernoulli sequence is H,T,T,H,H,H,H,T,H,T,T,H,H,T,T,T,T,H,T,H,T,T,.... 29 i i 30 Chapter 3. Continuum Probability and Sets of Measure Zero Definition 3.1.2. We define the space of Bernoulli sequences, B = {all Bernoulli sequences generated by a particular experiment} . We use H and T to denote the two outcomes. For simplicity, we mostly treat the case where the outcomes have equal probability of occurring, i.e. corresponding to a “fair coin”. In general, the two results may have different probabilities. We show that B can (almost) be represented by the real numbers in (0, 1], which implies that B is uncountable. Theorem 3.1.1. If we delete a countable subset of B, we can index the remaining points using the numbers in (0, 1]. Recall that by index, we mean there is a 1 − 1 correspondence between the two sets. Proof. We construct a map from (0, 1] to B that fails to be onto by a countable subset. Any point x ∈ [0, 1] can be written as an expansion in base 2, or binary expansion, x= ∞ X ai i =1 2i , ai = 0 or 1. Each such expansion corresponds to a Bernoulli sequence. To see this, define the n t h term of the Bernoulli sequence to be H when an = 1 and T when an = 0. Example 3.1.2. H , T , H , H , H , T , T , H , T , T , H , · · · ↔ 0.10111001001 · · · . A problem with using real numbers as an index set is the fact that some numbers do not have a unique binary expansion but we consider two Bernoulli sequences with different members to be distinct. Example 3.1.3. 1 2 = 0.10 . . . and 1 2 = 0.0111 . . . , but H T T T T · · · = 6 T HHH ···. Thus, the method above used to generate a Bernoulli sequence does not define a function into B. To avoid this trouble, we adopt the convention that if the real number x has terminating and non-terminating binary expansions, we use the non-terminating expansion. This is the reason for using (0, 1] instead of [0, 1]. With this convention, the method above defines a 1 − 1 map into B that is not onto because it does not produce Bernoulli sequences ending in all T ’s. We claim that the set BT of such Bernoulli sequences is countable. Let BTk be the finite set of Bernoulli sequences that have only T ’s after the k t h term. We have, BT = ∞ [ BTk . (3.1) k=1 This implies that BT is countable and there is a 1 − 1 and onto correspondence between (0, 1] and B \ BT . i i 3.1. Probability and sets of real numbers 31 Proof Comment 3.1. The decomposition of a countable set as a countable union of finite sets in (3.1) is a standard measure theory argument. 3.1.2 Initial encounter with measure Since B is uncountable and BT is countable, we would like to “ignore” BT for all practical purposes and identify B with I = (0, 1]. Likewise, it turns out to be convenient to measure the size of any finite or countable subset of I as “negligible” compared to the size of I , which has a number of important ramifications. This is the first motivating example for devising a way to measure the size of sets of real numbers that applies to complex sets. Lebesgue developed an approach to measure the sizes of complex sets of real numbers that is the basis for measure theory. Measure theory can be developed in a very abstract way that applies to spaces of many different kinds of objects, though we focus on spaces consisting of real numbers in this book. In that context, it is initially reasonable to think of measure as a generalization of length in one dimension, and area and volume in higher dimensions. But, we also caution that measures can have other interpretations. For example, we use measure to quantify probability later on. To fit common conceptions of measuring the sizes of sets, at a minimum, a measure µ should satisfy some properties. Definition 3.1.3 (First Wish List for Measures). A measure µ is a real-valued function defined on a collection of subsets of a space X called the measurable sets. If A is a measurable set, µ (A) is the measure of A. At a minimum, the structure must satisfy: (Non-negativity) µ should be non-negative. (Closed under finite unions) If {Ai }ni=1 is a finite collection of disjoint measurable sets, n S then Ai is measurable. i=1 (Finite-additivity) If {Ai }ni=1 is a collection of disjoint measurable sets, then, n n [ X µ Ai = µ (Ai ). i=1 i=1 Thus, a measure is a non-negative finitely additive set function, just like a probability function. There should be a connection here. We pay particular attention to the case of real numbers: Definition 3.1.4. If the space X is an interval of real numbers and the measurable sets include intervals for which µ ((a, b )) = µ ([a, b ]) = µ ((a, b ]) = µ ([a, b )) = b − a, a, b ∈ X, we call µ the Lebesgue measure on X and write µ = µL . Note that this implies that the measure of a set of a single point is zero, i.e., µL ({a}) = 0. i i 32 Chapter 3. Continuum Probability and Sets of Measure Zero 3.1.3 Assigning probabilities to events in B So far, we have identified B with the interval of real numbers I and have introduced the desirability of a general way to measure the sizes of sets in I and some properties that such a measure should have. The next step is to assign a system for computing probabilities of events in B using the measure. For simplicity, we consider the case when T and H occur with equal probability. To start from what we know, we first consider the space consisting of a Bernoulli trial of finite length n. The probability of H as the first outcome in any trial is .5, and likewise the probability of T as the first outcome in any trial. This can be computed using simple counting over all possible trials of length n. Unfortunately, we cannot make a counting argument in the case of B, though intuition suggests that the probabilities are also .5. Switching to sets of real numbers, if AH is the event in B consisting of sequences where H is the first outcome, the corresponding set in I = (0, 1] is IAH = {x ∈ I ; x = 0.1a1 a2 a3 . . . : ai = 0 or 1} = (.5, 1]. Note that the largest number not in IAH is 0.100000 . . . while the largest number in IAH is 0.11111 . . . .) We do not include 1/2 because we use non-terminating expansions. Likewise, if AT is the event where T occurs as the first outcome, then IAT = (0, .5]. We have µL (IAH ) = µL (IAT ) = .5. In this case, based on the fact that IAH and IAT have equal measures, it seems reasonable to assign the probabilities, P (AH ) = µL (IAH ) = .5 and P (AT ) = µL (IAT ) = .5. Next, if we consider the events AH H , AH T , AT H , AT T in B in which the first two outcomes H H , H T , T H , T T are specified, the corresponding intervals are IAT T = (0, .25], IAT H = (.25, .5], IAH T = (.5, .75], IAH H = (.75, 1]. Since these intervals have equal length, we assign the probability of .25 to each and to each corresponding event. We can continue with this argument, considering the events corresponding to specification of the first three outcomes, then the first four outcomes, and so on. Considering the events in which the first n outcomes are specified, we obtain 2n intervals of equal length, and assign equal probability 2−n to each interval and thus each event. In this way, we obtain a sequence of “binary” partitions Tn of I into 2n nonovern lapping subintervals In, j of equal length such that I = ∪2j =1 In, j , see Fig. 3.1. We assign equal probabilities to each subinterval in a given partition and to the corresponding events. Moreover, it appears that any interval (a, b ] ⊂ I can be “approximated” arbitrarS ily well by I ⊂(a,b ] In, j in the sense that the intervals of points not in the approximation n, j (a, b ] \ ∪In, j ⊂(a,b ] In, j shrink in size as n increases, see Fig. 3.1. In view of the Wish List 3.1.3 and the fact that µL (I ) = 1, we extend these observations to a general principle of modeling. Axiom 3.1.1 (The Measure Theory Model for Probability on B). If A is an event in B, we let IA denote the corresponding set of real numbers in (0, 1]. Then, we assign the probability of A, denoted by P (E), to be µL (IA). All of this discussion is terribly vague, since we have not defined µL , described the collection of measurable sets, or quantified the sense of approximation of sets observed i i 3.1. Probability and sets of real numbers 33 0 1 T4 a b Figure 3.1. Illustration of the sequence of “binary” partitions Tn of I . We illustrate an approximation of the interval (a, b ) by subintervals in T5 . above! But, we verify these ideas are useful in some simple examples below and show that they lead to stating and proving important theorems in the next couple of sections. Example 3.1.4. Consider the event A in which H is the n t h outcome. Then, IA = x ∈ I ; x = 0.a1 a2 . . . 1an+1 an+2 an+3 . . . : ai = 0 or 1 Let s = 0.a1 a2 . . . an−1 1, so IA contains (s, s + 2n−1 ]. We can choose a1 , a2 , . . . an−1 in 2n−1 different ways and each of the resulting intervals are disjoint from the others, so we use finite additivity to conclude that, P (A) = µL (IA) = 2n−1 · 1 = 1/2. 2n As a concrete example, consider n = 3. Then, we have the following cases: H T H , H H H , T H H , H T H : corresponding to 4 disjoint intervals of length 1/23 , and P (A) = 4/8 = 1/2.) Example 3.1.5. Let A be the event where exactly i of the first n outcomes are H , so IA = x ∈ I ; x = 0.a1 a2 . . . an an+1 · · · : exactly i of the first n digits are 1 and remaining are 0 or 1 . Choose a1 , . . . , an so exactly i are 1 and set s = 0.a1 a2 . . . an . IA contains (s , s + 2−n ]. The intervals corresponding to different choices of a1 , . . . , an are disjoint and there are exactly n n! = , i i !(n − i)! such intervals. So P (A) = µL (IA) = n 1 · . i 2n i i 34 Chapter 3. Continuum Probability and Sets of Measure Zero 3.1.4 Recapping the construction of the model We note that there are actually two modeling steps involved with Axiom 3.1.1: Step 1 The adoption of the measure formulation for probability, which gives a procedure for computing probabilities of events; Step 2 The assignation of specific probabilities to events in B, i.e. P (A) = µL (IA) for A ⊂ B. Step 1 is a proposal for how to carry out stochastic computations in a probability space with an infinite number of points. This use of measure theory is not entirely free from controversy and there are alternative proposed frameworks. But it is fair to say that the proposal of measure theory as a foundation for probability by Kolmogorov stands as one of the great mathematical achievements of the Twentieth Century. The worthiness of measure theory as a framework for probability is demonstrated in part by the ability to state and prove important probabilistic results. We present a couple of examples in the next two sections and many examples in later chapters. The assigning of probabilities in Step 2 is subject to perhaps a greater degree of controversy. Partly, this is due to the fact that “randomness” is used to model various situations, including systems that are truly stochastic in nature and systems whose state is unknown but not truly stochastic. Even if a system is random, there may be limited information on the probability values of different events, and when there is information, it is often based on a finite set of observations. Above, we extrapolated to define P (E) = µL (IE ) working from a finite set of examples. We concluding by noting that the model derived in this section can be applied to a variety of situations. Example 3.1.6. We can use I an an index set for the points in the space corresponding to the random throw of a dart onto the interval I and it can index the time of arrival of a single α particle during a unit interval of time. We can also extend these ideas to higher dimensions, e.g., by considering a square dart board. Put a 2 d dart throwing example here 3.1.5 Numerical simulation References Exercises 3.2 The Weak Law of Large Numbers Continuing the program of motivating measure theory as a model for probability in B, we use it to state and prove some important results in probability. Of course, we have not shown that it is possible to derive measures yet and we have only described properties of measures under a lot of restrictions. But, we tackle those issues later. In the mean time, we begin by revisiting the Law of Large Numbers. Recall that intuition suggests that it should be possible to detect the probabilities of H and T in B by examining the outcomes of many repetitions of the experiment. In particular, the number of times that H occurs in a large number of trials should be related to the probability of H . However, as discussed earlier, a precise statement of this intuition i i 3.2. The Weak Law of Large Numbers 35 is difficult to formulate. Assuming the probability of H is p and Sn is the number of H ’s that occur in the first n trials, then if we could show that lim n→∞ Sn = p, n then this would be a mathematical statement expressing the intuition. But such a result is certainly false. A sequence of experiments could yield outcomes of all H ’s for example. So, we need to create a careful formulation. To make things simple, we assume that the probabilities of H and T are both .5. To state and prove the desired result, we introduce some functions. Definition 3.2.1. A random variable is a function on the outcomes of an experiment. The name “random variable” is a rather disconcerting name to assign to a function! Expressing and proving results in probability by using random variables is a supremely important technique. Definition 3.2.2. For x ∈ I , define the random variable, Sn (x) = a1 + · · · + an , where x = 0.a1 a2 · · · an · · · Sn gives the number of heads in the first n experiments of the Bernoulli sequence corresponding to x. Definition 3.2.3. Given δ > 0, define S (x) 1 In = x ∈ I : n − > δ . n 2 (3.2) Roughly speaking, this is the event consisting of outcomes for which there are not approximately the same number of H and T after n trials, where δ quantifies the discrepancy. We prove Theorem 3.2.1 (Weak Law of Large Numbers for Bernoulli Sequences). δ > 0, µL (In ) → 0 as n → ∞. For fixed (3.3) An observant reader should be uncomfortable at this conclusion, because In is an apparently complicated set, and we have not yet specified a procedure for computing the measure µL of complicated sets. Fortunately, during the proof, it becomes apparent that In is actually a finite collection of nonoverlapping intervals for which µL is defined. By definition, (3.3) implies that for any fixed δ > 0, given any ε > 0, µL S (x) 1 x ∈ I : n − > δ < ε, n 2 for all sufficiently large n. Identifying µL with P , we see that (3.3) extends the earlier Law of Large Numbers (2.4) to B. i i 36 Chapter 3. Continuum Probability and Sets of Measure Zero 1 R1 1 1/2 R2 1 1/4 1/2 3/4 1 1 -1 -1 R3 1/4 1/2 3/4 1 -1 Figure 3.2. Plots of the first three Rademacher functions. Remark 3.1. The idea of measuring the size of the set where a function takes a specified range of values is central to measure theory. However, such a set is not a finite collection of disjoint intervals in general. To prove the result, we reformulate it using two new random variables. Definition 3.2.4. For x ∈ I , we define the i t h Rademacher function by, Ri (x) = 2ai − 1, x = 0.a1 a2 · · · Equivalently, ¨ Ri (x) = 1, −1, ai = 1, ai = 0. We plot some of these functions in Fig. 3.2. Ri has a useful interpretation. Suppose we bet on a sequence of coin tosses such that at each toss, we win $1 if it is heads and lose $1 if it is tails. Then Ri (x) is the amount won or lost at the i t h toss in the sequence of tosses represented by x. The next random variable is; Definition 3.2.5. We define Wn (x) = n P i =1 Ri (x). Following the interpretation of Ri , Wn gives the total amount won or lost after the n t h toss in the betting game described above. By the definition of Ri , Wn (x) = 2(a1 + a2 + · · · + an ) − n = 2Sn (x) − n, Now, x = .a1 a2 a3 · · · . S (x) 1 − > ε ⇔ 2Sn (x) − n| > 2εn, x ∈ I : n n 2 or in other words, if and only if, |Wn (x)| > 2εn. (3.4) i i 3.2. The Weak Law of Large Numbers 37 f α included in set included in set Figure 3.3. We illustrate a typical set in Chebyshev’s inequality. Note that since ε is arbitrary, the factor 2 is immaterial. Definition 3.2.6. We define, An = {x ∈ I : |Wn (x)| > nε} . We can prove Theorem 3.2.1 by showing that µL (An ) → 0 as n → ∞. (3.5) To do this, we use a special version of an important result. Theorem 3.2.2 (Special Case of Chebyshev’s Inequality). Let f be a non-negative, piecewise constant function on I and α > 0 be a positive real number. Then, 1 µL ({x ∈ I : f (x) > α}) < α Z1 f (x) d x, 0 where the integral is the standard Riemann integral, which is well defined for piecewise constant, nonnegative functions. We illustrate the theorem in Fig. 3.3. Proof. [Theorem 3.2.2] Since f is piecewise constant, there is a mesh 0 = x1 < x2 < · · · < xn = 1 such that f (x) = ci for xi < x ≤ xi +1 for 1 ≤ i ≤ n − 1. Then since f is i i 38 Chapter 3. Continuum Probability and Sets of Measure Zero nonnegative, Z1 f (x) d x = n X i =1 0 ≥ ci (xi +1 − xi ) n X ci (xi +1 i =1 ci >α n X >α i =1 ci >α − xi ) (xi +1 − xi ) = αµL ({x ∈ I : f (x) > α}). Now we are ready to prove Theorem 3.2.1 Proof. We can also describe the set An as An = x ∈ I : Wn2 (x) > n 2 ε2 , where Wn2 (x) is piecewise constant and non-negative. By Theorem 3.2.2, 1 µL (An ) < n 2 ε2 Z1 Wn2 (x) d x. 0 We compute, Z1 Wn2 (x) d x 0 = Z1 X n i =1 0 1 2 Ri (x) dx = n Z X i =1 1 R2i (x) d x 0 + n Z X i , j =1 0 i6= j Ri (x)R j (x) d x. The first integral on the right is easy since R2i (x) = 1 for all x, so 1 n Z X i =1 R2i (x) d x = n. 0 R1 We consider 0 Ri (x)R j (x) d x when i 6= j . Without loss of generality, we assume i < j . Set J to be the interval, ` `+1 , , 0 ≤ ` < 2i . J= 2i 2i Ri is constant on J while R j oscillates 2( j − i ) times. Because this is an even number of oscillations, cancellation implies Z 1 0 Ri (x)R j (x) d x = Ri (x) Z 1 0 R j (x) d x = 0. i i 3.3. Sets of measure zero 39 Therefore, Z 1 Ri (x)R j (x) d x = 0, i 6= j . 0 R1 Thus, Wn2 (x) d x = n, and 0 µL (In ) ≤ 1 1 n= ⇒ µL (In ) → 0 n 2 ε2 nε2 as n → ∞. The random variables introduced for this proof can be used to quantify other interesting questions. Example 3.2.1. Suppose in the betting game above, we start with M dollars. We compute an expression that yields the probability we lose all the money. If An is the event where we lose the money on the n t h toss, then the corresponding set of numbers is IAn = {x ∈ I : Wi (x) > −M for i < n and Wn (x) = −M } . The set IAn , determined by where a function has prescribed values, is generally complicated. The event A of losing all the money, given by IA = ∞ [ IAn n=1 is even more complicated. The probability of A is µL (IA), once we figure out how that is computed. 3.2.1 Numerical simulation References Exercises 3.3 Sets of measure zero Theorem 3.2.1 states that the size of the event consisting of Bernoulli sequences for a fair coin for which the relative frequency of H ’s in the first N trials is larger than a fixed distance from 1/2 tends to 0 as N → ∞. But, this leaves open the question: For a fair coin and a “typical” x, does S (x) 1 = ? (3.6) lim n n→∞ n 2 This is an important question from the point of view of numerical simulation, as it is quite common that we would have only one numerical sequence corresponding to a choice of x in hand. Can we reliably use the computed example to try to approximate the answer to statistical questions? Definition 3.3.1. The set of normal numbers in I is Sn (x) 1 N= x ∈I : → as n → ∞ . n 2 i i 40 Chapter 3. Continuum Probability and Sets of Measure Zero Another way to state the intuition behind the Law of Large Numbers is that the nonnormal numbers should be atypical in some sense. Definition 3.3.2. An event in B is atypical if it has probability zero, or if the corresponding set of real numbers has Lebesgue measure 0. Thus, the intuition behind the Law of Large Numbers is that Nc should have Lebesgue measure zero. In this section, we characterize sets with Lebesgue measure zero. We noted above that the Lebesgue measure of a single point is zero. It follows immediately that finite collections of points also have Lebesgue measure zero. Infinite collections are apparently more complicated. For example, I is the uncountable union of single points and does not have Lebesgue measure zero. Working from the assumptions about measure we have made so far, we develop a general method for characterizing sets with Lebesgue measure zero. In doing so, we actually motivate several key aspects of measure theory. The characterization is based on a fundamentally important concept for metric spaces. Definition 3.3.3. Given a subset A ⊂ Rn , a countable cover of A is a countable collection ∞ S n of sets {Ai }∞ Ai . If the sets in a countable cover are open, we call it i =1 in R such that A ⊂ an open cover. i =1 We emphasize that the requirement of being countable is important. Definition 3.3.4. A set A ⊂ R has Lebesgue measure zero if for every ε > 0, there is a countable cover {Ai }∞ i =1 of A, where each Ai consists of a finite union of open intervals, such that ∞ X µL (Ai ) < ε. i=1 We also say that A has measure zero. Note that because each Ai in the countable cover consists of a finite union of open intervals, their Lebesgue measure is computable. In this way, we sidestep the issue of computing µL (A) directly. This definition also uses (implicitly) another property of Lebesgue measure: Definition 3.3.5. If (c, d ) ⊂ (a, b ), then µL ((c, d )) ≤ µL ((a, b )). We say that Lebesgue measure is monotone. We could use half open or closed interval in the definition instead of open intervals, but open intervals turn out to be convenient for “compactness” arguments. Example 3.3.1. We show that a closed interval [a, b ] with a 6= b cannot have measure zero. If [a, b ] is covered by countably many open intervals, we can extract a finite number that cover [a, b ] (a finite subcover) because it is compact. The sum of length of these intervals must be at least b − a. We describe some sets of measure zero. i i 3.3. Sets of measure zero 41 Theorem 3.3.1. 1. A measurable subset of a set of measure zero has measure zero. 2. If {Ai }∞ i =1 is a countable collection of sets of measure zero, then ∞ S Ai has measure zero. i=1 3. Any finite or countable set of numbers has measure zero. This states that a countable union of sets of measure zero is a set of measure zero. In contrast, uncountable unions of sets of measure zero can have nonzero measure. The assumption that the subset of the set of measure zero in 2. is measurable is an important point that we address in later chapters. Proof. Result 1. This follows from the definition since any countable cover of the larger set is also a cover of the smaller set. Result 2. We choose ε > 0. Since An has measure zero, there is a countable collection of open intervals Bn,1 , Bn,2 , . . . , covering An with ∞ X i =1 µL (Bn,i ) ≤ The collection {Bn,i }∞ is countable and covers n,i =1 ∞ X i ,n=1 µL (Bn,i ) = ∞ ∞ X X n=1 i =1 ε . 2n ∞ S n=1 An . Moreover, µL (Bn,i ) ≤ ∞ X ε = ε. n n=1 2 Note that we use non-negativity to switch the order of summation in this argument. Result 3. This follows from 2. and 3. and the observation that a point has measure zero. Proof Comment 3.2. This is a classic measure theory argument that the reader should study until it is familiar. An interesting question is whether or not there are any interesting sets of measure zero. We next show that there are uncountable sets of measure zero. In particular, we describe the construction of a special example that is used frequently in measure theory. The set is constructed by an iterative process. Definition 3.3.6. Step 1 Beginning with the unit F0 = [0, 1], divide F0 into 3 equal parts and remove 1interval 2 the middle third open interval 3 , 3 to get 1 2 F1 = 0, ∪ ,1 . 3 3 See Fig. 3.4. i i 42 Chapter 3. Continuum Probability and Sets of Measure Zero 1 o o F0 1_ 3 2_ 3 1 F1 Figure 3.4. The first step in the construction of the Cantor set. Step 2 Working on F1 next, divide each of its two pieces into equal thirds and remove the middle open intervals from the divisions to get F2 . 2 1 2 7 8 1 ∪ , ∪ , ∪ ,1 . F2 = 0, 9 9 3 3 9 9 This has 22 closed intervals of length 3−2 , see Fig. 3.5. 1_ 9 o 2_ 9 1_ 3 2_ 3 7_ 9 8_ 9 1 F2 Figure 3.5. The second step in the construction of the Cantor set. Step i Divide each of the 2i−1 pieces remaining after step i − 1 into equal thirds and remove the middle piece from each to get Fi . Fi has 2i closed intervals of length 3−i . End result This procedure yields a sequence of closed sets {Fi }, where each Fi is a finite union of 2i closed interval of length 3−i . The Cantor (Middle Third) Set C is defined, C= ∞ \ Fi . i =1 Theorem 3.3.2. Let C be the Cantor set in R. Then, 1. C is closed. 2. Every point in C is a limit of a sequence of points in C. 3. C has measure zero. 4. C is uncountable. Proof. Result 1 Exercise. Result 2 Exercise. Result 3 C is contained in Fi for any i. Since Fi is a union of disjoint intervals whose lengths sum to (2/3)i and, for any ε > 0, (2/3)i < ε for all sufficiently large i, C has measure zero. i i 3.3. Sets of measure zero Result 4 form 43 We show that every point x ∈ C can be represented uniquely by a series of the x= ∞ X ai i=1 3i , where ai = 0 or 2. This can be recognized as a base 3 decimal expansion. To show uniqueness, if ∞ ∞ X bi ai X = i 3 3i i=1 i=1 for ai , bi = 0 or 2, we show that ai = bi for all i. Suppose ai 6= bi for some i. Let n be the smallest number with an 6= bn , so |an − bn | = 2. Since |ai − bi | ≤ 2 for all i, ∞ ∞ ∞ X X |ai − bi | ai − bi X ai − bi 1 |an − bn | − 0= = ≥ 3i i =n 3i 3n 3i −n i =1 i =n+1 ∞ X 2 1 1 2− = . ≥ n i 3 3 3n i =1 This is a contradiction and so every number in C has a unique base 3 decimal expansion. Now let {Gi , j , j = 1, 2, . . . , 2i −1 } be the open “middle third” intervals removed to obtain Fi . Then, a number given by the base 3 decimal expansion 0.b1 b2 b3 . . ., bi = 0, 1, 2, is in Gi , j for some j if and only if: • b j = 0 or 2 for each j < i, because it is in Fi−1 ; • bi = 1, because it is in one of the discarded open intervals at this stage; • the b j ’s are not all 0 or 2 for j > i. It is a good exercise to use a variation of the Cantor diagonal argument to show that C is uncountable. Check notes on this proof. To give some idea of the importance of the concept of sets of measure zero, we quote a beautiful result of Lebesgue that states “if and only if” conditions for a function to be Riemann integrable. Recall that two aspects of Riemann integration provided significant impetus to the development of measure theory. First, there was a long search minimal equivalent conditions on a function that would guarantee the function is Riemann integrable. Second, the Riemann integral has some annoying “flaws”. We provide a theory for Riemann integration and discuss these issues in Appendix A. Here, we simply quote one of the most important results. To explain the idea, we begin with a canonic example. First, Definition 3.3.7. A property of sets that holds except on a set of measure zero is said to hold almost everywhere (a.e. ). We say that almost all points in a set have a property if all the points except those in a set of measure zero have the property. Now, the example. i i 44 Chapter 3. Continuum Probability and Sets of Measure Zero Definition 3.3.8. Dirichlet’s function is defined ¨ D(x) = 1, 0, if x ∈ Q, if x 6∈ Q. From the definition, D is a bounded function and D(x) = 0 a.e. It is a simple exercise to show that D is discontinuous at every point in I and therefore D(x) is not continuous a.e. We prove the following result in Appendix A. Theorem 3.3.3 (Lebesgue’s Theorem on Riemann Integration). A bounded function is Riemann integrable on a closed interval if and only if it is continuous a.e. on the interval. Add Theorem 1.3 from Billingsley? References Exercises 3.4 The Strong Law of Large Numbers We return to analyzing the set of normal numbers N. Theorem 3.4.1 (Strong Law of Large Numbers for Bernoulli Sequences). Nc is an uncountable set with Lebesgue measure zero. Unlike the Weak Law of Large Numbers Theorem 3.2.1, this theorem is a statement that requires measure theory. This version of the Law of Large Numbers is called strong because Theorem 3.4.1 implies Theorem 3.2.1. This is a consequence of a general result on different kinds of convergence that we prove later on. Proof. We first show that that Nc is uncountable and contains a “Cantor-like” set. Consider the map f : I → I , f(x) = 0.a1 11a2 11a3 11 . . . , for x = 0.a1 a2 a3 . . .. The map is 1 − 1, so its image is uncountable. Moreover, f(I ) is contained in Nc . In fact, if y = f(x), then S3n (y) ≥ 3n, and S3n (y) 2 ≥ . 3n 3 Such y’s clearly violate the Law of Large Numbers. The image set f(I ) is Cantor-like in that it is the countable nested union of sets consisting of finite number of well-separated, disjoint intervals. We cover the complicated set Nc using a countable cover of much simpler sets. Recall the set An = {x ∈ I : Wn (x) > εn} used in the proof of the Weak Law of Large Numbers. We use an equivalent definition, An = x ∈ I : Wn4 (x) > ε4 n 4 . i i 3.4. The Strong Law of Large Numbers 45 By Chebyshev’s Inequality 3.2.2, Z 1 µL (An ) ≤ 4 ε n4 1 0 Wn4 d x 1 ≤ 4 ε n4 Z1 X n 0 i =1 4 Ri d x. The integrand yields 5 kinds of terms, 1. R4i for i = 1 · · · n. 2. R2i R2j for i 6= j . 3. R2i R j Rk for i 6= j 6= k. 4. R3i R j for i 6= j . 5. Ri R j Rk R l for i 6= j 6= k 6= l . Since R4i (x) = 1 and R2i (x)R2j (x) = 1 for all i, j , Z 1 0 R4i d x = Z 1 0 R2i R2j d x = 1. We show the other terms integrate to zero because of cancellation. Two follow from the proof of the Weak Law of Large Numbers: Z 1 0 R2i Ri Rk d x = Z Z 1 0 R3i R j dx = 1 R j Rk d x = 0, i 6= j 6= k, 0 Z 1 Ri R j d x = 0, i 6= j . 0 Finally, assume i < j < k < l , and consider an interval of the form J= m m +1 , . 2k 2k Ri R j Rk is constant on J . However, R l oscillates 2(l − l ) times on J , so Z 1 Ri R j Rk R l d x = 0. 0 There are n terms of the first kind of integrand and 3n(n − 1) terms involving the second kind of integrand, so Z 1 0 Wn4 (x) d x = 3n 2 − 2n ≤ 3n 2 , and µL (An ) ≤ 3 n 2 ε4 . i i 46 Chapter 3. Continuum Probability and Sets of Measure Zero We cover Nc using a collection of sets of the form An for increasing n and decreasing ε chosen in such a way that the cover has arbitrarily small measure. For a constant C , set ε4n = C n −1/2 , so ∞ ∞ X 1 3 3 X . = 4 3/2 2 C n n=1 n=1 εn n The last series converges and the quantity can be made smaller than any δ > 0 by choosing sufficiently large C . Hence, given δ > 0, there is a sequence {εn } such that ∞ X 3 ≤ δ. 4 2 ε n=1 n n For each n, set Ãn = {x ∈ I : |Wn (x)| > εn n} . Note Ãn is a finite union of intervals since Wn is piecewise constant. We have µL (Ãn ) ≤ and ∞ X 3 , ε4n n 2 µL (Ãn ) ≤ δ. n=1 If we show that Nc ⊂ ∞ S n=1 Ãn , then we are done. This holds if N ⊃ then for each n, |Wn (x)| ≤ εn n, or |Wn (x)| n ≤ εn . Since εn → 0, |Wn (x)| n ∞ T n=1 Ãcn . If x ∈ ∞ T n=1 Ãcn , → 0, or x ∈ N. The proof of Theorem 3.4.1 can be used to draw stronger conclusions. For example, a normal number has the property that no finite sequence of digits occurs more frequently than any other finite sequence of digits. 3.4.1 Numerical simulation 3.5 A second wish list for measure theory With some informal experience with measure theory ideas, we make a second attempt at a wish list of desirable properties for a measure theory. We are considering the measure on Rn that extends the standard notions of length, area, and volume. If E ⊂ Rn for some n, let µ (E) denote its “measure”. 1. µ should be non-negative set function from sets in Rn into the extended reals R ∪ {∞}. µ ({x}) = 0 for a single point. µ (A) = ∞ should be possible for unbounded sets. 2. In R, we should have µ ([a, b ]) = b − a. In Rn , we should have µ (Q) = (b1 − a1 )(b2 − a2 ) . . . (bn − an ), for generalized rectangles (multi-intervals), Q = {x ∈ Rn : ai ≤ xi ≤ bi , 1 ≤ i ≤ n} . i i 3.5. A second wish list for measure theory 47 3. If {A1 , A2 , . . . , An } are disjoint sets, then µ (A1 ∪ A2 ∪ . . . ∪ An ) = n X i =1 µ (Ai ) . What about infinite collections? Well, µ ({x}) = 0. But in R, [ {x} . (0, 1) = x∈R This is a problem because we cannot have [ X {x} = 1 = µ ((0, 1)) = µ µ ({x}) = 0. x∈(0,1) x∈R So, uncountable collections of sets are a problem and we avoid them. What about countable collections? Countable disjoint collections of sets of measure zero should have measure zero. Also, 1 1 1 1 1 (0, 1] = , 1 ∪ , ∪ , ∪ . . . , 2 3 2 4 3 and, 1 1 1 1 1 + − + − + ... 1 = µ ((0, 1]) = 1 − 2 2 3 3 4 1 1 1 1 1 ,1 + µ , +µ , + .... =µ 2 3 2 4 3 So we would like to say that if {Ai } is a countable collection of disjoint sets then ∞ ∞ [ X Ai = µ (Ai ). µ i=1 i=1 4. If A ⊂ B are sets, then µ (A) ≤ µ (B), or µ should be “monotone”. 5. For the standard “volume” measure on Rn , if a set A is obtained from another set B by rotation, translation, or reflection maps, then µ (A) = µ (B). It turns out that we cannot construct a desirable measure that satisfies all of these properties. We have to give up something, so we do not require that the measure be defined on all subsets on Rn . We settle for a measure defined on a class of subsets. References Exercises