Download PROBABILITY AS A NORMALIZED MEASURE “Probability is a

PROBABILITY AS A NORMALIZED MEASURE A. BEJAN† Abstract. Axiomatic foundations of the probability theory, based on the notions of σ-algebra (σ-field) and normalized measures on this structure not only give a formal basis for the development of the theory, but allows to eliminate of some contradictions which might otherwise bring to some catastrophic consequences. “Probability is a degree of certainty, and differs from certainty as a part from a whole” J. Bernoulli, Ars Conjectandi (1713) 1. Elementary Probability Theory 1.1. Space of elementary events. Events. Operations in the class of events. Ω - space of elementary events, sample space. Contains all possible outcomes of the experiment. The result of the experiment, however, is just one of these outcomes. The elements of Ω are also called sample points. Any subset of Ω as a set of some possible outcomes can be naturally called an event: if A ⊆ Ω, then A is an event. We say that event A has occurred if the outcome of the experiment is contained in A. ∅ is the impossible event, Ω is the true event. Events are sets. For any given events, their unions, intersections, complements, differences are, again, events: As we shall see later, this is not actually true, and not any subset of Ω may be regarded to as an event! That is why we need that σ-zoo! A ∪ B, A ∩ B, A\B, Ā = Ac = Ω\A, A4B. Events A and B are said to be inconsistent if A ∩ B = ∅. If A ⊆ B ⊆ Ω, then B occurs whenever A occurs. One says that B follows from A. Example 1. Write in set-theoretic terms the event D that exactly two of the events A, B, C occur. D= Try events A = {2, 3, 5},B = {2, 4, 6}, C = {1, 3, 5} to check your Date: February 2006, HWU, young researchers seminar meeting ,. 1 B and C are the sets of even and odd numbers between 1 and 6; what is A? A. BEJAN† 2 formula. 1.2. Discrete sample space. This is nothing but a sample space which is countable: |Ω| = Card Ω ≤ ℵ0 , i.e. Ω = {ω1 , ω2 , . . . , ωn , . . .}. Definition 1.1. Any function p : Ω 7→ [0, 1], which satisfies the condition X p(ωi ) = 1, ωi ∈Ω is called the probability function. The value p(ωi ) is called the probability of the elementary event ωi . By the probability of an event A ⊆ Ω is understood to be the quantity X ∂ef P(A) = p(ωi ). ωi ∈A Example 2. Ω = {0, 1}n = {ω = (δ1 , δ2 , . . . , δn )|δi ∈ {0, 1}}. Let Σω = δ1 + . . . + δn , and put ∂ef p(ω) = γ Σω (1 − γ)n−Σω , for some γ ∈ [0, 1]. It is a matter of checking that p(ω) defined as above is a probability function on Ω. ... no space is left for this excercise, why? Example 3. Try to find the probability function on Ω = {0, 1}ℵ0 , by assigning probability values to each of elementary events ω from Ω. See the thorough discussion on this example in [4], Example 1.10. and this is a second argument why we are moving towards the σ-zoo Remark 1. Note, that by a summation one can assign (define) probabilities to events which are not more than countable! Otherwise the notion of the summation is not defined. 1.3. Geometric probability. The notion of the geometric probability can be introduced as it follows. Suppose the outcomes of the experiment can be put into one-toone correspondence to the points of some region Ω in Rn , so that the probability for the point-outcome to lie in any part of A ⊆ Ω does not PROBABILITY AS A NORMALIZED MEASURE 3 depend on the form of A and its position in Ω, but depends only on the measure of A, and hence, is proportional to this measure: µ(A) P(“ · ” ∈ A) = , µ(Ω) where µ(·) is just a geometric measure of a region - length, area, volume, etc. Example 4. Close your eyes, take a needle and mark some point on [0, 1] with it. Suppose that the nib of your needle has no diameter - it is a perfect point. What is the probability that you marked 0.29? At the same time this is not an event which is impossible - it is one of the elementary events in this experiment! Example 5. Pete and John have agreed to meet in the city centre on Friday between 19-00 and 20-00. They have decided only to wait one for another 10 minutes and then, in any case, to leave for a pub. What is the probability that they meet between 19-00 and 20-00? ... ... Non-uniform geometric probability ≡ physical notion (density, center of mass). 2. ”Non-elementary” Probability Theory Above we saw that it is not always possible to assign probabilities to sample points. Furthermore, some subsets of Ω may not be measurable. There is a way - to specify the subsets which are going to be considered as events and then define probability only on the set of events. 2.1. σ-algebra of a sample space. Definition 2.1. The set F ⊆ 2Ω is called a σ-algebra of Ω if the following conditions are satisfied: (1) Ω ∈ F; (2) A ∈ F ⇒ Ā ∈ F; (3) if A1 , A2 , . . . ∈ F then ∞ S Ai ∈ F. i=1 These are the axioms. One can check that these axioms are sufficient for F to be closed with respect to other basic set operations (difference, symmetric difference and intersection). note again - not all subsets of a geometric region are measurable in this sense, see Example 10 ”A point is that of which there is no part” Euclid answer: 11/36 A. BEJAN† 4 Remark 2. The first axiom can be substituted with the requirement that F is not empty. Property 1. ∅ ∈ F . Property 2. If A1 , A2 , . . . ∈ F then ∞ T Ai ∈ F. Proof: i=1 ∞ S Āi = i=1 ∞ T Ai i=1 Property 3. If A, B ∈ F, then A\B ∈ F. Proof: A\B = A ∩ B̄ ∈ F . T Theorem 2.2. The intersection Fλ (countable or uncountable) of λ∈Λ sigma algebras {Fλ }λ∈Λ over some set Ω is again σ-algebra over Ω and it is the smallest possible sigma-algebra over Ω (any other will contain it). Definition 2.3. Let G ⊆ 2Ω . The intersection of all sigma-algebras containing G which is the smallest sigma-algebra containing all elements of G is called to be the sigma-algebra generated by G and denoted by σ(G). Example 6. Let Ω = {1, 2, 3, 4, 5, 6}. The following families of subsets of Ω are σ-algebras: (1) (2) (3) (4) F F F F = {∅, Ω} - so called trivial sigma-algebra. = {∅, Ω, {1}, {1}} = {∅, Ω, A, A}, where A is some proper subset of Ω. = P(Ω) = 2Ω . Now find σ(G) if G = {{3, 4}}. Example 7. Consider the following examples. |σ(G)| = 8 (1) Let Ω = [0, 1] and G = {[0, 31 ], [ 12 , 1]}. What is σ(G) then? (2) Let Ω be the interval (0, 1] and let Fe to be the class of all sets of the form (a0 , a1 ] ∪ (a2 , a3 ] . . . ∪ (an−1 , an ], where 0 ≤ a0 ≤ a1 ≤ . . . ≤ an ≤ 1. Show that Fe is not a σ-algebra. 2.2. Probability as a normalized measure. Definition 2.4. Let Ω be some set and let F be a σ-algebra of its subsets. Function µ : F 7→ R ∪ {∞} is called a measure on (Ω, F) if it satisfies the following conditions: (1) measure is a non-negative function: µ(A) ≥ 0 ∀A ∈ F. PROBABILITY AS A NORMALIZED MEASURE 5 (2) measure is a sigma-additive function: ∞ ∞ [ X µ( Ai ) = µ(Ai ) ∀A1 , A2 , . . . ∈ F s.t. Ai ∩ Aj = ∅, i 6= j. i=1 i=1 Definition 2.5. Let Ω be some set and let F be a σ-algebra of its subsets. A measure on (Ω, F) is called normalized if µ(Ω) = 1. Another word for a normalized measure - probability, or probability measure. Definition 2.4 can be rewritten for probability measure. Definition 2.6. Let Ω be some set and let F be a σ-algebra of its subsets. Function P : F 7→ R ∪ {∞} is called a probability measure on (Ω, F) if it satisfies the following conditions: (1) it is a non-negative function: P(A) ≥ 0 ∀A ∈ F. (2) it is a sigma-additive function: ∞ ∞ [ X P( Ai ) = P(Ai ) ∀A1 , A2 , . . . ∈ F s.t. Ai ∩ Aj = ∅, i 6= j. i=1 i=1 (3) P(Ω)=1. Definition 2.7. A triple (Ω, F, P) is called a probability space if F is a σ-algebra on Ω and P is a probability measure on F. Property 4. P(∅) = 0. n n P S P(Ai ) ∀A1 , A2 , . . . , An ∈ F s.t. Ai ∩ Aj = Property 5. P( Ai ) = i=1 ∅, 1 ≤ i < j ≤ n. i=1 Property 6. P(Ā) = 1 − P(A). Property 7. If A ⊆ B then P(B\A) = P(B) − P(A). Property 8. 0 ≤ P(A) ≤ 1. Property 9. P(B ∪ A) = P(A) + P(B) − P(A ∩ B). Property 10. P(B ∪ A) ≤ P(A) + P(B). Property 11. P(A1 ∪ A2 ∪ . . . ∪ An ) ≤ P(A1 ) + P(A2 ) + . . . + P(An ). n P P Property 12. P(A1 ∪ A2 ∪ . . . ∪ An ) = P(Ai ) − P(Ai ∩ Aj ) + i=1 i<j P P(Ai ∩ Aj ∩ Am ) − . . . + (−1)n−1 P(A1 ∩ A1 . . . ∩ An ). i<j<m induction works perfectly, though takes some time A. BEJAN† 6 1 1 1 − 2! + 3! − ... + 1 n−1 (−1) → 1 − e−1 n! Example 8. A clerk has to arrange n letters into n envelopes. However, by some reasons the letters were arranged chaotically into envelopes. What is the probability that at least one letter has been placed into correct envelope. What is the limit of this probability when n → ∞? Example 9. A and B play a game until one wins once (and is declared the winner of the match). The probability that A wins each game is 0.3, the probability that B wins each game is 0.2. What is a suitable probability space, sigma algebra and the probability that A wins the match? 2.3. Borel σ-algebra and Lebesgue measure. Definition 2.8. The Borel sigma algebra is defined on a topological space (Ω, O) and is B = σ(O). The Borel σ algebra on R is σ(C), where C is any of the classes of sets as follows: (1) (2) (3) (4) (5) (6) C C C C C C = {(a, b)|a ≤ b a, b ∈ R}, = {(a, b]|a ≤ b a, b ∈ R}, = {[a, b)|a ≤ b a, b ∈ R}, = {[a, b]|a ≤ b a, b ∈ R}, = {(−∞, a]| a ∈ R}, = {(−∞, a)| a ∈ R}. The Borel σ algebra on R is denoted by B(R). Note that all common, usual subsets of R are in B(R). Particularly, R, any interval, any one point set, N and Q are in B(R). Question: does B(R) contain the set of all irrational numbers? Lemma 2.9. There exists unique measure λ on (R, B(R)) which assigns to each interval its length. This measure is called the Lebesgue measure. Finally, justify introduction of probability on σ-algebras by considering an example of the subset of a segment whose Lebesgue measure does not exist. Example 10. Vitali’s set Consider a unit circle (which is essentially a segment :) Take some irrational number α. The number nα cannot be integer for any n ∈ N (why?). Therefore if we take any point from [0, 2π], i.e. the point on the circle and mark all the points which are obtained by a rotation of x on the angle 2πnα, n = ±1, ±2, . . ., we will PROBABILITY AS A NORMALIZED MEASURE 7 never come back to x. There is a countable set Kx of all such points for any x from [0, 2π]. The circle is naturally divided on disjoint classes {Kx } then. Take from each Kx one and only one point and form the set A0 . Define by An the set of points obtained by rotation of the set ∞ S A0 on the angle 2πnα, n ∈ Z. The union An is nothing but a n=−∞ segment [0, 2π] then. Also these sets are disjoint, therefore the measure of their union is just a sum of measures. Suppose that A0 is measurable. Noting that all An have the same measure - it is equal to the measure of A0 , obtain the contradiction: Ã ∞ ! ∞ ∞ [ X X 2π = µ An = µ(An ) = µ(A0 ) 6= 2π. n=−∞ n=−∞ n=−∞ Assumption about measurability of the set A0 leads us to a contradiction. The set A0 is not measurable. Exercise 1. Find a non-measurable (in the sense of Lebesgue measure) set on R which is unbounded. Remark 3. One may notice the fact that the family of different classes Kx is uncountable1 - this makes one to be assailed by doubts about the existence of the set A0 (recall its definition). Indeed, historically, the term non-measurable set was established in the theory after Vitali has proved in 1905 a theorem which stated that any Lebesgue measurable set of a non-zero measure contains an uncountable subset, which is not Lebesgue measurable. Vitali used heavily the property of invariance of the Lebesgue measure with respect to parallel shifts in the Euclidean space. Some time later new constructions were proposed to show the existence of non-measurable sets (F. Bernstein, 1908; S. Ulam, 1930). However, all the new methods essentially used the so called axiom of uncountable choice, see [2] for more information. A long series of debates between mathematicians about the nature of unmeasurable set obtained in this way has seen the end in 1970, when R. Sollovay proved that it is impossible to prove the existence of unmeasurable sets without the axiom of uncountable choice. Acknowledgement. I am thankful to Tom for revising these notes. His observations and our discussion have led to some corrections and improvements. I am responsible for any mistakes/misprints/whatever which you can find here. 1my attention to this fact has been payed by Thomas Dodd. A. BEJAN† 8 You also may find useful and interesting the exposition in [3]. 3. FURTHER READING A.N. Kolmogorov. Grundbegriffe der Wahrscheinlichkeitsrechnung. Springer, Berlin, 1933. English translation (1950): Foundations of the theory of probability. Chelsea, New-York. References 1. Evans, L.C. An introduction to stochastic differential equations. Lecture Notes, version 1.2. 2. http://planetmath.org/encyclopedia/AxiomOfChoice.html 3. http://www.stats.uwaterloo.ca/ dlmcleis/s901/ 4. http://www.cs.cmu.edu/ chal/Shreve/chap1.ps † School of Mathematical and Computer Sciences, Heriot-Watt University

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download PROBABILITY AS A NORMALIZED MEASURE “Probability is a