Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Sample Spaces and Events Random experiment: An experiment whose outcome is uncertain (e.g., coin tosses, drawing a card from a deck, tomorrow’s temperature in Kingston). MTHE/STAT 351: 1 – Axioms of Probability Sample space: The set of all possible outcomes of a random experiment. Usually denoted by S. T. Linder Events: Subsets of the sample space S. Usually denoted by A, B, C, etc. Notation: A ⇢ S. Queen’s University Examples: Fall 2016 - Flipping a coin. Possible outcomes are H and T . Sample space: S = {H, T }. - Rolling a die. S = {1, 2, 3, 4, 5, 6}. Let A = {1, 3, 5}. Then A ⇢ S is the event that “an odd number is rolled.” MTHE/STAT 351: 1 – Axioms of Probability 1 / 35 2 / 35 Sample spaces need not be finite. - Flipping a coin until we get a total of two heads or a total of two tails. S = {HH, T T, HT T, HT H, T HH, T HT } Examples: - Professor L. arrives to his 8:30 am class not later than 8:40. Let the observation be the amount of time (in minutes) he is late. We have The event “2 flips are needed to stop” is {HH, T T }. The event “3 flips are needed to stop” is {HT T, HT H, T HH, T HT }. S = {t : 0 t 10} = [0, 10]. - A coin is flipped 3 times and the sequence of outcomes is recorded. Then A = (3, 10] is the event that “Professor L. is more than 3 minutes late.” S = {HHH, HHT, HT H, HT T, T HH, T HT, T T H, T T T } - Observe the lifetime of a light bulb in hours. We have Suppose the number of heads is recorded. Then S = {t : 0 t < 1} = [0, 1). S = {0, 1, 2, 3} Then B = [0, 1000] is the event that the light bulb does not survive past 1000 hours. Note: An appropriate sample space for an experiment depends on what is being observed. MTHE/STAT 351: 1 – Axioms of Probability MTHE/STAT 351: 1 – Axioms of Probability 3 / 35 MTHE/STAT 351: 1 – Axioms of Probability 4 / 35 Language of probability theory If E ⇢ F , then F occurs whenever E occurs. We say that E implies F . Probability theory has its own terminology somewhat di↵erent from that of set theory. Universal set ! Sample space Set ! Event Let E1 , E2 , . . . , En be events. Then n [ Ei i=1 means that at least one of the Ei occurs. In this language S is the “certain event” and ; is the “impossible event.” Also, Let E and F be subsets of the sample space S. If the outcome of a random experiment belongs to E, we say that “E occurs.” n \ Ei i=1 How can E c , E [ F , EF , and E F be described in this language? c ! E does not occur E[F ! at least one of E or F occurs E \ F (or EF ) ! E and F occur simultaneously ! E occurs, but F does not occur E E F MTHE/STAT 351: 1 – Axioms of Probability is the event that all Ei occur simultaneously. If E1 , E2 , . . . , En are mutually exclusive, i.e., Ei \ Ej = ; for i 6= j, then Tn i=1 Ei is the impossible event. 5 / 35 MTHE/STAT 351: 1 – Axioms of Probability 6 / 35 Practice: E and F are events of the sample space S. The event “exactly one of E and F occurs” can be expressed as F ) [ (F (E E, F , and G are events of the sample space S. The event “at least one of E, F , G occurs” is E) Let’s show that this is equal to (E [ F ) properties of set operations: (E \ F ) using the E[F [G The event “at most one of E, F , G occurs” is (E = = F ) [ (F E) (EF c ) [ (F E c ) c c (E [ F )(E [ F})(F c [ E c ) | [ {zE})(F | {z S = (E [ F )(F c [ E c ) = (E [ F )(F \ E)c = (E [ F ) (F \ E) MTHE/STAT 351: 1 – Axioms of Probability ⇥ (E [ F [ G)c [ (E [ F [ G) | {z } | (by definition) none of them occurs (by distributivity) S ⇤ ((E \ F ) [ (E \ G) [ (F \ G)) {z } exactly one of them occurs (since AS = A for all A ⇢ S) (by DeMorgan’s law) (by definition). 7 / 35 MTHE/STAT 351: 1 – Axioms of Probability 8 / 35 (b) Let En be the event “n rolls necessary to complete the experiment.” Describe En in terms of the elements of S. Example: A die is rolled repeatedly until 6 appears. (a) What is the sample space S of the experiment? En is the collection of all length-n sequences of integer numbers between 1 and 6 such that 6 appears only in the last position: The sample space is the collection of all finite-length sequences of integer numbers between 1 and 6 such that 6 appears only in the last position, plus the collection of all infinite sequences in which 6 does not appear. Formally S= 1 [ n=1 {a1 , a2 , . . . , an 1, 6 : 1 aj 5, j = 1, . . . , n En = {a1 , a2 , . . . , an (c) What is the event ( 1 [ where A = {a1 , a2 , . . . : 1 aj 5, j = 1, 2, . . .}. n=1 9 / 35 1} c En ) ? En !c = {a1 , a2 , . . . : 1 aj 5, j = 1, 2, . . .} = A. MTHE/STAT 351: 1 – Axioms of Probability 10 / 35 Definition (Event Space) An event space is a collection of events (subsets) of a sample space S such that Axioms of probability Assume a random experiment is repeatedly performed. For an event E in the sample space, let n(E) be the number of times E occurs in the first n repetitions. One way of defining the probability P (E) of E is the limit of relative frequencies n(E) P (E) = lim n!1 n 1) S is an event 2) If E is an event, then E c is also an event S1 3) If E1 , E2 , E3 , . . . are events, then so is n=1 En . Remarks: ; is an event since S c = ;. There are several conceptual problems with this definition: The union of a finite collection of events is also an event since Sn S1 i=1 Ei = i=1 Ei if we set Ei = ; for i = n + 1, n + 2, . . .. Cannot repeat experiment infinitely many times. Why should the limit exist? The intersection of a finite or countably many events is also an event. For example, EF is an event since EF = (E c [ F c )c . Even if limit exists, why should it be the same if the entire experiment is repeated the second time. If S is a finite set, then the collection of all subsets of S is an event space. It turns out that it is much easier to get a mathematically consistent theory if we assume that P (E) exists for all events and satisfies certain intuitively desirable axioms. MTHE/STAT 351: 1 – Axioms of Probability n=1 : 1 aj 5, j = 1, . . . , n This is the event that 6 never appears. 1} [ A MTHE/STAT 351: 1 – Axioms of Probability S1 1, 6 A collection of subsets of S satisfying properties 1, 2, and 3 is called a -algebra or -field. 11 / 35 MTHE/STAT 351: 1 – Axioms of Probability 12 / 35 Theorem 1 P (;) = 0. Definition (Probability Axioms) A real-valued function P on the event space is called a probability function if it satisfies the following: Proof Let 1) P (E) 0 for all events E. Ei = ;, E1 = S, i = 2, 3, . . . S1 Then the Ei are mutually exclusive and S = i=1 Ei . Thus by Axiom 3, 2) P (S) = 1. 3) If E1 , E2 , E3 , . . . are mutually exclusive events (i.e., Ei Ej = ; for all i 6= j), then 1 1 [ X P Ei = P (Ei ). i=1 P (S) = 1 X P (Ei ) = P (S) + i=1 i=1 implying that Remark: Property 3 is called countable additivity. 1 X P (Ei ) i=2 1 X P (Ei ) = 0 i=2 These axioms directly imply the following theorems. By Axiom 1, P (Ei ) P (;) = 0. MTHE/STAT 351: 1 – Axioms of Probability 13 / 35 0 for all i. Since Ei = ; for all i 2, this gives ⇤ MTHE/STAT 351: 1 – Axioms of Probability 14 / 35 Another consequence of Axiom 3 is 3⇤ . (Finite additivity) If E1 , E2 , . . . , En is a finite collection of mutually exclusive events, then P n [ Ei = i=1 n X Theorem 2 P (E c ) = 1 P (Ei ). Proof E and E c are mutually exclusive, so i=1 P (E) + P (E c ) = P (E [ E c ) = P (S) = 1. Proof In Axiom 3 set Ei = ; for all i > n. Then P n [ Ei = P i=1 1 [ Ei = i=1 = n X i=1 1 X P (Ei ) + i=1 1 X Proof Since by Axiom 1, P (E) P (;) . {z =0 ⇤ Corollary 0 P (E) 1 for all events E. P (Ei ) i=n+1 | P (E). ⇤ 0 P (E) = 1 } Important special case: If E and F are mutually exclusive, then 0 and P (E c ) P (E c ) 1. | {z } 0 0, ⇤ P (E [ F ) = P (E) + P (F ) MTHE/STAT 351: 1 – Axioms of Probability 15 / 35 MTHE/STAT 351: 1 – Axioms of Probability 16 / 35 Equally likely outcomes Example: We call a coin fair if H and T are equally likely in a single toss. We’ll apply the axioms to figure out P ({H}) and P ({T }). Suppose S = {s1 , s2 , . . . , sN } is a sample space with N equally likely outcomes: P ({s1 }) = P ({s2 }) = . . . = P ({sN }). Since S = {H, T } and {H} \ {T } = ;, we have 1 = P (S) Using the axioms as before, = P ({H, T }) = P ({H}) + P ({T }) = 2P ({H}) (by Axiom 3) 1 = P (S) = P N [ i=1 (by the equally likely assumption) {si } = N X i=1 P ({si }) = N P ({s1 }) so Thus P ({H}) = P ({T }) = 1/2. P ({si }) = Note that this is not the result of a practical experiment, but the consequence of the axioms. MTHE/STAT 351: 1 – Axioms of Probability 1 , N i = 1, . . . , n Using this, we show that the “equally likely outcomes” assumption imposes a probability function on S. 17 / 35 MTHE/STAT 351: 1 – Axioms of Probability 18 / 35 Example: If two dice are rolled, what is the probability that the sum of the obtained two numbers is 5? Theorem 3 Let S be a sample space consisting of N equally likely outcomes. Then for all E ⇢ S, |E| P (E) = N where |E| denotes the number of elements in E. Solution: S = {(i, j) : 1 i, j 6}. We assume that the dice are fair, so all 36 outcomes in S are equally likely. If E = “sum is 5,” then E = {(1, 4), (2, 3), (3, 2), (4, 1)} 4 1 = . 36 9 Example: What is the probability of getting exactly 2 heads in 3 flips of a fair coin? so P (E) = Proof We showed that P ({s}) = 1/N for any s 2 S. Thus P (E) = P [ s2E {s} = X s2E P ({s}) = X 1 |E| = . N N ⇤ Solution: S = {HHH, HHT, HT H, T HH, HT T, T HT, T T H, T T T } and E = {HHT, HT H, T HH}. Thus s2E P (E) = |E| 3 = . 8 8 Later we’ll learn how to calculate the probability of 3 heads in 20 flips. MTHE/STAT 351: 1 – Axioms of Probability 19 / 35 MTHE/STAT 351: 1 – Axioms of Probability 20 / 35 More basic consequences of the axioms The following is an example where the outcomes are not equally likely. Example: A, B, and C are the only competitors in a race. A is twice as likely as B to win, and C is 2/3 as likely as A to win. There are no ties. What are the probabilities of winning for A, B, and C? Theorem 4 If E ⇢ F , then Solution: Let A = “A wins,” B = “B wins,” C = “C wins.” Then S = {A, B, C} and Proof E ⇢ F implies that F = E [ (F disjoint. Thus by Axiom 3, 1 = P (S) = P (A) + P (B) + P (C). We also know that P (A) = 2P (B) 6 , 13 P (B) = 3 , 13 P (C) = E are ⇤ P (E). 0 by Axiom 1 This is a system of 3 linear equations with 3 unknowns.The solution is P (A) = E). Also, E and F P (F ) = P (E) + P (F E) | {z } P (C) = 23 P (A). and P (E) P (F ) Corollary If E ⇢ F , then 4 . 13 P (F MTHE/STAT 351: 1 – Axioms of Probability 21 / 35 Theorem 5 For arbitrary events E and F E) = P (F ) P (E) MTHE/STAT 351: 1 – Axioms of Probability 22 / 35 Example: An integer between 1 and 100 is chosen at random. What is the probability that it is divisible by either 5 or 7? P (E [ F ) = P (E) + P (F ) Solution: E = “divisible by 5,” F = ”divisible by 7.” Then P (EF ) P (E [ F ) = P (E) + P (F ) P (EF ) where Proof We have E [ F = E [ (F disjoint, by Axiom 3 (EF )). Since E and F P (E [ F ) = P (E) + P (F |E| 20 |F | 14 = , P (F ) = = . 100 100 100 100 Since 5 and 7 are primes, an integer is divisible by both 5 and 7 i↵ it is |EF | 2 divisible by 5 · 7 = 35. Thus P (EF ) = = and 100 100 20 14 2 P (E [ F ) = + = 0.32 100 100 100 (EF ) are P (E) = (EF )). But EF ⇢ F , so by the previous theorem P (E [ F ) = P (E) + P (F ) P (EF ). ⇤ One can generalize Theorem 5 to n > 2 events. For 3 events we have Corollary For any two events E and F P (E [ F [ G) P (E [ F ) P (E) + P (F ) = P (E) + P (F ) + P (G) This inequality is called the union bound. MTHE/STAT 351: 1 – Axioms of Probability P (EF ) P (EG) P (F G) + P (EF G) This is the so called inclusion-exclusion principle. 23 / 35 MTHE/STAT 351: 1 – Axioms of Probability 24 / 35 Example: In a hotel with 300 guests, there are 27 guests who smoke cigarettes, 11 who smoke cigars, 8 who smoke pipes, 4 who smoke both cigarettes and cigars, 3 who smoke both cigarettes and pipes, 3 who smoke both pipes and cigars. Also, there is one guest who smokes all three. By the inclusion-exclusion formula P (E [ F [ G) = How many non-smoking guests are staying in the hotel? P (E) + P (F ) + P (G) Since P (A) = Solution: Let S denote the set of all guests. If a guest is randomly chosen, then for all A ⇢ S the probability that A contains this guest is |A| P (A) = , where N = 300. N Let E = “guests who smoke cigarettes,” F = “guests who smoke cigars,” and G = “guests who smoke pipes.” Then E [ F [ G is the set of guests who smoke and P (EF ) P (EG) P (F G) + P (EF G). |A| , multiplying both sides by N gives N |E [ F [ G| = |E| + |F | + |G| |EF | |EG| |F G| + |EF G|. We are given that |E| = 27, |F | = 11, |G| = 8, |EF | = 4, |EG| = 3, |F G| = 3, and |EF G| = 1. Thus |E [ F [ G| = 27 + 11 + 8 N · P (E [ F [ G) = |E [ F [ G| Hence there are 300 4 3 3 + 1 = 37 37 = 263 non-smokers among the guests. is the number of guests who smoke. MTHE/STAT 351: 1 – Axioms of Probability 25 / 35 MTHE/STAT 351: 1 – Axioms of Probability 26 / 35 Continuity of the probability function If {En } is increasing define Recall that a function f : R ! R is continuous on the real line if and only if (i↵) for all x and convergent sequences {xn }1 n=1 such that lim xn = x, we have lim En = n!1 1 [ En . 1 \ En . n=1 For a decreasing sequence {En } define n!1 lim f (xn ) = f (x). n!1 lim En = n!1 Probability functions have an analogous continuity property. Call a sequence of events E1 , E2 , E3 , . . . increasing if n=1 Theorem 6 (Continuity of Probability Function) If {En } is an increasing or decreasing sequence of events, then E1 ⇢ E 2 ⇢ · · · ⇢ En ⇢ · · · lim P (En ) = P ( lim En ). and decreasing if n!1 E1 MTHE/STAT 351: 1 – Axioms of Probability E2 ··· En n!1 ··· 27 / 35 MTHE/STAT 351: 1 – Axioms of Probability 28 / 35 Random selection of a point from an interval Proof for increasing sequences Let A1 = E1 and An = En En 1 for n 2. (Thus An contains all elements of En that are not in any of E1 , . . . , En 1 ). The events A1 , A2 , A3 , . . . are mutually exclusive and n [ n [ Ai = i=1 1 [ and Ei i=1 Ai = i=1 1 [ We want to build a probability model for “randomly selecting” a point from a bounded interval [a, b] = {x : a x b}. For any sub-interval [↵, ] ⇢ [a, b], we’ll denote the event “the point falls in [↵, ]” also by [↵, ]. Ei i=1 Thus P ( lim En ) n!1 = P 1 [ En = P n=1 = lim n!1 i=1 P = An = n=1 n X | 1 [ i=1 Ai lim P (En ). } P An (by Axiom 3) Ai = lim P n!1 {z By intuitive reasoning, for ↵ < the probability P ([↵, ]) should be proportional to the length of [↵, ], i.e., n=1 n [ P Ai = lim P Sn 1 X i=1 n!1 P ([↵, ]) = k( n [ Ei 1 = P (S) = P ([a, b]) = k(b En n!1 so k = 1 b a. Thus, if ↵ < 29 / 35 En = [x Then E1 E2 E3 ✏, x + ✏] 2 [a, b]. Define P ([↵, ]) · · · (decreasing sequence) and n!1 1 \ n=1 = P ({↵}) +P ((↵, )) + P ({ }) | {z } | {z } =0 P ((↵, )). Similarly we can show that P ([↵, ]) = P ((↵, ]) = P ([↵, )). Remarks: P ({x}) = P ( lim En ) = lim P (En ) = lim n!1 2✏/n = 0. b a The fact that P ({x}) = 0 shows that there are non-empty events with zero probability. We have seen that P ((a, b)) = P ([a, b]) = 1. Thus there are events with probability 1 that are not equal to S. Thus selecting x has zero probability for all x. MTHE/STAT 351: 1 – Axioms of Probability P ({↵} [ (↵, ) [ { }) =0 En = {x} n!1 30 / 35 = = Thus by the continuity of P n!1 ↵ . a MTHE/STAT 351: 1 – Axioms of Probability ✏/n , x + ✏/n]. lim En = b What is the probability of (↵, ) ⇢ [a, b]? What is the probability of selecting a given point x? Let x 2 (a, b) and choose ✏ > 0 such that [x a) and [↵, ] ⇢ [a, b], P ([↵, ]) = MTHE/STAT 351: 1 – Axioms of Probability for some k > 0. By the axioms, i=1 | {z } ⇤ ↵) 31 / 35 MTHE/STAT 351: 1 – Axioms of Probability 32 / 35 Technical detour Example: A bus arrives at a bus station at random time between 8:00 and 8:15 am. Its scheduled arrival time is 8:05 am. Let’s call the bus almost punctual if it less than 2 minutes early and less than 5 minutes late. What is the probability that the bus is not almost punctual? One can ask the following question: in the random selection of a point from an interval experiment what subsets of the interval [a, b] are events? Recall that collection of events of a sample space have the property that Solution: Let’s measure the time in minutes. For simplicity, shift the time interval so that the bus randomly arrives in the interval [0, 15] and the scheduled arrival time is 5. Then “bus almost punctual” = (3, 10). Thus (a) S is an event, (b) If E is an event, then E c is also an event, (c) if E1 , E2 , . . . is a sequence of events, then P (not almost punctual) = 1 P (almost punctual) = 1 = 1 P ((3, 10)) 10 3 8 = . 15 15 S1 n=1 En is also an event. If S is a finite or countably infinite set, then the set of events is usually taken to be the collection of all subsets of S. (Check that (a), (b), and (c) hold in this case.) If S in an uncountably infinite set, such as the interval [a, b], the choice of events is more tricky. MTHE/STAT 351: 1 – Axioms of Probability 33 / 35 We have already seen that all (open or closed) subintervals of [a, b] are events. It follows that all finite and infinite unions of open or closed (or half open, half closed) subintervals of [a, b] are events. We know from real analysis that each open set in R can be written as a union of countably many open intervals. Thus by (c) all open subsets of [a, b] are events. Since the complement of a closed set is an open set, if follows from (b) that all closed subsets of [a, b] are events. The smallest collection of subsets of [a, b] which contain all open sets in [a, b] and satisfy (a), (b), (c) are called the collection of Borel sets in [a, b]. It turns out that the notion of length can be extended to any Borel set B ⇢ [a, b], and one can define the probability that a point randomly chosen from [a, b] falls into B by P (B) = MTHE/STAT 351: 1 – Axioms of Probability length(B) b a 35 / 35 MTHE/STAT 351: 1 – Axioms of Probability 34 / 35