Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
6 HARRY CRANE 2. Probability models In this section, we use previous examples to introduce the concept of a probability model, which is fundamental to statistical methodology. Example 2.1 (Dice models). (a) two 6-sided dice (one black, one white): Write the outcome of a roll of these dice as an ordered pair (W, B), where the first entry is the outcome on the white die and the second entry is the outcome on the black die. – #{black die shows 1} = #{B = 1} = #{(1, 1), (2, 1), (3, 1), (4, 1), (5, 1), (6, 1)} = 6 = #{white die shows 1}. – #{at least one die shows 1} = #{(1, 1), . . . , (1, 6), (2, 1), . . . , (6, 1)} = 11 , 6 + 6. – #{neither die shows 1} = 25 = 36 − 11. – #{black and white die show same number} = #{B = W} = #{(1, 1), (2, 2), . . . , (6, 6)} = 6. (b) two n-sided dice (one black, one white): Suppose n ≥ 4. Then Ω = {(i1 , i2 ) : 1 ≤ i1 , i2 ≤ n} and #Ω = n2 . – #F = #{total number of pips equals 4} = #{B + W = 4} = #{(1, 3), (2, 2), (3, 1)} = 3. Then #F/#Ω = 3/n2 . – #F = #{number on black die is greater than number on white die} = #{B > W} = (n − 1) + (n − 2) + · · · + 1 + 0 = (n − 1)n/2. In this case, #F/#Ω = (n − 1)n/2 1 n2 − n 1 = → 2 2 2 n 2 n as n → ∞. 2.1. Set theory. For a random process, both the sample space and events of interest are sets, and so we can phrase much of our discussion terms of set theory. A set A is a collection of distinguishable elements. Recall that the sample space is the set of possible outcomes of a random experiment. We denote the sample space by Ω and write ω ∈ Ω to denote that ω is an outcome for the experiment. A set A0 for which every a0 ∈ A0 is also an element of A is called a subset of A, written A0 ⊆ A. In probability, every event is a subset of the sample space. If the outcome ω of the experiment is in any event E ⊆ Ω, then E occurs. Table 2.1 summarizes the key set-theoretic ideas for this course. Event language Sample space Event Impossible, null event A does not occur A & B both occur A & B are mutually exclusive at least one of A & B occurs if A then B A but not B exactly one of A & B All events Set language Universal set Subset of Ω Empty set A complement A intersect B A & B are disjoint A union B A is a subset of B A minus B, A not B symmetric difference of A & B Power set of Ω Set notation Ω, S A, B, . . . ∅ Ac A∩B A∩B=∅ A∪B A⊆B A − B, A \ B A4B 2Ω Definition collection of distinct objects collection of objects in Ω ∅ := {} {x ∈ Ω : x < A} {x ∈ Ω : x ∈ A and x ∈ B} {x ∈ Ω : x ∈ A, x ∈ B or x ∈ A ∩ B} x ∈ A implies x ∈ B A ∩ Bc (A − B) ∪ (B − A) {A ⊆ Ω} Table 1. Definition of terms from set theory and their corresponding probabilistic interpretation. Sets satisfy the following rules of operation. Let A, B, C ⊆ Ω. PROBABILITY MODELS 7 • Commutative laws: A∪B = B∪A A∩B = B∩A • Associative laws: (A ∪ B) ∪ C = A ∪ (B ∪ C) (A ∩ B) ∩ C = A ∩ (B ∩ C) • Distributive laws: (A ∪ B) ∩ C = (A ∩ C) ∪ (B ∩ C) (A ∩ B) ∪ C = (A ∪ C) ∩ (B ∪ C) • DeMorgan’s laws: (A ∪ B)c = Ac ∩ Bc (A ∩ B)c = Ac ∪ Bc Be sure that you know each of the above properties and why they are true. To help intuition, it is sometimes useful to draw a picture. 2.2. Equally likely outcomes. Suppose Ω is finite and every ω ∈ Ω is equally likely, i.e., occurs with the same frequency. We define a probability distribution on Ω, called the (discrete) uniform distribution, as a function P : 2Ω → [0, 1], where P[A] := #A/#Ω, (1) A ⊆ Ω. We interpret (1) as follows. Suppose we repeatedly draw outcomes from the experiment modeled by P and define #{times A occurs in the first n trials} . n Then fn → P[A] as n → ∞. This convergence, though intuitive, requires some mathematical tools to prove. We revisit this concept later (Section ??) when we discuss the law of large numbers. fn := Example 2.2 (Dice, revisited). As in Example 2.1(a), let Ω := {(i1 , i2 ) : i1 , i2 = 1, . . . , 6} so that P[A] := #A/36 for any A ⊆ Ω. For j = 2, . . . , 12, we define T j := {sum of i1 and i2 is j}. Then j P[T j ] 2 3 4 5 6 7 8 9 10 11 12 1 36 2 36 3 36 4 36 5 36 6 36 5 36 4 36 3 36 2 36 1 36 • Let E := {B + W ∈ {7, 11}}, then E = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1), (5, 6), (6, 5)} and #E = 8. Therefore, P[E] = 8/36. Furthermore, notice that E = T7 ∪ T11 and P[T7 ∪ T11 ] = P[E] = 8/36 = 6/36 + 2/36 = P[T7 ] + P[T11 ]. • Let E = T2c = T3 ∪ · · · ∪ T12 , then #T2c = 35 and P[T2c ] = 35/36 = 1 − 1/36 = 1 − P[T2 ]. • P[Ω] = 36/36 = 1. 8 HARRY CRANE 2.3. Axioms of probability. For every finite sample space Ω, 2Ω satisfies (1) Ω ∈ 2Ω , (2) if A, B ∈ 2Ω then A ∩ B ∈ 2Ω , and (3) if A ∈ 2Ω then Ac ∈ 2Ω . The power set is an example of a σ-field, which plays a central role in probability theory. Inituitively, a σ-field, e.g., the power set, contains all the events that we are interested in for a given experiment: (1) says that we are interested in something, (2) says that if we are interested in whether A and B occur individually then we are intereseted in whether both occur, and (3) says that if we are interested in whether A occurs then we are interested in whether Ac occurs. We will not study σ-fields in this course, they are a topic in advanced probability theory; instead, we refer only to the event space, which we denote (Ω, E), where Ω is a sample space and E is a collection of events in Ω. Definition 2.3 (Probability measure). Given an event space (Ω, E), a probability measure is a function P : E → [0, 1] satisfying (PM1) P(Ω) = 1, (PM2) P(A) ≥ 0 for all A ∈ E, and S P (PM3) if A1 , A2 , . . . are disjoint events in E, then P ( Ai ) = P(Ai ). Definition 2.4 (Probability model). A probability model consists of a triple (Ω, E, P), where • (Ω, E) is an event space and • P : E → [0, 1] is a probability measure. The collection E consists of all events which we can assign a probability: (PM1) Ω ∈ E: something always happens; we can give E a probability, then we can give Ec a probability; (PM2) if E ∈ E then Ec ∈ E: ifS (PM3) E1 , E2 , . . . ∈ E implies i Ei ∈ E: if we can assign each of E1 , E2 , . . . a probability, then S we can assign Ei a probability. Proposition 2.5 (Properties of probability measures). The following hold for any probability model (Ω, E, P). (i) P(∅) = 0. hS i P (ii) If E1 , . . . , En ∈ E are mutually exclusive, then P ni=1 Ei = ni=1 P[Ei ]. (iii) If, for E, E0 ∈ E, E ⊆ E0 , then P[E0 −E] = P[E0 ]−P[E]. Hence, if E ⊆ E0 , then P[E] ≤ P[E0 ]. (iv) P[Ec ] = 1 − P[E] for all E ∈ E. (v) (Inclusion-Exclusion formula) P[E ∪ E0 ] = P[E] + P[E0 ] − P[E ∩ E0 ] for all E, E0 ∈ E. In particular, P[E ∪ E0 ] = P[E] + P[E0 ] only if P[E ∩ E0 ] = 0. S Proof. (i) Since ∅ =P ∞ i=1 ∅ and the empty set is disjoint from itself, i.e., ∅ ∩ ∅ = ∅, we observe P[∅] = ∞ i=1 P[∅]; whence, P[∅] = 0. S S (ii) We can write the finite union as a countable union by ni=1 Ei = ni=1 Ei ∪ ∅ ∪ ∅ ∪ · · · . Therefore, by (PM3) and part (i), we have n n n X [ X P Ei = P[Ei ] + P[∅] + P[∅] + · · · = P[Ei ]. i=1 i=1 i=1 (iii) For E ⊆ E0 , we can write E0 as a union of the mutually exclusive events E and E0 \ E, i.e., E0 = E ∪ E0 \ E and E ∩ E0 \ E = ∅. By (ii), we have P[E0 ] = P[E] + P[E0 \ E], PROBABILITY MODELS 9 and P[E] ≤ P[E0 ] by (PM2). (iv) Here, we write Ω = E ∪ Ec and combine (PM1) with item (iii). (v) We can write E ∪ E0 as the disjoint union E ∪ (E0 \ E). The conclusion follows by items (ii) and (iii). 2.4. Discrete probability models. When an experiment has finitely or countable many outcomes, we specify a discrete probability model. For an at most countable event space (Ω, E), we define a discrete probability model as follows. • To P each ω ∈ Ω, we assign a probability mass p(ω) so that p(ω) ≥ 0 for all ω ∈ Ω and ω∈Ω p(ω) = 1. P • To each E ∈ E, we assign P[E] = ω∈E p(ω). We call p : Ω → [0, 1] a probability mass function (pmf). Examples of discrete probability models include the coin, dice, lottery, and poker examples we have already discussed. 2.5. Continuous probability models. When an experiment has uncountably many possible outcomes, we use a continuous probability model. For example, consider spinning a needle pivoted at the origin and recording the angle (in radians) between the needle and the x-axis upon stopping. Then Ω = [0, P 2π) and we define E := {countable unions of [a, b), a < b}. For each E ∈ E, we define P[E] = i (bi − ai ), where E = [a1 , b1 ) ∪ [a2 , b2 ) ∪ · · · . S P Remark 2.6. In the above definition, we can prove that P [ Ei ] = i P[Ei ] for countable unions, but not uncountable unions. The main difference between discrete and continuous probability models is that discrete probability involves algebra and combinatorics whereas continuous probability models involve calculus. We begin with discrete probability models, because they are often more intuitive. We move to continuous probability models in the second half of the course.