Download 2. Probability models In this section, we use previous examples to

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
6
HARRY CRANE
2. Probability models
In this section, we use previous examples to introduce the concept of a probability model,
which is fundamental to statistical methodology.
Example 2.1 (Dice models).
(a) two 6-sided dice (one black, one white): Write the
outcome of a roll of these dice as an ordered pair (W, B), where the first entry is the outcome
on the white die and the second entry is the outcome on the black die.
– #{black die shows 1} = #{B = 1} = #{(1, 1), (2, 1), (3, 1), (4, 1), (5, 1), (6, 1)} = 6 =
#{white die shows 1}.
– #{at least one die shows 1} = #{(1, 1), . . . , (1, 6), (2, 1), . . . , (6, 1)} = 11 , 6 + 6.
– #{neither die shows 1} = 25 = 36 − 11.
– #{black and white die show same number} = #{B = W} = #{(1, 1), (2, 2), . . . , (6, 6)} =
6.
(b) two n-sided dice (one black, one white): Suppose n ≥ 4. Then Ω = {(i1 , i2 ) : 1 ≤
i1 , i2 ≤ n} and #Ω = n2 .
– #F = #{total number of pips equals 4} = #{B + W = 4} = #{(1, 3), (2, 2), (3, 1)} = 3.
Then #F/#Ω = 3/n2 .
– #F = #{number on black die is greater than number on white die} = #{B > W} = (n −
1) + (n − 2) + · · · + 1 + 0 = (n − 1)n/2. In this case,
#F/#Ω =
(n − 1)n/2 1 n2 − n
1
=
→
2
2
2 n
2
n
as n → ∞.
2.1. Set theory. For a random process, both the sample space and events of interest are
sets, and so we can phrase much of our discussion terms of set theory. A set A is a collection
of distinguishable elements. Recall that the sample space is the set of possible outcomes of a
random experiment. We denote the sample space by Ω and write ω ∈ Ω to denote that ω
is an outcome for the experiment. A set A0 for which every a0 ∈ A0 is also an element of A
is called a subset of A, written A0 ⊆ A. In probability, every event is a subset of the sample
space. If the outcome ω of the experiment is in any event E ⊆ Ω, then E occurs. Table 2.1
summarizes the key set-theoretic ideas for this course.
Event language
Sample space
Event
Impossible, null event
A does not occur
A & B both occur
A & B are mutually exclusive
at least one of A & B occurs
if A then B
A but not B
exactly one of A & B
All events
Set language
Universal set
Subset of Ω
Empty set
A complement
A intersect B
A & B are disjoint
A union B
A is a subset of B
A minus B, A not B
symmetric difference of A & B
Power set of Ω
Set notation
Ω, S
A, B, . . .
∅
Ac
A∩B
A∩B=∅
A∪B
A⊆B
A − B, A \ B
A4B
2Ω
Definition
collection of distinct objects
collection of objects in Ω
∅ := {}
{x ∈ Ω : x < A}
{x ∈ Ω : x ∈ A and x ∈ B}
{x ∈ Ω : x ∈ A, x ∈ B or x ∈ A ∩ B}
x ∈ A implies x ∈ B
A ∩ Bc
(A − B) ∪ (B − A)
{A ⊆ Ω}
Table 1. Definition of terms from set theory and their corresponding probabilistic interpretation.
Sets satisfy the following rules of operation. Let A, B, C ⊆ Ω.
PROBABILITY MODELS
7
• Commutative laws:
A∪B = B∪A
A∩B = B∩A
• Associative laws:
(A ∪ B) ∪ C = A ∪ (B ∪ C)
(A ∩ B) ∩ C = A ∩ (B ∩ C)
• Distributive laws:
(A ∪ B) ∩ C = (A ∩ C) ∪ (B ∩ C)
(A ∩ B) ∪ C = (A ∪ C) ∩ (B ∪ C)
• DeMorgan’s laws:
(A ∪ B)c = Ac ∩ Bc
(A ∩ B)c = Ac ∪ Bc
Be sure that you know each of the above properties and why they are true. To help intuition,
it is sometimes useful to draw a picture.
2.2. Equally likely outcomes. Suppose Ω is finite and every ω ∈ Ω is equally likely, i.e.,
occurs with the same frequency. We define a probability distribution on Ω, called the (discrete)
uniform distribution, as a function P : 2Ω → [0, 1], where
P[A] := #A/#Ω,
(1)
A ⊆ Ω.
We interpret (1) as follows. Suppose we repeatedly draw outcomes from the experiment
modeled by P and define
#{times A occurs in the first n trials}
.
n
Then fn → P[A] as n → ∞. This convergence, though intuitive, requires some mathematical
tools to prove. We revisit this concept later (Section ??) when we discuss the law of large
numbers.
fn :=
Example 2.2 (Dice, revisited). As in Example 2.1(a), let Ω := {(i1 , i2 ) : i1 , i2 = 1, . . . , 6} so that
P[A] := #A/36 for any A ⊆ Ω. For j = 2, . . . , 12, we define T j := {sum of i1 and i2 is j}. Then
j
P[T j ]
2
3
4
5
6
7
8
9
10 11 12
1
36
2
36
3
36
4
36
5
36
6
36
5
36
4
36
3
36
2
36
1
36
• Let E := {B + W ∈ {7, 11}}, then E = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1), (5, 6), (6, 5)}
and #E = 8. Therefore, P[E] = 8/36. Furthermore, notice that E = T7 ∪ T11 and
P[T7 ∪ T11 ] = P[E] = 8/36 = 6/36 + 2/36 = P[T7 ] + P[T11 ].
• Let E = T2c = T3 ∪ · · · ∪ T12 , then #T2c = 35 and
P[T2c ] = 35/36 = 1 − 1/36 = 1 − P[T2 ].
• P[Ω] = 36/36 = 1.
8
HARRY CRANE
2.3. Axioms of probability. For every finite sample space Ω, 2Ω satisfies
(1) Ω ∈ 2Ω ,
(2) if A, B ∈ 2Ω then A ∩ B ∈ 2Ω , and
(3) if A ∈ 2Ω then Ac ∈ 2Ω .
The power set is an example of a σ-field, which plays a central role in probability theory.
Inituitively, a σ-field, e.g., the power set, contains all the events that we are interested in
for a given experiment: (1) says that we are interested in something, (2) says that if we are
interested in whether A and B occur individually then we are intereseted in whether both
occur, and (3) says that if we are interested in whether A occurs then we are interested in
whether Ac occurs. We will not study σ-fields in this course, they are a topic in advanced
probability theory; instead, we refer only to the event space, which we denote (Ω, E), where
Ω is a sample space and E is a collection of events in Ω.
Definition 2.3 (Probability measure). Given an event space (Ω, E), a probability measure is a
function P : E → [0, 1] satisfying
(PM1) P(Ω) = 1,
(PM2) P(A) ≥ 0 for all A ∈ E, and
S
P
(PM3) if A1 , A2 , . . . are disjoint events in E, then P ( Ai ) = P(Ai ).
Definition 2.4 (Probability model). A probability model consists of a triple (Ω, E, P), where
• (Ω, E) is an event space and
• P : E → [0, 1] is a probability measure.
The collection E consists of all events which we can assign a probability:
(PM1) Ω ∈ E: something always happens;
we can give E a probability, then we can give Ec a probability;
(PM2) if E ∈ E then Ec ∈ E: ifS
(PM3) E1 , E2 , . . . ∈ E implies
i Ei ∈ E: if we can assign each of E1 , E2 , . . . a probability, then
S
we can assign Ei a probability.
Proposition 2.5 (Properties of probability measures). The following hold for any probability
model (Ω, E, P).
(i) P(∅) = 0.
hS
i P
(ii) If E1 , . . . , En ∈ E are mutually exclusive, then P ni=1 Ei = ni=1 P[Ei ].
(iii) If, for E, E0 ∈ E, E ⊆ E0 , then P[E0 −E] = P[E0 ]−P[E]. Hence, if E ⊆ E0 , then P[E] ≤ P[E0 ].
(iv) P[Ec ] = 1 − P[E] for all E ∈ E.
(v) (Inclusion-Exclusion formula) P[E ∪ E0 ] = P[E] + P[E0 ] − P[E ∩ E0 ] for all E, E0 ∈ E.
In particular, P[E ∪ E0 ] = P[E] + P[E0 ] only if P[E ∩ E0 ] = 0.
S
Proof.
(i) Since ∅ =P ∞
i=1 ∅ and the empty set is disjoint from itself, i.e., ∅ ∩ ∅ = ∅, we
observe P[∅] = ∞
i=1 P[∅]; whence, P[∅] = 0.
S
S
(ii) We can write the finite union as a countable union by ni=1 Ei = ni=1 Ei ∪ ∅ ∪ ∅ ∪ · · · .
Therefore, by (PM3) and part (i), we have
 n 
n
n
X
[  X
P  Ei  =
P[Ei ] + P[∅] + P[∅] + · · · =
P[Ei ].
i=1
i=1
i=1
(iii) For E ⊆ E0 , we can write E0 as a union of the mutually exclusive events E and E0 \ E,
i.e., E0 = E ∪ E0 \ E and E ∩ E0 \ E = ∅. By (ii), we have
P[E0 ] = P[E] + P[E0 \ E],
PROBABILITY MODELS
9
and P[E] ≤ P[E0 ] by (PM2).
(iv) Here, we write Ω = E ∪ Ec and combine (PM1) with item (iii).
(v) We can write E ∪ E0 as the disjoint union E ∪ (E0 \ E). The conclusion follows by
items (ii) and (iii).
2.4. Discrete probability models. When an experiment has finitely or countable many
outcomes, we specify a discrete probability model. For an at most countable event space (Ω, E),
we define a discrete probability model as follows.
• To
P each ω ∈ Ω, we assign a probability mass p(ω) so that p(ω) ≥ 0 for all ω ∈ Ω and
ω∈Ω p(ω) = 1.
P
• To each E ∈ E, we assign P[E] = ω∈E p(ω).
We call p : Ω → [0, 1] a probability mass function (pmf). Examples of discrete probability
models include the coin, dice, lottery, and poker examples we have already discussed.
2.5. Continuous probability models. When an experiment has uncountably many possible
outcomes, we use a continuous probability model. For example, consider spinning a needle
pivoted at the origin and recording the angle (in radians) between the needle and the x-axis
upon stopping. Then Ω = [0, P
2π) and we define E := {countable unions of [a, b), a < b}. For
each E ∈ E, we define P[E] = i (bi − ai ), where E = [a1 , b1 ) ∪ [a2 , b2 ) ∪ · · · .
S
P
Remark 2.6. In the above definition, we can prove that P [ Ei ] = i P[Ei ] for countable unions,
but not uncountable unions.
The main difference between discrete and continuous probability models is that discrete
probability involves algebra and combinatorics whereas continuous probability models
involve calculus. We begin with discrete probability models, because they are often more
intuitive. We move to continuous probability models in the second half of the course.
Related documents