Download MTHE/STAT 351: 1 – Axioms of Probability Sample Spaces and

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Sample Spaces and Events
Random experiment: An experiment whose outcome is uncertain (e.g.,
coin tosses, drawing a card from a deck, tomorrow’s temperature in
Kingston).
MTHE/STAT 351: 1 – Axioms of Probability
Sample space: The set of all possible outcomes of a random
experiment. Usually denoted by S.
T. Linder
Events: Subsets of the sample space S. Usually denoted by A, B, C,
etc. Notation: A ⇢ S.
Queen’s University
Examples:
Fall 2016
- Flipping a coin. Possible outcomes are H and T . Sample space:
S = {H, T }.
- Rolling a die. S = {1, 2, 3, 4, 5, 6}.
Let A = {1, 3, 5}. Then A ⇢ S is the event that “an odd number is
rolled.”
MTHE/STAT 351: 1 – Axioms of Probability
1 / 35
2 / 35
Sample spaces need not be finite.
- Flipping a coin until we get a total of two heads or a total of two
tails.
S = {HH, T T, HT T, HT H, T HH, T HT }
Examples:
- Professor L. arrives to his 8:30 am class not later than 8:40. Let the
observation be the amount of time (in minutes) he is late. We have
The event “2 flips are needed to stop” is {HH, T T }.
The event “3 flips are needed to stop” is
{HT T, HT H, T HH, T HT }.
S = {t : 0  t  10} = [0, 10].
- A coin is flipped 3 times and the sequence of outcomes is recorded.
Then A = (3, 10] is the event that “Professor L. is more than 3
minutes late.”
S = {HHH, HHT, HT H, HT T, T HH, T HT, T T H, T T T }
- Observe the lifetime of a light bulb in hours. We have
Suppose the number of heads is recorded. Then
S = {t : 0  t < 1} = [0, 1).
S = {0, 1, 2, 3}
Then B = [0, 1000] is the event that the light bulb does not survive
past 1000 hours.
Note: An appropriate sample space for an experiment depends on what
is being observed.
MTHE/STAT 351: 1 – Axioms of Probability
MTHE/STAT 351: 1 – Axioms of Probability
3 / 35
MTHE/STAT 351: 1 – Axioms of Probability
4 / 35
Language of probability theory
If E ⇢ F , then F occurs whenever E occurs. We say that E implies F .
Probability theory has its own terminology somewhat di↵erent from that
of set theory.
Universal set
!
Sample space
Set
!
Event
Let E1 , E2 , . . . , En be events. Then
n
[
Ei
i=1
means that at least one of the Ei occurs.
In this language S is the “certain event” and ; is the “impossible event.”
Also,
Let E and F be subsets of the sample space S. If the outcome of a
random experiment belongs to E, we say that “E occurs.”
n
\
Ei
i=1
How can E c , E [ F , EF , and E
F be described in this language?
c
!
E does not occur
E[F
!
at least one of E or F occurs
E \ F (or EF )
!
E and F occur simultaneously
!
E occurs, but F does not occur
E
E
F
MTHE/STAT 351: 1 – Axioms of Probability
is the event that all Ei occur simultaneously.
If E1 , E2 , . . . , En are mutually exclusive, i.e., Ei \ Ej = ; for i 6= j, then
Tn
i=1 Ei is the impossible event.
5 / 35
MTHE/STAT 351: 1 – Axioms of Probability
6 / 35
Practice:
E and F are events of the sample space S. The event “exactly one
of E and F occurs” can be expressed as
F ) [ (F
(E
E, F , and G are events of the sample space S. The event “at least
one of E, F , G occurs” is
E)
Let’s show that this is equal to (E [ F )
properties of set operations:
(E \ F ) using the
E[F [G
The event “at most one of E, F , G occurs” is
(E
=
=
F ) [ (F
E)
(EF c ) [ (F E c )
c
c
(E [ F )(E
[ F})(F c [ E c )
| [
{zE})(F
| {z
S
=
(E [ F )(F c [ E c )
=
(E [ F )(F \ E)c
=
(E [ F )
(F \ E)
MTHE/STAT 351: 1 – Axioms of Probability
⇥
(E [ F [ G)c [ (E [ F [ G)
|
{z
}
|
(by definition)
none of them occurs
(by distributivity)
S
⇤
((E \ F ) [ (E \ G) [ (F \ G))
{z
}
exactly one of them occurs
(since AS = A for all A ⇢ S)
(by DeMorgan’s law)
(by definition).
7 / 35
MTHE/STAT 351: 1 – Axioms of Probability
8 / 35
(b) Let En be the event “n rolls necessary to complete the experiment.”
Describe En in terms of the elements of S.
Example: A die is rolled repeatedly until 6 appears.
(a) What is the sample space S of the experiment?
En is the collection of all length-n sequences of integer numbers
between 1 and 6 such that 6 appears only in the last position:
The sample space is the collection of all finite-length sequences of
integer numbers between 1 and 6 such that 6 appears only in the
last position, plus the collection of all infinite sequences in which 6
does not appear.
Formally
S=
1
[
n=1
{a1 , a2 , . . . , an
1, 6
: 1  aj  5, j = 1, . . . , n
En = {a1 , a2 , . . . , an
(c) What is the event (
1
[
where A = {a1 , a2 , . . . : 1  aj  5, j = 1, 2, . . .}.
n=1
9 / 35
1}
c
En ) ?
En
!c
= {a1 , a2 , . . . : 1  aj  5, j = 1, 2, . . .} = A.
MTHE/STAT 351: 1 – Axioms of Probability
10 / 35
Definition (Event Space) An event space is a collection of events
(subsets) of a sample space S such that
Axioms of probability
Assume a random experiment is repeatedly performed. For an event E in
the sample space, let n(E) be the number of times E occurs in the first
n repetitions. One way of defining the probability P (E) of E is the limit
of relative frequencies
n(E)
P (E) = lim
n!1
n
1) S is an event
2) If E is an event, then E c is also an event
S1
3) If E1 , E2 , E3 , . . . are events, then so is n=1 En .
Remarks:
; is an event since S c = ;.
There are several conceptual problems with this definition:
The union of a finite collection of events is also an event since
Sn
S1
i=1 Ei =
i=1 Ei if we set Ei = ; for i = n + 1, n + 2, . . ..
Cannot repeat experiment infinitely many times.
Why should the limit exist?
The intersection of a finite or countably many events is also an
event. For example, EF is an event since EF = (E c [ F c )c .
Even if limit exists, why should it be the same if the entire
experiment is repeated the second time.
If S is a finite set, then the collection of all subsets of S is an event
space.
It turns out that it is much easier to get a mathematically consistent
theory if we assume that P (E) exists for all events and satisfies certain
intuitively desirable axioms.
MTHE/STAT 351: 1 – Axioms of Probability
n=1
: 1  aj  5, j = 1, . . . , n
This is the event that 6 never appears.
1} [ A
MTHE/STAT 351: 1 – Axioms of Probability
S1
1, 6
A collection of subsets of S satisfying properties 1, 2, and 3 is called
a -algebra or -field.
11 / 35
MTHE/STAT 351: 1 – Axioms of Probability
12 / 35
Theorem 1
P (;) = 0.
Definition (Probability Axioms) A real-valued function P on the event
space is called a probability function if it satisfies the following:
Proof Let
1) P (E)
0 for all events E.
Ei = ;,
E1 = S,
i = 2, 3, . . .
S1
Then the Ei are mutually exclusive and S = i=1 Ei . Thus by Axiom 3,
2) P (S) = 1.
3) If E1 , E2 , E3 , . . . are mutually exclusive events (i.e., Ei Ej = ; for all
i 6= j), then
1
1
[
X
P
Ei =
P (Ei ).
i=1
P (S) =
1
X
P (Ei ) = P (S) +
i=1
i=1
implying that
Remark: Property 3 is called countable additivity.
1
X
P (Ei )
i=2
1
X
P (Ei ) = 0
i=2
These axioms directly imply the following theorems.
By Axiom 1, P (Ei )
P (;) = 0.
MTHE/STAT 351: 1 – Axioms of Probability
13 / 35
0 for all i. Since Ei = ; for all i
2, this gives
⇤
MTHE/STAT 351: 1 – Axioms of Probability
14 / 35
Another consequence of Axiom 3 is
3⇤ . (Finite additivity) If E1 , E2 , . . . , En is a finite collection of mutually
exclusive events, then
P
n
[
Ei =
i=1
n
X
Theorem 2
P (E c ) = 1
P (Ei ).
Proof E and E c are mutually exclusive, so
i=1
P (E) + P (E c ) = P (E [ E c ) = P (S) = 1.
Proof In Axiom 3 set Ei = ; for all i > n. Then
P
n
[
Ei
=
P
i=1
1
[
Ei =
i=1
=
n
X
i=1
1
X
P (Ei ) +
i=1
1
X
Proof Since by Axiom 1, P (E)
P (;) .
{z
=0
⇤
Corollary 0  P (E)  1 for all events E.
P (Ei )
i=n+1
|
P (E).
⇤
0  P (E) = 1
}
Important special case: If E and F are mutually exclusive, then
0 and P (E c )
P (E c )  1.
| {z }
0
0,
⇤
P (E [ F ) = P (E) + P (F )
MTHE/STAT 351: 1 – Axioms of Probability
15 / 35
MTHE/STAT 351: 1 – Axioms of Probability
16 / 35
Equally likely outcomes
Example: We call a coin fair if H and T are equally likely in a single
toss. We’ll apply the axioms to figure out P ({H}) and P ({T }).
Suppose S = {s1 , s2 , . . . , sN } is a sample space with N equally likely
outcomes:
P ({s1 }) = P ({s2 }) = . . . = P ({sN }).
Since S = {H, T } and {H} \ {T } = ;, we have
1 =
P (S)
Using the axioms as before,
=
P ({H, T })
=
P ({H}) + P ({T })
=
2P ({H})
(by Axiom 3)
1 = P (S) = P
N
[
i=1
(by the equally likely assumption)
{si } =
N
X
i=1
P ({si }) = N P ({s1 })
so
Thus P ({H}) = P ({T }) = 1/2.
P ({si }) =
Note that this is not the result of a practical experiment, but the
consequence of the axioms.
MTHE/STAT 351: 1 – Axioms of Probability
1
,
N
i = 1, . . . , n
Using this, we show that the “equally likely outcomes” assumption
imposes a probability function on S.
17 / 35
MTHE/STAT 351: 1 – Axioms of Probability
18 / 35
Example: If two dice are rolled, what is the probability that the sum of
the obtained two numbers is 5?
Theorem 3
Let S be a sample space consisting of N equally likely outcomes. Then
for all E ⇢ S,
|E|
P (E) =
N
where |E| denotes the number of elements in E.
Solution: S = {(i, j) : 1  i, j  6}. We assume that the dice are fair,
so all 36 outcomes in S are equally likely. If E = “sum is 5,” then
E = {(1, 4), (2, 3), (3, 2), (4, 1)}
4
1
= .
36
9
Example: What is the probability of getting exactly 2 heads in 3 flips of
a fair coin?
so P (E) =
Proof We showed that P ({s}) = 1/N for any s 2 S. Thus
P (E) = P
[
s2E
{s} =
X
s2E
P ({s}) =
X 1
|E|
=
.
N
N
⇤
Solution: S = {HHH, HHT, HT H, T HH, HT T, T HT, T T H, T T T }
and E = {HHT, HT H, T HH}. Thus
s2E
P (E) =
|E|
3
= .
8
8
Later we’ll learn how to calculate the probability of 3 heads in 20 flips.
MTHE/STAT 351: 1 – Axioms of Probability
19 / 35
MTHE/STAT 351: 1 – Axioms of Probability
20 / 35
More basic consequences of the axioms
The following is an example where the outcomes are not equally likely.
Example: A, B, and C are the only competitors in a race. A is twice as
likely as B to win, and C is 2/3 as likely as A to win. There are no ties.
What are the probabilities of winning for A, B, and C?
Theorem 4
If E ⇢ F , then
Solution: Let A = “A wins,” B = “B wins,” C = “C wins.” Then
S = {A, B, C} and
Proof E ⇢ F implies that F = E [ (F
disjoint. Thus by Axiom 3,
1 = P (S) = P (A) + P (B) + P (C).
We also know that
P (A) = 2P (B)
6
,
13
P (B) =
3
,
13
P (C) =
E are
⇤
P (E).
0 by Axiom 1
This is a system of 3 linear equations with 3 unknowns.The solution is
P (A) =
E). Also, E and F
P (F ) = P (E) + P (F E)
| {z }
P (C) = 23 P (A).
and
P (E)  P (F )
Corollary If E ⇢ F , then
4
.
13
P (F
MTHE/STAT 351: 1 – Axioms of Probability
21 / 35
Theorem 5
For arbitrary events E and F
E) = P (F )
P (E)
MTHE/STAT 351: 1 – Axioms of Probability
22 / 35
Example: An integer between 1 and 100 is chosen at random. What is
the probability that it is divisible by either 5 or 7?
P (E [ F ) = P (E) + P (F )
Solution: E = “divisible by 5,” F = ”divisible by 7.” Then
P (EF )
P (E [ F ) = P (E) + P (F )
P (EF )
where
Proof We have E [ F = E [ (F
disjoint, by Axiom 3
(EF )). Since E and F
P (E [ F ) = P (E) + P (F
|E|
20
|F |
14
=
,
P (F ) =
=
.
100
100
100
100
Since 5 and 7 are primes, an integer is divisible by both 5 and 7 i↵ it is
|EF |
2
divisible by 5 · 7 = 35. Thus P (EF ) =
=
and
100
100
20
14
2
P (E [ F ) =
+
= 0.32
100 100 100
(EF ) are
P (E) =
(EF )).
But EF ⇢ F , so by the previous theorem
P (E [ F ) = P (E) + P (F )
P (EF ).
⇤
One can generalize Theorem 5 to n > 2 events. For 3 events we have
Corollary For any two events E and F
P (E [ F [ G)
P (E [ F )  P (E) + P (F )
= P (E) + P (F ) + P (G)
This inequality is called the union bound.
MTHE/STAT 351: 1 – Axioms of Probability
P (EF )
P (EG)
P (F G) + P (EF G)
This is the so called inclusion-exclusion principle.
23 / 35
MTHE/STAT 351: 1 – Axioms of Probability
24 / 35
Example: In a hotel with 300 guests, there are 27 guests who smoke
cigarettes, 11 who smoke cigars, 8 who smoke pipes, 4 who smoke both
cigarettes and cigars, 3 who smoke both cigarettes and pipes, 3 who
smoke both pipes and cigars. Also, there is one guest who smokes all
three.
By the inclusion-exclusion formula
P (E [ F [ G)
=
How many non-smoking guests are staying in the hotel?
P (E) + P (F ) + P (G)
Since P (A) =
Solution: Let S denote the set of all guests. If a guest is randomly
chosen, then for all A ⇢ S the probability that A contains this guest is
|A|
P (A) =
, where N = 300.
N
Let E = “guests who smoke cigarettes,” F = “guests who smoke
cigars,” and G = “guests who smoke pipes.” Then E [ F [ G is the set
of guests who smoke and
P (EF )
P (EG)
P (F G) + P (EF G).
|A|
, multiplying both sides by N gives
N
|E [ F [ G|
=
|E| + |F | + |G|
|EF |
|EG|
|F G| + |EF G|.
We are given that |E| = 27, |F | = 11, |G| = 8, |EF | = 4, |EG| = 3,
|F G| = 3, and |EF G| = 1. Thus
|E [ F [ G| = 27 + 11 + 8
N · P (E [ F [ G) = |E [ F [ G|
Hence there are 300
4
3
3 + 1 = 37
37 = 263 non-smokers among the guests.
is the number of guests who smoke.
MTHE/STAT 351: 1 – Axioms of Probability
25 / 35
MTHE/STAT 351: 1 – Axioms of Probability
26 / 35
Continuity of the probability function
If {En } is increasing define
Recall that a function f : R ! R is continuous on the real line if and
only if (i↵) for all x and convergent sequences {xn }1
n=1 such that
lim xn = x, we have
lim En =
n!1
1
[
En .
1
\
En .
n=1
For a decreasing sequence {En } define
n!1
lim f (xn ) = f (x).
n!1
lim En =
n!1
Probability functions have an analogous continuity property. Call a
sequence of events E1 , E2 , E3 , . . . increasing if
n=1
Theorem 6 (Continuity of Probability Function)
If {En } is an increasing or decreasing sequence of events, then
E1 ⇢ E 2 ⇢ · · · ⇢ En ⇢ · · ·
lim P (En ) = P ( lim En ).
and decreasing if
n!1
E1
MTHE/STAT 351: 1 – Axioms of Probability
E2
···
En
n!1
···
27 / 35
MTHE/STAT 351: 1 – Axioms of Probability
28 / 35
Random selection of a point from an interval
Proof for increasing sequences Let A1 = E1 and An = En En 1
for n 2. (Thus An contains all elements of En that are not in any of
E1 , . . . , En 1 ). The events A1 , A2 , A3 , . . . are mutually exclusive and
n
[
n
[
Ai =
i=1
1
[
and
Ei
i=1
Ai =
i=1
1
[
We want to build a probability model for “randomly selecting” a point
from a bounded interval [a, b] = {x : a  x  b}.
For any sub-interval [↵, ] ⇢ [a, b], we’ll denote the event “the point falls
in [↵, ]” also by [↵, ].
Ei
i=1
Thus
P ( lim En )
n!1
=
P
1
[
En = P
n=1
=
lim
n!1
i=1
P
=
An =
n=1
n
X
|
1
[
i=1
Ai
lim P (En ).
}
P An
(by Axiom 3)
Ai = lim P
n!1
{z
By intuitive reasoning, for ↵ < the probability P ([↵, ]) should be
proportional to the length of [↵, ], i.e.,
n=1
n
[
P Ai = lim P
Sn
1
X
i=1
n!1
P ([↵, ]) = k(
n
[
Ei
1 = P (S) = P ([a, b]) = k(b
En
n!1
so k =
1
b a.
Thus, if ↵ <
29 / 35
En = [x
Then E1
E2
E3
✏, x + ✏] 2 [a, b]. Define
P ([↵, ])
· · · (decreasing sequence) and
n!1
1
\
n=1
=
P ({↵}) +P ((↵, )) + P ({ })
| {z }
| {z }
=0
P ((↵, )).
Similarly we can show that P ([↵, ]) = P ((↵, ]) = P ([↵, )).
Remarks:
P ({x}) = P ( lim En ) = lim P (En ) = lim
n!1
2✏/n
= 0.
b a
The fact that P ({x}) = 0 shows that there are non-empty events
with zero probability.
We have seen that P ((a, b)) = P ([a, b]) = 1. Thus there are events
with probability 1 that are not equal to S.
Thus selecting x has zero probability for all x.
MTHE/STAT 351: 1 – Axioms of Probability
P ({↵} [ (↵, ) [ { })
=0
En = {x}
n!1
30 / 35
=
=
Thus by the continuity of P
n!1
↵
.
a
MTHE/STAT 351: 1 – Axioms of Probability
✏/n , x + ✏/n].
lim En =
b
What is the probability of (↵, ) ⇢ [a, b]?
What is the probability of selecting a given point x?
Let x 2 (a, b) and choose ✏ > 0 such that [x
a)
and [↵, ] ⇢ [a, b],
P ([↵, ]) =
MTHE/STAT 351: 1 – Axioms of Probability
for some k > 0.
By the axioms,
i=1
| {z }
⇤
↵)
31 / 35
MTHE/STAT 351: 1 – Axioms of Probability
32 / 35
Technical detour
Example: A bus arrives at a bus station at random time between 8:00
and 8:15 am. Its scheduled arrival time is 8:05 am. Let’s call the bus
almost punctual if it less than 2 minutes early and less than 5 minutes
late. What is the probability that the bus is not almost punctual?
One can ask the following question: in the random selection of a point
from an interval experiment what subsets of the interval [a, b] are events?
Recall that collection of events of a sample space have the property that
Solution: Let’s measure the time in minutes. For simplicity, shift the time
interval so that the bus randomly arrives in the interval [0, 15] and the
scheduled arrival time is 5. Then “bus almost punctual” = (3, 10). Thus
(a) S is an event,
(b) If E is an event, then E c is also an event,
(c) if E1 , E2 , . . . is a sequence of events, then
P (not almost punctual)
=
1
P (almost punctual)
=
1
=
1
P ((3, 10))
10 3
8
=
.
15
15
S1
n=1
En is also an event.
If S is a finite or countably infinite set, then the set of events is usually
taken to be the collection of all subsets of S. (Check that (a), (b), and
(c) hold in this case.)
If S in an uncountably infinite set, such as the interval [a, b], the choice
of events is more tricky.
MTHE/STAT 351: 1 – Axioms of Probability
33 / 35
We have already seen that all (open or closed) subintervals of [a, b] are
events. It follows that all finite and infinite unions of open or closed (or
half open, half closed) subintervals of [a, b] are events.
We know from real analysis that each open set in R can be written as a
union of countably many open intervals. Thus by (c) all open subsets of
[a, b] are events.
Since the complement of a closed set is an open set, if follows from (b)
that all closed subsets of [a, b] are events.
The smallest collection of subsets of [a, b] which contain all open sets in
[a, b] and satisfy (a), (b), (c) are called the collection of Borel sets in
[a, b].
It turns out that the notion of length can be extended to any Borel set
B ⇢ [a, b], and one can define the probability that a point randomly
chosen from [a, b] falls into B by
P (B) =
MTHE/STAT 351: 1 – Axioms of Probability
length(B)
b a
35 / 35
MTHE/STAT 351: 1 – Axioms of Probability
34 / 35