Download eliminated

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Inductive probability wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Birthday problem wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
Lectures 5–11
Conditional Probability and Independence
Purpose: Calculate probabilities under restrictions, conditions or partial information on the
random experiment. Break down complex probabilistic analyses into manageable steps.
Example 1 (Roll of Two Fair Dice): What is the long run relative frequency of a sum of
seven given that we rolled at least one six. The relative frequencies of the shaded boxes should
be roughly equal in the long run. Thus the conditional relative frequency of the light shaded boxes
among the shaded boxes should be about 2/11.
6
1+6=7
2+6=8
3+6=9
4 + 6 = 10
5 + 6 = 11
6 + 6 = 12
6 + 5 = 11
4
6 + 4 = 10
in
at
ed
5
6+3=9
el
im
3
2
6+2=8
6+1=7
1
1
2
3
5
4
P (sum of 7|at least one 6) =
2
11
=
6
2/36
11/36
Think of the original probabilities of the shaded region (adding to 11/36) being prorated so that
the new conditional probabilities of the individual shaded boxes add to one. This is accomplished by
dividing the original shaded box probabilities by this total shaded (conditioning) region probability,
i.e., by 11/36, thus giving us the desired (11/36)/(11/36) = 1, the probability of our new sampling
space (shaded region).
1
Definition:
If P (F ) > 0 , then P (E|F ) =
P (EF )
P (F )
undefined when P (F ) = 0 !
Long Run Frequency Interpretation of P (E|F ): By the long run frequency paradigm we would
in a large number N of repeated experiments see about a proportion P (F ) of the experiments result
in the event F , or we would expect to see event F to occur about N · P (F ) times, and similarly
event EF about N · P (EF ) times. Thus, when we focus on experiments that result in F (i.e., given
F ), the proportion of such restricted experiments that also result in E is thus approximately
N · P (EF )
P (EF )
=
= P (E|F )
N · P (F )
P (F )
Example 2 (Makeup Exam): A student is given a makeup exam and is given one hour to finish
it. The exam is designed so that a fraction x/2 of students would finish it in less than x hours, i.e.,
about half would finish it in the allotted time. Given that the student (when viewed as a random
choice from all students) is still working on the exam after 40 minutes, what is the chance that the
student will use the full hour (finished or not)?
Let X denote the time (in hours) that the student needs to finish the exam. We want
P (X < 1|X ≥ 40/60) =
P (X < 1) − P (X < 40/60)
1/2 − 1/3
1
P (40/60 ≤ X < 1)
=
=
=
P (X ≥ 40/60)
1 − P (X < 40/60)
1 − 1/3
4
X < 1 is shorthand for the event that the student finishes prior to 1 hour, and similarly for the
other usages. The chance that the student will use the full hour (finished or not) is .75.
Prorated probabilities: Note that in the case of equally likely outcomes it is often easier to
work with the reduced sample space treating the remaining outcomes as equally likely. In general,
outcome probabilities are prorated
P ({e}|F ) =
P ({e})
if e ∈ F
P (F )
and P ({e}|F ) = 0 otherwise.
When P ({e}) are all the same, then P ({e})/P (F ) are all the same for all e ∈ F .
Example 3 (Coin Flips): Two fair coins are flipped. All outcomes of the sample space S =
{(H.H), (H, T ), (T, H), (T, T )} are equally likely. What is the conditional probability that both
coins are heads given a) that the first coin shows heads and b) at least one of the coins shows
heads?
Let B = {(H, H)}, F = {(H, T ), (H, H)} and A = {(H, T ), (T, H), (H, H)}, then for a)
P (B|F ) =
P (BF )
P (B)
1/4
1
=
=
=
P (F )
P (F )
1/2
2
P (B|A) =
P (AB)
P (B)
1/4
1
=
=
=
P (A)
P (A)
3/4
3
while for b) we get
2
This takes many by surprise, and it is often phrased in terms of the boy/girl problem in a family
with two children. If you are told that at least one of the children is a girl, the answer again is
1/3 for the probability that both are girls. Note however the lengthy but readable discussion in
Example 3m in Section 3.3 of Ross.
Here comes the kicker. Suppose you are told that at least one is a girl and she was born on a
Sunday and assuming all 2 × 7 × 2 × 7 = 196 gender, weekday combinations (G1 , D1 , G2 , D2 ) as
equally likely, the answer changes to?? We will generalize this from days Di to attributes Ai which
can take any of n values. For example, for n = 365 it could be the day in the year.
If I tell you that at least one of the children is a girl with attribute A = k, then the given
information reduces the sample space to
{(G1 , k, G2 , i), i = 1, . . . , n} ∪ {(G1 , i, G2 , k), i = 1, . . . , n, i 6= k}
∪ {(G1 , k, B2 , i), i = 1, . . . , n} ∪ {(B1 , i, G2 , k), i = 1, . . . , n}
with altogether n + (n − 1) + n + n = 4n − 1 equally likely outcomes. Of these n + n − 1 = 2n − 1
qualify as having two girls. Thus the conditional chance is
Lec6
ends
1
2n − 1
1
2n − 1
≈
for large n
and
=
for n = 1
4n − 1
2
4n − 1
3
The product formula: A reformulation of the definition of P (E|F ): P (EF ) = P (E|F )P (F )
Useful in computing P (EF ) when P (E|F ) is more transparent.
The Law of Total Probability: Suppose the sample space S can be represented as the disjoint
union of M events Fi , i = 1, . . . , M , with Fi Fj = ∅ for i 6= j, then for any event E we have (given
as (3.4) by Ross, but pulled forward to clean up the next example):
P (E) = P (ES) = P E
M
[
i=1
!
Fi = P
M
[
i=1
!
EFi =
M
X
i=1
P (EFi ) =
M
X
P (E|Fi )P (Fi )
i=1
Special Case:
E = EF ∪EF c mutually exclusive =⇒ P (E) = P (EF )+P (EF c ) = P (E|F )P (F )+P (E|F c )P (F c )
with the hope that P (F ), P (E|F ) and P (E|F c ) are easier to ascertain than P (E).
Example 4 (Bridge): In the card game bridge 52 cards are dealt with equal chance for all sets
of hands of 13 each to East, West, North and South. Given that North and South have a total of
8 spades, what is the chance that East has 3 of the remaining 5 spades?
Let Fi be the event that specifies the ith distinct deal to North and South, where they have a
total of 8 spades, and where East and West thus have the other 5 spades, but the deal to East and
West is not specified otherwise. How many such disjoint events Fi there are is not important,
say
26
1
there are M . However, it is easy to realize that each Fi contains the same number 13 of deals
to East and West, i.e., P (F1 ) = . . . = P (FM ) = P (F )/M , where F is the event that North and
South have 8 spades among themselves.
1
M=
13
8
39
18
26
13
, 8 spades, 18 non-spades, to make two hands of 13 for N and S.
3
Conditionally, given F , we can view F1 ∪ . . . ∪ FM = F as our reduced sample space with all
deals in it being equally likely. Let E be the event that East gets exactly 3 spades. Then
P (E|Fi ) =
5
3
21
10
26
13
= 0.339
Note that which 5 spades and which 21 non-spades are involved in making the two hands for
East and West depends on Fi , the hands specified for North and South. However, the probability
P (E|Fi ) is always the same. Using the same idea as presented in the law of total probability we
have (see diagram)
P (E|F ) =
M
M
P (E(F1 ∪ . . . ∪ FM )) X
P (EFi ) X
P (E|Fi )P (Fi )
P (EF )
=
=
=
= P (E|Fi ) = .339
P (F )
P (F )
P (F )
i=1 P (F )
i=1
Example 5 (Urn): An urn contains k blue balls and n − k red ones. If we draw the balls out one
by one in random order (all n! orders equally likely), what is the chance that the first ball is blue?
Distinguish the balls by labels 1, . . . , k, k + 1, . . . , n, with first k corresponding to blue. Then
there are k ways to make the first choice so that it is blue, and then (n − 1)! ways to make the
remaining choices. On the other hand there are n! ways to make all n choices, without restrictions.
Thus the desired probability is k(n − 1)!/n! = k/n, intuitively quite evident, when we just focus
on the first choice. But the same argument works and gives the same answer when we ask: what
is the chance that the last ball (or any position ball) is blue.
Now suppose the urn contains b blue balls and r red balls. You randomly draw one by one
n balls out. Given that there are k blue balls among the n drawn, what is the chance that the
first one is blue? It is exactly the same as in the previous setting since the condition reduces the
sample space to equally likely sequences of n balls with k blues ones and n − k red ones, i.e., it is
as though we draw n from an urn with k blue and n − k red balls.
4
Example 6 (An Ace in Each Bridge Hand): When we deal out 4 hands of 13 cards each,
what is the chance that each hand has an ace?
Define the following four events:
E1 = {the ace of spades in any one of the hands}
E2 = {the ace of spades and the ace of heart are in different hands}
E3 = {the aces of spades, hearts and diamonds are in different hands}
E4 = {all four aces are in different hands},
then the desired probability is
52 39 26 13
·
·
·
≈ .1055
52 51 50 49
P (E1 E2 E3 E4 ) = P (E1 )P (E2 |E1 )P (E3 |E1 E2 )P (E4 |E1 E2 E3 ) =
The fractions become clear by viewing the deal as though the aces are dealt one by one as the first
4 cards to the 52 positions (positions 1, . . . , 13 making the first hand, positions 14, . . . , 26 making
the second hand, etc.). With that view 39/51 is the chance that the ace of hearts goes to one of
the 39 positions out of the 51 open positions, where 39 counts the positions in the other hands,
different from the hand where the ace of spades wound up. And so on. This reasoning assumes
(correctly) that this way of dealing out cards makes all such hands equally likely. In a normal deal,
clockwise one card to each player repeatedly from a shuffled deck, we could track which of the 52
slots the 4 aces got dealt into, then which slots the 4 kings got dealt to, and so on. This kind of
tracking should make clear that the same kind of hands are possible either way, all equally likely.
In both cases the shuffled deck determines the random sequence of dealing, with the same result.
Another path involves just simple counting, without conditional probabilities. We can deal the
cards in 52! orders. Assume that the first 13 cards go to player 1, the second 13 to player 2, etc.
The ace of space can land in any of the 52 deal positions, the ace of hearts has 39 positions left
so that it lands in any other hand, etc. After the 4 ace positions in different hands have been
determined, there are 48! ways to deal out the other cards. Thus
P (E) =
52 · 39 · 26 · 13
52 · 39 · 26 · 13 · 48!
=
52!
52 · 51 · 50 · 49
Example 7 (Tree Diagram: Rolling 5 Dice to Get Different Faces): 5 Dice are rolled at
most twice in order to achieve 5 distinct faces. On the second roll, dice with duplicate faces are
rolled again. What ist the chance of the event D of finishing with 5 distinct faces?
First we work out the probabilities for the events E1 , . . . , E5 , where Ei means that we get
exactly i distinct faces on the first roll. We have (recall the birthday problem)
720
6·5·4·3·2
p5 = P (E5 ) =
= 5
5
6
6
p3 = P (E3 ) =
5
1
5
3
·6·5
,
p4 = P (E4 ) =
·6·5·4
65
5
2
+
·6·5
5
1
·6·3·5·4
65
5
2
·6·5·4·3
65
=
=
3600
65
3000
65
450
6
, p1 = P (E1 ) = 5
5
6
6
5
Note that 720 + 3600 + 3000 + 450 + 6 = 7776 = 6 , confirming a proper count in the disjoint sets.
We only comment on P (E3 ) which can come about as triple, single, single, e.g., (4, 1, 3, 4, 4), and
p2 = P (E2 ) =
65
+
65
5
=
Lec7
ends
as two doubles and a single, e.g., (2, 4, 6, 6, 4). While the numerator count for the former should
be clear, the count for the
latter is obtained by choosing the position for the singleton and filling it
in 6 possible ways, i.e., 51 · 6, and then taking the left most free position as the left most position
in the left most pair and combine that with the 3 positions for the right most position in that
pair. The remaining slots define the other pair. These pairs are then filled with 5 and 4 respective
choices.
The probability can then be obtained by following all branches in the tree diagram below,
leading to the event of interest, i.e., all 5 faces are distinct, and multiplying the probabilities
along each such branch and adding up all these products. The probability at each branch segment
represents the conditional probability of traversing this segment, conditional on having arrived at
the segment from the root node. Again, an application of the law of total probability.
P (D) = P (E5 )P (D|E5 ) + P (E4 )P (D|E4 ) + P (E3 )P (D|E3 ) + P (E2 )P (D|E2 ) + P (E1 )P (D|E1 )
720 3600 2 3000 6
450 24
6 120
88940
=
+ 5 · + 5 · 2+ 5 · 3 + 5· 4 =
≈ .3177
5
6
6
6
6
6
6
6
6
6
67
Bayes’ Formula
Example 8 (Insurance): In a population there are 10% accident prone people and the rest is
not accident prone. An accident prone person has a 20% chance of having an accident in a given
6
year whereas for a normal person that chance is 10%. What is the chance that a randomly chosen
person will have an accident during the next year. If F is the event that the chosen person is
accident prone and A is the event that the chosen person will have an accident next year then
P (A) = P (A|F )P (F ) + P (A|F c )P (F c ) = .2 · .1 + .1 · .9 = .11
If the chosen person had an accident within that year, what is the chance that the person is accident
prone?
P (F |A) =
P (A|F )P (F )
P (A|F )P (F )
.2 · .1
2
P (AF )
=
=
=
=
c
c
P (A)
P (A)
P (A|F )P (F ) + P (A|F )P (F )
.11
11
i.e., the chance has almost doubled, from 1/10 to 2/11. This is an instance of Bayes’ formula.
Example 9 (Multiple Choice Tests): What is the chance of the event K that the student
knew the answer if the student answered the question correctly (event C). Assume that there are
m choices and the a priori chance of the student knowing the answer is p. When the student does
not know the answer, it is chosen randomly.
P (K|C) =
P (C|K)P (K)
1·p
m·p
P (KC)
=
=
=
c
c
P (C)
P (C|K)P (K) + P (C|K )P (K )
1 · p + (1/m)(1 − p)
1−p+m·p
Note:P (K|C) % 1 as m % ∞.
With p = .5 and m = 4 we get P (K|C) = .8.
Example 10 (Blood Test for Disease): A test is 95% effective on persons with the disease
and has a 1% false alarm rate. Suppose that the prevalence of the disease in the population is
.5%. What is the chance that the person actually has the disease (event D), given that the test is
positive (event E)?
P (DE)
P (E|D)P (D)
=
P (E)
P (E|D)P (D) + P (E|Dc )P (Dc )
95
.95 · .005
=
≈ .323
=
.95 · .005 + .01 · .995
294
.99 · .005
≈ .3322
P (D|E) =
.99 · .005 + .01 · .995
1 · .005
≈ .33445
P (D|E) =
1 · .005 + .01 · .995
P (D|E) =
With P (E|D) = .99
With P (E|D) = 1
Bayes’ formula
The following long run type argument makes the surprising answer more transparent. Out of
1000 people, roughly 995 will have no disease, and about 10 of them will give a false positive E.
5 will have the disease and about all will give a true positive. 5/(10 + 5) ≈ .333.
Such illustrations can counter the possible psychological damage arising from routine tests.
General Bayes Formula:
Let F1 , F2 , F3 , . . . , Fn be mutually exclusive events whose union in S then
P (E) =
n
X
i=1
P (EFi ) =
n
X
P (E|Fi )P (Fi ) law of total probability
i=1
and hence
7
Lec8
ends
P (Fj |E) =
P (Fj E)
P (E|Fj )P (Fj )
= Pn
P (E)
i=1 P (E|Fi )P (Fi )
Bayes’ formula
So far we have seen it when S was split in two mutually exclusive events, e.g. F and F c .
Example 11 (Three Cards): One card with both sides black (BB), one card with both sides
red (RR) and one card with a red side and a black side (RB). Cards are mixed and one
randomly selected card is randomly flipped and placed on the ground. You don’t see the flip. If
the color facing up is red (Ru ) what is the chance that it is RR?
P (RR|Ru ) =
1 · 31
P (Ru |RR)P (RR)
=
P (Ru |RR)P (RR) + P (Ru |BB)P (BB) + P (Ru |RB)P (RB)
1 · 13 + 0 · 31 + 12 ·
1
3
=
2
3
Independence
Definition of Independence:
Two events E and F are called independent if P (EF ) = P (E)P (F )
otherwise they are called dependent.
Motivate through P (E|F ) = P (E).
This relationship appears one–sided, but it is symmetric, if P (EF ) > 0, i.e., P (F ) > 0, P (E) > 0.
The definition of independence does not require P (F ) > 0.
An event F with P (F ) = 0 is always independent of any other event.
Example 12 (2 Dice): The number on the first die is independent of the number on the second
die. Let F4 be the event that the first die is 4 and S6 the event that the sum is 6 and S7 the event
that the sum is 7. Is F4 independent of S6 (S7 )?
5
1
1
1
1
, P (S6 ) =
, P (S7 ) = , P (F4 S7 ) =
= P (F4 )P (S7 ) , P (F4 S6 ) =
6= P (F4 )P (S6 )
6
36
6
36
36
Example 13 (Cards): If we draw a card at random then the event A that the card is an ace is
independent of the event C that the card is a club. However, this breaks down as soon as the
king of diamonds is missing from the deck, but not when all kings are missing.
P (F4 ) =
Theorem: Independence of E, F implies independence of E, F c , of E c , F and of E c , F c .
Example 14 (Independence in 3 Events?): If E is independent of F and also independent of
G, is E then independent of F G? Not necessarily!
In a throw of two dice let: E be the event that the sum is 7, F be the event that first die is a 4
and G be the event that the second die is a 3. Then P (E) = 1/6, P (F ) = 1/6, P (G) = 1/6,
P (EF ) = 1/36, P (EG) = 1/36, P (F G) = 1/36 but P (EF G) = 1/36 6= (1/6)(1/36) All events
are pairwise independent but E and F G are not.
Definition of Independence of 3 Events:
The 3 events E, F , and G are called independent if all the following relations hold:
P (EF G) = P (E)P (F )P (G)
P (EF ) = P (E)P (F ), P (EG) = P (E)P (G), P (F G) = P (F )P (G)
If E, F and G are independent then E is independent of any event formed from F and G, e.g., E
is then independent of F ∪ G, F c ∪ G, F G, F Gc , etc.
8
Definition (Extension of Independence to Many Events): E1 , E2 , . . ., En are called
independent if
P (Ei1 Ei2 . . . Eik ) = P (Ei1 )P (Ei2 ) · · · P (Eik )
for any subcollection of events Ei1 , Ei2 , . . ., Eik (1 ≤ k ≤ n) taken from E1 , E2 , . . ., En .
Subexperiments, Repeated Experiments: Often an experiment is made up of many
subexperiment (say n) such that the outcome in each subexperiment is not affected by the
outcomes in the other experiments. In such a case of ”physical independence” we may then
reasonably assume the (probabilistic) independence of the events E1 , E2 , . . ., En provided Ei is
completely described by the outcomes in the ith subexperiment.
If Si represents the sample space of the ith subexperiment and ei one of its typical outcomes, then
an outcome e of the full experiment could be described by e = (e1 , e2 , . . . , en ) and its sample
space is:
S = S1 × S2 × · · · × Sn = {(e1 , e2 , . . . , en ) : e1 ∈ S1 , e2 ∈ S2 , . . . , en ∈ Sn } .
If the sample spaces of the subexperiments are all the same and if the probability function defined
on the events of each subexperiment is the same, then the subexperiments are called trials.
Note that by prescribing the probability function for each subexperiment and assuming
independence of the events from these subexperiments, it is possible to construct a probability
function on the events described in terms of the outcomes (e1 , e2 , . . . , en ) of the overall
experiment such that it is consistent with the probability function on the subexperiments.
Assume this without proof.
Lec9
ends
Example 15 (Finite and Infinite Independent Trials): A finite number n or an infinite
sequence of independent trials is performed. For each trial we distinguish whether a certain event
E occurs or not. If E occurs we call the result a success otherwise we call the result a failure. Let
p = P (E) denote the probability of success in a single trial.
E1 be the event of at least one success in the first n trials.
Ek be the event of exactly k successes in the first n trials.
E∞ be the event that all trials are successes.
Find P (E1 ), P (Ek ) and P (E∞ ).
!
P (E1 ) = 1 −
P (E1c )
n
= 1 − (1 − p) ,
n k
p (1 − p)n−k
P (Ek ) =
k
(
P (E∞ ) ≤ P (En ) = p
n
for all n
=⇒ P (E∞ ) =
0 for p < 1
1 for p = 1
Example 16 (Parallel System): A system is composed of n separate components
(relays,artificial horizon in cockpit) such that the system “works” as long as at least one of the
components work. Such a system is called a parallel system. Suppose that during a given time
period the chance that component i works (functions) is pi , i = 1, 2, . . . , n and assume that the
functioning of a component does not depend on that of any of the other components, i.e. we may
assume probabilistic independence of (failure) events pertaining to separate components. Let E
9
Air
India
litigation
denote the event that the system functions, i.e. at least one of the components functions during
the given time period. Let Ei denote the event that component i functions. Then
P (E) = 1 − P (E c ) = 1 − P (E1c E2c · · · Enc ) = 1 −
n
Y
i=1
P (Eic ) = 1 −
n
Y
(1 − pi )
i=1
thus achieving arbitrarily high reliability (= probability of functioning) through redundancy.
This is how once can achieve, on paper, a probability of failure on the order of 10−9 , by having a
triply redundant system with components of reliability .999 each ⇒ (.001)3 = 10−9 .
Example 17 (Infinite Trials, E Before F ): Suppose we perform a potentially infinite number
of independent trials and for each trial we note whether an event E or an event F or neither of
the two events occurs. It is assumed that E and F are mutually exclusive. Their respective
probabilities of occurrence in any given trial are denoted by P (E) and P (F ). What is the
probability that E occurs before F in these trials. Let G denote the event consisting of all those
trial sequences, i.e. outcomes, in which the first E occurs before the first F . Let E1 denote the
event that the first trial results in the event E, let F1 denote the event that the first trial results
in F and N1 the event that neither E nor F occurs in the first trial. Then
P (G) = P (GE1 ) + P (GF1 ) + P (GN1 )
= P (G|E1 )P (E1 ) + P (G|F1 )P (F1 ) + P (G|N1 )P (N1 )
= 1 · P (E) + 0 · P (F ) + P (G)(1 − P (E) − P (F ))
i.e. P (G) =
P (E)
P (E) + P (F )
Discuss intuition.
P ( · |F ) is a Probability:
The Power of the Axiomatic Approach!
For fixed F with P (F ) > 0 the function P (E|F ) is a probability function defined for events
E ⊂ S, i.e. it satisfies the 3 axioms of probability:
1. 0 ≤ P (E|F ) ≤ 1
2. P (S|F ) = 1
3. P (∪∞
i=1 Ei |F ) =
P∞
i=1
P (Ei |F ) for mutually exclusive events E1 , E2 , E3 , . . ..
Consequences: All the consequences derived from the original axiom set hold as well for the
conditional probability function Q(E) = P (E|F ), e.g.
Q(E1 ∪ E2 ) = Q(E1 ) + Q(E2 ) − Q(E1 E2 ) or P (E1 ∪ E2 |F ) = P (E1 |F ) + P (E2 |F ) − P (E1 E2 |F )
Also:
Q(E|G) =
Q(EG)
P (EG|F )
P (EGF )/P (F )
P (EGF )
=
=
=
= P (E|GF )
Q(G)
P (G|F )
P (GF )/P (F )
P (GF )
10
and
P (E|F ) = Q(E) = Q(E|G)Q(G) + Q(E|Gc )Q(Gc ) = P (E|F G)P (G|F ) + P (E|F Gc )P (Gc |F )
Conditional Independence: Two events E1 and E2 are conditionally (given F ) independent if
Q(E1 E2 ) = P (E1 E2 |F ) = P (E1 |F )P (E2 |F ) = Q(E1 )Q(E2 ) .
Conditional independence of E1 and E2 does not imply the (unconditional) independence of E1
and E2 . Randomly pick one of two boxes | • • • ◦| and | • • ◦ | and then 2 balls with replacement
from that box.
P (•1 •2 |box i) = P (•1 |box i)P (•2 |box i) but P (•1 •2 ) 6= P (•1 )P (•2 )
1 3 1 2
17
1 3 3 1 2 2
145
17 2
P (•i ) = × + × =
and P (•1 •2 ) = × × + × × =
6=
2 4 2 3
24
2 4 4 2 3 3
288
24
Conversely, unconditional independence of E1 and E2 does not imply their conditional
Lec10
independence given F . Rolling two dice, let E1 = “sum of 7”, E2 = “first die is a 6” and F = “at ends
least one 6.”
1
2 6
1
while P (E1 E2 |F ) =
6= P (E1 |F )P (E2 |F ) =
P (E1 E2 ) = P (E1 )P (E2 ) =
36
11
11 11
However, the independence of E1 , E2 and F implies the conditional independence of E1 and E2
given F (exercise).
This notion of conditional independence of two events can easily be generalized to a
corresponding notion of conditional independence of three or more events as was done for the
unconditional case.
Example 18 (Insurance Revisited): Let F be the event that a randomly selected person is
accident-prone with P (F ) = .1, let A1 be the event that this person has an accident in the first
year and A2 be the event that this person has an accident in the second year. An accident-prone
person has chance .2 of having an accident in any given year whereas for a non-accident prone
person that chance is .1. 10% of the population is accident-prone. Finally we assume that A1 and
A2 are conditionally independent given that an accident-prone (or non-accident-prone) person
was selected, i.e. given F (or F c ). What is P (A2 |A1 ) ?
P (A2 A1 |F )P (F ) + P (A2 A1 |F c )P (F c )
P (A2 A1 )
=
P (A1 )
P (A1 )
P (A2 |F )P (A1 |F )P (F ) + P (A2 |F c )P (A1 |F c )P (F c )
=
P (A1 )
13
.2 · .2 · .1 + .1 · .1 · .9
=
=
= .1182 6= P (A2 ) = .11
.11
110
It also shows that conditional independence does not necessarily imply unconditional
independence. Similarly one shows
P (Ac3 Ac1 Ac2 )
P (Ac3 Ac1 Ac2 |F )P (F ) + P (Ac3 Ac1 Ac2 |F c )P (F c )
P (Ac3 |Ac1 Ac2 ) =
=
P (Ac1 Ac2 )
P (Ac1 Ac2 |F )P (F ) + P (Ac1 Ac2 |F c )P (F c )
.83 · .1 + .93 · .9
=
= .8919 .
.82 · .1 + .92 · .9
Compare this with P (Ac3 ) = .89 and P (Ac3 |F ) = .8 and P (Ac3 |F c ) = .9.
P (A2 |A1 ) =
11
The Gambler’s Ruin Problem
Two gamblers, A and B, with respective fortunes of i and N − i units keep flipping a coin
(probability of heads = p). Each time a head turns up A collects one unit from B, each time a
tail turns up B collects one unit from A. The game ends when one of the players goes broke, i.e.
gets ruined. What is the probability of the event E that A is the ultimate winner?
Let Pi = P (E), the subscript i emphasizing the dependence on the fortune of player A.
With H denoting the event of a head on the first toss, note that
Pi = P (E) = P (E|H)P (H) + P (E|H c )P (H c ) = pP (E|H) + (1 − p)P (E|H c ) = pPi+1 + (1 − p)Pi−1
=⇒
pPi + (1 − p)Pi = pPi+1 + (1 − p)Pi−1
or
Pi+1 − Pi =
1−p
(Pi − Pi−1 )
p
Using the boundary condition P0 = 0 we get
P2 − P1
=
1−p
1−p
(P1 − P0 ) =
P1
p
p
!2
1−p
1−p
P3 − P2 =
(P2 − P1 ) =
P1
p
p
... ... ...
!i−1
1−p
1−p
Pi − Pi−1 =
(Pi−1 − Pi−2 ) =
P1
p
p
... ... ...
!N −1
1−p
1−p
P1
PN − PN −1 =
(PN −1 − PN −2 ) =
p
p
and adding these equations get
!
!2
!i−1 
1
−
p
1
−
p
1
−
p

Pi − P1 = P1 
+
... +
p
p
p
or
1 − ((1 − p)/p)i
Pi = P1
if (1 − p)/p 6= 1 or p 6= 1/2
1 − (1 − p)/p
and
Pi = iP1 if (1 − p)/p = 1 or p = 1/2
Using PN = 1 we get:
P1
=
1 − (1 − p)/p
1 − ((1 − p)/p)N
if p 6= 1/2 and P1 =
1
N
if p = 1/2
Pi
=
1 − ((1 − p)/p)i
1 − ((1 − p)/p)N
if p 6= 1/2 and Pi =
i
N
if p = 1/2 .
and hence
If Qi = probability that player B wins ultimately, starting with N − i units, we get by symmetry:
Qi =
1 − (p/(1 − p))N −i
if (1 − p) 6= 1/2
1 − (p/(1 − p))N
and
Qi =
N −i
if (1 − p) = 1/2 .
N
and note Pi + Qi = 1, i.e. the chance that play will go on forever is 0.
See illustrations on the last 4 pages.
12
Example 19 (Drug Testing): Two drugs are tested on pairs of patients, one drug per patient
in a pair. Drug A has cure probability PA and drug B has cure probability PB . The test for each
pair of patients constitutes a trial and the score SA of A is increased by one each time drug A
results in a cure of that patient. Similarly the score SB of B goes up each time B effects a cure.
The trials are stopped as soon as SA − SB either reaches M or −M , where M is some
predetermined number. If we eliminate all those trials which result in no change of the score
difference SA − SB , then the remaining trials are again independent and the outcome of such a
remaining trial is either that SA − SB increases or decreases by one with probability
P = PA (1 − PB )/(PA (1 − PB ) + PB (1 − PA )) or 1 − P = PB (1 − PA )/(PA (1 − PB ) + PB (1 − PA ))
respectively. This is exactly the gambler’s ruin problem, where both players start out with M
units, i.e. N = 2M .
Note: SA − SB = M iff A had M more wins than B or A wins, if both start with M betting units.
When PA 6= PB , the probability that drug B comes out ahead ({B > A}) is:
)M
1 − ( 1−P
1
P
P ({B > A}) = 1−P ({A > B}) = 1−
1−P 2M =
1 + γM
1−( P )
where γ =
P
PA (1 − PB )
=
.
1−P
PB (1 − PA )
For PA = .6 and PB = .4 and M = 5 we get P ({B > A}) = .017 and for M = 10 we get
P ({B > A}) = .0003.
γ is also called the odds-ratio, of the odds PA /(1 − PA ) over the odds PB /(1 − PB ).
13
1.0
0.8
0.6
0.4
0.2
r=
●
i
N
= 0.05
0.0
Pi = probability that player A with capital i ruins player B with capital N−i
N=100; i=5
N=200; i=10
N=500; i=25
N=1000; i=50
N=5000; i=250
N=10000; i=500
0.45
0.50
0.55
p = P(A wins one unit per game)
14
0.60
1.0
0.8
0.6
0.4
0.2
●
r=
i
N
= 0.25
0.0
Pi = probability that player A with capital i ruins player B with capital N−i
N=100; i=25
N=200; i=50
N=500; i=125
N=1000; i=250
N=5000; i=1250
N=10000; i=2500
0.45
0.50
0.55
p = P(A wins one unit per game)
15
0.60
1.0
0.8
0.6
0.2
0.4
●
r=
i
N
= 0.5
0.0
Pi = probability that player A with capital i ruins player B with capital N−i
N=100; i=50
N=200; i=100
N=500; i=250
N=1000; i=500
N=5000; i=2500
N=10000; i=5000
0.45
0.50
0.55
p = P(A wins one unit per game)
16
0.60
1.0
0.8
0.2
0.4
0.6
●
r=
i
N
= 0.95
0.0
Pi = probability that player A with capital i ruins player B with capital N−i
N=100; i=95
N=200; i=190
N=500; i=475
N=1000; i=950
N=5000; i=4750
N=10000; i=9500
0.45
0.50
0.55
p = P(A wins one unit per game)
17
0.60