Download Conditional Probability - University of Arizona Math

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Birthday problem wikipedia , lookup

Inductive probability wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
1
1.1
Conditional Probability
Introduction
Probability Theory is about assigning probabilities to the outcomes of random
events. As we have seen, any assignment of probabilities that satisfy the arithmetic requirements is a valid mathematical probability model. Eventually we
will look at ways to judge how well a probability model corresponds to the observed results of a random experiment. Until then, when we need an example
of a probability model for a familiar experiment, we will try to choose the probability model that corresponds closely to our experience with that experiment.
An example we have used, and will continue to use, is the rolling of a pair
of fair dice. There are several possible sample spaces for this experiment, but
the most general is
f(1; 1); (1; 2); (1; 3); (1; 4); (1; 5); (1; 6);
(2; 1); (2; 2); (2; 3); (2; 4); (2; 5); (2; 6);
(3; 1); (3; 2); (3; 3); (3; 4); (3; 5); (3; 6);
(4; 1); (4; 2); (4; 3); (4; 4); (4; 5); (4; 6);
(5; 1); (5; 2); (5; 3); (5; 4); (5; 5); (5; 6);
(6; 1); (6; 2); (6; 3); (6; 4); (6; 5); (6; 6)g:
This sample space assumes we can tell the two dice apart, identifying one as the
…rst and the other as the second. The ordered pairs in the sample space give
the outcome of …rst and then the second die. We will create a probability model
based on the assumption that these sample points are equally likely. Thus we
have
1
:
p((a; b)) =
36
We identify several events in this sample space for reference in future examples.
First there are the events associated with the totals on the two dice. These
events can be used to create their own sample space, and it can be assigned a
probability model that corresponds to the equiprobable model on the ordered
pairs. The result is
Total
p(k)
p(k)
2
3
4
5
6
7
8
9
10
11
12
1
36
1
36
2
36
1
18
3
36
1
12
4
36
1
9
5
36
5
36
6
36
1
6
5
36
5
36
4
36
1
9
3
36
1
12
2
36
1
18
1
36
1
36
The dice can also come up doubles or not. Those probabilities are
p
p
Doubles
Not Doubles
6
36
1
6
30
36
5
6
1
We also want to consider the events that the total come up odd or come up
even. We again compute these probabilities from original model.
p
p
1.2
Even
Odd
18
36
1
2
18
36
1
2
Conditional Probability I
The notion of conditional probability is one of the trickiest in probability theory. Conditional probability can lead to results that at …rst seem counterintuitive. For this reason, many probability puzzles, mathematical paradoxes,
magic tricks, and cheating schemes have their origins in conditional probability.
Also most, if not all, successful gambling strategies are based on a complete
analysis of conditional probabilities. This is especially true in card games.
Conditional probability is based on the following premise:
Summary 1 Suppose we have a probability model for a random experiment.
Suppose that one round of the experiment has been conducted, but that the full
result has not yet been revealed. Suppose that it comes to your attention that
a particular event in the sample space has de…nitely occurred. How does that
knowledge change your view of the probabilities of other events?
It is clear that knowing that certain event has taken place, does give you
additional information, and that information should have an e¤ect on you estimates of other probabilities. If you have two events E1 and E2 that are
mutually exclusive, and you discover that E1 has de…nitely occurred, then that
excludes the possibility that E2 occurred as well. The event E2 may have a
positive probability at the beginning of the experiment, but once you know that
E1 has occurred, you should reason that the probability of E2 occurring has
dropped to zero.
Consider a speci…c example. Suppose that two fair dice are rolled, but you
do not know exactly how they came out. You are told that the result did
produce an even total. The sample space for the experiment is
f(1; 1); (1; 2); (1; 3); (1; 4); (1; 5); (1; 6);
(2; 1); (2; 2); (2; 3); (2; 4); (2; 5); (2; 6);
(3; 1); (3; 2); (3; 3); (3; 4); (3; 5); (3; 6);
(4; 1); (4; 2); (4; 3); (4; 4); (4; 5); (4; 6);
(5; 1); (5; 2); (5; 3); (5; 4); (5; 5); (5; 6);
(6; 1); (6; 2); (6; 3); (6; 4); (6; 5); (6; 6)g:
The event of even total is
Even = f(1; 1); (1; 3); (1; 5); (2; 2); (2; 4); (2; 6);
(3; 1); (3; 3); (3; 5); (4; 2); (4; 4); (4; 6);
(5; 1); (5; 4); (5; 5); (6; 2); (6; 4); (6; 6)g:
2
The sample points that remain are still equally likely, so armed with the extra
information that it was de…nitely one of these 18 sample points that occurred,
1
you should adjust the probability of each to 18
. This is the conditional probability of the sample points:
0
p((a; b) given Even) =
1
18
if a + b is odd
if a + b is even.
We read this notation as "the probability that (a; b) occurs given that the event
Even occurs."
We can also adjust the probabilities of the various totals:
Total
p(Total k given Even)
p(Total k given Even)
2
3
4
5
6
7
8
9
10
11
12
1
18
1
18
0
18
3
18
1
6
0
18
5
18
5
18
0
18
5
18
5
18
0
18
3
18
1
6
0
18
1
18
1
18
0
0
0
0
0
and the probabilities for doubles and not:
p( given Even)
p( given Even)
Doubles
Not Doubles
6
18
1
3
12
18
2
3
Of course we also have
p( given Even)
1.3
Even
1
Odd
0
Conditional Probability II
The o¢ cial de…nition of conditional probability depends strictly on a bit of
arithmetic. It depends, however, on the probability of the intersection event
E 1 \ E2 .
De…nition 2 Let E1 and E2 be two events in the sample space of a random
experiment with a set probability model. The conditional probability of E1 given
E2 is
p(E1 \ E2 )
p(E1 given E2 ) =
.
p(E2 )
It is easy to remember this formula if you know that it involves E1 \ E2 .
First the formula requires division, and to be a probability, the larger number
in the denominator. Since the intersection event E1 \ E2 is part of both of the
events E1 and E2 , its probability is smaller. So p(E1 \ E2 ) is in the numerator.
The denominator is the event that follows the word "given." The only hard
part is to remember that p(E1 given E2 ) means that we are assuming that E2
has de…nitely occurred, and we are looking for the probability of E1 :
3
This formula gives results that agree with the probabilities we computed
earlier. For example,
5
;
36
1
p(Even) =
;
2
5
p(Total 6 \ Even) =
36
because coming up with Total 6 automatically means the total came up Even
at the same time. Thus
p(Total 6 )
p(Total 6 given Even) =
=
5
36
1
2
=
5 2
5
=
:
36 1
18
Also
p(Doubles)
=
p(Even)
=
p(Doubles \ Even)
=
So
p(Doubles given Even) =
1
6
1
2
1
;
6
1
;
2
1
:
6
=
1 2
1
=
6 1
3
=
5 36
= 1:
36 5
But we see that
p(Even given Total 6 ) =
5
36
5
36
Because if it came up with a total 6, we know for sure the total is even.
Now notice that if E1 and E2 are mutually exclusive, then E1 \ E2 = ;:
Thus
p(E1 given E2 )
=
p(E2 given E1 )
=
p(E1 \ E2 )
0
=
=0
p(E2 )
p(E2 )
p(E1 \ E2 )
0
=
=0
p(E1 )
p(E1 )
Normally p(E1 given E2 ) and p(E2 given E1 ) are di¤erent, but if the two
events are mutually exclusive, they are both zero.
For another example, consider the game of Rock-Paper-Scissors. Assuming
the two players are trying to vary their plays, the game can be considered a
random process. A sample space for this experiment is
f(Rock, Rock); (Rock, Paper); (Rock, Scissors);
(Paper, Rock); (Paper, Paper); (Paper, Scissors);
(Scissors, Rock); (Scissors, Paper); (Scissors, Scissors)g:
4
We build an equiprobable probability model on this sample space, each sample
point will be assigned a probability of 19 . There are three possible events to
worry about:
First player wins
Second player wins
Tie
= f(Rock, Scissors); (Scissors, Paper); (Paper, Rock)g
= f(Rock, Paper); (Scissors, Rock); (Paper, Scissors)g
= f(Rock, Rock); (Scissors, Scissors); (Paper, Paper)g
Then the probabilities under the model are
p(First player wins)
=
p(Second player wins)
=
p(Tie)
=
1
3
1
3
1
:
3
So according to these models, when we play this game, the …rst player wins 31
of the time; the second player wins 13 of the time, and the game ends in a tie 13
of the time.
Now suppose we are only interested in who wins the game. That means
that we would like to assume that the game reaches a conclusion. Only the
event Tie does not determine a winner, so we are interested in the outcome of
the game under the condition it was not a tie. The event that the game did
not end in a tie is the complementary event Tiec : We know
p(Tie c ) = 1
p(Tie) = 1
1
2
= :
3
3
Now if the …rst player wins, the game did not end in a tie, and similarly for the
second player. So
p(Tie c \ (First player wins))
= p(First player wins) =
p(Tie c \ (Second player wins))
1
3
= p(Second player wins) =
1
3
Thus
p((First player wins) given Tie c )
= p(First player wins) =
p((Second player wins) given Tie c )
1
3
2
3
= p(Second player wins) =
=
1
3
2
3
1
2
=
1
2
This is the third distinct way we have analyzed this game and come to the
conclusion that both players have an equal chance to win.
5
1.4
Independence
Consider another example of conditional probability involving two fair dice. In
this example, we will consider events where F1 is the event that the …rst die
comes up 1; F2 is the event that the …rst die comes up 2, and so on through F3 ,
F4 , F5 , and F6 . Similarly, S1 is the event that the second die comes up 1, and
this goes on through S2 , S3 , S4 , S5 , and S6 . For example
F1
S3
= f(1; 1); (1; 2); (1; 3); (1; 4); (1; 5); (1; 6)g
= f(1; 3); (2; 3); (3; 3); (4; 3); (5; 3); (6; 3)g:
Now in pairs the Fk events are mutually exclusive, and the Sk events are mutually exclusive in pairs also. However, each Fk event has sample points in
common with each Sk event. In fact,
Fk \ Sj = f(k; j)g:
We can easily compute probabilities for all these events:
6
= 16 ;
p(Fk ) = 36
6
p(Sj ) = 36 = 16 ;
1
p(Fk \ Sj ) = 36
;
for all k = 1; 2; 3; 4; 5; 6
for all j = 1; 2; 3; 4; 5; 6
for all possible k and j.
From this, we can compute all possible conditional probabilities. First
p(Fk given Fj ) = 0 if k 6= j
p(Sk given Sj ) = 0 if k 6= j
This is because the Fk events are mutually exclusive in pairs; as are the Sk
events. Of course, if k = j,
p(Fk given Fk ) = 1
p(Sk given Sk ) = 1:
The interesting conditional probabilities are the mixed pairs. We see
p(Fk given Sj ) =
p(Fk \ Sj )
=
p(Sj )
1
36
1
6
=
1 6
1
= :
36 1
6
p(Fj \ Sk )
=
p(Fj )
1
36
1
6
=
1
1 6
= :
36 1
6
In the other direction,
p(Sk given Fj ) =
Now normally, when we change the order in a conditional probability, the
computed probability changes. However, the Fk and Sk events are special in
this regard. But they are even more special than that. We also notice that
p(Fk given Sj )
=
p(Sk given Fj )
=
6
1
= p(Fk )
6
1
= p(Sk ):
6
What this means is that knowing the outcome of the second die in a toss does
not change the probability that any particular number occurs on the …rst die.
Similarly, knowing the outcome of the …rst die in a toss, does not change the
probability that any particular number occurs on the second die. In particular,
the probability that the second die came up 2 is 16 . If you know for sure that
the …rst die came up with a 5, the probability that the second die came up 2
(given that the …rst came up 5) is still 16 . Knowing that the …rst die came up
6 gives you no additional information about whether the second die came up 2
or not. If the result of the …rst die gave even the slightest hint about what the
second die did, then the probability would have changed.
When we have two events D and E in a sample space with a probability
model, we say they are independent events if
p(D given E) = p(D)
and p(E given D) = p(E).
By our de…nition of conditional probability, we see that the …rst condition is
the same as saying
p(D) = p(D given E) =
p(D \ E)
:
p(E)
And in turn
p(E)p(D) = p(D \ E):
The second condition leads to the same place:
p(E)
p(E)p(D)
= p(E given F ) =
p(D \ E)
p(D)
= p(D \ E):
Thus we are led to the following o¢ cial de…nition:
De…nition 3 When we have two events D and E in a sample space with a
probability model, we say they are independent events if
p(D \ E) = p(D)p(E):
In our example, we used the equiprobable probability model based on the
ordered pair sample space to …nd that the results of the …rst die and the results
of the second die are mathematically independent. That would mean that, no
matter how much two dice bump into each other as they are thrown, when they
come to a stop, the end result on one die has no in‡uence on the result on the
other. When we throw two dice, they do dice act independently of each other,
and neither has any substantial in‡uence on the other in terms of how they
come to rest.
If this seems intuitive to you, then that supports the idea that the probability
model for dice throwing will correspond to any results we might observe. This
7
supports our choice of the equiprobable probability on the sample space of
ordered pairs.
In fact, we could go the other way. We could start with the assumption that
on the toss of one die, the sample space f1; 2; 3; 4; 5; 6g is equally likely. So on
the toss of one fair die, we would use the probability model
k
p(k)
1
2
3
4
5
6
1
6
1
6
1
6
1
6
1
6
1
6:
If we also assume that the toss of two dice are independent, and thus all events
involving two di¤erent dice are mathematically independent, then we can compute
p((k; j))
= p(Fk \ Sj )
= p(Fk ) p(Sj )
1 1
=
6 6
1
:
=
36
This is the same probability we get by assuming the ordered pair sample space is
equilikely. So we …nd that the assumption that the six possibilities on the toss
of one die are equilikely, and that, in a toss of two die, the results on each die
are independent is equivalent to assuming that the ordered pair sample space
of a two dice experiment is equilikely. Certainly if two reasonable sounding
assumptions lead to the same probability models, we are on to something.
Now we have two terms that describe the interactions between events in a
random experiment: mutually exclusive and independent. They are totally
di¤erent, but nonetheless, easy to confuse. Two events are mutually exclusive
if they cannot both occur at the same time. Before you describe two events
as mutually exclusive, you should check and make sure that they cannot both
occur. If two events are mutually exclusive, then knowing that one occurred will
de…nitely tell you something about the probability of the other event. Once you
one of a pair of mutually exclusive events has occurred, you know for absolute
certain that the other event has not occurred! Mutually exclusive events cannot
possibly be independent.
If two events are independent, then knowing one
occurred cannot give you even the slightest hint about the other. Independent
events cannot be mutually exclusive. If two events are independent, then it must
be possible for them both to occur at once. Before you describe two events as
independent, you should check and make sure that they can both occur at once
in one experiment.
Recognizing when two events are either mutually exclusive or independent
is important because both of these are very special situations. When we know
one of these is the case, we have useful information about the possibility of them
both occurring and the possibility that either one of them occurs.
Conclusion 4 If E and D are mutually exclusive events in a sample space with
a probability model, then p(E \ D) = 0:
8
Conclusion 5 If E and D are independent events in a sample space with a
probability model, then p(E \ D) = p(D)p(E):
Conclusion 6 If E and D are mutually exclusive events in a sample space with
a probability model, then p(E [ D) = p(D) + p(E):
Conclusion 7 If E and D are independent events in a sample space with a
probability model, then p(E [ D) = p(D) + p(E) p(D)p(E):
1.5
Caution
We started this discussion saying that the notion of conditional probability is
one of the trickiest in probability theory. Conditional probability can lead to
results that at …rst seem counter-intuitive. For this reason, many probability
puzzles, mathematical paradoxes, magic tricks, and cheating schemes have their
origins in conditional probability. Also most, if not all, successful gambling
strategies are based on a complete analysis of conditional probabilities. This
is especially true in card games. Once you see what conditional probability is
about, you can see how it applies to games and strategies.
Suppose you are playing a card game, and the …nal outcome that will determine whether you win or your opponent does comes down to one card. From
the exposed cards, it is clear to all that, if the …nal card is a spade, you will
win, but if it is not, you will lose. Further, everyone watching would see that
there are 34 cards left in the deck.. Further those people paying close attention
would have counted and know that, of the 34 unaccounted for cards, exactly 8
are spades, 4 are clubs, 11 are diamonds, and 11 are hearts.
We can create a sample space for this random experiment:
fS1 ; S2 ; : : : S8 ; C1 ; C2 ; C3 ; C4 ; D1 ; : : : ; D11 ; H1 ; : : : ; H11 g
Here the S’s C’s, D’s and H’s are unaccounted for cards in the various suits.
The event that you will win is
E = fS1 ; S2 ; : : : S8 ; g:
It seems reasonable to assume that the sample space is equilikely. That gives a
1
probability model where every card has a probability of 34
. Thus the probability
4
8
that you will win is 34 = 17 : This does not look good. However, you are a
very attentive card player and the dealer is very sloppy. You have noticed that
the next card to be dealt is black. You know for sure that the event
D = fS1 ; S2 ; : : : S8 ; C1 ; C2 ; C3 ; C4 g
has occurred. Now
and so we compute
D \ E = fS1 ; S2 ; : : : S8 g
p(D)
=
p(D \ E)
=
9
6
12
=
34
17
8
4
=
34
17
So you know that your chances of winning are actually
p(D \ E)
=
p(D)
p(E given D) =
4
17
6
17
=
2
4 17
= :
17 6
3
This looks much better. Of course, using this extra information would be highly
unethical, but at the same time rather lucrative.
The aspect of conditional probability that makes it so tricky is that it requires
a very complete knowledge of exactly what event is known to occur. Many
puzzles, tricks, and other schemes are based on obscuring the full content of the
conditional event. Consider the following example.
Suppose a friend takes a dime and a penny and ‡ips them both in such a
way you cannot see the results. She then informs you that one of the coins has
come up heads. What is the probability that the other is also heads? Well,
‡ipping two coins is the same as ‡ipping two coins independently. The results
of one coin should not e¤ect the results of the other. Thus the probability that
the other coin is a heads should be 12 :
But not so fast. If we set up a sample space for this experiment, we would
probably pick
f(H; H); (H; T ); (T; H); (T; T )g
where the penny is …rst and the dime is second. We keep track of the ordered
pairs because we saw from our dice example that this should lead to an equilikely
sample space. Under this assumption, we assign a probability of 41 to each pair.
Now we are told that one of the coins is a heads. That is the event
E = f(H; H); (H; T ); (T; H)g:
We are asked about the case where they are both heads, that is
D = f(H; H)g:
We notice that
D \ E = f(H; H)g:
Computing probabilities, we …nd
p(E)
=
p(D \ E)
=
p(D given E)
=
3
4
1
4
p(D \ E)
1 4
1
=
= :
p(E)
4 3
3
This is much less than 12 :
What happened? In our …rst calculation we fell for our friends, possible
deliberate, obscuring of the event she told us took place. Our friend said,
"One of the coins has come up heads." Notice she did not say, "The penny
10
came up heads" or "The dime came up heads." In our calculation we reasoned
that ‡ipping two coins is the same as ‡ipping two coins independently. This
is correct; however, when we ‡ip two coins independently, we do know which
is which. Our sample space f(H; H); (H; T ); (T; H); (T; T )g assumes we know
which coin is which. We know from experience that that should lead to an
equilikely sample space. The information that one of the coins has come up
heads is actually less than it seems. Because it does not say whether it was
the penny or the dime that came up heads, it only eliminated one of the four
equally likely possibilities.
To see how this actually works in practice, suppose you are playing the game
repeatedly and keeping score. You friend agrees to play the exact same game as
long as they can. First your friend says, "One of the coins has come up heads"
and you make your guess on the other. Next your friend says, "One of the coins
has come up heads" and you make your guess on the other. Again, your friend
says, "One of the coins has come up heads" and you make your guess on the
other. Then your friend says, "One of the coins has come up tails." You have to
ask yourself, "why did she change what she told me?" Assuming your friend is
keeping her promise, there is only one reason she has changed her information.
She had to change it to be truthful. Your informed guess is that the other coin
is also a Tails which is right. This means that there is some small hint about
the "other" coin in the information "one of the coins has come up heads." This
phrasing obscures the full content of the event being described in the game, and
that leads the careless to draw the wrong conclusion.
So does that mean that your friend saying "The penny came up heads"
changes the game. Yes, if the game always requires that the result of the
penny is disclosed. However, if your "friend" is trying to take advantage of
you, they will change this particular rule every time they play the game. They
will sometime tell you what the penny did, sometimes the dime. They will
sometimes tell you a coin came up heads, other times tails, and they will not
wait until they are forced to before the make the switch. If the rule of the
game does not allow this, then the 21 and 21 probability model works. But if
the rule does allow these changes, or you friend chooses to ignore the rule, then
the correct probability model is 31 and 23 . Since the rules of the game involve
someone’s choice, that human intervention has reduced some of the assumed
randomness of the play. Your probability analysis and your participation in the
game need to take this into account.
Conditional probability is, in practice, very tricky business. But the mathematics does have one very practical application. Any extra condition when
known can have a large impact on the probabilities of other events. Poker
players certainly know this. Good ones are always alert to the slightest hint
about what another player knows, and they are as careful as possible not to accidently give any of their own information away. Even the most skilled gambler
can loose a lot of money very quickly by developing an inadvertent "tell" that
gives away the strength of their hand. Top players in a losing streak are known
to …lm themselves playing and watch the play over and over trying to spot any
behavior that is giving their thoughts away.
11
Prepared by: Daniel Madden and Alyssa Keri: May 2009
12