Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
1 1.1 Conditional Probability Introduction Probability Theory is about assigning probabilities to the outcomes of random events. As we have seen, any assignment of probabilities that satisfy the arithmetic requirements is a valid mathematical probability model. Eventually we will look at ways to judge how well a probability model corresponds to the observed results of a random experiment. Until then, when we need an example of a probability model for a familiar experiment, we will try to choose the probability model that corresponds closely to our experience with that experiment. An example we have used, and will continue to use, is the rolling of a pair of fair dice. There are several possible sample spaces for this experiment, but the most general is f(1; 1); (1; 2); (1; 3); (1; 4); (1; 5); (1; 6); (2; 1); (2; 2); (2; 3); (2; 4); (2; 5); (2; 6); (3; 1); (3; 2); (3; 3); (3; 4); (3; 5); (3; 6); (4; 1); (4; 2); (4; 3); (4; 4); (4; 5); (4; 6); (5; 1); (5; 2); (5; 3); (5; 4); (5; 5); (5; 6); (6; 1); (6; 2); (6; 3); (6; 4); (6; 5); (6; 6)g: This sample space assumes we can tell the two dice apart, identifying one as the …rst and the other as the second. The ordered pairs in the sample space give the outcome of …rst and then the second die. We will create a probability model based on the assumption that these sample points are equally likely. Thus we have 1 : p((a; b)) = 36 We identify several events in this sample space for reference in future examples. First there are the events associated with the totals on the two dice. These events can be used to create their own sample space, and it can be assigned a probability model that corresponds to the equiprobable model on the ordered pairs. The result is Total p(k) p(k) 2 3 4 5 6 7 8 9 10 11 12 1 36 1 36 2 36 1 18 3 36 1 12 4 36 1 9 5 36 5 36 6 36 1 6 5 36 5 36 4 36 1 9 3 36 1 12 2 36 1 18 1 36 1 36 The dice can also come up doubles or not. Those probabilities are p p Doubles Not Doubles 6 36 1 6 30 36 5 6 1 We also want to consider the events that the total come up odd or come up even. We again compute these probabilities from original model. p p 1.2 Even Odd 18 36 1 2 18 36 1 2 Conditional Probability I The notion of conditional probability is one of the trickiest in probability theory. Conditional probability can lead to results that at …rst seem counterintuitive. For this reason, many probability puzzles, mathematical paradoxes, magic tricks, and cheating schemes have their origins in conditional probability. Also most, if not all, successful gambling strategies are based on a complete analysis of conditional probabilities. This is especially true in card games. Conditional probability is based on the following premise: Summary 1 Suppose we have a probability model for a random experiment. Suppose that one round of the experiment has been conducted, but that the full result has not yet been revealed. Suppose that it comes to your attention that a particular event in the sample space has de…nitely occurred. How does that knowledge change your view of the probabilities of other events? It is clear that knowing that certain event has taken place, does give you additional information, and that information should have an e¤ect on you estimates of other probabilities. If you have two events E1 and E2 that are mutually exclusive, and you discover that E1 has de…nitely occurred, then that excludes the possibility that E2 occurred as well. The event E2 may have a positive probability at the beginning of the experiment, but once you know that E1 has occurred, you should reason that the probability of E2 occurring has dropped to zero. Consider a speci…c example. Suppose that two fair dice are rolled, but you do not know exactly how they came out. You are told that the result did produce an even total. The sample space for the experiment is f(1; 1); (1; 2); (1; 3); (1; 4); (1; 5); (1; 6); (2; 1); (2; 2); (2; 3); (2; 4); (2; 5); (2; 6); (3; 1); (3; 2); (3; 3); (3; 4); (3; 5); (3; 6); (4; 1); (4; 2); (4; 3); (4; 4); (4; 5); (4; 6); (5; 1); (5; 2); (5; 3); (5; 4); (5; 5); (5; 6); (6; 1); (6; 2); (6; 3); (6; 4); (6; 5); (6; 6)g: The event of even total is Even = f(1; 1); (1; 3); (1; 5); (2; 2); (2; 4); (2; 6); (3; 1); (3; 3); (3; 5); (4; 2); (4; 4); (4; 6); (5; 1); (5; 4); (5; 5); (6; 2); (6; 4); (6; 6)g: 2 The sample points that remain are still equally likely, so armed with the extra information that it was de…nitely one of these 18 sample points that occurred, 1 you should adjust the probability of each to 18 . This is the conditional probability of the sample points: 0 p((a; b) given Even) = 1 18 if a + b is odd if a + b is even. We read this notation as "the probability that (a; b) occurs given that the event Even occurs." We can also adjust the probabilities of the various totals: Total p(Total k given Even) p(Total k given Even) 2 3 4 5 6 7 8 9 10 11 12 1 18 1 18 0 18 3 18 1 6 0 18 5 18 5 18 0 18 5 18 5 18 0 18 3 18 1 6 0 18 1 18 1 18 0 0 0 0 0 and the probabilities for doubles and not: p( given Even) p( given Even) Doubles Not Doubles 6 18 1 3 12 18 2 3 Of course we also have p( given Even) 1.3 Even 1 Odd 0 Conditional Probability II The o¢ cial de…nition of conditional probability depends strictly on a bit of arithmetic. It depends, however, on the probability of the intersection event E 1 \ E2 . De…nition 2 Let E1 and E2 be two events in the sample space of a random experiment with a set probability model. The conditional probability of E1 given E2 is p(E1 \ E2 ) p(E1 given E2 ) = . p(E2 ) It is easy to remember this formula if you know that it involves E1 \ E2 . First the formula requires division, and to be a probability, the larger number in the denominator. Since the intersection event E1 \ E2 is part of both of the events E1 and E2 , its probability is smaller. So p(E1 \ E2 ) is in the numerator. The denominator is the event that follows the word "given." The only hard part is to remember that p(E1 given E2 ) means that we are assuming that E2 has de…nitely occurred, and we are looking for the probability of E1 : 3 This formula gives results that agree with the probabilities we computed earlier. For example, 5 ; 36 1 p(Even) = ; 2 5 p(Total 6 \ Even) = 36 because coming up with Total 6 automatically means the total came up Even at the same time. Thus p(Total 6 ) p(Total 6 given Even) = = 5 36 1 2 = 5 2 5 = : 36 1 18 Also p(Doubles) = p(Even) = p(Doubles \ Even) = So p(Doubles given Even) = 1 6 1 2 1 ; 6 1 ; 2 1 : 6 = 1 2 1 = 6 1 3 = 5 36 = 1: 36 5 But we see that p(Even given Total 6 ) = 5 36 5 36 Because if it came up with a total 6, we know for sure the total is even. Now notice that if E1 and E2 are mutually exclusive, then E1 \ E2 = ;: Thus p(E1 given E2 ) = p(E2 given E1 ) = p(E1 \ E2 ) 0 = =0 p(E2 ) p(E2 ) p(E1 \ E2 ) 0 = =0 p(E1 ) p(E1 ) Normally p(E1 given E2 ) and p(E2 given E1 ) are di¤erent, but if the two events are mutually exclusive, they are both zero. For another example, consider the game of Rock-Paper-Scissors. Assuming the two players are trying to vary their plays, the game can be considered a random process. A sample space for this experiment is f(Rock, Rock); (Rock, Paper); (Rock, Scissors); (Paper, Rock); (Paper, Paper); (Paper, Scissors); (Scissors, Rock); (Scissors, Paper); (Scissors, Scissors)g: 4 We build an equiprobable probability model on this sample space, each sample point will be assigned a probability of 19 . There are three possible events to worry about: First player wins Second player wins Tie = f(Rock, Scissors); (Scissors, Paper); (Paper, Rock)g = f(Rock, Paper); (Scissors, Rock); (Paper, Scissors)g = f(Rock, Rock); (Scissors, Scissors); (Paper, Paper)g Then the probabilities under the model are p(First player wins) = p(Second player wins) = p(Tie) = 1 3 1 3 1 : 3 So according to these models, when we play this game, the …rst player wins 31 of the time; the second player wins 13 of the time, and the game ends in a tie 13 of the time. Now suppose we are only interested in who wins the game. That means that we would like to assume that the game reaches a conclusion. Only the event Tie does not determine a winner, so we are interested in the outcome of the game under the condition it was not a tie. The event that the game did not end in a tie is the complementary event Tiec : We know p(Tie c ) = 1 p(Tie) = 1 1 2 = : 3 3 Now if the …rst player wins, the game did not end in a tie, and similarly for the second player. So p(Tie c \ (First player wins)) = p(First player wins) = p(Tie c \ (Second player wins)) 1 3 = p(Second player wins) = 1 3 Thus p((First player wins) given Tie c ) = p(First player wins) = p((Second player wins) given Tie c ) 1 3 2 3 = p(Second player wins) = = 1 3 2 3 1 2 = 1 2 This is the third distinct way we have analyzed this game and come to the conclusion that both players have an equal chance to win. 5 1.4 Independence Consider another example of conditional probability involving two fair dice. In this example, we will consider events where F1 is the event that the …rst die comes up 1; F2 is the event that the …rst die comes up 2, and so on through F3 , F4 , F5 , and F6 . Similarly, S1 is the event that the second die comes up 1, and this goes on through S2 , S3 , S4 , S5 , and S6 . For example F1 S3 = f(1; 1); (1; 2); (1; 3); (1; 4); (1; 5); (1; 6)g = f(1; 3); (2; 3); (3; 3); (4; 3); (5; 3); (6; 3)g: Now in pairs the Fk events are mutually exclusive, and the Sk events are mutually exclusive in pairs also. However, each Fk event has sample points in common with each Sk event. In fact, Fk \ Sj = f(k; j)g: We can easily compute probabilities for all these events: 6 = 16 ; p(Fk ) = 36 6 p(Sj ) = 36 = 16 ; 1 p(Fk \ Sj ) = 36 ; for all k = 1; 2; 3; 4; 5; 6 for all j = 1; 2; 3; 4; 5; 6 for all possible k and j. From this, we can compute all possible conditional probabilities. First p(Fk given Fj ) = 0 if k 6= j p(Sk given Sj ) = 0 if k 6= j This is because the Fk events are mutually exclusive in pairs; as are the Sk events. Of course, if k = j, p(Fk given Fk ) = 1 p(Sk given Sk ) = 1: The interesting conditional probabilities are the mixed pairs. We see p(Fk given Sj ) = p(Fk \ Sj ) = p(Sj ) 1 36 1 6 = 1 6 1 = : 36 1 6 p(Fj \ Sk ) = p(Fj ) 1 36 1 6 = 1 1 6 = : 36 1 6 In the other direction, p(Sk given Fj ) = Now normally, when we change the order in a conditional probability, the computed probability changes. However, the Fk and Sk events are special in this regard. But they are even more special than that. We also notice that p(Fk given Sj ) = p(Sk given Fj ) = 6 1 = p(Fk ) 6 1 = p(Sk ): 6 What this means is that knowing the outcome of the second die in a toss does not change the probability that any particular number occurs on the …rst die. Similarly, knowing the outcome of the …rst die in a toss, does not change the probability that any particular number occurs on the second die. In particular, the probability that the second die came up 2 is 16 . If you know for sure that the …rst die came up with a 5, the probability that the second die came up 2 (given that the …rst came up 5) is still 16 . Knowing that the …rst die came up 6 gives you no additional information about whether the second die came up 2 or not. If the result of the …rst die gave even the slightest hint about what the second die did, then the probability would have changed. When we have two events D and E in a sample space with a probability model, we say they are independent events if p(D given E) = p(D) and p(E given D) = p(E). By our de…nition of conditional probability, we see that the …rst condition is the same as saying p(D) = p(D given E) = p(D \ E) : p(E) And in turn p(E)p(D) = p(D \ E): The second condition leads to the same place: p(E) p(E)p(D) = p(E given F ) = p(D \ E) p(D) = p(D \ E): Thus we are led to the following o¢ cial de…nition: De…nition 3 When we have two events D and E in a sample space with a probability model, we say they are independent events if p(D \ E) = p(D)p(E): In our example, we used the equiprobable probability model based on the ordered pair sample space to …nd that the results of the …rst die and the results of the second die are mathematically independent. That would mean that, no matter how much two dice bump into each other as they are thrown, when they come to a stop, the end result on one die has no in‡uence on the result on the other. When we throw two dice, they do dice act independently of each other, and neither has any substantial in‡uence on the other in terms of how they come to rest. If this seems intuitive to you, then that supports the idea that the probability model for dice throwing will correspond to any results we might observe. This 7 supports our choice of the equiprobable probability on the sample space of ordered pairs. In fact, we could go the other way. We could start with the assumption that on the toss of one die, the sample space f1; 2; 3; 4; 5; 6g is equally likely. So on the toss of one fair die, we would use the probability model k p(k) 1 2 3 4 5 6 1 6 1 6 1 6 1 6 1 6 1 6: If we also assume that the toss of two dice are independent, and thus all events involving two di¤erent dice are mathematically independent, then we can compute p((k; j)) = p(Fk \ Sj ) = p(Fk ) p(Sj ) 1 1 = 6 6 1 : = 36 This is the same probability we get by assuming the ordered pair sample space is equilikely. So we …nd that the assumption that the six possibilities on the toss of one die are equilikely, and that, in a toss of two die, the results on each die are independent is equivalent to assuming that the ordered pair sample space of a two dice experiment is equilikely. Certainly if two reasonable sounding assumptions lead to the same probability models, we are on to something. Now we have two terms that describe the interactions between events in a random experiment: mutually exclusive and independent. They are totally di¤erent, but nonetheless, easy to confuse. Two events are mutually exclusive if they cannot both occur at the same time. Before you describe two events as mutually exclusive, you should check and make sure that they cannot both occur. If two events are mutually exclusive, then knowing that one occurred will de…nitely tell you something about the probability of the other event. Once you one of a pair of mutually exclusive events has occurred, you know for absolute certain that the other event has not occurred! Mutually exclusive events cannot possibly be independent. If two events are independent, then knowing one occurred cannot give you even the slightest hint about the other. Independent events cannot be mutually exclusive. If two events are independent, then it must be possible for them both to occur at once. Before you describe two events as independent, you should check and make sure that they can both occur at once in one experiment. Recognizing when two events are either mutually exclusive or independent is important because both of these are very special situations. When we know one of these is the case, we have useful information about the possibility of them both occurring and the possibility that either one of them occurs. Conclusion 4 If E and D are mutually exclusive events in a sample space with a probability model, then p(E \ D) = 0: 8 Conclusion 5 If E and D are independent events in a sample space with a probability model, then p(E \ D) = p(D)p(E): Conclusion 6 If E and D are mutually exclusive events in a sample space with a probability model, then p(E [ D) = p(D) + p(E): Conclusion 7 If E and D are independent events in a sample space with a probability model, then p(E [ D) = p(D) + p(E) p(D)p(E): 1.5 Caution We started this discussion saying that the notion of conditional probability is one of the trickiest in probability theory. Conditional probability can lead to results that at …rst seem counter-intuitive. For this reason, many probability puzzles, mathematical paradoxes, magic tricks, and cheating schemes have their origins in conditional probability. Also most, if not all, successful gambling strategies are based on a complete analysis of conditional probabilities. This is especially true in card games. Once you see what conditional probability is about, you can see how it applies to games and strategies. Suppose you are playing a card game, and the …nal outcome that will determine whether you win or your opponent does comes down to one card. From the exposed cards, it is clear to all that, if the …nal card is a spade, you will win, but if it is not, you will lose. Further, everyone watching would see that there are 34 cards left in the deck.. Further those people paying close attention would have counted and know that, of the 34 unaccounted for cards, exactly 8 are spades, 4 are clubs, 11 are diamonds, and 11 are hearts. We can create a sample space for this random experiment: fS1 ; S2 ; : : : S8 ; C1 ; C2 ; C3 ; C4 ; D1 ; : : : ; D11 ; H1 ; : : : ; H11 g Here the S’s C’s, D’s and H’s are unaccounted for cards in the various suits. The event that you will win is E = fS1 ; S2 ; : : : S8 ; g: It seems reasonable to assume that the sample space is equilikely. That gives a 1 probability model where every card has a probability of 34 . Thus the probability 4 8 that you will win is 34 = 17 : This does not look good. However, you are a very attentive card player and the dealer is very sloppy. You have noticed that the next card to be dealt is black. You know for sure that the event D = fS1 ; S2 ; : : : S8 ; C1 ; C2 ; C3 ; C4 g has occurred. Now and so we compute D \ E = fS1 ; S2 ; : : : S8 g p(D) = p(D \ E) = 9 6 12 = 34 17 8 4 = 34 17 So you know that your chances of winning are actually p(D \ E) = p(D) p(E given D) = 4 17 6 17 = 2 4 17 = : 17 6 3 This looks much better. Of course, using this extra information would be highly unethical, but at the same time rather lucrative. The aspect of conditional probability that makes it so tricky is that it requires a very complete knowledge of exactly what event is known to occur. Many puzzles, tricks, and other schemes are based on obscuring the full content of the conditional event. Consider the following example. Suppose a friend takes a dime and a penny and ‡ips them both in such a way you cannot see the results. She then informs you that one of the coins has come up heads. What is the probability that the other is also heads? Well, ‡ipping two coins is the same as ‡ipping two coins independently. The results of one coin should not e¤ect the results of the other. Thus the probability that the other coin is a heads should be 12 : But not so fast. If we set up a sample space for this experiment, we would probably pick f(H; H); (H; T ); (T; H); (T; T )g where the penny is …rst and the dime is second. We keep track of the ordered pairs because we saw from our dice example that this should lead to an equilikely sample space. Under this assumption, we assign a probability of 41 to each pair. Now we are told that one of the coins is a heads. That is the event E = f(H; H); (H; T ); (T; H)g: We are asked about the case where they are both heads, that is D = f(H; H)g: We notice that D \ E = f(H; H)g: Computing probabilities, we …nd p(E) = p(D \ E) = p(D given E) = 3 4 1 4 p(D \ E) 1 4 1 = = : p(E) 4 3 3 This is much less than 12 : What happened? In our …rst calculation we fell for our friends, possible deliberate, obscuring of the event she told us took place. Our friend said, "One of the coins has come up heads." Notice she did not say, "The penny 10 came up heads" or "The dime came up heads." In our calculation we reasoned that ‡ipping two coins is the same as ‡ipping two coins independently. This is correct; however, when we ‡ip two coins independently, we do know which is which. Our sample space f(H; H); (H; T ); (T; H); (T; T )g assumes we know which coin is which. We know from experience that that should lead to an equilikely sample space. The information that one of the coins has come up heads is actually less than it seems. Because it does not say whether it was the penny or the dime that came up heads, it only eliminated one of the four equally likely possibilities. To see how this actually works in practice, suppose you are playing the game repeatedly and keeping score. You friend agrees to play the exact same game as long as they can. First your friend says, "One of the coins has come up heads" and you make your guess on the other. Next your friend says, "One of the coins has come up heads" and you make your guess on the other. Again, your friend says, "One of the coins has come up heads" and you make your guess on the other. Then your friend says, "One of the coins has come up tails." You have to ask yourself, "why did she change what she told me?" Assuming your friend is keeping her promise, there is only one reason she has changed her information. She had to change it to be truthful. Your informed guess is that the other coin is also a Tails which is right. This means that there is some small hint about the "other" coin in the information "one of the coins has come up heads." This phrasing obscures the full content of the event being described in the game, and that leads the careless to draw the wrong conclusion. So does that mean that your friend saying "The penny came up heads" changes the game. Yes, if the game always requires that the result of the penny is disclosed. However, if your "friend" is trying to take advantage of you, they will change this particular rule every time they play the game. They will sometime tell you what the penny did, sometimes the dime. They will sometimes tell you a coin came up heads, other times tails, and they will not wait until they are forced to before the make the switch. If the rule of the game does not allow this, then the 21 and 21 probability model works. But if the rule does allow these changes, or you friend chooses to ignore the rule, then the correct probability model is 31 and 23 . Since the rules of the game involve someone’s choice, that human intervention has reduced some of the assumed randomness of the play. Your probability analysis and your participation in the game need to take this into account. Conditional probability is, in practice, very tricky business. But the mathematics does have one very practical application. Any extra condition when known can have a large impact on the probabilities of other events. Poker players certainly know this. Good ones are always alert to the slightest hint about what another player knows, and they are as careful as possible not to accidently give any of their own information away. Even the most skilled gambler can loose a lot of money very quickly by developing an inadvertent "tell" that gives away the strength of their hand. Top players in a losing streak are known to …lm themselves playing and watch the play over and over trying to spot any behavior that is giving their thoughts away. 11 Prepared by: Daniel Madden and Alyssa Keri: May 2009 12