* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download STA111 - Lecture 2 Counting and Conditional Probability 1 Basic
Survey
Document related concepts
Transcript
STA111 - Lecture 2 Counting and Conditional Probability We’re going to start calculating probabilities in examples where the sample space (set of possible outcomes) is finite and where it makes sense to compute the probability of an event as “(number of favorable cases)/(number of possible outcomes)”. In our first examples, then, finding the probability of an event will be reduced to counting the elements of the sample space that satisfy some condition. 1 Basic Counting Product rule: Quoting our textbook “If one thing can be accomplished in n1 different ways and after this a second thing can be accomplished in n2 different ways, ... , and finally a k-th thing can be accomplished in nk different ways, then all k things can be accomplished in the specified order in n1 n2 · · · nk different ways.” This sounds more complicated than it actually is. Examples and tree diagrams help. Examples: • Suppose your favorite frozen yogurt place has 4 different choices of yogurt and 10 different toppings. There are 4 × 10 = 40 different 1-topping choices. • Suppose that I have 5 t-shirts and 3 pairs of shorts. In total, I have 15 distinct “outfits.” Permutations: Suppose we want to pick k elements from a set that has n elements in a certain order (n ≥ k). By the product rule, there are n(n − 1)..(n − k + 1) ways of doing it. Combinations: Suppose that we still want to pick a subset of k elements from a set with n elements, but order doesn’t matter. There are n! n n(n − 1)..(n − k + 1) = = k! k!(n − k)! k possible subsets. Examples: • What is the probability of a four of a kind in 5 card poker? We need to compute P (4 of a kind) = where #(favorable hands) , #(5 card hands) 52 #(5 card hands) = , 5 because we’re choosing a subset of 5 cards out of a deck with 52, and order is irrelevant. Finally, #(favorable hands) = 13 · (52 − 4) = 624, because we can 1) pick any rank from A-K (13 choices) 2) choose any other card (we have 52 minus the 4 that we’ve taken). If you are curious and/or want extra practice, Wikipedia has a great article on poker probabilities. 1 • Suppose we’re tossing a fair coin 4 times. What is the probability of obtaining tails exactly once? What is the probability of obtaining all heads? What about the probability of obtaining at least one tail? Let H denote heads and T denote tails. The possible outcomes can be identified with a string of four letters (for example HHHH if all the outcomes are heads, HTHH if all but the second are heads). The number of possible outcomes is, then, 24 = 16, and we can assume they’re all equally likely. The probability of obtaining tails just once is 4/16 = 1/4 (the favorable cases are HHHT, HHTH, HTHH, THHH). The probability of obtaining all heads is 1/16 = 0.0625 (HHHH is the only favorable outcome). The probability of obtaining at least one tail is P (all headsc ). Remember that in Exercise 3 in Lecture 1 you showed that if A is an event P (Ac ) = 1 − P (A). Therefore, the probability of obtaining at least one tail is 1 − 1/16 = 0.9375. Exercise 1. If you roll a (fair) die 2 times, what is the probability that you obtain the same number twice? If you add up the numbers of the outcomes, what is the probability of obtaining a number strictly greater than 10? Exercise 2. A bag has 3 green jelly beans and 7 red jelly beans. If you extract 2 jelly beans, what is the probability that the 2 of them are red? Now suppose that you draw 5 jelly beans out of the bag. What is the probability that 3 are red and 2 are green? 2 Conditional Probability Let A, B be events and let P (B) > 0. Then, we define the probability of A given B as P (A | B) = P (A ∩ B) . P (B) The interpretation of P (A | B) is “the probability that A happens given that we know that B has happened”. In this context, P (A) is sometimes called the marginal probability of A (in contrast with the conditional probability of A, given B). Conditional probability can be confusing and counterintuitive. The best way to master it is by practicing a lot. As we will see in Lecture, we can see why the definition of conditional probability “works” using Venn diagrams. Examples • This example is taken from a previous iteration of this course. If A = {a person is 6 ft tall} and B = {a person is a basketball player}, P (A | B) is the probability that a person is 6 ft tall given that she is a basketball player. Is P (A | B) greater or less than P (A)? • Suppose that our friend Bobby throws a die. He knows the outcome, but we don’t. He tells us that the result is an even number. What is the probability that it is a 6? Well, our intuition tells us that it should be 1/3, since the only possible outcomes are 2, 4, and 6, and they should be equally likely. On the other hand, the probability that the result is, say, a 5, is 0, because we know that the outcome is an even number. Therefore, P (5) = P (6) = 1/6, but P (5 | B) = 0 and P (6 | B) = 1/3, where B is the information that Bobby gave us. Conditional probability, then, is a formalization of how we update our probabilities when a new piece of information comes in. Let’s see that if we apply the mathematical definition above, we get the same results. On the one hand P (B) = 1/2 (probability that the outcome is an even number). On the other hand P (6∩B) = P (6 and even number) = P (6) = 1/6 and P (5 ∩ B) = P (5 and even number) = 0. Then P (6 | B) = 1/3 and P (5 | B) = 0. 2 • The definition of conditional independence gives us a useful formula for computing probabilities of intersections: P (A ∩ B) = P (A)P (B | A) = P (B)P (A | B). These formulas are quite interpretable: “the probability that A and B happen is the probability that A happens and the probability that B happens, given that A has happened”. For example, we can compute the probability of drawing two aces in a well-shuffled deck as follows. Let A1 be the event “the first card is an ace” and A2 be the event “the second card is an ace”. Then P (A1 ∩ A2 ) = P (A1 )P (A2 | A1 ) = 4 3 ≈ 0.005, 52 51 because if the first card is an ace, we have 51 cards left, 3 of which are aces. We can also find this probability directly using the rule “(number of favorable cases)/(number of possible outcomes)”: 4 4 3 2 P (A1 ∩ A2 ) = 52 = ≈ 0.005. 52 51 2 Exercise 3. For each of the following events A and B, do you think that P (A | B) less than, equal to, or greater than P (A)? 1. A: high temperature over 100 degrees, B: summer. 2. A: a card drawn from a well-shuffled deck is greater than 5, B: the card is a club. 3. A: understanding conditional probability, B: having done this exercise. Come up with 3 more examples of your own. Exercise 4. Suppose you’re drawing cards from a well-shuffled deck and the first two cards you get are aces. What is the probability of having a hand with 4 aces in total after 3 more draws? Independence: The events A and B are said to be independent if P (A | B) = P (A). That is, if knowing B doesn’t change the probability of A. By the definition of conditional probability, this implies that if A and B are independent, we must have P (A ∩ B) = P (A)P (B). Examples: • Tossing coins or rolling dice repeatedly are good examples of independent events. It is reasonable to assume that the outcome of the first toss doesn’t change the probability of the outcome of the subsequent ones (if the coin or die is fair). • Bobby has 3 bags with red and green jelly beans. The first one has 10 red beans and 10 green beans. The second one has 25 red beans and 5 green beans. Finally, the third one has 25 green beans and 5 green beans. He will select one of the three bags with equal probabilities and will draw a jelly bean. Before you know the bag from which he is drawing the jelly bean, what is the probability that it is red? Suppose that you know that he’s going to draw the bean from the first bag, what is the probability that it is a red jelly bean? In both cases, the probability of drawing a red bean is 1/2. In the first example, independence was a model assumption. In the second, it was something we showed (although there is an implicit independence assumption, too – can you identify it?). 3