Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
DUBLIN INSTITUTE OF TECHNOLOGY KEVIN STREET, DUBLIN 8 N S Probability Based Learning: Introduction to Probability O REVISION QUESTIONS SO LU TI *** SOLUTIONS *** M ACHINE L EARNING AT DIT Dr. John Kelleher Dr. Brian Mac Namee *** SOLUTIONS *** *** SOLUTIONS *** *** SOLUTIONS *** 1. Given the joint distribution for X and Y listed in Table 1 calculate the following: (a) P (X = x1 ) ( marks) 0.02 + 0.14 + 0.10= 0.26 (b) P (Y = y2 ) 0.14 + 0.32 = 0.46 (c) P (Y = y2 |X = x1 ) N S ( marks) ( marks) P (a∧b) P (b) P (Y =y2 ∧X=x1 ) → P (X=x1 ) 0.14 0.26 P (Y = y2 |X = x1 ) = P (Y = y2 |X = x1 ) = → O From the product rule: P (a|b) = Table 1: Joint Distribution for X and Y X = x1 0.02 0.14 0.10 X = x2 0.30 0.32 0.12 LU TI Y = y1 Y = y2 Y = y3 2. Given that P (a|b) = 0.5, P (a) = 0.3, P (b) = 0.4 calculate P (b|a). P (b|a) = P (a|b)×P (b) P (a) = 0.5×0.4 0.3 (x marks) = 0.67 SO 3. Consider the domain of dealing 5-card poker hands from a standard deck of 52 cards, under the assumption that the dealer is fair. (a) Given that the number of combinations of r objects that can be selected, without regard to order and without repetition, from n distinct objects is given by the n! equation nr = (n−r)!r! . How many atomic events are there in the joint probability distribution (i.e., how many 5-card hands are there)? 52! 52?51?50?49?48 There are 52 = 2, 598, 960 possible five-card 5 = (47!5! = 1?2?3?4?5 hands *** SOLUTIONS *** 1 *** SOLUTIONS *** *** SOLUTIONS *** (b) What is the probability of each atomic event? By the fair-dealing assumption, each of these is equally likely. Each hand therefore occurs with probability 1/2,598,960. N S (c) What is the probability of being dealt a royal straight flush (i.e. being dealt a hand containing Ace, King, Queen, Jack and 10 all from the one suit)? There are four hands that are royal straight ushes (one in each suit). These events are are mutually exclusive, therefore the probability of a royal ush is just the sum of the probabilities of the atomic events, i.e., 1 4 2,598,960 = 649,740 . (d) What is the probability of being dealt a four of a kind (i.e. four kings, or four nines etc.? There are 13 possible kinds and for each, the fth card can be one of 48 13×48 1 possible other cards. The total probability is therefore 2,598,960 = 4,165 . O 4. Given the full joint distribution shown in Table 2, calculate the following: Table 2: Full joint distribution for a dentist visit ¬toothache catch ¬catch 0.072 0.008 0.144 0.576 LU TI toothache catch ¬catch 0.108 0.012 0.016 0.064 cavity ¬cavity (a) P (toothache) This asks for the probability that T oothache is true. P (toothache) = 0.108 + 0.012 + 0.016 + 0.064 = 0.2 (b) P(Cavity) This asks for the vector of probability values for the random variable Cavity. It has two values, which we list in the order htrue, f alsei. First add up P (cavity) = 0.108 + 0.012 + 0.072 + 0.008 = 0.2. Then we have textbf P (Cavity) = h0.2, 0.8i . SO (c) P(T oothache|cavity) This asks for the vector of probability values for T oothache, given that Cavity is true. textbf P (T oothache|cavity) = h 0.108+.012 , 0.072+0.008 i= 0.2 0.2 h0.6, 0.4i (d) P(Cavity|toothache ∨ catch) *** SOLUTIONS *** 2 *** SOLUTIONS *** *** SOLUTIONS *** N S This asks for the vector of probability values for Cavity, given that either T oothache or Catch is true. Recall P (a|b) = PP(a∧b) (b) → P(Cavity|toothache ∨ catch) = , P (¬cavity∧(toothache∨cavity)) i h P (cavity∧(toothache∨cavity)) P (toothache∨catch) P (toothache∨catch) First compute P (toothache ∨ catch) = 0.108 + 0.012 + 0.016 + 0.064 + 0.072 + 0.144 = 0.416. Then P(Cavity|toothache ∨ catch) = h 0.108+0.012+0.072 , 0.016+0.064+0.144 i = h0.4615, 0.5384i 0.416 0.416 O 5. After you yearly checkup, the doctor has bad news and good news. The bad news is that you tested positive for a serious disease and that the test is 99% accurate (i.e., the probability of testing positive when you do have the disease is 0.99, as is the probability of testing negative when you don’t have the disease). The good news is that this is a rare disease, striking only 1 in 10,000 people of your age. Why is it good news that the disease is rare? What are the chances that you actually have the disease? LU TI We are given the following information: P (test|disease) = 0.99 P (¬test|¬disease) = 0.99 P (disease) = 0.0001 and the observation test. What the patient is concerned about is P (disease|test). Roughly speaking, the reason it is a good thing that the disease is rare is that P (disease|test) is proportional to P (disease), so a lower prior for disease will mean a lower value for P (disease|test). Roughly speaking, if 10,000 people take the test, we expect 1 to actually have the disease, and most likely test positive, while the rest do not have the disease, but 1 percent of them (about 100 people) will test positive anyway, so P (disease|test) will be about 1 in 100. More (a) precisely: P (a|b) = P (b|a)P P (a) P (disease|test) = P (disease|test) = P (test|disease)P (disease) → P (test) P (test|disease)P (disease) P (test|disease)P (disease)+P (test|6disease)P (¬disease) → 0.99×0.0001 (0.99×0.0001)+(0.1×0.9999) SO = .009804 The moral is that when the disease is much rarer than the test a ccuracy, a positive test result does not mean the disease is likely. A false positive reading remains much more likely. 6. Suppose you are given a bag containing n unbiased coins. You are told that n − 1 of these coins are normal, with heads on one side and tails on the other, whereas one is is fake, with heads on both sides. (a) Suppose you reach into the bag, picking out a coin uniformly at random, flip it, and get a head. What is the (conditional) probability that the coin you chose is the fake coin? *** SOLUTIONS *** 3 *** SOLUTIONS *** *** SOLUTIONS *** P (f ake|heads) = 1 1× n 2+(n−1) 2n 1 n N S There are n ways to pick a coin, and 2 outcomes for each flip (although with the fake coin, the results of the ip are indistinguishable), so there are 2n total atomic events, each equally likely. Of those, only 2 pick the fake coin, and 2 + (n − 1) result in heads (the two heads results for the fake coin and half the rest of the flips (n-1)). So the probability of a fake coin given heads, P (f ake|heads), is P (heads|f ake)P (f ake) . P (heads|f ake) = 1 P (heads) 1 P (f ake) = n −head−results P (heads) = totals−number−of = 2+(n−1) total−atomic−events 2n → 2n 2+(n−1) LU TI O P (f ake|heads) = × → 2 P (f ake|heads) = 2+(n−1) → 2 P (f ake|heads) = n+1) More formally: Let α be a normalising constant. P(F ake|heads) = αP(heads|F ake)P(F ake) → P(F ake|heads) = αhP (heads|f ake), P (heads|¬f ake)ihP (f ake), P (¬f ake)i P(F ake|heads) = αh1.0, 0.5ih n1 , (n−1) n i 1 (n−1) P(F ake|heads) = αh n , 2n i Let us compute α (n−1) 2+(n−1) 1 = n+1 n + 2n = 2n 2n → n+1 By definition α 2n = 1 → 2n α = n+1 2n Plugging α = n+1 back into our equation 2n (n−1) n+1 , 2n 2 h n+1 , (n−1) n+1 P(F ake|heads) = h n1 × P(F ake|heads) = × 2n n+1 i → SO (b) Suppose you continue flipping the coin for a total of k times after picking it and see k head. Nows what is the conditional probability that you picked the fake coin? *** SOLUTIONS *** 4 *** SOLUTIONS *** *** SOLUTIONS *** Now there are 2k n atomic events: (i) there are n ways to pick a coin, (ii) there are k flips of the picked coin. Each flip can result in one of 2 outcomes (although with the fake coin, the results of the ip are indistinguishable). As a result there are 2k possible outcomes for k flips. N S (iii) so there are 2k n atomic events in the domain (outcomes of flips × ways of picking a coin Of these 2k n atomic events 2k pick the fake coin, and 2k + (n − 1) result in heads. So the probability of a fake coin given a run of k heads, 2k P (f ake|headsk ), is (2k +(n−1)) . Note this approaches 1 as k increases, as expected. LU TI O Doing it the formal way: P(F ake|headsk ) = αP(headsk |F ake)P(F ake) → P(F ake|headsk ) = αh1.0, 0.5k ih n1 , (n−1) n i→ 1 (n−1) k P(F ake|heads ) = αh n , 2k n i → Let us compute α 2k +(n−1) (n−1) 1 → n + 2k n = 2k n 2k n By definition α = 2k +(n−1) → 2k n back into our equation 2k +(n−1) 2k k P(F ake|heads ) = h 2k +(n−1) , 2k(n−1) i +(n−1 Plugging α = SO (c) Suppose you wanted to decide whether the chosen coin was fake by flipping it k times. The decision procedure returns FAKE if all k flips come up heads, otherwise it returns NORMAL. What is the (unconditional) probability that this procedure makes an error? The procedure makes an error if and only if a fair coin is chosen and turns up heads k times in a row. The probability of this is: P (¬f ake|headsk ) = P (headsk ||¬f ake)P (¬f ake) → P (¬f ake|headsk ) = 0.5k × n−1 n → P (¬f ake|headsk ) = n−1 2k n 7. Suppose you are a witness to a nighttime hit-and-run accident involving a taxi in Athens. All taxis in Athens are blue or green. You swear, under oath, that the taxi was blue. Extensive testing shows that, under the dim lighting conditions, discrimination between blue and green is 75% reliable. (a) Is it possible to calculate the most likely color for the taxi? (Hint: distinguish carefully between the proposition that the taxi is blue and the proposition that it appears blue.) *** SOLUTIONS *** 5 *** SOLUTIONS *** *** SOLUTIONS *** N S The relevant aspect of the world can be described by two random variables: B means the taxi was blue, and LB means the taxi looked blue. The information on the reliability of color identication can be written as P (LB|B) = 0.75 and P (¬LB|¬B) = 0.75 We need to know the probability that the taxi was blue, given that it looked blue: P (B|LB) ∝ P (LB|B)P (B) ∝ 0.75P (B) P (¬B|LB) ∝ P (LB|¬B)P (¬B) ∝ 0.25(1?P (B)) Thus we cannot decide the probability without some information about the prior probability of blue taxis, P (B). For example, if we knew that all taxis were blue, i.e., P (B) = 1, then obviously P(B—LB) = 1. On the other hand, if we adopt Laplaces Principle of Indifference, which states that propositions can be deemed equally likely in the absence of any differentiating information, then we have P (B) = 0.5 and P (B —LB) = 0.75. Usually we will have some differentiating information, so this principle does not apply. SO LU TI O (b) What about now, given that 9 out of 10 Athenian taxis are green? Given that 9 out of 10 taxis are green, and assuming the taxi in question is drawn randomly from the taxi population, we have P (B) = 0.1. Hence P (B|LB) ∝ 0.75 × 0.1 ∝ 0.075 P (¬B|LB) ∝ 0.25 × 0.9 ∝ 0.225 0.075 P (B|LB) = 0.075+0.225 = 0.25 0.225 P (¬B|LB) = 0.075+0.225 = 0.75 *** SOLUTIONS *** 6 *** SOLUTIONS ***