Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lectures 5–11 Conditional Probability and Independence Purpose: Calculate probabilities under restrictions, conditions or partial information on the random experiment. Break down complex probabilistic analyses into manageable steps. Example 1 (Roll of Two Fair Dice): What is the long run relative frequency of a sum of seven given that we rolled at least one six. The relative frequencies of the shaded boxes should be roughly equal in the long run. Thus the conditional relative frequency of the light shaded boxes among the shaded boxes should be about 2/11. 6 1+6=7 2+6=8 3+6=9 4 + 6 = 10 5 + 6 = 11 6 + 6 = 12 6 + 5 = 11 4 6 + 4 = 10 in at ed 5 6+3=9 el im 3 2 6+2=8 6+1=7 1 1 2 3 5 4 P (sum of 7|at least one 6) = 2 11 = 6 2/36 11/36 Think of the original probabilities of the shaded region (adding to 11/36) being prorated so that the new conditional probabilities of the individual shaded boxes add to one. This is accomplished by dividing the original shaded box probabilities by this total shaded (conditioning) region probability, i.e., by 11/36, thus giving us the desired (11/36)/(11/36) = 1, the probability of our new sampling space (shaded region). 1 Definition: If P (F ) > 0 , then P (E|F ) = P (EF ) P (F ) undefined when P (F ) = 0 ! Long Run Frequency Interpretation of P (E|F ): By the long run frequency paradigm we would in a large number N of repeated experiments see about a proportion P (F ) of the experiments result in the event F , or we would expect to see event F to occur about N · P (F ) times, and similarly event EF about N · P (EF ) times. Thus, when we focus on experiments that result in F (i.e., given F ), the proportion of such restricted experiments that also result in E is thus approximately N · P (EF ) P (EF ) = = P (E|F ) N · P (F ) P (F ) Example 2 (Makeup Exam): A student is given a makeup exam and is given one hour to finish it. The exam is designed so that a fraction x/2 of students would finish it in less than x hours, i.e., about half would finish it in the allotted time. Given that the student (when viewed as a random choice from all students) is still working on the exam after 40 minutes, what is the chance that the student will use the full hour (finished or not)? Let X denote the time (in hours) that the student needs to finish the exam. We want P (X < 1|X ≥ 40/60) = P (X < 1) − P (X < 40/60) 1/2 − 1/3 1 P (40/60 ≤ X < 1) = = = P (X ≥ 40/60) 1 − P (X < 40/60) 1 − 1/3 4 X < 1 is shorthand for the event that the student finishes prior to 1 hour, and similarly for the other usages. The chance that the student will use the full hour (finished or not) is .75. Prorated probabilities: Note that in the case of equally likely outcomes it is often easier to work with the reduced sample space treating the remaining outcomes as equally likely. In general, outcome probabilities are prorated P ({e}|F ) = P ({e}) if e ∈ F P (F ) and P ({e}|F ) = 0 otherwise. When P ({e}) are all the same, then P ({e})/P (F ) are all the same for all e ∈ F . Example 3 (Coin Flips): Two fair coins are flipped. All outcomes of the sample space S = {(H.H), (H, T ), (T, H), (T, T )} are equally likely. What is the conditional probability that both coins are heads given a) that the first coin shows heads and b) at least one of the coins shows heads? Let B = {(H, H)}, F = {(H, T ), (H, H)} and A = {(H, T ), (T, H), (H, H)}, then for a) P (B|F ) = P (BF ) P (B) 1/4 1 = = = P (F ) P (F ) 1/2 2 P (B|A) = P (AB) P (B) 1/4 1 = = = P (A) P (A) 3/4 3 while for b) we get 2 This takes many by surprise, and it is often phrased in terms of the boy/girl problem in a family with two children. If you are told that at least one of the children is a girl, the answer again is 1/3 for the probability that both are girls. Note however the lengthy but readable discussion in Example 3m in Section 3.3 of Ross. Here comes the kicker. Suppose you are told that at least one is a girl and she was born on a Sunday and assuming all 2 × 7 × 2 × 7 = 196 gender, weekday combinations (G1 , D1 , G2 , D2 ) as equally likely, the answer changes to?? We will generalize this from days Di to attributes Ai which can take any of n values. For example, for n = 365 it could be the day in the year. If I tell you that at least one of the children is a girl with attribute A = k, then the given information reduces the sample space to {(G1 , k, G2 , i), i = 1, . . . , n} ∪ {(G1 , i, G2 , k), i = 1, . . . , n, i 6= k} ∪ {(G1 , k, B2 , i), i = 1, . . . , n} ∪ {(B1 , i, G2 , k), i = 1, . . . , n} with altogether n + (n − 1) + n + n = 4n − 1 equally likely outcomes. Of these n + n − 1 = 2n − 1 qualify as having two girls. Thus the conditional chance is Lec6 ends 1 2n − 1 1 2n − 1 ≈ for large n and = for n = 1 4n − 1 2 4n − 1 3 The product formula: A reformulation of the definition of P (E|F ): P (EF ) = P (E|F )P (F ) Useful in computing P (EF ) when P (E|F ) is more transparent. The Law of Total Probability: Suppose the sample space S can be represented as the disjoint union of M events Fi , i = 1, . . . , M , with Fi Fj = ∅ for i 6= j, then for any event E we have (given as (3.4) by Ross, but pulled forward to clean up the next example): P (E) = P (ES) = P E M [ i=1 ! Fi = P M [ i=1 ! EFi = M X i=1 P (EFi ) = M X P (E|Fi )P (Fi ) i=1 Special Case: E = EF ∪EF c mutually exclusive =⇒ P (E) = P (EF )+P (EF c ) = P (E|F )P (F )+P (E|F c )P (F c ) with the hope that P (F ), P (E|F ) and P (E|F c ) are easier to ascertain than P (E). Example 4 (Bridge): In the card game bridge 52 cards are dealt with equal chance for all sets of hands of 13 each to East, West, North and South. Given that North and South have a total of 8 spades, what is the chance that East has 3 of the remaining 5 spades? Let Fi be the event that specifies the ith distinct deal to North and South, where they have a total of 8 spades, and where East and West thus have the other 5 spades, but the deal to East and West is not specified otherwise. How many such disjoint events Fi there are is not important, say 26 1 there are M . However, it is easy to realize that each Fi contains the same number 13 of deals to East and West, i.e., P (F1 ) = . . . = P (FM ) = P (F )/M , where F is the event that North and South have 8 spades among themselves. 1 M= 13 8 39 18 26 13 , 8 spades, 18 non-spades, to make two hands of 13 for N and S. 3 Conditionally, given F , we can view F1 ∪ . . . ∪ FM = F as our reduced sample space with all deals in it being equally likely. Let E be the event that East gets exactly 3 spades. Then P (E|Fi ) = 5 3 21 10 26 13 = 0.339 Note that which 5 spades and which 21 non-spades are involved in making the two hands for East and West depends on Fi , the hands specified for North and South. However, the probability P (E|Fi ) is always the same. Using the same idea as presented in the law of total probability we have (see diagram) P (E|F ) = M M P (E(F1 ∪ . . . ∪ FM )) X P (EFi ) X P (E|Fi )P (Fi ) P (EF ) = = = = P (E|Fi ) = .339 P (F ) P (F ) P (F ) i=1 P (F ) i=1 Example 5 (Urn): An urn contains k blue balls and n − k red ones. If we draw the balls out one by one in random order (all n! orders equally likely), what is the chance that the first ball is blue? Distinguish the balls by labels 1, . . . , k, k + 1, . . . , n, with first k corresponding to blue. Then there are k ways to make the first choice so that it is blue, and then (n − 1)! ways to make the remaining choices. On the other hand there are n! ways to make all n choices, without restrictions. Thus the desired probability is k(n − 1)!/n! = k/n, intuitively quite evident, when we just focus on the first choice. But the same argument works and gives the same answer when we ask: what is the chance that the last ball (or any position ball) is blue. Now suppose the urn contains b blue balls and r red balls. You randomly draw one by one n balls out. Given that there are k blue balls among the n drawn, what is the chance that the first one is blue? It is exactly the same as in the previous setting since the condition reduces the sample space to equally likely sequences of n balls with k blues ones and n − k red ones, i.e., it is as though we draw n from an urn with k blue and n − k red balls. 4 Example 6 (An Ace in Each Bridge Hand): When we deal out 4 hands of 13 cards each, what is the chance that each hand has an ace? Define the following four events: E1 = {the ace of spades in any one of the hands} E2 = {the ace of spades and the ace of heart are in different hands} E3 = {the aces of spades, hearts and diamonds are in different hands} E4 = {all four aces are in different hands}, then the desired probability is 52 39 26 13 · · · ≈ .1055 52 51 50 49 P (E1 E2 E3 E4 ) = P (E1 )P (E2 |E1 )P (E3 |E1 E2 )P (E4 |E1 E2 E3 ) = The fractions become clear by viewing the deal as though the aces are dealt one by one as the first 4 cards to the 52 positions (positions 1, . . . , 13 making the first hand, positions 14, . . . , 26 making the second hand, etc.). With that view 39/51 is the chance that the ace of hearts goes to one of the 39 positions out of the 51 open positions, where 39 counts the positions in the other hands, different from the hand where the ace of spades wound up. And so on. This reasoning assumes (correctly) that this way of dealing out cards makes all such hands equally likely. In a normal deal, clockwise one card to each player repeatedly from a shuffled deck, we could track which of the 52 slots the 4 aces got dealt into, then which slots the 4 kings got dealt to, and so on. This kind of tracking should make clear that the same kind of hands are possible either way, all equally likely. In both cases the shuffled deck determines the random sequence of dealing, with the same result. Another path involves just simple counting, without conditional probabilities. We can deal the cards in 52! orders. Assume that the first 13 cards go to player 1, the second 13 to player 2, etc. The ace of space can land in any of the 52 deal positions, the ace of hearts has 39 positions left so that it lands in any other hand, etc. After the 4 ace positions in different hands have been determined, there are 48! ways to deal out the other cards. Thus P (E) = 52 · 39 · 26 · 13 52 · 39 · 26 · 13 · 48! = 52! 52 · 51 · 50 · 49 Example 7 (Tree Diagram: Rolling 5 Dice to Get Different Faces): 5 Dice are rolled at most twice in order to achieve 5 distinct faces. On the second roll, dice with duplicate faces are rolled again. What ist the chance of the event D of finishing with 5 distinct faces? First we work out the probabilities for the events E1 , . . . , E5 , where Ei means that we get exactly i distinct faces on the first roll. We have (recall the birthday problem) 720 6·5·4·3·2 p5 = P (E5 ) = = 5 5 6 6 p3 = P (E3 ) = 5 1 5 3 ·6·5 , p4 = P (E4 ) = ·6·5·4 65 5 2 + ·6·5 5 1 ·6·3·5·4 65 5 2 ·6·5·4·3 65 = = 3600 65 3000 65 450 6 , p1 = P (E1 ) = 5 5 6 6 5 Note that 720 + 3600 + 3000 + 450 + 6 = 7776 = 6 , confirming a proper count in the disjoint sets. We only comment on P (E3 ) which can come about as triple, single, single, e.g., (4, 1, 3, 4, 4), and p2 = P (E2 ) = 65 + 65 5 = Lec7 ends as two doubles and a single, e.g., (2, 4, 6, 6, 4). While the numerator count for the former should be clear, the count for the latter is obtained by choosing the position for the singleton and filling it in 6 possible ways, i.e., 51 · 6, and then taking the left most free position as the left most position in the left most pair and combine that with the 3 positions for the right most position in that pair. The remaining slots define the other pair. These pairs are then filled with 5 and 4 respective choices. The probability can then be obtained by following all branches in the tree diagram below, leading to the event of interest, i.e., all 5 faces are distinct, and multiplying the probabilities along each such branch and adding up all these products. The probability at each branch segment represents the conditional probability of traversing this segment, conditional on having arrived at the segment from the root node. Again, an application of the law of total probability. P (D) = P (E5 )P (D|E5 ) + P (E4 )P (D|E4 ) + P (E3 )P (D|E3 ) + P (E2 )P (D|E2 ) + P (E1 )P (D|E1 ) 720 3600 2 3000 6 450 24 6 120 88940 = + 5 · + 5 · 2+ 5 · 3 + 5· 4 = ≈ .3177 5 6 6 6 6 6 6 6 6 6 67 Bayes’ Formula Example 8 (Insurance): In a population there are 10% accident prone people and the rest is not accident prone. An accident prone person has a 20% chance of having an accident in a given 6 year whereas for a normal person that chance is 10%. What is the chance that a randomly chosen person will have an accident during the next year. If F is the event that the chosen person is accident prone and A is the event that the chosen person will have an accident next year then P (A) = P (A|F )P (F ) + P (A|F c )P (F c ) = .2 · .1 + .1 · .9 = .11 If the chosen person had an accident within that year, what is the chance that the person is accident prone? P (F |A) = P (A|F )P (F ) P (A|F )P (F ) .2 · .1 2 P (AF ) = = = = c c P (A) P (A) P (A|F )P (F ) + P (A|F )P (F ) .11 11 i.e., the chance has almost doubled, from 1/10 to 2/11. This is an instance of Bayes’ formula. Example 9 (Multiple Choice Tests): What is the chance of the event K that the student knew the answer if the student answered the question correctly (event C). Assume that there are m choices and the a priori chance of the student knowing the answer is p. When the student does not know the answer, it is chosen randomly. P (K|C) = P (C|K)P (K) 1·p m·p P (KC) = = = c c P (C) P (C|K)P (K) + P (C|K )P (K ) 1 · p + (1/m)(1 − p) 1−p+m·p Note:P (K|C) % 1 as m % ∞. With p = .5 and m = 4 we get P (K|C) = .8. Example 10 (Blood Test for Disease): A test is 95% effective on persons with the disease and has a 1% false alarm rate. Suppose that the prevalence of the disease in the population is .5%. What is the chance that the person actually has the disease (event D), given that the test is positive (event E)? P (DE) P (E|D)P (D) = P (E) P (E|D)P (D) + P (E|Dc )P (Dc ) 95 .95 · .005 = ≈ .323 = .95 · .005 + .01 · .995 294 .99 · .005 ≈ .3322 P (D|E) = .99 · .005 + .01 · .995 1 · .005 ≈ .33445 P (D|E) = 1 · .005 + .01 · .995 P (D|E) = With P (E|D) = .99 With P (E|D) = 1 Bayes’ formula The following long run type argument makes the surprising answer more transparent. Out of 1000 people, roughly 995 will have no disease, and about 10 of them will give a false positive E. 5 will have the disease and about all will give a true positive. 5/(10 + 5) ≈ .333. Such illustrations can counter the possible psychological damage arising from routine tests. General Bayes Formula: Let F1 , F2 , F3 , . . . , Fn be mutually exclusive events whose union in S then P (E) = n X i=1 P (EFi ) = n X P (E|Fi )P (Fi ) law of total probability i=1 and hence 7 Lec8 ends P (Fj |E) = P (Fj E) P (E|Fj )P (Fj ) = Pn P (E) i=1 P (E|Fi )P (Fi ) Bayes’ formula So far we have seen it when S was split in two mutually exclusive events, e.g. F and F c . Example 11 (Three Cards): One card with both sides black (BB), one card with both sides red (RR) and one card with a red side and a black side (RB). Cards are mixed and one randomly selected card is randomly flipped and placed on the ground. You don’t see the flip. If the color facing up is red (Ru ) what is the chance that it is RR? P (RR|Ru ) = 1 · 31 P (Ru |RR)P (RR) = P (Ru |RR)P (RR) + P (Ru |BB)P (BB) + P (Ru |RB)P (RB) 1 · 13 + 0 · 31 + 12 · 1 3 = 2 3 Independence Definition of Independence: Two events E and F are called independent if P (EF ) = P (E)P (F ) otherwise they are called dependent. Motivate through P (E|F ) = P (E). This relationship appears one–sided, but it is symmetric, if P (EF ) > 0, i.e., P (F ) > 0, P (E) > 0. The definition of independence does not require P (F ) > 0. An event F with P (F ) = 0 is always independent of any other event. Example 12 (2 Dice): The number on the first die is independent of the number on the second die. Let F4 be the event that the first die is 4 and S6 the event that the sum is 6 and S7 the event that the sum is 7. Is F4 independent of S6 (S7 )? 5 1 1 1 1 , P (S6 ) = , P (S7 ) = , P (F4 S7 ) = = P (F4 )P (S7 ) , P (F4 S6 ) = 6= P (F4 )P (S6 ) 6 36 6 36 36 Example 13 (Cards): If we draw a card at random then the event A that the card is an ace is independent of the event C that the card is a club. However, this breaks down as soon as the king of diamonds is missing from the deck, but not when all kings are missing. P (F4 ) = Theorem: Independence of E, F implies independence of E, F c , of E c , F and of E c , F c . Example 14 (Independence in 3 Events?): If E is independent of F and also independent of G, is E then independent of F G? Not necessarily! In a throw of two dice let: E be the event that the sum is 7, F be the event that first die is a 4 and G be the event that the second die is a 3. Then P (E) = 1/6, P (F ) = 1/6, P (G) = 1/6, P (EF ) = 1/36, P (EG) = 1/36, P (F G) = 1/36 but P (EF G) = 1/36 6= (1/6)(1/36) All events are pairwise independent but E and F G are not. Definition of Independence of 3 Events: The 3 events E, F , and G are called independent if all the following relations hold: P (EF G) = P (E)P (F )P (G) P (EF ) = P (E)P (F ), P (EG) = P (E)P (G), P (F G) = P (F )P (G) If E, F and G are independent then E is independent of any event formed from F and G, e.g., E is then independent of F ∪ G, F c ∪ G, F G, F Gc , etc. 8 Definition (Extension of Independence to Many Events): E1 , E2 , . . ., En are called independent if P (Ei1 Ei2 . . . Eik ) = P (Ei1 )P (Ei2 ) · · · P (Eik ) for any subcollection of events Ei1 , Ei2 , . . ., Eik (1 ≤ k ≤ n) taken from E1 , E2 , . . ., En . Subexperiments, Repeated Experiments: Often an experiment is made up of many subexperiment (say n) such that the outcome in each subexperiment is not affected by the outcomes in the other experiments. In such a case of ”physical independence” we may then reasonably assume the (probabilistic) independence of the events E1 , E2 , . . ., En provided Ei is completely described by the outcomes in the ith subexperiment. If Si represents the sample space of the ith subexperiment and ei one of its typical outcomes, then an outcome e of the full experiment could be described by e = (e1 , e2 , . . . , en ) and its sample space is: S = S1 × S2 × · · · × Sn = {(e1 , e2 , . . . , en ) : e1 ∈ S1 , e2 ∈ S2 , . . . , en ∈ Sn } . If the sample spaces of the subexperiments are all the same and if the probability function defined on the events of each subexperiment is the same, then the subexperiments are called trials. Note that by prescribing the probability function for each subexperiment and assuming independence of the events from these subexperiments, it is possible to construct a probability function on the events described in terms of the outcomes (e1 , e2 , . . . , en ) of the overall experiment such that it is consistent with the probability function on the subexperiments. Assume this without proof. Lec9 ends Example 15 (Finite and Infinite Independent Trials): A finite number n or an infinite sequence of independent trials is performed. For each trial we distinguish whether a certain event E occurs or not. If E occurs we call the result a success otherwise we call the result a failure. Let p = P (E) denote the probability of success in a single trial. E1 be the event of at least one success in the first n trials. Ek be the event of exactly k successes in the first n trials. E∞ be the event that all trials are successes. Find P (E1 ), P (Ek ) and P (E∞ ). ! P (E1 ) = 1 − P (E1c ) n = 1 − (1 − p) , n k p (1 − p)n−k P (Ek ) = k ( P (E∞ ) ≤ P (En ) = p n for all n =⇒ P (E∞ ) = 0 for p < 1 1 for p = 1 Example 16 (Parallel System): A system is composed of n separate components (relays,artificial horizon in cockpit) such that the system “works” as long as at least one of the components work. Such a system is called a parallel system. Suppose that during a given time period the chance that component i works (functions) is pi , i = 1, 2, . . . , n and assume that the functioning of a component does not depend on that of any of the other components, i.e. we may assume probabilistic independence of (failure) events pertaining to separate components. Let E 9 Air India litigation denote the event that the system functions, i.e. at least one of the components functions during the given time period. Let Ei denote the event that component i functions. Then P (E) = 1 − P (E c ) = 1 − P (E1c E2c · · · Enc ) = 1 − n Y i=1 P (Eic ) = 1 − n Y (1 − pi ) i=1 thus achieving arbitrarily high reliability (= probability of functioning) through redundancy. This is how once can achieve, on paper, a probability of failure on the order of 10−9 , by having a triply redundant system with components of reliability .999 each ⇒ (.001)3 = 10−9 . Example 17 (Infinite Trials, E Before F ): Suppose we perform a potentially infinite number of independent trials and for each trial we note whether an event E or an event F or neither of the two events occurs. It is assumed that E and F are mutually exclusive. Their respective probabilities of occurrence in any given trial are denoted by P (E) and P (F ). What is the probability that E occurs before F in these trials. Let G denote the event consisting of all those trial sequences, i.e. outcomes, in which the first E occurs before the first F . Let E1 denote the event that the first trial results in the event E, let F1 denote the event that the first trial results in F and N1 the event that neither E nor F occurs in the first trial. Then P (G) = P (GE1 ) + P (GF1 ) + P (GN1 ) = P (G|E1 )P (E1 ) + P (G|F1 )P (F1 ) + P (G|N1 )P (N1 ) = 1 · P (E) + 0 · P (F ) + P (G)(1 − P (E) − P (F )) i.e. P (G) = P (E) P (E) + P (F ) Discuss intuition. P ( · |F ) is a Probability: The Power of the Axiomatic Approach! For fixed F with P (F ) > 0 the function P (E|F ) is a probability function defined for events E ⊂ S, i.e. it satisfies the 3 axioms of probability: 1. 0 ≤ P (E|F ) ≤ 1 2. P (S|F ) = 1 3. P (∪∞ i=1 Ei |F ) = P∞ i=1 P (Ei |F ) for mutually exclusive events E1 , E2 , E3 , . . .. Consequences: All the consequences derived from the original axiom set hold as well for the conditional probability function Q(E) = P (E|F ), e.g. Q(E1 ∪ E2 ) = Q(E1 ) + Q(E2 ) − Q(E1 E2 ) or P (E1 ∪ E2 |F ) = P (E1 |F ) + P (E2 |F ) − P (E1 E2 |F ) Also: Q(E|G) = Q(EG) P (EG|F ) P (EGF )/P (F ) P (EGF ) = = = = P (E|GF ) Q(G) P (G|F ) P (GF )/P (F ) P (GF ) 10 and P (E|F ) = Q(E) = Q(E|G)Q(G) + Q(E|Gc )Q(Gc ) = P (E|F G)P (G|F ) + P (E|F Gc )P (Gc |F ) Conditional Independence: Two events E1 and E2 are conditionally (given F ) independent if Q(E1 E2 ) = P (E1 E2 |F ) = P (E1 |F )P (E2 |F ) = Q(E1 )Q(E2 ) . Conditional independence of E1 and E2 does not imply the (unconditional) independence of E1 and E2 . Randomly pick one of two boxes | • • • ◦| and | • • ◦ | and then 2 balls with replacement from that box. P (•1 •2 |box i) = P (•1 |box i)P (•2 |box i) but P (•1 •2 ) 6= P (•1 )P (•2 ) 1 3 1 2 17 1 3 3 1 2 2 145 17 2 P (•i ) = × + × = and P (•1 •2 ) = × × + × × = 6= 2 4 2 3 24 2 4 4 2 3 3 288 24 Conversely, unconditional independence of E1 and E2 does not imply their conditional Lec10 independence given F . Rolling two dice, let E1 = “sum of 7”, E2 = “first die is a 6” and F = “at ends least one 6.” 1 2 6 1 while P (E1 E2 |F ) = 6= P (E1 |F )P (E2 |F ) = P (E1 E2 ) = P (E1 )P (E2 ) = 36 11 11 11 However, the independence of E1 , E2 and F implies the conditional independence of E1 and E2 given F (exercise). This notion of conditional independence of two events can easily be generalized to a corresponding notion of conditional independence of three or more events as was done for the unconditional case. Example 18 (Insurance Revisited): Let F be the event that a randomly selected person is accident-prone with P (F ) = .1, let A1 be the event that this person has an accident in the first year and A2 be the event that this person has an accident in the second year. An accident-prone person has chance .2 of having an accident in any given year whereas for a non-accident prone person that chance is .1. 10% of the population is accident-prone. Finally we assume that A1 and A2 are conditionally independent given that an accident-prone (or non-accident-prone) person was selected, i.e. given F (or F c ). What is P (A2 |A1 ) ? P (A2 A1 |F )P (F ) + P (A2 A1 |F c )P (F c ) P (A2 A1 ) = P (A1 ) P (A1 ) P (A2 |F )P (A1 |F )P (F ) + P (A2 |F c )P (A1 |F c )P (F c ) = P (A1 ) 13 .2 · .2 · .1 + .1 · .1 · .9 = = = .1182 6= P (A2 ) = .11 .11 110 It also shows that conditional independence does not necessarily imply unconditional independence. Similarly one shows P (Ac3 Ac1 Ac2 ) P (Ac3 Ac1 Ac2 |F )P (F ) + P (Ac3 Ac1 Ac2 |F c )P (F c ) P (Ac3 |Ac1 Ac2 ) = = P (Ac1 Ac2 ) P (Ac1 Ac2 |F )P (F ) + P (Ac1 Ac2 |F c )P (F c ) .83 · .1 + .93 · .9 = = .8919 . .82 · .1 + .92 · .9 Compare this with P (Ac3 ) = .89 and P (Ac3 |F ) = .8 and P (Ac3 |F c ) = .9. P (A2 |A1 ) = 11 The Gambler’s Ruin Problem Two gamblers, A and B, with respective fortunes of i and N − i units keep flipping a coin (probability of heads = p). Each time a head turns up A collects one unit from B, each time a tail turns up B collects one unit from A. The game ends when one of the players goes broke, i.e. gets ruined. What is the probability of the event E that A is the ultimate winner? Let Pi = P (E), the subscript i emphasizing the dependence on the fortune of player A. With H denoting the event of a head on the first toss, note that Pi = P (E) = P (E|H)P (H) + P (E|H c )P (H c ) = pP (E|H) + (1 − p)P (E|H c ) = pPi+1 + (1 − p)Pi−1 =⇒ pPi + (1 − p)Pi = pPi+1 + (1 − p)Pi−1 or Pi+1 − Pi = 1−p (Pi − Pi−1 ) p Using the boundary condition P0 = 0 we get P2 − P1 = 1−p 1−p (P1 − P0 ) = P1 p p !2 1−p 1−p P3 − P2 = (P2 − P1 ) = P1 p p ... ... ... !i−1 1−p 1−p Pi − Pi−1 = (Pi−1 − Pi−2 ) = P1 p p ... ... ... !N −1 1−p 1−p P1 PN − PN −1 = (PN −1 − PN −2 ) = p p and adding these equations get ! !2 !i−1 1 − p 1 − p 1 − p Pi − P1 = P1 + ... + p p p or 1 − ((1 − p)/p)i Pi = P1 if (1 − p)/p 6= 1 or p 6= 1/2 1 − (1 − p)/p and Pi = iP1 if (1 − p)/p = 1 or p = 1/2 Using PN = 1 we get: P1 = 1 − (1 − p)/p 1 − ((1 − p)/p)N if p 6= 1/2 and P1 = 1 N if p = 1/2 Pi = 1 − ((1 − p)/p)i 1 − ((1 − p)/p)N if p 6= 1/2 and Pi = i N if p = 1/2 . and hence If Qi = probability that player B wins ultimately, starting with N − i units, we get by symmetry: Qi = 1 − (p/(1 − p))N −i if (1 − p) 6= 1/2 1 − (p/(1 − p))N and Qi = N −i if (1 − p) = 1/2 . N and note Pi + Qi = 1, i.e. the chance that play will go on forever is 0. See illustrations on the last 4 pages. 12 Example 19 (Drug Testing): Two drugs are tested on pairs of patients, one drug per patient in a pair. Drug A has cure probability PA and drug B has cure probability PB . The test for each pair of patients constitutes a trial and the score SA of A is increased by one each time drug A results in a cure of that patient. Similarly the score SB of B goes up each time B effects a cure. The trials are stopped as soon as SA − SB either reaches M or −M , where M is some predetermined number. If we eliminate all those trials which result in no change of the score difference SA − SB , then the remaining trials are again independent and the outcome of such a remaining trial is either that SA − SB increases or decreases by one with probability P = PA (1 − PB )/(PA (1 − PB ) + PB (1 − PA )) or 1 − P = PB (1 − PA )/(PA (1 − PB ) + PB (1 − PA )) respectively. This is exactly the gambler’s ruin problem, where both players start out with M units, i.e. N = 2M . Note: SA − SB = M iff A had M more wins than B or A wins, if both start with M betting units. When PA 6= PB , the probability that drug B comes out ahead ({B > A}) is: )M 1 − ( 1−P 1 P P ({B > A}) = 1−P ({A > B}) = 1− 1−P 2M = 1 + γM 1−( P ) where γ = P PA (1 − PB ) = . 1−P PB (1 − PA ) For PA = .6 and PB = .4 and M = 5 we get P ({B > A}) = .017 and for M = 10 we get P ({B > A}) = .0003. γ is also called the odds-ratio, of the odds PA /(1 − PA ) over the odds PB /(1 − PB ). 13 1.0 0.8 0.6 0.4 0.2 r= ● i N = 0.05 0.0 Pi = probability that player A with capital i ruins player B with capital N−i N=100; i=5 N=200; i=10 N=500; i=25 N=1000; i=50 N=5000; i=250 N=10000; i=500 0.45 0.50 0.55 p = P(A wins one unit per game) 14 0.60 1.0 0.8 0.6 0.4 0.2 ● r= i N = 0.25 0.0 Pi = probability that player A with capital i ruins player B with capital N−i N=100; i=25 N=200; i=50 N=500; i=125 N=1000; i=250 N=5000; i=1250 N=10000; i=2500 0.45 0.50 0.55 p = P(A wins one unit per game) 15 0.60 1.0 0.8 0.6 0.2 0.4 ● r= i N = 0.5 0.0 Pi = probability that player A with capital i ruins player B with capital N−i N=100; i=50 N=200; i=100 N=500; i=250 N=1000; i=500 N=5000; i=2500 N=10000; i=5000 0.45 0.50 0.55 p = P(A wins one unit per game) 16 0.60 1.0 0.8 0.2 0.4 0.6 ● r= i N = 0.95 0.0 Pi = probability that player A with capital i ruins player B with capital N−i N=100; i=95 N=200; i=190 N=500; i=475 N=1000; i=950 N=5000; i=4750 N=10000; i=9500 0.45 0.50 0.55 p = P(A wins one unit per game) 17 0.60