Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Week 3 Conditional probabilities, Bayes formula, Expected value of a random variable WEEK 3 page 1 We recall our discussion of 5 card poker hands. Example 13 : a) What is the probability of event A that a 5 card poker hand contains one or more aces. This is the probability of the complement of the event that the hand contains no aces. 485/525 . Thus P( A) = P( 1 or more ace) = 1 – P( no aces ) = 525−485/525 52 48 where the numerator − is the number of hands having 1 or more aces (i.e. not having zero 5 5 P( no aces ) = aces ). We could also have computed this by noting that the event 1 or more aces is the union of the 4 events : exactly 1 ace (hence 4 non-aces chosen from 48 non-aces) , exactly two aces (hence 3 nonaces) , exactly 3 aces or exactly 4 aces. By our basic counting principles we then must have 52 − 48 = 4 ⋅ 48 4 ⋅ 48 4 ⋅ 48 4 ⋅ 48 5 5 1 4 2 3 3 2 4 1 b) What is the probability of the event B that we are dealt a full house consisting of 3 aces and 2 kings ? The probability is the number of ways to choose 3 aces from the 4 aces in the deck times the ways to choose 2 kings from 4 over the total number of 5 card poker hands or 4 ⋅ 4 / 52 . 3 2 5 c) What is the probability of any full house (3 of one kind two of another) ? This is the same as in b) above except there are 13 ways to choose the kind that we have 3 of and 12 ways to choose the kind we 4 4 52 have 2 of (since it can't be the kind we already picked). So probability 13⋅12⋅ ⋅ / 3 2 5 d) What is the conditional probability P(B | A ) of the full house in b) if we are told by the dealer before he gives us our 5 cards that our hand has at least 1 ace in it already. This is a conditional probability problem conditioned on the event A that our hand has at least one ace in it. In this case we saw in part a) that the number of such hands with one or more ace is the total number of hands minus 52 − 48 the number having no aces or hands. These hands constitutes the reduced sample space 5 5 A for the conditional event problem (i.e. the event A that our hand has at least one ace in it in this example) . We can assume that our full house is then randomly selected from these hands with each such hand being equally likely. This yields the number of full houses aces over kings found in the numerator of b) above over the size of the reduced sample space or 4⋅4 3 2 P B∩A P B | A= = P A 52 − 48 5 5 To see the last equality, note that by dividing both the numerator and denominator above by the total 52 number of poker hands hands this is the same as the answer in b) divided by the answer in a) 5 i.e. it is equal to P( B ) / P( A) and since the full house containing 3 aces that is the event B is contained in the event A that at least one ace occurs we also have B=B∩ A . This is how we will define conditional probability in general. I.e. we have motivated the following definition. WEEK 3 page 2 Definition of conditional probability : for any two events A and B with P(A) non-zero P B∩A we define P B | A = . If we let p(B)= P B | A one checks that for fixed A the set P A function measure p(B) satisfies the 3 axioms needed for it to be a probability measure. Namely it satisfies 1) 0≤ p A≤1 P A p = p A= =1 (i.e. since ∩ A= A the sample space has measure 1 ) 2) P A 3) countable additivity for p( ) follows directly from the same property of P( ) . Thus a conditional probability measure really is a probability measure. The above definition can be re-written as the multiplication rule for conditional probabilities : P A∩B = P A⋅P B | A . For three events (and similarly for more than three) this takes the form P A∩B∩C=P A⋅P B| A⋅P C | A∩B , which can be verified by going back to the definition of conditional probabilities. If we think of events which are ordered in time, this says that the probability of a sequence of events occurring may be written as a product where at each step of the product we condition the next event on the previous events that are assumed to have already occurred. For the three events above first A occurs and then B occurs (given A already has) and then C occurs (given A and B have already occurred). Example 1 : To calculate the probability of selecting 3 aces in a row from a randomly shuffled 52 card deck, we have done such problems directly by a counting argument. Namely the number of ways to do this is the number of ways to choose 3 aces from the 4 in the deck and to get the probability we divide by the number of ways to select 3 cards from 52 . I.e . P( select three aces in a row from a randomly shuffled 52 card deck ) 4⋅3⋅2 = 4 / 52 = . 52⋅51⋅50 3 3 We could also obtain the same result using the multiplication rule for conditional probabilities namely Ai be the event that the i th card selected is an ace, we have letting 4 3 2 P A1∩A2∩ A3=P A1⋅P A2 | A1⋅P A3 | A1∩ A2 = ⋅ ⋅ 52 51 50 That is, for the first ace there are 4 ways (aces) to choose an ace out of 52 equally likely cards to select , but having chosen an ace there are now 3 ways to choose an ace from the remaining 3 aces left in the 51 equally likely cards remaining, and finally 2 ways to choose the third ace out of the remaining 2 aces in the 50 equally likely cards remaining given that an ace was obtained in each of the previous two selections. Example 2 : problem 3.70 b) from the text : during the month of May in a certain town the probability of the event R k1 that day k+1 is rainy given Rk that day k was is .80 . That is assuming days are either rainy or sunny (not rainy) P R k1 | Rk =.80 . This implies P S k 1 | Rk =.20 is the probability of a sunny day given the previous day was raining. We are also told that P R k1 | S k =.60 is the probability that the next day will be rainy given the previous was sunny. What then is the probability that for some 5 consecutive days in May in this town a rainy day is followed by two more rainy days, then a sunny day and then a rainy day. We are not told the probability that a particular day is rainy so we must interpret this as the probability that given the first day is rainy that the second and third are too, the fourth sunny and the fifth rainy. WEEK 3 page 3 By the multiplication rule for conditional probabilities we have P R5 S 4 R3 R 2 | R1= P R5 | S 4 R3 R2 R 1⋅P S 4 | R3 R 2 R1 ⋅P R3 | R 2 R1 ⋅P R2 | R 1 now we use the property (so-called Markov property) that the next event only depends on the previous one and nothing earlier (one says that the Markov chain has a “memory of one”. to get = P R5 | S 4 ⋅P S 4 | R3 ⋅P R 3 | R2⋅P R2 | R1 = .6⋅.2⋅.8⋅.8 = .0768 for our desired conditional probability. Remark : A Markov chain generalizes this example to the case where there may be more than the two states rainy or sunny. For example in the finite state space case for a 5 state chain, the chain is described by a 5 by 5 transition matrix of conditional probabilities. For example the row 1 column 3 entry of the matrix gives the conditional probability of going to state 3 given that the previous state was state 1. If we want to know the probability of going from state 2 to state 5 in 4 time steps we look at the (2,5) entry of the fourth power of the transition matrix. If one wants to remember the last two states (memory of two) there is a trick one uses. This situation can still be described by a Markov chain where now we enlarge the state space from 5 states to all 25 pairs of states (which we could label as states 1 through 25). Now the transition matrix is a 25 by 25 matrix of 625 transition probabilities. Similarly if we want to remember the previous three states our enlarged state space would then consist of all 125 triples of states one through 5 (so 5 cubed or 125 states) etc. To learn more consider taking Math 632 Introduction to Stochastic (random) Processes. Independent events : Intuitively what we mean when we say two events A and B are independent such as two consecutive flips of a fair coin, is that being told that event B has occurred (the first flip yielded a head say) should not influence the probability of A occurring (the second flip is a tail say) or in symbols P A| B=P A . Equivalently by the definition of conditional probability, this is true if and only if P A∩B=P A P B ( <-- pairwise indepence ) i.e. the probability of the intersection of the two events factors as the product of the probabilities of the individual events. More generally, we say a collection of events are independent if the probability of the intersection of the events in any sub-collection of two or more of the events factors as the product of the probabilities of the individual events: P A i ∩ Ai ∩...∩ Ai =P Ai ×P A i ×. . .×P Ai . It is possible for three events to fail to be independent ( not independent = dependent ) even when any two of the events are pairwise independent. 1 2 k 1 2 k Example 3: Three events which are dependent (not independent) but which are pairwise independent : For a simple example of this consider flipping a fair coin twice. Let A= the first flip yields a head={HH, HT}, B= the second flip yields a head={TH, HH}, C= exactly one head occurs in the two flips={TH, HT} Note that the intersection of the three events is the empty set which has probability 0, we have P A∩B∩C=P ∅=0≠P A⋅P B⋅P C=1/8 since the individual events each have probability P A=P B= PC =1/2 so their product is 1/8. Since 0 is not equal to 1/8, A, B, and C are not independent events. but we claim the probability of the intersection of any two of these events is P A∩B=P B∩C =P A∩C =1 /4 , so these are pairwise independent events since clearly P A∩B=P A⋅P B etc. Example 4 : Consider 5 rolls of a fair (six-sided) die. The probability of rolling a 3 for any particular WEEK 3 page 4 roll is 1/6 while the probability of not rolling a 3 is 5/6 by the probability of the complement. Find the probability of rolling exactly two 3's in 5 rolls. Using independence, the probability of any particular sequence of 5 rolls, which we view as 5 independent events, two of which involve rolling a 3, and the other three involve rolling anything else , is the product of the individual probabilities1/6 times 1/6 times 5/6 times 5/6 times 5/6. But there are 5 choose 2 ways that we could have selected the particular two rolls in which the 3 occurred. Thus by the sum rule for probabilities of disjoint events : = 5 1/62 5/63 . 2 This is an example of a binomial random variable where the probability of success (a 3 is rolled) for each trial (roll) is p=1/6 and the probability of failure is (1-p) = 5/6. Similar reasoning gives that the probability of exactly k successes in n independent trials each having success probability p is the binomial probability b( k; n, p) = P( exactly k successes in n independent trials each having success probability p ) k n−k = n p 1− p . k P( two threes in 5 rolls) = Example 5 : Consider a system where parts 1 and 2 operate independently of one another each with failure probability .1 but are running in parallel so that the system diagram looks like _____ ______| 1 |______ / |____| \ ------/ \__________ \ _____ / \______| 2 |______/ |____| Letting A ={ component 1 operates successfully}, B= {component 2 operates successfully}, then P( system succeeds ) = P( either 1 or 2 operates) = P A∪B=P AP B−P A⋅P B=.9.9−.9⋅.9=.99 so the combined system is operational 99% of the time. A slightly more complicated example involves a similar system like _____ _____ ______| 1 |_____| 3 |__ / |____| |_____| \ ------/ \_________ \ _____ / \______| 2 |______ ______/ |____| where now with event C = {component 3 operates successfully}, with C independent of A and B P( system succeeds ) = P A∩C∪B=P A∩CP B−P A∩B∩C=.9.9.9−.9.9.9=.981 using independence of the three events and assuming component 3 also fails with probability .1. The success probability is slightly lower than before since now both components 1 and 3 must work properly for the top series to work. WEEK 3 page 5 Example 6 : Problem 3.75 : A tree diagram for conditional probabilities is a useful device.. Figure 3.16 of the text used for exercise 3.75 is the following .30 .30 B---------- A B------------ A / \ / \ .4 / \-------- A .4 / \-------- A / which can be filled in as / .70 \ \ \ .6\ .8 \ B -------- A \ B ---------- A \ \ \------------ A \--------- A .2 .2 The interpretation is that P B=.4 , P A | B=.30 , P A | B =.20 from which we infer =.6 , P A | B=.7 , P A| B =.8 (using the law of the probability for the complement ) PB = (.4)(.3) + (.6)(.8) = .60 P A=P B P A | BP B P A| B a) P A∩ B P B P A| B .4.3 1 = = = = .20 by part a) using the multiplication b) P B | A= P A P A .60 5 rule for conditional probabilities. Note that the original diagram gave us P A | B and what was wanted in part b) was to reverse the order of conditioning, that is to find P B | A . This is the situation where Bayes' theorem applies. One has a collection of mutually exclusive (disjoint) events which exhaust the sample space. In this B . case the sample space is a disjoint union of B and c) Similarly P B| A= P B P A | B .4.7 = = .70 P A .4 Part a) is referred to as the rule of total probability (or rule of elimination) . It is used to get the denominator in part b). Parts b) and c) are known as Bayes' theorem. To get the rule of total probability we note that the disjoint partition of the sample space such as B∪ B= also partitions any set A= A∩B∪ A∩ B as a disjoint union. Since the union is disjoint, the third axiom of . We then re-write each probability on the right via probability gives P A=P A∩BP A∩ B the multiplication rule for conditional probabilities (essentially the definition of conditional probability). This gives the rule of total probability which gives the denominator in Bayes' theorem. E 1∪E 2∪... E k = is a disjoint union which exhausts the sample space then P E l P D | El P E l | D= k Bayes' theorem says . The probabilities P E l l=1, .. , k are ∑ P E j P D| E j More generally if j=1 called the priors (which reflect our best knowledge before the experiment D). Then we collect some new data and update these to get the posterior probabilities P E l | D . Certain types of probability distributions have the property that the prior and posterior both have a similar form except that certain real number parameters characteristic of the distribution change in ways which are easy to compute. Then if we get some new data in we can regard the old posterior as the new prior and use the new data to update to get the new posterior and so on. WEEK 3 page 6 Example 7 : An Ace Electronics dealer sells 3 brands of televisions : 50% are the first brand which are Italian made , 30% are the second brand which are French made, and 20% are the third brand which are Swiss made. Each TV comes with a 1 year warranty. We know that 25% of the Italian TVs will require repair work under warranty, 20% of the French will require repairs under warranty, and 10% of the Swiss TVs will need repair under warranty. a) What is the probability that a randomly selected customer has bought an Italian TV that will need repairs under warranty ? b) What is the probability that a randomly selected customer has a TV that will need repairs under warranty ? c) If the customer returns to the store with a TV that needs repairs under warranty, what is the probability that it is an Italian TV? French ? Swiss ? Letting A be the event that a randomly selected TV is Italian , B that it is French, C that it is Swiss and R that it will need repairs under warranty. We are given that P A=.50, P B=.30, PC =.20, P R | A=.25, P R | B=.20, P R| C =.10 Part a) asks for P A∩R=P A⋅P R | A=.50⋅.25=.125=1 /8. by the product rule for conditional probabilities. Part b) wants P(R) which uses the rule of total probability P R= P R∩AP R∩BP R∩C =P A P R| AP B P R| BP C P R| C = .125 + .060 + .020 = .205. P A P R| A .125 = =.61 using Bayes' theorem and again Part c) asks for P A| R= P R .205 .060 .20 =.29 and P C | R= =.10 which could also be obtained as for P B| R= .205 .205 1−P A| R−P B | R=1−.61−.29=.10 . Example 8 : Suppose a certain diagnostic test is 98% accurate both on those who do and those that don't have a disease. If .3%=.003 of the population has cancer, find the probability that a randomly selected tested person has cancer (C) given that the test is positive (+). The given information says from which we deduce that P C = .003, P | C = .98 = P −| C P C=.997, P | C =.02=P −|C Then by Bayes' theorem, P|C PC .98×.003 3 = ≈ P ( C |) = . This is slightly larger P C .98×.003.02×.997 P | C PC P | C 23 than 15% . Thus the posterior probability of having the disease (given a positive test result) is over 150 chances out of 1000 up from the original prior probability of .003 ( 3 chances in 1000 ) prior to testing. A positive test result only gives a 15% chance of having the disease due to the fact that the chance of having it in the population as a whole is so small. Expectation of a random variable (also called its expected value or mean value) Consider the following game: we flip a fair coin 5 times. If a head occurs for the first time on the j th j flip the game pays us winnings amount W =W =a j =2 dollars for 1≤ j≤5 where is a particular outcome of the experiment that is a sequence of 5 heads or tails such that the first head occurs on the j th flip and using independence of flips this happens with probability 1 p j= P :W =a j = j (= P W =a j for short ) . If no heads occur in 5 flips we'll take j=6 so 2 that the game pays us the grand prize of W =26=$ 64 and this occurs with probability 1/32. Now the 64 dollar question is : How much are you willing to pay to play this game so that on average you will break even ? WEEK 3 page 7 If we play the game a large number of times n then by the relative frequency interpretation of probabilities, the number of times s j we won amount a j dollars (successes s j in winning the sj is approximately the j th amount ) is approximately n p j (equivalently the probability p j ≈ n relative fraction of times we won amount a j ). Thus if we play the game independently many times n wining amount W k on the k th time that we played, our long term average or sample mean n winnings ∑ Wk ∑ ajn pj is approximately or canceling the n's and denoting our ≈ j = k=1 W n n average winnings by E[ W ] (the expected value of the random variable W ) we find for the definition of the expectation of any discrete random variable W taking possible values a j (from a countable set of values) with probabilities p j : E [W ]=∑ a j p j=∑ a j P W j =a j . j j In our particular example when the sample size n gets big but the sum over j is for 1≤ j≤6 this gives E [W ]=21/241/ 481/8161/16321/3264 1/32 = 111112=$ 7 . Thus we should be willing to pay $7 each time we play the game, assuming we play more than 32 times, long enough to win the grand prize which we expect to happen around 1/32 of the time. The situation is a little different if instead of stopping at the 5th flip we flip the coin 15 times with grand prize 216 dollars. It is easy enough to calculate the expectation in this case using the above formula. But realistically not all of us would want to wait around on average 215 times which could take several years before we see the rare event of winning the grand prize which occurs with probability 15 1/ 2 . Economists speak of a utility function which describes how much playing such a game is worth to us personally and which may vary from person to person depending on our tastes in gambling and how much we are willing to risk. In one of the homeworks problem 3.90 involving expected values, a company pays some per unit cost of C dollars and sells the item at a per unit sales price S=S 1 . If a fixed number k of items are stocked for the day then the cost of the k units is a fixed amount k⋅C . The demand (how many units customers desire to purchase that day) is assumed to be a random variable where p j gives the probability that the demand equals j units that day and the sales price resulting from a given demand j is then a random variable X given by X = S⋅j if j≤k −1 else X =S⋅k if j≥k . (I.e. The actual sales cannot exceed the number in stock.) Then the expected profit is the expected sales price minus the fixed cost or k−1 E [ P ]=E [ X ]−C⋅k =∑ p j⋅S⋅ j ∑ p j ⋅S⋅k −C⋅k . j =0 j≥k That is for demand j less than the number k in stock our sales price for the j units sold is S⋅j and this occurs with probability p j while if the demand is greater than or equal to the number k in stock we sell k items at a price S⋅k and this occurs with probability ∑ p j . Note we could have re-written j≥k the fixed cost k⋅C =k⋅C⋅∑ p j (since the sum of the probabilities equals 1) and then the above is j equivalent to the expected profit where the profit is the random variable price random variable minus the fixed cost for k items in stock . X −C⋅k which is the sales