* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Appendix: Conditional Probability: Conditional probability is updated
Survey
Document related concepts
Transcript
Appendix: Conditional Probability: Conditional probability is updated probability. Conditionality is used to update probability when some relevant information becomes available. It is accomplished by updating the sample space. Let us start with reviewing the set operations. There are two binary set operations, union (denoted by U) and intersection (denoted by ∩). The union of two sets, A U B, is a set consisting of all the elements in A and B. It is the union of elements from both sets. Let us have an example. If X is the number of dots on the upper side of a rolled die, then the sample space for this random variable is S = {1, 2, 3, 4, 5, 6}. Let A = {2, 4, 6} and B = {5, 6}, then A U B = {2, 4, 5, 6}, not {2, 4, 5, 6, 6} because any element cannot be listed more than once in a set. The intersection of two sets, A ∩ B, is a set consisting of all the common elements between A and B. It is an intersection of the elements from both sets. In the example with the rolled die above, A ∩ B = {6}, not {6, 6} which is not a set. There is a set-operator that operates on one set. It is the complement of a set. If the set is D, then the complement of D is denoted as Dc or D’. The complement of a set is a set consisting of all the elements not in the set but in the sample space. It is the complement to the event for the sample space. For example, Ac = {1, 3, 5} and Bc = {1, 2, 3, 4} with the rolled die. Notice that, for any subset D of S (that is, all the elements of D are elements of S), D U Dc = S and D ∩ Dc = φ (= { }, the empty set). So, P(D U Dc) = P(S) = 1 and P(D U Dc) = P(D) + P(Dc) because D ∩ Dc = φ and P(D ∩ Dc) = 0, which gives P(D) + P(Dc) = 1 or P(D) = 1 - P(Dc), useful for finding P(D) when it is difficult to find P(D) but P(Dc) 1 is easy to find. Generally, if one is difficulty find, the other is easy to find. By the way, these probability equations are equivalent to P(Dc) = 1 - P(D) as well. Please verify all this using A and Ac in the rolled-die example. If the die is assumed to be balanced, then each sample point has the equal probability of 1/6 (that is, P(A) = ½ and P(Ac) = ½). Can you do it? Also, repeat the verification with B in the rolled-die example. This time, P(B) = 1/3 and P(Bc) = 2/3. Can you verify them? Generally, P(D U E) = P(D) + P(E) – P(D ∩ E). The subtraction of P(D ∩ E) is required because it is added twice as part of P(D) and P(E). So, one of them must be subtracted to make it right. See the left diagram below. Two sets, D and E, are given as circles in their sample space given as a rectangle. The overlapping area of the two circles is D ∩ E. The elements of sets are not explicit given (unless needed) but are assumed to exist. So, when circles for sets are overlapping, it is assumed that the intersection has at least one element (not empty). On the other hand, if there is no common element between two sets (that is, the sets are mutually exclusive), their circles are given without overlapping, see the right diagram below. By the way, these diagrams are called Venn diagrams which are very commonly used with sets. D D E E S S If Sets D and E are mutually exclusive to each other, then P(D U E) = P(D) + P(E) – P(D ∩ E) becomes P(D U E) = P(D) + P(E) since P(D ∩ E) = P(φ)= 0. By the way, remember that it is always P(A ∩ Ac) = 0 because A and Ac are mutually exclusive. 2 Now, let us go back to conditional probabilities. The conditional probability of D given E is denoted as P(D|E) and defined as P(D|E) = 𝐏(𝐃∩𝐄) 𝐏(𝐄) , where E ≠ φ since you cannot condition on a set that does not occur and, also, cannot have 0 in the numerator. Note that P(φ) = 0 and the conditional probability must be defined as a number between 0 and 1, both inclusive, like any other probability. A common mistake is to use P(DUE) in the numerator. However, this would result in P(D|E) > 1 because DUE is greater than E (see the left diagram above), and you know it is incorrect. Conditional or not, a probability is a number between 0 and 1, both inclusive. It is sometimes said “the conditional probability of D on E.” So, P(D ∩ E) is “on” P(E) in the fraction. However, it cannot be “on E” in the fraction because you cannot use division on sets (division is for numbers). “The conditional probability of D given E” means that “the conditional probability of D given that E has occurred.” We are interested in P(D), say, to make some important decisions. We received the information that E has occurred. Then, P(D) is obsolete and incorrect in light of this information. We should not make important decisions based on old, incorrect probability because it leads to incorrect decisions. So, we need to update P(D) using the information that E has occurred. P(D) is a probability with respect to S. In fact, P(D) is a conditional probability of D given S, P(D|S) = P(D ∩ S)/P(S) = P(D)/1 = P(D) since P(S) = 1 and D ∩ S = D (check this with the left diagram above). That is, any probability, including P(D), is a conditional probability on S. Now, we have S = E U Ec and suppose that we now have the information that Ec is no longer true (since E has occurred). So, P(D) = P(D|S) is no longer correct. How to correct it? Using E as a new sample space, you condition D on E. Actually, you must condition D ∩ E, instead of D, since D = (D ∩ E)U(D ∩ Ec) and D ∩ Ec is not valid; namely, D ∩ Ec = φ because E has occurred. 3 That is, we have D = D ∩ E and P(D) = P(D ∩ E) with the information that E has occurred. All this is clearer with the left diagram above. Since E has occurred, shrink S (the rectangle) to E (the circle), getting rid of Ec and shrink D to P(D ∩ E). Consequently, P(D) is now changed (shrunk or reduced) to P(D ∩ E) which is the probability of D with respect to E, no longer to S. So, P(D) must be P(D ∩ E)/P(E) (not P(D) = P(D ∩ S)/P(S) any more) since E has occurred. This is what the two sentences, “Conditionality is used to update probability when some relevant information becomes available” and “It is accomplished by updating the sample space,” mean. So, you are using some probability to make decisions. Then becomes available a piece of information that some part of the sample space is no longer true. So, the sample space and the probability for your decision-making are obsolete and no longer correct. As a result, you update (correct) the probability and the sample space by shrinking the sample space to the part that is true. The resulting, updated, correct probability is the conditional probability given the true part of the original sample space. The true part is the updated, correct sample space. Let us have some examples. You live in a big city where the chance of getting convicted for murder is only 10%, P(C) = 0.10, the chance of getting arrested for murder is 8%, P(A) = 0.08, and the chance of getting arrested and convicted for murder is 7%, P(A ∩ C) = 0.07. You just despise your mother-in-law. After examining these probability, you decide that it is worth risking the chance and whack her (oh, it feels so good – she had it coming!!) However, now you are arrested for the murder and sitting in the back of a policy car. What is your chance of getting convicted for the murder? It is not 10%. You might think it is only 7% because the probability is for the event of being arrested and convicted for the murder. However, this P(A ∩ C) = 0.07 is based on the entire samples space, including “not getting arrested, Ac.” It must be updated by conditioning it on A, which is P(A ∩ C)/P(A) = 0.07/0.08 = 0.875. Yes, you need a good lawyer. 4 Conditioning can be used to obtain predictive (speculative) probability. This is a conditional probability of A given that B will occur. That is, a conditional probability can be an updated probability, the last example, or a predictive probability. For instance, you have just come into a big money by winning in a big poker tournament and are wondering how to invest the money. Of course, you decide not to invest in crude oil because the chance of the crude oil price going up 40% within one month is only 0.0001, P(O40) = 0.0001. Also, the chance of U. S. going to war in the Middle East is 0.00005, P(W) = 0.00005, and the chance of U. S. going to war and the crude oil price going up 40% within one month is 0.00004, P(W ∩ O40) = 0.00004. Now, you have just talked to your college buddy who works for the Department of Defense, and he has told you that we are about to go to war in the Middle East. Then, the chance of crude oil price going up 40% is predicted by P(O40|W) = P(W ∩ O40)/P(W) = 0.00004/0.00005 = 0.8. With this predictive probability, you might want to invest your poker tournament winning in crude oil for a month or so. Now, by multiplying both sides of P(D|E) = P(D∩E) P(E) with P(E) and swapping the left and right sides around, you get P(D ∩ E) = P(E)*P(D|E). Some mathematicians and statisticians define the conditional probability of D given E to be any probability P(D|E) that satisfies the last equation because it is equivalent to the original definition but does not require any condition such as E ≠ φ (the simpler, the better). Indeed, the last equation appears often in practice. An example of P(D ∩ E) = P(E)P(D|E) is the following. You draw two cards from a well-shuffled deck of 52 playing cards without replacement (that is, the first card drawn stays on the table and not goes back into the deck for the second draw). You are interested in the probability of the first card drawn is a black card, B1, and the second card drawn is also a black card, B2. That is, P(B1 ∩ B2) is of your interest. This probability is computed as 1 * 1 26 25 . The first probability, 1/26, is P(B1) and the second probability, 1/25, is P(B2|B1). That is, P(B1 ∩ B2) = P(B1)*P(B2|B1). 5 By the way, can you find the probability of the first card drawn is a heart and the second card drawn is a red card? Conditional probabilities are used in the Bayes’ Theorem and Bayesian statistics. If you are interested in them, you can find information about them at http://en.wikipedia.org/wiki/Bayes'_theorem http://en.wikipedia.org/wiki/Category:Bayesian_statistics http://en.wikipedia.org/wiki/Bayesian_inference Independent Events: If the probability of one event remains the same as before and after another event happens, then the second event has no effect on the first event (in terms of its probability). These events are called independent events. That is, A and B are independent events if and only if P(A | B) = P(A) or P(B | A) = P(B) . By the way, “if and only if” means the two-way implication. If A is independent of B; that is, P(A | B) = P(A), then P(A | B) = P(A ∩ B)/P(B) = P(A). That is, P(A ∩ B)/P(B) = P(A) which is equivalent to P(A ∩ B) = P(A)*P(B). So, if P(A | B) = P(A), then P(A ∩ B) = P(A)*P(B). Now, this means P(B | A) = P(A ∩ B)/P(A) = P(B)*P(A)/P(A) = P(B), and we have P(B | A) = P(B) as well. This shows that B is independent of A if and only if A is independent of B. So, we have the following. A and B are independent of each other if and only if P(A | B) = P(A) or P(B | A) = P(B). And, equivalently, A and B are independent of each other if and only if P(A ∩ B) = P(A)*P(B). Let us have some examples of independent (and dependent) events. Recall the example with drawing two cards from a well-shuffled deck of 52 cards. If the 6 drawing is performed with replacement (that is, the card drawn first is shuffled back into the deck for the second draw), then the probability of two cards drawn being black cards is 1/26*1/26. That is, P(B1 ∩ B2) = P(B1)*P(B2). So, B1 and B2 are independent events when the drawing is conducted with replacement while they are not independent events when the drawing is conducted without replacement. If a balance die is rolled and a fair coin is tossed, then the evens concerning the number of dots on the rolled die and the events concerning the side showing up on the tossed coin are independent of each other. For instance, let A be the event of the number of dots on the die greater than 4 and H be the event of the side up being heads. Then, P(A) = 2/6 = 1/3 and P(H) = 1/2 and P(A ∩ H) = 2/12 = 1/6. P(A)*P(H) = (1/3)*(1/2) = 1/6 = P(A ∩ H). That is, A and H are independent events of each other. This independence comes from the physical and operational independences of the die and the coin. In the example with the murder of your mother-in-low, A and C are not independent events since P(A ∩ C) = 0.07 is different from 0.008 =P(A)*P(C) (=0.08*0.10). In the example with the crude oil price, the two events, O40 and W, are not independent events because P(O40|W) = 0.8 and P(O40) = 0.0001 are different from each other. Finally, there are some interesting questions. If A and B are independent, are A and Bc independent? If A and B are independent, are Ac and B independent? If A and B are independent, are Ac and Bc independent? You should be able to find the answers by checking whether the probabilities of intersections can be factored into the individual probabilities or not. For instance, to answer whether A and Bc are independent or not, you need to check whether P(A ∩ Bc) is same as P(A)P(Bc) or not, using P(A ∩ B) = P(A)P(B). Venn diagrams are very helpful for this. © Copyrighted by Michael Greenwich, 08/2012. 7