Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Ismor Fischer, 5/29/2012 3.2 3.2-1 Conditional Probability and Independent Events Using population-based health studies to estimate probabilities relating potential risk factors to a particular disease, evaluate efficacy of medical diagnostic and screening tests, etc. Example: Events: A = “lung cancer” B = “smoker” S A Disease Status B 0.12 0.04 Smoker 0.03 0.81 Probabilities: P(A) = 0.15 Lung cancer (A) No lung cancer (Ac) Yes (B) 0.12 0.04 0.16 No (Bc) 0.03 0.81 0.84 0.15 0.85 1.00 P(B) = 0.16 P(A ∩ B) = 0.12 Definition: Conditional Probability of Event A, given Event B (where P(B) ≠ 0) P(A | B) = = Comments: P(B | A) = P( A ∩ B) P( B) 0.12 = 0.75 >> 0.15 = P(A). 0.16 P( B ∩ A) 0.12 = = 0.80, so P(A | B) ≠ P(B | A) in general. P( A) 0.15 General formula can be rewritten: P(A ∩ B) = P(A | B) × P(B) ← IMPORTANT Example: P(Angel barks) = 0.1 P(Brutus barks) = 0.2 P(Angel barks | Brutus barks) = 0.3 Therefore… P(Angel and Brutus bark) = 0.06 Ismor Fischer, 5/29/2012 3.2-2 Example: Suppose that two balls are to be randomly drawn, one after another, from a container holding four red balls and two green balls. Under the scenario of sampling without replacement, calculate the probabilities of the events A = “First ball is red”, B = “Second ball is red”, and A ∩ B = “First ball is red AND second ball is red”. (As an exercise, list the 6 × 5 = 30 outcomes in the sample space of this experiment, and use “brute force” to solve this problem.) R1 R3 G1 R2 R4 G2 This type of problem – known as an “urn model” – can be solved with the use of a tree diagram, where each branch of the “tree” represents a specific event, conditioned on a preceding event. The product of the probabilities of all such events along a particular sequence of branches is equal to the corresponding intersection probability, via the previous formula. In this example, we obtain the following values: 1st draw 2nd draw P(B | A) = 3/5 P(A ∩ B) = 12/30 A P(A) = 4/6 c B c P(B | A) = 2/5 A P(A ∩ B ) = 8/30 c A∩B c A ∩B c P(B | A ) = 4/5 P(Ac ∩ B) = 8/30 c P(A ) = 2/6 c c P(B | A ) = 1/5 P(Ac ∩ Bc) = 2/30 We can calculate the probability P(B) by adding the two “boxed” values above, i.e., P(B) = P(A ∩ B) + P(Ac ∩ B) = 12/30 + 8/30 = 20/30, or P(B) = 2/3. This last formula – which can be written as P(B) = P(B | A) P(A) + P(B | Ac) P(Ac) – can be extended to more general situations, where it is known as the Law of Total Probability, and is a useful tool in Bayes’ Theorem (next section). Ismor Fischer, 5/29/2012 3.2-3 Suppose event C = “coffee drinker.” Disease Status S A 0.09 0.06 0.34 0.51 Probabilities: P(A) = 0.15 Therefore, P(A | C) = Coffee Drinker C Lung cancer (A) No lung cancer (Ac) Yes (C) 0.06 0.34 0.40 No (Cc) 0.09 0.51 0.60 0.15 0.85 1.00 P(C) = 0.40 P(A ∩ C) = 0.06 P(A ∩ C) 0.06 = = 0.15 = P(A) P(C) 0.40 i.e., the occurrence of event C gives no information about the probability of event A. Definition: Two events A and B are said to be statistically independent if either: (1) P(A | B) = P(A), i.e., P(B | A) = P(B), or equivalently, (2) P(A ∩ B) = P(A) × P(B). Exercise: Prove that if events B and C are statistically independent, then so are each of the following: B and “Not C” “Not B” and C “Not B” and “Not C” Hint: Let P(B) = b, P(C) = c, and construct a 2 × 2 probability table. Summary A, B disjoint ⇔ If either event occurs, then the other cannot occur: P ( A ∩ B ) = 0. A, B independent ⇔ If either event occurs, this gives no information about the other: P ( A ∩ B= ) P ( A)× P ( B ) . Example: A = “Select a 2” and B = “Select a ♣” are not disjoint events, because A ∩ B = {2♣} ≠ ∅. However, P(A ∩ B) = 1/52 = 1/13 × 1/4 = P(A) × P(B); hence they are independent events. Can two disjoint events ever be independent? Why? Ismor Fischer, 5/29/2012 3.2-4 A VERY IMPORTANT AND USEFUL FACT: It can be shown that for any event A, all of the elementary properties of “probability” P(A) covered in the notes, extend to “conditional probability” P (A | B ) , for any other event B. For example, since we know that P( A1 ∪ A2 )= P( A1 ) + P( A2 ) − P( A1 ∩ A2 ) for any two events A1 and A2, it is also true that P( A1 ∪ A2 | B= ) P( A1 | B) + P( A2 | B) − P( A1 ∩ A2 | B) for any other event B. As another example, since we know that P ( Ac ) = 1 − P (A ) , it therefore also follows that P ( Ac | B ) = 1 − P (A | B ) . Exercise: Prove these two statements. (Hint: Sketch a Venn diagram.) HOWEVER, there is one important exception! We know that if A and B are two independent events, then P( A ∩ B) = P( A) P( B) . But this does not extend to conditional probabilities! In particular, if C is any other event, then P( A ∩ B | C ) ≠ P( A | C ) P( B | C ) in general. The following example illustrates this, for three events A, B, and C: B A .20 .20 .20 .05 .05 .05 .10 C .15 Exercise: Confirm that P( A ∩ B) = P( A) P( B) , but P( A ∩ B | C ) ≠ P( A | C ) P( B | C ) . In other words, two events that may be independent in a general population, may not necessarily be independent in a particular subgroup of that population. Ismor Fischer, 5/29/2012 3.2-5 More on Conditional Probability and Independent Events Another example from epidemiology S = POPULATION A = lung cancer A∩B S = POPULATION A = lung cancer A∩C C = smoker B = obese Suppose that, in a certain study population, we wish to investigate the prevalence of lung cancer (A), and its associations with obesity (B) and cigarette smoking (C), respectively. From the first of the two stylized Venn diagrams above, by comparing the scales drawn, observe that the proportion of the size of the intersection A ∩ B (green) relative to event B (blue + green), is about equal to the proportion of the size of event A (yellow + green) relative to the entire population S. That is, P( A) P( A ∩ B) = . P( S ) P( B) (As an exercise, verify this equality for the following probabilities: yellow = .09, green = .07, blue = .37, white = .47, to two decimals, before reading on.) In other words, the probability that a randomly chosen person from the obese subpopulation has lung cancer, is equal to the probability that a randomly chosen person from the general population has lung cancer (.16). This equation can be equivalently expressed as P(A | B) = P(A), since the left side is conditional probability by definition, and P(S) = 1 in the denominator of the right side. In this form, the equation clearly conveys the interpretation that knowledge of event B (obesity) yields no information about event A (lung cancer). In this example, lung cancer is equally probable (.16) among the obese as it is among the general population, so knowing that a person is obese is completely unrevealing with respect to having lung cancer. Events A and B that are related in this way are said to be independent. Note that they are not disjoint! In the second diagram however, the relative size of A ∩ C (orange) to C (red + orange), is larger than the relative size of A (yellow + orange) to the whole population S, so P(A | C) ≠ P(A), i.e., events A and C are dependent. Here, as is true in general, the probability of lung cancer is indeed influenced by whether a person is randomly selected from among the general population or the smoking subset, where it is much higher. Statistically, lung cancer would be a rare disease in the U.S., if not for cigarettes (although it is on the rise among nonsmokers). Ismor Fischer, 5/29/2012 3.2-6 Application: “Are Blood Antibodies Independent?” An example of conditional probability in human genetics (Adapted from Rick Chappell, Ph.D., UW Dept. of Biostatistics & Medical Informatics) Background: The surfaces of human red blood cells (“erythrocytes”) are coated with antigens that are classified into four disjoint blood types: O, A, B, and AB. Each type is associated with blood serum antibodies for the other types, that is, • • • • Type O blood contains both A and B antibodies. (This makes Type O the “universal donor”, but capable of receiving only Type O.) Type A blood contains only B antibodies. Type B blood contains only A antibodies. Type AB blood contains neither A nor B antibodies. (This makes Type AB the “universal recipient”, but capable of donating only to Type AB.) In addition, blood is also classified according to the presence (+) or absence (−) of Rh factor (found predominantly in rhesus monkeys, and to varying degree in human populations; they are important in obstetrics). Hence there are eight distinct blood groups corresponding to this joint classification system: O+, O−, A+, A−, B+, B−, AB+, AB−. According to the American Red Cross, the U.S. population has the following blood group relative frequencies: Blood Types Rh factor + − Totals O .384 .077 .461 A .323 .065 .388 B .094 .017 .111 AB .032 .007 .039 Totals .833 .166 .999 From these values (and from the background information above), we can calculate the following probabilities: P (A antibodies) = P (Type O or B) = P (O) + P (B) = .461 + .111 = .572 P (B antibodies) = P (Type O or A) = P (O) + P (A) = .461 + .388 = .849 P (B antibodies and Rh+ ) = P (Type O+ or A+) = P (O+) + P (A+) = .384 + .323 = .707 Ismor Fischer, 5/29/2012 3.2-7 Using these calculations, we can answer the following. Question: Is having “A antibodies” independent of having “B antibodies”? Solution: We must check whether or not P(A and B antibodies) = P(A antibodies) × P(B antibodies), i.e., P(Type O) .572 × .849 or .461 .486 This indicates near independence of the two events; there does exist a slight dependence. The dependence would be much stronger if America were composed of two disjoint (i.e., non-interbreeding) groups: Type A (with B antibodies only) and Type B (with A antibodies only), and no Type O (with both A and B antibodies). Since this is evidently not the case, the implication is that either these traits evolved before humans spread out geographically, or they evolved later but the populations became mixed in America. Question: Is having “B antibodies” independent of “Rh+”? Solution: We must check whether or not P (B antibodies and Rh+) = P (B antibodies) × P (Rh+), that is, .707 = .849 × .833, which is true, so we have exact independence of these events. These traits probably predate diversification in humans (and were not differentially selected for since). Exercises: • Is having “A antibodies” independent of “Rh+”? • Find P (A antibodies | B antibodies) and P (B antibodies | A antibodies). Conclusions? • Is “Blood Type” independent of “Rh factor”? (Do a separate calculation for each blood type: O, A, B, AB, and each Rh factor: +, −.)