* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Topic 2 - NCSU Statistics
Survey
Document related concepts
Transcript
ST 521: Statistical Theory I Armin Schwartzman Class topic 2 Conditional Probability and Independence 2 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 cont. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Independence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Independence of many events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Independence of many events (cont.) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Sequential conditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 The Birthday Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 The Birthday Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Turning around probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Decomposition Formula (Total Probability) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 (So called) Bayes’ Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Bayes and Screening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Example: cervical cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1 Conditional Probability and Independence 2 / 15 Conditional Probability If P (B) > 0 we define the conditional probability of the event A given B as P (A|B) = P (AB) P (B) Intuitively, conditioning on B means reducing the original sample space S to B, which becomes the new, reduced sample space. All probabilities are computed with respect to B. Notice: P (A|S) = P (AS) = P (A) P (S) Disjoint events: If AB = ∅, then P (A|B) = 0 and P (B|A) = 0. ST 521 – Class topic 2 Fall 2014 – 3 / 15 cont. Conditional probability satisfies the axioms of probability: 1. 2. 3. P (S|B) = 1 P (A|B) ≥ 0 ∞ ∞ If A1 , A2 , . . . are mutually exclusive events, then P i=1 Ai B = i=1 P (Ai |B) and all the other properties: 1. 2. 3. P (∅|B) = 0 P (A|B) ≤ 1 P (Ac |B) = 1 − P (A|B) etc. ST 521 – Class topic 2 Fall 2014 – 4 / 15 2 Independence Two events A and B are said to be independent if P (AB) = P (A)P (B) Why? Because then (1) P (AB) P (A)P (B) = = P (A) P (B) P (B) P (A)P (B) P (AB) = = P (B) P (B|A) = P (A) P (A) P (A|B) = This could have been taken as the definition, but (1) is easier to generalize. If A and B are independent, then so are • Ac and B • A and B c • Ac and B c * Can two disjoint events be independent and viceversa? (HW) ST 521 – Class topic 2 Fall 2014 – 5 / 15 Independence of many events The events A1 , A2 , . . . , An are mutually independent if for every subcollection A i1 , . . . , Aik of size k = 2, . . . , n k k Aij = P (Aij ) P j=1 j=1 Notice that is a very strong condition. But it is necessary to ensure that P (Aj |Ai1 . . . Aik ) = P (Aj ) for every j and every subcollection Ai1 , . . . , Aik that does not include Aj . Pairwise independence is not enough. ST 521 – Class topic 2 Fall 2014 – 6 / 15 3 Independence of many events (cont.) Example: Toss a coin three times. Define the events: • H1 = { The outcome of the first toss is heads } • H2 = { The outcome of the second toss is heads } • H3 = { The outcome of the third toss is heads } Suppose every outcome is equally likely. Then these events independent. Now define the events: • A12 = { The outcome of the first toss equals the second } • A13 = { The outcome of the first toss equals the third } • A23 = { The outcome of the second toss equals the third } These events are pairwise independent but not mutually independent. ST 521 – Class topic 2 Fall 2014 – 7 / 15 Sequential conditioning By the definition of conditional probability: P (AB) = P (A)P (B|A) P (AB) = P (B)P (A|B) This is useful for computing probabilities of sequential events. E.g: What is the probability of dealing two aces in a row? More generally: P (A1 . . . An ) = P (A1 )P (A2 |A1 )P (A3 |A1 A2 ) . . . P (An |A1 . . . An−1 ) ST 521 – Class topic 2 Fall 2014 – 8 / 15 4 The Birthday Problem In a group of n students in a class, what is the probability that at least two have the same birthday? Solution: Suppose we order the n students in an arbitrary order. Let D j be the event that the first j have different birthdays. Then ST 521 – Class topic 2 Fall 2014 – 9 / 15 The Birthday Problem 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 10 20 30 40 n ST 521 – Class topic 2 50 60 70 80 Fall 2014 – 10 / 15 5 Turning around probabilities Also by the definition of conditional probability: P (A|B) = P (B|A)P (A) P (B) (2) This is useful for computing conditional probabilities when the reverse conditioning is easier to compute. E.g.: Prob. that it will rain given that it is thundering vs. prob. that it thundered given that it is raining. (2) is called sometimes Bayes’ rule. It is often used in a context where we want to know the probability that a particular hypothesis is true. We have an a priori belief in whether or not the hypothesis is true, then update that probability by collecting data. E.g.: Suppose a priori boys are equally likely to be born as girls. Say 90% of boys play with trucks. Baby X plays with trucks. What is the probability that Baby X is a boy? ST 521 – Class topic 2 Fall 2014 – 11 / 15 Decomposition Formula (Total Probability) Let {A1 , A2 , . . .} be a partition of S. Then P (B) = n P (Ai )P (B|Ai ) i=1 Proof: ST 521 – Class topic 2 Fall 2014 – 12 / 15 6 (So called) Bayes’ Theorem Let {A1 , A2 , . . .} be a partition of S. Then P (B|Aj )P (Aj ) P (Aj |B) = n i=1 P (B|Ai )P (Ai ) Proof: ST 521 – Class topic 2 Fall 2014 – 13 / 15 Bayes and Screening An important application of Bayes’ theorem is screening. Let D be the disease: D means diseased D means ”no disease” and T be the diagnostic test: T + means a positive test T − means a negative test. Then we have that the positive predictive value P (D|T + ) = ≡ P (D) P (T + |D) P (D) P (T + |D) + P (D) P (T + |D) prevalence × sensitivity prev. × sens. + (1 − prev.) × (1 − specificity) ST 521 – Class topic 2 Fall 2014 – 14 / 15 7 Example: cervical cancer Let D be cervical cancer. P (D) we’ll take to be 1 in 21,000, which is the approximate annual incidence rate in the US (SEER 2002 estimate). P (D) = .00004762 Let us take the sensitivity (P (T + |D)) to be 0.71 and the specificity (1 − P (T + |D)) to be 0.75. Thence the positive predictive value is: 0.00004762 × 0.71 0.00004762 × 0.71 + (1 − 0.00004762) × (1 − 0.75) = 0.000135 P (D|T + ) = That means that for every 1,000,000 positive results, about 135 truly have cervical cancer. But for any particular patient, testing positive increases the probability of having the disease by a factor of 2.8! ST 521 – Class topic 2 Fall 2014 – 15 / 15 8