Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Probability (Rosner, chapter 3) KLMED 8004, September 2010 Eirik Skogvoll, Consultant/ Professor • What is probability? • Basic probability axioms and rules of calculation 1 Breast cancer (Example 3.1) • Incidence of breast cancer during the next 5 years for women aged 45 to 54 – Group A had their first birth before the age of 20 (“early”) – Group B had their first birth after the age of 30 (“late”) • • Suppose 4 out of 1000 in group A, and 5 out of 1000 i group B develop breast cancer over the next 5 years. Is this a chance finding, or does it represent a genuine increased risk? If the numbers were 40 out of 10 000 and 50 out of 10 000? Still due to chance? 2 Diagnostic test (Eks 3.26) • Suppose that an automated blood pressure machine classifies 85% of hypertensive patients as hypertensive, 23% of normotensive patients as hypertensive, and we know that 20% of the general population are hypertensive. • What is the sensitivity, specificity and positive predictive value of the test? 3 Probability of male livebirth – expl. 3.2 Number of livebirths 10 100 1000 10000 100000 3760358 17989361 34832051 Number of boys 8 55 525 5139 51127 1927054 9219202 17857857 4 Proportion of boys 0,8 0,55 0,525 0,5139 0,51127 0,51247 0,51248 0,51268 Probability (Def 3.1) • • • The sample space, S (N: “utfallsrommet”) is the set of all possible outcomes from an experiment An experiment is repeated n times. The event A occurs nA times. The relative frequency nA/n approaches a fixed number as the number of experiments (trials) goes towards infinity. This number Pr(A) is called the Probability of A. This definition is termed frequentist. 5 How to quantify probability • Empirical estimation: nA/n • Inference/ calculations based on a theoretical/ physical model • ”Subjective” probability ”Probability has no universally accepted interpretation” Chatterjee, S. K. Statistical Thought. A perspective and History. Oxford University Press, 2003. Page 36. 6 Example: throw a die • Probability of a six is 1/6 • Probability of five or six is 2/6 • These calculations are made under assumptions of fair dice (equal probabiltiy of all outcomes) and certain rules of calculation. 7 (Very) subjective probability: ”There is hardly any way back, says the UN climate committee. There is a a 50 percent chance that polar meltdown is inevitable, an April report claims. ”The UN climate comittee presented their latest report in January. The committe states that there is a 90 percent chance that global warming is caused by human activity” http://www.aftenposten.no/nyheter/miljo/article1650116.ece (19.02.2007) 8 http://weather.yahoo.com/ accessed 31. August 2010 at 1111 hours Tonight: A steady rain early...then remaining cloudy with a few showers. Low 43F. Winds WNW at 5 to 10 mph. Chance of rain 80%. Rainfall near a quarter of an inch. 9 Mutually exclusive events (Def 3.2) • Two events A og B are mutually exclusive (N: “disjunkte”) if they cannot both happen at the same time 10 Expl. 3.7 Diastolic blood pressure (DBP) • A = {DBP 90} • B = {75 DBP 100} • A og B are not mutually exclusive 11 A B (“A union B”) means that A, or B, or both, occur (Def 3.4). 12 Example • A = {DBP 90} • B = {75 DBP 100} • A B = {DBP 75} 13 A B (“Intersection”, N: “Snitt”) means that both A and B occurs (Def. 3.5) 14 Example • A = {DBP 90} • B = {75 DBP 100} • A B = {90 DBP 100} 15 Basic rules of probability Kolmogorov’s axioms (1933, Eq. 3.1) • The probability of an event, E, always satisfies: 0 Pr(E) 1 • If A and B are mutually exclusive, then Pr(A B) = Pr(A) + Pr(B) This also applies to more than 2 events. • The probability of a certain event is 1: 16 Pr(S) = 1 Example (Rosner, expl 3.6, s.47), diastolic BP A: DBP < 90 mmHg (normal). Pr (A) = 0,7 B: 90 DBT < 95 (“borderline”). Pr (B) = 0,1 C: DBT < 95 Pr (C) = Pr(A B) = Pr (A) + Pr (B) = 0,7 + 0,1 = 0,8 Because mutually exclusive 17 A ("complement of A") means that A does not occur. (Def 3.6) Pr(A) = 1 - Pr(A) 18 Independent events • “A og B are independent if Pr(B) is not influenced by whether A has happened or not.” • Def 3.7: A and B are independent if Pr(A B) = Pr(A) Pr(B) 19 Example 3.15 Testing for syphilis A = {Dr A makes a positive diagnosis} B = {Dr B makes a positive diagnosis} Given that Pr( A ) 0,1 Pr( B ) 0,17 Pr( A B ) 0, 08 Then Pr( A B ) 0, 08 > Pr( A ) Pr( B ) 0,1 0,17 0, 017 and the events are dependent (as expected) 20 The multiplication law of probability (Equation 3.2) • If A1, …, Ak are independent, then Pr(A1 A2 ... Ak) = Pr(A1)Pr(A2)…Pr(Ak) 21 The addition law of probability (Eq. 3.3) • Pr(AB) = Pr(A) + Pr(B) - Pr(AB) Rosner fig. 3.5, s. 52 22 Example 3.13 and 3.17 A= {Mother’s DBP 95} B = {Father’s DBP 95} Pr (A) = 0,1 Pr (B) = 0,2 Assume independence. What is the probability of being a “hypertensive family”? Pr(AB) = Pr(A)*Pr(B) = 0,1*0,2 = 0,02 What is the probability of at least one parent being hypertensive? Pr (A B) = Pr (A) + Pr (B) - Pr(AB) = 0,1 + 0,2 - 0,02 = 0,28 23 Addition theorem for 3 events Consider three independent events A, B and C Pr (A B C) = Pr (A) + Pr (B) + Pr (C) - Pr (A B) - Pr (A C) - Pr (B C) + Pr (A B C) A B S C 24 Conditional probability – Aalen et al. (2006) New cancer within 1 year A 15 000 4 500 Age 70-79 year Population 4 000 000 B 300 000 A = ”This person develops cancer within 1 year” P(A) = B = ”The person is 70-79 years old” P(B) P( A | B) 4500 1.5% 300000 P( A | B) 4500 / 4000000 P ( A B) 300000 / 4000000 P( B) 25 30 0000 4 000 000 15 000 0.38% 4 000 000 Conditional probability - def 3.9 • • • Conditional probability of B given A: We “re-define” the sample space from S to A: Pr(B|A) = Pr(A B)/Pr(A) 26 Conditional probability and independence A and B are independent if and only if (Eq. 3.5 ) (1) Pr(B|A) = Pr(B) Then also Pr(B|A) Pr(B), and the corresponding for A|B. (1) may be used as a definition of independence! 27 Example 3.20 (cont. expl 3.15) Pr( B | A ) Pr( B A ) / Pr( A ) 0, 08 / 0, 01 0,8 Pr(B )=0,17 - events are dependent Pr( B | A ) Pr( B A ) / Pr( A ) Pr( B ) Pr( B A ) Pr( B A ) så because mutually exclusive Pr( B | A ) (Pr( B ) Pr( B A )) / Pr( A ) (0,17 0, 08) / 0,9 0,1 28 Another look at problem 3.1 +++ A 2 by 2 table of 100 families: Mother ill (A1) Mother healthy Totalt Father ill (A2) 2 Father healthy 8 Total 8 82 90 10 90 100 10 Note the difference of (A1 A2) og (A1|A2 ) … (A1 A2) are defined on S (the entire sample space) while (A1|A2) is defined on A2 as the sample space 29 Relative risk Relative risk (RR) of B given A (def 3.10): Pr(B|A) RR = Pr(B|A) If A are B independent, RR=1 (by definition) 30 Relative risk - eks 3.19 A = {Positive mammography} B = {Breast cancer the next 2 years} Pr(B|A) = 0,1 Pr(B|A) = 0,0002 Pr(B|A) 0,1 RR = = = 500 0,0002 Pr(B|A) 31 Dependent events (expl 3.14 →) • • • A = {Mother’s DBP 95}, B = {First born child’s DBP 95} Pr(A) = 0,1 Pr(B) = 0,2 Pr(AB) = 0,05 (known!) • Pr(A)*Pr(B) = 0,1*0,2 = 0,02 Pr(AB) • thus: the events are dependent! Pr(B|A) = Pr(AB)/Pr(A) = 0,05/0,1 = 0,5 Pr(B) 32 Generalized Multiplication law of probability (Eq 3.8) • From the definition of conditional probability, we have: Pr(AB) = Pr(A)*Pr(B|A) • In general: Pr(A1 A2 ... Ak) = Pr(A1)*Pr(A2|A1)*Pr(A3|A2A1)* …* Pr(Ak|Ak...A2A1) 33 Total-Probability Rule (Eq 3.7) A2 A1 B Ak k Pr( B ) Pr( B | Ai ) Pr( Ai ) i 1 34 Prevalence • The prevalence of a disease equals the proportion of population that is diseased (def 3.17) • Expl. (Aalen, 1998): – By 31. December 1995, 21 482 Norwegian women suffered from breast cancer. – Total female population: 2 150 000 – Prevalence: 21 482 / 2 150 000 = 0,010 ( 1 %) 35 Incidence (or incidence rate) • Incidence is a measure of the number of new cases occurring during some time period (i.e. a rate) • Expl (Aalen, 1998): – During 1995, a total of 2 154 Norwegian women were diagnosed with breast cancer – Total female population: 2 150 000 – Incidence rate: 2 154 cases/ (2 150 000 persons * 1 year) = 0,0010 cases per person and year 36 Prevalence of cataract - expl 3.22 We wish to determine the total prevalence of cataract in the population ≥ 60 years during the next 5 years. Age specific prevalence is known. A1 = {60-64 yrs}, A2 = {65-69 yrs},A3 = {70-74 yrs}, A4 = {75+ yrs}, B = {catarakt within 5 år} Pr(A1)=0,45, Pr(A2)=0,28, Pr(A3)=0,20, Pr(A4)=0,07 Pr(B|A1)=0,024, Pr(B|A2)=0,046, Pr(B|A3)=0,088, Pr(B|A4)=0,153 k Pr(B) = Pr (B|Ai)*Pr(Ai) i=1 0.024*0.450+ 046*0.280 +0.088* 0.20+ 0,153*0,070 = 0.052 37 Eks: Age adjusted incidenc of breast cancer, www.kreftregisteret.no Age-adjusted incidence rate 1954–99 (world std.) Breast, females 80 60 Rate per 100 000 40 20 0 1954 1959 1964 1969 1974 1979 1984 1989 1994 1999 Year o f diagno sis 38 Bayes’ rule, diagnosis and screening A {symptom or positive diagnostic test} B {disease} P(B) disease prevalence P(A|B) sensitivity P(A|B) " false positive rate" S B A P(A|B) spesificity B P(A|B) P(A|B) 1 (why?) P(A|B) 1 P(A|B) 1 specificity P(B|A) PPV PV positive predictive value P(B|A) NPV PV negative predictive value 39 Diagnosis of breast cancer (expl 3.23) A = {pos. mammogram} B = {breast cancer within 2 years} Pr (B | A) 0,0002 Pr (B | A) 1 0,0002 0,9998 Dvs. NPV PV 0,9998 Pr (B | A) 0,1 PPV PV 40 Bayes’ rule Definition (Rosner Eq. 3.9) Bayes’ rule/ theorem Combines the expressions of conditional and total probability: Pr (B A) Pr (A | B) Pr (B) PPV PV P(B | A) Pr (A) Pr (A | B) Pr (B) P(A | B) P(B) S We have found one conditional probability by means of the “opposite” or “inverse” conditional probability! B A B 41 Bayes’ rule Example (Rosner expl. 3.26, s. 61) Prevalence of hypertension = Pr (B) = 0,2. The auto-BP machine classifies 84 % of hypertensive patients and 23 % of normotensive patients as hypertensive. PPV? NPV? Pr (A|B) 0,84 (sensitivity) og Pr ( A | B ) 0,23 ("false positive rate") dvs. spesificity Pr (A | B) 1 0, 23 0,77 42 From Bayes' rule we have Pr( A | B ) Pr( B ) PV Pr( B | A) Pr( A | B ) Pr( B ) Pr( A | B ) Pr( B ) sens prevalence sens prevalence (1 spes ) (1 prevalence ) 0,84 0, 2 0,168 0, 48 0,84 0, 2 0, 23 0,8 0,352 and similarly spec (1 prevalence) spec (1 prevalence) (1 sens ) prevalence 0, 77 0,8 0, 616 0,95 0, 77 0,8 0,16 0, 2 0, 648 PV - Pr( B | A) 43 Bayes’ rule. Low prevalence – a paradox? What if the prevalence is low? Pr(B) = 0,0001 P(A|B) = 0,84 (sensitivity) P(A|B) 0, 77 (specificity) Then 0,84 0,0001 PPV = = 0,0037 0,84 0,0001 + (1-0,77)(1-0,0001) 0,77 (1 0,0001) NPV = = 0,999998 0,77 (1 0,0001) + (1-0,84) 0,0001 44 Bayes’ rule, diagnosis and screening Traditional 2*2 table Illness – + Test result + – a [TP] c [FN] a+c b [FP] a+b d [TN] c+d b+d a+b+c+d A = {test positive}, B = {illness}, TP = true positive, FP = false positive, FN = false negative, TN = true negative 45 ac Prevalence P(B) abcd a Sensitivity P ( A | B ) ac d Spesificit y P ( A | B ) bd a PPV P ( B | A) ab d NPV P ( B | A ) cd ad Accuracy a b c d 46 Using a 2*2 table require us to “invent” patients on order to calculate PPV etc. …! With Bayes’ rule this information is utilised directly. Diagnostics/ ROC Rosner tbl. 3.2 og 3.3, s. 6364 Criterium “1+”: all rated 1 to 5 are diagnosed as abnormal. We find all the diseased, but identify none as healthy. Sensitivity = 1, spesificity = 0, ‘false positive rate’ = 1. 47 Diagnostics/ ROC Criterium “2+”: all rated 2 til 5 are diagnosed as abnormal. We find 48/51 diseased, and identify 33/58 as healthy. Sensitivity = 0,94 Specificity = 0,57 ‘False positive rate’ = 0,43 48 Diagnostics/ ROC Criterium “3+”: all rated 3 to 5 are diagnosed as abnormal. We find 46/51 diseased, and identify 39/58 as healthy. Sensitivity = 0,90 Spesificity = 0,67 ‘False positive rate’ = 0,33 49 Diagnostics/ ROC Criterium “4+”: all rated 4 and 5 are diagnosed as abnormal. We find 44/51 diseased, and identify 45/58 as healthy. Sensitivity = 0,86 Specificity = 0,78 ‘False positive rate’ = 0,22 50 Diagnostics/ ROC Criterium “5+”: all rated 5 are diagnosed as abnormal. We find 33/51 diseased, and identify 56/58 as healthy. Sensitivity = 0,65 Specificity = 0,97 ‘False positive rate’ = 0,03 51 Diagnostics/ ROC Criterium “6+”: All rated > 5 are diagnosed as abnormal (nonsense!). We find no diseased and identify everybody as healthy. Sensitivity = 0 Specificity = 1 ‘False positive rate’ = 0 52 Diagnostics/ ROC (receiver operating characteristic) ‘False pos. rate’ 1 0,43 0,33 0,22 0,03 0 The result is summarized as a table ...: (Rosner table 3.3, s. 64) … and shown as a ROC curve. (Rosner fig. 3.7, s. 64) “Cut-off” values may be decided from visual inspection. 53 Area under the ROC curve • Summarizes overall diagnostic performance • Corresponds to the probability that a diseased patient is correctly classified, compared to a healthy patient • Equals 1 for a perfect test • Equals 0,5 for a non-informative test • Equals 0,89 in the example 54