Download B - Personal webpages at NTNU

Document related concepts
no text concepts found
Transcript
Probability (Rosner, chapter 3)
KLMED 8004, September 2010
Eirik Skogvoll, Consultant/ Professor
• What is probability?
• Basic probability axioms and rules of calculation
1
Breast cancer (Example 3.1)
•
Incidence of breast cancer during the next 5 years for women aged 45 to 54
– Group A had their first birth before the age of 20 (“early”)
– Group B had their first birth after the age of 30 (“late”)
•
•
Suppose 4 out of 1000 in group A, and 5 out of 1000 i group B develop breast
cancer over the next 5 years.
Is this a chance finding, or does it represent a genuine increased risk?
If the numbers were 40 out of 10 000 and 50 out of 10 000?
Still due to chance?
2
Diagnostic test (Eks 3.26)
• Suppose that an automated blood pressure machine classifies
85% of hypertensive patients as hypertensive,
23% of normotensive patients as hypertensive,
and we know that 20% of the general population are hypertensive.
• What is the sensitivity, specificity and positive predictive value of the
test?
3
Probability of male livebirth – expl. 3.2
Number of
livebirths
10
100
1000
10000
100000
3760358
17989361
34832051
Number of
boys
8
55
525
5139
51127
1927054
9219202
17857857
4
Proportion of
boys
0,8
0,55
0,525
0,5139
0,51127
0,51247
0,51248
0,51268
Probability (Def 3.1)
•
•
•
The sample space, S (N: “utfallsrommet”) is the set of all possible outcomes
from an experiment
An experiment is repeated n times.
The event A occurs nA times.
The relative frequency nA/n approaches a fixed number as the number of
experiments (trials) goes towards infinity.
This number Pr(A) is called the Probability of A.
This definition is termed frequentist.
5
How to quantify probability
• Empirical estimation: nA/n
• Inference/ calculations based on a theoretical/ physical model
• ”Subjective” probability
”Probability has no universally accepted
interpretation”
Chatterjee, S. K. Statistical Thought. A perspective and History. Oxford
University Press, 2003. Page 36.
6
Example: throw a die
• Probability of a six is 1/6
• Probability of five or six is 2/6
• These calculations are made under assumptions of fair dice (equal
probabiltiy of all outcomes) and certain rules of calculation.
7
(Very) subjective probability:
”There is hardly any way back, says the UN climate
committee. There is a a 50 percent chance that polar
meltdown is inevitable, an April report claims.
”The UN climate comittee presented their latest report in
January. The committe states that there is a 90 percent
chance that global warming is caused by human activity”
http://www.aftenposten.no/nyheter/miljo/article1650116.ece
(19.02.2007)
8
http://weather.yahoo.com/
accessed 31. August 2010 at 1111 hours
Tonight: A steady rain early...then remaining cloudy with a
few showers. Low 43F. Winds WNW at 5 to 10 mph.
Chance of rain 80%. Rainfall near a quarter of an inch.
9
Mutually exclusive events (Def 3.2)
• Two events A og B are mutually exclusive (N: “disjunkte”) if they
cannot both happen at the same time
10
Expl. 3.7 Diastolic blood pressure (DBP)
• A = {DBP  90}
• B = {75  DBP  100}
• A og B are not mutually exclusive
11
A  B (“A union B”) means that A, or B,
or both, occur (Def 3.4).
12
Example
• A = {DBP  90}
• B = {75  DBP  100}
• A  B = {DBP  75}
13
A  B (“Intersection”, N: “Snitt”) means that
both A and B occurs
(Def. 3.5)
14
Example
• A = {DBP  90}
• B = {75  DBP  100}
• A  B = {90  DBP  100}
15
Basic rules of probability
Kolmogorov’s axioms (1933, Eq. 3.1)
•
The probability of an event, E, always satisfies:
0  Pr(E)  1
•
If A and B are mutually exclusive, then
Pr(A  B) = Pr(A) + Pr(B)
This also applies to more than 2 events.
•
The probability of a certain event is 1:
16
Pr(S) = 1
Example (Rosner, expl 3.6, s.47), diastolic BP
A: DBP < 90 mmHg (normal). Pr (A) = 0,7
B: 90  DBT < 95 (“borderline”). Pr (B) = 0,1
C: DBT < 95
Pr (C) = Pr(A B) = Pr (A) + Pr (B) = 0,7 + 0,1 = 0,8
Because mutually exclusive
17
A ("complement of A") means that A does not occur.
(Def 3.6)
Pr(A) = 1 - Pr(A)
18
Independent events
• “A og B are independent if Pr(B) is not influenced by whether A has
happened or not.”
• Def 3.7: A and B are independent if
Pr(A B) = Pr(A) Pr(B)
19
Example 3.15
Testing for syphilis
A = {Dr A makes a positive diagnosis}
B  = {Dr B makes a positive diagnosis}
Given that
Pr( A )  0,1
Pr( B  )  0,17
Pr( A  B  )  0, 08
Then
Pr( A  B  )  0, 08 > Pr( A )  Pr( B  )  0,1 0,17  0, 017
and the events are dependent (as expected)
20
The multiplication law of probability
(Equation 3.2)
• If A1, …, Ak are independent, then
Pr(A1 A2 ... Ak) = Pr(A1)Pr(A2)…Pr(Ak)
21
The addition law of probability (Eq. 3.3)
• Pr(AB) = Pr(A) + Pr(B) - Pr(AB)
Rosner fig.
3.5, s. 52
22
Example 3.13 and 3.17
A= {Mother’s DBP  95} B = {Father’s DBP  95}
Pr (A) = 0,1 Pr (B) = 0,2 Assume independence.
What is the probability of being a “hypertensive family”?
Pr(AB) = Pr(A)*Pr(B) = 0,1*0,2 = 0,02
What is the probability of at least one parent being
hypertensive?
Pr (A  B) = Pr (A) + Pr (B) - Pr(AB)
= 0,1 + 0,2 - 0,02 = 0,28
23
Addition theorem for 3 events
Consider three independent events A, B and C
Pr (A  B  C) = Pr (A) + Pr (B) + Pr (C)
- Pr (A  B) - Pr (A  C) - Pr (B  C) + Pr (A  B  C)
A
B
S
C
24
Conditional probability – Aalen et al. (2006)
New cancer within 1 year
A
15 000
4 500
Age 70-79 year
Population 4 000 000
B
300 000
A = ”This person develops cancer within 1 year” P(A) =
B = ”The person is 70-79 years old” P(B) 
P( A | B) 
4500
 1.5%
300000
P( A | B) 
4500 / 4000000
P ( A  B)

300000 / 4000000
P( B)
25
30 0000
4 000 000
15 000
 0.38%
4 000 000
Conditional probability - def 3.9
•
•
•
Conditional probability of B given A:
We “re-define” the sample space from S to A:
Pr(B|A) = Pr(A  B)/Pr(A)
26
Conditional probability and independence
A and B are independent if and only if (Eq. 3.5  )
(1)
Pr(B|A) = Pr(B)
Then also Pr(B|A)  Pr(B), and the corresponding for A|B.
(1) may be used as a definition of independence!
27
Example 3.20 (cont. expl 3.15)
Pr( B  | A )  Pr( B   A ) / Pr( A )  0, 08 / 0, 01  0,8
 Pr(B  )=0,17 - events are dependent
Pr( B  | A )  Pr( B   A ) / Pr( A )
Pr( B  )  Pr( B   A )  Pr( B   A )
så
because mutually exclusive
Pr( B  | A )  (Pr( B  )  Pr( B   A )) / Pr( A )  (0,17  0, 08) / 0,9  0,1
28
Another look at problem 3.1 +++
A 2 by 2 table of 100 families:
Mother ill
(A1)
Mother
healthy
Totalt
Father ill
(A2)
2
Father
healthy
8
Total
8
82
90
10
90
100
10
Note the difference of (A1  A2) og (A1|A2 ) …
(A1  A2) are defined on S (the entire sample space) while
(A1|A2) is defined on A2 as the sample space
29
Relative risk
Relative risk (RR) of B given A (def 3.10):
Pr(B|A)
RR =
Pr(B|A)
If A are B independent, RR=1 (by definition)
30
Relative risk - eks 3.19
A = {Positive mammography}
B = {Breast cancer the next 2 years}
Pr(B|A) = 0,1
Pr(B|A) = 0,0002
Pr(B|A)
0,1
RR =
=
= 500
0,0002
Pr(B|A)
31
Dependent events (expl 3.14 →)
•
•
•
A = {Mother’s DBP  95},
B = {First born child’s DBP  95}
Pr(A) = 0,1 Pr(B) = 0,2 Pr(AB) = 0,05 (known!)
•
Pr(A)*Pr(B) = 0,1*0,2 = 0,02
 Pr(AB)
•
thus: the events are dependent!
Pr(B|A) = Pr(AB)/Pr(A) = 0,05/0,1 = 0,5  Pr(B)
32
Generalized Multiplication law of probability
(Eq 3.8)
•
From the definition of conditional probability, we have:
Pr(AB) = Pr(A)*Pr(B|A)
•
In general:
Pr(A1 A2 ... Ak) = Pr(A1)*Pr(A2|A1)*Pr(A3|A2A1)* …* Pr(Ak|Ak...A2A1)
33
Total-Probability Rule (Eq 3.7)
A2
A1
B
Ak
k
Pr( B )   Pr( B | Ai ) Pr( Ai )
i 1
34
Prevalence
• The prevalence of a disease equals the proportion of population that is
diseased (def 3.17)
• Expl. (Aalen, 1998):
– By 31. December 1995, 21 482 Norwegian women suffered from breast
cancer.
– Total female population: 2 150 000
– Prevalence: 21 482 / 2 150 000 = 0,010 ( 1 %)
35
Incidence (or incidence rate)
• Incidence is a measure of the number of new cases occurring during
some time period (i.e. a rate)
• Expl (Aalen, 1998):
– During 1995, a total of 2 154 Norwegian women were diagnosed with
breast cancer
– Total female population: 2 150 000
– Incidence rate: 2 154 cases/ (2 150 000 persons * 1 year)
= 0,0010 cases per person and year
36
Prevalence of cataract - expl 3.22
We wish to determine the total prevalence of cataract in the
population ≥ 60 years during the next 5 years. Age specific prevalence
is known.
A1 = {60-64 yrs}, A2 = {65-69 yrs},A3 = {70-74 yrs}, A4 = {75+ yrs},
B = {catarakt within 5 år}
Pr(A1)=0,45, Pr(A2)=0,28, Pr(A3)=0,20, Pr(A4)=0,07
Pr(B|A1)=0,024, Pr(B|A2)=0,046, Pr(B|A3)=0,088, Pr(B|A4)=0,153
k
Pr(B) =  Pr (B|Ai)*Pr(Ai)
i=1
0.024*0.450+ 046*0.280 +0.088* 0.20+ 0,153*0,070 = 0.052
37
Eks: Age adjusted incidenc of breast cancer,
www.kreftregisteret.no
Age-adjusted incidence rate 1954–99 (world std.)
Breast, females
80
60
Rate per
100 000
40
20
0
1954 1959 1964 1969 1974 1979 1984 1989 1994 1999
Year o f diagno sis
38
Bayes’ rule, diagnosis and screening
A  {symptom or positive diagnostic test}
B  {disease}
P(B)  disease prevalence
P(A|B)  sensitivity
P(A|B)  " false positive rate"
S
B
A
P(A|B)  spesificity
B
P(A|B)  P(A|B)  1 (why?)
 P(A|B)  1  P(A|B)  1  specificity
P(B|A)  PPV  PV   positive predictive value
P(B|A)  NPV PV   negative predictive value
39
Diagnosis of breast cancer (expl 3.23)
A = {pos. mammogram}
B = {breast cancer within 2 years}
Pr (B | A)  0,0002  Pr (B | A)  1  0,0002  0,9998
Dvs. NPV  PV   0,9998
Pr (B | A)  0,1  PPV  PV 
40
Bayes’ rule
Definition (Rosner Eq. 3.9) Bayes’ rule/ theorem
Combines the expressions of conditional and total probability:
Pr (B  A)
Pr (A | B)  Pr (B)
PPV  PV  P(B | A) 

Pr (A)
Pr (A | B)  Pr (B)  P(A | B)  P(B)

S
We have found one
conditional probability by
means of the “opposite” or
“inverse” conditional
probability!
B
A
B
41
Bayes’ rule
Example (Rosner expl. 3.26, s. 61)
Prevalence of hypertension = Pr (B) = 0,2. The auto-BP
machine classifies 84 % of hypertensive patients and 23 %
of normotensive patients as hypertensive.
PPV? NPV?
Pr (A|B)  0,84 (sensitivity)
og Pr ( A | B )  0,23 ("false positive rate")
dvs. spesificity  Pr (A | B)  1  0, 23  0,77
42
From Bayes' rule we have
Pr( A | B )  Pr( B )
PV  Pr( B | A) 
Pr( A | B )  Pr( B )  Pr( A | B )  Pr( B )
sens  prevalence

sens  prevalence  (1  spes )  (1  prevalence )
0,84  0, 2
0,168


 0, 48
0,84  0, 2  0, 23  0,8 0,352

and similarly
spec  (1  prevalence)
spec  (1  prevalence)  (1  sens )  prevalence
0, 77  0,8
0, 616


 0,95
0, 77  0,8  0,16  0, 2 0, 648
PV -  Pr( B | A) 
43
Bayes’ rule.
Low prevalence – a paradox?
What if the prevalence is low?
Pr(B) = 0,0001
P(A|B) = 0,84 (sensitivity)
P(A|B)  0, 77 (specificity)
Then
0,84  0,0001
PPV =
= 0,0037
0,84  0,0001 + (1-0,77)(1-0,0001)
0,77  (1  0,0001)
NPV =
= 0,999998
0,77  (1  0,0001) + (1-0,84)  0,0001
44
Bayes’ rule, diagnosis and screening
Traditional 2*2 table
Illness
–
+
Test
result
+
–
a [TP]
c [FN]
a+c
b [FP]
a+b
d [TN]
c+d
b+d
a+b+c+d
A = {test positive}, B = {illness}, TP = true positive, FP = false positive,
FN = false negative, TN = true negative
45
ac
Prevalence  P(B) 
abcd
a
Sensitivity  P ( A | B ) 
ac
d
Spesificit y  P ( A | B ) 
bd
a
PPV  P ( B | A) 
ab
d
NPV  P ( B | A ) 
cd
ad


Accuracy


a  b  c  d 

46
Using a 2*2 table
require us to
“invent” patients on
order to calculate
PPV etc. …!
With Bayes’ rule
this information is
utilised directly.
Diagnostics/ ROC
Rosner
tbl. 3.2
og 3.3,
s. 6364
Criterium “1+”: all rated 1 to 5 are diagnosed as abnormal.
We find all the diseased, but identify none as healthy.
Sensitivity = 1, spesificity = 0, ‘false positive rate’ = 1.
47
Diagnostics/ ROC
Criterium “2+”: all rated 2 til 5 are diagnosed as abnormal.
We find 48/51 diseased, and identify 33/58 as healthy.
Sensitivity = 0,94 Specificity = 0,57 ‘False positive rate’ = 0,43
48
Diagnostics/ ROC
Criterium “3+”: all rated 3 to 5 are diagnosed as abnormal.
We find 46/51 diseased, and identify 39/58 as healthy.
Sensitivity = 0,90 Spesificity = 0,67 ‘False positive rate’ = 0,33
49
Diagnostics/ ROC
Criterium “4+”: all rated 4 and 5 are diagnosed as abnormal.
We find 44/51 diseased, and identify 45/58 as healthy.
Sensitivity = 0,86 Specificity = 0,78 ‘False positive rate’ = 0,22
50
Diagnostics/ ROC
Criterium “5+”: all rated 5 are diagnosed as abnormal.
We find 33/51 diseased, and identify 56/58 as healthy.
Sensitivity = 0,65 Specificity = 0,97 ‘False positive rate’ = 0,03
51
Diagnostics/ ROC
Criterium “6+”: All rated > 5 are diagnosed as abnormal (nonsense!).
We find no diseased and identify everybody as healthy.
Sensitivity = 0 Specificity = 1 ‘False positive rate’ = 0
52
Diagnostics/ ROC
(receiver operating characteristic)
‘False pos. rate’
1
0,43
0,33
0,22
0,03
0
The result is
summarized as a
table ...: (Rosner
table 3.3, s. 64)
… and shown as a
ROC curve.
(Rosner fig. 3.7, s.
64) “Cut-off”
values may be
decided from
visual inspection.
53
Area under the ROC curve
• Summarizes overall diagnostic performance
• Corresponds to the probability that a diseased patient is correctly
classified, compared to a healthy patient
• Equals 1 for a perfect test
• Equals 0,5 for a non-informative test
• Equals 0,89 in the example
54
Related documents