Download bolt.mph.ufl.edu

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
1
DATASET INTRODUCTION
2
Dataset: Urine
From Cleveland Clinic 1981-1984
3
Outcome Variable:
Categorical Variable
 Calcium Oxalate Crystal Presence
• In this analysis, this variable will be our
• Outcome variable
• Response Variable
• Dependent Variable
• Note: The dataset is coded directly as Yes/No (not 0/1 coding)
4
Other Variables (Covariates)
QuantitativeVariables
 Specific Gravity
 pH
 Osmolarity
 Conductivity
 Urea Concentration (millimoles/liter)
 Calcium Concentration (millimoles/liter)
 Cholesterol: serum cholesterol levels
5
Discussion/Review
 Purpose of dataset: Determine which of the covariates
are related to the outcome. Covariates can also be called
• Independent Variables
• Predictors
• Explanatory Variables
 Outcomes/Covariates can be categorical or quantitative
 Can be more than one outcome and many covariates in a
given study with any mixture of variable types
6
Calcium
Oxalate Crystal
Presence
N
Mean
Std
Dev
Min
Q1
Med
Q3
Max
8.48
No
42
2.69
1.90
0.17
1.22
2.16
3.93
Yes
31
5.92
3.59
0.27
3.10
6.19
7.82 14.34
7
Discussion
 Clearly, those with calcium oxalate crystals present tend
to have higher calcium concentrations
 Later we will learn to conduct hypothesis tests in such
situations
 Now we use this data to illustrate concepts of probability
8
Comments
 To facilitate our discussion of probability and classification
tests
 We will categorize the quantitative variable Calcium
Concentration into four groups
1 = 0-1.99
2 = 2-4.99
3 = 5-7.99
4 = 8 or More
9
BASIC PROBABILITY
Part 1 (Unconditional Probability using Logic)
10
Back to the Urine Dataset
 Suppose one individual is selected from our sample and
consider the following questions
• What is the probability that the individual has calcium oxalate
crystals present?
• What is the probability that the individual has a calcium
concentration of 5 or more?
• What is the probability the individual has calcium oxalate crystals
present AND has a calcium concentration of 5 or more?
• What is the probability the individual has calcium oxalate crystals
present OR has a calcium concentration of 5 or more?
11
Comments
 All of these four probability questions relate to the
ENTIRE SAMPLE
 We begin by answering the questions logically from the
table we created using software
12
Let’s Practice!
Basic Probability of an Event
• What is the probability that the individual has calcium oxalate
crystals present? We will denote this event by A.
• = PREVALENCE of calcium oxalate crystals in our sample
Table of group by r
group (Calcium
Concentration
r (Calcium Oxalate
Group)
Crystal Presence)
Frequency
No
Yes Total
Total
0-1.99
19
4
23
2-4.99
17
9
26
5-7.99
5
11
16
8 or More
1
7
8
42
31
73
13
Let’s Practice!
Basic Probability of an Event
• What is the probability that the individual has a calcium
concentration of 5 or more? We will denote this event by B.
Table of group by r
group (Calcium
Concentration
r (Calcium Oxalate
Group)
Crystal Presence)
Frequency
No
Yes Total
Total
0-1.99
19
4
23
2-4.99
17
9
26
5-7.99
5
11
16
8 or More
1
7
8
42
31
73
14
Let’s Practice!
Basic Probability of an Event: Intersections
• What is the probability the individual has calcium oxalate crystals
present AND has a calcium concentration of 5 or more?
Table of group by r
group (Calcium
Concentration
r (Calcium Oxalate
Group)
Crystal Presence)
Frequency
No
Yes Total
Total
0-1.99
19
4
23
2-4.99
17
9
26
5-7.99
5
11
16
8 or More
1
7
8
42
31
73
15
Let’s Practice!
Basic Probability of an Event: Unions
• What is the probability the individual has calcium oxalate crystals
present OR has a calcium concentration of 5 or more?
Table of group by r
group (Calcium
Concentration
r (Calcium Oxalate
Group)
Crystal Presence)
Frequency
No
Yes Total
Total
0-1.99
19
4
23
2-4.99
17
9
26
5-7.99
5
11
16
8 or More
1
7
8
42
31
73
16
USING PROBABILITY
RULES
Part 1
17
Probability Rules
 Rules are created and used for many reasons
 The rules and properties stated previously are important
and useful in probability and sometimes in statistics
 Not always needed
• If you can determine the answer through logic alone you may not
need a rule!
• If you are provided only pieces of the puzzle, sometimes a rule is
faster than logic!
18
Continuing
 We now illustrate a few formulas using the questions we
have already answered using logic
19
Let’s Practice Again!
Complement Rule
• What is the probability that the individual DOES NOT have calcium
oxalate crystals present?
• We could use logic and count the No’s instead of the Yes’s
however knowing P(Yes)=P(A):
Table of group by r
group (Calcium
Concentration
r (Calcium Oxalate
Group)
Crystal Presence)
Frequency
No
Yes Total
Total
0-1.99
19
4
23
2-4.99
17
9
26
5-7.99
5
11
16
8 or More
1
7
8
42
31
73
20
Let’s Practice Again!
Addition Rule (Unions)
• What is the probability the individual has calcium oxalate crystals
present OR has a calcium concentration of 5 or more?
Table of group by r
group (Calcium
Concentration
r (Calcium Oxalate
Group)
Crystal Presence)
Frequency
No
Yes Total
Total
0-1.99
19
4
23
2-4.99
17
9
26
5-7.99
5
11
16
8 or More
1
7
8
42
31
73
21
Let’s Practice Again!
Addition Rule (Unions)
• What is the probability the individual has calcium oxalate crystals
present OR has a calcium concentration of 5 or more?
22
INDEPENDENCE
Part 1
23
Independent Events
 Two events are independent if knowing one event occurs
does not change the probability of the other
 This is not the same as “disjoint” events which are
separate in that they cannot occur together
 These are two different concepts entirely
 Independence is a statement about the equality of the
probability of one event whether or not the other event
occurs (or is occurring, or has occurred)
24
Let’s Practice!
Investigating Independence Part 1
?
We know the following from our sample
25
Let’s Practice!
Investigating Independence Part 1
 From our sample we have:
 This is clearly not equal to 0.247!!
 In our sample the events are dependent (we can test this
hypothesis about the population later)
26
BASIC PROBABILITY
Part 2: Conditional Probability (Logic & Formula)
27
Conditional Probability
 So far, we have divided by the TOTAL
 Sometimes, however, we have additional CONDITIONS
that cause us to alter the denominator (bottom) of our
probability calculation
 Suppose, when choosing one person from the Urine data,
we ask
• Given the individual has Calcium Oxalate Crystals present, what is
the probability the individual’s calcium concentration is 5 or above?
 “Conditional” refers to the fact that we have these
additional conditions, restrictions, or other information
28
Let’s Practice!
CONDITIONAL Probability of an Event
• Given the individual has Calcium Oxalate Crystals present, what is
the probability the individual’s calcium concentration is 5 or above?
Table of group by r
group (Calcium
Concentration
r (Calcium Oxalate
Group)
Crystal Presence)
Frequency
No
Yes Total
Total
0-1.99
19
4
23
2-4.99
17
9
26
5-7.99
5
11
16
8 or More
1
7
8
42
31
73
29
Let’s Practice!
CONDITIONAL Probability FORMULA
• Given the individual has Calcium Oxalate Crystals present, what is
the probability the individual’s calcium concentration is 5 or above?
30
Let’s Practice!
CONDITIONAL Probability of an Event
• Given the individual DOES NOT HAVE Calcium Oxalate Crystals
present, what is the probability the individual’s calcium
concentration is 5 or above?
Table of group by r
group (Calcium
Concentration
r (Calcium Oxalate
Group)
Crystal Presence)
Frequency
No
Yes Total
Total
0-1.99
19
4
23
2-4.99
17
9
26
5-7.99
5
11
16
8 or More
1
7
8
42
31
73
31
MORE PRACTICE
Conditional Probability
32
Let’s Verify!
CONDITIONAL Probability of an Event
• Given the individual has a calcium concentration of 5 or above,
what is the probability the individual has calcium oxalate crystals?
• We have a small amount of rounding error this time
Table of group by r
group (Calcium
Concentration
r (Calcium Oxalate
Group)
Crystal Presence)
Frequency
No
Yes Total
Total
0-1.99
19
4
23
2-4.99
17
9
26
5-7.99
5
11
16
8 or More
1
7
8
42
31
73
33
INDEPENDENCE
Part 2
34
Let’s Practice!
Investigating Independence Part 2
?
We know the following from our sample
?
35
Comments
Investigating Independence Part 2
 These probabilities are clearly unequal in our sample, our
eventual question might be if this is also true for our
population
 In this sample, these events are dependent
 From our analysis so far, it seems likely they may be
dependent in our population (we can test later)
 Knowing whether or not the person has calcium
oxalate crystals present CHANGES the probability of
having a calcium concentration of 5 or above!!
36
GENERAL
MULTIPLICATION RULE
37
General Multiplication Rule
 This formula comes from rearranging the definition of
conditional probability
 To achieve the second formulation on the right consider
the formula below for P(A|B) instead and note that the
numerator is unchanged
38
General Multiplication Rule
39
REPEATED SAMPLING
40
Repeated Sampling
 Often we consider problems in which we draw multiple
individuals from a set of individuals
• Drawing parts from a box where some are defective
• Choosing multiple people from a certain population
 The formulas we have investigated can be used to
calculate probabilities in these situations
41
Let’s Practice!
 If we select two subjects at random from our sample, what
is the probability that both have a calcium concentration of
8 or more?
Table of group by r
group (Calcium
Concentration
r (Calcium Oxalate
Group)
Crystal Presence)
Frequency
No
Yes Total
Total
0-1.99
19
4
23
2-4.99
17
9
26
5-7.99
5
11
16
8 or More
1
7
8
42
31
73
42
WANT TO LEARN MORE?
READ THE FOLLOWING
OPTIONAL MATERIAL
The remaining slides are optional. They illustrate some
more difficult probability rules along with additional
examples of probability related to the health sciences
43
Optional Content: Read About
 Relative Risk
 Total Probability Rule
 Bayes Rule
 Screening Tests
• Sensitivity/Specificity
• PV+/PV-
• False Positive and False Negative Rates
 ROC Curves
44
Relative Risk
 Relative risk is
• the risk of an “event” relative to an “exposure”
• the ratio of the probability of the event occurring among “exposed”
versus “non-exposed”
• If A and B are independent, the relative risk is 1
 In our rule B is the EVENT and A is the EXPOSURE
45
Let’s Practice!
 Find the Relative Risk of High Calcium Concentration
Given Calcium Oxalate Crystal Presence
• Note: this is the reverse of what we probably want in this case,
consider that for more practice!
• INTERPRET RR: Having a calcium concentration of 5 or more is
around 4 times more likely among those with calcium oxalate
crystals than among those without.
46
Total Probability Rule
47
Bayes’ Rule
 We want to find P(A|B) so that we will need to “rearrange”
the formula swapping A’s and B’s
48
Bayes’ Rule
49
Let’s Verify!
CONDITIONAL Probability of an Event
• Given the individual has a calcium concentration of 5 or above,
what is the probability the individual has calcium oxalate crystals?
• We have a small amount of rounding error this time
Table of group by r
group (Calcium
Concentration
r (Calcium Oxalate
Group)
Crystal Presence)
Frequency
No
Yes Total
Total
0-1.99
19
4
23
2-4.99
17
9
26
5-7.99
5
11
16
8 or More
1
7
8
42
31
73
50
SCREENING TESTS
and ROC Curves
51
Screening Tests
52
Sensitivity &
Specificity
Has
Condition
“Epi” Style
Does not
have
Condition
Test
Positive
A
TP
B
FP
Total
Positive
Test (A+B)
Test
Negative
C
FN
D
TN
Total
Negative
Test (C+D)
Number
with
Condition
(A+C)
Number
without
Condition
(B+D)
53
Sensitivity & Specificity
group (Calcium
Concentration
Group)
Frequency
Total
Has
Condition
r (Calcium Oxalate
Crystal Presence)
Yes
No
Total
0-1.99
4
19
23
2-4.99
9
17
26
5-7.99
11
5
16
8 or More
7
1
8
31
42
73
0-1.99
NEGATIVE
2 or more
POSITIVE
Does not
have
Condition
4
19
27
23
31
42
54
Sensitivity & Specificity
group (Calcium
Concentration
Group)
Frequency
Total
Has
Condition
r (Calcium Oxalate
Crystal Presence)
Yes
No
Total
0-1.99
4
19
23
2-4.99
9
17
26
5-7.99
11
5
16
8 or More
7
1
8
31
42
73
0-4.99
NEGATIVE
5 or more
POSITIVE
Does not
have
Condition
13
36
18
6
31
42
55
Sensitivity & Specificity
group (Calcium
Concentration
Group)
Frequency
Total
Has
Condition
r (Calcium Oxalate
Crystal Presence)
Yes
No
Total
0-1.99
4
19
23
2-4.99
9
17
26
5-7.99
11
5
16
8 or More
7
1
8
31
42
73
0-7.99
NEGATIVE
8 or more
POSITIVE
Does not
have
Condition
24
41
7
1
31
42
56
Bayes’ Rule
Here we Define:
A = Disease
B = Test Positive
Negative
0- 4.99
Positive
≥8
Has
Condition
Does not
have
Condition
24
41
7
1
31
42
57
Choosing Different Cut-Off
High Sensitivity
but Low Specificity
Cut-point
Sensitivity Specificity
2 or more
0.87
0.45
5 or more
0.58
0.86
8 or more
0.23
0.98
58
Choosing Different Cut-Off
Specificity Increased
But you reduce sensitivity
(orange arrow)
Cut-point
Sensitivity Specificity
2 or more
0.87
0.45
5 or more
0.58
0.86
8 or more
0.23
0.98
59
Choosing Different Cut-Off
Very High Specificity
Very Low Sensitivity (High
False Negative Rate)
Cut-point
Sensitivity Specificity
2 or more
0.87
0.45
5 or more
0.58
0.86
8 or more
0.23
0.98
60
What happens when
 We assign all individuals a positive test result?
• Sensitivity = P(Test+|Disease) = 1
• Specificity = P(Test-|No Disease) = 0
• 1 – Specificity = 1
 We assign all individuals a negative test result?
• Sensitivity = P(Test+|Disease) = 0
• Specificity = P(Test-|No Disease) =1
• 1 – Specificity = 0
61
Receiver Operating Characteristic curve
(ROC curve)
Cut-point
Sensitivity
Specificity
2 or more
0.87
0.45
5 or more
0.58
0.86
8 or more
0.23
0.98
True Positive Rate (Sensitivity)
ROC Curve for Calcium Oxalate
1.00
0.90
0.80
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
0.00
2
5
8
0.20
0.40
0.60
0.80
False Positive Rate (1-Specificity)
1.00
ROC Curves
 Area under the curve =
probability that for a
randomly selected pair of
normal and abnormal
subjects, the test will
correctly identify the
normal subject given the
“measurement”
 Area = 0.89 for the
example on the left
62
63
Trapezoidal Rule (FYI)