Download Back to Basics, 2003 POPULATION HEALTH: GENERAL

Document related concepts
Transcript
Back to Basics, 2017
POPULATION HEALTH
Session 1:
Critical Appraisal,
Epidemiology Methods,
Biostatistics
Ian McDowell
School of Public Health
[email protected]
(Based on slides by Dr. N. Birkett)
March 2017
1
THE PLAN (2)
• 4 hour session. Aim to spend about 3 hours
on lecture
– Review MCQs in remaining time
• A 10 minute break about half-way through
• Please interrupt for questions, etc. if things
aren’t clear.
– Goal is to help you, not to cover a fixed
curriculum.
March 2017
2
LMCC Objective 78.2 (abbreviated)
78.2: ASSESSING AND MEASURING HEALTH STATUS AT
THE POPULATION LEVEL
• Describe the health status of a defined population.
• Measure and record the factors that affect the health status of
a population with respect to the principles of causation
– Critically evaluate possible sources of data …
– Know how to access and collect health information to
describe the health of a population …
– Analyze population health data using appropriate measures
– Interpret and present the analysis of health status indicator
– Apply the principles of epidemiology by accurately
discussing the implications of the measures …
http://mcc.ca/wp-content/uploads/Qualifying-examination-objectives.pdf
March 2017
(see page 208)
3
Structure
• Epidemiology & biostatistics methods are the
basic science that underpin medical & health
services research
• MCC does not test basic sciences, but questions
cover their application in practice. The emphasis
is on critical appraisal of information, and
• Your daily clinical application is via EvidenceBased Medicine, so this offers one way to
structure this presentation.
March 2017
4
Content Map
Study
Designs
Sampling
EBM:
Causal
criteria
Measurements
March 2017
Critical
appraisal,
PICO,
Up to Date
Biostatistics
Data
analyses
Errors &
validity
5
EBM judgment criteria
ARE THE STUDY RESULTS VALID?
Based on
• Choice of study design
• Sampling (representativeness, chance errors)
• Quality of study execution
• Validity of measures (biases)
• Strength of associations found
• Control for confounding factors
March 2017
6
EBM Domains
• Etiology, or identifying harm
– Highlights causal criteria, strength of
association, observational study designs
• Therapy
– Focus on experimental designs & execution
• Diagnosis
– Highlights validity of measures
• Prognosis
– Sampling, adjustment, likelihood estimates
March 2017
7
1. Descriptive Studies
The ED seems to be seeing more
patients every year. Is this a true
trend, or just a chance fluctuation?
March 2017
8
Descriptive & Analytic studies
• Observational research;
• Count things (prevalence, incidence);
• Perhaps test associations (i.e., analytic
study)
– E.g. different prevalence among men & women
= association between sex & the disease.
March 2017
9
Basic terms
• Prevalence:
– The probability that a person has the outcome of
interest today. Counts existing cases of disease.
Useful for measuring burden of illness: how big is
the problem? How many hospital beds will we
require?
• Incidence:
– The probability (chance) that someone without the
outcome will develop it over a fixed period of time.
Relates to new cases of disease. Useful for
studying causes of illness.
March 2017
10
Prevalence
• Back in July, 2010, 140 graduates from the
U. of Ottawa medical school started
working as R1’s.
• Of this group, 100 had insomnia on June 30.
• Therefore, the prevalence of insomnia was:
100
= 0.72 = 72%
140
March 2017
11
Incidence Proportion (risk)
• In July 2010, 140 graduates from U. Ottawa
medical school start working as R1’s.
• Over the next year, 30 develop a stomach
ulcer.
• Therefore, the incidence proportion (risk) of
an ulcer in the first year post-graduation is:
March 2017
12
Incidence Rate (1)
• Incidence rate is the ‘speed’ with which
people get ill. Rates involve time.
• Everyone dies (eventually). Many consider it
desirable to delay dying:
 death rate is lower.
• Compute with person-time denominator:
PT = # people * duration of follow-up
# new cases
Incidence Rate =
Person-time of follow-up
March 2017
13
Incidence rate (2)
• 140 U. of Ottawa medical students were
followed during their residency
– 50 did 2 years of residency
– 90 did 4 years of residency
– Person-time = 50 * 2 + 90 * 4 = 460 PY’s
• During follow-up, 30 developed stress ulcers.
• Incidence rate of stress is:
30
65 cases
IR =
= 0.065 cases/PY =
460
1,000 PYs
March 2017
14
Prevalence & incidence
• As long as conditions are ‘stable’ and disease is
fairly rare, we have this relationship:
P » I *d
That is,
Prevalence ≈ Incidence rate * average disease duration
March 2017
15
Sampling - Glossary
• Target population: population we would like to
describe or generalise results to
– E.g. all patients with lung cancer in Ontario, not just
those you studied at the Civic hospital on Ottawa.
• Study population: group from which we draw the
sample (Civic hospital)
– The study population may not be equivalent to the target
population, so a random sample that is representative of
the study population may not be representative of the
target population
• Sampling frame: list of elements in the study
population (all lung cancer patients at the Civic)
• Probability samples: probability of being included in
the sample is known for each unit
– Important to adjust estimates for different parts of the
population.
March 2017
16
Target population (Lung cancer patients
in Ottawa)
Inferences
Study population
(Lung cancer patients
in Civic hospital)
March 2017
drawn
Sample
17
Glossary - continued
•
•
•
•
•
•
Simple random sampling (SRS): each person has the same
(& non-zero) probability of being selected
Systematic sample: Every nth individual
Convenience sample: e.g. patients attending the out-patient
clinic this morning
Stratified sample: reduce sampling error by sub-groups on
your list (e.g. by age-group).
Cluster sample: for efficiency, instead of randomly sampling
from all over the city, first choose certain study populations
only (e.g., schools), then sample more intensively from them.
Sampling unit = the school.
Multistage sample: e.g. randomly select cities, then hospitals
in those cities, then patients in the selected hospitals.
March 2017
18
Sampling error
• Random samples most likely to give unbiased
estimates of the sample population parameters.
But:
• Random samples may differ from the study
population due to chance, so results based on
them may be ‘off-target’ (i.e. random error:
variation in a measurement due to chance)
• Random sampling error or standard error =
square root (variance/sample size)
• Standard errors are used in the calculation of
95% confidence intervals.
March 2017
19
Advantages of a Simple Random
Sample
• Many statistical tests assumes that you are
using SRS
• Random samples with good response rates
may be less susceptible to selection bias
• Random sampling may (hopefully!) produce
a representative sample.
March 2017
20
Disadvantages of a Simple
Random Sample
• Requires complete list of population
• Random samples may have low response rates
resulting in selection bias
– E.g., people may not understand why they were chosen:
“You were chosen by chance” is not very appealing!
• May still result in a non-representative sample,
particularly with small sample sizes or low
response rates.
March 2017
21
Biostatistics (1):
Descriptive Statistics
These describe data results:
average values & spread of values
March 2017
22
MCQ
Your colleague shows you a study result
quoting a mean value of “4.6 +/- 2.1”. You
discuss the meaning of this.
–
–
–
–
–
It shows the standard deviation
It shows the variance around the mean
It shows the standard error of the mean
It shows the confidence level
GOK
March 2017
23
‘Central Tendency’ & Dispersion
• Mean:
– average value. Measures the ‘centre’ of the data. Will be roughly
in the middle.
• Median:
– The middle value: 50% above and 50% below. Used when data
are skewed.
• Variance:
– A measure of how spread out the data are.
– Defined by subtracting the mean from each observation, squaring,
adding them all up and dividing by the number of observations.
March 2017
24
Variance & Standard Deviation
• Standard deviation:
– Square root of the variance.
March 2017
25
March 2017
26
Multiple Studies
• EBM is not just based on results of one
study of some individuals;
• It is based (ideally) on results accumulated
over many studies
• There may be some variation in these
results
• So we need to summarize several studies …
March 2017
27
Standard Error
• The standard deviation looks at the variation of data in
individuals
• But we often repeat studies. Each produces a mean value,
and these mean values may vary somewhat.
– What is the distribution of these means?
– Will be ‘normal’, ‘Gaussian’ or ‘Bell curve’
– Mean of the means will be same as population mean
– But the variance of the means will be smaller than
population variance
• This is the Standard error (of the mean):
March 2017
sd
, where n is the sample size
n
28
Estimation
• Usually we study a sample of people to
estimate ‘parameters’ for a broader
population.
– The ‘population’ could refer to all Canadians,
or everyone of a certain group, or to all patients
with a particular disease, etc.
• Sample used in estimation ideally should be
randomly selected.
• The accuracy of the resulting estimates is
described by ‘inferential statistics’.
March 2017
29
Confidence Intervals
– A range of numbers which tell us where
the best estimate of the correct answer (or
parameter) lies.
• For a 95% confidence interval, we are 95%
sure that the true value lies inside the
interval.
– Usually computed as: mean ± twice the
standard error
March 2017
30
Example of Confidence Interval
• If sample mean is 80 with a standard
deviation of 20, and the sample size is 25
then:
– We can be 95% confident that the true mean
lies within the range:
80 ± (2*4) = (72, 88).
March 2017
31
Example of Confidence Interval
• If the sample size were 100, then
– 95% confidence interval is:
80 ± (2*2) = (76, 84).
– Increasing sample size makes our estimate of
the parameter more precise.
March 2017
32
Analytic Study Designs
March 2017
33
MCQ
A patient (who regrettably did not attend B2B) asks
you to explain why a randomised trial is considered
so superior to other designs. Which of the following
is the best answer?
–
–
–
–
The randomization ensures a representative sample
RCT approaches a true experimental design
It achieves better control of confounders
The prospective design allows for more complete
follow-up
– The results permit both relative and absolute
analyses.
34
March 2017
Back to EBM
• The Etiology, Harm & Therapy domains of
EBM all involve studies designed to find
causal relationships.
• This requires analytic studies – examining
relationships between presumed causal
factors and health states.
• Study designs can be observational or
(ideally) experimental.
– The designs vary in terms of how well they can
35
address
the
causal
criteria.
March 2017
Causal Criteria
Criterion
Comments
Temporal sequence
Strong criterion, but when did the
disease begin?
Strength of association
Depends on how other factors are
controlled in analysis
Biological gradient; dose-response
OK, but there may be a threshold
Specificity of association
OK criterion for ID, but not for obesity,
smoking, etc.
Consistency across studies
Good, unless relationship applies only
to minority of people
Biological rationale?
Good if we have a theory.
Cessation of exposure
Great, if pathology is reversible
March 2017
36
Cohort studies (1)
1. Select non-diseased subjects based on their
exposure status
• Main method used:
• Select a group of people with the exposure of interest
• Select a group of people without the exposure
• Can also simply select a group of people without
the disease and study a range of exposures.
2. Follow the group to determine what happens
to them.
March 2017
37
Cohort studies (2)
3. Compare the incidence of the disease in
exposed and unexposed people
• If exposure increases risk, incidence will be
higher in exposed subjects than unexposed
subjects
• Compute a relative risk (risk ratio).
4. Framingham Study is standard example.
March 2017
38
Study begins
Outcomes
time
Disease
Exposed
group
Unexposed
group
No disease
Disease
No disease
March 2017
39
Cohort studies (4)
Disease
YES
RISK RATIO
NO
= 42 122
Risk in exposed:
Exp
YES
42
80
122
NO
43
302
345
85
382
467
Risk in Non-exposed = 43 345
If exposure increases risk, you would expect
42
43
122 to be larger than 345. How much
larger can be assessed by the ratio of one
to the other:
Risk Ratio (RR) =
Risk in exposed
Risk in unexposed
42
(
) 0.344
= 122 =
( 43 345 ) 0.125
= 2.76
March 2017
40
Cohort studies (5)
• To avoid long follow-up, can use a
historical cohort study design.
• Recruit subjects sometime in the past
• Usually use administrative records to
record exposures
• Follow-up to the present & record
outcomes
• Can continue to follow into the future.
March 2017
41
Cohort studies (6)
• Example: cancer in Gulf War Veterans
• Exposure took place in 1991-2
• Study is conducted in 2016
• Identify soldiers deployed to Persian Gulf in
1991
• Identify soldiers not deployed to Persian Gulf
in 1991
• Compare incidence of cancer in each group
from 1991 to 2010
March 2017
42
Case-control studies (1)
• Select subjects based on their final outcome.
– Select a group of people with the outcome or disease
(cases)
– Select a group of people without the outcome
(controls)
– Ask them about past exposures (or get from records)
– Compare the frequency of exposure in the two groups
• If exposure increases risk, the odds of exposure in the cases
should be higher than the odds in the controls
– Compute an Odds Ratio
– Under many conditions, OR ≈ RR
March 2017
43
The study begins by
selecting
subjects based on
Exposed
Unexposed
Review
records
Disease
(cases)
Review
records
No disease
(controls)
Exposed
Unexposed
March 2017
44
Case-control studies (3)
Disease?
YES
ODDS RATIO
NO
= 42 18
Odds of exposure in controls = 43 67
Odds of exposure in cases
Exp?
YES
42
18
NO
43
67
If exposure increases risk, you will find more exposed
85
85
170
cases than exposed controls. The odds of exposure
for cases would be higher
42
18
> 43
67
This can be assessed by the ratio of one to the other:
OR is often called the:
‘cross-product ratio’
Odds Ratio (OR) =
Exposure odds in cases
Exposure odds in controls
42
(
)
= 18
( 43 67 )
42 × 67
43×18
= 3.64
=
March 2017
45
Case-control studies (3)
Disease?
YES
ODDS RATIO
NO
Odds of exposure in cases
Exp?
YES
42
18
NO
43
67
= 42/43
Odds of exposure in controls = 18/67
If exposure increases risk, cases should be more
85
85
170
likely to have been exposed than controls.
The odds of exposure for cases would be higher
(42/43 > 18/67).
This can be assessed by the ratio of one to the other:
Odds Ratio (OR)= Exposure odds in cases
Exposure odds in controls
OR is often called the:
‘cross-product ratio’
= (42/43) / (18/67)
= (42x67) / (43x18)
= 3.64
March 2017
46
Randomized Controlled Trials
Basically a cohort study where the researcher decides
which exposure (e.g. treatment) the subjects get.
– Recruit a group of people meeting pre-specified
eligibility criteria.
– Randomly assign some subjects (usually 50% of them)
to get the control treatment and the rest to get the
experimental treatment.
– Follow-up the subjects to determine the risk of the
outcome in both groups.
– Compute a relative risk or otherwise compare the
groups.
March 2017
47
RCTs (2)
Some key design features
– Intention to treat vs. per protocol analysis
– Allocation concealment
• the person randomizing should not know the next
treatment allocation
– Blinding (masking)
• of the Patient, Treatment team, Outcome assessor,
Statistician
– Monitoring committee
March 2017
• Early termination rules
48
RCTs (3)
Some technical & ethical challenges:
• Equipoise
– Must be no clear advantage of the treatment, or it’s
unethical to withhold it from controls
– But, if you’re not confident of superiority, why do a trial?
• Often highly selective study samples
– Generalizable?
• Contamination
– Control group members get the new treatment
• Co-intervention
– Some people get treatments other than those under study.
March 2017
49
RCT: analysis of outcomes
• Relative Risk:
RR = Incidence treatment / Incidence control
• Absolute risk reduction
ARR = Incidencecontrol - incidencetreatment = attributable risk
• Relative risk reduction
ARR
RRR =
= 1- RR
incidencecontrol
March 2017
50
RCTs – Options for Analysis
Treatment
Control
Asthma
attack
15
25
No
attack
35
25
Relative Risk =
Absolute Risk Reduction =
Relative Risk Reduction =
Number Needed to Treat =
March 2017
Total Incidence
50
50
.30
.50
0.3 / 0.5 = 0.6
0.5 - 0.3 = 0.2
0.2 / 0.5 = -.4 = 40%
1/ARR = 1 / 0.2 = 5
51
78.2: CRITICAL APPRAISAL
Hierarchy of evidence
(highest to lowest quality, approximately)
•
•
•
•
•
•
•
•
•
March 2017
Meta analyses & Systematic reviews
Experimental (Randomized trial)
Prospective Cohort
Historical Cohort
Case-Control
Cross-sectional
Ecological (for individual-level exposures)
Case report/series
Expert opinion
52
Confounding
Interpreting associations and
distinguishing causal influences
(Note: Standardization is covered in the
Assessing & Measuring Health session)
March 2017
53
MCQ
An anxious patient brings you an article that studied an
association between alcohol consumption and cancer of the
mouth. The authors stated that the causal link was
confounded by smoking. Which of the following represents
the best explanation of what this may mean?
–
–
–
–
–
It was a mistaken result due to a flawed analysis.
Key criteria for a causal link were not met.
Smokers gave unreliable information on their drinking.
Adjusting for smoking removes the association.
Smoking and drinking mutually interact in causing
cancer.
March 2017
54
Confounding: an example
• Does drinking alcohol cause oral cancer?
– A case-control study found an OR of 3.4
(95% CI: 2.1 - 4.8)
• BUT, the effect of alcohol may be ‘mixed up’ with
the effect of smoking (‘confusion’ en français).
• A confounder is an extraneous factor which is
associated with both exposure and outcome, and is
not an intermediate step in the causal pathway.
– Smoking causes mouth cancer;
– Heavy drinkers tend to be heavy smokers;
– Smoking is not part of causal pathway for alcohol.
March 2017
55
The Confounding Triangle
? Causal association?
Oral cancer
Alcohol
Smoking
March 2017
56
Confounding
So, does alcohol drinking cause oral cancer?
• Run analyses for smokers separately from
non-smokers:
– Among smokers, we find:
• OR = 1.3 (95% CI: 0.92-1.83)
– Among non-smokers, we find:
• OR = 1.1 (95% CI: 0.8-1.7).
– Not significant: likely confounded by smoking!
• Logistic regression commonly used to adjust
for multiple confounders.
57
March 2017
TIME FOR A
BREAK!
March 2017
58
Biostatistics (2):
Inferential Statistics
& significance testing
How likely is it that the result in this
study accurately reflects what is
going on in the broader population?
March 2017
59
MCQ
A colleague is reading an article and asks you to
clarify the meaning of ‘inferential statistics’.
1. Statistics that show which conclusion is most likely to
be correct
2. Mathematics that estimate the likelihood of a chance
finding
3. An analysis that demonstrates a significant correlation
4. Results that meet criteria for causation
5. Analyses that involve more than two variables.
March 2017
60
MCQ
A patient shows you a study evaluating the medication you
recently prescribed for her. It shows a p-value of 0.05. She
asks you to explain this statistic.
1.
2.
3.
4.
5.
March 2017
The p value summarizes the magnitude of the benefit of the
treatment.
It demonstrates a significant benefit to the therapy.
The study was small; a larger one would have provided a
better p-value.
It shows the probability that there is no benefit of the
treatment.
It shows that one can never be certain that a given
treatment will work.
61
BIOSTATISTICS
Inferential Statistics
• Describing things is fine but limited.
• Want to compare different groups to see if they
differ more than you might expect by chance
alone:
– New drug treatments compared to old ones
– Exposure to pollutants and risk of cancer.
• Inferential statistics makes this possible
– requires a good study design to avoid bias.
March 2017
62
Experimental logic: “I cured the patient”
• Start with a theory: ‘Magnetic personality cures people’.
• We cannot prove a theory, but we can disprove predictions it makes,
which casts doubt on the theory.
• So, set up a Null Hypothesis (hoping secretly to disprove this):
“The patients are no better after seeing Dr Gauss.”
• Generate some data.
• Check to see if the results are consistent with the null hypothesis.
– If the result is ‘unlikely’, then reject the null hypothesis.
• Statistics just puts a mathematical overlay on this approach.
March 2017
63
Hypothesis Testing (1)
Used to compare two or more groups.
• General process of hypothesis testing :
1. We first assume that the two groups have the
same outcome results.
= null hypothesis (H0)
2. Generate some data
March 2017
64
Hypothesis Testing (2)
3. From the data, compute some number (a
‘statistic’) that interprets any difference
between the groups
4. Under the null hypothesis (H0), this value
should be zero.
4. Compare the value you get to ‘0’.
• If difference is ‘too large’, we can conclude that our
assumption (null hypothesis) is unlikely to be true
• So, reject the null hypothesis.
March 2017
65
Hypothesis Testing (3)
5. We quantify the extent of our discomfort with
the null hypothesis through the significance
level or p-value.
–
‘p’ = probability that the difference we found
could occur if the null hypothesis really is true.
– Reject H0 if the p-value is ‘too small’
• What is ‘too small’?
– arbitrary
– tradition sets it at < 0.05 (a 5% chance).
March 2017
66
Hypothesis Testing (4)
• Defining the p-value
– The probability that we reject the null hypothesis (conclude
it is wrong) when it is really right. (A false positive)
• Calculation of p-value
– Assuming that the null hypothesis is true,
– What is the probability that our statistic would be at least as
big as what we actually got?
• It is not the probability that the groups are different
• We can never prove that the null hypothesis is true
• We either reject or accept the null hypothesis
March 2017
67
Example of significance test (1)
• Is there an association between sex and smoking?
– 35 of 100 men smoke but only 20 of 100 women smoke
• Usually present data in a 2x2 table:
Smoke
Don’t
smoke
Men
35
65
100
Women
20
80
100
55
145
200
• Compare observed #’s to what we would have expected under
the null hypothesis.
March 2017
68
Example of significance test (2)
• Null Hypothesis
– There is no effect of sex on the probability a person is a
smoker.
• Calculate a chi-square value (the statistic)
2
c
 = 5.64.
March 2017
69
Example of significance test (3)
– If there is no effect of sex on smoking (the null
hypothesis), a chi-square value as large as 5.64
would occur only 1.8% of the time.
 p = 0.018
– Gives moderately strong evidence to reject the null
hypothesis
– Would conclude that smoking prevalence differs by
sex.
March 2017
70
Example of significance test (4)
• Instead of computing the p-value, could
compare your statistic to the ‘critical value’
– The value of the Chi-square which gives
p = 0.05 is 3.84
– Since 5.64 > 3.84, we conclude that p < 0.05
• Doesn’t tell us what the probability actually
is.
– Just that the observed statistic is rarer than our
cut-off value
March 2017
71
Examples of significance tests (5)
• Common methods used are:
– T-test
– Z-test
– Chi-square test
– ANOVA
• Approach can be extended through the use of regression
models to relate several independent variables to one
dependent variable:
– Linear regression
– Logistic regression
– Cox models
March 2017
72
Back to Hypothesis Testing … (5)
• p-values are key for interpreting hypothesis tests.
• But they are being down-played
– Modern approach is to present 95% confidence
intervals of the treatment effect rather than a p-value
– Gives estimate of the range of potential benefits.
• Now, we need to get to statistical power.
• So, a bit more stuff and some more terms (sorry).
March 2017
73
Hypothesis Testing (6)
• Hypothesis tests can get things right or wrong
• Two types of errors can occur:
– Type 1 error (aka. Alpha)
– Type 2 error (aka. Beta)
• p-value
– Essentially the alpha value
• Power
– Related to type 2 error (Beta)
March 2017
74
Hypothesis Testing (7)
• p-value:
• The chance you will say there is a difference
between groups (e.g. the new drug is better)
when there really is NO difference
= risk of an alpha error
March 2017
75
Hypothesis testing (8)
Actual Situation
No effect
Results
of Stats
Analysis
March 2017
Effect
No effect
Effect
(No error)
Type 2 error
(β)
Type 1 error
(α)
(No error)
76
Hypothesis Testing (9)
• Statistical Power:
– It’s easy to show that a drug reduces BP by 40 mmHg
– Hard to show that it reduces BP by 1 mmHg
– Study more likely to ‘miss’ the small effect than the large
effect.
– Statistical Power is:
• The chance the study will show a difference between groups
when there really is a difference of a given amount.
• Basically, this is 1-β
– Power depends on how big a difference you consider to be
important
March 2017
77
How to improve your power?
• Increase sample size
• Improve precision of the measurement tools
used (reduces standard deviation)
• Use better statistical methods
• Use better designs
• Reduce bias
• Set a bigger difference you wish to find.
March 2017
78
Study Measurements
(See EBM Diagnosis and Prognosis
studies)
March 2017
79
Measurement Core Concepts (1)
• Random Variation (chance):
– Every time we measure something, errors will
occur.
• Any sample will include people with values
different from the real value, just by chance.
• These are random factors which affect the precision
(SD) of our data but not necessarily the validity
when studying a large group of people.
– Statistics and bigger sample sizes can help here.
March 2017
80
Measurement Core Concepts (2)
• Bias:
– A systematic factor which causes two groups to
differ.
• A study uses a two-section scale to measure height
• Scale was incorrectly assembled (with a gap
between the upper and lower sections).
• Over-estimates height for everyone (a bias).
– Bigger numbers and statistics don’t help much;
you need good measurements & a good design
instead.
March 2017
81
Reliability
• = reproducibility. Does it produce the same result
repeatedly? (A reliable car starts every time)
• Related to chance error
• Random errors average out in the long run
• But in patient care you hope to do a test only once
– Therefore, you need a reliable test
March 2017
82
Validity
• Whether a test measures what it purports to
measure
– Is a disease present (or absent)?
– How often is the test result correct?
– What interpretation can you place on the test
result?
• Normally use criterion validity
– Compare test result to a gold standard
• SIM web page on validity
March 2017
83
Reliability and Validity
Target shooting as a metaphor
Reliability
Low
Low
•
•
•
•
•
•
Validity
•
High •
March 2017
High
•••
•••
•
•
•
•
•••
••
•
84
Test Properties - Validity (1)
Test +ve
Diseased
Not diseased
90
5
True positives
Test -ve
10
False negatives
100
March 2017
95
False positives
95
105
True negatives
100
200
85
Test Properties (2)
Diseased
Not diseased
Test +ve
90
5
95
Test -ve
10
95
105
100
100
200
Sensitivity = 0.90
March 2017
Specificity = 0.95
86
2x2 Table for Testing a Test
Gold standard
Test Positive
Test Negative
March 2017
Disease
present
Disease
absent
a (TP)
c (FN)
Sensitivity
= a/(a+c)
b (FP)
d (TN)
Specificity
= d/(b+d)
87
Test Properties (3)
• Sensitivity =
• Specificity =
Pr(test positive in a person
with disease)
Pr(test negative in a person
without disease)
• Range: 0 to 1
–
–
–
–
> 0.9:
0.8-0.9:
0.7-0.8:
< 0.7:
March 2017
Excellent
Not bad
So-so
Poor
88
Test Properties (4)
• Sensitivity and Specificity
– Values depend on cutoff point between normal/abnormal
– Generally, high sensitivity is associated with lower specificity and
vice-versa.
– Not affected by prevalence, if ‘case-mix’ is constant
• Do you want a test to have high sensitivity or high
specificity?
– Depends on cost of ‘false positive’ and ‘false negative’ errors
– PKU – one false negative is a disaster
– Ottawa Ankle Rules: insisted on sensitivity of 1.00
– Incarceration: high specificity
March 2017
89
Overall test performance: the ROC
• Test scores for sick and well
populations almost always overlap.
• Change the cut-off point:
• As sensitivity goes up, specificity will go down
• Graph the sen & spec (actually,
TP
1-spec) for each cut-point.
• = Receiver Operating Characteristic
curves.
• Area under the curve = measure of
overall test quality. 0.9 = excellent,
0.8 = good, 0.5 = useless.
March 2017
FP
90
Ruling in & out
• “SpPIn”
– Positive result on a specific test rules Dx in
– High specificity means that test identifies only
this particular disease (it’s very choosy),
so + score rules in.
• “SnNOut”
– Negative result on a sensitive test
rules it out
– High sensitivity means it would
most likely find disease if present, so you can
rely on a negative result to exclude the Dx.
March 2017
91
Test Properties (5)
• Sen & Spec not directly useful to clinician: You see
the test result, but don’t know if it’s a true or false
result.
• Patients don’t ask:
– “If I’ve got the disease, how likely is that the test will
be positive?”
• They ask:
– “My test is positive. Does that mean I have the
disease?”
→ Predictive values.
March 2017
92
Predictive Values
• Based on rows, not columns
+
-
TP
FP
FN
TN
– PPV interprets positive test (% of Pos that are TP)
– NPV interprets negative test (% of Neg that are TN)
• Shows the probability that the patient has (or does
not have) the disease, based on the test result
– Immediately useful to clinician & patient.
March 2017
93
Test Properties (9)
Test +ve
Diseased
Not diseased
90
5
TP
Test -ve
March 2017
95
FP
10
95
105
100
100
200
PPV =
0.95
NPV =
0.90
94
2x2 Table for Testing a Test
Gold standard
Disease
Disease
Present
Absent
Test + a (TP) b (FP)
Test - c (FN) d (TN)
March 2017
PPV = a/(a+b)
NPV= d/(c+d)
95
Test Properties (10)
Diseased
Test +ve
90
Not
diseased
5
Test -ve
10
95
105
100
100
200
95
PPV =
0.95
What happens if the disease is less common?
March 2017
96
Test Properties (11)
Diseased
Test +ve
9
Not
diseased
5
Test -ve
1
95
10
100
95
14
105
96
200
110
PPV =
9/14
= 0.64
What happens if the disease is less common?
March 2017Increasing numbers of false positives
97
Prevalence and Predictive Values
• When disease gets rarer, more people end up in
‘no disease’ group than in ‘disease’ group
• Therefore, for a given sensitivity & specificity:
– # true positives goes down
– # false positives goes up
• PPV goes down
• Conversely, NPV goes up.
March 2017
98
Prevalence and Predictive Values
A Dilemma
• Predictive values of a test are dependent on the
pre-test prevalence of the disease
• Prevalence is lower in non-tertiary care
settings.
• Most tests are developed and evaluated in
tertiary care settings.
• PPV will be lower in non-tertiary care settings.
March 2017
99
Prevalence and Predictive Values
• So, how do you determine how useful a test
will be in a different patient setting?
• Process is often called ‘calibrating’a test
– Relies on the stability of sensitivity &
specificity across populations.
– You need to know (or guess) the prevalence in
the new setting.
– Allows you to estimate what the PPV and NPV
would be in the new setting.
March 2017
100
Methods for Calibrating a Test
Four methods:
1. Apply test + gold standard to a consecutive series
of patients from the new population
• Rarely feasible (especially during an exam)
2. Hypothetical table [simple & feasible]
• Assume the new population has (e.g.) 10,000 people
• Fill in the cells based on the prevalence, sensitivity and
specificity [next slide]
3. Likelihood Ratios & a Nomogram
• Only useful if you have access to the nomogram
4. Bayes’ Theorem (calculates Likelihood Ratios)
March 2017
101
Calibration by hypothetical table
• Pretend you can do a new study in your
patient population
• Assume a practice size
• 10,000 makes the numbers nice
• Figure out how many patients with disease
there would be (prevalence)
• Figure what test results you would expect to see
• Compute PPV
March 2017
102
Calibration by hypothetical table
Fill cells in following order:
“Truth”
Test +ve
Test -ve
Disease
Present
Disease
Absent
Total
PV
4th
7th
8th
10th
6th
9th
11th
5th
Sensitivity
Specificity
Total
March 2017
2nd
3rd
Pre-test Prevalence
10,000
103
Test Properties (11)
Tertiary care: Prev=0.050
Diseased
Test +ve
Test -ve
March 2017
Not diseased
450
25
50
475
500
500
Sens = 0.90
Spec = 0.95
475
PPV = 0.89
525
1,000
104
Test Properties (12)
Primary care: Prev=0.01
Diseased
Test +ve
Test -ve
585
90
495
10
9,405
9,415
100
9,900
10,000
Sens = 0.90
March 2017
Not diseased
PPV = 0.1538
Spec = 0.95
105
Calibration using Likelihood Ratios
• The Likelihood Ratio is the probability of a
given test result in a patient with the disease,
divided by the probability of the same finding
in a patient without the disease.
• Consider the following table (from a research
study)
– How do the ‘odds’ of having the disease change
once you get a positive test result?
March 2017
106
Likelihood Ratios
Diseased Not
diseased
90
5
95
Test
+ve
Test - 10
ve
100
95
105
100
200
Post-test odds
(test +ve) = 18.0
Pre-test odds =
1.00
Odds (after +ve test) are 18 times higher than the odds
before you had the test. This is the LIKELIHOOD RATIO.
March 2017
https://www.youtube.com/watch?v=nFZs6eMvZFY
107
Likelihood Ratios
• Likelihood ratios are related to sens &
spec
 LR(+) =
• Sometimes given as the definition of
the LR(+)
• LR(+) is fixed across populations.
– Bigger is better.
March 2017
108
Examples of LRs
LRs change pretest probability (pp) as follows:
LR+
Increases pp by
LR-
Reduces pp by
2
15% (of the pp)
0.5
15%
5
30%
0.2
30%
10
45%
0.1
45%
CAGE questions
for alcoholism:
March 2017
CAGE
LR+
1 positive
NS
2 positive
4.5
3 positive
13.3
4 positive
101
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1495095/
109
Claire Lee’s Video!
• Claire produced a video on LRs.
– Guys - learn how to tell if an egg is rotten
before you open it!
• https://www.youtube.com/watch?v=ohohrv
6peYk
March 2017
110
Calibration with Nomogram
• Graphical approach which avoids arithmetic
• Scaled to work directly with probabilities
– no need to convert to odds
• Draw line from pretest probability
(= prevalence) through likelihood ratio
– extend to estimate posttest probabilities
• Only useful if someone gives you the
nomogram!
March 2017
111
Example of Nomogram
(pretest probability 1%, LR+ 18, LR– 0.105)
15%
18
(LR = 1
no change)
1%
.105
0.01%
Pretest Prob.
March 2017
LR
Posttest Prob.
112
MCQ
Several authorities recommend routine screening for depression in
primary care settings. You find a brief screening test that has a
positive likelihood ratio (LR+) of 2. If the pre-test probability is
5%, what is the main conclusion you should draw from that
figure?
1. A patient with a positive score will be twice as likely to have
a depression.
2. The interpretation will depend on the negative likelihood ratio
(LR-).
3. Patients with depression will be twice as likely to score
positively.
4. The test result will not be very informative.
5. A negative result will help you to rule out the disease.
March 2017
113
TIME FOR A
BREAK!
March 2017
114
Measures of Benefit (1)
• Consider a new potentially life-saving drug.
How many people do we need to treat to
prevent one death?
• = ‘Number Needed to Treat’ (NNT).
– Treat 5 people & study 5 controls for one year:
• Incidence rate for the control group is 2 deaths per 5
person-years.
• Incidence rate for the experimental group is 1 death
per 5 person-years.
115
March 2017
Number needed to treat (2)
• Treat 5 people for one year:
–
–
–
–
Control therapy: 2 deaths
Exp therapy:
1 death
PREVENTED = 1 death
So, NNT was 5 to prevent 1 death.
• Calculation:
– What is the risk difference?
(aka Absolute Risk Reduction)
 2/5 – 1/5 = 1/5
March 2017
1
NNT =
RD
116
Number needed to treat (3)
For preventing rare diseases you will need to
treat many people to prevent one outcome,
even if the reduction in risk is high:
Relative risk reduction
IR (Old Rx)
IR (New Rx)
RD
NNT
March 2017
= 0.1
= 10/1,000
= 1/1,000
= 9/1,000
= 1000/9
= 111
117
THE END
March 2017
118
43) The classical “epidemiological triad” of
disease causation consists of factors
which fall into which of the following
categories:
a) host, reservoir, environment
b) host, vector, environment
c) reservoir, agent, vector
d) host, agent, environment
e) host, age, environment
March 2017
119
For Mathematicians:
• Some folks may wish to understand the
references to Bayes.
• Calibration by Bayes theorem:
• Remember:
– Post-test odds(+) = pretest odds * LR(+)
– And, the LR(+) is ‘fixed’ across populations
• Need a short aside about odds
March 2017
120
Converting between odds & probabilities
• if prevalence = 0.20, then
• pre-test odds =
= 0.25 (1 to 4)
• if post-test odds = 0.25, then
• PPV =
March 2017
= 0.20
121
Calibration by Bayes’s Theorem
• For diagnostic tests:
– Prevalence is your best guess about the probability
that the patient has the disease, before you do the
test
• Also known as Pretest Probability of Disease
(a+c) / N in 2x2 table
• Is closely related to Pre-test odds of disease
when disease is rare:
(a+c) / (b+d) in 2x2 table
March 2017
122
Test Properties (13)
Diseased
Not diseased
Test +ve
a
b
a+b
Test -ve
c
d
c+d
Prevalence proportion
a+c
b+d
a+b+c+d
=N
Prevalence odds
March 2017
123
Calibration by Bayes’s Theorem
• To ‘calibrate’ your test for a new population:
– Get the LR(+) value (via sen & spec) from the
reference source
– Estimate the pre-test odds for your population
– Compute the post-test odds
– Convert to post-test probability to get PPV
March 2017
124
Example:
(sens 90%, spec 95%, new prevalence 1%)
• Compare to the ‘hypothetical table’ method
(PPV=15.38%)
March 2017
125