Download L5-Diagnostic Tests

Document related concepts
no text concepts found
Transcript
Evaluation of Diagnostic Tests
&
ROC Curve Analysis
PhD Özgür Tosun
Why a physician needs biostatistics?
TODAY’S EXAMPLE
Understanding the “Statistics”
A 50-year-old woman, no symptoms,
participates in routine mammography
screening.
 She tests positive, is alarmed, and wants to
know from you whether she has breast
cancer for certain or what the chances are.
 Apart from the screening results, you know
nothing else about this woman.
 How many women who test positive actually
have breast cancer?

Additional Info
The probability that a woman has breast
cancer is 1% ("prevalence")
 If a woman has breast cancer, the probability
that she tests positive is 90% ("sensitivity")
 If a woman does not have breast cancer, the
probability that she nevertheless tests positive
is 9% (”false positive rate")

Your answer???
a)
b)
c)
d)
nine in 10 (90%)
eight in 10 (80%)
one in 10 (10%)
one in 100 (1%)
ATTENTION !!

The fact that 90% of women with breast
cancer get a positive result from a
mammogram (sensitivity) doesn't mean
that 90% of women with positive results
have breast cancer.
REALITY
Cancer
Healthy
Positive
9
89
98
Negative
1
901
902
10
990
1000
TEST
Prevalance
Sensitivity
False Positive Rate
Answer



Total positive test results among 1,000 women
= 98
Only 9 of them are actually having cancer
How many women who test positive actually have
breast cancer?
◦ 9/98 =~ one in 10 (10%)

The high false positive rate, combined with the
disease's prevalence of 1%, means that roughly
nine out of 10 women with a worrying
mammogram don't actually have breast cancer.
What Doctors Do with the
Question?
In one trial, almost half the group of 160
gynecologists responded that the
woman's chance of having cancer was
nine in 10 (90%).
 Only 21% said that the figure was one in
10 (10%) - which is the correct answer.
 That's a worse result than if the doctors
had been answering at random (25%).

What Happens When Doctor Does
Not Explain the Right Probabilities
to the Patient?
How few specialists understand the risk a woman
with a positive mammogram result is worrying
 We can only imagine how much anxiety those
innumerate doctors cause in women
 This may even lead to unnecessary cancer
treatment to healthy woman
 Research suggests that months after a
mammogram false alarm, up to a quarter of
women are still affected by the process on a
daily basis.

EVALUATION OF
DIAGNOSTIC TESTS
The “Gold Standard” :
What is a Gold Standard ?
 Biopsy results, pathological evaluation,
radiological procedures, prolonged follow
up, autopsies
 Almost always more costly, invasive, less
feasible
 Lack of objective standards of disease (e.g.
angina Pectoris: Gold standard is careful
history taking)

Diagnostic Characteristics
It is not hypothesis testing
BUT
◦ How well does the test identify patients
with a disease?
◦ How well does the test identify patients
without a disease?
Evaluation of the Diagnostic Test

Give a group of people (with and without
the disease) both tests (the candidate test
and the “gold standard” test) and then
cross-classify the results and report the
diagnostic characteristics of the test.
Truth or Gold
Standard
+
+
-
a
(TP)
b
(FP)
Candidate Test
-
c
(FN)
d
(TN)
A perfect test would have b and c equal to 0
Diagnostic Characteristics
Sensitivity: The probability that a diseased
individual will be identified as “diseased”
by the test
= P(T+ / D+) = a/(a+c)
Specificity: The probability that an
individual without the disease will be
identified as “healthy” by the test
= P(T - / D-) = d/(b+d)
Diagnostic Characteristics

False positive rate= Given a subject without the
disease, the probability that he will have a positive
test result
◦ P(T+ / D-)
= b/(b+d)
= 1 – Specificity

False negative rate= Given a subject with the
disease, the probability that he will have a negative
test result
◦ P(T- / D+)
= c/(a+c)
= 1 – Sensitivity
Predictive Values of
Diagnostic Tests


More informative from the patient or
physician perspective
Special applications of Bayes Theorem
Predictive Values of Diagnostic Tests

Positive Predictive Value: The
probability that an individual with a
positive test result has the disease
= P(D+ / T+) = a/(a+b)
Predictive Values of Diagnostic Tests

Negative Predictive Value: The
probability that an individual with a
negative test result does not have the
disease
= P(D- / T-) = d/(c+d)
A LAST SIMPLE
EXAMPLE TO SUM IT
UP
Suppose we have a test statistic for predicting
the presence or absence of disease.
True Disease Status
Pos
Test
Criterion
Pos
Neg
Neg
Suppose we have a test statistic for predicting
the presence or absence of disease.
True Disease Status
Pos
Test
Criterion
Pos
Neg
Neg
Suppose we have a test statistic for predicting the
presence or absence of disease.
True Disease Status
Pos
Test
Criterion
Pos
Neg
Neg
TP 
TP = True Positive
Suppose we have a test statistic for predicting the
presence or absence of disease.
True Disease Status
Pos
Test
Criterion
Pos
Neg
Neg
Suppose we have a test statistic for predicting the
presence or absence of disease.
True Disease Status
Pos
Test
Criterion
Pos
Neg
Neg
FP 
FP = False Positive
Suppose we have a test statistic for predicting the
presence or absence of disease.
True Disease Status
Pos
Test
Criterion
Pos
Neg
Neg
Suppose we have a test statistic for predicting the
presence or absence of disease.
True Disease Status
Pos
Test
Criterion
Pos
Neg
Neg
FN 
FN = False Negative
Suppose we have a test statistic for predicting the
presence or absence of disease.
True Disease Status
Pos
Test
Criterion
Pos
Neg
Neg
Suppose we have a test statistic for predicting the
presence or absence of disease.
True Disease Status
Pos
Test
Criterion
Pos
Neg
Neg
TN 
TN = True Negative
Suppose we have a test statistic for predicting the
presence or absence of disease.
True Disease Status
Test
Criterion
Pos
Neg
Pos
Neg
TP
FN
FP
TN
P
N
P+ N
True Disease Status
Pos
Neg
Test
Pos
TP
FP
Criterion Neg
FN
TN
P
N
Accuracy = Probability that the test yields a
correct result.
= (TP+TN) / (P+N)
P+ N
Test
Criterion
Pos
Neg
True Disease Status
Pos
Neg
TP
FP
FN
TN
P
N
P+ N
Sensitivity = Probability that a true case will test positive
= TP / P
Also referred to as True Positive Rate (TPR)
or True Positive Fraction (TPF).
True Disease Status
Pos
Neg
Test
Pos
TP
FP
Criterion Neg
FN
TN
P
N
P+ N
Specificity = Probability that a true negative will test negative
= TN / N
Also referred to as True Negative Rate (TNR)
or True Negative Fraction (TNF).
True Disease Status
Pos
Neg
Test
Pos
TP
FP
Criterion Neg
FN
TN
P
N
P+ N
False Negative = Prob that a true positive will test negative
Rate
= FN / P = 1 - Sensitivity
Also referred to as False Negative Fraction (FNF).
True Disease Status
Pos
Neg
Test
Pos
TP
FP
Criterion Neg
FN
TN
P
N
P+ N
False Positive = Prob that a true negative will test positive
Rate
= FP / N = 1 - Specificity
Also referred to as False Positive Fraction (FPF).
Test
Criterion
Pos
Neg
Positive Predictive
Value (PPV)
True Disease Status
Pos
Neg
TP
FP
FN
TN
P
N
P+ N
= Probability that a positive test
will truly have disease
= TP / (TP+FP)
True Disease Status
Pos
Neg
Test
Pos
TP
FP
Criterion Neg
FN
TN
P
N
P+ N
Negative Predictive = Probability that a negative test
Value (NPV)
will truly be disease free
= TN / (TN+FN)
True Disease Status
Test
Criterion
Pos
Neg
Se = 27/100 = .27
Sp = 727/900 = .81
FPR = 1- Spe = .19
FNR = 1- Sen = .73
Pos
Neg
27
73
173
727
200
800
100
900
1000
Acc = (27+727)/1000 = .75
PPV = 27/200 = .14
NPV = 727/800 = .91
ROC CURVE
Introduction to ROC curves

ROC = Receiver Operating Characteristic

The ROC curve was first developed by electrical
engineers and radar engineers during World War II for
detecting enemy objects in battle fields

Soon introduced to psychology to account for
perceptual detection of stimuli.

During World War II, for the analysis of radar signals.

Following the attack on Pearl Harbor in 1941, the
United States army began new research to increase
the prediction of correctly detected Japanese aircraft
from their radar signals.
ROC
Receiver Operating Characteristics
• ROC analysis is developed for the
signal receivers in radars
• Basic aim was to distinguish the enemy
signals from normal signals
• It is a graphical analysis method
Development of
Receiver Operating
Characteristics
(ROC) Curves
If you decrease the threshold (cut off), sensitivity will
increase. You will be able to catch every (enemy) plane
signals. However, noise in the data will also increase so
that you will not be able to progress
ROC curve in this example includes alternative threshold (cut off) values
and beware that the sensitivity and specificity will simultaneously change
as we change the threshold. Remember, some signals are from the enemy
planes while some are from normal.
ROC Analysis

“ROC analysis since then has been used in medicine,
radiology, biometrics, and other areas for many decades.”

In medicine, ROC analysis has been extensively used in the
evaluation of diagnostic tests.

ROC curves are also used extensively in epidemiology and
medical research

Evidence-based medicine.

In radiology, ROC analysis is a common technique to
evaluate new radiology techniques.

Can be used to compare tests & procedures
ROC Curves
Use and interpretation
The ROC methodology easily generalizes to
test statistics that are continuous (such as
lung function or a blood gas).
 The ROC curve allows us to see, in a simple
visual display, how sensitivity and
specificity vary as our threshold varies.
 The shape of the curve also gives us some
visual clues about the overall strength of
association between the underlying test
statistic and disease status.

Example
People
without the
disease
People
with
disease
Test Result
Threshold
Call these patients “negative”
Call these patients “positive”
Test Result
Some definitions ...
Call these patients “negative”
Call these patients “positive”
True Positives
Test Result
without the disease
with the disease
Call these patients “negative”
Call these patients “positive”
Test Result
without the disease
with the disease
False
Positives
Call these patients “negative”
Call these patients “positive”
True
negatives
Test Result
without the disease
with the disease
Call these patients “negative”
Call these patients “positive”
False
negatives
Test Result
without the disease
with the disease
Moving the Threshold: right
‘‘-’’
‘‘+’’
Test Result
without the disease
with the disease
Moving the Threshold: left
‘‘-’’
‘‘+’’
Test Result
without the disease
with the disease
GOLD
STANDARD
Diseased
Healthy
ALTERNATIVE TEST
GOLD
STANDARD
9
8
7
6
5
4
3
2
1
0
Diseased
Healthy
0
100
200
Test parameter, mg/dl
300
F
r
e
q
u
e
n
c
y
ALTERNATIVE TEST
Healthy
0
Diseased
100
200
6
5
4
3
2
1
0
F
r
e
q
u
e
n
c
y
300
Test parameter, mg/dl
ALTERNATIVE TEST
6
5
4
3
2
1
0
1
2
3
4
5
6
Diseased
Healthy
0
100
200
Test parameter, mg/dl
300
F
r
e
q
u
e
n
c
y
ALTERNATIVE TEST
GOLD
STANDARD
Diseased
Healthy
FN
TP
TN
FP
5
4
3
2
1
0
1
2
3
4
5
6
Negative outcome Positive outcome
0
100
200
300
400
Test parametresi, mg/dl
F
r
e
q
u
e
n
c
y
500
FN False Negative
TP True Positive
TN True Negative
FP False Positive
FN
TP
TN
FP
FN
TP
TN
FP
Sensitivity and Specificity
Sensitivity
Ability of a test to correctly diagnose the real
patients.
FN
Sensitivity = TP / ( TP + FN )
TP
Specificity
Ability of a test to correctly diagnose
the real healthy people.
Specificity = TN / ( TN + FP )
TN
FP
“Receiver Operating Characteristic” Curve
It is the graphical representation of all sensitivity and specificity
combinations for every possible threshold (cut off) value. Aim is to
differenciate the diseased and healthy subjects.
Measured
Value
ı ı ı ı ı ı ı ı ı ı
Sensitivity
Frequency
1.0 -ı
0.8 0.6 0.4 0.2 0.0-ı
1.0
ı ı ı ı ı ı ı ı ı ı
0.8
0.6
0.4
Specificity
FN
00
1 25
24 TP
25
Sensitivity
Sensitivity
Sensitivity::24
:25
25
25/ //25
/25
25
25====0.96
1.00
1.00
1.00
Sensitivity:
TN
5
8 20
17 FP
1
24
0
3
25
22
Specificity:
Specificity:
Specificity: 0
3
1
25
25==
=
=0.32
0.12
0.00
0.04
0.20
85////25
0.2
0.0
“Receiver Operating Characteristic” Curve
Sensitivity
1
Frequency
Measured value
1
Specificity
Area Under the Curve (AUC) shows the diagnostic
performance of a test.
AUC is between 0.5 and 1.0
0
“Receiver Operating Characteristic” Curve
We can use ROC curves to compare the diagnostic performances of
more than one alternative tests.
Frequency
Frequency
Measured
value
Measure
d value
1
Test 1
Sen
1
Spe
0
0
Test 2
ROC curve
True Positive Rate
(sensitivity)
100%
0%
0%
False Positive Rate
(1-specificity)
100%
ROC curve comparison
A poor test:
A good test:
100%
True Positive Rate
True Positive Rate
100%
0
%
0
%
100%
False Positive Rate
0
%
0
%
100%
False Positive Rate
ROC curve extremes
Best Test:
Worst test:
100%
True Positive
Rate
True Positive Rate
100%
0
%
0
%
0
%
False Positive
Rate
100
%
The distributions
don’t overlap at all
0
%
False Positive
Rate
100
%
The distributions
overlap completely
(Tossing a coin)
Area under ROC curve (AUC)

Overall measure of test performance

Comparisons between two tests based on
differences between (estimated) AUC

For continuous data, AUC equivalent to MannWhitney U-statistic (nonparametric test of
difference in location between two populations)
AUC for ROC curves
100%
100%
True Positive
Rate
True Positive Rate
AUC = 100%
0
%
0
%
0
%
False Positive
Rate
100
%
0
%
False Positive
Rate
100
%
100%
100%
0
%
False Positive
Rate
True Positive
Rate
AUC = 90%
True Positive
Rate
0
%
AUC = 50%
100
%
0
%
AUC = 65%
0
%
False Positive
Rate
100
%
Interpretation of AUC

AUC can be interpreted as the probability that
the test result from a randomly chosen
diseased individual is more indicative of disease
than that from a randomly chosen healthy
individual

No clinically relevant meaning