Download Logic Argument of Research Article

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Chapter 6-4. Sampling With Verification Bias
*** This chapter is under construction ***
Verification bias is a problem in cohort studies where the screened positive cases are more
completely verified for true disease status than are the screened negative cases. That is, the
referemce standard variable is collected a greater proportion of the time when the diagnostic test
is positive than when it is negative. Other terms for verification bias are work-up bias, referral
bias, selection bias, and ascertainment bias. (Pepe, 2003, pp.168-169).
The bias introduces a sensitivity estimate that is too high and a specificity that is too low, when
using ordinary formulas (naïve estimates) for these test characteristics based on your sample data.
You can quote Pepe (2003, p.169) as a citation for this:
“When screen positives are more likely to be verified for disease than screen negatives,
the bias in naïve estimates is always to increase sensitivity and to decrease specificity
from their true values.”
Pepe (2003, p. 168) gives the following example.
The table on the left shows data for a cohort where all screens (Y) are verified with the
reference standard, or true disease state (D). On the right, all screen positives are verified
but only 10% of screen negatives are verified.
Y=1
Y=0
Fully observed
D = 1 D=0
40
95
135
10
855
865
50
950
1000
Selected data
D = 1 D=0
40
95
1
85
41
180
135
86
221
Fully observed:
Sensitivity = True Positive Fraction (TPF) = 40/50 = 80%
False Positive Fraction (FPF) = 95/950 = 10%
Specificity = 1 – FPF = 855/950 = 90%
(Bias) naïve estimates based on selected data
Sensitivity = TPF = 40/41 = 97.6%
FPF = 95/180 = 52.8%
Specificity = 1 – FPF = 85/180 = 47.2%
_________________
Source: Stoddard GJ. Biostatistics and Epidemiology Using Stata: A Course Manual [unpublished manuscript] University of Utah
School of Medicine, 2010.
Chapter 6-4 (revision 16 May 2010)
p. 1
In this example, the naïve sample estimates are biased, with sensitivity being too high and
specificity being too low, which is consistent with the known direction of this bias. If the
estimates were not biased, the sample (selected data table), which is assumed representative of
the population (fully observed table), should provide estimates that accurately reflect the
population values.
Correcting for Bias with Baye’s Theorem
To correct for bias with Bayes’ theorem, to get a TPF and FPF that are adjusted for verification
bias, we can use (Pepe, 2003, p.171):
TPF  P[Y  1| D  1]
P[ D  1| Y  1]P[Y  1]
P[ D  1]
P[ D  1| Y  1]P[Y  1]

P[ D  1| Y  1]P[Y  1]  P[ D  1| Y  0]P[Y  0]

FPF  P[Y  1| D  0]
P[ D  0 | Y  1]P[Y  1]
P[ D  0]
P[ D  0 | Y  1]P[Y  1]

P[ D  0 | Y  1]P[Y  1]  P[ D  0 | Y  0]P[Y  0]

Chapter 6-4 (revision 16 May 2010)
p. 2
These are called the Begg and Greenes bias adjusted estimates (Begg and Greenes, 1983). Pepe
(2003, p.171) denotes these as
ˆ
ˆ
and FPF
TPF
BG
BG
Calculating these from our data tables,
Fully observed
D = 1 D=0
40
95
135
10
855
865
50
950
1000
Y=1
Y=0
Selected data
D = 1 D=0
40
95
1
85
41
180
135
86
221
Using the fully observed table to calculate P[Y = 1] and P[Y = 0], and the selected data table to
calculate the remaining terms,
ˆ
TPF
BG 
P[ D  1| Y  1]P[Y  1]
P[ D  1| Y  1]P[Y  1]  P[ D  1| Y  0]P[Y  0]
 40   135 



0.2963  0.135
 135   1000 


 0.7995  0.80
 40   135   1   865  0.2963  0.135  0.0116  0.865


   

 135   1000   86   1000 
ˆ
FPF
BG 
P[ D  0 | Y  1]P[Y  1]
P[ D  0 | Y  1]P[Y  1]  P[ D  0 | Y  0]P[Y  0]
 95   135 



0.7037  0.135
 135   1000 


 0.1000  0.10
 95   135   85   865  0.7037  0.135  0.9884  0.865


   

 135   1000   86   1000 
These estimates are identical to those from the fully observed table, and thus unbiased.
Chapter 6-4 (revision 16 May 2010)
p. 3
Inverse Probability Weighting/Imputation
We can also get an unbiased estimate by recreated the fully observed table from the selected data
based on the probability of verification of the screened result.
Defining a variable V, which is an indicator variable for disease verification status (Pepe, 2003,
p.169),
1 if D is ascertained

V 

0 if D is not ascertained 
we then multiply every cell in the selected data table by
1
, which is the inverse of the estimated selection probability.
P̂[V  1| Y ]
The result is called the inverse probability weighted table or the imputed data table (Pepe, 2003,
p.171).
In the example,
P̂[V  1| Y  1] = 1.0 , since all screened positives were verified
P̂[V  1| Y  0] = 0.1 , since 10% of screened negatives were verified.
Inverse weighting the cells of the selected data table,
Y=1
Y=0
Selected data
D = 1 D=0
40
95
1
85
41
180
135
86
221
× 1/1.0
× 1/0.1
Imputed data
D = 1 D=0
40
95
10
850
50
945
135
860
995
and then calculating the test characteristics using ordinary formulas,
ˆ
TPF
IPW = 40/50 = 0.80 = 80%
ˆ
FPF
IPW = 95/945 = 0.1005 = 10%
Chapter 6-4 (revision 16 May 2010)
p. 4
We see that the following expressions hold,
ˆ
ˆ
TPF
BG = TPFIPW
ˆ
ˆ
FPF
BG = FPFIPW
That is, the Begg and Greenes and inverse probability weighted estimates of TPF and FPF are the
same, so one can use either approach to calculated the bias-corrected classification probabilities
(Pepe, 2003, p.172).
Pepe (2003, p.172) states,
“The Begg and Greenes estimators are the maximum likelihood estimates when
observations are independent (Zhou, 1993).”
Pepe (2003, p.172) provide the variances formulas derived by Begg and Greens (1983), which
are:
ˆ

TPF
BG
var  log
ˆ
1-TPFBG

 1 1

1  PPV
NPV


 

V
V
 N   (1   ) PPV P1  (1  NPV) P0 (1   ) 
ˆ

FPF
BG
var  log
ˆ
1-FPFBG

 1 1

PPV
1-NPV


 

V
V
 N   (1   ) (1-PPV) P1  NPV P0 (1   ) 
and
where  = P[Y=1]
N = study cohort sample size (the “fully observed table” N)
P1V = proportion of subjects for whom Y = 1 that are verified for
disease status
V
P0 = proportion of subjects for whom Y = 0 that are verified for
disease status
Chapter 6-4 (revision 16 May 2010)
p. 5
Returning to the example,
Imputed data
D = 1 D=0
Y = 1 40
95
135
Y = 0 10
850
860
50
945
995
N = 1000 actual study cohort size
we have N = 1000
 = P[Y=1] = 135/1000 = 0.1350
ˆ = 40/135 = 0.2963
PPV
ˆ = 850/860 = 0.9884
NPV
P1V = 1.0 , all screened positive were verified
P0V = 0.1 , 10% of screened negative were verified
Substituting these values into the variance formulas,
ˆ

 1 1

TPF
1  PPV
NPV
BG
var  log
 




ˆ
N   (1   ) PPV P1V (1  NPV) P0V (1   ) 
1-TPF
BG 



1 
1
1  0.2963
0.9984




1000  0.1350(1  0.1350) 0.2963 1 0.1350 (1  0.9984)  0.1 (1  0.1350) 
 1.011
ˆ

TPF
BG
se  log
ˆ
1-TPFBG

ˆ



TPF
BG
  var  log
  1.011  1.0056
ˆ
1-TPF
BG 


ˆ

FPF
BG
var  log
ˆ
1-FPFBG

 1 1

PPV
1-NPV


 

V
V
 N   (1   ) (1-PPV) P1  NPV P0 (1   ) 


1 
1
0.2963
1-0.9984




1000  0.1350(1  0.1350) (1-0.2963) 1  0.1350 0.9984  0.1 (1  0.1350) 
 .0117
ˆ
ˆ




FPF
FPF
BG
BG
se  log
 var  log

  .0117  0.1082
ˆ
ˆ
1-FPF
1-FPF
BG 
BG 


Chapter 6-4 (revision 16 May 2010)
p. 6
The asymptotic confidence interval around log
ˆ
TPF
BG
is given by
ˆ
1-TPFBG
ˆ
ˆ


TPF
TPF
BG
BG
 1.96se  log

ˆ
ˆ
1-TPF
1-TPF
BG
BG 

log(0.8 / 0.2)  1.96 1.0056
log
(0.59,3.36)
Similarly, the asymptotic confidence interval around log
ˆ
FPF
BG
is given by
ˆ
1-FPF
BG
ˆ
ˆ


FPF
FPF
BG
BG
 1.96se  log

ˆ
ˆ
1-FPF
1-FPF
BG
BG 

log(0.1/ 0.9)  1.96 1.082
log
(2.41, 1.99)
To convert these to confidence intervals around TPF and FPF, we use the transformation (Begg
and Greenes, 1983),

 
 1 
  exp log 

  
 
  1  exp log 
 
  1 
 
 
 
The 95% CI for TPF, or sensitivity, where sensitivity was estimated as 0.80, is
 exp(0.59)
exp(3.36) 
,


 1  exp(0.59) 1  exp(3.36) 
(0.36 , 0.97)
The 95% CI for FPF, where FPF was estimated as 0.10, is
 exp(2.41)
exp(1.99) 
,


 1  exp(2.41) 1  exp(1.99) 
(0.08 , 0.12)
Using specificity = 1-FPF, and switching the CI limits from (a,b) to (b,a) to account for taking
the additive inverse, the 95% CI for specificity, where specificity was estimated as 0.90, is
(1-0.12 , 1-0.08) = (0.88, 0.92)
Chapter 6-4 (revision 16 May 2010)
p. 7
These are very wide, uniformative intervals. Pepe (2003, p.174) compared the asymptotic CI, or
large sample theory CI, computed above for TPF, (0.36 , 0.97) to a bootstrapped CI using the
percentile method, or (2.5th , 97.5th) percentiles, which was (0.67, 0.89), a much narrower and
different CI.
Pepe (2003, p.174) used this example, along with a sensitivity analysis of this interval, to
conclude that the applicability of large sample theory to inference in realistic sample sizes is
called into question. She suggested that 1) a CI derived by resampling/simulation, that is, a
bootstrapped CI, would be better suited to the realistic sample sizes encountered in practice; or
2) in practice, one might at least compare results from asymptotic theory with those from
resampling/simulation.
Chapter 6-4 (revision 16 May 2010)
p. 8
Software
The Begg and Greenes unibased estimates for sensitivity (TPF) and specificity (1-FPF) , along
with the confidence interval, can be computed using the Stata ado-file, begggreenes.ado [Author:
Greg Stoddard].
Note: Using begggreenes.ado
In the command window, execute the command
sysdir
This tells you the directories Stata searches to find commands, or ado files.
It will look like:
STATA: C:\Program Files\Stata10\
UPDATES: C:\Program Files\Stata10\ado\updates\
BASE: C:\Program Files\Stata10\ado\base\
SITE: C:\Program Files\Stata10\ado\site\
PLUS: c:\ado\plus\
PERSONAL: c:\ado\personal\
OLDPLACE: c:\ado\
I suggest you copy the file begggreenes.ado and begggreenes.hlp from the course manual ado
files subdirectory to the c:\ado\personal\ directory. Alternatively, you can simply make sure
these two files are in your working directory (the directory shown in bottom left-corner of Stata
screen). Having done that, begggreenes becomes an executable command in your installation of
Stata. If the directory c:\ado\personal\ does not exit, then you should create it using Windows
Explorer (My Documents icon), and then copy the two files into this directory. The directory is
normally created by Stata the first time you update Stata.
To get help for begggreenes, use help begggreenes in the command window.
To execute, use the command begggreenes followed by the two required variable names and
three options.
Chapter 6-4 (revision 16 May 2010)
p. 9
The syntax is found in the help file.
help begggreenes
Syntax for begggreenes
---------------------------------------------------------------------[by byvar:] begggreenes yvar dvar [if] [in] , cohortsize( ) pv1( ) pv0( )
where yvar is name of dichotomous test variable
dvar is name of dichotomous disease variable (gold standard)
cohortsize(n), where n= size of study cohort
pv1(x), where x=number between 0 and 1 is the proportion of
the yvar=1 subjects in the cohort that have nonmissing
dvar (have verification of disease)
pv0(y), where y=number between 0 and 1 is the proportion of
the yvar=0 subjects in the cohort that have nonmissing
dvar (have verification of disease)
Note: the two variables and 3 options are required.
Description
----------begggreenes computes the Begg and Greenes (1983) unbiased estimators for
sensitivity and specificity, along with both asymptotic and bootstrapped CIs.
Reference
--------Begg CB, Greenes RA. Assessment of diagnostic tests when disease
verification is subject to selection bias. Biometrics 1983;39:207-215.
Example
-------begggreenes yvar dvar , cohortsize(1000) pv1(1) pv0(.1)
Chapter 6-4 (revision 16 May 2010)
p. 10
To obtain the statistics discussed in the example, first bring the data into Stata using
clear
input yvar dvar count
1 1 40
1 0 95
0 1 1
0 0 85
end
expand count
drop count
Then, to compute the statistics, use
begggreenes yvar dvar , cohortsize(1000) pv1(1) pv0(.1)
Sample Data
test
disease (gold)
+
---------------+ |
40
95 |
135
- |
1
85 |
86
----------------------41
180 |
221
Imputed Inverse Probability Weighting Population Data
disease (gold)
+
---------------test
+ |
40
95 |
135
- |
10
850 |
860
----------------------50
945 |
995
Sensitivity (Begg & Greenes) = 0.7991
95% CI (0.3571 , 0.9661)
Specificity (Begg & Greenes) = 0.9000
95% CI (0.8791 , 0.9176)
Sensitivity (Inverse Probability Weighted) =
Specificity (Inverse Probability Weighted) =
Cohort N =
1000
Proportion cohort with positive test disease
Proportion cohort with negative test disease
0.8000
0.8995
verified = 1.0000
verified = 0.1000
These results agree with the results in the above text, as well as agree with those shown in Pepe
(2003) where she presented this example.
Chapter 6-4 (revision 16 May 2010)
p. 11
To get bootstrapped confidences intervals, as suggested by Pepe (2003), use the following
command. It will use four bootstrapping methods. The most popularly reported approach is the
bias-corrected CI, although the bias-corrected and accelerated CI is supposed to be superior.
bootstrap r(unbiased_sensitivity_BG) r(unbiased_specificity_BG), ///
reps(1000) size(221) seed(999) bca: ///
begggreenes yvar dvar , cohortsize(1000) pv1(1) pv0(.1)
estat bootstrap, all
Bootstrap results
command:
_bs_1:
_bs_2:
Number of obs
Replications
=
=
221
1000
begggreenes yvar dvar, cohortsize(1000) pv1(1) pv0(.1)
r(unbiased_sensitivity)
r(unbiased_specificity)
-----------------------------------------------------------------------------|
Observed
Bootstrap
|
Coef.
Bias
Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------_bs_1 |
.79907085
.0366465
.15247002
.5002351
1.097907
(N)
|
.5238683
1
(P)
|
.4854757
1 (BC)
|
.3863797
1 (BCa)
_bs_2 |
.89999388
5.30e-06
.0074108
.885469
.9145188
(N)
|
.8849489
.9142065
(P)
|
.8843828
.9136888 (BC)
|
.8843828
.9136316 (BCa)
-----------------------------------------------------------------------------(N)
normal confidence interval
(P)
percentile confidence interval
(BC)
bias-corrected confidence interval
(BCa) bias-corrected and accelerated confidence interval
Article Suggestion
Here is an suggestion for reporting this approach in the Statistical Methods section of your
article.
Given that 100% of the patients who tested positive on the screening test had their disease
verified using the gold standard test, while only 10% of the patients who tested negative
on the screening test had their disease verified, ordinary estimates of sensitivity and
specificity are subject to verification bias (Pepe, 2003). Therefore, we report Begg and
Greenes estimates of sensitivity and specificity, where the estimates are corrected for
verification bias using a Bayes Theorem approach (Begg and Greenes, 1983; Pepe, 2003).
Pepe has shown the asymptotic confidence intervals to be unreliable for sample sizes used
in research studies and instead recommends bootstrapped confidence intervals (Pepe,
2003). Thus, we report boostrapped confidence intervals using the “bias-corrected”
method (Carpenter and Bithell, 2000), where “bias” used in this sense is not referring to
verification bias, but rather is making the confidence intervals closer to their expected
value.
Chapter 6-4 (revision 16 May 2010)
p. 12
References
Begg CB, Greenes RA. (1983). Assessment of diagnostic tests when disease verification is
subject to selection bias. Biometrics 39:207-215.
Carpenter J, Bithell J. (2000). Bootstrap confidence intervals: when, which, what? A practical
guide for medical statisticians. Statist. Med. 19:1141-1164.
Pepe MS. (2003). The Statistical Evaluation of Medical Tests for Classifcation and Prediction,
New York, Oxford University Press, p.168-173.
Zhou XH. (1993). Maximum likelihood estimators of sensitivity and specificity corrected for
verification bias. Communication in Statistics—Theory and Methods, 22:3177-98.
Chapter 6-4 (revision 16 May 2010)
p. 13