Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 6-4. Sampling With Verification Bias *** This chapter is under construction *** Verification bias is a problem in cohort studies where the screened positive cases are more completely verified for true disease status than are the screened negative cases. That is, the referemce standard variable is collected a greater proportion of the time when the diagnostic test is positive than when it is negative. Other terms for verification bias are work-up bias, referral bias, selection bias, and ascertainment bias. (Pepe, 2003, pp.168-169). The bias introduces a sensitivity estimate that is too high and a specificity that is too low, when using ordinary formulas (naïve estimates) for these test characteristics based on your sample data. You can quote Pepe (2003, p.169) as a citation for this: “When screen positives are more likely to be verified for disease than screen negatives, the bias in naïve estimates is always to increase sensitivity and to decrease specificity from their true values.” Pepe (2003, p. 168) gives the following example. The table on the left shows data for a cohort where all screens (Y) are verified with the reference standard, or true disease state (D). On the right, all screen positives are verified but only 10% of screen negatives are verified. Y=1 Y=0 Fully observed D = 1 D=0 40 95 135 10 855 865 50 950 1000 Selected data D = 1 D=0 40 95 1 85 41 180 135 86 221 Fully observed: Sensitivity = True Positive Fraction (TPF) = 40/50 = 80% False Positive Fraction (FPF) = 95/950 = 10% Specificity = 1 – FPF = 855/950 = 90% (Bias) naïve estimates based on selected data Sensitivity = TPF = 40/41 = 97.6% FPF = 95/180 = 52.8% Specificity = 1 – FPF = 85/180 = 47.2% _________________ Source: Stoddard GJ. Biostatistics and Epidemiology Using Stata: A Course Manual [unpublished manuscript] University of Utah School of Medicine, 2010. Chapter 6-4 (revision 16 May 2010) p. 1 In this example, the naïve sample estimates are biased, with sensitivity being too high and specificity being too low, which is consistent with the known direction of this bias. If the estimates were not biased, the sample (selected data table), which is assumed representative of the population (fully observed table), should provide estimates that accurately reflect the population values. Correcting for Bias with Baye’s Theorem To correct for bias with Bayes’ theorem, to get a TPF and FPF that are adjusted for verification bias, we can use (Pepe, 2003, p.171): TPF P[Y 1| D 1] P[ D 1| Y 1]P[Y 1] P[ D 1] P[ D 1| Y 1]P[Y 1] P[ D 1| Y 1]P[Y 1] P[ D 1| Y 0]P[Y 0] FPF P[Y 1| D 0] P[ D 0 | Y 1]P[Y 1] P[ D 0] P[ D 0 | Y 1]P[Y 1] P[ D 0 | Y 1]P[Y 1] P[ D 0 | Y 0]P[Y 0] Chapter 6-4 (revision 16 May 2010) p. 2 These are called the Begg and Greenes bias adjusted estimates (Begg and Greenes, 1983). Pepe (2003, p.171) denotes these as ˆ ˆ and FPF TPF BG BG Calculating these from our data tables, Fully observed D = 1 D=0 40 95 135 10 855 865 50 950 1000 Y=1 Y=0 Selected data D = 1 D=0 40 95 1 85 41 180 135 86 221 Using the fully observed table to calculate P[Y = 1] and P[Y = 0], and the selected data table to calculate the remaining terms, ˆ TPF BG P[ D 1| Y 1]P[Y 1] P[ D 1| Y 1]P[Y 1] P[ D 1| Y 0]P[Y 0] 40 135 0.2963 0.135 135 1000 0.7995 0.80 40 135 1 865 0.2963 0.135 0.0116 0.865 135 1000 86 1000 ˆ FPF BG P[ D 0 | Y 1]P[Y 1] P[ D 0 | Y 1]P[Y 1] P[ D 0 | Y 0]P[Y 0] 95 135 0.7037 0.135 135 1000 0.1000 0.10 95 135 85 865 0.7037 0.135 0.9884 0.865 135 1000 86 1000 These estimates are identical to those from the fully observed table, and thus unbiased. Chapter 6-4 (revision 16 May 2010) p. 3 Inverse Probability Weighting/Imputation We can also get an unbiased estimate by recreated the fully observed table from the selected data based on the probability of verification of the screened result. Defining a variable V, which is an indicator variable for disease verification status (Pepe, 2003, p.169), 1 if D is ascertained V 0 if D is not ascertained we then multiply every cell in the selected data table by 1 , which is the inverse of the estimated selection probability. P̂[V 1| Y ] The result is called the inverse probability weighted table or the imputed data table (Pepe, 2003, p.171). In the example, P̂[V 1| Y 1] = 1.0 , since all screened positives were verified P̂[V 1| Y 0] = 0.1 , since 10% of screened negatives were verified. Inverse weighting the cells of the selected data table, Y=1 Y=0 Selected data D = 1 D=0 40 95 1 85 41 180 135 86 221 × 1/1.0 × 1/0.1 Imputed data D = 1 D=0 40 95 10 850 50 945 135 860 995 and then calculating the test characteristics using ordinary formulas, ˆ TPF IPW = 40/50 = 0.80 = 80% ˆ FPF IPW = 95/945 = 0.1005 = 10% Chapter 6-4 (revision 16 May 2010) p. 4 We see that the following expressions hold, ˆ ˆ TPF BG = TPFIPW ˆ ˆ FPF BG = FPFIPW That is, the Begg and Greenes and inverse probability weighted estimates of TPF and FPF are the same, so one can use either approach to calculated the bias-corrected classification probabilities (Pepe, 2003, p.172). Pepe (2003, p.172) states, “The Begg and Greenes estimators are the maximum likelihood estimates when observations are independent (Zhou, 1993).” Pepe (2003, p.172) provide the variances formulas derived by Begg and Greens (1983), which are: ˆ TPF BG var log ˆ 1-TPFBG 1 1 1 PPV NPV V V N (1 ) PPV P1 (1 NPV) P0 (1 ) ˆ FPF BG var log ˆ 1-FPFBG 1 1 PPV 1-NPV V V N (1 ) (1-PPV) P1 NPV P0 (1 ) and where = P[Y=1] N = study cohort sample size (the “fully observed table” N) P1V = proportion of subjects for whom Y = 1 that are verified for disease status V P0 = proportion of subjects for whom Y = 0 that are verified for disease status Chapter 6-4 (revision 16 May 2010) p. 5 Returning to the example, Imputed data D = 1 D=0 Y = 1 40 95 135 Y = 0 10 850 860 50 945 995 N = 1000 actual study cohort size we have N = 1000 = P[Y=1] = 135/1000 = 0.1350 ˆ = 40/135 = 0.2963 PPV ˆ = 850/860 = 0.9884 NPV P1V = 1.0 , all screened positive were verified P0V = 0.1 , 10% of screened negative were verified Substituting these values into the variance formulas, ˆ 1 1 TPF 1 PPV NPV BG var log ˆ N (1 ) PPV P1V (1 NPV) P0V (1 ) 1-TPF BG 1 1 1 0.2963 0.9984 1000 0.1350(1 0.1350) 0.2963 1 0.1350 (1 0.9984) 0.1 (1 0.1350) 1.011 ˆ TPF BG se log ˆ 1-TPFBG ˆ TPF BG var log 1.011 1.0056 ˆ 1-TPF BG ˆ FPF BG var log ˆ 1-FPFBG 1 1 PPV 1-NPV V V N (1 ) (1-PPV) P1 NPV P0 (1 ) 1 1 0.2963 1-0.9984 1000 0.1350(1 0.1350) (1-0.2963) 1 0.1350 0.9984 0.1 (1 0.1350) .0117 ˆ ˆ FPF FPF BG BG se log var log .0117 0.1082 ˆ ˆ 1-FPF 1-FPF BG BG Chapter 6-4 (revision 16 May 2010) p. 6 The asymptotic confidence interval around log ˆ TPF BG is given by ˆ 1-TPFBG ˆ ˆ TPF TPF BG BG 1.96se log ˆ ˆ 1-TPF 1-TPF BG BG log(0.8 / 0.2) 1.96 1.0056 log (0.59,3.36) Similarly, the asymptotic confidence interval around log ˆ FPF BG is given by ˆ 1-FPF BG ˆ ˆ FPF FPF BG BG 1.96se log ˆ ˆ 1-FPF 1-FPF BG BG log(0.1/ 0.9) 1.96 1.082 log (2.41, 1.99) To convert these to confidence intervals around TPF and FPF, we use the transformation (Begg and Greenes, 1983), 1 exp log 1 exp log 1 The 95% CI for TPF, or sensitivity, where sensitivity was estimated as 0.80, is exp(0.59) exp(3.36) , 1 exp(0.59) 1 exp(3.36) (0.36 , 0.97) The 95% CI for FPF, where FPF was estimated as 0.10, is exp(2.41) exp(1.99) , 1 exp(2.41) 1 exp(1.99) (0.08 , 0.12) Using specificity = 1-FPF, and switching the CI limits from (a,b) to (b,a) to account for taking the additive inverse, the 95% CI for specificity, where specificity was estimated as 0.90, is (1-0.12 , 1-0.08) = (0.88, 0.92) Chapter 6-4 (revision 16 May 2010) p. 7 These are very wide, uniformative intervals. Pepe (2003, p.174) compared the asymptotic CI, or large sample theory CI, computed above for TPF, (0.36 , 0.97) to a bootstrapped CI using the percentile method, or (2.5th , 97.5th) percentiles, which was (0.67, 0.89), a much narrower and different CI. Pepe (2003, p.174) used this example, along with a sensitivity analysis of this interval, to conclude that the applicability of large sample theory to inference in realistic sample sizes is called into question. She suggested that 1) a CI derived by resampling/simulation, that is, a bootstrapped CI, would be better suited to the realistic sample sizes encountered in practice; or 2) in practice, one might at least compare results from asymptotic theory with those from resampling/simulation. Chapter 6-4 (revision 16 May 2010) p. 8 Software The Begg and Greenes unibased estimates for sensitivity (TPF) and specificity (1-FPF) , along with the confidence interval, can be computed using the Stata ado-file, begggreenes.ado [Author: Greg Stoddard]. Note: Using begggreenes.ado In the command window, execute the command sysdir This tells you the directories Stata searches to find commands, or ado files. It will look like: STATA: C:\Program Files\Stata10\ UPDATES: C:\Program Files\Stata10\ado\updates\ BASE: C:\Program Files\Stata10\ado\base\ SITE: C:\Program Files\Stata10\ado\site\ PLUS: c:\ado\plus\ PERSONAL: c:\ado\personal\ OLDPLACE: c:\ado\ I suggest you copy the file begggreenes.ado and begggreenes.hlp from the course manual ado files subdirectory to the c:\ado\personal\ directory. Alternatively, you can simply make sure these two files are in your working directory (the directory shown in bottom left-corner of Stata screen). Having done that, begggreenes becomes an executable command in your installation of Stata. If the directory c:\ado\personal\ does not exit, then you should create it using Windows Explorer (My Documents icon), and then copy the two files into this directory. The directory is normally created by Stata the first time you update Stata. To get help for begggreenes, use help begggreenes in the command window. To execute, use the command begggreenes followed by the two required variable names and three options. Chapter 6-4 (revision 16 May 2010) p. 9 The syntax is found in the help file. help begggreenes Syntax for begggreenes ---------------------------------------------------------------------[by byvar:] begggreenes yvar dvar [if] [in] , cohortsize( ) pv1( ) pv0( ) where yvar is name of dichotomous test variable dvar is name of dichotomous disease variable (gold standard) cohortsize(n), where n= size of study cohort pv1(x), where x=number between 0 and 1 is the proportion of the yvar=1 subjects in the cohort that have nonmissing dvar (have verification of disease) pv0(y), where y=number between 0 and 1 is the proportion of the yvar=0 subjects in the cohort that have nonmissing dvar (have verification of disease) Note: the two variables and 3 options are required. Description ----------begggreenes computes the Begg and Greenes (1983) unbiased estimators for sensitivity and specificity, along with both asymptotic and bootstrapped CIs. Reference --------Begg CB, Greenes RA. Assessment of diagnostic tests when disease verification is subject to selection bias. Biometrics 1983;39:207-215. Example -------begggreenes yvar dvar , cohortsize(1000) pv1(1) pv0(.1) Chapter 6-4 (revision 16 May 2010) p. 10 To obtain the statistics discussed in the example, first bring the data into Stata using clear input yvar dvar count 1 1 40 1 0 95 0 1 1 0 0 85 end expand count drop count Then, to compute the statistics, use begggreenes yvar dvar , cohortsize(1000) pv1(1) pv0(.1) Sample Data test disease (gold) + ---------------+ | 40 95 | 135 - | 1 85 | 86 ----------------------41 180 | 221 Imputed Inverse Probability Weighting Population Data disease (gold) + ---------------test + | 40 95 | 135 - | 10 850 | 860 ----------------------50 945 | 995 Sensitivity (Begg & Greenes) = 0.7991 95% CI (0.3571 , 0.9661) Specificity (Begg & Greenes) = 0.9000 95% CI (0.8791 , 0.9176) Sensitivity (Inverse Probability Weighted) = Specificity (Inverse Probability Weighted) = Cohort N = 1000 Proportion cohort with positive test disease Proportion cohort with negative test disease 0.8000 0.8995 verified = 1.0000 verified = 0.1000 These results agree with the results in the above text, as well as agree with those shown in Pepe (2003) where she presented this example. Chapter 6-4 (revision 16 May 2010) p. 11 To get bootstrapped confidences intervals, as suggested by Pepe (2003), use the following command. It will use four bootstrapping methods. The most popularly reported approach is the bias-corrected CI, although the bias-corrected and accelerated CI is supposed to be superior. bootstrap r(unbiased_sensitivity_BG) r(unbiased_specificity_BG), /// reps(1000) size(221) seed(999) bca: /// begggreenes yvar dvar , cohortsize(1000) pv1(1) pv0(.1) estat bootstrap, all Bootstrap results command: _bs_1: _bs_2: Number of obs Replications = = 221 1000 begggreenes yvar dvar, cohortsize(1000) pv1(1) pv0(.1) r(unbiased_sensitivity) r(unbiased_specificity) -----------------------------------------------------------------------------| Observed Bootstrap | Coef. Bias Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------_bs_1 | .79907085 .0366465 .15247002 .5002351 1.097907 (N) | .5238683 1 (P) | .4854757 1 (BC) | .3863797 1 (BCa) _bs_2 | .89999388 5.30e-06 .0074108 .885469 .9145188 (N) | .8849489 .9142065 (P) | .8843828 .9136888 (BC) | .8843828 .9136316 (BCa) -----------------------------------------------------------------------------(N) normal confidence interval (P) percentile confidence interval (BC) bias-corrected confidence interval (BCa) bias-corrected and accelerated confidence interval Article Suggestion Here is an suggestion for reporting this approach in the Statistical Methods section of your article. Given that 100% of the patients who tested positive on the screening test had their disease verified using the gold standard test, while only 10% of the patients who tested negative on the screening test had their disease verified, ordinary estimates of sensitivity and specificity are subject to verification bias (Pepe, 2003). Therefore, we report Begg and Greenes estimates of sensitivity and specificity, where the estimates are corrected for verification bias using a Bayes Theorem approach (Begg and Greenes, 1983; Pepe, 2003). Pepe has shown the asymptotic confidence intervals to be unreliable for sample sizes used in research studies and instead recommends bootstrapped confidence intervals (Pepe, 2003). Thus, we report boostrapped confidence intervals using the “bias-corrected” method (Carpenter and Bithell, 2000), where “bias” used in this sense is not referring to verification bias, but rather is making the confidence intervals closer to their expected value. Chapter 6-4 (revision 16 May 2010) p. 12 References Begg CB, Greenes RA. (1983). Assessment of diagnostic tests when disease verification is subject to selection bias. Biometrics 39:207-215. Carpenter J, Bithell J. (2000). Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians. Statist. Med. 19:1141-1164. Pepe MS. (2003). The Statistical Evaluation of Medical Tests for Classifcation and Prediction, New York, Oxford University Press, p.168-173. Zhou XH. (1993). Maximum likelihood estimators of sensitivity and specificity corrected for verification bias. Communication in Statistics—Theory and Methods, 22:3177-98. Chapter 6-4 (revision 16 May 2010) p. 13