Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Sensitivity, Specificity, and Useful Measures of Diagnostic Utility Problems with Traditional Indexes Decision Making and Forward Risk Diagnostic Risk Modeling Assessing Diagnostic Yield Summary References Sensitivity, Specificity, and Useful Measures of Diagnostic Utility Frank E Harrell Jr Department of Biostatistics Vanderbilt University School of Medicine Statistics and Methodology Core Training Seminar Vanderbilt Kennedy Center 3 May 2012 Sensitivity, Specificity, and Bayes’ Rule Sensitivity, Specificity, and Useful Measures of Diagnostic Utility Problems with Traditional Indexes Decision Making and Forward Risk Diagnostic Risk Modeling Assessing Diagnostic Yield Summary References sensitivity = Prob[T + |D + ] specificity = Prob[T − |D − ] Prob[D + |T + ] = sens×prev sens×prev +(1−spec)×(1−prev ) Problems with Traditional Indexes of Diagnostic Utility Sensitivity, Specificity, and Useful Measures of Diagnostic Utility Diagnosis forced to be binary Problems with Traditional Indexes Decision Making and Forward Risk Test force to be binary Sensitivity and specificity are in backwards time order Confuse decision making for groups vs. individuals Diagnostic Risk Modeling Inadequate utilization of pre-test information Assessing Diagnostic Yield Dichotomization of continuous variables in general Summary References The Dichotomizers Sensitivity, Specificity, and Useful Measures of Diagnostic Utility Problems with Traditional Indexes Decision Making and Forward Risk Diagnostic Risk Modeling Assessing Diagnostic Yield Summary References The speed limit is 60. I am going faster than the speed limit. Will I be caught? A response from a dichotomizer: Are you going faster than 70? A response from a better dichotomizer: If you are among other cars, are you going faster than 73? If you are exposed are your going faster than 67? Better: How fast are you going and are you exposed? Motorist, continued Sensitivity, Specificity, and Useful Measures of Diagnostic Utility Problems with Traditional Indexes Decision Making and Forward Risk Diagnostic Risk Modeling Assessing Diagnostic Yield Summary References Analogy to most medical diagnosis research in which +/diagnosis is a false dichotomy of an underlying disease severity: The speed limit is moderately high. I am going fairly fast. Will I be caught? The “sensitive” motorist (who is also a lover of P-values): Of all the motorists receiving speeding tickets, what proportion of them were going semi–fast? BI-RADS Scores Sensitivity, Specificity, and Useful Measures of Diagnostic Utility Diagnosis 0 Incomplete 1 Negative 2 Benign Problems with Traditional Indexes 3 Probably Benign Decision Making and Forward Risk 4 Suspicious Abnormality Diagnostic Risk Modeling 5 Highly Suspicious of Malignancy 6 Known Biopsy Malignancy Assessing Diagnostic Yield Proven Number of Criteria Your mammogram or ultrasound didn’t give the radiologist enough information to make a clear diagnosis; follow-up imaging is necessary There is nothing to comment on; routine screening recommended A definite benign finding; routine screening recommended Findings that have a high probability of being benign (> 98%); six-month short interval follow-up Not characteristic of breast cancer, but reasonable probability of being malignant (3 to 94%); biopsy should be considered Lesion that has a high probability of being malignant (≥ 95%); take appropriate action Lesions known to be malignant that are being imaged prior to definitive treatment; assure that treatment is completed Summary References Breast Imaging Reporting and Data System, American College of Radiologists http://breastcancer.about.com/od/diagnosis/a/birads.htm American College of Radiology. BI-RADS US (PDF document) Copyright 2004. How to Reduce False Positives and Negatives? Sensitivity, Specificity, and Useful Measures of Diagnostic Utility Do away with “positive” and “negative” Problems with Traditional Indexes Decision Making and Forward Risk Diagnostic Risk Modeling Assessing Diagnostic Yield Summary References Provide risk estimates Defer decision to decision maker Risks have self-contained error rates Risk of 0.2 → Prob[error]=.2 if don’t treat Risk of 0.8 → Prob[error]=.2 if treat Against Diagnosis Sensitivity, Specificity, and Useful Measures of Diagnostic Utility Problems with Traditional Indexes Decision Making and Forward Risk Diagnostic Risk Modeling Assessing Diagnostic Yield Summary References The act of diagnosis requires that patients be placed in a binary category of either having or not having a certain disease. Accordingly, the diseases of particular concern for industrialized countries—such as type 2 diabetes, obesity, or depression—require that a somewhat arbitrary cut-point be chosen on a continuous scale of measurement (for example, a fasting glucose level > 6.9 mmol/L [> 125 mg/dL] for type 2 diabetes). These cut-points do not adequately reflect disease biology, may inappropriately treat patients on either side of the cut-point as 2 homogeneous risk groups, fail to incorporate other risk factors, and are invariable to patient preference. Vickers et al. [2008] Problems with Sensitivity and Specificity Sensitivity, Specificity, and Useful Measures of Diagnostic Utility Problems with Traditional Indexes Decision Making and Forward Risk Diagnostic Risk Modeling Assessing Diagnostic Yield Backwards time order Irrelevant to both physician and patient Improper discontinuous scoring rules Are not test characteristics Are characteristics of the test and patients Not constant; vary with patient characteristics Sensitivity ↑ with any covariate related to disease severity if diagnosis is dichotomized Require adjustment for workup bias Diagnostic risk models do not; only suffer from under-representation Summary References Good for proof of concept of a diagnostic method in a case–control study; not useful for utility Hlatky et al. [1984]; Moons et al. [1997]; Moons and Harrell [2003]; Gneiting and Raftery [2007] Sensitivity of Exercise ECG for Diagnosing CAD Sensitivity, Specificity, and Useful Measures of Diagnostic Utility Age (years) < 40 40–49 50–59 ≥ 60 Sex male female # Diseased CAs 1 2 3 Problems with Traditional Indexes Decision Making and Forward Risk Diagnostic Risk Modeling Assessing Diagnostic Yield Summary References Hlatky et al. [1984] Sensitivity 0.56 0.65 0.74 0.84 0.72 0.57 0.48 0.68 0.85 Types of Bias Sensitivity, Specificity, and Useful Measures of Diagnostic Utility Problems with Traditional Indexes Decision Making and Forward Risk Diagnostic Risk Modeling Assessing Diagnostic Yield Summary References Asymmetric error in an estimator of the appropriate quantity Zero error in estimating the wrong quantity Damage Caused by Improper Discontinuous Scoring Rules Sensitivity, Specificity, and Useful Measures of Diagnostic Utility Example: Predicting Prob[disease] N = 400, 0.57 of subjects have disease Classify as diseased if prob. > 0.5 Problems with Traditional Indexes Decision Making and Forward Risk Diagnostic Risk Modeling Assessing Diagnostic Yield Summary References Model age sex age+sex constant C Index .592 .589 .639 .500 χ2 10.5 12.4 22.8 0.0 Adjusted Odds Ratios: age (IQR 58y:42y) 1.6 (0.95CL 1.2-2.0) sex (f:m) 0.5 (0.95CL 0.3-0.7) Test of sex effect adjusted for age (22.8 − 10.5): P = 0.0005 Proportion Correct .622 .588 .600 .573 Sensitivity & specificity are also improper rules Problems with ROC Curves and Cutoffs Sensitivity, Specificity, and Useful Measures of Diagnostic Utility Problems with Traditional Indexes Decision Making and Forward Risk Diagnostic Risk Modeling Assessing Diagnostic Yield Summary References . . . statistics such as the AUC are not especially relevant to someone who must make a decision about a particular xc . . . . ROC curves lack or obscure several quantities that are necessary for evaluating the operational effectiveness of diagnostic tests. . . . ROC curves were first used to check how radio receivers (like radar receivers) operated over a range of frequencies. . . . This is not how most ROC curves are used now, particularly in medicine. The receiver of a diagnostic measurement . . . wants to make a decision based on some xc , and is not especially interested in how well he would have done had he used some different cutoff. Briggs and Zaretzki [2008]; In the discussion to this paper, David Hand states “whe integrating to yield the overall AUC measure, it is necessary to decide what weight t give each value in the integration. The AUC implicitly does this using a weighting derive empirically from the data. This is nonsensical. The relative importance of misclassifyin a case as a non-case, compared to the reverse, cannot come from the data itself. It mus come externally, from considerations of the severity one attaches to the different kind of misclassifications.” Optimum Decision Making for an Individual Sensitivity, Specificity, and Useful Measures of Diagnostic Utility Problems with Traditional Indexes Decision Making and Forward Risk Diagnostic Risk Modeling Assessing Diagnostic Yield Minimize expected loss/cost/disutility Uses utility function (e.g., inverse of cost of missing a diagnosis, cost of over-treatment if disease is absent) probability of disease d = decision, o = outcome Utility for outcome o = U(o) R Expected utility of decision d = U(d) = p(o|d)U(o)do dopt = d maximizing U(d) Summary References http://en.wikipedia.org/wiki/Optimal_decision Diagnostic Risk Modeling Assuming (Atypical) Binary Disease Status Sensitivity, Specificity, and Useful Measures of Diagnostic Utility Problems with Traditional Indexes Decision Making and Forward Risk Diagnostic Risk Modeling Assessing Diagnostic Yield Summary Y X T α β γ 1:diseased, 0:normal vector of subject characteristics (e.g., demographics, risk factors, symptoms) vector of test (biomarker, . . . ) outputs intercept vector of coefficients of X vector of coefficients of T 1 pre(X ) = Prob[Y = 1|X ] = 1+exp[−(α ∗ +β ∗ X )] 1 post(X , T ) = Prob[Y = 1|X , T ] = 1+exp[−(α+βX +γT )] References Note: Proportional odds model extends to ordinal disease severity Y . Significant Coronary Artery Disease Sensitivity, Specificity, and Useful Measures of Diagnostic Utility Problems with Traditional Indexes Decision Making and Forward Risk Diagnostic Risk Modeling Assessing Diagnostic Yield Summary References Pryor et al. [1983] Bacterial vs. Viral Meningitis Sensitivity, Specificity, and Useful Measures of Diagnostic Utility Problems with Traditional Indexes Decision Making and Forward Risk Diagnostic Risk Modeling Assessing Diagnostic Yield Summary References Spanos et al. [1989] Model for Ordinal Diagnostic Classes Sensitivity, Specificity, and Useful Measures of Diagnostic Utility Problems with Traditional Indexes Decision Making and Forward Risk Diagnostic Risk Modeling Assessing Diagnostic Yield Summary References Brazer et al. [1991] Assessing Diagnostic Yield Sensitivity, Specificity, and Useful Measures of Diagnostic Utility Problems with Traditional Indexes Decision Making and Forward Risk Diagnostic Risk Modeling Assessing Diagnostic Yield Summary References Absolute Yield Pencina et al. [2008]: Absolute incremental information in a new set of markers Consider change in predicted risk when add new variables Average increase in risk of disease when disease present + Average decrease in risk of disease when disease absent Formal Test of Added Information Likelihood ratio χ2 test of partial association of new markers, adjusted for old markers Assessing Relative Diagnostic Yield Sensitivity, Specificity, and Useful Measures of Diagnostic Utility Problems with Traditional Indexes Decision Making and Forward Risk Diagnostic Risk Modeling Variation in relative log odds of disease = T γ̂, holding X constant Summarize with Gini’s mean difference or inter-quartile range, then anti-log E.g.: the typical modification of pre-test odds of disease is by a factor of 3.4 Assessing Diagnostic Yield Summary References Gini’s mean difference = mean absolute difference between any pair of values Relationship between Odds Ratio and Absolute Change in Risk Assessing Diagnostic Yield Summary References 0.5 0.4 0.3 4 0.2 3 2 1.75 1.5 0.1 Diagnostic Risk Modeling 5 1.25 0.0 Decision Making and Forward Risk 10 Increase in Risk with T+ Problems with Traditional Indexes 0.6 Sensitivity, Specificity, and Useful Measures of Diagnostic Utility 0.0 0.2 0.4 0.6 0.8 Risk of Disease for Subject with T Numbers above curves are odds ratios − 1.0 Assessing Absolute Diagnostic Yield: Cohort Study Sensitivity, Specificity, and Useful Measures of Diagnostic Utility Patient i = 1, 2, 3, . . . , n In-sample sufficient statistics: pre(X1 ), . . . , pre(Xn ), post(X1 , T1 ), . . . , post(Xn , Tn ) Problems with Traditional Indexes Summarize with quantile regression to estimate 10th and 90th percentiles of post as a function of pre Decision Making and Forward Risk Diagnostic Risk Modeling Assessing Diagnostic Yield Summary References Hlatky et al. [2009] Assessing Absolute Yield, continued Sensitivity, Specificity, and Useful Measures of Diagnostic Utility Problems with Traditional Indexes Decision Making and Forward Risk Diagnostic Risk Modeling Assessing Diagnostic Yield Summary References Out-of-sample assessment: compute pre(X ) and post(X , T ) for any X and T of interest Summary measures quantile regression function of pre (Koenker and Bassett [1978]) curves as a overall mean |post – pre| quantiles of post – pre du50 : distribution of post when pre = 0.5 diagnostic utility at maximum pre-test uncertainty Choose X so that pre = 0.5 Examine distribution of post at this pre Summarize with quantiles, Gini’s mean difference on prob. scale Special case where test is binary (atypical): compute post for T + and for T − Assessing Diagnostic Yield: Case-Control & Other Oversampling Designs Sensitivity, Specificity, and Useful Measures of Diagnostic Utility Problems with Traditional Indexes Decision Making and Forward Risk Diagnostic Risk Modeling Assessing Diagnostic Yield Summary References Intercept α is meaningless Choose X and solve for α so that pre = 0.5 Proceed as above to estimate du50 Example: Diagnosis of Coronary Artery Disease (CAD): Test = Total Cholesterol Sensitivity, Specificity, and Useful Measures of Diagnostic Utility 100 150 200 250 300 350 400 3 Vessel or Left Main CAD Significant CAD 70 4 Decision Making and Forward Risk log odds Problems with Traditional Indexes 2 70 0 Diagnostic Risk Modeling Assessing Diagnostic Yield 40 −2 40 Summary References 100 150 200 250 300 350 400 Cholesterol, mg % Relative effect of total cholesterol for age 40 and 70; Data from Duke Cardiovascular Disease Databank, n = 2258 Diagnostic Risk Modeling Assessing Diagnostic Yield Summary References 0.8 0.6 0.4 0.2 Decision Making and Forward Risk 0.0 Problems with Traditional Indexes Post−Test Probability (age + sex + cholesterol) Sensitivity, Specificity, and Useful Measures of Diagnostic Utility 1.0 Utility of Cholesterol for Diagnosing Significant CAD 0.2 0.4 0.6 0.8 Pre−Test Probability (age + sex) Curves are 0.1 and 0.9 quantiles from quantile regression using restricted cubic splines Summary Sensitivity, Specificity, and Useful Measures of Diagnostic Utility Problems with Traditional Indexes Diagnostic utility needs to be estimated using measures of relevance to individual decision makers Improper scoring rules lead to suboptimal decisions Traditional risk modeling is a powerful tool in this setting Decision Making and Forward Risk Cohort studies are ideal but useful measures can be obtained even with oversampling Diagnostic Risk Modeling Avoid categorization of any continuous or ordinal variables Assessing Diagnostic Yield This work used only free software Summary References LATEX Sensitivity, Specificity, and Useful Measures of Diagnostic Utility References S. R. Brazer, F. S. Pancotto, T. T. Long III, F. E. Harrell, K. L. Lee, M. P. Tyor, and D. B. Pryor. Using ordinal logistic regression to estimate the likelihood of colorectal neoplasia. J Clin Epi, 44:1263–1270, 1991. W. M. Briggs and R. Zaretzki. The skill plot: A graphical technique for evaluating continuous diagnostic tests (with discussion). Biometrics, 64:250–261, 2008. Problems with Traditional Indexes Decision Making and Forward Risk Diagnostic Risk Modeling T. Gneiting and A. E. Raftery. Strictly proper scoring rules, prediction, and estimation. J Am Stat Assoc, 102:359–378, 2007. M. A. Hlatky, D. B. Pryor, F. E. Harrell, R. M. Califf, D. B. Mark, and R. A. Rosati. Factors affecting the sensitivity and specificity of exercise electrocardiography. Multivariable analysis. Am J Med, 77:64–71, 1984. M. A. Hlatky, P. Greenland, D. K. Arnett, C. M. Ballantyne, M. H. Criqui, M. S. Elkind, A. S. Go, F. E. Harrell, Y. Hong, B. V. Howard, V. J. Howard, P. Y. Hsue, C. M. Kramer, J. P. McConnell, S. L. Normand, C. J. O’Donnell, S. C. Smith, and P. W. Wilson. Criteria for evaluation of novel markers of cardiovascular risk: a scientific statement from the American Heart Association. Circulation, 119(17): 2408–2416, 2009. American Heart Association Expert Panel on Subclinical Atherosclerotic Diseases and Emerging Risk Factors and the Stroke Council; PMID 19364974. Assessing Diagnostic Yield R. Koenker and G. Bassett. Regression quantiles. Econometrica, 46:33–50, 1978. Summary K. G. M. Moons and F. E. Harrell. Sensitivity and specificity should be de-emphasized in diagnostic accuracy studies. Academic Radiology, 10:670–672, 2003. Editorial. References K. G. M. Moons, G.-A. van Es, J. W. Deckers, J. D. F. Habbema, and D. E. Grobbee. Limitations of sensitivity, specificity, likelihood ratio, and Bayes’ theorem in assessing diagnostic probabilities: A clinical example. Epidemiology, 8(1):12–17, 1997. M. J. Pencina, R. B. D’Agostino Sr, R. B. D’Agostino Jr, and R. S. Vasan. Evaluating the added predictive ability of a new marker: From area under the ROC curve to reclassification and beyond. Stat Med, 27: 157–172, 2008. Sensitivity, Specificity, and Useful Measures of Diagnostic Utility Problems with Traditional Indexes Decision Making and Forward Risk Diagnostic Risk Modeling Assessing Diagnostic Yield Summary References D. B. Pryor, F. E. Harrell, K. L. Lee, R. M. Califf, and R. A. Rosati. Estimating the likelihood of significant coronary artery disease. Am J Med, 75:771–780, 1983. A. Spanos, F. E. Harrell, and D. T. Durack. Differential diagnosis of acute meningitis: An analysis of the predictive value of initial observations. JAMA, 262:2700–2707, 1989. A. J. Vickers, E. Basch, and M. W. Kattan. Against diagnosis. Ann Int Med, 149:200–203, 2008. Sensitivity, Specificity, and Useful Measures of Diagnostic Utility Problems with Traditional Indexes Decision Making and Forward Risk Diagnostic Risk Modeling Assessing Diagnostic Yield Summary References Reducing Bias and Increasing Diagnostic Utility Through Diagnostic Risk Models Frank E Harrell Jr Department of Biostatistics Vanderbilt University Medical diagnostic research, as usually practiced, is prone to bias and even more importantly to yielding information that is not useful to patients or physicians and sometimes overstates the value of diagnostics. Important sources of these problems are conditioning on the wrong statistical information, reversing the flow of time, and categorization of inherently continuous test outputs and disease severity. It will be shown that sensitivity and specificity are not properties of tests in the usual sense of the word, and that they were never natural choices for describing test performance. This implies that ROC curves are unhelpful. So is categorical thinking. The many advantages of diagnostic risk modeling will be discussed, and this talk will show how pre– and post-test diagnostic models give rise to clinically useful displays of pre-test vs. post-test probabilities that themselves quantify diagnostic utility in a way that is useful to patients, physicians, and diagnostic device makers. And unlike sensitivity and specificity, post-test probabilities are immune to certain biases, including workup bias.