Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Diagnostic Testing Ethan Cowan, MD, MS Department of Emergency Medicine Jacobi Medical Center Department of Epidemiology and Population Health Albert Einstein College of Medicine The Provider Dilemma A 26 year old pregnant female presents after twisting her ankle. She has no abdominal or urinary complaints. The nurse sends a UA and uricult dipslide prior to you seeing the patient. What should you do with the results of these tests? The Provider Dilemma Should a provider give antibiotics if either one or both of these tests come back positive? Why Order a Diagnostic Test? When the diagnosis is uncertain Incorrect diagnosis leads to clinically significant morbidity or mortality Diagnostic test result changes management Test is cost effective Clinician Thought Process Clinician derives patient prior prob. of disease: H & P Literature Experience “Index of Suspicion” 0% - 100% “Low, Med., High” Threshold Approach to Diagnostic Testing Probability of Disease 0% 100% Testing Zone P(-) P(+) P < P(-) Dx testing & therapy not indicated P(-) < P < P(+) Dx testing needed prior to therapy P > P(+) Only intervention needed Pauker and Kassirer, 1980, Gallagher, 1998 Threshold Approach to Diagnostic Testing Probability of Disease 0% 100% Testing Zone P(-) P(+) Width of testing zone depends on: Test properties Risk of excess morbidity/mortality attributable to the test Risk/benefit ratio of available therapies for the Dx Pauker and Kassirer, 1980, Gallagher, 1998 Test Characteristics Reliability Inter observer Intra observer Correlation B&A Plot Simple Agreement Kappa Statistics Validity Sensitivity Specificity NPV PPV ROC Curves Reliability The extent to which results obtained with a test are reproducible. Reliability Not Reliable Reliable Intra rater reliability Extent to which a measure produces the same result at different times for the same subjects Inter rater reliability Extent to which a measure produces the same result on each subject regardless of who makes the observation Correlation (r) For continuous data r=1 perfect r=0 none O1 O1 = O2 O2 Bland & Altman, 1986 Correlation (r) Measures relation strength, not O1 agreement Problem: even near perfect correlation may indicate significant differences between observations r = 0.8 O1 = O2 O2 Bland & Altman, 1986 Bland & Altman Plot O1 – O 2 For continuous data Plot of observation differences versus the means Data that are evenly distributed around 0 and are within 2 STDs exhibit good agreement 10 0 -10 [O1 + O2] / 2 Bland & Altman, 1986 Simple Agreement Rater 1 + total Extent Rater 2 + a b c d a+c b+d total a+b c+d N to which two or more raters agree on the classifications of all subjects % of concordance in the 2 x 2 table (a + d) / N Not ideal, subjects may fall on diagonal by chance Kappa Rater 1 + total The Rater 2 + a b c d a+c b+d total a+b c+d N proportion of the best possible improvement in agreement beyond chance obtained by the observers K = (pa – p0)/(1-p0) Pa = (a+d)/N (prop. of subjects along the main diagonal) Po = [(a + b)(a+c) + (c+d)(b+d)]/N2 (expected prop.) Interpreting Kappa Values K=1 K > 0.80 0.60 < K < 0.80 0.40 < K < 0.60 0 < K < 0.40 K=0 K<0 Perfect Excellent Good Fair Poor Chance (pa = p0) Less than chance Weighted Kappa Rater 1 1 2 . . C total Rater 2 1 2 n11 n12 n21 n22 . . . . nC1 nC2 n.1 n.2 ... ... ... ... ... ... ... C n1C n2C . . nCC n.C total n1. n2. . . nC. N Used for more than 2 observers or categories Perfect agreement on the main diagonal weighted more than partial agreement off of it. Validity The degree to which a test correctly diagnoses people as having or not having a condition Internal Validity External Validity Validity Valid, not reliable Reliable and Valid Internal Validity Performance Characteristics Sensitivity Specificity NPV PPV ROC Curves 2 x 2 Table Disease Status Test Result + total cases noncases TP FN FP TN cases noncases TP = True Positives FP = False Positives total positives negatives N TN = True Negatives FN = False Negatives Gold Standard Definitive test used to identify cases Example: traditional agar culture The dipstick and dipslide are measured against the gold standard Sensitivity (SN) Disease Status Test Result + total cases noncases TP FN FP TN cases noncases total positives negatives N Probability of correctly identifying a true case TP/(TP + FN) = TP/ cases High SN, Negative test result rules out Dx (SnNout) Sackett & Straus, 1998 Specificity (SP) Disease Status Test Result + total cases noncases TP FN FP TN cases noncases total positives negatives N Probability of correctly identifying a true noncase TN/(TN + FP) = TN/ noncases High SP, Positive test result rules in Dx (SpPin) Sackett & Straus, 1998 Problems with Sensitivity and Specificity Remain constant over patient populations But, SN and SP convey how likely a test result is positive or negative given the patient does or does not have disease Paradoxical inversion of clinical logic Prior knowledge of disease status obviates need of the diagnostic test Gallagher, 1998 Positive Predictive Value (PPV) Disease Status Test Result + total cases noncases TP FN FP TN cases noncases total positives negatives N Probability that a labeled (+) is a true case TP/(TP + FP) = TP/ total positives High SP corresponds to very high PPV (SpPin) Sackett & Straus, 1998 Negative Predictive Value (NPV) Disease Status Test Result + total cases noncases TP FN FP TN cases noncases total positives negatives N Probability that a labeled (-) is a true noncase TN/(TN + FN) = TP/ total negatives High SN corresponds to very high NPV (SnNout) Sackett & Straus, 1998 Predictive Value Problems Vulnerable to Disease Prevalence (P) Shifts Do not remain constant over patient populations As P PPV NPV As P PPV NPV Gallagher, 1998 Flipping a Coin to Dx AMI for People with Chest Pain ED AMI Prevalence 6% AMI No AMI Heads (+) 3 47 50 Tails (-) 3 47 50 6 94 100 SN = 3 / 6 = 50% SP = 47 / 94 = 50% PPV= 3 / 50 = 6% NPV = 47 / 50 = 94% Worster, 2002 Flipping a Coin to Dx AMI for People with Chest Pain CCU AMI Prevalence 90% AMI No AMI Heads (+) 45 5 50 Tails (-) 45 5 50 10 100 90 SN = 45 / 90 = 50% SP = 5 / 10 = 50% PPV= 45 / 50 = 90% NPV = 5 / 50 = 10% Worster, 2002 Receiver Operator Curve 1.0 Sensitivity (TPR) 0.0 0.0 1-Specificity (FPR) 1.0 Allows consideration of test performance across a range of threshold values Well suited for continuous variable Dx Tests Receiver Operator Curve Avoids the “single cutoff trap” Sepsis No Effect Effect WBC Count Gallagher, 1998 Area Under the Curve (θ) 1.0 Sensitivity (TPR) 0.0 0.0 1-Specificity (FPR) 1.0 Measure of test accuracy (θ) 0.5 – 0.7 no to low discriminatory power (θ) 0.7 – 0.9 moderate discriminatory power (θ) > 0.9 high discriminatory power Gryzybowski, 1997 Problem with ROC curves Same problems as SN and SP “Reverse Logic” Mainly used to describe Dx test performance Appendicitis Example Study design: Prospective cohort Gold standard: Pathology report from appendectomy or CT finding (negatives) Diagnostic Test: Total WBC Physical Exam + OR + Appy CT Scan - No Appy Cardall, 2004 Appendicitis Example WBC Appy Not Appy Total > 10,000 66 89 155 < 10,000 21 98 119 Total 87 187 274 SN 76% (65%-84%) SP 52% (45%-60%) PPV 42% (35%-51%) NPV 82% (74%-89%) Cardall, 2004 Appendicitis Example Patient WBC: 13,000 Management: Get CT with PO & IV Contrast Physical Exam + OR + Appy CT Scan - No Appy Cardall, 2004 Abdominal CT Follow UP CT result: acute appendicitis Patient taken to OR for appendectomy But, was WBC necessary? Answer given in talk on Likelihood Ratios