Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Evaluation of Diagnostic Tests & ROC Curve Analysis PhD Özgür Tosun Why a physician needs biostatistics? TODAY’S EXAMPLE Understanding the “Statistics” A 50-year-old woman, no symptoms, participates in routine mammography screening. She tests positive, is alarmed, and wants to know from you whether she has breast cancer for certain or what the chances are. Apart from the screening results, you know nothing else about this woman. How many women who test positive actually have breast cancer? Additional Info The probability that a woman has breast cancer is 1% ("prevalence") If a woman has breast cancer, the probability that she tests positive is 90% ("sensitivity") If a woman does not have breast cancer, the probability that she nevertheless tests positive is 9% (”false positive rate") Your answer??? a) b) c) d) nine in 10 (90%) eight in 10 (80%) one in 10 (10%) one in 100 (1%) ATTENTION !! The fact that 90% of women with breast cancer get a positive result from a mammogram (sensitivity) doesn't mean that 90% of women with positive results have breast cancer. REALITY Cancer Healthy Positive 9 89 98 Negative 1 901 902 10 990 1000 TEST Prevalance Sensitivity False Positive Rate Answer Total positive test results among 1,000 women = 98 Only 9 of them are actually having cancer How many women who test positive actually have breast cancer? ◦ 9/98 =~ one in 10 (10%) The high false positive rate, combined with the disease's prevalence of 1%, means that roughly nine out of 10 women with a worrying mammogram don't actually have breast cancer. What Doctors Do with the Question? In one trial, almost half the group of 160 gynecologists responded that the woman's chance of having cancer was nine in 10 (90%). Only 21% said that the figure was one in 10 (10%) - which is the correct answer. That's a worse result than if the doctors had been answering at random (25%). What Happens When Doctor Does Not Explain the Right Probabilities to the Patient? How few specialists understand the risk a woman with a positive mammogram result is worrying We can only imagine how much anxiety those innumerate doctors cause in women This may even lead to unnecessary cancer treatment to healthy woman Research suggests that months after a mammogram false alarm, up to a quarter of women are still affected by the process on a daily basis. EVALUATION OF DIAGNOSTIC TESTS The “Gold Standard” : What is a Gold Standard ? Biopsy results, pathological evaluation, radiological procedures, prolonged follow up, autopsies Almost always more costly, invasive, less feasible Lack of objective standards of disease (e.g. angina Pectoris: Gold standard is careful history taking) Diagnostic Characteristics It is not hypothesis testing BUT ◦ How well does the test identify patients with a disease? ◦ How well does the test identify patients without a disease? Evaluation of the Diagnostic Test Give a group of people (with and without the disease) both tests (the candidate test and the “gold standard” test) and then cross-classify the results and report the diagnostic characteristics of the test. Truth or Gold Standard + + - a (TP) b (FP) Candidate Test - c (FN) d (TN) A perfect test would have b and c equal to 0 Diagnostic Characteristics Sensitivity: The probability that a diseased individual will be identified as “diseased” by the test = P(T+ / D+) = a/(a+c) Specificity: The probability that an individual without the disease will be identified as “healthy” by the test = P(T - / D-) = d/(b+d) Diagnostic Characteristics False positive rate= Given a subject without the disease, the probability that he will have a positive test result ◦ P(T+ / D-) = b/(b+d) = 1 – Specificity False negative rate= Given a subject with the disease, the probability that he will have a negative test result ◦ P(T- / D+) = c/(a+c) = 1 – Sensitivity Predictive Values of Diagnostic Tests More informative from the patient or physician perspective Special applications of Bayes Theorem Predictive Values of Diagnostic Tests Positive Predictive Value: The probability that an individual with a positive test result has the disease = P(D+ / T+) = a/(a+b) Predictive Values of Diagnostic Tests Negative Predictive Value: The probability that an individual with a negative test result does not have the disease = P(D- / T-) = d/(c+d) A LAST SIMPLE EXAMPLE TO SUM IT UP Suppose we have a test statistic for predicting the presence or absence of disease. True Disease Status Pos Test Criterion Pos Neg Neg Suppose we have a test statistic for predicting the presence or absence of disease. True Disease Status Pos Test Criterion Pos Neg Neg Suppose we have a test statistic for predicting the presence or absence of disease. True Disease Status Pos Test Criterion Pos Neg Neg TP TP = True Positive Suppose we have a test statistic for predicting the presence or absence of disease. True Disease Status Pos Test Criterion Pos Neg Neg Suppose we have a test statistic for predicting the presence or absence of disease. True Disease Status Pos Test Criterion Pos Neg Neg FP FP = False Positive Suppose we have a test statistic for predicting the presence or absence of disease. True Disease Status Pos Test Criterion Pos Neg Neg Suppose we have a test statistic for predicting the presence or absence of disease. True Disease Status Pos Test Criterion Pos Neg Neg FN FN = False Negative Suppose we have a test statistic for predicting the presence or absence of disease. True Disease Status Pos Test Criterion Pos Neg Neg Suppose we have a test statistic for predicting the presence or absence of disease. True Disease Status Pos Test Criterion Pos Neg Neg TN TN = True Negative Suppose we have a test statistic for predicting the presence or absence of disease. True Disease Status Test Criterion Pos Neg Pos Neg TP FN FP TN P N P+ N True Disease Status Pos Neg Test Pos TP FP Criterion Neg FN TN P N Accuracy = Probability that the test yields a correct result. = (TP+TN) / (P+N) P+ N Test Criterion Pos Neg True Disease Status Pos Neg TP FP FN TN P N P+ N Sensitivity = Probability that a true case will test positive = TP / P Also referred to as True Positive Rate (TPR) or True Positive Fraction (TPF). True Disease Status Pos Neg Test Pos TP FP Criterion Neg FN TN P N P+ N Specificity = Probability that a true negative will test negative = TN / N Also referred to as True Negative Rate (TNR) or True Negative Fraction (TNF). True Disease Status Pos Neg Test Pos TP FP Criterion Neg FN TN P N P+ N False Negative = Prob that a true positive will test negative Rate = FN / P = 1 - Sensitivity Also referred to as False Negative Fraction (FNF). True Disease Status Pos Neg Test Pos TP FP Criterion Neg FN TN P N P+ N False Positive = Prob that a true negative will test positive Rate = FP / N = 1 - Specificity Also referred to as False Positive Fraction (FPF). Test Criterion Pos Neg Positive Predictive Value (PPV) True Disease Status Pos Neg TP FP FN TN P N P+ N = Probability that a positive test will truly have disease = TP / (TP+FP) True Disease Status Pos Neg Test Pos TP FP Criterion Neg FN TN P N P+ N Negative Predictive = Probability that a negative test Value (NPV) will truly be disease free = TN / (TN+FN) True Disease Status Test Criterion Pos Neg Se = 27/100 = .27 Sp = 727/900 = .81 FPR = 1- Spe = .19 FNR = 1- Sen = .73 Pos Neg 27 73 173 727 200 800 100 900 1000 Acc = (27+727)/1000 = .75 PPV = 27/200 = .14 NPV = 727/800 = .91 ROC CURVE Introduction to ROC curves ROC = Receiver Operating Characteristic The ROC curve was first developed by electrical engineers and radar engineers during World War II for detecting enemy objects in battle fields Soon introduced to psychology to account for perceptual detection of stimuli. During World War II, for the analysis of radar signals. Following the attack on Pearl Harbor in 1941, the United States army began new research to increase the prediction of correctly detected Japanese aircraft from their radar signals. ROC Receiver Operating Characteristics • ROC analysis is developed for the signal receivers in radars • Basic aim was to distinguish the enemy signals from normal signals • It is a graphical analysis method Development of Receiver Operating Characteristics (ROC) Curves If you decrease the threshold (cut off), sensitivity will increase. You will be able to catch every (enemy) plane signals. However, noise in the data will also increase so that you will not be able to progress ROC curve in this example includes alternative threshold (cut off) values and beware that the sensitivity and specificity will simultaneously change as we change the threshold. Remember, some signals are from the enemy planes while some are from normal. ROC Analysis “ROC analysis since then has been used in medicine, radiology, biometrics, and other areas for many decades.” In medicine, ROC analysis has been extensively used in the evaluation of diagnostic tests. ROC curves are also used extensively in epidemiology and medical research Evidence-based medicine. In radiology, ROC analysis is a common technique to evaluate new radiology techniques. Can be used to compare tests & procedures ROC Curves Use and interpretation The ROC methodology easily generalizes to test statistics that are continuous (such as lung function or a blood gas). The ROC curve allows us to see, in a simple visual display, how sensitivity and specificity vary as our threshold varies. The shape of the curve also gives us some visual clues about the overall strength of association between the underlying test statistic and disease status. Example People without the disease People with disease Test Result Threshold Call these patients “negative” Call these patients “positive” Test Result Some definitions ... Call these patients “negative” Call these patients “positive” True Positives Test Result without the disease with the disease Call these patients “negative” Call these patients “positive” Test Result without the disease with the disease False Positives Call these patients “negative” Call these patients “positive” True negatives Test Result without the disease with the disease Call these patients “negative” Call these patients “positive” False negatives Test Result without the disease with the disease Moving the Threshold: right ‘‘-’’ ‘‘+’’ Test Result without the disease with the disease Moving the Threshold: left ‘‘-’’ ‘‘+’’ Test Result without the disease with the disease GOLD STANDARD Diseased Healthy ALTERNATIVE TEST GOLD STANDARD 9 8 7 6 5 4 3 2 1 0 Diseased Healthy 0 100 200 Test parameter, mg/dl 300 F r e q u e n c y ALTERNATIVE TEST Healthy 0 Diseased 100 200 6 5 4 3 2 1 0 F r e q u e n c y 300 Test parameter, mg/dl ALTERNATIVE TEST 6 5 4 3 2 1 0 1 2 3 4 5 6 Diseased Healthy 0 100 200 Test parameter, mg/dl 300 F r e q u e n c y ALTERNATIVE TEST GOLD STANDARD Diseased Healthy FN TP TN FP 5 4 3 2 1 0 1 2 3 4 5 6 Negative outcome Positive outcome 0 100 200 300 400 Test parametresi, mg/dl F r e q u e n c y 500 FN False Negative TP True Positive TN True Negative FP False Positive FN TP TN FP FN TP TN FP Sensitivity and Specificity Sensitivity Ability of a test to correctly diagnose the real patients. FN Sensitivity = TP / ( TP + FN ) TP Specificity Ability of a test to correctly diagnose the real healthy people. Specificity = TN / ( TN + FP ) TN FP “Receiver Operating Characteristic” Curve It is the graphical representation of all sensitivity and specificity combinations for every possible threshold (cut off) value. Aim is to differenciate the diseased and healthy subjects. Measured Value ı ı ı ı ı ı ı ı ı ı Sensitivity Frequency 1.0 -ı 0.8 0.6 0.4 0.2 0.0-ı 1.0 ı ı ı ı ı ı ı ı ı ı 0.8 0.6 0.4 Specificity FN 00 1 25 24 TP 25 Sensitivity Sensitivity Sensitivity::24 :25 25 25/ //25 /25 25 25====0.96 1.00 1.00 1.00 Sensitivity: TN 5 8 20 17 FP 1 24 0 3 25 22 Specificity: Specificity: Specificity: 0 3 1 25 25== = =0.32 0.12 0.00 0.04 0.20 85////25 0.2 0.0 “Receiver Operating Characteristic” Curve Sensitivity 1 Frequency Measured value 1 Specificity Area Under the Curve (AUC) shows the diagnostic performance of a test. AUC is between 0.5 and 1.0 0 “Receiver Operating Characteristic” Curve We can use ROC curves to compare the diagnostic performances of more than one alternative tests. Frequency Frequency Measured value Measure d value 1 Test 1 Sen 1 Spe 0 0 Test 2 ROC curve True Positive Rate (sensitivity) 100% 0% 0% False Positive Rate (1-specificity) 100% ROC curve comparison A poor test: A good test: 100% True Positive Rate True Positive Rate 100% 0 % 0 % 100% False Positive Rate 0 % 0 % 100% False Positive Rate ROC curve extremes Best Test: Worst test: 100% True Positive Rate True Positive Rate 100% 0 % 0 % 0 % False Positive Rate 100 % The distributions don’t overlap at all 0 % False Positive Rate 100 % The distributions overlap completely (Tossing a coin) Area under ROC curve (AUC) Overall measure of test performance Comparisons between two tests based on differences between (estimated) AUC For continuous data, AUC equivalent to MannWhitney U-statistic (nonparametric test of difference in location between two populations) AUC for ROC curves 100% 100% True Positive Rate True Positive Rate AUC = 100% 0 % 0 % 0 % False Positive Rate 100 % 0 % False Positive Rate 100 % 100% 100% 0 % False Positive Rate True Positive Rate AUC = 90% True Positive Rate 0 % AUC = 50% 100 % 0 % AUC = 65% 0 % False Positive Rate 100 % Interpretation of AUC AUC can be interpreted as the probability that the test result from a randomly chosen diseased individual is more indicative of disease than that from a randomly chosen healthy individual No clinically relevant meaning