Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Vol. 6, No. 10, October 2010 “Can You Handle the Truth?” Making Sense of Biostatistics: Type I and Type II Errors By Ronald E Dechert The concept of error is commonplace in everyday life. In statistics, there are several kinds of errors. There is the error of using the wrong statistical measures. There is the error of calculating the statistics incorrectly. There is the error of interpreting the results incorrectly. These errors are all mistakes that can be avoided, but there are two kinds of statistical errors that cannot be avoided with sample sizes less than 100%: Type I and Type II errors. We commit a Type I error by rejecting the null hypothesis when it is, in fact, true. Conversely, we commit a Type II error by rejecting the alternative hypothesis when, in fact, it is true. The troublesome thing about Type I and Type II errors is that, for a given sample size (and power), reducing the probability of one type of error increases the probability of the other. As a result, we have to decide which type of error we care about more. Typically, diagnostic tests are powered to yield positive results with a Type I error of less than 5% and a Type II error of less than 20% to minimize the chance of declaring patients free of disease when they are not. However, these percentages can vary depending on the consequences. For example, if you have just been bitten by a rabid squirrel, you probably want treatment. On the other hand, if you have just been diagnosed with depression, you may want to wait before starting treatment with psychoactive drugs. Type I errors are also called “alpha errors” or “false positives” because the primary (null) hypothesis is rejected when, in fact, it is true. Type II errors are also called “beta errors” or “false negatives” because the primary hypothesis is accepted (and alternative hypothesis rejected) when it is, in fact, false. It may seem odd that incorrectly rejecting the primary hypothesis is a false positive and not a false negative, but in most clinical trials, the primary hypothesis is the “null hypothesis,” which, for statistical reasons, takes the form, “The result we are hoping for is not true.” We are thus dealing with a double negative.1 A hypothetical example will illustrate these concepts. Suppose Dr. Susan Mack has developed a new blood test for a protein (p490) that she believes can be used to identify patients who are predisposed to dementia. Dr. Mack has observed that patients with dementia have a higher level of p490 in their blood when compared to normal patients. Dr. Mack came to this conclusion by comparing the p490 levels in 100 subjects with dementia and 100 normal subjects without a diagnosis of dementia. She observed that the mean p490 level in the group with dementia was 350, with a standard deviation of 50. In contrast, the p490 levels in the normal population were significantly lower, with a mean level of 200 and a standard deviation of 35. Dr. Mack wants to select a p490 threshold that will be used to diagnosis a patient having dementia. If Dr. Mack selects a p490 threshold that is too high, the test will generate too many Type I errors (false positives) and diagnose patients as having dementia when, in fact, they do not. If Dr. Mack selects a p490 threshold that is too low, the p490 test will generate Type II errors (false negatives) and diagnose patients as not having dementia when, in fact, they do. Since the diagnosis of dementia has significant emotional and physical concerns, Dr. Mack wants the probability of falsely diagnosing dementia (Type I error) to be as low as possible and certainly less than 5%. To determine the threshold to be used for detecting dementia with the p490 test, Dr. Mack examines the descriptive statistics for both populations. Subscribe free at www.firstclinical.com © 2010 First Clinical Research and the Authors If Dr. Mack selects a threshold of 280, what would be the probability of committing a Type I error? To solve for this, Dr. Mack subtracts the threshold value from the mean of the normal population and divides by the standard deviation to obtain a Z score: Z= (200-280)/35 = -2.28 By looking up this value in a probability table (one-sided), Dr. Mack determines that the probability of a committing a Type I error when selecting a threshold of 280 is 0.012. In other words, if Dr. Mack observes a level of p490 in a patient that is greater than or equal to 280, she would conclude that her patient has dementia with a certainty of 98.8%, rejecting the null hypothesis (her patient does not have dementia) with a 1.2% probability of being wrong. Dr. Mack also wants to know the probability of committing a Type II error (false negative). She calculates this probability the same way, but instead of using the mean and standard deviation of the normal population, she uses the mean and standard deviation for the patients who have dementia: Z= (350-280)/50 = 1.40 Using a probability table again, Dr. Mack determines that the probability of a Z score equal to 1.40 is 0.081. In other words, if Dr. Mack concludes that her patient (with a p490 level less than 280) does not have dementia, she has a 91.9% probability of being right and 8.1% of being wrong. Based on these probabilities of Type I and Type II errors, Dr. Mack concludes that a threshold p490 value of 280 gives the best results for her patients. However, another physician may choose a value that shifts the probabilities. The concept of Type I and II errors is also applicable to clinical trials for drugs, devices and other treatments. These clinical trials are typically powered to yield a Type I error probability of 5% and a Type II error probability of 10% because study sponsors prefer that a given trial accept a treatment that is not effective rather than reject a treatment that is effective. This preference is not as biased as it sounds because treatments are tested in multiple trials. A false negative in any single trial can mean patients never see a treatment that is, in fact, effective. Reference 1. “Making Sense of Biostatistics: The Research Hypothesis and the Null Hypothesis,” J. Rick Turner, Journal of Clinical Research Best Practices, October 2008 Author Ronald E. Dechert, DPH, is Associate Director of the Mott Respiratory Care at the University of Michigan Medical Center. Contact him at 734.936.5237 or [email protected]. Subscribe free at www.firstclinical.com © 2010 First Clinical Research and the Author(s) 2