Download Making Sense of Biostatistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

History of statistics wikipedia , lookup

Statistics wikipedia , lookup

Transcript
Vol. 6, No. 10, October 2010
“Can You Handle the Truth?”
Making Sense of Biostatistics: Type I and Type II Errors
By Ronald E Dechert
The concept of error is commonplace in everyday life. In statistics, there are several kinds
of errors. There is the error of using the wrong statistical measures. There is the error of
calculating the statistics incorrectly. There is the error of interpreting the results incorrectly.
These errors are all mistakes that can be avoided, but there are two kinds of statistical
errors that cannot be avoided with sample sizes less than 100%: Type I and Type II errors.
We commit a Type I error by rejecting the null hypothesis when it is, in fact, true.
Conversely, we commit a Type II error by rejecting the alternative hypothesis when, in fact,
it is true. The troublesome thing about Type I and Type II errors is that, for a given sample
size (and power), reducing the probability of one type of error increases the probability of
the other. As a result, we have to decide which type of error we care about more.
Typically, diagnostic tests are powered to yield positive results with a Type I error of less
than 5% and a Type II error of less than 20% to minimize the chance of declaring patients
free of disease when they are not. However, these percentages can vary depending on the
consequences. For example, if you have just been bitten by a rabid squirrel, you probably
want treatment. On the other hand, if you have just been diagnosed with depression, you
may want to wait before starting treatment with psychoactive drugs.
Type I errors are also called “alpha errors” or “false positives” because the primary (null)
hypothesis is rejected when, in fact, it is true. Type II errors are also called “beta errors” or
“false negatives” because the primary hypothesis is accepted (and alternative hypothesis
rejected) when it is, in fact, false. It may seem odd that incorrectly rejecting the primary
hypothesis is a false positive and not a false negative, but in most clinical trials, the primary
hypothesis is the “null hypothesis,” which, for statistical reasons, takes the form, “The result
we are hoping for is not true.” We are thus dealing with a double negative.1
A hypothetical example will illustrate these concepts. Suppose Dr. Susan Mack has
developed a new blood test for a protein (p490) that she believes can be used to identify
patients who are predisposed to dementia. Dr. Mack has observed that patients with
dementia have a higher level of p490 in their blood when compared to normal patients. Dr.
Mack came to this conclusion by comparing the p490 levels in 100 subjects with dementia
and 100 normal subjects without a diagnosis of dementia. She observed that the mean
p490 level in the group with dementia was 350, with a standard deviation of 50. In contrast,
the p490 levels in the normal population were significantly lower, with a mean level of 200
and a standard deviation of 35.
Dr. Mack wants to select a p490 threshold that will be used to diagnosis a patient having
dementia. If Dr. Mack selects a p490 threshold that is too high, the test will generate too
many Type I errors (false positives) and diagnose patients as having dementia when, in
fact, they do not. If Dr. Mack selects a p490 threshold that is too low, the p490 test will
generate Type II errors (false negatives) and diagnose patients as not having dementia
when, in fact, they do. Since the diagnosis of dementia has significant emotional and
physical concerns, Dr. Mack wants the probability of falsely diagnosing dementia (Type I
error) to be as low as possible and certainly less than 5%. To determine the threshold to be
used for detecting dementia with the p490 test, Dr. Mack examines the descriptive statistics
for both populations.
Subscribe free at www.firstclinical.com
© 2010 First Clinical Research and the Authors
If Dr. Mack selects a threshold of 280, what would be the probability of committing a Type I
error? To solve for this, Dr. Mack subtracts the threshold value from the mean of the normal
population and divides by the standard deviation to obtain a Z score:
Z= (200-280)/35 = -2.28
By looking up this value in a probability table (one-sided), Dr. Mack determines that the
probability of a committing a Type I error when selecting a threshold of 280 is 0.012. In
other words, if Dr. Mack observes a level of p490 in a patient that is greater than or equal
to 280, she would conclude that her patient has dementia with a certainty of 98.8%,
rejecting the null hypothesis (her patient does not have dementia) with a 1.2% probability
of being wrong.
Dr. Mack also wants to know the probability of committing a Type II error (false negative).
She calculates this probability the same way, but instead of using the mean and standard
deviation of the normal population, she uses the mean and standard deviation for the
patients who have dementia:
Z= (350-280)/50 = 1.40
Using a probability table again, Dr. Mack determines that the probability of a Z score equal
to 1.40 is 0.081. In other words, if Dr. Mack concludes that her patient (with a p490 level
less than 280) does not have dementia, she has a 91.9% probability of being right and
8.1% of being wrong.
Based on these probabilities of Type I and Type II errors, Dr. Mack concludes that a
threshold p490 value of 280 gives the best results for her patients. However, another
physician may choose a value that shifts the probabilities.
The concept of Type I and II errors is also applicable to clinical trials for drugs, devices and
other treatments. These clinical trials are typically powered to yield a Type I error
probability of 5% and a Type II error probability of 10% because study sponsors prefer that
a given trial accept a treatment that is not effective rather than reject a treatment that is
effective. This preference is not as biased as it sounds because treatments are tested in
multiple trials. A false negative in any single trial can mean patients never see a treatment
that is, in fact, effective.
Reference
1. “Making Sense of Biostatistics: The Research Hypothesis and the Null Hypothesis,” J.
Rick Turner, Journal of Clinical Research Best Practices, October 2008
Author
Ronald E. Dechert, DPH, is Associate Director of the Mott Respiratory Care at the University
of Michigan Medical Center. Contact him at 734.936.5237 or [email protected].
Subscribe free at www.firstclinical.com
© 2010 First Clinical Research and the Author(s)
2