Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Post Mortem on a Real Data Set: 1. An Example of an Unbalanced 1-Way ANOVA 2. Use of Bonferroni's Method for Multiple Comparisons This data set is part of a study by a medical researcher to assess a new methodology for detecting cancerous cells in the tube from the cervix that leads to the uterus. The medical researcher selected specimens from tissues that had been removed because they were cancerous. These cells are of two grades of cancer (1) Low grade adenocarcinoma in situ (level 1 in the variable celltype in the data set) and (2) High grade adenocarcimo (level 2). The goal is to compare them with normal cells. The normal cells (level 0 in the analysis below) were obtained from samples of hysterectomies that had been performed for reasons unrelated to any cancers. This may seem OK on the face of it, but in fact the cells in the female reproductive tract undergo age related changes, so one can not be sure that any differences found were due to cancer but may simply be age related changes. The goal here is to examine any age differences that may exist between the three groups. We should note that this is an example of a bad use of hypothesis tests: we wish to find evidence that the ages do not differ "significantly" meaning that we want to find evidence in favor of the null hypothesis. One of the groups (the normals) has a small sample size of 11, so there is not too much power for detecting departures, i.e. there is a high probability of type II error. First Analysis: One Way ANOVA: Are there differences on average between the normals and other groups? To assess this question, we performed a 1-way ANOVA. Here is the basic output from Minitab: Worksheet size: 5000 cells One-way Analysis of Variance Analysis of Variance for age Source DF SS celltype 2 753 Error 67 8675 Total 69 9428 Level 0 -) 1 2 MS 377 129 F 2.91 P 0.061 N Mean StDev Individual 95% CIs For Mean Based on Pooled StDev --+---------+---------+---------+--- 13 46.08 15.28 (-----------*----------- 38 19 37.84 42.58 8.99 12.66 (-------*------) (---------*----------) --+---------+---------+---------+--- Pooled StDev = 11.38 Tukey's pairwise comparisons Family error rate = 0.0500 Individual error rate = 0.0193 35.0 40.0 45.0 50.0 Critical value = 3.39 Intervals for (column level mean) - (row level mean) 0 1 -0.53 17.00 2 -6.32 13.32 1 -12.40 2.93 The P-value for the analysis of variance is 0.061. By the “usual” 0.05 level of significance, we can say there are not significant differences in the mean levels of the groups, but it is hardly comforting. The normal probability plot of the residuals is shown below, and does not suggest any problem with the normality assumptions. However, boxplots of the age by celltype definitely suggest that the assumption of equal variances is not satisfied. I selected “Basic Statistics” and “displayed descriptive statistics” of Age by Celltype. The results are shown below: Descriptive Statistics Variable age celltype 0 1 2 N 13 38 19 Mean 46.08 37.84 42.58 Median 46.00 36.00 39.00 TrMean 46.18 37.12 41.47 StDev 15.28 8.99 12.66 Variable age celltype 0 1 2 SE Mean 4.24 1.46 2.90 Minimum 23.00 25.00 26.00 Maximum 68.00 62.00 78.00 Q1 30.50 31.75 34.00 Q3 58.00 40.50 50.00 It seems that the normal group has the largest sample standard deviation (at 15.28) and is by any measure of central tendency (mean, median, or trimmed mean) the oldest. It is also the smallest groups at 13 (the others have 38 and 19). We concluded that there are potentially problems with the ANOVA assumptions. Note also that Tukey's method of multiple comparisons is definitely dubious in this case as the sample sizes within groups are not nearly equal. Comment: How should the study have been done? Ideally, for each of the cancer patients we would have found a normal patient who (nearly) matched in important characteristics such as age, race, smoking, SES (socioeconomic status), etc. Then we could do paired sample t-tests to detect differences in the variables of interest (which haven’t been described here), and have some assurance that differences that we found were due to the cancer and not so-called confounding factors. Bonferroni's Method of Multiple Comparisons: Now we perform a similar analysis to the one above but using Bonferroni's method instead of ANOVA + Tukey's method. Here is a quick summary of Bonferroni's method, which applies to any multiple comparisons problem: 1. For simultaneous 1- confidence intervals for k parameters, construct individual 1/k confidence intervals for each parameter separately. 2. For testing k sets of null hypothesis with a Family Wise Error Rate (FWER) of , perform individual hypothesis tests at level /k. The bottom line is to divide the error probability by the number of confidence intervals or tests. One issue with Bonferroni's method is that it is not as powerful as a specially designed method meaning that it has higher type II error probabilities for tests and wider confidence intervals. For instance, in an ANOVA setting, the ANOVA test is more likely to detect a difference (reject the null hypothesis of no difference), of course provided the ANOVA assumptions are met. The beauty of Bonferroni's method is that it applies to any setting. Recall that the two sample t-test is reasonably robust to departures from normality (so is ANOVA) and doesn't require the assumption of equal variances (which is a bit of a problem for ANOVA). So, we reanalyzed the above data by performing 3 pairwise two sample t-tests, but each t-test will be at the 0.05/3 = 0.0167 level of significance since there are 3 pairwise comparisons (1 vs. 2, 1 vs. 0, and 2 vs. 0). To perform the tests at the 0.0167 level, we simply reject if any of the P-values are below 0.0167. I also tried to get 1-.0167 = 98.33% confidence intervals (so the simultaneous level of confidence is 95%) but minitab appears to have rounded off to just 98%, so I have only 94% simultaneous level of confidence. The results are shown below. Two Sample T-Test and Confidence Interval Two sample T for a2 ct2 1 2 N 38 19 Mean 37.84 42.6 StDev 8.99 12.7 SE Mean 1.5 2.9 98% CI for mu (1) - mu (2): ( -13.0, 3.6) T-Test mu (1) = mu (2) (vs not =): T = -1.46 P = 0.16 DF = 27 Two Sample T-Test and Confidence Interval Two sample T for a3 ct3 0 1 N 13 38 Mean 46.1 37.84 StDev 15.3 8.99 SE Mean 4.2 1.5 98% CI for mu (0) - mu (1): ( -3.9, 20.4) T-Test mu (0) = mu (1) (vs not =): T = 1.84 P = 0.087 DF = 14 Two Sample T-Test and Confidence Interval Two sample T for a4 ct4 0 2 N 13 19 Mean 46.1 42.6 StDev 15.3 12.7 SE Mean 4.2 2.9 98% CI for mu (0) - mu (2): ( -9.8, 16.8) T-Test mu (0) = mu (2) (vs not =): T = 0.68 P = 0.50 DF = 22 The smallest P-value was 0.087 (for testing 0 (normal) vs. 1 (adenocarcinoma in situ). As this is not less than 0.0167, we cannot reject the null hypothesis of no difference.