Download Post Mortem on a Real Data Set: An Example of 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Post Mortem on a Real Data Set:
1. An Example of an Unbalanced 1-Way ANOVA
2. Use of Bonferroni's Method for Multiple Comparisons
This data set is part of a study by a medical researcher to assess a new methodology for
detecting cancerous cells in the tube from the cervix that leads to the uterus. The medical
researcher selected specimens from tissues that had been removed because they were
cancerous. These cells are of two grades of cancer (1) Low grade adenocarcinoma in
situ (level 1 in the variable celltype in the data set) and (2) High grade adenocarcimo
(level 2). The goal is to compare them with normal cells. The normal cells (level 0 in the
analysis below) were obtained from samples of hysterectomies that had been performed
for reasons unrelated to any cancers. This may seem OK on the face of it, but in fact the
cells in the female reproductive tract undergo age related changes, so one can not be sure
that any differences found were due to cancer but may simply be age related changes.
The goal here is to examine any age differences that may exist between the three groups.
We should note that this is an example of a bad use of hypothesis tests: we wish to find
evidence that the ages do not differ "significantly" meaning that we want to find
evidence in favor of the null hypothesis. One of the groups (the normals) has a small
sample size of 11, so there is not too much power for detecting departures, i.e. there is a
high probability of type II error.
First Analysis: One Way ANOVA: Are there differences on average between the
normals and other groups? To assess this question, we performed a 1-way ANOVA.
Here is the basic output from Minitab:
Worksheet size: 5000 cells
One-way Analysis of Variance
Analysis of Variance for age
Source
DF
SS
celltype
2
753
Error
67
8675
Total
69
9428
Level
0
-)
1
2
MS
377
129
F
2.91
P
0.061
N
Mean
StDev
Individual 95% CIs For Mean
Based on Pooled StDev
--+---------+---------+---------+---
13
46.08
15.28
(-----------*-----------
38
19
37.84
42.58
8.99
12.66
(-------*------)
(---------*----------)
--+---------+---------+---------+---
Pooled StDev =
11.38
Tukey's pairwise comparisons
Family error rate = 0.0500
Individual error rate = 0.0193
35.0
40.0
45.0
50.0
Critical value = 3.39
Intervals for (column level mean) - (row level mean)
0
1
-0.53
17.00
2
-6.32
13.32
1
-12.40
2.93
The P-value for the analysis of variance is 0.061. By the “usual” 0.05
level of significance, we can say there are not significant differences
in the mean levels of the groups, but it is hardly comforting. The
normal probability plot of the residuals is shown below, and does not
suggest any problem with the normality assumptions.
However, boxplots of the age by celltype definitely suggest that the
assumption of equal variances is not satisfied.
I selected “Basic Statistics” and “displayed descriptive statistics” of Age by Celltype.
The results are shown below:
Descriptive Statistics
Variable
age
celltype
0
1
2
N
13
38
19
Mean
46.08
37.84
42.58
Median
46.00
36.00
39.00
TrMean
46.18
37.12
41.47
StDev
15.28
8.99
12.66
Variable
age
celltype
0
1
2
SE Mean
4.24
1.46
2.90
Minimum
23.00
25.00
26.00
Maximum
68.00
62.00
78.00
Q1
30.50
31.75
34.00
Q3
58.00
40.50
50.00
It seems that the normal group has the largest sample standard deviation (at 15.28) and is
by any measure of central tendency (mean, median, or trimmed mean) the oldest. It is
also the smallest groups at 13 (the others have 38 and 19). We concluded that there are
potentially problems with the ANOVA assumptions. Note also that Tukey's method of
multiple comparisons is definitely dubious in this case as the sample sizes within groups
are not nearly equal.
Comment: How should the study have been done? Ideally, for each of the cancer
patients we would have found a normal patient who (nearly) matched in important
characteristics such as age, race, smoking, SES (socioeconomic status), etc. Then we
could do paired sample t-tests to detect differences in the variables of interest (which
haven’t been described here), and have some assurance that differences that we found
were due to the cancer and not so-called confounding factors.
Bonferroni's Method of Multiple Comparisons: Now we perform a similar analysis to
the one above but using Bonferroni's method instead of ANOVA + Tukey's method.
Here is a quick summary of Bonferroni's method, which applies to any multiple
comparisons problem:
1. For simultaneous 1- confidence intervals for k parameters, construct individual 1/k confidence intervals for each parameter separately.
2. For testing k sets of null hypothesis with a Family Wise Error Rate (FWER) of ,
perform individual hypothesis tests at level /k.
The bottom line is to divide the error probability by the number of confidence intervals or
tests. One issue with Bonferroni's method is that it is not as powerful as a specially
designed method meaning that it has higher type II error probabilities for tests and wider
confidence intervals. For instance, in an ANOVA setting, the ANOVA test is more likely
to detect a difference (reject the null hypothesis of no difference), of course provided the
ANOVA assumptions are met. The beauty of Bonferroni's method is that it applies to
any setting.
Recall that the two sample t-test is reasonably robust to departures from normality (so is
ANOVA) and doesn't require the assumption of equal variances (which is a bit of a
problem for ANOVA). So, we reanalyzed the above data by performing 3 pairwise two
sample t-tests, but each t-test will be at the 0.05/3 = 0.0167 level of significance since
there are 3 pairwise comparisons (1 vs. 2, 1 vs. 0, and 2 vs. 0). To perform the tests at the
0.0167 level, we simply reject if any of the P-values are below 0.0167. I also tried to get
1-.0167 = 98.33% confidence intervals (so the simultaneous level of confidence is 95%)
but minitab appears to have rounded off to just 98%, so I have only 94% simultaneous
level of confidence. The results are shown below.
Two Sample T-Test and Confidence Interval
Two sample T for a2
ct2
1
2
N
38
19
Mean
37.84
42.6
StDev
8.99
12.7
SE Mean
1.5
2.9
98% CI for mu (1) - mu (2): ( -13.0, 3.6)
T-Test mu (1) = mu (2) (vs not =): T = -1.46
P = 0.16
DF = 27
Two Sample T-Test and Confidence Interval
Two sample T for a3
ct3
0
1
N
13
38
Mean
46.1
37.84
StDev
15.3
8.99
SE Mean
4.2
1.5
98% CI for mu (0) - mu (1): ( -3.9, 20.4)
T-Test mu (0) = mu (1) (vs not =): T = 1.84
P = 0.087
DF = 14
Two Sample T-Test and Confidence Interval
Two sample T for a4
ct4
0
2
N
13
19
Mean
46.1
42.6
StDev
15.3
12.7
SE Mean
4.2
2.9
98% CI for mu (0) - mu (2): ( -9.8, 16.8)
T-Test mu (0) = mu (2) (vs not =): T = 0.68
P = 0.50
DF = 22
The smallest P-value was 0.087 (for testing 0 (normal) vs. 1 (adenocarcinoma in situ).
As this is not less than 0.0167, we cannot reject the null hypothesis of no difference.