Download Chapter 2-99. Homework Problem Solutions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Psychometrics wikipedia , lookup

Omnibus test wikipedia , lookup

Regression toward the mean wikipedia , lookup

Resampling (statistics) wikipedia , lookup

Student's t-test wikipedia , lookup

Categorical variable wikipedia , lookup

Transcript
Chapter 2-99. Homework Problem Solutions
Chapter 2-1. Describing variables, levels of measurement, and choice of descriptive
statistics
Problem 1) Read in the data file
From the Stata menu bar, click on File on the menu bar, find the directory datasets
& do-files, which is a subdirectory of the course manual, and open the file:
births_with_missing.dta.
Your directory path will differ, but something like the following will be displayed in the
Stata Results window,
use "C:\Documents and Settings\u0032770.SRVR\Desktop\
Biostats & Epi With Stata\datasets & do-files\
births_with_missing.dta", clear
Problem 2) Listing data
List the data for bweight.
a) At the first “—more—” prompt, with the cursor in the Command window, hit
the enter key a couple of times (notice this scrolls one line at a time).
b) With the cursor in the Command window, hit the space bar a couple of times
(notice this scrolls a page at a time).
c) Click on the “—more—” prompt with the mouse (notice this scrolls a page at
a time, as well)
d) We have seen enough. Hit the stop icon on the menu bar (the red dot with a
white X in the middle of it). This terminates (breaks) the output.
list bweight
1.
2.
3.
4.
5.
6.
+---------+
| bweight |
|---------|
|
2974 |
|
3270 |
|
2620 |
|
3751 |
|
3200 |
|---------|
|
3673 |
…
______________________
Source: Stoddard GJ. Biostatistics and Epidemiology Using Stata: A Course Manual. Salt Lake City, UT: University of Utah
School of Medicine. Chapter 2-99. (Accessed January 8, 2012, at http://www.ccts.utah.edu/biostats/
?pageId=5385).
Chapter 2-99 (revision 8 Jan 2012)
p. 1
Problem 3) Frequency table
Create a frequency table for the variable lowbw.
tabulate lowbw
* <or>
tab lowbw
low birth |
weight |
Freq.
Percent
Cum.
------------+----------------------------------0 |
420
87.87
87.87
1 |
58
12.13
100.00
------------+----------------------------------Total |
478
100.00
Problem 4) Histogram
Create a histogram for the variable gestwks, asking for the percent on the y-axis,
rather than proportions (density).
15
0
5
10
Percent
20
25
histogram gestwks , percent
25
30
Chapter 2-99 (revision 8 Jan 2012)
35
gestation period
40
45
p. 2
Problem 5) Kernal Density Plot
Create a kernal density for the variable gestwks, overlaying the graphs for male
and female newborns.
The following will work in the do-file editor. If you did this in the Command window,
you would exclude the command continuation symbol “///” and make the command just
one long command on the same line.
.2
.1
0
kdensity gestwks
.3
twoway (kdensity gestwks if sexalph == "female", lcolor(pink)) ///
(kdensity gestwks if sexalph == "male" , lcolor(blue))
25
30
kdensity gestwks
Chapter 2-99 (revision 8 Jan 2012)
35
x
40
45
kdensity gestwks
p. 3
Problem 6) Box plot
Create a boxplot for the variable bweight, showing male and female newborns on
the same graph.
3,000
2,000
1,000
birth weight in grams
4,000
5,000
graph box bweight, over(sexalph)
female
male
Problem 7) Visualizing Distribution From Descriptive Statistics
A variable has the following descriptive statistics:
Mean = 45
Median = 50
SD = 3
Is this distribution symmetrical, or is it skewed? If skewed, is it left or right
skewed?
The distribution is left skewed. If it were symmetrical, the mean and median would be
been very close to the same number. You can see that the mean is 5 less than the median,
which is (mean – median)/SD = (45-50)/3 = -5/3 = -1.67 SDs apart. Recall that a normal
distribution, which is symmetrical, has six SDs from the minimum to the maximum,
approximately (middle 99.7% of distribution). So -1.67 SD, which is nearly -2 SDs, is
about 1/3 of the distribution apart, which would be a very noticable skewness if the
distribution was graphed. Since the mean is to the left of the median, it is said to be “left”
skewed—the long tail is in the direction of the skewness.
Chapter 2-99 (revision 8 Jan 2012)
p. 4
Problem 8) Visualizing Distribution From Descriptive Statistics
A variable has the following descriptive statistics:
Mean = 50
Median = 49
SD = 10
Is this distribution symmetrical, or is it skewed? If skewed, is it left or right
skewed?
Although it would technically be correct to say it was right skewed, since the mean is
greater than the median, the distribution is symmetrical for all practical purposes. Even
though the mean and median differ by 1 point, the difference is only (mean - median)/SD
= 1/10 SD apart, which would hardly be noticeable if the histogram was displayed.
Munro (2001, p.42) provides Pearson’s skewness coefficient as a way to assess skewness,
which is what was used in the preceding paragraph:
Pearson’s skewness coefficient is
skewness = (mean – median)/SD
Notice that the “sign” of the Pearson skewness coefficient is in agreement with the
concept of “left” and “right” skewness, being negative or positive skewness on the
number line (to the left or right on the number line).
Munro (2001, p.43) gives Hidebrand’s rule-of-thumb to think about skewness,
“Hidebrand (1986) states that skewness values above 0.2 or below -0.2 indicate
severe skewness.”
The 1/10 SD, or 0.1 SD, is within the -0.2 to 0.2 range, so the skewness is not severe
enough to be of practical concern.
There is also an official statistic called skewness, which is given by the summarize
command with the detail option.
NOTE: A discussion on the assessment of skewness was not even provided in Chapter 21. It turns out that it is a relatively unimportant concept. Statistical tests, such as the t
test to be covered later in this course, assume a normal, or symmetrical distribution.
However, the test is very robust to violations of this assumption, giving correct results
anyway.
Chapter 2-99 (revision 8 Jan 2012)
p. 5
Problem 9) Descriptive Statistics
Obtain the descriptive statistics, including the median (50th percentile) for the
variable bweight.
summarize bweight , detail
* <or>
sum bweight , detail
birth weight in grams
------------------------------------------------------------Percentiles
Smallest
1%
924
628
5%
1801
693
10%
2399
708
Obs
478
25%
2878
864
Sum of Wgt.
478
50%
75%
90%
95%
99%
3192.5
3551
3804
4041
4423
Largest
4436
4512
4516
4553
Mean
Std. Dev.
3137.253
637.777
Variance
Skewness
Kurtosis
406759.5
-1.039337
5.094602
Problem 10) Descriptive Statistics by Group
Obtain the short list of descriptive statistics (N, mean, SD, min, max) for
variable bweight, for both males and females.
bysort sexalph: sum bweight
--------------------------------------------------------------------------------> sexalph =
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------bweight |
41
2958.512
627.7393
1431
4226
--------------------------------------------------------------------------------> sexalph = female
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------bweight |
212
3069.236
622.204
628
4300
--------------------------------------------------------------------------------> sexalph = male
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------bweight |
225
3233.911
641.5076
693
4553
Notice the first table of descriptive statistics is for those infants with missing data for the
sexalph variable.
Chapter 2-99 (revision 8 Jan 2012)
p. 6
This could be done with a nicer format using the table or tabstat commands, as shown at the end
of Chapter 2-1, but the sum command with a bysort is much easier to memorize.
Problem 11) Best Choice of Descriptive Statistics to Describe a Variable’s Distribution
The variable race/ethnicity is coded as:
1) Caucasian (White)
2) African-American (Black)
3) Asian
4) Native American
5) Pacific Islander
What is the level of measurement (measure scale) of this variable? What is the
best way to describe it in a “Patient Characteristics” table of a manuscript?
The scores simply represent labels or classifications, which have no natural rank ordering. Thus,
the level of measure is “nominal” or an “unordered categorical scale.”
All that can be done for unordered categories is to report the count, or frequency, for each
category, along with the percent of the sample within each category. For this variable, simply
show the count and percent. It is also popular to just show the percent, since the count can be
derived from the percent and sample size for each group if the reader so choses to. Thus, the
entire distribution is put in the table, which with only five categories, the reader should be able to
hold in his or her head and visualize correctly the distribution.
For the categories with very small percentages, another approach is to combine those categories
into an “other” category, which simplifies the presentation.
Problem 12) Best Choice of Descriptive Statistics to Describe a Variable’s Distribution
The variable systolic blood pressure is coded as actual values of the measurement.
What is the variable’s level of measurement? What is the best way to describe it
in a “Patient Characteristics” table of a manuscript?
The scores look like the integer number system, with equal intervals between the values. The
starting value is atmospheric pressure. If the blood pressure increases by 10%, there is 10% more
force being exerted, so ratios can be computed with this variable. Thus, the variable is a ratio
scale. For statistical analysis, it is generally sufficient to just think of it as an “interval scale”,
since the approach to the statistical analysis will almost always be the same as with an interval
scale.
To describe it, use the mean and standard deviation. If the sample of patients is such that the
variable is extremely skewed, use the median and interquartile range, instead.
Problem 13) Best Choice of Descriptive Statistics to Describe a Variable’s Distribution
The variable sex is scored as
Chapter 2-99 (revision 8 Jan 2012)
p. 7
1) male
2) female
What is the best way to describe it in a “Patient Characteristics” table of a
manuscript?
The variable has two categories, so it is a dichotomous, or binary, scale. It can also be referred to
as “unordered categorical.”
To describe it, simply use the count and percent, or just the percent. This actually only needs to
be done for one category, either male or female, since the reader can compute the other percent in
his or her head without too much trouble.
Problem 14) Best Choice of Descriptive Statistics to Describe a Variable’s Distribution
The variable New York Heart Association class (NYHA class)(Miller-Davis et al,
2006) is a simple scale that classifies a patient according to how cardiac
symptoms impinge on day to day activies. It is scored as
Class I) No limitations of physical activity (ordinary physical activity
does not cause symptoms)
Class II) Slight limitation of physical activity (ordinary physical activity
does cause symptoms)
Class III) Moderate limitation of activity (comfortable at rest but less
than orinary activities cause symptoms)
Class IV) Unable to perform any physical activity without discomfort
(may be symptomatic even at rest); therefore severe limitation
What is the variable’s level of measurement? What is the best way to describe it
in a “Patient Characteristics” table of a manuscript?
The variable is an “ordinal level of measurement” or “ordered categorical scale.” Since there are
only four categories, it should be reported as counts with percents, or just percents. This ignores
rank ordering, but the reader would be able to hold the distribution in his or her head just fine. If
the percents where shown in side-by-side columns, the reader could even see if the percents were
lumping up at the low end for one group versus lumping up at the high end for the other group.
If this was not obvious, either or also reporting the median and interquartile range would be
helpful.
Problem 15) Open up the file births_with_missing.dta in Stata. Compute the frequency
tables or descriptive statistics, separately for mothers with and without hypertension, and
fill in the following table with the appropriate row labels in column one and the best
choice of descriptive statistcs in columns two and three.
Table 1. Patient Characteristics
Maternal
Hypertension
Present
Chapter 2-99 (revision 8 Jan 2012)
Maternal
Hypertension
Absent
p. 8
[N = ]
[N = ]
Maternal age, yrs
Sex of Newborn
The descriptive statistics could be generated using,
. tab hyp
hypertens |
Freq.
Percent
Cum.
------------+----------------------------------0 |
411
85.98
85.98
1 |
67
14.02
100.00
------------+----------------------------------Total |
478
100.00
. bysort hyp: tab sexalph
-> hyp = 0
sex coded |
as string |
Freq.
Percent
Cum.
------------+----------------------------------female |
190
49.10
49.10
male |
197
50.90
100.00
------------+----------------------------------Total |
387
100.00
-> hyp = 1
sex coded |
as string |
Freq.
Percent
Cum.
------------+----------------------------------female |
26
40.63
40.63
male |
38
59.38
100.00
------------+----------------------------------Total |
64
100.00
. bysort hyp: sum matage
-> hyp = 0
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------matage |
397
34.08816
3.861201
23
43
-> hyp = 1
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------matage |
66
33.57576
4.278967
24
43
Chapter 2-99 (revision 8 Jan 2012)
p. 9
The missing values make reporting the counts problematic, so just showing percents would be
the easiest approach. Here, we assume that the missing values follow the same distribution as the
nonmissing values.
Here is one format. Other formats are also fine—it is just a matter of personal reporting style.
Table 1. Patient Characteristics
Maternal
Hypertension
Present
[N = 67]
Maternal age, yrs
Mean (SD)
34 (4)
Sex of Newborn, %
Male
59
Maternal
Hypertension
Absent
[N = 411]
34 (4)
51
Chapter 2-2. Logic of significance tests
Chapter 2-3. Choice of significance test
It was shown in Chapter 2-1 that the decision of which descriptive statistic to use was
based on the level of measurement of the data. The most informative measure of average
and dispersion (such as mean and standard deviation) was selected after determining the
level of measurement of the variable.
The choice of a test statistic, also called a significance test, is made in a similar way.
You choose the test the makes the best use of the information in the variable; that is, it
depends on the level of measurement of the variable, and whether the groups being
compared are independent (different study subjects) or related (same person measured at
least twice).
Problem 1) Practice Selecting a Significance Test Using Chapter 2-3.
Selecting a significance test before these tests have been introduced in later chapters is in
some sense jumping ahead. Still, it is useful at this point in the course to see that the
decision is actually quite simple, which removes a lot of mystery about the subject of
statistics. You do not even have to know what the tests are to be able to do this. This
problem is an exercise to illustrate how easy it is.
In this problem, the study is comparing an active treatment (intervention group) to an
untreated control group (control group). These groups are different subjects (different
people, animals, or specimens). The outcome is an interval scale, and for this analysis, no
control for other variables is desired. What is the best significance test to use?
Chapter 2-99 (revision 8 Jan 2012)
p. 10
Answer: Looking at the table in Chapter 2-3, on page 3, we find the “continuous” row,
since “continuous” is another name for “interval scale.” Then we find the “two
independent groups” column. The “best” test, or at least an excellent one that is widely
accepted as the best choice, is shown, which is the independent groups t-test. This can
also be found in Chapter 2-3, on page 7. First find “Two Unrelated Samples”, then find
“Interval Scale”, then find “Tests for Location (average)”. There we find independent
groups t-test list first. The test listed first is the most popular.
Problem 2) Practice Selecting a Significance Test Using Chapter 2-3.
In this problem, the study is comparing a baseline, or pre-intervention measurement, to a
post-intervention measurement on the same study subjects. There is no control group in
the experiment. The outcome is an ordinal scale variable, and for this analysis, no control
for other variables is desired. What is the best significance test to use?
Answer: Looking at the table in Chapter 2-3, on page 3, we find the “ordered categorical”
row, since “ordered categorical” is another name for “ordinal scale.” Then we find the
“two correlated groups” column. The “best” test, or at least an excellent one that is
widely accepted as the best choice, is shown, which is the Wilcoxon sign rank test. This
can also be found in Chapter 2-3, on page 6. First find “Two Related Samples”, then find
“Ordinal Scale”. There we find the Wilcoxon signed rank test listed first. The test listed
first is the most popular.
Chapter 2-4. Comparison of two independent groups
Problem 1) Crosstabulation analysis
Open the file births_with_missing.dta inside Stata. In preparing to test for an association
between hypertension and preterm in the subgroup of females (sex equal 2), first check
the minimum expected cell frequency assumption. Do this using:
tabulate preterm hyp if sex==2, expect
* <or abbreviate to>
tab preterm hyp if sex==2, expect
Comparing the results to the minimum expected cell frequency rule, should a chi-square
test be used to test the association, or should a Fisher’s exact test be used?
Chapter 2-99 (revision 8 Jan 2012)
p. 11
+--------------------+
| Key
|
|--------------------|
|
frequency
|
| expected frequency |
+--------------------+
|
hypertens
pre-term |
0
1 |
Total
-----------+----------------------+---------0 |
160
18 |
178
|
155.4
22.6 |
178.0
-----------+----------------------+---------1 |
19
8 |
27
|
23.6
3.4 |
27.0
-----------+----------------------+---------Total |
179
26 |
205
|
179.0
26.0 |
205.0
The minimum expected cell frequency rule is found on page 19 of Chapter 2-4.
Specifically, it states the following:
Daniel (1995, pp.524-526) in his statistics textbook, cites a rule attributable to
Cochran (1954):
2 × 2 table: the chi-square test should not be used if n < 20. If 20 < n < 40, the
chi-square test should not be used if any expected frequency is less
than 5. When n ≥ 40, three of the expected cell frequencies should be
at least 5 and one expected frequency can be as small as 1.
In our problem, we a 2 × 2 table with a sample size n=205, which is greater than 40. We
see that we have three cells with expected cell frequencies greater than 5, and one cell
with an expected cell frequency less than 5, but greater than 1. Thus we satisified the
minimum expected cell frequency rule, so a chi-square test can be used.
Problem 2) Crosstabulation analysis
Compute the appropriate test statistic for the crosstabulation table in Problem 1. Ask for
row or column percents, depending on which we would want to report in our manuscript.
We would like to report column percents, which provide the percent of mothers with and
without hypertension who have the outcome of a low birthweight newborn. We want the
chi-square test, which is justified by meeting the minimum expected cell frequency rule,
since it provides a more powerful test than a Fisher’s exact test. To get this, we use,
tab preterm hyp if sex==2, col chi2
Chapter 2-99 (revision 8 Jan 2012)
p. 12
+-------------------+
| Key
|
|-------------------|
|
frequency
|
| column percentage |
+-------------------+
|
hypertens
pre-term |
0
1 |
Total
-----------+----------------------+---------0 |
160
18 |
178
|
89.39
69.23 |
86.83
-----------+----------------------+---------1 |
19
8 |
27
|
10.61
30.77 |
13.17
-----------+----------------------+---------Total |
179
26 |
205
|
100.00
100.00 |
100.00
Pearson chi2(1) =
8.0640
Pr = 0.005
We could report this result as, “Mothers with hypertension during pregnancy delivered
preterm 31% of the time, while mothers without hypertension had only 11%
preterm deliveries (p = 0.005).”
Problem 3) Crosstabulation analysis
Use the following “immediate” (the data follow immediately after the command name)
version of the tabulate command,
tabi 5 4 \ 3 10 , expect
should a chi-square test be used, or should a Fisher’s exact test be used?
+--------------------+
| Key
|
|--------------------|
|
frequency
|
| expected frequency |
+--------------------+
|
col
row |
1
2 |
Total
-----------+----------------------+---------1 |
5
4 |
9
|
3.3
5.7 |
9.0
-----------+----------------------+---------2 |
3
10 |
13
|
4.7
8.3 |
13.0
-----------+----------------------+---------Total |
8
14 |
22
|
8.0
14.0 |
22.0
Applying “If 20 < n < 40, the chi-square test should not be used if any expected
frequency is less than 5,” we see that we have two cells with an expected frequency less
than 5. We are not allowed any, for this sample size, so a Fisher’s exact test must be
used.
Chapter 2-99 (revision 8 Jan 2012)
p. 13
Problem 4) Comparison of a nominal outcome
In Sulkowski (2000) Table 1, the following distribution of race is provided for the two
study groups.
Race
Black
White
Other
Protease
Inhibitor
Regimen
(n =
211)
Dual
Nucleoside
Analog
Regimen
(n = 87)
151 (72)
57 (27)
3 ( 1)
71 (82)
13 (15)
3( 3)
P
value
0.02
The problem is to verify the percents and the p value. Use the tabi command to add the
three rows of data as part of the command, with each row separated by the carriage return,
or new line, symbol “\”. (See an example for two rows of data in the previous problem
above.) First check the expected frequencies, and then use a chi-square test or Fisher’s
exact test (more correctly called a Fisher-Freeman-Halton test when the table is larger
than 2 × 2), as appropriate.
Using
tabi 151 71\57 13\3 3, expect
we get
+--------------------+
| Key
|
|--------------------|
|
frequency
|
| expected frequency |
+--------------------+
|
col
row |
1
2 |
Total
-----------+----------------------+---------1 |
151
71 |
222
|
157.2
64.8 |
222.0
-----------+----------------------+---------2 |
57
13 |
70
|
49.6
20.4 |
70.0
-----------+----------------------+---------3 |
3
3 |
6
|
4.2
1.8 |
6.0
-----------+----------------------+---------Total |
211
87 |
298
|
211.0
87.0 |
298.0
We discover 2 cells, or 2/6 = 33%, have expected frequencies < 5.
Applying the minimum expected cell frequency rule-of-thumb (Daniel, 1995, pp.524526), quoted in Chapter 2-4,
Chapter 2-99 (revision 8 Jan 2012)
p. 14
larger than 2 × 2 table (r × c table):
the chi-square test can be used if no more than 20% of the cells have
expected frequencies < 5 and no cell has an expected frequency < 1.
we see that we did not meet the minimum expected cell frequency criteria, since we hae
33% of cells with an expected frequency <5, which is larger than 20%.
Thus, we next ask for the Fisher-Freeman-Halton test. Also, we notice that we need
column percents to check the percents in Sulkowski’s table, so we specify the “col”
option.
tabi 151 71\57 13\3 3, col exact
We get,
|
col
row |
1
2 |
Total
-----------+----------------------+---------1 |
151
71 |
222
|
71.56
81.61 |
74.50
-----------+----------------------+---------2 |
57
13 |
70
|
27.01
14.94 |
23.49
-----------+----------------------+---------3 |
3
3 |
6
|
1.42
3.45 |
2.01
-----------+----------------------+---------Total |
211
87 |
298
|
100.00
100.00 |
100.00
Fisher's exact =
0.038
The column percents agreee with what Sulkowski reported, which was
Race
Black
White
Other
Protease
Inhibitor
Regimen
(n =
211)
Dual
Nucleoside
Analog
Regimen
(n = 87)
151 (72)
57 (27)
3 ( 1)
71 (82)
13 (15)
3( 3)
P
value
0.02
The difference in p values is a mystery, but not a critical issue since the conclusion of a
significant difference does not change.
Chapter 2-99 (revision 8 Jan 2012)
p. 15
Problem 5) Comparison of an ordinal outcome
Body mass index (BMI) is computed using the equation:
body mass index (BMI) = weight/height2 in kg/m2
BMI is frequently recoded into four BMI categories recommended by the National Heart,
Lung, and Blood Institute (1998)(Onyike et al., 2003):
underweight (BMI <18.5)
normal weight (BMI 18.5–24.9)
overweight
(BMI 25.0–29.9)
obese
(BMI 30)
(How to compute BMI and recode it into these four categories is explained in Chapter 111, if you ever need to do this in your own research.) This recoding converts the data
from an interval scale into an ordinal scale, since the categories have order but not equal
intervals. To compare two groups on BMI as an ordinal scale, a Wilcoxon-MannWhitney test is appropriate.
Suppose the data are:
BMI, count (%)
Underweight
Normal weight
Overweight
Obese
Active
Drug
(n = 100
Placebo
Drug
(n = 100)
4 ( 4)
30 (30)
50 (50)
16 (16)
0 ( 0)
20 (20)
45 (45)
35 (35)
The ranksum command, which is the Wilcoxon-Mann-Whitney test, does not have an
“immediate” form, so we have to convert these data into “individual level” data, where
each row of the dataset represents an individual subject. To do this, copy the following
into the Stata do-file editor and run it.
Chapter 2-99 (revision 8 Jan 2012)
p. 16
* --- wrong way to do it --clear
input active bmicat count
1 1 4
1 2 30
1 3 50
1 4 16
0 1 0
0 2 20
0 3 45
0 4 35
end
expand count
tab bmicat active
The result is,
|
active
bmicat |
0
1 |
Total
-----------+----------------------+---------1 |
1
4 |
5
2 |
20
30 |
50
3 |
45
50 |
95
4 |
35
16 |
51
-----------+----------------------+---------Total |
101
100 |
201
We discover that we have a subject in the placebo group (1=Active, 0=Placebo) that does
not belong there. To avoid this situation, we must drop any cell count equal to 0 before
we expand the data. Here is the correct way to do it:
* --- correct way to do it --clear
input active bmicat count
1 1 4
1 2 30
1 3 50
1 4 16
0 1 0
0 2 20
0 3 45
0 4 35
end
drop if count==0 // always a good idea to add this line
expand count
tab bmicat active
The result is,
|
active
bmicat |
0
1 |
Total
-----------+----------------------+---------1 |
0
4 |
4
2 |
20
30 |
50
3 |
45
50 |
95
4 |
35
16 |
51
-----------+----------------------+---------Total |
100
100 |
200
Chapter 2-99 (revision 8 Jan 2012)
p. 17
Now that we have the dataset correctly created, to run the Wilcoxon-Mann-Whitney test
we use,
ranksum bmicat , by(active)
Two-sample Wilcoxon rank-sum (Mann-Whitney) test
active |
obs
rank sum
expected
-------------+--------------------------------0 |
100
11305
10050
1 |
100
8795
10050
-------------+--------------------------------combined |
200
20100
20100
unadjusted variance
adjustment for ties
adjusted variance
167500.00
-23343.59
---------144156.41
Ho: bmicat(active==0) = bmicat(active==1)
z =
3.305
Prob > |z| =
0.0009 <- report his two-tailed p value
From just visualizing the data, we notice that the placebo group tends to have greater BMI
values. To report this result, we could say something like:
BMI was significantly higher in the placeblo group compared to the active drug
group (p<0.001)[Table 1].
Problem 6) Comparison of an interval outcome
Cut and paste the following into the do-file editor and execute it to set up the dataset.
These data represent two groups (1=Patients with Coronary Heart Disease (CHD), 0=
Patients without CHD) on an outcome of systolic blood pressure (SBP).
clear
input chd sbp
1 225
1 190
1 162
1 178
1 158
0 154
0 124
0 128
0 165
0 162
end
graph box sbp ,over(chd)
Chapter 2-99 (revision 8 Jan 2012)
p. 18
220
200
sbp
180
160
140
120
0
1
Just by looking at the boxplot, would you guess a two-sample t-test would have a smaller
p value (more significant) than a Wilcoxon-Mann-Whitney test?
Looking at the graph, we see the non-CHD patients are skewed to the left, so the
mean is pulled into the direction of being smaller than the median. For the CHD
patients, the skewness is to the right, so the mean is pulled in the direction of
being larger than the median. A comparison of means (t-test) will be more
powerful than a comparison of medians (Wilcoxon-Mann-Whitney test) since the
means are more separated than the medians. One might wonder, though, if the
extra variability created by the skewness will offset this advantage, creating a
larger denominator in the t-test statistic, so maybe the Wilcoxon-Mann-Whitney
test will win out.
The four tests can be run using the following,
ttest sbp , by(chd)
ttest sbp , by(chd) unequal
ranksum sbp , by(chd)
permtest2 sbp, by(chd)
Chapter 2-99 (revision 8 Jan 2012)
p. 19
. ttest sbp , by(chd)
Two-sample t test with equal variances
-----------------------------------------------------------------------------Group |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------0 |
5
146.6
8.623224
19.28212
122.6581
170.5419
1 |
5
182.6
12.04824
26.94068
149.1487
216.0513
---------+-------------------------------------------------------------------combined |
10
164.6
9.207726
29.11739
143.7707
185.4293
---------+-------------------------------------------------------------------diff |
-36
14.81621
-70.16624
-1.833765
-----------------------------------------------------------------------------diff = mean(0) - mean(1)
t = -2.4298
Ho: diff = 0
degrees of freedom =
8
Ha: diff < 0
Pr(T < t) = 0.0206
Ha: diff != 0
Pr(|T| > |t|) = 0.0412
Ha: diff > 0
Pr(T > t) = 0.9794
. ttest sbp , by(chd) unequal
Two-sample t test with unequal variances
-----------------------------------------------------------------------------Group |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------0 |
5
146.6
8.623224
19.28212
122.6581
170.5419
1 |
5
182.6
12.04824
26.94068
149.1487
216.0513
---------+-------------------------------------------------------------------combined |
10
164.6
9.207726
29.11739
143.7707
185.4293
---------+-------------------------------------------------------------------diff |
-36
14.81621
-70.79485
-1.205146
-----------------------------------------------------------------------------diff = mean(0) - mean(1)
t = -2.4298
Ho: diff = 0
Satterthwaite's degrees of freedom = 7.24624
Ha: diff < 0
Pr(T < t) = 0.0221
Ha: diff != 0
Pr(|T| > |t|) = 0.0443
Ha: diff > 0
Pr(T > t) = 0.9779
. ranksum sbp , by(chd)
Two-sample Wilcoxon rank-sum (Mann-Whitney) test
chd |
obs
rank sum
expected
-------------+--------------------------------0 |
5
18.5
27.5
1 |
5
36.5
27.5
-------------+--------------------------------combined |
10
55
55
unadjusted variance
adjustment for ties
adjusted variance
22.92
-0.14
---------22.78
Ho: sbp(chd==0) = sbp(chd==1)
z = -1.886
Prob > |z| =
0.0593
Chapter 2-99 (revision 8 Jan 2012)
p. 20
. permtest2 sbp, by(chd)
Fisher-Pitman permutation test for two independent samples
chd |
obs
mean
std.dev.
-------------+--------------------------------0 |
5
146.6
19.282116
1 |
5
182.6
26.940676
-------------+--------------------------------combined |
10
164.6
29.117387
mode of operation:
exact (complete permutation)
Test of hypothesis Ho: sbp(chd==0) >= sbp(chd==1) :
Test of hypothesis Ho: sbp(chd==0) <= sbp(chd==1) :
Test of hypothesis Ho: sbp(chd==0) == sbp(chd==1) :
p=.02380952 (one-tailed)
p=.98412698 (one-tailed)
p=.04761905 (two-tailed)
We find that two-sample t-test which assumes equal variances has the smallest p value.
If you wonder if the skewness and differences in variances (or standard deviation
differences) are severe enough to invalid the two-sample t-test which assumes equal
variances, it is comforting to see the significance is confirmed by the Fisher-Pitman
permutation test which has neither the normality or homogeneity of variance assumptions.
This result illustrates the robustness of the t-test to these two assumptions, giving a
reasonable p value even though the assumptions might be called into question.
Chapter 2-5. Basics of power analysis
Problem 1) sample size determination for a comparison of two means
You have pilot data taken from Chapter 2-4, problem 6 above.
clear
input chd sbp
1 225
1 190
1 162
1 178
1 158
0 154
0 124
0 128
0 165
0 162
end
Designing a new study to test this difference in mean SPB, between patients with
CHD and without CHD, given these standard deviations, what sample size do you
need to have 80% power, using a two-sided alpha 0.05 level comparison?
After creating this dataset in Stata, to obtain the means and standard deviations,
you can use,
Chapter 2-99 (revision 8 Jan 2012)
p. 21
ttest sbp , by(chd)
* <or>
bysort chd: sum sbp
. ttest sbp , by(chd)
Two-sample t test with equal variances
-----------------------------------------------------------------------------Group |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------0 |
5
146.6
8.623224
19.28212
122.6581
170.5419
1 |
5
182.6
12.04824
26.94068
149.1487
216.0513
---------+-------------------------------------------------------------------combined |
10
164.6
9.207726
29.11739
143.7707
185.4293
---------+-------------------------------------------------------------------diff |
-36
14.81621
-70.16624
-1.833765
-----------------------------------------------------------------------------diff = mean(0) - mean(1)
t = -2.4298
Ho: diff = 0
degrees of freedom =
8
Ha: diff < 0
Pr(T < t) = 0.0206
Ha: diff != 0
Pr(|T| > |t|) = 0.0412
Ha: diff > 0
Pr(T > t) = 0.9794
. * <or>
. bysort chd: sum sbp
-------------------------------------------------------------------------------> chd = 0
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------sbp |
5
146.6
19.28212
124
165
-------------------------------------------------------------------------------> chd = 1
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------sbp |
5
182.6
26.94068
158
225
To obtain the required sample size, use,
sampsi 146.6 182.6 , sd1(19.3) sd2(26.9) power(.8)
Chapter 2-99 (revision 8 Jan 2012)
p. 22
Estimated sample size for two-sample comparison of means
Test Ho: m1 = m2, where m1 is the mean in population 1
and m2 is the mean in population 2
Assumptions:
alpha
power
m1
m2
sd1
sd2
n2/n1
=
=
=
=
=
=
=
0.0500
0.8000
146.6
182.6
19.3
26.9
1.00
(two-sided)
Estimated required sample sizes:
n1 =
n2 =
7
7
We see that we require n=7 subjects per group.
Problem 2) z score (effect size) approach to power analysis
Cuellar and Ratcliffe (2009) include the following sample size determination
paragraph in their article,
“The study was powered to include 40 participants, 20 randomized to each
group. Randomization would include an equal number of men and women
in each group. Twenty particpants per group were needed to detect the
required moderate effect size of 0.9 assuming 80% power, alpha of 0.05,
and using a Student’s t-test.”
Come up with the “sampsi” command that verifies the sample size computation
described in their paragraph. (hint: review the “What to do if you don’t know
anything” section of Chapter 2-5).
The authors were stating that their effect size was a 0.9 SD difference in means.
Given that z-scores have a mean of 0 and a SD of 1, you use SD = 1 for both
groups and a mean of 0 for one of the groups. You then assume the distribution is
shifted by a mean difference of 0.9SD =0.9(1) = 0.9 for the other group, so use a
mean of 0.9 for the other group. The sampsi command is thus,
sampsi 0 .9 , sd1(1) sd2(1) power(.8)
Chapter 2-99 (revision 8 Jan 2012)
p. 23
Estimated sample size for two-sample comparison of means
Test Ho: m1 = m2, where m1 is the mean in population 1
and m2 is the mean in population 2
Assumptions:
alpha
power
m1
m2
sd1
sd2
n2/n1
=
=
=
=
=
=
=
0.0500
0.8000
0
.9
1
1
1.00
(two-sided)
Estimated required sample sizes:
n1 =
n2 =
20
20
Problem 3) Verifying the z score approach to sample size determination and power
analysis is legitimate
In Chapter 2-5, the z score approach was presented, but without a demonstration that it
actually gives the same answer as when the data are expressed in their original
measurement scale. To verify the approach is legitimate, we will first a) verify that a zscore transformed variable has a mean of 0 and standard deviation of 1. Second, b) we
will verify a t-test on data transformed into z-scores gives the same p value as when the
original scale is used. Third, c) we will verify the sample size and power calculations are
identical for a z-score transformed variable and the variable in its original scale.
Note: Verifying with an example is not a formal mathematic proof; but if it is true, it
should work for any example we try. We can think of it as a demonstration, or
verification, rather than a proof. That is good enough for our purpose.
We will use the same dataset used above in Ch 2-5 problem 1, which is duplicated here.
Cut-and-paste this dataset into the do-file editor, highlight it with the mouse, and then hit
the last icon on the right to execute it.
clear
input chd sbp
1 225
1 190
1 162
1 178
1 158
0 154
0 124
0 128
0 165
0 162
end
sum sbp
Chapter 2-99 (revision 8 Jan 2012)
p. 24
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------sbp |
10
164.6
29.11739
124
225
a) verify that a z-score transformed variable has a mean of 0 and standard deviation of
1.
Using this mean and standard deviation (SD), generate a variable that contains the z
scores for systolic blood pressure (sbp), using the formula
z
X X
SD
[Hint: to compute z=(a-b)/c, use: generate z = (a-b)/c ]
Then, use the command, summarize, or abbreviated to sum, to obtain the means and SDs
for sbp and your new variable z. If you do this correctly, the mean of z will be 0, and the
SD of z will be 1.
Using,
generate z=(sbp-164.6)/29.11739
sum sbp z
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------sbp |
10
164.6
29.11739
124
225
z |
10
1.02e-09
.9999999 -1.394356
2.074362
Or, using the “extensions to generate command”, egen, along with the function std to get
the standardized scores, or z-scores, where the egen command does exactly the same
computation,
drop z // use this only if have already generated z
egen z = std(sbp)
sum sbp z
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------sbp |
10
164.6
29.11739
124
225
z |
10
1.02e-08
1 -1.394356
2.074362
(Note: the in-line comment “//” only works in the do-file editor—it will give an error
message is used in the Command window.)
We see that the variable z has a mean of 0, except for rounding error, and a SD of 1,
which is a known property of the z-score.
Chapter 2-99 (revision 8 Jan 2012)
p. 25
b) verify a t-test on data transformed into z-scores gives the same p value as when the
original scale is used
Using an independent sample t-test, compare the coronary heart disease, chd, group to the
healthy group on the outcome systolic blood pressure, sbp. Then, repeat the t-test for the
z-score transformed systolic blood pressure. Notice that the p values are identical for
both t-tests.
Using,
ttest sbp , by(chd)
ttest z , by(chd)
.
ttest sbp , by(chd)
Two-sample t test with equal variances
-----------------------------------------------------------------------------Group |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------0 |
5
146.6
8.623224
19.28212
122.6581
170.5419
1 |
5
182.6
12.04824
26.94068
149.1487
216.0513
---------+-------------------------------------------------------------------combined |
10
164.6
9.207726
29.11739
143.7707
185.4293
---------+-------------------------------------------------------------------diff |
-36
14.81621
-70.16624
-1.833765
-----------------------------------------------------------------------------diff = mean(0) - mean(1)
t = -2.4298
Ho: diff = 0
degrees of freedom =
8
Ha: diff < 0
Pr(T < t) = 0.0206
Ha: diff != 0
Pr(|T| > |t|) = 0.0412
Ha: diff > 0
Pr(T > t) = 0.9794
. ttest z , by(chd)
Two-sample t test with equal variances
-----------------------------------------------------------------------------Group |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------0 |
5
-.6181873
.2961538
.66222
-1.440442
.2040674
1 |
5
.6181874
.4137815
.9252436
-.5306543
1.767029
---------+-------------------------------------------------------------------combined |
10
1.02e-08
.3162278
1
-.7153569
.7153569
---------+-------------------------------------------------------------------diff |
-1.236375
.508844
-2.409771
-.0629783
-----------------------------------------------------------------------------diff = mean(0) - mean(1)
t = -2.4298
Ho: diff = 0
degrees of freedom =
8
Ha: diff < 0
Pr(T < t) = 0.0206
Ha: diff != 0
Pr(|T| > |t|) = 0.0412
Ha: diff > 0
Pr(T > t) = 0.9794
We see that the p values are identical for the two t-tests. This verifies that the power and
sample size determination will not be affected by a z-score transformation, since both can
be thought of as a function of the p value.
Chapter 2-99 (revision 8 Jan 2012)
p. 26
c) verify the sample size and power calculations are identical for a z-score transformed
variable and the variable in its original scale
Using the means and SDs from the first t-test, compute the required sample size for
power of 0.80. Do the same for the second t-test. Then, using a sample size of n=7 per
group, compute the power for these same means and SDs. You should get the same result
for both measurement scales.
Using,
sampsi
sampsi
*
sampsi
sampsi
146.6 182.6 ,sd1(19.28212) sd2(26.94068) power(.80)
-.6181873 .6181874 , sd1(.66222) sd2(.9252436) power(.80)
146.6 182.6 ,sd1(19.28212) sd2(26.94068) n1(7) n2(7)
-.6181873 .6181874 , sd1(.66222) sd2(.9252436) n1(7) n2(7)
. sampsi 146.6 182.6 ,sd1(19.28212) sd2(26.94068) power(.80)
Estimated sample size for two-sample comparison of means
Test Ho: m1 = m2, where m1 is the mean in population 1
and m2 is the mean in population 2
Assumptions:
alpha
power
m1
m2
sd1
sd2
n2/n1
=
=
=
=
=
=
=
0.0500
0.8000
146.6
182.6
19.2821
26.9407
1.00
(two-sided)
Estimated required sample sizes:
n1 =
n2 =
7
7
. sampsi -.6181873 .6181874 , sd1(.66222) sd2(.9252436) power(.80)
Estimated sample size for two-sample comparison of means
Test Ho: m1 = m2, where m1 is the mean in population 1
and m2 is the mean in population 2
Assumptions:
alpha
power
m1
m2
sd1
sd2
n2/n1
=
0.0500
=
0.8000
= -.618187
= .618187
=
.66222
= .925244
=
1.00
(two-sided)
Estimated required sample sizes:
n1 =
n2 =
7
7
We see we got the same result for the sample size determination.
Chapter 2-99 (revision 8 Jan 2012)
p. 27
. sampsi 146.6 182.6 ,sd1(19.28212) sd2(26.94068) n1(7) n2(7)
Estimated power for two-sample comparison of means
Test Ho: m1 = m2, where m1 is the mean in population 1
and m2 is the mean in population 2
Assumptions:
alpha
m1
m2
sd1
sd2
sample size n1
n2
n2/n1
=
=
=
=
=
=
=
=
0.0500
146.6
182.6
19.2821
26.9407
7
7
1.00
(two-sided)
Estimated power:
power =
0.8199
. sampsi -.6181873 .6181874 , sd1(.66222) sd2(.9252436) n1(7) n2(7)
Estimated power for two-sample comparison of means
Test Ho: m1 = m2, where m1 is the mean in population 1
and m2 is the mean in population 2
Assumptions:
alpha
m1
m2
sd1
sd2
sample size n1
n2
n2/n1
=
0.0500
= -.618187
= .618187
=
.66222
= .925244
=
7
=
7
=
1.00
(two-sided)
Estimated power:
power =
0.8199
We see we got the same result for the power analysis.
Clarification of the z-score approach
This is not quite how we did it in the chapter, however. In the chapter, we used a mean of
0 and SD of 1 for both groups, and then we used some fraction or multiple of the SD=1
for the effect size. In that approach, we assume that both groups have the same SD and
only differ in their means. Suppose that the data we used above come from pilot data, or
previous published data, so we can use these means and SDs in our sample size
determination for a larger study.
Chapter 2-99 (revision 8 Jan 2012)
p. 28
-----------------------------------------------------------------------------Group |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------0 |
5
146.6
8.623224
19.28212
122.6581
170.5419
1 |
5
182.6
12.04824
26.94068
149.1487
216.0513
---------+-------------------------------------------------------------------combined |
10
164.6
9.207726
29.11739
143.7707
185.4293
---------+-------------------------------------------------------------------diff |
-36
14.81621
-70.16624
-1.833765
-----------------------------------------------------------------------------diff = mean(0) - mean(1)
t = -2.4298
Ho: diff = 0
degrees of freedom =
8
Ha: diff < 0
Pr(T < t) = 0.0206
Ha: diff != 0
Pr(|T| > |t|) = 0.0412
Ha: diff > 0
Pr(T > t) = 0.9794
We see for our control group, we have n=5, mean=146.6, SD=19.28. For our treatment
group, we have n=5, mean=182.6, and SD=26.94. For estimate of a common, or same,
SD, what should we use? Conservatively, we could use the larger of the two, or common
SD = 26.94. We do not use the combined SD from the t-test output, which is SD=29.12,
as the t-test does not use that value and it is unnecessarily large. Even the larger SD of
the two groups, SD=26.94, is unnecesarily large. The power of the t-test will depend on
the SD that the t-test will use, which is a weighted average of the two SDs. The formula
for this, for the two sample t-test with equal variances, called the pooled standard
deviation, is (Rosner, 2006, p.305),
(n1  1) s12  (n2  1) s22
s
n1  n2  2
Using this formula, we calculate the pooled SD using,
display sqrt(((5-1)*19.28212^2+(5-1)*26.94068^2)/(5+5-2))
23.426485
The mean difference from the t-test output was, -36, or 36 if you substract in the opposite
direction. Just using 36, then, is okay, since you get an identical result for the two-sided
comparison, and a two-sided comparison is almost universally what you use. This
difference expressed in standard deviation units is
display 36/23.426485
1.5367222
Consistent with how it was done in Chapter 2-5, we use the fact that z-scores have means
of 0 and SDs of 1. So, we specify one mean as 0, the other mean as the difference in SD
units, which is 1.5367222, and use SD=1 for both groups. We then compute the required
sample size for power = 0.80 using,
sampsi 0 1.5367222 , sd1(1) sd2(1) power(.80)
Chapter 2-99 (revision 8 Jan 2012)
p. 29
Estimated sample size for two-sample comparison of means
Test Ho: m1 = m2, where m1 is the mean in population 1
and m2 is the mean in population 2
Assumptions:
alpha
power
m1
m2
sd1
sd2
n2/n1
=
=
=
=
=
=
=
0.0500
0.8000
0
1.53672
1
1
1.00
(two-sided)
Estimated required sample sizes:
n1 =
n2 =
7
7
We see this is identical to the n=7 in each group calculated using the original scale of the
variable above. We then verify the power calculation, using,
sampsi 0 1.5367222 , sd1(1) sd2(1) n1(7) n2(7)
Estimated power for two-sample comparison of means
Test Ho: m1 = m2, where m1 is the mean in population 1
and m2 is the mean in population 2
Assumptions:
alpha
m1
m2
sd1
sd2
sample size n1
n2
n2/n1
=
=
=
=
=
=
=
=
0.0500
0
1.53672
1
1
7
7
1.00
(two-sided)
Estimated power:
power =
0.8199
We see that the power = 0.8199 is identical to the power computed above using the
original scale of the variable.
Chapter 2-6. More on levels of measurement
Chapter 2-7. Comparison of two paired groups
Chapter 2-8. Multiplicity and the Comparison of 3+ Groups
Problem 1) working with the formulas for the Bonferroni procedure, Holm procedure,
and Hochberg procedure for multiplicity adjustment
These are three popular procedures with formulas that are simply enough that you can do
them by hand. The procedures are described in Ch 2-8, pp. 9-10.
Chapter 2-99 (revision 8 Jan 2012)
p. 30
For Bonferroni adjusted p values, you simply multiply each p value by the number of
comparisons made. If this results in a p value greater than 1, an anomoly, you set the
adjusted p value to 1.
For Holm adjusted p values, you first sort the p values from smallest to largest. Than you
multiple by smallest p value by the number of comparison, the next smallest by the
number of comparisons minus 1, and so on. Do this same thing for Hochberg adjusted p
values. If a Holm adjusted p value becomes larger than the next adjusted p value, an
anomoly since this conflicts with the rank ordering of the unadjusted p values, you carry
the previous adjusted p value forward. For Hochberg, the anomoly adjustment is to carry
the subsequent adjusted p value backward. If the Holm adjusted p value exceeds 1, you
set it to 1.
Fill in the following table, doing the computations and adjustments in your head.
Sorted
Bonferroni Holm
Unadjusted Adjusted
Adjusted
P value
P value
P value
(before
anomoly
correction)
0.020
0.060
0.060
0.025
0.075
0.050
0.040
0.120
0.040
Holm
Adjusted
P value
(after
anomoly
correction)
0.060
0.060
0.060
Hochberg
Adjusted
P value
(before
anomoly
correction)
0.060
0.050
0.040
Hochberg
Adjusted
P value
(after
anomoly
correction)
0.040
0.040
0.040
Use the mcpi command, after installing it if necessary as described in Ch 2-8, to check
your answers.
The mcpi command will have the form,
mcpi .020 .025 .040
SORTED ORDER: before anomaly
Unadj ---------------------P Val
TCH
Homml Finnr
0.0200 0.034 0.040 0.059
0.0250 0.043 0.040 0.037
0.0400 0.068 0.040 0.040
corrected
Adjusted --------------------------Hochb Ho-Si Holm
Sidak Bonfr
0.060 0.059 0.060 0.059 0.060
0.050 0.049 0.050 0.073 0.075
0.040 0.040 0.040 0.115 0.120
SORTED ORDER: anomaly corrected
(1) If Finner or Holm or Bonfer P > 1 (undefined) then set to 1
(2) If Finner or Hol-Sid or Holm P < preceding smaller P
(illogical) then set to preceding P
(3) Working from largest to smallest, if Hochberg preceding
smaller P > P then set preceding smaller P to P
Unadj ---------------------- Adjusted --------------------------P Val
TCH
Homml Finnr Hochb Ho-Si Holm
Sidak Bonfr
0.0200 0.034 0.040 0.059 0.040 0.059 0.060 0.059 0.060
0.0250 0.043 0.040 0.059 0.040 0.059 0.060 0.073 0.075
0.0400 0.068 0.040 0.059 0.040 0.059 0.060 0.115 0.120
ORIGINAL ORDER: anomaly corrected
Chapter 2-99 (revision 8 Jan 2012)
p. 31
Unadj ---------------------- Adjusted --------------------------P Val
TCH
Homml Finnr Hochb Ho-Si Holm
Sidak Bonfr
0.0200 0.034 0.040 0.059 0.040 0.059 0.060 0.059 0.060
0.0250 0.043 0.040 0.059 0.040 0.059 0.060 0.073 0.075
0.0400 0.068 0.040 0.059 0.040 0.059 0.060 0.115 0.120
----------------------------------------------------------------*Adjusted for 3 multiple comparisons
KEY: TCH
Homml
Finnr
Hochb
Ho-Si
Holm
Sidak
Bonfr
= Tukey-Ciminera-Heyse procedure
(use TCH only with highly correlated comparisons)
= Hommel procedure
= Finner procedure
= Hochberg procedure
= Holm-Sidak procedure
= Holm procedure
= Sidak procedure
= Bonferroni procedure
This exercise illustrates the conservativeness of the Bonferroni procedure, which lost
significance for all three p values. It also illustrates why Hochberg is more popular, and
more powerful, than the Holm procedure—the Holm procedure lost all of the
significance, as well, while Hochberg kept all three p values significant.
Problem 2) a published example of using the Bonferroni procedure
Kumara et al. (2009) compared five protein assays against a preop baseline. In their
Statistical Methods, they state, “In regards to the protein assays, 5 comparisons (all vs.
preop baseline) were carried out for each parameter, thus, a Bonferroni adjustment was
made….”
These authors had several parameters, with five comparisons made for each parameter.
The adjustment for the five comparisons was made separately for each parameter, which
is the popular way to do it. One might consider if an adjustment needs to be made for all
of the parameters simultaneously, so k parameters × 5 comparisons, which would be a
large number. Alternatively, you could adjust for all p values reported in the paper. A
line has to be drawn somewhere, or all significance would be lost in the paper.
Statisticians have drawn the line at a “family” of comparisons, so arises the term “familywise error rate” or FWER. Each parameter represents a family of five related
comparisons, so an adjustment is made for those five comparisons to control the FWER.
This adjustment is done separately for each parameter, which makes sense, since each
parameter is a separate question to study.
In the Kumara (2009, last paragraph), we find,
“The mean preopplasma level for the 105 patients was 164 ± 146 pg/mL.
Significant elevations, as per the Bonferoni criteria, were noted on POD 5 (355 ±
275 pg/mL, P = 0.002) and for the POD 7 to 13 time period (371 ± 428 pg/mL, P
= 0.001) versus the preop results (Fig.1). Although VEGF levels were eleavated
for the POD 14 to 20 (289 ± 297 pg/mL; vs. preop, P = 0.036) and POD 21 to 27
(244 ± 297 pg/mL; vs. preop, P = 0.048), as per the Bonferoni correction, these
differences were not significant. By the second month after surgery the mean
VEGF level was near baseline….”
Chapter 2-99 (revision 8 Jan 2012)
p. 32
This is an example of where two significant p values were lost after adjusting for multiple
comparions. The authors reported the results this way, showing the unadjusted p values,
while mentioning the adjustment declared them nonsignificant, apparently because they
thought the effects were real and they wanted to lead the reader in that direction. Even in
their Figure 1 they denote the results as significant. This is a good illustration of
investigators being frustrated by “I had significant but lost it due to the stupid multiple
comparison adjustment.” It could easily be argued that the authors took the right
approach, since not informing the reader might produce a Type II error (false negative
conclusion). There is no universal consensus on this point.
The exercise is to see if applying some other multiple comparison adjustment would have
saved the significance, since we know the Bonferroni procedure is too conservative.
For the fifth p value, the last quoted sentence, it was clearly greater than 0.05, so just use
0.50. It makes no difference what value >0.05 we choose, since multiple comparison
adjustments cannot create significance that was not there before adjustment. Using 0.50,
then, along with the four other p values in the quoted paragraph, use the mcpi command
to see if significance would have been saved by one of the other procedures.
mcpi .002 .001 .036 .048 0.500
Chapter 2-99 (revision 8 Jan 2012)
p. 33
ORIGINAL ORDER: anomaly corrected
Unadj ---------------------- Adjusted --------------------------P Val
TCH
Homml Finnr Hochb Ho-Si Holm
Sidak Bonfr
0.0020 0.004 0.008 0.005 0.008 0.008 0.008 0.010 0.010
0.0010 0.002 0.005 0.005 0.005 0.005 0.005 0.005 0.005
0.0360 0.079 0.072 0.059 0.096 0.104 0.108 0.167 0.180
0.0480 0.104 0.096 0.060 0.096 0.104 0.108 0.218 0.240
0.5000 0.788 0.500 0.500 0.500 0.500 0.500 0.969 1.000
----------------------------------------------------------------*Adjusted for 5 multiple comparisons
We discover that none of the FWER procedures saved the day in the strict sense of
adjusted p values < 0.05. However, we observe that Finner’s procedure came close, with
0.059 and 0.060, in place of 0.180 and 0.240 that Bonferroni gave. This would have
allowed the reader to argue a “marginally significant” result, which is usually reported as
a “trend toward significance.”
This example also illustrates how different multiple comparison procedures can be the
winner from situation to situation. Which procedure will be the winner depends on the
pattern of the p values. Even statisticians are at a lose of predicting a priori which
procedure will be the winner, making a “pre-specified” analysis a nerve racking practice
to follow.
Problem 3) Analyzing Data with Multiple Treatments
An animal experiment was performed to test the effectiveness of a new drug. The
researcher was convinced the drug was effective, but he was not sure which carrier
solution would enhance drug delivery. (The drug is disolved into a the carrier solution so
it can be delivered intravenously.) Three candidate carriers were considered, carrier A, B,
and C. The experiment then involved four groups:
treat: 1 = inert carrier only (control group)
2 = active drug in carrier A
3 = active drug in carrier B
4 = active drug in carrier C
The researcher wanted to conclude the drug was effective if any of the active drug groups
were significantly greater on the response variable than the control group.
That is, the decision rule was:
Conclude effectiveness if (treatment 1 > control) or ( treatment 2 > control) or
( treatment 3 > control).
The decision rule, or “win strategy” fits the multiple comparison situation where you
want to control the family-wise error rate (FWER), which is the typical situation that
researchers learn about in statistics courses. That is, you want to use multiple
comparisons to arrive at a single conclusion, and you want to keep the Type I error at
alpha ≤ 0.05.
Chapter 2-99 (revision 8 Jan 2012)
p. 34
To create the dataset, copy the following into the Stata do-file editor and execute it.
clear
set seed 999
set obs 24
gen treat = 1 in 1/6
replace treat = 2 in 7/12
replace treat = 3 in 13/18
replace treat = 4 in 19/24
gen response = invnorm(uniform())*4+1*treat
bysort treat: sum response
-> treat = 1
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------response |
6
.2875054
3.057285 -5.066414
2.869501
-> treat = 2
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------response |
6
1.772157
2.48272 -2.487267
4.081223
-> treat = 3
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------response |
6
4.119424
7.105336 -3.477194
15.84198
-> treat = 4
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------response |
6
6.501425
2.198203
2.682323
9.369428
For sake of illustration, let’s now perform a oneway analysis of variance, which many
statistics instructors still erroneously teach is a necessary first step.
oneway response treat, tabulate
Analysis of Variance
Source
SS
df
MS
F
Prob > F
-----------------------------------------------------------------------Between groups
133.575257
3
44.5250857
2.51
0.0876
Within groups
354.143942
20
17.7071971
-----------------------------------------------------------------------Total
487.719199
23
21.2051826
The oneway ANOVA is not significant (p = 0.088), so the statistics instructors who
advocate this was a necessary first step would say you cannot go any further. You would
then conclude that the drug was not effective.
As was pointed out in Chapter 2-8, in the section called “Common Misconception of
Thinking Analysis of Variance (ANOVA) Must Precede Pairwise Comparisons,” there is
no reason to do this. Many statisticians are aware that the ANOVA test is ultra
conservative, and so they stay away from it when testing treatment effects.
Chapter 2-99 (revision 8 Jan 2012)
p. 35
A more correct and more powerful approach is to bypass the ANOVA test, going straight
to making the three specific individual comparisons of interest, which are each of the
three active drug groups (2, 3, and 4) with the control group (1). A multiple
comparison procedure is applied to these three comparisons to control the family-wise
error rate.
The homework problem is to perform an independent sample t-test between groups 1 and
2, 1 and 3, and 1 and 4, and then adjust the three obtained two-sided p values using the
mcpi command. You will need to include an “if” statement in the ttest command, as was
shown in Chapter 2-8 (see pp. 44-45). Apply Hommell’s procedure, which is one of the
methods used by the mcpi command. From the obtained results, should you conclude the
drug is effective?
ttest response if treat==1 | treat==2, by(treat)
ttest response if treat==1 | treat==3, by(treat)
ttest response if treat==1 | treat==4, by(treat)
mcpi .3775 .2528 .0024
. ttest response if treat==1 | treat==2, by(treat)
Ha: diff < 0
Pr(T < t) = 0.1888
Ha: diff != 0
Pr(|T| > |t|) = 0.3775
Ha: diff > 0
Pr(T > t) = 0.8112
. ttest response if treat==1 | treat==3, by(treat)
Ha: diff < 0
Pr(T < t) = 0.1264
Ha: diff != 0
Pr(|T| > |t|) = 0.2528
Ha: diff > 0
Pr(T > t) = 0.8736
. ttest response if treat==1 | treat==4, by(treat)
Ha: diff < 0
Pr(T < t) = 0.0012
Ha: diff != 0
Pr(|T| > |t|) = 0.0024
Ha: diff > 0
Pr(T > t) = 0.9988
. mcpi .3775 .2528 .0024
ORIGINAL ORDER: anomaly corrected
Unadj ---------------------- Adjusted --------------------------P Val
TCH
Homml Finnr Hochb Ho-Si Holm
Sidak Bonfr
0.3775 0.560 0.377 0.377 0.377 0.442 0.506 0.759 1.000
0.2528 0.396 0.377 0.354 0.377 0.442 0.506 0.583 0.758
0.0024 0.004 0.007 0.007 0.007 0.007 0.007 0.007 0.007
----------------------------------------------------------------*Adjusted for 3 multiple comparisons
The p value for the hypothesis test of effectiveness is the smallest adjusted p value from
the three comparisons. In this dataset, you would conclude: The drug was demonstrated
to be effective (p = 0.007).
Chapter 2-9. Correlation
Chapter 2-10. Linear regression
Problem 1) regression equation
The regression output, among other things, shows the equation of the regression line.
From simple algebra, we know the equation of a straight line is:
Chapter 2-99 (revision 8 Jan 2012)
p. 36
y  a  bx
where y is the outcome, or dependent variable, x is the predictor, or independent variable,
a is the y-intercept, and b is the slope of the line. Fitting this equation is called “simple
linear regression.” Extending this equation to three predictor variables, the regression
equation is:
y  a  b1 x1  b2 x2  b3 x3
Fitting this equation is called “multivariable linear regression” to signify that more than
one predictor variable is included in the equation.
Using the FEV dataset, fev.dta, predicting FEV by height,
regress fev height
-----------------------------------------------------------------------------fev |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------height |
.1319756
.002955
44.66
0.000
.1261732
.137778
_cons | -5.432679
.1814599
-29.94
0.000
-5.788995
-5.076363
------------------------------------------------------------------------------
We can find the intercept and slope from the coefficient column of the regression table
and write the linear equation as,
fev = -5.432679 + 0.1319756(height)
Listing these variables for the first subject,
list fev height in 1
+----------------+
|
fev
height |
|----------------|
1. | 1.708
57 |
+----------------+
To apply the prediction equation to predict FEV for this first subject, we can use the
display command, where “*” denotes multiplication,
display -5.432679 + 0.1319756*57
2.0899302
We can get Stata to provide the predicted values from applying the regression equation,
using the predict command. In the following example, we use “pred_fev” as the variable
we choose to store the predicted values in. The predict command applies the equation
from the last fitted model.
Chapter 2-99 (revision 8 Jan 2012)
p. 37
predict pred_fev
list fev height pred_fev in 1
+---------------------------+
|
fev
height
pred_fev |
|---------------------------|
1. | 1.708
57
2.089929 |
+---------------------------+
The predicted value that Stata came up with is slightly different from what we got with
the display command because it is using more decimal places of accuracy.
Now that we see how it works, the homework problem is to fit a multivariable model for
FEV with height and age as the predictor variables. Use the display command to predict
FEV for the first subject. Then, use the predict command to check your answer.
In other words, take all of the Stata commands we used above,
regress fev height
list fev height in 1
display -5.432679 + 0.1319756*57
capture drop pred_fev
predict pred_fev
list fev height pred_fev in 1
and modify them for the two predictors, height and age, used together.
The homework solution is the following commands:
regress fev height age
list fev height age in 1
display -4.610466 + 0.1097118*57 + 0.0542807*9
capture drop pred_fev
predict pred_fev
list fev height age pred_fev in 1
. regress fev height age
Source |
SS
df
MS
-------------+-----------------------------Model | 376.244941
2 188.122471
Residual | 114.674892
651 .176151908
-------------+-----------------------------Total | 490.919833
653 .751791475
Number of obs
F( 2,
651)
Prob > F
R-squared
Adj R-squared
Root MSE
=
654
= 1067.96
= 0.0000
= 0.7664
= 0.7657
=
.4197
-----------------------------------------------------------------------------fev |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------height |
.1097118
.0047162
23.26
0.000
.100451
.1189726
age |
.0542807
.0091061
5.96
0.000
.0363998
.0721616
_cons | -4.610466
.2242706
-20.56
0.000
-5.050847
-4.170085
-----------------------------------------------------------------------------. list fev height age in 1
+----------------------+
|
fev
height
age |
|----------------------|
Chapter 2-99 (revision 8 Jan 2012)
p. 38
1. | 1.708
57
9 |
+----------------------+
. display -4.610466 + 0.1097118*57 + 0.0542807*9
2.1316329
. capture drop pred_fev
. predict pred_fev
(option xb assumed; fitted values)
. list fev height age pred_fev in 1
+---------------------------------+
|
fev
height
age
pred_fev |
|---------------------------------|
1. | 1.708
57
9
2.131634 |
+---------------------------------+
Problem 2) reporting the results
Using the regression model output from Problem 1, which is
Source |
SS
df
MS
-------------+-----------------------------Model | 376.244941
2 188.122471
Residual | 114.674892
651 .176151908
-------------+-----------------------------Total | 490.919833
653 .751791475
Number of obs
F( 2,
651)
Prob > F
R-squared
Adj R-squared
Root MSE
=
654
= 1067.96
= 0.0000
= 0.7664
= 0.7657
=
.4197
-----------------------------------------------------------------------------fev |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------height |
.1097118
.0047162
23.26
0.000
.100451
.1189726
age |
.0542807
.0091061
5.96
0.000
.0363998
.0721616
_cons | -4.610466
.2242706
-20.56
0.000
-5.050847
-4.170085
------------------------------------------------------------------------------
write a sentence that could be used to interpret the the association of height with FEV.
Solution: There are many ways to write this, so there are many correct answers to this
problem. Here are two possibilities:
a) brief statement: Height was associated with FEV (p<0.001), controlling for age.
b) detailed statement: Height was significantly associated with FEV, after controlling for
age [adjusted mean increase in FEV per one inch of height, 0.11, 95%CI (0.10, 0.12),
p<0.001].
Problem 3) Are the signs (positive or negative) of correlations transitive?
When we speak of correlation, referring to the regression coefficient or referring to the
correlation coefficient is analogous, since the correlation coefficient is simply the
regression coefficient computed after transforming the data to standardized scores, or zscores. Both approaches test for a correlation (an association). When the regression
coefficient or the correlation coefficient is positive, we say the variables are “positively
correlated”, so as you increase on one variable, you also increase on the other. When they
are negative, we say the variables are “negatively correlated”, so as you increase on one
Chapter 2-99 (revision 8 Jan 2012)
p. 39
variable, you decrease on the other. It does not matter which variable is the dependent or
independent variable when you test for a correlation or when you interpret whether it is
negatively or positively correlated.
In mathematics, the transitive property of equality is:
If A=B and B=C, then A=C.
Similarly, the transitive property of inequality is:
If A≥B and B≥C, then A≥C.
Suppose you analyze your data, computing correlation coefficients between variables A,
B, and C, and you get the following result:
A vs B: Pearson r = +0.75
B vs C: Pearson r = +0.70
A vs C: Pearson r = -0.50
Your co-investigator looks at these results and says to you, “This cannot be right. If B
increases with increasing A, and C increases with increasing B, then it is impossible for C
to decrease with increasing A. Think about it. If I put a beaker of water on a flame, the
temperature of the water increases when the temperature of the breaker increases (A
increase → B increase). If I put a stone in the water, the temperature of the stone goes up
when the temperature of the water goes up (B increase → C increase). Now you are
trying to tell me that when the temperature of the breaker increases the temperature of the
stone decreases? (A increase → C decrease).” What your co-investigator just did was
assume that the sign of the correlation coefficient exhibits the transitive property.
Is the co-investigator correct that you must have made a mistake?
(hint: You will not find this in the course manual chapters on regression or correlation. It
is just an exercise in reasoning. It is included here as a homework problem because this
type of “reasoning” is quite common when researchers look for associations.)
Solution. The answer is that the co-investigator is making a logical mistake. The
transitive property that we see in mathematics for equality and inequality does not hold in
other settings, in general. The sign of the correlation coefficient does not have the
transitive property.
Here is a simple real-life example of the transitive property not holding: Bill loves Jane.
Jane loves Bob. Does this imply Bill loves Bob? No, actually Bill hates Bob because he
is jealous of Jane’s love for Bob.
Chapter 2-11. Logistic regression and dummy variables
Chapter 2-99 (revision 8 Jan 2012)
p. 40
Chapter 2-12. Survival analysis: Kaplan-Meier graphs, Log-rank Test, and
Cox regression
Chapter 2-13. Confidence intervals versus p values and trends toward
Significance
Chapter 2-14. Pearson correlation coefficient with clustered data
Chapter 2-15. Equivalence and noninferiority tests
Problem 1) Difference in two proportions noninferiority test
Reboli et al. (N Engl J Med, 2007) conducted a randomized, double-blind, noninferiority
trial to test their hypothesis that anidualfungin, a new echinocandin, is noninferior to
fluconazole for the treatment of invasive candidiasis. In their article, they stated that their
statistical method was,
“The primary analysis in this noninferiority trial was a two-step comparison of the
rate of global success between the two study groups at the end of intravenous
therapy. A two-sided 95% confidence interval was calculated for the true
difference in efficacy (the success rate with anidulafungin minus that with
fluconazole). In the first step, noninferiority was considered to be shown if the
lower limit of the two-sided 95% confidence interval was greater than -20
percentage points. In the second step, if the lower limit was greater than 0, then
anidualfungin was considered to be superior in the strict sense to fluconazole.”
To set up the data reported by Reboli et al., cut-and-paste the following into the Stata dofile editor, highlight it, and double click on the last icon on the do-file menu bar to
execute it.
clear all
set obs 245
gen anidulafungin=1 in 1/127
replace anidulafungin=0 in 128/245
recode anidulafungin 0=1 1=0 ,gen(fluconazole)
label variable fluconazole // turn off variable label
gen globalresponse=1 in 1/96
replace globalresponse=0 in 97/127
replace globalresponse=1 in 128/198
replace globalresponse=0 in 199/245
label define anidulafunginlab 1 "1. anidulafungin" ///
0 "0. fluconazole"
label values anidulafungin anidulafunginlab
label define fluconazolelab 1 "1. fluconazole" ///
0 "0. anidulafungin"
label values fluconazole fluconazolelab
label define globalresponselab 1 "1. Success" 0 "0. Failure"
label values globalresponse globalresponselab
tab globalresponse anidulafungin, col
tab globalresponse fluconazole, col
Chapter 2-99 (revision 8 Jan 2012)
p. 41
Part a) Using the noninferiority margin of -20%, where anidulafungin could have an
absolute percent success of 20 points (20%) less than fluconazole, test the noninferiority
hypothesis using with the appropriate confidence interval. Hint: Use the prtest
command. Depending on which direction you want to compute the difference, use either
the anidulafungin group variable or the fluconazole group variable.
Part b) If justified following the noninferiority analysis, test for superiority of
anidulafungin to fluconazole.
Solution. In Stata, an easy way to get a confidence interval for the difference in two
proportions is the prtest command. Normally in a data analysis, you use an indicator
variable for the new therapy, which is anidulafungin, making the standard therapy,
fluconazole, the referent group. So, most likely, first you would try,
prtest globalresponse , by(anidulafungin)
Two-sample test of proportions
0. fluconazo: Number of obs =
118
1. anidulafu: Number of obs =
127
-----------------------------------------------------------------------------Variable |
Mean
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------0. fluconazo |
.6016949
.0450666
.5133659
.6900239
1. anidulafu |
.7559055
.0381163
.6811989
.8306121
-------------+---------------------------------------------------------------diff | -.1542106
.0590242
-.2698959
-.0385253
| under Ho:
.0595634
-2.59
0.010
-----------------------------------------------------------------------------diff = prop(0. fluconazo) - prop(1. anidulafu)
z = -2.5890
Ho: diff = 0
Ha: diff < 0
Pr(Z < z) = 0.0048
Ha: diff != 0
Pr(|Z| < |z|) = 0.0096
Ha: diff > 0
Pr(Z > z) = 0.9952
If we report this difference, -0.15, or -15%, it would be confusing to the reader, because
we want to make a statement about anidulafungin relative to fluconazole, and this
difference makes anidulafungin look like it had 15% less success than fluconazole.
Actually, anidulafungin has 15% more success. The problem is that the prtest command
lists the groups, or creates table rows, in numerical order and then subtracts the second
row from the first. We want the subtraction in the opposite order. (The ttest command in
Stata does the same thing for comparing means.)
We can still use this output, by 1) changing the sign on the difference to make it 0.154,
rather than -0.154, and then 2) change the signs of the two confidence limits and reverse
their order, so (-0.270 , -0.039) becomes (0.039 , 0.270). Finally, converting to
percentages makes this result 15.4% and (3.9% , 27.0%).
Alternatively, we can use the indicator variable that is coded in the opposite order so that
the subtraction is in the desired direction,
prtest globalresponse , by(fluconazole)
Chapter 2-99 (revision 8 Jan 2012)
p. 42
Two-sample test of proportions
0. anidulafu: Number of obs =
127
1. fluconazo: Number of obs =
118
-----------------------------------------------------------------------------Variable |
Mean
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------0. anidulafu |
.7559055
.0381163
.6811989
.8306121
1. fluconazo |
.6016949
.0450666
.5133659
.6900239
-------------+---------------------------------------------------------------diff |
.1542106
.0590242
.0385253
.2698959
| under Ho:
.0595634
2.59
0.010
-----------------------------------------------------------------------------diff = prop(0. anidulafu) - prop(1. fluconazo)
z =
2.5890
Ho: diff = 0
Ha: diff < 0
Pr(Z < z) = 0.9952
Ha: diff != 0
Pr(|Z| < |z|) = 0.0096
Ha: diff > 0
Pr(Z > z) = 0.0048
Converting to percents, the result is 15.4% and (3.9% , 27.0%).
We see that noninferiority of anidulafungin relative to fluconazole was demonstrated,
since the lower bound of the two-sided 95% confidence interval does not cross -20%. We
are now justified to test for superiority by either comparing the lower bound of this same
confidence interval to the zero, or by an ordinary significance test of the difference. Since
we are testing for superiority only after first establishing noninferiority, no adjustment for
multiplicity (multiple comparisons) is required. Superiority of anidulafungin to
fluconazole is demonstrated since the 3.8% is greater than 0%. Likewise, we can use the
two-sided p value, which is statistically significant (p=0.0096, which we report as
p=0.01).
In their article, Reboli et al, did just this, reported this result in their Result section
stating,
“For the primary end point of global response at the end of intravenous therapy, a
successful outcome was achieved in 96 of 127 patients in the anidulafungin group
(75.6%), as compared with 71 of 118 patients in the fluconazole group (60.2%)
(difference, 15.4 percentage points; 95% confidence interval [CI], 3.9 to 27.0); therefore,
anidulafungin met the prespecified criteria for noninferiority to fluconazole. Since the
confidence interval for the difference excluded 0, there was a significantly greater
response rate in the anidulafungin group (P=0.01).
Chapter 2-16. Validity and reliability
Chapter 2-17. Methods comparison studies
Chapter 2-99 (revision 8 Jan 2012)
p. 43
References
Cuellar NG, Ratcliffe SJ. (2009). Does valerian improve sleepiness and symptom severity in
people with restless legs syndrome? Alternative Therapies 15(2):22-28.
Daniel WW. (1995). Biostatistics: A Foundation for Analysis in the Health Sciences. 6th ed.
New York, John Wiley & Sons.
Kumara HMCS, Feingold D, Kalady M. et al. (2009). Colorectal resection is associated wth
persistent proangiogenic plasma protein changes: a postoperative plasma stimulates in
vitro endothelial cell growth, migration, and invasion. Ann Surg 249(6):973-977.
Hidebrand DK. (1986). Statistical Thinking for Behavioral Scientists. Boston, Duxbury Press.
Miller-Davis C, Marden S, Leidy NK. (2006). The New York Heart Association Classes and
functional status: what are we really measuring? Heart Lung 35(4):217-24.
Munro BH. (2001). Statistical Methods for Health Care Research. 4th ed. Philadelphia,
Lippincott.
Onyike CU, Crum RM, Lee HB, Lyketsos CG, Eaton WW. (2003). Is obesity associated with
major depression? Results from the third national health and nutrition examination
survey. Am J Epidemiol 158(12):1139-1153.
Reboli AC, Rotstein C, Pappas PG, et al. (2007). Anidulafungin versus fluconazole for invasive
candidiasis. N Engl J Med 356;24:2472-2482.
Rosner B. (2006). Fundamentals of Biostatistics, 6th ed. Belmont CA, Duxbury Press.
Sulkowski MS, Thomas DL, Chaisson RE, Moore RD. (2000). Hepatotoxicity associated with
antiretroviral therapy in adults infected with human immunodeficiency virus and the role
of hepatitis C or B virus infection. JAMA 283(1):74-80. {cited in: Ch 2-1, 3-5, 3-9}
Chapter 2-99 (revision 8 Jan 2012)
p. 44