Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 5-16. Correlated Data: Analysis of Covariance (ANCOVA) Versus
Change Scores
In this chapter we will compare the analysis of covariance (ANCOVA) approach to the change
approach for analyzing repeated measures data. By ANCOVA, we are referring to specific
analysis situation where where the baseline outcome variable is controlled for while testing for
effects on one or more later outcome variable measurements.
Isoproterenol Dataset
We will again use the 11.2.Isoproterenol.dta dataset provided with the Dupont (2002, p.338)
textbook, described as,
“Lang et al. (1995) studied the effect of isoproterenol, a β-adrenergic agonist, on forearm
blood flow in a group of 22 normotensive men. Nine of the study subjects were black and
13 were white. Each subject’s blood flow was measured at baseline and then at
escalating doses of isoproterenol.”
Reading the data in,
File
Open
Find the directory where you copied the course CD
Change to the subdirectory datasets & do-files
Single click on 11.2.Isoproterenol.dta
Open
use "C:\Documents and Settings\u0032770.SRVR\Desktop\
Biostats & Epi With Stata\datasets & do-files\
11.2.Isoproterenol.dta", clear
*
which must be all on one line, or use:
cd "C:\Documents and Settings\u0032770.SRVR\Desktop\"
cd "Biostats & Epi With Stata\datasets & do-files"
use 11.2.Isoproterenol.dta, clear
_________________
Source: Stoddard GJ. Biostatistics and Epidemiology Using Stata: A Course Manual [unpublished manuscript] University of Utah
School of Medicine, 2010.
Chapter 5-16 (revision 16 May 2010)
p. 1
Listing it,
list , nolabel
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
+---------------------------------------------------------------------+
| id
race
fbf0
fbf10
fbf20
fbf60
fbf150
fbf300
fbf400 |
|---------------------------------------------------------------------|
| 1
1
1
1.4
6.4
19.1
25
24.6
28 |
| 2
1
2.1
2.8
8.3
15.7
21.9
21.7
30.1 |
| 3
1
1.1
2.2
5.7
8.2
9.3
12.5
21.6 |
| 4
1
2.44
2.9
4.6
13.2
17.3
17.6
19.4 |
| 5
1
2.9
3.5
5.7
11.5
14.9
19.7
19.3 |
|---------------------------------------------------------------------|
| 6
1
4.1
3.7
5.8
19.8
17.7
20.8
30.3 |
| 7
1
1.24
1.2
3.3
5.3
5.4
10.1
10.6 |
| 8
1
3.1
.
.
15.45
.
.
31.3 |
| 9
1
5.8
8.8
13.2
33.3
38.5
39.8
43.3 |
| 10
1
3.9
6.6
9.5
20.2
21.5
30.1
29.6 |
|---------------------------------------------------------------------|
| 11
1
1.91
1.7
6.3
9.9
12.6
12.7
15.4 |
| 12
1
2
2.3
4
8.4
8.3
12.8
16.7 |
| 13
1
3.7
3.9
4.7
10.5
14.6
20
21.7 |
| 14
2
2.46
2.7
2.54
3.95
4.16
5.1
4.16 |
| 15
2
2
1.8
4.22
5.76
7.08
10.92
7.08 |
|---------------------------------------------------------------------|
| 16
2
2.26
3
2.99
4.07
3.74
4.58
3.74 |
| 17
2
1.8
2.9
3.41
4.84
7.05
7.48
7.05 |
| 18
2
3.13
4
5.33
7.31
8.81
11.09
8.81 |
| 19
2
1.36
2.7
3.05
4
4.1
6.95
4.1 |
| 20
2
2.82
2.6
2.63
10.03
9.6
12.65
9.6 |
|---------------------------------------------------------------------|
| 21
2
1.7
1.6
1.73
2.96
4.17
6.04
4.17 |
| 22
2
2.1
1.9
3
4.8
7.4
16.7
21.2 |
+---------------------------------------------------------------------+
We see that the data are in wide format, with variables
id
patient ID (1 to 22)
race race (1=white, 2=black)
fbf0 forearm blood flow (ml/min/dl) at ioproterenol dose 0 mg/min
fbf10 forearm blood flow (ml/min/dl) at ioproterenol dose 10 mg/min
…
fbf400 forearm blood flow (ml/min/dl) at ioproterenol dose 400 mg/min
In this dataset, each of the several occasions represents an increasing dose, so can be thought of
as an effect across dose, rather than as an effect across time.
Chapter 5-16 (revision 16 May 2010)
p. 2
Paired Sample t Test (Change Analysis With Two Repeated Measurements)
The paired sample t test is a very popular approach for analyzing two correlated measurements.
Let’s compare the no dose forearm blood flow, fbf0, to the initial dose (10 mg/min), ignoring
race for now.
Statistics
Summaries, tables & tests
Classical tests of hypotheses
Mean comparison test, paired data
First variable: fbf10
Second variable: fbf0
OK
ttest fbf10 == fbf0
<or>
ttest fbf10 = fbf0
Paired t test
-----------------------------------------------------------------------------Variable |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------fbf10 |
21
3.057143
.3864103
1.770755
2.251105
3.863181
fbf0 |
21
2.467619
.2532897
1.160719
1.939266
2.995972
---------+-------------------------------------------------------------------diff |
21
.5895238
.1967903
.9018064
.1790265
1.000021
-----------------------------------------------------------------------------Ho: mean(fbf10 - fbf0) = mean(diff) = 0
Ha: mean(diff) < 0
t =
2.9957
P < t =
0.9964
Ha: mean(diff) != 0
t =
2.9957
P > |t| =
0.0071
Ha: mean(diff) > 0
t =
2.9957
P > t =
0.0036
The paired t test is identically the one sample t test on the absolute change scores (fbf10 – fbf0).
To verify this,
capture drop diff10
gen diff10 = fbf10-fbf0
list fbf0 fbf10 diff10 in 1/5
1.
2.
3.
4.
5.
+-------------------------+
| fbf0
fbf10
diff10 |
|-------------------------|
|
1
1.4
.4 |
| 2.1
2.8
.7 |
| 1.1
2.2
1.1 |
| 2.44
2.9
.46 |
| 2.9
3.5
.5999999 |
+-------------------------+
Chapter 5-16 (revision 16 May 2010)
p. 3
Statistics
Summaries, tables & tests
Classical tests of hypotheses
One sample mean comparison test
Variable name: diff10
Hypothesized mean: 0
OK
ttest diff10 = 0
One-sample t test
-----------------------------------------------------------------------------Variable |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------diff10 |
21
.5895238
.1967903
.9018064
.1790265
1.000021
-----------------------------------------------------------------------------Degrees of freedom: 20
Ho: mean(diff10) = 0
Ha: mean < 0
t =
2.9957
P < t =
0.9964
Ha: mean != 0
t =
2.9957
P > |t| =
0.0071
Ha: mean > 0
t =
2.9957
P > t =
0.0036
Compared to paired t test from above,
Paired t test
-----------------------------------------------------------------------------Variable |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------fbf10 |
21
3.057143
.3864103
1.770755
2.251105
3.863181
fbf0 |
21
2.467619
.2532897
1.160719
1.939266
2.995972
---------+-------------------------------------------------------------------diff |
21
.5895238
.1967903
.9018064
.1790265
1.000021
-----------------------------------------------------------------------------Ho: mean(fbf10 - fbf0) = mean(diff) = 0
Ha: mean(diff) < 0
t =
2.9957
P < t =
0.9964
Ha: mean(diff) != 0
t =
2.9957
P > |t| =
0.0071
Ha: mean(diff) > 0
t =
2.9957
P > t =
0.0036
And we can easily see that the two forms of the t test are identically the same, which is a test that
the mean difference, or change, is equal to 0.
Chapter 5-16 (revision 16 May 2010)
p. 4
The paired test takes the correlation structure of the data into account by being a test on
difference scores. The standard error of the difference, by definition, includes the correlation
coefficient of the two variables. The variance used in this formula is (van Belle, 2002, p.61):
Var(difference)  Var(Yi1  Yi 2 )   12   22  2  1 2
where ρ is the correlation between the two variables, the sigmas represent
the standard deviations and the sigma-squared the variances.
Let’s verify this. First obtain the correlation coefficient between the baseline and time 10
measurements:
corr fbf0 fbf10
(obs=21)
|
fbf0
fbf10
-------------+-----------------fbf0 |
1.0000
fbf10 |
0.8927
1.0000
Using the standard deviations displayed in the paired t test output,
Paired t test
-----------------------------------------------------------------------------Variable |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------fbf10 |
21
3.057143
.3864103
1.770755
2.251105
3.863181
fbf0 |
21
2.467619
.2532897
1.160719
1.939266
2.995972
---------+-------------------------------------------------------------------diff |
21
.5895238
.1967903
.9018064
.1790265
1.000021
------------------------------------------------------------------------------
and converting them to variances for use in the formula for the change variance:
display 1.770755^2+1.160719^2-2*0.8927*1.770755*1.160719
.81322181
Take the square root of the variance to get the standard deviation:
display sqrt(.81322181)
.90178812
which is identically the standard deviation of the difference shown in either t test output, accurate
to 4 decimal places. (Accuracy was limited to 4 decimal places since that is all we used for the
correlation coefficient in the calculation.)
Paired t test
-----------------------------------------------------------------------------Variable |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------fbf10 |
21
3.057143
.3864103
1.770755
2.251105
3.863181
fbf0 |
21
2.467619
.2532897
1.160719
1.939266
2.995972
---------+-------------------------------------------------------------------diff |
21
.5895238
.1967903
.9018064
.1790265
1.000021
------------------------------------------------------------------------------
Chapter 5-16 (revision 16 May 2010)
p. 5
Designing a Repeated Measures Experiment
There are two popularly used approaches to designing a pre-test vs post-test experiment.
Cross-Over Design
The first approach is the cross-over design, where the same subjects receive both treatments, onehalf randomized to received treatment A first, and the other receives treatment B first, with a
wash-out period in between.
treatment A (active)
baseline post-intervention
X0A
X1A
wash-out
treatment B (placeblo)
baseline post-intervention
X0B
X1B
A popular way to analyze these data are:
First, compute two change scores (differences):
diffA = X1A - X0A
diffB = X1B - X0B
Second, compare diffA to diffB using a paired t test.
Parallel Groups Design
The second approach is the parallel groups design, where different subjects are randomized to
receive either treatment A or B.
pre (or baseline)
treatment A (active)
X0A
treatment B (placeblo)
X0B
intervention
post-intervention
X1A
X1B
A popular way to analyze these data are:
First, compute two change scores (differences):
diffA = X1A - X0A
diffB = X1B - X0B
Second, compare diffA to diffB using an independent groups t test.
Which study design leads to a more powerful statistical comparison? That depends on which
version of the t test, paired or independent groups, is more powerful.
Chapter 5-16 (revision 16 May 2010)
p. 6
When Is the Paired t Test More Powerful than Independent Groups t Test?
The formula for the independent groups t test is (Rosner, 1995, p.259)
t
x1  x2
, where s  [(n1  1) s12  (n2  1) s22 /(n1  n2  2)
1 1
s
n1 n2
which when n1  n2  n , reduces to
t
x1  x2
s12  s22
n
The formula for the paired t test is
t
d
sd
n
x1  x2
s12  s22  2rs1s2
n
where r is the Pearson correlation coefficient. The comparison involves three situations:
r = 0 : the two forms of the t test are identical
r > 0 : the paired t test is more powerful (larger value), since r will then shrink sd in the
denominator of the paired t test statistic [This is quantitative expression of the rule-ofthumb that paired data increase precision.]
r < 0 : the independent groups t test is more powerful (larger value), since r will then inflate sd
in the denominator of the paired t test statistic
Note (Is it legitimate to switch to the independent groups t test when your repeated measures data
has an r < 0?):
No. You are stuck analyzing your data with the paired t test, rather than an independent
groups t test, since you have correlated data. That is, the independent groups t test
assumes ρ = 0, and has no ability to account for any other correlation structure in the data.
Conclusion
Replacing x1 with diffA and x2 with diffB in the above discussion, and recognizing that r > 0 is
the usual situation for a cross-over trial, the cross-over trial with its paired t test is the more
powerful approach. [Still, cross-over designs are frequently criticized for other reasons. For
example, one cannot be sure there is not a carry-over effect that remains even with a long washout period.]
Chapter 5-16 (revision 16 May 2010)
p. 7
Illustration
To illustrate the effect of the correlation structure on the paired and independent groups t tests,
we will draw some random samples of size n=40 with one variable having mean 3 and SD 1.8,
and a second variable having mean 2.5 and SD 1.2, similar to the estimates above. We will do
this for various correlations between the two variables. Then, we will compare the two variables
using both a paired and independent groups t test.
To draw random samples that are correlated, you pass the drawnorm command a vector of
means, a vector of standard deviations, and a correlation matrix, which we do in the following
Stata code.
capture drop log close
log using junk, replace
set seed 999
clear
forvalues r=-.75(.25).75 {
matrix m = (3 , 2.5) // mean1 = 3, mean2 = 2.5
matrix sd = (1.8 , 1.2) // sd1 = 1.8, sd2 = 1.2
matrix c = (1 , `r' \ `r' , 1) // correlation
drawnorm x1 x2 , n(40) means(m) sds(sd) corr(c)
sum
corr x1 x2
// Pearson correlation
ttest x1=x2
// paired t test
ttest x1=x2,unpaired // independent groups t test
drop x1 x2
}
log close
In this one iteration Monte Carlo simulation, the results are
correlation
means
-0.78
-0.28
-0.22
-0.18
0.25
0.64
0.77
3.36 vs 2.34
2.75 vs 2.68
2.87 vs 2.26
3.15 vs 2.22
2.50 vs. 2.43
2.72 vs 2.43
3.01 vs 2.70
paired t test
p value
0.032
0.818
0.086
0.011
0.807
0.313
0.042
independent t test
p value
0.005
0.796
0.057
0.005
0.827
0.510
0.291
As expected, the independent groups t test consistently has a smaller p value (more powerful)
when the data are negatively correlated and the paired t test consistently has a smaller p value
(more powerful) when the data are positively correlated.
Chapter 5-16 (revision 16 May 2010)
p. 8
Analysis of a Parallel Groups Design Study
For the rest of the discussion, where we compare the ANCOVA approach to the change
approach, we will consider only the parallel groups design (or an independent groups
observational study).
treatment A (active)
treatment B (placeblo)
pre (or baseline)
X0A
X0B
intervention
post-intervention
X1A
X1B
Frison and Pocock (1992) discuss three methods for analyzing parallel groups design randomized
controlled trials comparing two treatments with baseline and post-treatment measurements on the
same subjects.
Post-treatment means (POST): use an independent groups t test to compare the two groups on
their post-treatment measurement (or mean summary measure of
post-treatment measurements)
i.e., compare X1A to X1B ,
and assume X0A = X0B as a result of randomization (and so
can be ignored)
Mean changes (CHANGE): use an independent groups t test to the compare the two groups on
their baseline to post treatment change (or difference between
mean of baseline measurements and mean of post-treatment
measurements)
i.e., first compute two change scores:
diffA = X1A - X0A
diffB = X1B - X0B
and then compare diffA to diffB
(has an intuitive appeal, since a paired t test uses change scores)
Analysis of Covariance (ANCOVA): compare the two groups using a regression model of the
post-treatment measurement (or mean summary measure of
post-treatment measurements) using the baseline
measurement (or mean summary measure of baseline
measurements) as a covariate.
i.e., fit the linear regression
X1 = β0+ β1group + β2X0
Chapter 5-16 (revision 16 May 2010)
p. 9
According to van Belle’s argument (see box), the CHANGE approach will be superior to the
POST approach only if the pre to post correlation is at least 0.5.
CHANGE approach more powerful than POST approach only if the correlation is at least
0.5
van Belle (2002, p.61) explains:
“There is an old rule of thumb in the design of experiments that says that pairing data is
always a good thing in that it increases precision. This is not so. …Do not pair unless
the correlation between the pairs is greater than 0.5. … To see this, consider the variance
of the differences of paired data:
Var(Yi1  Yi 2 )   12   22  2  1 2
where  i2 is the variance of Yi and ρ is the correlation. For simplicity assume that the
two variances are equal so that
Var(Yi1  Yi 2 )  2 2 (1   ) .
If ρ = 0, that is, the variables are uncorrelated, then the variance of the difference is just
equal to the sum of the variances. This variance is reduced only when ρ > 0; that is, there
is a positive correlation between the two variables. Even in this case, the correlation ρ
must be greater than 0.5 in order to have the variances of the difference be smaller than
the variance of Yi2.”
Although the POST analysis is justified for a randomized controlled trial, since the
randomization assures comparable baselines, it is not a completely convincing analysis. This is
because baseline differences can still exist even though randomization took place, especially with
small sample sizes.
Chapter 5-16 (revision 16 May 2010)
p. 10
We can compare the power of the three approaches using power calculation, allowing various
correlations between the baseline and post-treatment measurements. This power calculation
approach is identical to a Monte Carlo simulation, since by definition of the power of a test is the
proportion of times the test statistic is significant in a large number of samples. (If we took a
Monte Carlo simulation approach, we would compute the proportion of times the test statistic is
significant in a larger number of samples.)
Using Stata’s sampsi command, the method options are post, change, or ancova as defined
above, and the r01 is the correlation between the baseline and post-treatment measurements.
For the post method,
Statistics
Summaries, tables & tests
Classical tests of hypotheses
Sample size and power determination
Main tab: Input: Two sample comparison of means:
Mean one: 3
Standard deviation one: 1.8
Mean two: 2.5
Standard deviation two: 1.2
Options tabs: Output: Compute power
Sample-based calculations: Sample size one: 150
Sample size two: 150
Correlations (between -1 and 1): -0.75
Choose method: Post
OK
sampsi 3 2.5, sd1(1.8) sd2(1.2) n1(150) n2(150) r01(-0.75) method(post)
Estimated power for two samples with repeated measures
Assumptions:
alpha =
0.0500
m1 =
3
m2 =
2.5
sd1 =
1.8
sd2 =
1.2
sample size n1 =
150
n2 =
150
n2/n1 =
1.00
number of follow-up measurements =
1
number of baseline measurements =
0
Method: POST
relative efficiency
adjustment to sd
adjusted sd1
adjusted sd2
=
=
=
=
1.000
1.000
1.800
1.200
Estimated power:
power =
0.808
Chapter 5-16 (revision 16 May 2010)
(two-sided)
p. 11
Doing this for a number of correlations and the three methods:
forvalues r=-.75(.25).75 {
sampsi 3 2.5 ,sd1(1.8) sd2(1.2) n1(150) n2(150) method(post) ///
pre(1) post(1) r01(`r')
sampsi 3 2.5 ,sd1(1.8) sd2(1.2) n1(150) n2(150) method(change) ///
pre(1) post(1) r01(`r')
sampsi 3 2.5 ,sd1(1.8) sd2(1.2) n1(150) n2(150) method(ancova) ///
pre(1) post(1) r01(`r')
}
The resulting values for power are:
correlation
-0.75
-0.50
-0.25
0
0.25
0.50
0.75
POST
0.81
0.81
0.81
0.81
0.81
0.81
0.81
CHANGE
0.33
0.37
0.43
0.52
0.64
0.81
0.98
ANCOVA
0.99
0.91
0.83
0.81
0.83
0.91
0.99
The POST analysis is not affected by the correlation between the baseline and post-treatment,
since it only uses the post-treatment data.
The CHANGE analysis only achieves equal power to the POST analysis when the correlation
reaches 0.50, as van Belle stated. The CHANGE approach is the more commonly used approach,
but it really shouldn’t be.
The ANCOVA analysis is more powerful to both POST and CHANGE, regardless of the size of
correlation.
We have just verified statisticians have claimed for some time, that the ANCOVA approach is
the most powerful approach of the three (Fleiss, 1986, Chapter 7; Frison and Pocock, 1992).
Chapter 5-16 (revision 16 May 2010)
p. 12
Example
Returning to the isoproterenol dataset,
use 11.2.Isoproterenol.dta, clear
recode race 2=1 1=0 , gen(black)
list id black fbf0 fbf10 , nolabel
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
+---------------------------+
| id
black
fbf0
fbf10 |
|---------------------------|
| 1
0
1
1.4 |
| 2
0
2.1
2.8 |
| 3
0
1.1
2.2 |
| 4
0
2.44
2.9 |
| 5
0
2.9
3.5 |
|---------------------------|
| 6
0
4.1
3.7 |
| 7
0
1.24
1.2 |
| 8
0
3.1
. |
| 9
0
5.8
8.8 |
| 10
0
3.9
6.6 |
|---------------------------|
| 11
0
1.91
1.7 |
| 12
0
2
2.3 |
| 13
0
3.7
3.9 |
| 14
1
2.46
2.7 |
| 15
1
2
1.8 |
|---------------------------|
| 16
1
2.26
3 |
| 17
1
1.8
2.9 |
| 18
1
3.13
4 |
| 19
1
1.36
2.7 |
| 20
1
2.82
2.6 |
|---------------------------|
| 21
1
1.7
1.6 |
| 22
1
2.1
1.9 |
+---------------------------+
To make the sample size equal for the three approaches, which will give a more fair comparison
of the approaches, we drop subject #8, which has a missing value for fbf10.
drop if fbf10==.
Chapter 5-16 (revision 16 May 2010)
p. 13
POST Approach
Comparing the forearm blood flow after the 10 mg dose (post-treatment outcome), using an
independent groups t test,
ttest fbf10 , by(black)
Two-sample t test with equal variances
-----------------------------------------------------------------------------Group |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------0 |
12
3.416667
.6439501
2.230709
1.999342
4.833991
1 |
9
2.577778
.2459549
.7378648
2.010605
3.144951
---------+-------------------------------------------------------------------combined |
21
3.057143
.3864103
1.770755
2.251105
3.863181
---------+-------------------------------------------------------------------diff |
.8388889
.7776535
-.7887586
2.466536
-----------------------------------------------------------------------------diff = mean(0) - mean(1)
t =
1.0787
Ho: diff = 0
degrees of freedom =
19
Ha: diff < 0
Pr(T < t) = 0.8529
Ha: diff != 0
Pr(|T| > |t|) = 0.2942
Ha: diff > 0
Pr(T > t) = 0.1471
This approach is generally accompanied with a test of the assumption that the baseline (pretreatment) forearm blood flow is equivalent in whites and blacks,
ttest fbf0 , by(black)
Two-sample t test with equal variances
-----------------------------------------------------------------------------Group |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------0 |
12
2.6825
.41896
1.45132
1.760375
3.604625
1 |
9
2.181111
.1857002
.5571006
1.752886
2.609337
---------+-------------------------------------------------------------------combined |
21
2.467619
.2532897
1.160719
1.939266
2.995972
---------+-------------------------------------------------------------------diff |
.5013889
.5123727
-.5710194
1.573797
-----------------------------------------------------------------------------diff = mean(0) - mean(1)
t =
0.9786
Ho: diff = 0
degrees of freedom =
19
Ha: diff < 0
Pr(T < t) = 0.8300
Ha: diff != 0
Pr(|T| > |t|) = 0.3401
Ha: diff > 0
Pr(T > t) = 0.1700
In applying this approach, the researcher would say that the baselines not significantly different,
implying the baseline data can be ignored. [This is a questionable assumption, however, since
any baseline differences are still carried forward to affect the later measurement.]
Chapter 5-16 (revision 16 May 2010)
p. 14
CHANGE Approach
First the change from baseline is computed, with a few observations listed to check the
computation,
gen diff10 = fbf10-fbf0
list fbf0 fbf10 diff10 in 1/5
1.
2.
3.
4.
5.
+-------------------------+
| fbf0
fbf10
diff10 |
|-------------------------|
|
1
1.4
.4 |
| 2.1
2.8
.7 |
| 1.1
2.2
1.1 |
| 2.44
2.9
.46 |
| 2.9
3.5
.5999999 |
+-------------------------+
Notice that the baseline value is substracted from the post value, so that a positive number
represents an increase and a negative number represents a decrease, which provides a more
intuitive presentation than subtracting in the reverse order.
Then, the blacks are compared to whites using an independent groups t test on the change scores,
ttest diff10 , by(black)
Two-sample t test with equal variances
-----------------------------------------------------------------------------Group |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------0 |
12
.7341667
.3088259
1.069804
.0544455
1.413888
1 |
9
.3966667
.2071634
.6214902
-.081053
.8743863
---------+-------------------------------------------------------------------combined |
21
.5895238
.1967903
.9018064
.1790265
1.000021
---------+-------------------------------------------------------------------diff |
.3375
.4005753
-.5009138
1.175914
-----------------------------------------------------------------------------diff = mean(0) - mean(1)
t =
0.8425
Ho: diff = 0
degrees of freedom =
19
Ha: diff < 0
Pr(T < t) = 0.7950
Ha: diff != 0
Pr(|T| > |t|) = 0.4100
Ha: diff > 0
Pr(T > t) = 0.2050
We see that the CHANGE approach has a larger p value than the POST approach (p = .410 and
p = 0.294, respectively). This is opposite of what the power analysis, on page 10 above,
predicted, since the correlation is > 0.5.
corr fbf0 fbf10
|
fbf0
fbf10
-------------+-----------------fbf0 |
1.0000
fbf10 |
0.8927
1.0000
Chapter 5-16 (revision 16 May 2010)
p. 15
bysort black: corr fbf0 fbf10
-> black = 0
|
fbf0
fbf10
-------------+-----------------fbf0 |
1.0000
fbf10 |
0.9171
1.0000
-> black = 1
|
fbf0
fbf10
-------------+-----------------fbf0 |
1.0000
fbf10 |
0.5699
1.0000
What happened?
Chapter 5-16 (revision 16 May 2010)
p. 16
ttest fbf0 , by(black)
Two-sample t test with equal variances
-----------------------------------------------------------------------------Group |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------0 |
12
2.6825
.41896
1.45132
1.760375
3.604625
1 |
9
2.181111
.1857002
.5571006
1.752886
2.609337
---------+-------------------------------------------------------------------combined |
21
2.467619
.2532897
1.160719
1.939266
2.995972
---------+-------------------------------------------------------------------diff |
.5013889
.5123727
-.5710194
1.573797
-----------------------------------------------------------------------------diff = mean(0) - mean(1)
t =
0.9786
Ho: diff = 0
degrees of freedom =
19
Ha: diff < 0
Pr(T < t) = 0.8300
Ha: diff != 0
Pr(|T| > |t|) = 0.3401
Ha: diff > 0
Pr(T > t) = 0.1700
ttest fbf10 ,by(black)
Two-sample t test with equal variances
-----------------------------------------------------------------------------Group |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------0 |
12
3.416667
.6439501
2.230709
1.999342
4.833991
1 |
9
2.577778
.2459549
.7378648
2.010605
3.144951
---------+-------------------------------------------------------------------combined |
21
3.057143
.3864103
1.770755
2.251105
3.863181
---------+-------------------------------------------------------------------diff |
.8388889
.7776535
-.7887586
2.466536
-----------------------------------------------------------------------------diff = mean(0) - mean(1)
t =
1.0787
Ho: diff = 0
degrees of freedom =
19
Ha: diff < 0
Pr(T < t) = 0.8529
Ha: diff != 0
Pr(|T| > |t|) = 0.2942
Ha: diff > 0
Pr(T > t) = 0.1471
ttest diff10 , by(black)
Two-sample t test with equal variances
-----------------------------------------------------------------------------Group |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------0 |
12
.7341667
.3088259
1.069804
.0544455
1.413888
1 |
9
.3966667
.2071634
.6214902
-.081053
.8743863
---------+-------------------------------------------------------------------combined |
21
.5895238
.1967903
.9018064
.1790265
1.000021
---------+-------------------------------------------------------------------diff |
.3375
.4005753
-.5009138
1.175914
-----------------------------------------------------------------------------diff = mean(0) - mean(1)
t =
0.8425
Ho: diff = 0
degrees of freedom =
19
Ha: diff < 0
Pr(T < t) = 0.7950
Ha: diff != 0
Pr(|T| > |t|) = 0.4100
Ha: diff > 0
Pr(T > t) = 0.2050
We see that the POST analysis added the change from baseline (.3375) to the difference that
already existed at baseline (.5014) to get the post-treatment difference (0.8389). This carry
forward effect, regardless of lack of statistical significance at baseline, outweighed the advantage
of the precision gained by a pairwise change approach (notice diff10 has a smaller standard error
than fbf10, hence greater precision).
Chapter 5-16 (revision 16 May 2010)
p. 17
We can get a better feel for differences in baselines using a boxplot
graph box fbf0, medtype(line) by(black)
1
5
4
3
2
1
Baseline Forearm Blood Flow
6
0
Graphs by black
So, out of a POST and a CHANGE analysis for this dataset, which do you feel more comfortable
with?
Chapter 5-16 (revision 16 May 2010)
p. 18
ANCOVA Approach
With this approach, we compare the post treatment forearm blood flow between whites and
blacks, while controlling for baseline forearm blood flow.
regress fbf10 black fbf0
Source |
SS
df
MS
-------------+-----------------------------Model | 50.1062283
2 25.0531142
Residual | 12.6052018
18 .700288991
-------------+-----------------------------Total | 62.7114302
20 3.13557151
Number of obs
F( 2,
18)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
21
35.78
0.0000
0.7990
0.7767
.83683
-----------------------------------------------------------------------------fbf10 |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------black | -.1639327
.3781931
-0.43
0.670
-.9584869
.6306216
fbf0 |
1.346173
.1652242
8.15
0.000
.9990499
1.693296
_cons | -.1944425
.5047732
-0.39
0.705
-1.254932
.8660467
------------------------------------------------------------------------------
This is even less significant than the other approaches, contrary to the power analysis on page 10,
that suggested it should be more significant. Perhaps, however, this analysis is just simply more
correct, and it is more sensitive to a lack of effect than the other approaches. (If so, the same
thing could be said about the change approach being less significant than the post approach, since
the change approach uses more information and so should be more sensitive.)
We can get a feel for these data using a parallel coordinate plot (Cox, 2004). First, the parplot
command (ado file) must be added to Stata, if it hasn’t already been.
findit parplot
Then, creating the graph,
#delimit ;
parplot fbf0 fbf10
, transform(raw)
xlabel(1 "0" 2 "10")
ylabel(0(1)6, angle(horizontal))
ytitle("forearm blood flow (ml/min/dl) ")
xtitle("isoproterenol dose (mg/min)")
by(black)
;
#delimit cr
Chapter 5-16 (revision 16 May 2010)
p. 19
white
black
6
5
4
3
2
1
0
0
10
0
10
isoproterenol dose (mg/min)
Graphs by black
We observe that over the range of baseline values in the black group, both groups have very
similar looking graphs.
Just for the “sake of medical knowledge”, when we graph the entire dose range, we do see a
difference between the groups.
#delimit ;
parplot fbf0-fbf400
, transform(raw)
xlabel(1 "0" 2 "10" 3 "20" 4 "60" 5 "150" 6 "300" 7 "400")
ylabel(0(5)45, angle(horizontal))
ytitle("forearm blood flow (ml/min/dl) ")
xtitle("isoproterenol dose (mg/min)")
by(black)
;
#delimit cr
white
black
45
forearm blood flow (ml/min/dl)
40
35
30
25
20
15
10
5
0
0
10
20
60
150
300
400
0
10
20
60
150
300
400
isoproterenol dose (mg/min)
Graphs by black
Chapter 5-16 (revision 16 May 2010)
p. 20
How ANCOVA Works
The regression coefficient for black is the “adjusted mean difference” of forearm blood flow
between whites and blacks. That is, it is the predicted mean difference when both blacks and
whites have the same baseline blood flow. This is represented graphically as the vertical
difference between the groups regression lines.
regress fbf10 black fbf0
predict fbf10pred
sort fbf0
#delimit ;
twoway (scatter fbf10 fbf0 if black==0, ///
msymbol(square) mlcolor(blue) mfcolor(blue))
(scatter fbf10 fbf0 if black==1, ///
msymbol(triangle) mlcolor(green) mfcolor(green))
(line fbf10pred fbf0 if black==0, lcolor(blue))
(line fbf10pred fbf0 if black==1, lcolor(green))
, xline(2,lcolor(red)lwidth(medium))
xline(3,lcolor(red)lwidth(medium))
xtitle(forearm blood flow with dose 0 mg/min)
ytitle(forearm blood flow with dose 10 mg/min)
legend(off)
text(2.5 3.3 "black" , color(green))
text(5.5 3.3 "white" , color(blue))
;
#delimit cr
6
8
-----------------------------------------------------------------------------fbf10 |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------black | -.1639327
.3781931
-0.43
0.670
-.9584869
.6306216
fbf0 |
1.346173
.1652242
8.15
0.000
.9990499
1.693296
_cons | -.1944425
.5047732
-0.39
0.705
-1.254932
.8660467
------------------------------------------------------------------------------
4
white
0
2
black
1
2
3
4
forearm blood flow with dose 0 mg/min
Chapter 5-16 (revision 16 May 2010)
5
6
p. 21
The parallel coordinate plot on page 17 suggested that the effect is not linear, but perhaps
quadratic. We can improve the ANCOVA by including a quadratic term in the model.
gen fbf0quad = fbf0*fbf0
regress fbf10 black fbf0 fbf0quad
predict fbf10pred2
sort fbf0
#delimit ;
twoway (scatter fbf10 fbf0 if black==0, ///
msymbol(square) mlcolor(blue) mfcolor(blue))
(scatter fbf10 fbf0 if black==1, ///
msymbol(triangle) mlcolor(green) mfcolor(green))
(line fbf10pred2 fbf0 if black==0, lcolor(blue))
(line fbf10pred2 fbf0 if black==1, lcolor(green))
, xline(2,lcolor(red)lwidth(medium))
xline(3,lcolor(red)lwidth(medium))
xtitle(forearm blood flow with dose 0 mg/min)
ytitle(forearm blood flow with dose 10 mg/min)
legend(off)
text(2.5 3.3 "black" , color(green))
text(5.5 3.3 "white" , color(blue))
;
#delimit cr
6
8
10
-----------------------------------------------------------------------------fbf10 |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------black |
.0569853
.3516385
0.16
0.873
-.6849071
.7988778
fbf0 | -.0372271
.6108978
-0.06
0.952
-1.326109
1.251655
fbf0quad |
.2234133
.0957203
2.33
0.032
.0214612
.4253655
_cons |
1.477522
.8470312
1.74
0.099
-.3095573
3.264602
------------------------------------------------------------------------------
4
white
2
black
1
2
3
4
forearm blood flow with dose 0 mg/min
5
6
This model shows an even smaller effect.
Chapter 5-16 (revision 16 May 2010)
p. 22
Example 2
This time, all the usual assumptions for the test statistics will be met: normality distributed, as
we will draw random samples from the normal distribution, and equal variances. Also, we will
use equivalent baselines. The pre to post correlation will be r=0.7. Group 1 will change by 1 and
Group 2 will change by 2, so we should see group effect of 1, except for sampling variability.
set seed 999
clear
* group 0 changes by 1
matrix m = (10 , 11) // mean1 = 10, mean2 = 11
matrix sd = (3 , 3 ) // sd1 = 3, sd2 = 3
matrix c = (1 , .7 \ .7 , 1) // correlation
drawnorm x1 x2 , n(100) means(m) sds(sd) corr(c)
gen group=0
sum
corr
save temp, replace
clear
* group 1 changes by 2
matrix m = (10 , 12) // mean1 = 10, mean2 = 12
matrix sd = (3 , 3 ) // sd1 = 3, sd2 = 3
matrix c = (1 , .7 \ .7 , 1) // correlation
drawnorm x1 x2 , n(100) means(m) sds(sd) corr(c)
gen group=1
sum
corr
append using temp
capture drop diff
gen diff=x2-x1
sum diff
bysort group: sum x1 x2 diff
ttest x2 ,by(group) // POST
ttest diff ,by(group) // CHANGE
regress x2 group x1 // ANCOVA
-> group = 0
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------x1 |
100
10.26782
2.807734
5.122697
19.63148
x2 |
100
11.09425
2.726155
5.794079
20.09343
diff |
100
.8264255
2.127704 -4.929298
5.154228
-> group = 1
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------x1 |
100
9.674524
2.814999
1.26943
16.71844
x2 |
100
11.45815
2.841669
2.004145
18.97966
diff |
100
1.783629
2.250502 -5.201857
7.366302
Chapter 5-16 (revision 16 May 2010)
p. 23
. ttest x2 ,by(group) // POST
Two-sample t test with equal variances
-----------------------------------------------------------------------------Group |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------0 |
100
11.09425
.2726155
2.726155
10.55332
11.63517
1 |
100
11.45815
.2841669
2.841669
10.8943
12.022
---------+-------------------------------------------------------------------combined |
200
11.2762
.1968224
2.783489
10.88807
11.66432
---------+-------------------------------------------------------------------diff |
-.3639069
.3937893
-1.140466
.4126525
-----------------------------------------------------------------------------diff = mean(0) - mean(1)
t = -0.9241
Ho: diff = 0
degrees of freedom =
198
Ha: diff < 0
Pr(T < t) = 0.1783
Ha: diff != 0
Pr(|T| > |t|) = 0.3566
Ha: diff > 0
Pr(T > t) = 0.8217
. ttest diff ,by(group) // CHANGE
Two-sample t test with equal variances
-----------------------------------------------------------------------------Group |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------0 |
100
.8264255
.2127704
2.127704
.4042428
1.248608
1 |
100
1.783629
.2250502
2.250502
1.337081
2.230177
---------+-------------------------------------------------------------------combined |
200
1.305027
.1581463
2.236527
.9931695
1.616885
---------+-------------------------------------------------------------------diff |
-.9572034
.3097077
-1.567952
-.3464545
-----------------------------------------------------------------------------diff = mean(0) - mean(1)
t = -3.0907
Ho: diff = 0
degrees of freedom =
198
Ha: diff < 0
Pr(T < t) = 0.0011
. regress x2 group x1
Ha: diff != 0
Pr(|T| > |t|) = 0.0023
Ha: diff > 0
Pr(T > t) = 0.9989
// ANCOVA
Source |
SS
df
MS
-------------+-----------------------------Model | 745.439301
2
372.71965
Residual | 796.375252
197 4.04251397
-------------+-----------------------------Total | 1541.81455
199 7.74781182
Number of obs
F( 2,
197)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
200
92.20
0.0000
0.4835
0.4782
2.0106
-----------------------------------------------------------------------------x2 |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------group |
.7715594
.2859363
2.70
0.008
.2076705
1.335448
x1 |
.6870974
.0508248
13.52
0.000
.5868669
.7873278
_cons |
4.039253
.5592517
7.22
0.000
2.936365
5.142142
------------------------------------------------------------------------------
This time we get the expected results, that CHANGE is more significant (more powerful) than
POST, and ANCOVA and CHANGE are approximately equal (occurs when r is large, as shown
on page 12).
Chapter 5-16 (revision 16 May 2010)
p. 24
Next, we will use the same example, but perform a Monte Carlo simulation (10,000 samples) to
see which approach has the correct expected effect of 1.0, where the expected effect is the long
run average effect.
* -- Monte Carlo simulation
set seed 999
clear
gen posteffect = .
gen changeeffect = .
gen ancovaeffect = .
save tempeffects, replace
forvalues i=1/10000 {
quietly clear
quietly matrix m = (10 , 11) // mean1 = 10, mean2 = 11
quietly matrix sd = (3 , 3 ) // sd1 = 3, sd2 = 3
quietly matrix c = (1 , .7 \ .7 , 1) // correlation
quietly drawnorm x1 x2 , n(100) means(m) sds(sd) corr(c)
quietly gen group=0
quietly save temp, replace
quietly clear
quietly matrix m = (10 , 12) // mean1 = 10, mean2 = 12
quietly matrix sd = (3 , 3 ) // sd1 = 3, sd2 = 3
quietly matrix c = (1 , .7 \ .7 , 1) // correlation
quietly drawnorm x1 x2 , n(100) means(m) sds(sd) corr(c)
quietly gen group=1
quietly append using temp
quietly gen diff=x2-x1
quietly ttest x2 ,by(group) // POST
quietly gen posteffect = r(mu_2)-r(mu_1) in 1/1
quietly ttest diff ,by(group) // CHANGE
quietly gen changeeffect = r(mu_2)-r(mu_1) in 1/1
quietly regress x2 group x1 // ANCOVA
quietly gen ancovaeffect = _b[group]
quietly keep posteffect changeeffect ancovaeffect
quietly keep in 1/1
quietly append using tempeffects
quietly save tempeffects, replace
}
use tempeffects, clear
sum
Here are the results. Recall this simulation was based on random samples from populations with
an equilavent baseline. Computing the mean of the stored effects (mean differences) from the
three approaches based on 10,000 random samples,
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------posteffect |
10000
1.001576
.4264453 -.4825303
2.843583
changeeffect |
10000
1.005397
.3302449 -.2763749
2.271128
ancovaeffect |
10000
1.004405
.3058542 -.0475029
2.201868
we see they all have an expected effect of 1, which is the correct answer. The precision, as seen
by the standard deviation estimate of the effect (which is the standard error of the effect)
improves going from POST to CHANGE to ANCOVA, which is consistent with the relative
power of these approaches. Also, the fact that POST got the correct effect, is what justifies that
approach, although it is rarely used anymore.
Chapter 5-16 (revision 16 May 2010)
p. 25
Vickers and Altman paper (2005) summarize the discussion (first paragraph, first page, 2nd
column):
“In effect an analysis of covariance adjusts each patient’s follow up score for his or her
baseline score, but has the advantage of being unaffected by baseline differences. If, by
chance, baseline scores are worse in the treatment group, the treatment effect will be
underestimated by a follow up score analysis and overestimated by looking at change
scores (because of regression to the mean). By contrast, analysis of covariance gives the
same answer whether or not there is baseline imbalance.”
And in the last paragraph, first page, 2nd column, they state:
“An additional advantage of analysis of covariance is that it generally has greater
statistical power to detect a treatment effect than ther other methods.”
This is a good paper to read. It is short and easy to read.
Popularity of ANCOVA Approach
Because of it’s increased power and because it is not subject to regression towards the mean bias,
ANOVA is a popular analysis approach, although many researchers continue to use the change
score approach due to unfamiliarity with the ANCOVA approach. ANCOVA is commonly used
in papers published in the New England Journal of Medicine. Here are some examples of
ANCOVA being stated in the Statistical Methods section of some N Engl J Med papers, just
weeks apart:
The REST Investigators (2007;356(4):360-370),
“Analysis of covariance was used to compare quality-of-life scores (on the basis of results
on the SF-36 and EuroQol questionnaires) between groups, adjusting for baseline values.”
Cahen et al. (2007;356(7):676-684),
“To adjust for baseline scores, analysis of covariance was performed.”
Nagot et al. (2007;356(8):790-799), uses an ANCOVA approach in a modified Poisson
regression, ordinal regression, and linear regression in the same paper,
“The effect of the presence of HIV-1 RNA or HSV-2 DNA at any time during the
treatment phase (detection) was estimated as a risk ratio, using Poisson regression with
robust standard errors,29 adjusted for the presence of the outcome measurie during the
baseline phase. The frequency of genital HIV-1 RNA and HSV-2 DNA (the proportion
of visits during which the virus was detected, per woman) was estimated by a
proportional-odds-ordered logistic-regression model adjusted for baseline frequency.
Linear regression was used to compare the mean levels of genital or plasma HIV-1 RNA
(in log10 copies per milliliter) between the study groups, adjusted for mean baseline
values.”
Chapter 5-16 (revision 16 May 2010)
p. 26
In a methods paper discussing statistical guidelines for the European Respiratory Journal, Chinn
(2001, p.397) makes the point that a change analysis should be justified if used in place of an
ANCOVA approach, since ANCOVA is known to be better,
“Analysis of change in the outcome from baseline to final value should be justified if
used rather than the preferred analysis of final value adjusted for baseline.”
Protocol Suggestion
There are still a lot of researchers and statisticians that prefer the change approach over the
ANCOVA approach, because they are uninformed that ANCOVA is better. To avoid the
possibility that a grant reviewer will be uncomfortable with the ANCOVA approach, you could
add the following text after describing your regression model in the Statistical Methods section:
The baseline outcome measurement will be included as a covariate, in an analysis of
covariance (ANCOVA) fashion. The ANCOVA approach is known to be a more
powerful test than a group comparison of baseline to post-intervention change. Another
advantage is that ANCOVA is not distorted by regression towards the mean bias, whereas
a change analysis is subject to that bias (Frison and Pocock, 1992; Vickers and Altman,
2001).
Repeated Baseline Measurements
Frequently investigators will take repeated measurements of the baseline measurement to make
sure the baseline is stable. Statistical power is also increased when repeated measures of the
baseline are used. Senn (2007, p.103) provides a good discussion of approaches to including
these in your analysis,
“A special case arises where we have repeated measures of a proper baseline. The mean
of these is likely to be more strongly correlated with the outcome than any single measure
and will also show a lower degree of confounding with treatment than the most
confounded of the various measurements on which it is based. There will usually be little
value in fitting these measurement separately and the mean may be used instead. Frison
and Pocock have investigated this approach in detail and found it to be valuable (Frison
and Pocock, 1992). An exception might be when the interval between baseline
measurements is fairly long compared to the interval to outcome. It is possible that one
could then correct not only for the observed baseline level of patients but also for a trend
in baselines. For the case with two baselines, fitting the mean and the difference of the
baselines would be formally equivalent to fitting them both separately. Another way of
looking at this particular point is to say that if within-patient errors follow some sort of
autoregressive process, then more recent baselines will be more highly predictive than
earlier ones. It might be inefficient to fit a mean of them all and wiser to fit them
separately. It will usually be inappropriate to fit the last baseline only, however.”
Chapter 5-16 (revision 16 May 2010)
p. 27
Regression Towards The Mean Bias
A incredibly good website discussing this topic is:
www-users.york.ac.uk/~mb55/talks/regmean.htm
References
Chinn S. (2001). Statistics for the European Respiratory Journal. Eur Respir J 18:393-401.
Cox NJ. (2004). Speaking Stata: graphing agreement and disagreement. The Stata Journal
4(3):329-349.
Dupont WD. (2002). Statistical Modeling for Biomedical Researchers: a Simple
Introduction to the Analysis of Complex Data. Cambridge, Cambridge University
Press.
Fleiss JL. (1986). The Design and Aanlysis of Clinical Experiments. New York, John
Wiley & Sons.
Frison L, Pocock S. (1992). Repeated measures in clinical trials: analysis using mean summary
statistics and its implications for design. Statistics in Medicine 11:1685-1704.
Lang CC, Stein CM, Brown RM, et al. (1995). Attenuation of isoproterenol-mediated
vasodilation in blacks. N Engl J Med 333:155-60.
Senn S. (2007). Statistical Issues in Drug Development, 2nd ed. Hoboken NJ, John Wiley &
Sons.
van Belle G. (2002). Statistical Rules of Thumb. New York, John Wiley & Sons.
Vickers AJ, Altman DG. (2001). Statistics notes: analysing controlled trials with baseline and
follow up measurements. BMJ 323:1123-1124.
Chapter 5-16 (revision 16 May 2010)
p. 28