Download Linear mixed model

Document related concepts
no text concepts found
Transcript
From t-test to …
multilevel analyses
Stein Atle Lie
Outline
 Pared t-test (Mean and standard deviation)
 Two-group t-test (Mean and standard deviations)
 Linear regression
 GLM (general linear models)
 GEE (general estimation equations)
 GLMM (general linear mixed model)
…
 SPSS, Stata, R, MLwiN, gllamm (Stata)
Multilevel models
 “Same thing – many names”:
 Generalized estimation equations
 Random effects models
 Random intercept and random slope models
 Mixed effects models
 Variance component models
 Frailty models (in survival analyses)
 Latent variables
Cortisol data – missing data
ID per1day1 per1day2 per1day3 per2day1 per2day2 per2day3
2
6.83
6.44
7.09
8.34
5.45
9.59
4
7.94
10.61
9.27
6
1.65
3.62
9.13
8
6.62
15.02
9
1.49
6.94
22.5
37.76
20.04
25.08
29
1.97
18.34
9.15
60
21.21
6.91
5.04
6.63
4.54
168
7.81
10.6
22.26
19.91
8.81
10.55
188
3.75
6.01
6.43
219
25.02
16.49
5.64
10.18
11.48
7.87
227
10.3
18.06
14.87
19.37
7.73
235
3.92
0.94
8.11
237
16.4
13.12
16.43
1.55
5.19
239
25.68
15.35
10.79
8.99
17.18
246
6.03
8.08
10.65
247
11.51
27.07
25
13.93
16.12
10.81
257
19.8
11.47
10.88
10.42
9.25
3.98
273
9.42
9.42
274
8.41
12.48
10.72
3.63
10.37
8.62
277
9.92
11.56
16.88
6.12
5.83
9.47
282
15.56
32.96
26.6
8.48
7.37
283
15.66
8.86
12.37
4.53
0.42
9.39
298
0.32
1.21
4.54
5.39
8.14
307
16.22
13.45
15.76
0.57
2.69
2.31
319
8.67
11.77
9.48
22.35
13.19
6.05
322
38.06
22.37
15.89
20.41
3.53
19.98
338
8.4
8.62
3.09
8.15
8.3
10.88
348
29.89
33.27
23.99
11.72
4.59
11.16
355
2.11
4.22
15.28
13.94
7.95
14.18
364
5.89
5.81
5.8
5.06
7.05
7.72
376
10.27
14.57
12.11
1.28
5.88
377
11.49
17.26
12.23
5.13
8.99
431
13.28
9.06
17.22
10.19
8.56
7.6
432
13.65
17.72
8.68
10.89
10.9
11.05
534
4.24
10.45
11.92
535
5.54
11.71
22.45
20.09
11.24
14.08
536
19.03
8.75
13.03
1.4
10.03
3.26
537
6.59
17.89
5.19
5.63
3.14
5.79
538
3.25
5.54
0.63
8.34
8.21
1001
5.2
11.8
4.4
3.9
3.9
1002
7.8
10.2
10.2
6.4
8.3
2.6
1003
10.5
10.5
18.3
8.3
19
0.7
1004
12.2
10.1
13.2
8.9
4.6
5.4
1005
2.7
4.1
4.9
2.6
1006
25.5
10.3
8.3
4
6.3
1007
6.1
8.3
5.5
9.7
8.4
11.2
1008
4.6
4.8
3.8
5.8
5.3
4.3
1009
5
9.8
9.5
5.6
5.2
12.4
1010
4.8
15.6
12.8
8.2
9
13.9
1011
7.9
7.7
14.1
7.8
12.1
1012
13.8
16.6
7.6
1013
1.4
3.3
9.7
16.2
1014
9.5
8.4
9.3
15.1
12.6
15.3
1015
12.2
16.1
11
8.1
1016
8.5
8.4
6.8
19.9
Objective
 Take the general thinking from simple
statistical methods into more sophisticated
data-structures and statistical analyses
 Focus on the interpretation of the results with
respect to those found in basic statistical
methods
Multilevel data
Types of data:
 Repeated measures for the same individual
 The same measure is repeated several times on the
same individual
 Several observers have measured the same
individual
 Several different measures for the same individual
 Related observations (siblings, families, …)
 A categorical variable with ”many” levels
(multicenter data, hospitals, clinics, …)
 Panel data
Null hypotheses
 In ordinary statistics (using both pared and
two-sample t-tests) we define a null hypothesis.
H0: m1 = m2
 We assume that mean from group (or measure) 1
is equal to the mean from group (or measure) 2.
 Alternatively
H0: D = m1-m2 = 0
p-value
 Definition:
 “If our null-hypothesis is true - what is the
probability to observe the data* that we did?”
* And hence the mean, t-statistic, etc…
p-value
 We assume that our null-hypothesis is true
(m0=0 or m1-m2=0)
 We observe our data
 Mean value etc.
 Under the assumption of normal distributed data
p-value
 The p-value is the
probability to observe
our data (or something
more extreme) under
the given assumptions
-2
0
X
2
m0
4
6
8
X
Pared t-test
 The straightforward way to analyze two
repeated measures is a pared t-test.
 Measure at time1 or location1 (e.g. Data1) is
directly compared to measure at time2 or
location2 (e.g. Data2)
 Is the difference between Data1 and Data2
(Diff = Data1-Data2) unlike 0?
Pared t-test (n=10)
PASW:
T-TEST PAIRS=Data1 WITH Data2 (PAIRED).
Pared t-test
 The pared t-test will only be performed for
complete (balanced) data.
 What happens if we delete two observations
from data2?
 (Only 8 complete pairs remain)
Pared t-test (n=8)
PASW:
T-TEST PAIRS=Data1 WITH Data2 (PAIRED).
Excel
Two group t-test
 If we now consider the data from time1 and
time2 (or location1 and location2) to be
independent (even if their not) and use a two
group t-test on the full dataset, 2*10
observations
Two group t-test (n=20 [10+10])
PASW:
T-TEST GROUPS=Grp(1 2)
/VARIABLES=Data.
Two group t-test
 Observe that mean for Grp1 and Grp2 is equal
to mean for Data1 and Data2
 And that the mean difference is also equal
 The difference between pared t-test and two
group t-test lies in the
 Variance - and the number of observations
 and therefore in the standard deviation and
standard error
 and hence in the p-value and confidence intervals
Two group t-test
 The two group t-test are performed on all
available data.
 What happens if we delete two observations
from Grp2?
 (Only 8 complete pairs remain - but 18
observations remain!)
Two group t-test (n=18 [10+8])
PASW:
T-TEST GROUPS=Grp(1 2)
/VARIABLES=Data.
Two group t-test (s1=s2)
s1
s2
m1
D
m2
Two group t-test (s1=s2)
20
Percent of Total
15
10
s1
s2
5
0
-4
-2
X 10 rnorm(50, 20,3, 1) X 2 4
6
8
ANOVA (Analysis of variance (s1=s2=s3)
20
Percent of Total
15
s2 s3
s1
10
5
0
-4
-2
X1
0
2
X2 X3
rnorm(50, 4,
0, 1)
3,
4
6
8
Linear regression
 If we now perform an ordinary linear
regression with the data as outcome
(dependent variable) and the group variable
(Grp=1 and 2) as independent variable
 the coefficient for group is identical to the
mean difference
 and the standard error, t-statistic, and p-value
are identical to those found in a two-group
t-test
Linear regression (n=20)
Stata:
. regress data grp
Source |
SS
df
MS
Number of obs =
20
-------------+-----------------------------F( 1, 18) = 1.38
Model | 21.0124998 1 21.0124998
Prob > F = 0.2554
Residual | 274.01701 18 15.2231672
R-squared = 0.0712
-------------+-----------------------------Adj R-squared = 0.0196
Total | 295.02951 19 15.5278689
Root MSE = 3.9017
-----------------------------------------------------------------------------data |
Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------grp |
2.05 1.744888 1.17 0.255 -1.615873 5.715873
_cons |
5.33 2.75891 1.93 0.069 -.4662545 11.12625
------------------------------------------------------------------------------
Linear regression
 Now exchange the independent variable for
group (Grp=1 and 2) with a dummy variable
(dummy=0 for grp=1 and dummy=1 for grp=2)
 the coefficient for the dummy is equal to the
coefficient for grp (the mean difference)
 and the coefficient for the constant term is equal
to the mean for grp1 (the standard error is not!)
Linear regression (n=20)
Stata:
. regress data dummy
Source |
SS
df
MS
Number of obs =
20
-------------+-----------------------------F( 1, 18) = 1.38
Model | 21.0124998 1 21.0124998
Prob > F = 0.2554
Residual | 274.01701 18 15.2231672
R-squared = 0.0712
-------------+-----------------------------Adj R-squared = 0.0196
Total | 295.02951 19 15.5278689
Root MSE = 3.9017
-----------------------------------------------------------------------------data |
Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------dummy |
2.05 1.744888 1.17 0.255 -1.615873 5.715873
_cons |
7.38 1.233822 5.98 0.000 4.787836 9.972164
------------------------------------------------------------------------------
Linear models in Stata
 In ordinary linear models (regress and glm) in
Stata one may add an option for clustered data
– to obtain standard errors adjusted for
intragroup correlation
 This is ideal when you want to adjust for
clustered data, but are not interested in the
correlation within or between groups
 And - you will still have the population effects!!
Linear regression (n=20)
Stata:
. regress data dummy, cluster(id)
Linear regression
Number of obs =
F( 1, 9) = 2.64
Prob > F = 0.1388
R-squared = 0.0712
Root MSE = 3.9017
20
(Std. Err. adjusted for 10 clusters in id)
-----------------------------------------------------------------------------|
Robust
data |
Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------dummy |
2.05 1.262145 1.62 0.139 -.8051699 4.90517
_cons |
7.38 1.224847 6.03 0.000 4.609204 10.1508
------------------------------------------------------------------------------
Linear models in Stata
 Thus, we now have an alternative to the pared
t-test. The mean difference is identical to that
obtained from the pared t-test, and the standard
errors (and p-values) are adjusted for intragroup
correlation
 As an alternative we may use the program
gllamm (Generalized Linear Latent And Mixed
Models) in Stata
 http://www.gllamm.org/
gllamm (n=20)
gllamm (Stata):
. gllamm data dummy, i(id)
number of level 1 units = 20
number of level 2 units = 10
-----------------------------------------------------------------------------data | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------dummy |
2.05 1.167852 1.76 0.079 -.2389486 4.338949
_cons | 7.379808 1.172819 6.29 0.000 5.081124 9.678492
-----------------------------------------------------------------------------Variance at level 1 6.8193955 (3.0174853)
Variances and covariances of random effects
-----------------------------------------------------------------------------level 2 (id) var(1): 6.8114516 (4.5613185)
Linear models in Stata
 If we now delete two of the observations in Grp2
 We then have coefficients (“mean differences”)
calculated based on all (n=18) data
 and standard errors corrected for intragroup
correlation - using the commands <regress>,
<glm> or <gllamm>
Linear regression (n=18)
Stata:
. regress data dummy, cluster(id)
Linear regression
Number of obs =
F( 1, 9) = 1.63
Prob > F = 0.2332
R-squared = 0.0587
Root MSE = 4.1303
18
(Std. Err. adjusted for 10 clusters in id)
-----------------------------------------------------------------------------|
Robust
data |
Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+---------------------------------------------------------------dummy | 1.9575 1.531486 1.28 0.233 -1.506963 5.421963
_cons |
7.38 1.228869 6.01 0.000 4.600105 10.1599
------------------------------------------------------------------------------
gllamm (n=18)
gllamm (Stata):
. gllamm data dummy, i(id)
number of level 1 units = 18
number of level 2 units = 10
log likelihood = -48.538837
-----------------------------------------------------------------------------data | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------dummy | 2.458305 1.253552 1.96 0.050 .0013882 4.915223
_cons | 7.357426 1.232548 5.97 0.000 4.941677 9.773176
-----------------------------------------------------------------------------Variance at level 1 6.4041537 (3.3485133)
level 2 (id) var(1): 8.7561818 (5.1671805)
Intra class correlation (ICC)
Variance at level 1 6.4041537 (3.3485133)
level 2 (id) var(1): 8.7561818 (5.1671805)
 The total variance is hence
 6.4041 + 8.7561= 15.1603
 (and the standard deviation is hence 3.8936)
 The proportion of variance attributed to level 2
is therefore
 ICC = 8.7561/15.1603 = 0.578
Linear regression
 Ordinary linear regression
 Assumes data is Normal and i.i.d. (identical
independent distributed)
Linear regression
Y
Regression line:
y = b0 + b1·x
b1
(x1,y1)
residual
(xi,yi)
(xn,yn)
b0
Height * *Weight
Kortisol
Time
Months
X
Linear regression
 Assumptions:
1) y1, y2,…, yn are independent normal distributed
2) The expectation of Yi is:
E(Yi) = b0 + b1·xi
(linear relation between X and Y)
3) The variance of Yi is:
var(Yi) = s2
(equal variance for ALL values of X)
Linear regression
 Assumptions - Residuals (ei): yi = a + b·xi + ei
1) e1, e2,…, en are independent normal distributed
2) The expectation of ei is:
E(ei) = 0
3) The variance of ei is:
var(Yi) = s2
Y
Regression
^y =a+b·x
i
i
What is the ”best” a and b?
Least squares method
residual (e)
^ )2
(yi-y
i
(xi,yi)
residual (e)
^)
(xi,y
i
_
y
_
x
X
Regression
• Least squares method:
– We wish that the sum of squares
SSE =   yi  yˆ i 
2
=   yi  (a  b  xi ) 
2
(The distance from all points to the line [the residuals]; squared)
is as least as possible – we whish to find the minimum
Regression
• The least squares method:
– The solution is:
x  x y  y  SS

b=
=
SS
 x  x 
i
i
XY
2
i
a = y  bx
XX
Regression
• The maximum likelihood method:
• Assumptions:
1) y1, y2,…, yn are random (independent), normal-distributed observations, i.i.d.
2) Expectation for Yi is: E(Yi) = a + b·xi
1   yi  yˆi 2

1
2
2
s2
3) Variance for Yi is:
var(Yi) = s
f ( y) =
e
2 s
f(y) maximized v.r.t. a and b. (The likelihood-function)
1
l ( a, b | y ) =
e
2 s
1   yi ( a bxi ) 2

2
s2
This is the same as finding the minimum of
  y  (a  bx )
2
i
i
For simple linear regression the least squares method
and the maximum likelihood method are equal!
Y
Regression
The maximum likelihood method
”The probability that the line fits the observed points”
_
y
^
residual (e) (xi,yi)
(xi,yi)
1   yi  yˆi 2

1
f ( y) =
e2
2 s
_
x
s2
X
Ordinary linear regression
 The formula for an ordinary regression can
be expressed as:
yi = b0 + b1·xi + ei
ei ~N(0, se2)
Interpretation of coefficients
100
Vekt i kg (Y)
90
80
Y = - 97.6 + 0.96*X
Y = a + b*X
70
Det vil si:
a = -97.6 og b=0.96
60
Kvinner
Menn
50
150
160
170
180
190
Høyde i cm (X)
200
210
Interpretation of coefficients
100
Y = - 85.0 + 0.91*X1 - 1.86*X2
90
Vekt (Y)
} = 1.86 kg
80
70
60
Kvinner
50
150
Menn
160
170
180
Høyde (X)
190
200
210
Random intercept model
Y
Regression lines:
yij = b0  b1·xij+vij
b1
(x11,y11)
b0+uj
(xnp,ynp)
(xij,yij)
su
se
X
Random intercept model
 For a random intercept model, we can express
the regression line(s) - and the variance
components as
yij = b0 + b1·xij + vij
vij = uj + eij
eij ~N(0, se2)
(individual)
uj ~N(0, su2)
(group)
Random intercept model
 Alternatively we may express the formulas, for
the simple variance component model, in terms
of random intercepts:
yij = b0j + b1·xij + eij
b0j = b0 + uj
eij ~N(0, se2)
(individual)
uj ~N(0, su2)
(group)
Random slope model
 For a random slope model (the intercepts are
equal), we can express the regression line(s)
and the variance components as
yij = b0 + b1j·xij + eij
b1j = b1+ wj
eij ~N(0, se2)
(individual)
wj ~N(0, sw2)
(group)
Random slope and intercept model
 For a random slope and random intercept
model, we can express the regression line(s)
and the variance components as
yij = b0j + b1j·xij + eij
b1j = b1+ wj
b0j = b0 + uj
eij ~N(0, se2)
(individual)
uj ~N(0, su2)
(group)
wj ~N(0, sw2)
(group)
Cortisol data
 Cortisol level in saliva measured each morning
in 3 days, in two periods*
 55 individuals
 278 observations (52 missing)
*
The real data was measured 5 times per day, in 3 days and 3 periods - from the article:
Harris A, Marquis P, Eriksen HR, Grant I, Corbett R, Lie SA, Ursin H. Diurnal rhythm in British
Antarctic personnel. Rural Remote Health. 2010 Apr-Jun;10(2):1351.
Cortisol data – missing data
ID per1day1 per1day2 per1day3 per2day1 per2day2 per2day3
2
6.83
6.44
7.09
8.34
5.45
9.59
4
7.94
10.61
9.27
6
1.65
3.62
9.13
8
6.62
15.02
9
1.49
6.94
22.5
37.76
20.04
25.08
29
1.97
18.34
9.15
60
21.21
6.91
5.04
6.63
4.54
168
7.81
10.6
22.26
19.91
8.81
10.55
188
3.75
6.01
6.43
219
25.02
16.49
5.64
10.18
11.48
7.87
227
10.3
18.06
14.87
19.37
7.73
235
3.92
0.94
8.11
237
16.4
13.12
16.43
1.55
5.19
239
25.68
15.35
10.79
8.99
17.18
246
6.03
8.08
10.65
247
11.51
27.07
25
13.93
16.12
10.81
257
19.8
11.47
10.88
10.42
9.25
3.98
273
9.42
9.42
274
8.41
12.48
10.72
3.63
10.37
8.62
277
9.92
11.56
16.88
6.12
5.83
9.47
282
15.56
32.96
26.6
8.48
7.37
283
15.66
8.86
12.37
4.53
0.42
9.39
298
0.32
1.21
4.54
5.39
8.14
307
16.22
13.45
15.76
0.57
2.69
2.31
319
8.67
11.77
9.48
22.35
13.19
6.05
322
38.06
22.37
15.89
20.41
3.53
19.98
338
8.4
8.62
3.09
8.15
8.3
10.88
348
29.89
33.27
23.99
11.72
4.59
11.16
355
2.11
4.22
15.28
13.94
7.95
14.18
364
5.89
5.81
5.8
5.06
7.05
7.72
376
10.27
14.57
12.11
1.28
5.88
377
11.49
17.26
12.23
5.13
8.99
431
13.28
9.06
17.22
10.19
8.56
7.6
432
13.65
17.72
8.68
10.89
10.9
11.05
534
4.24
10.45
11.92
535
5.54
11.71
22.45
20.09
11.24
14.08
536
19.03
8.75
13.03
1.4
10.03
3.26
537
6.59
17.89
5.19
5.63
3.14
5.79
538
3.25
5.54
0.63
8.34
8.21
1001
5.2
11.8
4.4
3.9
3.9
1002
7.8
10.2
10.2
6.4
8.3
2.6
1003
10.5
10.5
18.3
8.3
19
0.7
1004
12.2
10.1
13.2
8.9
4.6
5.4
1005
2.7
4.1
4.9
2.6
1006
25.5
10.3
8.3
4
6.3
1007
6.1
8.3
5.5
9.7
8.4
11.2
1008
4.6
4.8
3.8
5.8
5.3
4.3
1009
5
9.8
9.5
5.6
5.2
12.4
1010
4.8
15.6
12.8
8.2
9
13.9
1011
7.9
7.7
14.1
7.8
12.1
1012
13.8
16.6
7.6
1013
1.4
3.3
9.7
16.2
1014
9.5
8.4
9.3
15.1
12.6
15.3
1015
12.2
16.1
11
8.1
1016
8.5
8.4
6.8
19.9
Cortisol data – long data format
ID
2
2
2
2
2
2
4
4
4
4
4
4
6
6
6
6
6
6
8
8
8
8
8
8
9
9
9
9
9
9
29
29
29
29
Period
1
1
1
2
2
2
1
1
1
2
2
2
1
1
1
2
2
2
1
1
1
2
2
2
1
1
1
2
2
2
1
1
1
2
Day
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
2
3
1
Kortisol
6.83
6.44
7.09
8.34
5.45
9.59
7.94
10.61
9.27
1.65
3.62
9.13
6.62
15.02
1.49
6.94
22.50
37.76
20.04
25.08
1.97
Period1
1
1
1
0
0
0
1
1
1
0
0
0
1
1
1
0
0
0
1
1
1
0
0
0
1
1
1
0
0
0
1
1
1
0
Period2
0
0
0
1
1
1
0
0
0
1
1
1
0
0
0
1
1
1
0
0
0
1
1
1
0
0
0
1
1
1
0
0
0
1
Day1
1
0
0
1
0
0
1
0
0
1
0
0
1
0
0
1
0
0
1
0
0
1
0
0
1
0
0
1
0
0
1
0
0
1
Day2
0
1
0
0
1
0
0
1
0
0
1
0
0
1
0
0
1
0
0
1
0
0
1
0
0
1
0
0
1
0
0
1
0
0
Day3
0
0
1
0
0
1
0
0
1
0
0
1
0
0
1
0
0
1
0
0
1
0
0
1
0
0
1
0
0
1
0
0
1
0
Cortisol data
20
Kortisol level
0
10
20
10
0
Kortisol level
30
Period2
30
Period1
1.0
1.5
2.0
Days
2.5
3.0
1.0
1.5
2.0
Days
2.5
3.0
Linear model
Stata:
. glm kortisol period2 day2 day3, cluster(id)
(. regress kortisol period2 day2 day3, cluster(id))
Generalized linear models
Optimization : ML
No. of obs =
278
Residual df =
274
(Std. Err. adjusted for 55 clusters in id)
-----------------------------------------------------------------------------|
Robust
kortisol |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------period2 | -2.536544 .9788702 -2.59 0.010 -4.455094 -.6179938
day2 | .1313347 .7238506 0.18 0.856 -1.287386 1.550056
day3 | .6528685 .7052775 0.93 0.355
-.72945 2.035187
_cons | 11.31802 .9542124 11.86 0.000 9.447799 13.18824
Linear mixed model (variance component)
Stata:
. gllamm kortisol period2 day2 day3, i(id)
number of level 1 units = 278
number of level 2 units = 55
-----------------------------------------------------------------------------kortisol |
Coef. Std. Err.
z P>|z| [95% Conf. Interval]
-------------+---------------------------------------------------------------period2 | -2.600979 .6875339 -3.78 0.000 -3.94852 -1.253437
day2 | .05486 .8154391 0.07 0.946 -1.543371 1.653091
day3 | .5183787 .8242555 0.63 0.529 -1.097132 2.13389
_cons | 11.29695 .7666444 14.74 0.000 9.794358 12.79955
-----------------------------------------------------------------------------Variance at level 1 31.202774 (2.9334224)
Variances and covariances of random effects
-----------------------------------------------------------------------------level 2 (id) var(1): 8.6764463 (2.8796675)
ICC=0.218
Linear mixed model (variance component)
R:
lmer(Kortisol~1+Day2+Day3+Period2 +(1|ID),data=kortisol)
Random effects:
Groups Name
Variance Std.Dev.
ID
(Intercept) 8.8683 2.9780
Residual
31.6173 5.6229
Number of obs: 278, groups: ID, 55
Fixed effects:
Estimate Std. Error t value
(Intercept) 11.29105 0.77282 14.610
Day2
0.05431 0.82076 0.066
Day3
0.51766 0.82946 0.624
Period2 -2.60115 0.69204 -3.759
ICC=0.219
Cortisol data
0
10
20
Kortisol level
20
10
0
Kortisol level
30
Period2
30
Period1
1.0
1.5
2.0
Days
2.5
3.0
1.0
1.5
2.0
Days
2.5
3.0
Linear mixed model (variance component)
PASW:
MIXED Kortisol BY ID WITH Period2 Day2 Day3
/FIXED=Period2 Day2 Day3 | SSTYPE(3)
/METHOD=REML
/PRINT=SOLUTION
/RANDOM=ID | COVTYPE(VC).
ICC=0.219
Linear mixed model (random intercept model)
R:
lmer(Kortisol~1+Day+Period2 +(1|ID),data=kortisol)
Random effects:
Groups Name
Variance Std.Dev.
ID
(Intercept) 8.8879 2.9813
Residual
31.4891 5.6115
Number of obs: 278, groups: ID, 55
Fixed effects:
Estimate Std. Error t value
(Intercept) 11.2281 0.7394 15.186
Day
0.2546 0.4137 0.616
Period2
-2.6007 0.6907 -3.765
ICC=0.220
Linear mixed model (random intercept model)
0
10
20
Kortisol level
20
10
0
Kortisol level
30
Period2
30
Period1
1.0
1.5
2.0
Days
2.5
3.0
1.0
1.5
2.0
Days
2.5
3.0
Linear mixed model (random slope model)
R:
lmer(Kortisol~1+Day+Period2 +(Day-1|ID),data=kortisol)
Random effects:
Groups Name Variance Std.Dev.
ID
Day 6.2228e-08 0.00024945 !
Residual
4.0499e+01 6.36390166
Number of obs: 278, groups: ID, 55
Fixed effects:
Estimate Std. Error t value
(Intercept) 11.2575 0.6948 16.202
Day
0.3227 0.4660 0.692
Period2
-2.5361 0.7644 -3.318
Linear mixed model (random slope model)
0
10
20
Kortisol level
20
10
0
Kortisol level
30
Period2
30
Period1
1.0
1.5
2.0
Days
2.5
3.0
1.0
1.5
2.0
Days
2.5
3.0
Linear mixed model (random slope & intercept)
R:
lmer(Kortisol~1+Day+Period2 +(1+Day|ID),data=kortisol)
Random effects:
Groups Name
Variance Std.Dev. Corr
ID
(Intercept) 10.88014 3.29851
Day
0.10535 0.32457 -1.000
Residual
31.38000 5.60179
Number of obs: 278, groups: ID, 55
Fixed effects:
Estimate Std. Error t value
(Intercept) 11.2138 0.7629 14.698
Day
0.2656 0.4149 0.640
Period2 -2.5940 0.6891 -3.764
ICC=0.257
Linear mixed model (random slope model)
0
10
20
Kortisol level
20
10
0
Kortisol level
30
Period2
30
Period1
1.0
1.5
2.0
Days
2.5
3.0
1.0
1.5
2.0
Days
2.5
3.0
Summary
 The interpretation of parameter estimates of
categorical variables (preferably dummy
variables) from linear models can be
interpreted as mean differences, as from
ordinary t-test
 This is equivalent in models for repeated or
clustered observations!
Software Personal opinion
 PASW/SPSS
 Very easy to do simple models (menu/syntax)
 Arrange data
 Stata
 Steeper learning curve to start
 Easy () to extend the simpler models to more
sophisticated models (e.g. for other distributions!)
 glamm
Software Personal opinion
R
 Steep learning curve
 Nice graphics
 MLwiN
 Based on mouse clicking (impossible syntax)
 Informative screen using formulas
 SAS
 Steep learning curve
 “Similar” to SPSS
 IGLS – Iterative Generalised Least Squares
 RIGLS – Residual/Restricted Iterative
Generalised Least Squares
 MCMC – Markov Chain Monte Carlo
 Bootstrap – «Baron von Munchausen»
Extended models - Stata
Extended models - SPSS
Extended models – Stata (gllamm)
Family (F):
gaussian
poisson
gamma
binomial
and link (g):
identity
log
reciprocal
logit
probit
cll (complementary log-log)
ll (log-log)
ologit (o stands for ordinal)
oprobit
ocll
mlogit
sprobit (scaled probit)
soprobit
Extended models
 gllamm also allows for probability weighting
(e.g. to adjust for dropout)
 The “svyset” (survey set) extention also allows
for probability weighting, and robust variance
estimates (linear models, logistic models, …)
Random slope and intercept model
 For a random slope and random intercept
model, we can express the general regression
line(s) and the variance (components) as
g(yij) = b0j + b1j·xij + eij
b1j = b1+ wj
b0j = b0 + uj
eij ~ F
(individual)
uj ~N(0, su2)
(group)
wj ~N(0, sw2)
(group)
Related documents