Download Introduction to Linear Mixed Models

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Interaction (statistics) wikipedia , lookup

Forecasting wikipedia , lookup

Regression analysis wikipedia , lookup

Regression toward the mean wikipedia , lookup

Time series wikipedia , lookup

Data assimilation wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Introduction to Linear Mixed Models
This course will begin with a brief review of linear models in regression
analysis and ANOVA. The concept of random effects will be introduced
and illustrated with several examples, including nested data classifications,
experimental designs, and observational studies. The importance of properly
using mixed models will be illustrated by comparing results of tests of
hypotheses and standard errors with and without recognizing random effects.
PROC MIXED in SAS will be used to demonstrate basic mixed model
analyses.
Exercise sessions will give students the opportunity to explore data analysis
on basic data sets and become familiar with PROC MIXED. Emphasis is on
getting a proper analysis and interpretation of results.
1
1. Review of Regression Analysis
Example: Cost of operation of livestock auction market.
Numbers of head of cattle, calves, hogs and sheep were recorded at nineteen
livestock auction markets, along with cost of operation of the markets. The
objective was to relate cost to numbers of head of the livestock categories.
A linear regression model is
y   0  1 x1   2 x2   3 x3   4 x4  
(1.1)
where
y = cost, x1 = cattle, x2 = calves, x3 = hogs, and x4 = sheep,
and  is the random error. The errors are assumed to be normally and
independently distributed with mean 0 and variance  2 , abbreviated
NID(0,  2 ). The expected value of y is
E ( y)   0  1x1   2 x2   3 x3   4 x4
(1.2)
and the variance is
V ( y)  V ( )   2 .
(1.3)
The value of  i is the expected increase in y due to a one-unit increase in the
value of the variable xi .
The method of Ordinary Least Squares (OLS) was used to estimate
parameters of the regression model, giving estimates and standard errors:
intercept cattle
parameter  i
3.22
parameter estimate ˆi 2.29
3.39
0.42
standard error se ̂
calves
1.61
0.85
hogs
0.81
0.47
sheep
0.80
0.19
i
The prediction equation is denoted
yˆ  ˆ0  ˆ1 x1  ˆ2 x2  ˆ3 x3  ˆ4 x4 .
(1.4)
For the auction market data, the prediction equation is
2
yˆ  2.29  3.22 x1  1.61x2  0.81x3  0.80 x4 .
(1.5)
Thus, the estimated increase in operation cost, y , due to a unit increase in
cattle is ˆ1 = $3.22. This is the cost per head of cattle of operating an
auction market. The standard error of the estimate is s̂ = 0.42.
1
An analysis of variance (ANOVA) for a regression with k independent
variables and n observations is summarized in the table
Source of
Variation
Regression
Error
Total
Degrees of
Freedom (DF)
k
n-k-1
n-1
Sum of Squares
(SS)
SSR=  j ( yˆ j  y) 2
SSE=  j ( y j  yˆ j ) 2
SST=  j ( y j  y) 2
Mean Square
(MS)
MSR=SSR/DFR
MSE=SSE/DFE
The total sum of squares, SST, is a measure of total variation in the data.
This dispersion comes from two sources. One is due to having different
values of cattle, calves, hogs, and sheep. It is measured by the regression
sum of squares, SSR. The other source of variation is everything that causes
variation in cost that is not due to variation in cattle, calves, hogs and sheep.
Although there are no two markets in the data set that have the same values
of cattle, calves, hogs and sheep, there are surely other factors that influence
cost, such as management practice and local environmental effects that are
unaccounted for, at least in this data set. These undetermined sources of
variation combined are called error, and are measured by the error sum of
squares SSE. Total variation is the sum of regression and error variation, as
indicated by the mathematical equation SST=SSR+SSE.
The ANOVA table for the auction market data is
Source of
Variation
Regression
Error
Total
Degrees of
Freedom (DF)
4
14=19-4-1
18=19-1
Sum of Squares
(SS)
SSR = 7936.7
SSE = 531.0
SST = 8467.8
Mean Square
(MS)
MSR = 1984.2
MSE = 37.9
3
Total variation is SST=8467.8, which is the sum of SSR=7936.7 and
SSE=531.0. Therefore, 94% of the total variation in cost is due to variation
in numbers of cattle, calves, hogs and sheep.
The MSE gives an estimate of the error variance,  2 , denoted ˆ 2 = 37.9.
The null hypothesis that cost is unrelated to any of the independent variables
is written H0: 1   2   3   4  0 . You can test this hypothesis using the test
statistic F=MSR/MSE, which has an F distribution with k numerator and nk-1 denominator degrees of freedom. For the auction market example, the
value of F is 1984.2/37.9=52.3, with significance probability p<0.0001.
You can test the null hypothesis H0:  i  0 using the statistic t= ˆi / seˆ , which
i
has a t distribution with n-k-1 degrees of freedom. For example, to test the
importance of hogs in the model, the test statistic is t=0.81/0.47=1.73, with
significance probability p=0.1054.
The statistic for testing the importance of cattle is t=3.22/0.42=7.62, with
significance probability p<0.0001. A 95% confidence interval for the true
cost is 3.22±(2.15)0.42. So you may be 95% confident that the true
operating cost per head of cattle, 1 , is in the interval 3.22 ± 0.90.
The ANOVA table can be expanded to include sources of variation for each
variable. The sums of squares for a variable measures the reduction in error
sum of squares due to adding that variable to a model that contains other
variables. The value of the sum of squares for the variable therefore
depends on which other variables are in the model. Most computer
programs compute either partial or sequential sums of squares. Here are the
“other variables” for partial and sequential sums of squares for the auction
market example.
Source of
variation
DF
x1=Cattle
x2=Calves
x3=Hogs
x4=Sheep
1
1
1
1
Other variables in
model
(partial)
x2, x3, x4
x1, x3, x4
x1, x2, x4
x1, x2, x3
Other variables in model
(sequential)
none
x1
x1, x2
x1, x2, x3
4
The partial sums of squares for a variable is a measure of the variation in
cost due to that variable apart from (in addition to) the variation due to all
other variables. The sequential sum of squares for a variable is a measure of
the variation in cost due to that variable apart from the variation due to the
variables that precede it in the ordered list of variables. The sequential sums
of squares add up to SSR.
Here are the partial and sequential analyses for the auction market data:
Source
of
variation
Cattle
Calves
Hogs
Sheep
DF
1
1
1
1
Seq.
Mean
Squares
6582.1
186.7
489.9
678.1
Seq. F
Seq. p- Par.
Statistics values Mean
Squares
173.53
<.0001 220.7
4.92
0.0436
136.1
12.91
0.0029
113.7
17.88
0.0008
678.1
Par. F
Par. Pstatistics values
58.02
3.59
3.00
17.88
<.0001
0.0791
0.1054
0.0008
Sums of squares are the same as mean squares because each has one DF.
The F statistics are equal to the mean squares divided by the MSE.
The values of sequential and partial sums of squares can differ markedly,
with correspondingly different inferential conclusions. The sequential test
for hogs is highly significant with p=.0029, whereas the partial test is only
marginally significant with p=.1054. This is because the sequential and
partial tests are assessing the significance of hogs in different models.
The regression model can be written in matrix notation as
Y  X  
where
 y1 
1 x11
y 
1 x
 2
 21
. 
. .
Y    and X  
. 
. .
. 
. .
 

1 xn1
 yn 
... x1k 
... x2 k 

... . 
... . 

... . 

... xnk 
are the nx1 vector of observed data and the nx(k+1) matrix containing the
values of the independent variables, respectively,
5
0 
 
 1
. 
  
. 
. 
 
  k 
is the vector of regression coefficients, and
1 
 
 2
. 
  
. 
. 
 
 n 
is the vector of errors.
For the auction market data,
27.698
 1 3.437
57.634 



 1 12.801
.

 . .
Y 
 , X 
.

 . .
.

 . .



46.890
 1 8.697
5.791
4.558
.
.
.
3.005
3.268
5.751
.
.
.
1.378
10.649
14.375
.
.
.
3.338

0 

 

 1

 , and     2  .
 

3 

 

 4

The parameter estimates and other regression computations can be
represented conveniently in matrix notation:
ˆ  ( X ' X ) 1 X 'Y , SST  Y ' ( I  (n-1) J )Y , SSE  Y ' ( I  X ( X ' X )1 X ' )Y ,
and
SSR=SST-SSE= Y ' ( X ( X ' X )1 X ' (n 1 ) J )Y .
The identity matrix I is an nxn matrix with 1 in each diagonal position and 0
in all non-diagonal positions, and the matrix J is an nxn matrix with 1 all
positions. These “sums of squares” are all of the form Y ' AY , called a
quadratic form, for some nxn matrix A. The number of degrees of freedom
6
associated with each of the sums of squares is the rank of the matrix of the
quadratic form:
DF(SST)=rank ( I  (n) 1 J ) =n-1,
DF(SSE)=rank ( I  X ( X ' X )1 X ' ) =n-k-1,
DF(SSR)=rank ( X ( X ' X )1 X ' (n-1) J ) =k.
The estimate ˆ  ( X ' X )1 X 'Y comes from solving “normal equations”
X ' Xˆ  X 'Y .
The explicit form of the normal equations is
n
 x
 j 1j
.

.
.

 j xkj
 j x1 j
 j x12j
.
.
.
 j x1 j xkj
 j xkj


 j x1 j xkj 
 j x2 j x1 j

.
.

…
.
.


.
.

 j x2 j xkj
 j xkj2 
 j x2 j
 ˆ0 
 j y j 
 


 ˆ1 
 j x1 j y j 
 


.  = .
.
. 
.

 


. 
.

 ˆ 
 j xkj y j 
 k
The covariance matrix of the random vector ̂ is V ( ˆ )  ˆ 2 ( X ' X ) 1 .
Inference about the parameter vector is usually in terms of linear forms
a ' ˆ =a0 ̂ 0 + a1 ˆ1 +…+ ak ̂ k .
The variance of a ' ˆ is V ( a ' ˆ )=a ' ( X ' X )1 a  2 . More generally, the
covariance matrix of a set of m linear forms L ' ˆ is V(L ' ˆ )=  2 L ' ( X ' X )1 L,
where L is a kxm matrix. The expressions provide the tools for inference
regarding linear forms L '  (linear combinations) of the parameter
vector  . Specifically:
Confidence interval for a '  :
a ' ˆ ±t  / 2, n  k  1 a' ( X ' X ) 1 aˆ 2 ,
Test statistic for H0: a'   0 :
Test statistic for H0: L'   0 :
t  a' ˆ / a' ( X ' X ) 1 aˆ 2 , DF=n-k-1,
F= ˆ ' L( L' ( X ' X )1 L) 1 Lˆ / ˆ 2 , DF=m,n-k-1.
7
The prediction equation for a given set of values x1, x2 , x3 , x4 of the
independent variables is yˆ  (1, x1, x2 , x3 , x4 ) ̂ . This is an example of a linear
form. The vector of predicted values corresponding to the set of
independent variables in the data set is Yˆ  Xˆ  X ( X ' X ) 1 X 'Y , which
motivates the name “hat matrix” in reference to X ( X ' X )1 X ' .
The linear combination yˆ  x' ̂ could be used to estimate the linear
combination x'   E ( y ) , where x'  (1, x1, x2 , x3 , x4 ) . The error of estimation is
yˆ  x'  , so the mean squared error of estimation is E ( yˆ  x'  )2  x' ( X ' X )1 x 2 .
Also, the linear combination yˆ  x' ̂ could be used to predict a “future value”
of y  x'    . The error of prediction is yˆ  y  xˆ  x   , so the mean
squared error of prediction is E ( yˆ  x'  ) 2  E ( x' ˆ  x'     (1  x' ( X ' X ) 1 x) 2 .
These give confidence limits for x'  and prediction limits for y  x'    :
Confidence interval for x'  :
yˆ  t / 2, n  k 1 x' ( X ' X )1 xˆ 2
Prediction interval for x'  :
yˆ  t / 2, n  k 1 (1  x' ( X ' X )1 x)ˆ 2 .
All these inferential procedures are in the context of the complete model
y   0  1x1  ...   k xk   . Results of the inference would change if the
model were changed; i.e., if variables were added or removed.
8
2. Review of Analysis of Variance
Example: Effects of dietary supplements on liver methionine in chickens.
Eight dietary supplements (diets) were randomly assigned to chickens in
pens. Six pens received each diet, giving 48 pens in all. At the end of nine
days, methionine (livermt) was measured in the livers of the chickens.
Let yij denote the measured liver methionine in the jth pen in diet group i . The
data from diet group i , yi1 ,…, yi 6 , are considered to be a random sample of
six values from a population with mean  i and variance  2 .
Means and standard deviations for the data are in the table:
Diet
(i)
Mean
ni
Std.
dev.
1
2
3
4
5
6
7
8
6.25 7.80 8.95 10.71 8.54 12.03 6.98 10.35
6
6
6
6
6
6
6
6
1.54 0.78 1.25 1.95 0.96 1.56 0.60 2.97
A statistical model for the data is
yij  i   ij , for i  1,...,8; j  1,...,6 ,
(2.1)
where  ij are NID (0,  2 ). In short, there are eight samples, each of size 6,
from eight populations, giving 48 observations in all. The populations have
means 1 ,  2 ,…, 8 , and are assumed to have homogeneous variances
 12   2 ,…,  82   2 .
In a designed experiment such as this, the dietary supplements are called
treatments. In other situations, data groups may occur naturally without
“treatments” being applied. This is the case in sample surveys and
observational studies wherein data may be obtained, for example, on
different ethnic groups or gender groups. In general, the term factor is used
to refer to categories that result from a classification variable. Factor level
refers to the individual categories. For the dietary supplements, the levels
are the eight different diets. For a factor such as gender, the levels are
“male” and “female.”
9
The liver methionine data is an example of a “one-way classification” of
data because the data are classified according to the levels of a single factor.
Equation (2.1) is called the factor means model because the expected values
E ( yij )  i
(2.2)
are represented directly as population means. It is sometimes convenient to
use the effects representation, which expresses means as
E ( yij )     i
(2.3)
where  is a “reference” value, and i  i   is the difference between the
mean and the reference value. The quantities  i are called the factor effects.
Then the ANOVA model becomes
yij     i   ij , for i  1,...,8; j  1,2,3 .
(2.4)
and is called a factor effects model. The choice of  is not unique, although
it is common to let it be the mean of the factor means; i.e.   (1  ...  a ) / a ,
where a is the number of levels of the factor. Then the factor effects are the
differences between the individual means and the overall mean. In other
situations, the reference might be one of the individual means, for example,
   a . Then the effects are differences between the factor level means and
the reference mean.
An analysis of variance for a one-way classification of data partitions the
total variation into sources due to differences between the factor levels, and
to differences within the factor levels. The standard ANOVA table is
Source of
Variation
Between groups
Within groups
Total
Degrees of
Freedom (DF)
a-1
n.-a
n.-1
Sum of Squares
(SS)
SSB= i ni ( yi.  y.. )2
SSW= ij ( yij  yi. )2
SST= ij ( yij  y.. )2
Mean Square
(MS)
MSB=SSB/DFB
MSW=SSW/DFW
In the notation of the table, ni is the number of observations in factor level i ,
the factor and overall sample means are yi.  yi. / ni and y..  y.. / n. , where
yi.   j yij is the total for factor level i , y..  ij yij is the overall total, and
10
n.  i ni is the total number of observations. The sum of squares and mean
squares “within” factor levels is often called “error,” in keeping with the
regression terminology; thus SSW=SSE and MSW=MSE.
The ANOVA for the liver methionine data is
Source of
Variation
Between diets
Within diets
Total
Degrees of
Freedom (DF)
7
40=48-8
47=48-1
Sum of Squares
(SS)
SSB = 163.57
SSW = 104.61
SST = 268.18
Mean Square
(MS)
MSB = 23.37
MSW = 2.62
The ANOVA table provides an estimate of the “error” variance  2 , denoted
ˆ 2 =MSW=2.62. Also, the ANOVA table contains computations for a
statistic to test the null hypothesis of equality of factor level means,
H0: 1  2  ...  7  8 . The test statistic is F=MSB/MSW, which has an F
distribution with a-1 numerator and n.-a denominator degrees of freedom.
For the liver methionine data, the value of F is 23.37/2.62=8.94, with
significance probability p<0.0001. Thus, there is highly significant
statistical evidence of difference between the diet population means.
Specific inference about the diet means can be made using linear
combinations of the means. An estimate of the difference  7   8 with
standard error is 6.98-10.35=-3.36. The standard error of the difference is
0.93. Test the significance of the difference using the t-statistic
t =-3.36/0.93=-3.60 (p=0.0009).
An estimate of the difference .5( 5   7 )  .5( 6  8 ) is .5(8.55+6.98).5(12.03+10.35)=-3.36. The standard error of the difference is 0.66. Test
the significance of the difference using the t-statistic
t =-3.42/0.66=-5.19 (p<0.0001).
The ANOVA model (2.4) can be represented as a regression model using
“dummy” variables. For each observation, let d i be a variable such that d i =1
11
if the observation is from treatment i and d i =0 if the observation is not from
treatment i . Then the ANOVA model can be written
y    1d1   2d2  ...   7 d7  8d8  
(2.5)
The interpretation of the “regression parameters” depends on how they are
defined, e.g. whether  is defined as the “overall” mean, or a specific factor
mean. This is an important issue to understand when using certain computer
programs, such as SAS procedures GLM or MIXED.
The ANOVA (2.5) model can be written in matrix notation as
Y  X  
where
 y1,1 
1 1 0
 
.
. 
 . .
. 
. . .
 

. 
. . .
y 
1 1 0
1, 6
Y    and X  
 y2,1 
1 0 1
 
. . .
. 

. 
. . .
 
. . .
. 

y 
1 1 0
 8, 6 
...
...
...
...
...
...
...
...
...
...
0
. 
.

.
0

0
.

.
.

1 
are the 48x1 vector of observed data, and the 48x9 “design” matrix. The
normal equations are
12
48
6

6

6
X ' Xˆ  6

6
6

6
6

6 6 6 6 6 6 6 6
6 0 0 0 0 0 0 0 
0 6 0 0 0 0 0 0

0 0 6 0 0 0 0 0
0 0 0 6 0 0 0 0

0 0 0 0 6 0 0 0
0 0 0 0 0 6 0 0

0 0 0 0 0 0 6 0
0 0 0 0 0 0 0 6 
 y.. 
 ˆ 
y 
ˆ 
 1. 
 1
 y 2. 
ˆ 2 
 
 
 y3. 
ˆ 3 
ˆ  = X 'Y =  y  .
 4. 
 4
ˆ
 y5. 
 5 
y 
ˆ 
 6. 
 6
 y 7. 
ˆ 7 
 
 
ˆ 8 
 y8. 
The matrix or X ' X is only of rank 8, so there is not a unique solution to the
equations. Each of the infinitely many solutions satisfies the equations
ˆ  ˆ1  y1. , ˆ  ˆ 2  y2. ,…, ˆ  ˆ8  y8. .
One such solution is
ˆ  y.. ,ˆ1  y1.  y.. ,...,ˆ8  y8.  y.. ,
and another is
ˆ  y8. ,ˆ1  y1.  y8. ,...,ˆ7  y7.  y8. ,ˆ8  y8.  y8.  0 .
This example illustrates a factor (diets) whose effects (the  i in equation 2.4)
are fixed. This means that inference is made about only the factor levels that
are represented in the data set; that is, the eight diets. The next example
illustrates a factor whose levels are random. Before going to that example,
however, we mention random effects in the present example, of a somewhat
different nature.
In fact, there were three birds in each pen. Thus, analyses already shown
were based on pen averages. Analyses could have been based on individual
bird data, which we denote yijk , the measured methionine in the liver of the
kth chicken in pen j diet group i. The following ANOVA table shows the
additional source of variation among birds within pens.
13
Source of
Variation
Between diets
Pens within diets
Birds within pens
Total
Degrees of
Freedom (DF)
7
40=48-8
96=144-48
47=48-1
Sum of Squares
(SS)
SSD
=163.57
SSP(D) =104.61
SSB(P,D)=313.65
SST
=1059.34
Mean Square
(MS)
MSB
=70.11
MSP(D) =7.84
MSB(P,D) =2.65
The ANOVA model becomes
yijk     i  pij   ijk , for i  1,...,8; j  1,...,6; k  1,2,3 .
(2.6)
where pij is the random effect of pen j in diet group i, and  ijk is the random
effect of bird k in pen j in diet group i. Since the model (2.6) contains both
fixed effects and random effects, it is called a mixed model.
The random pen effects pij are assumed to be distributed NID(0,  p 2 ) and the
bird effects  ijk are assumed to be distributed NID(0,  b 2 ). Expected mean
squares are useful to help understand what the mean squares measure.
Source of
Variation
Between diets
DF
Expected Mean Squares
(EMS)
7
Mean Square
(MS)
MSD
=70.11
Pens within diets
40
MSP(D)
 b 2  3 p 2
Birds within pens 96
Total
47
=7.84
MSB(P,D) =2.65
 b 2  3 p 2  18i (i  . )2 / 7
 b2
The quantity  2  i ( i  . ) 2 / 7 is a measure of differences between the diet
means. When H0: 1  2  ...  7  8 is true,  2  0 , and thus both MSB and
MSP(B) would be estimates of  b 2  3 p 2 . Therefore, the ratio
F=MSD/MSP(D)
(2.7)
provides a test statistic for H0: 1  2  ...  7  8 . The larger the value of F,
the more evidence against H0. Notice that F= 70.11/7.84 =8.94 is the value
of F that was obtained from the F statistic in the previous ANOVA before
bird variation was introduced. In fact, the sources of variation in the latter
table are simple multiples of 3 times the values in the previous table. One
14
must be careful to utilize the appropriate denominator (called error terms)
for test statistics. If we had used the “residual” error from the ANOVA
table, the test statistic would have been F=70.11/2.66=26.40, which is much
larger that the valid F=8.94. Likewise, the correct error term must be use in
calculations of standard errors of estimates. This table shows the effect of
using the incorrect residual variance as an error term in computing standard
errors of estimates between diet means instead of the correct mean square
between pens within diets.
Difference to be
estimated
Estimate
 7  8
.5(  5   7 )  .5( 6  8 )
-3.36
-3.42
Correct standard
error
0.93
0.66
Incorrect
standard error
0.54
0.38
The mixed model for this example can be written in matrix notation as
Y  X  ZU  e
where X is the “design” matrix for the fixed effects,  is the vector of fixedeffect parameters, Z is the design matrix for the random effects, and U is the
vector of random effects. The vectors and matrices in the model have the
form:
 y111 
1 1 0 ... 0 
1 0 0 0 0 0
y 
1

1
... 0
 112 
 1 0

 0 0 0 0 0
 y113 
1 1 0 ... 0 
1 0 0 0 0 0





... 0
 y121 
1 1 0

0 1 0 0 0 0
 

. 
. . . ... . 
. . . . . .
 





1
 
. 
. . . ... . 
. . . . . .
 2 
. 
. . . ... . 
. . . . . .
 
, X  
 ,   .  , Z  
Y 
 y161 
1 1 0 ... 0 
0 0 0 0 0 1
. 




0 0 0 0 0 1
... 0
 
1
 y162 
 1 0


. 
 y163 
1 1 0 ... 0 
0 0 0 0 0 1
 





8


... 0
 y211 
1 0 1

0 0 0 0 0 0
. 
. . . ... . 
. . . . . .





... .
. 
. . .

. . . . . .
. 
. . . ... . 
. . . . . .


0 ... 0 

0 ... 0 
0 ... 0 

0 ... 0 
. ... . 

. ... . 
. ... . 

0 ... 0 
0 ... 0 
0 ... 0 

1 ... 0 
. ... . 

. ... . 
. ... . 
15
For this example, U is distributed MVN(0,  p2 I ) and e is MVN(0,  e2 I ). The
covariance matrix of the observation vector is V (Y )   p2 ZZ ' e2 I .
The explicit form of V(Y) is
V11
0

0

0
0

0
V (Y )  0

.
.

.

0
0

0
0 0 0
V12 0 0
0 V13 0
0 0 V14
0
0
0
.
.
.
0
0
0
0
0
0
.
.
.
0
0
0
0
0
0
.
.
.
0
0
0
0
0
0
0
V15
0
0
0
0
0
0
0
.
.
.
0
0
0
V16
2
2
 y ij1   p   e

 
where Vij  V  y ij 2    p2


 y   2
 ij 3   p
0
.
.
.
0
0
0
0
0
0
0
0
0
V 21
.
.
.
0
0
0
...
...
...
...
...
...
...
...
...
...
...
...
...
 p2
 p2   e2
 p2
0
0
0
0
0
0
0
.
.
.
V81
0
0
0
0
0
0
0
0
0
.
.
.
0
V82
0
0 
0 
0 

0 
0 

0 
0 ,

. 
. 

. 

0 
0 

V83 


 p2
 for all i and j.

 p2   e2 
 p2
In words, observations from birds in the same pen have covariance  p2 and
observations in different pens are uncorrelated. That is, Cov( yijk , yijk' )=  p2 if
k  k ' , and Cov( yijk , yijk ' )=0 if i  i' or j  j ' . The quantity    p2 /( p2   e2 ) is
called the intra-class correlation, and is, in fact, the correlation between any
two measures within the same pen.
In more general models, U is MVN(0,G) and e is MVN(0,R), which results
in V(Y)=ZGZ’+R.
16
3. Analysis of variance for a factor with random effects
Example: Usual intake distribution of calories by pregnant women.
Sixty-five women submitted food intake diaries during the latter 34 weeks of
their pregnancy. The intake records were processed by a computer program
that calculated the number of calories, and other nutritional information.
The number of records per patient ranged from one to eight, and resulted in
324 observations in the data set. This is a one-way classification of data,
with the caloric intake values classified according to patient.
A statistical model for the data is
yij    ai   ij ,
for i  1,...,65; j  1,2,..., ni ,
(3.1)
where the ai are NID(0,  a 2 ) and the  ij are NID (0,  2 ). Model 3.1 differs
from model (2.4) in that (3.1) has random effects for patient instead of the
fixed effects for the diets. The objective of the kcal study is to estimate the
distribution of the effects, rather than make inference about effects of
individual patients. There are only three parameters in the model, the mean
intake for the population,  , the variance of the distribution of between
patient effects,  a2 , and the variance of the within patient “errors,”  2 .
The table contains an analysis of variance for the kcal data:
Source of
Variation
Between
patients
Within diets
Total
Degrees of
Freedom
(DF)
64
259
323
Sum of
Squares (SS)
Mean Square
(MS)
Expected MS
SSB =
39.14
SSW =
32.19
SST =
71.39
MSB =
0.611
MSW =
0.124
 2  4.96 a 2
2
The number k=4.96 in the “Between patients” expected mean square is a
number between 1 and 8, because there were between 1 and 8 observations
per patient. The expected means squares show that MSB=0.611 is an
estimate of  2  4.96 a 2 and MSW=0.124 is an estimate of  2 . Therefore,
17
(MSB-MSW)/4.96 =(0.611-0.124)/4.96 = 0.098
is an estimate of  a 2 , the variance of the random effects of patients. The
estimate is denoted ˆ a 2 =0.098. An estimate of the mean caloric intake for
the population of women is ̂ =2.130. The standard error of the estimate of
 is .045.
The variances  a 2 and  2 measure different aspects of variation in the caloric
intake of women. The variance  2 measures the variation of intake within an
individual woman; in other words, the variance of the population consisting
of all daily intake values for an individual women. (This variance is
assumed to be the same for all women, which probably is not realistic, but
there is not enough data to estimate a separate variance for each woman.)
The estimate of the standard deviation is ˆ =0.352. Therefore, using the
empirical rule that approximately 95% of the values in a population are
within ±2 standard deviations of the mean, we would say that approximately
95% of the intake values for an individual woman are within 2(0.352)=0.704
of the mean for that woman.
The variance  a 2 measures the variation between the true mean intakes of the
population of women from which the women in the study can be considered
a random sample. The estimate of the standard deviation is ̂ a = 0.313.
Using the estimate of the population mean ̂ =2.130 in conjunction with the
standard deviation estimate, we conclude that the true mean intakes of the
population of women is approximately distributed with mean 2.130 and
standard deviation 0.313.
Now let y stand for a kcal observation that is to be made at a randomly
chosen time on a woman randomly chosen from the population. The set of
all conceivable such observations constitutes another population. Its
probability distribution is a mixture of the distributions corresponding to all
women. The mean is of that population is  and the variance is
2
V ( y)   a   2 , the sum of the two variance components for between woman
variation and within woman variation. The estimate of this variance is
2
2
Vˆ ( y)  ˆ y  ˆ a  ˆ 2 =0.098+0.124=0.222. The standard deviation is
ˆ y =0.471. Thus the population is distributed with approximate mean of
2.130 and standard deviation 0.471.
18
This is an example of a one-way classification of data from an observational
study, as opposed to the designed experiment of supplemental diet. The
designed experiment had fixed effects of diets and random effects of pens,
making it a mixed model data study. The observational study has random
effects of patients. However, it also has an aspect of fixed effects. It the
data are classified according to trimester of the pregnancy, then “trimester”
could be considered a fixed effect. Let yijk be the kth kcal measurement in
trimester j for patient i. A mixed model for the kcal data would be
yijk    ai   j   ijk ,
for i  1,...,65; j  1,2,..., ni ,
(3.2)
where  j is the fixed effect of trimester j. The means for the three trimesters
are  j     j , j=1,2,3. A test for the null hypothesis H0: 1  2  3 is
F=6.69 with p=0.0015. Estimates of the means and differences between the
means, with standard errors, are in the table.
Parameters to be
estimated
Estimate
Standard error
1
2
3
1  2
1  3
 2  3
2.075
2.209
2.054
-0.134
0.021
0.155
0.054
0.050
0.056
0.049
0.056
0.048
p-values for
differences
0.0069
0.7029
0.0014
The basic conclusion is that kcal consumption increases by about 6% from
trimester 1 to trimester 2, and then decreases in trimester 3 to about the same
level as in trimester 1.
The mixed model for this example can be written in matrix notation as
Y  X  ZU  e
where X is the “design” matrix for the fixed effects,  is the vector of fixedeffect parameters, Z is the design matrix for the random effects, and U is the
vector of random effects. The vectors and matrices in the model have the
form:
19
 y211 
1 1 0 0 
1 0 0 ...
y 
1 1 0 0 
0 1 0 ...
 411 



 y421 
1 0 1 0 
0 1 0 ...





 y422 
1 0 1 0 
0 1 0 ...



 y423 
1 0 1 0 
0 1 0 ...
 





1
Y   y431  , X  1 0 0 1  ,     , Z  0 1 0 ...
 2 
y 
1 0 0 1 
0 1 0 ...
 
432






 3
 y511 
1 1 0 0 
0 0 1 ...


. . . . 
. . . ...
. 



. 
. . . . 
. . . ...





. . . . 
. . . ...
. 




 a2 

a 

 4

a5 

, U   
. 

. 

 

. 





For this example, U is distributed MVN(0,  p2 I ) and e is MVN(0,  e2 I ). The
variance of the observation vector is V (Y )   p2 ZZ ' e2 I . In words,
observations from birds in the same pen have covariance  p2 and
observations in different pens are uncorrelated. That is, Cov( yijk , yijk' )=  p2 if
k  k ' , and Cov( yijk , yijk ' )=0 if i  i' or j  j ' .
In more general models, U is MVN(0,G) and e is MVN(0,R), which results
in V(Y)=ZGZ’+R.
20
4. Mixed model analysis for data with crossed and nested effects
Example: Scores on a standardized exam were obtained from 890
students that were taught by 47 teachers in 14 different schools.
Demographic variables were also recorded, including socio-economic
status (ses). Data were analyzed to assess effects of school and ses on
test scores. School and ses are considered fixed and teacher is considered
random. Teachers are nested within schools, while schools and ses are
crossed.
It is helpful to organize the sources of variation in the style of an
ANOVA.
Source of Variation
Schools
Teachers(Schools)
Ses
Schools*Ses
DF
13
33
1
13
Let yijkl denote the score obtained by student l in ses group k taught by
teacher j in school i. A statistical model is
yijkl  ik  dij  eijkl
(4.1)
where  ik is the population mean score for students of ses k and in school
i, dij is the random effect of teacher j in school i, and eijkl is the random
effect of student l in ses group k taught by teacher j in school i. The
population of teacher effects is assumed NID(0,  t 2 ) and the student
effects are assumed NID(0,  2 ).
The means can be written in terms of school and ses effects
ik    i   k  ( )ik
(4.2)
This leads to the mixed model
yijkl     i   k  ( )ik  dij  eijkl
(4.3)
Mixed model methodology was used to estimate parameters of the model
and make inference about differences between ses groups and schools.
21
The variance component estimates are ˆ t 2 =7.90 and ˆ 2 =105.35. These
estimates show a large amount of variation between students but
relatively little due to differences between teachers. Tests of fixed effects
of school and ses are given by the F statistics
Effect
F
p
School
1.80
0.0745
Ses
10.61 <0.0012
School*Ses
0.97
0.4818
The “non-significant” school*ses interaction (p=0.4818) indicates no
evidence that ses effect depends on school. That is, the difference
between ses 1 and ses 2 is approximately the same in all schools. There
is highly significant evidence of difference in mean scores between ses
groups (p<0.0012), and evidence of differences between schools
(p=0.0745).
School
Ses=0
Ses=1
Diff
Pdiff
10
28.1
33.6
5.5
.0538
11
37.3
41.3
4.0
.3263
16
32.0
36.5
4.5
.0573
17
34.7
35.7
1.0
.7136
…
…
…
…
…
47
36.7
39.5
2.8
.3637
48
33.3
35.4
2.1
.5416
50
29.5
34.6
5.1
.0573
Mean
32.7
36.5
3.8
.0012
22
5. Multiple Levels of Sources of Variation
Example: Multi-Center Clinical Trial
Side effects of two drugs were investigated in a multi-center clinical trial.
Patients at fifty-three clinics were randomized to the drugs. Following
administration of the drugs, patients returned to the clinics at five triweekly visits. At each visit, several clinic sign were recorder, including
sitting heart rate. Numbers of patients on each drug ranged generally
from one to ten, although there were no patients one or the other drug at a
small number of clinics. Also, there were more than ten patients on each
drug at two clinics. Clinics (which are designated “inv”) are considered
random because it was desired to make inference applicable to a broader
population of clinics. Also, patients are considered random to represent
samples of patients from the populations of patients at each clinic. In
addition, there is residual variation at each visit for each patient.
Let yijkl be the measure of sitting heart rate (si_hr) at time l on patient k on
drug i at clinic j. When developing a statistical model for the data, it is
helpful to imagine the sources of sampling variation as if drugs had not
been assigned. These include random effects of clinic (b j ) , patient (cijk )
and residual (eijkl ) at measurement times. We assume these are distributed
2
2
2
NID(0,  center
) , NID(0,  patient
) , respectively. Assume the
) , and NID(0,  error
population mean is E ( yijkl )   . Then the observation may be represented
y    b j  cijk  eijkl .
The variance in an observation due to sampling error is
2
2
2
.
V ( yijkl )   center
  patient
  error
Now consider the effects of administering the drugs. First, consider the
effects on the mean. Let the il    i   l  ( )il denote the population
mean at visit k for patients administered drug i, where  i ,  l , and ( )il are
the fixed effects due to drug, visit, and drug*visit interaction. Next, there
is a possible random interaction effect (b) ij between clinic and drug.
2
Assume (b) ij is distributed NID(0,  center
*drug ) . Then an observation is
represented
23
y ijkl     i  b j  (b) ij  cijk   l  ( ) il  eijkl .
The mean and variance are
E ( y ijkl )     i   l  ( ) il
and
2
2
2
2
V ( yijkl )   center
  center
*drug   patient   error .
Estimates of the variance components are
2
ˆ center
 0.46
2
ˆ center
*drug  4.71
2
ˆ patient
 30.80
2
ˆ error
 62.65.
Tests of fixed effects are
Num Den
Effect DF DF F Value Pr > F
drug
1
43
visit
4 1118
drug*visit 4 1118
4.51 0.0395
0.36 0.8387
1.46 0.2129
Non-significant effects of visit and drugxvisit interation indicate that
comparison of overall drug means are justified. Estimates of the means
and difference between means, with 95% confidence intervals are in the
table.
Effect
Estimate
Drug 1 Mean 77.19
Drug 2 Mean 75.08
Difference
2.11
Standard
Error
Lower CL
Upper CL
0.73
0.70
0.99
75.73
73.67
0.11
78.66
76.49
4.12
24