Matematiikan ja tilastotieteen laitos / tilastotiede

Multilevel modelling:
general ideas and uses
Kari Nissinen
Finnish Institute for
Educational Research
Hierarchical data
Data in question is organized in a hierarchical /
multilevel manner
 Units at lower level (1-5) are arranged into
higher-level units (A, B)
Hierarchical data
Students within classes within schools
Employees within workplaces
Partners in couples
Residents within neighbourhoods
Nestlings within broods within populations…
Repeated measures within individuals
Hierarchical data
The key issue is clustering
lower-level units within an upper-level unit tend to be
more homogeneous than two arbitrary lower-level
E.g. students within a class: intra-cluster correlation
ICC (positive)
Repeated measures: autocorrelation (usually positive)
Hierarchical data
 Clustering
=> lower-level units are not
 In cross-sectional studies this is a problem
Two correlated observations provide less information
than two independent observations (partial ’overlap’)
Efficient sample size smaller than nominal sample
size => statistical inference falsely powerful
Clustering in cross-sectional
 Basic
statistical methods do not recognize
the dependence of observations
• Standard errors (variances) underestimated =>
confidence intervals too short, statistical tests too
 Special
methodology needed for correct
Design-based approaches (variance
estimation in cluster sampling framework)
Model-based approaches: multilevel models
Clustering in cross-sectional
of ’inference error’ due to
clustering: design effect (DEFF)
 Measure
= ratio of correct variance to underestimated
variance (no clustering assumed)
A function of ratio of nominal sample size to
effective sample size and/or homogeneity within
clusters (ICC)
Hierarchical data
 Hierarchy
is a property of population,
which can carry over into the sample data
Cluster sampling: hierarchy is explicitly
present in data collection => data possess the
same hierarchy (and possible clustering) exactly
Simple random sampling (etc): clustering may
or may not appear in the data
• It is present but hidden, may be difficult to identify
• Effect may be negligible
Hierarchical data
 Hierarchy
does not always lead to
clustering: units within a cluster can be
Other side of the coin is heterogeneity
between upper-level units: if no heterogeneity,
then no homogeneity among lower-level units
Zero ICC => no need for special methodology
Clustering can affect some target variables,
but not some others
Longitudinal data
 Clustering
= measurements on an
individual are not independent
When analyzing change this is a benefit
• Each units serves as its own ’control unit’ (’block
design’) => ’true’ change
• Autocorrelation ’carries’ this link from time point to
• Appropriate methods utilize this correlation =>
powerful statistical inference
Mixed models
 An
approach for handling hierarchical /
clustered / correlated data
 Typically regression or ANOVA models,
which contain effects of explanatory
variables, which can be (i) fixed, (ii)
random or (iii) both
Linear mixed models: error distribution normal
Generalized linear mixed models: error
distribution binomial, Poisson, gamma, etc
Mixed models
 Variance
component models
 Random coefficient regression models
 Multilevel models
 Hierachical (generalized) linear models
All these are special cases of mixed models
Similar estimation procedures (maximum
likelihood & its variants), etc
Fixed vs random effects
 1-way
ANOVA fixed effects model
Y(ij) = μ + α(i) + e(ij)
μ = fixed intercept, grand mean
α(i) = fixed effect of group i
e(ij) = random error (’random effect’) of unit ij
• random, because it is drawn from a population
• it has a probability distribution (often N(0,σ²))
Fixed vs random effects
 Fixed
effects determine the means of
E(Y(ij)) = μ + α(i), since E(e(ij))=0
 Random
effects determine the variances
(& covariances/correlations) of
Var(Y(ij)) = Var(e(ij)) = σ²
Fixed vs random effects
 1-way
ANOVA random effects model
Y(ij) = μ + u(i) + e(ij)
μ = fixed intercept, grand mean
u(i) = random effect of group i
• random when the group is drawn from a population
of groups
• has a probability distribution N(0,σ(u)²)
e(ij) = random error (’random effect’) of unit ij
Fixed vs random effects
 Now
the mean of observations is just
E(Y(ij)) = μ
 Variance
Var(Y(ij)) = Var(u(i) + e(ij))
= σ(u)² + σ²
Sum of two variance components => variance
component model
Random effects and clustering
 Random
group => units ij and ik within
group i are correlated:
= Cov(u(i) + e(ij), u(i) + e(ik))
= Cov(u(i), u(i)) = σ(u)²
 Positive intra-cluster correlation
ICC = Cov(Y(ij),Y(ik)) / Var(Y(ij))
= σ(u)² / (σ(u)² + σ²)
Mixed model
 Contains
both fixed and random effects,
Y(ij) = μ + βX(ij) + u(i) + e(ij)
i = school, j = student
μ = fixed intercept
β = fixed regression coefficient
u(i) = random school effect (’school intercept’)
e(ij) = random error of student j in school i
Mixed model
Y(ij) = μ + βX(ij) + u(i) + e(ij)
The mean of Y is modelled as a function of
explanatory variable X through the fixed
parameters μ and β
The variance of Y and within-cluster
covariance (ICC) are modelled through the
random effects u (’level 2’) and e (’level 1’)
This is the general idea; extends versatilely
Regression lines in variance
component model: high ICC
Regression lines in variance
component model: low ICC
An extension: random coefficient
Y(ij) = μ + βX(ij) + u(i) + v(i)X(ij) + e(ij)
v(i) = random school slope
Regression coefficient of X varies between
schools: β + v(i)
A ’side effect’: the variance of Y varies along
with X
• one possible way to model unequal variances (as
a function of X)
Random coefficient regression
Regression for repeated measures
Y(it) = μ(t) + βX(it) + e(it)
t = time, μ(t) = intercept at time t
i = individual
The errors e(it) of individual i correlated:
different (auto)correlation structures (e.g.
AR(1)) can be fitted as well as different
variance structures (unequal variances)