Download Lecture5-12-09 - University of Washington

Document related concepts

Instrumental variables estimation wikipedia , lookup

Forecasting wikipedia , lookup

Regression toward the mean wikipedia , lookup

German tank problem wikipedia , lookup

Data assimilation wikipedia , lookup

Time series wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Methods for Multilevel Analysis
XH Andrew Zhou, PhD
Professor, Department of Biostatistics
University of Washington
Examples of Multilevel (Hierarchical)
Data




Individual-family-neighborhood
Students-classroom-schooldistrict
Patient-provider-facility (the
Ambulatory Care Quality
Improvement Project (ACQUIP).
Other types, multiple outcomes
nested within individual
ACQUIP Alcohol Trial
•
•
•
A group-randomized trial
Intervention: Feedback given to the
providers at each visit on patient’s general
perceived health status as well as the
condition specific perceived health status
for 6 common conditions — chronic
obstructive pulmonary disease (COPD),
coronary artery disease (CAD),
hypertension, depression, diabetes, and
alcohol problems.
Outcome at 1-yr follow-up: (1) Self-reports
of advice about alcohol from their provider;
binary outcome.
Hierarchical Nature of Data




Patients – Providers – facility
Patient’s characteristics, e.g. advice
at baseline, co-morbility
Provider’s characteristics, e.g panel
size
Facility’s characteristics, e.g. urban
vs rural.
Research Questions



Whether the intervention was significantly
related with patient self-reports of advice
about alcohol from their providers after
one year of the intervention.
Independent effects of patient-level,
provider-level, and facility-level factors.
Quantification of provider-to-provider
variability and facility-to-facility variability
and the degree which it can be explained
by patient-level, provider-level, and
facility-level factors
Research Questions, Cont


Do facilities differ in expected
outcomes after controlling for
individual-level, provider-level, and
facility-level factors?
Do providers differ in expected
outcomes after controlling for
individual-level, provider-level, and
facility-level factors?
Multilevel (Hierarchical) Models
A hierarchical model analysis will treat
the sites and the providers as random
effects and will parse out the amount
of total variation in the outcome that is
attributable to each level of hierarchy.
An example using two-level linear
model on schools

A study of the relationship between
a single student-level predictor
variable (say, socioeconomic status
(SES)) and one student-level
outcome variable (mathematics
achievement) in J schools randomly
drawn from the entire population of
schools.
The SES-Achievement relationship
in one school


Our regression model would be
Figure 2.1 provides a scatterplot of
this relationship.
Yi   0  1 xi   i ,  i ~ N (0,  2 )
Yi is the math achievement score for student i,
x i is the socioeconomic status for subject i.
 0 is the intercept, and 1 is the slope.
Centering in covariates




0 is defined as the expected achivement of a
student whose SES is zero.
It may be helpful to scale the independent
variable, X, so that the intercept will be
meaningful.
We center SES by subtracting the mean SES
from each score.
Figure 2.2 shows the regression model with
centering.
The SES-Achievement relationship
in two schools

Figure 2.3 shows separate regression
models for two schools.
Yi1   01  11 xi1   i1 ,  i1 ~ N (0,  )
2
Yi1 is the achievement for student i in school 1,
x i1 is the socioeconomic status for subject i in school 1.
 01 is the intercept, and 11 is the slope in school 1.
Yi 2   02  12 xi 2   i 2 ,  i 2 ~ N (0,  2 )
Yi 2 is the achievement for student i in school 2





The two lines indicate that School 1 and School 2
differ in two ways.
(1) School 1 has higher mean than school 2 (01>02)
(2) SES is less predictive of achievement in School 1
than School 2 (11<12)
If students had been randomly assigned to the two
schools, we could say that School 1 is both more
“effective” and more “equitable”.
Of course, students are not assigned at random, so
such interpretations of school effects are
unwarranted without taking into account other
differences in student composition.
The SES-Achievement relationship in J
schools (2-level Variance Component)
Yij   0 j  11 xij   ij ,  ij ~ N (0,  )
2
Yij is the achievement for student i in school j,
x ij is the socioeconomic status for subject i in school j.
 0j is the intercept, and 1j is the slope in school j.
For each school, effectiveness and equality are described by
the pair of values ( 0j , 1 j ).

Often sensible and convenient to
assume that the intercept and slope
have a bivariate normal distribution
across the population of schools.
E( 0 j )   0 ,Var ( 0 j )   00 , E( 1 j )   1,
Var ( 1 j )  11,Cov(0 j ,1 j )= 01
Interpretation





0: the average school mean for the
population of schools
00: the population variance among the
school means
1: the average SES-achievement slope
for the population of schools
11:the population variance among the
slopes
01: the population covariance between
slopes and intercepts



Figure 2.4 provides a scatterplot of the
relationship between 0j and 1j for a
hypothetical sample of 200 schools.
There is more dispersion among means than
slopes (00> 11)
Two effects tend to be negatively
correlated (01<0); schools with high
averaged achievment, 0j, tend to have weak
SES-achievement relationship, 1j
Modeling the second level
•
•
Having examined graphically how
schools vary in terms of their
intcepts and slopes, we wish to
develop a model to predict 0j and 1j
using school characteristics.
Let Wj be an indicator, which takes on a
value of one for Catholic schools and a
value of zero for public
Two-level Linear Model, Cont
 0 j   00   01W j  uoj , uoj ~ N (0, 00 ),
1 j   10   11W j  u1 j , u1 j ~ N (0, 00 ),
cov(u0 j , u1 j )   10
Interpretation






00: the mean achievement for public
schools
01: the mean achievement difference
between Catholic and public schools
10: the average SES-achievement slope in
public schools
11: the mean difference in SESachievement slope in between Catholic and
public schools
u1j:the unique effect of school j on mean
achievement holding Wj constant
u0j: the unique effect of school j on SESachievement slope holding Wj constant
Estimation methods


It is not possible to estimate the
parameters of these regression
models directly because the
outcomes (0j, 1j) are not observed.
However, the data contain
information needed for this
estimation.
Estimation methods, cont

Combining models in two stages,
we obtain
Yij   00   01W j   10 ( X ij  X )   11W j ( X ij  X )  ij ,
ij  u0 j  u1 j (Xij -X)+ ij .
Estimation methods, Cont



The overall linear regression model is not
the typical linear model assumed in
standard ordinary least squares (OLS).
Efficient estimation and accurate
hypothesis testing based on OLS require
that the random errors are independent,
normally distributed, and have constant
variance.
In contrast, random errors in our overall
model are dependent within each school
and also have non-constant variances.
Estimation methods, cont.

The variance of random errors has
the following complicated form:
Var (ij )   00  11 (Xij -X) + .
2
2
Estimation methods, cont




Through standard regression analysis is not
appropriate, such models can be estimated
by iterative maximum likelihood procedure.
Figure 2.5 provides a graphical
representation of the model specified above.
Here we see two hypothetical plots of the
association between 0j and 1j, one for
public and a second for Catholic schools.
Plots show Catholic schools have both
higher mean achievement and weaker SES
effects than do the public school
Estimation methods, Cont
•
1.
2.
3.
Three types of parameters to
estimate to be estimated:
Fixed effects (00,01,10,11)
Random level-1 coefficients (0j,1j)
Variance-covariance components
(2,00,11,01)
Three common estimation methods



Maximum likelihood (ML) method is a general
estimation procedure, which produces
estimates for the population parameters that
maximize the probability of the observing the
data given the model.
Iterative generalized least squares (IGLS) and
Restricted Iterative generalized least squares.
Bayesian method
ML method

1.
2.
Two different likelihood functions:
Full Maximum Likelihood (FML) – both the
regression coefficients and the variance components
are included in the likelihood function.
Restricted Maximum Likelihood (RML) – only the
variance components are included in the likelihood
function, and the regression coefficients are estimated
in a second estimation step.
Comparison of these two methods



FML is more efficient and can provide estimates for
both variance components and fixed effect parameters.
But, FML may produce biased estimates for variance
components.
RML can provide less biases estimates for the variance
components and is equivalent to ANOVA estimates,
which are optimal, if the groups are balanced.
FML still continues to be used because (1) its
computation is generally easier, and (2) it is easier to
compare two models that differ in the fixed parameters
using the likelihood-based tests. However, with RML,
only differences in the random part can be compared
with likelihood-based tests
IGLS and RIGLS

The combined model is
Yij   00   01W j   10 ( X ij  X )   11W j ( X ij  X )  ij ,
var(ij )   00   11 (X ij -X) 2 + 2 ,
cov(ij ,kj )=  00   01[1  (X ij -X)+(X kj -X)]+ 11 (X ij -X)(X kj -X)
for i  k. Or we can re-write the model as
Y ~ N mJ J ( X  , ),
Y  (Y11 ,
, YmJ J ) ',   ( 00 ,  01 ,  10 ,  11 ) '.
IGLS and RIGLS, Cont


If , 00, 11, and 01 were known, then the
covariance matrix,, could be constructed
immediately, and the estimation could be
performed with generalized least squares.
However, without knowledge of the
covariance matrix, the estimation method
is instead and iterative process known as
iterative generalized least squares (IGLS).
IGLS and RIGLS, Cont


The first step is to start with reasonable
estimates of the fixed parameters.
Typically these are the estimates from
Ordinary Least Squares (OLS) that
assumes 00=11=01=0.
From these estimates, the raw residuals
are formed:
ˆ
ˆ
yij  yij   0  1 xij
IGLS and RIGLS, Cont
Let Y be the vector of raw residuals and it can be shown
E[YY T ]  
The estimation of these variance components involves an
application of Generalized Least Squares (GLS).
GLS is a regression technique that is used
when the error terms from OLS estimation
display non-random patterns, such as correlation.
IGLS and RIGLS, Cont


With the estimates of and from GLS, the
iterative procedure returns to the fixed
part of the model and calculates new
estimates of the fixed effects.
The procedure alternates between the
fixed and random effects in this way until
convergence, or until the parameter
estimates do not change from iteration to
iteration.
IGLS and RIGLS, Cont




IGLS estimation may produce biased estimates of
the random parameters because it does not take
into account the sampling variation of the
estimates for variance components.
This may be most severe in small samples.
However, unbiased estimates can be produced
using Restricted Iterative Generalized Least
Squares (RIGLS).
The main difference between IGLS and RIGLS is
that IGLS uses maximum likelihood and RIGLS
uses restricted maximum likelihood.
Bayesian method



Bayesian methods combine any prior information
about the parameters with the information
contained in the data to produce a posterior
distribution.
MCMC methods are commonly used
computational methods for generaring a random
sample from a posterior distribution.
MCMC methods are also iterative and include
Gibbs sampling and Metropolis-Hastings sampling.
MCMC methods tend to produce more accurate
interval estimates for small samples.
Three-level binary response
models for the Alcohol Drinking


Let Yijk be the binary response
variable for whether to receive
drinking advice by subject i cared
by provider j in hospital k
Xijk is an intervention status for
subject i by provider j in hospital k.
Three-level logistic regression
logit(Pr(Yijk  1))   0 jk  1 X ijk  eijk
 0 jk   0  vk  u jk
vk ~ N (0,  v2 )
u jk ~ N (0,  u2 )
eijk ~ N (0,  e2 )



The parameter e is a natural test for
whether the assumption of Binomial
variation is valid.
If is significantly different from one, the
data is said to exhibit extra-binomial
variation.
If is less than one is, the data is said to
be under-dispersed and if is greater than
one, the data is is said to be overdispersed.
Two estimation methods
Two estimation methods for multilevel logistic regression models:
• A quasi-likelihood approach
• Bayesian approach with MCMC
methods. I will briefly describe
these two approaches below.
Two Quasi-likelihood methods




For the quasi-likelihood approach, the first step in
the estimation is to approximate the non-linear
logistic regression equation using a Taylor series
expansion. A Taylor series approximates a nonlinear
function by an infinite series of terms.
If only the first term in the series is used, then the
estimation is known as a first order approximation.
If the second term in the series is also used, then is
referred to as second order approximation.
If the Taylor series is expanded about the fixed
parameters only, then the estimation is known as
Marginal Quasi-likelihood (MQL).
Two Quasi-likelihood methods,Cont


If the Taylor series is expanded about
the fixed and the random parameters,
then the estimation is known as
Penalized Quasi-Likelihood (PQL).
Once the quasi-likelihood has been
formed, the estimation procedures,
IGLS and RIGLS, can be applied to
estimate the parameter values.
Bayesian method

The MCMC method used for the
logistic regression equations in this
paper will be Metropolis-Hastings
sampling.
ACQUIP Alcohol Trial
•
•
•
Binary outcome at 1-yr follow-up: (1)
Self-reports of advice about alcohol
patients receive from their provider.
Patient-level covariates
Provider-level covariates.
The Alcohol Example, Cont



Random assignment at the firm level should
ensure that, on average, the two groups should
be balanced on the baseline covariates. However,
imbalance may still occur and confounding may
still present a problem.
Patient-level potential confounders: hypertension,
liver disease, being a smoker in the past year, and
the AUDIT score.
Provider-level potential confounders: the number
of patients per provider (Panel Size) and provider
training.
Alcohol example
logit(Pr(Yijk  1))   0 jk  1 X ijk   2 AdviceAtBaselineijk
  3 Hypertensionijk   4 LiverDiseaseijk   5 PastYearSmokerijk 
 6 BaselineAUDITijk   7 PanelSizeijk   8 Fellowijk   9 NPijk 
10 PAijk  11 Residentijk  12 RNijk  eijk
 0 jk   0  vk  u jk
vk ~ N (0,  v2 )
u jk ~ N (0,  u2 )
eijk ~ N (0,  e2 )
Three-level logistic regression, Cont




Here, the variables Hypertension, LiverDisease,
and PastYearSmoker are dichotomous variables
that are equal to one if the patient reported the
condition and zero if the patient did not report the
condition.
The variable BaselineAUDIT is the patients AUDIT
score at the baseline and is a continuous variable
that ranges from 0 to 40,
the variable PanelSize indicates the range of the
provider’s panel size.
The variables Fellow, NP, PA, Resident, and RN
are dichotomous variables representing the
categorical variable of provider type. The referent
provider type is staff physician
Results

Table 3.5.1 shows the MQL
estimates under the combinations
of first order and second order
approximation and the binomial and
extra-binomial assumptions and
Table 3.5.2 shows the PQL
estimates under the combinations
of first order and second order
approximation and the binomial and
extra-binomial assumptions.
Results for fixed effects


The estimates for the fixed effects are
quite stable between estimation
procedures. The estimate of the
intervention effect is approximately 1.35,
indicating that a patient in the
intervention group is more likely to report
advice than a patient in the control group.
This result is not significant if a two-tailed
test is used. However, this result is
significant if a one-tailed test is used.
The p-values for the one-tailed test range
from 0.02 to 0.05 depending on which
estimate is considered.
Results for fixed effects, Cont


A patient self-report of advice at baseline
as well as the patient’s baseline AUDIT
score are the only additional variables
significantly associated with a patient
self-report of advice on the one-year
follow-up survey.
None of the provider-level variables are
associated with a patient self-report of
advice on the one-year follow-up survey.
Results for variances and covariances



Estimates of the variance components are slightly
more variable across estimation procedures in
this model.
The estimate of the site level variance component
has increased from approximately zero to be in
the range of 0.01 to 0.04. However, these
estimates tend to include zero in the confidence
interval, indicating as before, that there may be
little or no residual clustering at the site level.
The provider level variance components estimates
are between 0.01 and 0.16, thus showing the
greatest variation under the different estimations.
Results for variances and covariances,
cont

The majority of the residual
variation is at the patient level. The
estimates for the patient level
variance component remain close to
one and support the assumption of
binomial variance at the patient
level.