Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Module 2:
Bayesian Hierarchical Models
Instructor: Elizabeth Johnson
Course Developed: Francesca Dominici and Michael Griswold
The Johns Hopkins University
Bloomberg School of Public Health
2006 Hopkins Epi-Biostat Summer Institute
1
Key Points from yesterday

“Multi-level” Models:
 Have
covariates from many levels and their interactions
 Acknowledge
correlation among observations from
within a level (cluster)

Random effect MLMs condition on unobserved
“latent variables” to describe correlations

Random Effects models fit naturally into a
Bayesian paradigm

Bayesian methods combine prior beliefs with the
likelihood of the observed data to obtain posterior
inferences
2006 Hopkins Epi-Biostat Summer Institute
2
Bayesian Hierarchical Models

Module 2:
 Example
1: School Test Scores

The simplest two-stage model

WinBUGS
 Example
2: Aww Rats

A normal hierarchical model for repeated
measures

WinBUGS
2006 Hopkins Epi-Biostat Summer Institute
3
Example 1:
School Test Scores
2006 Hopkins Epi-Biostat Summer Institute
4
Testing in Schools

Goldstein et al. (1993)

Goal: differentiate between `good' and `bad‘
schools

Outcome: Standardized Test Scores

Sample: 1978 students from 38 schools


MLM: students (obs) within schools (cluster)
Possible Analyses:
1.
Calculate each school’s observed average score
2.
Calculate an overall average for all schools
3.
Borrow strength across schools to improve individual
school estimates
2006 Hopkins Epi-Biostat Summer Institute
5
Testing in Schools

Why borrow information across schools?

Median # of students per school: 48, Range: 1-198

Suppose small school (N=3) has: 90, 90,10 (avg=63)

Suppose large school (N=100) has avg=65

Suppose school with N=1 has: 69 (avg=69)

Which school is ‘better’?

Difficult to say, small N  highly variable estimates

For larger schools we have good estimates, for
smaller schools we may be able to borrow information
from other schools to obtain more accurate estimates

How? Bayes
2006 Hopkins Epi-Biostat Summer Institute
6
Testing in Schools: “Direct Estimates”
100
Mean Scores & C.I.s for Individual Schools
80
Model: E(Yij) = j =  + b*j
score
60
b *j
0
20
40

0
10
20
30
2006 Hopkins Epi-Biostat
school Summer Institute
40
7
Fixed and Random Effects

Standard Normal regression models: ij ~ N(0,2)
1. Yij =  + ij
j = X (overall avg)
2. Yij = j + ij
j = Xj (school avg)
=  + b*j + ij
= X + b*j = X + (Xj – X)
2006 Hopkins Epi-Biostat Summer Institute
Fixed
Effects
8
Fixed and Random Effects

Standard Normal regression models: ij ~ N(0,2)
1. Yij =  + ij
j = X (overall avg)
2. Yij = j + ij
j = Xj (shool avg)
=  + b*j + ij

= X + b*j = X + (Xj – X)
Fixed
Effects
A random effects model:
3. Yij | bj =  + bj + ij, with: bj ~ N(0,2) Random Effects
Represents Prior beliefs about
similarities between schools!
2006 Hopkins Epi-Biostat Summer Institute
9
Fixed and Random Effects

Standard Normal regression models: ij ~ N(0,2)
1. Yij =  + ij
j = X (overall avg)
2. Yij = j + ij
j = Xj (shool avg)
=  + b*j + ij

= X + b*j = X + (Xj – X)
Fixed
Effects
A random effects model:
3. Yij | bj =  + bj + ij, with: bj ~ N(0,2) Random Effects
j = X + bjblup = X +
b*j = X +
(Xj – X)

Estimate is part-way between the model and the data

Amount depends
on variability () and underlying truth ()
10
2006 Hopkins Epi-Biostat Summer Institute
100
Testing in Schools: Shrinkage Plot
60
40

bj
0
score
b *j
20
80
Direct Sample Ests
Bayes Shrunk Ests
0
10
20
30
40
school
2006 Hopkins Epi-Biostat Summer Institute
11
Testing in Schools: Winbugs
Data: i=1..1978 (students), s=1…38 (schools)
 Model:
 Yis ~ Normal(s , 2y)
 s ~ Normal( , 2) (priors on school avgs)

Note: WinBUGS uses precision instead of
variance to specify a normal distribution!

WinBUGS:

Yis ~ Normal(s , y) with: 2y = 1 / y

s ~ Normal( , ) with: 2 = 1 / 
2006 Hopkins Epi-Biostat Summer Institute
12
Testing in Schools: Winbugs
WinBUGS Model:
 Yis ~ Normal(s , y) with: 2y = 1 / y
 s ~ Normal( , ) with: 2 = 1 / 
 y ~ (0.001,0.001) (prior on precision)
 Hyperpriors

 Prior
on mean of school means
  ~ Normal(0 , 1/1000000)
 Prior on precision (inv. variance) of school means
  ~ (0.001,0.001)

Using “Vague” / “Noninformative” Priors
2006 Hopkins Epi-Biostat Summer Institute
13
Testing in Schools: Winbugs

Full WinBUGS Model:
 Yis ~ Normal(s , y) with: 2y = 1 / y
 s ~ Normal( , ) with: 2 = 1 / 
 y ~ (0.001,0.001)
  ~ Normal(0 , 1/1000000)
  ~ (0.001,0.001)
2006 Hopkins Epi-Biostat Summer Institute
14
Testing in Schools: Winbugs
WinBUGS Code:
model
{
for( i in 1 : N ) {
Y[i] ~ dnorm(mu[i],y.tau)
mu[i] <- alpha[school[i]]
}
for( s in 1 : M ) {
alpha[s] ~ dnorm(alpha.c, alpha.tau)
}
y.tau ~ dgamma(0.001,0.001)
sigma <- 1 / sqrt(y.tau)
alpha.c ~ dnorm(0.0,1.0E-6)
alpha.tau ~ dgamma(0.001,0.001)
}

2006 Hopkins Epi-Biostat Summer Institute
15
Testing in Schools: Winbugs
Lets fit this one together!
 All the “model”, “data” and “inits” files are
now posted on the course webpage for
you to use for practice!

2006 Hopkins Epi-Biostat Summer Institute
16
Example 2: Aww, Rats…
A normal hierarchical model for
repeated measures
2006 Hopkins Epi-Biostat Summer Institute
17
Improving individual-level estimates

Gelfand et al (1990)

30 young rats, weights measured weekly for five weeks

Dependent variable (Yij) is weight for rat “i” at week “j”

Data:

Multilevel: weights (observations) within rats (clusters)
2006 Hopkins Epi-Biostat Summer Institute
18
Individual & population growth

Rat “i” has its own
expected growth line:

Weight
E(Yij) = b0i + b1iXj
There is also an
overall, average
population growth
line:
Pop line
(average growth)
E(Yij) = 0 + 1Xj
Individual Growth Lines
Study Day (centered)
2006 Hopkins Epi-Biostat Summer Institute
19
Improving individual-level estimates

Possible Analyses
1.
Each rat (cluster) has its own line:
intercept= bi0, slope= bi1
2.
All rats follow the same line:
bi0 = 0 , bi1 = 1
3.
A compromise between these two:
Each rat has its own line, BUT…
the lines come from an assumed distribution
E(Yij | bi0, bi1) = bi0 + bi1Xj
“Random Effects”
bi0 ~ N(0 , 02)
bi1 ~ N(1 , 12)
2006 Hopkins Epi-Biostat Summer Institute
20
Weight
A compromise:
Each rat has its own line, but information is
borrowed across rats to tell us about individual
rat growth
Pop line
(average growth)
Bayes-Shrunk Individual Growth Lines
2006 Hopkins Epi-Biostat Summer Institute
Study Day (centered)
21
Rats: Winbugs (see help: Examples Vol I)

WinBUGS Model:
2006 Hopkins Epi-Biostat Summer Institute
22
Rats: Winbugs (see help: Examples Vol I)

WinBUGS Code:
2006 Hopkins Epi-Biostat Summer Institute
23
Rats: Winbugs (see help: Examples Vol I)

WinBUGS Results:
10000 updates
beta.c sample: 10000
alpha0 sample: 10000
4.0
3.0
2.0
1.0
0.0
0.15
0.1
0.05
0.0
90.0
100.0
110.0
120.0
5.5
5.75
6.0
6.25
6.5
sigma sample: 10000
1.0
0.75
0.5
0.25
0.0
4.0
6.0
2006 Hopkins Epi-Biostat Summer Institute
8.0
24
Interpretation of the results:





Primary parameter of interest is beta.c
Our estimate is 6.185
(95% Interval: 5.975 – 6.394)
We estimate that a “typical” rat’s weight will
increase by 6.2 gm/day
Among rats with similar “growth influences”, the
average weight will increase by 6.2 gm/day
95% Interval for the expected growth for a rat is
5.975 – 6.394 gm/day
2006 Hopkins Epi-Biostat Summer Institute
25
WinBUGS Diagnostics:




MC error tells you to what extent simulation error contributes
to the uncertainty in the estimation of the mean.
This can be reduced by generating additional samples.
Always examine the trace of the samples.
To do this select the history button on the Sample Monitor
Tool.
Look for:
 Trends
 Correlations
mean

150.0
140.0
130.0
120.0
110.0
1
250
500
750
1000
iteration
2006 Hopkins Epi-Biostat Summer Institute
26
Rats: Winbugs (see help: Examples Vol I)

WinBUGS Diagnostics: history
alpha0
130.0
120.0
110.0
100.0
90.0
1001
2500
5000
7500
10000
iteration
beta.c
6.75
6.5
6.25
6.0
5.75
5.5
1001
2500
5000
7500
10000
iteration
sigma
9.0
8.0
7.0
6.0
5.0
4.0
1001
2500
2006
5000
7500 Institute
Hopkins Epi-Biostat
Summer
iteration
10000
27
WinBUGS Diagnostics:


Examine sample autocorrelation directly by selecting
the ‘auto cor’ button.
If autocorrelation exists, generate additional
samples and thin more.
mean
1.0
0.5
0.0
-0.5
-1.0
0
20
40
lag
2006 Hopkins Epi-Biostat Summer Institute
28
Rats: Winbugs (see help: Examples Vol I)

WinBUGS Diagnostics: autocorrelation
alpha0
1.0
0.5
0.0
-0.5
-1.0
beta.c
0
20
1.0
0.5
0.0
-0.5
-1.0
40
lag
sigma
0
20
1.0
0.5
0.0
-0.5
-1.0
40
lag
0
20
40
2006 Hopkins Epi-Biostat Summer Institute
lag
29
WinBUGS provides machinery for Bayesian
paradigm “shrinkage estimates” in MLMs
Pop line
(average growth)
Individual Growth Lines
Weight
Weight
Bayes
Pop line
(average growth)
Bayes-Shrunk Growth Lines
Study Day (centered)
Study Day (centered)
2006 Hopkins Epi-Biostat Summer Institute
30
School Test Scores Revisited
2006 Hopkins Epi-Biostat Summer Institute
31
Testing in Schools revisited

Suppose we wanted to include covariate
information in the school test scores example

Student-level covariates




Gender
London Reading Test (LRT) score
Verbal reasoning (VR) test category (1, 2 or 3, where 1
represents the highest level of understanding)
School -level covariates


Gender intake (all girls, all boys or mixed)
Religious denomination (Church of England, Roman
Catholic, State school or other)
2006 Hopkins Epi-Biostat Summer Institute
32
Testing in Schools revisited

Model

Wow! Can YOU fit this model?

Yes you can!

See WinBUGS>help>Examples Vol II for data,
code, results, etc.

More Importantly: Do you understand this model?
2006 Hopkins Epi-Biostat Summer Institute
33
Additional Comments:
Y is actually standardized score
(difference from expected norm in
standard deviations)
 What are the fixed effects in the model?

β are the fixed effects (measured both at
the school and student level)
 Assume these are independent normal
 The
2006 Hopkins Epi-Biostat Summer Institute
34
Additional Comments:

What are the random effects in the model?
 The α are the random effects (at the school
 Assume these are multivariate normal
level)
 These
may represent a) inherent school differences
(random intercept) b) inherent school difference in
terms of LRT and c) inherent school differences in
terms of VR test
 Fixed effects interpretations are conditional on
schools where these random effects are similar.

In this example we also put a model on the
overall variance: we assume that the inverse of
the between-pupil variance will increase linearly
with LRT score
2006 Hopkins Epi-Biostat Summer Institute
35
Some results:
node
mean
sd
MC error
2.50% median
97.50%
beta[1]
2.62E-04 9.87E-05 2.73E-06 6.95E-05 2.63E-04 4.58E-04
beta[2]
0.4163 0.06504 0.00332
0.2875
0.4182
0.537
beta[3]
0.1715 0.04775 0.001163 0.07816
0.1714
0.2663
beta[4]
0.1192
0.134 0.006156
-0.1459
0.1206
0.3731
beta[5]
0.06045
0.1044 0.004469
-0.15 0.06354
0.2612
beta[6]
-0.2839
0.1818 0.005977
-0.6371
-0.2868 0.07477
beta[7]
0.1497
0.1062 0.00392 -0.05925
0.1487
0.3657
beta[8]
-0.1574
0.1763 0.006249
-0.4984
-0.1595
0.1949
gamma[1]
-0.6726
0.1003 0.006384
-0.8611
-0.674
-0.4734
gamma[2]
0.03135 0.01022 1.31E-04 0.01128 0.03127 0.05167
gamma[3]
0.9511 0.09027 0.004472
0.7763
0.9532
1.119
max.var
0.6228 0.06987 7.49E-04
0.4967
0.6186
0.7709
min.var
0.5138 0.05349 6.45E-04
0.4181
0.5113
0.6276
phi
-0.00266 0.002843 3.28E-05 -0.00831 -0.00265 0.002981
theta
0.5792 0.03313 3.67E-04
0.5154
0.5795
0.6435
2006 Hopkins Epi-Biostat Summer Institute
36
Some results:



Gamma[1] to Gamma[3] represent the means of
the random effects distributions
Gamma[1] is the mean of the random intercept
distribution; hard to interpret in this case
Gamma[2] is the mean of the random effect of
LRT
 Among
children from schools with similar latent
effects, a one unit increase in LRT yeilds a 0.03
standard deviation increase in the child’s test score.
2006 Hopkins Epi-Biostat Summer Institute
37
Some results:



Gamma[3] is the mean of the random effect for
the VR test.
Among children from schools with similar latent
effects, children with the highest VR scores have
test scores that are on average 0.95 standard
deviations greater than children with the lowest
VR scores (95% CI: 0.78 – 1.12)
Among children from schools with similar latent
effects, children with the “moderate” VR scores
have test scores that are on average 0.42
standard deviations greater than children with
the lowest VR scores (95% CI: 0.29 – 0.54).
2006 Hopkins Epi-Biostat Summer Institute
38
Some results:
Among children from similar schools, girls
have average test scores that are 0.17
standard deviation greater than boys (95%
CI: 0.08 – 0.27)
 Among similar schools, all girls schools
have average test scores that are 0.12
standard deviations greater than mixed
schools (95% CI: -0.15 – 0.37)

2006 Hopkins Epi-Biostat Summer Institute
39
Bayesian Concepts

Frequentist: Parameters are “the truth”

Bayesian: Parameters have a distribution

“Borrow Strength” from other observations

“Shrink Estimates” towards overall averages

Compromise between model & data

Incorporate prior/other information in estimates

Account for other sources of uncertainty

Posterior  Likelihood * Prior
2006 Hopkins Epi-Biostat Summer Institute
40
Related documents