Download MAST30027: Modern Applied Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
Transcript
MAST30027: Modern Applied Statistics
Week 12 Lab Sheet
1. Suppose that X =
X1
X2
∼ N (µ, Σ), with µ =
µ1
µ2
and Σ =
σ12
σ12
σ12
σ22
.
(a) Show that the conditional distribution of X1 |X2 = x2 is normal with mean µ1 +(x2 −µ2 )σ12 /σ22
and variance σ12 − σ12 /σ22 .
a b
Solution: Let Σ−1 =
, then the condional marginal distribution of X1 given X2 =
b c
x2 is
f (x1 , x2 )
f (x2 )
∝
f (x1 , x2 )
∝
exp{− 12 [(x1 − µ1 )2 a + 2(x1 − µ2 )(x2 − µ2 )b + (x2 − µ2 )2 c]}
∝
exp{− 21 [x21 a − 2x1 (µ1 a − (x2 − µ2 )b]}
∝
exp{− 21 [x1 − (µ1 − (x2 − µ2 )b/a)]2 a}
2
Thus X1 |X2 = x2 ∼ N (µ1 − (x2 − µ2 )b/a, 1/a), where a = σ22 /(σ12 σ22 − σ12
) and b =
2
2
−σ12 /(σ12 σ22 − σ12
), and thus b/a = −σ12 /σ22 and 1/a = σ12 − σ12
/σ22 .
(b) Write an
uses the Gibbs sampler to generate a sample of size n = 1000 from
R function
that
0
4 1
the N
,
distribution.
0
1 4
Plot traces of X1 and X2 .
Solution: To test the simulator we do a normal probability plot for each marginal, and both
look good. The traces show pretty good mixing.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
+
+
+
+
>
>
>
>
>
>
set.seed(200)
# params
mu1 <- 0
mu2 <- 0
s11 <- 4
s12 <- 1
s22 <- 4
# initial values
x1 <- 6
x2 <- -6
# sample size
nreps <- 1000
Gsamples <- matrix(nrow=nreps, ncol=2)
Gsamples[1,] <- c(x1, x2)
# main loop
for (i in 2:nreps) {
x1 <- rnorm(1, mu1 + (x2 - mu2)*s12/s22, sqrt(s11 - s12/s22))
x2 <- rnorm(1, mu2 + (x1 - mu1)*s12/s11, sqrt(s22 - s12/s11))
Gsamples[i,] <- c(x1, x2)
}
# output
par(mfrow=c(2,2), mar=c(2,4,1,1))
qqnorm(Gsamples[,1], main="x1")
qqnorm(Gsamples[,2], main="x2")
plot(Gsamples[,1], type="l", xlab="iteration", ylab="x1")
plot(Gsamples[,2], type="l", xlab="iteration", ylab="x2")
1
x2
2
0
−4
−2
Sample Quantiles
0
−6
−5
Sample Quantiles
5
4
6
x1
−3
−2
−1
0
1
2
3
−3
−1
0
1
2
3
Theoretical Quantiles
−6
−5
−4
−2
0
x2
0
x1
2
5
4
6
Theoretical Quantiles
−2
0
200
400
600
800 1000
0
200
400
600
800 1000
(c) Use your simulator to estimate P(X1 ≥ 0, X2 ≥ 0). To get a feel for the convergence rate,
calculate the estimate using samples {1, . . . , k}, for k = 1, . . . , n, and then plot the estimates
against n.
Solution: The plot appears after part (d).
> par(mfrow=c(1,1))
> success <- apply(Gsamples, 1, function(x) (x[1] > 0)&(x[2] > 0))
> mean(success)
[1] 0.296
> plot(1:nreps, cumsum(success)/(1:nreps), type="l", xlab="k", ylab="prob", ylim=c(0,1))
4 2.8
(d) Now change Σ to
and generate another sample of size 1000.
2.8 4
What do the traces/estimates look like now?
Solution: We put s12 <- 2.8 then re-run the code above, getting a different Gsamples.
We plot the cumlative estimates on top of the previous graph using lines. The cumulative
estimates are more volatile in the second case, reflecting the stronger autocorrelation in the
Markov chain, caused by the stronger correlation between X1 and X2 .
> success <- apply(Gsamples, 1, function(x) (x[1] > 0)&(x[2] > 0))
> mean(success)
[1] 0.38
> lines(1:nreps, cumsum(success)/(1:nreps), col="red")
2
1.0
0.8
0.6
0.0
0.2
0.4
prob
0
200
400
600
800
1000
k
2. Read the “Dyes” example from the WinBUGS Examples Vol. 1 (in the Help menu).
The posterior predictive distribution is the posterior distribution of a fitted value. Let T be some
function, applied to either the observed values or the fitted values, then the posterior predicted
p-value is the probability that T applied to the fitted values is larger than T applied to the observations. The posterior predicted p-value is averaged over the posterior distribution of the parameters,
so it is a single number rather than a distribution. Values between 0.05 and 0.95 are generally considered reasonable, while a value smaller than 0.01 or larger than 0.99 indicates a major failure of
the model (Gelman et al. §6.3)
Here is a modification of the Dyes code from the examples; six lines have been added to the model.
Note that the ranked function picks out a particular value from an ordered sample (the largest in
this case; see the User Manual in the Help menu).
model
{
for( i in 1 : batches ) {
mu[i] ~ dnorm(theta, tau.btw)
for( j in 1 : samples ) {
y[i , j] ~ dnorm(mu[i], tau.with)
yfit[i, j] ~ dnorm(mu[i], tau.with)
resid[i, j] <- abs(y[i, j] - mu[i])
fresid[i, j] <- abs(yfit[i, j] - mu[i])
}
largest[i] <- ranked(resid[i, ], samples)
flargest[i] <- ranked(fresid[i, ], samples)
pppv[i] <- step(largest[i] - flargest[i])
3
}
theta ~ dnorm(0.0, 1.0E-10)
# prior for within-variation
sigma2.with <- 1 / tau.with
tau.with ~ dgamma(0.001, 0.001)
# prior for between-variation
# ICC = sigma2.btw / (sigma2.btw + sigma2.with)
ICC ~ dunif(0,1)
sigma2.btw <- sigma2.with * ICC/(1-ICC)
tau.btw <- 1/sigma2.btw
}
# data
list(batches = 6, samples = 5,
y = structure(
.Data = c(1545, 1440, 1440, 1520, 1580,
1540, 1555, 1490, 1560, 1495,
1595, 1550, 1605, 1510, 1560,
1445, 1440, 1595, 1465, 1545,
1595, 1630, 1515, 1635, 1625,
1520, 1455, 1450, 1480, 1445),
.Dim = c(6, 5)))
# initial values
list(theta=1500, tau.with=1,ICC=0.5,
mu=c(1,1,1,1,1,1),
yfit=structure(
.Data = c(1545, 1440, 1440, 1520, 1580,
1540, 1555, 1490, 1560, 1495,
1595, 1550, 1605, 1510, 1560,
1445, 1440, 1595, 1465, 1545,
1595, 1630, 1515, 1635, 1625,
1520, 1455, 1450, 1480, 1445),
.Dim = c(6, 5)))
What are the nodes yfit, resid, fresid and pppv for?
Fit the model and monitor pppv. Does the output suggest any problems with model fit?
Solution: Here yfit are the fitted values, resid are the distances between the observations and
their means and fresid are the corresponding values for the fitted values.
In this example we are calculating posterior predicted p-values for six test statistics, one for each
batch. For batch i the test statistic is maxj |yij − µi |. pppv[i] is a Bernoulli r.v., equal to 1 when
the test statistic is larger for the fitted values than the observed values. The posterior mean of
pppv[i] gives us an estimate of the posterior predicted p-value.
Running the model we get posterior means for the pppv[i] all comfortably between 0.05 and 0.95,
so (from the point of view afforded by these test statistics) there is no cause to worry that the
model does not fit.
3. The “Rats” example in WinBUGS Examples Vol. 1 and the “Birats” example in Vol. 2 both model
data on the growth of baby rats.
Estimate the DIC for these two models. In both cases run at least 1000 burn-in iterations before
you set the DIC, then use output for the DIC based on 10,000 additional iterations. You can use
the initial values given in the examples.
What is the estimated effective number of (free) parameters in each model? What could explain
the difference?
Is one model strongly preferred to the other after the penalty for model complexity is taken in to
account?
4
Solution:. To estimate the DIC we use the DIC... button in the Inference menu. For the first
model we have an estimated DIC of 1020.5 and effective number of parameters pD = 54.5. For the
second model the estimated DIC is 1019.2 and pD = 49.9.
For both models we have yij ∼ N (µij , 1/τ ) where µij = αi + βi xj for some αi and βi . Without any
prior information about the parameters, both models have 61 free parameters since i = 1, . . . , 30.
In the first model αi and βi are independent, while the second model allows for dependence between
αi and βi , for each i. The priors for the first model are clearly not completely uninformative, as the
effective number of parameters has come down from 61 to 54.5. The second model is a generalisation
of the first, which we naturally think of as meaning it is more flexible. Thus it is perhaps surprising
that the effective number of parameters has gone down. The reason that pD is lower for the second
model is that the priors are not completely uninformative, so the extra structure that allows for
dependence between αi and βi ends up introducing a (small) prior dependence between them.
The difference in DIC is small, so there is no strong preference for one model over the other.
5