Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Degrees of freedom (statistics) wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Taylor's law wikipedia , lookup

Gibbs sampling wikipedia , lookup

Statistical inference wikipedia , lookup

Student's t-test wikipedia , lookup

Regression toward the mean wikipedia , lookup

Transcript
1
WinBUGS Examples
“I confess that I have been a blind mole, but it is better to learn
wisdom late than never to learn it at all.”
Sherlock Holmes
The Man with the Twisted Lip
James B. Elsner
Department of Geography, Florida State University
Portions of this presentation are taken from Getting Started with WinBUGS, Gillian Raab, June 1999.
2
We practice the WinBUGS language format using a simple example of
estimating a mean and variance of 20 observations using a diffuse prior for the
mean and variance (precision). In other words, we do not want to bring in any
prior information.
In WinBUGS we specify normal distributions in terms of their mean and precision. The
precision that is usually represented by tau is 1/sigma^2.
WinBUGS requires all priors to be proper. That is, they integrate to a finite number. For
the mean of a distribution, the usual choice is a normal distribution. If we don’t want to
bring in any prior information, then we allow the standard deviation to cover a very wide
range of values. The default in WinBUGS is a value of 1.0E-6 for the precision. Thus we
are allowing a 95% credible interval between –2,000,000 and +2,000,000. That is pretty
vague, but you can always make it even more vague.
Priors for the precision are less obvious. One way to specify the prior on the precision is
to use a uniform distribution over a fixed range, which gives a Pareto distribution for the
precision. Another is to use a member of the Gamma family of distributions for the
precision, which is a conjugate prior for the precision in a normal distribution.
Conjugate priors are choices of prior distributions that are naturally connected with a
particular likelihood (Poisson, Gamma, etc.). The conjugate prior has the same form as
the likelihood, except that the role of the “data” and the “estimate” are switched, and the
conjugate prior and the likelihood have different normalizing constants.
3
It is recommended that you specify a vague prior for tau as Gamma(0.001,0.001). This
is the default when you write your model with a doodle. The first parameter of the
distribution is labeled a and the second is labeled q. The mean of the distribution is
given by aq and the variance by aq^2.
We will use doodle.
Open WinBUGS > Doodle > New…
We have 20 values of the random variable x, so we need to set up a plate in our doodle.
4
Doodle Quick Help
5
Node
Plate
Edge
Select Doodle > Write Code
model {
mean ~ dnorm(0,0.00001) # Prior on normal mean
tau ~dgamma(0.01, 0.01) # Prior on normal precision
sd <-sqrt(1/tau)
# Variance = 1/precision
for (i in 1:N) { x[i] ~ dnorm(mean,tau)}
}
Notice that because we use tau to define the variability, we calculate the standard
deviation from tau, with the sd being a logical node.
In the Code window add the data.
list(N=20,
x=c(98,160,136,128,130,114,123,134,128,107,123,125,129,132,154,115,126,132,136,130))
6
X[i]~dnorm(mean,tau)
Select Model > Specification
check model
load data
gen inits
shape parameter of gamma tau
too small – cannot sample.
To overcome this, we can specify an initial value for tau—any reasonable value will do,
even if it is wrong, the sampler will bring them back into the correct range after a few
samples.
7
Recheck the model, load the data, and compile. Then select the list(tau=1) and click
load inits. Then click gen inits to generate the other initial values (in this case, mean).
Select Inference > Sample to bring up the Sample Monitor Tool. Type mean and set, then
sd and select.
Select Model > Update to bring up the Update Tool.
Choose 1000 updates, then go
back to the Sample Monitor
Tool and select stats.
8
The summary statistics from one WinBUGS run shows a posterior mean for the mean
node of 128.0. The posterior sd for the mean is 3.41. The 95% credible interval for the
mean is (121.2, 134.9).
This is compared with a frequentist result using the t distribution. To see this using R,
type:
>mean(x)+1.96*sd(x)/sqrt(length(x))
> [1] 134.1
>mean(x)-1.96*sd(x)/sqrt(length(x))
> [1] 121.9
The mean is 128.0 with a 95% confidence interval (121.9, 134.1).
The posterior mean for the sd node is 14.49, this compares with a frequentist result of
13.9. Note the posterior credible limits (10.58, 20.46). Using frequentist approach, we
would use the c-squared distribution.
9
The MC error tells you to what extent simulation error contributes to the uncertainty in
the estimation of the mean. This can be reduced by generating additional samples. The
simulation errors on the 95% credible interval values will be considerably larger because
of the small numbers in the tails of the distribution.
You should examine the trace of all your samples. To do this select the history button on
the Sample Monitor Tool.
mean
If the samples are largely independent,
150.0
it will look very jagged indicating
140.0
130.0
little autocorrelation between samples.
120.0
110.0
You can address the question of sample
autocorrelation directly by selecting the
auto cor button. Autocorrelation is not
a problem in this example.
If autocorrelation exists, you need to
generate additional samples.
1
250
500
750
1000
iteration
mean
1.0
0.5
0.0
-0.5
-1.0
0
20
40
lag
The results from the WinBUGS Bayesian approach are very similar to the results from a
frequentist approach. Why?
This is a complicated analysis for a simple problem, but it forms the building blocks for
more complex problems.
We continue practicing with WinBUGS. Here is an ordinary least squares regression
example.
The following data need to be fit by linear regression. They concern the fuel
consumption (reponse variable, y) and weight (explanatory variable, x) of ten cars.
Y[]
3.4
3.8
9.1
2.2
2.6
2.9
2.0
2.7
1.9
3.4
X[]
5.5
5.9
6.5
3.3
3.6
4.6
2.9
3.6
3.1
4.9
8
6
Y
10
4
2
0
2
3
4
5
6
X
The plot shows the data along with the OLS regression line. We can see that there is
what appears to be a single influential point that may be an outlier.
11
The regression fit using a frequentist approach is described by the follow summary,
as given in R.
12
Alternatively, we can set up a Bayesian regression model.
Start with Doodle.
Add a plate.
index: i
from: 1 up to: N
Add nodes. On the plate.
name: Y[i] type: stochastic density: dnorm
mean: mu[i] precision tau
name: mu[i] type: logical density: identity
value: a+b*X[i]
name: X[i] type: constant
Off the plate.
name: a type: stochastic density: dnorm
mean: 0.0 precision 1.0E-6
name: b type: stochastic density: dnorm
mean: 0.0 precision 1.0E-6
13
Add nodes. Off the plate.
name: tau type: stochastic density: dgamma
shape: 1.0E-3 scale: 1.0E-3
name: sd type: logical link: identity
value: 1/sqrt(tau)
Connect the nodes using edges to produce the following linear regression doodle.
Select Doodle > Write Code, then add data
and initial value for tau.
14
Use the Sample Monitor Tool to set the node values for a and b. The select Model >
Update … to generate samples.
Check for autocorrelation among the samples. What do you find?
How do the results compare with the frequentist approach?
b sample: 1000
1.5
1.0
0.5
0.0
-2.0
0.0
2.0
What about the outlier? It is certainly an influential point in the regression.
A frequentist approach might eliminate it. Classical regression results with that point
removed are:
8
Y
15
4
0
3
6
X
16
In ordinary regression we assume normally distributed errors. If we think that there
might be outliers, then it is better to assume an error distribution with longer tails
than the normal. The t distribution is better, for example.
We can do this easily in WinBUGS by replacing the normal distribution assumption
on Y[] by a t distribution with some small number of degrees of freedom. The
smaller the degrees of freedom, the longer the tails. This makes the regression
robust to outliers.
Run the sampler and generate 5000
samples. It is always a good idea to
discard the first few hundred or so
samples so that the final solution has not
“memory” of it’s initial values (burn-in).
In the Sample Monitor Tool dialog set
beg to 1001 and end to 5000. Print out
the statistics for the slope and intercept
nodes.
17
Plot the trace and autcorrelation to check for correlation in the samples.
b
b
1.0
0.5
0.0
-0.5
-1.0
1.5
1.0
0.5
0.0
5850
5900
5950
Some weak
autocorrelation
in the samples.
iteration
0
20
40
lag
Examine the posterior density for the slope parameter. Compare it to the original
model.
Select coda to print out the
successive sample values. You
b sample: 4000
can cut/paste or export to another
6.0
program.
4.0
2.0
0.0
-0.5
0.0
0.5
1.0
18 With the t distribution, we get a very different result. The regression is now what we call
resistant regression, meaning that it is not unduly influenced by outliers. It works because
the likelihood now recognizes the possibility of finding an odd point in the tails of the
distribution.
You can experiment with different values of the degrees of freedom to see how the
posterior distribution on the slope parameter changes.
How do the results from the Bayesian model compare with those from the leave-it-out
frequentist model?
Discussion questions:
How do you choose a value for the degrees of freedom in the t distribution?
Would we want to assume this is also a parameter with a distribution?
Model choice is part of the prior, so should it be chosen before we see the data?
Next: Hierarchical models.