Download Dia 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Decision making as a model
6. A little bit of Bayesian statistics
1
0,9
0,8
0,7
0,6
0,5
Opdracht 5.1
0,4
0,3
0,2
0,1
0
0
0,2
0,4
0,6
0,8
1
Helling: .657,
Intercept .831
Az = .756
sds = 1.52 sdn
Opdracht 5.2
1
0,9
slope
0,8
.643
0,7
0,6
Intercept
0,5
1.267
0,4
0,3
Az
0,2
0,1
0
0
0,2
0,4
0,6
0,8
1
A
B
C
D
E
Prev.
.75
.60
.25
.10
.06
d'
1.28
1.50
1.77
1.90
1.85
c
-.44
-.19
.36
.70
.93
β
.57
.76
1.89
3.75
5.57
βuneq
.37
.49
1.22
2.41
3.58
LRopt
.15
.31
1.37
4.11
7.16
=
.857
N.B. a course on Bayesians statistics is impossible in one lecture
You might study:
-Bolstad, W.M. (2007). Introduction to Bayesian statistics, 2nd ed.
Hoboken N.J. : Wiley.
-Gill, J. (2002). Bayesian methods: A social and behavioral
approach. Boca Raton Fl: Chapman & Hall.
Or consult
Herbert Hoijtink
Irene Klugkist (M&T)
Classical statistics vs Bayesian statistics
Probability: confidence/strength
of belief based on all available
prior and actual evidence
Probability: limit of long run
relative frequency
Fixed unknown
parameters (like θ)
θ Stochastic
Inference based on
likelihood: p(data|θ)
Confidence Interval
p(….≤ x̅ ≤….|μ = x̅ )
Inference based on likelihood:
p(data|θ) and prior: p(θ)
≠
Credible Interval
p(….≤ μ ≤….| x̅ )
Bayesian statistics
Given data D, what can be said about the probability of posible
values of some unknown quantity θ ?
p(D|θi)•p(θi)
p(θi|D) = --------------------Σj p(D|θj)•p(θj)
Continous distributions:
NB.: θ is supposed to
be a random variable!
From p(Y|X) to continous
function:
pdf(y|X) pdf of y for some X
L(θ|D)•pdf(θ)
pdf(θ|D) = ∞-----------------------L(θ|D)•pdf(θ) dθ
-∞
L(x|Y) likelihood function of x
for some Y
(normalization constant)
A fair coin?
Beta(a,b):
K∙xa-1∙(1-x)b-1
Three possibilities:
Three priors
Γ(a+b)
K = ----------- , 0 ≤ x ≤ 1
Γ(a)Γ(b)
∞
Γ(a) = e-y ya-1dy
(a>0)
0
Γ(a) = (a-1)!
0
0,1
0,2 0,3
0,4
0,5
0,6
0,7 0,8
Probability of “Head”
0,9
1
(a integer)
beta(20,20)
beta(1,1)
beta(.5,.5)
We will throw the coin several times
L(θ|D)•pdf(θ)
∞-----------------------L(θ|D)•pdf(θ) dθ
pdf(θ|D) =
-∞
pdf(θ|D)  L(θ|D) • pdf(θ)
pdf(H|RinN)  L(H |RinN) • pdf(H)
3 different priors:
Binomial:
N
HR(1-H)N-R
L(H|RinN)=
R
NB: function of H, not R!
0
0,1
0,2 0,3
0,4
0,5
0,6
0,7 0,8
0,9
1
Likelihood
(binomial)
1x head: N=R=1;
L=H
N
HR(1-H)N-R
Prior (beta)
K1∙H19∙(1-H)19 ∙
Posterior
(beta)
= K1∙H20∙(1-H)19
R
N
K2∙H0∙(1-H)0 ∙
HR(1-H)N-R
= K2∙H1∙(1-H)0
HR(1-H)N-R
= K3∙H.5∙(1-H)-.5
R
N
K3
∙H-.5∙(1-H)-.5 ∙
R
beta(20,20)
beta(.5,.5)
beta(1,1)
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1
0
20
40
60
80
10 0
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1
posterior
Likelihood
function
1: head
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
0
20
40
60
80
10 0
0
20
40
60
80
10 0
1
1: head
2. tail
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1
1: head
2. tail
3: tail
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
0
20
40
60
80
10 0
0
20
40
60
80
10 0
1
1: head
2. tail
3: tail
4: head
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1
1: head
2. tail
3: tail
0
20
40
60
80
10 0
4: head
5: head
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1
0
10x (6 heads)
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1
20
40
60
80
10 0
20x (9 heads)
0
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
40x (18 heads)
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
40
60
80
10 0
1
0
0
20
1
20
40
60
80
10 0
80x (38 heads)
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1
0
160x (69 heads)
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1
Is that fair?
20
0
40
20
60
40
80
60
10 0
80
10 0
0
320x (120 heads)
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
640x (249 heads)
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
40
60
80
10 0
1
0
0
20
1
obviously not!
20
40
60
80
10 0
Narrow priors distributions have strong influence on the posterior
distribution
With lots of data the prior distribution does not make much
difference anymore (although narrow priors retain their
influence longer)
With fewer data the prior distribution does make a lot of difference,
so you have to have good reasons for your prior distribution
This was an untypical case: the blue line prior was quite
reasonable for coin-like objects, but this “coin” was a
computer simulation (with H=.40)
Essential: Suitable likelihoods and priors
Priors must be well founded or reasonable, especially when
informative (small sd)
Likelihood function must be good model of data (or data
producing process)
Priors and likelihood functions preferably conjugate (of same
family, so that product is tractible)
Simple case: infer μ from observation D (from normal distribution
known variance σD)
pdf(μ|D)  L(μ|D) • pdf(μ) (normal prior: mp,σp)
(or something like: g(μ|D)  f(D|μ) • g(μ) )
(D – μ)2
- ---------2σD2
1
--------- e
σD√2π
1
(μ – mp)2
- ---------2σp2
• -------- e
σp√ 2π
Lengthy derivation results in normal distribution with:
mean:
mp D
----2 + ----2
σp σD
-------------1
1
----2+ ---- 2
σp σD
and variance:
1
-----------1
1
---+ ---2
σp σD 2
mean:
mp D
----2 + ----2
σp σD
-------------1
1
----2+ ---- 2
σp σD
and variance:
1
-----------1
1
---+ ---2
σp σD 2
Add fractions, multiply numerator and denominator by σp2σD2 :
mean:
σD2mp + σp2D
------------------σD 2 + σp 2
and variance:
σp2σ D 2
-------------σD 2 + σp 2
If there are n independent observations D1, D2, D3……Dn
mean:
mp nmD
----2 + ----2
σp σD
--------------1
n
----2 + ----2
σp σD
and variance:
1
-----------1
n
----2+ ---- 2
σp σD
mean:
mp nmD
----2 + ----2
σp σD
--------------1
n
----2+ ---- 2
σp σD
and variance:
1
--------------1
n
---+ ---2
σp σD 2
Add fractions, multiply numerator and denominator by σp2σD2 :
mean:
σD 2mp + σp2 nmD
----------------------σD2 + nσp2
and variance:
-Weighing by inverse of variances
-Small variance prior weighs heavily
-Large n swamps prior
-More data: posterior variance decreases
σp 2 σD 2
--------------σD2 + nσp2
Example of
Bayesian statistics at UU
(approach Hoijtink c.s.)
Bayesian “AN(C)OVA”
In stead of: “is there a significant difference between groups – and if
so which? ”
“how much support from data to specific informative models?”
Global description!
For more information see:
Klugkist,I., Laudy,O. & Hoijtink, H. (2005). Inequality constrained
analysis of variance: A Bayesian approach. Psychological Methods,
10, 477-493.
Model selection according to Bayes Factor:
remember?
p(A|B)
------------- =
p(¬A|B)
p(B|A)
------------p(B|¬A)
•
p(A)
-------p(¬A)
posterior odds = likelihood ratio • prior odds
(Bayes Factor)
BF = posterior odds / prior odds
Bayes Factor in general: extent to which data support one model
better/worse than other model:
BF12
p(D|M1)
= ----------p(D|M2)
Example 1: four groups.
three models (constraints on values μ’s):
M1: ( μ1,μ2 ) > ( μ3,μ4 )
M2: μ1 < μ2 < μ3 , μ4
M3: μ1 < μ2 , μ3 ≈ μ4
against encompassing model without constraints:
M0: μ1, μ2 , μ3, μ4
μ2
For explanatory purpose simpler
example 2: two groups:
M1: μ1 > μ2
M2: μ1 ≈ μ2
M0: μ1 , μ2
μ2
μ1
μ1
μ2
μ1
specify diffuse prior for M0
Compute posterior (≈likelihood) for encompassing model
(M0)-given data
For every model estimate proportion of encompassing prior (1/c)
and of posterior (1/d) satisfying constrictions implied by that model.
Works with simulated sampling.
For M0 : 1/c0 = 1/d0 = 1
For M1 : 1/c1 = .5 (viewed from above:)
μ2
(e.g.) 1/d1 = .99
μ1
μ2
For M2 : 1/c2 = .02 and 1/d2 = .003
μ1
Select model by Bayes Factors:
In general:
p(D|M)∙p(M)
p(M|D)∙p(D)
p(M|D) = -----------------  p(D|M) = ---------------p(D)
p(M)
p(M1|D)∙p(D)
p(M1|D)
1
-------------------------------p(D|M1)
p(M1)
p(M0|D)
d1
----------- = ------------------------ = ----------------- = ------------p(D|M0)
p(M0|D)∙p(D)
p(M1)
1
---------------------------p(M0)
p(M0)
c1
BFm0 =
1/dm
---------1/cm
Bayes factor takes complexity (size of parameter space) in to
account of model by denominator 1/c
Assuming the prior probability of all models (including the
comassing model) are equal, the Posterior Model Probability (PMP)
can be computed from Bayes factors:
p(D|Mi)•p(Mi)
PMPi = p(Mi|D) = ---------------------∑ p(D|Mj)•p(Mj)
j
Equal model
priors:
p(D|Mi)
= -------------∑ p(D|Mj)
j
p(D|Mi)/p(D|M0)
= ---------------------∑ p(D|Mj)/p(D|M0)
j
BFi0
= -----------∑ BFj0
j
Our example:
BF10= .99/.5 = 1.98
BF20= .003/.02 = .15
PMP1 = 1.98/(1.98 +.15 + 1) = .63
PMP2 =
.15/(1.98 + .15 + 1) = .05
PMP0 =
1 /(1.98 + .15 + 1) = .32
M1 is clearly superior.
Example 1 (four groups) would require four-dimensional
drawings,
but can be computed by means of software by Hoijtink &co.
(2- 8 groups and 0-2 covariates!)
Related documents