Download Latent variables and missing data

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Part 10:
Latent variables and
missing data
1
Missing data
The Bayesian approach to missing data problems in
statistics is straightforward
•
since missing data may be treated the same as
any other unknown parameter
‣
with a full conditional for GS
‣
or via MH
2
Data augmentation
Dealing with missing data is so easy that sometimes it
makes sense to imagine that there is missing data to
make inference more computationally tractable
•
such imaginary missing quantities are
sometimes called latent variables
This technique is called data augmentation, and is
especially useful when latent variables
•
•
allow full conditionals for the other unknown
parameters to be derived
and have full conditionals themselves
3
Example: Genetic linkage
The genetic linkage of 197 animals is allocated to one
of four categories
Y = (y1 , y2 , y3 , y4 ) = (125, 18, 20, 34)
with probabilities
(1/2 + ✓/4, (1
✓)/4, (1
where ✓ is unknown
4
✓)/4, ✓/4)
Example: Intractable posterior
Suppose we place a Beta(a, b) prior on ✓
Then
⇡(✓|y) /
✓
|
1 ✓
+
2 4
◆y 1
(1 ✓)
{z
y2 +y3 y4
multinomial likelihood
/ (2 + ✓) (1
y1
✓
}
⇥✓
a 1
y2 +y3 +b 1 y4 +a 1
✓)
✓
How can we sample from this?
•
•
One option: MH
Another option: Data augmentation
5
(1
b 1
✓)
Example: Data augmentation
Consider data augmentation with the data set
X = (x1 , x2 , x3 , x4 , x5 )
with x1 + x2 = y1 , x3 = y2 , x4 = y3 , x5 = y4
In other words, divide the first cell, with multinomial
probability (1/2 + ✓/4) , into two cells with
probabilities 1/2 and ✓/4
6
Example: Data augmented posterior
Then let X = (Y, Z) with “missing” or latent data
so that
Z: z = x1 ) x2 = y1
✓ ◆ ✓ ◆y 1
y1
1
⇡(✓, Z|Y ) /
z
2
z
✓ ◆z
✓
(1
4
y2 +y3 +b 1 y4 +a 1
✓)
✓|Z, Y ⇠ Beta(z + y4 + a, y2 + y3 + b)
✓
◆
✓
Z|✓, Y ⇠ Bin y1 ,
2+✓
7
z
✓
80
60
40
20
0
Frequency
100
140
Example: latent variable sampling
15
20
25
30
35
z
8
40
45
200
100
0
Frequency
300
Example: latent variable sampling
0.4
0.5
0.6
theta
9
0.7
0.8