Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Part 10: Latent variables and missing data 1 Missing data The Bayesian approach to missing data problems in statistics is straightforward • since missing data may be treated the same as any other unknown parameter ‣ with a full conditional for GS ‣ or via MH 2 Data augmentation Dealing with missing data is so easy that sometimes it makes sense to imagine that there is missing data to make inference more computationally tractable • such imaginary missing quantities are sometimes called latent variables This technique is called data augmentation, and is especially useful when latent variables • • allow full conditionals for the other unknown parameters to be derived and have full conditionals themselves 3 Example: Genetic linkage The genetic linkage of 197 animals is allocated to one of four categories Y = (y1 , y2 , y3 , y4 ) = (125, 18, 20, 34) with probabilities (1/2 + ✓/4, (1 ✓)/4, (1 where ✓ is unknown 4 ✓)/4, ✓/4) Example: Intractable posterior Suppose we place a Beta(a, b) prior on ✓ Then ⇡(✓|y) / ✓ | 1 ✓ + 2 4 ◆y 1 (1 ✓) {z y2 +y3 y4 multinomial likelihood / (2 + ✓) (1 y1 ✓ } ⇥✓ a 1 y2 +y3 +b 1 y4 +a 1 ✓) ✓ How can we sample from this? • • One option: MH Another option: Data augmentation 5 (1 b 1 ✓) Example: Data augmentation Consider data augmentation with the data set X = (x1 , x2 , x3 , x4 , x5 ) with x1 + x2 = y1 , x3 = y2 , x4 = y3 , x5 = y4 In other words, divide the first cell, with multinomial probability (1/2 + ✓/4) , into two cells with probabilities 1/2 and ✓/4 6 Example: Data augmented posterior Then let X = (Y, Z) with “missing” or latent data so that Z: z = x1 ) x2 = y1 ✓ ◆ ✓ ◆y 1 y1 1 ⇡(✓, Z|Y ) / z 2 z ✓ ◆z ✓ (1 4 y2 +y3 +b 1 y4 +a 1 ✓) ✓|Z, Y ⇠ Beta(z + y4 + a, y2 + y3 + b) ✓ ◆ ✓ Z|✓, Y ⇠ Bin y1 , 2+✓ 7 z ✓ 80 60 40 20 0 Frequency 100 140 Example: latent variable sampling 15 20 25 30 35 z 8 40 45 200 100 0 Frequency 300 Example: latent variable sampling 0.4 0.5 0.6 theta 9 0.7 0.8