Download Special Topic: Bayesian Finite Population Survey

Document related concepts

History of statistics wikipedia , lookup

Taylor's law wikipedia , lookup

Bootstrapping (statistics) wikipedia , lookup

Foundations of statistics wikipedia , lookup

Student's t-test wikipedia , lookup

Misuse of statistics wikipedia , lookup

Bayesian inference wikipedia , lookup

Gibbs sampling wikipedia , lookup

Statistical inference wikipedia , lookup

Transcript
Special Topic: Bayesian Finite Population
Survey Sampling
Sudipto Banerjee
Division of Biostatistics
School of Public Health
University of Minnesota
April 22, 2010
1
Special Topic
Overview
Scientific survey sampling (Neyman, 1934) represents one
of statistics’ greatest contribution to science
2
Special Topic
Overview
Scientific survey sampling (Neyman, 1934) represents one
of statistics’ greatest contribution to science
It provides the remarkable ability to obtain useful
inferences about large populations from modest samples,
with measurable uncertainty
2
Special Topic
Overview
Scientific survey sampling (Neyman, 1934) represents one
of statistics’ greatest contribution to science
It provides the remarkable ability to obtain useful
inferences about large populations from modest samples,
with measurable uncertainty
It forms the backbone of data collection in science and
government
2
Special Topic
Overview
Scientific survey sampling (Neyman, 1934) represents one
of statistics’ greatest contribution to science
It provides the remarkable ability to obtain useful
inferences about large populations from modest samples,
with measurable uncertainty
It forms the backbone of data collection in science and
government
Traditional (and widely favoured) approach:
Randomization-based
2
Special Topic
Overview
Scientific survey sampling (Neyman, 1934) represents one
of statistics’ greatest contribution to science
It provides the remarkable ability to obtain useful
inferences about large populations from modest samples,
with measurable uncertainty
It forms the backbone of data collection in science and
government
Traditional (and widely favoured) approach:
Randomization-based
Advocating Bayes for survey sampling is like “swimming
upstream”: Modelling assumptions of any kind are
anathema here, let alone priors and further subjectivity that
Bayes brings along!
2
Special Topic
Fundamental Notions
For a population with N units, let Y = (Y1 , . . . , YN ) denote
the population variable
3
Special Topic
Fundamental Notions
For a population with N units, let Y = (Y1 , . . . , YN ) denote
the population variable
Yi is the value of unit i in the population
3
Special Topic
Fundamental Notions
For a population with N units, let Y = (Y1 , . . . , YN ) denote
the population variable
Yi is the value of unit i in the population
Let I = (I1 , . . . , IN ) be the set of inclusion indicator
variables
3
Special Topic
Fundamental Notions
For a population with N units, let Y = (Y1 , . . . , YN ) denote
the population variable
Yi is the value of unit i in the population
Let I = (I1 , . . . , IN ) be the set of inclusion indicator
variables
Ii = 1 if unit i is selected in the sample; Ii = 0 otherwise
3
Special Topic
Fundamental Notions
For a population with N units, let Y = (Y1 , . . . , YN ) denote
the population variable
Yi is the value of unit i in the population
Let I = (I1 , . . . , IN ) be the set of inclusion indicator
variables
Ii = 1 if unit i is selected in the sample; Ii = 0 otherwise
Let I0 denote the complement of I, obtained by “switching”
the 1’s to 0’s and 0’s to 1’s in I
3
Special Topic
Fundamental Notions
For a population with N units, let Y = (Y1 , . . . , YN ) denote
the population variable
Yi is the value of unit i in the population
Let I = (I1 , . . . , IN ) be the set of inclusion indicator
variables
Ii = 1 if unit i is selected in the sample; Ii = 0 otherwise
Let I0 denote the complement of I, obtained by “switching”
the 1’s to 0’s and 0’s to 1’s in I
Thus, I0 represents the exclusion indicator variables,
indexing units not selected in the sample.
3
Special Topic
Simple random sampling
Simple Random Sampling With Replacement (SRSWR):
We draw a unit from the population, put it back and draw
again.
4
Special Topic
Simple random sampling
Simple Random Sampling With Replacement (SRSWR):
We draw a unit from the population, put it back and draw
again.
Simple Random Sampling Without Replacement
(SRSWOR): We draw a unit from the population, keep it
aside and draw again
4
Special Topic
Simple random sampling
Simple Random Sampling With Replacement (SRSWR):
We draw a unit from the population, put it back and draw
again.
Simple Random Sampling Without Replacement
(SRSWOR): We draw a unit from the population, keep it
aside and draw again
SRSWR leads to binomial distribution; chances of units
being drawn remain the same between draws
4
Special Topic
Simple random sampling
Simple Random Sampling With Replacement (SRSWR):
We draw a unit from the population, put it back and draw
again.
Simple Random Sampling Without Replacement
(SRSWOR): We draw a unit from the population, keep it
aside and draw again
SRSWR leads to binomial distribution; chances of units
being drawn remain the same between draws
SRSWOR leads to hypergeometric distribution: chances of
units being drawn change from draw to draw
4
Special Topic
Simple random sampling
Simple Random Sampling With Replacement (SRSWR):
We draw a unit from the population, put it back and draw
again.
Simple Random Sampling Without Replacement
(SRSWOR): We draw a unit from the population, keep it
aside and draw again
SRSWR leads to binomial distribution; chances of units
being drawn remain the same between draws
SRSWOR leads to hypergeometric distribution: chances of
units being drawn change from draw to draw
Key questions:
What are the random variables?
4
Special Topic
Simple random sampling
Simple Random Sampling With Replacement (SRSWR):
We draw a unit from the population, put it back and draw
again.
Simple Random Sampling Without Replacement
(SRSWOR): We draw a unit from the population, keep it
aside and draw again
SRSWR leads to binomial distribution; chances of units
being drawn remain the same between draws
SRSWOR leads to hypergeometric distribution: chances of
units being drawn change from draw to draw
Key questions:
What are the random variables?
What are the parameters?
4
Special Topic
Estimating the population mean
Let YI be the collection of units selected in the sample; we
often write YI as y = (y1 , . . . , yn ).
5
Special Topic
Estimating the population mean
Let YI be the collection of units selected in the sample; we
often write YI as y = (y1 , . . . , yn ).
IMPORTANT: The lower-case yi ’s simply index the sample;
thus yi does not necessarily represent Yi , but the i-th
member of the sample.
5
Special Topic
Estimating the population mean
Let YI be the collection of units selected in the sample; we
often write YI as y = (y1 , . . . , yn ).
IMPORTANT: The lower-case yi ’s simply index the sample;
thus yi does not necessarily represent Yi , but the i-th
member of the sample.
Let Ȳ =
5
1
N
PN
i=1 Yi
be the population mean
Special Topic
Estimating the population mean
Let YI be the collection of units selected in the sample; we
often write YI as y = (y1 , . . . , yn ).
IMPORTANT: The lower-case yi ’s simply index the sample;
thus yi does not necessarily represent Yi , but the i-th
member of the sample.
Let Ȳ =
1
N
PN
Let ȲI = ȳ =
of YI
5
i=1 Yi
1
n
Pn
be the population mean
i=1 yi
be the sample mean; i.e. the mean
Special Topic
Estimating the population mean
Let YI be the collection of units selected in the sample; we
often write YI as y = (y1 , . . . , yn ).
IMPORTANT: The lower-case yi ’s simply index the sample;
thus yi does not necessarily represent Yi , but the i-th
member of the sample.
Let Ȳ =
1
N
PN
Let ȲI = ȳ =
of YI
i=1 Yi
1
n
Pn
be the population mean
i=1 yi
be the sample mean; i.e. the mean
Let YI 0 be the collection of units excluded from the sample;
let ȲI 0 be the mean of YI 0 .
5
Special Topic
Estimating the population mean
Let YI be the collection of units selected in the sample; we
often write YI as y = (y1 , . . . , yn ).
IMPORTANT: The lower-case yi ’s simply index the sample;
thus yi does not necessarily represent Yi , but the i-th
member of the sample.
Let Ȳ =
1
N
PN
Let ȲI = ȳ =
of YI
i=1 Yi
1
n
Pn
be the population mean
i=1 yi
be the sample mean; i.e. the mean
Let YI 0 be the collection of units excluded from the sample;
let ȲI 0 be the mean of YI 0 .
Then Ȳ = f ȳ + (1 − f )ȲI 0 ; f =
5
n
N
is the sampling fraction
Special Topic
Estimating the population mean
Key questions answered:
What are the random variables? Answer: The Ii ’s
What are the parameters? Answer: The Yi ’s.
6
Special Topic
Estimating the population mean
Key questions answered:
What are the random variables? Answer: The Ii ’s
What are the parameters? Answer: The Yi ’s.
For SRSWOR: E[Ii ] = P (Ii = 1) =
6
−1
(Nn−1
)
=
N
(n)
Special Topic
n
N
for each i.
Estimating the population mean
Key questions answered:
What are the random variables? Answer: The Ii ’s
What are the parameters? Answer: The Yi ’s.
−1
(Nn−1
)
=
N
(n)
Unbiasedness
of the sample mean ȳ: Write
P
I
Y
and see:
ȳ = n1 N
i
i
i=1
For SRSWOR: E[Ii ] = P (Ii = 1) =
E[ȳ] =
6
N
N
i=1
i=1
n
N
for each i.
1X
1X n
E[Ii ]Yi =
Yi = Ȳ .
n
n
N
Special Topic
Estimating the population mean
Key questions answered:
What are the random variables? Answer: The Ii ’s
What are the parameters? Answer: The Yi ’s.
−1
(Nn−1
)
=
N
(n)
Unbiasedness
of the sample mean ȳ: Write
P
I
Y
and see:
ȳ = n1 N
i
i
i=1
For SRSWOR: E[Ii ] = P (Ii = 1) =
E[ȳ] =
N
N
i=1
i=1
n
N
for each i.
1X
1X n
E[Ii ]Yi =
Yi = Ȳ .
n
n
N
P
For SRSWR: ȳ = N
i=1 Wi Yi , where Wi is the number of
times unit i appears in the sample. Then, Wi ∼ Bin(n, N1 ),
n
so E[Wi ] = N
, hence
P
P
N
1
n
E[ȳ] = n i=1 E[Wi ]Yi = n1 N
i=1 N Yi = Ȳ
6
Special Topic
Bayesian modelling
We need a prior for each population unit: Yi
7
Special Topic
Bayesian modelling
We need a prior for each population unit: Yi
iid
Suppose Yi ∼ N (µ, σ 2 )
7
Special Topic
Bayesian modelling
We need a prior for each population unit: Yi
iid
Suppose Yi ∼ N (µ, σ 2 )
Let us first assume σ 2 is known and P (µ) ∝ 1.
7
Special Topic
Bayesian modelling
We need a prior for each population unit: Yi
iid
Suppose Yi ∼ N (µ, σ 2 )
Let us first assume σ 2 is known and P (µ) ∝ 1.
What is “seen”? D = (YI , I)
7
Special Topic
Bayesian modelling
We need a prior for each population unit: Yi
iid
Suppose Yi ∼ N (µ, σ 2 )
Let us first assume σ 2 is known and P (µ) ∝ 1.
What is “seen”? D = (YI , I)
Note that P (D | µ, σ 2 ) = P (I)P (YI | µ, σ 2 ) ∝ P (YI | µ, σ 2 );
so likelihood is product of n independent N (µ, σ 2 ).
7
Special Topic
Bayesian modelling
We need a prior for each population unit: Yi
iid
Suppose Yi ∼ N (µ, σ 2 )
Let us first assume σ 2 is known and P (µ) ∝ 1.
What is “seen”? D = (YI , I)
Note that P (D | µ, σ 2 ) = P (I)P (YI | µ, σ 2 ) ∝ P (YI | µ, σ 2 );
so likelihood is product of n independent N (µ, σ 2 ).
We need the posterior distribution
Z
P (Ȳ | D, σ 2 ) = P (Ȳ | D, µ, σ 2 )P (µ | D, σ 2 )dµ.
7
Special Topic
Bayesian modelling
Let us take a closer look at P (Ȳ | D, σ 2 ).
8
Special Topic
Bayesian modelling
Let us take a closer look at P (Ȳ | D, σ 2 ).
Observe that Ȳ = f ȳ + (1 − f )ȲI 0 .
8
Special Topic
Bayesian modelling
Let us take a closer look at P (Ȳ | D, σ 2 ).
Observe that Ȳ = f ȳ + (1 − f )ȲI 0 .
Given D, ȳ is fixed; so the posterior distribution of ȲI 0 will
determine the posterior distribution of Ȳ .
8
Special Topic
Bayesian modelling
Let us take a closer look at P (Ȳ | D, σ 2 ).
Observe that Ȳ = f ȳ + (1 − f )ȲI 0 .
Given D, ȳ is fixed; so the posterior distribution of ȲI 0 will
determine the posterior distribution of Ȳ .
So, what is the posterior distribution P (ȲI 0 | D, σ 2 )?
8
Special Topic
Bayesian modelling
Let us take a closer look at P (Ȳ | D, σ 2 ).
Observe that Ȳ = f ȳ + (1 − f )ȲI 0 .
Given D, ȳ is fixed; so the posterior distribution of ȲI 0 will
determine the posterior distribution of Ȳ .
So, what is the posterior distribution P (ȲI 0 | D, σ 2 )?
Z
P (ȲI 0 | D, σ 2 ) = P (ȲI 0 | µ, σ 2 )P (µ | D, σ 2 )dµ,
where we use P (ȲI 0 | D, µ, σ 2) = P (ȲI 0 | µ, σ 2 ). Note
σ2
P (ȲI 0 | µ, σ 2 ) = N µ, N (1−f
)
8
Special Topic
Bayesian modelling
Let us take a closer look at P (Ȳ | D, σ 2 ).
Observe that Ȳ = f ȳ + (1 − f )ȲI 0 .
Given D, ȳ is fixed; so the posterior distribution of ȲI 0 will
determine the posterior distribution of Ȳ .
So, what is the posterior distribution P (ȲI 0 | D, σ 2 )?
Z
P (ȲI 0 | D, σ 2 ) = P (ȲI 0 | µ, σ 2 )P (µ | D, σ 2 )dµ,
where we use P (ȲI 0 | D, µ, σ 2) = P (ȲI 0 | µ, σ 2 ). Note
σ2
P (ȲI 0 | µ, σ 2 ) = N µ, N (1−f
)
Can we simplify this posterior without integrating?
8
Special Topic
Bayesian modelling
What is the posterior distribution P (µ | D, σ 2 )?
9
Special Topic
Bayesian modelling
What is the posterior distribution P (µ | D, σ 2 )?
Note that P (µ | D, σ 2 ) ∝ P (D | µ, σ 2 )
9
Special Topic
Bayesian modelling
What is the posterior distribution P (µ | D, σ 2 )?
Note that P (µ | D, σ 2 ) ∝ P (D | µ, σ 2 )
2
=⇒ P (µ | D, σ 2 ) = N ȳ, σn .
9
Special Topic
Bayesian modelling
What is the posterior distribution P (µ | D, σ 2 )?
Note that P (µ | D, σ 2 ) ∝ P (D | µ, σ 2 )
2
=⇒ P (µ | D, σ 2 ) = N ȳ, σn .
Conditional on D and σ 2 , we can write:
Ȳ = f ȳ + (1 − f )ȲI 0 ;
ȲI 0 = µ + u1 ; u1 ∼ N 0,
σ2
N (1 − f )
⇒ Ȳ = f ȳ + (1 − f )µ + (1 − f )u1
;
2
Also, conditional on D and σ 2 , µ = ȳ + u2 ; u2 ∼ N 0, σn .
9
Special Topic
Bayesian modelling
Then:
Ȳ = ȳ + (1 − f )u2 + (1 − f )u1
10
Special Topic
Bayesian modelling
Then:
Ȳ = ȳ + (1 − f )u2 + (1 − f )u1
The posterior is P (Ȳ | D, σ 2 ) is:
σ2
N ȳ, (1 − f )
n
10
Special Topic
Bayesian modelling
Then:
Ȳ = ȳ + (1 − f )u2 + (1 − f )u1
The posterior is P (Ȳ | D, σ 2 ) is:
σ2
N ȳ, (1 − f )
n
Let σ 2 be unknown. Prior: P (µ, σ 2 ) ∝
10
Special Topic
1
σ2
Bayesian modelling
Then:
Ȳ = ȳ + (1 − f )u2 + (1 − f )u1
The posterior is P (Ȳ | D, σ 2 ) is:
σ2
N ȳ, (1 − f )
n
Let σ 2 be unknown. Prior: P (µ, σ 2 ) ∝
1
σ2
Then we reproduce classical result:
Ȳ − ȳ
q
where s2 =
10
1
n−1
Pn
s2
n (1
i=1 (yi
∼ tn−1
− f)
− ȳ)2 is the sample variance.
Special Topic
Bayesian computation
Note that the marginal
posterior
distribution
n−1 (n−1)s2
2
P (σ | D) = IG 2 , 2
, where
P
n
1
2
s2 = n−1
i=1 (yi − ȳ) is the sample variance.
11
Special Topic
Bayesian computation
Note that the marginal
posterior
distribution
n−1 (n−1)s2
2
P (σ | D) = IG 2 , 2
, where
P
n
1
2
s2 = n−1
i=1 (yi − ȳ) is the sample variance.
We can perform exact posterior sampling as follows. For
l = 1, . . . , L do the following:
11
Special Topic
Bayesian computation
Note that the marginal
posterior
distribution
n−1 (n−1)s2
2
P (σ | D) = IG 2 , 2
, where
P
n
1
2
s2 = n−1
i=1 (yi − ȳ) is the sample variance.
We can perform exact posterior sampling as follows. For
l = 1, . . . , L do the following:
2
Draw σ(l)
∼ IG
11
2
n−1 (n−1)s
2 ,
2
Special Topic
Bayesian computation
Note that the marginal
posterior
distribution
n−1 (n−1)s2
2
P (σ | D) = IG 2 , 2
, where
P
n
1
2
s2 = n−1
i=1 (yi − ȳ) is the sample variance.
We can perform exact posterior sampling as follows. For
l = 1, . . . , L do the following:
(n−1)s2
2
Draw σ(l)
∼ IG n−1
,
2
22
σ(l)
Draw µ(l) ∼ N ȳ, n (1 − f )
11
Special Topic
Bayesian computation
Note that the marginal
posterior
distribution
n−1 (n−1)s2
2
P (σ | D) = IG 2 , 2
, where
P
n
1
2
s2 = n−1
i=1 (yi − ȳ) is the sample variance.
We can perform exact posterior sampling as follows. For
l = 1, . . . , L do the following:
(n−1)s2
2
Draw σ(l)
∼ IG n−1
,
2
22
σ(l)
Draw µ(l) ∼ N ȳ, n (1 − f )
2
σ(l)
(l)
Draw ȲI 0 ∼ N µ(l) , N (1−f )
11
Special Topic
Bayesian computation
Note that the marginal
posterior
distribution
n−1 (n−1)s2
2
P (σ | D) = IG 2 , 2
, where
P
n
1
2
s2 = n−1
i=1 (yi − ȳ) is the sample variance.
We can perform exact posterior sampling as follows. For
l = 1, . . . , L do the following:
(n−1)s2
2
Draw σ(l)
∼ IG n−1
,
2
22
σ(l)
Draw µ(l) ∼ N ȳ, n (1 − f )
2
σ(l)
(l)
Draw ȲI 0 ∼ N µ(l) , N (1−f )
(l)
Set Ȳ (l) = f ȳ + (1 − f )ȲI 0 .
11
Special Topic
Bayesian computation
Note that the marginal
posterior
distribution
n−1 (n−1)s2
2
P (σ | D) = IG 2 , 2
, where
P
n
1
2
s2 = n−1
i=1 (yi − ȳ) is the sample variance.
We can perform exact posterior sampling as follows. For
l = 1, . . . , L do the following:
(n−1)s2
2
Draw σ(l)
∼ IG n−1
,
2
22
σ(l)
Draw µ(l) ∼ N ȳ, n (1 − f )
2
σ(l)
(l)
Draw ȲI 0 ∼ N µ(l) , N (1−f )
(l)
Set Ȳ (l) = f ȳ + (1 − f )ȲI 0 .
The resulting collection {Ȳ (l) }L
l=1 is a sample from the
posterior distribution of the population mean.
11
Special Topic
Homework
HOMEWORK
A simple random sample of 30 households was drawn from a township containing 576 households. The
numbers of persons per household in the sample were as follows:
5, 6, 3, 3, 2, 3, 3, 3, 4, 4, 3, 2, 7, 4, 3, 5, 4, 4, 3, 3, 4, 3, 3, 1, 2, 4, 3, 4, 2, 4
Use a non-informative Bayesian analysis to estimate the total number of people in the area and compute
the posterior probability that the population total lies within 10% of the sample estimate.
From a list of 468 small 2-year colleges in the North-Eastern United States, a simple sample of 100
colleges was drawn. Data for the number of students (y) and the number of teachers (x) for these colleges
were summarized as follows:
The total number of students in the sample was: 44, 987
The total number of teachers in the sample was: 3, 079
Pn
2
Also given are the sample sums of squares:
i=1 yi = 29, 881, 219 and
Pn
2
i=1 xi = 111, 090.
Assuming a non-informative Bayesian setting and where the population of students and teachers are
independent, find the posterior mean and 95% credible interval for the student-teacher ratio in the
population (i.e. all the 468 colleges combined).
12
Special Topic
Missing Data Inference
A general theory for Bayesian missing data analysis
13
Special Topic
Missing Data Inference
A general theory for Bayesian missing data analysis
Let y = (y1 , . . . , yN ) be the units in the population. We
assume a likelihood for this population, say p(y | θ), where
θ are the parameters in the model.
13
Special Topic
Missing Data Inference
A general theory for Bayesian missing data analysis
Let y = (y1 , . . . , yN ) be the units in the population. We
assume a likelihood for this population, say p(y | θ), where
θ are the parameters in the model.
Let I = (I1 , . . . , IN ) be the set of “inclusion indicators”.
13
Special Topic
Missing Data Inference
A general theory for Bayesian missing data analysis
Let y = (y1 , . . . , yN ) be the units in the population. We
assume a likelihood for this population, say p(y | θ), where
θ are the parameters in the model.
Let I = (I1 , . . . , IN ) be the set of “inclusion indicators”.
Let us denote by “obs” the index set {i : Ii = 1} and by
“mis” the index set {i; Ii = 0}
13
Special Topic
Missing Data Inference
A general theory for Bayesian missing data analysis
Let y = (y1 , . . . , yN ) be the units in the population. We
assume a likelihood for this population, say p(y | θ), where
θ are the parameters in the model.
Let I = (I1 , . . . , IN ) be the set of “inclusion indicators”.
Let us denote by “obs” the index set {i : Ii = 1} and by
“mis” the index set {i; Ii = 0}
So, yobs is the set of observed data points and ymis is for
the unobserved data. We will often write y = (yobs , ymis ).
13
Special Topic
Building the models
We will break the joint probability model into two parts
14
Special Topic
Building the models
We will break the joint probability model into two parts
The model for the “complete data” p(y | θ) – including
observed and missing data
14
Special Topic
Building the models
We will break the joint probability model into two parts
The model for the “complete data” p(y | θ) – including
observed and missing data
The model for the inclusion vector I, say p(I | y, φ)
14
Special Topic
Building the models
We will break the joint probability model into two parts
The model for the “complete data” p(y | θ) – including
observed and missing data
The model for the inclusion vector I, say p(I | y, φ)
We define the complete-likelihood as:
p(y, I | θ, φ) = p(y | θ)p(I | y, φ)
14
Special Topic
Building the models
We will break the joint probability model into two parts
The model for the “complete data” p(y | θ) – including
observed and missing data
The model for the inclusion vector I, say p(I | y, φ)
We define the complete-likelihood as:
p(y, I | θ, φ) = p(y | θ)p(I | y, φ)
Parameters: Scientific interest usually revolves around
estimation of θ (sometimes called the “super-population”
parameters)
14
Special Topic
Building the models
We will break the joint probability model into two parts
The model for the “complete data” p(y | θ) – including
observed and missing data
The model for the inclusion vector I, say p(I | y, φ)
We define the complete-likelihood as:
p(y, I | θ, φ) = p(y | θ)p(I | y, φ)
Parameters: Scientific interest usually revolves around
estimation of θ (sometimes called the “super-population”
parameters)
The parameters φ index the missingness model (often
called “design” parameters).
14
Special Topic
Buliding the models
The “complete-likelihood” is not the data likelihood as it
involves the missing data component as well.
15
Special Topic
Buliding the models
The “complete-likelihood” is not the data likelihood as it
involves the missing data component as well.
The actual information available is (yobs , I). So the data
likelihood is:
Z
p(yobs , I | θ, φ) = p(y, I | θ, φ)dymis
15
Special Topic
Buliding the models
The “complete-likelihood” is not the data likelihood as it
involves the missing data component as well.
The actual information available is (yobs , I). So the data
likelihood is:
Z
p(yobs , I | θ, φ) = p(y, I | θ, φ)dymis
The joint posterior distribution is
p(θ, φ | yobs , I) ∝ p(θ, φ)p(yobs , I | θ, φ)
Z
∝ p(θ, φ) p(y | θ)p(I | y, φ)dymis
15
Special Topic
Buliding the models
The “complete-likelihood” is not the data likelihood as it
involves the missing data component as well.
The actual information available is (yobs , I). So the data
likelihood is:
Z
p(yobs , I | θ, φ) = p(y, I | θ, φ)dymis
The joint posterior distribution is
p(θ, φ | yobs , I) ∝ p(θ, φ)p(yobs , I | θ, φ)
Z
∝ p(θ, φ) p(y | θ)p(I | y, φ)dymis
15
Special Topic
Buliding the models
These integrals are evaluated by drawing samples
(ymis , θ, φ) from p(ymis , θ, φ | yobs , I)
16
Special Topic
Buliding the models
These integrals are evaluated by drawing samples
(ymis , θ, φ) from p(ymis , θ, φ | yobs , I)
Typically, we first compute samples (θ(l) , φ(l) ) from
p(θ, φ | yobs , I)
16
Special Topic
Buliding the models
These integrals are evaluated by drawing samples
(ymis , θ, φ) from p(ymis , θ, φ | yobs , I)
Typically, we first compute samples (θ(l) , φ(l) ) from
p(θ, φ | yobs , I)
(l)
Then we draw ymis ∼ p(ymis | yobs , I, θ(l) , φ(l) )
16
Special Topic
Buliding the models
These integrals are evaluated by drawing samples
(ymis , θ, φ) from p(ymis , θ, φ | yobs , I)
Typically, we first compute samples (θ(l) , φ(l) ) from
p(θ, φ | yobs , I)
(l)
Then we draw ymis ∼ p(ymis | yobs , I, θ(l) , φ(l) )
Computations often simplify from assumptions:
p(I | y, φ) = p(I | yobs , φ)
16
Special Topic
Buliding the models
These integrals are evaluated by drawing samples
(ymis , θ, φ) from p(ymis , θ, φ | yobs , I)
Typically, we first compute samples (θ(l) , φ(l) ) from
p(θ, φ | yobs , I)
(l)
Then we draw ymis ∼ p(ymis | yobs , I, θ(l) , φ(l) )
Computations often simplify from assumptions:
p(I | y, φ) = p(I | yobs , φ)
p(I | y, φ) = p(I | φ) = p(I)
16
Special Topic
Buliding the models
These integrals are evaluated by drawing samples
(ymis , θ, φ) from p(ymis , θ, φ | yobs , I)
Typically, we first compute samples (θ(l) , φ(l) ) from
p(θ, φ | yobs , I)
(l)
Then we draw ymis ∼ p(ymis | yobs , I, θ(l) , φ(l) )
Computations often simplify from assumptions:
p(I | y, φ) = p(I | yobs , φ)
p(I | y, φ) = p(I | φ) = p(I)
p(φ | θ) = p(φ)
p(ymis | θ(l) , yobs ) = p(ymis | θ(l) ).
16
Special Topic