Download Inference on latent variable models

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Maximum entropy thermodynamics wikipedia , lookup

Probability amplitude wikipedia , lookup

Dragon King Theory wikipedia , lookup

Transcript
Statistical modelling and latent
variables (2)
Mixing latent variables and parameters in statistical inference
Trond Reitan (Division of statistics and insurance mathematics, Department of Mathematics, University of Oslo)
State spaces
We typically have a parametric model for the latent
variables, representing the true state of a system. Also,
the distribution of the observations may depend on
parameters as well as latent variables. Observations
may often be seen as noisy versions of the actual state
of a system.
Examples of states could be:

Use green arrows for
one-way parametric
dependency (for
which you don’t
provide a probability
distribution in
frequentist statistics).
L
D
1. The physical state of a rocket
(position, orientation, velocity,
fuel-state).
2. Real water temperature (as
opposed to measured
temperature).
3. Occupancy in an area.
4. Carrying capacity in an area.
Observations, latent variables and parameters inference
Sometimes we are interested in the parameters,
sometimes in the state of the latent variables,
sometimes both.
Impossible to do inference on the latent variables without
also dealing with the parameters and vice versa.

L
D
Often, other
parameters affect
the latent
variables than the
observations.
D L
L
D
Observations, latent variables and parameters
ML estimation
A latent variable model will specify the distribution of the latent
variables given the parameters and the distribution of the

observations given both the parameters and the latent
variables. This will give the distribution of data *and*
latent variables:
L
f(D,L|)=f(L|)f(D|L,)
D
But in an ML analysis, we want the likelihood, f(D|)!
Theory (law of total probability again):
Pr( D |  )   Pr( D, L | )   Pr( D | L, ) Pr( L |  )
or
f ( D |  )   f ( D, L | )d   f ( D | L,  ) f ( L |  )d
Observations, latent variables and parameters
ML estimation
Likelihood:
Pr( D |  )   Pr( D, L,  )   Pr( D |L,  ) Pr( L |  )
L
L
or
f ( D |  )   f ( D, L | )dL   f ( D | L,  ) f ( L |  )dL
The integral can often not be obtained analytically.
In occupancy, the sum is easy (only two possible states)
Kalman filter: For latent variables as linear normal Markov
chains with normal observations depending linearly on
them, this can be done analytically.
Alternative when analytical methods fail: numerical
integration, particle filters, Bayesian statistics using MCMC.
Occupancy as a state-space model –
the model in words
Assume a set areas, i(1,…,A). Each area has a set of ni
transects. Each transect has an independent detection
probability, p, given the occupancy. Occupancy is a latent
variable for each area, i. Assume independency between
the occupancy state in different areas. The probability of
occupancy is labelled . So, the parameters are =(p,).
Pr(i=1|)=.
Start with distribution of observations given the latent variable:
Pr(xi,j=1 | i=1,)=p. Pr(xi,j=0 | i=1,)=1-p,
Pr(xi,j=1 | i=0,)=0. Pr(xi,j=0 | i=0,)=1.
So, for 5 transects with outcome 00101, we will get
Pr(00101 | i=1,)=(1-p)(1-p)p(1-p)p=p2(1-p)3.
Pr(00101 | i=0,)=11010=0
Occupancy as a state-space model –
graphic model
One latent
1
variable per area
2
3
………
(area occupancy)
Data: x1,1 x1,2
Detections in single
transects.
p

Parameters ():
x1,3 ……… x1,n1
A
=occupancy rate
p=detection rate
given occupancy.
Pr(i=1|)=
Pr(i=0|)=1-
The area occupancies
are independent.
Pr(xi,j=1 | i=1,)=p. Pr(xi,j=0 | i=1,)=1-p,
Pr(xi,j=1 | i=0,)=0. Pr(xi,j=0 | i=0,)=1
The detections are independent *conditioned* on the occupancy.
Important to keep such things in mind when modelling!
PS: What we’ve done so far is enough to start analyzing using WinBUGS.
Occupancy as a state-space model –
probability distribution for a set of transects
Probability for a set of transects to give ki>0 detections in a
given order is
Pr( ki | i  1, )  p k i (1  p) ni  ki , Pr( ki | i  0, )  0
while with no detections
Pr( ki  0 | i  1, )  (1  p) ni , Pr( ki  0 | i  0, )  1
We can represent this more compactly if we introduce the
identification function. I(A)=1 if A is true. I(A)=0 if A is false.
Then Pr(ki | i  0, )  I (ki  0)
With no given order on the ki detection, we pick up the
binomial coefficient:
 n  ki
Pr( ki | i  1, )    p (1  p) ni ki , Pr( ki | i  0, )  I (ki  0)
k 
(Not relevant at all for inference. The for a given dataset, the constant is just “sitting” there.)
Occupancy as a state-space model –
area-specific marginal detection probability
(likelihood)
For a given area with an unknown occupancy state, the
detection probability will then be (law of tot. prob.):
Pr(ki |  )  Pr( ki | i  1, ) Pr(i  1 |  )  Pr( ki | i  0, ) Pr(i  0 |  ) 
 n  ki
  p (1  p) ni ki   I (ki  0)(1  )
k 
Binomial (p=0.6)
Occupancy (p=0.6, =0.6)
Occupancy is a zero-inflated binomial model
Occupancy as a state-space model –
full likelihood
Each area is independent, so the full likelihood is:
 ni  ki

ni  k i
Pr(ki  |  )     p (1  p )
  I ( ki  0)(1  ) 
i 1  k i 

A
We can now do inference on the parameters, =(p,), using
ML estimation (or using Bayesian statistics).
Occupancy as a state-space model –
occupancy inference
Inference on i, given the parameters,  (Bayes theorem):
Pr( ki | i  1) Pr(i  1)
Pr(i  1 | ki ,  ) 
 100% for ki  0
Pr( ki | i  1) Pr(i  1)
(1  p ) ni 
Pr(i  1 | ki  0,  ) 
(1  p ) ni   (1  )
PS: We pretend that  is known here. However,  is estimated from the
data and is not certain at all. We are using data twice! Once to estimate
 and once to do inference on the latent variables. Avoided in a
Bayesian setting.