Download Bayesian Inference - Translational Neuromodeling Unit

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Foundations of statistics wikipedia , lookup

Transcript
Bayesian Inference
Ekaterina Lomakina
TNU seminar: Bayesian inference
1 March 2013
Outline
•
•
•
•
•
•
•
Probability distributions
Maximum likelihood estimation
Maximum a posteriori estimation
Conjugate priors
Conceptualizing models as collection of priors
Noninformative priors
Empirical Bayes
Probability distribution
• Density estimation – to model distribution p(x) of a random
variable x given a finite set of observations x1, …, xN.
Nonparametric approach
• Histogram
• Kernel density estimation
• Nearest neighbor approach
Parametric approach
• Gaussian distribution
• Beta distribution
• …
The Exponential Family
Gaussian distribution
Binomial distribution
Beta distribution
etc…
Gaussian distribution
• Central limit theorem (CLT) states that, given certain
conditions, the mean of a sufficiently large number of
independent random variables, each with a well-defined mean
and well-defined variance, will be approximately normally
distributed
Bean machine by Sir
Francis Galton
Maximum likelihood estimation
• The frequentist approach to estimate parameters of the
distribution given a set of observations is to maximize
likelihood.
– data are i.i.d
– monotonic transformation
MLE for Gaussian distribution
– simple average
Maximum a posterior
estimation
• The bayesian approach to estimate parameters of the
distribution given a set of observations is to maximize
posterior distribution.
• It allows to account for the prior information.
MAP for Gaussian distribution
Posterior distribution is given by
– weighted average
Conjugate prior
• In general, for a given probability distribution p(x|η), we can seek a
prior p(η) that is conjugate to the likelihood function, so that the
posterior distribution has the same functional form as the prior.
• For any member of the exponential family, there exists a conjugate
prior that can be written in the form
• Important conjugate pairs include:
Binomial – Beta
Multinomial – Dirichlet
Gaussian – Gaussian (for mean)
Gaussian – Gamma (for precision)
Exponential – Gamma
MLE for Binomial distribution
• Binomial distribution models the probability of m “heads” out
of N tosses.
• The only parameter of the distribution μ encodes probability
of a single event (“head”)
• Maximum likelihood estimation
is given by
MAP for Binomial distribution
• The conjugate prior for this distribution is Beta
• The posterior is then given by
where l = N – m, simply the number of “tails”.
Models as collection of priors
-1
• Take a simple regression model
• Add a prior on weights
• And get Bayesian linear regression!
Models as collection of priors
-2
yn
• Take again a simple regression model
β
Where yn is some function of xn
• Add a prior on function
• And get Gaussian processes!
yn
K
β
Models as collection of priors
-3
• Take a model where xn is discrete and unknown
θ
• Add a prior on states (xn), assuming they are
temporarily smooth
• And get Hidden Markov Model!
x1
x2
t1
t2
xn-1
tn-1
xn
tn
xn+1
tn+1
Noninformative priors
• Sometimes we have no strong prior belief but still want to
apply Bayesian inference. Then we need noninformative
priors.
• If our parameter λ is a discrete variable with K states then we
can simply set each prior probability to 1/K.
• However for continues variables it is not so clear.
• One example of a noninformative prior could be a
noninformative prior over μ for Gaussian distribution:
with
• We can see that the effect of the prior on the posterior over μ
is vanished in this case.
Empirical Bayes
• But what if still want to assume some prior
information but want to learn it from the data
instead of assuming in advance?
• Imagine the following model
λ
θs
xn
N
• We cannot use full Bayesian inference but we can
approximate it by finding the best λ* to maximize
p(X|λ)
S
Empirical Bayes
• We can estimate the result by the following iterative
procedure (EM-algorithm):
• Initialize λ*
• E-step:
Compute p(θ|X, λ) given fixed λ*
• M-step:
• It illustrates the other term for Empirical Bayes – maximum
marginal likelihood.
• This is not fully Bayesian treatment however offers a useful
compromise between Bayesian and frequentist approaches.
Thank you for your attention!