Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
5. Maximum Likelihood –II Prof. Yuille. Stat 231. Fall 2004. Topics • Exponential Distributions, Sufficient Statistics, and MLE. • Maximum Entropy Principle. • Model Selection. Exponential Distributions. • Gaussians are a member of the class of exponential distribution. • Parameters • Statistics Sufficient Statistics. • The are the sufficient statistics of the distribution. • Knowledge of is all we need to know about the data The rest is irrelevant. • Almost all distributions can be expressed as Exponentials – Gaussian, Poisson, etc. Sufficient Statistics of Gaussian • One-Dimensional Gaussian and samples • Sufficient statistics are • And • These are sufficient to learn the parameters of the distribution from data. MLE for Gaussian • To estimate the parameters – maximize • Or equivalently, maximize: • The sufficient statistics are chosen so that Sufficient Statistics for Gaussian • Distribution is of form: • This is the same as a Gaussian with mean • and variance Exponential Models and MLE. • MLE corresponds to maximizing Equivalent to minimizing Where Exponential Models and MLE. • This minimization is a convex optimization problem and hence has a unique solution. But finding this solution may be difficult. • Algorithms such as Generalized Iterative Scaling are guaranteed to converge. Maximum Entropy Principle. • An alternative way to think of Exponential Distributions and MLE. • Start with the Statistics, and then estimate the form and the parameters of the probability distribution. • Using the Maximum Entropy principle. Entropy • The entropy of a distribution is • Defined by Shannon as a measure of the information obtained by observing a sample from P(x). Maximum Entropy Principle • Maximum Entropy Principle. Select the distribution P(x) which maximizes the entropy subject to constraints. • Lagrange multipliers • The observed value of the statistics are Maximum Entropy • Minimize with respect to P(x). Gives the (exponential) form of the distribution: • Maximizing with respect to the Lagrange parameters ensures that the constraints are satisfied: • Maximum Entropy. • This gives the same result as MLE for Exponential Distributions. • Maximum Entropy + Constraints = Exponential Distribution + MLE Parameter. • The Max-Ent distribution which has the observed sufficient statistics is the exponential distribution with those statistics. • Example: can obtain a Gaussian by performing Max-Ent on statistics Minimax Principle. • Construct a distribution incrementally by increasing the number of statistics • The entropy of the Max-Ent distribution with M statistics is given by: • Minimax Principle: select the statistics to minimize the entropy of the maximum entropy distribution. This relates to model selection. Model Selection. • Suppose we do not know which model generates the data. • Two models • Priors • Model selection enables us to estimate which model is most likely to have generated the data Model Selection. • Calculate • Compare with • Observe that we must sum over all possible values of the model parameters Model Selection & Minimax. • The entropy of the Max-Ent distribution • Is minus the probability of the data • So the Minimax Principle is a form of model selection. But it estimates the parameters instead of summing them out. Model Selection. • Important Issue: Suppose the model has more parameters than Then is more flexible and can fit a larger number of data models. • But summing over the parameters • and penalizes this flexibility. • Gives “Occam’s Razor” favoring the simpler model. Model Selection. • More advanced modeling requires performing model selection – where the models are complex. • Beyond scope of this course.