Download Principle of Maximum Entropy www.AssignmentPoint.com The

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

German tank problem wikipedia , lookup

Transcript
Principle of Maximum
Entropy
www.AssignmentPoint.com
www.AssignmentPoint.com
The principle of maximum entropy states that, subject to precisely stated prior
data (such as a proposition that expresses testable information), the probability
distribution which best represents the current state of knowledge is the one with
largest entropy.
Another way of stating this: Take precisely stated prior data or testable
information about a probability distribution function. Consider the set of all trial
probability distributions that would encode the prior data. Of those, the one with
maximal information entropy is the proper distribution, according to this
principle.
History
The principle was first expounded by E. T. Jaynes in two papers in 1957 where
he emphasized a natural correspondence between statistical mechanics and
information theory. In particular, Jaynes offered a new and very general
rationale why the Gibbsian method of statistical mechanics works. He argued
that the entropy of statistical mechanics and the information entropy of
information theory are principally the same thing. Consequently, statistical
mechanics should be seen just as a particular application of a general tool of
logical inference and information theory.
Overview[edit]
In most practical cases, the stated prior data or testable information is given by a
set of conserved quantities (average values of some moment functions),
associated with the probability distribution in question. This is the way the
www.AssignmentPoint.com
maximum entropy principle is most often used in statistical thermodynamics.
Another possibility is to prescribe some symmetries of the probability
distribution. The equivalence between conserved quantities and corresponding
symmetry groups implies a similar equivalence for these two ways of specifying
the testable information in the maximum entropy method.
The maximum entropy principle is also needed to guarantee the uniqueness and
consistency of probability assignments obtained by different methods, statistical
mechanics and logical inference in particular.
The maximum entropy principle makes explicit our freedom in using different
forms of prior data. As a special case, a uniform prior probability density
(Laplace's principle of indifference, sometimes called the principle of
insufficient reason), may be adopted. Thus, the maximum entropy principle is
not merely an alternative way to view the usual methods of inference of
classical statistics, but represents a significant conceptual generalization of
those methods. It means that thermodynamics systems need not be shown to be
ergodic to justify treatment as a statistical ensemble.
In ordinary language, the principle of maximum entropy can be said to express a
claim of epistemic modesty, or of maximum ignorance. The selected
distribution is the one that makes the least claim to being informed beyond the
stated prior data, that is to say the one that admits the most ignorance beyond
the stated prior data.
www.AssignmentPoint.com
Testable information
The principle of maximum entropy is useful explicitly only when applied to
testable information. Testable information is a statement about a probability
distribution whose truth or falsity is well-defined. For example, the statements
the expectation of the variable x is 2.87
and
p2 + p3 > 0.6
(where p2 + p3 are probabilities of events) are statements of testable
information.
Given testable information, the maximum entropy procedure consists of seeking
the probability distribution which maximizes information entropy, subject to the
constraints of the information. This constrained optimization problem is
typically solved using the method of Lagrange multipliers.
Entropy maximization with no testable information respects the universal
"constraint" that the sum of the probabilities is one. Under this constraint, the
maximum entropy discrete probability distribution is the uniform distribution,
www.AssignmentPoint.com
Applications
The principle of maximum entropy is commonly applied in two ways to
inferential problems:
Prior probabilities
The principle of maximum entropy is often used to obtain prior probability
distributions for Bayesian inference. Jaynes was a strong advocate of this
approach, claiming the maximum entropy distribution represented the least
informative distribution. A large amount of literature is now dedicated to the
elicitation of maximum entropy priors and links with channel coding.
Maximum entropy models
Alternatively, the principle is often invoked for model specification: in this case
the observed data itself is assumed to be the testable information. Such models
are widely used in natural language processing. An example of such a model is
logistic regression, which corresponds to the maximum entropy classifier for
independent observations.
www.AssignmentPoint.com