Download DOC - Berkeley Statistics

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Randomness wikipedia , lookup

Birthday problem wikipedia , lookup

Inductive probability wikipedia , lookup

Probability interpretations wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Conditioning (probability) wikipedia , lookup

Transcript
Spring 2006
The following materials are from http://www.mlpedia.org/index.php?title=Markov_chain
Markov chain, named after Andrey Markov, is a stochastic process with the Markov property.
In such a process, the past is irrelevant for predicting the future, given knowledge of the present.
A Markov chain is a sequence X1, X2, X3, ... of random variables. The range of these variables,
i.e., the set of their possible values, is called the state space, the value of Xn being the state of the
process at time n. If the conditional probability distribution of Xn+1 on past states is a function of
Xn alone, then:
where x is some state of the process.
A simple way to visualise a specific type of Markov chain is through a finite state machine. If
you are at state y at time n, then the probability that you will move on to state x at time n+1 does
not depend on n, and only depends on the current state y that you are in. Hence at any time n, a
finite Markov chain can be characterized by a matrix of probabilities whose x, y element is given
by P( X n1  x | X n  y) and is independent of the time index n.
Andrey Markov produced the first results (1906) for these processes. A generalization to
countably infinite state spaces was given by Kolmogorov (1936). Markov chains are related to
Brownian motion and the ergodic hypothesis, two topics in physics which were important in the
early years of the twentieth century
Properties of Markov chains
A Markov chain is characterized by the conditional distribution P( X n1 | X n ) , which is called
the transition probability of the process. This is sometimes called the "one-step" transition
probability. The marginal distribution P(Xn) is the distribution over states at time n. The initial
distribution is P(X0). There may exist one or more state distributions π such that
,
where Y is just a convenient name for the variable of integration. Such a distribution π is called a
stationary distribution or steady-state distribution. Whether there is a stationary distribution, and
whether it is unique if it does exist, are determined by certain properties of the process.
Irreducible means that every state is accessible from every other state. A process is periodic if
there exists at least one state to which the process will continually return with a fixed time period
(greater than one). Aperiodic means that there is no such state. Positive recurrent means that the
expected return time is finite for every state. If the Markov chain is positive recurrent, there
exists a stationary distribution. If it is positive recurrent and irreducible, there exists a unique
stationary distribution, and furthermore the process constructed by taking the stationary
distribution as the initial distribution is ergodic.
Markov chains in discrete state spaces
If the state space is finite, the transition probability distribution can be represented as a matrix,
called the transition matrix, with the (i, j)'th element equal to Pij  P( X n 1  j | X n  i )
For a discrete state space, if P is the one-step transition matrix, then Pk is the transition matrix for
the k-step transition. The stationary distribution is a vector which satisfies the equation
,
where πT is the transpose of π.
As a consequence, neither the existence nor the uniqueness of a stationary distribution is
guaranteed for a general transition matrix P. However, if the transition matrix P is irreducible
and aperiodic, then there exists a unique stationary distribution π. In addition, Pk converges
elementwise to a rank-one matrix in which each row is the (transpose of the) stationary
distribution πT, that is
,
where is the column vector with all entries equal to 1. This is stated by the Perron-Frobenius
theorem.
This means that if we simulate or observe a random walk with transition matrix P, then the
long-term probability of presence of the walker in a given state is independent from where the
chain was started, and is dictated by the stationary distribution. The random walk "forgets" the
past. In short, Markov chains are the "next thing" after memoryless processes (i.e., a sequence of
independent identically distributed random variables).
A transition matrix which is positive (that is, every element of the matrix is positive) is
irreducible and aperiodic. A matrix is a stochastic matrix if and only if it is the matrix of
transition probabilities of some Markov chain.
The special case of the transition probability being independent of the past is known as the
Bernoulli scheme. A Bernoulli scheme with only two possible states is known as a Bernoulli
process.
Scientific applications
Markovian systems appear extensively in physics, particularly statistical mechanics, whenever
probabilities are used to represent unknown or unmodelled details of the system, if it can be
assumed that the dynamics are time-invariant, and that no relevant history need be considered
which is not already included in the state description.
Markov chains can also be used to model various processes in queueing theory and statistics.
Claude Shannon's famous 1948 paper A mathematical theory of communication, which at a
single step created the field of information theory, opens by introducing the concept of entropy
through Markov modeling of the English language. Such idealised models can capture many of
the statistical regularities of systems. Even without describing the full structure of the system
perfectly, such signal models can make possible very effective data compression through entropy
coding techniques such as arithmetic coding. They also allow effective state estimation and
pattern recognition. The world's mobile telephone systems depend on the Viterbi algorithm for
error-correction, while Hidden Markov models (where the Markov transition probabilities are
initially unknown and must also be estimated from the data) are extensively used in speech
recognition and also in bioinformatics, for instance for coding region/gene prediction.
The PageRank of a webpage as used by Google is defined by a Markov chain. It is the
probability to be at page i in the stationary distribution on the following Markov chain on all
(known) webpages. If N is the number of known webpages, and a page i has ki links then it has
transition probability (1-q)/ki + q/N for all pages that are linked to and q/N for all pages that are
not linked to. The parameter q is taken to be about 0.15.
Markov chain methods have also become very important for generating sequences of random
numbers to accurately reflect very complicated desired probability distributions - a process
called Markov chain Monte Carlo or MCMC for short. In recent years this has revolutionised the
practicability of Bayesian inference methods.
Markov chains also have many applications in biological modelling, particularly population
processes, which are useful in modelling processes that are (at least) analogous to biological
populations.
A recent application of Markov chains is in geostatistics. That is, Markov chains are used in two
to three dimensional stochastic simulations of discrete variables conditional on observed data.
Such an application is called "Markov chain geostatistics", similar with kriging geostatistics. The
Markov chain geostatistics method is still in development.
Markov chains can be used to model many games of chance. The children's games Chutes and
Ladders and Candy Land, for example, are represented exactly by Markov chains. At each turn,
the player starts in a given state (on a given square) and from there has fixed odds of moving to
certain other states (squares).