Download Cadeias de Markov Escondidas - Universidade Federal do

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Choice modelling wikipedia , lookup

Least squares wikipedia , lookup

Time series wikipedia , lookup

Transcript
Cadeias de Markov Escondidas
Magnos Martinello
Universidade Federal do Espírito Santo - UFES
Departamento de Informática - DI
Laboratório de Pesquisas em Redes Multimidia - LPRM
Fevereiro 2007
História

Andrey (Andrei) Andreyevich Markov (Russian:
Андрей Андреевич Марков) (June 14, 1856 N.S. –
July 20, 1922) was a Russian mathematician. He is
best known for his work on theory of stochastic
processes. His research later became known as
Markov chains.

His son, another Andrey Andreevich Markov (19031979), was also a notable mathematician.
Magnos Martinello – UFES
Cadeia de Markov

In mathematics, a Markov chain, named after Andrey
Markov, is a discrete-time or continuous-time
stochastic process with the Markov property.

A Markov chain is a series of states of a system that
has the Markov property.

A series with the Markov property, is a sequence of
states for which the conditional probability
distribution of a state in the future can be deduced
using only the current state
Magnos Martinello – UFES
Definição formal

A Markov chain is a sequence of random variables X1,
X2, X3, ... with the Markov property, namely that,
given the present state, the future and past states
are independent. Formally,


The possible values of Xi form a countable set S
called the state space of the chain. Markov chains
are often described by a directed graph, where the
edges are labeled by the probabilities of going from
one state to the other states.

A finite state machine is an example of a Markov
chain.
Magnos Martinello – UFES
Propriedades

Reducibility : a Markov chain is said to be irreducible if its state space
is a communicating class; this means that, in an irreducible Markov
chain, it is possible to get to any state from any state

Periodicity : A state i has period k if any return to state i must occur
in some multiple of k time steps and k is the largest number with this
property. If k = 1, then the state is said to be aperiodic

Recurrence : A state i is said to be transient if, given that we start in
state i, there is a non-zero probability that we will never return back
to i.



If a state i is not transient (it has finite hitting time with probability
1), then it is said to be recurrent or persistent
A state i is called absorbing if it is impossible to leave this state
Ergodicity : A state i is said to be ergodic if it is aperiodic and
positive recurrent
Magnos Martinello – UFES
Aplicações científicas

Markovian systems appear extensively in physics,
particularly statistical mechanics

Markov chains can also be used to model various
processes in queueing theory and statistics. Claude
Shannon's famous 1948 paper A mathematical theory
of communication, which at a single step created the
field of information theory, opens by introducing the
concept of entropy (effective data compression
through entropy coding techniques) through Markov
modeling. They also allow effective state estimation
and pattern recognition.
Magnos Martinello – UFES
Aplicações científicas

The PageRank of a webpage as used by Google is defined by a
Markov chain. It is the probability to be at page i in the
stationary distribution on the following Markov chain on all
(known) webpages.

Markov models have also been used to analyze web navigation
behavior of users. A user's web link transition on a particular
website can be modeled using first or second order Markov
models

Markov chain methods have also become very important for
generating sequences of random numbers to accurately reflect
very complicated desired probability distributions - a process
called Markov chain Monte Carlo or MCMC for short. In recent
years this has revolutionised the practicability of Bayesian
inference methods.

Markov parody generator (Emacs, M-x dissociated-press )
Magnos Martinello – UFES
Modelo de previsão de tempo

The probabilities of weather conditions, given the
weather on the preceding day, can be represented by
a transition matrix:


Pij is the probability that, if a given day is of type i,
it will be followed by a day of type j.

Note that the rows of P sum to 1: this is because P is
a stochastic matrix.
Magnos Martinello – UFES
Prevendo o tempo

The weather on day 0 is known to be sunny. This is
represented by a vector in which the "sunny" entry is
100%, and the "rainy" entry is 0%:


The weather on day 1 can be predicted by:



Thus, there is an 90% chance that day 1 will also be
sunny.
The weather on day 2 can be predicted in the same
way:


General rules for day n are:


Magnos Martinello – UFES
Regime estacionário

In this example, predictions for the weather on more
distant days are increasingly inaccurate and tend
towards a steady state vector.

The steady state vector is defined as:

Since the q is independent from initial conditions, it
must be unchanged when transformed by P.
Magnos Martinello – UFES
Regime estacionário
So
− 0.1q1 + 0.5q2 = 0
Magnos Martinello – UFES
Conclusão

Since they are a probability vector we know that

q1 + q2 = 1.

Solving this pair of simultaneous equations gives the steady
state distribution:

In conclusion, in the long term, 83% of days are sunny.

For the most prolific example of the use of Markov chains, see
Google. A description behind the page rank algorithm, which is
basically a Markov chain over the graph of the Internet, can be
found in the seminal paper, "The Page Rank Citation Ranking:
Bringing Order to the Web" by Larry Page, Sergey Brin, R.
Motwani, and T. Winograd.
Magnos Martinello – UFES
Definição

A hidden Markov model (HMM) is a statistical model
in which the system being modeled is assumed to be a
Markov process with unknown parameters, and the
challenge is to determine the hidden parameters
from the observable parameters. The extracted
model parameters can then be used to perform
further analysis, for example for pattern recognition
applications. A HMM can be considered as the
simplest dynamic Bayesian network.
Magnos Martinello – UFES
Cadeia de Markov escondida
State transitions in a hidden Markov model (example)
x — hidden states
y — observable outputs
a — transition probabilities
b — output probabilities
Magnos Martinello – UFES
Intuição/applicação

In a regular Markov model, the state is directly
visible to the observer, and therefore the state
transition probabilities are the only parameters. In a
hidden Markov model, the state is not directly
visible, but variables influenced by the state are
visible.

Hidden Markov models are especially known for their
application in temporal pattern recognition such as
speech, handwriting, gesture recognition, musical
score following and bioinformatics.
Magnos Martinello – UFES
HMMs and their Usage

HMMs are very common in Computational Linguistics:



Speech recognition (observed: acoustic signal, hidden:
words)
Handwriting recognition (observed: image, hidden: words)
Machine translation (observed: foreign words, hidden: words
in target language)
Magnos Martinello – UFES
Architecture of a Hidden Markov Model

The diagram below shows the general architecture of an HMM.
Each oval shape represents a random variable that can adopt a
number of values. The random variable x(t) is the value of the
hidden variable at time t. The random variable y(t) is the value
of the observed variable at time t. The arrows in the diagram
denote conditional dependencies.

From the diagram, it is clear that the value of the hidden
variable x(t) (at time t) only depends on the value of the hidden
variable x(t − 1) (at time t − 1). This is called the Markov
property. Similarly, the value of the observed variable y(t) only
depends on the value of the hidden variable x(t) (both at time
t).
Magnos Martinello – UFES
Madame moderna caprichosa

Assume you have a friend who lives far away and to
whom you talk daily over the telephone.

Your friend is only interested in three activities:
walking in the park, shopping, and cleaning his
apartment.

The choice of what to do is determined exclusively by
the weather on a given day.

Based on what she tells you she did each day, you try
to guess what the weather must have been like.
Magnos Martinello – UFES
Madame moderna caprichosa

You believe that the weather operates as a discrete Markov
chain. There are two states, "Rainy" and "Sunny", but you cannot
observe them directly, that is, they are hidden from you.

On each day, there is a certain chance that your friend will
perform one of the following activities, depending on the
weather: "walk", "shop", or "clean". Since your friend tells you
about her activities, those are the observations.

The entire system is that of a hidden Markov model (HMM).

You know the general weather trends in the area, and what your
friend likes to do on average. In other words, the parameters of
the HMM are known
Magnos Martinello – UFES
Probability of an observed sequence

The probability of observing a sequence Y =
y(0),y(1),...,y(L − 1) of length L is given by:


where the sum runs over all possible hidden node
sequences X = x(0),x(1),...,x(L − 1). A brute force
calculation of P(Y) is intractable for realistic
problems, as the number of possible hidden node
sequences typically is extremely high. The calculation
can however be sped up enormously using an
algorithm called the forward-backward procedure.
Magnos Martinello – UFES
Using Hidden Markov Models




There are three canonical problems associated with HMMs:
Given the parameters of the model, compute the probability of
a particular output sequence. This problem is solved by the
forward-backward algorithm.
Given the parameters of the model, find the most likely
sequence of hidden states that could have generated a given
output sequence. This problem is solved by the Viterbi
algorithm.
Given an output sequence or a set of such sequences, find the
most likely set of state transition and output probabilities. In
other words, train the parameters of the HMM given a dataset
of sequences. This problem is solved by the Baum-Welch
algorithm.
Magnos Martinello – UFES