Download 1.3. Invariant probability distribution. Definition 1.4. A probability

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of statistics wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Probability interpretations wikipedia , lookup

Probability wikipedia , lookup

Transcript
MATH 56A SPRING 2008
STOCHASTIC PROCESSES
31
1.3. Invariant probability distribution.
Definition 1.4. A probability distribution is a function
π : S → [0, 1]
from the set of states S to the closed unit interval [0, 1] so that
!
π(i) = 1.
i∈S
is:
When the set of states is equal to S = {1, 2, · · · , s} then the condition
s
!
π(i) = 1.
i=1
Definition 1.5. A probability distribution π is called invariant if
πP = P.
I.e., π is a left eigenvector for P with eigenvalue 1.
1.3.1. probability distribution of Xn . Each Xn has a probability distribution. I used the following example to illustrate this.
1/3
1
2
•
•
1/4
The numbers 1/3 and 1/4 are transition probabilities. They say
nothing about X0 . But we need to start in a random state X0 . This
is because we need to understand how the transition from one random
state to another works so that we can go from Xn to Xn+1 .
X0 will be equal to either 0 or 1 with probability:
α1 = P(X0 = 1),
α2 = P(X0 = 2).
These two numbers are between 0 and 1 (inclusive) and add up to 1:
α1 + α2 = 1.
So, α = (α1 , α2 ) is a probability distribution. α is the probability
distribution of X0 and is called the initial (probability) distribution.
Once the distribution of X0 is given, the probability distribution of
every Xn is determined by the transition matrix:
Theorem 1.6. The probability distribution of Xn is the vector αP n .
32
FINITE MARKOV CHAINS
So, in the example,
P(X2 = 2) =
2 !
2
!
i=1 j=1
=
P(X0 = i) P(X1 = j | X0 = i) P(X2 = 2 | X1 = j)
" #$ % "
#$
%"
#$
%
αi
!
p(i,j)
p(j,2)
αi p(i, j)p(j, 2) = (αP 2 )2 .
i,j
This is the sum of the probabilities of all possible ways that you can
end up at state 2 at time 2.
To prove this in general, I used the following probability formula:
Lemma 1.7. Suppose that our sample space is a disjoint union
&
Ω=
Bi
of events Bi . Then
P(A) =
(1.1)
!
i
P(Bi )P(A | Bi )
I drew this picture to illustrate this basic concept that you should
already know.
Proof of theorem 1.6. By induction on n. If n = 0 then P n = I is the
identity matrix. So,
αP n = αP 0 = αI = α.
This is the distribution of Xn = X0 by definition. So, the theorem
holds for n = 0.
Suppose the theorem holds for n. Then, by Equation (1.1)
P(Xn+1 = 1) =
P(Xn = 1) P(Xn+1 = 1 | Xn = 1) +P(Xn = 2) P(Xn+1 = 1 | Xn = 2)
" #$ % "
" #$ % "
#$
%
#$
%
B1
p(1,1)
B2
'
p(2,1)
= (αP n )1 p(1, 1) + (αP n )2 p(2, 1) = [(αP n )P ]1 = αP n+1
(
1
.
MATH 56A SPRING 2008
STOCHASTIC PROCESSES
33
And similarly, P(Xn+1 = 2) = (αP n+1 )2 . So, the theorem holds for
n + 1. So, it holds for all n ≥ 0.
!
Corollary 1.8. If the initial distribution α = π is invariant, then Xn
has probability distribution π for all n.
Proof. The distribution of Xn is
αP n = πP n = "#$%
πP P · · · P = π
=π
!
since every time you multiply P you always have π.
1.3.2. Perron-Frobenius Theorem. I stated this very important theorem without proof. However, the proof is outlined in Exercise 1.20 in
the book.
Theorem 1.9 (Perron-Frobenius). Suppose that A is a square matrix all of whose entries are positive real numbers. Then, A has a left
eigenvector π, all of whose coordinates are positive real numbers. I.e.,
πP = λπ.
Furthermore,
(a) π is unique up to a scalar multiple. (If α is another left eigenvector of A with positive real entries then α = Cπ for some
scalar C.)
(b) λ1 = λ is a positive real number.
(c) The eigenvector λ1 is larger in absolute value then any other
eigenvalue of A: |λ2 |, |λ3 |, · · · < λ1 . (So, λ1 is called the maximal eigenvalue of A.)
(d)
 
π
π 
1

lim n P n = 
 ... 
n→∞ λ1
π
assuming that π is a probability distribution, i.e.,
/
πi = 1.
I didn’t prove this. However, I tried to explain the last statement.
When we raise P to the power n, it tends to look like multiplication
by λn1 . So, we should divide by λn1 . If we know that the rows of the
34
matrix
FINITE MARKOV CHAINS
1
Pn
λn
1
are all the same, what is it?
 
α

α  0! 1
1 n

π n P = (π1 , π2 , · · · )  .. 
=
πi α
λ1
.
α
But π λ11 P = π. So,
π
and
0! 1
1 n
P
=
π
=
πi α
λn1
1
α = / π.
πi
This theorem applies to Markov chains but with some conditions.
First, I stated without proof the fact:
Theorem 1.10. The maximal eigenvalue of P is 1. More precisely,
all eigenvalues of P have |λ| ≤ 1.
Proof. Suppose that P has an eigenvalue λ with absolute value greater
than 1. Then, there is an eigenvector x so that xP = λx. Then
xP n = λn x diverges as n goes to infinity. But this is not possible since
the entries of the matrix P n are all between 0 and 1 since P n is the
probability transition matrix from X0 to Xn by Theorem 1.6.
!
I’ll explain this proof later. What I did explain is class is that 1 is
always an eigenvalue of P . This follows from the fact that the rows of
P add up to 1:
!
p(i, j) = 1.
j
This implies that the column vector with all entries 1 is a right eigenvector of P with eigenvalue 1:
   
1
1
1 1
  
P
 ...  =  ... 
1
For example, if
then
2
1
3
2/3 1/3
P =
1/4 3/4
2 3 2
32 3 2 3
1
1
1
2/3 1/3
P
=
=
1/4 3/4
1
1
1
MATH 56A SPRING 2008
STOCHASTIC PROCESSES
35
The invariant distribution π is a left eigenvector with eigenvalue 1. The
unique invariant distribution is:
2
3
3 4
π=
,
7 7
This means:
2
32
3 2
3
3 4
3 4
2/3 1/3
πP =
,
=
,
= π.
1/4 3/4
7 7
7 7
You can find π using linear algebra or a computer. You can also use
intuition. In the Markov chain:
1/3
1
•
2
1/4
•
we can use the Law of large numbers which says that, if there are
a large number of people moving randomly, then the proportion who
move will be approximately equal to the probability. So, if there are a
large number of people in states 1 and 2 then one third of those at 1
will move to 2 and one fourth of those in 2 will move to 1. If you want
the distribution to be stable, the numbers should have a 3:4 ratio. If
there are 3 guys at point 1 and 1/3 of them move, then one guy moves
to 2. If there are 4 guys at 2 and 1/4 of them move then one guy moves
from 2 to 1 and the distribution is the same. To make it a probability
distribution, the vector (3, 4) needs to be divided by 7.
Theorem 1.11. If the Markov chain is aperiodic and irreducible then
it satisfies the conclusions of the Perron-Frobenius theorem.
Proof. These conditions imply that A = P n has all positive entries for
some finite n. Then, the Perron-Frobenius eigenvector for A is the
invariant distribution for P .
!
The Perron-Frobenius theorem tells us that the distribution of Xn
will reach an equilibrium (the invariant distribution) for large n (assuming aperiodic). Then next question is: How long does it take?
36
FINITE MARKOV CHAINS
1.4. Transient classes. I asked the question:
How long does it take to escape from a transient class? I started
with a really simple example:
p
•1
! •2
This is a Markov chain with one transient class {1} and one absorbing
class {2}. The question is: How long can you stay in the transient
class? I was glad to see that students know basic probability:
P(Xn = 1 | X0 = 1) = (1 − p)n .
What happens when n goes to infinity?
lim (1 − p)n = 0 if p > 0.
n→∞
Proof. And you guys helped me with this proof:
L = lim (1 − p)n
n→∞
ln L = lim "#$%
n ln (1 − p) = −∞.
n→∞
" #$ %
→∞
So, L = 0.
"
<1
#$
<0
%
!
So, the probability of remaining indefinitely in state 1 is zero. In
other words, you will eventually escape the transient class with probability one (at least in this example). For future reference I recorded
this conclusion as follows.
Theorem 1.12. If the probability of success is p > 0 and if you try
infinitely many times, then you will eventually succeed with probability
one.
But how long does it take? Let
T := smallest n so that Xn = 2.
Then
P(T = n | X0 = 1) = p(1 − p)n−1 .
For example, if n = 3 then, T = 3 which means we have:
X0
•
1
X1
1−p
−−−→
•
1
X2
1−p
−−−→
•
1
X3
p
−−−→
P(T = 3 | X0 = 1) = p(1 − p)2 .
•
2
MATH 56A SPRING 2008
STOCHASTIC PROCESSES
37
The numbers p(1−p)n add up to 1 and give what is called the geometric
distribution on the nonnegative integers.
From this formula we can calculate the conditional expected value
of T :
∞
!
E(T | X0 = 1) =
np(1 − p)n−1
n=0
and, yes, the n = 0 term is zero. This is easy to calculate using a little
calculus. First we start with the geometric series
∞
!
1
.
g(x) :=
xn = 1 + x + x2 + x3 + · · · =
1−x
n=0
Then differentiate:
g % (x) =
∞
!
nxn−1 =
n=0
1
.
(1 − x)2
Applying this formula to the expected value problem, x = 1 − p and
we get:
2
3
!
1
1
1
n−1
E(T | X0 = 1) = p
n(1 − p)
=p
=p 2 =
.
2
(1 − (1 − p))
p
p
This was actually intuitively obvious from the beginning. For example:
•
1/5
!
•
When p = 1/5 then you expect it to happen in 5 trials. So, E(T ) =
5 = 1/p.