Download Ch 13 - Oncourse

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Embodied cognitive science wikipedia , lookup

Technological singularity wikipedia , lookup

Philosophy of artificial intelligence wikipedia , lookup

History of artificial intelligence wikipedia , lookup

Ethics of artificial intelligence wikipedia , lookup

Intelligence explosion wikipedia , lookup

Existential risk from artificial general intelligence wikipedia , lookup

Transcript
Machine Learning: Probabilistic
13.0
Stochastic and dynamic Models of
Learning
13.3
Stochastic Extensions to Reinforcement
Learning
13.1
Hidden Markov Models (HMMs)
13.4
Epilogue and References
13.2
Dynamic Bayesian Networks and
Learning
13.5
Exercises
George F Luger
ARTIFICIAL INTELLIGENCE 6th edition
Structures and Strategies for Complex Problem Solving
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
1
DEFINITION
HIDDEN MARKOV MODEL
A graphical model is called a hidden Markov model (HMM) if it is a
Markov model whose states are not directly observable but are hidden
by a further stochastic system interpreting their output. More formally,
given a set of states S = s1, s2, ..., sn, and given a set of state transition
probabilities A = a11, a12, ..., a1n, a21, a22, ..., ..., ann, there is a set of
observation likelihoods, O = pi(ot), each expressing the probability of
an observation ot (at time t) being generated by a state st.
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
2
a12 = 1 - a 11
a11
S1
S2
a22
a21 = 1 - a 22
p(H) = b1
p(H) = b2
p(T) = 1 - b 1
p(T) = 1 - b 2
Figure 13.1
A state transition diagram of a hidden Markov model of two
states designed for the coin flipping problem. The a ij are
determined by the elements of the 2 x 2 transition matrix,A.
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
3
p(H) = b2
p(H) = b2
S2
a12
p(H) = b1
p(T) = 1 - b 1
a11
Figure 13.2
S1
a22
a21
a32
a23
a13
S3
p(H) = b3
p(T) = 1 - b 3
a33
a31
The state transition diagram for a three state hidden Markov
model of coin flipping. Each coin/state, Si, has its bias,bi.
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
4
a.
Figure 13.3.
b.
a. The hidden, S, and observable, O, states of the AR-HMM where
p(O t | St , O t -1).
b. The values of the hidden St of the example AR-HMM:s afe, unsafe,
and faulty.
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
5
Figure 13.4 A selection of the real-time data across multiple time
periods, with one time slice expanded, f or the AR-HMM.
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
6
Fi gure 13.5 Th e ti me-se rie s da ta o f Fi gure 13.4 process ed by a fas t
Fou rie r tran sform on the freque ncy d omai n.This was
th e da ta s ubmi tted to the AR-HMM f or each ti me p erio d.
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
7
Fi gure 13.6
An a uto regres sive factor ia l HMM, where the obsevable
r
sta te Ot,
at time tis dep ende nt o n mul tipl e (St) s ubpro ces s, Sit, and Ot-1.
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
8
neat
.00013
n
iy
.48
t
.52
need
.00056
n
iy
Sta rt
#
d
.11
new
.001
n
knee
.000024
n
Fi gure 13.7
.89
.36
.64
End
#
iy
um
iy
A PFSM repres enti ng a set of phon eticall y related En glis h
words . Th e pro babi lity o f ea ch wo rd occu rring is b elow that
word. Ad apte d fro m Jurasky and Martin (200 8).
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
9
Start = 1.0
#
n
iy
#
neat
.00013
2 paths
1.0
1.0 x .00013 = .00013
.00013 x 1.0 = .00013
.00013 x .52 = .000067
need
.00056
2 paths
1.0
1.0 x .00056 = .00056
.00056 x 1.0 = .00056
.00056 x .11 = .000062
new
.001
2 paths
1.0
1.0 x .001 = .001
.001 x .36 = .00036
.00036 x 1.0 = .00036
knee
.000024
1 path
1.0
1.0 x .000024 = .000024 .000024 x 1.0 = .000024 .000024 x 1.0 = .000024
end
Total best
.00036
Figure 13.8 A trace of the Viterbi algorithm on several of the paths of Figure 13.7. Rows
report the maximum value for Viterbi on each word for each input value (top row).
Adapted from Jurafsky and Martin (2008).
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
10
function Viterbi(Observations of length T, Probabilistic FSM)
begin
number := number of states in FSM
create probability matrix viterbi[R = N + 2, C = T + 2];
viterbi[0, 0] := 1.0;
for each time step (observation) t from 0 to T do
for each state si from i = 0 to number do
for each transition from si to sj in the Probabilistic FSM do
begin
new-count := viterbi[si, t] x path[si, sj] x p(sj | si);
if ((viterbi[sj, t + 1] = 0) or (new-count > viterbi[sj, t + 1]))
then
begin
viterbi[si, t + 1] := new-count
append back-pointer [sj , t + 1] to back-pointer list
end
end;
return viterbi[R, C];
return back-pointer list
end.
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
11
Fi gure 13.9
A DBN exam ple of two tim e sl ice s.Th e se t Q o f ran dom
varia bles are hi dden , th e se t O o bserved , t indi cates time.
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
12
Fi gure 13.1 0 A Ba ye sian bel ief net for the b urgla r ala rm, e arthq uake,
bu rglary e xa mple.
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
13
Fi gure 13.1 1
A Markov r an dom fiel d reflecting the potenti al functio ns o f th e
r an dom varia ble s in the BBN of Figu re 13 .10, tog ethe r with the
two obse rvations abo ut the s ys tem.
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
14
Fi gure 13.1 2.
L
u
A l earna ble node Li s ad ded to the Markov ran dom fie ld o f Fi gure
13 .11. The Markov rand om field ite rates across thre e ti me p eriod s.
Fo r sim plicity, the EM itera tion is only i ndicated a t ti me 1 .
DEFINITION
A MARKOV DECISION PROCESS, or MDP
A Markov Decision Process is a tuple <S, A, P, R> where:
S is a set of states, and
A is a set of actions.
pa(st , st+1) = p(st+1 | st , at = a) is the probability that if the agent executes
action a Œ A from state st at time t, it results in state st+1 at time t+1. Since
the probability, pa Œ P is defined over the entire state-space of actions, it is
often represented with a transition matrix.
R(s) is the reward received by the agent when in state s.
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
16
DEFINITION
A PARTIALLY OBSERVABLE MARKOV DECISION PROCESS, or POMDP
A Partially Observable Markov Decision Process is a tuple <S, A, O, P, R> where:
S is a set of states, and
A is a set of actions.
O is the set of observations denoting what the agent can see about its world.
Since the agent cannot directly observe its current state, the observations are
probabilistically related to the underlying actual state of the world.
pa(st, o, st+1) = p(st+1 , ot = o | st , at = a) is the probability that when the agent
executes action a from state st at time t, it results in an observation o that
leads to an underlying state st +1 at time t+1.
R(st, a, st+1) is the reward received by the agent when it executes action a in
state st and transitions to state st+1.
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
17
st
st+1
high
high
low
low
high
high
low
low
low
low
high
low
high
low
high
low
high
low
high
low
Table 13.1.
at
search
search
search
search
wait
wait
wait
wait
recharge
recharge
pa (st,st+1)
a
1-a
1-b
b

1
0
0
1
1
0
Ra(st,st+1)
R_search
R_search
-3
R_search
R_wait
R_wait
R_wait
R_wait
0
0
Tran siti on p robab ilities and e
xpected re ward s for the fini te MDP of
th e recycling robo t examp le. The ta ble co ntai ns a ll p ossi ble co mbin atio ns o f th e current state, s t, next s tate , s t+1, the actions and
re wa rds p ossi ble from the cu rrent stateat.
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
18
Fi gure 13.1 3. Th e tran siti on grap h for the recycling robo t. Thestate node s are
th e la rge circles and the action nod es a re th e sm alllabck sta tes.
Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009
19