Download Ch 13 - Oncourse

Machine Learning: Probabilistic 13.0 Stochastic and dynamic Models of Learning 13.3 Stochastic Extensions to Reinforcement Learning 13.1 Hidden Markov Models (HMMs) 13.4 Epilogue and References 13.2 Dynamic Bayesian Networks and Learning 13.5 Exercises George F Luger ARTIFICIAL INTELLIGENCE 6th edition Structures and Strategies for Complex Problem Solving Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 1 DEFINITION HIDDEN MARKOV MODEL A graphical model is called a hidden Markov model (HMM) if it is a Markov model whose states are not directly observable but are hidden by a further stochastic system interpreting their output. More formally, given a set of states S = s1, s2, ..., sn, and given a set of state transition probabilities A = a11, a12, ..., a1n, a21, a22, ..., ..., ann, there is a set of observation likelihoods, O = pi(ot), each expressing the probability of an observation ot (at time t) being generated by a state st. Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 2 a12 = 1 - a 11 a11 S1 S2 a22 a21 = 1 - a 22 p(H) = b1 p(H) = b2 p(T) = 1 - b 1 p(T) = 1 - b 2 Figure 13.1 A state transition diagram of a hidden Markov model of two states designed for the coin flipping problem. The a ij are determined by the elements of the 2 x 2 transition matrix,A. Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 3 p(H) = b2 p(H) = b2 S2 a12 p(H) = b1 p(T) = 1 - b 1 a11 Figure 13.2 S1 a22 a21 a32 a23 a13 S3 p(H) = b3 p(T) = 1 - b 3 a33 a31 The state transition diagram for a three state hidden Markov model of coin flipping. Each coin/state, Si, has its bias,bi. Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 4 a. Figure 13.3. b. a. The hidden, S, and observable, O, states of the AR-HMM where p(O t | St , O t -1). b. The values of the hidden St of the example AR-HMM:s afe, unsafe, and faulty. Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 5 Figure 13.4 A selection of the real-time data across multiple time periods, with one time slice expanded, f or the AR-HMM. Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 6 Fi gure 13.5 Th e ti me-se rie s da ta o f Fi gure 13.4 process ed by a fas t Fou rie r tran sform on the freque ncy d omai n.This was th e da ta s ubmi tted to the AR-HMM f or each ti me p erio d. Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 7 Fi gure 13.6 An a uto regres sive factor ia l HMM, where the obsevable r sta te Ot, at time tis dep ende nt o n mul tipl e (St) s ubpro ces s, Sit, and Ot-1. Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 8 neat .00013 n iy .48 t .52 need .00056 n iy Sta rt # d .11 new .001 n knee .000024 n Fi gure 13.7 .89 .36 .64 End # iy um iy A PFSM repres enti ng a set of phon eticall y related En glis h words . Th e pro babi lity o f ea ch wo rd occu rring is b elow that word. Ad apte d fro m Jurasky and Martin (200 8). Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 9 Start = 1.0 # n iy # neat .00013 2 paths 1.0 1.0 x .00013 = .00013 .00013 x 1.0 = .00013 .00013 x .52 = .000067 need .00056 2 paths 1.0 1.0 x .00056 = .00056 .00056 x 1.0 = .00056 .00056 x .11 = .000062 new .001 2 paths 1.0 1.0 x .001 = .001 .001 x .36 = .00036 .00036 x 1.0 = .00036 knee .000024 1 path 1.0 1.0 x .000024 = .000024 .000024 x 1.0 = .000024 .000024 x 1.0 = .000024 end Total best .00036 Figure 13.8 A trace of the Viterbi algorithm on several of the paths of Figure 13.7. Rows report the maximum value for Viterbi on each word for each input value (top row). Adapted from Jurafsky and Martin (2008). Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 10 function Viterbi(Observations of length T, Probabilistic FSM) begin number := number of states in FSM create probability matrix viterbi[R = N + 2, C = T + 2]; viterbi[0, 0] := 1.0; for each time step (observation) t from 0 to T do for each state si from i = 0 to number do for each transition from si to sj in the Probabilistic FSM do begin new-count := viterbi[si, t] x path[si, sj] x p(sj | si); if ((viterbi[sj, t + 1] = 0) or (new-count > viterbi[sj, t + 1])) then begin viterbi[si, t + 1] := new-count append back-pointer [sj , t + 1] to back-pointer list end end; return viterbi[R, C]; return back-pointer list end. Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 11 Fi gure 13.9 A DBN exam ple of two tim e sl ice s.Th e se t Q o f ran dom varia bles are hi dden , th e se t O o bserved , t indi cates time. Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 12 Fi gure 13.1 0 A Ba ye sian bel ief net for the b urgla r ala rm, e arthq uake, bu rglary e xa mple. Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 13 Fi gure 13.1 1 A Markov r an dom fiel d reflecting the potenti al functio ns o f th e r an dom varia ble s in the BBN of Figu re 13 .10, tog ethe r with the two obse rvations abo ut the s ys tem. Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 14 Fi gure 13.1 2. L u A l earna ble node Li s ad ded to the Markov ran dom fie ld o f Fi gure 13 .11. The Markov rand om field ite rates across thre e ti me p eriod s. Fo r sim plicity, the EM itera tion is only i ndicated a t ti me 1 . DEFINITION A MARKOV DECISION PROCESS, or MDP A Markov Decision Process is a tuple <S, A, P, R> where: S is a set of states, and A is a set of actions. pa(st , st+1) = p(st+1 | st , at = a) is the probability that if the agent executes action a Œ A from state st at time t, it results in state st+1 at time t+1. Since the probability, pa Œ P is defined over the entire state-space of actions, it is often represented with a transition matrix. R(s) is the reward received by the agent when in state s. Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 16 DEFINITION A PARTIALLY OBSERVABLE MARKOV DECISION PROCESS, or POMDP A Partially Observable Markov Decision Process is a tuple <S, A, O, P, R> where: S is a set of states, and A is a set of actions. O is the set of observations denoting what the agent can see about its world. Since the agent cannot directly observe its current state, the observations are probabilistically related to the underlying actual state of the world. pa(st, o, st+1) = p(st+1 , ot = o | st , at = a) is the probability that when the agent executes action a from state st at time t, it results in an observation o that leads to an underlying state st +1 at time t+1. R(st, a, st+1) is the reward received by the agent when it executes action a in state st and transitions to state st+1. Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 17 st st+1 high high low low high high low low low low high low high low high low high low high low Table 13.1. at search search search search wait wait wait wait recharge recharge pa (st,st+1) a 1-a 1-b b  1 0 0 1 1 0 Ra(st,st+1) R_search R_search -3 R_search R_wait R_wait R_wait R_wait 0 0 Tran siti on p robab ilities and e xpected re ward s for the fini te MDP of th e recycling robo t examp le. The ta ble co ntai ns a ll p ossi ble co mbin atio ns o f th e current state, s t, next s tate , s t+1, the actions and re wa rds p ossi ble from the cu rrent stateat. Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 18 Fi gure 13.1 3. Th e tran siti on grap h for the recycling robo t. Thestate node s are th e la rge circles and the action nod es a re th e sm alllabck sta tes. Luger: Artificial Intelligence, 6th edition. © Pearson Education Limited, 2009 19

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Ch 13 - Oncourse