Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Recitation session Bayesian networks, HMM, Kalman Filters, DBNs Ch. 15 AIMA 2nd Ed. 1 Hidden Markov models and Dynamic Bayesian networks Assume we want to build a program that infers the state of confusion in a CS440 lecture. We have a video camera that observes the classroom. It is a rather fancy camera: it can detect individual students faces. More than that, it can, with some limited accuracy, determine from one image of a student’s face whether the student is sleeping or not. We know, from studies done over many years, that the probability of a student sleeping in a lecture, when it is confusing, is 70%. If the lecture is not confusing, the student may still be tired from a night or partying or he may be simply bored and, hence, sleeping with probability 20%. We also know that when the lecture is confusing (or not), the fact that one student is sleeping does not impact other students around her. Let us assume that our smart camera returns the information on each and every student in the class sleeping or not sleeping with the accuracy of 80%. Let us also assume that it does so every 1 minute. We also know that if the class was confused at some time t there is 90% certainty that it would be confused a minute later. Similarly, if it is not confused at time it will not be confused at time with the same probability of 90%. Initially, there is a 10% chance that the class will be confused. 1. Formulate the problem by modeling it as a dynamic Bayesian network. Define variables, structure, and parameters. 2. Can this problem be modeled as an HMM? If so, convert it into an HMM. Compute all necessary parameters. 3. Let there be only 2 students in the class. The algorithm implemented in the camera for detection of sleeping has been improved so that it does not make any errors, i.e., the camera can accurately tell whether a student is sleeping or not. The camera records over two minutes and returns the following information about two students: time 1: (asleep, awake), time 2: (asleep, asleep) . (a) Predict the state of confusion at minute t=2 given the information about the two students at minute t=1. (b) Now assume we have the full information (both at t=1 and t=2). What is the probability of the state of confusion at t=2? (c) Looking back, what should have been the probability of confusion at t=1 if we have this complete information? (d) What is the most likely explanation of the class’s state of confusion at t=1 and t=2, with all of the camera record in hand? 1.1 Solution 1. We first need to name all the variables in the problem. Let denote the random variable that describes the state of confusion of the class at time . We assume confused not confused. Next, let be the random variable corresponding to what the camera detects as the state of the student in the class at time , i.e., asleep awake. Of course, the camera cannot detect the true “sleeping” state of the student. Let us call this true state , asleep awake. (Note: we could also say that trully asleep truly awake, but it should be clear from the context what the states’ real meaning is.) 1 ... C t−1 ... Ct T1,t−1 T2,t−1 T1,t T2,t S1,t−1 S2,t−1 S1,t S2,t Figure 1: Two slices of the CS440 dynamic Bayesian network. Based on the description of the problem we arrive at the network shown in Figure 1. Actually, we show only two slices, at times and . Nodes corresponding to each student are independent of each other, given the state of the class confusion, as defined in the problem. Furthermore, -variables only depend on corresponding -variables. Finally, -nodes are linked across time. For convenience, we show the evidence nodes as shaded. After defining the variables and the network structure, we are left with the task of defining network parameters. From the problem definition, it is easy to see that: 2. One way to model this problem using a hidden Markov model is to group all hidden variables at every time , and all into one compound hidden variable at that time . Let us call this compound hidden variable . Then where is the total number of students in the class. The compound variable will have different states. For instance, when there are only two student in the class, , the number of states of the compound hidden variable is . At the same time, we need to group all the evidence variables, into a compound evidence variable The compound evidence variable has states. The new model is now an HMM depicted in Figure 2. We also need to define parameters of this new model. Let us first consider the transition probability between compound hidden variables and . Using the definition of the compound variables and by observing the 2 ... ht−1 ht Et−1 Et ... Figure 2: Two slices of the CS440 dynamic Bayesian network now represented as a HMM with compound states. dependencies in the original network we can write The transition table will have entries. For instance, the entry corresponding to the hidden states confused asleep awake and confused asleep asleep can be computed as confused asleep awake confused asleep asleep asleepconfused awakeconfused confusedconfused Other entries can be computed in a similar manner. The compound evidence probabilities can also be easily computed. Again, using the definition of the compound variables and the dependencies observed from the network we can write This table will have entries. For instance, the entry corresponding to evidence and compound hidden state confused asleep awake will be computed as awake awake awake awake confused asleep awake awakeasleep awakeawake Finally, the last parameter needed to be computed is the initial state distribution, . We compute it as The initial probability of the state confused asleep awake is, for example, confused asleep awake asleepconfused awakeconfused confused 3 C1 S1,1 C2 S2,1 S1,2 asleep awake S2.2 asleep asleep Figure 3: CS440 dynamic Bayesian network. 3. With two students in the class and a perfect camera sleeping detector the network becomes slightly different from the one we discussed previously. The main difference is that we can drop all nodes (or -nodes, for that matter)—for perfect camera detectors . This is depicted in Figure 3. We see that this is a simplified version of the network in (1). Transition state probabilities are the same as in (1). Evidence state probabilities can be computed using the method from (2). Note that what used to be called the -variable is now the variable, e.g., . Hence, this model is an HMM and we can use appropriate HMM algorithms to answer the remaining questions. (a) To compute the predicted probabilities of the state of confusion at time given the evidence at time we could “blindly” use the HMM prediction algorithm. However, it is equally easy to “derive” this algorithm and compute the requested probability in the process. First, we note that we need to compute . From the network, this term can be written as asleep asleep awake awake Here we know all the terms except . This is the probability of the filtered state estimate at time . We can compute this probability as So, to compute the predicted state distribution at time we first need to compute the filtered state probability at time , and then compute the predicted probability using the result of the first step. In particular, we get the following numbers: asleep½ ¾½ awake½ ½½ asleep awake ½ Note that the above product is not the typical vector product–rather, it is a component-by-component product. 4 After normalization, we get the probability of the filtered estimate at as asleep awake To compute the predicted state probabilities we use the result from the previous line together with the derivation of the predicted probability from above to get asleep awake Note that the probability suggests it is more likely, given the evidence at time , that the class if not confused (probability ) than that it is confused ( ). This probability is lower that that of the prior probability of the class being not confused, . This is the result of including conflicting evidence (one student asleep, another awake). On the other hand, we predict that the class is more likely to be confused at time than at time . (b) Once we receive evidence at time we can use it to filter the estimate of the class’s state of confusion. Namely, we want to compute asleep awake asleep asleep We can compute this probability a way analogous to the one we used to compute the initial filtered estimate at time : The only difference is that we use the predicted estimate from the previous time step, . Substituting the numbers from our problem we arrive at After normalization we get the filtered probability at asleep awake asleep asleep As intuitively expected, the evidence that both students are now asleep points to a more likely explanation that, at time the class is confused. The process we followed in this problem so far is called the forward probability propagation process. (c) Can the evidence at time tell us, in retrospect, what the state of confusion was at time answer is yes. We need to compute the smoothed estimate ? The This probability can be computed using a combination of forward andbackward probabilities. Again, instead of “blindly” using the forward and backward formulas, we can easily derived the needed probability as follows Let us first compute the second term asleep asleep 5 The smoothed estimate is then Compared to the probability of the filtered state, , the probability of the smoothed state points to a much more uncertain state of confusion. This can be seen as a consequence of the overwhelming evidence that the class is confused at time . (d) To find the most likely explanation in terms of class confusion of the evidence at two time steps we need to compute the best sequence of confusion states : ½ ¾ ½ ¾ Of course, in this simple situation we could easily substitute all possible combinations (four of them) of the values of and in the above equation and select the one with the highest probability. Because this is not computationally feasible when sequences are longer, we need to use the so called Viterbi algorithm. The Viterbi algorithm is, essentially, the forward algorithm with the replaced by a operation and some additional bookkeeping. We can very easily derive it for our simple problem, using the factorization defined by our network: ½ ¾ ¾ ½ Let us first compute a quantity we denote with , This is, essentially, the (unnormalized) filtered estimate (forward probability) we computed in 3a. Using this quantity, we can compute ½ and keep track which state of lead to the maximum values for each state of , i.e., ½ Once we have computed , we next compute ¾ Then, we trace back and find, from , which lead to this and select it as Let us first compute for our problem: confusednot confused confused asleep awake not confused confused not confused 6 . confused not confused and confused not confused We next need to enter evidence at and compute the most likely state at that time, confused Hence, the most likely explanation at time is that the class is confused, confused. Tracing back using we find that the state that led to confused is confused confused. Thus, our most likely explanation of the evidence received at two times is that the class was confused at both times, confused confused Note that this explanation differs from the one we would obtained by looking at the estimates of the smoothed states we calculated in (3b) and (3c). If we picked the states that maximize those probabilities we would get ¾ ½ confused not confused From the above discussion we arrive at the following important conclusion: the most likely sequence of states in an HMM, given evidence, is not always the sequence of individually most likely smoothed states. 7