Download Recitation session Bayesian networks, HMM, Kalman Filters, DBNs

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Randomness wikipedia , lookup

Infinite monkey theorem wikipedia , lookup

Probabilistic context-free grammar wikipedia , lookup

Dempster–Shafer theory wikipedia , lookup

Probability box wikipedia , lookup

Ars Conjectandi wikipedia , lookup

Inductive probability wikipedia , lookup

Birthday problem wikipedia , lookup

Probability interpretations wikipedia , lookup

Transcript
Recitation session
Bayesian networks, HMM, Kalman Filters, DBNs
Ch. 15 AIMA 2nd Ed.
1 Hidden Markov models and Dynamic Bayesian networks
Assume we want to build a program that infers the state of confusion in a CS440 lecture. We have a video camera that
observes the classroom. It is a rather fancy camera: it can detect individual students faces. More than that, it can, with
some limited accuracy, determine from one image of a student’s face whether the student is sleeping or not.
We know, from studies done over many years, that the probability of a student sleeping in a lecture, when it is
confusing, is 70%. If the lecture is not confusing, the student may still be tired from a night or partying or he may be
simply bored and, hence, sleeping with probability 20%.
We also know that when the lecture is confusing (or not), the fact that one student is sleeping does not impact other
students around her.
Let us assume that our smart camera returns the information on each and every student in the class sleeping or not
sleeping with the accuracy of 80%.
Let us also assume that it does so every 1 minute. We also know that if the class was confused at some time t there
is 90% certainty that it would be confused a minute later. Similarly, if it is not confused at time it will not be confused
at time with the same probability of 90%. Initially, there is a 10% chance that the class will be confused.
1. Formulate the problem by modeling it as a dynamic Bayesian network. Define variables, structure, and parameters.
2. Can this problem be modeled as an HMM? If so, convert it into an HMM. Compute all necessary parameters.
3. Let there be only 2 students in the class. The algorithm implemented in the camera for detection of sleeping
has been improved so that it does not make any errors, i.e., the camera can accurately tell whether a student is
sleeping or not. The camera records over two minutes and returns the following information about two students:
time 1: (asleep, awake), time 2: (asleep, asleep) .
(a) Predict the state of confusion at minute t=2 given the information about the two students at minute t=1.
(b) Now assume we have the full information (both at t=1 and t=2). What is the probability of the state of
confusion at t=2?
(c) Looking back, what should have been the probability of confusion at t=1 if we have this complete information?
(d) What is the most likely explanation of the class’s state of confusion at t=1 and t=2, with all of the camera
record in hand?
1.1 Solution
1. We first need to name all the variables in the problem. Let denote the random variable that describes the state
of confusion of the class at time . We assume confused not confused. Next, let be the random
variable corresponding to what the camera detects as the state of the student in the class at time , i.e.,
asleep awake. Of course, the camera cannot detect the true “sleeping” state of the student. Let us call
this true state , asleep awake. (Note: we could also say that trully asleep truly awake, but
it should be clear from the context what the states’ real meaning is.)
1
...
C t−1
...
Ct
T1,t−1
T2,t−1
T1,t
T2,t
S1,t−1
S2,t−1
S1,t
S2,t
Figure 1: Two slices of the CS440 dynamic Bayesian network.
Based on the description of the problem we arrive at the network shown in Figure 1. Actually, we show only
two slices, at times and .
Nodes corresponding to each student are independent of each other, given the state of the class confusion, as
defined in the problem. Furthermore, -variables only depend on corresponding -variables. Finally, -nodes
are linked across time. For convenience, we show the evidence nodes as shaded.
After defining the variables and the network structure, we are left with the task of defining network parameters.
From the problem definition, it is easy to see that:
2. One way to model this problem using a hidden Markov model is to group all hidden variables at every time ,
and all into one compound hidden variable at that time . Let us call this compound hidden variable .
Then
where is the total number of students in the class. The compound variable will have different states.
For instance, when there are only two student in the class, , the number of states of the compound hidden
variable is .
At the same time, we need to group all the evidence variables, into a compound evidence variable
The compound evidence variable has states.
The new model is now an HMM depicted in Figure 2.
We also need to define parameters of this new model. Let us first consider the transition probability between
compound hidden variables and . Using the definition of the compound variables and by observing the
2
...
ht−1
ht
Et−1
Et
...
Figure 2: Two slices of the CS440 dynamic Bayesian network now represented as a HMM with compound states.
dependencies in the original network we can write
The transition table will have entries. For instance, the entry corresponding to the hidden states
confused asleep awake and confused asleep asleep can be computed as
confused asleep awake confused asleep asleep asleepconfused awakeconfused confusedconfused
Other entries can be computed in a similar manner.
The compound evidence probabilities can also be easily computed. Again, using the definition of the compound
variables and the dependencies observed from the network we can write
This table will have entries. For instance, the entry corresponding to evidence and compound hidden state confused asleep awake will be computed as
awake awake
awake awake confused asleep awake
awakeasleep awakeawake
Finally, the last parameter needed to be computed is the initial state distribution, . We compute it as
The initial probability of the state confused asleep awake is, for example,
confused asleep awake
asleepconfused awakeconfused confused
3
C1
S1,1
C2
S2,1
S1,2
asleep awake
S2.2
asleep asleep
Figure 3: CS440 dynamic Bayesian network.
3. With two students in the class and a perfect camera sleeping detector the network becomes slightly different
from the one we discussed previously. The main difference is that we can drop all nodes (or -nodes, for that
matter)—for perfect camera detectors . This is depicted in Figure 3.
We see that this is a simplified version of the network in (1). Transition state probabilities are the same as in (1).
Evidence state probabilities can be computed using the method from (2). Note that what used to be called the
-variable is now the variable, e.g., . Hence, this model is an HMM and
we can use appropriate HMM algorithms to answer the remaining questions.
(a) To compute the predicted probabilities of the state of confusion at time given the evidence at time
we could “blindly” use the HMM prediction algorithm. However, it is equally easy to “derive” this
algorithm and compute the requested probability in the process.
First, we note that we need to compute . From the network, this term can be written as
asleep asleep awake awake Here we know all the terms except . This is the probability of the filtered state estimate
at time . We can compute this probability as
So, to compute the predicted state distribution at time we first need to compute the filtered state
probability at time , and then compute the predicted probability using the result of the first step.
In particular, we get the following numbers:
asleep½ ¾½ awake½ ½½ asleep awake
½ Note that the above product is not the typical vector product–rather, it is a component-by-component
product.
4
After normalization, we get the probability of the filtered estimate at as
asleep awake To compute the predicted state probabilities we use the result from the previous line together with the
derivation of the predicted probability from above to get
asleep awake Note that the probability suggests it is more likely, given the evidence at time , that the class if not
confused (probability ) than that it is confused (
). This probability is lower that that of the
prior probability of the class being not confused, . This is the result of including conflicting evidence
(one student asleep, another awake). On the other hand, we predict that the class is more likely to be
confused at time than at time .
(b) Once we receive evidence at time we can use it to filter the estimate of the class’s state of confusion.
Namely, we want to compute
asleep awake asleep asleep
We can compute this probability a way analogous to the one we used to compute the initial filtered estimate
at time :
The only difference is that we use the predicted estimate from the previous time step, .
Substituting the numbers from our problem we arrive at
After normalization we get the filtered probability at asleep awake asleep asleep As intuitively expected, the evidence that both students are now asleep points to a more likely explanation
that, at time the class is confused.
The process we followed in this problem so far is called the forward probability propagation process.
(c) Can the evidence at time tell us, in retrospect, what the state of confusion was at time answer is yes. We need to compute the smoothed estimate
? The
This probability can be computed using a combination of forward andbackward probabilities.
Again, instead of “blindly” using the forward and backward formulas, we can easily derived the needed
probability as follows
Let us first compute the second term
asleep asleep 5
The smoothed estimate is then
Compared to the probability of the filtered state, , the probability of
the smoothed state points to a much more uncertain state of confusion. This can be seen as a consequence
of the overwhelming evidence that the class is confused at time .
(d) To find the most likely explanation in terms of class confusion of the evidence at two time steps we need
to compute the best sequence of confusion states :
½ ¾ ½ ¾
Of course, in this simple situation we could easily substitute all possible combinations (four of them) of
the values of and in the above equation and select the one with the highest probability. Because this
is not computationally feasible when sequences are longer, we need to use the so called Viterbi algorithm.
The Viterbi algorithm is, essentially, the forward algorithm with the
replaced by a operation and
some additional bookkeeping. We can very easily derive it for our simple problem, using the factorization
defined by our network:
½ ¾
¾
½
Let us first compute a quantity we denote with ,
This is, essentially, the (unnormalized) filtered estimate (forward probability) we computed in 3a. Using
this quantity, we can compute
½
and keep track which state of lead to the maximum values for each state of , i.e.,
½
Once we have computed , we next compute
¾
Then, we trace back and find, from , which lead to this and select it as Let us first compute for our problem:
confusednot confused
confused
asleep awake not confused
confused not confused
6
.
confused
not confused
and
confused
not confused
We next need to enter evidence at and compute the most likely state at that time,
confused
Hence, the most likely explanation at time is that the class is confused, confused.
Tracing back using we find that the state that led to confused is confused confused. Thus, our most likely explanation of the evidence received at two times is that the class
was confused at both times,
confused confused
Note that this explanation differs from the one we would obtained by looking at the estimates of the
smoothed states we calculated in (3b) and (3c). If we picked the states that maximize those probabilities
we would get
¾
½
confused not confused
From the above discussion we arrive at the following important conclusion: the most likely sequence of
states in an HMM, given evidence, is not always the sequence of individually most likely smoothed states.
7