Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CaNew’2000, the 2nd International Workshop on Causal Networks Held in conjunction with ECAI 2000, the 14th European Conference on Artificial Intelligence Berlin, Germany, August 2000, pages 1-5 Modelling Dynamic Causal Interactions with Bayesian Networks: Temporal Noisy Gates Severino F. Galán1 and Francisco J. Díez1 Abstract. The usual way of applying Bayesian networks to the modelling of temporal processes consists in discretizing time and creating an instance of each random variable for each point in time. This method leads to large and complex networks. We present a new approach called Net of Irreversible Events in Discrete Time (NIEDT), for temporal reasoning in domains involving irreversible events. Under this approach, time is discretized, nodes are associated to events, and each value of a node represents the occurrence of an event at a particular instant; this leads to more simple networks. We also define several types of Temporal Noisy Gates, which facilitate the acquisition and representation of uncertain temporal knowledge. 1.1.1 The noisy OR-gate In the noisy OR model, each cause Xi (a binary random variable) acts independently of the other causes to produce the effect Y (also a binary random variable). For each Xi, an inhibitory mechanism could prevent this action from taking place, i.e. each present cause may fail to produce the effect with a certain probability. A noisy OR-gate can be decomposed as shown in Figure 1. Each auxiliary variable Zi represents the fact that Y has been produced by Xi. Therefore, Y=+y when Zi=+zi for at least one i. Keywords: probabilistic temporal reasoning, causality, temporal Bayesian networks 1 … Xn Z1 … Zn Y INTRODUCTION 1.1 Figure 1. Bayesian networks Bayesian networks (BNs) [16, 4] are a probability-based method for representing and reasoning with uncertain knowledge. Each node is associated with a random variable taking on either discrete or continuous values. In our work they are all discrete. Links define probabilistic dependence relations between variables. Formally, a BN is an acyclic directed graph along with a probability distribution for its variables, which satisfies the Markov condition: the probability of any variable V, once determined the values of its parents, is independent of the non-descendants of V. The joint probability over the random variables in the net can be expressed as P( x1 ,..., xn ) = ∏ P( x | pa( x )) i i (1) where pa(xi) stands for the set of parents of variable Xi. In the general case, it is necessary to assign each node a set of conditional probabilities that grows exponentially with the number of parents. This complicates the acquisition of the parameters, their storage and the propagation of evidence. For these reasons, causal interaction models called canonical models [16] were developed in order to simplify both BN construction and probability computation. The most famous example is the noisy OR-gate, which requires just one parameter per parent. Department of Artificial Intelligence, UNED, Paseo Senda del Rey 9, 28040 Madrid, Spain, email: {seve, fjdiez}@dia.uned.es Noisy OR-gate for n causes. The parameters that define the model are: ci ≡ P(+ z i |+ xi ) (2) Put another way, 1−ci = P(¬zi|+xi) is the probability that inhibitor Ii prevents Xi from causing Y. (In a more detailed model, Ii might be represented as a second parent of Zi.) If Xi is absent, it cannot produce Y; therefore, P ( + z i | ¬x i ) = 0 (3) For a certain configuration of the Xi’s: i 1 X1 P(+ y | x ) = 1 − ∏ (1 − c ) i (4) i∈TX where TX is the subset of causes of Y that are present. 1.1.2 The noisy MAX-gate The noisy MAX-gate [11, 8, 10] is a generalization for graded variables of the noisy OR-gate. A graded variable E can be either absent or present with gE degrees of intensity. Usually E=0 means “E is absent” and succeeding numbers indicate higher intensity. This type of causal interaction can be constructed by introducing n auxiliary variables Zi with the same domain as Y (see Figure 1). The parameters of the model are the conditional probabilities: c X i = xi Z i = zi ≡ P ( Z i = z i | X i = xi ) (5) The value taken on by Y is the maximum of the zi’s. Therefore, the conditional probability table (CPT) for Y is given by: ∑ ∏c P( y | x ) = z max z = y X i = xi Z i = zi (6) i Figure 2 illustrates Equation (6) for a family with two causes, A and B, and one effect, C. A=a a c ZAA==1 a c ZAA==2 … c ZAA==a g C b c ZBB==0 C=0 C=1 C=2 … C=gC c B =b Z B =1 C=1 C=1 C=2 … C=gC c B =b Z B =2 P(C | A=a, B=b) cZ A =0 C=2 C=2 C=2 … C=gC … … … … … … c ZBB=b= g C C=gC C=gC C=gC … C=gC Figure 2. 1.1.3 Noisy MAX-gate for two causes and one effect. The noisy AND-gate In the noisy AND model, each parent Xi (a binary random variable) is interpreted as a condition for the effect Y (also a binary random variable). There exists an inhibitory mechanism for each condition, so that when Xi is present, Y may be false even if the other conditions are satisfied. This can be modelled by introducing an auxiliary variable Zi for each arc Xi→Y. Zi=+zi represents that condition Xi is present and not inhibited. The parameters of the model are: hi ≡ P (+ z i |+ xi ) (7) The conditional probability of Y can be computed as follows: ∏ hi P(+ y | x ) = i 0 for x = (+ x1 , ,+ x n ) (8) otherwise This conjunctive model corresponds to a noisy AND without substitutors [10]. 1.1.4 The noisy MIN-gate The noisy MIN-gate [8, 10] is a generalization for graded variables of the noisy AND. As the noisy MAX, the noisy MIN can be modelled through the introduction of n auxiliary variables Zi. The parameters associated to the relation between Xi and Zi are: hZXi i==zxi i ≡ P( Z i = zi | X i = xi ) (9) The value taken on by Y is the minimum of the zi’s. Therefore, the CPT for Y is P( y | x ) = ∑ ∏h z min z = y X i = xi Z i = zi i (10) 1.1.5 Leaky noisy gates In real-world applications, it is often unfeasible to enumerate all the possible causes of an effect. In such a case, the non-explicit causes can be grouped together in a node X*, implicit in the OR/MAX-gate: cy* is the probability that Y=y when the causes explicit in the model are known to be absent. If Y is a binary random variable, it suffices to have one parameter c+y*. In conjunctive interaction, the non-explicit conditions can be grouped into a node X*, giving rise to the leaky noisy AND/MINgates. 1.2 BNs and time The usual way to apply BNs to dynamic domains consists in discretizing time and creating an instance of each random variable for each point in time. Under the formalism of dynamic Bayesian networks (DBNs), initially a static causal model is built. Then, a copy of this model is generated for each instant in a certain temporal range of interest. Finally, links between nodes in adjacent static networks are established. Among research activities applying DBNs, as defined above, are a model for making judgements concerning persistence of propositions by Dean and Kanazawa [7], a model for sensor validation by Nicholson and Brady [15], a method for reasoning with DBNs by Kjærulff [13], etc. Other methods that introduce an explicit representation of time in BNs appear in [3, 12, 2, 17]. Due to its similarity to a NIEDT, we make special mention of the formalism presented in [2]. Arroyo-Figueroa and Sucar propose a model called temporal nodes Bayesian networks (TNBNs). The TNBN is an extension of a standard BN, in which each temporal node represents an event or a state change of a variable. There is at most one change of state for each variable in the temporal range of interest. The value taken on by the variable represents the interval in which the change has ocurred. Time is discretized in a finite number of intervals, allowing a different number and duration of intervals for each node (multiple granularity). Intervals for each arc represent relative times between the parent events and the corresponding child state change. When an initial event is detected, its time of occurrence fixes temporally the network. The TNBN model, however, lacks a formalization of canonical models for temporal processes. 2 DESCRIPTION OF THE NEW APPROACH In a NIEDT, each variable represents an event that can take place at an instant within a certain temporal range of interest. Such a range is discretized adopting the appropriate temporal unit for each case (seconds, minutes, etc.). Therefore, the temporal granularity depends on the particular problem. A temporal random variable V in the net can take on a set of values v[i] with i∈{a,…,b,never}, where a and b are instants defining the limits of the temporal range of interest for V. For example, if V represents “being taken to hospital”, V=v[a] expresses that the patient has been taken to hospital at instant a. If the patient is not taken to hospital then V=v[never]. We assume that each event can happen at most once, that is to say, processes are irreversible. This way, we guarantee the exclusivity of the values associated to each variable in the net, since it will be impossible for the same event to happen at two different instants. The links in the net represent temporal causal mechanisms between neighbour nodes. Therefore, each CPT represents the most probable delays between parent events and the corresponding child event. For the case of general dynamic interaction in a family of nodes, giving the CPT involves taking into account any possible configuration of instants for a node and its parents, and estimating a probability for that distribution of temporal events. In a family of n parents X1,…,Xn and one child Y, the CPT is given by when kt << 1 for the temporal range of interest, (1−k)t ≈ 1; therefore, P(a[t]) ≈ k. We then have time invariance for node A. (11) Let us considere the net in Figure 3. The temporal ranges of interest for events A and B are {0,…,tA} and {0,…,tB}, respectively. P(b[j]|a[i]) is the probability that B happens at j when A has happened at i. The CPT for link A→B can be as general as possible, permitting any delay between A and B, with probabilities varying over time for each particular delay (see Table 1). When j<i, P(b[j]|a[i])=0 because the effect cannot precede the cause. When i=never, P(b[j]|a[i])=0 as well. P(Y [tY ] | X 1 [t1 ], , X n [t n ]) where tY ∈ {0, , nY , never}, ti ∈ {0, , ni , never} In general BNs, the joint probability is given by the product of all the CPT’s in the network. Any marginal or conditional probability can be derived from the joint probability. For example, if B has happened at t1 and C at t2, the a posteriori probability for A is P(a[t ], b[t1 ], c[t 2 ]) P(a[t ] | b[t1 ], c[t 2 ]) = P(b[t1 ], c[t 2 ]) 2.2 , X n [t n + ∆t ]) = P (Y [tY ] | X 1 [t1 ], Figure 3. , nY }, ti , ti + ∆t ∈ {0, , X n [t n ]) (13) , ni } If all the CPT’s are time-invariant, the network will be timeinvariant as well. 2.1 ∑ t ' =0 P(a[t ' ]) (14) (15) (The proof is omitted because of the lack of space.) Therefore, P(a[t]) is a probability distribution with exponential decay. Since (1 − k ) t = 1 − kt + 1 2 2 k t + 2 Temporal network with one parent and one child. b[0] 0.5 0 0 0 b[1] 0.1 0.3 0 0 b[2] 0.1 0.05 0.2 0 b[3] 0.1 0.02 0.2 0 b[never] 0.2 0.63 0.6 1 (17) where j , j + ∆t ∈ {0, , t B } and i, i + ∆t ∈ {0, ,tA} In this case, we only need to specify a probability for each delay. For example, once we know that A has taken place, the probability of B at that instant is 0.5, and 0.1 one instant later: 0.5 if j = i P(b[ j ] | a[i]) = 0.1 if j = i + 1 0 otherwise (18) The conditional probabilities for arc A→B appear in Table 2. This expression could be used in our approach to evaluate the a priori probabilities. In many domains pA[t] is a constant and does not depend on t. If pA[t]=k, Equation (14) turns into: P(a[t ]) =(1 − k ) t ⋅ k B P(b[ j + ∆t ] | a[i + ∆t ]) = P(b[ j ] | a[i]) Let A be an event node that may occur at one of instants 0, 1, 2... We define pA[t] as the probability of A being true at t given that it was false at 0, 1, ..., and t−1, i.e. the probability that A happens at time t if it has not happened before t. These values can be obtained from a database or estimated by a human expert. We wish to calculate the probability of A being true at each point in time. As an illustrative example, A could represent the “death caused by an epidemic disease”, and pA[t] would be the percentage of population dying weekly as a consequence of the disease. (The time could be discretized in weeks.) The probability for temporal node A is: t −1 A If we had a time-invariant causal relation for arc A→B then Temporal nodes without parents P(a[t ]) = p A [t ] ⋅ 1 − {0,…,tB} Table 1 A general CPT for tA=2 and tB=3. B\A a[0] a[1] a[2] a[never] where tY , tY + ∆t ∈{0, {0,…,tA} (12) This expression can be used for diagnosis or prediction. In many domains, the dynamic causal relations have the property of time invariance: P (Y [tY + ∆t ] | X 1 [t1 + ∆t ], Node with one parent (16) Table 2 Time-invariant CPT for tA=2 and tB=3. B\A a[0] a[1] a[2] a[never] b[0] 0.5 0 0 0 b[1] 0.1 0.5 0 0 b[2] 0 0.1 0.5 0 b[3] 0 0 0.1 0 b[never] 0.4 0.4 0.4 1 The case presented shows that there are two possible delays (0 or 1) between the occurrences of the parent event and the child event, with associated probabilities 0.5 and 0.1, respectively, which are time-invariant. These probabilities can be estimated by a human expert or obtained from a database by taking into account the delay between A and B. 2.3 Canonical models and time If we consider a family of two parent nodes A and B, and one child node C, we must provide the parameters (19) P(c[k ] | a[i], b[ j ]) that Y becomes true at min(t1, t2), since every event can happen only once. From Figure 4, a temporal noisy OR-gate is equivalent to a noisy MIN-gate. The equivalence is reached by associating increasing intensity degrees to increasing temporal indices. Therefore, a temporal noisy OR-gate can be modelled through a noisy MIN-gate by sorting temporal values from past to future. 2.3.2 Temporal noisy AND-gate The temporal noisy AND-gate represents the case in which the effect is present as soon as all its conditions permit it to be present, as illustrated in Figure 5 for a family with two conditions. where i∈{0,…,tA,never}, j∈{0,…,tB,never}, k∈{0,…,tC,never} P(y[i] | x1[i1], x2[i2]) hzx [[0i ]] hzx [[1i] ] … In real-world applications, it is difficult to find a human expert or a database that allows us to create such a table. For this reason, a formalization for temporal domains of traditional canonical models turns out to be necessary. hzx [[0i ]] y[0] y[1] hzx [[1i] ] y[1] … 2.3.1 hzx [[ni ]] Temporal noisy OR-gate We are dealing with domains that can be modelled associating binary random variables to events. In the static case, the noisy ORgate reproduces appropriately the kind of interactions in which both the presence of one cause is sufficient to produce the effect and this causal mechanism is independent of the rest of causes. For temporal processes, additional questions should be taken into account, as shown below. Let us consider a net with n causes X1,…,Xn and one effect Y. The temporal ranges for these nodes are, repectively, {0,…, t X1 } , …, {0,…, t X n } , and {0,…,tY} Each parameter ci that appeared in the static case (see Section 1.1.1), separates now into parameters c ZXii==zxi i[[kj]] j ∈ {0, , t X i , never}, k ∈ {0, , tY , never} (20) allowing different delays between cause and effect. The type of relation between Xi and Zi was described in section 2.2. We are interested in calculating the probability of Y at any instant given evidence about its causes, as indicated in Figure 4, where n=2, X1=x1[i1] and X2=x2[i2]. P(y[i] | x1[i1], x2[i2]) c zx [[0i ]] c zx [[1i]] … c zx [[0i ]] y[0] y[0] … y[0] y[0] c zx [[1i] ] y[0] y[1] … y[1] y[1] … … … … … … c zx [[ni ]] y[0] y[1] … y[nY ] y[nY ] i ] c zx [[never ] y[0] y[1] … y[nY ] y[never ] 2 2 1 1 1 1 1 1 i] c zx [[ni ]] c zx [[never ] 1 1 Y 1 1 1 1 2 2 2 2 2 2 2 2 Y 2 2 Figure 4. Temporal noisy OR-gate. The general reasoning followed in Figure 4 establishes that if X1 causes Y to be true at t1, and X2 does the same for t2, we consider 2 2 hzx [[ni ]] i] hzx [[never ] … y[nY ] y[never ] y[1] … y[nY ] y[never ] … … … … … y[nY ] y[nY ] … y[nY ] y[never ] 1 1 1 1 1 2 2 2 2 h 2 2 2 Y x2 [i2 ] z2 [ never ] 1 1 1 1 Y 1 1 1 y[never ] y[never ] … y[never ] y[never ] Figure 5. Temporal noisy AND-gate. Under this type of interaction, if X1 permits Y to be true at t1, and X2 at t2, we consider that Y becomes true at max(t1, t2). A temporal noisy AND-gate is equivalent to a noisy MAX-gate. The equivalence is reached by associating increasing intensity degrees to increasing temporal indices. Therefore, a temporal noisy ANDgate can be modelled through a noisy MAX-gate by sorting the temporal values from past to future. Note that associating increasing intensity degrees to decreasing temporal indices, i.e. sorting the temporal values from future to past, a temporal noisy OR-gate could be modelled through a noisy MAX-gate and a temporal noisy AND-gate could be modelled through a noisy MINgate. In canonical models applications, disjunctive interactions (OR gate) appear much more often than conjunctive ones. That is because we are mainly interested in modelling the evolution of failures, anomalies or malfunctions in a system, either in the past (diagnosis) or in the future (prediction). In this kind of domains, disjunctive interaction is directly related to our intuitive notion of causality. (For an anomaly to appear, only one of its causes is needed.) On the contrary, in other domains we are interested in the evolution of a system from a state of malfunction to another of normality. In this case, event nodes represent processes of recovery that interact conjunctively. For example, after a car accident resulting in multiple injuries to a person, we could model the process of recovery as shown in Figure 6. X1 … Xn AND Y Figure 6. Net for modelling the process of complete recovery from an accident. Each variable Xi represents the event “the patient starts to be treated of injury i”. Variable Y represents “complete recovery”. All these variables interact through a temporal noisy AND-gate because each Xi is a condition for Y. Of course, if the Xi’s are not independent, the model must contain links among them or common ancestors of these nodes. The temporal range of interest is {0,…,m}, where t=0 is the time the accident occurs and t=m is an arbitrary time point. We introduce for each condition Xi one auxiliary variable Zi representing “recovery from injury i”. The parameters needed to complete the model are the conditional probabilities: hZXi i==zxi i[[k j]] ≡ P ( z i [k ] | xi [ j ]) ∀j , k ∈ {0, , m}, ∀i ∈ {1, , n} (21) and the prior probabilities: P( xi [ j ]) ∀j ∈ {0, , m}, ∀i ∈ {1, , n} (22) The conditional probabilities give us an idea of the most probable durations of successful treatments, and the prior probabilities indicate the times treatments usually start to be applied after the accident. (Some treatments can be applied just after the accident occurrence, others can only be applied in a hospital, etc.) The event “complete recovery” will be true as soon as recovery from the last injury has taken place. This kind of reasoning is illustrated in Figure 5. 2.3.3 Temporal leaky noisy gates Under the hypotheses introduced by Díez and Druzdzel [10] for leaky models, the non-explicit causes in the model of a node Y can be grouped together in a node called X*. If the temporal range for Y is {0,…,tY}, we only need to give the parameters c ∗y[i ] ∀i∈{0,…,tY} (23) and consider X* as another cause of Y. Therefore, a temporal leaky noisy OR-gate can be modelled by means of a leaky noisy MINgate. To that end, temporal indices must be ordered from past to future. A temporal leaky noisy AND-gate can be modelled through a leaky noisy MAX-gate by grouping the non-explicit conditions into a node X*. 3 CONCLUSIONS The process of computing posterior probabilities in BNs is NPhard [5]. This complexity becomes particularly problematic in large models such as those that arise when modelling temporal processes by DBNs. We have presented a new method called Net of Irreversible Events in Discrete Time (NIEDT), for handling temporal information through BNs. This method leads to nets structurally less complex for domains involving irreversible temporal events, in comparison with the formalism of DBNs. This improvement in complexity is a consequence of restricting to one the number of possible occurrences over time for each event. This restriction does not exist in DBNs. Therefore, we create a more compact and simple representation but, as a result, a reduction in temporal expressiveness appears. We have adapted different canonical models of causal interaction in static domains, for problems characterized by the presence of irreversible temporal events. As a result, we have defined the temporal noisy gates. ACKNOWLEDGEMENTS This research was supported by the Spanish CICYT, under grants TIC97-0604 and TIC97-1135-C04-04. REFERENCES [1] G. Arroyo-Figueroa. Razonamiento Probabilístico con Nodos Temporales y su Aplicación al Diagnóstico y Predicción de Eventos. Doctoral thesis, Instituto Tecnológico y de Estudios Superiores de Monterrey, Mexico, 1999. In Spanish. [2] G. Arroyo-Figueroa and L. E. Sucar. A temporal Bayesian network for diagnosis and prediction. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence, pages 13-20, Stockholm, Sweden, 1999. Morgan Kaufmann, San Francisco, CA. [3] C. Berzuini. Representing time in causal probabilistic networks. In M. Henrion, R. D. Shachter, L. N. Kanal, and J. F. Lemmer, editors, Uncertainty in Artificial Intelligence 5, pages 15-28. Elsevier Science Publishers B. V. (North-Holland), 1990. [4] E. Castillo, J. M. Gutiérrez, and A. S. Hadi. Expert Systems and Probabilistic Network Models. Springer Verlag, New York, 1997. [5] G. F. Cooper. The computational complexity of probabilistic inference using Bayesian belief networks. Artificial Intelligence, 42:393-405, 1990. [6] P. Dagum and A. Galper. Forecasting sleep apnea with dynamic network models. In Proceedings of the 9th Conference on Uncertainty in Artificial Intelligence, pages 64-71, Washington D.C., 1993. Morgan Kaufmann, San Francisco, CA. [7] T. Dean and K. Kanazawa. A model for reasoning about persistence and causation. Computational Intelligence, 5:142-150, 1989. [8] F. J. Díez. Parameter adjustment in Bayes networks. The generalized noisy OR-gate. In Proceedings of the 9th Conference on Uncertainty in Artificial Intelligence, pages 99-105, Washington D.C., 1993. Morgan Kaufmann, San Francisco, CA. [9] F. J. Díez. Sistema Experto Bayesiano para Ecocardiografía. Doctoral thesis, U.N.E.D., Madrid, 1994. In Spanish. [10] F. J. Díez and M. Druzdzel. Canonical probabilistic models for knowledge engineering. Technical Report, Decision Systems Laboratory, University of Pittsburgh, 2000. In preparation. [11] M. Henrion. Some practical issues in constructing belief networks. In L. N. Kanal, T. S. Levitt, and J. F. Lemmer, editors, Uncertainty in Artificial Intelligence 3, pages 161-173. Elsevier Science Publishers, Amsterdam, 1989. [12] K. Kanazawa. Reasoning about Time and Probability. PhD thesis, Department of Computer Science, Brown University, 1992. [13] U. Kjærulff. A computational scheme for reasoning in dynamic probabilistic networks. In Proceedings of the 8th Conference on Uncertainty in Artificial Intelligence, pages 121-129, Stanford University, 1992. Morgan Kaufmann, San Francisco, CA. [14] L. Ngo, P. Haddawy, and J. Helwig. A theoretical framework for context-sensitive temporal probability model construction with application to plan projection. In Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence, pages 419-426, Montreal, Canada, 1995. Morgan Kaufmann, San Francisco, CA. [15] A. E. Nicholson and J. M. Brady. Sensor validation using dynamic belief networks. In Proceedings of the 8th Conference on Uncertainty in Artificial Intelligence, pages 207-214, Stanford University, 1992. Morgan Kaufmann, San Francisco, CA. [16] J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Mateo, CA, 1988. Revised second printing, 1991. [17] E. Santos Jr. and J. D. Young. Probabilistic temporal networks: A unified framework for reasoning with time and uncertainty. International Journal of Approximate Reasoning 20 (1999) 263291.