Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Modeling Context-Dependent Faults for Diagnosis 1 Wolfgang Mayer ∗ Markus Stumptner ∗∗ ∗ University of South Australia, Adelaide, SA 5095, Australia (e-mail: [email protected]) ∗∗ University of South Australia, Adelaide, SA 5095, Australia (e-mail: [email protected]) Abstract: Most Model-based diagnosis frameworks rely on incremental probing and the assumption that faults occur independently to infer the most likely explanation for a symptom. For systems where additional sensors are unavailable or a repair action must be issued at once, these assumptions are often inadequate and dependent faults must be considered explicitly. We introduce explicit models of context-dependent component fault behavior and show that our compositional models are well-suited for the one-shot fault diagnosis of pseudo-static systems. We develop extensions to the well-known Conflict-Directed A* algorithm to infer the most-likely system state given a fixed set of observations and show that our approach complements earlier dependency models. 1. INTRODUCTION Powerful model-based reasoning techniques to infer possible faults in a system based on manifestations of observable symptoms have been developed. Given the inherent complexity of this problem and the fact that most practical problems do not require explanations to be produced, most modeling techniques and inference algorithms apply the principle of parsimony to sacrifice completeness for efficiency. For example, the assumptions that (i) the model reflects all possible behaviors and component interactions relevant to a diagnosis task, and (ii) all observed symptoms can be attributed to a small set of component failures, and (iii) that these failures occur independently, are commonly found in the literature. In scenarios where only a few “most likely” explanations are required, (ii) and (iii) together with probabilistic models of single components failing are often used to guide the search for candidate explanations or guide incremental diagnosis and measurement (de Kleer and Williams, 1987). However, these assumptions are not always adequate. While recent algorithmic improvements have led to systems that can efficiently test and diagnose faults with large cardinality (for example MFMC faults are discussed in de Kleer (2008)), many algorithms employ strong independence assumptions and may not discriminate well between different fault candidates if dependent faults are present. Our work aims to complement those systems with fault inter-dependencies. Our work aims at computing likely explanations of symptoms by considering components faults that are caused by cascading effects of failures or mis-configurations elsewhere in the system. We use the term “fault” in the sense that each “faulty” component must be repaired, replaced, or reconfigured in order to restore the intended system function. That does not imply that the component is or 1 This work was supported by the Australian Research Council under grant DP0881854. was necessarily faulted, just that it is in a mode that cannot support the desired system function. For example, a broken fuse caused by overloading the system is not faulty but must nevertheless be replaced to restore system function. We extend consistency-based diagnosis of static systems with context-specific component behavior to capture component behaviors in the presence of cascading failures. Our contributions are as follows: • We present a diagnosis framework that incorporates causality between components by means of capturing context-specific component behavior triggered by external influences. As a result, our work unifies both static diagnosis and diagnosis of cascading failures under a common formalism. • Our notion of causality dispenses with the singleorigin hypothesis widely applied in earlier work on dependent failures, for example Weber and Wotawa (2008) or Tatar (1995) by allowing failures contexts to depend on multiple interacting components. We also drop the assumption that failures are caused by component faults by allowing that failures may also be activated by unusual combinations in expected, correct behavior. • Our fault models are compositional, based on properties of individual components. No explicit global model of failure propagation or causality of faults that spans different components is required. Rather than relying on explicit specifications of how failure modes of one component may affect other components, we infer dependent failures from the components’ inputand output values propagated in the model. • We extend well-known heuristic search methods to compute most likely explanations using consistencybased diagnosis techniques enriched with information about possible mode changes. This paper is organized as follows: In Section 2, we use an example to informally introduce different aspects of F R i0 V 10V u S1 S2 i1 i2 M1 M2 Fig. 1. Electronic Circuit Schematic Diagram causality in failure diagnosis. A formal characterization of context-dependent behavior is given in Section 3. Our extensions to heuristic search procedures to compute diagnoses is presented in Section 4. We discuss the properties of our framework and selected related work in Section 5. conn0 ⇔ ok(F ) ∧ ok(R) ∧ (conn1 ∨ conn2 ) conn1 ⇔ closed(S1 ) ∧ (ok(M1 ) ∨ shorted(M1 )) conn2 ⇔ closed(S2 ) ∧ (ok(M2 ) ∨ shorted(M2 )) i0 = i1 + i2 ∧ r(R) i0 + u = 10 ¬connk ⇒ ik = 0 (k ∈ {0, 1, 2}) connk ⇒ u = r(Mk ) ik (k ∈ {1, 2}) ok(Mk ) ∧ ik > 0 ⇒ on(Mk ) ∧ r(Mk ) = 6 (k ∈ {1, 2}) shorted(Mk ) ⇒ r(Mk ) = 0 (k ∈ {1, 2}) ok(R) ⇒ r(R) = 5 Fig. 2. Formal Model of the Circuit in Figure 1 mode ok ok ok broken condition 0A = i0 0A < i0 ≤ 1A 1A < i0 true P (F =ok|·) 1.00 0.98 0.01 0.00 P (F =broken|·) 0.00 0.02 0.99 1.00 Fig. 3. Conditional Mode Transitions for F 2. EXPLOITING CAUSALITY FOR DIAGNOSIS While successful for many systems, the assumptions that failures occur independently may not hold for tightlycoupled systems, where failures in one component can easily propagate to damage other parts of the system. For example, the water pouring from a broken floodgate in a dam may wreak havoc further downstream, a shorted component in an electric circuit may damage other connected components, or damage to a jet engine rotor may spread to other engine stages (Akerlund et al., 2006). Therefore, repair choices based on fault identification and isolation using the independence assumption may fail to restore the full system purpose, and possible remaining faults may adversely affect the replaced parts. While incremental diagnosis and active probing may help to uncover hidden faults and refine diagnoses, limited observability and time or cost constraints may prohibit isolating the true origin of faults. In particular, autonomous or embedded systems operating under tight constraints are a prime example where a “good-enough” diagnosis must be submitted to a higher-level planning module based on the limited sensoric information. Hence the desire to incorporate causal relationships between component behaviors into the diagnosis process to extend the scope of explanations to components that are likely to be affected. Consider the electronic circuit depicted in Figure 1. Two motors, M1 and M2 , are connected via switches S1 and S2 to a common circuit that contains a resistive element, R, and a fuse, F , and a voltage source, V . A (much simplified) model of the circuit is given in Figure 2: if assumed in the ok mode, F and R conduct electricity based on simplified models of physics; otherwise, they are perfect isolators. Similarly, motors can be operating normally (mode ok ), be short-circuited (shorted ), or broken. A motor can be observed as on iff electrical current is available and it is operating normally. Electrical current flows only if the circuit forms a closed loop (conn0 ). Switches are either open or closed. This model is essentially a static representation of the possible system states, where the component mode assumptions (and observations) directly determine the system state. Some cascading failures can be represented by additional axioms (for example, shorted(M1 ) ⇒ broken(F )), but in general it is necessary to consider a sequence of mode transitions to adequately capture the intermediate state(s) induced by the system behavior. For example, assume that M1 can transit from the ok mode into shorted only if i0 > 0. Since the latter implies broken(F ) ∧ i0 = 0, the necessary transient state where ok(F ) ∧ shorted(M1 ) ∧ i0 > 0 cannot be represented in our single-state model and, therefore, not be explained. Adhering to the compositional modeling paradigm, we augment the static system description with possible mode transitions defined locally for each component. For each component, we represent possible transitions between the different behavioral modes based on conditional probabilities enriched with additional constraints that guard the activation of a transition. Given the current behavioral mode and the component properties implied by the system model, the conditional probability states how probable a mode change is between two system states. For example, Figure 3 gives the context-specific behavior of the fuse F in Figure 1. Transition models for the other components are defined similarly. In this model, an explanation of an observed symptom is a trajectory of (possibly concurrent) mode changes between a sequence of states, rather than a simple static assignment of modes to components. In our example, the most likely explanation for “both motors being ¬on while both switches are closed” is that F is broken with P (F = broken) ≈ 0.99. This is because if both motors are connected and working properly, the flow of electric current through F exceeds the 1A threshold, and hence the much higher probability for failure applies. Here, the context-specific behavior is encapsulated in F , and causal effects are derived from the system model; no explicit representation of causal dependencies is necessary. Different from classical consistency-based diagnosis, where F and R are both considered possible explanations with low probability, here F exceeds R (and the remaining double-failure hypotheses) in likelihood by a factor of 20. The discrepancy between the case where F fails spontaneously and where F fails causally illustrates that a diagnostic framework based solely on prior probabilities may fail to correctly discriminate between explanations. 3. MODELING CONDITIONAL BEHAVIOR In this and the following sections we develop a formal model of our diagnosis framework. We build on the consistency-based diagnosis framework as defined by, e.g., Reiter (1987). We extend the classical static system description with conditional mode transitions for each component, and a characterization of the initial states from which all diagnoses originate: Definition 1. (Diagnosis System). A Diagnosis System is a tuple hSD, COM P, M T, OBS, Ii, where SD contains the structural composition of the system and the behavioral models of the components comprising the system. The set COM P contains the atoms representing the components in SD, and M T defines a the conditional mode transitions for each component. OBS contains a set of literals representing the observations, and I a characterization of the initial state(s) of the system. Each component C ∈ COM P is associated with a set of behavioral modes, M odes(C), and a set of atoms representing properties and ports of the component, Locals(C). A partial (complete) assignment of component modes to component variables in SD is called a partial (complete) mode assignment. Similar to OBS, I represents the set of mode assignments that characterize those system states where all relevant components function as intended. This could be the single set where all components are assigned the ok mode, or could admit multiple assignments are permitted to allow for unknown or “don’t care” system variables, such as switch positions. In the following, we will implicitly assume that a diagnosis system is given hSD, COM P, M T, OBS, Ii if understood from the context. We also apply the definitions of consistency of a mode assignment with respect to a set X: Definition 2. (Consistent Mode Assignment). A mode assignment A is consistent with X iff SD ∪ A ∪ X 6|= ⊥. To model the dynamic aspects of context-specific behavior, we assume that the mode transition behavior of each individual component can be expressed as a finite transition system, with nodes representing the behavioral modes and the transitions possible mode changes. Definition 3. (Conditional Mode Transition System). The Conditional Mode Transition System T (C) for a component C ∈ COM P is a transition system hM, T i, where M = M odes(C) denotes the set of vertexes and T ⊆ M × M × L(Locals(C)) × R is a set of labeled transitions between component modes. Each transition hs, t, g, pi, s, t ∈ M , is labeled with a guard condition g over the component’s properties, Locals(C), and a probability estimate p ∈ [0, 1] that defines how likely the mode change is to occur given that the transition is indeed applicable. We assume that ∀m ∈ M : hm, m, g, pi with g satisfiable in SD and p > 0 always exists. These transitions capture the case where a component does not change mode. For example, component F representing a fuse may change from the ok mode to broken under different conditions. Figure 3 summarizes the two transitions: if the current i0 through F is below 1A, F changes to the broken mode with probability 0.02 and remains in ok mode otherwise. If i0 exceeds 1A, the transition to broken is much more likely (.99). The conditional probabilities and guard conditions given in Figure 3 define a conditional mode transition system for F . We allow non-deterministic mode transition systems, but for every pair of mode transitions that connect the same modes the guard conditions must be mutually unsatisfiable. This ensures that there is at most one applicable transition between each pair of modes for a component. This limitation could be lifted but would enlarge the search space considerable. Instead, we advocate to model unconditional, spontaneous transitions as guarded transitions, where the guard condition is obtained from the complement of the conditional transitions. The set M T contains a conditional mode transition system for each component in COM P . Similar to mode assignments, mode transitions for individual components can be extended to the full component set: Definition 4. (Joint Mode Transition). Let ti = hXi , Yi , Gi , Pi i ∈ T (Ci ), Ci ∈ COM P . A joint mode transition is a set T ⊆ {t1 , . . . , tn }, n ∈ [1, |COM P |], such that all Gi are jointly satisfiable given SD. T is source consistent w.r.t. a mode S assignment A iff ∀i : (Ci = Xi ) ∈ A and SD ∪ A ∪ {Gi } is consistent; T is target consistent with A iff ∀i : (Ci = Yi ) ∈ A and A is consistent. T is complete if n = |COM P | and partial otherwise. For the example, a partial joint mode transition J that assumes concurrent mode transitions from the ok mode to broken for both M1 and F failing spontaneously is represented as hokF , brokenF , 0 < i0 ≤ 1, 0.02i , . J= okM1 , brokenM1 , 0 < i0 ≤ 1.3, 0.05 Here we assume that the threshold above which the components fail in response to an abnormal flow of current are set at 1A and 1.3A for F and M1 , respectively. The joint transition is partial, since transitions for some components are absent from J. Joint Mode Transitions constrain the possible joint evolution of the individual components comprising the system: Definition 5. (Mode Trajectory). A mode trajectory T = hA1 , J1 , A2 , . . . , Ak , Jk , Ak+1 i, k ≥ 1, is an alternating sequence of mode assignments Ai and joint mode transitions Jj such that, for all i, Ji is source consistent with Ai and is target consistent with Ai+1 . T is consistent iff ∀i ∈ 1, . . . , k + 1 : Ai is consistent. T is complete if J1 , . . . , Jk are complete and partial otherwise. A mode trajectory characterizes a particular evolution of sets of concurrent mode transition, constrained by the transitions permitted by MT. While the transition systems representing the dynamic behavior of components are modeled locally, the behaviors of different components are linked through the common context established by the system model SD. Components can enable or prohibit context-specific behavior in others by changing the shared system state. For example, a two-step mode trajectory T that originates in a mode assignment where all but the two switch components are known to be in the ok mode, first assumes that M1 transits into shorted mode, followed by F entering broken, such that T terminates with the observation that both motors are ¬on, is given as follows: {F = ok, R = ok, M1 = ok, M2 = ok}, + okM1 , shortedM1 , 0 < i0 ≤ 1.3, 0.05 , {F = ok, R = ok, M1 = shorted, M2 = ok} , hokF , brokenF , 1 < i0 , 0.99i , {F = broken, R = ok, M1 = shorted, M2 = ok} * T = We can now restate the diagnosis problem as that of finding a complete, consistent mode trajectory originating in a state in I and leading to a state consistent with OBS: Definition 6. (Context-Consistent Mode Trajectory). A mode trajectory T = hA1 , . . . , Ak i, k ≥ 2, is context consistent for OBS and I iff T is consistent, A1 is consistent with I, and Ak is consistent with OBS. Here, context consistency is a necessary requirement for a mode trajectory to contain a relevant explanation: Observation 1. A mode trajectory T is an explanation for a diagnosis system only if it is context-consistent. First, consistency of T is a necessary requirement to ensure that the mode changes implied by two consecutive joint mode transitions are actually feasible given the system description. Second, if T was inconsistent with OBS, some observed symptom would not be resolved and the mode changes would not qualify as full explanation. Third, if inconsistent with I, the explanation would be known to not originate in a past “normal” system behavior and the trajectory would not reflect the actual system behavior. Observation 2. Every complete context-consistent mode trajectory T contains at least one (possibly trivial) sequence of mode transitions that evidence a possible system evolution. By construction, T links a state conforming to I to one consistent with OBS. Since T is complete, both the initial mode assignment A1 and the final mode assignment Ak of T are consistent with the constraints imposed by the system description, and a concrete mode trajectory from A1 to Ak exists. Based on these observations, we define the solution of a Diagnosis System as follows: Definition 7. (Diagnosis). A Mode Trajectory T is a Diagnosis for a Diagnosis System D iff T is complete and context consistent with D. Note that transition probabilities are used only to guide the computation of preferred explanations (see Section 4.2) and are not intrinsic to the characterization of diagnoses. Modeling system evolution in this way requires us to impose particular assumptions on the system to guarantee sound explanations; many of our assumptions are shared by other works, for example Tatar (1995): • Faults are to be discrete and persistent. While similar trajectory-based solutions have been proposed to handle intermittent (Baral et al., 2000) and transient faults (Narasimhan and Biswas, 2006), this is left for future extension. • The system is pseudo-static; that is, the system evolves much faster than the diagnosis and measurement process, and stabilizes after a finite number of steps, prior to the diagnosis process. • The diagnosis problem can be solved without reference to an explicit model of time (other than the order implied by mode trajectories). • We employ the Closed World Assumption in that modes and transitions not captured in the system model are assumed to not exist. For each component, rich behavioral models are available to capture (some of) the normal as well as the abnormal behavior. Our work is thus situated in the middle ground between abductive and consistency-based diagnosis. • The joint model behavior is affected only by variables in I, OBS, and A. No other hidden influences may exist. Since consistency-based diagnosis exploits inconsistencies between only those variables, this is not usually a severe restriction. • Different to Tatar (1995), we do not require that the mode transition systems are deterministic. 4. COMPUTING EXPLANATIONS Mode trajectories essentially capture the full dynamic behavior of the system model, and may contain unrelated mode changes, thus not qualifying as a parsimonious explanation. We address this problem by focusing search those context-consistent mode trajectories that are “most relevant” to a diagnosis problem. We apply both logical and probabilistic measures to arrive at suitable solutions. 4.1 Using Conflicts for Pruning We observe that a context-consistent trajectory can be computed incrementally, starting from a trivial (partial) trajectory T = hI, J, OBSi. By representing unassigned mode transitions and partial mode assignments in T as sets of variables, and mode transitions along with their guard expressions as constraints, a constraint satisfaction problem is obtained that can be solved efficiently using a modification of standard search procedures as described below. If as a result, a complete and context-consistent trajectory is obtained, its mode transitions are a diagnosis (Observation 2). Otherwise, Observation 1 implies that once a candidate trajectory T has become context-inconsistent, it cannot be completed into a diagnosis and must be altered. Here, conflicts state which combinations of transitions imply a discrepancy (either between mutually unsatisfiable preconditions, or via assigned mode variables). Hence, at least one of these transitions must be replaced by an alternative mode transition, or T must be extended with an additional joint mode transition to resolve the discrepancy. To ensure termination of the algorithm, an attempt to complete a candidate trajectory into a complete one is abandoned if it contains the same mode assignment more than once. This does not sacrifice completeness: Observation 3. If two (partial) mode assignments A1 and A2 appear in a context-consistent trajectory, where A1 and A2 assign the same modes to the same set of components (and equally leave the remaining components unassigned), the trajectory is redundant. If the duplicate joint transition is complete, the same trajectory without the section between the duplicate assignments is also a solution. Otherwise, since the number of components and mode transitions is finite, a different completion strategy can be found that ensures one of the two transitions is complete. Then, either the previous case applies or the redundancy no longer appears. Conflict-Directed A? (CDA? ) is an efficient algorithm for solving constraint-optimization problems that combines conflicts to prune infeasible regions in the search space and heuristics to focus on promising values first (Williams and Ragno, 2003). Like A? , it relies on admissible estimates for guidance. CDA? is designed to solve finite-domain CSPs where an optimal satisfying assignment is sought based on weights attached to variables. CDA? proceeds by assigning CSP variables in best-first order, guided by its heuristics, until either a complete consistent assignment or a conflict has been found. In the former case, the algorithm outputs the assignment as the solution and proceeds to enumerate other solutions. Otherwise, a conflict is obtained to guide the variable selection heuristics to satisfiable regions in the search space. To adapt the algorithm to the dynamic scope of our problem, it must be extended to (i) dynamically expand or shrink the CSP problem whenever a partial trajectory is expanded by one level, to (ii) generalize conflicts such that they can be reused across different sub-problems, and to (iii) avoid visiting equivalent states repeatedly by pruning based on visited mode assignments as well as on mode trajectories. Also, for efficiency, we optimize by delaying assignments to constraint variables for “nochange” transitions and apply simple pre-filtering based on partial assignments and known transition effects and prerequisite conditions to reduce variable domains during search. Function ExpandNode is assumed to be called from the top-level A? loop, where the node associated with the best cost estimate is retrieved, expanded, and newly added nodes are queued (excluding those that have already been visited). Williams and Ragno (2003) provide a comprehensive discussion of different variants of the basic algorithm. Our modifications to the algorithm are as follows: we associate mode trajectories with unique scope elements to be able to keep track of the set of relevant variables and conflicts for each trajectory. (i) Whenever a CSP variable representing a mode transition is assigned (if ResolvesAllConflicts?(node) is satisfied), each constraint variable that corresponds to a transition in a mode trajectory is marked with a reference to the assigned transition’s guard condition. If a node in the subtree is expanded, all constraints along the path to the root must be activated in the constraint solver prior to checking consistency. This ensures that constraints introduced by transition assignments are checked only in the sub-CSP for that mode trajectory. (ii) each CSP variable is associated with a scope element that keeps track of the set of CSP variables relevant to a trajectory. This is used in Leaf? to test whether all relevant variables have been assigned. Scope elements are organized in a hierarchical tree structure for efficiency. By sharing overlapping variable scopes, conflicts can be exploited across mode trajectories. ResolvesAllConflicts? checks whether the path from the CDA? root to node resolves all known conflicts in the node’s scope. Otherwise, in ExpandConflict, we check if a conflict involves a variable that belongs to the last mode assignment in a trajectory. If yes, we expand the trajectory by one level by creating a copy of the current variable scope and by adding CSP variables that represent the transitions in the newly created joint transition to it. We also add constraints that ensure the newly introduced joint transition cannot result in a duplicate transition (see Observation 3). function CDA? (DiagP rob) : ModeTrajectory visited ← ∅ queue ← {EmptyTrajectory(DiagP rob)} while ¬Empty(queue) do t ← RemoveBestCandidate(queue) if IsConsistent(t) ∧ IsComplete(t) then return Trajectory(t) end if visited ← visited ∪ {t} new ← ExpandNode(t) queue ← (queue ∪ new) \ visited end while return ⊥ end function function ExpandNode(node) : Set(Node) if ResolvesAllConflicts?(node) then if Leaf?(node, VarScope(node)) then return ExpandBestChild(node) ∪ ExpandBestAncestorsSiblings(node) else return ExpandBestChild(node) end if else return ExpandConflict(node) end if end function function ExpandConflict(node) : Set(Node) nextScope ← ResolveConflictBestChild(node) ∪ ExpandSibling(node) if InLastLevel?(node) then scope ← CloneScope(node) extScope ← ExtendOneLevel(scope) AddExclusionConstraints(extScope) nextScope ← nextScope ∪ {extScope} end if return nextScope end function 4.2 Using Probabilities to Guide Diagnosis In addition to using conflicts for pruning, we are interested in exploiting the conditional probabilities attached to mode transitions to guide the search towards trajectories that occur with high probability. Probability 0.99 0.05 0.05 0.03 0.03 Explanation broken(F) [dependent] broken(F) [dependent], broken(M1 ) broken(F) [dependent], broken(M2 ) shorted(M1 ) shorted(M2 ) 5. RELATED WORK AND DISCUSSION Fig. 4. Top five diagnoses for Figure 1 This idea is justified by the fact that in many practical scenarios, a few “likely” explanations that fit well within the observed scenario are more valuable than the complete set of candidate diagnoses, yet it is undesirable to impose a rigid limit on diagnosis size. Furthermore, contextdependent behaviors are particularly amenable for such treatment, since the attached probabilities often differ by several orders of magnitude. For example, the probability that F in Figure 1 breaks spontaneously is 2%, while F is almost certain to break if i0 > 1A. Therefore, only one of the possible candidates is likely to be considered in the search if those probabilities are leveraged as guidance heuristics. Unfortunately, the assumption that variables can be treated independently is violated in our domain, since conditional transitions can become dependent if they share common variables in SD, resulting in inadmissible estimates if combined under the assumption that all constituents occur independently. Moreover, priors for the distribution of mode assignments and other variables in SD are not available. For example, if we estimate the probability that components M1 ,M2 in Figure 1 break as P (Mj = broken|Mj0 = ok, ij > 1.5) obtained from the (trivial) example in this paper also indicate that the computation is largely dominated by the consistency checking and conflict extraction procedures. (We use a constraint propagation framework to represent our models.) j = 1, 2 then the common SD variable i0 linking both components together renders them dependent if the values of i1 , i2 are unassigned. Mj0 is Mj in the previous state. Similarly, the computation of the marginal probabilities requires knowledge of prior probabilities such as P (i1 > 1.5) that are generally unavailable for complex SD models. Our solution to this problem is to over-approximate these probabilities by ignoring the probability distributions of mode and state variables entirely by using the maximum possible value 1. While the resulting estimates are often crude, the variation in conditional probabilities, combined with pruning due to inconsistent mode assignments, and the memory-efficient CDA? search strategy all contribute to keep the search focused. Figure 4 shows the top five results when applying our algorithm to the example in Section 2. It can be seen that a single implied fault is returned as the top-ranked diagnosis. From the mode trajectory and SD it can be inferred that if both switches are closed, a dependent failure occurs in F . The two second-best explanations implicate F (again, this component failed dependently), together with an independent failure of M1 or M2 . The next two diagnoses attribute the failure to a shorted motor. In total, there are 33 different explanations. Our preliminary results indicate that the time required to compute dependent diagnoses is suitable for on-line use. All results presented in this paper required less than one second of CPU time, and none of our of our example runs required more than 20 seconds to compute. Early results Pan (1984) combined heuristic diagnostics with an explicit causal model to study dependent faults. A formal basis for causal representations was first provided in Console et al. (1989), where a qualitative causal network was used to define dependencies. A MAY link in the network represents the existence of an abstracted description that omits explicit specification of a potentially disambiguating factor. Dynamic aspects are abstracted and the networks are assumed to be singly connected. Tatar (1995) models the system as a deterministic finite state machine that executes a sequence of states and is required to achieve a stable state after a limited number of iterations, with diagnosis executed in an ATMS-based engine. More recently, Weber and Wotawa (2008) used so-called Cascading Failure Graphs to identify causal links between failures. As in MAY links, no strength or probability is associated to transitions and diagnostic rankings are conducted in terms of subset minimality under a number of boundary assumptions, such as the existence of a single root cause and the acyclicity of the graph. Models and heuristics that reflect possible interactions between components have already been exploited to adapt modeling assumptions and to focus the overall diagnostic process (Böttcher, 1995). While we employ the CWA to ensure effective discrimination, heuristics-driven adjustment of the ”active” hypotheses is a possible complementary technique. Qualitative models that capture physical properties of electrical and electronic circuits have also been proposed to broaden the spectrum of faults that can be isolated using traditional diagnostic inference mechanisms (de Kleer, 2007). Since our algorithm employs a generic notion of dependency, it could be generalized to accommodate different dependency models. We are currently investigating extensions to our framework to incorporate some of the dependency models presented by de Kleer (2007). The topic has also been tackled from a “Reasoning about Actions” viewpoint, mostly with a strong emphasis on repair and the assumption that repeated probing and evaluation is possible, starting with the model for repair actions with time dependencies in Friedrich et al. (1991). Sun and Weld (1993) used a diagnosis engine based on Raiman’s incomplete alibi principle to generate diagnostic candidates and then created plans for executing repair actions including further probes. Thiébaux et al. (1996) adapted a stochastic planner in a domain where unambiguously identified system states cannot be due to lack or failure of sensors, and actuators are likewise unreliable, so that not even a precise repair goal can be specified in advance. The cardinality of examined faults is successively incremented if repairs for single (and later binary, etc.) faults do not lead to the expected restoration of functionality. The search space had to be restricted via domain specific heuristics. Baral et al. (2000) provide a theoretical definition of diagnosis in terms of satisfying a repair goal over a space of action descriptions defined the situation calculus. Action outcomes can be nondeterministic (unreliable) although no probability estimates are involved. a dependent diagnosis along with all its dependent faults are enumerated, provided that the conditional probability of the dependent fault is high. Finally, a major body of work examines diagnosis and repair as part of model-based reactive control systems where the system model is encoded as an Optimal Constraint Satisfaction Problem (OCSP), e.g., Williams and Ragno (2003), where Conflict-Directed A* is used for mode estimation and reconfiguration, or Mikaelian et al. (2005), where system behavior is specified in terms of probabilistic hierarchical automata (PHCAs) that are solved using a decomposition algorithm. A similar representation is used in the diagnosis of hybrid systems with Temporal Causal Graphs (TCGs), which explicitly capture mode changes and time dependencies (Narasimhan and Biswas, 2006). TCGs also capture qualitative fault signatures that are compared against the continuous process parameters. 6. CONCLUSION Our framework extends previous work in multiple aspects. Considering the modeling aspect, our work does not require an explicit global model of fault and failure dependencies; instead, dependencies are derived from component models. As a side-effect we also avoid the problem of unexpected results due to mismatches between the causal model and the behavior predicted by SD. Our search procedure is able to deal with spontaneous and dependent faults simultaneously, while not requiring that all dependent faults originate in a single component. We apply heuristics to discriminate between logically equally plausible explanations. We rank explanations such that those that best fit an observed scenario are computed first. Assuming all conditional mode transitions share the same values, our framework computes results that match those of Weber and Wotawa (2008) if appropriate fault models are available. Since we recast the diagnosis problem essentially as a planning problem (the search for mode trajectories corresponds to the search for operators in the planning domain), the worst-case complexity is elevated to PSPACE. However, computational complexity does not seem to be a significant issue in diagnosing dependent failures, since even in the presence of dependent failures, the preferred paths are typically rather short (length 1–3 mode transitions in our example), and conflicts between mode transitions are rare. Our unified treatment of both Mode Assignments and mode transitions also helps to detect inconsistencies and apply constraint propagation early. While our approach shares some common attributes with incremental planning graph construction, using the same forward expansion process often is not possible, since the initial state may not be known in full. A potential limitation of our search strategy can be observed by some constituent explanations that narrowly outrank the larger explanation that includes an additional mode assumption. This is an artifact of our use of conditional probabilities < 1.0. To resolve the issue, one can simply continue enumerating the most-preferred explanations until CDA? either switches to a different candidate that no longer includes the previous best one, or until the probability estimate attached to the candidate drops below a given threshold. In this way, the most preferred kernel of In this paper we have presented a diagnosis framework that enables the diagnosis of dependent faults by specification of component specific behavior (mode transitions), aiming at situations where complex probing and repair strategies are not applicable. The model shares a basic assumptions commonly made by previously presented comparable approaches, but lifts part of these assumptions. Unlike Weber and Wotawa (2008); Tatar (1995); Mikaelian et al. (2005); Narasimhan and Biswas (2006) we do not operate on an explicit global dependency graph or model, and make no single origin assumption. Unlike Baral et al. (2000) we incorporate probabilistic information, and unlike Thiébaux et al. (1996), we do not use domain-specific heuristics or incremental diagnosis and probing. Among the restrictions we share with earlier work focusing on dependent faults Tatar (1995); Weber and Wotawa (2008) are the explicit modeling of time and the handling of repair actions. Lifting these is the main topic for future work. We are currently working on incorporating selected complementary dependency models into our implementation, and to extend our empirical investigation to larger and more diverse types of systems. REFERENCES O. Akerlund et al. ISAAC, a framework for integrated safety analysis of functional, geometrical, and human aspects. In Third European Congress ERTS Embedded Real Time Software, January 2006. Chitta Baral, Sheila McIlraith, and Tran Cao Son. Formulating diagnostic problem solving using an action language with narratives and sensing. In Proceedings of the International Conference on Principles of Knowledge Representation and Reasoning, pages 311–322, 2000. Claudia Böttcher. No faults in structure? How to diagnose hidden interaction. In Proceedings of the 14th International Joint Conference on Artificial Intelligence, pages 1728–1735, Montreal, August 1995. Luca Console, Daniele Theseider Dupré, and Pietro Torasso. A theory of diagnosis for incomplete causal models. In Proceedings of the 11th International Joint Conference on Artificial Intelligence, pages 1311–1317, Detroit, August 1989. Morgan Kaufmann Publishers, Inc. Johan de Kleer. Modeling when connections are the problem. In Manuela M. Veloso, editor, Proceedings of the 20th International Joint Conference on Artificial Intelligence, pages 310–317, Hyderabad, India, January 2007. Johan de Kleer. An improved approach for generating max-fault min-cardinality diagnoses. In Proceedings of the 19th International Workshop on Principles of Diagnosis, Blue Mountains, NSW, Australia, sep 2008. Johan de Kleer and B. C. Williams. Diagnosing multiple faults. Artificial Intelligence, 32(1):97–130, 1987. Gerhard Friedrich, Georg Gottlob, and Wolfgang Nejdl. Formalizing the repair process. In Proceedings of the Second International Workshop on Principles of Diagnosis, Milano, September 1991. Tsoline Mikaelian, Brian C. Williams, and Martin Sachenbacher. Model-based monitoring and diagnosis of systems with software-extended behavior. In Proceedings of the National Conference on Artificial Intelligence (AAAI), pages 327–333, Pittsburgh, 2005. Sriram Narasimhan and Gautam Biswas. Model-based diagnosis of hybrid systems. IEEE Transactions on Systems, Man and Cybernatics, 37(3):348–361, 2006. Jeff Pan. Qualitative reasoning with deep-level mechanism models for diagnoses of mechanism failures. In Proceedings of the IEEE Conference on Artificial Intelligence Applications (CAIA), pages 295–301, Denver, 1984. Raymond Reiter. A theory of diagnosis from first principles. Artificial Intelligence, 32:57–95, 1987. Ying Sun and Daniel S. Weld. A framework for modelbased repair. In Proceedings of the National Conference on Artificial Intelligence (AAAI), pages 182–187, 1993. Mugur Tatar. Diagnosis with cascading defects. In Proceedings of the Sixth International Workshop on Principles of Diagnosis, Goslar, October 1995. Sylvie Thiébaux, Marie-Odile Cordier, Olivier Jehl, and Jean-Paul Krivine. Supply restoration in power distribution systems: A case study in integrating model-based diagnosis and repair planning. In Proceedings of the International Conference on Uncertainty in Artificial Intelligence, pages 525–532, Portland, OR, 1996. Jörg Weber and Franz Wotawa. Diagnosing dependent failures in the context of consistency-based diagnosis. In Proceedings of the 19th International Workshop on Principles of Diagnosis, Blue Mountains, Sydney, Australia, September 2008. Brian C. Williams and Robert J. Ragno. Conflict-directed A* and its role in model-based embedded systems. Discrete Applied Mathematics, 2003.