Download Modeling Context-Dependent Faults for Diagnosis 1

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Neural modeling fields wikipedia , lookup

Ecological interface design wikipedia , lookup

Mathematical model wikipedia , lookup

Transcript
Modeling Context-Dependent Faults
for Diagnosis 1
Wolfgang Mayer ∗ Markus Stumptner ∗∗
∗
University of South Australia, Adelaide, SA 5095, Australia
(e-mail: [email protected])
∗∗
University of South Australia, Adelaide, SA 5095, Australia
(e-mail: [email protected])
Abstract: Most Model-based diagnosis frameworks rely on incremental probing and the
assumption that faults occur independently to infer the most likely explanation for a symptom.
For systems where additional sensors are unavailable or a repair action must be issued at once,
these assumptions are often inadequate and dependent faults must be considered explicitly.
We introduce explicit models of context-dependent component fault behavior and show that our
compositional models are well-suited for the one-shot fault diagnosis of pseudo-static systems.
We develop extensions to the well-known Conflict-Directed A* algorithm to infer the most-likely
system state given a fixed set of observations and show that our approach complements earlier
dependency models.
1. INTRODUCTION
Powerful model-based reasoning techniques to infer possible faults in a system based on manifestations of observable
symptoms have been developed. Given the inherent complexity of this problem and the fact that most practical
problems do not require explanations to be produced,
most modeling techniques and inference algorithms apply
the principle of parsimony to sacrifice completeness for
efficiency. For example, the assumptions that (i) the model
reflects all possible behaviors and component interactions
relevant to a diagnosis task, and (ii) all observed symptoms
can be attributed to a small set of component failures, and
(iii) that these failures occur independently, are commonly
found in the literature. In scenarios where only a few “most
likely” explanations are required, (ii) and (iii) together
with probabilistic models of single components failing are
often used to guide the search for candidate explanations
or guide incremental diagnosis and measurement (de Kleer
and Williams, 1987). However, these assumptions are not
always adequate. While recent algorithmic improvements
have led to systems that can efficiently test and diagnose
faults with large cardinality (for example MFMC faults are
discussed in de Kleer (2008)), many algorithms employ
strong independence assumptions and may not discriminate well between different fault candidates if dependent
faults are present. Our work aims to complement those
systems with fault inter-dependencies.
Our work aims at computing likely explanations of symptoms by considering components faults that are caused
by cascading effects of failures or mis-configurations elsewhere in the system. We use the term “fault” in the sense
that each “faulty” component must be repaired, replaced,
or reconfigured in order to restore the intended system
function. That does not imply that the component is or
1
This work was supported by the Australian Research Council
under grant DP0881854.
was necessarily faulted, just that it is in a mode that
cannot support the desired system function. For example, a broken fuse caused by overloading the system is
not faulty but must nevertheless be replaced to restore
system function. We extend consistency-based diagnosis of
static systems with context-specific component behavior to
capture component behaviors in the presence of cascading
failures. Our contributions are as follows:
• We present a diagnosis framework that incorporates
causality between components by means of capturing
context-specific component behavior triggered by external influences. As a result, our work unifies both
static diagnosis and diagnosis of cascading failures
under a common formalism.
• Our notion of causality dispenses with the singleorigin hypothesis widely applied in earlier work on
dependent failures, for example Weber and Wotawa
(2008) or Tatar (1995) by allowing failures contexts
to depend on multiple interacting components. We
also drop the assumption that failures are caused by
component faults by allowing that failures may also
be activated by unusual combinations in expected,
correct behavior.
• Our fault models are compositional, based on properties of individual components. No explicit global
model of failure propagation or causality of faults that
spans different components is required. Rather than
relying on explicit specifications of how failure modes
of one component may affect other components, we
infer dependent failures from the components’ inputand output values propagated in the model.
• We extend well-known heuristic search methods to
compute most likely explanations using consistencybased diagnosis techniques enriched with information
about possible mode changes.
This paper is organized as follows: In Section 2, we use
an example to informally introduce different aspects of
F
R
i0
V
10V
u
S1
S2
i1
i2
M1
M2
Fig. 1. Electronic Circuit Schematic Diagram
causality in failure diagnosis. A formal characterization
of context-dependent behavior is given in Section 3. Our
extensions to heuristic search procedures to compute diagnoses is presented in Section 4. We discuss the properties
of our framework and selected related work in Section 5.
conn0 ⇔ ok(F ) ∧ ok(R) ∧ (conn1 ∨ conn2 )
conn1 ⇔ closed(S1 ) ∧ (ok(M1 ) ∨ shorted(M1 ))
conn2 ⇔ closed(S2 ) ∧ (ok(M2 ) ∨ shorted(M2 ))
i0 = i1 + i2 ∧ r(R) i0 + u = 10
¬connk ⇒ ik = 0
(k ∈ {0, 1, 2})
connk ⇒ u = r(Mk ) ik
(k ∈ {1, 2})
ok(Mk ) ∧ ik > 0 ⇒ on(Mk ) ∧ r(Mk ) = 6
(k ∈ {1, 2})
shorted(Mk ) ⇒ r(Mk ) = 0
(k ∈ {1, 2})
ok(R) ⇒ r(R) = 5
Fig. 2. Formal Model of the Circuit in Figure 1
mode
ok
ok
ok
broken
condition
0A = i0
0A < i0 ≤ 1A
1A < i0
true
P (F =ok|·)
1.00
0.98
0.01
0.00
P (F =broken|·)
0.00
0.02
0.99
1.00
Fig. 3. Conditional Mode Transitions for F
2. EXPLOITING CAUSALITY FOR DIAGNOSIS
While successful for many systems, the assumptions that
failures occur independently may not hold for tightlycoupled systems, where failures in one component can
easily propagate to damage other parts of the system. For
example, the water pouring from a broken floodgate in a
dam may wreak havoc further downstream, a shorted component in an electric circuit may damage other connected
components, or damage to a jet engine rotor may spread
to other engine stages (Akerlund et al., 2006). Therefore,
repair choices based on fault identification and isolation
using the independence assumption may fail to restore
the full system purpose, and possible remaining faults
may adversely affect the replaced parts. While incremental
diagnosis and active probing may help to uncover hidden
faults and refine diagnoses, limited observability and time
or cost constraints may prohibit isolating the true origin
of faults. In particular, autonomous or embedded systems
operating under tight constraints are a prime example
where a “good-enough” diagnosis must be submitted to a
higher-level planning module based on the limited sensoric
information. Hence the desire to incorporate causal relationships between component behaviors into the diagnosis
process to extend the scope of explanations to components
that are likely to be affected.
Consider the electronic circuit depicted in Figure 1. Two
motors, M1 and M2 , are connected via switches S1 and S2
to a common circuit that contains a resistive element, R,
and a fuse, F , and a voltage source, V . A (much simplified)
model of the circuit is given in Figure 2: if assumed in the
ok mode, F and R conduct electricity based on simplified
models of physics; otherwise, they are perfect isolators.
Similarly, motors can be operating normally (mode ok ),
be short-circuited (shorted ), or broken. A motor can be
observed as on iff electrical current is available and it
is operating normally. Electrical current flows only if the
circuit forms a closed loop (conn0 ). Switches are either
open or closed.
This model is essentially a static representation of the possible system states, where the component mode assumptions (and observations) directly determine the system
state. Some cascading failures can be represented by additional axioms (for example, shorted(M1 ) ⇒ broken(F )),
but in general it is necessary to consider a sequence of
mode transitions to adequately capture the intermediate
state(s) induced by the system behavior. For example,
assume that M1 can transit from the ok mode into shorted
only if i0 > 0. Since the latter implies broken(F ) ∧ i0 = 0,
the necessary transient state where ok(F ) ∧ shorted(M1 ) ∧
i0 > 0 cannot be represented in our single-state model
and, therefore, not be explained.
Adhering to the compositional modeling paradigm, we
augment the static system description with possible mode
transitions defined locally for each component. For each
component, we represent possible transitions between the
different behavioral modes based on conditional probabilities enriched with additional constraints that guard the
activation of a transition. Given the current behavioral
mode and the component properties implied by the system
model, the conditional probability states how probable a
mode change is between two system states. For example,
Figure 3 gives the context-specific behavior of the fuse F
in Figure 1. Transition models for the other components
are defined similarly. In this model, an explanation of an
observed symptom is a trajectory of (possibly concurrent)
mode changes between a sequence of states, rather than a
simple static assignment of modes to components.
In our example, the most likely explanation for “both
motors being ¬on while both switches are closed” is that
F is broken with P (F = broken) ≈ 0.99. This is because if
both motors are connected and working properly, the flow
of electric current through F exceeds the 1A threshold,
and hence the much higher probability for failure applies.
Here, the context-specific behavior is encapsulated in F ,
and causal effects are derived from the system model; no
explicit representation of causal dependencies is necessary.
Different from classical consistency-based diagnosis, where
F and R are both considered possible explanations with
low probability, here F exceeds R (and the remaining
double-failure hypotheses) in likelihood by a factor of 20.
The discrepancy between the case where F fails spontaneously and where F fails causally illustrates that a
diagnostic framework based solely on prior probabilities
may fail to correctly discriminate between explanations.
3. MODELING CONDITIONAL BEHAVIOR
In this and the following sections we develop a formal model of our diagnosis framework. We build on
the consistency-based diagnosis framework as defined by,
e.g., Reiter (1987). We extend the classical static system
description with conditional mode transitions for each
component, and a characterization of the initial states
from which all diagnoses originate:
Definition 1. (Diagnosis System). A Diagnosis System is
a tuple hSD, COM P, M T, OBS, Ii, where SD contains
the structural composition of the system and the behavioral models of the components comprising the system.
The set COM P contains the atoms representing the components in SD, and M T defines a the conditional mode
transitions for each component. OBS contains a set of
literals representing the observations, and I a characterization of the initial state(s) of the system. Each component
C ∈ COM P is associated with a set of behavioral modes,
M odes(C), and a set of atoms representing properties and
ports of the component, Locals(C). A partial (complete)
assignment of component modes to component variables
in SD is called a partial (complete) mode assignment.
Similar to OBS, I represents the set of mode assignments that characterize those system states where all
relevant components function as intended. This could be
the single set where all components are assigned the
ok mode, or could admit multiple assignments are permitted to allow for unknown or “don’t care” system
variables, such as switch positions. In the following, we
will implicitly assume that a diagnosis system is given
hSD, COM P, M T, OBS, Ii if understood from the context.
We also apply the definitions of consistency of a mode
assignment with respect to a set X:
Definition 2. (Consistent Mode Assignment). A mode assignment A is consistent with X iff SD ∪ A ∪ X 6|= ⊥.
To model the dynamic aspects of context-specific behavior,
we assume that the mode transition behavior of each individual component can be expressed as a finite transition
system, with nodes representing the behavioral modes and
the transitions possible mode changes.
Definition 3. (Conditional Mode Transition System).
The Conditional Mode Transition System T (C) for a
component C ∈ COM P is a transition system hM, T i,
where M = M odes(C) denotes the set of vertexes and
T ⊆ M × M × L(Locals(C)) × R is a set of labeled
transitions between component modes.
Each transition hs, t, g, pi, s, t ∈ M , is labeled with a guard
condition g over the component’s properties, Locals(C),
and a probability estimate p ∈ [0, 1] that defines how likely
the mode change is to occur given that the transition is
indeed applicable. We assume that ∀m ∈ M : hm, m, g, pi
with g satisfiable in SD and p > 0 always exists. These
transitions capture the case where a component does not
change mode.
For example, component F representing a fuse may change
from the ok mode to broken under different conditions.
Figure 3 summarizes the two transitions: if the current
i0 through F is below 1A, F changes to the broken mode
with probability 0.02 and remains in ok mode otherwise. If
i0 exceeds 1A, the transition to broken is much more likely
(.99). The conditional probabilities and guard conditions
given in Figure 3 define a conditional mode transition
system for F .
We allow non-deterministic mode transition systems, but
for every pair of mode transitions that connect the same
modes the guard conditions must be mutually unsatisfiable. This ensures that there is at most one applicable
transition between each pair of modes for a component.
This limitation could be lifted but would enlarge the
search space considerable. Instead, we advocate to model
unconditional, spontaneous transitions as guarded transitions, where the guard condition is obtained from the
complement of the conditional transitions.
The set M T contains a conditional mode transition system
for each component in COM P . Similar to mode assignments, mode transitions for individual components can be
extended to the full component set:
Definition 4. (Joint Mode Transition).
Let ti = hXi , Yi , Gi , Pi i ∈ T (Ci ), Ci ∈ COM P . A
joint mode transition is a set T ⊆ {t1 , . . . , tn }, n ∈
[1, |COM P |], such that all Gi are jointly satisfiable given
SD. T is source consistent w.r.t. a mode
S assignment A iff
∀i : (Ci = Xi ) ∈ A and SD ∪ A ∪ {Gi } is consistent;
T is target consistent with A iff ∀i : (Ci = Yi ) ∈ A and
A is consistent. T is complete if n = |COM P | and partial
otherwise.
For the example, a partial joint mode transition J that
assumes concurrent mode transitions from the ok mode
to broken for both M1 and F failing spontaneously is
represented as
hokF , brokenF , 0 < i0 ≤ 1, 0.02i , .
J=
okM1 , brokenM1 , 0 < i0 ≤ 1.3, 0.05
Here we assume that the threshold above which the
components fail in response to an abnormal flow of current
are set at 1A and 1.3A for F and M1 , respectively.
The joint transition is partial, since transitions for some
components are absent from J.
Joint Mode Transitions constrain the possible joint evolution of the individual components comprising the system:
Definition 5. (Mode Trajectory). A mode trajectory T =
hA1 , J1 , A2 , . . . , Ak , Jk , Ak+1 i, k ≥ 1, is an alternating
sequence of mode assignments Ai and joint mode transitions Jj such that, for all i, Ji is source consistent with
Ai and is target consistent with Ai+1 . T is consistent
iff ∀i ∈ 1, . . . , k + 1 : Ai is consistent. T is complete if
J1 , . . . , Jk are complete and partial otherwise.
A mode trajectory characterizes a particular evolution of
sets of concurrent mode transition, constrained by the
transitions permitted by MT. While the transition systems
representing the dynamic behavior of components are
modeled locally, the behaviors of different components are
linked through the common context established by the
system model SD. Components can enable or prohibit
context-specific behavior in others by changing the shared
system state.
For example, a two-step mode trajectory T that originates
in a mode assignment where all but the two switch
components are known to be in the ok mode, first assumes
that M1 transits into shorted mode, followed by F entering
broken, such that T terminates with the observation that
both motors are ¬on, is given as follows:
{F = ok, R = ok, M1 = ok, M2 = ok},
+
okM1 , shortedM1 , 0 < i0 ≤ 1.3, 0.05 ,
{F = ok, R = ok, M1 = shorted, M2 = ok} ,
hokF , brokenF , 1 < i0 , 0.99i ,
{F = broken, R = ok, M1 = shorted, M2 = ok}
*
T =
We can now restate the diagnosis problem as that of
finding a complete, consistent mode trajectory originating
in a state in I and leading to a state consistent with OBS:
Definition 6. (Context-Consistent Mode Trajectory). A
mode trajectory T = hA1 , . . . , Ak i, k ≥ 2, is context
consistent for OBS and I iff T is consistent, A1 is
consistent with I, and Ak is consistent with OBS.
Here, context consistency is a necessary requirement for a
mode trajectory to contain a relevant explanation:
Observation 1. A mode trajectory T is an explanation for
a diagnosis system only if it is context-consistent.
First, consistency of T is a necessary requirement to ensure
that the mode changes implied by two consecutive joint
mode transitions are actually feasible given the system
description. Second, if T was inconsistent with OBS, some
observed symptom would not be resolved and the mode
changes would not qualify as full explanation. Third, if
inconsistent with I, the explanation would be known to
not originate in a past “normal” system behavior and the
trajectory would not reflect the actual system behavior.
Observation 2. Every complete context-consistent mode
trajectory T contains at least one (possibly trivial) sequence of mode transitions that evidence a possible system
evolution.
By construction, T links a state conforming to I to one
consistent with OBS. Since T is complete, both the initial
mode assignment A1 and the final mode assignment Ak
of T are consistent with the constraints imposed by the
system description, and a concrete mode trajectory from
A1 to Ak exists.
Based on these observations, we define the solution of a
Diagnosis System as follows:
Definition 7. (Diagnosis). A Mode Trajectory T is a Diagnosis for a Diagnosis System D iff T is complete and
context consistent with D.
Note that transition probabilities are used only to guide
the computation of preferred explanations (see Section 4.2)
and are not intrinsic to the characterization of diagnoses.
Modeling system evolution in this way requires us to
impose particular assumptions on the system to guarantee
sound explanations; many of our assumptions are shared
by other works, for example Tatar (1995):
• Faults are to be discrete and persistent. While similar trajectory-based solutions have been proposed to
handle intermittent (Baral et al., 2000) and transient
faults (Narasimhan and Biswas, 2006), this is left for
future extension.
• The system is pseudo-static; that is, the system
evolves much faster than the diagnosis and measurement process, and stabilizes after a finite number of
steps, prior to the diagnosis process.
• The diagnosis problem can be solved without reference to an explicit model of time (other than the order
implied by mode trajectories).
• We employ the Closed World Assumption in that
modes and transitions not captured in the system
model are assumed to not exist. For each component,
rich behavioral models are available to capture (some
of) the normal as well as the abnormal behavior. Our
work is thus situated in the middle ground between
abductive and consistency-based diagnosis.
• The joint model behavior is affected only by variables
in I, OBS, and A. No other hidden influences may
exist. Since consistency-based diagnosis exploits inconsistencies between only those variables, this is not
usually a severe restriction.
• Different to Tatar (1995), we do not require that the
mode transition systems are deterministic.
4. COMPUTING EXPLANATIONS
Mode trajectories essentially capture the full dynamic
behavior of the system model, and may contain unrelated
mode changes, thus not qualifying as a parsimonious
explanation. We address this problem by focusing search
those context-consistent mode trajectories that are “most
relevant” to a diagnosis problem. We apply both logical
and probabilistic measures to arrive at suitable solutions.
4.1 Using Conflicts for Pruning
We observe that a context-consistent trajectory can be
computed incrementally, starting from a trivial (partial)
trajectory T = hI, J, OBSi. By representing unassigned
mode transitions and partial mode assignments in T as
sets of variables, and mode transitions along with their
guard expressions as constraints, a constraint satisfaction
problem is obtained that can be solved efficiently using a
modification of standard search procedures as described
below.
If as a result, a complete and context-consistent trajectory
is obtained, its mode transitions are a diagnosis (Observation 2). Otherwise, Observation 1 implies that once a
candidate trajectory T has become context-inconsistent, it
cannot be completed into a diagnosis and must be altered.
Here, conflicts state which combinations of transitions
imply a discrepancy (either between mutually unsatisfiable
preconditions, or via assigned mode variables). Hence, at
least one of these transitions must be replaced by an alternative mode transition, or T must be extended with an
additional joint mode transition to resolve the discrepancy.
To ensure termination of the algorithm, an attempt to
complete a candidate trajectory into a complete one is
abandoned if it contains the same mode assignment more
than once. This does not sacrifice completeness:
Observation 3. If two (partial) mode assignments A1 and
A2 appear in a context-consistent trajectory, where A1 and
A2 assign the same modes to the same set of components
(and equally leave the remaining components unassigned),
the trajectory is redundant.
If the duplicate joint transition is complete, the same
trajectory without the section between the duplicate assignments is also a solution. Otherwise, since the number
of components and mode transitions is finite, a different
completion strategy can be found that ensures one of the
two transitions is complete. Then, either the previous case
applies or the redundancy no longer appears.
Conflict-Directed A? (CDA? ) is an efficient algorithm for
solving constraint-optimization problems that combines
conflicts to prune infeasible regions in the search space and
heuristics to focus on promising values first (Williams and
Ragno, 2003). Like A? , it relies on admissible estimates for
guidance. CDA? is designed to solve finite-domain CSPs
where an optimal satisfying assignment is sought based on
weights attached to variables.
CDA? proceeds by assigning CSP variables in best-first
order, guided by its heuristics, until either a complete
consistent assignment or a conflict has been found. In
the former case, the algorithm outputs the assignment as
the solution and proceeds to enumerate other solutions.
Otherwise, a conflict is obtained to guide the variable
selection heuristics to satisfiable regions in the search
space.
To adapt the algorithm to the dynamic scope of our
problem, it must be extended to (i) dynamically expand
or shrink the CSP problem whenever a partial trajectory
is expanded by one level, to (ii) generalize conflicts such
that they can be reused across different sub-problems,
and to (iii) avoid visiting equivalent states repeatedly
by pruning based on visited mode assignments as well
as on mode trajectories. Also, for efficiency, we optimize
by delaying assignments to constraint variables for “nochange” transitions and apply simple pre-filtering based
on partial assignments and known transition effects and
prerequisite conditions to reduce variable domains during
search.
Function ExpandNode is assumed to be called from the
top-level A? loop, where the node associated with the
best cost estimate is retrieved, expanded, and newly added
nodes are queued (excluding those that have already been
visited). Williams and Ragno (2003) provide a comprehensive discussion of different variants of the basic algorithm.
Our modifications to the algorithm are as follows: we
associate mode trajectories with unique scope elements to
be able to keep track of the set of relevant variables and
conflicts for each trajectory.
(i) Whenever a CSP variable representing a mode transition is assigned (if ResolvesAllConflicts?(node) is
satisfied), each constraint variable that corresponds to a
transition in a mode trajectory is marked with a reference
to the assigned transition’s guard condition. If a node in
the subtree is expanded, all constraints along the path
to the root must be activated in the constraint solver
prior to checking consistency. This ensures that constraints
introduced by transition assignments are checked only in
the sub-CSP for that mode trajectory.
(ii) each CSP variable is associated with a scope element
that keeps track of the set of CSP variables relevant to
a trajectory. This is used in Leaf? to test whether all
relevant variables have been assigned. Scope elements are
organized in a hierarchical tree structure for efficiency.
By sharing overlapping variable scopes, conflicts can be
exploited across mode trajectories.
ResolvesAllConflicts? checks whether the path from
the CDA? root to node resolves all known conflicts in the
node’s scope. Otherwise, in ExpandConflict, we check
if a conflict involves a variable that belongs to the last
mode assignment in a trajectory. If yes, we expand the
trajectory by one level by creating a copy of the current
variable scope and by adding CSP variables that represent
the transitions in the newly created joint transition to it.
We also add constraints that ensure the newly introduced
joint transition cannot result in a duplicate transition (see
Observation 3).
function CDA? (DiagP rob) : ModeTrajectory
visited ← ∅
queue ← {EmptyTrajectory(DiagP rob)}
while ¬Empty(queue) do
t ← RemoveBestCandidate(queue)
if IsConsistent(t) ∧ IsComplete(t) then
return Trajectory(t)
end if
visited ← visited ∪ {t}
new ← ExpandNode(t)
queue ← (queue ∪ new) \ visited
end while
return ⊥
end function
function ExpandNode(node) : Set(Node)
if ResolvesAllConflicts?(node) then
if Leaf?(node, VarScope(node)) then
return ExpandBestChild(node)
∪ ExpandBestAncestorsSiblings(node)
else
return ExpandBestChild(node)
end if
else
return ExpandConflict(node)
end if
end function
function ExpandConflict(node) : Set(Node)
nextScope ← ResolveConflictBestChild(node)
∪ ExpandSibling(node)
if InLastLevel?(node) then
scope ← CloneScope(node)
extScope ← ExtendOneLevel(scope)
AddExclusionConstraints(extScope)
nextScope ← nextScope ∪ {extScope}
end if
return nextScope
end function
4.2 Using Probabilities to Guide Diagnosis
In addition to using conflicts for pruning, we are interested
in exploiting the conditional probabilities attached to
mode transitions to guide the search towards trajectories
that occur with high probability.
Probability
0.99
0.05
0.05
0.03
0.03
Explanation
broken(F) [dependent]
broken(F) [dependent], broken(M1 )
broken(F) [dependent], broken(M2 )
shorted(M1 )
shorted(M2 )
5. RELATED WORK AND DISCUSSION
Fig. 4. Top five diagnoses for Figure 1
This idea is justified by the fact that in many practical
scenarios, a few “likely” explanations that fit well within
the observed scenario are more valuable than the complete
set of candidate diagnoses, yet it is undesirable to impose
a rigid limit on diagnosis size. Furthermore, contextdependent behaviors are particularly amenable for such
treatment, since the attached probabilities often differ by
several orders of magnitude. For example, the probability
that F in Figure 1 breaks spontaneously is 2%, while F is
almost certain to break if i0 > 1A. Therefore, only one
of the possible candidates is likely to be considered in
the search if those probabilities are leveraged as guidance
heuristics.
Unfortunately, the assumption that variables can be
treated independently is violated in our domain, since
conditional transitions can become dependent if they share
common variables in SD, resulting in inadmissible estimates if combined under the assumption that all constituents occur independently. Moreover, priors for the
distribution of mode assignments and other variables in
SD are not available. For example, if we estimate the
probability that components M1 ,M2 in Figure 1 break as
P (Mj = broken|Mj0 = ok, ij > 1.5)
obtained from the (trivial) example in this paper also
indicate that the computation is largely dominated by the
consistency checking and conflict extraction procedures.
(We use a constraint propagation framework to represent
our models.)
j = 1, 2
then the common SD variable i0 linking both components
together renders them dependent if the values of i1 , i2 are
unassigned. Mj0 is Mj in the previous state. Similarly,
the computation of the marginal probabilities requires
knowledge of prior probabilities such as P (i1 > 1.5) that
are generally unavailable for complex SD models.
Our solution to this problem is to over-approximate these
probabilities by ignoring the probability distributions of
mode and state variables entirely by using the maximum
possible value 1. While the resulting estimates are often
crude, the variation in conditional probabilities, combined
with pruning due to inconsistent mode assignments, and
the memory-efficient CDA? search strategy all contribute
to keep the search focused.
Figure 4 shows the top five results when applying our
algorithm to the example in Section 2. It can be seen
that a single implied fault is returned as the top-ranked
diagnosis. From the mode trajectory and SD it can be
inferred that if both switches are closed, a dependent
failure occurs in F . The two second-best explanations
implicate F (again, this component failed dependently),
together with an independent failure of M1 or M2 . The
next two diagnoses attribute the failure to a shorted motor.
In total, there are 33 different explanations.
Our preliminary results indicate that the time required to
compute dependent diagnoses is suitable for on-line use.
All results presented in this paper required less than one
second of CPU time, and none of our of our example runs
required more than 20 seconds to compute. Early results
Pan (1984) combined heuristic diagnostics with an explicit
causal model to study dependent faults. A formal basis
for causal representations was first provided in Console
et al. (1989), where a qualitative causal network was
used to define dependencies. A MAY link in the network
represents the existence of an abstracted description that
omits explicit specification of a potentially disambiguating
factor. Dynamic aspects are abstracted and the networks
are assumed to be singly connected. Tatar (1995) models
the system as a deterministic finite state machine that
executes a sequence of states and is required to achieve
a stable state after a limited number of iterations, with
diagnosis executed in an ATMS-based engine. More recently, Weber and Wotawa (2008) used so-called Cascading Failure Graphs to identify causal links between failures.
As in MAY links, no strength or probability is associated
to transitions and diagnostic rankings are conducted in
terms of subset minimality under a number of boundary
assumptions, such as the existence of a single root cause
and the acyclicity of the graph.
Models and heuristics that reflect possible interactions
between components have already been exploited to adapt
modeling assumptions and to focus the overall diagnostic
process (Böttcher, 1995). While we employ the CWA to ensure effective discrimination, heuristics-driven adjustment
of the ”active” hypotheses is a possible complementary
technique.
Qualitative models that capture physical properties of
electrical and electronic circuits have also been proposed
to broaden the spectrum of faults that can be isolated using traditional diagnostic inference mechanisms (de Kleer,
2007). Since our algorithm employs a generic notion of
dependency, it could be generalized to accommodate different dependency models. We are currently investigating
extensions to our framework to incorporate some of the
dependency models presented by de Kleer (2007).
The topic has also been tackled from a “Reasoning about
Actions” viewpoint, mostly with a strong emphasis on
repair and the assumption that repeated probing and
evaluation is possible, starting with the model for repair
actions with time dependencies in Friedrich et al. (1991).
Sun and Weld (1993) used a diagnosis engine based on
Raiman’s incomplete alibi principle to generate diagnostic
candidates and then created plans for executing repair
actions including further probes. Thiébaux et al. (1996)
adapted a stochastic planner in a domain where unambiguously identified system states cannot be due to lack
or failure of sensors, and actuators are likewise unreliable,
so that not even a precise repair goal can be specified in
advance. The cardinality of examined faults is successively
incremented if repairs for single (and later binary, etc.)
faults do not lead to the expected restoration of functionality. The search space had to be restricted via domain
specific heuristics. Baral et al. (2000) provide a theoretical
definition of diagnosis in terms of satisfying a repair goal
over a space of action descriptions defined the situation
calculus. Action outcomes can be nondeterministic (unreliable) although no probability estimates are involved.
a dependent diagnosis along with all its dependent faults
are enumerated, provided that the conditional probability
of the dependent fault is high.
Finally, a major body of work examines diagnosis and
repair as part of model-based reactive control systems
where the system model is encoded as an Optimal Constraint Satisfaction Problem (OCSP), e.g., Williams and
Ragno (2003), where Conflict-Directed A* is used for mode
estimation and reconfiguration, or Mikaelian et al. (2005),
where system behavior is specified in terms of probabilistic
hierarchical automata (PHCAs) that are solved using a
decomposition algorithm. A similar representation is used
in the diagnosis of hybrid systems with Temporal Causal
Graphs (TCGs), which explicitly capture mode changes
and time dependencies (Narasimhan and Biswas, 2006).
TCGs also capture qualitative fault signatures that are
compared against the continuous process parameters.
6. CONCLUSION
Our framework extends previous work in multiple aspects. Considering the modeling aspect, our work does
not require an explicit global model of fault and failure dependencies; instead, dependencies are derived from
component models. As a side-effect we also avoid the
problem of unexpected results due to mismatches between
the causal model and the behavior predicted by SD. Our
search procedure is able to deal with spontaneous and
dependent faults simultaneously, while not requiring that
all dependent faults originate in a single component. We
apply heuristics to discriminate between logically equally
plausible explanations. We rank explanations such that
those that best fit an observed scenario are computed first.
Assuming all conditional mode transitions share the same
values, our framework computes results that match those
of Weber and Wotawa (2008) if appropriate fault models
are available.
Since we recast the diagnosis problem essentially as a planning problem (the search for mode trajectories corresponds
to the search for operators in the planning domain), the
worst-case complexity is elevated to PSPACE. However,
computational complexity does not seem to be a significant
issue in diagnosing dependent failures, since even in the
presence of dependent failures, the preferred paths are
typically rather short (length 1–3 mode transitions in our
example), and conflicts between mode transitions are rare.
Our unified treatment of both Mode Assignments and
mode transitions also helps to detect inconsistencies and
apply constraint propagation early. While our approach
shares some common attributes with incremental planning
graph construction, using the same forward expansion
process often is not possible, since the initial state may
not be known in full.
A potential limitation of our search strategy can be observed by some constituent explanations that narrowly
outrank the larger explanation that includes an additional
mode assumption. This is an artifact of our use of conditional probabilities < 1.0. To resolve the issue, one can
simply continue enumerating the most-preferred explanations until CDA? either switches to a different candidate
that no longer includes the previous best one, or until the
probability estimate attached to the candidate drops below
a given threshold. In this way, the most preferred kernel of
In this paper we have presented a diagnosis framework that
enables the diagnosis of dependent faults by specification
of component specific behavior (mode transitions), aiming
at situations where complex probing and repair strategies
are not applicable. The model shares a basic assumptions
commonly made by previously presented comparable approaches, but lifts part of these assumptions. Unlike Weber
and Wotawa (2008); Tatar (1995); Mikaelian et al. (2005);
Narasimhan and Biswas (2006) we do not operate on an
explicit global dependency graph or model, and make no
single origin assumption. Unlike Baral et al. (2000) we incorporate probabilistic information, and unlike Thiébaux
et al. (1996), we do not use domain-specific heuristics or
incremental diagnosis and probing.
Among the restrictions we share with earlier work focusing
on dependent faults Tatar (1995); Weber and Wotawa
(2008) are the explicit modeling of time and the handling
of repair actions. Lifting these is the main topic for future
work. We are currently working on incorporating selected
complementary dependency models into our implementation, and to extend our empirical investigation to larger
and more diverse types of systems.
REFERENCES
O. Akerlund et al. ISAAC, a framework for integrated
safety analysis of functional, geometrical, and human
aspects. In Third European Congress ERTS Embedded
Real Time Software, January 2006.
Chitta Baral, Sheila McIlraith, and Tran Cao Son. Formulating diagnostic problem solving using an action language with narratives and sensing. In Proceedings of the
International Conference on Principles of Knowledge
Representation and Reasoning, pages 311–322, 2000.
Claudia Böttcher. No faults in structure? How to diagnose
hidden interaction. In Proceedings of the 14th International Joint Conference on Artificial Intelligence, pages
1728–1735, Montreal, August 1995.
Luca Console, Daniele Theseider Dupré, and Pietro
Torasso. A theory of diagnosis for incomplete causal
models. In Proceedings of the 11th International Joint
Conference on Artificial Intelligence, pages 1311–1317,
Detroit, August 1989. Morgan Kaufmann Publishers,
Inc.
Johan de Kleer. Modeling when connections are the
problem. In Manuela M. Veloso, editor, Proceedings
of the 20th International Joint Conference on Artificial
Intelligence, pages 310–317, Hyderabad, India, January
2007.
Johan de Kleer. An improved approach for generating
max-fault min-cardinality diagnoses. In Proceedings
of the 19th International Workshop on Principles of
Diagnosis, Blue Mountains, NSW, Australia, sep 2008.
Johan de Kleer and B. C. Williams. Diagnosing multiple
faults. Artificial Intelligence, 32(1):97–130, 1987.
Gerhard Friedrich, Georg Gottlob, and Wolfgang Nejdl.
Formalizing the repair process. In Proceedings of the
Second International Workshop on Principles of Diagnosis, Milano, September 1991.
Tsoline Mikaelian, Brian C. Williams, and Martin Sachenbacher. Model-based monitoring and diagnosis of systems with software-extended behavior. In Proceedings
of the National Conference on Artificial Intelligence
(AAAI), pages 327–333, Pittsburgh, 2005.
Sriram Narasimhan and Gautam Biswas. Model-based
diagnosis of hybrid systems. IEEE Transactions on
Systems, Man and Cybernatics, 37(3):348–361, 2006.
Jeff Pan. Qualitative reasoning with deep-level mechanism
models for diagnoses of mechanism failures. In Proceedings of the IEEE Conference on Artificial Intelligence
Applications (CAIA), pages 295–301, Denver, 1984.
Raymond Reiter. A theory of diagnosis from first principles. Artificial Intelligence, 32:57–95, 1987.
Ying Sun and Daniel S. Weld. A framework for modelbased repair. In Proceedings of the National Conference
on Artificial Intelligence (AAAI), pages 182–187, 1993.
Mugur Tatar. Diagnosis with cascading defects. In
Proceedings of the Sixth International Workshop on
Principles of Diagnosis, Goslar, October 1995.
Sylvie Thiébaux, Marie-Odile Cordier, Olivier Jehl, and
Jean-Paul Krivine. Supply restoration in power distribution systems: A case study in integrating model-based
diagnosis and repair planning. In Proceedings of the
International Conference on Uncertainty in Artificial
Intelligence, pages 525–532, Portland, OR, 1996.
Jörg Weber and Franz Wotawa. Diagnosing dependent
failures in the context of consistency-based diagnosis.
In Proceedings of the 19th International Workshop on
Principles of Diagnosis, Blue Mountains, Sydney, Australia, September 2008.
Brian C. Williams and Robert J. Ragno. Conflict-directed
A* and its role in model-based embedded systems.
Discrete Applied Mathematics, 2003.