Download Building BN-Based System Reliability Model by Dual Genetic

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Personal knowledge base wikipedia , lookup

Catastrophic interference wikipedia , lookup

Rete algorithm wikipedia , lookup

Minimax wikipedia , lookup

Gene expression programming wikipedia , lookup

Hierarchical temporal memory wikipedia , lookup

Genetic algorithm wikipedia , lookup

Transcript
Building BN-Based System Reliability Model by Dual Genetic Algorithm
YOU Wei-zhen(游威振),ZHONG Xiao-pin(钟小品)*
Shenzhen Key Laboratory of Electromagnetic Control, Shenzhen University, Shenzhen 518060, China
Abstract: A system reliability model based on Bayesian network (BN) is built via an evolutionary strategy called
dual genetic algorithm (DGA). BN is a probabilistic approach to analyze relationships between stochastic events.
In contrast with traditional methods where the BN model for a system is built by professionals, the DGA is
proposed for the automatic analysis of historical data and construction of the BN for the estimation of system
reliability. The whole solution space of BN structures is searched by DGA and a more accurate BN model is
obtained. The efficacy of the proposed method is shown by some literature examples.
Key words: Bayesian network (BN) model; dual genetic algorithm (DGA); system reliability; historical data
CLC number: TP277
Document code: A
Article ID: 1672-5220
Introduction
System reliability is generally defined as the probability that a product does not fail during a defined period
of time under given functional and surrounding conditions[1]. Reliability is among the most frequently considered
aspects for systems, such as mechanical or electric control systems. The introduction of newly developed
technology and increasingly complex systems leads to more and more unreliable factors for the environment and
the safety of people. This makes it important and challenging to estimate system reliability for most system
engineers[2].
The traditional approaches are always based on the premise that the failure mechanism of the system is well
understood[3]. However, for newly designed complex systems, it becomes challenging to understand the detailed
relations among components of the system.
To solve this problem, Bayesian network structure learning theory has been proposed and developed by
researchers as a new way to reliability estimation[4-6]. The earliest studies of BN model for reliability estimation
was done by Barlow[7], who compared Bayesian and non-Bayesian approaches for system reliability estimation
when studying spherical pressure vessels. A graphical-belief environment, introduced by Almond[8], was
developed for large complex systems for risk evaluation. It is important but difficult to build an accurate BN
structure, since a large space of possible BN structures could be searched from a small number of nodes. The
well-known K2 algorithm aims to search the best BN structure by analyzing a system database[9]. However, the
K2 algorithm is based on the assumption that there is a given ordering between the variables, leading to the result
that an optimal structure may not be found. Although the traditional GA-based structure learning methods have
been proposed[10-11], they cannot find a general way to search the entire BN space for the reason that the genetic
operations cannot close in all possible BN structures.
This paper therefore proposes a new method of BN structure learning called dual genetic algorithm (DGA)[12],
in which a particular BN is represented by a pair of chromosomes—ordering chromosome and connectivity
chromosome. With special genetic operators, this method can learn all the topologies of BN nodes as well as the
ordering among them, thereby, searching the whole solution space of the BN problem.
The paper is organized as follows: Section 1 provides a brief summary of BN for system reliability. Section 2
gives information about DGA for BN structure learning. Section 3 presents a simulation and a performance
analysis of the proposed method. Section 4performs the reliability estimation via DGA-built BN model.
Conclusions are finally provided in section 5.
1 Bayesian Network for System Reliability
The BN is represented as a directed acyclic graph (DAG), in which the nodes represent the corresponding
variables and the directed links represent the dependent relationship between every two nodes [13]. The nodes are
called root nodes if no arrows lead into them, while the nodes having some arrows leading into them are named
___________________________
Received date: 2015-06-12
Foundation item: National Natural Science Foundation of China (No.61203184)
*Correspondence should be addressed to ZHONG Xiao-pin, Email: [email protected]
child nodes. The nodes that have arrows leading out from them are parent nodes. So a root node sometimes may
be a parent node, see the sample BN in Fig. 1. It is a probabilistic approach to analyze and capture the
relationships between stochastic events. From the perspective of probability, the variables of a BN are viewed as
the components in a system, while the links represent the interactions of the components that lead to ‘‘success’’ or
‘‘failure’’ of the system thereby, one could treat the BN as an approach to describe the interactions among the
components of the system. For easy understanding, a binary digit is used to represent the state of a component: 0
means “not functioning” and 1 means “working properly”. In general, the strength of dependence between every
two nodes is represented by a inter action between the two nodes represented by a probability value. The joint
probability of all variables, when they are in particular instantiation in the BN, is calculated as Eq. (1).
p(S )   p( Ai | P( Ai ))
(1)
i
where S  A1 , A2 ,..., An represents all nodes in BN and P( Ai ) represents the parents of Ai . In reliability estimation,
the components are assigned with their conditional probability tables (CPTs). For Ai , the CPT
contains p( Ai | P( Ai )) . Each parent of Ai is instantiated as one particular state: Success or Failure. If Ai
has m parents, then there should be 2 m different instantiations in its CPT.
For example, in Fig.1 the nodes A1 , A2 , A3 are root nodes for the reason that they have no oriented edges
coming in or out. That is, they are independent of each other. A4 and A5 have the parent
set { A1 , A2 } and { A2 , A3 } respectively, so they are all child nodes.
A1
A2
A4
A3
A5
System
behavior
Fig.1 A sample BN structure
System behavior node represents the state of the system. If we know the corresponding CPTs to this BN
model, then by implementing Bayes’ rule the overall system success probability can be computed as follows:
p( A6  1) 
1
1
1
  A0 p( A1 ) p( A2 ) p( A4 | A1, A2 ) p( A3 ) p( A5 | A2 , A3 ) p( A6  1| A4 , A5 )
A 0 A 0
1
2
(2)
5
where A6 represents system behavior.
2 DGA for BN Construction
In the proposed method DGA, the ordering chromosome represents the order between all nodes in the BN
structure, but it is not fixed in a certain permutation. The connectivity matrix is deduced from an upper triangular
matrix containing binary digits, which is also in change. This combination ensures the entire space of BN
structures to be searched for the fittest one.
2.1 Encoding BN structure
The ordering chromosome X o (subscript ‘o’ denotes ‘ordering’) shown below is an array of unrepeated
integers ranging from 1 to n , where xi {1, 2, , n}. If there are m root nodes in BN, then the first m integers
in X o (subscript ‘c’ denotes ‘connectivity’) represent the root nodes.
X o  x1 x2  xn , X c  c1,2 c1,3  c1, n c2,3 cn  2, n cn 1, n ,
In contrast, the connectivity chromosome X c is an array of binary digits that corresponds to the connectivity
matrix C :
 0 c1,2

0 0
C 

0 0

0 0
c1,3
c1, n 1
c2,3
c2, n 1
0
0
0
0
c1, n 

c2, n 
1, xi  P( x j ),
 , where ci , j  


0, xi  P( x j ).
cn 1, n 

0 
P( x j ) is the parent set of node x j . The outstanding advantage of the structure for connectivity chromosome is that
when we haven’t got the prior knowledge about the ordering of the nodes, it can still describe all possible
relationships between the nodes. Fig. 2 is taken as an example.
A1
A5
A6
A2
A3
A3
A4
A5
A1
A2
A4
A6
(a) BN1
(b) BN2
Fig. 2 Two sample BNs
The ith ( i  1, 2 ) BN is represented as a coupled chromosome ( X io , X ic ) . The two BNs are described as follows:
X 1o  126354,
X 1c  110000100010010;
X 2 o  543126,
X 2 c  110001100010011.
The corresponding connectivity matrixes are as below:
0

0
0
C1 = 
0
0

0
1
1
0
0
0
0
1
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0

0
0
C2 = 
0
0

0
0

0
0 ,

1
0

0
1
1
0
0
0
0
1
1
0
0
0
0
0
1
0 .
0
0
0
0
1
0
0
0
0
1
0
0
0
0
0



It is clear that when X o is fixed, all relations among nodes in BN can be represented by X c . In turn,
when X c is fixed, all possible ordering of the nodes can be represented by X o . Therefore, the DGA encoding
method can search all possible BN structures.
2.2 Searching method of DGA
When searching the full solution space of BN, DGA works in a similar way to the traditional GA. It is
composed of two main factors: a fitness function to evaluate each individual, and an evolutionary method to
search the highest fitness individual. The pseudo code of DGA is given in Fig. 3.
Input: Database D , a maximum N for iterations, population size n for
every generation, crossover rate pc , mutation rate pm .
Output: Each node with its parent set.
(1) i  0 , initialize Population P(0) , size ( P(0)) > n .
(2) while i  N
(2.1) evaluate P(i) and select individuals for crossover;
(2.2) crossover and mutation to produce new population D ' ;
(2.3) choose best individual and save;
(2.4) P(i  1)  D ' P(i) , i  i  1 ;
(3) decode highest score individual into a BN;
(4) print out each node and its parents.
Fig. 3 Pseudo code of DGA
Although similar to the conventional genetic algorithms in evolution process, the proposed method has some
differences especially in crossover and mutation operators.
2.2.1 Fitness function
Inspired by the scoring function in K2 algorithm, this paper introduces the fitness function as follows:
n
qi
i 1
r
i
(ri -1)!
Nijk ! ,

j 1 ( Nij  ri -1)! k 1
f BN  
(3)
where n is the number of nodes in BN; qi is the number of instantiations for the parent set of node X i in D ; ri is
the number of possible values for X i in D ; Nijk is the number of instantiations in D when X i has the kth value
r
and its parent set has the jth instantiation; Nij   ki 1 Nijk is the number of instantiations in D when the parent set
for X i has the jth instantiation.
2.2.2 Crossover in DGA
Considering that the crossover for connectivity chromosome is similar to the crossover in conventional
genetic algorithm, this paper only presents how the crossover works on the ordering chromosome. Without loss of
generality, we take Fig. 2 for example:
X1o  126354, X 2o  543126 .
The crossover operator uniformly generates a random integer m from the interval (1, n) as the crossover point.
Assuming that m  2 , the crossover leads to
X1o '  12 , X 2o '  54  .
 . When the numbers
Because X1o still contains ‘1’and ‘2’, then the rest numbers in X 1o should be in X 2o
‘1’and’2’are removed from X 2o , we get ‘5436’. This leads to the following results:
X '1o  125436, X '2o  541236 .
Suppose the connectivity chromosomes keep the same, in contrast with Fig. 2, the resulting BNs are shown below:
A1
A5
A5
A2
A4
A3
A1
A4
A2
A6
A6
A3

(b) X 2o

(a) X 1o
Fig. 4 BNs after crossover
2.2.3 Mutation in DGA
In fact, the plain mutation operating is enough and closed in the space of the connectivity chromosomes.
However, the ordering chromosome is not the case. We assume there is an ordering chromosome X o  126354, see
Fig. 5(a). The mutation operator uniformly generates two random integers from the interval [1, n ] as mutation
points: r1  1, r2  3 . The resulting ordering chromosome is shown in Fig. 5(b).
A1
A6
A6
A2
A3
A5
A1
A2
A3
A4
(a) BN before mutation
A5
A4
(b) BN after mutation
Fig. 5 BNs before and after mutation
3 Simulation and Analysis
The correctness of the system BN model can hugely influence the accuracy of the system reliability
estimation results. Incorrect associations in the BN model may lead to in accuracies in the system-level reliability
estimation. Once a correct BN is constructed, the procedures of estimating system reliability will be feasible and
straightforward. Therefore, this section provides a simulation in which the performance of the proposed DGA is
analyzed via a comparison of DGA and K2 algorithm. To get a good understanding of the performance of DGA, a
comparison of DGA with the famous K2 algorithm is performed on some aspects in the following section. We
firstly provide a brief introduction of K2 Algorithm for BN construction, and then apply the simulation.
3.1 A brief summary of K2 algorithm for BN construction
The K2 algorithm is a greedy heuristic searching method, which aims to search for the highest-score
parent set for a particular node in BN. When the K2 algorithm is analyzing a node, it first assumes that the
node has no parents and incrementally adds the parents whose addition can increase the score function. The
K2 program stops when the addition of a certain parent cannot contribute to the score increment[9].The
following steps provide a brief expression of the K2 algorithm:
(1). Initialize the parent set  i for node i, and calculate initial score f (i,  i ) ;
(2). Keep adding nodes to the parent set  i that can increase score f ;
(3). When step (2) can increase score f, stop adding nodes and analyze the next node j;
(4). When all nodes are analyzed, print out each node k and its parent set  k .
The score function inK2 algorithm is as the following:
(di  1)! di
ijk ! ,
j 1 ( ij  di  1) k 1
qi
f (i,  i )  
(4)
where X i has di possible values and qi instantiations of its parent set in D , ijk is the number of
instantiations in D when X i has the kth value and its parent set has the jth instantiation.  ij is the total
number of instantiations in D when the parent set for X i has the jth instantiation.
3.2 Simulation and analysis
The simulation analyzed several BN structures representing systems with various numbers of components,
and some of them are displayed Fig. 6. Based on the mapping relations between fault tree model of the system and
the BN model [14], a series of historical observation data are obtained via Monte Carlo theory and the
predetermined CPTs for every node in the BNs of Fig. 6.
As was explained at first, for newly designed complex systems it is difficult to understand the detailed
information about the system structure, so the ordering on nodes of the BN model may not be clear for engineers.
Therefore, the simulation is designed for the following assumptions about the ordering of nodes: (a) known; (b)
unknown. The simulation repeated ten times under the two assumptions respectively, and the mean CPU time
(MCT) and the mean accuracy (MA) indicators are shown in Table 1.The accuracy is calculated as 1  where
the error rate  is defined as  =(N m  N a ) / N w , in which Nm and N a respectively means the number of missed links
and the number of wrongly added links in the constructed BN. The number of links in target BN is denoted by N w .
A1
A3
A2
A3
A2
A1
A6
A4
A5
A4
(a) n=5
A1
(b) n=6
A3
A2
A2
A4
A7
A5
A3
A6
A8
A4
A5
A4
A7
A1
A4
A10
A2
A3
A5
A6
A9
A1
A3
A6
(d) n=8
A6
A5
A8
A1
(c) n=7
A2
A5
A7
A8
(e) n=9
A7
A9
(f) n=10
Fig. 6 BNs tested in simulation
In Table 1, “Net” denotes “Network”, “NB” denotes “number of nodes in the BN”, “Na” denotes “number of
associations in the BN”. It can be observed that the running times of both algorithms are highly dependent on the
number of nodes in the BN structure. This is because when the number of nodes increases, the associations in BN
grow fast in number, resulting in more calculations. Table 1 also shows that the DGA takes more time to analyze a
BN than K2 algorithm. This is because, when the ordering of nodes is fixed, K2 algorithm uses the greedy-search
method to avoid searching the unrelated nodes while the DGA applies evolution method in its computation.
However, the accuracy of BN listed in the last column implies that the DGA outperforms K2 algorithm
especially when the BN has large number of the nodes or associations. The most probable reason is that, the
observation data don’t strongly support some detailed relationships between the nodes. This makes the
greedy-search method in K2 prematurely wash out the should-be-chosen nodes, which can be seen as a defect of
K2 algorithm. Besides, because of the randomness and directed selection of the evolution method in DGA, it is
possible to preserve the necessary nodes in order to reconstruct a better individual.
Table 1 Simulation results for BNs in Fig. 6
Net
NB
1
5
2
Na
MCT /s
MA /%
K2
DGA
K2
DGA
4
0.51
1.1
100
100
6
7
0.86
2.9
100
100
3
7
9
1.44
11.8
90
100
4
8
10
4.80
39.6
90
100
5
9
12
12.78
97.9
87.5
93.3
6
10
14
63.82
231.1
80
90
The simulation under the second assumption was designed to make comparisons of the two methods on the
following aspects: (1) average running time; (2) mean error rate (MER), see Fig. 7 and Table 2.
DGA vs K2 on running time
running time (s)
1500
K2
DGA
1000
500
0
5
6
7
8
9
10
number of nodes in BN
Fig. 7 Mean running time of two methods
The comparison in Fig. 7 demonstrates that, when the BN structure is simple, there is no big difference
between the two methods, but DGA clearly outperforms K2 algorithm as the BN becomes complex. This
phenomenon results from the advantage of directed selection in DGA which doesn’t exist in K2 algorithm.
In Table 2, the comparison of MER indicates that, when dealing with a BN of unknown node ordering, K2
algorithm always shows poor performance compared with DGA. This is because, as we know, when the database
for the system is not enough to express all causalities in BN, the greedy search method in K2 may reject certain
suitable nodes. This won’t happen in DGA, for it explores the entire solution space by the dual encoding of BN.
Table 2 MER of the two methods
Net
1
2
3
4
5
6
NB
Na
5
6
7
8
9
10
4
7
9
10
12
14
MER /%
K2
0
0
14.6
20
18.9
23.1
DGA
0
0
0
10
11.9
13.2
Fig. 8 shows the evolution process of DGA when there are 8 nodes in BN structure. The method analyzed
only about 20 generations before it searched the fittest structure. That means about 2000 structures are evaluated
by DGA, but this is only one-tenth of the workload for K2 algorithm.
For further comparison of accuracy of the two methods, the structure (d) in Fig. 6 is taken as the target
structure for eight trials on the same database based on the assumption that the node order is known. See statistical
results in Table 3.
-145
BN score
3
x 10
2
1
0
0
10
20
30
40
50
60
generations
Fig. 8 Evolution process of DGA
It can be seen from Table 3 that the proposed K2 searches the structures much more efficiently than K2
algorithm. The reason might be that the proposed method is able to explore the entire solution space for the fittest
BN.
Table 3 Comparison with the target structure
Trial
1
2
3
4
5
6
7
K2
Missed
Wrong
Missed
links
links
links
1
0
1
2
1
1
2
1
2
0
1
2
1
1
0
0
0
1
0
1
0
DGA
Wrong links
1
0
0
0
2
0
1
4 Reliability Estimation via DGA-Built BN
This section provides an automation analysis on reliability of the system presented in Fig. 9, and the
estimation results are then compared with the actual values provided in Ramirez-Marquez and Jiang[15]. A
database containing 500 instances of system behavior was obtained via Monte Carlo theory and the CPTs for all
nodes (root nodes have their prior probabilities). This database was implemented in DGA to construct a BN.
1
2
3
4
6
5
Fig. 9 Case bridge system
Table 4 Comparison of the estimation results
Component
1
2
3
4
5
6
Sys. reliability by DGA
Sys. reliability (reported
in Ref. [15])
Nominal reliability
Case 1
Case 2
0.90
0.85
0.80
0.8
0.90
0.95
0.93
0.9
0.83
0.875
0.85
0.85
0.80425
0.80971
0.813388
0.815113
The evaluated results of the DGA-built BN reliability model presented in Table 4 demonstrate that based on
the provided data, the proposed method is able to reach a very close approximation to the actual system reliability.
That is, the traditional approaches of analyzing the interaction of system graphically can be accurately replaced by
the approach proposed in this paper.
5 Conclusions
Building BN model for the evaluation of system reliability is a very popular and widely studied subject
recently. Accuracy of the BN model highly influences the correctness of reliability estimation results. This paper
proposes a genetic approach to learn BN structures for system reliability estimation. The special BN-coding
scheme in this method ensured that all possible BN structures can be searched so as to find the most suitable BN
structure for the system. The simulation results show that this method worked efficiently especially on complex
systems without the prior knowledge of the BN structure, and it is able to reach a very close approximation to the
actual system reliability. That is, the traditional approaches of building BN reliability models can be effectively
replaced by the approach proposed in this paper.
References
[1] Zacks S. Introduction to Reliability Analysis: Probability Models and Statistical Methods [M]. Berlin:
Springer Science & Business Media, 2012: 3-12.
[2] Zhong X P, M. Ichchou. Reliability Assessment of Complex Mechatronic Systems Using a Modified
Nonparametric Belief Propagation Algorithm [J]. Reliability Engineering and System Safety, 2010, 95(11):
1174-1185.
[3] Doguc O, Ramirez-Marquez J E. A Generic Method for Estimating System Reliability Using Bayesian
Networks [J]. Reliability Engineering and System Safety, 2009, 94(2): 542-550.
[4] Weber P, Medina-Oliva G, Simon C, et al. Overview on Bayesian Networks Applications for Dependability,
Risk Analysis and Maintenance Areas [J]. Engineering Applications of Artificial Intelligence, 2012, 25(4):
671-682.
[5] Botev Z I, L'Ecuyer P, Rubino G, et al. Static Network Reliability Estimation via Generalized Splitting [J].
Informs Journal on Computing, 2013, 25(1): 56-71.
[6] Doguc O, Ramirez-Marquez J E. An Automated Method for Estimating Reliability of Grid Systems Using
Bayesian Networks [J]. Reliability Engineering & System Safety, 2012, 104: 96-105.
[7] Barlow R E. Using Influence Diagrams [R]. California University Berkeley Operations Research Center,
USA: 1987.
[8] Almond R G. An Extended Example for Testing Graphical Belief [J]. Statistical Science Research Report,
1992, 6: 1-18.
[9] Lerner B, Malka R. Investigation of the K2 Algorithm in Learning Bayesian Network Classifiers [J]. Applied
Artificial Intelligence, 2011, 25(1): 74-96.
[10] Morales M M, Dominguez R G, Ramirez N C, et al. A Method Based on Genetic Algorithms and Fuzzy
Logic to Induce Bayesian Networks [C]. Proceedings of the Fifth Mexican International Conference in
Computer Science, Mexico, 2004: 176-180.
[11] Larrañaga P, Karshenas H, Bielza C, et al. A Review on Evolutionary Algorithms in Bayesian Network
Learning and Inference Tasks [J]. Information Sciences, 2013, 233: 109-125.
[12] Jaehun L E E, Chung W, Euntai K I M. Structure Learning of Bayesian Networks Using Dual Genetic
Algorithm [C]. IEICE Transactions on Information and Systems, Japan, 2008: 32-43.
[13] Hartung S, Nichterlein A. NP-Hardness and Fixed-Parameter Tractability of Realizing Degree Sequences
with Directed Acyclic Graphs [J]. SIAM Journal on Discrete Mathematics, 2015, 29(4): 1931-1960.
[14] Bobbio A, Portinale L, Minichino M, et al. Improving the Analysis of Dependable Systems by Mapping
Fault Trees into Bayesian Networks [J]. Reliability Engineering and System Safety, 2001, 71(3): 249-260.
[15] Ramirez-Marquez J E, Jiang W. Confidence Bounds for the Reliability of Binary Capacitated Two-Terminal
Networks [J]. Reliability Engineering and System Safety, 2006, 91(7): 905-914.