Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Building BN-Based System Reliability Model by Dual Genetic Algorithm YOU Wei-zhen(游威振),ZHONG Xiao-pin(钟小品)* Shenzhen Key Laboratory of Electromagnetic Control, Shenzhen University, Shenzhen 518060, China Abstract: A system reliability model based on Bayesian network (BN) is built via an evolutionary strategy called dual genetic algorithm (DGA). BN is a probabilistic approach to analyze relationships between stochastic events. In contrast with traditional methods where the BN model for a system is built by professionals, the DGA is proposed for the automatic analysis of historical data and construction of the BN for the estimation of system reliability. The whole solution space of BN structures is searched by DGA and a more accurate BN model is obtained. The efficacy of the proposed method is shown by some literature examples. Key words: Bayesian network (BN) model; dual genetic algorithm (DGA); system reliability; historical data CLC number: TP277 Document code: A Article ID: 1672-5220 Introduction System reliability is generally defined as the probability that a product does not fail during a defined period of time under given functional and surrounding conditions[1]. Reliability is among the most frequently considered aspects for systems, such as mechanical or electric control systems. The introduction of newly developed technology and increasingly complex systems leads to more and more unreliable factors for the environment and the safety of people. This makes it important and challenging to estimate system reliability for most system engineers[2]. The traditional approaches are always based on the premise that the failure mechanism of the system is well understood[3]. However, for newly designed complex systems, it becomes challenging to understand the detailed relations among components of the system. To solve this problem, Bayesian network structure learning theory has been proposed and developed by researchers as a new way to reliability estimation[4-6]. The earliest studies of BN model for reliability estimation was done by Barlow[7], who compared Bayesian and non-Bayesian approaches for system reliability estimation when studying spherical pressure vessels. A graphical-belief environment, introduced by Almond[8], was developed for large complex systems for risk evaluation. It is important but difficult to build an accurate BN structure, since a large space of possible BN structures could be searched from a small number of nodes. The well-known K2 algorithm aims to search the best BN structure by analyzing a system database[9]. However, the K2 algorithm is based on the assumption that there is a given ordering between the variables, leading to the result that an optimal structure may not be found. Although the traditional GA-based structure learning methods have been proposed[10-11], they cannot find a general way to search the entire BN space for the reason that the genetic operations cannot close in all possible BN structures. This paper therefore proposes a new method of BN structure learning called dual genetic algorithm (DGA)[12], in which a particular BN is represented by a pair of chromosomes—ordering chromosome and connectivity chromosome. With special genetic operators, this method can learn all the topologies of BN nodes as well as the ordering among them, thereby, searching the whole solution space of the BN problem. The paper is organized as follows: Section 1 provides a brief summary of BN for system reliability. Section 2 gives information about DGA for BN structure learning. Section 3 presents a simulation and a performance analysis of the proposed method. Section 4performs the reliability estimation via DGA-built BN model. Conclusions are finally provided in section 5. 1 Bayesian Network for System Reliability The BN is represented as a directed acyclic graph (DAG), in which the nodes represent the corresponding variables and the directed links represent the dependent relationship between every two nodes [13]. The nodes are called root nodes if no arrows lead into them, while the nodes having some arrows leading into them are named ___________________________ Received date: 2015-06-12 Foundation item: National Natural Science Foundation of China (No.61203184) *Correspondence should be addressed to ZHONG Xiao-pin, Email: [email protected] child nodes. The nodes that have arrows leading out from them are parent nodes. So a root node sometimes may be a parent node, see the sample BN in Fig. 1. It is a probabilistic approach to analyze and capture the relationships between stochastic events. From the perspective of probability, the variables of a BN are viewed as the components in a system, while the links represent the interactions of the components that lead to ‘‘success’’ or ‘‘failure’’ of the system thereby, one could treat the BN as an approach to describe the interactions among the components of the system. For easy understanding, a binary digit is used to represent the state of a component: 0 means “not functioning” and 1 means “working properly”. In general, the strength of dependence between every two nodes is represented by a inter action between the two nodes represented by a probability value. The joint probability of all variables, when they are in particular instantiation in the BN, is calculated as Eq. (1). p(S ) p( Ai | P( Ai )) (1) i where S A1 , A2 ,..., An represents all nodes in BN and P( Ai ) represents the parents of Ai . In reliability estimation, the components are assigned with their conditional probability tables (CPTs). For Ai , the CPT contains p( Ai | P( Ai )) . Each parent of Ai is instantiated as one particular state: Success or Failure. If Ai has m parents, then there should be 2 m different instantiations in its CPT. For example, in Fig.1 the nodes A1 , A2 , A3 are root nodes for the reason that they have no oriented edges coming in or out. That is, they are independent of each other. A4 and A5 have the parent set { A1 , A2 } and { A2 , A3 } respectively, so they are all child nodes. A1 A2 A4 A3 A5 System behavior Fig.1 A sample BN structure System behavior node represents the state of the system. If we know the corresponding CPTs to this BN model, then by implementing Bayes’ rule the overall system success probability can be computed as follows: p( A6 1) 1 1 1 A0 p( A1 ) p( A2 ) p( A4 | A1, A2 ) p( A3 ) p( A5 | A2 , A3 ) p( A6 1| A4 , A5 ) A 0 A 0 1 2 (2) 5 where A6 represents system behavior. 2 DGA for BN Construction In the proposed method DGA, the ordering chromosome represents the order between all nodes in the BN structure, but it is not fixed in a certain permutation. The connectivity matrix is deduced from an upper triangular matrix containing binary digits, which is also in change. This combination ensures the entire space of BN structures to be searched for the fittest one. 2.1 Encoding BN structure The ordering chromosome X o (subscript ‘o’ denotes ‘ordering’) shown below is an array of unrepeated integers ranging from 1 to n , where xi {1, 2, , n}. If there are m root nodes in BN, then the first m integers in X o (subscript ‘c’ denotes ‘connectivity’) represent the root nodes. X o x1 x2 xn , X c c1,2 c1,3 c1, n c2,3 cn 2, n cn 1, n , In contrast, the connectivity chromosome X c is an array of binary digits that corresponds to the connectivity matrix C : 0 c1,2 0 0 C 0 0 0 0 c1,3 c1, n 1 c2,3 c2, n 1 0 0 0 0 c1, n c2, n 1, xi P( x j ), , where ci , j 0, xi P( x j ). cn 1, n 0 P( x j ) is the parent set of node x j . The outstanding advantage of the structure for connectivity chromosome is that when we haven’t got the prior knowledge about the ordering of the nodes, it can still describe all possible relationships between the nodes. Fig. 2 is taken as an example. A1 A5 A6 A2 A3 A3 A4 A5 A1 A2 A4 A6 (a) BN1 (b) BN2 Fig. 2 Two sample BNs The ith ( i 1, 2 ) BN is represented as a coupled chromosome ( X io , X ic ) . The two BNs are described as follows: X 1o 126354, X 1c 110000100010010; X 2 o 543126, X 2 c 110001100010011. The corresponding connectivity matrixes are as below: 0 0 0 C1 = 0 0 0 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 C2 = 0 0 0 0 0 0 , 1 0 0 1 1 0 0 0 0 1 1 0 0 0 0 0 1 0 . 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 It is clear that when X o is fixed, all relations among nodes in BN can be represented by X c . In turn, when X c is fixed, all possible ordering of the nodes can be represented by X o . Therefore, the DGA encoding method can search all possible BN structures. 2.2 Searching method of DGA When searching the full solution space of BN, DGA works in a similar way to the traditional GA. It is composed of two main factors: a fitness function to evaluate each individual, and an evolutionary method to search the highest fitness individual. The pseudo code of DGA is given in Fig. 3. Input: Database D , a maximum N for iterations, population size n for every generation, crossover rate pc , mutation rate pm . Output: Each node with its parent set. (1) i 0 , initialize Population P(0) , size ( P(0)) > n . (2) while i N (2.1) evaluate P(i) and select individuals for crossover; (2.2) crossover and mutation to produce new population D ' ; (2.3) choose best individual and save; (2.4) P(i 1) D ' P(i) , i i 1 ; (3) decode highest score individual into a BN; (4) print out each node and its parents. Fig. 3 Pseudo code of DGA Although similar to the conventional genetic algorithms in evolution process, the proposed method has some differences especially in crossover and mutation operators. 2.2.1 Fitness function Inspired by the scoring function in K2 algorithm, this paper introduces the fitness function as follows: n qi i 1 r i (ri -1)! Nijk ! , j 1 ( Nij ri -1)! k 1 f BN (3) where n is the number of nodes in BN; qi is the number of instantiations for the parent set of node X i in D ; ri is the number of possible values for X i in D ; Nijk is the number of instantiations in D when X i has the kth value r and its parent set has the jth instantiation; Nij ki 1 Nijk is the number of instantiations in D when the parent set for X i has the jth instantiation. 2.2.2 Crossover in DGA Considering that the crossover for connectivity chromosome is similar to the crossover in conventional genetic algorithm, this paper only presents how the crossover works on the ordering chromosome. Without loss of generality, we take Fig. 2 for example: X1o 126354, X 2o 543126 . The crossover operator uniformly generates a random integer m from the interval (1, n) as the crossover point. Assuming that m 2 , the crossover leads to X1o ' 12 , X 2o ' 54 . . When the numbers Because X1o still contains ‘1’and ‘2’, then the rest numbers in X 1o should be in X 2o ‘1’and’2’are removed from X 2o , we get ‘5436’. This leads to the following results: X '1o 125436, X '2o 541236 . Suppose the connectivity chromosomes keep the same, in contrast with Fig. 2, the resulting BNs are shown below: A1 A5 A5 A2 A4 A3 A1 A4 A2 A6 A6 A3 (b) X 2o (a) X 1o Fig. 4 BNs after crossover 2.2.3 Mutation in DGA In fact, the plain mutation operating is enough and closed in the space of the connectivity chromosomes. However, the ordering chromosome is not the case. We assume there is an ordering chromosome X o 126354, see Fig. 5(a). The mutation operator uniformly generates two random integers from the interval [1, n ] as mutation points: r1 1, r2 3 . The resulting ordering chromosome is shown in Fig. 5(b). A1 A6 A6 A2 A3 A5 A1 A2 A3 A4 (a) BN before mutation A5 A4 (b) BN after mutation Fig. 5 BNs before and after mutation 3 Simulation and Analysis The correctness of the system BN model can hugely influence the accuracy of the system reliability estimation results. Incorrect associations in the BN model may lead to in accuracies in the system-level reliability estimation. Once a correct BN is constructed, the procedures of estimating system reliability will be feasible and straightforward. Therefore, this section provides a simulation in which the performance of the proposed DGA is analyzed via a comparison of DGA and K2 algorithm. To get a good understanding of the performance of DGA, a comparison of DGA with the famous K2 algorithm is performed on some aspects in the following section. We firstly provide a brief introduction of K2 Algorithm for BN construction, and then apply the simulation. 3.1 A brief summary of K2 algorithm for BN construction The K2 algorithm is a greedy heuristic searching method, which aims to search for the highest-score parent set for a particular node in BN. When the K2 algorithm is analyzing a node, it first assumes that the node has no parents and incrementally adds the parents whose addition can increase the score function. The K2 program stops when the addition of a certain parent cannot contribute to the score increment[9].The following steps provide a brief expression of the K2 algorithm: (1). Initialize the parent set i for node i, and calculate initial score f (i, i ) ; (2). Keep adding nodes to the parent set i that can increase score f ; (3). When step (2) can increase score f, stop adding nodes and analyze the next node j; (4). When all nodes are analyzed, print out each node k and its parent set k . The score function inK2 algorithm is as the following: (di 1)! di ijk ! , j 1 ( ij di 1) k 1 qi f (i, i ) (4) where X i has di possible values and qi instantiations of its parent set in D , ijk is the number of instantiations in D when X i has the kth value and its parent set has the jth instantiation. ij is the total number of instantiations in D when the parent set for X i has the jth instantiation. 3.2 Simulation and analysis The simulation analyzed several BN structures representing systems with various numbers of components, and some of them are displayed Fig. 6. Based on the mapping relations between fault tree model of the system and the BN model [14], a series of historical observation data are obtained via Monte Carlo theory and the predetermined CPTs for every node in the BNs of Fig. 6. As was explained at first, for newly designed complex systems it is difficult to understand the detailed information about the system structure, so the ordering on nodes of the BN model may not be clear for engineers. Therefore, the simulation is designed for the following assumptions about the ordering of nodes: (a) known; (b) unknown. The simulation repeated ten times under the two assumptions respectively, and the mean CPU time (MCT) and the mean accuracy (MA) indicators are shown in Table 1.The accuracy is calculated as 1 where the error rate is defined as =(N m N a ) / N w , in which Nm and N a respectively means the number of missed links and the number of wrongly added links in the constructed BN. The number of links in target BN is denoted by N w . A1 A3 A2 A3 A2 A1 A6 A4 A5 A4 (a) n=5 A1 (b) n=6 A3 A2 A2 A4 A7 A5 A3 A6 A8 A4 A5 A4 A7 A1 A4 A10 A2 A3 A5 A6 A9 A1 A3 A6 (d) n=8 A6 A5 A8 A1 (c) n=7 A2 A5 A7 A8 (e) n=9 A7 A9 (f) n=10 Fig. 6 BNs tested in simulation In Table 1, “Net” denotes “Network”, “NB” denotes “number of nodes in the BN”, “Na” denotes “number of associations in the BN”. It can be observed that the running times of both algorithms are highly dependent on the number of nodes in the BN structure. This is because when the number of nodes increases, the associations in BN grow fast in number, resulting in more calculations. Table 1 also shows that the DGA takes more time to analyze a BN than K2 algorithm. This is because, when the ordering of nodes is fixed, K2 algorithm uses the greedy-search method to avoid searching the unrelated nodes while the DGA applies evolution method in its computation. However, the accuracy of BN listed in the last column implies that the DGA outperforms K2 algorithm especially when the BN has large number of the nodes or associations. The most probable reason is that, the observation data don’t strongly support some detailed relationships between the nodes. This makes the greedy-search method in K2 prematurely wash out the should-be-chosen nodes, which can be seen as a defect of K2 algorithm. Besides, because of the randomness and directed selection of the evolution method in DGA, it is possible to preserve the necessary nodes in order to reconstruct a better individual. Table 1 Simulation results for BNs in Fig. 6 Net NB 1 5 2 Na MCT /s MA /% K2 DGA K2 DGA 4 0.51 1.1 100 100 6 7 0.86 2.9 100 100 3 7 9 1.44 11.8 90 100 4 8 10 4.80 39.6 90 100 5 9 12 12.78 97.9 87.5 93.3 6 10 14 63.82 231.1 80 90 The simulation under the second assumption was designed to make comparisons of the two methods on the following aspects: (1) average running time; (2) mean error rate (MER), see Fig. 7 and Table 2. DGA vs K2 on running time running time (s) 1500 K2 DGA 1000 500 0 5 6 7 8 9 10 number of nodes in BN Fig. 7 Mean running time of two methods The comparison in Fig. 7 demonstrates that, when the BN structure is simple, there is no big difference between the two methods, but DGA clearly outperforms K2 algorithm as the BN becomes complex. This phenomenon results from the advantage of directed selection in DGA which doesn’t exist in K2 algorithm. In Table 2, the comparison of MER indicates that, when dealing with a BN of unknown node ordering, K2 algorithm always shows poor performance compared with DGA. This is because, as we know, when the database for the system is not enough to express all causalities in BN, the greedy search method in K2 may reject certain suitable nodes. This won’t happen in DGA, for it explores the entire solution space by the dual encoding of BN. Table 2 MER of the two methods Net 1 2 3 4 5 6 NB Na 5 6 7 8 9 10 4 7 9 10 12 14 MER /% K2 0 0 14.6 20 18.9 23.1 DGA 0 0 0 10 11.9 13.2 Fig. 8 shows the evolution process of DGA when there are 8 nodes in BN structure. The method analyzed only about 20 generations before it searched the fittest structure. That means about 2000 structures are evaluated by DGA, but this is only one-tenth of the workload for K2 algorithm. For further comparison of accuracy of the two methods, the structure (d) in Fig. 6 is taken as the target structure for eight trials on the same database based on the assumption that the node order is known. See statistical results in Table 3. -145 BN score 3 x 10 2 1 0 0 10 20 30 40 50 60 generations Fig. 8 Evolution process of DGA It can be seen from Table 3 that the proposed K2 searches the structures much more efficiently than K2 algorithm. The reason might be that the proposed method is able to explore the entire solution space for the fittest BN. Table 3 Comparison with the target structure Trial 1 2 3 4 5 6 7 K2 Missed Wrong Missed links links links 1 0 1 2 1 1 2 1 2 0 1 2 1 1 0 0 0 1 0 1 0 DGA Wrong links 1 0 0 0 2 0 1 4 Reliability Estimation via DGA-Built BN This section provides an automation analysis on reliability of the system presented in Fig. 9, and the estimation results are then compared with the actual values provided in Ramirez-Marquez and Jiang[15]. A database containing 500 instances of system behavior was obtained via Monte Carlo theory and the CPTs for all nodes (root nodes have their prior probabilities). This database was implemented in DGA to construct a BN. 1 2 3 4 6 5 Fig. 9 Case bridge system Table 4 Comparison of the estimation results Component 1 2 3 4 5 6 Sys. reliability by DGA Sys. reliability (reported in Ref. [15]) Nominal reliability Case 1 Case 2 0.90 0.85 0.80 0.8 0.90 0.95 0.93 0.9 0.83 0.875 0.85 0.85 0.80425 0.80971 0.813388 0.815113 The evaluated results of the DGA-built BN reliability model presented in Table 4 demonstrate that based on the provided data, the proposed method is able to reach a very close approximation to the actual system reliability. That is, the traditional approaches of analyzing the interaction of system graphically can be accurately replaced by the approach proposed in this paper. 5 Conclusions Building BN model for the evaluation of system reliability is a very popular and widely studied subject recently. Accuracy of the BN model highly influences the correctness of reliability estimation results. This paper proposes a genetic approach to learn BN structures for system reliability estimation. The special BN-coding scheme in this method ensured that all possible BN structures can be searched so as to find the most suitable BN structure for the system. The simulation results show that this method worked efficiently especially on complex systems without the prior knowledge of the BN structure, and it is able to reach a very close approximation to the actual system reliability. That is, the traditional approaches of building BN reliability models can be effectively replaced by the approach proposed in this paper. References [1] Zacks S. Introduction to Reliability Analysis: Probability Models and Statistical Methods [M]. Berlin: Springer Science & Business Media, 2012: 3-12. [2] Zhong X P, M. Ichchou. Reliability Assessment of Complex Mechatronic Systems Using a Modified Nonparametric Belief Propagation Algorithm [J]. Reliability Engineering and System Safety, 2010, 95(11): 1174-1185. [3] Doguc O, Ramirez-Marquez J E. A Generic Method for Estimating System Reliability Using Bayesian Networks [J]. Reliability Engineering and System Safety, 2009, 94(2): 542-550. [4] Weber P, Medina-Oliva G, Simon C, et al. Overview on Bayesian Networks Applications for Dependability, Risk Analysis and Maintenance Areas [J]. Engineering Applications of Artificial Intelligence, 2012, 25(4): 671-682. [5] Botev Z I, L'Ecuyer P, Rubino G, et al. Static Network Reliability Estimation via Generalized Splitting [J]. Informs Journal on Computing, 2013, 25(1): 56-71. [6] Doguc O, Ramirez-Marquez J E. An Automated Method for Estimating Reliability of Grid Systems Using Bayesian Networks [J]. Reliability Engineering & System Safety, 2012, 104: 96-105. [7] Barlow R E. Using Influence Diagrams [R]. California University Berkeley Operations Research Center, USA: 1987. [8] Almond R G. An Extended Example for Testing Graphical Belief [J]. Statistical Science Research Report, 1992, 6: 1-18. [9] Lerner B, Malka R. Investigation of the K2 Algorithm in Learning Bayesian Network Classifiers [J]. Applied Artificial Intelligence, 2011, 25(1): 74-96. [10] Morales M M, Dominguez R G, Ramirez N C, et al. A Method Based on Genetic Algorithms and Fuzzy Logic to Induce Bayesian Networks [C]. Proceedings of the Fifth Mexican International Conference in Computer Science, Mexico, 2004: 176-180. [11] Larrañaga P, Karshenas H, Bielza C, et al. A Review on Evolutionary Algorithms in Bayesian Network Learning and Inference Tasks [J]. Information Sciences, 2013, 233: 109-125. [12] Jaehun L E E, Chung W, Euntai K I M. Structure Learning of Bayesian Networks Using Dual Genetic Algorithm [C]. IEICE Transactions on Information and Systems, Japan, 2008: 32-43. [13] Hartung S, Nichterlein A. NP-Hardness and Fixed-Parameter Tractability of Realizing Degree Sequences with Directed Acyclic Graphs [J]. SIAM Journal on Discrete Mathematics, 2015, 29(4): 1931-1960. [14] Bobbio A, Portinale L, Minichino M, et al. Improving the Analysis of Dependable Systems by Mapping Fault Trees into Bayesian Networks [J]. Reliability Engineering and System Safety, 2001, 71(3): 249-260. [15] Ramirez-Marquez J E, Jiang W. Confidence Bounds for the Reliability of Binary Capacitated Two-Terminal Networks [J]. Reliability Engineering and System Safety, 2006, 91(7): 905-914.