Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Marginal Distributions in Evolutionary Algorithms Martin Pelikan Slovak Technical University 812 37 Bratislava Slovakia email: [email protected] tel/fax: +421-7-5325 308 / +421-7-393 198 Heinz Muehlenbein GMD Forschungszentrum Informationstechnik D-53754 Sankt Augustin Germany email: [email protected] tel/fax: +49-2241-14 2405/ +49-2241-14 2889 Abstract In this paper, the description of two gene pool recombination operators is described. Both operators are based on the estimation of the distribution of the parents and its use to generate new individuals. The Univariate Marginal Distribution Algorithm (UMDA) uses simple univariate distributions. It is a perfect algorithm for linear problems. The Bivariate Marginal Distribution Algorithm (BMDA) is an extension of UMDA. In BMDA, the most important pair dependencies are taken into account. The dependencies are measured by the Pearson's chi-square statistics. The structure of a problem is being discovered during an optimization. BMDA works well for linear problems as well as for problems with interacting genes. 1 Introduction Evolutionary algorithms work over populations of strings. The main schema of evolutionary algorithms is simple. The initial population is generated randomly. From the current population, a set of high quality individuals is selected rst. The better the individual, the bigger the chance for its selection. The information contained in the set of selected individuals is then used in order to create the ospring population. New individuals then replace a part of the old population or the whole old population. The process of the selection, processing the information from the selected set, and the incorporation of new individuals into the old population, is repeated until the population satises the termination criteria. In a simple genetic algorithm, the information contained in the selected set is processed using Mendelian crossover and mutation on pairs of individuals. Crossover combines the information contained in two individuals together by swapping some of genes between them. Mutation is a small perturbation to the genome in order to preserve the diversity of the population and to introduce the new information. The theory on GAs is based on the fundamental theorem that claims that the number of schemata increases exponentially for schemata with tness above an average of the population. Schemata with a tness lower than average exponentially vanish. When schemata of a large dening length are needed to obtain the optimum, a simple genetic algorithm does not work well. The other approach to processing the information contained in the selected set is to estimate the distribution of selected individuals and to generate new individuals according to this distribution. A general schema of these algorithms is called the Estimation of Distribution Algorithm (EDA) [1]. The estimation of the distribution of the selected set is a very complex problem. It has to be done eciently and the estimate should be able to cover a large number of problems. A general implementation of EDA was presented in [1]. The weak point of this implementation was the determination of the distribution. The Univariate Marginal Distribution Algorithm (UMDA) [2] uses simple univariate marginal distributions. The theory shows that UMDA works perfect for linear problems. It works very well for the problems with not many signicant dependencies. The Bivariate Marginal Distribution Algorithm (BMDA) [3] is an extension of UMDA. It is based on bivariate marginal distributions. Bivariate distributions allow taking into account the most important pair dependencies. BMDA does therefore perform well for linear as well as quadratic problems. The problem arises for the problems with signicant dependencies of a higher order although many of such problems can be solved by BMDA eciently as well.. If the structure of a problem were known, there would be a way to cope with dependencies of a higher order. The Factorized Distribution Algorithm [4] (FDA) works very eciently for decomposable problems of a known structure. It requires a prior knowledge about the problem decomposition. This is not needed by UMDA because UMDA xes the distribution to take each gene independently. BMDA learns the structure of a problem during an optimization process but the model covers well only problems with pair dependencies that do not form cycles. However, the problems with a more complex dependency model are approximated with this model. In the following text, only chromosomes represented by binary strings of xed length will be considered. Algorithms can be easily extended to strings over any nite alphabet. The rst position in a string will be referred as the 0th position. 2 Univariate Marginal Distribution Algorithm The Univariate Marginal Distribution Algorithm (UMDA) uses the simplest way to estimate the distribution. It belongs to the EDA schema. The estimation of distribution of the selected set is done with a very simple linear model, the so-called univariate marginal distributions. Let us denote a chromosome length by n. For each position i 2 f0; : : : ; n ? 1g and each possible value xi 2 f0; 1g on this position, we dene the univariate marginal frequency pi (xi ) for set P as the frequency of strings that have xi on ith position in set P . Let us have the univariate marginal frequencies for the set of selected individuals. Then, the distribution of the parents is estimated as p(X ) = nY ?1 i=0 pi (xi ) (1) New individuals are generated according to this distribution. For the selected set, the univariate marginal frequencies are calculated rst. Then, for each new individual, a bit on the ith position is generated using the univariate frequencies pi (a). It is set to a with the probability equal to pi (a). UMDA works perfect for linear problems what was shown in [2]. The problem arises for problems with signicant dependencies among genes. 3 Bivariate Marginal Distribution Algorithm The Bivariate Marginal Distribution Algorithm (BMDA) [3] also belongs to the EDA class of algorithms. It uses a more sophisticated distribution based on bivariate marginal frequencies. The most important dependencies are identied. The distributions is estimated in order to preserve these identied dependencies. The estimation of distribution is described in Section 3.1. In Section 3.2, the algorithm for the generation of new individuals according to the estimated distribution is described. Let is denote a chromosome length by n. For any two positions i 6= j 2 f0; : : :; n ? 1g and any possible values xi ; xj 2 f0; 1g on these positions, we dene the bivariate marginal frequency pi;j (xi ; xj ) for set P as the frequency of strings that have xi and xj on positions i and j , respectively. With the use of univariate and bivariate marginal frequencies, the conditional probability of appearance of the value xi on ith position having given the value xj on j th position can be calculated as pi;j (xi jxj ) = pi;jp(x(xi ; x) j ) (2) j j The important dependencies will be identied using the Pearson's chi-square statistics [6] for independence of two random variables. For the two positions i and j , the statistics is given by (Npi;j (xi ; xj ) ? Npi (xi )pj (xj ))2 (3) Npi (xi )pj (xj ) x ;x where the sum runs over all combinations of xi and xj . The two genes corresponding to positions i and 2 j are independent for 95% if Xi;j < 3:84. 2 Xi;j = X i j 3.1 Construction of a Dependency Graph For the estimation of the distribution of the selected set, a dependency graph will be used. The graph will be dened by three sets, V , E , and R, i.e. G = (V; E; R). V is the set of vertices, E V V is 2 the set of edges and R is a set containing one vertex from each of the connected components of G. In a dependency graph each node corresponds to a position in a string. There is one to one correspondence between the vertices and positions in a string. Thus, we can use the set of vertices V = f0; : : : ; n ? 1g, where vertex i corresponds to the ith position. As it will be clear from the construction of the graph, it does not have be connected. That means the graph does not have to be a tree. The dependency graph is always acyclic. It can be seen as the set of trees that are not mutually connected. The generation of new strings does not depend on the number of connected components of the graph. In the algorithm for the construction of a dependency graph, another set denoted by D is used. It stands for all pairs of positions that are not independent for 95% (see Equation 3). The pseudo-code of algorithm for construction of a dependency graph follows. The time complexity of this algorithm is O(n3 ). Algorithm for Construction of the Dependency Graph 1. set V f0; : : : ; n ? 1g set A V set E ; 2. v any vertex from A add v into R 3. remove v from A 4. if there are no more vertices in set A, nish 5. if in D there are no more dependencies of any v and v0 where v 2 A and v0 2 V n A, go to 2 2 6. set v to the vertex from A that maximizes Xv;v over all v 2 A and v0 2 V n A 0 7. add edge (v; v ) into the set of edges E 8. go to step 3 0 3.2 Generation of New Individuals Each individual is generated using the same algorithm. For the generation, the graph G = (V; E; R) computed by the previous algorithm is used. First, the values for all positions from R are generated, using their univariate marginal frequencies. Then, the value on any position that is connected to an already generated position in the graph, is generated, using the conditional probabilities for this position having given the value on the already generated position connected to this one. The pseudo-code of the algorithm for generation of one new individual follows. Algorithm for Generation of New Individual Using the Dependency Graph 1. set K V 2. generate xr for all r 2 R using univariate frequencies (set it to the value a with probability pr (a)) set K K n R 3. if K is already empty, nish 4. choose k from K such that there exist k0 from V n K connected to k in the graph G 5. generate xk using conditional probability having given value for xk , i.e. set it to value a with probability pk;k (ajxk ) 6. remove k from the set K 7. go to 4 0 0 0 3.3 Description of BMDA In BMDA, as in any of evolutionary algorithms, the population is generated randomly rst. Then, the set of good individuals is selected. Using this set, the dependency graph is constructed. New individuals are generated according to this graph. Created individuals are then incorporated into the old population. The pseudo-code of BMDA follows. Bivariate Marginal Distribution Algorithm 1. set t 0 randomly generate initial population P (0) 2. select parents S (t) from P (t) calculate univariate frequencies pi and bivariate frequencies pi;j for the selected set S (t) 3 3. create a dependency graph G = (V; E; R) using the frequencies pi and pi;j 4. generate the set of new individuals O(t) using the dependency graph G and frequencies pi and pi;j 5. replace some of individuals from P (t) with new individuals O(t) set t t + 1 6. if termination criteria are not met, go to 2 The termination criterion due to the lack of diversity is dened as follows: if all univariate frequencies are closer than > 0 to 0 or 1, the algorithm is terminated. If this is the case, we say the algorithm -converged. In our experiments, we use this termination criterion with = 0:05. 4 Experiments First, the used tness functions will be described. The tness denitions follow. For some functions, the permutation denoted by will be used. The permutation is used in order to show how the ordering of genes aects the performance of compared algorithms. Onemax tness function [2] fonemax(x) = nX ?1 i=0 xi (4) Quadratic tness function without overlapping [3] 2 ?1 X n fquadratic (x; ) = where f2 is dened as i=0 ? f2 x(2i) ; x(2i+1) (5) f2 (u; v) = 0:9 ? 0:9(u + v) + 1:9 (6) Deceptive function of order 3 [3] n f3deceptive (x; ) = ?1 3 X i=0 ? f3 x(3i) + x(3i+1) + x(3i+2) (7) where x is a bit string, is any permutation of order n, and f3 is dened as 8 > > < if u = 0 if u = 1 if u = 2 otherwise (8) f5 (x5i + x5i+1 + x5i+2 + x5i+3 + x5i+4 ) (9) f3 (u) = > > : 0:9 0:8 0 1 Trap function of order 5 [5] 5 ?1 X n ftrap5 (x) = where f5 is dened as i=0 f5 (u) = 5 ? u if u < 5 5 otherwise 4 (10) 4.1 The discovery of dependencies In this section, the evolution of dependencies is shown. Dependencies are measured by the Pearson's chi-square statistics (see Equation 3). Since there is n(n ? 1) dierent pairs of variables, it would be hard to show the evolution for all possible dependencies. Therefore, only dependency between the rst gene (the 0th position) and the others is shown. The three-dimensional graph is used. The rst axis stands for the gene that shown dependencies are counted with (i.e., it obtain values from f1,2,. . . ,n-1g). The second axis stands for the number of epochs and the third axis stands for the chi-square statistics of the rst gene and the gene corresponding to the value on the rst axis in epoch corresponding to the value on the second axis. The two tness functions was used. In Figure 1 the evolution of dependencies for the deceptive function of order 3 is shown (see Equation 7). In this tness function, the rst bit is correlated with the next two ones what is clear from the basic denition of the function as well as from its decomposition [3]. The algorithm clearly discovers the right dependencies and they gradually get stronger until the algorithm nds the optima. The set of parameters of the algorithm was set to values to make it converge in most of the runs. The second used function is the trap function of order 5 (see Equation 9). Results for this function are shown in Figure 2. In this function, the rst bit is correlated with the next four ones. The algorithm identies the right dependencies and they gradually get stronger. Figure 1: f3deceptive Figure 2: The evolution of dependencies for ftrap5 of The evolution of dependencies for of size n = 30 (see Equation 7). size n = 20 (see Equation 9). chi-square statistics chi-square statistics 800 500 600 400 300 400 200 200 100 0 0 25 14 20 15 epoch 10 5 0 30 25 20 10 15 position 5 0 12 10 epoch 8 6 4 2 0 8 6 12 10 position 16 14 20 18 4 2 0 4.2 Comparisons Due to the lack of space, only results for a few experiments will be presented. Comparisons were done with UMDA, BMDA and with GA with onepoint and uniform crossover. The experiments were done with three tness functions. For all algorithms, the set of parameters was set so that the convergence was achieved in all of 30 independent runs. A number of tness function calls performed until convergence was compared. The rst is a linear onemax tness function fonemax (see Equation 4). Results for various problem sizes are shown in Figure 3. Similar experiments were done for quadratic tness function fquadratic (see Equation 5). Results are shown in Figure 4. UMDA and GA with uniform crossover performed much worse so the corresponding results are not present in the graph. For deceptive function of order 3 (see Equation 7) there were used two dierent permutations. The 1 was identity. The 3 was dened as follows n ( i mod 3) + i (11) 3 (i) = 3 5 Conclusions For linear problems, UMDA works perfectly. BMDA needs larger population, in order to be discover the right dependency structure of a problem. UMDA and GA with uniform crossover work best for this class of problems since they take genes as independent. For quadratic function, BMDA is clearly the best. 5 Figure 3: Fitness evaluations for fonemax for various problem sizes (see Equation 4). Figure 4: Fitness evaluations for fquadratic for various problem sizes (see Equation 5). 140000 6000 UMDA GA(uniform) GA(onepoint) BMDA 5000 Number of fitness evaluations Number of fitness evaluations 7000 4000 3000 2000 1000 0 120000 BMDA GA(onepoint) 100000 80000 60000 40000 20000 0 20 40 60 80 100 120 Size of the problem 140 160 180 0 20 40 60 80 Size of the problem 100 120 Table 1: Fitness evaluations for f3deceptive of size n = 30 BMDA GA (one-point) GA (uniform) tness eval. for 1 tness eval. for 3 17; 550 17; 420 4; 977 230; 000 > 650; 000 > 650; 000 BMDA uses a model that covers quadratic functions. That is why it beats all other algorithms easily. For a dierent reordering a gap between BMDA and a simple GA with onepoint crossover enlarges [3]. The deceptive function of order 3 is not covered by the model of BMDA. BMDA works quite well for this function anyway. Moreover, its performance is independent of the order of genes so it wins when the distances between genes that correlate with each other are longer. For problems with dependencies or higher order, BMDA does not work very well. The model it uses is not sucient for covering the problem well. The solution to this problem might be the use of FDA. However, FDA requires a problemspecic knowledge in the initial stage. This is not required by any of UMDA, BMDA, or GA. If this were overcome, FDA would perform very well for all problems that are decomposable. References [1] Muehlenbein, H., Paa, G., 1996, From Recombination of Genes to the Estimation of Distributions, In Voigt, H. M., et al. (eds), Lecture Notes in Computer Science 1141: Parallel Problem Solving from Nature - PPSN IV, pp. 178-187. [2] Muehlenbein, H., 1998, The Equation for Response to Selection and its Use for Prediction, Evolutionary Computation, 5, 303-346. [3] Pelikan, M., Muehlenbein, H., 1998, The Bivariate Marginal Distribution Algorithm, submitted for publication. [4] Muehlenbein, H., Rodriguez, A. O., 1998, Schemata, Distributions and Graphical Models in Evolutionary Optimization, submitted for publication. [5] Kargupta, H., 1995, SEARCH, polynomial complexity, and the fast messy genetic algorithm, dissertation, University of Illinois, IL [6] Marascuilo, L. A., McSweeney, M., 1977, Nonparametric and Distribution-Free Methods for the Social Sciences, Brooks/Cole Publishing Company, CA 6