Download Marginal Distributions in Evolutionary Algorithms

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Hardy–Weinberg principle wikipedia , lookup

Microevolution wikipedia , lookup

Median graph wikipedia , lookup

Species distribution wikipedia , lookup

Gene expression programming wikipedia , lookup

Transcript
Marginal Distributions in Evolutionary Algorithms
Martin Pelikan
Slovak Technical University
812 37 Bratislava
Slovakia
email: [email protected]
tel/fax: +421-7-5325 308 / +421-7-393 198
Heinz Muehlenbein
GMD Forschungszentrum Informationstechnik
D-53754 Sankt Augustin
Germany
email: [email protected]
tel/fax: +49-2241-14 2405/ +49-2241-14 2889
Abstract
In this paper, the description of two gene pool recombination operators is described. Both
operators are based on the estimation of the distribution of the parents and its use to generate
new individuals. The Univariate Marginal Distribution Algorithm (UMDA) uses simple univariate
distributions. It is a perfect algorithm for linear problems. The Bivariate Marginal Distribution
Algorithm (BMDA) is an extension of UMDA. In BMDA, the most important pair dependencies
are taken into account. The dependencies are measured by the Pearson's chi-square statistics. The
structure of a problem is being discovered during an optimization. BMDA works well for linear
problems as well as for problems with interacting genes.
1 Introduction
Evolutionary algorithms work over populations of strings. The main schema of evolutionary algorithms
is simple. The initial population is generated randomly. From the current population, a set of high
quality individuals is selected rst. The better the individual, the bigger the chance for its selection.
The information contained in the set of selected individuals is then used in order to create the ospring
population. New individuals then replace a part of the old population or the whole old population. The
process of the selection, processing the information from the selected set, and the incorporation of new
individuals into the old population, is repeated until the population satises the termination criteria. In
a simple genetic algorithm, the information contained in the selected set is processed using Mendelian
crossover and mutation on pairs of individuals. Crossover combines the information contained in two
individuals together by swapping some of genes between them. Mutation is a small perturbation to the
genome in order to preserve the diversity of the population and to introduce the new information. The
theory on GAs is based on the fundamental theorem that claims that the number of schemata increases
exponentially for schemata with tness above an average of the population. Schemata with a tness
lower than average exponentially vanish. When schemata of a large dening length are needed to obtain
the optimum, a simple genetic algorithm does not work well.
The other approach to processing the information contained in the selected set is to estimate the
distribution of selected individuals and to generate new individuals according to this distribution. A
general schema of these algorithms is called the Estimation of Distribution Algorithm (EDA) [1]. The
estimation of the distribution of the selected set is a very complex problem. It has to be done eciently
and the estimate should be able to cover a large number of problems. A general implementation of EDA
was presented in [1]. The weak point of this implementation was the determination of the distribution.
The Univariate Marginal Distribution Algorithm (UMDA) [2] uses simple univariate marginal distributions. The theory shows that UMDA works perfect for linear problems. It works very well for the
problems with not many signicant dependencies.
The Bivariate Marginal Distribution Algorithm (BMDA) [3] is an extension of UMDA. It is based on
bivariate marginal distributions. Bivariate distributions allow taking into account the most important
pair dependencies. BMDA does therefore perform well for linear as well as quadratic problems. The
problem arises for the problems with signicant dependencies of a higher order although many of such
problems can be solved by BMDA eciently as well..
If the structure of a problem were known, there would be a way to cope with dependencies of a
higher order. The Factorized Distribution Algorithm [4] (FDA) works very eciently for decomposable
problems of a known structure. It requires a prior knowledge about the problem decomposition. This
is not needed by UMDA because UMDA xes the distribution to take each gene independently. BMDA
learns the structure of a problem during an optimization process but the model covers well only problems
with pair dependencies that do not form cycles. However, the problems with a more complex dependency
model are approximated with this model.
In the following text, only chromosomes represented by binary strings of xed length will be considered. Algorithms can be easily extended to strings over any nite alphabet. The rst position in a
string will be referred as the 0th position.
2 Univariate Marginal Distribution Algorithm
The Univariate Marginal Distribution Algorithm (UMDA) uses the simplest way to estimate the distribution. It belongs to the EDA schema. The estimation of distribution of the selected set is done with
a very simple linear model, the so-called univariate marginal distributions.
Let us denote a chromosome length by n. For each position i 2 f0; : : : ; n ? 1g and each possible value
xi 2 f0; 1g on this position, we dene the univariate marginal frequency pi (xi ) for set P as the frequency
of strings that have xi on ith position in set P . Let us have the univariate marginal frequencies for the
set of selected individuals. Then, the distribution of the parents is estimated as
p(X ) =
nY
?1
i=0
pi (xi )
(1)
New individuals are generated according to this distribution. For the selected set, the univariate
marginal frequencies are calculated rst. Then, for each new individual, a bit on the ith position is
generated using the univariate frequencies pi (a). It is set to a with the probability equal to pi (a).
UMDA works perfect for linear problems what was shown in [2]. The problem arises for problems
with signicant dependencies among genes.
3 Bivariate Marginal Distribution Algorithm
The Bivariate Marginal Distribution Algorithm (BMDA) [3] also belongs to the EDA class of algorithms.
It uses a more sophisticated distribution based on bivariate marginal frequencies. The most important
dependencies are identied. The distributions is estimated in order to preserve these identied dependencies. The estimation of distribution is described in Section 3.1. In Section 3.2, the algorithm for the
generation of new individuals according to the estimated distribution is described.
Let is denote a chromosome length by n. For any two positions i 6= j 2 f0; : : :; n ? 1g and any
possible values xi ; xj 2 f0; 1g on these positions, we dene the bivariate marginal frequency pi;j (xi ; xj )
for set P as the frequency of strings that have xi and xj on positions i and j , respectively. With the use
of univariate and bivariate marginal frequencies, the conditional probability of appearance of the value
xi on ith position having given the value xj on j th position can be calculated as
pi;j (xi jxj ) = pi;jp(x(xi ; x) j )
(2)
j j
The important dependencies will be identied using the Pearson's chi-square statistics [6] for independence of two random variables. For the two positions i and j , the statistics is given by
(Npi;j (xi ; xj ) ? Npi (xi )pj (xj ))2
(3)
Npi (xi )pj (xj )
x ;x
where the sum runs over all combinations of xi and xj . The two genes corresponding to positions i and
2
j are independent for 95% if Xi;j
< 3:84.
2
Xi;j
=
X
i
j
3.1 Construction of a Dependency Graph
For the estimation of the distribution of the selected set, a dependency graph will be used. The graph
will be dened by three sets, V , E , and R, i.e. G = (V; E; R). V is the set of vertices, E V V is
2
the set of edges and R is a set containing one vertex from each of the connected components of G. In a
dependency graph each node corresponds to a position in a string. There is one to one correspondence
between the vertices and positions in a string. Thus, we can use the set of vertices V = f0; : : : ; n ? 1g,
where vertex i corresponds to the ith position. As it will be clear from the construction of the graph, it
does not have be connected. That means the graph does not have to be a tree. The dependency graph
is always acyclic. It can be seen as the set of trees that are not mutually connected. The generation of
new strings does not depend on the number of connected components of the graph.
In the algorithm for the construction of a dependency graph, another set denoted by D is used. It
stands for all pairs of positions that are not independent for 95% (see Equation 3). The pseudo-code
of algorithm for construction of a dependency graph follows. The time complexity of this algorithm is
O(n3 ).
Algorithm for Construction of the Dependency Graph
1. set V f0; : : : ; n ? 1g
set A V
set E ;
2. v any vertex from A
add v into R
3. remove v from A
4. if there are no more vertices in set A, nish
5. if in D there are no more dependencies of any v and v0 where v 2 A and v0 2 V n A, go to 2
2
6. set v to the vertex from A that maximizes Xv;v
over all v 2 A and v0 2 V n A
0
7. add edge (v; v ) into the set of edges E
8. go to step 3
0
3.2 Generation of New Individuals
Each individual is generated using the same algorithm. For the generation, the graph G = (V; E; R)
computed by the previous algorithm is used. First, the values for all positions from R are generated,
using their univariate marginal frequencies. Then, the value on any position that is connected to an
already generated position in the graph, is generated, using the conditional probabilities for this position
having given the value on the already generated position connected to this one. The pseudo-code of the
algorithm for generation of one new individual follows.
Algorithm for Generation of New Individual Using the Dependency Graph
1. set K V
2. generate xr for all r 2 R using univariate frequencies (set it to the value a with probability pr (a))
set K K n R
3. if K is already empty, nish
4. choose k from K such that there exist k0 from V n K connected to k in the graph G
5. generate xk using conditional probability having given value for xk , i.e. set it to value a with
probability pk;k (ajxk )
6. remove k from the set K
7. go to 4
0
0
0
3.3 Description of BMDA
In BMDA, as in any of evolutionary algorithms, the population is generated randomly rst. Then,
the set of good individuals is selected. Using this set, the dependency graph is constructed. New
individuals are generated according to this graph. Created individuals are then incorporated into the
old population. The pseudo-code of BMDA follows.
Bivariate Marginal Distribution Algorithm
1. set t 0
randomly generate initial population P (0)
2. select parents S (t) from P (t)
calculate univariate frequencies pi and bivariate frequencies pi;j for the selected set S (t)
3
3. create a dependency graph G = (V; E; R) using the frequencies pi and pi;j
4. generate the set of new individuals O(t) using the dependency graph G and frequencies pi and
pi;j
5. replace some of individuals from P (t) with new individuals O(t)
set t t + 1
6. if termination criteria are not met, go to 2
The termination criterion due to the lack of diversity is dened as follows: if all univariate frequencies
are closer than > 0 to 0 or 1, the algorithm is terminated. If this is the case, we say the algorithm
-converged. In our experiments, we use this termination criterion with = 0:05.
4 Experiments
First, the used tness functions will be described. The tness denitions follow. For some functions,
the permutation denoted by will be used. The permutation is used in order to show how the ordering
of genes aects the performance of compared algorithms.
Onemax tness function [2]
fonemax(x) =
nX
?1
i=0
xi
(4)
Quadratic tness function without overlapping [3]
2 ?1
X
n
fquadratic (x; ) =
where f2 is dened as
i=0
?
f2 x(2i) ; x(2i+1)
(5)
f2 (u; v) = 0:9 ? 0:9(u + v) + 1:9
(6)
Deceptive function of order 3 [3]
n
f3deceptive (x; ) =
?1
3
X
i=0
?
f3 x(3i) + x(3i+1) + x(3i+2)
(7)
where x is a bit string, is any permutation of order n, and f3 is dened as
8
>
>
<
if u = 0
if u = 1
if u = 2
otherwise
(8)
f5 (x5i + x5i+1 + x5i+2 + x5i+3 + x5i+4 )
(9)
f3 (u) = >
>
:
0:9
0:8
0
1
Trap function of order 5 [5]
5 ?1
X
n
ftrap5 (x) =
where f5 is dened as
i=0
f5 (u) =
5 ? u if u < 5
5
otherwise
4
(10)
4.1 The discovery of dependencies
In this section, the evolution of dependencies is shown. Dependencies are measured by the Pearson's
chi-square statistics (see Equation 3). Since there is n(n ? 1) dierent pairs of variables, it would be
hard to show the evolution for all possible dependencies. Therefore, only dependency between the rst
gene (the 0th position) and the others is shown. The three-dimensional graph is used. The rst axis
stands for the gene that shown dependencies are counted with (i.e., it obtain values from f1,2,. . . ,n-1g).
The second axis stands for the number of epochs and the third axis stands for the chi-square statistics
of the rst gene and the gene corresponding to the value on the rst axis in epoch corresponding to the
value on the second axis.
The two tness functions was used. In Figure 1 the evolution of dependencies for the deceptive
function of order 3 is shown (see Equation 7). In this tness function, the rst bit is correlated with
the next two ones what is clear from the basic denition of the function as well as from its decomposition [3]. The algorithm clearly discovers the right dependencies and they gradually get stronger until
the algorithm nds the optima. The set of parameters of the algorithm was set to values to make it
converge in most of the runs. The second used function is the trap function of order 5 (see Equation 9).
Results for this function are shown in Figure 2. In this function, the rst bit is correlated with the next
four ones. The algorithm identies the right dependencies and they gradually get stronger.
Figure 1:
f3deceptive
Figure 2: The evolution of dependencies for ftrap5 of
The evolution of dependencies for
of size n = 30 (see Equation 7).
size n = 20 (see Equation 9).
chi-square statistics
chi-square statistics
800
500
600
400
300
400
200
200
100
0
0
25
14
20
15
epoch
10
5
0
30
25
20
10
15
position
5
0
12
10
epoch
8
6
4
2
0
8 6
12 10
position
16 14
20 18
4
2
0
4.2 Comparisons
Due to the lack of space, only results for a few experiments will be presented. Comparisons were done
with UMDA, BMDA and with GA with onepoint and uniform crossover. The experiments were done
with three tness functions. For all algorithms, the set of parameters was set so that the convergence was
achieved in all of 30 independent runs. A number of tness function calls performed until convergence
was compared. The rst is a linear onemax tness function fonemax (see Equation 4). Results for various
problem sizes are shown in Figure 3. Similar experiments were done for quadratic tness function
fquadratic (see Equation 5). Results are shown in Figure 4. UMDA and GA with uniform crossover
performed much worse so the corresponding results are not present in the graph. For deceptive function
of order 3 (see Equation 7) there were used two dierent permutations. The 1 was identity. The 3
was dened as follows
n
(
i
mod
3)
+
i
(11)
3 (i) =
3
5 Conclusions
For linear problems, UMDA works perfectly. BMDA needs larger population, in order to be discover the
right dependency structure of a problem. UMDA and GA with uniform crossover work best for this class
of problems since they take genes as independent. For quadratic function, BMDA is clearly the best.
5
Figure 3: Fitness evaluations for fonemax for various
problem sizes (see Equation 4).
Figure 4: Fitness evaluations for fquadratic for various problem sizes (see Equation 5).
140000
6000
UMDA
GA(uniform)
GA(onepoint)
BMDA
5000
Number of fitness evaluations
Number of fitness evaluations
7000
4000
3000
2000
1000
0
120000
BMDA
GA(onepoint)
100000
80000
60000
40000
20000
0
20
40
60
80
100
120
Size of the problem
140
160
180
0
20
40
60
80
Size of the problem
100
120
Table 1: Fitness evaluations for f3deceptive of size n = 30
BMDA
GA (one-point)
GA (uniform)
tness eval. for 1 tness eval. for 3
17; 550
17; 420
4; 977
230; 000
> 650; 000
> 650; 000
BMDA uses a model that covers quadratic functions. That is why it beats all other algorithms easily.
For a dierent reordering a gap between BMDA and a simple GA with onepoint crossover enlarges [3].
The deceptive function of order 3 is not covered by the model of BMDA. BMDA works quite well for
this function anyway. Moreover, its performance is independent of the order of genes so it wins when
the distances between genes that correlate with each other are longer. For problems with dependencies
or higher order, BMDA does not work very well. The model it uses is not sucient for covering the
problem well. The solution to this problem might be the use of FDA. However, FDA requires a problemspecic knowledge in the initial stage. This is not required by any of UMDA, BMDA, or GA. If this
were overcome, FDA would perform very well for all problems that are decomposable.
References
[1] Muehlenbein, H., Paa, G., 1996, From Recombination of Genes to the Estimation of Distributions,
In Voigt, H. M., et al. (eds), Lecture Notes in Computer Science 1141: Parallel Problem Solving
from Nature - PPSN IV, pp. 178-187.
[2] Muehlenbein, H., 1998, The Equation for Response to Selection and its Use for Prediction, Evolutionary Computation, 5, 303-346.
[3] Pelikan, M., Muehlenbein, H., 1998, The Bivariate Marginal Distribution Algorithm, submitted for
publication.
[4] Muehlenbein, H., Rodriguez, A. O., 1998, Schemata, Distributions and Graphical Models in Evolutionary Optimization, submitted for publication.
[5] Kargupta, H., 1995, SEARCH, polynomial complexity, and the fast messy genetic algorithm, dissertation, University of Illinois, IL
[6] Marascuilo, L. A., McSweeney, M., 1977, Nonparametric and Distribution-Free Methods for the
Social Sciences, Brooks/Cole Publishing Company, CA
6