Download Paper ~ Which Algorithm Should I Choose At Any Point of the

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Computational phylogenetics wikipedia , lookup

Cryptography wikipedia , lookup

Natural computing wikipedia , lookup

Sieve of Eratosthenes wikipedia , lookup

Post-quantum cryptography wikipedia , lookup

Knapsack problem wikipedia , lookup

Lateral computing wikipedia , lookup

Multiplication algorithm wikipedia , lookup

Gene expression programming wikipedia , lookup

Theoretical computer science wikipedia , lookup

Quicksort wikipedia , lookup

Computational complexity theory wikipedia , lookup

Pattern recognition wikipedia , lookup

Mathematical optimization wikipedia , lookup

Simulated annealing wikipedia , lookup

Sorting algorithm wikipedia , lookup

Travelling salesman problem wikipedia , lookup

Fisher–Yates shuffle wikipedia , lookup

Simplex algorithm wikipedia , lookup

Probabilistic context-free grammar wikipedia , lookup

Smith–Waterman algorithm wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Selection algorithm wikipedia , lookup

Fast Fourier transform wikipedia , lookup

Operational transformation wikipedia , lookup

Algorithm characterizations wikipedia , lookup

Factorization of polynomials over finite fields wikipedia , lookup

Genetic algorithm wikipedia , lookup

Algorithm wikipedia , lookup

Time complexity wikipedia , lookup

Transcript
Which Algorithm Should I Choose at any Point
of the Search: An Evolutionary Portfolio Approach
Shiu Yin Yuen
Chi Kin Chow
Xin Zhang
Department of Electronic Engineering
City University of Hong Kong
Hong Kong, China
[email protected]
[email protected]
[email protected]
engineering applications to find good quality solutions for
challenging optimization problems. Famous examples include
Genetic Algorithm (GA), Evolutionary Strategy (ES),
Evolutionary Programming (EP), Particle Swarm Optimization
(PSO), Differential Evolution (DE), Ant Colony Optimization
(ACO), Artificial Immune System (AIS), Cultural Algorithm
(CA), Estimation of Distribution algorithm (EDA), Artificial Bee
Colony Algorithm (ABC), Biogeography-based Optimization
(BBO), and others [1].
ABSTRACT
Many good evolutionary algorithms have been proposed in the
past. However, frequently, the question arises that given a
problem, one is at a loss of which algorithm to choose. In this
paper, we propose a novel algorithm portfolio approach to address
the above problem. A portfolio of evolutionary algorithms is first
formed. Artificial Bee Colony (ABC), Covariance Matrix
Adaptation Evolutionary Strategy (CMA-ES), Composite DE
(CoDE), Particle Swarm Optimization (PSO2011) and Self
adaptive Differential Evolution (SaDE) are chosen as component
algorithms. Each algorithm runs independently with no
information exchange. At any point in time, the algorithm with the
best predicted performance is run for one generation, after which
the performance is predicted again. The best algorithm runs for
the next generation, and the process goes on. In this way,
algorithms switch automatically as a function of the
computational budget. This novel algorithm is named Multiple
Evolutionary Algorithm (MultiEA). Experimental results on the
full set of 25 CEC2005 benchmark functions show that MultiEA
outperforms i) Multialgorithm Genetically Adaptive Method for
Single Objective Optimization (AMALGAM-SO); ii) Populationbased Algorithm Portfolio (PAP); and iii) a multiple algorithm
approach which chooses an algorithm randomly (RandEA). The
properties of the prediction measures are also studied. The
portfolio approach proposed is generic. It can be applied to
portfolios composed of non-evolutionary algorithms as well.
In spite of the proliferation of algorithms that use the evolution
metaphor, for general users dealing with an optimization scenario,
there is little readily available guideline of which algorithm to
choose. Frequently, one resorts to words of mouth or fame of the
algorithm, or try it out one by one in an exhaustive manner. The
problem is compounded by the fact that individual algorithms will
need parameter tuning to obtain the best performance, which is
computationally expensive or even prohibitive [2]. The current
state of affairs motivates this paper.
Usually an algorithm has a standard recommended set of
parameters defined by the researchers. This set of parameters is
usually arrived at after many tests on benchmark functions and
practical applications. Thus for each algorithm, we use the
recommended set of parameters and do not attempt the
challenging problem of parameter tuning and control [2]. Instead,
we believe that the multiple algorithms can be complementary:
An algorithm which does not work well on one problem will be
replaced by an algorithm that works well for it. So the problem is
how to select algorithms.
Categories and Subject Descriptors
I.2.8 [Artificial Intelligence]: Problem Solving, Control Methods,
and Search – Heuristic methods; G.1.6 [Numerical Analysis]:
Optimization – Global optimization.
Note also that the question of choice of algorithm should be a
function of the computational budget. For example, one algorithm
may converge fast to a shallow local optimum, another may
converge slower but to a deeper local optimum given enough
time, still another may converge the slowest but eventually reach
the global optimum. Which algorithm should one choose? If
fitness evaluations are expensive, then only a small computational
budget is allowed and the first one should be chosen. If fitness
evaluations are relatively inexpensive or we have a design
problem such that we can tolerate longer runs, then the second
algorithm should be chosen. Finally, if fitness evaluations are
cheap and we aim at solving a scientific problem of which finding
the global optimum is essential, then the third algorithm should be
chosen.
General Terms
Algorithms, Experimentation
Keywords
Evolutionary algorithm, portfolio, global optimization
1. INTRODUCTION
Rapid advances in Evolutionary Computation (EC) have been
witnessed in the past two decades. There are now many powerful
Evolutionary Algorithms (EAs) that are applied to scientific and
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee.
GECCO'13, July 6-10, 2013, Amsterdam, The Netherlands.
Copyright © 2013 ACM 978-1-4503-1963-8/13/07...$15.00.
In this paper, an algorithm portfolio approach is advocated to
tackle the problem. Our conceptual framework is simple: (1) put
promising EAs together in a portfolio; (2) an initialization is
conducted in which each algorithm is run for some number of
generations until there is a change in fitness; (3) use a predictive
measure to predict the performance of each algorithm at the
nearest common future point; (4) select the algorithm which has
567
idea is to use a set of q EAs {A1,…,Aq} that have a common
population with N individuals. Initially, the number of offspring
{N1,…,Nq} that each algorithm produces is fixed by the user,
=N. After the N offspring are generated, the N
where ∑
parents and the N offspring are sorted by decreasing fitness.
Individuals with higher fitness are inserted into the population
pool of the next generation one by one. To maintain diversity, an
individual will be inserted only if its Euclidean distance is larger
than a user-specified threshold compared with all existing
individuals in the pool. The process goes on until N new
individuals are generated, which makes up the next generation.
Each algorithm has its own stopping criterion. The multimethod
search stops when one of the stopping criteria is fulfilled. Then
the search is re-run with the population size doubled and the
number of offspring is recalculated by probability matching as
Ni=NΨ / ∑
Ψ , where Ψ is the number of generations for
which algorithm i is responsible for fitness improvement. To
prevent an algorithm to be driven extinct, the minimum Ni/N ratio
is fixed by the user. To pass information from one run to the
other, the best found solution is included as one of the individuals
in the otherwise randomly generated initial population of the next
run, provided that a numerical condition is fulfilled. On simple
uni-modal problems, the method obtains similar performance as
existing individual algorithms. For complex multi-modal
problems, the method is superior.
the best predicted performance to run for one generation; and (5)
repeat step (3) and (4) until a given computational budget is
reached.
Note that as the algorithm which has the best predicted
performance may be different at different stages of the search, our
approach will switch from one algorithm to another and back
automatically and seamlessly.
In a nutshell, we propose to choose, at any point of the search, the
algorithm which has the best predicted performance to run (for
one generation). A typical scenario of running our algorithm is
that after some trials in which each algorithm runs in parallel and
interacts indirectly, an algorithm which has the best predicted
performance by a considerable margin stands out, and only it is
run for quite some time. If it is a very good algorithm that excels
in small, medium and large budgets, then only it will run from
then on. However, if it is an algorithm that converges fast to a
local optimum, as discussed above, then it will run for awhile and
gradually the predicted performance will not be as good compared
with other algorithms. At which time, a second algorithm, which
has a better predicted performance, will take over. Like changes
would occur as the search progresses.
A novel online performance prediction metric is proposed to
advise which algorithm should be chosen to generate the next new
generation of solutions. The metric is parameter-less - it does not
introduce any new control parameter to the system - thus avoiding
the difficult parameter tuning and control problem [2]. We name
our algorithm Multiple Evolutionary Algorithm (MultiEA).
Recently, Peng et al. [15] proposed an algorithm called
population-based algorithm portfolio (PAP). q EAs are run in
parallel, each with their own populations. There are two important
parameters in the algorithm: migration interval I and migration
size s. A migration across population occurs every I generations.
The migration process is as follows: for each algorithm, the s best
individuals are found amongst all the individuals in the remaining
(q-1) populations. These s individuals are appended to the
population. Then the worst s individuals in the population are
discarded. It is experimentally found that PAP performs better
than running each individual algorithm with the same
computational budget. The authors conclude that the performance
gain is due to the synergy of the algorithms through the migration
process.
Currently in the EC community, CMA-ES [3] is one of the
strongest algorithms. It is the clear winner in the BBOB2010
competition [4]. On the other hand, several improved variants of
DE have won several competitions held in recent years [5].
Amongst these variants, Composite DE (CoDE) [6] and Self
adaptive DE (SaDE) [7] are the state of the art. For the influential
swarm intelligence methods, the current “standard” PSO
algorithm is PSO2011 [8], and ABC [9] is one of the most
recently proposed methods with the advantage of competitive
performance and few control parameters. In this paper, ABC,
CMA-ES, CoDE, PSO2011 and SaDE shall be chosen by us for
further investigations since they are our choice algorithms if we
were to recommend to outsiders. Other EC researchers may of
course have other choices. In the companion paper [10], we report
a novel way to automatically and parameter-lessly compose a
portfolio, i.e., find its member algorithms.
A related approach is the classic algorithm selection paradigm
[16]. (See also the recent survey [17].) The approach aims to
extract some distinguishing features from the problem, and use
machine learning to map the features to the most suitable
algorithm.
The rest of this paper is organized as follows: Section 2 discusses
the relationships of our approach with existing works. Section 3
reports MultiEA. Section 4 reports the numerical experiments.
Section 5 investigates the properties of the prediction measures.
Section 6 concludes.
2.2 Limitations of Existing Works
[11] is designed for algorithms that are guaranteed to find the
globally optimal solutions (Las Vegas algorithms). However, the
usual applications of EA do not require or guarantee the finding of
the global optimal solution (i.e., they are Monte Carlo
algorithms). [12] requires a problem class to have been well
defined and previous history is used to advise the algorithm
selection. [13] also considers Las Vegas algorithms and a specific
problem class. Moreover, the bandit metaphor may not be a
suitable one for a Monte Carlo type optimization algorithm. These
works are not applicable to the scenario in which a multiple
algorithm EA is requested to handle a single problem instance
with little or no applicable prior knowledge, the case considered
in this paper.
2. RELATIONSHIPS WITH EXISTING
WORKS
2.1 Existing Works
Relatively little work has been done on using a portfolio of
evolutionary algorithms to improve optimization effectiveness.
Early works include [11, 12]. The recent work of Gagliolo and
Schmidhuber [13] considered the problem as a multi-armed bandit
problem on choosing different time allocators, whose run time
distribution is determined online through censored and
uncensored data. Recently, Vrugt et al. [14] reported a self
adaptive multi-method search known as AMALGAM-SO. The
All existing algorithms invariably introduce more control
parameters into the algorithm (e.g. the definition of risk in [11],
the risk aversion constant in the utility function of [12], the
568
Whether the best so far solution participates in the evolution, it is
reasonable to keep a copy of it. Hence ( ) ≤ ( ) iff ≥ .
An indicator of the performance is the convergence curve
= {( , ( ))}, which records the entire history of the
convergence trend.
choices and parameters of the time allocators in [13], the initial
number of offspring for each algorithm, the minimum probability,
diversity threshold, stopping criteria for each algorithm, and the
condition for the best found solution in [14], the two migration
parameters in [15]). A careful tuning of these control parameters
is needed. Such tuning is computationally expensive [2]. Even
when this has been done, there is no guarantee that it would work
well in a new unknown problem. While recent researches [14, 15]
have demonstrated that a multiple algorithm approach is
promising, a self adaptive approach is taken, i.e., incorporate the
algorithm selection decision within the evolutionary process. It is
known that a self adaptive approach does not mitigate completely
the problem of premature convergence, as the whole population
may be focused on only a part of the search space which then
biases the future available search strategies. Moreover, offspring
generated from different algorithms may mislead each other.
Another disadvantage of the self adaptive approach is that it sheds
no physical insight on why an algorithm is good for a problem, as
it does not have an easily understood physical model for credit
assignment. Also, the process sheds no new insight on which
algorithm is good for which problem. The algorithm selection
approach [16, 17] assumes that meaningful features can be
extracted that fully characterizes the problem class, and the
difficult feature-algorithm mapping can be learnt satisfactorily.
Even if the above two requirements have been met, given an
unknown problem, a preliminary uniform sampling has to be done
to extract the features; the sampling may be biased due to the alias
errors generated by under-sampling [17].
The relative performance of the algorithms can be compared by
comparing the convergence curves. This is also the standard
practice in many EC papers. A popular measure used in the EC
community is comparing ( ) while keeping the total number of
evaluations a constant. It simply means: run each algorithm by a
fixed user defined number of evaluations and choose the winner.
This exhaustive approach is very computationally expensive. It
can be considered as a generalized form of parameter tuning, only
that in this case, the parameter is replaced by algorithm. Such a
generalized tuning is useful if P is a typical problem of the class
of problems to be solved. However, in practical optimization
scenarios, usually the class of problems is not well defined, and it
is impossible to know beforehand what a typical problem is.
It is more reasonable to predict fitness at some future point
common to all algorithms and choose the winner. This requires
extrapolating the convergence curve. We have tested simple
extrapolation using ideas such as exponential curve (with
parameters found using a genetic algorithm), polynomial or
Taylor series (with parameters found using standard equations).
We found that if one wishes to extrapolate a little, then these
functions perform well. However, if the extrapolation is far into
the future, then these functions give bad results. Moreover, these
extrapolations do not use the knowledge that the curve is nonincreasing. In this paper, we propose a novel prediction measure
to tackle this problem. Let revisit the convergence curve =
{( , ( ))}. For clarity, we have dropped the subscript . Define
( ) = {( − , ( − )), … , ( , ( )))} as the sub-curve which
includes all the data points from the ( − ) generation to the
generation, for which is the history length. For each subcurve ( ), a linear regression is done to find
2.3 Merits of Our Approach
In this paper, we present a novel alternative approach which
overcomes the above limitations. It does not require the global
optimal solution to be known; it works on a single problem
instance with little or no a prior knowledge; it does not introduce
any new control parameter; it uses a winner take all strategy rather
than probability matching; as the algorithms run independently, it
does not suffer from the limitations of a self adaptive approach,
and offspring from different algorithms will not mislead each
other. An easily understood physical model for credit assignment
based on fitness prediction is introduced. It also gives direct
insights on “which algorithm is good for which problem” as a
function of computational budget. Existing multiple algorithm
approaches [14, 15] find a good synergy by self adaptive
information exchange between algorithms through a common
population. The present approach aims to select the best algorithm
at each instance by predicted future performance, based on past
history. It switches between algorithms and back dynamically as
new search information is obtained which enables revised
predictions to be made. A synergy is achieved indirectly. Our
approach is complementary to existing works, and furnishes an
alternative and fresh view of the problem.
arg min ∑(
( , )
, )∈ ( )
−(
+ )
(1)
which gives a straight line with slope and y-intercept that
minimizes the mean squared error. The predicted fitness using
data points in ( ) for future generation is
( , )=
+
(2)
Since may take on values from 1 to − 1, we have − 1
( , 1) to
( , − 1). We treat these
predicted values
predicted values as sample points of an unknown distribution, and
( ) to it. In this paper,
fit a bootstrap probability distribution
the distribution is computed by kernel smoothing function
estimate. Specifically, we use Matlab statistical toolbox function
ksdensity. (Strictly speaking, ksdensity has some parameters that
can be varied; in this paper, we have used the standard values
provided in Matlab for a “standard” interpolation.) We predict the
( ) to get the predicted fitness
fitness at by sampling from
( ).
3. MULTIEA
This paper focuses on single objective (SO) continuous parameter
optimization with dimension D. Without loss of generality, let the
optimization be a minimization, and assume one has a problem P
for which we have no prior knowledge about which algorithm is
the best. Assume that EAs have been chosen and placed within
algorithms are executed
a portfolio
= { , … , }. The
independently and there is no information exchange between
them.
The underlying idea is to treat each linear regression model ( )
with equal probability. The distribution model is designed to
model the uncertainty in the prediction. In the special case that the
convergence curve is indeed linear, then the bootstrap distribution
will be a single spike. Since more recent points appear in more
regression models, implicitly a heavier weighting is put on more
recent points and vice versa.
Let be the number of generations that algorithm has run. Let
( ) be the best (i.e., smallest) fitness found by at generation .
569
( )( >
Now for each algorithm , the predicted best fitness
) predicts the fitness of the algorithm if it were evaluated
generations.
Sample bpdi(tmin) to get pfi(tmin).
Step 4:
Choose the algorithm with index arg mini(pfi(tmin)), which has
the best predicted performance.
Step 5:
Run it for one generation. αi ← αi+1.
Step 6:
Record the best-so-far solution Sbest found by the portfolio.
Step 7:
Stop if total number of evaluations > N.
Otherwise goto Step 2.
Output:
The best solution found Sbest.
Assume that the population size
of the algorithms is equal.
Then the nearest future is + 1. This is the nearest future in
which algorithm has run one more generation. In this case the
unit of time is the number of generations. Let
=
max , … , . This is the nearest common future of all the
algorithms. The algorithm which will generate the next generation
(
) , where
(
) for each is sampled
is argmin
from its bootstrap probability distribution
(
).
It has a clear physical meaning: The algorithm that will run the
next generation is one for which the predicted best fitness is the
smallest if they were evaluated the same number of generations at
the nearest common future point.
4. NUMERICAL EXPERIMENTS
4.1 Test Function Set
MultiEA is examined using the 25 functions in CEC 2005 test
suite. The dimension D of all test functions is chosen to be 30.
The maximum number of function evaluations is set to 10,000D =
300,000. The performance of an algorithm is measured by
statistics from 25 independent runs.
In general, let
be the population size of algorithm . Then the
nearest future is
, of which = + 1. This is the nearest
future in which algorithm has run one more generation. In this
case the unit of time is the number of fitness evaluations. Let
≡( )
≡ max
,…,
. This is the nearest
common future of all the algorithms. The algorithm which will
(
) , where
(
)
run the next generation is argmin
4.2 Test Algorithm Set
To evaluate the performance of MultiEA, we compare its
performance with three multi-algorithms: AMALGAM-SO, PAP
and RandEA. The first two algorithms has been described in
Section 2.1.
for each is sampled from its bootstrap probability distribution
(
).
Note that only efficient algorithms will be selected often;
inefficient algorithms will be selected sparsely since its predicted
fitness is large. Also, note that the above metric is parameter-less.
It does not introduce additional parameters into the algorithm
portfolio.
RandEA means randomly chosen EA: Given the number of
evaluations N, if we have no knowledge about which algorithm is
the best, it is reasonable to randomly select an algorithm and run it
for N evaluations. This algorithm is referred as RandEA. Its
performance is defined as the average performance of the q
algorithms. It serves as a baseline algorithm to test whether there
is a positive synergy in MultiEA.
One has to ensure that each algorithm is able to make a non-trivial
is run until there is a decrease
prediction. Thus each algorithm
in the best fitness, i.e., until the smallest such that ( − 1) ≠
( ). Note that the initial number of generations
is
determined automatically, without introducing any extra control
parameter.
As said in Section 1, ABC, CMA-ES, CoDE, PSO2011 and SaDE
are chosen as component algorithms for MultiEA, AMALGAMSO, PAP, and RandEA.
All algorithms are implemented in Matlab. All the experiments
are conducted on a PC with 2.67GHz 4-core CPU and 3 GB of
memory.
Below, we summarize the proposed method:
Algorithm (MultiEA).
Input:
A portfolio of q EAs AP={A1,…,Aq}; Ai has population size mi;
single objective minimization problem P with dimension D;
maximum number of evaluations N.
Step 1:
for i = 1 to q
Run Ai until there is a change of fitness. Let αi be the
number of generations that Ai has run.
Step 2:
Compute the nearest common future point tmin ≡(mt)max≡
max(m1t1, …, mqtq), where ti = αi + 1.
Step 3:
for i = 1 to q
Construct convergence curve Ci = {(j, fi(j))| j=1,…, αi}.
Construct sub-curves {Ci(1), …, Ci(αi-1)}.
for each sub-curve Ci(l)
Least square line fit to get line parameters (a, b).
Predict the fitness at the smallest common future point
pfi(tmin,l).
Use the αi-1 sample points to pfi(tmin,l) (l=1,…, αi-1) to
construct bootstrap probability distribution bpdi(tmin).
4.3 Results
The detailed results (mean and standard deviation) are presented
in Tables 1-3 in Appendix. To illustrate the significance of the
results, the p-value for Mann-Whitney U test (rank-sum test)
comparing the averaged best function error values of MultiEA
with each of the other algorithms are listed. In the tables, a pvalue less than 0.05 (α=0.05) corresponds to significance in the
comparison result. Values shown in bold means that the
associated algorithm is significantly outperformed by MultiEA;
while underlined values mean that MultiEA is significantly
outperformed by the other algorithm. If the mean and/or the
standard deviation returns “0.00E+00”, it simply means that the
value is smaller than the smallest precision of floating point
number of Matlab.
MultiEA significantly outperforms AMALGAM-SO, PAP and
RandEA in 13, 18 and 21 out of 25 test functions; while it is
significantly outperformed by AMALGAM-SO, PAP and
RandEA in 9, 3 and 4 test cases, respectively. Since MultiEA
performs better than AMALGAM-SO and PAP, two state of the
570
art multiple algorithm approaches, this confirms the power and
potential of MultiEA. Moreover, MultiEA outperforms RandEA.
This suggests that by putting the algorithms in a portfolio using
our method, a positive synergy of the algorithms is achieved.
MultiEA
+
-
Fig. 1 Summary of the result comparison between MultiEA and
five other prediction measures. “+” means MultiEA significantly
outperforms another algorithm; “-” means the opposite.
5. PROPERTIES OF PREDICTION
MEASURE
This section analyzes the properties of the proposed prediction
measure in Section 3 by comparing its performance with 5
alternative prediction measures.
It can be observed that MultiEA outperforms the other algorithms
in a neck to neck comparison.
Comparing with MultiEA1 and MultiEA2, the result suggests that
the use of a probability distribution is a good strategy for
modeling the evolution tendency of component algorithms than a
single mean/median point. MultiEA3 may put too much strength
on the latest data points. MultiEA4 put equal weight on all data
points since it randomly chooses two points and doing a
prediction. MultiEA outperforms both MultiEA3 and MultiEA4,
which means that MultiEA attains a better balance in the use of
data points than these two algorithms. MultiEA5 utilizes a
histogram to select algorithm. The histogram roughly estimates
the distribution of the predicted values. This is why MultiEA5
only obtains slightly worse results than MultiEA.
5.1 Test Prediction Measures
Five different prediction measures are studied as follows.
Measure 1: As said in Section 3, MultiEA builds a probability
distribution based on − 1 predicted values ( , 1) to ( , −
1) samples from the obtained probability distribution and then
chooses the algorithm having the best predicted performance.
Instead of building a bootstrap probability distribution, measure 1
calculates the mean value based on − 1 predicted values for all
component algorithms, and then chooses the algorithm having the
best mean value.
Measure 2: It calculates the median value based on − 1
predicted values for all component algorithms, and then chooses
the algorithms having the best median value.
Through the study of five different prediction measures, we can
conclude that the proposed method could organize well the
predicted values by forming a bootstrap distribution and identify
the evolution tendency of component algorithms. Therefore,
MultiEA achieves good performance (on experimental results on
CEC 2005 test suite).
Measure 3: It is a weighted linear regression method. The data
points from the ( − ) generation to the
generation are
weighted by 1,2,…, (l+1), normalized by (l+1)(l+2)/2. This
measure biases towards the near past data points more than the far
past data points. The weighted linear regression is done to find
arg min ∑(
( , )
, )∈ ( )
−
( , )(
+ )
MultiEA1 MultiEA2 MultiEA3 MultiEA4 MultiEA5
5
7
4
14
3
0
0
0
1
0
Note that MultiEA5 is only marginally worse than MultiEA. Thus
it is an attractive alternative.
(3)
6. CONCLUSIONS
Evolutionary algorithms (EAs) have proliferated as one of the best
tools for solving the ubiquitous optimization problem in the real
world. Many excellent EAs have been reported in the past.
However, there arises an important open problem which bewilders
researchers and engineers alike: Confronted with an optimization
problem, which algorithm should I choose?
where ( , ) is the weight of ( , ). Except this change, this
prediction measure is the same as MultiEA.
Measure 4: It first randomly chooses two points, and then builds a
probability density distribution using ksdensity. For l data points,
there are
combinations. As the number of evaluations
2
increases, this prediction measure will become quite slow (i.e.,
more computationally expensive).
This paper is an attempt to answer the above question. A novel
algorithm, known as Multiple Evolutionary Algorithm (MultiEA),
is proposed. A portfolio is first formed by selecting several state
of the art EAs. A novel predictive measure is reported to predict
the performance of individual algorithms if they were extrapolated
to the same number of evaluations in the nearest future. The
algorithm with the best predicted performance is chosen to run for
one generation. New search information is received and the
history is updated. The predicted performance of the algorithms is
updated and the algorithm with the best predicted performance is
re-selected, which may or may not be the same algorithm in the
last generation. Experimental results show that the measure is
stable and is a reasonably effective predictor. The idea is simple
and natural. It is parameter-less; it does not introduce any new
control parameter to the algorithm, thus avoiding the challenging
parameter tuning and control problem.
Measure 5: Instead of using ksdensity as MultiEA, this prediction
measure uses a histogram approach and samples based on the
histogram of all predicted values ( , 1) to ( , − 1). Thus it
is truly parameter-less and is less computationally expensive.
Measures 1 and 2 are two simpler alternatives of forming
probability distributions in MultiEA. Measure 3 is inspired by the
heuristic that the most recent data points should reflect the trend
of the performance of component algorithms better than the old
data points. Measure 4 is an alternative for using linear regression
and building a probability distribution, which is clearly more
computationally expensive than MultiEA. Measure 5 avoids the
use of ksdensity function in Matlab in building the probability
distribution.
Each component algorithm retains and uses its recommended set
of parameters, which is the standard practice in the evolutionary
community. No parameter tuning and control is required. Instead,
our approach expects individual algorithms to play
complementary roles. Different algorithms which excel in
different problems will stand out when required. Moreover, as
different algorithms may be the best for different computational
For easy reference, the associated algorithms of these five
measures are denoted by MultiEA1, MultiEA2,…, and MultiEA5.
5.2 Results
The detailed results (mean, standard deviation and p-value using
U test) are presented in Tables 1-3 in Appendix. Fig. 1 presents a
summary of the statistical hypothesis test results.
571
In this paper, we have used our personal expertise and judgment
to select algorithms to build a portfolio with fixed algorithms. In
principle, if we have had selected a very bad algorithm, it would
not affect too much the performance of MultiEA. This is because
MultiEA will only select the best algorithm to run at any one time.
In the companion paper [10], we report a novel method to find the
best combination to compose a portfolio.
budget in the sense of absolute maximum number of fitness
evaluations, the approach fully allows the selection of the best
algorithm given a fixed computational budget, as well as
automatic algorithm switching as the budget varies.
Compared with the algorithm selection approach, ours do not
require the preliminary feature extraction via sampling of the
search space of the problem. Instead, we assume that the problem
is an unknown one. Of course, it does not preclude information
collection during the execution of our algorithm (and appropriate
post-processing resampling to approximate uniform sampling)
which can aid the algorithm mapping for the algorithm selection
approach. Hence the two approaches are entirely complimentary.
The initial motivation of this paper is a question from a colleague:
there are so many evolutionary algorithms claimed to be powerful
and useful, which algorithm should I choose? As this question
comes from a user of evolutionary computation for optimization
of practical problems, we hope that our paper has answered him in
one way. We encourage more research in this interesting and
important direction.
For existing approaches that works on an unknown problem, some
recent multiple algorithm portfolio approaches use a common
population and a self adaptive approach to apportion different
algorithms at different stages. Compared with these approaches,
ours use a distinctly different philosophy, namely, we concentrate
on selecting the best algorithm given the current computational
budget and predicted performance. We believe that these two
approaches are complementary, and this paper provides a fresh
alternative.
7. ACKNOWLEDGMENTS
The work described in this paper was supported by a grant from
CityU (7002746).
8. REFERENCES
[1] Engelbrecht, A.P. 2007. Computational Intelligence, An
Introduction 2nd Ed. Wiley.
Five algorithms, namely, Artificial Bee Colony (ABC),
Covariance Matrix Adaptation Evolutionary Strategy (CMA-ES),
Composite Differential Evolution (CoDE), Self adaptive
Differential Evolution (SaDE) and Particle Swarm Optimization
(PSO2011) are selected to compose our portfolio in MultiEA in
our experiments. These algorithms are selected because they are
state of the art algorithms, and have significantly different
characteristics, strengths and pitfalls.
[2] Lobo, F.G., Lima, and C.F., Michalewicz, Z. Eds. 2007.
Parameter Setting in Evolutionary Algorithms. Springer,
Berlin, Germany.
[3] Hansen, N. 2011. The CMA Evolutionary Strategy: A
Tutorial. Technical Report, 28 June 2011. URL
http://www.lri.fr/~hansen/cmatutorial.pdf
[4] Black-Box Optimization Benchmarking (BBOB) 2010,
http://coco.gforge.inria.fr/doku.php?id=bbob-2010
Experimental results on the full set of CEC 2005 benchmark
functions show that MultiEA outperforms i) Multialgorithm
Genetically Adaptive Method for Single-Objective Optimization
(AMALGAM-SO); ii) Population-based Algorithm Portfolio
(PAP); iii) a simple multiple algorithm approach which chooses
an algorithm randomly (RandEA).
[5] Das, S., and Suganthan, P.N. 2011. Differential evolution: a
survey of the state-of-the-art. IEEE Trans. Evol. Comput.
15(1), 4-31.
[6] Wang, Y., Cai, Z., and Zhang, Q. 2011. Differential
evolution with composite trial vector generation strategies
and control parameters. IEEE Trans. Evol. Comput. 15(1),
55-66.
The properties of the prediction measures are also studied. It is
found that the prediction measure achieves a better performance
than five alternative heuristic prediction measures, but one
alternative measure, which is truly parameter-less and is simpler,
is a strong alternative.
[7] Qin, A.K., Huang, V.L., and Suganthan, P.N. 2009.
Differential evolution algorithm with strategy adaptation for
global numerical optimization. IEEE Trans. Evol. Comput.
13(2), 398-417.
The proposed approach is generic: Though the scope of this paper
is restricted to portfolio of EAs, it is a general framework and
good non-EA (e.g. Efficient Global Optimization algorithm
(EGO) [18]) is also welcome. Experiments have been conducted
on continuous optimization problems in this paper, but there is no
reason why it cannot be applied to discrete optimization problem.
The proposed MultiEA may also be considered as a hyperheuristic [19].
[8] Particle Swarm Central, http://www.particleswarm.info/
[9] Karaboga, D., and Basturk, B. 2007. A powerful and
efficient algorithm for numerical function optimization:
artificial bee colony (ABC) algorithm. J. Global Optim.
39(3), 459-471.
[10] On Composing an (Evolutionary) algorithm portfolio.
GECCO 2013.
Theoretically, MultiEA, by virtue that it is just another EA, will
still be under the no free lunch (NFL) theorems [20]. There are
likely to be situations that MultiEA gets a bad result. However,
the premise of the NFL is that all computable problems are
equally likely to occur. It is hardly the case in the real world. In
fact, the promise of multi-method search is that by using many
search metaphor inspired by nature, mathematics, sciences, arts,
etc., one would build up better search strategies that are more able
to deal with real world problems. It may perhaps be likened to
synergy of individuals with different talents in a heterogeneous
society.
[11] Huberman, B.A., Lukose, R.M., and Hogg, T. 1997. An
economics approach to hard computational problems.
Science 275, 51-54.
[12] Fukunaga, A.S. 2000. Genetic algorithm portfolios, Proc.
IEEE CEC, 1304-1311.
[13] Gagliolo, M., and Schmidhuber, J. 2011. Algorithm portfolio
selection as a bandit problem with unbounded losses. Ann.
Math. Artif. Intell. 61, 49-86.
[14] Vrugt, J.A., Robinson, B. A., and Hyman, J.M.. 2009. Selfadaptive multimethod search for global optimization in real-
572
parameter spaces. IEEE Trans. Evol. Comput. 13(2), 243259.
[18] Jones, D., Schonlau, M., and Welch W. 1998. Efficient
global optimization of expensive black-box functions. J.
Global Optim. 13, 455–492.
[15] Peng, F., Tang, K., Chen, G., and Yao, X. 2010. Populationbased algorithm portfolios for numerical optimization, IEEE
Trans. Evol. Comput. 14(5), 782-800.
[19] Burke, E., Kendall, G., Newall, J., Hart, E., Ross, P. and
Schulenburg, S. 2003. Hyper-heuristics: an emerging
direction in modern search technology, in Glover F. and
Kochenberger, G.A. eds. Handbook of Meta-Heuristics.
Springer US, 2003, 457–474.
[16] Rice, J. 1976. The algorithm selection problem. Adv.
Comput. 15, 65-118.
[17] Muñoz, M.A., Kirley, M., and Halgamuge, S.K. 2013. The
algorithm selection problem on the continuous optimization
domain. Moewes, C. et al. (Eds.): Computational Intelligence
in Intelligent Data Analysis, SCI 445, 75-89.
[20] Wolpert, D. H., and Macready, W. G. 1997. No free lunch
theorems for optimization. IEEE Trans. Evol. Comput. 1(1),
67-82.
APPENDIX
Table 1.
− with = . The mean and standard deviation (std) of the final function error values obtained by AMALGAMSO, PAP, RandEA, MultiEA and five other prediction measures in 25 runs. p is p-value calculated using U test with α=0.05
Algorithm
AMALGAM-SO
PAP
RandEA
MultiEA
MultiEA1
MultiEA2
MultiEA3
MultiEA4
MultiEA5
mean
std
p
mean
std
p
mean
std
p
mean
std
mean
std
p
mean
std
p
mean
std
p
mean
std
p
mean
std
p
8.38E-15
1.78E-15
9.73E-11
3.89E-29
1.62E-28
4.12E-02
9.48E-17
1.18E-17
9.73E-11
0.00E+00
0.00E+00
0.00E+00
0.00E+00
1.00E+00
0.00E+00
0.00E+00
1.00E+00
0.00E+00
0.00E+00
1.00E+00
6.71E-28
4.39E-28
3.32E-09
0.00E+00
0.00E+00
1.00E+00
4.82E-15
9.21E-16
1.42E-09
9.83E-26
4.93E-26
1.42E-09
4.90E+02
1.55E+02
1.42E-09
3.95E-24
7.79E-25
3.10E-10
1.37E-09
7.71E-01
5.92E-10
1.91E-09
3.84E-04
3.55E-11
1.36E-10
3.42E-01
4.06E-09
1.41E-08
2.87E-08
2.16E-11
8.05E-11
2.22E-01
9.47E-14
2.67E-14
2.98E-02
4.06E+04
1.08E+05
1.17E-02
1.66E+06
4.46E+05
1.42E-09
8.82E+04
1.51E+05
2.13E+05
2.35E+05
3.79E-02
2.22E+05
1.78E+05
2.32E-03
1.22E+05
2.10E+05
9.85E-01
1.73E+05
1.86E+05
2.63E-03
1.41E+05
2.07E+05
7.71E-01
573
8.53E+01
1.98E+02
6.25E-02
2.36E+00
7.35E+00
1.99E-02
5.43E+04
8.76E+04
1.42E-09
5.18E-02
1.18E-01
8.90E+00
3.75E+01
2.77E-01
1.26E+00
3.45E+00
2.44E-02
1.06E+01
5.24E+01
2.29E-01
1.82E+01
4.41E+01
4.61E-05
4.28E+00
1.56E+01
1.12E-01
9.72E-05
4.85E-04
1.42E-09
1.33E+03
5.84E+02
5.00E-02
4.09E+03
3.41E+02
1.42E-09
1.01E+03
6.16E+02
1.14E+03
4.48E+02
2.95E-01
1.32E+03
6.81E+02
1.30E-01
1.36E+03
6.30E+02
4.16E-02
1.53E+03
5.62E+02
3.19E-03
1.09E+03
6.55E+02
7.71E-01
9.04E-15
4.33E-15
4.97E-01
3.26E-01
1.28E+00
3.61E-02
1.95E+02
3.86E+02
1.42E-09
3.26E-01
8.74E-01
5.98E-01
1.44E+00
9.23E-01
8.03E-01
2.88E+00
9.69E-01
9.71E-01
2.75E+00
9.85E-01
7.14E+00
4.77E+00
6.18E-08
8.50E-01
3.55E+00
2.60E-01
3.94E-15
7.80E-16
8.98E-02
8.56E-03
1.21E-02
8.30E-01
1.25E-02
5.43E-03
2.86E-03
6.90E-03
6.43E-03
7.58E-03
8.75E-03
8.72E-01
9.55E-03
1.19E-02
4.12E-01
7.68E-03
7.79E-03
5.86E-01
1.12E-02
9.04E-03
2.86E-03
1.15E-02
1.29E-02
2.08E-01
2.10E+01
4.41E-02
1.81E-06
2.03E+01
2.28E-01
9.86E-03
2.09E+01
3.84E-02
2.63E-03
2.06E+01
3.80E-01
2.07E+01
3.35E-01
1.35E-01
2.05E+01
3.48E-01
8.54E-01
2.07E+01
2.40E-01
6.53E-02
2.07E+01
2.83E-01
1.62E-01
2.06E+01
3.25E-01
4.38E-01
Table 2.
−
with = . The mean and standard deviation (std) of the final function error values obtained by AMALGAMSO, PAP, RandEA, MultiEA and five other prediction measures in 25 runs. p-value is calculated using U test with α=0.05
Algorithm
AMALGAM-SO
PAP
RandEA
MultiEA
MultiEA1
MultiEA2
MultiEA3
MultiEA4
MultiEA5
mean
std
p
mean
std
p
mean
std
p
mean
std
mean
std
p
mean
std
p
mean
std
p
mean
std
p
mean
std
p
5.05E+00
7.18E+00
9.73E-11
0.00E+00
0.00E+00
1.00E+00
5.51E+01
1.06E+01
9.73E-11
0.00E+00
0.00E+00
0.00E+00
0.00E+00
1.00E+00
0.00E+00
0.00E+00
1.00E+00
0.00E+00
0.00E+00
1.00E+00
3.03E-06
1.49E-05
1.37E-07
4.97E-16
2.49E-15
3.37E-01
7.93E+00
1.02E+01
1.17E-08
3.70E+01
9.39E+00
7.58E-02
1.07E+02
1.07E+01
1.42E-09
4.29E+01
1.08E+01
4.46E+01
1.90E+01
7.71E-01
5.34E+01
3.88E+01
6.07E-01
3.98E+01
1.03E+01
2.86E-01
4.16E+01
9.88E+00
7.20E-01
4.46E+01
1.14E+01
8.92E-01
9.91E-01
1.37E+00
1.80E-09
3.41E+01
1.56E+00
1.42E-09
2.45E+01
1.59E+00
1.42E-09
9.06E+00
3.36E+00
1.72E+01
1.04E+01
2.83E-02
2.52E+01
7.44E+00
1.99E-07
1.87E+01
9.86E+00
2.99E-03
1.00E+01
5.03E+00
6.84E-01
2.19E+01
9.59E+00
5.91E-05
3.86E+02
5.60E+02
1.04E-03
1.38E+04
2.23E+04
1.53E-02
2.31E+04
4.55E+03
1.42E-09
2.84E+03
3.33E+03
5.13E+03
3.19E+03
6.60E-03
5.05E+03
5.08E+03
1.16E-01
3.45E+03
2.69E+03
3.13E-01
6.87E+03
6.02E+03
1.37E-02
3.90E+03
5.37E+03
3.04E-01
1.73E+00
5.57E-01
6.18E-08
9.46E+00
9.82E-01
1.42E-09
3.67E+00
8.31E-01
1.42E-09
9.94E-01
1.38E-01
9.99E-01
2.36E-01
3.42E-01
1.08E+00
2.22E-01
2.60E-01
1.06E+00
1.83E-01
1.40E-01
1.39E+00
2.67E-01
3.34E-07
1.10E+00
4.21E-01
5.48E-01
1.13E+01
1.41E+00
1.04E-03
1.33E+01
1.67E-01
2.29E-09
1.32E+01
1.69E-01
2.30E-08
1.25E+01
5.31E-01
1.29E+01
3.21E-01
1.79E-02
1.30E+01
3.80E-01
2.11E-04
1.27E+01
3.66E-01
2.14E-01
1.26E+01
5.08E-01
7.12E-01
1.26E+01
5.15E-01
5.35E-01
2.45E+02
1.00E+02
9.51E-08
3.24E+02
1.14E+02
3.96E-09
2.76E+02
3.34E+01
1.42E-09
1.40E+01
4.10E+01
2.23E+01
6.13E+01
7.12E-01
2.12E+01
5.25E+01
9.23E-01
2.64E+01
6.22E+01
6.98E-01
2.57E+01
4.98E+01
9.32E-03
3.56E+01
6.46E+01
3.83E-01
2.58E+01
1.31E+01
4.13E-09
1.46E+02
5.82E+01
1.01E-06
1.66E+02
4.26E+01
9.51E-08
7.66E+01
2.91E+01
1.08E+02
7.91E+01
1.81E-01
9.96E+01
4.78E+01
4.34E-03
9.33E+01
4.19E+01
3.42E-01
9.46E+01
2.59E+01
3.84E-03
8.25E+01
2.52E+01
2.86E-01
Table 3.
−
with = . The mean and standard deviation (std) of the final function error values obtained by AMALGAMSO, PAP, RandEA, MultiEA and five other prediction measures in 25 runs. p-value is calculated using U test with α=0.05
Algorithm
AMALGAM-SO
PAP
RandEA
MultiEA
MultiEA1
MultiEA2
MultiEA3
MultiEA4
MultiEA5
mean
std
p
mean
std
p
mean
std
p
mean
std
mean
std
p
mean
std
p
mean
std
p
mean
std
p
mean
std
p
4.80E+01
3.86E+01
1.62E-07
2.23E+02
7.43E+01
9.62E-05
2.16E+02
5.56E+01
3.07E-04
1.51E+02
5.99E+01
1.26E+02
4.87E+01
1.68E-01
1.40E+02
7.89E+01
3.04E-01
1.19E+02
5.66E+01
1.12E-01
1.02E+02
5.89E+01
4.34E-03
1.31E+02
3.63E+01
3.13E-01
9.04E+02
1.29E+00
1.42E-09
9.07E+02
2.71E+00
1.42E-09
8.47E+02
1.81E+01
6.22E-03
8.52E+02
1.51E+00
8.52E+02
5.38E-01
3.32E-01
8.52E+02
6.22E-01
9.07E-01
8.52E+02
4.82E-01
6.69E-01
8.52E+02
8.13E-01
4.49E-01
8.52E+02
6.91E-01
9.07E-01
9.04E+02
8.76E-01
1.42E-09
9.07E+02
2.11E+00
1.42E-09
8.48E+02
1.75E+01
1.51E-05
8.52E+02
5.62E-01
8.52E+02
3.75E-01
3.83E-01
8.52E+02
6.01E-01
4.26E-01
8.52E+02
1.06E+00
4.97E-01
8.52E+02
5.50E-01
4.38E-01
8.52E+02
5.63E-01
5.09E-01
9.04E+02
1.19E+00
1.42E-09
9.08E+02
3.83E+00
1.42E-09
8.48E+02
1.73E+01
3.07E-04
8.52E+02
2.29E+00
8.52E+02
6.21E-01
5.09E-01
8.52E+02
5.22E-01
5.61E-01
8.52E+02
3.86E-01
4.38E-01
8.52E+02
6.95E-01
4.15E-01
8.52E+02
6.53E-01
8.77E-01
574
5.00E+02
3.50E-04
1.88E-10
5.12E+02
5.88E+01
1.18E-03
7.09E+02
9.63E+01
1.88E-10
4.93E+02
2.48E+01
5.00E+02
6.96E-14
8.97E-02
5.00E+02
1.89E-13
8.28E-03
4.93E+02
2.48E+01
2.74E-01
4.97E+02
1.31E+01
3.19E-02
4.96E+02
1.80E+01
3.71E-02
8.49E+02
2.29E+01
1.42E-09
9.06E+02
2.49E+01
1.42E-09
6.63E+02
1.98E+01
1.42E-09
5.77E+02
2.01E+00
5.76E+02
1.46E+00
2.14E-01
5.76E+02
2.43E+00
1.87E-01
5.78E+02
2.71E+00
1.35E-01
5.79E+02
3.29E+00
3.39E-03
5.77E+02
2.13E+00
9.69E-01
5.34E+02
1.35E-03
3.57E-05
5.34E+02
7.10E-03
4.41E-04
7.46E+02
6.92E+01
1.40E-09
5.33E+02
3.38E+00
5.34E+02
8.32E-03
5.12E-04
5.50E+02
8.06E+01
1.22E-01
5.34E+02
6.68E-03
1.12E-03
5.34E+02
1.61E-03
9.61E-01
5.34E+02
2.29E-03
2.23E-02
2.00E+02
3.53E-04
9.73E-11
2.00E+02
8.42E-13
1.61E-01
3.25E+02
5.50E+01
9.73E-11
2.00E+02
2.90E-14
2.00E+02
2.90E-14
1.00E+00
2.00E+02
2.90E-14
1.00E+00
2.00E+02
2.90E-14
1.00E+00
2.16E+02
7.78E+01
3.37E-01
2.15E+02
7.53E+01
3.37E-01
2.09E+02
7.99E-01
1.42E-09
1.63E+03
3.31E+00
1.42E-09
5.17E+02
6.46E+00
1.42E-09
5.79E+02
5.99E+00
5.81E+02
5.28E+00
3.42E-01
5.81E+02
5.69E+00
5.48E-01
5.85E+02
5.44E+00
1.19E-03
5.83E+02
5.90E+00
7.75E-02
5.80E+02
4.36E+00
7.42E-01