Download Paper ~ Which Algorithm Should I Choose At Any Point of the

Which Algorithm Should I Choose at any Point of the Search: An Evolutionary Portfolio Approach Shiu Yin Yuen Chi Kin Chow Xin Zhang Department of Electronic Engineering City University of Hong Kong Hong Kong, China [email protected] [email protected] [email protected] engineering applications to find good quality solutions for challenging optimization problems. Famous examples include Genetic Algorithm (GA), Evolutionary Strategy (ES), Evolutionary Programming (EP), Particle Swarm Optimization (PSO), Differential Evolution (DE), Ant Colony Optimization (ACO), Artificial Immune System (AIS), Cultural Algorithm (CA), Estimation of Distribution algorithm (EDA), Artificial Bee Colony Algorithm (ABC), Biogeography-based Optimization (BBO), and others [1]. ABSTRACT Many good evolutionary algorithms have been proposed in the past. However, frequently, the question arises that given a problem, one is at a loss of which algorithm to choose. In this paper, we propose a novel algorithm portfolio approach to address the above problem. A portfolio of evolutionary algorithms is first formed. Artificial Bee Colony (ABC), Covariance Matrix Adaptation Evolutionary Strategy (CMA-ES), Composite DE (CoDE), Particle Swarm Optimization (PSO2011) and Self adaptive Differential Evolution (SaDE) are chosen as component algorithms. Each algorithm runs independently with no information exchange. At any point in time, the algorithm with the best predicted performance is run for one generation, after which the performance is predicted again. The best algorithm runs for the next generation, and the process goes on. In this way, algorithms switch automatically as a function of the computational budget. This novel algorithm is named Multiple Evolutionary Algorithm (MultiEA). Experimental results on the full set of 25 CEC2005 benchmark functions show that MultiEA outperforms i) Multialgorithm Genetically Adaptive Method for Single Objective Optimization (AMALGAM-SO); ii) Populationbased Algorithm Portfolio (PAP); and iii) a multiple algorithm approach which chooses an algorithm randomly (RandEA). The properties of the prediction measures are also studied. The portfolio approach proposed is generic. It can be applied to portfolios composed of non-evolutionary algorithms as well. In spite of the proliferation of algorithms that use the evolution metaphor, for general users dealing with an optimization scenario, there is little readily available guideline of which algorithm to choose. Frequently, one resorts to words of mouth or fame of the algorithm, or try it out one by one in an exhaustive manner. The problem is compounded by the fact that individual algorithms will need parameter tuning to obtain the best performance, which is computationally expensive or even prohibitive [2]. The current state of affairs motivates this paper. Usually an algorithm has a standard recommended set of parameters defined by the researchers. This set of parameters is usually arrived at after many tests on benchmark functions and practical applications. Thus for each algorithm, we use the recommended set of parameters and do not attempt the challenging problem of parameter tuning and control [2]. Instead, we believe that the multiple algorithms can be complementary: An algorithm which does not work well on one problem will be replaced by an algorithm that works well for it. So the problem is how to select algorithms. Categories and Subject Descriptors I.2.8 [Artificial Intelligence]: Problem Solving, Control Methods, and Search – Heuristic methods; G.1.6 [Numerical Analysis]: Optimization – Global optimization. Note also that the question of choice of algorithm should be a function of the computational budget. For example, one algorithm may converge fast to a shallow local optimum, another may converge slower but to a deeper local optimum given enough time, still another may converge the slowest but eventually reach the global optimum. Which algorithm should one choose? If fitness evaluations are expensive, then only a small computational budget is allowed and the first one should be chosen. If fitness evaluations are relatively inexpensive or we have a design problem such that we can tolerate longer runs, then the second algorithm should be chosen. Finally, if fitness evaluations are cheap and we aim at solving a scientific problem of which finding the global optimum is essential, then the third algorithm should be chosen. General Terms Algorithms, Experimentation Keywords Evolutionary algorithm, portfolio, global optimization 1. INTRODUCTION Rapid advances in Evolutionary Computation (EC) have been witnessed in the past two decades. There are now many powerful Evolutionary Algorithms (EAs) that are applied to scientific and Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. GECCO'13, July 6-10, 2013, Amsterdam, The Netherlands. Copyright © 2013 ACM 978-1-4503-1963-8/13/07...$15.00. In this paper, an algorithm portfolio approach is advocated to tackle the problem. Our conceptual framework is simple: (1) put promising EAs together in a portfolio; (2) an initialization is conducted in which each algorithm is run for some number of generations until there is a change in fitness; (3) use a predictive measure to predict the performance of each algorithm at the nearest common future point; (4) select the algorithm which has 567 idea is to use a set of q EAs {A1,…,Aq} that have a common population with N individuals. Initially, the number of offspring {N1,…,Nq} that each algorithm produces is fixed by the user, =N. After the N offspring are generated, the N where ∑ parents and the N offspring are sorted by decreasing fitness. Individuals with higher fitness are inserted into the population pool of the next generation one by one. To maintain diversity, an individual will be inserted only if its Euclidean distance is larger than a user-specified threshold compared with all existing individuals in the pool. The process goes on until N new individuals are generated, which makes up the next generation. Each algorithm has its own stopping criterion. The multimethod search stops when one of the stopping criteria is fulfilled. Then the search is re-run with the population size doubled and the number of offspring is recalculated by probability matching as Ni=NΨ / ∑ Ψ , where Ψ is the number of generations for which algorithm i is responsible for fitness improvement. To prevent an algorithm to be driven extinct, the minimum Ni/N ratio is fixed by the user. To pass information from one run to the other, the best found solution is included as one of the individuals in the otherwise randomly generated initial population of the next run, provided that a numerical condition is fulfilled. On simple uni-modal problems, the method obtains similar performance as existing individual algorithms. For complex multi-modal problems, the method is superior. the best predicted performance to run for one generation; and (5) repeat step (3) and (4) until a given computational budget is reached. Note that as the algorithm which has the best predicted performance may be different at different stages of the search, our approach will switch from one algorithm to another and back automatically and seamlessly. In a nutshell, we propose to choose, at any point of the search, the algorithm which has the best predicted performance to run (for one generation). A typical scenario of running our algorithm is that after some trials in which each algorithm runs in parallel and interacts indirectly, an algorithm which has the best predicted performance by a considerable margin stands out, and only it is run for quite some time. If it is a very good algorithm that excels in small, medium and large budgets, then only it will run from then on. However, if it is an algorithm that converges fast to a local optimum, as discussed above, then it will run for awhile and gradually the predicted performance will not be as good compared with other algorithms. At which time, a second algorithm, which has a better predicted performance, will take over. Like changes would occur as the search progresses. A novel online performance prediction metric is proposed to advise which algorithm should be chosen to generate the next new generation of solutions. The metric is parameter-less - it does not introduce any new control parameter to the system - thus avoiding the difficult parameter tuning and control problem [2]. We name our algorithm Multiple Evolutionary Algorithm (MultiEA). Recently, Peng et al. [15] proposed an algorithm called population-based algorithm portfolio (PAP). q EAs are run in parallel, each with their own populations. There are two important parameters in the algorithm: migration interval I and migration size s. A migration across population occurs every I generations. The migration process is as follows: for each algorithm, the s best individuals are found amongst all the individuals in the remaining (q-1) populations. These s individuals are appended to the population. Then the worst s individuals in the population are discarded. It is experimentally found that PAP performs better than running each individual algorithm with the same computational budget. The authors conclude that the performance gain is due to the synergy of the algorithms through the migration process. Currently in the EC community, CMA-ES [3] is one of the strongest algorithms. It is the clear winner in the BBOB2010 competition [4]. On the other hand, several improved variants of DE have won several competitions held in recent years [5]. Amongst these variants, Composite DE (CoDE) [6] and Self adaptive DE (SaDE) [7] are the state of the art. For the influential swarm intelligence methods, the current “standard” PSO algorithm is PSO2011 [8], and ABC [9] is one of the most recently proposed methods with the advantage of competitive performance and few control parameters. In this paper, ABC, CMA-ES, CoDE, PSO2011 and SaDE shall be chosen by us for further investigations since they are our choice algorithms if we were to recommend to outsiders. Other EC researchers may of course have other choices. In the companion paper [10], we report a novel way to automatically and parameter-lessly compose a portfolio, i.e., find its member algorithms. A related approach is the classic algorithm selection paradigm [16]. (See also the recent survey [17].) The approach aims to extract some distinguishing features from the problem, and use machine learning to map the features to the most suitable algorithm. The rest of this paper is organized as follows: Section 2 discusses the relationships of our approach with existing works. Section 3 reports MultiEA. Section 4 reports the numerical experiments. Section 5 investigates the properties of the prediction measures. Section 6 concludes. 2.2 Limitations of Existing Works [11] is designed for algorithms that are guaranteed to find the globally optimal solutions (Las Vegas algorithms). However, the usual applications of EA do not require or guarantee the finding of the global optimal solution (i.e., they are Monte Carlo algorithms). [12] requires a problem class to have been well defined and previous history is used to advise the algorithm selection. [13] also considers Las Vegas algorithms and a specific problem class. Moreover, the bandit metaphor may not be a suitable one for a Monte Carlo type optimization algorithm. These works are not applicable to the scenario in which a multiple algorithm EA is requested to handle a single problem instance with little or no applicable prior knowledge, the case considered in this paper. 2. RELATIONSHIPS WITH EXISTING WORKS 2.1 Existing Works Relatively little work has been done on using a portfolio of evolutionary algorithms to improve optimization effectiveness. Early works include [11, 12]. The recent work of Gagliolo and Schmidhuber [13] considered the problem as a multi-armed bandit problem on choosing different time allocators, whose run time distribution is determined online through censored and uncensored data. Recently, Vrugt et al. [14] reported a self adaptive multi-method search known as AMALGAM-SO. The All existing algorithms invariably introduce more control parameters into the algorithm (e.g. the definition of risk in [11], the risk aversion constant in the utility function of [12], the 568 Whether the best so far solution participates in the evolution, it is reasonable to keep a copy of it. Hence ( ) ≤ ( ) iff ≥ . An indicator of the performance is the convergence curve = {( , ( ))}, which records the entire history of the convergence trend. choices and parameters of the time allocators in [13], the initial number of offspring for each algorithm, the minimum probability, diversity threshold, stopping criteria for each algorithm, and the condition for the best found solution in [14], the two migration parameters in [15]). A careful tuning of these control parameters is needed. Such tuning is computationally expensive [2]. Even when this has been done, there is no guarantee that it would work well in a new unknown problem. While recent researches [14, 15] have demonstrated that a multiple algorithm approach is promising, a self adaptive approach is taken, i.e., incorporate the algorithm selection decision within the evolutionary process. It is known that a self adaptive approach does not mitigate completely the problem of premature convergence, as the whole population may be focused on only a part of the search space which then biases the future available search strategies. Moreover, offspring generated from different algorithms may mislead each other. Another disadvantage of the self adaptive approach is that it sheds no physical insight on why an algorithm is good for a problem, as it does not have an easily understood physical model for credit assignment. Also, the process sheds no new insight on which algorithm is good for which problem. The algorithm selection approach [16, 17] assumes that meaningful features can be extracted that fully characterizes the problem class, and the difficult feature-algorithm mapping can be learnt satisfactorily. Even if the above two requirements have been met, given an unknown problem, a preliminary uniform sampling has to be done to extract the features; the sampling may be biased due to the alias errors generated by under-sampling [17]. The relative performance of the algorithms can be compared by comparing the convergence curves. This is also the standard practice in many EC papers. A popular measure used in the EC community is comparing ( ) while keeping the total number of evaluations a constant. It simply means: run each algorithm by a fixed user defined number of evaluations and choose the winner. This exhaustive approach is very computationally expensive. It can be considered as a generalized form of parameter tuning, only that in this case, the parameter is replaced by algorithm. Such a generalized tuning is useful if P is a typical problem of the class of problems to be solved. However, in practical optimization scenarios, usually the class of problems is not well defined, and it is impossible to know beforehand what a typical problem is. It is more reasonable to predict fitness at some future point common to all algorithms and choose the winner. This requires extrapolating the convergence curve. We have tested simple extrapolation using ideas such as exponential curve (with parameters found using a genetic algorithm), polynomial or Taylor series (with parameters found using standard equations). We found that if one wishes to extrapolate a little, then these functions perform well. However, if the extrapolation is far into the future, then these functions give bad results. Moreover, these extrapolations do not use the knowledge that the curve is nonincreasing. In this paper, we propose a novel prediction measure to tackle this problem. Let revisit the convergence curve = {( , ( ))}. For clarity, we have dropped the subscript . Define ( ) = {( − , ( − )), … , ( , ( )))} as the sub-curve which includes all the data points from the ( − ) generation to the generation, for which is the history length. For each subcurve ( ), a linear regression is done to find 2.3 Merits of Our Approach In this paper, we present a novel alternative approach which overcomes the above limitations. It does not require the global optimal solution to be known; it works on a single problem instance with little or no a prior knowledge; it does not introduce any new control parameter; it uses a winner take all strategy rather than probability matching; as the algorithms run independently, it does not suffer from the limitations of a self adaptive approach, and offspring from different algorithms will not mislead each other. An easily understood physical model for credit assignment based on fitness prediction is introduced. It also gives direct insights on “which algorithm is good for which problem” as a function of computational budget. Existing multiple algorithm approaches [14, 15] find a good synergy by self adaptive information exchange between algorithms through a common population. The present approach aims to select the best algorithm at each instance by predicted future performance, based on past history. It switches between algorithms and back dynamically as new search information is obtained which enables revised predictions to be made. A synergy is achieved indirectly. Our approach is complementary to existing works, and furnishes an alternative and fresh view of the problem. arg min ∑( ( , ) , )∈ ( ) −( + ) (1) which gives a straight line with slope and y-intercept that minimizes the mean squared error. The predicted fitness using data points in ( ) for future generation is ( , )= + (2) Since may take on values from 1 to − 1, we have − 1 ( , 1) to ( , − 1). We treat these predicted values predicted values as sample points of an unknown distribution, and ( ) to it. In this paper, fit a bootstrap probability distribution the distribution is computed by kernel smoothing function estimate. Specifically, we use Matlab statistical toolbox function ksdensity. (Strictly speaking, ksdensity has some parameters that can be varied; in this paper, we have used the standard values provided in Matlab for a “standard” interpolation.) We predict the ( ) to get the predicted fitness fitness at by sampling from ( ). 3. MULTIEA This paper focuses on single objective (SO) continuous parameter optimization with dimension D. Without loss of generality, let the optimization be a minimization, and assume one has a problem P for which we have no prior knowledge about which algorithm is the best. Assume that EAs have been chosen and placed within algorithms are executed a portfolio = { , … , }. The independently and there is no information exchange between them. The underlying idea is to treat each linear regression model ( ) with equal probability. The distribution model is designed to model the uncertainty in the prediction. In the special case that the convergence curve is indeed linear, then the bootstrap distribution will be a single spike. Since more recent points appear in more regression models, implicitly a heavier weighting is put on more recent points and vice versa. Let be the number of generations that algorithm has run. Let ( ) be the best (i.e., smallest) fitness found by at generation . 569 ( )( > Now for each algorithm , the predicted best fitness ) predicts the fitness of the algorithm if it were evaluated generations. Sample bpdi(tmin) to get pfi(tmin). Step 4: Choose the algorithm with index arg mini(pfi(tmin)), which has the best predicted performance. Step 5: Run it for one generation. αi ← αi+1. Step 6: Record the best-so-far solution Sbest found by the portfolio. Step 7: Stop if total number of evaluations > N. Otherwise goto Step 2. Output: The best solution found Sbest. Assume that the population size of the algorithms is equal. Then the nearest future is + 1. This is the nearest future in which algorithm has run one more generation. In this case the unit of time is the number of generations. Let = max , … , . This is the nearest common future of all the algorithms. The algorithm which will generate the next generation ( ) , where ( ) for each is sampled is argmin from its bootstrap probability distribution ( ). It has a clear physical meaning: The algorithm that will run the next generation is one for which the predicted best fitness is the smallest if they were evaluated the same number of generations at the nearest common future point. 4. NUMERICAL EXPERIMENTS 4.1 Test Function Set MultiEA is examined using the 25 functions in CEC 2005 test suite. The dimension D of all test functions is chosen to be 30. The maximum number of function evaluations is set to 10,000D = 300,000. The performance of an algorithm is measured by statistics from 25 independent runs. In general, let be the population size of algorithm . Then the nearest future is , of which = + 1. This is the nearest future in which algorithm has run one more generation. In this case the unit of time is the number of fitness evaluations. Let ≡( ) ≡ max ,…, . This is the nearest common future of all the algorithms. The algorithm which will ( ) , where ( ) run the next generation is argmin 4.2 Test Algorithm Set To evaluate the performance of MultiEA, we compare its performance with three multi-algorithms: AMALGAM-SO, PAP and RandEA. The first two algorithms has been described in Section 2.1. for each is sampled from its bootstrap probability distribution ( ). Note that only efficient algorithms will be selected often; inefficient algorithms will be selected sparsely since its predicted fitness is large. Also, note that the above metric is parameter-less. It does not introduce additional parameters into the algorithm portfolio. RandEA means randomly chosen EA: Given the number of evaluations N, if we have no knowledge about which algorithm is the best, it is reasonable to randomly select an algorithm and run it for N evaluations. This algorithm is referred as RandEA. Its performance is defined as the average performance of the q algorithms. It serves as a baseline algorithm to test whether there is a positive synergy in MultiEA. One has to ensure that each algorithm is able to make a non-trivial is run until there is a decrease prediction. Thus each algorithm in the best fitness, i.e., until the smallest such that ( − 1) ≠ ( ). Note that the initial number of generations is determined automatically, without introducing any extra control parameter. As said in Section 1, ABC, CMA-ES, CoDE, PSO2011 and SaDE are chosen as component algorithms for MultiEA, AMALGAMSO, PAP, and RandEA. All algorithms are implemented in Matlab. All the experiments are conducted on a PC with 2.67GHz 4-core CPU and 3 GB of memory. Below, we summarize the proposed method: Algorithm (MultiEA). Input: A portfolio of q EAs AP={A1,…,Aq}; Ai has population size mi; single objective minimization problem P with dimension D; maximum number of evaluations N. Step 1: for i = 1 to q Run Ai until there is a change of fitness. Let αi be the number of generations that Ai has run. Step 2: Compute the nearest common future point tmin ≡(mt)max≡ max(m1t1, …, mqtq), where ti = αi + 1. Step 3: for i = 1 to q Construct convergence curve Ci = {(j, fi(j))| j=1,…, αi}. Construct sub-curves {Ci(1), …, Ci(αi-1)}. for each sub-curve Ci(l) Least square line fit to get line parameters (a, b). Predict the fitness at the smallest common future point pfi(tmin,l). Use the αi-1 sample points to pfi(tmin,l) (l=1,…, αi-1) to construct bootstrap probability distribution bpdi(tmin). 4.3 Results The detailed results (mean and standard deviation) are presented in Tables 1-3 in Appendix. To illustrate the significance of the results, the p-value for Mann-Whitney U test (rank-sum test) comparing the averaged best function error values of MultiEA with each of the other algorithms are listed. In the tables, a pvalue less than 0.05 (α=0.05) corresponds to significance in the comparison result. Values shown in bold means that the associated algorithm is significantly outperformed by MultiEA; while underlined values mean that MultiEA is significantly outperformed by the other algorithm. If the mean and/or the standard deviation returns “0.00E+00”, it simply means that the value is smaller than the smallest precision of floating point number of Matlab. MultiEA significantly outperforms AMALGAM-SO, PAP and RandEA in 13, 18 and 21 out of 25 test functions; while it is significantly outperformed by AMALGAM-SO, PAP and RandEA in 9, 3 and 4 test cases, respectively. Since MultiEA performs better than AMALGAM-SO and PAP, two state of the 570 art multiple algorithm approaches, this confirms the power and potential of MultiEA. Moreover, MultiEA outperforms RandEA. This suggests that by putting the algorithms in a portfolio using our method, a positive synergy of the algorithms is achieved. MultiEA + - Fig. 1 Summary of the result comparison between MultiEA and five other prediction measures. “+” means MultiEA significantly outperforms another algorithm; “-” means the opposite. 5. PROPERTIES OF PREDICTION MEASURE This section analyzes the properties of the proposed prediction measure in Section 3 by comparing its performance with 5 alternative prediction measures. It can be observed that MultiEA outperforms the other algorithms in a neck to neck comparison. Comparing with MultiEA1 and MultiEA2, the result suggests that the use of a probability distribution is a good strategy for modeling the evolution tendency of component algorithms than a single mean/median point. MultiEA3 may put too much strength on the latest data points. MultiEA4 put equal weight on all data points since it randomly chooses two points and doing a prediction. MultiEA outperforms both MultiEA3 and MultiEA4, which means that MultiEA attains a better balance in the use of data points than these two algorithms. MultiEA5 utilizes a histogram to select algorithm. The histogram roughly estimates the distribution of the predicted values. This is why MultiEA5 only obtains slightly worse results than MultiEA. 5.1 Test Prediction Measures Five different prediction measures are studied as follows. Measure 1: As said in Section 3, MultiEA builds a probability distribution based on − 1 predicted values ( , 1) to ( , − 1) samples from the obtained probability distribution and then chooses the algorithm having the best predicted performance. Instead of building a bootstrap probability distribution, measure 1 calculates the mean value based on − 1 predicted values for all component algorithms, and then chooses the algorithm having the best mean value. Measure 2: It calculates the median value based on − 1 predicted values for all component algorithms, and then chooses the algorithms having the best median value. Through the study of five different prediction measures, we can conclude that the proposed method could organize well the predicted values by forming a bootstrap distribution and identify the evolution tendency of component algorithms. Therefore, MultiEA achieves good performance (on experimental results on CEC 2005 test suite). Measure 3: It is a weighted linear regression method. The data points from the ( − ) generation to the generation are weighted by 1,2,…, (l+1), normalized by (l+1)(l+2)/2. This measure biases towards the near past data points more than the far past data points. The weighted linear regression is done to find arg min ∑( ( , ) , )∈ ( ) − ( , )( + ) MultiEA1 MultiEA2 MultiEA3 MultiEA4 MultiEA5 5 7 4 14 3 0 0 0 1 0 Note that MultiEA5 is only marginally worse than MultiEA. Thus it is an attractive alternative. (3) 6. CONCLUSIONS Evolutionary algorithms (EAs) have proliferated as one of the best tools for solving the ubiquitous optimization problem in the real world. Many excellent EAs have been reported in the past. However, there arises an important open problem which bewilders researchers and engineers alike: Confronted with an optimization problem, which algorithm should I choose? where ( , ) is the weight of ( , ). Except this change, this prediction measure is the same as MultiEA. Measure 4: It first randomly chooses two points, and then builds a probability density distribution using ksdensity. For l data points, there are combinations. As the number of evaluations 2 increases, this prediction measure will become quite slow (i.e., more computationally expensive). This paper is an attempt to answer the above question. A novel algorithm, known as Multiple Evolutionary Algorithm (MultiEA), is proposed. A portfolio is first formed by selecting several state of the art EAs. A novel predictive measure is reported to predict the performance of individual algorithms if they were extrapolated to the same number of evaluations in the nearest future. The algorithm with the best predicted performance is chosen to run for one generation. New search information is received and the history is updated. The predicted performance of the algorithms is updated and the algorithm with the best predicted performance is re-selected, which may or may not be the same algorithm in the last generation. Experimental results show that the measure is stable and is a reasonably effective predictor. The idea is simple and natural. It is parameter-less; it does not introduce any new control parameter to the algorithm, thus avoiding the challenging parameter tuning and control problem. Measure 5: Instead of using ksdensity as MultiEA, this prediction measure uses a histogram approach and samples based on the histogram of all predicted values ( , 1) to ( , − 1). Thus it is truly parameter-less and is less computationally expensive. Measures 1 and 2 are two simpler alternatives of forming probability distributions in MultiEA. Measure 3 is inspired by the heuristic that the most recent data points should reflect the trend of the performance of component algorithms better than the old data points. Measure 4 is an alternative for using linear regression and building a probability distribution, which is clearly more computationally expensive than MultiEA. Measure 5 avoids the use of ksdensity function in Matlab in building the probability distribution. Each component algorithm retains and uses its recommended set of parameters, which is the standard practice in the evolutionary community. No parameter tuning and control is required. Instead, our approach expects individual algorithms to play complementary roles. Different algorithms which excel in different problems will stand out when required. Moreover, as different algorithms may be the best for different computational For easy reference, the associated algorithms of these five measures are denoted by MultiEA1, MultiEA2,…, and MultiEA5. 5.2 Results The detailed results (mean, standard deviation and p-value using U test) are presented in Tables 1-3 in Appendix. Fig. 1 presents a summary of the statistical hypothesis test results. 571 In this paper, we have used our personal expertise and judgment to select algorithms to build a portfolio with fixed algorithms. In principle, if we have had selected a very bad algorithm, it would not affect too much the performance of MultiEA. This is because MultiEA will only select the best algorithm to run at any one time. In the companion paper [10], we report a novel method to find the best combination to compose a portfolio. budget in the sense of absolute maximum number of fitness evaluations, the approach fully allows the selection of the best algorithm given a fixed computational budget, as well as automatic algorithm switching as the budget varies. Compared with the algorithm selection approach, ours do not require the preliminary feature extraction via sampling of the search space of the problem. Instead, we assume that the problem is an unknown one. Of course, it does not preclude information collection during the execution of our algorithm (and appropriate post-processing resampling to approximate uniform sampling) which can aid the algorithm mapping for the algorithm selection approach. Hence the two approaches are entirely complimentary. The initial motivation of this paper is a question from a colleague: there are so many evolutionary algorithms claimed to be powerful and useful, which algorithm should I choose? As this question comes from a user of evolutionary computation for optimization of practical problems, we hope that our paper has answered him in one way. We encourage more research in this interesting and important direction. For existing approaches that works on an unknown problem, some recent multiple algorithm portfolio approaches use a common population and a self adaptive approach to apportion different algorithms at different stages. Compared with these approaches, ours use a distinctly different philosophy, namely, we concentrate on selecting the best algorithm given the current computational budget and predicted performance. We believe that these two approaches are complementary, and this paper provides a fresh alternative. 7. ACKNOWLEDGMENTS The work described in this paper was supported by a grant from CityU (7002746). 8. REFERENCES [1] Engelbrecht, A.P. 2007. Computational Intelligence, An Introduction 2nd Ed. Wiley. Five algorithms, namely, Artificial Bee Colony (ABC), Covariance Matrix Adaptation Evolutionary Strategy (CMA-ES), Composite Differential Evolution (CoDE), Self adaptive Differential Evolution (SaDE) and Particle Swarm Optimization (PSO2011) are selected to compose our portfolio in MultiEA in our experiments. These algorithms are selected because they are state of the art algorithms, and have significantly different characteristics, strengths and pitfalls. [2] Lobo, F.G., Lima, and C.F., Michalewicz, Z. Eds. 2007. Parameter Setting in Evolutionary Algorithms. Springer, Berlin, Germany. [3] Hansen, N. 2011. The CMA Evolutionary Strategy: A Tutorial. Technical Report, 28 June 2011. URL http://www.lri.fr/~hansen/cmatutorial.pdf [4] Black-Box Optimization Benchmarking (BBOB) 2010, http://coco.gforge.inria.fr/doku.php?id=bbob-2010 Experimental results on the full set of CEC 2005 benchmark functions show that MultiEA outperforms i) Multialgorithm Genetically Adaptive Method for Single-Objective Optimization (AMALGAM-SO); ii) Population-based Algorithm Portfolio (PAP); iii) a simple multiple algorithm approach which chooses an algorithm randomly (RandEA). [5] Das, S., and Suganthan, P.N. 2011. Differential evolution: a survey of the state-of-the-art. IEEE Trans. Evol. Comput. 15(1), 4-31. [6] Wang, Y., Cai, Z., and Zhang, Q. 2011. Differential evolution with composite trial vector generation strategies and control parameters. IEEE Trans. Evol. Comput. 15(1), 55-66. The properties of the prediction measures are also studied. It is found that the prediction measure achieves a better performance than five alternative heuristic prediction measures, but one alternative measure, which is truly parameter-less and is simpler, is a strong alternative. [7] Qin, A.K., Huang, V.L., and Suganthan, P.N. 2009. Differential evolution algorithm with strategy adaptation for global numerical optimization. IEEE Trans. Evol. Comput. 13(2), 398-417. The proposed approach is generic: Though the scope of this paper is restricted to portfolio of EAs, it is a general framework and good non-EA (e.g. Efficient Global Optimization algorithm (EGO) [18]) is also welcome. Experiments have been conducted on continuous optimization problems in this paper, but there is no reason why it cannot be applied to discrete optimization problem. The proposed MultiEA may also be considered as a hyperheuristic [19]. [8] Particle Swarm Central, http://www.particleswarm.info/ [9] Karaboga, D., and Basturk, B. 2007. A powerful and efficient algorithm for numerical function optimization: artificial bee colony (ABC) algorithm. J. Global Optim. 39(3), 459-471. [10] On Composing an (Evolutionary) algorithm portfolio. GECCO 2013. Theoretically, MultiEA, by virtue that it is just another EA, will still be under the no free lunch (NFL) theorems [20]. There are likely to be situations that MultiEA gets a bad result. However, the premise of the NFL is that all computable problems are equally likely to occur. It is hardly the case in the real world. In fact, the promise of multi-method search is that by using many search metaphor inspired by nature, mathematics, sciences, arts, etc., one would build up better search strategies that are more able to deal with real world problems. It may perhaps be likened to synergy of individuals with different talents in a heterogeneous society. [11] Huberman, B.A., Lukose, R.M., and Hogg, T. 1997. An economics approach to hard computational problems. Science 275, 51-54. [12] Fukunaga, A.S. 2000. Genetic algorithm portfolios, Proc. IEEE CEC, 1304-1311. [13] Gagliolo, M., and Schmidhuber, J. 2011. Algorithm portfolio selection as a bandit problem with unbounded losses. Ann. Math. Artif. Intell. 61, 49-86. [14] Vrugt, J.A., Robinson, B. A., and Hyman, J.M.. 2009. Selfadaptive multimethod search for global optimization in real- 572 parameter spaces. IEEE Trans. Evol. Comput. 13(2), 243259. [18] Jones, D., Schonlau, M., and Welch W. 1998. Efficient global optimization of expensive black-box functions. J. Global Optim. 13, 455–492. [15] Peng, F., Tang, K., Chen, G., and Yao, X. 2010. Populationbased algorithm portfolios for numerical optimization, IEEE Trans. Evol. Comput. 14(5), 782-800. [19] Burke, E., Kendall, G., Newall, J., Hart, E., Ross, P. and Schulenburg, S. 2003. Hyper-heuristics: an emerging direction in modern search technology, in Glover F. and Kochenberger, G.A. eds. Handbook of Meta-Heuristics. Springer US, 2003, 457–474. [16] Rice, J. 1976. The algorithm selection problem. Adv. Comput. 15, 65-118. [17] Muñoz, M.A., Kirley, M., and Halgamuge, S.K. 2013. The algorithm selection problem on the continuous optimization domain. Moewes, C. et al. (Eds.): Computational Intelligence in Intelligent Data Analysis, SCI 445, 75-89. [20] Wolpert, D. H., and Macready, W. G. 1997. No free lunch theorems for optimization. IEEE Trans. Evol. Comput. 1(1), 67-82. APPENDIX Table 1. − with = . The mean and standard deviation (std) of the final function error values obtained by AMALGAMSO, PAP, RandEA, MultiEA and five other prediction measures in 25 runs. p is p-value calculated using U test with α=0.05 Algorithm AMALGAM-SO PAP RandEA MultiEA MultiEA1 MultiEA2 MultiEA3 MultiEA4 MultiEA5 mean std p mean std p mean std p mean std mean std p mean std p mean std p mean std p mean std p 8.38E-15 1.78E-15 9.73E-11 3.89E-29 1.62E-28 4.12E-02 9.48E-17 1.18E-17 9.73E-11 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.00E+00 0.00E+00 0.00E+00 1.00E+00 0.00E+00 0.00E+00 1.00E+00 6.71E-28 4.39E-28 3.32E-09 0.00E+00 0.00E+00 1.00E+00 4.82E-15 9.21E-16 1.42E-09 9.83E-26 4.93E-26 1.42E-09 4.90E+02 1.55E+02 1.42E-09 3.95E-24 7.79E-25 3.10E-10 1.37E-09 7.71E-01 5.92E-10 1.91E-09 3.84E-04 3.55E-11 1.36E-10 3.42E-01 4.06E-09 1.41E-08 2.87E-08 2.16E-11 8.05E-11 2.22E-01 9.47E-14 2.67E-14 2.98E-02 4.06E+04 1.08E+05 1.17E-02 1.66E+06 4.46E+05 1.42E-09 8.82E+04 1.51E+05 2.13E+05 2.35E+05 3.79E-02 2.22E+05 1.78E+05 2.32E-03 1.22E+05 2.10E+05 9.85E-01 1.73E+05 1.86E+05 2.63E-03 1.41E+05 2.07E+05 7.71E-01 573 8.53E+01 1.98E+02 6.25E-02 2.36E+00 7.35E+00 1.99E-02 5.43E+04 8.76E+04 1.42E-09 5.18E-02 1.18E-01 8.90E+00 3.75E+01 2.77E-01 1.26E+00 3.45E+00 2.44E-02 1.06E+01 5.24E+01 2.29E-01 1.82E+01 4.41E+01 4.61E-05 4.28E+00 1.56E+01 1.12E-01 9.72E-05 4.85E-04 1.42E-09 1.33E+03 5.84E+02 5.00E-02 4.09E+03 3.41E+02 1.42E-09 1.01E+03 6.16E+02 1.14E+03 4.48E+02 2.95E-01 1.32E+03 6.81E+02 1.30E-01 1.36E+03 6.30E+02 4.16E-02 1.53E+03 5.62E+02 3.19E-03 1.09E+03 6.55E+02 7.71E-01 9.04E-15 4.33E-15 4.97E-01 3.26E-01 1.28E+00 3.61E-02 1.95E+02 3.86E+02 1.42E-09 3.26E-01 8.74E-01 5.98E-01 1.44E+00 9.23E-01 8.03E-01 2.88E+00 9.69E-01 9.71E-01 2.75E+00 9.85E-01 7.14E+00 4.77E+00 6.18E-08 8.50E-01 3.55E+00 2.60E-01 3.94E-15 7.80E-16 8.98E-02 8.56E-03 1.21E-02 8.30E-01 1.25E-02 5.43E-03 2.86E-03 6.90E-03 6.43E-03 7.58E-03 8.75E-03 8.72E-01 9.55E-03 1.19E-02 4.12E-01 7.68E-03 7.79E-03 5.86E-01 1.12E-02 9.04E-03 2.86E-03 1.15E-02 1.29E-02 2.08E-01 2.10E+01 4.41E-02 1.81E-06 2.03E+01 2.28E-01 9.86E-03 2.09E+01 3.84E-02 2.63E-03 2.06E+01 3.80E-01 2.07E+01 3.35E-01 1.35E-01 2.05E+01 3.48E-01 8.54E-01 2.07E+01 2.40E-01 6.53E-02 2.07E+01 2.83E-01 1.62E-01 2.06E+01 3.25E-01 4.38E-01 Table 2. − with = . The mean and standard deviation (std) of the final function error values obtained by AMALGAMSO, PAP, RandEA, MultiEA and five other prediction measures in 25 runs. p-value is calculated using U test with α=0.05 Algorithm AMALGAM-SO PAP RandEA MultiEA MultiEA1 MultiEA2 MultiEA3 MultiEA4 MultiEA5 mean std p mean std p mean std p mean std mean std p mean std p mean std p mean std p mean std p 5.05E+00 7.18E+00 9.73E-11 0.00E+00 0.00E+00 1.00E+00 5.51E+01 1.06E+01 9.73E-11 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.00E+00 0.00E+00 0.00E+00 1.00E+00 0.00E+00 0.00E+00 1.00E+00 3.03E-06 1.49E-05 1.37E-07 4.97E-16 2.49E-15 3.37E-01 7.93E+00 1.02E+01 1.17E-08 3.70E+01 9.39E+00 7.58E-02 1.07E+02 1.07E+01 1.42E-09 4.29E+01 1.08E+01 4.46E+01 1.90E+01 7.71E-01 5.34E+01 3.88E+01 6.07E-01 3.98E+01 1.03E+01 2.86E-01 4.16E+01 9.88E+00 7.20E-01 4.46E+01 1.14E+01 8.92E-01 9.91E-01 1.37E+00 1.80E-09 3.41E+01 1.56E+00 1.42E-09 2.45E+01 1.59E+00 1.42E-09 9.06E+00 3.36E+00 1.72E+01 1.04E+01 2.83E-02 2.52E+01 7.44E+00 1.99E-07 1.87E+01 9.86E+00 2.99E-03 1.00E+01 5.03E+00 6.84E-01 2.19E+01 9.59E+00 5.91E-05 3.86E+02 5.60E+02 1.04E-03 1.38E+04 2.23E+04 1.53E-02 2.31E+04 4.55E+03 1.42E-09 2.84E+03 3.33E+03 5.13E+03 3.19E+03 6.60E-03 5.05E+03 5.08E+03 1.16E-01 3.45E+03 2.69E+03 3.13E-01 6.87E+03 6.02E+03 1.37E-02 3.90E+03 5.37E+03 3.04E-01 1.73E+00 5.57E-01 6.18E-08 9.46E+00 9.82E-01 1.42E-09 3.67E+00 8.31E-01 1.42E-09 9.94E-01 1.38E-01 9.99E-01 2.36E-01 3.42E-01 1.08E+00 2.22E-01 2.60E-01 1.06E+00 1.83E-01 1.40E-01 1.39E+00 2.67E-01 3.34E-07 1.10E+00 4.21E-01 5.48E-01 1.13E+01 1.41E+00 1.04E-03 1.33E+01 1.67E-01 2.29E-09 1.32E+01 1.69E-01 2.30E-08 1.25E+01 5.31E-01 1.29E+01 3.21E-01 1.79E-02 1.30E+01 3.80E-01 2.11E-04 1.27E+01 3.66E-01 2.14E-01 1.26E+01 5.08E-01 7.12E-01 1.26E+01 5.15E-01 5.35E-01 2.45E+02 1.00E+02 9.51E-08 3.24E+02 1.14E+02 3.96E-09 2.76E+02 3.34E+01 1.42E-09 1.40E+01 4.10E+01 2.23E+01 6.13E+01 7.12E-01 2.12E+01 5.25E+01 9.23E-01 2.64E+01 6.22E+01 6.98E-01 2.57E+01 4.98E+01 9.32E-03 3.56E+01 6.46E+01 3.83E-01 2.58E+01 1.31E+01 4.13E-09 1.46E+02 5.82E+01 1.01E-06 1.66E+02 4.26E+01 9.51E-08 7.66E+01 2.91E+01 1.08E+02 7.91E+01 1.81E-01 9.96E+01 4.78E+01 4.34E-03 9.33E+01 4.19E+01 3.42E-01 9.46E+01 2.59E+01 3.84E-03 8.25E+01 2.52E+01 2.86E-01 Table 3. − with = . The mean and standard deviation (std) of the final function error values obtained by AMALGAMSO, PAP, RandEA, MultiEA and five other prediction measures in 25 runs. p-value is calculated using U test with α=0.05 Algorithm AMALGAM-SO PAP RandEA MultiEA MultiEA1 MultiEA2 MultiEA3 MultiEA4 MultiEA5 mean std p mean std p mean std p mean std mean std p mean std p mean std p mean std p mean std p 4.80E+01 3.86E+01 1.62E-07 2.23E+02 7.43E+01 9.62E-05 2.16E+02 5.56E+01 3.07E-04 1.51E+02 5.99E+01 1.26E+02 4.87E+01 1.68E-01 1.40E+02 7.89E+01 3.04E-01 1.19E+02 5.66E+01 1.12E-01 1.02E+02 5.89E+01 4.34E-03 1.31E+02 3.63E+01 3.13E-01 9.04E+02 1.29E+00 1.42E-09 9.07E+02 2.71E+00 1.42E-09 8.47E+02 1.81E+01 6.22E-03 8.52E+02 1.51E+00 8.52E+02 5.38E-01 3.32E-01 8.52E+02 6.22E-01 9.07E-01 8.52E+02 4.82E-01 6.69E-01 8.52E+02 8.13E-01 4.49E-01 8.52E+02 6.91E-01 9.07E-01 9.04E+02 8.76E-01 1.42E-09 9.07E+02 2.11E+00 1.42E-09 8.48E+02 1.75E+01 1.51E-05 8.52E+02 5.62E-01 8.52E+02 3.75E-01 3.83E-01 8.52E+02 6.01E-01 4.26E-01 8.52E+02 1.06E+00 4.97E-01 8.52E+02 5.50E-01 4.38E-01 8.52E+02 5.63E-01 5.09E-01 9.04E+02 1.19E+00 1.42E-09 9.08E+02 3.83E+00 1.42E-09 8.48E+02 1.73E+01 3.07E-04 8.52E+02 2.29E+00 8.52E+02 6.21E-01 5.09E-01 8.52E+02 5.22E-01 5.61E-01 8.52E+02 3.86E-01 4.38E-01 8.52E+02 6.95E-01 4.15E-01 8.52E+02 6.53E-01 8.77E-01 574 5.00E+02 3.50E-04 1.88E-10 5.12E+02 5.88E+01 1.18E-03 7.09E+02 9.63E+01 1.88E-10 4.93E+02 2.48E+01 5.00E+02 6.96E-14 8.97E-02 5.00E+02 1.89E-13 8.28E-03 4.93E+02 2.48E+01 2.74E-01 4.97E+02 1.31E+01 3.19E-02 4.96E+02 1.80E+01 3.71E-02 8.49E+02 2.29E+01 1.42E-09 9.06E+02 2.49E+01 1.42E-09 6.63E+02 1.98E+01 1.42E-09 5.77E+02 2.01E+00 5.76E+02 1.46E+00 2.14E-01 5.76E+02 2.43E+00 1.87E-01 5.78E+02 2.71E+00 1.35E-01 5.79E+02 3.29E+00 3.39E-03 5.77E+02 2.13E+00 9.69E-01 5.34E+02 1.35E-03 3.57E-05 5.34E+02 7.10E-03 4.41E-04 7.46E+02 6.92E+01 1.40E-09 5.33E+02 3.38E+00 5.34E+02 8.32E-03 5.12E-04 5.50E+02 8.06E+01 1.22E-01 5.34E+02 6.68E-03 1.12E-03 5.34E+02 1.61E-03 9.61E-01 5.34E+02 2.29E-03 2.23E-02 2.00E+02 3.53E-04 9.73E-11 2.00E+02 8.42E-13 1.61E-01 3.25E+02 5.50E+01 9.73E-11 2.00E+02 2.90E-14 2.00E+02 2.90E-14 1.00E+00 2.00E+02 2.90E-14 1.00E+00 2.00E+02 2.90E-14 1.00E+00 2.16E+02 7.78E+01 3.37E-01 2.15E+02 7.53E+01 3.37E-01 2.09E+02 7.99E-01 1.42E-09 1.63E+03 3.31E+00 1.42E-09 5.17E+02 6.46E+00 1.42E-09 5.79E+02 5.99E+00 5.81E+02 5.28E+00 3.42E-01 5.81E+02 5.69E+00 5.48E-01 5.85E+02 5.44E+00 1.19E-03 5.83E+02 5.90E+00 7.75E-02 5.80E+02 4.36E+00 7.42E-01

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Paper ~ Which Algorithm Should I Choose At Any Point of the