Download Self–Adaptation in Evolutionary Algorithms Revisited

Self–Adaptation in Evolutionary Algorithms Revisited James McKnight Stoddart McKenzie NI VER S Y TH IT E U R G H O F E D I U N B Master of Science School of Informatics University of Edinburgh 2004 Abstract The concept of dynamic adaptation of operator probabilites in a genetic algorithm (GA) has been well studied in the past. The rationale behind this idea is that by modifying operator probabilities, as the genetic algorithm runs, we can usefully bias the probability of selecting a given operator, relative to its recent success in producing fitter children. This dissertation reports an empirical investigation into the performance of the same adaptive mechanism, applied to adapting population level (global) operator probabilities, and also to individual (local) level operator probabilities. In addition, a non– adaptive genetic algorithm performance is used for comparative purposes. Several test problems form the basis of the experiments, including numerical optimisation problems (most of which are defined by the De Jong test suite), the MaxOnes problem and a number of travelling salesman problems. On average, individual level adaptation performed only equally as well as population level adaptation. For suitably large problems both population and individual level adaptation were found to provide performance improvements over a non–adaptive GA. In addition, utility was found in the increased robustness to parameter settings offered by the adaptive GAs, for many problem instances. iii Acknowledgements I would firstly like to thank my supervisor, Dr. Andrew Tuson, for his continued guidance and advice throughout the project. I would also like to thank Bryant Julstrom, of St. Cloud State University, Minnesota, for the provision of papers and his generally helpful attitude. Thanks also to my family and friends for providing support and encouragement throughout the entire MSc. iv Declaration I declare that this thesis was composed by myself, that the work contained herein is my own except where explicitly stated otherwise in the text, and that this work has not been submitted for any other degree or professional qualification except as specified. (James McKnight Stoddart McKenzie) v Table of Contents 1 Introduction 1.1 1.2 1.3 1 Project Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 Aims and Approach . . . . . . . . . . . . . . . . . . . . . . 2 Genetic Algorithm Overview . . . . . . . . . . . . . . . . . . . . . . 3 1.2.1 Components of a Genetic Algorithm . . . . . . . . . . . . . . 3 1.2.2 Parameters of a Genetic Algorithm . . . . . . . . . . . . . . . 7 Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2 Previous Research in Self–Adaptation 11 2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.1 Methods of Self–Adaptation . . . . . . . . . . . . . . . . . . 12 2.2.2 Levels of Self–Adaptation . . . . . . . . . . . . . . . . . . . 13 2.2.3 Adaptation Evidence . . . . . . . . . . . . . . . . . . . . . . 14 2.2.4 Subject of Adaptation . . . . . . . . . . . . . . . . . . . . . 14 2.2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Co–evolutionary Examples . . . . . . . . . . . . . . . . . . . . . . . 15 2.3.1 Adaptation of Crossover Type . . . . . . . . . . . . . . . . . 15 2.3.2 Co–evolution at Different Levels . . . . . . . . . . . . . . . . 17 Learning Rule Examples . . . . . . . . . . . . . . . . . . . . . . . . 17 2.4.1 Population Level Adaptation . . . . . . . . . . . . . . . . . . 17 2.4.2 Reinforcement Learning Approach . . . . . . . . . . . . . . . 18 2.4.3 Individual Level Adaptation . . . . . . . . . . . . . . . . . . 20 2.3 2.4 vii 2.5 General Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.5.1 Parameter Migration . . . . . . . . . . . . . . . . . . . . . . 22 2.5.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3 System Implementation 25 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.2 The Basic Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . . 26 3.3 The ADOPP System . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.4 ADOPP Modification for Individual Adaptation . . . . . . . . . . . . 31 3.5 Genetic Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.5.1 Binary Encoding Operators . . . . . . . . . . . . . . . . . . . 34 3.5.2 Permutation Encoding Operators . . . . . . . . . . . . . . . . 35 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.6 4 The Test Problems 37 4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.2 Binary f6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.3 30 City TSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.4 100 City TSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.5 MaxOnes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.6 De Jong Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 5 Formative Experiments 45 5.1 Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.2.1 Normal GA Parameter Treatments . . . . . . . . . . . . . . . 46 5.2.2 Adaptive Parameter Treatments . . . . . . . . . . . . . . . . 46 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.3.1 Binary F6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.3.2 30 City TSP . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.3.3 100 City TSP . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.3.4 MaxOnes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.3 viii 5.3.5 De Jong f1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.3.6 De Jong f2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.3.7 De Jong f3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.3.8 De Jong f4 . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.3.9 De Jong f5 . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 5.4 Revisiting Median/Parent Improvement . . . . . . . . . . . . . . . . 54 5.5 Tuned Parameter Values . . . . . . . . . . . . . . . . . . . . . . . . 55 5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 6 Summative Experiments 59 6.1 Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 6.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 6.2.1 T–test Details . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 6.3.1 Binary F6 Results . . . . . . . . . . . . . . . . . . . . . . . . 63 6.3.2 30 city TSP . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 6.3.3 100 city TSP . . . . . . . . . . . . . . . . . . . . . . . . . . 66 6.3.4 MaxOnes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 6.3.5 De Jong f1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 6.3.6 De Jong f2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 6.3.7 De Jong f3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 6.3.8 De Jong f4 . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 6.3.9 De Jong f5 . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 6.3.10 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Additional Large TSPs . . . . . . . . . . . . . . . . . . . . . . . . . 77 6.4.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Additional TSPs with Varying Structure . . . . . . . . . . . . . . . . 79 6.5.1 105 City TSP . . . . . . . . . . . . . . . . . . . . . . . . . . 80 6.5.2 127 City TSP . . . . . . . . . . . . . . . . . . . . . . . . . . 81 6.5.3 225 City TSP . . . . . . . . . . . . . . . . . . . . . . . . . . 82 6.5.4 120 City TSP . . . . . . . . . . . . . . . . . . . . . . . . . . 82 6.5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 6.3 6.4 6.5 ix 7 Conclusion 87 7.1 Project Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 7.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 7.2.1 Hypothesis Test . . . . . . . . . . . . . . . . . . . . . . . . . 88 7.2.2 Normal Versus Adaptive Performance . . . . . . . . . . . . . 91 Further Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 7.3 Bibliography 93 A Test Problem Settings 97 A.1 Binary Encoded Problems . . . . . . . . . . . . . . . . . . . . . . . 97 A.2 Traveling Salesman Problems . . . . . . . . . . . . . . . . . . . . . . 97 B Location of Data Files and Source Code 99 B.1 Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 B.1.1 Development and Execution Environment . . . . . . . . . . . 99 B.1.2 Location . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 B.2 Data Location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 B.2.1 Formative Experiment Results . . . . . . . . . . . . . . . . . 100 B.2.2 Summative Experiment Results . . . . . . . . . . . . . . . . 100 x List of Figures 3.1 An Operator History . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.2 Edge-swap Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.1 Map of 30 city TSP . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.2 Map of 100 city TSP . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.1 Performance for Binary F6 - All GA Types . . . . . . . . . . . . . . 48 5.2 Performance for 30 city TSP - All GA Types . . . . . . . . . . . . . . 49 5.3 Performance for 100 city TSP - All GA Types . . . . . . . . . . . . . 50 5.4 Performance for MaxOnes - All GA Types . . . . . . . . . . . . . . . 51 5.5 Performance for De Jong f1 - All GA Types . . . . . . . . . . . . . . 52 5.6 Performance for De Jong f2 - All GA Types . . . . . . . . . . . . . . 53 5.7 Performance for De Jong f3 - All GA Types . . . . . . . . . . . . . . 54 5.8 Performance for De Jong f4 - All GA Types . . . . . . . . . . . . . . 55 5.9 Performance for De Jong f5 - All GA Types . . . . . . . . . . . . . . 57 6.1 Comparative performance for Binary f6 . . . . . . . . . . . . . . . . 63 6.2 Operator probability adaptation for Binary f6 . . . . . . . . . . . . . 63 6.3 Comparative performance for 30 city TSP . . . . . . . . . . . . . . . 65 6.4 Operator probability adaptation for 30 city TSP . . . . . . . . . . . . 65 6.5 Comparative performance for 100 city TSP . . . . . . . . . . . . . . 66 6.6 Operator probability adaptation for 100 city TSP . . . . . . . . . . . 67 6.7 Comparative performance for MaxOnes . . . . . . . . . . . . . . . . 68 6.8 Operator probability adaptation for MaxOnes . . . . . . . . . . . . . 68 6.9 Comparative performance for De Jong f1 . . . . . . . . . . . . . . . 69 xi 6.10 Operator probability adaptation for De Jong f1 . . . . . . . . . . . . . 70 6.11 Comparative performance for De Jong f2 . . . . . . . . . . . . . . . 71 6.12 Operator probability adaptation for De Jong f2 . . . . . . . . . . . . . 71 6.13 Comparative performance for De Jong f3 . . . . . . . . . . . . . . . 72 6.14 Operator probability adaptation for De Jong f3 . . . . . . . . . . . . . 73 6.15 Comparative performance for De Jong f4 . . . . . . . . . . . . . . . 75 6.16 Operator probability adaptation for De Jong f4 . . . . . . . . . . . . . 75 6.17 Comparative performance for De Jong f5 . . . . . . . . . . . . . . . 76 6.18 Operator probability adaptation for De Jong f5 . . . . . . . . . . . . . 76 6.19 Maps of 100,150 and 200 city TSPs . . . . . . . . . . . . . . . . . . 78 6.20 Comparative performance for 150 city TSP . . . . . . . . . . . . . . 78 6.21 Operator probability adaptation for 150 city TSP . . . . . . . . . . . 79 6.22 Comparative performance for 200 city TSP . . . . . . . . . . . . . . 80 6.23 Operator probability adaptation for 200 city TSP . . . . . . . . . . . 80 6.24 Map of 105 city TSP . . . . . . . . . . . . . . . . . . . . . . . . . . 81 6.25 Comparative performance for 105 city TSP . . . . . . . . . . . . . . 81 6.26 Operator probability adaptation for 105 city TSP . . . . . . . . . . . 82 6.27 Map of 127 city TSP . . . . . . . . . . . . . . . . . . . . . . . . . . 82 6.28 Comparative performance for 127 city TSP . . . . . . . . . . . . . . 83 6.29 Operator probability adaptation for 127 city TSP . . . . . . . . . . . 83 6.30 Map of 225 city TSP . . . . . . . . . . . . . . . . . . . . . . . . . . 84 6.31 Comparative performance for 225 city TSP . . . . . . . . . . . . . . 84 6.32 Operator probability adaptation for 225 city TSP . . . . . . . . . . . 85 6.33 Map of 120 city TSP . . . . . . . . . . . . . . . . . . . . . . . . . . 85 6.34 Comparative performance for 120 city TSP . . . . . . . . . . . . . . 85 6.35 Operator probability adaptation for 120 city TSP . . . . . . . . . . . 86 xii List of Tables 5.1 Tuned Parameter Settings . . . . . . . . . . . . . . . . . . . . . . . . 56 6.1 p–values for Binary f6 . . . . . . . . . . . . . . . . . . . . . . . . . 64 6.2 p–values for 30 city TSP . . . . . . . . . . . . . . . . . . . . . . . . 66 6.3 p–values for 100 city TSP . . . . . . . . . . . . . . . . . . . . . . . . 67 6.4 p–values for MaxOnes . . . . . . . . . . . . . . . . . . . . . . . . . 69 6.5 p–values for De Jong f1 . . . . . . . . . . . . . . . . . . . . . . . . . 70 6.6 p–values for De Jong f2 . . . . . . . . . . . . . . . . . . . . . . . . . 72 6.7 p–values for De Jong f3 . . . . . . . . . . . . . . . . . . . . . . . . . 73 6.8 p–values for De Jong f4 . . . . . . . . . . . . . . . . . . . . . . . . . 74 6.9 p–values for De Jong f5 . . . . . . . . . . . . . . . . . . . . . . . . . 74 6.10 p–values for 150 city TSP . . . . . . . . . . . . . . . . . . . . . . . . 79 6.11 p–values for 200 city TSP . . . . . . . . . . . . . . . . . . . . . . . . 79 6.12 p–values for 105 city TSP . . . . . . . . . . . . . . . . . . . . . . . . 81 6.13 p–values for 127 city TSP . . . . . . . . . . . . . . . . . . . . . . . . 84 6.14 p–values for 225 city TSP . . . . . . . . . . . . . . . . . . . . . . . . 84 6.15 p–values for 120 city TSP . . . . . . . . . . . . . . . . . . . . . . . . 86 A.1 Settings for all binary encoded problems . . . . . . . . . . . . . . . . 98 A.2 Settings for all TSPs . . . . . . . . . . . . . . . . . . . . . . . . . . 98 xiii Chapter 1 Introduction 1.1 Project Overview Evolutionary Algorithms encompass the three main approaches of evolutionary computing: Genetic Algorithms, Evolutionary Strategies and Evolutionary Programming. Although the specific details of each approach vary, they all draw inspiration from the principles of natural evolution. Fogel [6], provides an introductory overview of the Evolutionary Algorithms field. This project focuses on Genetic Algorithms (GAs), a powerful and general purpose technique for function optimisation. There are a great many decisions faced by the GA designer, such as; population model, selection method, genetic operators to employ, probability of applying certain operators etc. Due to the large number of design choices and parameter settings involved in implementing and running GAs they are, as a result, very hard to ‘tune’, i.e. to configure in an optimal fashion for the problem at hand. Even if a reasonably extensive manual exploration of the parameter space is made, and the best parameters are selected from the findings of this search, there is still no guarantee the settings are optimal. It is very likely that the ‘ideal’ parameter settings are actually dynamic schedules which vary as the GA progresses. These two factors give rise to the concept of having parameters adapt, as the GA runs, in order to hopefully provide a reduction in parameter tuning requirements and/or an increase in performance since the intuition is that if the parameters adapt in the 1 2 Chapter 1. Introduction desired way this will yield better optimisation capabilities. This project considers an adaptive GA system, which operates at the population and individual level (separately) in order to ascertain if there is an advantage to be had with the finer grained approach of individual level adaptation. In addition a non–adaptive GA counterpart will also be compared with an adaptive implementation in order to ascertain if performance gains are feasible over the ‘normal’ approach of having fixed parameters. The comparison with the normal (non–adaptive) GA will place the work in a wider, more general context, though of primary focus is the difference, if any, between population and individual level adaptation. 1.1.1 Aims and Approach The primary aim of the project is to test the hypothesis: An adaptive mechanism operating at the individual level will perform as well or better than the same mechanism operating at the population level A secondary aim of the project is to investigate the nature of operator probability adaptation for different problems i.e. track the probabilities as the GA progresses. Furthermore, rigorous statistical comparisons between the adaptive and non–adaptive GAs will be made. In order to realise the proposed aims there are three major components required: 1. The GA system, including normal and adaptive functionality (discussed in Chapter 3) 2. Test problems which the system will optimise (discussed in Chapter 4) 3. A principled means of comparing results of the optimisations (discussed in Chapter 6) The following section provides a brief introduction to Genetic Algorithm concepts and terminology. 1.2. Genetic Algorithm Overview 3 1.2 Genetic Algorithm Overview Genetic Algorithms (GAs) were originally proposed by John Holland in [9] and are based on an abstraction of the principles of natural evolution. Holland’s original intent was to develop a formal framework for the study of natural evolution, and this remains an area of active research today. However, Genetic Algorithms (GAs) also gained popularity as a method of function optimisation and have been successfully applied to many industrial applications including timetabling, job scheduling in multi– processor systems, circuit layout and aerofoil design. Ross and Corne [16] provide an overview of GA applications. The work in this project focuses on the use of GAs as an optimisation technique. 1.2.1 Components of a Genetic Algorithm A brief introduction to GA terminology and concepts is provided here, for a more detailed discussion, Mitchell [14] provides a good introduction to the field of genetic algorithms. Although there are variations between GA implementations, all generally feature the following components (whose names are derived from biological terminology): Solution representation (i.e. an individual) Population of individuals Fitness function Selection method Crossover operator Mutation operator The algorithm itself 4 Chapter 1. Introduction Solution Representation The solution representation, or just representation, has historically always been a string of bits, due to the precedence set in [9], and for some time this approach persisted as the representation. Hence in order to attempt to solve a problem, the first step would be how to represent a solution to the problem as a bit sequence. While, for many problems, this does indeed make sense (e.g. the knapsack problem), often it is quite unnatural and there are more intuitive and fitting representations. Some examples of effective, non-binary, representations include: A permutation of integers, representing a tour in a travelling salesman problem A string of integers, representing task/CPU assignments in a job scheduling problem A string of real numbers, representing weights in a neural network Population The population is the term used for the collection of individuals upon which the GA operates in order to produce new, and hopefully better, solutions. The size of the population is generally fixed throughout a run. There are two main types of population model; steady–state and generational. In a steady–state population, single individuals are inserted into the population overwriting a ‘weaker’ individual which has been selected via some policy. In a generational population, a completely new set of individuals are produced from the previous population, although some individuals may simply be copied – unaltered – into the new population. Often, both forms of population model feature a form of elitism. In a steady–state GA, this generally takes the form of a proportion of the best individuals which are protected from being usurped by new individuals. In a generational population, elitism amounts to roughly the same thing; either the best (or a proportion of the best) individual(s) are copied directly into the new population, unmodified. 1.2. Genetic Algorithm Overview 5 Fitness Function The fitness function is in a sense, the essence of the problem – it is what we are trying to solve. It may be a function which is being optimised directly, or some form of constraint satisfaction expression (which usually feature in, for example, timetabling problems). The fitness function provides the means by which the quality of a given individual is assessed. The fitness of an individual is then used to determine the likelihood that it will be selected for reproduction. Selection Method The selection method chooses individuals from which offspring are produced. The selection should essentially be biased toward individuals of higher fitness, in order that their genetic material persists in the population and is recombined and improved upon. Their are a number of different selection schemes, including: fitness proportionate selection, tournament selection and rank based selection. Each method has its advantages and disadvantages. A common selection method is fitness proportionate selection, whereby individuals are selected with a probability which is in direct proportion with its fitness value (relative to the sum of all the fitness values of the population). This method can cause the GA to become trapped in a local optima as highly fit genetic material is proliferated throughout the population before adequate exploration of the search space has occurred. Techniques such as rank selection can address this issue by obscuring absolute differences in fitness value, but this requires ordering of the population, which is not required in, for example, fitness proportionate selection. Crossover Crossover, within the GA community, has historically been considered as the major force behind the power of GAs. It is the means via which genetic material from parents (typically two, but it can be more) is recombined in order to produce a new individual, or individuals. A common crossover operator is two point crossover, which is used to illustrate the concept. 6 Chapter 1. Introduction This operator exchanges alleles of parents between two randomly selected points in the string. For example the two parents: 00000000 and 11111111, may produce the following offspring: 00111100 and 11000011, where the crossover points have been at the second and second–from–last bit positions. Mutation Mutation is concerned not with recombination of material from multiple sources, but the disruption of material from only one parent. Historically mutation has been viewed, in the GA community, as a means of diversity maintenance within a population, e.g. reintroducing a difference into an otherwise completely converged loci (where all genes at a certain position have the same value). For some time this rationale has lead to the perception that mutation is a ‘background’ operator, while crossover is the driving force in the search. This view is no longer as popular since some mutation–only techniques have proven comparable with GA performance, e.g. simulated annealing. The typical mutation operator for binary chromosomes is bit–flip mutation, where a bit is randomly inverted. This gives rise to the mutation rate, a value which specifies the probability of inverting any given bit in the string, and is applied to all bits. So for example, a mutation rate of 0.1 applied to a string of 20 bits, will on average, invert 2 of the 20 bits in a given string, though the actual bit positions flipped will vary. Operator Probabilities and Operator Parameters Any given genetic operator has an associated probability, known as the operator probability, which is the likelihood that a particular operator will be invoked on any given iteration of the GA. Usually in a generational population, crossover and mutation are applied with probabilities independently of each other. This means that one, both or neither operator may be applied to a parent chromosome. The opposite approach is generally taken in a steady–state population; either crossover or mutation are always applied in the production of a child and these operators are applied in a mutually– exclusive manner. In addition to the operator probability, there is often also an operator parameter. This is a probability which is relevant once a given operator has been invoked. For 1.2. Genetic Algorithm Overview 7 example, mutation may have an operator probability of 0.4 and when invoked, the mutation rate (operator parameter) then comes into play in order to effect the mutation. The Algorithm The specifics of the algorithm are mainly determined by the population model which has been chosen. The following algorithm example assumes a steady–state population with an elitist policy (the basis of the GA implemented in the project). 1. Initialise the population, typically with randomly generated individuals 2. Evaluate each member of the population 3. Select parent individuals, with bias toward fitter chromosomes 4. Apply an operator to produce a new child 5. Select an individual for deletion, with bias toward poorer chromosomes 6. Overwrite this selected individual 7. Return to 3, until some termination condition is met. Common termination conditions are; population convergence (all individuals are the same), a maximum number of evaluations have been executed or a solution of a particular quality has been found. An interesting aspect of GAs is that, though composed of simple methods and concepts, once these components are brought together and executed as a whole, they interact in a highly epistatic manner. This leads us to the fact that considerable effort must be expended on ‘tuning’ a GA, i.e. finding the best parameter settings. Not only are these parameters sensitive to the design decisions made, but they will almost certainly be problem dependent too. 8 Chapter 1. Introduction 1.2.2 Parameters of a Genetic Algorithm Once the decisions have been made as to what components will realise a GA (e.g. population model, representation, genetic operators and selection method) we are then faced with the task of finding suitable parameters with which to run the GA. Even for a modest GA, the parameter space facing the designer is formidable. The following gives an idea of some parameters which are applicable in most problems: Population size Probability of invoking crossover Probability of invoking mutation Bit-wise mutation rate Selection pressure This list is by no means exhaustive, but already we can see that to investigate these dimensions exhaustively would require considerable time and effort. As a result, there have been previous attempts to find robust and general parameter settings. Notable instances are by De Jong [11] and Grefenstette [7]. Although the values derived in these studies are indeed robust for many problems, they are still a form of ‘compromise’. Having parameters adapt dynamically presents a possibly more robust solution. 1.3 Dissertation Outline This section has introduced the basis of the project undertaken; the motivation underpinning the work, the aims and objectives of what is trying to be achieved and the required methods via which to pursue these aims. Also, the main terminology and concepts of genetic algorithms have been discussed for the benefit of readers unfamiliar with the field. The remainder of this section will give a brief summary of the subsequent chapters of the dissertation. 1.3. Dissertation Outline 9 Chapter 2 discusses in some detail instances of previous research in the area of self-adaptation in GAs and defines a general taxonomy of the various approaches that have, thus far, been realised. Chapter 3 details the non-adaptive GA, the population level adaptive GA, which is a recreation of the adaptive operator probabilities (ADOPP) system [13], and the individual level adaptive GA, which is a modified version of the original ADOPP system. Chapter 4 describes the test problems that form the basis of the experiments and discusses why these particular problems were selected. Chapter 5 describes the experiments carried out, and results obtained, in order to provide an exploration of the relevant parameter spaces and the identification of parameter settings from which to conduct more detailed experimentation. Chapter 6 discusses the results of the experiments which test the main hypothesis of the dissertation. The results in this chapter are based mainly on t–test comparisons between each GA type, for all the original test problems, and some additional larger and more complex problems. Additionally, a discussion of the differing nature of operator probability adaptation is presented. Chapter 7 discusses the main findings of the project, what was successfully achieved and what was not, and proposes some directions in which the work may be taken in future. Chapter 2 Previous Research in Self–Adaptation 2.1 Overview This section provides a review of some examples of research in self–adaptation that have been previously undertaken. The first part of the review will consider some surveys and overview reports which propose taxonomies within which to classify the nature of the various approaches. A framework will then be synthesised from this and some example works will be reviewed relative to this framework. By defining this framework not only can the previous research be characterised effectively, it also serves to highlight any areas within which less attention has been focused. The central hypothesis of the project along with the underlying motivation (discussed in section 1.1.1) will then be restated within the context of the derived framework. 2.2 Framework There have been a number of papers written which attempt to define and clarify the various approaches that fall under the term ‘self–adaptation’. It should be noted that even the definition of this term itself is not really universally agreed upon. However, 11 12 Chapter 2. Previous Research in Self–Adaptation although the terminology may be somewhat inconsistent between attempts to formalise research in this field, more often than not the actual meaning behind a proposed term is the same. It is worthwhile reviewing some of the overview work and establishing the precise terminology used in this report. 2.2.1 Methods of Self–Adaptation Angeline [1] proposes two orthogonal dimensions along which types of adaptation may be defined. Firstly the actual methods of adaptation are separated into two types: ‘Absolute Update Rules’ and ‘Empirical Update Rules’. Examples of absolute update rules include Davis’s adaptive operator fitness [4] and Julstrom’s adaptive operator probabilities [13]. The distinguishing characteristic of this type of rule is that it explicitly defines how the adapted entity is changed, via a mechanism which operates externally to the GA itself and typically involves the computation of some statistic based on search performance over recent iterations. Angeline draws a comparison between these absolute update rules and traditional artificial intelligence ‘heuristics’. Contrastingly, empirical update rules are not based on any such external mechanism. Rather the same – in this case evolutionary – forces propelling the search within the problem space is simultaneously applied to search (in practice, some subset of) the space of possible GA variants. The earliest example of such an approach in GAs was performed by Bäck [2], and is based on ideas originating from the evolutionary strategies community. Both these approaches involve some form of feedback in the adaptive procedure. There are examples of deterministic schedules (e.g. a pre–defined decrease of mutation probability as the GA run progresses) which can be classed as ‘adaptive’ techniques, but these are not discussed in [1], as such methods fall outwith the definitions proposed in this review. A survey which is more comprehensive in scope by Eiben et al. [5] presents the all–encompassing concept of ‘parameter setting’. The proposed taxonomy here not only considers the previously discussed adaptive methods, but also includes manual configuration of static parameters: ‘parameter tuning’. All other forms of parameter 2.2. Framework 13 modification fall within ‘parameter control’, and it is within this area that dynamic modification of parameters occur. Within the realm of parameter control three distinct methods are identified: ‘deterministic’, ‘adaptive’ and ‘self–adaptive’. The primary factor which identifies a deterministic method is that it does not feature any feedback whatsoever from the progress of the search. Instead, some aspect of the GA is modified in a fixed way. This does give rise to subtle examples that are somewhat difficult to classify, for instance, Davis [3], used an adaptive mechanism to derive time– varying schedules which were then applied in a fixed manner (i.e. no feedback occurs during the GA run). Though this is quite clearly an example of a deterministic system, the derivation of the deterministic schedule was via an adaptive mechanism. The adaptive and self–adaptive types are basically equivalent to Angeline’s absolute and empirical update rules respectively. Tuson [22] performed a comparison of the two main techniques. The terminology used in that work shall be adopted here, giving rise to ‘learning rule methods’ (equivalent to ‘adaptive’ and ‘absolute update rule’ terms) and ‘co–evolution’ (equivalent to ‘self–adaptive’ and ‘empirical update rule’ terms). during the GA run). Henceforth the terms ‘adaptive’ and ‘self–adaptive’ simply refer to a GA which modifies parameters via feedback from progress (be it via co–evolution or a learning rule). 2.2.2 Levels of Self–Adaptation The second main aspect used to classify adaptation in [1] is that of ‘Adaptation Level’. This recognises the scope, or granularity, at which adaptation occurs. Three levels are identified: ‘population’, ‘individual’ and ‘component’ level. Population level adaptation is concerned with the modification of some aspect of the GA which applies uniformly to all members of the population. Typical examples of such attributes are the probability of crossover and modifications to the fitness function. Individual level adaptation is concerned with the modification of a number of attributes, each of which is associated with a particular chromosome in the population. Commonly bitwise mutation rate is adapted, which can change independently for each 14 Chapter 2. Previous Research in Self–Adaptation chromosome in the population. Finally, component level adaptation is the finest granularity of adaptation possible. Here, the adaptation causes changes affecting, independently, each gene in an individual. This approach features most extensively in evolutionary strategies research, where individuals most commonly take the form of a vector of real numbers. Consequently, each number (gene) may be assigned a parameter specifying the magnitude of mutation which that component will experience, for example. Angeline also draws attention to the fact that while the power and control intuitively offered by component level adaptation may be appealing, the resultant parameter space which must ultimately be searched (along with the problem space itself) could be sufficiently large such that effective adaptation cannot be realised. 2.2.3 Adaptation Evidence A slightly more subtle dimension of classification is the evidence used in adaptation. The term evidence is taken from [18] and identifies the actual information upon which adaptation is based, for example, operator productivity, whereby some measurement is made of an individual’s fitness improvement over (usually) its parent. For co– evolutionary methods the evidence is implicitly obtained via the fitness function. 2.2.4 Subject of Adaptation Finally we have the actual aspect of the GA which is being adapted. [5] uses the term ‘component’ to address this, but it is felt this is somewhat confusing since the term is already in use regarding adaptation level. Therefore, the term subject is introduced to refer to that part of the GA which is undergoing adaptation. This is a very broad dimension, since it can really be any part of a GA at all. Subjects of adaptation include; operator probabilities, operator parameters, fitness function, population size and representation. 2.3. Co–evolutionary Examples 15 2.2.5 Summary To summarise the defined terminology, four separate dimensions of classification have been identified (as was essentially done in [18] and [5]), but have synthesised the terms used, where appropriate. The dimensions are: 1. The method of adaptation 2. The level within the GA at which the adaptation is taking place 3. The evidence utilised in order to effect the adaptation 4. Finally, the subject of the adaptation; what is actually being modified Some examples works will now be discussed relative to the above dimensions. 2.3 Co–evolutionary Examples There have been many previous works based on the idea of co–evolution. An appealing argument is put forward in [18] in support of this approach. Basically, since the space representing the possible configurations of a GA is vast, epistatic and there is little or no prior knowledge regarding the landscape, then this makes it an ideal candidate for optimisation via a GA. Of course, the question is whether the GA can beneficially search its own configuration space, while maintaining a competitive performance (hopefully, an improved one) in the search of the problem space. 2.3.1 Adaptation of Crossover Type Spears [19], proposes a system which features two crossover operators (uniform and two–point) whose selection is controlled by an additional bit tagged onto the end of the representation. A ‘0’ indicates uniform crossover and a ‘1’ two–point crossover. This bit is subjected to the evolutionary forces just as the solution representation itself is. A generational GA was used with fitness proportionate selection and the test problems reported were two versions of the N–peak problem. Basically, the landscape features 16 Chapter 2. Previous Research in Self–Adaptation one global optimum and N-1 local optima. The problems featured 1 and 6 peaks, stressing algorithm performance across problem size, but not structure. Spears initially compared the adaptive GA with two non–adaptive GAs, each of which featured one of the crossover types. The adaptive GAs performance was found to approximate that of the best performing normal GA (uniform crossover appeared to be the optimal operator to use). It therefore appeared that the GA was indeed ‘adapting’ toward use of the preferred operator. To test this idea, another control GA was run which featured both operators, but selected them randomly. This GA was very close in performance to the adaptive GA. Spears suggested that the good performance was simply the result of multiple operators, rather than anything adaption was doing. By using a system which measured the rate of change in the appended bit, Spears developed the idea of ‘confirmed’ adaptation, in order to try and determine more rigorously if a perceived improvement could actually be attributed to adaptation. All control GAs still featured the added bit, but it was not actively used for anything in the non– adaptive GAs, this allowed Spears to measure the rate of change in this appended bit and to interpret the result as ‘confirmed’ adaptation (or not). It was found that in some instances, what appeared to be adaptation was not actually ‘confirmed’ adaptation. Most interestingly, Spears proposed a modification to the fitness function, which formed a type of hybrid approach between learning rule and co–evolution. Recognising that if two different operators produced children with the same fitness then the operators (in the form of the appended bit) would be rewarded equally by fitness selection, since each child is as likely as the other. Clearly, this does not take into account the parent fitness; it is possible that one operator type provided a much larger increase over the parent fitness than the other operator. By incorporating this fitness ‘difference’ into the fitness function, Spears added a measure of operator productivity into the GA. This type of measurement measurement is usually performed explicitly (and externally to the GA) in learning rule approaches (discussed shortly). This enhancement was run on both the adaptive and non–adaptive GAs and while some improvement was found for the adaptive GAs, the non–adaptive GAs benefited considerably, suggesting that operator productivity may be a means to enhance GAs in 2.4. Learning Rule Examples 17 general, and are not just the sole concern of adaptive systems. 2.3.2 Co–evolution at Different Levels An early investigation based on ideas from ES by Bäck [2] considers a co–evolutionary approach and reports results for both individual and component level performances. The combination of different levels of adaptation within the same study is rare, for GA literature at least. The standard approach of encoding subject parameter values directly into the chromosome is adopted here. 20 bits are used to encode mutation rate, providing a very fine grained discretisation of the mutation rate, which spans the range [0.0. . . 1.0]. A single rate can be associated with one individual, involving one 20 bit segment being appended to the chromosome, realising individual level adaptation. Or, a rate is appended that is specifically associated with each bit of an individual. The individuals are binary strings themselves. Interestingly, when mutation is applied, the encoded parameter first mutates itself, then this result is used to mutate the associated object variable, be it the whole string, or one gene. Bäck found that the component level adaptation was more strongly disruptive, i.e. the bias was toward exploration of the search space. Contrastingly, individual level adaptation was found to be a more exploitative system, favouring the preservation of highly fit genetic material. It was found that individual level adaptation performed best on unimodal landscapes and component level adaptation on multimodal landscapes. In light of the assumptions underlying this project, this is an encouraging result; a finer grained adaptive mechanism is successfully exploiting a suitably non–uniform fitness landscape. 2.4 Learning Rule Examples 2.4.1 Population Level Adaptation Two well known works which share considerable commonality are by Davis [3] and Julstrom [13]. Both examples adapt operator probabilities by rewarding credit to oper- 18 Chapter 2. Previous Research in Self–Adaptation ators for their contribution in producing fitter children – otherwise known as operator productivity. Davis presents the slightly more complicated of the two systems, which adapts probabilities for five operators. Julstrom’s system is used to adapt two operators (crossover and mutation), however, there is no reason in principle that this system could not be extended to adapt probabilities for multiple operators. Davis’s work uses representations of a string of integers and the optimisation task is to obtain a string of all 5’s (the allele values are in the range 1. . . 32). The population model adopted is steady state as it is argued that such a system preserves information which can be exploited by the adaptive mechanism. Davis observes that a generational approach would be detrimental to adaptation (for this type of learning rule system) since the vast majority, if not all, of the population is replaced on each iteration of a generational GA. The idea is not tested, though the intuition seems valid. Davis compared the adaptive GA to one which featured all the same operators, but selected them at random. It was found that the adaptive GA did perform slightly better than the non–adaptive GA. However, this comparison was based only on visual inspection of graphs. Julstrom did not compare his adaptive GAs with a non–adaptive control. Instead, two problems were selected for which the optimum were known. The adaptive GAs were then observed to obtain, or get close to, these optimum. The basis of the adaptive mechanism involves the periodic recalculation of the operator probabilities, utilising operator productivity as the basis of crediting operators and also passing portions of this credit back to ancestral operators. This addresses the interplay between operators, where although on operator may not be directly responsible for the production of fit offspring, it can be shown that the operator did help ‘set the stage’. Julstrom’s approach is similar to this though the details of the mechanism seem a little more straight forward when compared with Davis’s. 2.4.2 Reinforcement Learning Approach Pettinger and Everson [15], propose a hybrid system (which to some degree, all learning rule methods are) which incorporates a reinforcement learning (RL) agent that controls operator selection. The RL agent is based on Watkin’s Q(λ)–learning and was 2.4. Learning Rule Examples 19 tested using a 40 city travelling salesman problem. The net result of this approach (following training of the RL agent) is a state–action table that determines, stochastically, the operator to select dependant upon the current state of the GA. In order for the ‘state’ of the GA to be perceived by the RL agent, the following discretisation of attributes were made: The generation count was split into four epochs. Average population fitness, normalised by the initial population fitness, was split into four intervals and a measure of population diversity was similarly binned into three states. The operators available to the GA formed the ‘actions’ available, of which there were three crossover operators and four mutation operators. The operators are further refined by specifying the ‘class’ of individual that will be selected in order that the operator can be applied. To enable this the population was split into two classes: ‘Fit’ (within top 10% of population) and ‘Unfit’ (the rest of the population). This additional dimension therefore created two versions of each mutation operator and four versions of each crossover operator. The task of the agent was to learn the most appropriate operator to apply, given the current state of the GA, the basis of reward for any action taken by the agent (‘action’ being the application of a particular operator), is a form of operator productivity. This approach was shown to be successful compared with a non–adaptive GA that selected from the same set of operators at random. As Spears showed [19], this is a good test to conduct, particularly when multiple operators are in play, since it may well just be the utility of having several operators that is behind any apparent improvement. However, there seems to be a distinct lack of fair tuning in the comparison. The RL agent experiences 150 training runs in order to learn the required state action table, whilst the normal GA is executed with just one parameter configuration. Admittedly, the RL agent’s learning phase is not strictly a tuning exercise (in the traditional sense), but the point is that computational effort is being expended in order to improve the adaptive GA performance. It seems only fair that a similar amount of effort be spent exploring the normal GAs parameter space. Apart from providing a better basis for comparison, this can help increase confidence that there is not some static parameter setup that actually provides equivalent performance to that of the adaptive 20 Chapter 2. Previous Research in Self–Adaptation system. 2.4.3 Individual Level Adaptation Srinivas and Patnaik [20] report a learning rule based adaptive GA which operates at the level of individuals. This is quite an unusual combination of approaches. Very often learning rule methods are concerned with population level adaptation and co– evolutionary methods are primarily concerned with individual level adaptation. This may be a consequence of the early work of Davis [3], as discussed, which featured a population level learning rule approach. The affinity between individual level adaptation and co-evolutionary methods is obvious, since by its fundamental setup, co–evolution is focused on the individual and there is no global or centralised control in place. The system put forward in [20] is an elegant one, with essentially no book keeping overhead, though computational effort is of course required. The authors begin by deriving simple population level expressions to derive crossover probability p c and mutation rate pm using only the population maximum and average fitnesses. These expressions are then extended by incorporating the individual’s own fitness, creating a simple but effective localised parameter set. The rationale behind the system is one of local optima avoidance, which the authors argue may be achieved by preserving highly fit individuals, while applying more disruptive force to very unfit individuals. This is effectively the exploration/exploitation balance made explicit; the search space is exploited by the current best individual, but by increasing disruption to low fitness individuals the search is kept active and the likelihood of becoming trapped in a local optima is considerably less likely. Some parameters are introduced by the system which are investigated by the authors. The GA was found to be robust to the changes. Interestingly, this work was based on a generational GA, whilst many other learning rule approaches are based on steady state GAs. Steady state GAs are often favoured in learning rule approaches as due to their incremental nature they tend to preserve information in a more stable manner than generational populations. Retaining information is not an issue here though, as there is no credit assignment in place, storage 2.5. General Points 21 of operator events or similar. The crossover and mutation rates are simply calculated directly, as needed, from nothing more than three fitness values (and some constants). Just as [19] advocated the use of a ‘fitness increase’ type metric, a similar concept is at play here; by calculating nothing more than differences in fitnesses, suitable values can be derived for crossover and mutation rate. The results reported were favourable for the adaptive GA compared with one featuring static pc and pm . These results were basically just direct comparisons of average performances (over 30 runs). Other metrics were recorded as well, such as how many times the GA got stuck in a local optima, which the adaptive GA performed well in. 2.5 General Points Irrespective of the approach taken to realise an adaptive GA, the following basic steps feature in the majority of works: 1. Determination of the parameters which will be adapted. As discussed previously, there are a large number of parameters in a typical GA. Some research concentrates on modifying just one parameter, other work incorporates multiple parameters simultaneously. The specific representation of the parameters must be considered along with how and when they will be modified. 2. Deciding how the static parameters will be handled. There are basically two approaches taken here. One is to choose a value for the parameter (often an experimental ”standard”, e.g. Crossover probability of 0.6 is typical). The other tests the parameter over a range of values, though, for a given set of runs, the parameter is still static. The choice made depends upon what is under consideration e.g. population size may be kept static for all experiments if its influence is not under scrutiny. 3. Selecting the test suite for the experiment. In general, it is felt that many experiments do not consider a sufficient number of benchmark problems, however [2], and [22] are extensive in the test problems featured. Given that practically all re- 22 Chapter 2. Previous Research in Self–Adaptation sults suggest a high degree of problem dependency in the observed performance, a comprehensive a suite of test problems as possible should be used. 4. The adaptive GAs are then run on the test suite and usually compared with a static parameter GA as a control. Though, as in [13] and [19], sometimes showing effective adaptation is the primary aim. Often, performance comparisons are made simply by comparing average performance but this is not rigorous [24]. 2.5.1 Parameter Migration Several of the techniques which have been developed in order to adapt parameters within a GA, do themselves introduce parameters. This is not necessarily an issue as it is often the case that these higher level parameters are in fact a lot more robust than those that are to be adapted, in which case the overhead is acceptable. This said, it is clearly something that should be kept in mind when attempting to create adaptive GAs. 2.5.2 Summary A framework for the classification of research in the field of self–adaptation was proposed and some notable instances of previous works reviewed. There are sufficient instances of positive results to show that it is worthwhile to focus research in this area. One aspect which has received relatively little attention is in comparing different levels of adaptation. It was noted that the vast majority of learning rule methods operate at the population level, while overwhelmingly, co–evolution is realised at the individual, or component, level. These appear to be the intuitive levels for these methods, but there is no reason that other combinations should not be tried. The general intuition behind finer grained approaches is appealing, as Angeline observes: “While the number of parameters for a population–level adaptive technique is smallest, and consequently easiest to adapt, they may not provide enough control over the evolutionary computation to permit the most efficient processing. This is because the best way to manipulate an individual may be significantly different than how to manipulate the population on average.” 2.5. General Points 23 This is a valid observation, though care must be taken to temper any hopes of performance gain with the computational overheads required to realise such finer grained systems. Angeline draws attention to this point by highlighting the fact that although component level adaptation may be at the extreme of offering adaptive control, the resulting parameter space may well be prohibitive of effective adaptation. Since relatively few learning rule methods have been proposed at the individual level (Srinivas and Patnaik [20] being a notable exception), the deficiency will be addressed in here. Additionally, by basing the technique on an existing population level system, a study of the ‘effect’ of adaptation level on the system will be possible. Combining the intuition supporting finer grained adaptation, and addressing the gap in studies comparing adaptation level leads naturally to the hypothesis of the project: An adaptive mechanism operating at the individual level will perform as well or better than the same mechanism operating at the population level In order to test this hypothesis a GA system is required which is discussed in Chapter 3. Chapter 3 System Implementation 3.1 Overview As mentioned earlier, any form of empirical investigation requires a system upon which experiments are run. This chapter details the GA system developed for these purposes, which is based upon previous work by Julstrom [13] called adaptive operator probabilities (ADOPP). ADOPP was selected as a basis for the work as it was felt to be both flexible in nature and a relatively uncomplicated realisation of a learning rule based adaptive system. Though originally based on population level adaptation, there was no reason in principle that the mechanisms involved could not be directed toward finer grained, individual level, adaptation. The opportunity of rigorously comparing the system with a non–adaptive counterpart was also present, as this was not an aim of the original work. Three broad types of GA were implemented: 1. Non-adaptive (or normal) 2. Population level adaptive 3. Individual level adaptive Each GA type shall be discussed in more detail in the following sections. 25 26 Chapter 3. System Implementation 3.2 The Basic Genetic Algorithm This GA is based on a simplified version of the one proposed in [13], essentially with the adaptive operator probability functionality removed. This provides a reasonable basis for comparison with the adaptive versions since the underlying GA mechanics remain the same, and all that varies is the presence (or absence) of the adaptive mechanism. The GA uses a steady–state population that does not permit the insertion of duplicate chromosomes, based on the ‘steady–state without duplicates’ GA of [4]. The primary advantage of this approach compared to generational GA is that population diversity is ensured (convergence to the same individual is impossible). It can also be argued that barring duplicates results in a more efficient use of the population since calls to the evaluation function only occur in order to assess new genetic material, as opposed to possibly having several calls to the fitness function to assess a number of instances of the same individual. Furthermore the non-duplicate restriction on population provides improved sampling of the search space, compared with a population model which permits duplicates. Algorithm 1 defines the basic GA operation. Algorithm 1 Normal GA 1. Initialise population (at random and with no duplicates) 2. Evaluate all individuals in population 3. Select a parent 4. Select the operator to apply 5. IF crossover selected 6. THEN get another parent and apply crossover 7. ELSE apply mutation 8. IF new child is not already in population 9. THEN evaluate the child and insert into population 10. ELSE discard new child 11. Go to step 3 until maximum number of evaluations have occurred The parent selection method relating to step 3 in algorithm 1 is linear rank selection, which is based on the definition in [23]. Linear rank selection firstly orders the population by fitness, then assigns each individual a ‘ranked fitness’ 1 . Following this, 1 This value should not be confused with the individuals actual fitness value; the linear value is used 3.2. The Basic Genetic Algorithm 27 an individual, i, is selected with the likelihood of rank i over the sum of all the ranks. In addition to using linear rank selection for reproductive purposes, it is also used to select individuals that will be overwritten by fitter offspring. Of course, when selecting for deletion, the bias is toward less fit offspring (with an elite proportion of the fittest in the population protected from deletion). This deletion process forms part of the ‘insert’ functionality mentioned in step 9 of algorithm 1. Linear rank selection has two main advantages over other selection methods. Firstly, it tends to avoid premature convergence as absolute differences in individuals’ fitness values are obscured, so, even if a super fit individual was present in the population, it will not rapidly proliferate through the population as would very likely be the case in, say, fitness proportionate selection. Secondly, linear ranking performs a form of ‘auto–scaling’ of fitness values, such that no matter the magnitude of variance present across all the fitness values, the selective pressure resulting is always constant. Certain other selection methods tend to suffer from a decrease in selective pressure as the fitness values in the population become less varied, which tends to occur as the GA makes progress. These advantages do come at a cost as linear ranking requires that the population is sorted according to fitness value before the ranked fitnesses can be applied. Methods such as roulette wheel selection and tournament selection do not require sorting. Also, there may well be situations in which it may be important to know the absolute difference between two fitness values, in which case linear rank selection would be undesirable. The operator which will be applied on a given iteration of the GA is determined stochastically by the operator probabilities. For crossover, this is denoted as P Cr and for mutation as P Mu . Crossover and mutation are applied exclusively on any given iteration of the algorithm and one of the operators is always applied, which gives rise to the condition that P Cr P Mu = 100%. These values remain fixed for the duration of a run. The details of crossover and mutation behaviour are discussed later in this chapter, suffice it to say at the moment that both operators produce one child only. only to select individuals for reproduction or deletion 28 Chapter 3. System Implementation 3.3 The ADOPP System The following description is based largely on the detail provided in [13], which provides a good explanation of the system. Where applicable, a little more detail is provided here and some insights regarding certain design decisions are offered. Julstrom proposes a system called adaptive operator probabilities (ADOPP) in [13], which adapts the operator probabilities as the GA runs. The unity condition [P Cr P mu = 100%] still holds throughout the run. By utilising an operator productivity metric, ADOPP assigns a probability to an operator which is proportionate to that operator’s contribution over recently generated individuals. Operator probabilities are updated after the creation of each new individual. Two structures are utilised in the ADOPP system; an operator history, which records the operators that led to the creation of the individual, and a queue which records recent operator contribution and operator invocations. Operator History Julstrom originally used a binary tree to implement the operator history, in this work a 2–dimensional ragged array was used, which was slightly simpler to implement. Figure 3.1 shows an example of an operator history. Figure 3.1: An Operator History 3.3. The ADOPP System 29 The most recent entry in the history (circled) holds the immediate operator; the operator which actually created the individual. At the next ‘layer’ of the history we have the operators which created the parents of the individual and so on. This history extends back to a pre–specified level, known as depth, which is one of the parameters of ADOPP. The example history has a depth of 4. Whenever crossover or mutation is applied to create a new individual then the appropriate section(s) of the parental history/ies are copied into the new individual’s operator history. The empty square bracket entries in the history, [ ], signify a null entry and are required since any chromosome produced by mutation has one parent only. The purpose of the operator history is to enable the appropriate assignment of credit, not only to the immediate operator which produced the child, but also to operators which played a contributory role in the creation of the child. This is important, for example, in addressing the intuition that mutation may introduce useful genetic material into a population which is then spliced together with other fit genetic material, via crossover, to produce a good offspring. Clearly, crediting only crossover in such a scenario would be unfair and misleading. Credit is assigned to the operators when an individual is created which is considered to be an improved chromosome. ADOPP considers an individual improved if its fitness is greater than that of the median individual’s fitness. When an improved individual is created, a credit value of 1.0 is rewarded to the immediate operator. The operators which created the new individual’s parents are rewarded a credit value of 1.0 decay, where decay is a value in the range 0.0 to 1.0 and controls the generosity of ancestral credit assignment. The grandparent operators are awarded 1.0 decay2 , and so on, back to the final layer of ancestral operators. Decay is the second ADOPP parameter. For the example history of Figure 3.1, assuming a decay value of 0.6, crossover would therefore be awarded a total credit of 1.0 + 0.6 + 2 0.62 + 3 0.63 = 2.968. Meanwhile, mutation would be awarded a total credit of 0.6 + 0.6 2 + 2 0.63 = 1.392. In the event of a new chromosome not being an improvement, then both crossover and mutation are awarded zero credit, but note that the invocation of the operator is still recorded in the queue of recent operators. Also, since our GA implementation does not 30 Chapter 3. System Implementation permit duplicate chromosomes into the population, then if a duplicate is produced, this too results in zero credit for crossover and mutation. Operator Queue The operator queue is used to store recent operator invocations, along with the associated credit values for crossover and mutation. Each time a new individual is created, the operator used is always recorded. If the new individual is an improvement (better than the median individual and not already in the population), then the credit values for crossover and mutation are derived from the new individual’s operator history, and these values are enqueued along with the operator type. Otherwise, zero credit values are enqueued along with the operator type. The length of the queue is defined by an integer value, qlen, which is ADOPP’s third and final parameter. Initially, ADOPP operates with fixed operator probabilities (as per the normal GA) until qlen entries have been added to the operator queue. After this point the operator probabilities are derived from the operator queue, and enqueueing a new entry requires removing the oldest one. The initial, fixed probabilities, in both the original and this work, are set at P Cr = P Mu = 50%. The operator queue has four variables associated with it, namely the number of crossover and mutation invocations recorded, num Cr and num Mu , respectively, and the total credit due to crossover and mutation (summed across all queue entries), cred Cr and cred Mu , respectively. When a new entry is added to the queue, the relevant invocation count is incremented by one and the credit values being enqueued are added to cred Cr and cred Mu accordingly. Note that these values may be zero. When an entry is removed from the queue the opposite process to adding an entry is followed; the appropriate operator count is decremented by 1 and the credit scores for crossover and mutation are subtracted from cred Cr and cred Mu , respectively. The four values are then used to derive the operator probabilities via the following expression: P Cr cred Cr num Cr cred Cr cred Mu num Cr num Mu Therefore, quite simply, the total credit values are scaled according to the frequency 3.4. ADOPP Modification for Individual Adaptation 31 of the appropriate operator, and the resulting values are normalised with each other to satisfy the earlier condition that P Cr P Mu = 100%. To ensure that an operator always has some participation, the probabilities are bounded between 5% and 95%. ADOPP Algorithm Algorithm 2 summarises the above discussion and presents the system in a more complete form. Algorithm 2 ADOPP GA 1. Initialise population (at random and with no duplicates) 2. Evaluate all individuals in population 3. Select a parent 4. IF operator queue is full 5. THEN select operator using individual dynamic probabilities 6. ELSE select operator using fixed global probabilities 7. IF crossover selected 8. THEN get another parent and apply crossover 9. ELSE apply mutation 10.IF new child is not already in population 11. THEN 11.1. Evaluate the child and copy the parental histories 11.2. IF child is an improvement 11.3. THEN enqueue calculated credit values along with applied operator 11.4. ELSE enqueue zero credit along with applied operator 11.5. Insert child into population 12. ELSE discard new child 13. Go to step 3 until maximum number of evaluations have occurred 3.4 ADOPP Modification for Individual Adaptation The system described in section 3.3 adapts operator probabilities at the population level. The enhancement described in this section allows the same mechanism to adapt at the level of the individual, in theory enabling finer grained adaptation, and hopefully providing more effective optimisation. 32 Chapter 3. System Implementation Primary Extension The main modification made to ADOPP is that instead of just one operator queue, each individual has its own queue (as is already the case for the operator history structure). The initial behaviour of the individual level ADOPP system (iADOPP) is identical to the original, that is, until all operator queues are full, operators are selected based on static probabilities (again, both set at 50%). Until all queues are full each new entry is added to each and every queue. This ensures that all operator probabilities begin adapting at the same time, and also enables adaptation in the shortest possible time; if we were to wait for each queue to populate individually it is likely that adaptation may not occur sufficiently early on in the GA run. For example, if we have a population size of 100 and a qlen of 100, this results in a worst case scenario of 10,000 iterations before all queues are full, within which time the GA will more than likely have converged already, making adaptation rather pointless. In reality it is more likely that a lot fewer than 10,000 iterations would be required before the onset of adaptation, however, the delay would very likely still be sufficiently such that adaptation was not occurring at a suitably early stage in a run. Note that once the queues are full, new entries are enqueued only onto the currently selected queue, i.e. the queue belonging to the parent selected in step 3 of algorithm 2. Additional Copying Overheads Given that each individual now has its own operator queue, we must copy the parental queue into the new chromosome’s queue, as well as the operator histories. For mutation this is straight forward as we are concerned with only one parent, and therefore one queue. For crossover it was decided to copy the queue of the first parent selected. This is a clear consequence of the more demanding overhead requirements, both in terms of book keeping and computational effort, of realising adaptation at the individual level. Such practical requirements essentially demand that benefit is gained from these more costly mechanisms. Otherwise there are simply no grounds for justifying their inclusion. 3.4. ADOPP Modification for Individual Adaptation 33 Improvement Criteria In ADOPP an improved chromosome is one whose fitness is greater than the median individual’s fitness. This clearly makes sense from a population point of view, but less so from the individual standpoint. Another option is to seek for an improvement over the parent’s fitness, which focuses the adaptation more tightly on the individual level. Again, for mutation, this is a straightforward exercise, but for crossover we must decide over which parent we seek an improvement. There are three main choices to consider: 1. 1st parent selected (or 2nd, the point being we simply always select a certain parent unconditionally for comparison) 2. improvement over the less fit parent 3. improvement over the fitter parent Clearly option 3 is the most demanding and option 2 the least. Due to the fact that both parents are selected by linear rank, option 1 forms a kind of stochastic super set of both options. With some exploratory experimentation it was found that no particular approach exhibited any advantage over another, therefore, option 1 was chosen as it is the simplest. Both median and parent improvement versions of iADOPP were implemented and feature in experiments discussed in Chapter 5. We can now refine the original three GA types stated earlier to include both iADOPP implementations, resulting in four GA types: 1. Non-adaptive (or normal) 2. Population level adaptive 3. Individual level adaptive (using median improvement) 4. Individual level adaptive (using parent improvement) 34 Chapter 3. System Implementation In addition to the main GA components discussed it is also worthwhile to provide an overview of the genetic operators utilised. For the binary encoded problems these operators are very straight forward, while the permutation encoding operators are more complicated since they essentially encode some domain knowledge. 3.5 Genetic Operators 3.5.1 Binary Encoding Operators The following operators are used for all test problems apart from the travelling salesman problems. The test problems are discussed in detail in Chapter 4. Uniform Crossover This is a common crossover technique [21] for binary encoded strings. Each allele is selected stochastically from one of the parents to form a singular offspring. For example, the two parents: 11111111 and 00000000, may form the following offspring: 11011010. Usually, parents are selected with equal probability and this approach is taken here. Bit–flip Mutation Bit–flip mutation was discussed in Chapter 1 and is worth reiterating here. Mutation is invoked stochastically at a rate determined by its operator probability (P Mu ). Once invoked, the mutation rate applies to each bit in the string and defines the likelihood that any given bit will be inverted. Under this scheme when a bit is mutated it is guaranteed to change. 2 2 This is subtly different from Holland’s originally proposed mutation [9]. Holland defined a mutation event as the random assignment of a value from the set 0,1 to a gene (the work is based on binary strings), therefore though mutation may be applied to a gene, a change in that gene is not guaranteed as a result. This scheme is also applied stochastically with a small probability. 3.5. Genetic Operators 35 3.5.2 Permutation Encoding Operators These operators are used only for the travelling salesman problems which use permutations of integers to represent non–cyclical tours which are discussed in Chapter 4. Very Greedy Crossover This operator is described in [12]. The basis of very greedy crossover (VGX) is that shortest edges are heavily favoured when constructing a new tour from two parents. The algorithm is as follows; a starting city is selected at random and the four parental edges connected to that city are determined (call this the edge list). If there is a shared edge in the edge list which is to an unvisited city, then that edge is appended to the tour. If there are no shared edges then the shortest non–cyclical parental edge is appended. If it happens that all parental edges cause cycles, then the shortest edge to a previously unvisited city is selected. We can see therefore that the respect shown to shared edges is the only manner in which VGX does not behave greedily [12]. Edge-swap Mutation For permutation encoded problems, i.e. TSPs, mutation is defined as the reversal of the cities between two randomly chosen points in a tour. This has the effect of changing two edges in a tour. Although several cities may be ‘disrupted’ in terms of where they appear in the permutation, since going from city A to city B is equivalent of going from city B to city A in a symmetric TSP (which all TSPs in this work are), it is only the two altered edges (those at the start and end of the reversed section) that actually affect the length of the tour. Figure 3.2 illustrates the operation for an 8 city problem. The shaded section of the parent (top) indicates the portion of the tour that has been randomly selected for reversal. 36 Chapter 3. System Implementation Figure 3.2: Edge-swap Mutation 3.6 Summary This section has described the implementation effort undertaken in order to produce a system via which the aims of the project can be pursued. The primary system design is that proposed in [13], which provides a learning rule adaptation mechanism which is used to modify operator probabilities during the execution of a GA at the population level. The adaptive mechanism works with a steady state GA and a population that does not permit duplicate chromosomes i.e. newly created instances of an already existing individual cannot be inserted into the population. The underlying steady state GA was utilised as a means to provide a non–adaptive counterpart by removing those aspects of the system relating to operator probability adaptation. By including an appropriate normal GA implementation for comparative purposes, valid conclusions can be drawn regarding the effects and benefits (if any) of adaptation. The population level adaptive system was extended to allow finer grained modification of operator probabilities. This enabled an investigation of the differences (if any) between population and individual level operator probability to be carried out. This type of comparison has not previously been conducted for a learning rule adaptive system, across the adaptation levels considered here. Chapter 4 The Test Problems 4.1 Overview Test problem selection is as important a part of any empirical investigation as the choice of GA implementation and any parameter settings. Care should be taken to try and select problems that will hopefully prove illuminating for the investigation at hand. Using well studied problems can sometimes provide a useful means of comparison with previous experimental results. More importantly though, is selecting problems that address the assumptions behind any research. Usually, a broad range of problems is useful as it can serve to highlight particular strengths (and weaknesses) of the algorithms under scrutiny. Here, the two test problems of the original work feature, namely binary f6 [4] and a 30 city travelling salesman problem (TSP) [10], along with several others. The complete set of test problems is as follows: Binary f6 30 city TSP 100 city TSP MaxOnes De Jong Test Functions 37 38 Chapter 4. The Test Problems Each test problem will now be discussed in some more detail and the reasons for their inclusion given. The TSP problems use the permutation encoding operators while all other problems use uniform crossover and bit-flip mutation, as discussed in Chapter 3. In all the binary coded problems (all except the TSPs), the uniform crossover probability i.e. the likelihood of selecting the current bit value from the 2nd parent, is 0.5. The mutation rates are all based on the standard rate of 1/l, where l is the bit string length of the individuals. For binary f6, the mutation rate chosen is 0.05, the same is the value reported in [13]. Details of all static parameter settings, chromosome string lengths etc. are given in Appendix A. 4.2 Binary f6 1 0.6 0.4 max f 0.2 0 -100 sin2 x2 x2 0 5 x22 2 2 1 f x1 x2 0 5 1 0 0 001 x21 100 0 xi 100 0 0.8 -50 0 50 f 0 0 1 100 The above visualisation of binary f6 is similar to that featured in [4]. x 2 is held at the optimal value of 0 and x1 is then varied across the entire input range. x2 behaves in a symmetric manner to this, therefore swapping the roles of the variables would produce the same result. Binary f6 is a very complex landscape with many concentrated local maxima and only one global maximum. This problem is one of two attempted in the original work [13]. 4.3. 30 City TSP 39 Figure 4.1: Map of 30 city TSP 4.3 30 City TSP This is the second problem featured in [13]. A display of the city layout is shown in Figure 4.1. While this is not as telling as an impression of the actual fitness landscape of the problem, it at least gives an indication of the general structure and a means of comparison with other TSPs. The minimum tour length of this TSP instance is 420. We can see that the distribution of cities is relatively even. There is a little clustering present, but nothing extreme. 4.4 100 City TSP Figure 4.2: Map of 100 city TSP This problem 1 Data for the 1 provides an expansion upon the original set, while keeping the 100 city tsp was obtained from TSPLIB: http://www.iwr.uni- 40 Chapter 4. The Test Problems problem type familiar. From Figure 4.2 we can see that this problem is also evenly distributed, with no specific clusters or structure in the layout. The minimum tour length is 21282. 4.5 MaxOnes MaxOnes is a simple maximisation problem where the maximum fitness is achieved by having a string of all ‘1’s. The fitness for any individual is simply the number of ‘1’s present in the string. A string length of 100 bits is used, yielding a mutation rate of 0.01. Population was taken as 100. This problem is included to see if adaptation can exploit a simple problem, or if adaptation is actually detrimental to performance in this situation. 4.6 De Jong Functions The De Jong test suite [11] is well known in the GA field. It presents a varied mixture of landscape types in isolation, and can provide useful information on the relative strengths and weaknesses of an optimiser. The suite is composed of five problems, each discussed in turn. Note that for the visualisation of functions provided, the fitness landscapes are shown as maximisation problems. In fact, they are implemented as minimisation problems (the usual approach, and that of [11]). However, the visualisation of the problems is clearer if presented from the maximisation stand point. In reality it makes no difference whether the functions are treated as max/minimisation problems, as both forms are equivalent. It was found that some functions required more or fewer runs than the typical 5000 and the stated values were derived via some extended preliminary runs. The population size was taken as 100 for all functions (details in Appendix A). heidelberg.de/groups/comopt/software/TSPLIB95/tsp/kroA100.tsp.gz 4.6. De Jong Functions 41 f1 - Sphere Model f1 x ∑3i 1 x2i 5 12 xi 5 12 min f1 f1 0 0 0 0 F1 is the simplest of the De Jong test problems, it is smooth and unimodal, and should not present any problems for any capable optimiser. Similarly to MaxOnes, this problem should test whether adaptation can bring any performance benefit for such a simple landscape. Certainly, it is not expected that individual level adaptation would provide any benefit for such a problem, given the lack of any localised sub–features which may be exploited. f2 - Rosenbrock’s Function f2 x1 x2 100 x21 x2 2 ! 1 x1 2 048 xi 2 048 min f2 2 f2 1 1 0 F2 is a complicated surface which features a ridge as the maximal feature of the landscape. This ridge follows a parabolic trajectory and has a global optimum at just one point on the ridge. This is a more promising candidate for individual level adaptation due to the variations and non-uniformity present compared with, for example, f1. f3 - Step Function This function is composed of several completely flat, incremental plateaus. In a sense, it shares some commonality with MaxOnes, since the landscape of MaxOnes is essen- 42 Chapter 4. The Test Problems f3 x 25 ∑5i " 1 # xi $ 5 12 xi 5 12 min f3 &% % f3 5 12 5 && 5 12 5 &' 0 tially composed of several ‘levels’, with only one minimum (the all zero string) and only one maximum (the all one string) representing the extremes of the landscape. In between these extremes it is quite possible to have several distinct individuals that have the same fitness value. Similarly, it is possible to produce new individuals in the f3 landscape, that although different, have the same fitness value as existing individuals. This raises the possibility of creating children that do not yield any information upon which adaptation can progress. This landscape may therefore prove difficult for adaptation to exploit effectively. f4 - Quartic Function with Noise 4 f4 x ∑30 i " 1 i ( xi Gauss 0 1 1 28 xi 1 28 min f4 f4 0 && 0 0 This function features the addition of Gaussian(0,1) noise, and therefore the landscape example presented can be thought of as an indicative ‘sample’ of the surface, which ultimately changes as the search progresses. The underlying surface is quite a simple however. This problem may shed some light on the different GA types’ capability in handling noise. 4.7. Summary 43 f5 - Sheckel’s Foxholes 1 ! ∑25 j 1 j , ∑ 2 ) x i . ai j + 6 i 1 / 32 16 0 16 32 32 16 &0 0 16 32 32 32 32 32 32 16 16 &0 32 32 32 1 65 536 xi 65 536 min f5 f5 32 32 2 1 1 f 5 ) x1 * x2 + ai j 1 500 F5 is an extreme landscape, featuring several sharp local optima (and one global optimum). Again, we have large areas of this landscape from which no useful information can be gleaned due the the considerable flat areas. This may well cause problems for the adaptive GAs, although there is also high non–uniformity present thereby combining desirable and detrimental features. 4.7 Summary As discussed earlier, test problem selection is an important part of any empirical investigation of GAs. The intuition underpinning the hypothesis is that individual level adaption will be able to exploit localised features of a fitness landscape that population level will be unable to, due to population level adaptation being too coarse. The test problems selected feature both unimodal and multimodal landscapes and varying degrees of non-uniformity. The diversity of the landscapes featured provides a reasonable basis with which to test this intuition. Chapter 5 Formative Experiments 5.1 Aims Two primary aims are pursued in this formative experimentation: 1. Investigate the sensitivity in performance of the GA to variation in relevant parameters. 2. Identify the best parameter settings for a given GA type, such that these settings may be used for more in–depth experimentation. The first aim allows for an informal comparison to be made between GA types regarding the impact of parameter changes on performance. The second aim addresses ‘fair tuning’, which is often lacking in other comparative work. Basically, when comparing two or more systems an approximately equal amount of effort should be expended on parameter exploration for each system. By doing so, confidence is gained that any comparisons being made relate to the systems’ best possible performance. 5.2 Methodology For every configuration i.e. combination of test problem and GA type, a number of different parameter values will be exercised. For the normal GA, the only parameter 45 46 Chapter 5. Formative Experiments under investigation is the crossover probability, P Cr , and by implication the mutation probability P Mu . For the adaptive GAs the three ADOPP parameters of depth, decay and qlen will be explored. Each distinct setting of parameter(s) is known as a treatment, and several are run for every configuration. All treatment results for a particular problem are shown together om the same graph, providing a form of visualisation of the GAs robustness; the less variation that exists between each treatment performance, the more robust the GA to parameter changes. Sections 5.2.1 and 5.2.2 discuss the values which feature in each treatment for the normal GA and adaptive GA experiments, respectively. 5.2.1 Normal GA Parameter Treatments The ADOPP system bounds the operator probabilities within the range of 5% to 95%. It is therefore fair to consider the normal GA somewhere within the same range. The values of P Cr tested range from 10% to 90%, in increments of 5%, resulting in a set of 17 treatments, each of which was run 10 times and the average performance reported. In total this resulted in 170 GA runs for tuning and investigation purposes. All graphs detailing normal performance display the results for all 17 treatments simultaneously, which helps to show both the sensitivity of performance to the probability settings and also whether or not there is a clearly favoured static setting. 5.2.2 Adaptive Parameter Treatments In [13] the following values were reported for each parameter: depth [1, 5, 8] decay [0.5, 0.8, 0.95] qlen [10, 100, 500] A finer grained approach is taken here, incorporating the following ranges: 5.3. Results 47 depth [1, 3, 5, 7, 10, 15 1 ] decay [0.0, 0.2, 0.4, 0.6, 0.8, 1.0] qlen [10, 50, 100, 200, 350, 500] The parameters are varied independently, requiring a ‘locked’ value for each parameter (shown in bold above) which will be assumed whenever that particular parameter is not being varied. An exhaustive search of all (6 6 6) 216 combinations is not feasible. These locked values are the same as those used in the original work. This range set provides 16 combinations, each of which was run 10 times and the average performance reported. The resulting computational effort is 160 runs for each problem, comparable with the 170 expended for the normal GA instances. As for the normal GA results, performance for all 16 treatments are shown simultaneously on graphs displaying adaptive GA results, which allows for easy comparison between performances for each treatment. 5.3 Results The following results provide an informal characterisation of the response of each configuration to parameter variation. Furthermore they enable the identification of suitable parameter settings from which to conduct more in–depth experiments. It should be noted however that no conclusive points are, or should be, drawn from these results. 5.3.1 Binary F6 There is no a great deal of difference apparent in the performance for binary f6 across the four GA types. The ‘spread’ in final solution values is very similar for each GA, although the individual level adaptive GA with parent improvement has the most focused values (i.e. the least spread). 1 For the 100 city TSP, it was found that that a depth of 15 caused a Java out-of-memory error. This was due to the large population size of 500 required for this problem. Therefore this depth value was not included in runs pertaining to the 100 city TSP 48 Chapter 5. Formative Experiments POPULATION LEVEL ADAPTATION 1 0.95 0.95 0.9 0.9 SOLUTION FITNESS SOLUTION FITNESS NORMAL GA 1 0.85 0.8 0.75 0.7 0.85 0.8 0.75 0 500 1000 1500 2000 2500 3000 EVALUATIONS 3500 4000 4500 0.7 5000 0 500 1 1 0.95 0.95 0.9 0.9 0.85 0.8 0.75 0.7 1000 1500 2000 2500 3000 EVALUATIONS 3500 4000 4500 5000 4500 5000 INDIVIDUAL LEVEL ADAPTATION (PARENT IMPROVEMENT) SOLUTION FITNESS SOLUTION FITNESS INDIVIDUAL LEVEL ADAPTATION (MEDIAN IMPROVEMENT) 0.85 0.8 0.75 0 500 1000 1500 2000 2500 3000 3500 4000 4500 EVALUATIONS 5000 0.7 0 500 1000 1500 2000 2500 3000 3500 4000 EVALUATIONS Figure 5.1: Performance for Binary F6 - All GA Types Generally, all GA types behaved in a consistent and fairly robust manner to variations in parameter values. The best solution, of 0.995141 was achieved by the individual level adaptive GA with parent improvement. The settings used to obtain this were depth = 5, decay = 0.8 and qlen = 500. 5.3.2 30 City TSP Figure 5.2 illustrates clearly that the ADOPP GAs (operating at both the population and individual level) are considerably more robust to parameter changes than the normal GA, for this problem. As can be seen, the normal GA often fails to find the minimum tour length of 420. Indeed, the only setting which achieved this for all 10 runs was P Cr = 65%. Of all the adaptive treatments, only one failed to achieve (on average) the minimum tour length. This occurred for the individual level adaptive GA with parent improvement and the settings responsible were depth = 5, decay = 0.8 and qlen = 50. 5.3. Results 49 POPULATION LEVEL ADAPTATION 430 428 428 426 426 TOUR LENGTH TOUR LENGTH NORMAL GA 430 424 424 422 422 420 420 0 2000 4000 6000 8000 10000 EVALUATIONS 12000 14000 16000 0 2000 4000 6000 8000 10000 EVALUATIONS 12000 14000 16000 14000 16000 INDIVIDUAL LEVEL ADAPTATION (PARENT IMPROVEMENT) 430 430 428 428 426 426 TOUR LENGTH TOUR LENGTH INDIVIDUAL LEVEL ADAPTATION (MEDIAN IMPROVEMENT) 424 422 424 422 420 420 0 2000 4000 6000 8000 10000 12000 EVALUATIONS 14000 16000 0 2000 4000 6000 8000 10000 12000 EVALUATIONS Figure 5.2: Performance for 30 city TSP - All GA Types 5.3.3 100 City TSP For the 100 city TSP we have very similar results to those obtained for the 30 city TSP, as shown in Figure 5.3, with a marked decrease in parameter variation sensitivity for the adaptive GAs. The minimum tour length of 21282 was not achieved by any normal GA treatment. The best tour achieved by the normal GA was 21359.3, with a P Cr of 40%. Whilst there were several instances of the best tour being achieved with certain adaptive treatments, none obtained the minimum tour on all 10 runs. Even so, these results strongly suggest that performance is improved by the inclusion of adaptation. The best average tour length found was 21336.7, which was obtained by the individual level adaptive GA with parent improvement. The parameter settings were depth = 5, decay = 0.8 and qlen = 100. 50 Chapter 5. Formative Experiments POPULATION LEVEL ADAPTATION 24000 23500 23500 23000 23000 TOUR LENGTH TOUR LENGTH NORMAL GA 24000 22500 22500 22000 22000 21500 21500 21000 0 5000 10000 15000 20000 25000 30000 EVALUATIONS 35000 40000 45000 21000 50000 0 5000 24000 24000 23500 23500 23000 23000 22500 22000 21500 21500 0 5000 10000 15000 20000 25000 30000 35000 40000 15000 20000 25000 30000 EVALUATIONS 35000 40000 45000 50000 45000 50000 22500 22000 21000 10000 INDIVIDUAL LEVEL ADAPTATION (PARENT IMPROVEMENT) TOUR LENGTH TOUR LENGTH INDIVIDUAL LEVEL ADAPTATION (MEDIAN IMPROVEMENT) 45000 50000 21000 0 EVALUATIONS 5000 10000 15000 20000 25000 30000 35000 40000 EVALUATIONS Figure 5.3: Performance for 100 city TSP - All GA Types 5.3.4 MaxOnes Figure 5.4 shows yet another example of the improved robustness of the adaptive approaches. While the maximum fitness of 100 was eventually achieved by every treatment of each GA type, clearly the adaptive GAs’ performances are more consistent and reliable. Since nothing can be said regarding final solution quality for this problem, the speed of convergence will be considered instead. For the normal GA, a P Cr of 55% was found to offer the best average convergence time to the optimum solution, which was found after 1950 evaluations. It was found that the best average convergence times of the adaptive GAs were actually slightly slower than that achieved by the (optimally tuned) normal GA. The adaptive GAs required at least between 2025 and 2100 evaluations before reaching the optimum, suggesting that the much improved robustness may be obtained at a slight trade off of speed to optimum solution. 5.3. Results 51 POPULATION LEVEL ADAPTATION 100 95 95 90 90 SOLUTION FITNESS SOLUTION FITNESS NORMAL GA 100 85 80 75 85 80 75 70 70 65 65 60 0 500 1000 1500 2000 2500 3000 EVALUATIONS 3500 4000 4500 60 5000 0 500 100 100 95 95 90 90 85 80 75 65 1000 1500 2000 2500 3000 3500 EVALUATIONS 4000 3500 4000 4500 5000 4500 5000 75 70 500 2000 2500 3000 EVALUATIONS 80 65 0 1500 85 70 60 1000 INDIVIDUAL LEVEL ADAPTATION (PARENT IMPROVEMENT) SOLUTION FITNESS SOLUTION FITNESS INDIVIDUAL LEVEL ADAPTATION (MEDIAN IMPROVEMENT) 4500 5000 60 0 500 1000 1500 2000 2500 3000 3500 4000 EVALUATIONS Figure 5.4: Performance for MaxOnes - All GA Types 5.3.5 De Jong f1 For De Jong f1, no striking differences in performance are evident from Figure 5.5. This is a very straightforward landscape, so the result is not too surprising. The best result of 9 10 5 was obtained by the population level adaptive GA and the settings used were as follows: depth = 5, decay = 0.8 and qlen = 100. 5.3.6 De Jong f2 Again, there are no obvious differences between the GA performances for De Jong f2, as shown in Figure 5.6. However, for the individual level adaptive GA with parent improvement, there is clearly a treatment that is not performing as well as the others. The settings responsible are depth = 5, decay = 0.8 and qlen = 10 and the minimum achieved by this setting is only 0.0354. It is possible that the shorter queue length does not provide sufficient information in order that effective operator probabilities can be derived, thus impacting 52 Chapter 5. Formative Experiments POPULATION LEVEL ADAPTATION 2 1.5 1.5 SOLUTION FITNESS SOLUTION FITNESS NORMAL GA 2 1 0.5 1 0.5 0 0 0 500 1000 1500 EVALUATIONS 2000 2500 0 500 1000 1500 EVALUATIONS 2000 2500 INDIVIDUAL LEVEL ADAPTATION (PARENT IMPROVEMENT) 2 2 1.5 1.5 SOLUTION FITNESS SOLUTION FITNESS INDIVIDUAL LEVEL ADAPTATION (MEDIAN IMPROVEMENT) 1 0.5 1 0.5 0 0 0 500 1000 1500 2000 2500 0 EVALUATIONS 500 1000 1500 2000 2500 EVALUATIONS Figure 5.5: Performance for De Jong f1 - All GA Types on performance. The best solution overall of 7.063 10 4 was obtained by the normal GA with a P Cr of 65% 5.3.7 De Jong f3 Figure 5.7 shows a more convincing difference between the approaches. As was the case for the TSP problems and MaxOnes, the adaptive GAs applied to De Jong f3 appear to offer increased robustness to parameter variance, in comparison to the normal GA. There were several instances of parameter settings successfully achieving the minimum result of 0 across all the GA types. However, for the normal and population level adaptive approaches, there were a number of settings which on average did not obtain the minimum. For the individual level adaptive GAs both featured only one setting which failed to achieve the minimum on all ten runs. For the median improvement version the settings 5.3. Results 53 POPULATION LEVEL ADAPTATION 1 0.8 0.8 SOLUTION FITNESS SOLUTION FITNESS NORMAL GA 1 0.6 0.4 0.6 0.4 0.2 0.2 0 0 0 500 1000 1500 EVALUATIONS 2000 2500 0 500 1000 1500 EVALUATIONS 2000 2500 INDIVIDUAL LEVEL ADAPTATION (PARENT IMPROVEMENT) 1 1 0.8 0.8 SOLUTION FITNESS SOLUTION FITNESS INDIVIDUAL LEVEL ADAPTATION (MEDIAN IMPROVEMENT) 0.6 0.4 0.2 0.6 0.4 0.2 0 0 0 500 1000 1500 2000 EVALUATIONS 2500 0 500 1000 1500 2000 2500 EVALUATIONS Figure 5.6: Performance for De Jong f2 - All GA Types of depth = 5, decay = 0.8 and qlen = 10 obtained an average fitness of 0.1. The parent improvement version achieved an average fitness, also of only 0.1, with settings of depth = 7, decay = 0.8 and qlen = 100. The best results overall in terms of speed to solution was obtained by individual level adaptation (median improvement) with settings of depth = 5, decay = 0.4 and qlen = 100. With these settings the optimum was found after at most 1550 evaluations. 5.3.8 De Jong f4 No strong differences are evident between the GA types for De Jong f4 (Figure 5.8). Due to the fact that noise is added to the ‘pure’ value returned by the quartic function, in practical terms it is not possible to obtain the minimum of 0. The best average solution fitness found was 2.392 and this was obtained by the individual level adaptive GA with median improvement. The settings were depth = 5, decay = 0.8 and qlen = 100. 54 Chapter 5. Formative Experiments POPULATION LEVEL ADAPTATION 10 8 8 SOLUTION FITNESS SOLUTION FITNESS NORMAL GA 10 6 4 2 6 4 2 0 0 0 500 1000 1500 2000 2500 3000 EVALUATIONS 3500 4000 4500 5000 0 500 1000 1500 2000 2500 3000 EVALUATIONS 3500 4000 4500 5000 4500 5000 INDIVIDUAL LEVEL ADAPTATION (PARENT IMPROVEMENT) 10 10 8 8 SOLUTION FITNESS SOLUTION FITNESS INDIVIDUAL LEVEL ADAPTATION (MEDIAN IMPROVEMENT) 6 4 2 6 4 2 0 0 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 0 500 1000 1500 2000 EVALUATIONS 2500 3000 3500 4000 EVALUATIONS Figure 5.7: Performance for De Jong f3 - All GA Types 5.3.9 De Jong f5 Performance results for De Jong f5 are again, all fairly similar (Figure 5.9). If anything, the normal GA results seem to be more consistent than those of the adaptive GAs and it happens for this problem that the normal GA did produce the best average performance. With a P Cr of 25%, a solution fitness of 1.019 was obtained. We can see that one treatment in particular has performed relatively poorly for the population level adaptive GA. The relevant settings were depth = 3, decay = 0.8 and qlen = 100 and a solution fitness of only 2.119 was achieved. 5.4 Revisiting Median/Parent Improvement Regarding the two versions of individual level adaptation considered thus far, it was noted that the median improvement criteria obtained a higher final solution fitness than the parent improvement criteria, for only two of the test problems. These were De Jong f2 and f4. However, performing a t–test on 50 runs of each 5.5. Tuned Parameter Values 55 POPULATION LEVEL ADAPTATION 120 100 100 80 80 SOLUTION FITNESS SOLUTION FITNESS NORMAL GA 120 60 40 20 0 60 40 20 0 1000 2000 3000 4000 EVALUATIONS 5000 6000 7000 0 8000 0 1000 120 120 100 100 80 80 60 40 20 0 2000 3000 4000 EVALUATIONS 5000 6000 7000 8000 7000 8000 INDIVIDUAL LEVEL ADAPTATION (PARENT IMPROVEMENT) SOLUTION FITNESS SOLUTION FITNESS INDIVIDUAL LEVEL ADAPTATION (MEDIAN IMPROVEMENT) 60 40 20 0 1000 2000 3000 4000 5000 6000 7000 8000 EVALUATIONS 0 0 1000 2000 3000 4000 5000 6000 EVALUATIONS Figure 5.8: Performance for De Jong f4 - All GA Types problem instance to determine if there was a significant difference between median/parent improvement criteria resulted in the following p–values: For f2, p-value was 0.78 and for f4, p-value was 0.70. Details of the t–test procedure used and interpretation of the p–value are discussed in section 6.2.1. It is clear that the difference is not significant, therefore, the individual level adaptation using median improvement is no longer considered in the remaining experiments. Henceforth, the term individual level adaptation can be taken to mean individual level adaptation with parent improvement criteria. 5.5 Tuned Parameter Values Table 5.1 summarises the optimal parameter settings found for each problem. The entry relating to normal GAs specifies P(Cr), while the entries relating to the adaptive GAs are of the form depth / decay / qlen. 56 Chapter 5. Formative Experiments Problem Binary f6 30 city TSP 100 city TSP MaxOnes DeJong f1 DeJong f2 DeJong f3 DeJong f4 DeJong f5 GA Type Best Settings Normal 50 Pop Adapt 5 / 0.8 / 500 Ind Adapt 3 / 0.8 / 100 Normal 65 Pop Adapt 5 / 0.8 / 50 Ind Adapt 5 / 0.8 / 100 Normal 40 Pop Adapt 5 / 0.8 / 100 Ind Adapt 5 / 0.8 / 100 Normal 55 Pop Adapt 5 / 0.6 / 100 Ind Adapt 7 / 0.8 / 100 Normal 75 Pop Adapt 5 / 0.8 / 200 Ind Adapt 5 / 0.8 / 50 Normal 65 Pop Adapt 5 / 0.2 / 100 Ind Adapt 5 / 0.2 / 100 Normal 65 Pop Adapt 5 / 0.4 / 100 Ind Adapt 5 / 0.2 / 100 Normal 55 Pop Adapt 15 / 0.8 / 100 Ind Adapt 5 / 0.2 / 100 Normal 25 Pop Adapt 5 / 0.2 / 100 Ind Adapt 5 / 0.8 / 10 Table 5.1: Tuned Parameter Settings 5.6. Summary 57 POPULATION LEVEL ADAPTATION 30 25 25 20 20 SOLUTION FITNESS SOLUTION FITNESS NORMAL GA 30 15 10 5 0 15 10 5 0 1000 2000 3000 4000 EVALUATIONS 5000 6000 7000 0 8000 0 1000 30 30 25 25 20 20 15 10 5 0 2000 3000 4000 EVALUATIONS 5000 6000 7000 8000 7000 8000 INDIVIDUAL LEVEL ADAPTATION (PARENT IMPROVEMENT) SOLUTION FITNESS SOLUTION FITNESS INDIVIDUAL LEVEL ADAPTATION (MEDIAN IMPROVEMENT) 15 10 5 0 1000 2000 3000 4000 5000 6000 7000 8000 0 0 1000 2000 3000 EVALUATIONS 4000 5000 6000 EVALUATIONS Figure 5.9: Performance for De Jong f5 - All GA Types 5.6 Summary From Table 5.1, we can see that there is no particularly ‘dominant’ parameter setting(s). The P Cr values vary considerably from values between 25% and 75%. This is entirely expected, given the diverse nature of the problems under consideration. Generally the preferred ADOPP parameters tend to be ‘moderate’ in nature, with the extremes of the parameter ranges tending not to feature. There are some exceptions though, such as the large depth preferred by f4 and long qlen of binary f6. Also, f2 and f3 show lower decay values across both adaptive GAs. There are several instances of increased robustness to parameter changes for adaptive GAs, most notably for TSP–30, TSP–100, MaxOnes and De Jong f3. It was noted that the results obtained were identical 2 for a depth of 1 and for a decay of 0.0, with the other parameters at the locked values. Julstrom noted the equivalence implied by these parameter values in [13]; both result in only the immediate 2 Results are identical so long as the same random number generator seeding is used, otherwise results would be very similar. 58 operator being credited. Chapter 5. Formative Experiments Chapter 6 Summative Experiments 6.1 Aims The primary aim of these summative experiments is to enable the acceptance or rejection of the hypothesis of the project: Using the same adaptive mechanism, individual level adaptation will perform as well or better than population adaptation. Although we may show that individual level performance is equivalent to population level performance, it is hoped that individual level adaptation will offer a gain in performance. Since the overheads involved in terms of book–keeping are higher for individual adaptation, then some improvement is desired in order to offer grounds for justifying this additional overhead. A complementary aim of these experiments is to investigate the nature of operator probability for each test problem. By considering significant differences in performance alongside the nature of adaptation observed, it is hoped that any relationships present across the orthogonal dimensions of adaptation level and problem type will become apparent. 59 60 Chapter 6. Summative Experiments 6.2 Methodology In order to test the hypothesis we must define a performance gain, which can be manifested in two ways: 1. Speed improvement: The individual level adaptive GA attains a solution of a given quality in fewer evaluations than the population level adaptive GA. 2. Quality improvement: The individual level adaptive GA attains a converged solution of higher fitness than the the population level adaptive GA. Of course, it may be the case that, for example, a gain in speed is obtained to the detriment of final solution quality. In this case it cannot be convincingly stated that one method is truly ‘better’ than the other and therefore such a scenario would not be considered a performance gain. More accurately it is a performance trade–off. Therefore, in order to class individual level performance as improved, we must show one of three scenarios to be true: 1. Individual level adaptation performs better in both speed and quality. 2. Individual level adaptation performs better in terms of speed without being outperformed in quality. 3. Individual level adaptation performs better in terms of quality without being outperformed in speed. In order to determine whether or not a perceived difference represents a significant gain or otherwise in performance, t–tests were carried out across the final converged solution fitnesses, for all test problems. Additionally, where there appeared to be a potential speed gain during a run, then a t–test was performed at this point. Specific details of the t–tests are discussed in section 6.2.1. Using the parameter values in Table 5.1, 50 runs of each tuned GA type were executed, for all problems. In performing pairwise comparisons between the GA types the population level GA was taken as the basis of comparison. Therefore, the following two pairwise comparisons are always made for each test problem: 6.2. Methodology 61 1. normal GA vs population level adaptive GA 2. population level adaptive GA vs individual level adaptive GA The first comparison provides a rigorous test of the normal GA against the population level ADOPP system – something that did not feature in the original work. However, of most interest are the comparisons between population and individual level adaptive GAs, as this is the central concern of the hypothesis under test. In addition to scrutinising performance, the nature of operator probability adaptation will be considered for each problem. The hypothesis essentially makes the assumption that individual level adaptation will be able to exploit local features of the fitness landscape in order to improve performance. It would therefore be expected that different trajectories in probability will develop between population and individual level adaptation. For population level adaptation there is only one ‘set’ of operator probabilities – the global P Cr and P(Mu). For individual level however, there are population size sets of probabilities. Obviously the probabilities belonging to the best individuals over the course of the run are the most directly responsible for GA performance, but we are also interested in the adaptation occurring elsewhere in the population. For this reason it was decided to track the best, median and worst individuals’ operator probabilities so that any variations in individuals’ operator probability adaptation would be apparent. 6.2.1 T–test Details In order to compare results in a principled manner, t–tests are used in order to determine whether any apparent differences are actually significant. The confidence level assumed is 95%, such that we will tolerate a 5% chance of making a type 1 error; rejecting the hypothesis under test when it was in fact true. In this case the hypothesis under test is the null hypothesis, which will be rejected in favour of the alternative hypothesis (the principle hypothesis of the project), when the p–value obtained is less than or equal to 0.05. 62 Chapter 6. Summative Experiments Each t–test 1 results in a p–value, which translates into the probability of making a type 1 error. In this context the p–value literally translates into the probability of obtaining the observed data. In this case a sample of 50 fitness values obtained at the same point from each run by ‘chance’. That is, if we assume that the difference between the systems under test has no effect on the produced results (the null hypothesis) then what is the likelihood of obtaining these results. Therefore, if a p–value is sufficiently small (less than 0.05) then it is very likely the observed ‘difference’ is not down to simple chance, but rather is far more likely to be attributable to the differences which exist between the systems. For each t–test performed the Bonferroni corrected p-value is also quoted, for completeness, although the hypothesis will be accepted or otherwise based on the uncorrected p–values in this report. The Bonferroni correction addresses the ‘diluting’ effect of making a significance claim based on multiple comparisons of the same system. Basically, each comparison made between two systems increases the likelihood of making a type 1 error. For example, if two different GAs are compared at two points during a run (say half way through and at the end) and the resultant p-values for each comparison are both 0.05, then to simultaneously assert both comparisons as indicative of anything the actual likelihood of making an error has increased to 0.1, since either the first comparison or the second comparison may have lead to an incorrect rejection of the null hypothesis. Therefore to derive the corrected p–value, the original p–value is multiplied by the number of pairwise comparisons made [24]. If the result happens to be more than 1.0, then the corrected p–value is simply set to 1.0. 6.3 Results In each table detailing the resultant p–values, where any value is found to be significant (less than or equal to 0.5), then both the value and the GA type that exhibited the advantage will be shown in bold. 1 T–tests were performed using the OpenOffice TTEST() function, using the two-tailed, unequal variance version 6.3. Results 63 The graphs in the following sections detail the average of the best-of-run solution fitnesses, over the sample of 50 runs. The operator probability adaptation graphs are based on average P Cr and P Mu values over the same 50 runs. 6.3.1 Binary F6 Results INDIVDUAL LEVEL ADAPTIVE GA / POPULATION LEVEL ADAPTIVE GA 1 0.95 0.95 SOLUTION FITNESS SOLUTION FITNESS NORMAL GA / POPULATION LEVEL ADAPTIVE GA 1 0.9 0.85 0.8 0.75 500 1000 1500 2000 2500 3000 3500 4000 0.85 0.8 NORMAL GA POP ADAPT GA 0 0.9 4500 0.75 5000 IND ADAPT GA POP ADAPT GA 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 3500 4000 4500 5000 EVALUATIONS EVALUATIONS Figure 6.1: Comparative performance for Binary f6 POPULATION LEVEL ADAPTATION INDIVIDUAL LEVEL ADAPTATION 100 100 P(Cr) P(Mu) 60 40 20 0 BEST P(Cr) BEST P(Mu) MEDIAN P(Cr) MEDIAN P(Mu) WORST P(Cr) WORST P(Mu) 80 PROBABILITY (%) PROBABILITY (%) 80 60 40 20 0 500 1000 1500 2000 2500 3000 EVALUATIONS 3500 4000 4500 5000 0 0 500 1000 1500 2000 2500 3000 EVALUATIONS Figure 6.2: Operator probability adaptation for Binary f6 There were no significant differences for any of the comparisons made, as shown by table 6.1. All GA types appear to offer competitive performance with average solution fitnesses of greater than 0.99. For population level adaptation (Figure 6.2), the operator probability adaptation obtained closely matches that reported in [13] for the same parameter settings of depth = 5, decay = 0.8, qlen = 500. 64 Chapter 6. Summative Experiments comparison evaluation p-value p-value(cor) norm/pop 5000 0.53 1.0 ind/pop 1750 0.30 0.90 ind/pop 5000 0.57 1.0 Table 6.1: p–values for Binary f6 The static operator probability period prior to adaptation commencing is visible due to the large queue size. Once adaptation commences, mutation is favoured until around 1500 evaluations, after which crossover begins to dominate. The final crossover probability adapted (around 70%) is rather higher than the preferred static value of 50%. This difference is not that surprising though, given the robust nature of binary F6 to parameter changes, as shown in Figure 5.1. Individual level adaptation seems to be following the same basic trajectory, though mutation does not become dominant in the early part of the run. Rather both probabilities stay at approximately 50% until around 1750 evaluations, after which crosssover again becomes the dominant operator. The final P Cr value of individual level adaptation of around 60% is lower than that of population level. The correlation between the worst, median and best probabilities is very apparent. It appears that there may not be suitable local features in the landscape for individual adaptation to exploit, or individual adaptation is failing to exploit them if they are present. 6.3.2 30 city TSP For the 30 city TSP (TSP–30), again, there are no significant differences in converged solution quality. There is a significant speed gain for individual level adaptation, which means that the hypothesis is accepted for this problem and individual level adaptation is outperforming population level adaptation. However, the advantage is present very early on in the run (1000 evaluations out of a total of 15000), so in practical terms this is probably not that useful. The probabilities adapted at the population level are fairly similar in nature to those 6.3. Results 65 NORMAL GA / POPULATION LEVEL ADAPTIVE GA INDIVIDUAL LEVEL ADAPTIVE GA / POPULATION LEVEL ADAPTIVE GA 430 430 IND ADAPT GA POP ADAPT GA 428 428 426 426 TOUR LENGTH TOUR LENGTH NORMAL GA POP ADAPT GA 424 424 422 422 420 420 0 2000 4000 6000 8000 10000 EVALUATIONS 12000 14000 16000 0 2000 4000 6000 8000 10000 EVALUATIONS 12000 14000 16000 12000 14000 16000 Figure 6.3: Comparative performance for 30 city TSP POPULATION LEVEL ADAPTATION INDIVIDUAL LEVEL ADAPTATION 100 100 P(Cr) P(Mu) 60 40 20 0 BEST P(Cr) BEST P(Mu) MEDIAN P(Cr) MEDIAN P(Mu) WORST P(Cr) WORST P(Mu) 80 PROBABILITY (%) PROBABILITY (%) 80 60 40 20 0 2000 4000 6000 8000 10000 12000 14000 EVALUATIONS 16000 0 0 2000 4000 6000 8000 10000 EVALUATIONS Figure 6.4: Operator probability adaptation for 30 city TSP reported in [13]. Again the adapted P Cr is close to the favoured static setting of 65%, though slightly lower. Once again we have a strong correlation between all the individual level probabilities, which also seem to match closely with the population level probabilities prior to 6000 evaluations, in any case. After around 6000 evaluations it appears that adaptation is breaking down somewhat. This coincides with the performance graph for the individual level adaptive GA at the point at which the optimisation becomes a lot less aggressive. Since by this point we have obtained a competitive tour length, it becomes harder for the GA to improve upon this. We therefore have a higher proportion of offspring being created, in comparison with earlier in the run, which are not acceptable. Since the invocation of the creating operator is still logged, along with zero credit, this has the net result that the operator probabilities are being derived from decreasing operator credit scores and 66 Chapter 6. Summative Experiments comparison evaluation p-value p-value(cor) norm/pop 6000 0.42 1.0 norm/pop 15000 0.32 1.0 ind/pop 1000 0.02 0.08 ind/pop 15000 0.32 1.0 Table 6.2: p–values for 30 city TSP thusly become erratic. This idea is supported by the fact that the best individual suffers most, as it has the highest demand in terms of offspring quality. Also, with a longer operator queue the effect is markedly decreased, since there is a greater ‘store’ of credit in a larger queue, which acts to dampen the effect. 6.3.3 100 city TSP NORMAL GA / POPULATION LEVEL ADAPTIVE GA INDIVIDUAL LEVEL ADAPTIVE GA / POPULATION LEVEL ADAPTIVE GA 30000 30000 NORMAL GA POP ADAPT GA 28000 26000 TOUR LENGTH TOUR LENGTH 28000 24000 22000 20000 IND ADAPT GA POP ADAPT GA 26000 24000 22000 0 5000 10000 15000 20000 25000 30000 EVALUATIONS 35000 40000 45000 50000 20000 0 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000 EVALUATIONS Figure 6.5: Comparative performance for 100 city TSP With TSP–100, there is an improvement in both speed to solution and final solution quality with population level adaptation compared with the normal GA. This seems to be an instance of adaptation working sufficiently well that non-adaptive operation is bettered. There was no significant difference observed between population and individual level adaptive GA performance. The population level operator probability adaptation for TSP–100 (Figure 6.6) is 6.3. Results 67 POPULATION LEVEL ADAPTATION INDIVIDUAL LEVEL ADAPTATION 100 100 BEST P(Cr) BEST P(Mu) MEDIAN P(Cr) MEDIAN P(Mu) WORST P(Cr) WORST P(Mu) P(Cr) P(Mu) 80 PROBABILITY (%) PROBABILITY (%) 80 60 40 20 0 60 40 20 0 5000 10000 15000 20000 25000 30000 EVALUATIONS 35000 40000 45000 50000 0 0 5000 10000 15000 20000 25000 30000 EVALUATIONS 35000 40000 45000 50000 Figure 6.6: Operator probability adaptation for 100 city TSP comparison evaluation p-value p-value(cor) norm/pop 7500 3 0.01 3 0.01 norm/pop 50000 3 0.01 3 0.01 ind/pop 50000 0.93 1.0 Table 6.3: p–values for 100 city TSP very similar to that obtained for TSP–30, settling on a value of around P Cr = 60%. This does not strongly agree with the favoured static setting which was 40%. We have similar behaviour for the individual level adaptation, although the effects are not quite so pronounced. It is suspected that the greatly increased search space of TSP–100 means that there is more room for improvement within a promising area of the fitness landscape, resulting in more successful offspring (that represent improvements) and maintaining credit levels. Contrastingly, for TSP–30, the optimum is usually reached by around 8000 evaluations and so after this point the best individual’s stored credit values, cred Cr and cred Mu , can only decrease. This results in very erratic operator probabilities. 6.3.4 MaxOnes It should be noted that since the maximum of 100 was always found for MaxOnes, there was zero variance between the compared end–of–run samples (evaluation = 5000). As a result the p–values are undefined, however, since the variance between samples is 68 Chapter 6. Summative Experiments NORMAL GA / POPULATION LEVEL ADAPTIVE GA INDIVIDUAL LEVEL ADAPTIVE GA / POPULATION LEVEL ADAPTIVE GA 100 100 NORMAL GA POP ADAPT GA 95 90 SOLUTION FITNESS SOLUTION FITNESS 90 85 80 75 85 80 75 70 70 65 65 60 IND ADAPT GA POP ADAPT GA 95 0 500 1000 1500 2000 2500 3000 EVALUATIONS 3500 4000 4500 60 5000 0 500 1000 1500 2000 2500 3000 EVALUATIONS 3500 4000 4500 5000 3500 4000 4500 5000 Figure 6.7: Comparative performance for MaxOnes POPULATION LEVEL ADAPTATION INDIVIDUAL LEVEL ADAPTATION 100 100 BEST P(Cr) BEST P(Mu) MEDIAN P(Cr) MEDIAN P(Mu) WORST P(Cr) WORST P(Mu) P(Cr) P(Mu) 80 PROBABILITY (%) PROBABILITY (%) 80 60 40 20 0 60 40 20 0 500 1000 1500 2000 2500 3000 EVALUATIONS 3500 4000 4500 5000 0 0 500 1000 1500 2000 2500 3000 EVALUATIONS Figure 6.8: Operator probability adaptation for MaxOnes zero, this can in reality be interpreted as a certainty that there is no difference between the systems being compared. For this reason p–values of ‘1.0’ are entered, thereby asserting that the null hypothesis is certainly true which, for all practical purposes it is. The MaxOnes problem was optimally solved by all the GA types, which is not particularly surprising since it consists of a simple, unimodal landscape. There was an advantage in terms of speed to solution with population level adaptation compared with the normal GA. It appears that population level adaptation can successfully exploit this simple landscape. These results can be contrasted with those in [22], which considered a MaxOnes problem (also of 100 bits) in a steady–state GA using an adaptive technique known as COBRA (discussed in [22]). There are two main differences between ADOPP and COBRA. Firstly, COBRA updates operator probabilities periodically, after a certain number 6.3. Results 69 comparison evaluation p-value p-value(cor) norm/pop 1250 3 0.01 0.02 norm/pop 5000 1.0 1.0 ind/pop 1400 0.33 1.0 ind/pop 5000 1.0 1.0 Table 6.4: p–values for MaxOnes of evaluations have occurred, as opposed to after the creation of each new individual. Secondly, COBRA uses a set of fixed operator probabilities which are assigned to the operators at each ‘re-ranking’ interval, such that an operator that has made a very positive contribution to progress may be awarded with a higher probability from set. It was found that COBRA offered an increase in robustness to parameters over a normal GA, as we have also found from the tuning experiments. However, no actual performance gains were seen for COBRA compared with a normal GA. It appears that the finer grained updating mechanism of ADOPP may be suitably responsive for this problem instance to provide a gain in speed. The adapted P Cr for MaxOnes or population level adaptation closely matches the tuned value of 55%. The correlation between population and individual level adaptation is particularly evident for this problem. 6.3.5 De Jong f1 NORMAL GA / POPULATION LEVEL ADAPTIVE GA INDIVIDUAL LEVEL ADAPTIVE GA / POPULATION LEVEL ADAPTIVE GA 1.8 1.8 1.6 1.6 NORMAL GA POP ADAPT GA 1.4 1.4 IND ADAPT GA POP ADAPT GA 1.2 SOLUTION FITNESS SOLUTION FITNESS 1.2 1 0.8 0.6 1 0.8 0.6 0.4 0.4 0.2 0.2 0 0 0 500 1000 1500 EVALUATIONS 2000 2500 0 500 1000 1500 EVALUATIONS Figure 6.9: Comparative performance for De Jong f1 2000 2500 70 Chapter 6. Summative Experiments comparison evaluation p-value p-value(cor) norm/pop 700 0.16 0.80 norm/pop 2500 0.47 1.0 ind/pop 200 0.09 0.45 ind/pop 700 0.012 0.06 ind/pop 2500 0.21 1.0 Table 6.5: p–values for De Jong f1 POPULATION LEVEL ADAPTATION INDIVIDUAL LEVEL ADAPTATION 100 100 P(Cr) P(Mu) 60 40 20 0 BEST P(Cr) BEST P(Mu) MEDIAN P(Cr) MEDIAN P(Mu) WORST P(Cr) WORST P(Mu) 80 PROBABILITY (%) PROBABILITY (%) 80 60 40 20 0 500 1000 1500 EVALUATIONS 2000 2500 0 0 500 1000 1500 EVALUATIONS 2000 2500 Figure 6.10: Operator probability adaptation for De Jong f1 De Jong F1 is another unimodal landscape, which should be straightforward to optimise. There are no significant differences from the population level adaptive GA (either for normal or individual level adaptive GA) and given the simple nature of the fitness landscape this is not surprising. However, population level adaptation did offer an improvement in speed performance compared with individual level adaptation. It may be the case that individual level adaptation is an unnecessarily complex mechanism for such a simple function. This agrees with the previous intuition that f1 is too simple for individual level adaptation to be able to offer any advantages. There seems to be very little movement in the operator probability values for individual level adaptation when compared with population level adaptation, Figure 6.10 6.3. Results 71 NORMAL GA / POPULATION LEVEL ADAPTIVE GA 0.7 0.6 0.6 NORMAL GA POP ADAPT GA IND ADAPT GA POP ADAPT GA 0.5 SOLUTION FITNESS 0.5 SOLUTION FITNESS NORMAL GA / POPULATION LEVEL ADAPTIVE GA 0.7 0.4 0.3 0.2 0.1 0.4 0.3 0.2 0.1 0 0 0 500 1000 1500 EVALUATIONS 2000 2500 0 500 1000 1500 EVALUATIONS 2000 2500 2000 2500 Figure 6.11: Comparative performance for De Jong f2 POPULATION LEVEL ADAPTATION INDIVIDUAL LEVEL ADAPTATION 100 100 P(Cr) P(Mu) 60 40 20 0 BEST P(Cr) BEST P(Mu) MEDIAN P(Cr) MEDIAN P(Mu) WORST P(Cr) WORST P(Mu) 80 PROBABILITY (%) PROBABILITY (%) 80 60 40 20 0 500 1000 1500 EVALUATIONS 2000 2500 0 0 500 1000 1500 EVALUATIONS Figure 6.12: Operator probability adaptation for De Jong f2 6.3.6 De Jong f2 For this problem, all approaches seemed as good as each other. This is quite surprising, as it is a fairly complicated surface. It may simply be too complex for any meaningful adaptation to take place, though it does appear that the presence of adaptation does not have a detrimental effect either. Another possibility is that it is a ‘ceiling effect’; the small dimensionality (2 variables) of the problem is proving easy to optimise for all GA types. Interestingly there were no significant performance differences for binary f6, which also features a dimensionality of only 2 variables. 6.3.7 De Jong f3 The only result of note here is the speed gain observed for the normal GA over the population level adaptive GA. The presence of so many plateaus in the landscape is likely 72 Chapter 6. Summative Experiments comparison evaluation p-value p-value(cor) norm/pop 220 0.24 1.0 norm/pop 330 0.27 1.0 norm/pop 2500 0.78 1.0 ind/pop 500 0.15 0.75 ind/pop 2500 0.52 1.0 Table 6.6: p–values for De Jong f2 NORMAL GA / POPULATION LEVEL ADAPTIVE GA INDIVIDUAL LEVEL ADAPTIVE GA / POPULATION LEVEL ADAPTIVE GA 10 10 NORMAL GA POP ADAPT GA IND ADAPT GA POP ADAPT GA 8 SOLUTION FITNESS SOLUTION FITNESS 8 6 4 6 4 2 2 0 0 0 500 1000 1500 2000 2500 3000 EVALUATIONS 3500 4000 4500 5000 0 500 1000 1500 2000 2500 3000 EVALUATIONS 3500 4000 4500 5000 Figure 6.13: Comparative performance for De Jong f3 to be confusing to the adaptation mechanism due to the lack of meaningful information when operators do not produce a child which is located on a different plateau better or otherwise. Again, there is no advantage to be had in terms of solution quality with any of the GA types. The adapted operator probabilities match well with the favoured static rates (P Cr = 65%). 6.3.8 De Jong f4 Here, population level adaptation has outperformed the normal GA, but again only in terms of speed. F4 is basically a fairly simple surface, complicated in a stochastic manner by the presence of Gaussian(0,1) noise present in the fitness function. However, it would seem that the adaptive mechanism is robust enough such that it can still exploit the simple underlying surface, as was the case for MaxOnes. 6.3. Results 73 POPULATION LEVEL ADAPTATION INDIVIDUAL LEVEL ADAPTATION 100 100 P(Cr) P(Mu) 60 40 20 0 BEST P(Cr) BEST P(Mu) MEDIAN P(Cr) MEDIAN P(Mu) WORST P(Cr) WORST P(Mu) 80 PROBABILITY (%) PROBABILITY (%) 80 60 40 20 0 500 1000 1500 2000 2500 3000 EVALUATIONS 3500 4000 4500 5000 0 0 500 1000 1500 2000 2500 3000 EVALUATIONS 3500 4000 4500 5000 Figure 6.14: Operator probability adaptation for De Jong f3 comparison evaluation p-value p-value(cor) norm/pop 750 0.04 0.16 norm/pop 5000 0.16 0.64 ind/pop 2000 0.11 0.44 ind/pop 5000 0.16 0.64 Table 6.7: p–values for De Jong f3 The operator probability trajectories (Figure 6.16) match quite strongly between population and individual level adaptation. 6.3.9 De Jong f5 As was the case for De Jong f2, f5 shows no significant performance differences for each of the three GA types under consideration. Again, this seems a bit surprising considering the fairly complex nature of the landscape. Similarly to f2 though, f5 only has a dimensionality of 2 so this may be another instance of a ceiling effect. The correlation between individual probabilities is less obvious for f5 (Figure 6.18). The median individual probabilities seem to be dominant, while best and worst are staying relatively closer to 50%. 74 Chapter 6. Summative Experiments comparison evaluation p-value p-value(cor) norm/pop 1500 0.02 0.10 norm/pop 7500 0.23 1.0 ind/pop 550 0.41 1.0 ind/pop 1500 0.24 1.0 ind/pop 7500 0.71 1.0 Table 6.8: p–values for De Jong f4 comparison evaluation p-value p-value(cor) norm/pop 1100 0.38 1.0 norm/pop 7500 0.18 0.9 ind/pop 900 0.51 1.0 ind/pop 1250 0.39 1.0 ind/pop 7500 0.27 1.0 Table 6.9: p–values for De Jong f5 6.3. Results 75 NORMAL GA / POPULATION LEVEL ADAPTIVE GA INDIVIDUAL LEVEL ADAPTIVE GA / POPULATION LEVEL ADAPTIVE GA 100 100 80 80 IND ADAPT GA POP ADAPT GA SOLUTION FITNESS SOLUTION FITNESS NORMAL GA POP ADAPT GA 60 40 20 60 40 20 0 0 0 1000 2000 3000 4000 EVALUATIONS 5000 6000 7000 8000 0 1000 2000 3000 4000 EVALUATIONS 5000 6000 7000 8000 Figure 6.15: Comparative performance for De Jong f4 POPULATION LEVEL ADAPTATION INDIVIDUAL LEVEL ADAPTATION 100 100 P(Cr) P(Mu) 80 PROBABILITY (%) PROBABILITY (%) 80 60 40 20 0 BEST P(Cr) BEST P(Mu) MEDIAN P(Cr) MEDIAN P(Mu) WORST P(Cr) WORST P(Mu) 60 40 20 0 1000 2000 3000 4000 5000 6000 7000 8000 0 0 1000 2000 EVALUATIONS 3000 4000 5000 6000 7000 8000 EVALUATIONS Figure 6.16: Operator probability adaptation for De Jong f4 6.3.10 Discussion There is no consistent advantage of any one type of GA over another, though this is not really surprising given the mixture of test problems. De Jong f2 and f5 showed no significant differences between any of the 3 approaches. Interestingly, these two problems feature only 2 variables each – the lowest number of variables in all the De Jong problems. This could therefore be a ‘ceiling effect’, whereby the dimensionality of the problem is sufficiently small that all the GAs can comfortably optimise the problem. There are several instances of significant speed advantages in using one method over another, although again, there is no consistent method which offers an improvement across all problems. There is only one instance of an adaptive approach offering an improvement in the 76 Chapter 6. Summative Experiments NORMAL GA / POPULATION LEVEL ADAPTIVE GA INDIVIDUAL LEVEL ADAPTIVE GA / POPULATION LEVEL ADAPTIVE GA 14 14 12 12 NORMAL GA POP ADAPT GA 10 SOLUTION FITNESS SOLUTION FITNESS 10 8 6 8 6 4 4 2 2 0 IND ADAPT GA POP ADAPT GA 0 0 1000 2000 3000 4000 EVALUATIONS 5000 6000 7000 8000 0 1000 2000 3000 4000 EVALUATIONS 5000 6000 7000 8000 Figure 6.17: Comparative performance for De Jong f5 POPULATION LEVEL ADAPTATION INDIVIDUAL LEVEL ADAPTATION 100 100 P(Cr) P(Mu) 80 PROBABILITY (%) PROBABILITY (%) 80 60 40 20 0 BEST P(Cr) BEST P(Mu) MEDIAN P(Cr) MEDIAN P(Mu) WORST P(Cr) WORST P(Mu) 60 40 20 0 1000 2000 3000 4000 EVALUATIONS 5000 6000 7000 8000 0 0 1000 2000 3000 4000 5000 6000 7000 8000 EVALUATIONS Figure 6.18: Operator probability adaptation for De Jong f5 actual quality of solution (for TSP–100), but this is achieved by population adaptation versus the normal GA. Since there were no significant differences between population and individual level adaptation for this problem, we may also implicitly conclude that individual level adaptation has improved upon normal GA performance. The operator probability adaptation obtained for each problem is quite telling. Perhaps most notably is the high correlation seen between the worst, median and best individuals’ P(Cr)/P(Mu) values in individual level adaptation. This suggests that individual adaptation is unable to exploit localised differences in fitness landscapes, since, if this were occurring, it seems more likely that these adaptation trajectories would noticeably differ. Also, for most problems, the operator probability graphs were fairly similar between population and individual level adaptation. Generally the values adapted to were less pronounced in individual level adaptation, but the rough ‘shape’ of adapta- 6.4. Additional Large TSPs 77 tion appeared to be preserved. Overall it would appear that with sufficiently large problems, adaptation can offer a clear advantage. Also, it would appear that individual level adaptation is not exploiting localised fitness landscape features. 6.4 Additional Large TSPs Since there is only one instance of adaptation providing a significant improvement in solution quality, it is worthwhile investigating this further. The advantage was gained for the TSP–100 problem. This suggests that with sufficiently large and/or complex problems adaptation does offer an advantage. As it required a sufficiently large problem before an improvement was witnessed for population level adaptation, it is also possible that with still larger problems an improvement in solution quality may be obtained with individual level adaptation over population level adaptation. In order to test this idea, two further TSP instances were run, using the optimal parameter settings obtained for the TSP–100, namely P Cr = 40% for the normal GA and depth = 5, decay = 0.8 and qlen = 100 for the adaptive GAs. The extended instances feature 150 and 200 cities 2 and can be seen, together with the original 100 city instance, in Figure 6.19 (TSP–100 top right, TSP–150 top left and TSP–200 bottom). It can be seen that the problems all have the same approximately even distribution and uniform structure. Tables 6.10 and 6.11 show the p–values obtained for TSP–150 and TSP–200, respectively. 6.4.1 Discussion These results strongly support the observation that for sufficiently large problems (TSPs at least) an adaptive GA will outperform a non-adaptive GA. We have significant improvements for this comparison and for both problems in favour of population level 2 Data for the 200 city tsp was obtained from TSPLIB: heidelberg.de/groups/comopt/software/TSPLIB95/tsp/kroA200.tsp.gz. Data for the 150 city tsp was obtained from TSPLIB: heidelberg.de/groups/comopt/software/TSPLIB95/tsp/kroA150.tsp.gz http://www.iwr.unihttp://www.iwr.uni- 78 Chapter 6. Summative Experiments Figure 6.19: Maps of 100,150 and 200 city TSPs NORMAL GA / POPULATION LEVEL ADAPTIVE GA INDIVIDUAL LEVEL ADAPTIVE GA / POPULATION LEVEL ADAPTIVE GA 50000 50000 45000 NORMAL GA POP ADAPT GA 40000 TOUR LENGTH TOUR LENGTH 45000 35000 30000 25000 IND ADAPT GA POP ADAPT GA 40000 35000 30000 0 10000 20000 30000 40000 50000 60000 70000 80000 EVALUATIONS 25000 0 10000 20000 30000 40000 50000 60000 70000 80000 EVALUATIONS Figure 6.20: Comparative performance for 150 city TSP adaptation. One significant gain in speed was obtained, but this favoured population level adaptation, as opposed to individual. We have very similar probability adaptation results as those seen previously, with considerable similarity between population and individual adaptation, and the familiar correlation between the worst, median and best individuals’ operator probabilities. We have shown with some confidence the success of adaptation for large TSPs. However, individual level adaptation is still equalled or outperformed by population level adaptation. This is not so surprising since the problem size was the only aspect scaled up. The nature of the problem structure itself did not change, i.e. there is a clear lack of any form of localised feature which may provide leverage for individual adaptation. 6.5. Additional TSPs with Varying Structure 79 POPULATION LEVEL ADAPTATION INDIVIDUAL LEVEL ADAPTATION 100 100 BEST P(Cr) BEST P(Mu) MEDIAN P(Cr) MEDIAN P(Mu) WORST P(Cr) WORST P(Mu) P(Cr) P(Mu) 80 PROBABILITY (%) PROBABILITY (%) 80 60 40 20 0 60 40 20 0 10000 20000 30000 40000 50000 EVALUATIONS 60000 70000 80000 0 0 10000 20000 30000 40000 50000 EVALUATIONS 60000 70000 80000 Figure 6.21: Operator probability adaptation for 150 city TSP comparison evaluation p-value p-value(cor) norm/pop 2550 3 0.01 3 0.01 norm/pop 75000 3 0.01 0.025 ind/pop 12450 0.045 0.18 ind/pop 75000 0.37 1.0 Table 6.10: p–values for 150 city TSP 6.5 Additional TSPs with Varying Structure Since scaling up the problem size of the TSPs shows no differences between the adaptation levels, it is possible that the actual problem structure may play a more influential role in the applicability of individual level adaptation. The following figures show the layout of some additional TSPs on which population and individual level adaptive GAs were run. The problems feature a diverse manner of city layouts. Each of the following TSP instances were run using the population level and incomparison evaluation p-value p-value(cor) norm/pop 5000 3 0.01 3 0.01 norm/pop 100000 3 0.01 3 0.01 ind/pop 100000 0.63 1.0 Table 6.11: p–values for 200 city TSP 80 Chapter 6. Summative Experiments INDIVIDUAL LEVEL ADAPTIVE GA / POPULATION LEVEL ADAPTIVE GA 55000 50000 50000 NORMAL GA POP ADAPT GA 45000 TOUR LENGTH TOUR LENGTH NORMAL GA / POPULATION LEVEL ADAPTIVE GA 55000 40000 40000 35000 35000 30000 30000 0 10000 20000 30000 40000 50000 60000 EVALUATIONS 70000 80000 90000 IND ADAPT GA POP ADAPT GA 45000 100000 0 10000 20000 30000 40000 50000 60000 EVALUATIONS 70000 80000 90000 100000 80000 90000 100000 Figure 6.22: Comparative performance for 200 city TSP POPULATION LEVEL ADAPTATION INDIVIDUAL LEVEL ADAPTATION 100 100 BEST P(Cr) BEST P(Mu) MEDIAN P(Cr) MEDIAN P(Mu) WORST P(Cr) WORST P(Mu) P(Cr) P(Mu) 80 PROBABILITY (%) PROBABILITY (%) 80 60 40 20 0 60 40 20 0 10000 20000 30000 40000 50000 60000 EVALUATIONS 70000 80000 90000 100000 0 0 10000 20000 30000 40000 50000 60000 70000 EVALUATIONS Figure 6.23: Operator probability adaptation for 200 city TSP dividual level adaptive GAs using the ADOPP parameters previously stated, namely depth = 5, decay = 0.8 and qlen = 100. The normal GA was not run on these problems as it has clearly been shown that the adaptive GAs outperform the normal GA for large TSPs. For each TSP, a map illustrating the layout of the cities is given, along with the experimental results. 6.5.1 105 City TSP Figure 6.24 shows the layout for a 105 city TSP instance 3 . This problem features a more structured layout than previous examples, with several highly linear clusters. The 3 Data for the 105 city TSP obtained from heidelberg.de/groups/comopt/software/TSPLIB95/tsp/lin105.tsp.gz TSPLIB: http://www.iwr.uni- 6.5. Additional TSPs with Varying Structure 81 IND ADAPT GA POP ADAPT GA Figure 6.24: Map of 105 city TSP INDIVIDUAL LEVEL ADAPTIVE GA / POPULATION LEVEL ADAPTIVE GA 17000 16500 TOUR LENGTH 16000 15500 15000 14500 14000 0 10000 20000 30000 40000 50000 60000 EVALUATIONS Figure 6.25: Comparative performance for 105 city TSP overall layout is approximately symmetric. comparison evaluation p-value p-value(cor) ind/pop 7980 0.49 0.98 ind/pop 52500 0.85 1.0 Table 6.12: p–values for 105 city TSP 6.5.2 127 City TSP Figure 6.27 shows an instance of a 127 city TSP 4 . This particular problem features a highly concentrated central cluster, with a few sparse outliers. 4 Data for the 127 city TSP obtained from TSPLIB: heidelberg.de/groups/comopt/software/TSPLIB95/tsp/bier127.tsp.gz http://www.iwr.uni- 82 Chapter 6. Summative Experiments POPULATION LEVEL ADAPTATION INDIVIDUAL LEVEL ADAPTATION 100 100 BEST P(Cr) BEST P(Mu) MEDIAN P(Cr) MEDIAN P(Mu) WORST P(Cr) WORST P(Mu) P(Cr) P(Mu) 80 PROBABILITY (%) PROBABILITY (%) 80 60 40 20 0 60 40 20 0 10000 20000 30000 EVALUATIONS 40000 50000 60000 0 0 10000 20000 30000 EVALUATIONS 40000 50000 60000 Figure 6.26: Operator probability adaptation for 105 city TSP Figure 6.27: Map of 127 city TSP 6.5.3 225 City TSP Figure 6.30 illustrates an instance of a 225 city TSP 5 . This is the most pathological example considered, featuring a perfect grid of cities. The problem is in a sense globally quite uniform, but also features concentrated linear sub–components. 6.5.4 120 City TSP Figure 6.33 is derived from the original 30 city TSP, and is basically that TSP, repeated 4 times, in order to create a TSP featuring multiple, isolated clusters. 5 Data for the 225 city TSP obtained from heidelberg.de/groups/comopt/software/TSPLIB95/tsp/ts225.tsp.gz TSPLIB: http://www.iwr.uni- IND ADAPT GA POP ADAPT GA 6.5. Additional TSPs with Varying Structure 83 INDIVIDUAL LEVEL ADAPTIVE GA / POPULATION LEVEL ADAPTIVE GA 134000 132000 TOUR LENGTH 130000 128000 126000 124000 122000 120000 118000 0 10000 20000 30000 40000 EVALUATIONS 50000 60000 70000 Figure 6.28: Comparative performance for 127 city TSP POPULATION LEVEL ADAPTATION INDIVIDUAL LEVEL ADAPTATION 100 100 BEST P(Cr) BEST P(Mu) MEDIAN P(Cr) MEDIAN P(Mu) WORST P(Cr) WORST P(Mu) P(Cr) P(Mu) 80 PROBABILITY (%) PROBABILITY (%) 80 60 40 20 0 60 40 20 0 10000 20000 30000 40000 EVALUATIONS 50000 60000 70000 0 0 10000 20000 30000 40000 50000 60000 70000 EVALUATIONS Figure 6.29: Operator probability adaptation for 127 city TSP 6.5.5 Discussion Although these additional TSPs provide a considerable variety in terms of problem structure, individual level adaptation has still failed to out perform population level adaptation. No significant differences were observed for either adaptive technique. The operator probability adaptation graphs obtained follow the same trends as witnessed for previous problems. There is a reasonable degree of similarity in the trajectory followed for population and individual level adaptation and, again, a high correlation between best, median and worst individuals’ P Cr and P Mu values. At least initially, while adaptation is stable. Although the city layouts vary considerably, it is of course possible that the resulting fitness landscapes are not all that different, but this seems unlikely. 84 Chapter 6. Summative Experiments comparison evaluation p-value p-value(cor) ind/pop 14986 0.53 1.0 ind/pop 63500 0.81 1.0 Table 6.13: p–values for 127 city TSP IND ADAPT GA POP ADAPT GA Figure 6.30: Map of 225 city TSP INDIVIDUAL LEVEL ADAPTIVE GA / POPULATION LEVEL ADAPTIVE GA 150000 TOUR LENGTH 145000 140000 135000 130000 125000 0 20000 40000 60000 80000 100000 120000 EVALUATIONS Figure 6.31: Comparative performance for 225 city TSP comparison evaluation p-value p-value(cor) ind/pop 112500 0.15 0.15 Table 6.14: p–values for 225 city TSP 6.5. Additional TSPs with Varying Structure 85 POPULATION LEVEL ADAPTATION INDIVIDUAL LEVEL ADAPTATION 100 100 BEST P(Cr) BEST P(Mu) MEDIAN P(Cr) MEDIAN P(Mu) WORST P(Cr) WORST P(Mu) P(Cr) P(Mu) PROBABILITY (%) 80 60 40 20 0 60 40 20 0 20000 40000 60000 EVALUATIONS 80000 100000 0 120000 0 20000 40000 60000 80000 EVALUATIONS Figure 6.32: Operator probability adaptation for 225 city TSP IND ADAPT GA POP ADAPT GA Figure 6.33: Map of 120 city TSP INDIVIDUAL LEVEL ADAPTIVE GA / POPULATION LEVEL ADAPTIVE GA 3000 2900 2800 TOUR LENGTH PROBABILITY (%) 80 2700 2600 2500 2400 2300 0 10000 20000 30000 40000 50000 60000 EVALUATIONS Figure 6.34: Comparative performance for 120 city TSP 100000 120000 86 Chapter 6. Summative Experiments POPULATION LEVEL ADAPTATION INDIVIDUAL LEVEL ADAPTATION 100 100 BEST P(Cr) BEST P(Mu) MEDIAN P(Cr) MEDIAN P(Mu) WORST P(Cr) WORST P(Mu) P(Cr) P(Mu) 80 PROBABILITY (%) PROBABILITY (%) 80 60 40 20 0 60 40 20 0 10000 20000 30000 EVALUATIONS 40000 50000 60000 0 0 10000 20000 30000 40000 EVALUATIONS Figure 6.35: Operator probability adaptation for 120 city TSP comparison evaluation p-value p-value(cor) ind/pop 15000 0.71 1.0 ind/pop 60000 0.23 0.46 Table 6.15: p–values for 120 city TSP 50000 60000 Chapter 7 Conclusion 7.1 Project Summary This report has detailed an investigation into whether optimisation performance in adaptive GAs can be improved by using finer grained adaptation of operator probabilities at the individual, as opposed to the population, level. The primary aim of the project was to test the hypothesis: An adaptive mechanism operating at the individual level will perform as well or better than the same mechanism operating at the population level A secondary aim was to compare normal GA performance with adaptive performance. In this case, the population level adaptive GA was compared with the normal GA. The motivations for pursuing self–adaptation in general come from two main points of view: 1. Hand tuning of GAs is time consuming and difficult, therefore reliable self– adaptation of parameters can improve on this situation 2. It is likely that for many problems, the optimal parameter values are not fixed entities, but rather should vary as the GA progresses The assumption that individual level adaptation will provide better performance than population level adaptation, is based on the rationale that, faced with an appropriate landscape (highly non–uniform) the finer grained capabilities of individual level 87 88 Chapter 7. Conclusion adaptation will yield improved results as it is able to respond in a more focused manner to the nuances of the landscape. Feedback is not automatically averaged together, possibly obscuring information in the process, as is the case for population level adaptation. In order to conduct the investigation a system was constructed based on [13], which adapted operator probabilities at the population level. Modifications to this system were implemented such that a non–adaptive GA and finer grained (individual level) adaptive GA were realised. Furthermore, a diverse set of test problems were selected on the basis of stressing each GA type with a large variety of fitness landscapes, with the aim that those landscapes featuring appropriately localised sub–features would be successfully exploited by individual level adaptation. Formative experimentation was then carried out in order to characterise each GA type’s response to parameter variation and to informally assess each GAs robustness to parameter variation. From this, parameter settings were selected with which to conduct more in–depth experimentation. Using the tuned parameter values from initial experimentation, larger sample sizes of 50 runs were taken and the results compared using t–tests as appropriate. The t–test results enabled the acceptance or rejection of the stated hypothesis. 7.2 Conclusions 7.2.1 Test of Hypothesis A thorough test of the hypothesis has been successfully conducted. The results comparing population and individual adaptive levels whilst not perhaps what was hoped for, given the assumptions behind the hypothesis, are still informative. There were several clear trends identified in the results. The hypothesis was accepted for the 30 city TSP, on the grounds of a significant advantage in speed. As much as the technical requirements of accepting the hypothesis have been met (population level adaptation did not proceed to perform better on final solution quality, for example), the result is tentative on two counts: 7.2. Conclusions 89 1. The advantage was observed in the very early stages of the run, therefore unless in the context of a very time–limited situation, this is unlikely to be conducive of real practical benefit. 2. The corrected p–value was not significant for this result, which does not lend confidence to the result. Also, the fact that there is no convincing trend supporting the result, does not lend confidence as to the general significance of the result. In light of the number of TSPs addressed in the study, and the fact that no such results were obtained for any other TSP instances, leads to the conclusion that this result is most likely a type 1 error, or a peculiarity of that particular TSP. For two test problems, De Jong f1 and TSP–150, the hypothesis was rejected as it was shown that population level adaptation exhibited a speed advantage over individual level adaptation (in both cases). There are no obvious commonalities between these problems that might provide some rationale for the result. F1 is a low dimensional and unimodal problem, TSP–150 is comparatively high dimensional and almost certainly multimodal. It is thought that these results are not necessarily indicative of a specific trend or pattern, especially in light of the fact that TSP–150 was the only TSP to obtained such a result. Further experimentation is required to investigate this. For the all other problems, the hypothesis was accepted, but only on the grounds of individual level adaptation performing equally as well as population level adaptation. As mentioned earlier, showing an equivalence in performance of the techniques is not indicative of any real practical benefit; if anything the opposite is true. Since the overheads, particularly as regards additional book–keeping, are considerably higher for individual level adaptation, we really require some performance gain before justifying the use of the method. 7.2.1.1 Why Was There No Advantage? The main conclusion to be drawn is that the adaptive mechanism implemented does not offer an advantage over population level adaptation by operating at the finer grained individual level. There are two possible reasons why this is the case: 90 Chapter 7. Conclusion 1. The ceiling effect was encountered for many problems, making adaptation somewhat redundant 2. The mechanism and GA implementation is failing to capitalise on local sub– features of appropriate landscapes The first reason is probable since for the three smallest numerical optimisation problems (in terms of dimensionality) there were no significant differences observed in performance. This was also found to be the case for the smallest TSP instance of 30 cities. Given the lack of significant differences observed for the varied structures of TSP problems attempted, at seems reasonable to assert that the mechanism itself is failing to respond effectively to the subtleties of landscapes being searched. This may be because operator probability is an inappropriate subject for adaptation. Alternatively, it is possible that the operator productivity metric in use is an insufficient mechanism with which to extract the relevant information from the progress of search (such that adaptation may occur and provide benefit). Both these suggestions seem feasible. Crossover can in some sense be regarded as a ‘global’ operator – it may combine solutions existing in far apart regions of the search space and result in considerable disruption. Therefore, allowing the probability of crossover to adapt, even at the individual level, does not significantly alter the net result of its application. The operator productivity used in ADOPP is relatively coarse compared with the approaches in some other works, such as [19] and [20]. Both these works utilise fitness values directly in the adaptive mechanism ([19] by utilising the difference between parent and child in order to reward children that are good and better than their parents, [20] by utilising the maximum and average fitnesses, along with the individual of interest’s fitness in order to control the amount of disruption experienced by that individual; basically, high disruption for lower fitnesses and low disruption for higher fitnesses). These works both report positive results. Although ADOPP does utilise fitness as a means of determining improvement, the actual magnitude of improvements are not featured in any way in the adaptive mechanism. Perhaps such a modification would prove advantageous. 7.2. Conclusions 91 7.2.2 Normal Versus Adaptive Performance A thorough comparison of the original population level adaptive GA with a non– adaptive counterpart was successfully realised and addressed the lack of a rigorous comparison in the original work. There were no advantages in population level adaptation compared with a normal GA, for the original two problems. There is only one instance of a significant result in favour of a normal GA over population level adaptation. This was obtained for De Jong f3 (the step function) and was in terms of speed only. This was most likely due to the misleading nature of the landscape, featuring several regions in which no useful information can be derived to enable adaptation. This may be indicative of situations in which adaptation is detrimental to performance. More work is required to investigate this possibility. Overwhelmingly, where differences were significant, population level adaptation outperformed normal adaptation. For the majority of these cases there were no differences between population and individual level adaptation, hence we may infer that individual level adaptation is typically superior to a normal GA. Gains in speed only for population level adaptation were observed for MaxOnes and De Jong f4, both of which are straightforward functions. Admittedly, f4 is complicated somewhat by the presence of noise, though the underlying function is simple enough. The gain for MaxOnes suggests that simple, unimodal problems can be successfully exploited, though no such benefit was seen for f1 (also straight forward and unimodal), so more experimentation is required to investigate this. The success for the f4 problem also suggests that the mechanism is robust in the presence of noise, though of course there is not enough evidence to make any firm conclusions, further noisy examples, based perhaps on more complicated functions would be more illuminating. A very convincing advantage for population level adaptation was evident for the large TSPs attempted (of 100, 150 and 200 cities). Adaptation brought both speed and quality improvements for each TSP and the corrected p–values also showed significance, in all cases. It can therefore be concluded that for sufficiently large TSPs, adaptation definitely does deliver an advantage. 92 Chapter 7. Conclusion 7.3 Further Work The work could be continued and extended in many ways. Some avenues that are felt worth pursuing are as follows: 1. It may prove more beneficial to adapt operator parameters, as opposed to operator probabilities. This would be particularly relevant in the case of mutation rate, as this provides a direct control of the disruptive force resulting from mutation. Due to the local scope of mutation – it typically makes small incremental changes to a solution – this seems a more likely candidate for the successful exploitation of localised landscape features. 2. Increase the dimensionality of the numerical optimisation problems. This would answer the question of whether the typical lack of difference in these problems is actually some due to some inherent quality of the fitness landscapes themselves, or whether it is due to a ceiling effect. If a ceiling effect was primarily responsible, then higher dimensional problems should begin to yield differences in results that would prove more instructive as to whether adaptation is suited to the problems or not. 3. Attempt to make the ‘improvement’ metric more explicit, by including resulting differences in fitness of new offspring relative to parents since, as discussed, this appears to aid performance. Similarly, some explicit metric of population diversity may prove useful in enhancing the adaptation, as this can result in a much more refined balance of exploration versus exploitation [20], where those individuals of poor fitness can be subject to much more disruptive forces, and those of higher fitness are preserved. Bibliography [1] P. J. Angeline. Adaptive and self–adaptive evolutionary computations. In M. Palaniswami and Y. Attikiouzel, editors, Computational Intelligence: A Dynamic Systems Perspective, pages 152–163. IEEE Press, 1995. [2] T. Bäck. Self–adaptation in genetic algorithms. In Varela and Bourgine, editors, Towards a Practice of Autonomous Systems: Proceedings of the First European Conference on Artificial Life. MIT Press, 1992. [3] L. Davis. Adapting operator probabilities in genetic algorithms. In J. D. Schaffer, editor, Proceedings of the Third International Conference on Genetic Algorithms, pages 61–69, San Mateo, CA, 1989. Morgan Kaufmann. [4] L. Davis. Handbook of Genetic Algorithms. Van Nostrand Reinhold, New York, NY, 1991. [5] Á. E. Eiben, R. Hinterding, and Z. Michalewicz. Parameter control in evolutionary algorithms. IEEE Trans. on Evolutionary Computation, 3(2):124–141, 1999. [6] D. B. Fogel. An introduction to simulated evolutionary optimization. IEEE Trans. on Neural Networks: Special Issue on Evolutionary Computation, 5(1):3–14, 1994. [7] J. J. Grefenstette. Optimization of control parameters for genetic algorithms. IEEE Transactions on Systems, Man and Cybernetics (SMC), 16(1):122–128, 1986. [8] R. Hinterding, Z. Michalewicz, and Á. E. Eiben. Adaptation in evolutionary computation: A survey. In Proceedings of The IEEE Conference on Evolutionary Computation, IEEE World Congress on Computational Intelligence, 1997. [9] J. H. Holland. Adaptation in Natural and Artificial Systems. The University of Michigan Press, Ann Arbor, 1975. 93 94 Bibliography [10] J.R. Holland I. Oliver, D. Smith. A study of permutation crossover operators on the traveling salesman problem. In J. Grefenstette, editor, Genetic Algorithms and their Applications: Proceedings of the Second International Conference, Hillsdale, New Jersey, 1987. Lawrence Erlbaum. [11] K. A. De Jong. An Analysis of the Behaviour of a Class of Genetic Adaptive Systems. PhD thesis, University of Michigan, Ann Arbor, 1975. [12] B. A. Julstrom. Very greedy crossover in a genetic algorithm for the traveling salesman problem. In Applied Computing 1995: Proceedings of the 1995 ACM Symposium on Applied Computing, New York, 1995. ACM Press. [13] B. A. Julstrom. What have you done for me lately? adapting operator probabilities in a steady–state genetic algorithm. In L. J. Eshelman, editor, Proceedings of the Sixth International Conference on Genetic Algorithms, pages 81–87, San Mateo, CA, 1995. Morgan Kaufmann. [14] M. Mitchell. An Introduction to Genetic Algorithms. MIT Press, 1992. [15] J. E. Pettinger and R. M. Everson. Controlling genetic algorithms with reinforcement learning. Engineering Optimisation (Submitted), 2003. [16] P.M. Ross and D. Corne. Applications of genetic algorithms. In AISB Quarterly, volume 89, pages 32–30, 1995. [17] J. Smith and T. C. Fogarty. Self adaptation of mutation rates in a steady state genetic algorithm. In International Conference on Evolutionary Computation, pages 318–323, 1996. [18] J.E. Smith and T. C. Fogarty. Operator and parameter adaptation in genetic algorithms. Soft Computing – A Fusion of Foundations, Methodologies and Applications, 1(2):81–87, 1997. [19] W. M. Spears. Adapting crossover in evolutionary algorithms. In R. G. Reynolds J. R. McDonnell and D. B. Fogel, editors, Proceedings of the Fourth Annual Conference on Evolutionary Programming, pages 367–384, Cambridge, MA, 1995. MIT Press. [20] M. Srinivas and L.M. Patnaik. Adaptive probabilities of crossover and mutation in genetic algorithms. IEEE Transactions on Systems, Man and Cybernetics, 24(4):656–667, 1994. [21] G. Syswerda. Uniform crossover in genetic algorithms. In Proceedings of the Third International Conference on Genetic Algorithms, San Mateo, California, 1989. Morgan Kaufmann. Bibliography 95 [22] A. Tuson and P. Ross. Adapting operator settings in genetic algorithms. Evolutionary Computation, 6(2):161–184, 1998. [23] Darrell Whitley. The GENITOR algorithm and selection pressure: Why rank– based allocation of reproductive trials is best. In J. D. Schaffer, editor, Proceedings of the Third International Conference on Genetic Algorithms, San Mateo, CA, 1989. Morgan Kaufman. [24] M. Wineberg and S. Christensen. Using appropriate statistics. In Tutorials of GECCO 2003: Genetic and Evolutionary Computation Conference, pages 339– 358, Chicago, IL, USA, 2003. Appendix A Test Problem Settings This appendix details the fixed parameter settings for all the test problems featured in the report. The ‘Uniform Xover’ entry refers to the probability that a gene is selected from the second parent, in uniform crossover. The ‘Report Gap’ entry refers to the frequency with which the GA was sampled during a run. For example, a report gap of 25 means metrics were taken after every 25 evaluations. A.1 Binary Encoded Problems Table A.1 details the settings for all binary encoded problems. A.2 Traveling Salesman Problems Table A.2 details the settings for all TSPs. Note that ‘Size’ and ‘String Length’ are not included in this table since they are both equivalent to the number of cities in the problem. Other parameters are not included since these are not defined for TSPs, e.g. mutation rate. 97 98 Appendix A. Test Problem Settings Problem Binary f6 MaxOnes f1 f2 f3 f4 f5 Size 2 100 3 2 5 30 2 String Length 44 100 30 24 50 240 34 Mutation Rate 0.05 0.01 Uniform Xover 0.5 0.5 0.5 0.5 0.5 0.5 0.5 Population Size 100 100 100 100 100 100 100 Selection Pressure 1.5 2 1.5 1.5 1.5 1.5 1.5 Evaluations 5000 5000 2500 2500 5000 7500 7500 Report Gap 25 25 10 10 25 25 25 0.033 0.042 0.02 0.0042 0.029 Table A.1: Settings for all binary encoded problems # Cities 30 100 150 200 105 127 225 120 Population Size 150 500 750 1000 525 635 1000 600 Selection Pressure 2 2 2 2 2 2 2 2 Evaluations Report Gap 15000 50000 75000 100000 52500 63500 112500 60000 50 125 150 200 Table A.2: Settings for all TSPs 105 127 225 120 Appendix B Location of Data Files and Source Code B.1 Source Code B.1.1 Development and Execution Environment All work was implemented in Java (version 1.4.2 03), providing portability and relatively quick development time. The system was developed and run under Redhat 9 Linux on Pentium4 2GHz machine(s), each with 500 MB RAM. B.1.2 Location All Java source and class files are located in: /home/s0340561/PROJECT/code/ B.2 Data Location The main directory in which all experimental data is stored is: /home/s0340561/PROJECT/data/ 99 100 Appendix B. Location of Data Files and Source Code There are several sub–directories within this which contain data specific to certain test problem/GA type configurations. B.2.1 Formative Experiment Results Each problem type was optimised with the following four GA types: normal, population level adaptive, individual level adaptive (with median improvement) and individual level adaptive (with parent improvement). There are therefore four sub–directories for each problem of the form: 54 prob type 3 Norm - results for the normal GA 54 prob type 3 PopAd - results for the population level adaptive GA 54 prob type 3 IndAd median - results for the individual level adaptive GA (median improvement) 54 prob type 3 IndAd parent - results for the individual level adaptive GA (parent improvement) Where 4 prob type 3 is one of 6 f6, tsp30, tsp100, max1, djF1, djF2, djF2, djF3, djF4, djF5 7 . The order given here matches the order of the problem types as reported in chapter 5. B.2.2 Summative Experiment Results Summative experiment results are located in sub–directories of the form: 54 prob type 3 Extended 4 prob type 3 is defined as before. In addition, the following sub–directory contains data for all the additional TSP problems which featured in sections 6.4 and 6.5: tspOtherExtended

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Self–Adaptation in Evolutionary Algorithms Revisited