Download Self–Adaptation in Evolutionary Algorithms Revisited

Document related concepts

Birthday problem wikipedia , lookup

Inverse problem wikipedia , lookup

Pattern recognition wikipedia , lookup

Genetic algorithm wikipedia , lookup

Transcript
Self–Adaptation in
Evolutionary Algorithms
Revisited
James McKnight Stoddart McKenzie
NI VER
S
Y
TH
IT
E
U
R
G
H
O F
E
D I
U
N B
Master of Science
School of Informatics
University of Edinburgh
2004
Abstract
The concept of dynamic adaptation of operator probabilites in a genetic algorithm
(GA) has been well studied in the past. The rationale behind this idea is that by modifying operator probabilities, as the genetic algorithm runs, we can usefully bias the
probability of selecting a given operator, relative to its recent success in producing
fitter children.
This dissertation reports an empirical investigation into the performance of the
same adaptive mechanism, applied to adapting population level (global) operator probabilities, and also to individual (local) level operator probabilities. In addition, a non–
adaptive genetic algorithm performance is used for comparative purposes. Several test
problems form the basis of the experiments, including numerical optimisation problems (most of which are defined by the De Jong test suite), the MaxOnes problem and
a number of travelling salesman problems.
On average, individual level adaptation performed only equally as well as population level adaptation. For suitably large problems both population and individual level
adaptation were found to provide performance improvements over a non–adaptive GA.
In addition, utility was found in the increased robustness to parameter settings offered
by the adaptive GAs, for many problem instances.
iii
Acknowledgements
I would firstly like to thank my supervisor, Dr. Andrew Tuson, for his continued guidance and advice throughout the project.
I would also like to thank Bryant Julstrom, of St. Cloud State University, Minnesota, for the provision of papers and his generally helpful attitude.
Thanks also to my family and friends for providing support and encouragement
throughout the entire MSc.
iv
Declaration
I declare that this thesis was composed by myself, that the work contained herein is
my own except where explicitly stated otherwise in the text, and that this work has not
been submitted for any other degree or professional qualification except as specified.
(James McKnight Stoddart McKenzie)
v
Table of Contents
1 Introduction
1.1
1.2
1.3
1
Project Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.1.1
Aims and Approach . . . . . . . . . . . . . . . . . . . . . .
2
Genetic Algorithm Overview . . . . . . . . . . . . . . . . . . . . . .
3
1.2.1
Components of a Genetic Algorithm . . . . . . . . . . . . . .
3
1.2.2
Parameters of a Genetic Algorithm . . . . . . . . . . . . . . .
7
Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
2 Previous Research in Self–Adaptation
11
2.1
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
2.2
Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
2.2.1
Methods of Self–Adaptation . . . . . . . . . . . . . . . . . .
12
2.2.2
Levels of Self–Adaptation . . . . . . . . . . . . . . . . . . .
13
2.2.3
Adaptation Evidence . . . . . . . . . . . . . . . . . . . . . .
14
2.2.4
Subject of Adaptation . . . . . . . . . . . . . . . . . . . . .
14
2.2.5
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
Co–evolutionary Examples . . . . . . . . . . . . . . . . . . . . . . .
15
2.3.1
Adaptation of Crossover Type . . . . . . . . . . . . . . . . .
15
2.3.2
Co–evolution at Different Levels . . . . . . . . . . . . . . . .
17
Learning Rule Examples . . . . . . . . . . . . . . . . . . . . . . . .
17
2.4.1
Population Level Adaptation . . . . . . . . . . . . . . . . . .
17
2.4.2
Reinforcement Learning Approach . . . . . . . . . . . . . . .
18
2.4.3
Individual Level Adaptation . . . . . . . . . . . . . . . . . .
20
2.3
2.4
vii
2.5
General Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21
2.5.1
Parameter Migration . . . . . . . . . . . . . . . . . . . . . .
22
2.5.2
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . .
22
3 System Implementation
25
3.1
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
3.2
The Basic Genetic Algorithm . . . . . . . . . . . . . . . . . . . . . .
26
3.3
The ADOPP System . . . . . . . . . . . . . . . . . . . . . . . . . .
28
3.4
ADOPP Modification for Individual Adaptation . . . . . . . . . . . .
31
3.5
Genetic Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . .
34
3.5.1
Binary Encoding Operators . . . . . . . . . . . . . . . . . . .
34
3.5.2
Permutation Encoding Operators . . . . . . . . . . . . . . . .
35
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
3.6
4 The Test Problems
37
4.1
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
4.2
Binary f6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
4.3
30 City TSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
4.4
100 City TSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
4.5
MaxOnes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
4.6
De Jong Functions . . . . . . . . . . . . . . . . . . . . . . . . . . .
40
4.7
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43
5 Formative Experiments
45
5.1
Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45
5.2
Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45
5.2.1
Normal GA Parameter Treatments . . . . . . . . . . . . . . .
46
5.2.2
Adaptive Parameter Treatments . . . . . . . . . . . . . . . .
46
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
5.3.1
Binary F6 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
5.3.2
30 City TSP . . . . . . . . . . . . . . . . . . . . . . . . . . .
48
5.3.3
100 City TSP . . . . . . . . . . . . . . . . . . . . . . . . . .
49
5.3.4
MaxOnes . . . . . . . . . . . . . . . . . . . . . . . . . . . .
50
5.3
viii
5.3.5
De Jong f1 . . . . . . . . . . . . . . . . . . . . . . . . . . .
51
5.3.6
De Jong f2 . . . . . . . . . . . . . . . . . . . . . . . . . . .
51
5.3.7
De Jong f3 . . . . . . . . . . . . . . . . . . . . . . . . . . .
52
5.3.8
De Jong f4 . . . . . . . . . . . . . . . . . . . . . . . . . . .
53
5.3.9
De Jong f5 . . . . . . . . . . . . . . . . . . . . . . . . . . .
54
5.4
Revisiting Median/Parent Improvement . . . . . . . . . . . . . . . .
54
5.5
Tuned Parameter Values . . . . . . . . . . . . . . . . . . . . . . . .
55
5.6
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
57
6 Summative Experiments
59
6.1
Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
59
6.2
Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
60
6.2.1
T–test Details . . . . . . . . . . . . . . . . . . . . . . . . . .
61
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
62
6.3.1
Binary F6 Results . . . . . . . . . . . . . . . . . . . . . . . .
63
6.3.2
30 city TSP . . . . . . . . . . . . . . . . . . . . . . . . . . .
64
6.3.3
100 city TSP . . . . . . . . . . . . . . . . . . . . . . . . . .
66
6.3.4
MaxOnes . . . . . . . . . . . . . . . . . . . . . . . . . . . .
67
6.3.5
De Jong f1 . . . . . . . . . . . . . . . . . . . . . . . . . . .
69
6.3.6
De Jong f2 . . . . . . . . . . . . . . . . . . . . . . . . . . .
71
6.3.7
De Jong f3 . . . . . . . . . . . . . . . . . . . . . . . . . . .
71
6.3.8
De Jong f4 . . . . . . . . . . . . . . . . . . . . . . . . . . .
72
6.3.9
De Jong f5 . . . . . . . . . . . . . . . . . . . . . . . . . . .
73
6.3.10 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . .
75
Additional Large TSPs . . . . . . . . . . . . . . . . . . . . . . . . .
77
6.4.1
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . .
77
Additional TSPs with Varying Structure . . . . . . . . . . . . . . . .
79
6.5.1
105 City TSP . . . . . . . . . . . . . . . . . . . . . . . . . .
80
6.5.2
127 City TSP . . . . . . . . . . . . . . . . . . . . . . . . . .
81
6.5.3
225 City TSP . . . . . . . . . . . . . . . . . . . . . . . . . .
82
6.5.4
120 City TSP . . . . . . . . . . . . . . . . . . . . . . . . . .
82
6.5.5
Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . .
83
6.3
6.4
6.5
ix
7 Conclusion
87
7.1
Project Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . .
87
7.2
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
88
7.2.1
Hypothesis Test . . . . . . . . . . . . . . . . . . . . . . . . .
88
7.2.2
Normal Versus Adaptive Performance . . . . . . . . . . . . .
91
Further Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
92
7.3
Bibliography
93
A Test Problem Settings
97
A.1 Binary Encoded Problems . . . . . . . . . . . . . . . . . . . . . . .
97
A.2 Traveling Salesman Problems . . . . . . . . . . . . . . . . . . . . . .
97
B Location of Data Files and
Source Code
99
B.1 Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
99
B.1.1
Development and Execution Environment . . . . . . . . . . .
99
B.1.2
Location . . . . . . . . . . . . . . . . . . . . . . . . . . . .
99
B.2 Data Location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
99
B.2.1
Formative Experiment Results . . . . . . . . . . . . . . . . . 100
B.2.2
Summative Experiment Results . . . . . . . . . . . . . . . . 100
x
List of Figures
3.1
An Operator History . . . . . . . . . . . . . . . . . . . . . . . . . .
28
3.2
Edge-swap Mutation . . . . . . . . . . . . . . . . . . . . . . . . . .
36
4.1
Map of 30 city TSP . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
4.2
Map of 100 city TSP . . . . . . . . . . . . . . . . . . . . . . . . . .
39
5.1
Performance for Binary F6 - All GA Types . . . . . . . . . . . . . .
48
5.2
Performance for 30 city TSP - All GA Types . . . . . . . . . . . . . .
49
5.3
Performance for 100 city TSP - All GA Types . . . . . . . . . . . . .
50
5.4
Performance for MaxOnes - All GA Types . . . . . . . . . . . . . . .
51
5.5
Performance for De Jong f1 - All GA Types . . . . . . . . . . . . . .
52
5.6
Performance for De Jong f2 - All GA Types . . . . . . . . . . . . . .
53
5.7
Performance for De Jong f3 - All GA Types . . . . . . . . . . . . . .
54
5.8
Performance for De Jong f4 - All GA Types . . . . . . . . . . . . . .
55
5.9
Performance for De Jong f5 - All GA Types . . . . . . . . . . . . . .
57
6.1
Comparative performance for Binary f6 . . . . . . . . . . . . . . . .
63
6.2
Operator probability adaptation for Binary f6 . . . . . . . . . . . . .
63
6.3
Comparative performance for 30 city TSP . . . . . . . . . . . . . . .
65
6.4
Operator probability adaptation for 30 city TSP . . . . . . . . . . . .
65
6.5
Comparative performance for 100 city TSP . . . . . . . . . . . . . .
66
6.6
Operator probability adaptation for 100 city TSP . . . . . . . . . . .
67
6.7
Comparative performance for MaxOnes . . . . . . . . . . . . . . . .
68
6.8
Operator probability adaptation for MaxOnes . . . . . . . . . . . . .
68
6.9
Comparative performance for De Jong f1 . . . . . . . . . . . . . . .
69
xi
6.10 Operator probability adaptation for De Jong f1 . . . . . . . . . . . . .
70
6.11 Comparative performance for De Jong f2 . . . . . . . . . . . . . . .
71
6.12 Operator probability adaptation for De Jong f2 . . . . . . . . . . . . .
71
6.13 Comparative performance for De Jong f3 . . . . . . . . . . . . . . .
72
6.14 Operator probability adaptation for De Jong f3 . . . . . . . . . . . . .
73
6.15 Comparative performance for De Jong f4 . . . . . . . . . . . . . . .
75
6.16 Operator probability adaptation for De Jong f4 . . . . . . . . . . . . .
75
6.17 Comparative performance for De Jong f5 . . . . . . . . . . . . . . .
76
6.18 Operator probability adaptation for De Jong f5 . . . . . . . . . . . . .
76
6.19 Maps of 100,150 and 200 city TSPs . . . . . . . . . . . . . . . . . .
78
6.20 Comparative performance for 150 city TSP . . . . . . . . . . . . . .
78
6.21 Operator probability adaptation for 150 city TSP . . . . . . . . . . .
79
6.22 Comparative performance for 200 city TSP . . . . . . . . . . . . . .
80
6.23 Operator probability adaptation for 200 city TSP . . . . . . . . . . .
80
6.24 Map of 105 city TSP . . . . . . . . . . . . . . . . . . . . . . . . . .
81
6.25 Comparative performance for 105 city TSP . . . . . . . . . . . . . .
81
6.26 Operator probability adaptation for 105 city TSP . . . . . . . . . . .
82
6.27 Map of 127 city TSP . . . . . . . . . . . . . . . . . . . . . . . . . .
82
6.28 Comparative performance for 127 city TSP . . . . . . . . . . . . . .
83
6.29 Operator probability adaptation for 127 city TSP . . . . . . . . . . .
83
6.30 Map of 225 city TSP . . . . . . . . . . . . . . . . . . . . . . . . . .
84
6.31 Comparative performance for 225 city TSP . . . . . . . . . . . . . .
84
6.32 Operator probability adaptation for 225 city TSP . . . . . . . . . . .
85
6.33 Map of 120 city TSP . . . . . . . . . . . . . . . . . . . . . . . . . .
85
6.34 Comparative performance for 120 city TSP . . . . . . . . . . . . . .
85
6.35 Operator probability adaptation for 120 city TSP . . . . . . . . . . .
86
xii
List of Tables
5.1
Tuned Parameter Settings . . . . . . . . . . . . . . . . . . . . . . . .
56
6.1
p–values for Binary f6 . . . . . . . . . . . . . . . . . . . . . . . . .
64
6.2
p–values for 30 city TSP . . . . . . . . . . . . . . . . . . . . . . . .
66
6.3
p–values for 100 city TSP . . . . . . . . . . . . . . . . . . . . . . . .
67
6.4
p–values for MaxOnes . . . . . . . . . . . . . . . . . . . . . . . . .
69
6.5
p–values for De Jong f1 . . . . . . . . . . . . . . . . . . . . . . . . .
70
6.6
p–values for De Jong f2 . . . . . . . . . . . . . . . . . . . . . . . . .
72
6.7
p–values for De Jong f3 . . . . . . . . . . . . . . . . . . . . . . . . .
73
6.8
p–values for De Jong f4 . . . . . . . . . . . . . . . . . . . . . . . . .
74
6.9
p–values for De Jong f5 . . . . . . . . . . . . . . . . . . . . . . . . .
74
6.10 p–values for 150 city TSP . . . . . . . . . . . . . . . . . . . . . . . .
79
6.11 p–values for 200 city TSP . . . . . . . . . . . . . . . . . . . . . . . .
79
6.12 p–values for 105 city TSP . . . . . . . . . . . . . . . . . . . . . . . .
81
6.13 p–values for 127 city TSP . . . . . . . . . . . . . . . . . . . . . . . .
84
6.14 p–values for 225 city TSP . . . . . . . . . . . . . . . . . . . . . . . .
84
6.15 p–values for 120 city TSP . . . . . . . . . . . . . . . . . . . . . . . .
86
A.1 Settings for all binary encoded problems . . . . . . . . . . . . . . . .
98
A.2 Settings for all TSPs . . . . . . . . . . . . . . . . . . . . . . . . . .
98
xiii
Chapter 1
Introduction
1.1 Project Overview
Evolutionary Algorithms encompass the three main approaches of evolutionary computing: Genetic Algorithms, Evolutionary Strategies and Evolutionary Programming.
Although the specific details of each approach vary, they all draw inspiration from the
principles of natural evolution. Fogel [6], provides an introductory overview of the
Evolutionary Algorithms field.
This project focuses on Genetic Algorithms (GAs), a powerful and general purpose
technique for function optimisation. There are a great many decisions faced by the GA
designer, such as; population model, selection method, genetic operators to employ,
probability of applying certain operators etc. Due to the large number of design choices
and parameter settings involved in implementing and running GAs they are, as a result,
very hard to ‘tune’, i.e. to configure in an optimal fashion for the problem at hand.
Even if a reasonably extensive manual exploration of the parameter space is made,
and the best parameters are selected from the findings of this search, there is still no
guarantee the settings are optimal. It is very likely that the ‘ideal’ parameter settings
are actually dynamic schedules which vary as the GA progresses.
These two factors give rise to the concept of having parameters adapt, as the GA
runs, in order to hopefully provide a reduction in parameter tuning requirements and/or
an increase in performance since the intuition is that if the parameters adapt in the
1
2
Chapter 1. Introduction
desired way this will yield better optimisation capabilities.
This project considers an adaptive GA system, which operates at the population and
individual level (separately) in order to ascertain if there is an advantage to be had with
the finer grained approach of individual level adaptation. In addition a non–adaptive
GA counterpart will also be compared with an adaptive implementation in order to
ascertain if performance gains are feasible over the ‘normal’ approach of having fixed
parameters.
The comparison with the normal (non–adaptive) GA will place the work in a wider,
more general context, though of primary focus is the difference, if any, between population and individual level adaptation.
1.1.1 Aims and Approach
The primary aim of the project is to test the hypothesis:
An adaptive mechanism operating at the individual level will perform as well or
better than the same mechanism operating at the population level
A secondary aim of the project is to investigate the nature of operator probability adaptation for different problems i.e. track the probabilities as the GA progresses.
Furthermore, rigorous statistical comparisons between the adaptive and non–adaptive
GAs will be made.
In order to realise the proposed aims there are three major components required:
1. The GA system, including normal and adaptive functionality (discussed in Chapter 3)
2. Test problems which the system will optimise (discussed in Chapter 4)
3. A principled means of comparing results of the optimisations (discussed in Chapter 6)
The following section provides a brief introduction to Genetic Algorithm concepts
and terminology.
1.2. Genetic Algorithm Overview
3
1.2 Genetic Algorithm Overview
Genetic Algorithms (GAs) were originally proposed by John Holland in [9] and are
based on an abstraction of the principles of natural evolution. Holland’s original intent was to develop a formal framework for the study of natural evolution, and this
remains an area of active research today. However, Genetic Algorithms (GAs) also
gained popularity as a method of function optimisation and have been successfully
applied to many industrial applications including timetabling, job scheduling in multi–
processor systems, circuit layout and aerofoil design. Ross and Corne [16] provide an
overview of GA applications. The work in this project focuses on the use of GAs as an
optimisation technique.
1.2.1 Components of a Genetic Algorithm
A brief introduction to GA terminology and concepts is provided here, for a more
detailed discussion, Mitchell [14] provides a good introduction to the field of genetic
algorithms.
Although there are variations between GA implementations, all generally feature
the following components (whose names are derived from biological terminology):
Solution representation (i.e. an individual)
Population of individuals
Fitness function
Selection method
Crossover operator
Mutation operator
The algorithm itself
4
Chapter 1. Introduction
Solution Representation
The solution representation, or just representation, has historically always been a string
of bits, due to the precedence set in [9], and for some time this approach persisted as
the representation. Hence in order to attempt to solve a problem, the first step would
be how to represent a solution to the problem as a bit sequence. While, for many
problems, this does indeed make sense (e.g. the knapsack problem), often it is quite
unnatural and there are more intuitive and fitting representations.
Some examples of effective, non-binary, representations include:
A permutation of integers, representing a tour in a travelling salesman problem
A string of integers, representing task/CPU assignments in a job scheduling
problem
A string of real numbers, representing weights in a neural network
Population
The population is the term used for the collection of individuals upon which the GA
operates in order to produce new, and hopefully better, solutions. The size of the population is generally fixed throughout a run. There are two main types of population
model; steady–state and generational. In a steady–state population, single individuals are inserted into the population overwriting a ‘weaker’ individual which has been
selected via some policy. In a generational population, a completely new set of individuals are produced from the previous population, although some individuals may
simply be copied – unaltered – into the new population. Often, both forms of population model feature a form of elitism. In a steady–state GA, this generally takes the
form of a proportion of the best individuals which are protected from being usurped
by new individuals. In a generational population, elitism amounts to roughly the same
thing; either the best (or a proportion of the best) individual(s) are copied directly into
the new population, unmodified.
1.2. Genetic Algorithm Overview
5
Fitness Function
The fitness function is in a sense, the essence of the problem – it is what we are trying to solve. It may be a function which is being optimised directly, or some form of
constraint satisfaction expression (which usually feature in, for example, timetabling
problems). The fitness function provides the means by which the quality of a given
individual is assessed. The fitness of an individual is then used to determine the likelihood that it will be selected for reproduction.
Selection Method
The selection method chooses individuals from which offspring are produced. The
selection should essentially be biased toward individuals of higher fitness, in order that
their genetic material persists in the population and is recombined and improved upon.
Their are a number of different selection schemes, including: fitness proportionate
selection, tournament selection and rank based selection. Each method has its advantages and disadvantages. A common selection method is fitness proportionate selection, whereby individuals are selected with a probability which is in direct proportion
with its fitness value (relative to the sum of all the fitness values of the population).
This method can cause the GA to become trapped in a local optima as highly fit genetic material is proliferated throughout the population before adequate exploration of
the search space has occurred.
Techniques such as rank selection can address this issue by obscuring absolute
differences in fitness value, but this requires ordering of the population, which is not
required in, for example, fitness proportionate selection.
Crossover
Crossover, within the GA community, has historically been considered as the major
force behind the power of GAs. It is the means via which genetic material from parents
(typically two, but it can be more) is recombined in order to produce a new individual,
or individuals. A common crossover operator is two point crossover, which is used to
illustrate the concept.
6
Chapter 1. Introduction
This operator exchanges alleles of parents between two randomly selected points
in the string. For example the two parents: 00000000 and 11111111, may produce the
following offspring: 00111100 and 11000011, where the crossover points have been
at the second and second–from–last bit positions.
Mutation
Mutation is concerned not with recombination of material from multiple sources, but
the disruption of material from only one parent. Historically mutation has been viewed,
in the GA community, as a means of diversity maintenance within a population, e.g.
reintroducing a difference into an otherwise completely converged loci (where all
genes at a certain position have the same value). For some time this rationale has
lead to the perception that mutation is a ‘background’ operator, while crossover is the
driving force in the search. This view is no longer as popular since some mutation–only
techniques have proven comparable with GA performance, e.g. simulated annealing.
The typical mutation operator for binary chromosomes is bit–flip mutation, where
a bit is randomly inverted. This gives rise to the mutation rate, a value which specifies
the probability of inverting any given bit in the string, and is applied to all bits. So for
example, a mutation rate of 0.1 applied to a string of 20 bits, will on average, invert 2
of the 20 bits in a given string, though the actual bit positions flipped will vary.
Operator Probabilities and Operator Parameters
Any given genetic operator has an associated probability, known as the operator probability, which is the likelihood that a particular operator will be invoked on any given
iteration of the GA. Usually in a generational population, crossover and mutation are
applied with probabilities independently of each other. This means that one, both or
neither operator may be applied to a parent chromosome. The opposite approach is
generally taken in a steady–state population; either crossover or mutation are always
applied in the production of a child and these operators are applied in a mutually–
exclusive manner.
In addition to the operator probability, there is often also an operator parameter.
This is a probability which is relevant once a given operator has been invoked. For
1.2. Genetic Algorithm Overview
7
example, mutation may have an operator probability of 0.4 and when invoked, the
mutation rate (operator parameter) then comes into play in order to effect the mutation.
The Algorithm
The specifics of the algorithm are mainly determined by the population model which
has been chosen. The following algorithm example assumes a steady–state population
with an elitist policy (the basis of the GA implemented in the project).
1. Initialise the population, typically with randomly generated individuals
2. Evaluate each member of the population
3. Select parent individuals, with bias toward fitter chromosomes
4. Apply an operator to produce a new child
5. Select an individual for deletion, with bias toward poorer chromosomes
6. Overwrite this selected individual
7. Return to 3, until some termination condition is met. Common termination conditions are; population convergence (all individuals are the same), a maximum
number of evaluations have been executed or a solution of a particular quality
has been found.
An interesting aspect of GAs is that, though composed of simple methods and
concepts, once these components are brought together and executed as a whole, they
interact in a highly epistatic manner. This leads us to the fact that considerable effort must be expended on ‘tuning’ a GA, i.e. finding the best parameter settings. Not
only are these parameters sensitive to the design decisions made, but they will almost
certainly be problem dependent too.
8
Chapter 1. Introduction
1.2.2 Parameters of a Genetic Algorithm
Once the decisions have been made as to what components will realise a GA (e.g.
population model, representation, genetic operators and selection method) we are then
faced with the task of finding suitable parameters with which to run the GA.
Even for a modest GA, the parameter space facing the designer is formidable. The
following gives an idea of some parameters which are applicable in most problems:
Population size
Probability of invoking crossover
Probability of invoking mutation
Bit-wise mutation rate
Selection pressure
This list is by no means exhaustive, but already we can see that to investigate these
dimensions exhaustively would require considerable time and effort. As a result, there
have been previous attempts to find robust and general parameter settings. Notable instances are by De Jong [11] and Grefenstette [7]. Although the values derived in these
studies are indeed robust for many problems, they are still a form of ‘compromise’.
Having parameters adapt dynamically presents a possibly more robust solution.
1.3 Dissertation Outline
This section has introduced the basis of the project undertaken; the motivation underpinning the work, the aims and objectives of what is trying to be achieved and the
required methods via which to pursue these aims.
Also, the main terminology and concepts of genetic algorithms have been discussed
for the benefit of readers unfamiliar with the field.
The remainder of this section will give a brief summary of the subsequent chapters
of the dissertation.
1.3. Dissertation Outline
9
Chapter 2 discusses in some detail instances of previous research in the area of
self-adaptation in GAs and defines a general taxonomy of the various approaches that
have, thus far, been realised.
Chapter 3 details the non-adaptive GA, the population level adaptive GA, which is
a recreation of the adaptive operator probabilities (ADOPP) system [13], and the individual level adaptive GA, which is a modified version of the original ADOPP system.
Chapter 4 describes the test problems that form the basis of the experiments and
discusses why these particular problems were selected.
Chapter 5 describes the experiments carried out, and results obtained, in order to
provide an exploration of the relevant parameter spaces and the identification of parameter settings from which to conduct more detailed experimentation.
Chapter 6 discusses the results of the experiments which test the main hypothesis
of the dissertation. The results in this chapter are based mainly on t–test comparisons
between each GA type, for all the original test problems, and some additional larger
and more complex problems. Additionally, a discussion of the differing nature of
operator probability adaptation is presented.
Chapter 7 discusses the main findings of the project, what was successfully achieved
and what was not, and proposes some directions in which the work may be taken in
future.
Chapter 2
Previous Research in Self–Adaptation
2.1 Overview
This section provides a review of some examples of research in self–adaptation that
have been previously undertaken.
The first part of the review will consider some surveys and overview reports which
propose taxonomies within which to classify the nature of the various approaches.
A framework will then be synthesised from this and some example works will be
reviewed relative to this framework.
By defining this framework not only can the previous research be characterised
effectively, it also serves to highlight any areas within which less attention has been
focused.
The central hypothesis of the project along with the underlying motivation (discussed in section 1.1.1) will then be restated within the context of the derived framework.
2.2 Framework
There have been a number of papers written which attempt to define and clarify the
various approaches that fall under the term ‘self–adaptation’. It should be noted that
even the definition of this term itself is not really universally agreed upon. However,
11
12
Chapter 2. Previous Research in Self–Adaptation
although the terminology may be somewhat inconsistent between attempts to formalise
research in this field, more often than not the actual meaning behind a proposed term
is the same. It is worthwhile reviewing some of the overview work and establishing
the precise terminology used in this report.
2.2.1 Methods of Self–Adaptation
Angeline [1] proposes two orthogonal dimensions along which types of adaptation
may be defined. Firstly the actual methods of adaptation are separated into two types:
‘Absolute Update Rules’ and ‘Empirical Update Rules’.
Examples of absolute update rules include Davis’s adaptive operator fitness [4]
and Julstrom’s adaptive operator probabilities [13]. The distinguishing characteristic of this type of rule is that it explicitly defines how the adapted entity is changed,
via a mechanism which operates externally to the GA itself and typically involves the
computation of some statistic based on search performance over recent iterations. Angeline draws a comparison between these absolute update rules and traditional artificial
intelligence ‘heuristics’.
Contrastingly, empirical update rules are not based on any such external mechanism. Rather the same – in this case evolutionary – forces propelling the search within
the problem space is simultaneously applied to search (in practice, some subset of)
the space of possible GA variants. The earliest example of such an approach in GAs
was performed by Bäck [2], and is based on ideas originating from the evolutionary
strategies community.
Both these approaches involve some form of feedback in the adaptive procedure.
There are examples of deterministic schedules (e.g. a pre–defined decrease of mutation
probability as the GA run progresses) which can be classed as ‘adaptive’ techniques,
but these are not discussed in [1], as such methods fall outwith the definitions proposed
in this review.
A survey which is more comprehensive in scope by Eiben et al. [5] presents the
all–encompassing concept of ‘parameter setting’. The proposed taxonomy here not
only considers the previously discussed adaptive methods, but also includes manual
configuration of static parameters: ‘parameter tuning’. All other forms of parameter
2.2. Framework
13
modification fall within ‘parameter control’, and it is within this area that dynamic
modification of parameters occur.
Within the realm of parameter control three distinct methods are identified: ‘deterministic’, ‘adaptive’ and ‘self–adaptive’.
The primary factor which identifies a deterministic method is that it does not feature
any feedback whatsoever from the progress of the search. Instead, some aspect of the
GA is modified in a fixed way. This does give rise to subtle examples that are somewhat
difficult to classify, for instance, Davis [3], used an adaptive mechanism to derive time–
varying schedules which were then applied in a fixed manner (i.e. no feedback occurs
during the GA run). Though this is quite clearly an example of a deterministic system,
the derivation of the deterministic schedule was via an adaptive mechanism.
The adaptive and self–adaptive types are basically equivalent to Angeline’s absolute and empirical update rules respectively.
Tuson [22] performed a comparison of the two main techniques. The terminology
used in that work shall be adopted here, giving rise to ‘learning rule methods’ (equivalent to ‘adaptive’ and ‘absolute update rule’ terms) and ‘co–evolution’ (equivalent to
‘self–adaptive’ and ‘empirical update rule’ terms). during the GA run).
Henceforth the terms ‘adaptive’ and ‘self–adaptive’ simply refer to a GA which
modifies parameters via feedback from progress (be it via co–evolution or a learning
rule).
2.2.2 Levels of Self–Adaptation
The second main aspect used to classify adaptation in [1] is that of ‘Adaptation Level’.
This recognises the scope, or granularity, at which adaptation occurs. Three levels are
identified: ‘population’, ‘individual’ and ‘component’ level.
Population level adaptation is concerned with the modification of some aspect of
the GA which applies uniformly to all members of the population. Typical examples of
such attributes are the probability of crossover and modifications to the fitness function.
Individual level adaptation is concerned with the modification of a number of attributes, each of which is associated with a particular chromosome in the population.
Commonly bitwise mutation rate is adapted, which can change independently for each
14
Chapter 2. Previous Research in Self–Adaptation
chromosome in the population.
Finally, component level adaptation is the finest granularity of adaptation possible.
Here, the adaptation causes changes affecting, independently, each gene in an individual. This approach features most extensively in evolutionary strategies research, where
individuals most commonly take the form of a vector of real numbers. Consequently,
each number (gene) may be assigned a parameter specifying the magnitude of mutation
which that component will experience, for example.
Angeline also draws attention to the fact that while the power and control intuitively
offered by component level adaptation may be appealing, the resultant parameter space
which must ultimately be searched (along with the problem space itself) could be sufficiently large such that effective adaptation cannot be realised.
2.2.3 Adaptation Evidence
A slightly more subtle dimension of classification is the evidence used in adaptation.
The term evidence is taken from [18] and identifies the actual information upon which
adaptation is based, for example, operator productivity, whereby some measurement
is made of an individual’s fitness improvement over (usually) its parent. For co–
evolutionary methods the evidence is implicitly obtained via the fitness function.
2.2.4 Subject of Adaptation
Finally we have the actual aspect of the GA which is being adapted. [5] uses the term
‘component’ to address this, but it is felt this is somewhat confusing since the term is
already in use regarding adaptation level. Therefore, the term subject is introduced to
refer to that part of the GA which is undergoing adaptation. This is a very broad dimension, since it can really be any part of a GA at all. Subjects of adaptation include;
operator probabilities, operator parameters, fitness function, population size and representation.
2.3. Co–evolutionary Examples
15
2.2.5 Summary
To summarise the defined terminology, four separate dimensions of classification have
been identified (as was essentially done in [18] and [5]), but have synthesised the terms
used, where appropriate. The dimensions are:
1. The method of adaptation
2. The level within the GA at which the adaptation is taking place
3. The evidence utilised in order to effect the adaptation
4. Finally, the subject of the adaptation; what is actually being modified
Some examples works will now be discussed relative to the above dimensions.
2.3 Co–evolutionary Examples
There have been many previous works based on the idea of co–evolution. An appealing
argument is put forward in [18] in support of this approach. Basically, since the space
representing the possible configurations of a GA is vast, epistatic and there is little or no
prior knowledge regarding the landscape, then this makes it an ideal candidate for optimisation via a GA. Of course, the question is whether the GA can beneficially search
its own configuration space, while maintaining a competitive performance (hopefully,
an improved one) in the search of the problem space.
2.3.1 Adaptation of Crossover Type
Spears [19], proposes a system which features two crossover operators (uniform and
two–point) whose selection is controlled by an additional bit tagged onto the end of the
representation. A ‘0’ indicates uniform crossover and a ‘1’ two–point crossover. This
bit is subjected to the evolutionary forces just as the solution representation itself is.
A generational GA was used with fitness proportionate selection and the test problems
reported were two versions of the N–peak problem. Basically, the landscape features
16
Chapter 2. Previous Research in Self–Adaptation
one global optimum and N-1 local optima. The problems featured 1 and 6 peaks,
stressing algorithm performance across problem size, but not structure.
Spears initially compared the adaptive GA with two non–adaptive GAs, each of
which featured one of the crossover types. The adaptive GAs performance was found
to approximate that of the best performing normal GA (uniform crossover appeared to
be the optimal operator to use). It therefore appeared that the GA was indeed ‘adapting’
toward use of the preferred operator.
To test this idea, another control GA was run which featured both operators, but
selected them randomly. This GA was very close in performance to the adaptive GA.
Spears suggested that the good performance was simply the result of multiple operators, rather than anything adaption was doing.
By using a system which measured the rate of change in the appended bit, Spears
developed the idea of ‘confirmed’ adaptation, in order to try and determine more rigorously if a perceived improvement could actually be attributed to adaptation. All control
GAs still featured the added bit, but it was not actively used for anything in the non–
adaptive GAs, this allowed Spears to measure the rate of change in this appended bit
and to interpret the result as ‘confirmed’ adaptation (or not). It was found that in some
instances, what appeared to be adaptation was not actually ‘confirmed’ adaptation.
Most interestingly, Spears proposed a modification to the fitness function, which
formed a type of hybrid approach between learning rule and co–evolution. Recognising that if two different operators produced children with the same fitness then the
operators (in the form of the appended bit) would be rewarded equally by fitness selection, since each child is as likely as the other. Clearly, this does not take into account
the parent fitness; it is possible that one operator type provided a much larger increase
over the parent fitness than the other operator. By incorporating this fitness ‘difference’
into the fitness function, Spears added a measure of operator productivity into the GA.
This type of measurement measurement is usually performed explicitly (and externally
to the GA) in learning rule approaches (discussed shortly).
This enhancement was run on both the adaptive and non–adaptive GAs and while
some improvement was found for the adaptive GAs, the non–adaptive GAs benefited
considerably, suggesting that operator productivity may be a means to enhance GAs in
2.4. Learning Rule Examples
17
general, and are not just the sole concern of adaptive systems.
2.3.2 Co–evolution at Different Levels
An early investigation based on ideas from ES by Bäck [2] considers a co–evolutionary
approach and reports results for both individual and component level performances.
The combination of different levels of adaptation within the same study is rare, for
GA literature at least. The standard approach of encoding subject parameter values
directly into the chromosome is adopted here. 20 bits are used to encode mutation rate,
providing a very fine grained discretisation of the mutation rate, which spans the range
[0.0. . . 1.0].
A single rate can be associated with one individual, involving one 20 bit segment
being appended to the chromosome, realising individual level adaptation. Or, a rate is
appended that is specifically associated with each bit of an individual. The individuals
are binary strings themselves. Interestingly, when mutation is applied, the encoded
parameter first mutates itself, then this result is used to mutate the associated object
variable, be it the whole string, or one gene.
Bäck found that the component level adaptation was more strongly disruptive, i.e.
the bias was toward exploration of the search space. Contrastingly, individual level
adaptation was found to be a more exploitative system, favouring the preservation of
highly fit genetic material. It was found that individual level adaptation performed best
on unimodal landscapes and component level adaptation on multimodal landscapes.
In light of the assumptions underlying this project, this is an encouraging result;
a finer grained adaptive mechanism is successfully exploiting a suitably non–uniform
fitness landscape.
2.4 Learning Rule Examples
2.4.1 Population Level Adaptation
Two well known works which share considerable commonality are by Davis [3] and
Julstrom [13]. Both examples adapt operator probabilities by rewarding credit to oper-
18
Chapter 2. Previous Research in Self–Adaptation
ators for their contribution in producing fitter children – otherwise known as operator
productivity. Davis presents the slightly more complicated of the two systems, which
adapts probabilities for five operators. Julstrom’s system is used to adapt two operators (crossover and mutation), however, there is no reason in principle that this system
could not be extended to adapt probabilities for multiple operators.
Davis’s work uses representations of a string of integers and the optimisation task
is to obtain a string of all 5’s (the allele values are in the range 1. . . 32). The population
model adopted is steady state as it is argued that such a system preserves information
which can be exploited by the adaptive mechanism. Davis observes that a generational
approach would be detrimental to adaptation (for this type of learning rule system)
since the vast majority, if not all, of the population is replaced on each iteration of a
generational GA. The idea is not tested, though the intuition seems valid.
Davis compared the adaptive GA to one which featured all the same operators,
but selected them at random. It was found that the adaptive GA did perform slightly
better than the non–adaptive GA. However, this comparison was based only on visual
inspection of graphs.
Julstrom did not compare his adaptive GAs with a non–adaptive control. Instead,
two problems were selected for which the optimum were known. The adaptive GAs
were then observed to obtain, or get close to, these optimum.
The basis of the adaptive mechanism involves the periodic recalculation of the operator probabilities, utilising operator productivity as the basis of crediting operators
and also passing portions of this credit back to ancestral operators. This addresses the
interplay between operators, where although on operator may not be directly responsible for the production of fit offspring, it can be shown that the operator did help ‘set
the stage’. Julstrom’s approach is similar to this though the details of the mechanism
seem a little more straight forward when compared with Davis’s.
2.4.2 Reinforcement Learning Approach
Pettinger and Everson [15], propose a hybrid system (which to some degree, all learning rule methods are) which incorporates a reinforcement learning (RL) agent that
controls operator selection. The RL agent is based on Watkin’s Q(λ)–learning and was
2.4. Learning Rule Examples
19
tested using a 40 city travelling salesman problem.
The net result of this approach (following training of the RL agent) is a state–action
table that determines, stochastically, the operator to select dependant upon the current
state of the GA. In order for the ‘state’ of the GA to be perceived by the RL agent, the
following discretisation of attributes were made: The generation count was split into
four epochs. Average population fitness, normalised by the initial population fitness,
was split into four intervals and a measure of population diversity was similarly binned
into three states.
The operators available to the GA formed the ‘actions’ available, of which there
were three crossover operators and four mutation operators. The operators are further
refined by specifying the ‘class’ of individual that will be selected in order that the
operator can be applied. To enable this the population was split into two classes: ‘Fit’
(within top 10% of population) and ‘Unfit’ (the rest of the population). This additional
dimension therefore created two versions of each mutation operator and four versions
of each crossover operator.
The task of the agent was to learn the most appropriate operator to apply, given the
current state of the GA, the basis of reward for any action taken by the agent (‘action’
being the application of a particular operator), is a form of operator productivity.
This approach was shown to be successful compared with a non–adaptive GA that
selected from the same set of operators at random. As Spears showed [19], this is a
good test to conduct, particularly when multiple operators are in play, since it may well
just be the utility of having several operators that is behind any apparent improvement.
However, there seems to be a distinct lack of fair tuning in the comparison. The RL
agent experiences 150 training runs in order to learn the required state action table,
whilst the normal GA is executed with just one parameter configuration.
Admittedly, the RL agent’s learning phase is not strictly a tuning exercise (in the
traditional sense), but the point is that computational effort is being expended in order
to improve the adaptive GA performance. It seems only fair that a similar amount of
effort be spent exploring the normal GAs parameter space. Apart from providing a better basis for comparison, this can help increase confidence that there is not some static
parameter setup that actually provides equivalent performance to that of the adaptive
20
Chapter 2. Previous Research in Self–Adaptation
system.
2.4.3 Individual Level Adaptation
Srinivas and Patnaik [20] report a learning rule based adaptive GA which operates at
the level of individuals. This is quite an unusual combination of approaches. Very
often learning rule methods are concerned with population level adaptation and co–
evolutionary methods are primarily concerned with individual level adaptation.
This may be a consequence of the early work of Davis [3], as discussed, which featured a population level learning rule approach. The affinity between individual level
adaptation and co-evolutionary methods is obvious, since by its fundamental setup,
co–evolution is focused on the individual and there is no global or centralised control
in place.
The system put forward in [20] is an elegant one, with essentially no book keeping
overhead, though computational effort is of course required. The authors begin by
deriving simple population level expressions to derive crossover probability p c and
mutation rate pm using only the population maximum and average fitnesses. These
expressions are then extended by incorporating the individual’s own fitness, creating a
simple but effective localised parameter set.
The rationale behind the system is one of local optima avoidance, which the authors
argue may be achieved by preserving highly fit individuals, while applying more disruptive force to very unfit individuals. This is effectively the exploration/exploitation
balance made explicit; the search space is exploited by the current best individual, but
by increasing disruption to low fitness individuals the search is kept active and the
likelihood of becoming trapped in a local optima is considerably less likely.
Some parameters are introduced by the system which are investigated by the authors. The GA was found to be robust to the changes.
Interestingly, this work was based on a generational GA, whilst many other learning
rule approaches are based on steady state GAs. Steady state GAs are often favoured
in learning rule approaches as due to their incremental nature they tend to preserve
information in a more stable manner than generational populations. Retaining information is not an issue here though, as there is no credit assignment in place, storage
2.5. General Points
21
of operator events or similar. The crossover and mutation rates are simply calculated
directly, as needed, from nothing more than three fitness values (and some constants).
Just as [19] advocated the use of a ‘fitness increase’ type metric, a similar concept is
at play here; by calculating nothing more than differences in fitnesses, suitable values
can be derived for crossover and mutation rate.
The results reported were favourable for the adaptive GA compared with one featuring static pc and pm . These results were basically just direct comparisons of average
performances (over 30 runs). Other metrics were recorded as well, such as how many
times the GA got stuck in a local optima, which the adaptive GA performed well in.
2.5 General Points
Irrespective of the approach taken to realise an adaptive GA, the following basic steps
feature in the majority of works:
1. Determination of the parameters which will be adapted. As discussed previously, there are a large number of parameters in a typical GA. Some research
concentrates on modifying just one parameter, other work incorporates multiple
parameters simultaneously. The specific representation of the parameters must
be considered along with how and when they will be modified.
2. Deciding how the static parameters will be handled. There are basically two
approaches taken here. One is to choose a value for the parameter (often an experimental ”standard”, e.g. Crossover probability of 0.6 is typical). The other
tests the parameter over a range of values, though, for a given set of runs, the
parameter is still static. The choice made depends upon what is under consideration e.g. population size may be kept static for all experiments if its influence
is not under scrutiny.
3. Selecting the test suite for the experiment. In general, it is felt that many experiments do not consider a sufficient number of benchmark problems, however [2],
and [22] are extensive in the test problems featured. Given that practically all re-
22
Chapter 2. Previous Research in Self–Adaptation
sults suggest a high degree of problem dependency in the observed performance,
a comprehensive a suite of test problems as possible should be used.
4. The adaptive GAs are then run on the test suite and usually compared with a
static parameter GA as a control. Though, as in [13] and [19], sometimes showing effective adaptation is the primary aim. Often, performance comparisons are
made simply by comparing average performance but this is not rigorous [24].
2.5.1 Parameter Migration
Several of the techniques which have been developed in order to adapt parameters
within a GA, do themselves introduce parameters. This is not necessarily an issue as
it is often the case that these higher level parameters are in fact a lot more robust than
those that are to be adapted, in which case the overhead is acceptable. This said, it is
clearly something that should be kept in mind when attempting to create adaptive GAs.
2.5.2 Summary
A framework for the classification of research in the field of self–adaptation was proposed and some notable instances of previous works reviewed. There are sufficient
instances of positive results to show that it is worthwhile to focus research in this area.
One aspect which has received relatively little attention is in comparing different
levels of adaptation. It was noted that the vast majority of learning rule methods operate
at the population level, while overwhelmingly, co–evolution is realised at the individual, or component, level. These appear to be the intuitive levels for these methods, but
there is no reason that other combinations should not be tried.
The general intuition behind finer grained approaches is appealing, as Angeline
observes:
“While the number of parameters for a population–level adaptive technique is smallest, and consequently easiest to adapt, they may not provide
enough control over the evolutionary computation to permit the most efficient processing. This is because the best way to manipulate an individual
may be significantly different than how to manipulate the population on
average.”
2.5. General Points
23
This is a valid observation, though care must be taken to temper any hopes of performance gain with the computational overheads required to realise such finer grained
systems. Angeline draws attention to this point by highlighting the fact that although
component level adaptation may be at the extreme of offering adaptive control, the
resulting parameter space may well be prohibitive of effective adaptation.
Since relatively few learning rule methods have been proposed at the individual
level (Srinivas and Patnaik [20] being a notable exception), the deficiency will be addressed in here. Additionally, by basing the technique on an existing population level
system, a study of the ‘effect’ of adaptation level on the system will be possible.
Combining the intuition supporting finer grained adaptation, and addressing the
gap in studies comparing adaptation level leads naturally to the hypothesis of the
project:
An adaptive mechanism operating at the individual level will perform as well or
better than the same mechanism operating at the population level
In order to test this hypothesis a GA system is required which is discussed in Chapter 3.
Chapter 3
System Implementation
3.1 Overview
As mentioned earlier, any form of empirical investigation requires a system upon which
experiments are run. This chapter details the GA system developed for these purposes,
which is based upon previous work by Julstrom [13] called adaptive operator probabilities (ADOPP).
ADOPP was selected as a basis for the work as it was felt to be both flexible
in nature and a relatively uncomplicated realisation of a learning rule based adaptive
system. Though originally based on population level adaptation, there was no reason
in principle that the mechanisms involved could not be directed toward finer grained,
individual level, adaptation.
The opportunity of rigorously comparing the system with a non–adaptive counterpart was also present, as this was not an aim of the original work.
Three broad types of GA were implemented:
1. Non-adaptive (or normal)
2. Population level adaptive
3. Individual level adaptive
Each GA type shall be discussed in more detail in the following sections.
25
26
Chapter 3. System Implementation
3.2 The Basic Genetic Algorithm
This GA is based on a simplified version of the one proposed in [13], essentially with
the adaptive operator probability functionality removed. This provides a reasonable
basis for comparison with the adaptive versions since the underlying GA mechanics
remain the same, and all that varies is the presence (or absence) of the adaptive mechanism.
The GA uses a steady–state population that does not permit the insertion of duplicate chromosomes, based on the ‘steady–state without duplicates’ GA of [4]. The
primary advantage of this approach compared to generational GA is that population
diversity is ensured (convergence to the same individual is impossible). It can also be
argued that barring duplicates results in a more efficient use of the population since
calls to the evaluation function only occur in order to assess new genetic material, as
opposed to possibly having several calls to the fitness function to assess a number of
instances of the same individual. Furthermore the non-duplicate restriction on population provides improved sampling of the search space, compared with a population
model which permits duplicates.
Algorithm 1 defines the basic GA operation.
Algorithm 1 Normal GA
1. Initialise population (at random and with no duplicates)
2. Evaluate all individuals in population
3. Select a parent
4. Select the operator to apply
5. IF crossover selected
6. THEN get another parent and apply crossover
7. ELSE apply mutation
8. IF new child is not already in population
9. THEN evaluate the child and insert into population
10. ELSE discard new child
11. Go to step 3 until maximum number of evaluations have occurred
The parent selection method relating to step 3 in algorithm 1 is linear rank selection, which is based on the definition in [23]. Linear rank selection firstly orders the
population by fitness, then assigns each individual a ‘ranked fitness’ 1 . Following this,
1 This
value should not be confused with the individuals actual fitness value; the linear value is used
3.2. The Basic Genetic Algorithm
27
an individual, i, is selected with the likelihood of rank i over the sum of all the ranks.
In addition to using linear rank selection for reproductive purposes, it is also used to
select individuals that will be overwritten by fitter offspring. Of course, when selecting
for deletion, the bias is toward less fit offspring (with an elite proportion of the fittest in
the population protected from deletion). This deletion process forms part of the ‘insert’
functionality mentioned in step 9 of algorithm 1.
Linear rank selection has two main advantages over other selection methods. Firstly,
it tends to avoid premature convergence as absolute differences in individuals’ fitness
values are obscured, so, even if a super fit individual was present in the population,
it will not rapidly proliferate through the population as would very likely be the case
in, say, fitness proportionate selection. Secondly, linear ranking performs a form of
‘auto–scaling’ of fitness values, such that no matter the magnitude of variance present
across all the fitness values, the selective pressure resulting is always constant. Certain other selection methods tend to suffer from a decrease in selective pressure as the
fitness values in the population become less varied, which tends to occur as the GA
makes progress.
These advantages do come at a cost as linear ranking requires that the population
is sorted according to fitness value before the ranked fitnesses can be applied. Methods such as roulette wheel selection and tournament selection do not require sorting.
Also, there may well be situations in which it may be important to know the absolute
difference between two fitness values, in which case linear rank selection would be
undesirable.
The operator which will be applied on a given iteration of the GA is determined
stochastically by the operator probabilities. For crossover, this is denoted as P Cr and
for mutation as P Mu . Crossover and mutation are applied exclusively on any given
iteration of the algorithm and one of the operators is always applied, which gives rise to
the condition that P Cr P Mu = 100%. These values remain fixed for the duration
of a run. The details of crossover and mutation behaviour are discussed later in this
chapter, suffice it to say at the moment that both operators produce one child only.
only to select individuals for reproduction or deletion
28
Chapter 3. System Implementation
3.3 The ADOPP System
The following description is based largely on the detail provided in [13], which provides a good explanation of the system. Where applicable, a little more detail is provided here and some insights regarding certain design decisions are offered.
Julstrom proposes a system called adaptive operator probabilities (ADOPP) in
[13], which adapts the operator probabilities as the GA runs. The unity condition
[P Cr P mu = 100%] still holds throughout the run.
By utilising an operator productivity metric, ADOPP assigns a probability to an
operator which is proportionate to that operator’s contribution over recently generated
individuals. Operator probabilities are updated after the creation of each new individual.
Two structures are utilised in the ADOPP system; an operator history, which records
the operators that led to the creation of the individual, and a queue which records recent
operator contribution and operator invocations.
Operator History
Julstrom originally used a binary tree to implement the operator history, in this work
a 2–dimensional ragged array was used, which was slightly simpler to implement.
Figure 3.1 shows an example of an operator history.
Figure 3.1: An Operator History
3.3. The ADOPP System
29
The most recent entry in the history (circled) holds the immediate operator; the
operator which actually created the individual. At the next ‘layer’ of the history we
have the operators which created the parents of the individual and so on. This history
extends back to a pre–specified level, known as depth, which is one of the parameters
of ADOPP. The example history has a depth of 4.
Whenever crossover or mutation is applied to create a new individual then the appropriate section(s) of the parental history/ies are copied into the new individual’s operator history. The empty square bracket entries in the history, [ ], signify a null entry
and are required since any chromosome produced by mutation has one parent only.
The purpose of the operator history is to enable the appropriate assignment of
credit, not only to the immediate operator which produced the child, but also to operators which played a contributory role in the creation of the child. This is important,
for example, in addressing the intuition that mutation may introduce useful genetic material into a population which is then spliced together with other fit genetic material,
via crossover, to produce a good offspring. Clearly, crediting only crossover in such a
scenario would be unfair and misleading.
Credit is assigned to the operators when an individual is created which is considered to be an improved chromosome. ADOPP considers an individual improved if its
fitness is greater than that of the median individual’s fitness.
When an improved individual is created, a credit value of 1.0 is rewarded to the
immediate operator. The operators which created the new individual’s parents are
rewarded a credit value of 1.0 decay, where decay is a value in the range 0.0 to 1.0
and controls the generosity of ancestral credit assignment. The grandparent operators
are awarded 1.0 decay2 , and so on, back to the final layer of ancestral operators.
Decay is the second ADOPP parameter.
For the example history of Figure 3.1, assuming a decay value of 0.6, crossover
would therefore be awarded a total credit of 1.0 + 0.6 + 2 0.62 + 3 0.63 = 2.968.
Meanwhile, mutation would be awarded a total credit of 0.6 + 0.6 2 + 2 0.63 = 1.392.
In the event of a new chromosome not being an improvement, then both crossover
and mutation are awarded zero credit, but note that the invocation of the operator is still
recorded in the queue of recent operators. Also, since our GA implementation does not
30
Chapter 3. System Implementation
permit duplicate chromosomes into the population, then if a duplicate is produced, this
too results in zero credit for crossover and mutation.
Operator Queue
The operator queue is used to store recent operator invocations, along with the associated credit values for crossover and mutation. Each time a new individual is created,
the operator used is always recorded. If the new individual is an improvement (better
than the median individual and not already in the population), then the credit values
for crossover and mutation are derived from the new individual’s operator history, and
these values are enqueued along with the operator type. Otherwise, zero credit values
are enqueued along with the operator type.
The length of the queue is defined by an integer value, qlen, which is ADOPP’s
third and final parameter. Initially, ADOPP operates with fixed operator probabilities
(as per the normal GA) until qlen entries have been added to the operator queue. After
this point the operator probabilities are derived from the operator queue, and enqueueing a new entry requires removing the oldest one. The initial, fixed probabilities, in
both the original and this work, are set at P Cr = P Mu = 50%.
The operator queue has four variables associated with it, namely the number of
crossover and mutation invocations recorded, num Cr and num Mu , respectively,
and the total credit due to crossover and mutation (summed across all queue entries),
cred Cr and cred Mu , respectively. When a new entry is added to the queue, the
relevant invocation count is incremented by one and the credit values being enqueued
are added to cred Cr and cred Mu accordingly. Note that these values may be zero.
When an entry is removed from the queue the opposite process to adding an entry
is followed; the appropriate operator count is decremented by 1 and the credit scores
for crossover and mutation are subtracted from cred Cr and cred Mu , respectively.
The four values are then used to derive the operator probabilities via the following
expression:
P Cr cred Cr num Cr cred Cr cred Mu num Cr num Mu Therefore, quite simply, the total credit values are scaled according to the frequency
3.4. ADOPP Modification for Individual Adaptation
31
of the appropriate operator, and the resulting values are normalised with each other to
satisfy the earlier condition that P Cr P Mu = 100%. To ensure that an operator
always has some participation, the probabilities are bounded between 5% and 95%.
ADOPP Algorithm
Algorithm 2 summarises the above discussion and presents the system in a more complete form.
Algorithm 2 ADOPP GA
1. Initialise population (at random and with no duplicates)
2. Evaluate all individuals in population
3. Select a parent
4. IF operator queue is full
5. THEN select operator using individual dynamic probabilities
6. ELSE select operator using fixed global probabilities
7. IF crossover selected
8. THEN get another parent and apply crossover
9. ELSE apply mutation
10.IF new child is not already in population
11. THEN
11.1. Evaluate the child and copy the parental histories
11.2. IF child is an improvement
11.3. THEN enqueue calculated credit values along with applied operator
11.4. ELSE enqueue zero credit along with applied operator
11.5. Insert child into population
12. ELSE discard new child
13. Go to step 3 until maximum number of evaluations have occurred
3.4 ADOPP Modification for Individual Adaptation
The system described in section 3.3 adapts operator probabilities at the population
level. The enhancement described in this section allows the same mechanism to adapt
at the level of the individual, in theory enabling finer grained adaptation, and hopefully
providing more effective optimisation.
32
Chapter 3. System Implementation
Primary Extension
The main modification made to ADOPP is that instead of just one operator queue, each
individual has its own queue (as is already the case for the operator history structure).
The initial behaviour of the individual level ADOPP system (iADOPP) is identical to
the original, that is, until all operator queues are full, operators are selected based on
static probabilities (again, both set at 50%). Until all queues are full each new entry
is added to each and every queue. This ensures that all operator probabilities begin
adapting at the same time, and also enables adaptation in the shortest possible time; if
we were to wait for each queue to populate individually it is likely that adaptation may
not occur sufficiently early on in the GA run. For example, if we have a population size
of 100 and a qlen of 100, this results in a worst case scenario of 10,000 iterations before
all queues are full, within which time the GA will more than likely have converged
already, making adaptation rather pointless. In reality it is more likely that a lot fewer
than 10,000 iterations would be required before the onset of adaptation, however, the
delay would very likely still be sufficiently such that adaptation was not occurring at a
suitably early stage in a run.
Note that once the queues are full, new entries are enqueued only onto the currently
selected queue, i.e. the queue belonging to the parent selected in step 3 of algorithm 2.
Additional Copying Overheads
Given that each individual now has its own operator queue, we must copy the parental
queue into the new chromosome’s queue, as well as the operator histories. For mutation
this is straight forward as we are concerned with only one parent, and therefore one
queue. For crossover it was decided to copy the queue of the first parent selected.
This is a clear consequence of the more demanding overhead requirements, both in
terms of book keeping and computational effort, of realising adaptation at the individual level. Such practical requirements essentially demand that benefit is gained from
these more costly mechanisms. Otherwise there are simply no grounds for justifying
their inclusion.
3.4. ADOPP Modification for Individual Adaptation
33
Improvement Criteria
In ADOPP an improved chromosome is one whose fitness is greater than the median
individual’s fitness. This clearly makes sense from a population point of view, but
less so from the individual standpoint. Another option is to seek for an improvement
over the parent’s fitness, which focuses the adaptation more tightly on the individual
level. Again, for mutation, this is a straightforward exercise, but for crossover we must
decide over which parent we seek an improvement. There are three main choices to
consider:
1. 1st parent selected (or 2nd, the point being we simply always select a certain
parent unconditionally for comparison)
2. improvement over the less fit parent
3. improvement over the fitter parent
Clearly option 3 is the most demanding and option 2 the least. Due to the fact that
both parents are selected by linear rank, option 1 forms a kind of stochastic super set of
both options. With some exploratory experimentation it was found that no particular
approach exhibited any advantage over another, therefore, option 1 was chosen as it is
the simplest.
Both median and parent improvement versions of iADOPP were implemented and
feature in experiments discussed in Chapter 5.
We can now refine the original three GA types stated earlier to include both iADOPP
implementations, resulting in four GA types:
1. Non-adaptive (or normal)
2. Population level adaptive
3. Individual level adaptive (using median improvement)
4. Individual level adaptive (using parent improvement)
34
Chapter 3. System Implementation
In addition to the main GA components discussed it is also worthwhile to provide
an overview of the genetic operators utilised. For the binary encoded problems these
operators are very straight forward, while the permutation encoding operators are more
complicated since they essentially encode some domain knowledge.
3.5 Genetic Operators
3.5.1 Binary Encoding Operators
The following operators are used for all test problems apart from the travelling salesman problems. The test problems are discussed in detail in Chapter 4.
Uniform Crossover
This is a common crossover technique [21] for binary encoded strings. Each allele
is selected stochastically from one of the parents to form a singular offspring. For
example, the two parents: 11111111 and 00000000, may form the following offspring:
11011010. Usually, parents are selected with equal probability and this approach is
taken here.
Bit–flip Mutation
Bit–flip mutation was discussed in Chapter 1 and is worth reiterating here. Mutation is
invoked stochastically at a rate determined by its operator probability (P Mu ). Once
invoked, the mutation rate applies to each bit in the string and defines the likelihood
that any given bit will be inverted. Under this scheme when a bit is mutated it is
guaranteed to change.
2
2 This is subtly different from Holland’s originally proposed mutation [9].
Holland defined a mutation
event as the random assignment of a value from the set 0,1 to a gene (the work is based on binary
strings), therefore though mutation may be applied to a gene, a change in that gene is not guaranteed as
a result. This scheme is also applied stochastically with a small probability.
3.5. Genetic Operators
35
3.5.2 Permutation Encoding Operators
These operators are used only for the travelling salesman problems which use permutations of integers to represent non–cyclical tours which are discussed in Chapter
4.
Very Greedy Crossover
This operator is described in [12]. The basis of very greedy crossover (VGX) is that
shortest edges are heavily favoured when constructing a new tour from two parents.
The algorithm is as follows; a starting city is selected at random and the four parental
edges connected to that city are determined (call this the edge list). If there is a shared
edge in the edge list which is to an unvisited city, then that edge is appended to the tour.
If there are no shared edges then the shortest non–cyclical parental edge is appended.
If it happens that all parental edges cause cycles, then the shortest edge to a previously
unvisited city is selected. We can see therefore that the respect shown to shared edges
is the only manner in which VGX does not behave greedily [12].
Edge-swap Mutation
For permutation encoded problems, i.e. TSPs, mutation is defined as the reversal of the
cities between two randomly chosen points in a tour. This has the effect of changing
two edges in a tour. Although several cities may be ‘disrupted’ in terms of where they
appear in the permutation, since going from city A to city B is equivalent of going
from city B to city A in a symmetric TSP (which all TSPs in this work are), it is only
the two altered edges (those at the start and end of the reversed section) that actually
affect the length of the tour. Figure 3.2 illustrates the operation for an 8 city problem.
The shaded section of the parent (top) indicates the portion of the tour that has been
randomly selected for reversal.
36
Chapter 3. System Implementation
Figure 3.2: Edge-swap Mutation
3.6 Summary
This section has described the implementation effort undertaken in order to produce a
system via which the aims of the project can be pursued.
The primary system design is that proposed in [13], which provides a learning
rule adaptation mechanism which is used to modify operator probabilities during the
execution of a GA at the population level. The adaptive mechanism works with a
steady state GA and a population that does not permit duplicate chromosomes i.e.
newly created instances of an already existing individual cannot be inserted into the
population.
The underlying steady state GA was utilised as a means to provide a non–adaptive
counterpart by removing those aspects of the system relating to operator probability
adaptation. By including an appropriate normal GA implementation for comparative
purposes, valid conclusions can be drawn regarding the effects and benefits (if any) of
adaptation.
The population level adaptive system was extended to allow finer grained modification of operator probabilities. This enabled an investigation of the differences (if any)
between population and individual level operator probability to be carried out. This
type of comparison has not previously been conducted for a learning rule adaptive
system, across the adaptation levels considered here.
Chapter 4
The Test Problems
4.1 Overview
Test problem selection is as important a part of any empirical investigation as the
choice of GA implementation and any parameter settings. Care should be taken to
try and select problems that will hopefully prove illuminating for the investigation at
hand. Using well studied problems can sometimes provide a useful means of comparison with previous experimental results. More importantly though, is selecting
problems that address the assumptions behind any research. Usually, a broad range
of problems is useful as it can serve to highlight particular strengths (and weaknesses)
of the algorithms under scrutiny.
Here, the two test problems of the original work feature, namely binary f6 [4]
and a 30 city travelling salesman problem (TSP) [10], along with several others. The
complete set of test problems is as follows:
Binary f6
30 city TSP
100 city TSP
MaxOnes
De Jong Test Functions
37
38
Chapter 4. The Test Problems
Each test problem will now be discussed in some more detail and the reasons for
their inclusion given.
The TSP problems use the permutation encoding operators while all other problems
use uniform crossover and bit-flip mutation, as discussed in Chapter 3.
In all the binary coded problems (all except the TSPs), the uniform crossover probability i.e. the likelihood of selecting the current bit value from the 2nd parent, is 0.5.
The mutation rates are all based on the standard rate of 1/l, where l is the bit string
length of the individuals. For binary f6, the mutation rate chosen is 0.05, the same is
the value reported in [13].
Details of all static parameter settings, chromosome string lengths etc. are given in
Appendix A.
4.2 Binary f6
1
0.6
0.4
max f 0.2
0
-100
sin2 x2 x2 0 5
x22 2
2
1
f x1 x2 0 5 1 0 0 001
x21 100 0 xi 100 0
0.8
-50
0
50
f 0 0 1
100
The above visualisation of binary f6 is similar to that featured in [4]. x 2 is held at
the optimal value of 0 and x1 is then varied across the entire input range. x2 behaves in a
symmetric manner to this, therefore swapping the roles of the variables would produce
the same result. Binary f6 is a very complex landscape with many concentrated local
maxima and only one global maximum. This problem is one of two attempted in the
original work [13].
4.3. 30 City TSP
39
Figure 4.1: Map of 30 city TSP
4.3 30 City TSP
This is the second problem featured in [13]. A display of the city layout is shown in
Figure 4.1. While this is not as telling as an impression of the actual fitness landscape
of the problem, it at least gives an indication of the general structure and a means of
comparison with other TSPs. The minimum tour length of this TSP instance is 420.
We can see that the distribution of cities is relatively even. There is a little clustering
present, but nothing extreme.
4.4 100 City TSP
Figure 4.2: Map of 100 city TSP
This problem
1 Data
for
the
1
provides an expansion upon the original set, while keeping the
100
city
tsp
was
obtained
from
TSPLIB:
http://www.iwr.uni-
40
Chapter 4. The Test Problems
problem type familiar. From Figure 4.2 we can see that this problem is also evenly
distributed, with no specific clusters or structure in the layout. The minimum tour
length is 21282.
4.5 MaxOnes
MaxOnes is a simple maximisation problem where the maximum fitness is achieved
by having a string of all ‘1’s. The fitness for any individual is simply the number of
‘1’s present in the string. A string length of 100 bits is used, yielding a mutation rate
of 0.01. Population was taken as 100. This problem is included to see if adaptation
can exploit a simple problem, or if adaptation is actually detrimental to performance
in this situation.
4.6 De Jong Functions
The De Jong test suite [11] is well known in the GA field. It presents a varied mixture
of landscape types in isolation, and can provide useful information on the relative
strengths and weaknesses of an optimiser. The suite is composed of five problems,
each discussed in turn.
Note that for the visualisation of functions provided, the fitness landscapes are
shown as maximisation problems. In fact, they are implemented as minimisation problems (the usual approach, and that of [11]). However, the visualisation of the problems
is clearer if presented from the maximisation stand point. In reality it makes no difference whether the functions are treated as max/minimisation problems, as both forms
are equivalent.
It was found that some functions required more or fewer runs than the typical 5000
and the stated values were derived via some extended preliminary runs. The population
size was taken as 100 for all functions (details in Appendix A).
heidelberg.de/groups/comopt/software/TSPLIB95/tsp/kroA100.tsp.gz
4.6. De Jong Functions
41
f1 - Sphere Model
f1 x ∑3i 1 x2i
5 12 xi 5 12
min f1 f1 0 0 0 0
F1 is the simplest of the De Jong test problems, it is smooth and unimodal, and
should not present any problems for any capable optimiser. Similarly to MaxOnes, this
problem should test whether adaptation can bring any performance benefit for such a
simple landscape. Certainly, it is not expected that individual level adaptation would
provide any benefit for such a problem, given the lack of any localised sub–features
which may be exploited.
f2 - Rosenbrock’s Function
f2 x1 x2 100 x21 x2 2 ! 1 x1 2 048 xi 2 048
min f2 2
f2 1 1 0
F2 is a complicated surface which features a ridge as the maximal feature of the
landscape. This ridge follows a parabolic trajectory and has a global optimum at just
one point on the ridge. This is a more promising candidate for individual level adaptation due to the variations and non-uniformity present compared with, for example,
f1.
f3 - Step Function
This function is composed of several completely flat, incremental plateaus. In a sense,
it shares some commonality with MaxOnes, since the landscape of MaxOnes is essen-
42
Chapter 4. The Test Problems
f3 x 25 ∑5i " 1 # xi $
5 12 xi 5 12
min f3 &%
%
f3 5 12 5 && 5 12 5 &' 0
tially composed of several ‘levels’, with only one minimum (the all zero string) and
only one maximum (the all one string) representing the extremes of the landscape. In
between these extremes it is quite possible to have several distinct individuals that have
the same fitness value.
Similarly, it is possible to produce new individuals in the f3 landscape, that although different, have the same fitness value as existing individuals. This raises the
possibility of creating children that do not yield any information upon which adaptation can progress. This landscape may therefore prove difficult for adaptation to exploit
effectively.
f4 - Quartic Function with Noise
4
f4 x ∑30
i " 1 i ( xi Gauss 0 1 1 28 xi 1 28
min f4 f4 0 && 0 0
This function features the addition of Gaussian(0,1) noise, and therefore the landscape example presented can be thought of as an indicative ‘sample’ of the surface,
which ultimately changes as the search progresses. The underlying surface is quite a
simple however.
This problem may shed some light on the different GA types’ capability in handling
noise.
4.7. Summary
43
f5 - Sheckel’s Foxholes
1
! ∑25
j 1 j , ∑ 2 ) x i . ai j + 6
i
1
/
32 16 0
16
32 32 16 &0 0 16 32
32 32 32 32 32 16 16 &0 32 32 32 1
65 536 xi 65 536
min f5 f5 32 32 2 1
1
f 5 ) x1 * x2 +
ai j 1
500
F5 is an extreme landscape, featuring several sharp local optima (and one global
optimum). Again, we have large areas of this landscape from which no useful information can be gleaned due the the considerable flat areas. This may well cause problems
for the adaptive GAs, although there is also high non–uniformity present thereby combining desirable and detrimental features.
4.7 Summary
As discussed earlier, test problem selection is an important part of any empirical investigation of GAs.
The intuition underpinning the hypothesis is that individual level adaption will be
able to exploit localised features of a fitness landscape that population level will be
unable to, due to population level adaptation being too coarse.
The test problems selected feature both unimodal and multimodal landscapes and
varying degrees of non-uniformity.
The diversity of the landscapes featured provides a reasonable basis with which to
test this intuition.
Chapter 5
Formative Experiments
5.1 Aims
Two primary aims are pursued in this formative experimentation:
1. Investigate the sensitivity in performance of the GA to variation in relevant parameters.
2. Identify the best parameter settings for a given GA type, such that these settings
may be used for more in–depth experimentation.
The first aim allows for an informal comparison to be made between GA types
regarding the impact of parameter changes on performance.
The second aim addresses ‘fair tuning’, which is often lacking in other comparative
work. Basically, when comparing two or more systems an approximately equal amount
of effort should be expended on parameter exploration for each system. By doing
so, confidence is gained that any comparisons being made relate to the systems’ best
possible performance.
5.2 Methodology
For every configuration i.e. combination of test problem and GA type, a number of
different parameter values will be exercised. For the normal GA, the only parameter
45
46
Chapter 5. Formative Experiments
under investigation is the crossover probability, P Cr , and by implication the mutation
probability P Mu . For the adaptive GAs the three ADOPP parameters of depth, decay
and qlen will be explored.
Each distinct setting of parameter(s) is known as a treatment, and several are run for
every configuration. All treatment results for a particular problem are shown together
om the same graph, providing a form of visualisation of the GAs robustness; the less
variation that exists between each treatment performance, the more robust the GA to
parameter changes.
Sections 5.2.1 and 5.2.2 discuss the values which feature in each treatment for the
normal GA and adaptive GA experiments, respectively.
5.2.1 Normal GA Parameter Treatments
The ADOPP system bounds the operator probabilities within the range of 5% to 95%.
It is therefore fair to consider the normal GA somewhere within the same range.
The values of P Cr tested range from 10% to 90%, in increments of 5%, resulting
in a set of 17 treatments, each of which was run 10 times and the average performance
reported. In total this resulted in 170 GA runs for tuning and investigation purposes.
All graphs detailing normal performance display the results for all 17 treatments
simultaneously, which helps to show both the sensitivity of performance to the probability settings and also whether or not there is a clearly favoured static setting.
5.2.2 Adaptive Parameter Treatments
In [13] the following values were reported for each parameter:
depth [1, 5, 8]
decay [0.5, 0.8, 0.95]
qlen [10, 100, 500]
A finer grained approach is taken here, incorporating the following ranges:
5.3. Results
47
depth [1, 3, 5, 7, 10, 15 1 ]
decay [0.0, 0.2, 0.4, 0.6, 0.8, 1.0]
qlen [10, 50, 100, 200, 350, 500]
The parameters are varied independently, requiring a ‘locked’ value for each parameter (shown in bold above) which will be assumed whenever that particular parameter is not being varied. An exhaustive search of all (6 6 6) 216 combinations is
not feasible. These locked values are the same as those used in the original work.
This range set provides 16 combinations, each of which was run 10 times and the
average performance reported. The resulting computational effort is 160 runs for each
problem, comparable with the 170 expended for the normal GA instances.
As for the normal GA results, performance for all 16 treatments are shown simultaneously on graphs displaying adaptive GA results, which allows for easy comparison
between performances for each treatment.
5.3 Results
The following results provide an informal characterisation of the response of each configuration to parameter variation. Furthermore they enable the identification of suitable
parameter settings from which to conduct more in–depth experiments. It should be
noted however that no conclusive points are, or should be, drawn from these results.
5.3.1 Binary F6
There is no a great deal of difference apparent in the performance for binary f6 across
the four GA types. The ‘spread’ in final solution values is very similar for each GA, although the individual level adaptive GA with parent improvement has the most focused
values (i.e. the least spread).
1
For the 100 city TSP, it was found that that a depth of 15 caused a Java out-of-memory error. This
was due to the large population size of 500 required for this problem. Therefore this depth value was
not included in runs pertaining to the 100 city TSP
48
Chapter 5. Formative Experiments
POPULATION LEVEL ADAPTATION
1
0.95
0.95
0.9
0.9
SOLUTION FITNESS
SOLUTION FITNESS
NORMAL GA
1
0.85
0.8
0.75
0.7
0.85
0.8
0.75
0
500
1000
1500
2000
2500
3000
EVALUATIONS
3500
4000
4500
0.7
5000
0
500
1
1
0.95
0.95
0.9
0.9
0.85
0.8
0.75
0.7
1000
1500
2000
2500
3000
EVALUATIONS
3500
4000
4500
5000
4500
5000
INDIVIDUAL LEVEL ADAPTATION (PARENT IMPROVEMENT)
SOLUTION FITNESS
SOLUTION FITNESS
INDIVIDUAL LEVEL ADAPTATION (MEDIAN IMPROVEMENT)
0.85
0.8
0.75
0
500
1000
1500
2000
2500
3000
3500
4000
4500
EVALUATIONS
5000
0.7
0
500
1000
1500
2000
2500
3000
3500
4000
EVALUATIONS
Figure 5.1: Performance for Binary F6 - All GA Types
Generally, all GA types behaved in a consistent and fairly robust manner to variations in parameter values. The best solution, of 0.995141 was achieved by the individual level adaptive GA with parent improvement. The settings used to obtain this were
depth = 5, decay = 0.8 and qlen = 500.
5.3.2 30 City TSP
Figure 5.2 illustrates clearly that the ADOPP GAs (operating at both the population and
individual level) are considerably more robust to parameter changes than the normal
GA, for this problem. As can be seen, the normal GA often fails to find the minimum
tour length of 420. Indeed, the only setting which achieved this for all 10 runs was
P Cr = 65%.
Of all the adaptive treatments, only one failed to achieve (on average) the minimum
tour length. This occurred for the individual level adaptive GA with parent improvement and the settings responsible were depth = 5, decay = 0.8 and qlen = 50.
5.3. Results
49
POPULATION LEVEL ADAPTATION
430
428
428
426
426
TOUR LENGTH
TOUR LENGTH
NORMAL GA
430
424
424
422
422
420
420
0
2000
4000
6000
8000
10000
EVALUATIONS
12000
14000
16000
0
2000
4000
6000
8000
10000
EVALUATIONS
12000
14000
16000
14000
16000
INDIVIDUAL LEVEL ADAPTATION (PARENT IMPROVEMENT)
430
430
428
428
426
426
TOUR LENGTH
TOUR LENGTH
INDIVIDUAL LEVEL ADAPTATION (MEDIAN IMPROVEMENT)
424
422
424
422
420
420
0
2000
4000
6000
8000
10000
12000
EVALUATIONS
14000
16000
0
2000
4000
6000
8000
10000
12000
EVALUATIONS
Figure 5.2: Performance for 30 city TSP - All GA Types
5.3.3 100 City TSP
For the 100 city TSP we have very similar results to those obtained for the 30 city TSP,
as shown in Figure 5.3, with a marked decrease in parameter variation sensitivity for
the adaptive GAs. The minimum tour length of 21282 was not achieved by any normal
GA treatment. The best tour achieved by the normal GA was 21359.3, with a P Cr of
40%.
Whilst there were several instances of the best tour being achieved with certain
adaptive treatments, none obtained the minimum tour on all 10 runs. Even so, these
results strongly suggest that performance is improved by the inclusion of adaptation.
The best average tour length found was 21336.7, which was obtained by the individual level adaptive GA with parent improvement. The parameter settings were depth
= 5, decay = 0.8 and qlen = 100.
50
Chapter 5. Formative Experiments
POPULATION LEVEL ADAPTATION
24000
23500
23500
23000
23000
TOUR LENGTH
TOUR LENGTH
NORMAL GA
24000
22500
22500
22000
22000
21500
21500
21000
0
5000
10000
15000
20000
25000
30000
EVALUATIONS
35000
40000
45000
21000
50000
0
5000
24000
24000
23500
23500
23000
23000
22500
22000
21500
21500
0
5000
10000
15000
20000
25000
30000
35000
40000
15000
20000
25000
30000
EVALUATIONS
35000
40000
45000
50000
45000
50000
22500
22000
21000
10000
INDIVIDUAL LEVEL ADAPTATION (PARENT IMPROVEMENT)
TOUR LENGTH
TOUR LENGTH
INDIVIDUAL LEVEL ADAPTATION (MEDIAN IMPROVEMENT)
45000
50000
21000
0
EVALUATIONS
5000
10000
15000
20000
25000
30000
35000
40000
EVALUATIONS
Figure 5.3: Performance for 100 city TSP - All GA Types
5.3.4 MaxOnes
Figure 5.4 shows yet another example of the improved robustness of the adaptive approaches. While the maximum fitness of 100 was eventually achieved by every treatment of each GA type, clearly the adaptive GAs’ performances are more consistent
and reliable.
Since nothing can be said regarding final solution quality for this problem, the
speed of convergence will be considered instead.
For the normal GA, a P Cr of 55% was found to offer the best average convergence time to the optimum solution, which was found after 1950 evaluations.
It was found that the best average convergence times of the adaptive GAs were
actually slightly slower than that achieved by the (optimally tuned) normal GA. The
adaptive GAs required at least between 2025 and 2100 evaluations before reaching the
optimum, suggesting that the much improved robustness may be obtained at a slight
trade off of speed to optimum solution.
5.3. Results
51
POPULATION LEVEL ADAPTATION
100
95
95
90
90
SOLUTION FITNESS
SOLUTION FITNESS
NORMAL GA
100
85
80
75
85
80
75
70
70
65
65
60
0
500
1000
1500
2000
2500
3000
EVALUATIONS
3500
4000
4500
60
5000
0
500
100
100
95
95
90
90
85
80
75
65
1000
1500
2000
2500
3000
3500
EVALUATIONS
4000
3500
4000
4500
5000
4500
5000
75
70
500
2000
2500
3000
EVALUATIONS
80
65
0
1500
85
70
60
1000
INDIVIDUAL LEVEL ADAPTATION (PARENT IMPROVEMENT)
SOLUTION FITNESS
SOLUTION FITNESS
INDIVIDUAL LEVEL ADAPTATION (MEDIAN IMPROVEMENT)
4500
5000
60
0
500
1000
1500
2000
2500
3000
3500
4000
EVALUATIONS
Figure 5.4: Performance for MaxOnes - All GA Types
5.3.5 De Jong f1
For De Jong f1, no striking differences in performance are evident from Figure 5.5.
This is a very straightforward landscape, so the result is not too surprising.
The best result of 9 10 5
was obtained by the population level adaptive GA and
the settings used were as follows: depth = 5, decay = 0.8 and qlen = 100.
5.3.6 De Jong f2
Again, there are no obvious differences between the GA performances for De Jong f2,
as shown in Figure 5.6.
However, for the individual level adaptive GA with parent improvement, there is
clearly a treatment that is not performing as well as the others. The settings responsible
are depth = 5, decay = 0.8 and qlen = 10 and the minimum achieved by this setting
is only 0.0354. It is possible that the shorter queue length does not provide sufficient
information in order that effective operator probabilities can be derived, thus impacting
52
Chapter 5. Formative Experiments
POPULATION LEVEL ADAPTATION
2
1.5
1.5
SOLUTION FITNESS
SOLUTION FITNESS
NORMAL GA
2
1
0.5
1
0.5
0
0
0
500
1000
1500
EVALUATIONS
2000
2500
0
500
1000
1500
EVALUATIONS
2000
2500
INDIVIDUAL LEVEL ADAPTATION (PARENT IMPROVEMENT)
2
2
1.5
1.5
SOLUTION FITNESS
SOLUTION FITNESS
INDIVIDUAL LEVEL ADAPTATION (MEDIAN IMPROVEMENT)
1
0.5
1
0.5
0
0
0
500
1000
1500
2000
2500
0
EVALUATIONS
500
1000
1500
2000
2500
EVALUATIONS
Figure 5.5: Performance for De Jong f1 - All GA Types
on performance.
The best solution overall of 7.063 10 4
was obtained by the normal GA with a
P Cr of 65%
5.3.7 De Jong f3
Figure 5.7 shows a more convincing difference between the approaches. As was the
case for the TSP problems and MaxOnes, the adaptive GAs applied to De Jong f3
appear to offer increased robustness to parameter variance, in comparison to the normal
GA.
There were several instances of parameter settings successfully achieving the minimum result of 0 across all the GA types. However, for the normal and population level
adaptive approaches, there were a number of settings which on average did not obtain
the minimum.
For the individual level adaptive GAs both featured only one setting which failed to
achieve the minimum on all ten runs. For the median improvement version the settings
5.3. Results
53
POPULATION LEVEL ADAPTATION
1
0.8
0.8
SOLUTION FITNESS
SOLUTION FITNESS
NORMAL GA
1
0.6
0.4
0.6
0.4
0.2
0.2
0
0
0
500
1000
1500
EVALUATIONS
2000
2500
0
500
1000
1500
EVALUATIONS
2000
2500
INDIVIDUAL LEVEL ADAPTATION (PARENT IMPROVEMENT)
1
1
0.8
0.8
SOLUTION FITNESS
SOLUTION FITNESS
INDIVIDUAL LEVEL ADAPTATION (MEDIAN IMPROVEMENT)
0.6
0.4
0.2
0.6
0.4
0.2
0
0
0
500
1000
1500
2000
EVALUATIONS
2500
0
500
1000
1500
2000
2500
EVALUATIONS
Figure 5.6: Performance for De Jong f2 - All GA Types
of depth = 5, decay = 0.8 and qlen = 10 obtained an average fitness of 0.1. The parent
improvement version achieved an average fitness, also of only 0.1, with settings of
depth = 7, decay = 0.8 and qlen = 100.
The best results overall in terms of speed to solution was obtained by individual
level adaptation (median improvement) with settings of depth = 5, decay = 0.4 and
qlen = 100. With these settings the optimum was found after at most 1550 evaluations.
5.3.8 De Jong f4
No strong differences are evident between the GA types for De Jong f4 (Figure 5.8).
Due to the fact that noise is added to the ‘pure’ value returned by the quartic function,
in practical terms it is not possible to obtain the minimum of 0.
The best average solution fitness found was 2.392 and this was obtained by the
individual level adaptive GA with median improvement. The settings were depth = 5,
decay = 0.8 and qlen = 100.
54
Chapter 5. Formative Experiments
POPULATION LEVEL ADAPTATION
10
8
8
SOLUTION FITNESS
SOLUTION FITNESS
NORMAL GA
10
6
4
2
6
4
2
0
0
0
500
1000
1500
2000
2500
3000
EVALUATIONS
3500
4000
4500
5000
0
500
1000
1500
2000
2500
3000
EVALUATIONS
3500
4000
4500
5000
4500
5000
INDIVIDUAL LEVEL ADAPTATION (PARENT IMPROVEMENT)
10
10
8
8
SOLUTION FITNESS
SOLUTION FITNESS
INDIVIDUAL LEVEL ADAPTATION (MEDIAN IMPROVEMENT)
6
4
2
6
4
2
0
0
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
0
500
1000
1500
2000
EVALUATIONS
2500
3000
3500
4000
EVALUATIONS
Figure 5.7: Performance for De Jong f3 - All GA Types
5.3.9 De Jong f5
Performance results for De Jong f5 are again, all fairly similar (Figure 5.9). If anything, the normal GA results seem to be more consistent than those of the adaptive
GAs and it happens for this problem that the normal GA did produce the best average
performance. With a P Cr of 25%, a solution fitness of 1.019 was obtained.
We can see that one treatment in particular has performed relatively poorly for the
population level adaptive GA. The relevant settings were depth = 3, decay = 0.8 and
qlen = 100 and a solution fitness of only 2.119 was achieved.
5.4 Revisiting Median/Parent Improvement
Regarding the two versions of individual level adaptation considered thus far, it was
noted that the median improvement criteria obtained a higher final solution fitness than
the parent improvement criteria, for only two of the test problems.
These were De Jong f2 and f4. However, performing a t–test on 50 runs of each
5.5. Tuned Parameter Values
55
POPULATION LEVEL ADAPTATION
120
100
100
80
80
SOLUTION FITNESS
SOLUTION FITNESS
NORMAL GA
120
60
40
20
0
60
40
20
0
1000
2000
3000
4000
EVALUATIONS
5000
6000
7000
0
8000
0
1000
120
120
100
100
80
80
60
40
20
0
2000
3000
4000
EVALUATIONS
5000
6000
7000
8000
7000
8000
INDIVIDUAL LEVEL ADAPTATION (PARENT IMPROVEMENT)
SOLUTION FITNESS
SOLUTION FITNESS
INDIVIDUAL LEVEL ADAPTATION (MEDIAN IMPROVEMENT)
60
40
20
0
1000
2000
3000
4000
5000
6000
7000
8000
EVALUATIONS
0
0
1000
2000
3000
4000
5000
6000
EVALUATIONS
Figure 5.8: Performance for De Jong f4 - All GA Types
problem instance to determine if there was a significant difference between median/parent
improvement criteria resulted in the following p–values: For f2, p-value was 0.78 and
for f4, p-value was 0.70. Details of the t–test procedure used and interpretation of the
p–value are discussed in section 6.2.1.
It is clear that the difference is not significant, therefore, the individual level adaptation using median improvement is no longer considered in the remaining experiments.
Henceforth, the term individual level adaptation can be taken to mean individual level
adaptation with parent improvement criteria.
5.5 Tuned Parameter Values
Table 5.1 summarises the optimal parameter settings found for each problem. The
entry relating to normal GAs specifies P(Cr), while the entries relating to the adaptive
GAs are of the form depth / decay / qlen.
56
Chapter 5. Formative Experiments
Problem
Binary f6
30 city TSP
100 city TSP
MaxOnes
DeJong f1
DeJong f2
DeJong f3
DeJong f4
DeJong f5
GA Type
Best Settings
Normal
50
Pop Adapt
5 / 0.8 / 500
Ind Adapt
3 / 0.8 / 100
Normal
65
Pop Adapt
5 / 0.8 / 50
Ind Adapt
5 / 0.8 / 100
Normal
40
Pop Adapt
5 / 0.8 / 100
Ind Adapt
5 / 0.8 / 100
Normal
55
Pop Adapt
5 / 0.6 / 100
Ind Adapt
7 / 0.8 / 100
Normal
75
Pop Adapt
5 / 0.8 / 200
Ind Adapt
5 / 0.8 / 50
Normal
65
Pop Adapt
5 / 0.2 / 100
Ind Adapt
5 / 0.2 / 100
Normal
65
Pop Adapt
5 / 0.4 / 100
Ind Adapt
5 / 0.2 / 100
Normal
55
Pop Adapt 15 / 0.8 / 100
Ind Adapt
5 / 0.2 / 100
Normal
25
Pop Adapt
5 / 0.2 / 100
Ind Adapt
5 / 0.8 / 10
Table 5.1: Tuned Parameter Settings
5.6. Summary
57
POPULATION LEVEL ADAPTATION
30
25
25
20
20
SOLUTION FITNESS
SOLUTION FITNESS
NORMAL GA
30
15
10
5
0
15
10
5
0
1000
2000
3000
4000
EVALUATIONS
5000
6000
7000
0
8000
0
1000
30
30
25
25
20
20
15
10
5
0
2000
3000
4000
EVALUATIONS
5000
6000
7000
8000
7000
8000
INDIVIDUAL LEVEL ADAPTATION (PARENT IMPROVEMENT)
SOLUTION FITNESS
SOLUTION FITNESS
INDIVIDUAL LEVEL ADAPTATION (MEDIAN IMPROVEMENT)
15
10
5
0
1000
2000
3000
4000
5000
6000
7000
8000
0
0
1000
2000
3000
EVALUATIONS
4000
5000
6000
EVALUATIONS
Figure 5.9: Performance for De Jong f5 - All GA Types
5.6 Summary
From Table 5.1, we can see that there is no particularly ‘dominant’ parameter set-
ting(s). The P Cr values vary considerably from values between 25% and 75%. This
is entirely expected, given the diverse nature of the problems under consideration.
Generally the preferred ADOPP parameters tend to be ‘moderate’ in nature, with
the extremes of the parameter ranges tending not to feature. There are some exceptions
though, such as the large depth preferred by f4 and long qlen of binary f6. Also, f2 and
f3 show lower decay values across both adaptive GAs.
There are several instances of increased robustness to parameter changes for adaptive GAs, most notably for TSP–30, TSP–100, MaxOnes and De Jong f3.
It was noted that the results obtained were identical
2
for a depth of 1 and for a
decay of 0.0, with the other parameters at the locked values. Julstrom noted the equivalence implied by these parameter values in [13]; both result in only the immediate
2 Results
are identical so long as the same random number generator seeding is used, otherwise
results would be very similar.
58
operator being credited.
Chapter 5. Formative Experiments
Chapter 6
Summative Experiments
6.1 Aims
The primary aim of these summative experiments is to enable the acceptance or rejection of the hypothesis of the project:
Using the same adaptive mechanism, individual level adaptation will perform as
well or better than population adaptation.
Although we may show that individual level performance is equivalent to population level performance, it is hoped that individual level adaptation will offer a gain in
performance. Since the overheads involved in terms of book–keeping are higher for
individual adaptation, then some improvement is desired in order to offer grounds for
justifying this additional overhead.
A complementary aim of these experiments is to investigate the nature of operator
probability for each test problem.
By considering significant differences in performance alongside the nature of adaptation observed, it is hoped that any relationships present across the orthogonal dimensions of adaptation level and problem type will become apparent.
59
60
Chapter 6. Summative Experiments
6.2 Methodology
In order to test the hypothesis we must define a performance gain, which can be manifested in two ways:
1. Speed improvement: The individual level adaptive GA attains a solution of a
given quality in fewer evaluations than the population level adaptive GA.
2. Quality improvement: The individual level adaptive GA attains a converged solution of higher fitness than the the population level adaptive GA.
Of course, it may be the case that, for example, a gain in speed is obtained to the
detriment of final solution quality. In this case it cannot be convincingly stated that
one method is truly ‘better’ than the other and therefore such a scenario would not be
considered a performance gain. More accurately it is a performance trade–off.
Therefore, in order to class individual level performance as improved, we must
show one of three scenarios to be true:
1. Individual level adaptation performs better in both speed and quality.
2. Individual level adaptation performs better in terms of speed without being outperformed in quality.
3. Individual level adaptation performs better in terms of quality without being outperformed in speed.
In order to determine whether or not a perceived difference represents a significant
gain or otherwise in performance, t–tests were carried out across the final converged
solution fitnesses, for all test problems. Additionally, where there appeared to be a
potential speed gain during a run, then a t–test was performed at this point. Specific
details of the t–tests are discussed in section 6.2.1.
Using the parameter values in Table 5.1, 50 runs of each tuned GA type were
executed, for all problems. In performing pairwise comparisons between the GA types
the population level GA was taken as the basis of comparison. Therefore, the following
two pairwise comparisons are always made for each test problem:
6.2. Methodology
61
1. normal GA vs population level adaptive GA
2. population level adaptive GA vs individual level adaptive GA
The first comparison provides a rigorous test of the normal GA against the population level ADOPP system – something that did not feature in the original work.
However, of most interest are the comparisons between population and individual
level adaptive GAs, as this is the central concern of the hypothesis under test.
In addition to scrutinising performance, the nature of operator probability adaptation will be considered for each problem. The hypothesis essentially makes the assumption that individual level adaptation will be able to exploit local features of the
fitness landscape in order to improve performance. It would therefore be expected that
different trajectories in probability will develop between population and individual
level adaptation.
For population level adaptation there is only one ‘set’ of operator probabilities –
the global P Cr and P(Mu). For individual level however, there are population size
sets of probabilities. Obviously the probabilities belonging to the best individuals over
the course of the run are the most directly responsible for GA performance, but we are
also interested in the adaptation occurring elsewhere in the population. For this reason
it was decided to track the best, median and worst individuals’ operator probabilities so
that any variations in individuals’ operator probability adaptation would be apparent.
6.2.1 T–test Details
In order to compare results in a principled manner, t–tests are used in order to determine whether any apparent differences are actually significant.
The confidence level assumed is 95%, such that we will tolerate a 5% chance of
making a type 1 error; rejecting the hypothesis under test when it was in fact true.
In this case the hypothesis under test is the null hypothesis, which will be rejected in
favour of the alternative hypothesis (the principle hypothesis of the project), when the
p–value obtained is less than or equal to 0.05.
62
Chapter 6. Summative Experiments
Each t–test 1 results in a p–value, which translates into the probability of making a
type 1 error.
In this context the p–value literally translates into the probability of obtaining the
observed data. In this case a sample of 50 fitness values obtained at the same point from
each run by ‘chance’. That is, if we assume that the difference between the systems
under test has no effect on the produced results (the null hypothesis) then what is the
likelihood of obtaining these results. Therefore, if a p–value is sufficiently small (less
than 0.05) then it is very likely the observed ‘difference’ is not down to simple chance,
but rather is far more likely to be attributable to the differences which exist between
the systems.
For each t–test performed the Bonferroni corrected p-value is also quoted, for completeness, although the hypothesis will be accepted or otherwise based on the uncorrected p–values in this report.
The Bonferroni correction addresses the ‘diluting’ effect of making a significance
claim based on multiple comparisons of the same system. Basically, each comparison
made between two systems increases the likelihood of making a type 1 error. For
example, if two different GAs are compared at two points during a run (say half way
through and at the end) and the resultant p-values for each comparison are both 0.05,
then to simultaneously assert both comparisons as indicative of anything the actual
likelihood of making an error has increased to 0.1, since either the first comparison or
the second comparison may have lead to an incorrect rejection of the null hypothesis.
Therefore to derive the corrected p–value, the original p–value is multiplied by the
number of pairwise comparisons made [24]. If the result happens to be more than 1.0,
then the corrected p–value is simply set to 1.0.
6.3 Results
In each table detailing the resultant p–values, where any value is found to be significant
(less than or equal to 0.5), then both the value and the GA type that exhibited the
advantage will be shown in bold.
1 T–tests
were performed using the OpenOffice TTEST() function, using the two-tailed, unequal
variance version
6.3. Results
63
The graphs in the following sections detail the average of the best-of-run solution
fitnesses, over the sample of 50 runs. The operator probability adaptation graphs are
based on average P Cr and P Mu values over the same 50 runs.
6.3.1 Binary F6 Results
INDIVDUAL LEVEL ADAPTIVE GA / POPULATION LEVEL ADAPTIVE GA
1
0.95
0.95
SOLUTION FITNESS
SOLUTION FITNESS
NORMAL GA / POPULATION LEVEL ADAPTIVE GA
1
0.9
0.85
0.8
0.75
500
1000
1500
2000
2500
3000
3500
4000
0.85
0.8
NORMAL GA
POP ADAPT GA
0
0.9
4500
0.75
5000
IND ADAPT GA
POP ADAPT GA
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
3500
4000
4500
5000
EVALUATIONS
EVALUATIONS
Figure 6.1: Comparative performance for Binary f6
POPULATION LEVEL ADAPTATION
INDIVIDUAL LEVEL ADAPTATION
100
100
P(Cr)
P(Mu)
60
40
20
0
BEST P(Cr)
BEST P(Mu)
MEDIAN P(Cr)
MEDIAN P(Mu)
WORST P(Cr)
WORST P(Mu)
80
PROBABILITY (%)
PROBABILITY (%)
80
60
40
20
0
500
1000
1500
2000
2500
3000
EVALUATIONS
3500
4000
4500
5000
0
0
500
1000
1500
2000
2500
3000
EVALUATIONS
Figure 6.2: Operator probability adaptation for Binary f6
There were no significant differences for any of the comparisons made, as shown by
table 6.1. All GA types appear to offer competitive performance with average solution
fitnesses of greater than 0.99.
For population level adaptation (Figure 6.2), the operator probability adaptation
obtained closely matches that reported in [13] for the same parameter settings of depth
= 5, decay = 0.8, qlen = 500.
64
Chapter 6. Summative Experiments
comparison evaluation p-value p-value(cor)
norm/pop
5000
0.53
1.0
ind/pop
1750
0.30
0.90
ind/pop
5000
0.57
1.0
Table 6.1: p–values for Binary f6
The static operator probability period prior to adaptation commencing is visible
due to the large queue size. Once adaptation commences, mutation is favoured until
around 1500 evaluations, after which crossover begins to dominate. The final crossover
probability adapted (around 70%) is rather higher than the preferred static value of
50%. This difference is not that surprising though, given the robust nature of binary
F6 to parameter changes, as shown in Figure 5.1.
Individual level adaptation seems to be following the same basic trajectory, though
mutation does not become dominant in the early part of the run. Rather both probabilities stay at approximately 50% until around 1750 evaluations, after which crosssover
again becomes the dominant operator. The final P Cr value of individual level adaptation of around 60% is lower than that of population level.
The correlation between the worst, median and best probabilities is very apparent.
It appears that there may not be suitable local features in the landscape for individual
adaptation to exploit, or individual adaptation is failing to exploit them if they are
present.
6.3.2 30 city TSP
For the 30 city TSP (TSP–30), again, there are no significant differences in converged
solution quality. There is a significant speed gain for individual level adaptation, which
means that the hypothesis is accepted for this problem and individual level adaptation
is outperforming population level adaptation. However, the advantage is present very
early on in the run (1000 evaluations out of a total of 15000), so in practical terms this
is probably not that useful.
The probabilities adapted at the population level are fairly similar in nature to those
6.3. Results
65
NORMAL GA / POPULATION LEVEL ADAPTIVE GA
INDIVIDUAL LEVEL ADAPTIVE GA / POPULATION LEVEL ADAPTIVE GA
430
430
IND ADAPT GA
POP ADAPT GA
428
428
426
426
TOUR LENGTH
TOUR LENGTH
NORMAL GA
POP ADAPT GA
424
424
422
422
420
420
0
2000
4000
6000
8000
10000
EVALUATIONS
12000
14000
16000
0
2000
4000
6000
8000
10000
EVALUATIONS
12000
14000
16000
12000
14000
16000
Figure 6.3: Comparative performance for 30 city TSP
POPULATION LEVEL ADAPTATION
INDIVIDUAL LEVEL ADAPTATION
100
100
P(Cr)
P(Mu)
60
40
20
0
BEST P(Cr)
BEST P(Mu)
MEDIAN P(Cr)
MEDIAN P(Mu)
WORST P(Cr)
WORST P(Mu)
80
PROBABILITY (%)
PROBABILITY (%)
80
60
40
20
0
2000
4000
6000
8000
10000
12000
14000
EVALUATIONS
16000
0
0
2000
4000
6000
8000
10000
EVALUATIONS
Figure 6.4: Operator probability adaptation for 30 city TSP
reported in [13]. Again the adapted P Cr is close to the favoured static setting of 65%,
though slightly lower.
Once again we have a strong correlation between all the individual level probabilities, which also seem to match closely with the population level probabilities prior to
6000 evaluations, in any case.
After around 6000 evaluations it appears that adaptation is breaking down somewhat. This coincides with the performance graph for the individual level adaptive GA
at the point at which the optimisation becomes a lot less aggressive. Since by this
point we have obtained a competitive tour length, it becomes harder for the GA to improve upon this. We therefore have a higher proportion of offspring being created, in
comparison with earlier in the run, which are not acceptable. Since the invocation of
the creating operator is still logged, along with zero credit, this has the net result that
the operator probabilities are being derived from decreasing operator credit scores and
66
Chapter 6. Summative Experiments
comparison evaluation p-value p-value(cor)
norm/pop
6000
0.42
1.0
norm/pop
15000
0.32
1.0
ind/pop
1000
0.02
0.08
ind/pop
15000
0.32
1.0
Table 6.2: p–values for 30 city TSP
thusly become erratic.
This idea is supported by the fact that the best individual suffers most, as it has the
highest demand in terms of offspring quality. Also, with a longer operator queue the
effect is markedly decreased, since there is a greater ‘store’ of credit in a larger queue,
which acts to dampen the effect.
6.3.3 100 city TSP
NORMAL GA / POPULATION LEVEL ADAPTIVE GA
INDIVIDUAL LEVEL ADAPTIVE GA / POPULATION LEVEL ADAPTIVE GA
30000
30000
NORMAL GA
POP ADAPT GA
28000
26000
TOUR LENGTH
TOUR LENGTH
28000
24000
22000
20000
IND ADAPT GA
POP ADAPT GA
26000
24000
22000
0
5000
10000
15000
20000
25000
30000
EVALUATIONS
35000
40000
45000
50000
20000
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
50000
EVALUATIONS
Figure 6.5: Comparative performance for 100 city TSP
With TSP–100, there is an improvement in both speed to solution and final solution
quality with population level adaptation compared with the normal GA. This seems to
be an instance of adaptation working sufficiently well that non-adaptive operation is
bettered. There was no significant difference observed between population and individual level adaptive GA performance.
The population level operator probability adaptation for TSP–100 (Figure 6.6) is
6.3. Results
67
POPULATION LEVEL ADAPTATION
INDIVIDUAL LEVEL ADAPTATION
100
100
BEST P(Cr)
BEST P(Mu)
MEDIAN P(Cr)
MEDIAN P(Mu)
WORST P(Cr)
WORST P(Mu)
P(Cr)
P(Mu)
80
PROBABILITY (%)
PROBABILITY (%)
80
60
40
20
0
60
40
20
0
5000
10000
15000
20000
25000
30000
EVALUATIONS
35000
40000
45000
50000
0
0
5000
10000
15000
20000
25000
30000
EVALUATIONS
35000
40000
45000
50000
Figure 6.6: Operator probability adaptation for 100 city TSP
comparison evaluation p-value p-value(cor)
norm/pop
7500
3 0.01
3 0.01
norm/pop
50000
3 0.01
3 0.01
ind/pop
50000
0.93
1.0
Table 6.3: p–values for 100 city TSP
very similar to that obtained for TSP–30, settling on a value of around P Cr = 60%.
This does not strongly agree with the favoured static setting which was 40%.
We have similar behaviour for the individual level adaptation, although the effects
are not quite so pronounced. It is suspected that the greatly increased search space of
TSP–100 means that there is more room for improvement within a promising area of
the fitness landscape, resulting in more successful offspring (that represent improvements) and maintaining credit levels. Contrastingly, for TSP–30, the optimum is usually reached by around 8000 evaluations and so after this point the best individual’s
stored credit values, cred Cr and cred Mu , can only decrease. This results in very
erratic operator probabilities.
6.3.4 MaxOnes
It should be noted that since the maximum of 100 was always found for MaxOnes,
there was zero variance between the compared end–of–run samples (evaluation = 5000).
As a result the p–values are undefined, however, since the variance between samples is
68
Chapter 6. Summative Experiments
NORMAL GA / POPULATION LEVEL ADAPTIVE GA
INDIVIDUAL LEVEL ADAPTIVE GA / POPULATION LEVEL ADAPTIVE GA
100
100
NORMAL GA
POP ADAPT GA
95
90
SOLUTION FITNESS
SOLUTION FITNESS
90
85
80
75
85
80
75
70
70
65
65
60
IND ADAPT GA
POP ADAPT GA
95
0
500
1000
1500
2000
2500
3000
EVALUATIONS
3500
4000
4500
60
5000
0
500
1000
1500
2000
2500
3000
EVALUATIONS
3500
4000
4500
5000
3500
4000
4500
5000
Figure 6.7: Comparative performance for MaxOnes
POPULATION LEVEL ADAPTATION
INDIVIDUAL LEVEL ADAPTATION
100
100
BEST P(Cr)
BEST P(Mu)
MEDIAN P(Cr)
MEDIAN P(Mu)
WORST P(Cr)
WORST P(Mu)
P(Cr)
P(Mu)
80
PROBABILITY (%)
PROBABILITY (%)
80
60
40
20
0
60
40
20
0
500
1000
1500
2000
2500
3000
EVALUATIONS
3500
4000
4500
5000
0
0
500
1000
1500
2000
2500
3000
EVALUATIONS
Figure 6.8: Operator probability adaptation for MaxOnes
zero, this can in reality be interpreted as a certainty that there is no difference between
the systems being compared. For this reason p–values of ‘1.0’ are entered, thereby
asserting that the null hypothesis is certainly true which, for all practical purposes it is.
The MaxOnes problem was optimally solved by all the GA types, which is not
particularly surprising since it consists of a simple, unimodal landscape. There was an
advantage in terms of speed to solution with population level adaptation compared with
the normal GA. It appears that population level adaptation can successfully exploit this
simple landscape.
These results can be contrasted with those in [22], which considered a MaxOnes
problem (also of 100 bits) in a steady–state GA using an adaptive technique known
as COBRA (discussed in [22]). There are two main differences between ADOPP and
COBRA.
Firstly, COBRA updates operator probabilities periodically, after a certain number
6.3. Results
69
comparison evaluation p-value p-value(cor)
norm/pop
1250
3 0.01
0.02
norm/pop
5000
1.0
1.0
ind/pop
1400
0.33
1.0
ind/pop
5000
1.0
1.0
Table 6.4: p–values for MaxOnes
of evaluations have occurred, as opposed to after the creation of each new individual.
Secondly, COBRA uses a set of fixed operator probabilities which are assigned to
the operators at each ‘re-ranking’ interval, such that an operator that has made a very
positive contribution to progress may be awarded with a higher probability from set.
It was found that COBRA offered an increase in robustness to parameters over a
normal GA, as we have also found from the tuning experiments. However, no actual
performance gains were seen for COBRA compared with a normal GA. It appears that
the finer grained updating mechanism of ADOPP may be suitably responsive for this
problem instance to provide a gain in speed.
The adapted P Cr for MaxOnes or population level adaptation closely matches
the tuned value of 55%. The correlation between population and individual level adaptation is particularly evident for this problem.
6.3.5 De Jong f1
NORMAL GA / POPULATION LEVEL ADAPTIVE GA
INDIVIDUAL LEVEL ADAPTIVE GA / POPULATION LEVEL ADAPTIVE GA
1.8
1.8
1.6
1.6
NORMAL GA
POP ADAPT GA
1.4
1.4
IND ADAPT GA
POP ADAPT GA
1.2
SOLUTION FITNESS
SOLUTION FITNESS
1.2
1
0.8
0.6
1
0.8
0.6
0.4
0.4
0.2
0.2
0
0
0
500
1000
1500
EVALUATIONS
2000
2500
0
500
1000
1500
EVALUATIONS
Figure 6.9: Comparative performance for De Jong f1
2000
2500
70
Chapter 6. Summative Experiments
comparison evaluation p-value p-value(cor)
norm/pop
700
0.16
0.80
norm/pop
2500
0.47
1.0
ind/pop
200
0.09
0.45
ind/pop
700
0.012
0.06
ind/pop
2500
0.21
1.0
Table 6.5: p–values for De Jong f1
POPULATION LEVEL ADAPTATION
INDIVIDUAL LEVEL ADAPTATION
100
100
P(Cr)
P(Mu)
60
40
20
0
BEST P(Cr)
BEST P(Mu)
MEDIAN P(Cr)
MEDIAN P(Mu)
WORST P(Cr)
WORST P(Mu)
80
PROBABILITY (%)
PROBABILITY (%)
80
60
40
20
0
500
1000
1500
EVALUATIONS
2000
2500
0
0
500
1000
1500
EVALUATIONS
2000
2500
Figure 6.10: Operator probability adaptation for De Jong f1
De Jong F1 is another unimodal landscape, which should be straightforward to
optimise. There are no significant differences from the population level adaptive GA
(either for normal or individual level adaptive GA) and given the simple nature of the
fitness landscape this is not surprising. However, population level adaptation did offer
an improvement in speed performance compared with individual level adaptation. It
may be the case that individual level adaptation is an unnecessarily complex mechanism for such a simple function. This agrees with the previous intuition that f1 is too
simple for individual level adaptation to be able to offer any advantages.
There seems to be very little movement in the operator probability values for individual level adaptation when compared with population level adaptation, Figure 6.10
6.3. Results
71
NORMAL GA / POPULATION LEVEL ADAPTIVE GA
0.7
0.6
0.6
NORMAL GA
POP ADAPT GA
IND ADAPT GA
POP ADAPT GA
0.5
SOLUTION FITNESS
0.5
SOLUTION FITNESS
NORMAL GA / POPULATION LEVEL ADAPTIVE GA
0.7
0.4
0.3
0.2
0.1
0.4
0.3
0.2
0.1
0
0
0
500
1000
1500
EVALUATIONS
2000
2500
0
500
1000
1500
EVALUATIONS
2000
2500
2000
2500
Figure 6.11: Comparative performance for De Jong f2
POPULATION LEVEL ADAPTATION
INDIVIDUAL LEVEL ADAPTATION
100
100
P(Cr)
P(Mu)
60
40
20
0
BEST P(Cr)
BEST P(Mu)
MEDIAN P(Cr)
MEDIAN P(Mu)
WORST P(Cr)
WORST P(Mu)
80
PROBABILITY (%)
PROBABILITY (%)
80
60
40
20
0
500
1000
1500
EVALUATIONS
2000
2500
0
0
500
1000
1500
EVALUATIONS
Figure 6.12: Operator probability adaptation for De Jong f2
6.3.6 De Jong f2
For this problem, all approaches seemed as good as each other. This is quite surprising,
as it is a fairly complicated surface. It may simply be too complex for any meaningful
adaptation to take place, though it does appear that the presence of adaptation does not
have a detrimental effect either. Another possibility is that it is a ‘ceiling effect’; the
small dimensionality (2 variables) of the problem is proving easy to optimise for all
GA types. Interestingly there were no significant performance differences for binary
f6, which also features a dimensionality of only 2 variables.
6.3.7 De Jong f3
The only result of note here is the speed gain observed for the normal GA over the population level adaptive GA. The presence of so many plateaus in the landscape is likely
72
Chapter 6. Summative Experiments
comparison evaluation p-value p-value(cor)
norm/pop
220
0.24
1.0
norm/pop
330
0.27
1.0
norm/pop
2500
0.78
1.0
ind/pop
500
0.15
0.75
ind/pop
2500
0.52
1.0
Table 6.6: p–values for De Jong f2
NORMAL GA / POPULATION LEVEL ADAPTIVE GA
INDIVIDUAL LEVEL ADAPTIVE GA / POPULATION LEVEL ADAPTIVE GA
10
10
NORMAL GA
POP ADAPT GA
IND ADAPT GA
POP ADAPT GA
8
SOLUTION FITNESS
SOLUTION FITNESS
8
6
4
6
4
2
2
0
0
0
500
1000
1500
2000
2500
3000
EVALUATIONS
3500
4000
4500
5000
0
500
1000
1500
2000
2500
3000
EVALUATIONS
3500
4000
4500
5000
Figure 6.13: Comparative performance for De Jong f3
to be confusing to the adaptation mechanism due to the lack of meaningful information
when operators do not produce a child which is located on a different plateau better or
otherwise. Again, there is no advantage to be had in terms of solution quality with any
of the GA types.
The adapted operator probabilities match well with the favoured static rates (P Cr = 65%).
6.3.8 De Jong f4
Here, population level adaptation has outperformed the normal GA, but again only in
terms of speed. F4 is basically a fairly simple surface, complicated in a stochastic manner by the presence of Gaussian(0,1) noise present in the fitness function. However, it
would seem that the adaptive mechanism is robust enough such that it can still exploit
the simple underlying surface, as was the case for MaxOnes.
6.3. Results
73
POPULATION LEVEL ADAPTATION
INDIVIDUAL LEVEL ADAPTATION
100
100
P(Cr)
P(Mu)
60
40
20
0
BEST P(Cr)
BEST P(Mu)
MEDIAN P(Cr)
MEDIAN P(Mu)
WORST P(Cr)
WORST P(Mu)
80
PROBABILITY (%)
PROBABILITY (%)
80
60
40
20
0
500
1000
1500
2000
2500
3000
EVALUATIONS
3500
4000
4500
5000
0
0
500
1000
1500
2000
2500
3000
EVALUATIONS
3500
4000
4500
5000
Figure 6.14: Operator probability adaptation for De Jong f3
comparison evaluation p-value p-value(cor)
norm/pop
750
0.04
0.16
norm/pop
5000
0.16
0.64
ind/pop
2000
0.11
0.44
ind/pop
5000
0.16
0.64
Table 6.7: p–values for De Jong f3
The operator probability trajectories (Figure 6.16) match quite strongly between
population and individual level adaptation.
6.3.9 De Jong f5
As was the case for De Jong f2, f5 shows no significant performance differences for
each of the three GA types under consideration. Again, this seems a bit surprising
considering the fairly complex nature of the landscape. Similarly to f2 though, f5 only
has a dimensionality of 2 so this may be another instance of a ceiling effect.
The correlation between individual probabilities is less obvious for f5 (Figure 6.18).
The median individual probabilities seem to be dominant, while best and worst are
staying relatively closer to 50%.
74
Chapter 6. Summative Experiments
comparison evaluation p-value p-value(cor)
norm/pop
1500
0.02
0.10
norm/pop
7500
0.23
1.0
ind/pop
550
0.41
1.0
ind/pop
1500
0.24
1.0
ind/pop
7500
0.71
1.0
Table 6.8: p–values for De Jong f4
comparison evaluation p-value p-value(cor)
norm/pop
1100
0.38
1.0
norm/pop
7500
0.18
0.9
ind/pop
900
0.51
1.0
ind/pop
1250
0.39
1.0
ind/pop
7500
0.27
1.0
Table 6.9: p–values for De Jong f5
6.3. Results
75
NORMAL GA / POPULATION LEVEL ADAPTIVE GA
INDIVIDUAL LEVEL ADAPTIVE GA / POPULATION LEVEL ADAPTIVE GA
100
100
80
80
IND ADAPT GA
POP ADAPT GA
SOLUTION FITNESS
SOLUTION FITNESS
NORMAL GA
POP ADAPT GA
60
40
20
60
40
20
0
0
0
1000
2000
3000
4000
EVALUATIONS
5000
6000
7000
8000
0
1000
2000
3000
4000
EVALUATIONS
5000
6000
7000
8000
Figure 6.15: Comparative performance for De Jong f4
POPULATION LEVEL ADAPTATION
INDIVIDUAL LEVEL ADAPTATION
100
100
P(Cr)
P(Mu)
80
PROBABILITY (%)
PROBABILITY (%)
80
60
40
20
0
BEST P(Cr)
BEST P(Mu)
MEDIAN P(Cr)
MEDIAN P(Mu)
WORST P(Cr)
WORST P(Mu)
60
40
20
0
1000
2000
3000
4000
5000
6000
7000
8000
0
0
1000
2000
EVALUATIONS
3000
4000
5000
6000
7000
8000
EVALUATIONS
Figure 6.16: Operator probability adaptation for De Jong f4
6.3.10 Discussion
There is no consistent advantage of any one type of GA over another, though this is
not really surprising given the mixture of test problems. De Jong f2 and f5 showed no
significant differences between any of the 3 approaches.
Interestingly, these two problems feature only 2 variables each – the lowest number
of variables in all the De Jong problems. This could therefore be a ‘ceiling effect’,
whereby the dimensionality of the problem is sufficiently small that all the GAs can
comfortably optimise the problem.
There are several instances of significant speed advantages in using one method
over another, although again, there is no consistent method which offers an improvement across all problems.
There is only one instance of an adaptive approach offering an improvement in the
76
Chapter 6. Summative Experiments
NORMAL GA / POPULATION LEVEL ADAPTIVE GA
INDIVIDUAL LEVEL ADAPTIVE GA / POPULATION LEVEL ADAPTIVE GA
14
14
12
12
NORMAL GA
POP ADAPT GA
10
SOLUTION FITNESS
SOLUTION FITNESS
10
8
6
8
6
4
4
2
2
0
IND ADAPT GA
POP ADAPT GA
0
0
1000
2000
3000
4000
EVALUATIONS
5000
6000
7000
8000
0
1000
2000
3000
4000
EVALUATIONS
5000
6000
7000
8000
Figure 6.17: Comparative performance for De Jong f5
POPULATION LEVEL ADAPTATION
INDIVIDUAL LEVEL ADAPTATION
100
100
P(Cr)
P(Mu)
80
PROBABILITY (%)
PROBABILITY (%)
80
60
40
20
0
BEST P(Cr)
BEST P(Mu)
MEDIAN P(Cr)
MEDIAN P(Mu)
WORST P(Cr)
WORST P(Mu)
60
40
20
0
1000
2000
3000
4000
EVALUATIONS
5000
6000
7000
8000
0
0
1000
2000
3000
4000
5000
6000
7000
8000
EVALUATIONS
Figure 6.18: Operator probability adaptation for De Jong f5
actual quality of solution (for TSP–100), but this is achieved by population adaptation
versus the normal GA. Since there were no significant differences between population
and individual level adaptation for this problem, we may also implicitly conclude that
individual level adaptation has improved upon normal GA performance.
The operator probability adaptation obtained for each problem is quite telling. Perhaps most notably is the high correlation seen between the worst, median and best
individuals’ P(Cr)/P(Mu) values in individual level adaptation. This suggests that individual adaptation is unable to exploit localised differences in fitness landscapes, since,
if this were occurring, it seems more likely that these adaptation trajectories would
noticeably differ.
Also, for most problems, the operator probability graphs were fairly similar between population and individual level adaptation. Generally the values adapted to
were less pronounced in individual level adaptation, but the rough ‘shape’ of adapta-
6.4. Additional Large TSPs
77
tion appeared to be preserved.
Overall it would appear that with sufficiently large problems, adaptation can offer a
clear advantage. Also, it would appear that individual level adaptation is not exploiting
localised fitness landscape features.
6.4 Additional Large TSPs
Since there is only one instance of adaptation providing a significant improvement in
solution quality, it is worthwhile investigating this further. The advantage was gained
for the TSP–100 problem. This suggests that with sufficiently large and/or complex
problems adaptation does offer an advantage. As it required a sufficiently large problem before an improvement was witnessed for population level adaptation, it is also
possible that with still larger problems an improvement in solution quality may be
obtained with individual level adaptation over population level adaptation.
In order to test this idea, two further TSP instances were run, using the optimal
parameter settings obtained for the TSP–100, namely P Cr = 40% for the normal GA
and depth = 5, decay = 0.8 and qlen = 100 for the adaptive GAs. The extended instances feature 150 and 200 cities 2 and can be seen, together with the original 100 city
instance, in Figure 6.19 (TSP–100 top right, TSP–150 top left and TSP–200 bottom).
It can be seen that the problems all have the same approximately even distribution and
uniform structure.
Tables 6.10 and 6.11 show the p–values obtained for TSP–150 and TSP–200, respectively.
6.4.1 Discussion
These results strongly support the observation that for sufficiently large problems (TSPs
at least) an adaptive GA will outperform a non-adaptive GA. We have significant improvements for this comparison and for both problems in favour of population level
2 Data
for the 200 city tsp was obtained from TSPLIB:
heidelberg.de/groups/comopt/software/TSPLIB95/tsp/kroA200.tsp.gz.
Data for the 150 city tsp was obtained from TSPLIB:
heidelberg.de/groups/comopt/software/TSPLIB95/tsp/kroA150.tsp.gz
http://www.iwr.unihttp://www.iwr.uni-
78
Chapter 6. Summative Experiments
Figure 6.19: Maps of 100,150 and 200 city TSPs
NORMAL GA / POPULATION LEVEL ADAPTIVE GA
INDIVIDUAL LEVEL ADAPTIVE GA / POPULATION LEVEL ADAPTIVE GA
50000
50000
45000
NORMAL GA
POP ADAPT GA
40000
TOUR LENGTH
TOUR LENGTH
45000
35000
30000
25000
IND ADAPT GA
POP ADAPT GA
40000
35000
30000
0
10000
20000
30000
40000
50000
60000
70000
80000
EVALUATIONS
25000
0
10000
20000
30000
40000
50000
60000
70000
80000
EVALUATIONS
Figure 6.20: Comparative performance for 150 city TSP
adaptation. One significant gain in speed was obtained, but this favoured population
level adaptation, as opposed to individual.
We have very similar probability adaptation results as those seen previously, with
considerable similarity between population and individual adaptation, and the familiar
correlation between the worst, median and best individuals’ operator probabilities.
We have shown with some confidence the success of adaptation for large TSPs.
However, individual level adaptation is still equalled or outperformed by population
level adaptation. This is not so surprising since the problem size was the only aspect
scaled up. The nature of the problem structure itself did not change, i.e. there is a
clear lack of any form of localised feature which may provide leverage for individual
adaptation.
6.5. Additional TSPs with Varying Structure
79
POPULATION LEVEL ADAPTATION
INDIVIDUAL LEVEL ADAPTATION
100
100
BEST P(Cr)
BEST P(Mu)
MEDIAN P(Cr)
MEDIAN P(Mu)
WORST P(Cr)
WORST P(Mu)
P(Cr)
P(Mu)
80
PROBABILITY (%)
PROBABILITY (%)
80
60
40
20
0
60
40
20
0
10000
20000
30000
40000
50000
EVALUATIONS
60000
70000
80000
0
0
10000
20000
30000
40000
50000
EVALUATIONS
60000
70000
80000
Figure 6.21: Operator probability adaptation for 150 city TSP
comparison evaluation p-value p-value(cor)
norm/pop
2550
3 0.01
3 0.01
norm/pop
75000
3 0.01
0.025
ind/pop
12450
0.045
0.18
ind/pop
75000
0.37
1.0
Table 6.10: p–values for 150 city TSP
6.5 Additional TSPs with Varying Structure
Since scaling up the problem size of the TSPs shows no differences between the adaptation levels, it is possible that the actual problem structure may play a more influential
role in the applicability of individual level adaptation. The following figures show the
layout of some additional TSPs on which population and individual level adaptive GAs
were run. The problems feature a diverse manner of city layouts.
Each of the following TSP instances were run using the population level and incomparison evaluation p-value p-value(cor)
norm/pop
5000
3 0.01
3 0.01
norm/pop
100000
3 0.01
3 0.01
ind/pop
100000
0.63
1.0
Table 6.11: p–values for 200 city TSP
80
Chapter 6. Summative Experiments
INDIVIDUAL LEVEL ADAPTIVE GA / POPULATION LEVEL ADAPTIVE GA
55000
50000
50000
NORMAL GA
POP ADAPT GA
45000
TOUR LENGTH
TOUR LENGTH
NORMAL GA / POPULATION LEVEL ADAPTIVE GA
55000
40000
40000
35000
35000
30000
30000
0
10000
20000
30000
40000
50000
60000
EVALUATIONS
70000
80000
90000
IND ADAPT GA
POP ADAPT GA
45000
100000
0
10000
20000
30000
40000
50000
60000
EVALUATIONS
70000
80000
90000
100000
80000
90000
100000
Figure 6.22: Comparative performance for 200 city TSP
POPULATION LEVEL ADAPTATION
INDIVIDUAL LEVEL ADAPTATION
100
100
BEST P(Cr)
BEST P(Mu)
MEDIAN P(Cr)
MEDIAN P(Mu)
WORST P(Cr)
WORST P(Mu)
P(Cr)
P(Mu)
80
PROBABILITY (%)
PROBABILITY (%)
80
60
40
20
0
60
40
20
0
10000
20000
30000
40000
50000
60000
EVALUATIONS
70000
80000
90000
100000
0
0
10000
20000
30000
40000
50000
60000
70000
EVALUATIONS
Figure 6.23: Operator probability adaptation for 200 city TSP
dividual level adaptive GAs using the ADOPP parameters previously stated, namely
depth = 5, decay = 0.8 and qlen = 100. The normal GA was not run on these problems
as it has clearly been shown that the adaptive GAs outperform the normal GA for large
TSPs.
For each TSP, a map illustrating the layout of the cities is given, along with the
experimental results.
6.5.1 105 City TSP
Figure 6.24 shows the layout for a 105 city TSP instance 3 . This problem features a
more structured layout than previous examples, with several highly linear clusters. The
3 Data
for the
105 city TSP
obtained from
heidelberg.de/groups/comopt/software/TSPLIB95/tsp/lin105.tsp.gz
TSPLIB:
http://www.iwr.uni-
6.5. Additional TSPs with Varying Structure
81
IND ADAPT GA
POP ADAPT GA
Figure 6.24: Map of 105 city TSP
INDIVIDUAL LEVEL ADAPTIVE GA / POPULATION LEVEL ADAPTIVE GA
17000
16500
TOUR LENGTH
16000
15500
15000
14500
14000
0
10000
20000
30000
40000
50000
60000
EVALUATIONS
Figure 6.25: Comparative performance for 105 city TSP
overall layout is approximately symmetric.
comparison evaluation p-value p-value(cor)
ind/pop
7980
0.49
0.98
ind/pop
52500
0.85
1.0
Table 6.12: p–values for 105 city TSP
6.5.2 127 City TSP
Figure 6.27 shows an instance of a 127 city TSP 4 . This particular problem features a
highly concentrated central cluster, with a few sparse outliers.
4 Data
for the
127 city
TSP
obtained from TSPLIB:
heidelberg.de/groups/comopt/software/TSPLIB95/tsp/bier127.tsp.gz
http://www.iwr.uni-
82
Chapter 6. Summative Experiments
POPULATION LEVEL ADAPTATION
INDIVIDUAL LEVEL ADAPTATION
100
100
BEST P(Cr)
BEST P(Mu)
MEDIAN P(Cr)
MEDIAN P(Mu)
WORST P(Cr)
WORST P(Mu)
P(Cr)
P(Mu)
80
PROBABILITY (%)
PROBABILITY (%)
80
60
40
20
0
60
40
20
0
10000
20000
30000
EVALUATIONS
40000
50000
60000
0
0
10000
20000
30000
EVALUATIONS
40000
50000
60000
Figure 6.26: Operator probability adaptation for 105 city TSP
Figure 6.27: Map of 127 city TSP
6.5.3 225 City TSP
Figure 6.30 illustrates an instance of a 225 city TSP 5 . This is the most pathological example considered, featuring a perfect grid of cities. The problem is in a sense globally
quite uniform, but also features concentrated linear sub–components.
6.5.4 120 City TSP
Figure 6.33 is derived from the original 30 city TSP, and is basically that TSP, repeated
4 times, in order to create a TSP featuring multiple, isolated clusters.
5 Data
for the
225 city TSP
obtained from
heidelberg.de/groups/comopt/software/TSPLIB95/tsp/ts225.tsp.gz
TSPLIB:
http://www.iwr.uni-
IND ADAPT GA
POP ADAPT GA
6.5. Additional TSPs with Varying Structure
83
INDIVIDUAL LEVEL ADAPTIVE GA / POPULATION LEVEL ADAPTIVE GA
134000
132000
TOUR LENGTH
130000
128000
126000
124000
122000
120000
118000
0
10000
20000
30000
40000
EVALUATIONS
50000
60000
70000
Figure 6.28: Comparative performance for 127 city TSP
POPULATION LEVEL ADAPTATION
INDIVIDUAL LEVEL ADAPTATION
100
100
BEST P(Cr)
BEST P(Mu)
MEDIAN P(Cr)
MEDIAN P(Mu)
WORST P(Cr)
WORST P(Mu)
P(Cr)
P(Mu)
80
PROBABILITY (%)
PROBABILITY (%)
80
60
40
20
0
60
40
20
0
10000
20000
30000
40000
EVALUATIONS
50000
60000
70000
0
0
10000
20000
30000
40000
50000
60000
70000
EVALUATIONS
Figure 6.29: Operator probability adaptation for 127 city TSP
6.5.5 Discussion
Although these additional TSPs provide a considerable variety in terms of problem
structure, individual level adaptation has still failed to out perform population level
adaptation.
No significant differences were observed for either adaptive technique.
The operator probability adaptation graphs obtained follow the same trends as witnessed for previous problems. There is a reasonable degree of similarity in the trajectory followed for population and individual level adaptation and, again, a high corre-
lation between best, median and worst individuals’ P Cr and P Mu values. At least
initially, while adaptation is stable.
Although the city layouts vary considerably, it is of course possible that the resulting fitness landscapes are not all that different, but this seems unlikely.
84
Chapter 6. Summative Experiments
comparison evaluation p-value p-value(cor)
ind/pop
14986
0.53
1.0
ind/pop
63500
0.81
1.0
Table 6.13: p–values for 127 city TSP
IND ADAPT GA
POP ADAPT GA
Figure 6.30: Map of 225 city TSP
INDIVIDUAL LEVEL ADAPTIVE GA / POPULATION LEVEL ADAPTIVE GA
150000
TOUR LENGTH
145000
140000
135000
130000
125000
0
20000
40000
60000
80000
100000
120000
EVALUATIONS
Figure 6.31: Comparative performance for 225 city TSP
comparison evaluation p-value p-value(cor)
ind/pop
112500
0.15
0.15
Table 6.14: p–values for 225 city TSP
6.5. Additional TSPs with Varying Structure
85
POPULATION LEVEL ADAPTATION
INDIVIDUAL LEVEL ADAPTATION
100
100
BEST P(Cr)
BEST P(Mu)
MEDIAN P(Cr)
MEDIAN P(Mu)
WORST P(Cr)
WORST P(Mu)
P(Cr)
P(Mu)
PROBABILITY (%)
80
60
40
20
0
60
40
20
0
20000
40000
60000
EVALUATIONS
80000
100000
0
120000
0
20000
40000
60000
80000
EVALUATIONS
Figure 6.32: Operator probability adaptation for 225 city TSP
IND ADAPT GA
POP ADAPT GA
Figure 6.33: Map of 120 city TSP
INDIVIDUAL LEVEL ADAPTIVE GA / POPULATION LEVEL ADAPTIVE GA
3000
2900
2800
TOUR LENGTH
PROBABILITY (%)
80
2700
2600
2500
2400
2300
0
10000
20000
30000
40000
50000
60000
EVALUATIONS
Figure 6.34: Comparative performance for 120 city TSP
100000
120000
86
Chapter 6. Summative Experiments
POPULATION LEVEL ADAPTATION
INDIVIDUAL LEVEL ADAPTATION
100
100
BEST P(Cr)
BEST P(Mu)
MEDIAN P(Cr)
MEDIAN P(Mu)
WORST P(Cr)
WORST P(Mu)
P(Cr)
P(Mu)
80
PROBABILITY (%)
PROBABILITY (%)
80
60
40
20
0
60
40
20
0
10000
20000
30000
EVALUATIONS
40000
50000
60000
0
0
10000
20000
30000
40000
EVALUATIONS
Figure 6.35: Operator probability adaptation for 120 city TSP
comparison evaluation p-value p-value(cor)
ind/pop
15000
0.71
1.0
ind/pop
60000
0.23
0.46
Table 6.15: p–values for 120 city TSP
50000
60000
Chapter 7
Conclusion
7.1 Project Summary
This report has detailed an investigation into whether optimisation performance in
adaptive GAs can be improved by using finer grained adaptation of operator probabilities at the individual, as opposed to the population, level. The primary aim of the
project was to test the hypothesis:
An adaptive mechanism operating at the individual level will perform as well or
better than the same mechanism operating at the population level
A secondary aim was to compare normal GA performance with adaptive performance. In this case, the population level adaptive GA was compared with the normal
GA. The motivations for pursuing self–adaptation in general come from two main
points of view:
1. Hand tuning of GAs is time consuming and difficult, therefore reliable self–
adaptation of parameters can improve on this situation
2. It is likely that for many problems, the optimal parameter values are not fixed
entities, but rather should vary as the GA progresses
The assumption that individual level adaptation will provide better performance
than population level adaptation, is based on the rationale that, faced with an appropriate landscape (highly non–uniform) the finer grained capabilities of individual level
87
88
Chapter 7. Conclusion
adaptation will yield improved results as it is able to respond in a more focused manner to the nuances of the landscape. Feedback is not automatically averaged together,
possibly obscuring information in the process, as is the case for population level adaptation.
In order to conduct the investigation a system was constructed based on [13], which
adapted operator probabilities at the population level. Modifications to this system
were implemented such that a non–adaptive GA and finer grained (individual level)
adaptive GA were realised.
Furthermore, a diverse set of test problems were selected on the basis of stressing
each GA type with a large variety of fitness landscapes, with the aim that those landscapes featuring appropriately localised sub–features would be successfully exploited
by individual level adaptation.
Formative experimentation was then carried out in order to characterise each GA
type’s response to parameter variation and to informally assess each GAs robustness to
parameter variation. From this, parameter settings were selected with which to conduct
more in–depth experimentation.
Using the tuned parameter values from initial experimentation, larger sample sizes
of 50 runs were taken and the results compared using t–tests as appropriate. The t–test
results enabled the acceptance or rejection of the stated hypothesis.
7.2 Conclusions
7.2.1 Test of Hypothesis
A thorough test of the hypothesis has been successfully conducted. The results comparing population and individual adaptive levels whilst not perhaps what was hoped
for, given the assumptions behind the hypothesis, are still informative. There were
several clear trends identified in the results.
The hypothesis was accepted for the 30 city TSP, on the grounds of a significant
advantage in speed. As much as the technical requirements of accepting the hypothesis
have been met (population level adaptation did not proceed to perform better on final
solution quality, for example), the result is tentative on two counts:
7.2. Conclusions
89
1. The advantage was observed in the very early stages of the run, therefore unless
in the context of a very time–limited situation, this is unlikely to be conducive
of real practical benefit.
2. The corrected p–value was not significant for this result, which does not lend
confidence to the result.
Also, the fact that there is no convincing trend supporting the result, does not lend
confidence as to the general significance of the result. In light of the number of TSPs
addressed in the study, and the fact that no such results were obtained for any other
TSP instances, leads to the conclusion that this result is most likely a type 1 error, or a
peculiarity of that particular TSP.
For two test problems, De Jong f1 and TSP–150, the hypothesis was rejected as
it was shown that population level adaptation exhibited a speed advantage over individual level adaptation (in both cases). There are no obvious commonalities between
these problems that might provide some rationale for the result. F1 is a low dimensional and unimodal problem, TSP–150 is comparatively high dimensional and almost
certainly multimodal. It is thought that these results are not necessarily indicative of a
specific trend or pattern, especially in light of the fact that TSP–150 was the only TSP
to obtained such a result. Further experimentation is required to investigate this.
For the all other problems, the hypothesis was accepted, but only on the grounds of
individual level adaptation performing equally as well as population level adaptation.
As mentioned earlier, showing an equivalence in performance of the techniques is
not indicative of any real practical benefit; if anything the opposite is true. Since the
overheads, particularly as regards additional book–keeping, are considerably higher for
individual level adaptation, we really require some performance gain before justifying
the use of the method.
7.2.1.1 Why Was There No Advantage?
The main conclusion to be drawn is that the adaptive mechanism implemented does not
offer an advantage over population level adaptation by operating at the finer grained
individual level. There are two possible reasons why this is the case:
90
Chapter 7. Conclusion
1. The ceiling effect was encountered for many problems, making adaptation somewhat redundant
2. The mechanism and GA implementation is failing to capitalise on local sub–
features of appropriate landscapes
The first reason is probable since for the three smallest numerical optimisation
problems (in terms of dimensionality) there were no significant differences observed
in performance. This was also found to be the case for the smallest TSP instance of 30
cities.
Given the lack of significant differences observed for the varied structures of TSP
problems attempted, at seems reasonable to assert that the mechanism itself is failing
to respond effectively to the subtleties of landscapes being searched. This may be
because operator probability is an inappropriate subject for adaptation. Alternatively,
it is possible that the operator productivity metric in use is an insufficient mechanism
with which to extract the relevant information from the progress of search (such that
adaptation may occur and provide benefit).
Both these suggestions seem feasible. Crossover can in some sense be regarded
as a ‘global’ operator – it may combine solutions existing in far apart regions of the
search space and result in considerable disruption. Therefore, allowing the probability
of crossover to adapt, even at the individual level, does not significantly alter the net
result of its application.
The operator productivity used in ADOPP is relatively coarse compared with the
approaches in some other works, such as [19] and [20]. Both these works utilise fitness
values directly in the adaptive mechanism ([19] by utilising the difference between parent and child in order to reward children that are good and better than their parents,
[20] by utilising the maximum and average fitnesses, along with the individual of interest’s fitness in order to control the amount of disruption experienced by that individual;
basically, high disruption for lower fitnesses and low disruption for higher fitnesses).
These works both report positive results.
Although ADOPP does utilise fitness as a means of determining improvement,
the actual magnitude of improvements are not featured in any way in the adaptive
mechanism. Perhaps such a modification would prove advantageous.
7.2. Conclusions
91
7.2.2 Normal Versus Adaptive Performance
A thorough comparison of the original population level adaptive GA with a non–
adaptive counterpart was successfully realised and addressed the lack of a rigorous
comparison in the original work. There were no advantages in population level adaptation compared with a normal GA, for the original two problems.
There is only one instance of a significant result in favour of a normal GA over
population level adaptation. This was obtained for De Jong f3 (the step function) and
was in terms of speed only. This was most likely due to the misleading nature of the
landscape, featuring several regions in which no useful information can be derived to
enable adaptation. This may be indicative of situations in which adaptation is detrimental to performance. More work is required to investigate this possibility.
Overwhelmingly, where differences were significant, population level adaptation
outperformed normal adaptation. For the majority of these cases there were no differences between population and individual level adaptation, hence we may infer that
individual level adaptation is typically superior to a normal GA.
Gains in speed only for population level adaptation were observed for MaxOnes
and De Jong f4, both of which are straightforward functions. Admittedly, f4 is complicated somewhat by the presence of noise, though the underlying function is simple
enough. The gain for MaxOnes suggests that simple, unimodal problems can be successfully exploited, though no such benefit was seen for f1 (also straight forward and
unimodal), so more experimentation is required to investigate this. The success for the
f4 problem also suggests that the mechanism is robust in the presence of noise, though
of course there is not enough evidence to make any firm conclusions, further noisy
examples, based perhaps on more complicated functions would be more illuminating.
A very convincing advantage for population level adaptation was evident for the
large TSPs attempted (of 100, 150 and 200 cities). Adaptation brought both speed
and quality improvements for each TSP and the corrected p–values also showed significance, in all cases. It can therefore be concluded that for sufficiently large TSPs,
adaptation definitely does deliver an advantage.
92
Chapter 7. Conclusion
7.3 Further Work
The work could be continued and extended in many ways. Some avenues that are felt
worth pursuing are as follows:
1. It may prove more beneficial to adapt operator parameters, as opposed to operator probabilities. This would be particularly relevant in the case of mutation
rate, as this provides a direct control of the disruptive force resulting from mutation. Due to the local scope of mutation – it typically makes small incremental
changes to a solution – this seems a more likely candidate for the successful
exploitation of localised landscape features.
2. Increase the dimensionality of the numerical optimisation problems. This would
answer the question of whether the typical lack of difference in these problems
is actually some due to some inherent quality of the fitness landscapes themselves, or whether it is due to a ceiling effect. If a ceiling effect was primarily
responsible, then higher dimensional problems should begin to yield differences
in results that would prove more instructive as to whether adaptation is suited to
the problems or not.
3. Attempt to make the ‘improvement’ metric more explicit, by including resulting
differences in fitness of new offspring relative to parents since, as discussed,
this appears to aid performance. Similarly, some explicit metric of population
diversity may prove useful in enhancing the adaptation, as this can result in a
much more refined balance of exploration versus exploitation [20], where those
individuals of poor fitness can be subject to much more disruptive forces, and
those of higher fitness are preserved.
Bibliography
[1] P. J. Angeline. Adaptive and self–adaptive evolutionary computations. In
M. Palaniswami and Y. Attikiouzel, editors, Computational Intelligence: A Dynamic Systems Perspective, pages 152–163. IEEE Press, 1995.
[2] T. Bäck. Self–adaptation in genetic algorithms. In Varela and Bourgine, editors,
Towards a Practice of Autonomous Systems: Proceedings of the First European
Conference on Artificial Life. MIT Press, 1992.
[3] L. Davis. Adapting operator probabilities in genetic algorithms. In J. D. Schaffer,
editor, Proceedings of the Third International Conference on Genetic Algorithms,
pages 61–69, San Mateo, CA, 1989. Morgan Kaufmann.
[4] L. Davis. Handbook of Genetic Algorithms. Van Nostrand Reinhold, New York,
NY, 1991.
[5] Á. E. Eiben, R. Hinterding, and Z. Michalewicz. Parameter control in evolutionary algorithms. IEEE Trans. on Evolutionary Computation, 3(2):124–141,
1999.
[6] D. B. Fogel. An introduction to simulated evolutionary optimization. IEEE Trans.
on Neural Networks: Special Issue on Evolutionary Computation, 5(1):3–14,
1994.
[7] J. J. Grefenstette. Optimization of control parameters for genetic algorithms.
IEEE Transactions on Systems, Man and Cybernetics (SMC), 16(1):122–128,
1986.
[8] R. Hinterding, Z. Michalewicz, and Á. E. Eiben. Adaptation in evolutionary
computation: A survey. In Proceedings of The IEEE Conference on Evolutionary
Computation, IEEE World Congress on Computational Intelligence, 1997.
[9] J. H. Holland. Adaptation in Natural and Artificial Systems. The University of
Michigan Press, Ann Arbor, 1975.
93
94
Bibliography
[10] J.R. Holland I. Oliver, D. Smith. A study of permutation crossover operators on
the traveling salesman problem. In J. Grefenstette, editor, Genetic Algorithms and
their Applications: Proceedings of the Second International Conference, Hillsdale, New Jersey, 1987. Lawrence Erlbaum.
[11] K. A. De Jong. An Analysis of the Behaviour of a Class of Genetic Adaptive
Systems. PhD thesis, University of Michigan, Ann Arbor, 1975.
[12] B. A. Julstrom. Very greedy crossover in a genetic algorithm for the traveling
salesman problem. In Applied Computing 1995: Proceedings of the 1995 ACM
Symposium on Applied Computing, New York, 1995. ACM Press.
[13] B. A. Julstrom. What have you done for me lately? adapting operator probabilities in a steady–state genetic algorithm. In L. J. Eshelman, editor, Proceedings
of the Sixth International Conference on Genetic Algorithms, pages 81–87, San
Mateo, CA, 1995. Morgan Kaufmann.
[14] M. Mitchell. An Introduction to Genetic Algorithms. MIT Press, 1992.
[15] J. E. Pettinger and R. M. Everson. Controlling genetic algorithms with reinforcement learning. Engineering Optimisation (Submitted), 2003.
[16] P.M. Ross and D. Corne. Applications of genetic algorithms. In AISB Quarterly,
volume 89, pages 32–30, 1995.
[17] J. Smith and T. C. Fogarty. Self adaptation of mutation rates in a steady state
genetic algorithm. In International Conference on Evolutionary Computation,
pages 318–323, 1996.
[18] J.E. Smith and T. C. Fogarty. Operator and parameter adaptation in genetic algorithms. Soft Computing – A Fusion of Foundations, Methodologies and Applications, 1(2):81–87, 1997.
[19] W. M. Spears. Adapting crossover in evolutionary algorithms. In R. G. Reynolds
J. R. McDonnell and D. B. Fogel, editors, Proceedings of the Fourth Annual Conference on Evolutionary Programming, pages 367–384, Cambridge, MA, 1995.
MIT Press.
[20] M. Srinivas and L.M. Patnaik. Adaptive probabilities of crossover and mutation
in genetic algorithms. IEEE Transactions on Systems, Man and Cybernetics,
24(4):656–667, 1994.
[21] G. Syswerda. Uniform crossover in genetic algorithms. In Proceedings of the
Third International Conference on Genetic Algorithms, San Mateo, California,
1989. Morgan Kaufmann.
Bibliography
95
[22] A. Tuson and P. Ross. Adapting operator settings in genetic algorithms. Evolutionary Computation, 6(2):161–184, 1998.
[23] Darrell Whitley. The GENITOR algorithm and selection pressure: Why rank–
based allocation of reproductive trials is best. In J. D. Schaffer, editor, Proceedings of the Third International Conference on Genetic Algorithms, San Mateo,
CA, 1989. Morgan Kaufman.
[24] M. Wineberg and S. Christensen. Using appropriate statistics. In Tutorials of
GECCO 2003: Genetic and Evolutionary Computation Conference, pages 339–
358, Chicago, IL, USA, 2003.
Appendix A
Test Problem Settings
This appendix details the fixed parameter settings for all the test problems featured in
the report. The ‘Uniform Xover’ entry refers to the probability that a gene is selected
from the second parent, in uniform crossover. The ‘Report Gap’ entry refers to the
frequency with which the GA was sampled during a run. For example, a report gap of
25 means metrics were taken after every 25 evaluations.
A.1 Binary Encoded Problems
Table A.1 details the settings for all binary encoded problems.
A.2 Traveling Salesman Problems
Table A.2 details the settings for all TSPs. Note that ‘Size’ and ‘String Length’ are
not included in this table since they are both equivalent to the number of cities in the
problem. Other parameters are not included since these are not defined for TSPs, e.g.
mutation rate.
97
98
Appendix A. Test Problem Settings
Problem
Binary f6
MaxOnes
f1
f2
f3
f4
f5
Size
2
100
3
2
5
30
2
String Length
44
100
30
24
50
240
34
Mutation Rate
0.05
0.01
Uniform Xover
0.5
0.5
0.5
0.5
0.5
0.5
0.5
Population Size
100
100
100
100
100
100
100
Selection Pressure
1.5
2
1.5
1.5
1.5
1.5
1.5
Evaluations
5000
5000
2500
2500
5000
7500
7500
Report Gap
25
25
10
10
25
25
25
0.033 0.042
0.02
0.0042 0.029
Table A.1: Settings for all binary encoded problems
# Cities
30
100
150
200
105
127
225
120
Population Size
150
500
750
1000
525
635
1000
600
Selection Pressure
2
2
2
2
2
2
2
2
Evaluations
Report Gap
15000 50000 75000 100000 52500 63500 112500 60000
50
125
150
200
Table A.2: Settings for all TSPs
105
127
225
120
Appendix B
Location of Data Files and
Source Code
B.1 Source Code
B.1.1 Development and Execution Environment
All work was implemented in Java (version 1.4.2 03), providing portability and relatively quick development time. The system was developed and run under Redhat 9
Linux on Pentium4 2GHz machine(s), each with 500 MB RAM.
B.1.2 Location
All Java source and class files are located in:
/home/s0340561/PROJECT/code/
B.2 Data Location
The main directory in which all experimental data is stored is:
/home/s0340561/PROJECT/data/
99
100
Appendix B. Location of Data Files and
Source Code
There are several sub–directories within this which contain data specific to certain test
problem/GA type configurations.
B.2.1 Formative Experiment Results
Each problem type was optimised with the following four GA types: normal, population level adaptive, individual level adaptive (with median improvement) and individual level adaptive (with parent improvement). There are therefore four sub–directories
for each problem of the form:
54 prob type 3 Norm - results for the normal GA
54 prob type 3 PopAd - results for the population level adaptive GA
54 prob type 3 IndAd median - results for the individual level adaptive GA (median improvement)
54 prob type 3 IndAd parent - results for the individual level adaptive GA (parent improvement)
Where 4 prob type 3 is one of 6 f6, tsp30, tsp100, max1, djF1, djF2, djF2,
djF3, djF4, djF5 7 .
The order given here matches the order of the problem types as reported in chapter 5.
B.2.2 Summative Experiment Results
Summative experiment results are located in sub–directories of the form:
54 prob type 3 Extended
4 prob type 3 is defined as before. In addition, the following sub–directory contains
data for all the additional TSP problems which featured in sections 6.4 and 6.5:
tspOtherExtended