Download Ch 12

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Genetic testing wikipedia , lookup

Ridge (biology) wikipedia , lookup

Hybrid (biology) wikipedia , lookup

Neocentromere wikipedia , lookup

Skewed X-inactivation wikipedia , lookup

Polymorphism (biology) wikipedia , lookup

Behavioural genetics wikipedia , lookup

Chromosome wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Quantitative trait locus wikipedia , lookup

Minimal genome wikipedia , lookup

Gene expression profiling wikipedia , lookup

Gene wikipedia , lookup

Y chromosome wikipedia , lookup

Frameshift mutation wikipedia , lookup

Human genetic variation wikipedia , lookup

Epigenetics of human development wikipedia , lookup

Karyotype wikipedia , lookup

Genomic imprinting wikipedia , lookup

Genetic engineering wikipedia , lookup

Polyploid wikipedia , lookup

Public health genomics wikipedia , lookup

Genome evolution wikipedia , lookup

Heritability of IQ wikipedia , lookup

Mutation wikipedia , lookup

History of genetic engineering wikipedia , lookup

X-inactivation wikipedia , lookup

Genetic drift wikipedia , lookup

Koinophilia wikipedia , lookup

Epistasis wikipedia , lookup

Designer baby wikipedia , lookup

Point mutation wikipedia , lookup

Biology and consumer behaviour wikipedia , lookup

Genome (book) wikipedia , lookup

Population genetics wikipedia , lookup

Gene expression programming wikipedia , lookup

Microevolution wikipedia , lookup

Transcript
12. GENETIC ALGORITHMS FOR SOLUTION OF
NONLINEAR OPTIMIZATION PROBLEMS
12.1
BACKGROUND
There are many types of water resources problems that are intractable with respect to classical
optimization approaches. For example, consider a groundwater optimization problem where it is
desired to determine the minimum pumping cost necessary to produce a desired sequence of outflows through time. If:

the aquifer is large, there will be many computational nodes (using standard aquifer modeling
methods, such as finite difference or finite element techniques)

the problem is time-varying, then the modeling approach must step through time, thereby
increasing the problem size even more

the aquifer is unconfined, the governing hydrodynamic equations will be nonlinear, as will be
their finite element or finite difference representation
If such a problem were to be modeled using the approaches discussed in the previous section, the
resulting model could literally have millions of simultaneous nonlinear constraints. It would not
be possible to solve such a problem, even with the powerful computer hardware and software
that has become so readily available. Problems that are so intractable--because of their dimensionality and/or nonlinearity--are quite common in water resources engineering (e.g., groundwater optimization such as the above, optimal operation of multiple reservoir systems in large river
basins, multiple constituent water quality management, etc.). This has led to the development
and application of non-traditional optimization methods that are robust, but not based on classical mathematical approaches, such as LP or gradient methods. Genetic algorithms (GA) represent one such set of robust methods.
12.2
GENETIC ALGORITHMS AND GENETIC PROGRAMMING
Genetic programming (GP) is a relatively new branch of operations research. The reader is
referred to the following on-line materials for a more exhaustive treatment of the subject:

an Introduction to Genetic Algorithms with Java Applets (http://cs.felk.cvut.cz/~xobitko/ga/)

the official GA FAQ (http://www.cs.cmu.edu/Groups/AI/html/faqs/ai/genetic/top.html)

the GA Archives (http://www.aic.nrl.navy.mil/galist/)

The Hitch-Hiker’s Guide to Evolutionary Computation
(http://www.cs.purdue.edu/coast/archive/clife/FAW/www/)
The Genetic Programming Tutorial Notebook

133
(http://www.mysite.com/jjf/gp/Tutorial/tutorial.html)
GP has been used in water resources engineering only in recent years. It is robust and computationally efficient for many types of problems, especially those that are highly nonlinear. Darwin’s Theory of Evolution and the basic genetic operations of sexual reproduction have inspired
it. As a result, much of the terminology used in GP/GA is derived from these origins in biology.
12.2.1
Biological Background
The physical characteristics of an individual are determined by its genetic make-up. The set of
physical characteristics of an organism is called the organism’s “phenotype”. The genetic makeup of an organism is called its “genotype”.
Genetic material is encoded in genes, which are arrayed together to form chromosomes. The
notion of genes combining to form chromosomes, which collectively represent an organism’s
genotype, which in turn sets the organism’s physical characteristics--its phenotype--is illustrated
in a very simplistic fashion in Figure 12.1.
Genotype
(genetic make-up)
Phenotype
(physical characteristics)
blue eyes
gene for
eye color
brown hair
gene for
hair color
{
...
chromosome
(a collection of genes)
Figure 12.1: A Simple Representation of the Relationship between
an Organism’s Genotype and Phenotype
In nature (according to Darwin’s Theory of Evolution), the environment acts upon an individual’s physical characteristics and determines:
134

the individual’s suitability for survival

the individual’s likelihood for reproductive success
In general, those individuals that are most suited for survival in the environment in which they
live will produce the greatest number of offspring. Thus:

they will pass more of their genetic material to subsequent generations than other, less fit
individuals

their offspring will be better suited for survival than the offspring of other, less fit individuals
This process of more fit individuals passing their genetic material to a greater number of offspring than less fit individuals is known as “survival of the fittest”. GP is a branch of operations
research whereby these biological processes are used as a set of principles for constructing optimization algorithms; these algorithms generate “populations” of decisions or management policies that become more and more fit with each succeeding generation. The following sections
illustrate how this is done.
12.2.2
Relationship of GP to Optimization
The process of biological evolution is one wherein, with each succeeding generation, individuals
are produced that are, on average, better fit for the environment in which they live. As such, it is
a type of optimization process which, in a sense, creates “better” individuals with each iteration.
Presumably, then, given enough iterations (i.e., with enough generations of a population living in
an environment of interest) and a way of measuring the quality or desirability of an organism, an
individual would eventually be produced having physical characteristics that, in total, would be
in the neighborhood of an “optimal solution”. This connection between genetics and evolution
on the one hand and optimization on the other begins to become more obvious when considering
the relationship in the terminology they use. For example, rough synonyms for terms in GP and
other, more familiar traditional terms in operations research are given in Table 12.1.
12.2.3
A Basic Genetic Algorithm
The basic design for genetic algorithms is actually quite simple and easy to code into a computer
program. A GA normally consists of the following steps:
Step 1: Population Generation: A population of n chromosomes (i.e., individuals) is generated
by randomly selecting values for the genes in the chromosomes. (I.e., randomly assign values to
the decision variables for each of a large number of alternatives.)
Table 12.1: Terminology Used in GA and Corresponding
Synonyms from Systems Analysis
Terminology Used in GA
Synonymous Concepts from Systems Analysis
135
gene
chromosome = collection of genes =
“organism” = “individual”
population = a set of chromosomes or
individuals
fitness
the value of a decision variable
an array of decision variables
a set of solutions (a collection of n-tuples, each of
which specifies a different set of values of decision
variables)
objective function value
Step 2: Fitness Evaluation: Evaluate the “fitness” of each chromosome in the population. (I.e.,
calculate the value of the objective function for each alternative.)
Step 3: Test for Completion: Test to see if an end condition has been achieved (e.g., test to see
if a maximum number of generations has been reached, etc.). If so, stop. If not, continue with
the next step.
Step 4: Create a New Population: Apply the processes of selection, crossover, mutation, and
replacement to build a new population.
Step 4a: Selection: Select two parent chromosomes from the present population according
to their fitness: the greater the fitness of an individual, the greater is the chance that the individual will be selected to be a parent and produce offspring. (I.e., select two alternatives
from the current collection of alternatives, and base that selection upon the value of the
objective function of the current alternatives.)
Step 4b: Crossover: With a pre-selected probability, select genes from one parent or the
other to form a new individual (i.e., to form an offspring). (I.e., use some of the decision
variable values from one of the alternatives, and some from the other, to formulate a new
alternative.)
Step 4c: Mutation: With a pre-selected probability, cause a mutation to happen at any given
gene in the new individual (i.e., make a small change in the value of a randomly selected
decision variable). (I.e., make small, random changes in the values of some of the decision
variables of the new alternative.)
Step 4d: Replacement: Repeat Steps 3a through 3c until n new chromosomes have been
constructed. Replace the old population of chromosomes with the new ones. (I.e., repeat the
processes outlined in Steps 3a through 3c until a complete set of new alternatives has been
formulated. Replace the old set of alternatives with this new one.)
Step 5: Repetition: Repeat the process with the new population, starting at Step 2.
12.2.4
Illustration of Key Steps
136
The following sections illustrate the above concepts in ways that might be more familiar to water
resources engineers.
Encoding of a Chromosome: Information about the genetic make-up of an individual is encoded
into a chromosome. There are several different ways of encoding such information, depending
on the type of problem of interest. The major encoding methods are binary encoding (where
genes take on values of either 0 or 1), permutation encoding (where genes have integer values),
value encoding (where genes take on real-valued numbers), and tree encoding (where genes are
actually “objects”, such as commands in a programming language; tree encoding is used in GA
to “evolve” computer programs). These are illustrated in Figure 12.2. The encoding method of
greatest utility to water resources problems is value encoding.
1 0 0 1 0 1 1 0 1 0 0 ...
"Binary Encoding": every chromosome
is a string of bits (i.e., either 0 or 1).
6 1 4 9 3 0 5 2 7 8 1 ...
"Permuatation Encoding": every
chromosome is a string of numbers,
each of which is a number in a
sequence.
"Value Encoding": every chromosome
is a sequence of values.
3.145 6.259 1.476 2.847 ...
+
"Tree Encoding": every chromosome
is a tree of objects, such as functions
or commands in a programming
language.
Enter
*
5
%
Figure 12.2: Alternative Chromosome Encoding Methods
Fitness Evaluation: Evaluation of the fitness of an individual simply corresponds to determining
a scalar-valued expression by examining the genes in the individual’s chromosome. This is the
same as calculating the value of the objective function, given values for the decision variables of
a problem, and is illustrated for a water resources problem in Figure 12.3.
137
"Chromosome" = set of
decision variable values, e.g.:
}
}
...
}
"Fitness" is evaluated for an individual
by putting the values of the "genes" into
a simulation model to see, for example:
Reservoir operating
rule parameter values
• how the water resources system
will behave
Pumping rates on
wells
• what will be the total costs and
benefits that result
others
e.g., "fitness" = objective function value
= benefits - costs
Figure 12.3: Evaluation of “Fitness” for a
Water Resources Problem
Crossover: Crossover is the process whereby, with a pre-specified probability, genes from one
parent or the other are used to form a new individual (the “offspring”). Crossover is a random
process involving the identification of a “crossover point” on the chromosome. It works by first
selecting (at random) one or more points on the chromosome where “crossover” will occur. All
genes in the offspring chromosome up to that point will be taken from one parent, while all genes
after that point will be taken from the other. Typically, multiple crossover points are selected,
with genes being taken first from one parent until a crossover point is encountered, and then
from the other parent until the next crossover point is found (as illustrated in Figure 12.4).
For water resources problems where value encoding is used, crossover consists of simply copying genes from first one parent and then the other, alternating between parents as crossover
points are found. This is illustrated in Figure 12.5.
138
Parent
A
Parent
B
Genes are selected at random
from the parents and copied
to the offspring.
Offspring
Figure 12.4: The Process of Crossover to
Produce a New Offspring
Parent A:
(3.712)
(4.681)
(0.973)
(7.818)
(6.579)
cross over point (randomly selected)
Parent B:
(8.040)
(6.721)
(5.619)
(0.011)
(2.038)
Offspring:
(3.712)
(4.681)
(0.973)
(0.011)
(2.038)
Figure 12.5: Value Encoding Crossover
139
Mutation: The process of mutation is used to continually introduce new values of genes into the
population (i.e., to continually modify the values of decision variables in alternatives). This
ensures that the current best solution will not stagnate, but will continue to improve with each
new generation. Mutation is a random, but controlled process. After a new offspring is created,
each gene in the offspring is examined. Most genes are left unchanged, but the value of some
genes will be modified. This will happen at random and only with a pre-specified frequency
(illustrated in Figure 12.6).
Offspring
before mutation:
Offspring
after mutation:
mutation
mutation
Figure 12.6: The Process of Mutation
For GA problems involving value encoding, the mutation process must modify the value of
mutated genes by only a small amount (see Figure 12.7). Many simple algorithms are available
for ensuring that the mutations are not so large (or numerous) to cause the resulting individual to
be “deformed”. Mutations should represent slight perturbations, not catastrophic ones. For
example, the rate at which genes mutate is typically selected to be quite low (say, on the order of
less than one percent). If a value encoding gene is to be mutated, the direction of the change
might be selected at random (essentially from a coin flip), but the magnitude of the mutation is
typically kept low (say, a random percent of some maximum amount for a given gene). Prespecification of the values of the parameters that control mutation rates and amounts is part of
the art of successful genetic programming.
140
Chromosome
Before
Mutation
Chromosome
After
Mutation
1.29
no mutation
1.29
5.68
no mutation
5.68
2.86
mutation
2.73
4.11
mutation
4.22
5.55
no mutation
5.55
...
...
Figure 12.7: Value Encoding Mutation
Selection of Parent Chromosomes: Various methods have been proposed to select the chromosomes of one population that will become the parents of chromosomes of the following population. All are based on some reference to the fitness of individual chromosomes. A common
method of parent selection is called “roulette wheel” selection, which picks individuals from the
present population at random. Let the probability of individual i being selected to become a parent be prob(Pi), define the fitness if individual i to be fi, and assume there are n individuals in the
population. The probability of individual i being selected, then, is:
prob(Pi) =
individual fitness
sum of fitness of all individuals
=
fi
n
 fj
...[12.1]
j=1
Selection with Elitism: When creating a new population of n individuals, it is often advantageous to retain a few of the best individuals from the previous population. Doing so is called
“elitism”. It works by retaining the best m individuals from the old population, and, in a population with n individuals, only replacing only n - m of them with new offspring. This prevents the
loss of the best solutions that have been found.
12.2.5
Parameters of GA
The most important parameters in a genetic algorithm are:

The population size, n: The population size controls the quantity of genetic material that is
being examined at any one time in search of an optimal solution. The greater the population
size, the greater will be the amount of genetic material under consideration, and, hence, the
141
greater will be the likelihood that better solutions will be discovered in the next generation.
However, as the population size increases the amount of computer resources and time
required to evaluate population fitness also increases.

The crossover probability: The crossover probability governs how often crossover will take
place in combining the genes of both parents to produce an offspring. Crossover that is too
frequent might separate genes that are close together on the chromosome of one of the parents, but which act in combination in a favorable manner. Crossover that is not frequent
enough runs the risk of producing an offspring that is too similar to one or the other of the
parents.

The mutation probability: The mutation probability governs how often genes will be
mutated. Mutations prevent the algorithm from converging to a local optimum, but mutations that are too frequent might cause the algorithm to wander and thereby slow the rate of
convergence to a global optimum.
12.3
EXAMPLE GA PROBLEM
Consider the following simple nonlinear optimization problem:
Max Z = f(x,y) = 1000 - [(x - 1)2 + (y- 1)2]
s.t.:
...[12.2]
-10 ≤ x ≤ 10
...[12.3]
-10 ≤ y ≤ 10
...[12.4]
This problem was solved with MacsGA (refer to Appendix 2) using the GA parameter values
shown in Table 12.2. Figure 12.8 shows a plot of the maximum, mean, and minimum fitness
(objective function) values of the population versus generation. Note that for a population size
of 100, the initial maximum fitness is very high. Also note that even the mean population fitness
rapidly converges to the optimal solution (i.e., Z = 1000), but that the minimum fitness per generation tends to wander as a result of the random selection and mutation processes built into the
GA code.
Table 12.2: Genetic Algorithm Parameters Used in
Solving Example Problem
Parameter
Value
Probability of Crossover
0.5
Probability of Mutation
0.05
Population Size
100
Number of Generations
100
Number of Elite Individuals
5
142
Example of GA Converge nce
1050
1000
Fitness
950
Minimum
Mean
Maximum
900
850
800
0
25
50
75
Generation
100
(a) Minimum, Mean, and Maximum Fitness for 100 Generations
Maximum Fitness Curve
1000.1
1000.0
Fitn ess
999.9
999.8
999.7
999.6
999.5
999.4
0
5
10
15
Gen eration
20
25
(b) Maximum Fitness for 30 Generations
Figure 12.8: GA Convergence in Solution
of a Simple Quadratic Problem
143
30
12.4
PROBLEMS
1. Evaluate the sensitivity of the rate of convergence of the problem presented in Section 12.3
to different GA parameter values (i.e., probability of crossover, mutation probability, etc.).
Plot and discuss your results.
2. Solve Problem 2 from Section 11 using a genetic algorithm.
144