Download s - Universidad Politécnica de Madrid

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Dual inheritance theory wikipedia , lookup

Behavioural genetics wikipedia , lookup

History of genetic engineering wikipedia , lookup

Genetic engineering wikipedia , lookup

Epistasis wikipedia , lookup

Heritability of IQ wikipedia , lookup

Genetic testing wikipedia , lookup

Public health genomics wikipedia , lookup

Group selection wikipedia , lookup

Polymorphism (biology) wikipedia , lookup

Human genetic variation wikipedia , lookup

Genome (book) wikipedia , lookup

Koinophilia wikipedia , lookup

Gene expression programming wikipedia , lookup

Genetic drift wikipedia , lookup

Population genetics wikipedia , lookup

Microevolution wikipedia , lookup

Transcript
Genetic Algorithms
Data Mining and Distributed
Computing Group
Facultad de Informática
1
DATSI, Universidad Politécnica de Madrid
Overview






2
A class of probabilistic optimization algorithms
Inspired by the biological evolution process
Uses concepts of “Natural Selection” and “Genetic
Inheritance” (Darwin 1859)
Originally developed by John Holland (1975)
Particularly well suited for hard problems where little
is known about the underlying search space
Widely-used in business, science and engineering
DATSI, Universidad Politécnica de Madrid
Natural Evolution
Darwin: The Origin of Species
–
–
Search of optimal forms (environment)
Based on:



3
assortment : recombination of genetic
material
randomness
selection : survival of the fittest
DATSI, Universidad Politécnica de Madrid
Biological Concepts (Cell)
• A set of many small “factories” working together
• Center: cell nucleus
• The nucleus contains the genetic information
4
DATSI, Universidad Politécnica de Madrid
Biological Concepts
(Chromosome)
• Chromosomes store genetic information
• Each chromosome is build of DNA
• Chromosomes in humans form pairs (23 pairs)
• The chromosome is divided in parts: genes
• Genes code for properties
• Possible values of the genes: allele
• Position of the gene in the
chromosome: locus
5
DATSI, Universidad Politécnica de Madrid
Biological Concepts
(Genetics)
• The entire combination of genes: genotype
• A genotype is expressed as a phenotype
• Alleles can be either dominant or recessive
• Dominant alleles will always express from the
genotype to the phenotype
• Recessive alleles can survive in the population for
many generations, without being expressed
6
DATSI, Universidad Politécnica de Madrid
Biological Concepts
(Reproduction)


7
Mitosis: copying the same
genetic information to new
offspring: there is no
exchange of information
Normal way of growing of
multicell structures, like
organs.
DATSI, Universidad Politécnica de Madrid
Biological Concepts
(Reproduction)
• Meiosis: the basis of sexual
reproduction
• After meiotic division 2 gametes
appear in the process
• In reproduction two gametes
conjugate to a zygote wich will
become the new individual
• Genetic information is shared
between the parents in order to
create new offspring
8
DATSI, Universidad Politécnica de Madrid
Biological Concepts
(Reproduction)
• During reproduction “errors” occur
• Due to these “errors” genetic variation exists
• Most important “errors” are:
• Recombination (cross-over)
• Mutation
9
DATSI, Universidad Politécnica de Madrid
Biological Concepts (Natural
Selection)
• The origin of species: “Preservation of favourable
variations and rejection of unfavourable variations.”
• Survival of the fittest
• Mathematical expresses as fitness: success in life
10
DATSI, Universidad Politécnica de Madrid
Genetic Algorithms (GAs)
John Holland (~1973)
–
–
GAs use the principle of natural selection to solve
complicated optimization problems.
An alternative of brute-force search on a complex
solution space.


11
Instead of complete exploration
Estocastic-driven search
DATSI, Universidad Politécnica de Madrid
Genetic Algorithms (GAs)
Search techniques
Calculus-based techniques
Direct methods
Finonacci
Guided random search techniques
Indirect methods
Newton
Evolutionary algorithms
Evolutionary strategies
12
Dynamic programming
Genetic algorithms
Parallel
Centralized
Simulated annealing
Enumerative techniques
Distributed
Sequential
Steady-state
Generational
© W. Williams, ‘99
DATSI, Universidad Politécnica de Madrid
Genetic Algorithms:
Definitions
Concepts:
–
–
–
–
Individual: A possible solution of the problem.
Gene: A solution component (a.k.a. an attribute) [eye color]
Allele: A gene value [eye color=green]
Population: group of potential solutions
+
10011101011011
-19 4 108 46
0.433 -33.345 0.0013
13
A
x
B
3
DATSI, Universidad Politécnica de Madrid
Genetic Algorithms:
Definitions
Concepts:
–
–
Genotype: Genetic codification (seq. of genes)
Phenotype: Actual individual (with its capabilities)
For GAs, phenotype represents the fitness value of
the solution:
Asumption: there exists a relationship between
genetic information and individual fitness.
14
DATSI, Universidad Politécnica de Madrid
Fitness Landscapes
Fitness:
Function that evaluates individual capabilities.
Graphical representation
N+1 dimensions
fitness
15
© L.J. Pit, ‘96
DATSI, Universidad Politécnica de Madrid
Natural Evolution and
Operators
How new individuals are created?
–
Crossover (recombination)


–
mixing existing genetic material, assortment
Inheritage of usefull characteristics
Mutation:


rare (very rare in biological perspective)
Randomness
They simulate the evolution process
16
DATSI, Universidad Politécnica de Madrid
Natural Evolution and
Operators
Crossover
#9
00010100010001
10011101011011
00010100011011
Variants:



17
2, 3, n-points crossover
Uniform crossover
Other...
DATSI, Universidad Politécnica de Madrid
Natural Evolution and
Operators
Mutation
#5
00010100011011
00011100011011
Variants:



18
Mutation-by-swap
Biased mutation
Cataclysmic mutation
DATSI, Universidad Politécnica de Madrid
Natural Evolution and
Operators
Selection:
1.
2.
Evaluation of individuals (fitness)
Parent selection (for each crossover operation).
Selective pressure: relationship between capacities
(fitness) of the individual and its possibilities to
generate new individuals (participate in the
reproduction).
19
DATSI, Universidad Politécnica de Madrid
Natural Evolution and
Operators
Selection (alternatives):
–
–
–
Tournament selection
Rank-based: Continuous function
Roulette wheel
Variants:
–
20
Previous population selection (worst individual
elimination).
DATSI, Universidad Politécnica de Madrid
Canonical Genetic Algorithm
P0
Initial population
end condition?
Fitness-function
selection
P’i
Selected population
crossover/mutation
Pi+1
21
Next population
Includes the
chances of
participation in
reproduction
DATSI, Universidad Politécnica de Madrid
GA Variants (Replacement)
Elitism:
–
–
The best(s) individuals in Pi stay in population Pi.
Provides monotonic fitness increment
Steady-state:
–
–
22
Only one individual is created per generation.
New one replaces worst individual.
DATSI, Universidad Politécnica de Madrid
Replacement Overview
Selection
Crossover/mutation
Pi
Current population
Size=N
Offi
Offspring population
P’i
Next population
Canonical
23
Steady-sate
Elitism
Offspring size
N
1
>, = ó < N
Current  Next
0
N-1
<< N
Replacement
All
Worst
Worst
DATSI, Universidad Politécnica de Madrid
The Metaphor
Genetic Algorithm
24
Nature
Optimization problem
Environment
Feasible solutions
Individuals living in that
environment
Solutions quality (fitness function)
Individual’s degree of adaptation to
its surrounding environment
A set of feasible solutions
A population of organisms (species)
Stochastic operators
Selection, recombination and
mutation in nature’s evolutionary
process
Iteratively applying a set of
stochastic operators on a set of
feasible solutions
Evolution of populations to suit their
environment
DATSI, Universidad Politécnica de Madrid
Aspects to consider:
Population Size
What’s the optimal size of the population?
–
–
25
Too small: premature convergence (weak
exploration)
Too large: waste of resources
DATSI, Universidad Politécnica de Madrid
Aspects to consider:
Problem representation

A key aspect: chromosomes representation
–
–
Initially, a binary string
Nowadays, other possibilities:






–
26
Bit strings
Real numbers
Permutations of element
Lists of rules
Program elements
... any data structure ...
(0101 ... 1100)
(43.2 -33.1 ... 0.0 89.2)
(E11 E3 E7 ... E1 E15)
(R1 R2 R3 ... R22 R23)
(genetic programming)
Great influence in the problem resolution
DATSI, Universidad Politécnica de Madrid
GA elements to define





27
Chromosome representation
Creation of initial population
Fitness function
Genetic operators
Parameters (population size,
probabilities of the genetic
operators, etc.)
Typical configuration for
small problems:
Population size:
50 – 100
Children per
generation:
= population
size
Crossovers:
> 85%
Mutations:
< 5%
Generations:
20 – 20,000
DATSI, Universidad Politécnica de Madrid
Example (The MAXONE
problem)
• Suppose we want to maximize the number
of ones in a string of m binary digits
Is it a trivial problem?
• It may seem so because we know the answer
in advance
28
• However, we can think of it as maximizing the
number of correct answers, each encoded by 1,
to m yes/no difficult questions`
DATSI, Universidad Politécnica de Madrid
MAXONE problem



29
An individual is encoded (naturally) as a string of m
binary digits
The fitness f of a candidate solution to the MAXONE
problem is the number of ones in its genetic code
We start with a population of n random strings.
Suppose that m = 10 and n = 6
DATSI, Universidad Politécnica de Madrid
MAXONE problem (initialization)
• We toss a fair coin 60 times and get the
following initial population:
30
s1 = 1111010101
f (s1) = 7
s2 = 0111000101
f (s2) = 5
s3 = 1110110101
f (s3) = 7
s4 = 0100010011
f (s4) = 4
s5 = 1110111101
f (s5) = 8
s6 = 0100110000
f (s6) = 3
DATSI, Universidad Politécnica de Madrid
MAXONE problem (selection)
• Next we apply fitness proportionate selection
with the roulette wheel method:
Individual i will have a
f (i )
 f (i)
• We repeat the
probability to be chosen
extraction as many
times as the number of
Area is
1
2
Proportional
individuals we need to
n
to fitness
value
have the same parent
3
population size (6 in our
case)
4
i
31
DATSI, Universidad Politécnica de Madrid
MAXONE problem (selection)
• Suppose that, after performing selection, we
get the following population:
32
s1` = 1111010101
(s1)
s2` = 1110110101
(s3)
s3` = 1110111101
(s5)
s4` = 0111000101
(s2)
s5` = 0100010011
(s4)
s6` = 1110111101
(s5)
DATSI, Universidad Politécnica de Madrid
MAXONE problem (crossover)
• Next we mate strings for crossover. For each
couple we decide according to crossover
probability (for instance 0.6) whether to actually
perform crossover or not
33
• Suppose that we decide to actually perform
crossover only for couples (s1`, s2`) and (s5`, s6`).
For each couple, we randomly extract a
crossover point, for instance 2 for the first and 5
for the second
DATSI, Universidad Politécnica de Madrid
MAXONE problem (crossover)
Before crossover:
s1` = 1111010101
s2` = 1110110101
s5` = 0100010011
s6` = 1110111101
After crossover:
s1`` = 1110110101
s2`` = 1111010101
34
s5`` = 0100011101
s6`` = 1110110011
DATSI, Universidad Politécnica de Madrid
MAXONE problem (mutation)
• The final step is to apply random mutation: for each bit that
we are to copy to the new population we allow a small
probability of error (for instance 0.1)
• Before applying mutation:
35
s1`` = 1110110101
s4`` = 0111000101
s2`` = 1111010101
s5`` = 0100011101
s3`` = 1110111101
s6`` = 1110110011
DATSI, Universidad Politécnica de Madrid
MAXONE problem (mutation)
After applying mutation:
36
s1``` = 1110100101
f (s1``` ) = 6
s2``` = 1111110100
f (s2``` ) = 7
s3``` = 1110101111
f (s3``` ) = 8
s4``` = 0111000101
f (s4``` ) = 5
s5``` = 0100011101
f (s5``` ) = 5
s6``` = 1110110001
f (s6``` ) = 6
DATSI, Universidad Politécnica de Madrid
MAXONE problem
• In one generation, the total population
fitness changed from 34 to 37, thus improved
by ~9%.
• At this point, we go through the same
process all over again, until a stopping
criterion is met
37
DATSI, Universidad Politécnica de Madrid
Local Optimizations (Lamark)
Fitness calculation performs an improved
solution in the neibourghood:
–
–
Lamarkian Evolution: the inheritance of acquired
characteristics.
Methods:


38
Hill-climbing
Simulated annealing
DATSI, Universidad Politécnica de Madrid
Local Optimizations (Baldwin)
Other alternatives:
–
–
Evolve fitness
function
But keep
genetic
representation
As a result:
–
39
New fitness
landscape
© L.J. Pit, ‘96
DATSI, Universidad Politécnica de Madrid
GAs: Why do they work?
• Schema Theorem
• Building Blocks Hypothesis
40
DATSI, Universidad Politécnica de Madrid
Schema Notation
• {0,1,#} is the symbol alphabet, where # is
a special wild card symbol
• A schema is a template consisting of a
string composed of these three symbols
• Example: the schema [1#00#] matches
the strings: [10000], [10001], [11000] and
[11001]
10 0
10000
10001
11000
11001
41
DATSI, Universidad Politécnica de Madrid
Schema



42
“A schema is a similarity template describing a
subset of strings with similarities at certain
positions” John Holland
Crossover causes low order schema, to get
increased representation in the next generation.
Mutation causes long schema to be destroyed! Thus
short schema have a better chance of surviving to
the next generation
DATSI, Universidad Politécnica de Madrid
Schema Theorem and
Building Block


43
GAs explore the search space by short, loworder schemata which, subsequently, are used
for information exchange during crossover
Objective: Building blocks (set of alleles with
representative participation on fitness/solution
quality).
DATSI, Universidad Politécnica de Madrid
Building Blocks
Hypothesis
44

“A genetic algorithm seeks near-optimal
performance through the juxtaposition of short,
low-order, high-performance schemata, called the
building blocks”

Some building blocks (short, low-order schemata) can
mislead GA and cause its convergence to suboptimal
points
DATSI, Universidad Politécnica de Madrid
Differences betwwen GA and
Conventional Search Algorithms





45
GA works on a coding of the parameters set, not the
parameter themselves
GA searches from a population of points, not a single
point
GA uses only a payoff function, and no domain
knowledge
GA uses probabilistic transition rules, not deterministic
ones
GA can provide a number of potential solutions to a
given problem. The final choice is left to the user.
DATSI, Universidad Politécnica de Madrid
Conclusions
“Genetic Algorithms are good at taking large, potentially huge
search spaces and navigating them, looking for optimal
combinations of things, solutions you might not otherwise find
in a lifetime.”
Salvatore Mangano
Computer Design, May 1995
46