Download Lecture 10: Learning - Genetic algorithms

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Point mutation wikipedia, lookup

Karyotype wikipedia, lookup

Neocentromere wikipedia, lookup

Chromosome wikipedia, lookup

Polyploid wikipedia, lookup

Genome (book) wikipedia, lookup

Microevolution wikipedia, lookup

Gene expression programming wikipedia, lookup

Population genetics wikipedia, lookup

Biology and consumer behaviour wikipedia, lookup

Epistasis wikipedia, lookup

Genetic drift wikipedia, lookup

Polymorphism (biology) wikipedia, lookup

Mutation wikipedia, lookup

Koinophilia wikipedia, lookup

Group selection wikipedia, lookup

Frameshift mutation wikipedia, lookup

Life history theory wikipedia, lookup

Transcript
Intro to AI
Genetic Algorithm
Ruth Bergman
Fall 2002
Imitating Nature
Aspect of the evolution of organisms:
• The organisms that are ill-suited for an environment
have little chances to reproduce (natural selection)
• Conversely, the best fitting have more chances to
survive and reproduce
Imitating Nature
Reproduction:
• Offspring are similar to their parents
• Random mutations occur and they can bring to better (or worse)
fitting individuals
“The Origin of the Species on the Basis of Natural Selection” C.
Darwin (1859)
Encoding:
• An organism is fully represented by its DNA string, that is a
string over a finite alphabet (4 symbols)
• Each element of this string is called gene
Genetic Algorithm (GA)
• Developed by John Holland in the early 70’s
• Optimization and machine learning techniques
inspired from the process of natural evolution and
evolutionary genetics
– Solutions are encoded as chromosomes
– Search proceeds through maintenance of a population of
solutions
– Reproduction favors “better” chromosomes
– New chromosomes are generated during reproduction
through processes of mutation and cross over, etc.
GA Framework
selection
Search
space
A
0 1 0 0 0
B
1 0 1 1 0
C
1 1 0 1 0
D
0 1 0 1 1
population
cross over
1 0 1 1 0
1 0 0 1 1
0 1 0 1 1
0 1 1 1 0
mutation
1 0 0 1 1
Fitness
evaluation
0 1 1 1 0
reproduction
GA Procedure
•
Start with a population of N individuals
1. Apply the fitness function to all the individuals
2. Select the pairs of individuals for reproduction (repetition
allowed).
3. Each pair generates two children (reproduction with cross-over)
4. Apply a random mutation to the children. The children become
the next generation
5. Apply steps 1,2,3 until some termination criteria applies
Encoding Scheme
• An individual (an organisms) is intended to be a
possible solution for the problem you want to solve
• An individual is represented by a binary string. Such
a string is intended to be the complete description of
the individual
• Example:
Suppose you have to find a number between 0 and 255,
which binary representation contains the same number of 1s
and 0s.
A individual is a string of 8 bits, ex:
h=
0 1 1 1 1 1 1 0
= 126
Fitness Function
• A fitness function is a function that says how good is
a solution, i.e. how well an individual fit the
environment
• Example
f (h)  8 | n1  n0 |
note that the fitness function gets the minimum value (i.e. 0)
when n1  8
or n0  8 and the maximum value (i.e. 8)
when n1  n2  4
The Initial Population
0 1 1 1 1
1 1 0
1 1 1 1 1
1 1 0
0 0 1 0 0
1 0 0
0 0 0 0 0
0 0 1
Optimization
•  local optimum 방지
cf.
Hill-climbing Method
GA Search Method
Selection
• Roulette wheel selection
– compute each individual’s contribution to the global fitness as
– The choice of the pairs for reproduction consists of randomly choosing
the individuals (with replacement) with distribution given by P
encoding
A
B
C
D
0 1 11 1 1 1 0
1 1 11 1 1 1 0
0 0 10 0 1 0 0
0 0 00 0 0 0 1
fitness
4
2
4
2
P(-)
.33
.17
.33
.17
D
17%
C
33%
A
33%
B
17%
Roulette Wheel
Crossover
– Randomly choose a cross over point “c”, i.e. a number
between 1 and n
– return two children: one composed by the first c bits of the
first parent and the last n-c bits of the second parent, the
other composed by the first c bits of the second parent and
the n-c bits of the first parents
0 1 1 1 1 1 1 0
0 1 1 1 1 1 0 0
0 0 1 0 0 1 0 0
0 0 1 0 0 1 1 0
c
1 1 1 1 1 1 1 0
1 1 1 1 1 1 0 1
0 0 0 0 0 0 0 1
0 0 0 0 0 0 1 0
Mutation
• mutation on individuals:
some of the children’s bits are changed (with a small,
independent probability
0 0 11 1 1 1 0
0 0 11 0 1 1 0
0 1 1 00 1 0 0
f  8 | 3  5 | 6
0 0 11 0 1 1 0
f  8 | 4  4 | 8
0 0 10 0 1 1 0
f  8 | 3  7 | 4
1 0 11 1 1 0 0
f  8 | 5  3 | 6
maximum found
Stopping Criteria
• Convergence:
– A population is said to converge when all the genes have
converged, I.e. when the value of every bit is the same at
least in the 95% of the individuals in the population
• Since convergence is not guaranteed, we must
consider other stopping criteria:
– Number of generations
– Almost constant value of the best fitting individual
– Almost constant value of the average fitness of the
population
Parameter Settings
• Population size
– How many chromosomes are in population
• Too few chromosome  small part of search space
• Too many chromosome  GA slow down
– Recommendation : 20-30, 50-100
• Probability of crossover
– How often will be crossover performed
– Recommendation : 80% -95%
• Probability of mutation
– How often will be parts of chromosome mutated
– Recommendation : 0.5% - 1%
Genetic Programming
• One of the central challenges of CS is to get
a computer to do what needs to be done,
without telling it how to do it
– Automatic programming (or program synthesis)
• GP is a branch of genetic algorithms
• Main difference between GP and GA
– Representation of the solution (computer program)
• GA: a string of numbers
– fixed-length character strings
• GP: computer program (lisp or scheme)
– Represent hierarchical computer programs of dynamically
varying sizes and shapes