Download Introduction to AI (part two)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Genome (book) wikipedia , lookup

Hardy–Weinberg principle wikipedia , lookup

Polymorphism (biology) wikipedia , lookup

Koinophilia wikipedia , lookup

Frameshift mutation wikipedia , lookup

Mutation wikipedia , lookup

Genetic drift wikipedia , lookup

Point mutation wikipedia , lookup

Epistasis wikipedia , lookup

Gene expression programming wikipedia , lookup

Microevolution wikipedia , lookup

Population genetics wikipedia , lookup

Transcript
Introduction to AI (part two)
Tim Watson
G6.71
[email protected]
Overview
• Three weeks on Genetic Algorithms (Gas)
• Two weeks on Symbolic AI
• One week of revision
• Labs used to run experiments with a
simple GA program
• Coursework and slides will be available on
L:\tw\IntroAI
What is a GA?
• Evolutionary Algorithm
– Create random population of possible
solutions in the form of bitstrings
– See how good each one is (test fitness)
– Produce next generation (fitter are more likely
to get into next generation)
– Crossover, mutate and make next generation
the current one
What can they do?
•
•
•
•
•
•
•
•
•
Schedule Barcelona Olympics
Aircraft Design
Dynamic Routing in Networks
Robot Arm Trajectory Planning
Lab Task Scheduling for US Navy
Aircraft Missile Evasion
Evolving aNN Architecture
Parameter Tuning for Sonar Systems
Conformational Analysis of DNA
Example: Onemax
1. Initial population:
100
000
110
3. Reproduce:
110
100
110
2. Calculate fitness:
100 2
000 1
110 3
4. Crossover & Mutate:
110
101
010
Biological Terminology
• Genes etc.
– Gene
– Locus (plural loci)
– Allele (also called gene value)
– (Pleiotropic gene)
• Genotype (Chromosome)
• Phenotype
• Fitness Landscape
GA Theory
• Schema Theorem
– GA searches schemata in parallel
– 10 represents 10, 1#, #0 and ##
– The theorem is rubbish!
• Building Block Hypothesis
– Good, small sequences are found and
recombined to form good solutions
• No Free Lunch Theorem
GA Parameters
• Population Size
– Static or Dynamic?
• Chromosome Size
– Fixed or Variable?
• Crossover Rate
– One-point, two-point or uniform?
• Mutation Rate
– Fixed or Variable?
• Fitness Function
• Termination Criteria
Prediction Test!
• What happens to the population statistics
in a standard GA with random fitness, no
crossover, no mutation and chromsize
equals 16?
– Best, Worst, Mean , Std. Dev., column counts
• Best=17, Worst=1, Mean=9ish, Std. Dev.
constant-ish, pop converges randomly.
Reproduction in GAs
• Need selective pressure for reproduction
to improve the population fitness
– None leads to random walk (slow)
– Some leads to geometric growth of best (fast)
• Infinite populations select individuals on
relative fitness: fit/mean(fit)
• Finite populations also affected by how
many copies are already present
Types of Selection
• Fitness-Proportionate
– Fitness scaling based on raw fitness where if
fita < fitb then scaled(fita)  scaled(fitb)
– Scaling can be altered dynamically
• Rank-Order
• Tournament
• Elitism
Initialising the Population
•
•
•
•
•
Uniformly at random
Best Guesses
Converged to Best Known
From Real World
Hybrid
Mutation
• Goal of selection: survival of the fittest
• Goal of mutation: explore lost or never seen
alleles
• Random reset versus bit flip
– Reset rate = ½ bit flip rate
• Mutation as a spring
• In infinite time every possible population visited
an infinite number of times
• Alternative to mutation: complement
Crossover
• Goal: to try out different combinations of
good bits of individuals
• Crossover point
– One-point
– Two-point
– N-point
– Uniform
Crossover (2)
• Closer genes are less likely to be split by
crossover
– AB###### probability of split with one-point
crossover = 1/7
– A######B probability of split = 1
• Local maxima can occur (e.g. for onemax)
11011 fit=4
1011000101
00100 fit=1
0010100000
All children have lower fitness than parents
Design Decisions
• If genes are linked then the representaion
of an individual ought to keep them close
together
• Get the balance right:
– Popsize too small  premature convergence
– Popsize too large  too slow to compute
– Mutation rate too low  not enough exploring
– Mutation rate too high  too much noise
Messy GAs
• Developed by David Goldberg et al. late
1980s early 1990s
• Goal of messy GAs: improve function
optimisation performance by explicitly
building up increasingly longer, highly fit
strings from well-tested shorter building
blocks
• Nature started with simple life-forms and
built up complex ones
Representation
• Each bit is tagged with its ‘real’ locus but not all loci have
to specified. For example, in a four-bit problem:
{(1,0), (2,0), (4,1), (4,0)}
{(3,1), (3,0), (3,1), (4,0), (4,1), (3,1)}
• Overspecification: conflicting specification at a locus
– left-to-right first come first served
• Underspecification: locus not specified
– can be thought of as a schema, e.g. 00#1
– replace with random value?
– use hill-climbing to find local optimum and use value from that
Outline of Algorithm
• Two phases: primordial and juxtapositional
• Primordial phase
– Goal: enrich population with small, promising
candidate schemata
– guess order k of smallest relevant schemata
– initial population contains all possible schemata of
order k, e.g. for k=3 and solution bitlength=4 there are
8 different 3-bit patterns and 4 ways of choosing 3 bits
for a popsize of 32
– use selection only to produce next generation
– regularly halve popsize and stop after n generations
Outline of Algorithm (cont.)
• Juxtapositional phase
– popsize remains fixed, selection continues
– no mutation
– two new operators: cut and splice
• Cut takes a string and cuts it at a random point to
produce two new strings:
{(1,1), (2,0), (1,0), (4,1), (1,1)}
{(1,1), (2,0), (1,0)}
{(4,1), (1,1)}
• Splice joins two strings together
• Goldberg argued that the primordial phase should
produce, in enough numbers, all the building blocks
necessary to construct an optimal solution and that the
juxtapositional phase should then construct it.
Problems with Messy GAs
• Based on Schema Theorem, which isn’t correct
• Need to know (or guess) k, when to halve
popsize and how many generations to run
primordial phase
• Initial size of population easily becomes too
large (e.g. for k=8, bitlength=64, popsize approx.
1 trillion)
• Goldberg et al. have announced that messy GAs
are ready for real-world problems and call for
their immediate application to difficult
combinatorial problems of practical importance.
No results have yet been forthcoming.