Download Genetic Programming

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Computer program wikipedia , lookup

Transcript
Genetic Programming
CSCE155
Fall 2004
Leen-Kiat Soh
Department of Computer Science and Engineering
University of Nebraska
Acknowledgments
• The materials in this presentation are based on
– http://www.genetic-programming.org
– http://www.geneticprogramming.com/gpanimatedtutorial.html
Introduction
• One of the central challenges of computer
science is to get a computer to do what needs to
be done, without telling it how to do it
• Genetic programming addresses this challenge
by providing a method for automatically creating
a working computer program from a high-level
problem statement of the problem
– Automatic programming (a.k.a. program synthesis or
program induction)
Basic Steps
• GP
– A domain-independent method
– Iteratively transforms a population of computer
programs into a new generation of programs
– Two sets of steps:
• Preparatory steps
• Executional steps
Preparatory Steps
• The human user communicates the high-level
statement of the problem to the genetic
programming system by performing certain welldefined preparatory steps:
–
–
–
–
–
The set of terminals
The set of primitive functions
The fitness measure
Certain parameters for controlling the run
The termination criterion and method for designating
the result of the run
Preparatory Steps
• The first two preparatory steps specify the
ingredients that are available to create the
computer programs
– A run of GP is a competitive search among a diverse
population of programs composed of the available
functions and terminals
Termination
Terminal Set
Criterion & Result
Function Set Fitness Measure Parameters Designation
GP
Computer Program
Preparatory Steps
Terminal and Function Sets
• The identification of the function set and terminal
set for a particular problem is usually a
straightforward process
– The function set may consist of merely the arithmetic
functions (+, -, *, /) and a conditional branching
operator
– The terminal set may consist of the program’s
external inputs (independent variables) and numerical
constants
– Defines the search space
Preparatory Steps
Terminal and Function Sets
• Robot mopping floor example
– Function set: moving, turning, swishing the mop, etc.
• Controller example
– Function set: signal processing functions that operate on timedomain signals, including integrators, differentiators, leads, lags,
gains, adders, subtractors, etc.
– Terminal set: reference signal and plant output
• Analog electrical circuit synthesis example
– Function set: building transistors, capacitors, resistors, etc.
– Terminal set: wire, a circuit’s placement and routing, etc.
Preparatory Steps
Fitness Measure
• Specifies what needs to be done
– The primary mechanism for communicating the highlevel statement of the problem’s requirements to the
GP system
– E.g., if the goal is to get GP to automatically
synthesize an amplifier, the fitness function is the
mechanism for telling GP to synthesize a circuit that
amplifying an incoming signal is rewarding
– Defines the search’s desired goal
Preparatory Steps
Control Parameters
• Specifies the control parameters for the run
– Population size, probabilities of performing the
genetic operations, the maximum size for programs,
etc.
– Defines the search’s administrative details
Preparatory Steps
Termination
• Specifies the termination criterion and the
method of designating the result of the run
– Termination criterion: a maximum number of
generations to be run, a problem-specific success
predicate, etc.
• E.g., when the value of fitness for numerous successive bestof-generation individuals appear to have reached a plateau
– The single best-so-far individual is then harvested
and designated as the result of the run
– Defines the search’s administrative details
Executional Steps
• GP typically
– Starts with a population of randomly generated
computer programs composed of the available
programmatic ingredients (functional and terminal
sets)
– Iteratively transforms a population of programs into a
new generation of the population by applying analogs
of naturally occurring genetic operations
• Operations are applied to individual(s) selected from the
population
• Individual(s) are probabilistically selected to participate in the
genetic operations based on their fitness measure
Executional Steps
• Steps are:
– Randomly create an initial population (generation 0)
of individual computer programs composed of the
available functions and terminals
– Iteratively perform the “genetic evolution” sub-steps
(called a generation) on the population until the
termination criterion is satisfied
– After the termination criterion is satisfied, harvest the
single best program in the population produced during
the run (the best-so-far individual) and designate it as
the result of the run
• If the run is successful, the result may be a solution (or
approximate solution) to the problem
Executional Steps
• “Genetic Evolution” steps are:
– Execute each program in the population and
ascertain its fitness using the problem’s fitness
measure
– Select one or two individual program(s) from the
population with a probability based on fitness (with reselection allowed) to participate in the genetic
operations
– Create new individual program(s) using genetic
operations
Genetic Operations
• Reproduction Operation
– Simply allow the selected program to survive to the next
generation without any changes
– This reproduction is typically performed quite frequently (say,
10%-15% during each generation of the run)
Genetic Operations
• Mutation Operation
– Only one parental program is needed
– A mutation point is randomly chosen for the selected program,
the subtree rooted at that point is deleted and a new subtree is
grown using the same random growth process that was used to
generate the initial population
– This asexual mutation is typically performed sparingly (say, 1%
during each generation of the run)
Genetic Operations
• Crossover (Sexual Recombination) Operation
– Two parental programs are needed
– A crossover point is randomly chosen in the first parent and a
crossover point is randomly chosen in the second parent. Then
the subtree rooted at the crossover point of the first, or receiving,
parent is deleted and replaced by the subtree from the second,
or contributing, parent
– This mutation is the predominant operation in GP (say, 85% to
90%)
Genetic Operations
• Architecture-Altering Operations
– Based on gene duplication and gene deletion in nature
– For computer programs related problems:
• Dynmically add and delete subrountines, arguments, iterations,
loops, recursions, and memory, and also different hierarchical
arrangements of these elements
– Programs with architectures that are well-suited to the problem
at hand will tend to grow and prosper in the competitive
evolutionary process; while inadequate ones wither away.
– These operations are applied sparingly during the run (say, 0.5%
to 1% on each generation)
Genetic Operations
• Architecture-Altering Operations, Cont’d
– Subtroutine duplication
• Duplicates a pre-existing subroutine in an individual program, gives
a new name to the copy, and randomly divides the pre-existing calls
to the old subroutine between the two
• Broadens the hierarchy and may lead to divergence later of the two
subroutines, sometimes yielding specialization
– Argument duplication
• Duplicates one argument of a subroutine, randomly divides internal
references to it, and preserves overall program semantics by
adjusting all calls to the subroutine
• Enlarges the dimensionality of the subspace on which the
subroutine operates
Genetic Operations
• Architecture-Altering Operations, Cont’d
– Subtroutine creation
• Creates a new subroutine from part of a main result-producing
branch
• Deepens the hierarchy of references in the overall program
– Subtroutine deletion
• Deletes a pre-existing subroutine
• Narrows or make shallower the hierarchy of subroutines
– Argument deletion
• Deletes an argument from a subroutine
• Reduces the amount of information available to the subroutine
– Generalization
Flowchart
Tidbits
• Each individual program in the population is executed so
that each can be measured in terms of how well it
performs the task at hand
– This translates into a single explicit numerical value, called
fitness
– E.g., the amount of error between an individual program’s output
and the desired output, the amount of time, the accuracy, the
number of lines, the payoff that a game-playing program
produces, etc.
• The creation of the initial random population is a blind
random search of the search space of the problem
– Typically, the individual programs in generation 0 all have
exceedingly poor fitness; but some are (usually) more fit than
others and are selected for the next generation
Tidbits
• With probabilistic selection, better individuals are favored
over inferior individuals
– The best individual in the population is not necessarily selected
– The worst individual in the population is not necessarily passed
over
• After each generation, the population of offspring
replaces the now-old generation
• All programs in the initial random population (generation
0) of a run of GP are syntactically valid, executable
programs
– The genetic operations that are performed are also designed to
produce offspring that are syntactically valid, executable
programs
Example of a GP Run
Symbolic Regression of A Quadratic Polynomial
• Goal: automatically create a computer program whose
output is equal to the values of the quadratic polynomial
x*x + x + 1 in the range from -1 to 1
• Preparatory Steps:
– Terminal Set: independent variable x
– Functional Set: flexible, say: +, -, *, %
– Fitness measure: compare result of an individual program with
the result of x*x + x + 1
• A fitness (error) of zero would indicate a perfect fit
Example of a GP Run
Symbolic Regression of A Quadratic Polynomial
• Executional Steps:
Figure 1 Initial population of four randomly created individuals of generation 0
Example of a GP Run
Symbolic Regression of A Quadratic Polynomial
• Executional Steps:
Figure 2 The fitness of each of the four randomly created individuals
of generation 0 is equal to the area between two curves: (a) 0.67, (b) 1.0, (c)
1.67, and (d) 2.67
Example of a GP Run
Symbolic Regression of A Quadratic Polynomial
• Executional Steps:
Figure 3 Population of generation 1 (after one reproduction, one
mutation, and one two-offspring crossover operation)
Human-Competitive Results
• An automatically created result is “human-competitive”
if it satisfies one or more of the eight criteria below:
– (A) The result was patented as an invention in the past, is an
improvement over a patented invention, or would qualify today
as a patentable new invention
– (B) The result is equal to or better than that was accepted as a
new scientific result at the time when it was published in a
peer-reviewed scientific journal
– (C) The result is equal to or better than was placed into a
database or archive of results maintained by an internationally
recognized panel of scientific experts
– (D) The result is publishable in its own right as a new scientific
result—independent of the fact that the result was mechanically
created
Human-Competitive Results
• An automatically created result is “human-competitive” if it
satisfies one or more of the eight criteria below, cont’d:
– (E) The result is equal to or better than the most recent humancreated solution to a long-standing problem for which there has
been a succession of increasingly better human-created solutions
– (F) The result is equal to or better than a result that was
considered an achievement in its field at the time it was first
discovered
– (G) The result solves a problem of indisputable difficulty in its field
– (H) The result holds its own or wins a regulated competition
involving human contestants (in the form of either live human
players or human-written computer programs)
36 Instances of GP-Generated
Human-Competitive Results
• 15 instances where GP has created an entity that either
infringes or duplicates the functionality of a previously
patented 20th-century invention
• 6 instances where GP has done the same with respect
to a 21st-century invention
• 2 instances where GP has created a patentable new
invention
• Fields include
– Computational molecular biology, cellular automata, sorting
networks, and the synthesis of the design of both the topology
and component sizing for complex structures, such as analog
electrical circuits, controllers, and antenna
36 Instances of GP-Generated
Human-Competitive Results
Claimed instance
Basis for claim of humancompetitiveness
Reference
1
Creation of a better-than-classical quantum algorithm for
the Deutsch-Jozsa “early promise” problem
B, F
Spector, Barnum, and
Bernstein 1998
2
Creation of a better-than-classical quantum algorithm for
Grover’s database search problem
B, F
Spector, Barnum, and
Bernstein 1999
3
Creation of a quantum algorithm for the depth-two
AND/OR query problem that is better than any
previously published result
D
Spector, Barnum,
Bernstein, and
Swamy 1999;
Barnum, Bernstein,
and Spector 2000
4
Creation of a quantum algorithm for the depth-one OR
query problem that is better than any previously
published result
D
Barnum, Bernstein, and
Spector 2000
5
Creation of a protocol for communicating information
through a quantum gate that was previously thought
not to permit such communication
D
Spector and Bernstein
2003
6
Creation of a novel variant of quantum dense coding
D
Spector and Bernstein
2003
7
Creation of a soccer-playing program that won its first two
games in the Robo Cup 1997 competition
H
Luke 1998
36 Instances of GP-Generated
Human-Competitive Results
Claimed instance
Basis for claim of humancompetitiveness
Reference
8
Creation of a soccer-playing program that ranked in the
middle of the field of 34 human-written programs in the
Robo Cup 1998 competition
H
Andre and Teller 1999
9
Creation of four different algorithms for the
transmembrane segment identification problem for
proteins
B, E
Sections 18.8 and 18.10
of Genetic Programming
II and sections 16.5 and
17.2 of Genetic
Programming III
10
Creation of a sorting network for seven items using only
16 steps
A, D
Sections 21.4.4, 23.6,
and 57.8.1 of Genetic
Programming III
11
Rediscovery of the Campbell ladder topology for lowpass
and highpass filters
A, F
Section 25.15.1 of
Genetic Programming III
and section 5.2 of
Genetic Programming IV
12
Rediscovery of the Zobel “M-derived half section” and
“constant K” filter sections
A, F
Section 25.15.2 of
Genetic Programming III
13
Rediscovery of the Cauer (elliptic) topology for filters
A, F
Section 27.3.7 of Genetic
Programming III
14
Automatic decomposition of the problem of synthesizing a
crossover filter
A, F
Section 32.3 of Genetic
Programming III
36 Instances of GP-Generated
Human-Competitive Results
Claimed instance
Basis for claim of humancompetitiveness
Reference
15
Rediscovery of a recognizable voltage gain stage and a
Darlington emitter-follower section of an amplifier and
other circuits
A, F
Section 42.3 of Genetic
Programming III
16
Synthesis of 60 and 96 decibel amplifiers
A, F
Section 45.3 of Genetic
Programming III
17
Synthesis of analog computational circuits for squaring,
cubing, square root, cube root, logarithm, and Gaussian
functions
A, D, G
Section 47.5.3 of Genetic
Programming III
18
Synthesis of a real-time analog circuit for time-optimal
control of a robot
G
Section 48.3 of Genetic
Programming III
19
Synthesis of an electronic thermometer
A, G
Section 49.3 of Genetic
Programming III
20
Synthesis of a voltage reference circuit
A, G
Section 50.3 of Genetic
Programming III
21
Creation of a cellular automata rule for the majority
classification problem that is better than the GacsKurdyumov-Levin (GKL) rule and all other known rules
written by humans
D, E
Andre, Bennett, and
Koza 1996 and section
58.4 of Genetic
Programming III
22
Creation of motifs that detect the D–E–A–D box family of
proteins and the manganese superoxide dismutase family
C
Section 59.8 of Genetic
Programming III
36 Instances of GP-Generated
Human-Competitive Results
Claimed instance
Basis for claim of humancompetitiveness
Reference
23
Synthesis of topology for a PID-D2 (proportional,
integrative, derivative, and second derivative) controller
A, F
Section 3.7 of Genetic
Programming IV
24
Synthesis of an analog circuit equivalent to Philbrick circuit
A, F
Section 4.3 of Genetic
Programming IV
25
Synthesis of a NAND circuit
A, F
Section 4.4 of Genetic
Programming IV
26
Simultaneous synthesis of topology, sizing, placement, and
routing of analog electrical circuits
A. F, G
Chapter 5 of Genetic
Programming IV
27
Synthesis of topology for a PID (proportional, integrative,
and derivative) controller
A, F
Section 9.2 of Genetic
Programming IV
28
Rediscovery of negative feedback
A, E, F, G
Chapter 14 of Genetic
Programming IV
A
Section 15.4.1 of Genetic
Programming IV
A
Section 15.4.2 of Genetic
Programming IV
A
Section 15.4.3 of Genetic
Programming IV
A
Section 15.4.4 of Genetic
Programming IV
29
Synthesis of a low-voltage balun circuit
30
Synthesis of a mixed analog-digital variable capacitor
circuit
31
Synthesis of a high-current load circuit
32
Synthesis of a voltage-current conversion circuit
36 Instances of GP-Generated
Human-Competitive Results
Claimed instance
Basis for claim of humancompetitiveness
33
Reference
A
Section 15.4.5 of Genetic
Programming IV
Synthesis of a tunable integrated active filter
A
Section 15.4.6 of Genetic
Programming IV
35
Creation of PID tuning rules that outperform the ZieglerNichols and Åström-Hägglund tuning rules
A, B, D, E, F, G
Chapter 12 of Genetic
Programming IV
36
Creation of three non-PID controllers that outperform a
PID controller using the Ziegler-Nichols or ÅströmHägglund tuning rules
A, B, D, E, F, G
Chapter 13 of Genetic
Programming IV
Synthesis of a cubic function generator
34
Web and Literature
• The home page of Genetic Programming Inc. at www.geneticprogramming.com.
• For information about the field of genetic programming in general,
visit www.genetic-programming.org
• The home page of John R. Koza at Genetic Programming Inc.
(including online versions of most papers) and the home page of
John R. Koza at Stanford University
• Information about the 1992 book Genetic Programming: On the
Programming of Computers by Means of Natural Selection, the
1994 book Genetic Programming II: Automatic Discovery of
Reusable Programs, the 1999 book Genetic Programming III:
Darwinian Invention and Problem Solving, and the 2003 book
Genetic Programming IV: Routine Human-Competitive Machine
Intelligence.
Web and Literature
• For information on 3,198 papers (many on-line) on genetic
programming (as of June 27, 2003) by over 900 authors, see
William Langdon’s bibliography on genetic programming.
• For information on the Genetic Programming and Evolvable
Machines journal published by Kluwer Academic Publishers
• Important Conferences:
– Genetic and Evolutionary Computation (GECCO) conference
– NASA/DoD Conference on Evolvable Hardware Conference (EH)
– Euro-Genetic-Programming Conference