Download Evolutionary Computing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of artificial intelligence wikipedia , lookup

Pattern recognition wikipedia , lookup

Concept learning wikipedia , lookup

Machine learning wikipedia , lookup

Hierarchical temporal memory wikipedia , lookup

Gene expression programming wikipedia , lookup

Genetic algorithm wikipedia , lookup

Catastrophic interference wikipedia , lookup

Convolutional neural network wikipedia , lookup

Transcript
Evolutionary Computing with
Neural Networks
Presentation Outline




Neural Networks
Evolutionary Computing with Neural
Networks
blondie24
Remarks & Conclusion
Presentation Outline

Neural Networks
–
–



Definition of Neural Network
Training a Neural Network
Evolutionary Computing with Neural
Networks
blondie24
Remarks & Conclusion
What is a Neural Network?
•
Fundamental processing element of a
neural network is a neuron
•
Biological neuron
1. receives inputs from other
sources
2. combines them in some way
3. performs a generally nonlinear operation on the result
4. outputs the final result.
•
A human brain has 100 billion neurons
•
An ant brain has 250,000 neurons
Computational Structure of a neuron


y  f   wk xk 
 k

...
x1 w1
w2
x2
xN
wN
S
f
y
Multi-Layer Neural Network
Multi-Layer Perceptron structure
Back-propagation Algorithm
•Minimizes the mean squared error
using a gradient descent method
1
E  ( d  o) 2
2
W '  W 
dE
dW
•Error is backpropagated into previous layers
one layer at a time.
•Does not guarantee an optimal solution, as it
might converge onto a local minimum
•takes a long time to train and requires long
amount of training data
Summary of Neural Networks


Artificial Neural Network is a powerful
tool for non-linear mapping
Training is slow and requires large
training data
Presentation Outline


Neural Networks
Evolutionary Computing with Neural
Networks
–
–
–


Evolution of Connection Weights
Evolution of Architectures
Evolution of Learning Rules
blondie24
Remarks & Conclusion
Evolution of Connection Weights
1.
2.
3.
4.
Encode each individual neural network’s
connection weights into chromosomes
Calculate the error function and
determine individual’s fitness
Reproduce children based on selection
criterion
Apply genetic operators
Representation of Weights
Binary Representation
–
Weights are represented by binary bits

–
Limitation on representation precision


–
e.g. 8 bits can represent connection weights between
-127 and +127
too few bits → some numbers cannot be approximated
too many bits → training might be prolonged
Crossover operator not intuitive

Solutions divide weights into
functional blocks
Representation of Weights
Real Number Representation
–
To overcome binary representation, some
proposed using real number

–
i.e., one real number per connection weight
Standard genetic operators such as crossover
not applicable to this representation


However, some argue that it is possible to perform
evolutionary computation with only mutation
Fogel, Fogel and Porto (1990)
adopted one genetic operator –
Gaussian random mutation
Presentation Outline


Neural Networks
Evolutionary Computing with Neural
Networks
–
–
–


Evolution of Connection Weights
Evolution of Architectures
Evolution of Learning Rules
blondie24
Remarks & Conclusion
Evolution of Architectures
1.
2.
3.
4.
5.
Encode each individual neural network’s
architecture into chromosomes
Train each neural network with
predetermined learning rule
Calculate the error function and determine
individual’s fitness
Reproduce children based on selection
criterion
Apply genetic operators
Representation of Architectures
Direct Encoding Scheme
–
–
All information is represented by binary strings, i.e. each
connection and node is specified by some binary bits
An N by N matrix C  (cij ) N  N can represent the connectivity
with N nodes, where
 1, if connection is ON
cij  
0, if connection is OFF
–
–
Does not scale well since large NN need a big matrix to
represent
Crossover operator not “meaningful”
Representation of Architectures
Indirect Encoding Scheme
–
Only the most important parameters or features of an
architecture are represented. Other details are left to the
learning process to decide

–
e.g. specify the number of hidden nodes and let the learning
process decide how they are connected (e.g. fully connected)
More biologically plausible as it is impossible for genetic
information encoded in humans to specify the whole nervous
system directly according to the discoveries of neuroscience
Which is Better? EC or heuristics?

empirical evidence suggests EC can outperform
human experts at deciding neural network
architecture
–


Chen and Lu (1998) evolved NNs with different number
of inputs, number of hidden layers, number of hidden
neurons, transfer functions, learning coefficients and
momentums to financial application of option pricing
Whether evolving architectures can work is more
uncertain than evolving connection weights, and
is on a case-by-case basis
Evolving architectures takes a (very) long time,
but might not be an issue if accuracy is most
important (e.g. financial analysis)
Which is Better? EC or heuristics?
EC seems better

Characteristics of architecture space
–
–
–
–
–
infinite as number of nodes and connections is
unbounded
non-differentiable as changes in number of nodes and
connections are discrete
complex and noisy as correlation between architecture
and performance is indirect
deceptive as neural networks with similar architectures
may have dramatically different abilities
multimodal as neural network with different architectures
can have similar capabilities
Presentation Outline


Neural Networks
Evolutionary Computing with Neural
Networks
–
–
–


Evolution of Connection Weights
Evolution of Architectures
Evolution of Learning Rules
blondie24
Remarks & Conclusion
Evolution of Learning Rules
Decode each individual into a learning rule
Construct a neural network (either pre-determined
or randomly) and train it with decoded learning rule
1.
2.
•
3.
4.
5.
refers to adapting the learning function, in this case,
the connection weights are updated with an adaptive
rule
Calculate the error function and determine
individual’s fitness
Reproduce children based on
selection criterion
Apply genetic operators
Representation of Learning Rules


Early attempts aimed at algorithm parameters
such as learning rate, but architecture was predefined
Representation of learning rules is impractical due
to its dynamic behavior
–
–
Constraints have to be set, e.g. basic form of learning
rules
Current efforts assume a learning rule to be a linear
function of local variables and their products
Why this representation can’t work




Learning rule equation has too many variables
which makes evolution extremely slow and
impractical
Prevents more interesting learning rules to be
evolved
Better representation needed
More research needed in this
Summary

Evolution of connection weights
–
–

Evolution of architectures
–
–

GA is used as learning rule for NN
Most widely researched and recognized with having
good potential
GA used to select general structural parameters and
neural learning is used separately to train neural
networks
Not clear if EC is better
Evolution of Learning Rules
–
–
GA used to select a learning rule to update weights
during training
Good potential area of research
Presentation Outline



Neural Networks
Evolutionary Algorithms on NN
Blondie24
–
–
–
–

Overview
Results
alphabeta searching
Evaluation function
Remarks & Conclusion
blondie24





Kumar Chellapilla & David Fogel (1999)
Checkers program called blondie24
alphabeta search on quiescent positions
Evolved a NN for evaluating checkers positions
(evaluation function)
Aim: true artificial intelligence/machine
learning
blondie24 Results





Program played against humans on
www.zone.net
90 games played over 2 weeks
Depth 6, 8 or 10 alphabeta search
Dominated players rated <1800
Results against players rated between
1800 and 1900
about even
blondie24 Results

Best games (both rated as Master level):
–
–


Final rating 1914.4 (Class A level)
Later version of blondie24 final rating 2045.85
(Master level)
–

Draw against player rated 2207
Win against player rated 2134
Better than 95% of all checkers players
True AI?
alphabeta search




Knuth & Moore (1975)
Improved minimax search
Used by almost all game-playing programs
Quiescent positions: “stable” positions
(e.g. no captures, no forced moves, etc.).
Search depth +2 on non-quiescent
positions.
alphabeta pseudocode
double alphabeta(int depth, double alpha, double beta)
{
if (if depth <= 0 || game is over)
return eval();
generate move list;
for (each move m)
{
make move m;
double val = -alphabeta(depth – 1, -beta, -alpha);
unmake move m;
if (val >= beta) // cut-off
return val;
if (val > alpha)
alpha = val;
}
return alpha;
}
blondie24 Evaluation Function



Neural Network: 2 hidden layers with
40 and 10 nodes respectively, fully
connected
Input: vector representing checkers
board position; piece differential
information
Output: evaluation of checkers
position, range [-1, 1]
Evolution of NN





Initial: 15 randomly-weighted NN
Each of the 15 NNs produce 1 offspring (using
mutation), total of 30 NNs
Each player plays against 5 randomly-selected
opponents using depth 4 alphabeta
Top 15 performers retained
250 generations (time taken
about 1 month)
Presentation Outline




Neural Networks
Evolutionary Algorithms on NN
blondie24
Remarks & Conclusion
Remarks on blondie24

Class A / Master rating
–
–

Quiescent search
–

based on rating at www.zone.net
is Checkers an easy game for computers?
Non-quiescent positions occur
frequently in checkers
Minimal input information
–
–
piece differential information may be
crucial
Othello program fares poorly with
no domain-specific information
Remarks on blondie24

blondie24 is an alphabeta search program
that maximizes f(piece differential + some
value NN).
–
–
How significant is NN?
How much stronger is blondie24 than a program
that maximizes piece differential on quiescent
positions using alphabeta?
The No Free Lunch Theorem

No Free Lunch Theorem (NFL)
–
–
Wolpert and Macready, “No Free Lunch
Theorems for Optimization”, IEEE Transactions
on Evolutionary Computation, Vol.1, No. 1., pp.
67-82, 1996
Concerned with optimization algorithms and
their performance over different classes of
problems
Statement of the NFL Theorem

For any two algorithms a1 and a2
Σf P(dm|f,m,a1) = Σf P(dm|f,m,a2)
–
–
–
–

P is performance
m is a number of time steps
dm is a particular set of m values
f is a problem
In English, “the performance of any two optimizing
algorithms is the same over the space of all possible
problems.”
NFL Theorem Does Not Say



Does not say that Evolutionary Computing is
no better than random search.
Does not say that comparisons between
algorithms are useless.
Does not say that there does not exist a
subset of problems that are more relevant
than the set of all problems.
NFL Theorem Says




Optimizing method should be tailored to the
problem domain.
If information on the problem domain is not
taken into account then on average no
optimization method should be preferred.
Comparison experiments need to be
qualified to a class of problems.
A theory of problem types is important.
Conclusion


blondie24 seems to show that evolutionary approaches can
lead to “true AI” (but we have some reservations)
blondie24 is a success for EC, but magnitude of
accomplishment may be less than it seems
–
–
–


Checkers may be easy for computers
Quiescent search may greatly extend search depth
Addition of piece differential information may be critical
NFL theorem shows that Evolutionary Computing cannot be
“miracle” cure for everything
Important task in EC: identifying domain knowledge