Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
History of artificial intelligence wikipedia , lookup
Pattern recognition wikipedia , lookup
Concept learning wikipedia , lookup
Machine learning wikipedia , lookup
Hierarchical temporal memory wikipedia , lookup
Gene expression programming wikipedia , lookup
Genetic algorithm wikipedia , lookup
Evolutionary Computing with Neural Networks Presentation Outline Neural Networks Evolutionary Computing with Neural Networks blondie24 Remarks & Conclusion Presentation Outline Neural Networks – – Definition of Neural Network Training a Neural Network Evolutionary Computing with Neural Networks blondie24 Remarks & Conclusion What is a Neural Network? • Fundamental processing element of a neural network is a neuron • Biological neuron 1. receives inputs from other sources 2. combines them in some way 3. performs a generally nonlinear operation on the result 4. outputs the final result. • A human brain has 100 billion neurons • An ant brain has 250,000 neurons Computational Structure of a neuron y f wk xk k ... x1 w1 w2 x2 xN wN S f y Multi-Layer Neural Network Multi-Layer Perceptron structure Back-propagation Algorithm •Minimizes the mean squared error using a gradient descent method 1 E ( d o) 2 2 W ' W dE dW •Error is backpropagated into previous layers one layer at a time. •Does not guarantee an optimal solution, as it might converge onto a local minimum •takes a long time to train and requires long amount of training data Summary of Neural Networks Artificial Neural Network is a powerful tool for non-linear mapping Training is slow and requires large training data Presentation Outline Neural Networks Evolutionary Computing with Neural Networks – – – Evolution of Connection Weights Evolution of Architectures Evolution of Learning Rules blondie24 Remarks & Conclusion Evolution of Connection Weights 1. 2. 3. 4. Encode each individual neural network’s connection weights into chromosomes Calculate the error function and determine individual’s fitness Reproduce children based on selection criterion Apply genetic operators Representation of Weights Binary Representation – Weights are represented by binary bits – Limitation on representation precision – e.g. 8 bits can represent connection weights between -127 and +127 too few bits → some numbers cannot be approximated too many bits → training might be prolonged Crossover operator not intuitive Solutions divide weights into functional blocks Representation of Weights Real Number Representation – To overcome binary representation, some proposed using real number – i.e., one real number per connection weight Standard genetic operators such as crossover not applicable to this representation However, some argue that it is possible to perform evolutionary computation with only mutation Fogel, Fogel and Porto (1990) adopted one genetic operator – Gaussian random mutation Presentation Outline Neural Networks Evolutionary Computing with Neural Networks – – – Evolution of Connection Weights Evolution of Architectures Evolution of Learning Rules blondie24 Remarks & Conclusion Evolution of Architectures 1. 2. 3. 4. 5. Encode each individual neural network’s architecture into chromosomes Train each neural network with predetermined learning rule Calculate the error function and determine individual’s fitness Reproduce children based on selection criterion Apply genetic operators Representation of Architectures Direct Encoding Scheme – – All information is represented by binary strings, i.e. each connection and node is specified by some binary bits An N by N matrix C (cij ) N N can represent the connectivity with N nodes, where 1, if connection is ON cij 0, if connection is OFF – – Does not scale well since large NN need a big matrix to represent Crossover operator not “meaningful” Representation of Architectures Indirect Encoding Scheme – Only the most important parameters or features of an architecture are represented. Other details are left to the learning process to decide – e.g. specify the number of hidden nodes and let the learning process decide how they are connected (e.g. fully connected) More biologically plausible as it is impossible for genetic information encoded in humans to specify the whole nervous system directly according to the discoveries of neuroscience Which is Better? EC or heuristics? empirical evidence suggests EC can outperform human experts at deciding neural network architecture – Chen and Lu (1998) evolved NNs with different number of inputs, number of hidden layers, number of hidden neurons, transfer functions, learning coefficients and momentums to financial application of option pricing Whether evolving architectures can work is more uncertain than evolving connection weights, and is on a case-by-case basis Evolving architectures takes a (very) long time, but might not be an issue if accuracy is most important (e.g. financial analysis) Which is Better? EC or heuristics? EC seems better Characteristics of architecture space – – – – – infinite as number of nodes and connections is unbounded non-differentiable as changes in number of nodes and connections are discrete complex and noisy as correlation between architecture and performance is indirect deceptive as neural networks with similar architectures may have dramatically different abilities multimodal as neural network with different architectures can have similar capabilities Presentation Outline Neural Networks Evolutionary Computing with Neural Networks – – – Evolution of Connection Weights Evolution of Architectures Evolution of Learning Rules blondie24 Remarks & Conclusion Evolution of Learning Rules Decode each individual into a learning rule Construct a neural network (either pre-determined or randomly) and train it with decoded learning rule 1. 2. • 3. 4. 5. refers to adapting the learning function, in this case, the connection weights are updated with an adaptive rule Calculate the error function and determine individual’s fitness Reproduce children based on selection criterion Apply genetic operators Representation of Learning Rules Early attempts aimed at algorithm parameters such as learning rate, but architecture was predefined Representation of learning rules is impractical due to its dynamic behavior – – Constraints have to be set, e.g. basic form of learning rules Current efforts assume a learning rule to be a linear function of local variables and their products Why this representation can’t work Learning rule equation has too many variables which makes evolution extremely slow and impractical Prevents more interesting learning rules to be evolved Better representation needed More research needed in this Summary Evolution of connection weights – – Evolution of architectures – – GA is used as learning rule for NN Most widely researched and recognized with having good potential GA used to select general structural parameters and neural learning is used separately to train neural networks Not clear if EC is better Evolution of Learning Rules – – GA used to select a learning rule to update weights during training Good potential area of research Presentation Outline Neural Networks Evolutionary Algorithms on NN Blondie24 – – – – Overview Results alphabeta searching Evaluation function Remarks & Conclusion blondie24 Kumar Chellapilla & David Fogel (1999) Checkers program called blondie24 alphabeta search on quiescent positions Evolved a NN for evaluating checkers positions (evaluation function) Aim: true artificial intelligence/machine learning blondie24 Results Program played against humans on www.zone.net 90 games played over 2 weeks Depth 6, 8 or 10 alphabeta search Dominated players rated <1800 Results against players rated between 1800 and 1900 about even blondie24 Results Best games (both rated as Master level): – – Final rating 1914.4 (Class A level) Later version of blondie24 final rating 2045.85 (Master level) – Draw against player rated 2207 Win against player rated 2134 Better than 95% of all checkers players True AI? alphabeta search Knuth & Moore (1975) Improved minimax search Used by almost all game-playing programs Quiescent positions: “stable” positions (e.g. no captures, no forced moves, etc.). Search depth +2 on non-quiescent positions. alphabeta pseudocode double alphabeta(int depth, double alpha, double beta) { if (if depth <= 0 || game is over) return eval(); generate move list; for (each move m) { make move m; double val = -alphabeta(depth – 1, -beta, -alpha); unmake move m; if (val >= beta) // cut-off return val; if (val > alpha) alpha = val; } return alpha; } blondie24 Evaluation Function Neural Network: 2 hidden layers with 40 and 10 nodes respectively, fully connected Input: vector representing checkers board position; piece differential information Output: evaluation of checkers position, range [-1, 1] Evolution of NN Initial: 15 randomly-weighted NN Each of the 15 NNs produce 1 offspring (using mutation), total of 30 NNs Each player plays against 5 randomly-selected opponents using depth 4 alphabeta Top 15 performers retained 250 generations (time taken about 1 month) Presentation Outline Neural Networks Evolutionary Algorithms on NN blondie24 Remarks & Conclusion Remarks on blondie24 Class A / Master rating – – Quiescent search – based on rating at www.zone.net is Checkers an easy game for computers? Non-quiescent positions occur frequently in checkers Minimal input information – – piece differential information may be crucial Othello program fares poorly with no domain-specific information Remarks on blondie24 blondie24 is an alphabeta search program that maximizes f(piece differential + some value NN). – – How significant is NN? How much stronger is blondie24 than a program that maximizes piece differential on quiescent positions using alphabeta? The No Free Lunch Theorem No Free Lunch Theorem (NFL) – – Wolpert and Macready, “No Free Lunch Theorems for Optimization”, IEEE Transactions on Evolutionary Computation, Vol.1, No. 1., pp. 67-82, 1996 Concerned with optimization algorithms and their performance over different classes of problems Statement of the NFL Theorem For any two algorithms a1 and a2 Σf P(dm|f,m,a1) = Σf P(dm|f,m,a2) – – – – P is performance m is a number of time steps dm is a particular set of m values f is a problem In English, “the performance of any two optimizing algorithms is the same over the space of all possible problems.” NFL Theorem Does Not Say Does not say that Evolutionary Computing is no better than random search. Does not say that comparisons between algorithms are useless. Does not say that there does not exist a subset of problems that are more relevant than the set of all problems. NFL Theorem Says Optimizing method should be tailored to the problem domain. If information on the problem domain is not taken into account then on average no optimization method should be preferred. Comparison experiments need to be qualified to a class of problems. A theory of problem types is important. Conclusion blondie24 seems to show that evolutionary approaches can lead to “true AI” (but we have some reservations) blondie24 is a success for EC, but magnitude of accomplishment may be less than it seems – – – Checkers may be easy for computers Quiescent search may greatly extend search depth Addition of piece differential information may be critical NFL theorem shows that Evolutionary Computing cannot be “miracle” cure for everything Important task in EC: identifying domain knowledge