Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture 10: Hybrid Systems Hybrid Systems: Synergies of Fuzzy, Neural and Evolutionary Computing Evolutionary Fuzzy Systems Although fuzzy logic systems have been successfully applied in many complex industrial processes, they experience a deficiency in knowledge acquisition and rely to a great extent on empirical and heuristic knowledge, which, in many cases, cannot be objectively elicited. One of the most important considerations in designing fuzzy systems is construction of the membership functions for each fuzzy set as well as the rule-base. In most existing applications, the fuzzy rules are generated by an expert in the area, especially for the control problems with only a few inputs. The correct choice of membership functions is by no means trivial but plays a crucial role in the success of an application. Previously, generation of membership functions had been a task mainly done either interactively, by trial and error, or by human experts. With an increasing number of inputs and linguistic variables, the possible number of rules for the system increases exponentially, which makes it difficult for experts to define a complete set of rules and associated membership functions for a good system performance. An automated way to design fuzzy systems might be preferable. There are many ways to combat problems of this nature. The design of a fuzzy system can be formulated as a search problem in a high-dimensional space where each point in the space represents a rule set, membership function and the corresponding system performance, that is, the performance of the system forms a hyper-surface in the space according to given performance criteria. Thus, finding the optimal location of this hyper-surface is a search problem, which is equivalent to developing the optimal fuzzy system design (Shi et al., 1999). These characteristics make evolutionary algorithms, specifically genetic algorithms, a more suitable method for searching the hyper-surface rather many other conventional methods such as hill climbing search. Efforts have been made to automate the construction of rule-bases and define the membership functions in various ways using genetic algorithms. In most of the cases, either the rule-base is fixed and the parameters of the membership functions are adjusted or membership functions are fixed and genetic algorithms optimise the rule-base. Some researchers have optimised the rule-base, the membership functions, scaling factors and controller parameters, which seems somewhat redundant. A block diagram of the GAfuzzy system (a controller) is shown in Figure 1. Investigations involving several example applications demonstrated that EA’s are capable of optimising the membership functions as well as rule-bases of fuzzy logic controllers. In general, the number of fuzzy rules increases exponentially with increasing number of input variables or linguistic labels. Hence it is very difficult to determine and select which rules in such a large rule space are the most suitable for controlling the process. Secondly, the membership function plays an important role in determining the control action prescribed and the performance of the system. In multivariable complex processes, the optimisation and selection of membership functions will also be very difficult. There are different arguments on whether the membership functions or rule-bases should be optimised. Based on the research carried out in this area, these can be divided into the following categories: membership function optimisation, rule-base optimisation and other parameters optimisation. We will investigate how EA can be applied to a fuzzy system. Let the initial MFs and rule base of the fuzzy system in Figure 1 be defined as shown in Figures 2 and 3. http://www.infm.ulst.ac.uk/~siddique 1 Lecture 10: Hybrid Systems EA Σ |e| FLC Rule-base e Set point e + Input MFs (Fuzzification) Output MFs (Defuzzification) Inference u Output Plant - Figure 1: EA-based optimisation of a FLC. nb 1 ns zo ps pb 1 -20 -10 0 10 20 0 -25 36 nb ns zo ps pb 1 0.5 0 -25 zo ps pb -10 0 10 25 (b) Change of error. (a) error. 1 ns 0.5 0.5 0 -36 nb nb ns zo ps pb -2 -1 0 1 2 0.5 -10 0 10 0 -3 25 (d) Control input. (c) Sum of error. Figure 2: Initial membership functions of inputs and output. error NB NS ZO PS PB NB PB PB PS PS ZO Change of error NS ZO PS PB PB PS PS ZO ZO ZO ZO ZO ZO ZO NS NS NB NB Figure 3: Rule-base for a 2-inputs, 1-output system. http://www.infm.ulst.ac.uk/~siddique 2 PB ZO NS NS NB NB 3 Lecture 10: Hybrid Systems Chromosome Representation One of the key issues in evolutionary design of fuzzy systems using GAs is the genotype representation, i.e. information encoded into chromosomes. A fuzzy system is specified only when the rules and membership functions associated with each fuzzy set are determined. This can be done in three ways - chromosome representation for membership functions, chromosome representation for rule base, and chromosome representation for both membership functions and rule base together Chromosome Representation of Membership Functions To translate membership functions to a representation useful as genetic material, the functions are parameterised with one to four coefficients and each of these coefficients constitutes a gene of the chromosome for genetic algorithms. In fuzzy system design, one can frequently assume triangular membership functions for which each membership function can be specified by just a few parameters. In the case of a triangular membership function, it is determined by three parameters: left position, peak and the right position. An overlapping (not more than 50%) of the fuzzy sets is desired to ensure a good performance of the system. Therefore, the left and peak position of the next fuzzy set is the same as the peak and right position of the previous fuzzy set as shown in Figure 4. a1 a2 a3 a4 a5 a6 a7 b1 b2 b3 b4 b5 b6 b7 c1 c2 c3 c4 c5 c6 c7 Figure 4: Parameterised membership functions. Seven parameters are needed to define five fuzzy sets for each input or output, that is, the five membership functions with each having 3 parameters are (a1, a2, a3), (a2, a3, a4), (a3, a4, a5), (a4, a5, a6) and (a5, a6, a7) There are 21 parameters in total for all inputs and output. A reduction of the number of parameters can be achieved by fixing the upper and lower limits of the universe of discourse for each input and output as shown in Figure 5(a). Hence, the chromosome for membership functions looks like in Figure 5(b). http://www.infm.ulst.ac.uk/~siddique 3 Lecture 10: Hybrid Systems amin amax a1 a2 a3 a4 a5 cmax cmin bmin b1 b2 b3 b4 b5 bmax c1 c2 c3 c4 c5 (a) Fixed upper and lower limit of the membership functions. ,, a1, a2, a3, a4, a5 b1, b2, b3, b4, b5 c1, c2, c3, c4, c5 Parameters of input 1(error) Parameters of input 2 (Change of error) Parameters of output (b) Chromosome representation for MFs Figure 5: Reduced chromosome representation for MFs. Example 1: Consider a fuzzy system described by the following MFs and rule-base where x1 ≅error and x2 ≅ Change of error. µ(x1) µ(x2) A1 1 A2 1 B1 B2 6 7 .5 .5 1 2 3 4 5 4 6 x1 ≅error 5 8 9 x2 ≅ Ch_error Membership functions for x1 and x2. Table: Rule-base for the Sugeno-type FLC x2 ≅ Ch_error x1 ≅error B2 B1 A1 z1 z3 A2 z2 z4 Where z1 = a1 x1 + b1 x 2 + 1 , z 2 = a 2 x1 + b2 x 2 + 1 , z 3 = a 3 x1 + b3 x 2 , and z 4 = a 4 x1 + b4 . Explain how you can apply GA to optimise the parameters of the fuzzy system. Develop a chromosome representation for the MFs. http://www.infm.ulst.ac.uk/~siddique 4 Lecture 10: Hybrid Systems Chromosome Representation of Rule-base GA’s can be used to optimise the rule-base of a fuzzy system. The linguistic variables can be represented by integer values, for example 0 for NB, 1 for NS, 2 for ZO, 3 for PS and 4 for PB. Applying this code to the fuzzy rule-base shown in Figure 3, the encoded rule-base shown in Figure 6 is obtained. A chromosome is thus obtained from the decision table by going row-wise and coding each output fuzzy set as an integer in { 0,1, L , n }, where n is the maximum number used to label the membership functions defined for the output variable of the fuzzy system. In this case, n = 4 as shown in Figure 7. error NB ↓ 0 4 4 3 3 2 NB→0 NS→1 ZO→2 PS→3 PB→4 Change of error PS ZO NS ↓ ↓ ↓ 3 2 1 4 3 2 2 1 4 3 2 1 0 3 2 2 1 0 PB ↓ 4 2 1 1 0 0 Figure 6: Encoding of the rule-base. {4 4 4 3 2 | 4 3 3 2 1 | 3 2 2 2 1 | 3 2 1 1 0 | 2 1 0 0 0 } 1 424 3 1 424 3 1 424 3 1 424 3 1 424 3 1st row 2 nd row 3 rd row 4 th row 5th row Figure 7: Chromosome representation of the rule-base. Example 2: Show a chromosome representation of the rule-base for the problem in example1. http://www.infm.ulst.ac.uk/~siddique 5 Lecture 10: Hybrid Systems Chromosome Representation of both MFs and Rule-base Chromosome representation for both MFs and rule-base together is straightforward. Concatenating the strings of the chromosomes for MFs and rule-base can make a simple representation. For example, such a representation is illustrated in Figure 8. Rule −base 6444444 4MFs 74444444 8 644444444 47 444444444 8 {1 a1a 2 a3 a 4 a5 }{b1b2 b3b4 b5 }{c1c2 c3 c4 c5 }{4 4 4 3 2 | 4 3 3 2 1 | 3 2 2 2 1 | 3 2 1 1 0 | 2 1 0 0 0 } 424 3 1 424 3 1 424 3 123 1 424 3 42 4 43 4 14243 14243 1 input1 input 2 output 1st row 2 nd row 3rd row 4th row 5th row Figure 8: Chromosome representation of MFs and rule-base. There will be two different mutation operators for both parts of the chromosome string. The genes in the membership function part of the chromosome will be replaced by a real value whereas genes of the rule-base part of the chromosome will be changed to either up a level or down a level of the integer value to avoid possible large deterioration in performance. Problem with binary coding is encountered in chromosome representation of the rule-base. When mutation is applied to a linguistic code of the rule-base, it alters to another valid linguistic code, which is restricted to a linguistic distance of two, i.e., up a level or down a level. This is illustrated in Figure 9. It requires 3 bits to represent integer values from 0 to 4 for five linguistic variables. Performing mutation on a single bit can change to a value 5, which is not a valid linguistic variable at all and thus, can cause a large deterioration of the performance. Such a big jump of the value by mutation will be difficult to control in binary coding. Therefore, an integer valued coding is suggestive for chromosome representation of the rule-base. rj → 0 0 1 = 1 Invalid mutated value Mutation 1 0 1 = 5 Figure 9: Problem in rule-base mutation using binary coding Example 3: Show a chromosome representation of the MFs and rule-base for the problem in example 1. http://www.infm.ulst.ac.uk/~siddique 6 Lecture 10: Hybrid Systems Objective Function Finding a good fitness measurement is quite important for evolving practical systems using ECs. Unlike traditional gradient-based methods, ECs can be used to evolve systems with any kind of fitness measurement function including those that are non-differentiable and discontinuous. How to define the fitness measurement function for a system to be evolved is problem dependent. The procedure of evaluating the knowledge base, i.e., membership functions and rule-base, consists of submitting to a simulation model or real system, and returning an assessment value according to a given cost function J subject to minimization. In many cases J is determined as a summation over time of some instantaneous cost rate. As an example, a trial knowledge base can be made to control the model of a process and then sum the errors over the response trajectory. The sum of errors is then directly related to the objective fitness of the trial. The fitness of trial is a measure of the overall worth of a solution, which takes into account the factors of an objective criterion, in this case, the performance of a fuzzy system implementable with the trial knowledge base. The objective is simply stated as the ability to follow a set point with minimal error. This objective can thus be expressed in terms of minimization of the system performance indices, which are in common use. These include integral of absolute error (IAE), integral of square error (ISE) and integral of time weighted absolute error (ITAE). Assume a system with multiple inputs and outputs whose overall design effectiveness can be measured by just one output of the overall system such as error. Finally, all membership functions and the rule-base can be expressed by some list of m (no. of membership functions and no. of rules) parameters, ( p1 , p 2 , L , p m ) = p , where each parameter takes only a finite set of values. In the case of IAE, it can be specified by the function: J ( p) = n ∑ | e( k ) | (1) k =1 In the case of ISE, it is defined as n J ( p ) = ∑ e( k ) 2 (2) k =1 In the case of ITAE, it is defined as n J ( p ) = ∑ ∆t ⋅ e( k ) (3) k =1 where e(k) is the output error of the system. n is some reasonable number of time units by which the system can be assumed to have settled quite close to a set point. Obviously the objective is to minimize J ( p ) subject to p. http://www.infm.ulst.ac.uk/~siddique 7 Lecture 10: Hybrid Systems Evaluation The practical problem of implementation is that how to evaluate each chromosome in the population. In this case each time the fuzzy system is applied to the plant for each individual of the population. Its performance is evaluated by calculating the sum of absolute error. Then the value is assigned to the individual’s fitness. The time taken in the evaluation of genetic structures, specially in the case of fuzzy system or controller, imposes restriction on the size of population and also the number of generations required to run the GA to a final solution. References: 1. Shi, Y., Eberhart, R. and Chen, Y. (1999). Implementation of evolutionary fuzzy systems, IEEE Transaction on Fuzzy Systems, vol. 7, pp. 109-119. 2. Ishibuchi, H., Nozaki, K., Yamamoto, N. and Tanaka, H. (1995). Selecting fuzzy if-then rules for classification problems using genetic algorithms, IEEE Transaction on Fuzzy Systems, vol. 3, pp. 260-270. 3. Chin, T.C. and Qi, X.M (1997). Genetic algorithms for learning the rule base of fuzzy logic controller, Fuzzy Sets and Systems, vol. 97, pp.1-7. 4. Karr, C.L. and Gentry, E.J. (1993). Fuzzy Control of pH using Genetic Algorithms, IEEE Trans. On Fuzzy Systems, vol. 1, No. 1, pp. 46-53. 5. Huang, Y.-P. and Huang, C.-H. (1997). Real-valued genetic algorithms for fuzzy grey prediction system, Fuzzy Sets and Systems, Vol. 87, No. 3, pp. 265-276, 1997. 6. Homaifar, A. and McCormick, Ed. (1995). Simultaneous design of membership functions and rule sets for fuzzy controllers using genetic algorithms, IEEE Trans. on Fuzzy Systems, Vol. 3 No. 2, pp. 129-139. 7. Qi, X.M and Chin, T.C. (1997). Genetic algorithms based fuzzy controller for higher order systems, Fuzzy Sets and Systems, vol. 91, pp. 279-284. 8. Cho, H.-J, Cho, K.-B and Wang, B.-H. (1997). Fuzzy-PID hybrid control: Automatic rule generation using genetic algorithms, Fuzzy Sets and Systems, Vol. 92, pp. 305-316. 9. Siarry, P. and Guely, F. (1998). A genetic algorithm for optimizing Takagi-Sugeno fuzzy rule bases, Fuzzy Sets and Systems, Vol. 99, pp. 37- 471. 10. Linkens, D.A and Nyongesa, H.O. (1995a). Genetic algorithms for fuzzy control, Part 1: Offline system development and application, IEE Proceedings of Control Theory and Application, vol. 142, No. 3, pp.161-176. 11. Linkens, D.A and Nyongesa, H.O. (1995b). Genetic algorithms for fuzzy control, Part 2: Online system development and application, IEE Proceedings of Control Theory and Application, vol. 142, No. 3, pp.177-185. 12. Karr, C.L. (1991). Design of an adaptive fuzzy logic controller using a genetic algorithm, Proceeedings of the 4th International Conference on Genetic Algorithms, Morgan Kaufmann Publishers, San Mateo, CA, pp. 450-457. 13. Park, Y. J., Cho, H.S. and Cha, D.H. (1995). Genetic algorithm-based optimization of fuzzy logic controller using characteristics parameters, IEEE International Conference on Evolutionary Computation, Parth, Western Australia, Nov 29 - Dec 1, vol. 2, pp. 831-836. 14. Markrehchi, M. (1995). Application of genetic algorithms in fuzzy rules generation, IEEE International Conference on Evolutionary Computation, Parth, Western Australia, Nov 29 - Dec 1, vol. 2, pp. 251-256. http://www.infm.ulst.ac.uk/~siddique 8 Lecture 10: Hybrid Systems Neuro-Fuzzy Systems Features of Neural and Fuzzy Systems Fuzzy systems and neural networks, both model-free systems, contain their own advantages and drawbacks. One area of combining them, popularly known as fuzzy neural networks, seeks maximisation of the desirable properties and the reduction of disadvantages in both systems. However, the subjective phenomena such as reasoning and perceptions are often regarded beyond the domain of conventional neural network theory. It is interesting to note that fuzzy logic is another powerful tool for modelling uncertainties associated with human cognition, thinking and perception. Paradigms based upon this integration are believed to have considerable potential in control systems, adaptive systems and autonomous systems. Neural networks Fuzzy systems No mathematical model required No rule-base required Different learning algorithms available Black box Rules cannot be extracted Capable of learning from experiential data No mathematical model required Prior rule-base can be used Simple interpretation and implementation Rules must be available No formal methods for tuning Capable of working without much a priori information Neuro-Fuzzy systems A neuro-fuzzy system is to find the parameters of a fuzzy system by means of learning methods obtained from neural networks. The most important reason for combining fuzzy systems with neural networks is their learning capability. Such a combination should be able to learn linguistic rules and/or membership functions or optimising existing ones. Learning in this case means Creating a rule-base Adjusting membership functions from scratch Determination of other system parameters Types of Neuro-Fuzzy systems In general two kinds of combinations between neural network and fuzzy systems are distinguished Cooperative neuro-fuzzy systems Hybrid neuro-fuzzy systems Cooperative neuro-fuzzy systems The combination lies in the determination of certain parameters of a fuzzy system (mentioned above) by neural network and vice versa where both neural network and fuzzy system work independently of each other. http://www.infm.ulst.ac.uk/~siddique 9 Lecture 10: Hybrid Systems Fuzzy-NN cooperative systems In this cooperation, fuzzy system translates linguistic statements into suitable perceptions in form of input data to be used by a NN. Linguistic statements Neural network Fuzzy Inference System Perceptions as NN inputs Decisions Learning Algorithm Figure 1: Cooperative Fuzzy-Neural System. NN-Fuzzy cooperative systems In this cooperation, a neural network determines membership functions from training data. This can be done by determining suitable parameters or by approximating the membership functions with neural networks shown in Figure 2. Rule-base Experiential data Neural network MFs Fuzzy Inference System Decisions/ Perceptions as output Learning Algorithm Figure 2: Learning fuzzy sets. A neural network determines fuzzy rules from training data. Clustering approach is usually applied and neural network learns offline, such a neuro-fuzzy system is shown in Figure 3. http://www.infm.ulst.ac.uk/~siddique 10 Lecture 10: Hybrid Systems MFs Neural network Experiential data Rule-base Fuzzy Inference System Decisions/ Perceptions as output Learning Algorithm Figure 3: Learning fuzzy rules A neural network determines parameters online i.e. during the use of fuzzy system, to adapt the membership functions and it can also learn the weights of the rules online or offline, such a neuro-fuzzy system is shown in Figure 4. Training data Neural network Fuzzy rules & Initial MFs rule weights, and/or or parameters Fuzzy system Error determination Figure 4: Learning fuzzy rule weights. Hybrid neuro-fuzzy systems The idea of a hybrid approach is to interpret a fuzzy system in terms of a neural network. The strategy adopted here with a neuro-fuzzy system is to find the parameters of a fuzzy system by means of learning methods obtained from neural networks. A common way to apply a learning algorithm to a fuzzy system is to represent it in a special neural-network-like architecture. Then a learning algorithm, such as backpropagation, can be used to train the http://www.infm.ulst.ac.uk/~siddique 11 Lecture 10: Hybrid Systems system. An adaptive neuro-fuzzy system with two inputs and one output is shown in Figure 5. This is described as follows: x1 x2 A1 wi Oi wi r1 N r2 N wi f i x1 A2 B1 r3 N r4 N Σ Y x2 B2 Figure 5: Hybrid neuro-fuzzy system Layer 1: Every node i in this layer is an adaptive node with triangular membership functions where x1 and x2 are angle error and change of error. These nodes calculate the membership grade of the inputs. O1 j = µ Aj ( x1 ) , (1) O1 j = µ Bj ( x 2 ) where j = 1,2 . Layer 2: Every node in this layer is a fixed node representing 4 rules labeled r1…r4. Each node determines the firing strength of a rule as w i = µ A j ( x 1 ). µ B j ( x 2 ), i = 1, 2 , 3 , 4 ; j = 1, 2 (2) Layer 3: Every node in this layer is a fixed node labelled N. Each node calculates the normalized firing strength; wi = wi , 4 ∑w i = 1, 2 ,... 4 (3) i i =1 Layer 4: Every node in this layer is an adaptive node with a linear function defined by f i = a i . x1 + bi . x 2 + c i , i = 1,2,...4 (4) where ai, bi and ci, i=1,2,…,4 are the parameters of the consequent part of the rule base. Each node calculates the weighted value of the consequent part of each rule as http://www.infm.ulst.ac.uk/~siddique 12 Lecture 10: Hybrid Systems wi . f i = wi ( a i x1 + bi x 2 + c i ), i = 1,2,...4 (5) Layer 5: The single node in this layer produces the control output by aggregating all the fired rule values; Y = ∑w i . fi , (6) i = 1, 2 ,... 4 i Thus an adaptive network has been created that is functionally equivalent to a Sugeno-type fuzzy model. The extension from Sugeno-type Neuro-Fuzzy system to Tsukamoto-type is straightforward. For Mamdani-type inference system with max-min composition, a corresponding adaptive system can be constructed if discrete approximations are used to replace the integrals in centroid (or other type) defuzzification scheme. Example 1: A Mamdani-type fuzzy system is described by the following MFs and rule-base. µ 1 L 55 µ µ 50 1 H M 55 error H -5 L M +5 ch-error 0 L 1 0 LM 2 4 M 6 HM 8 H 10 Torque Membership functions for error, change of error and torque Rule-base for Mamdani-type fuzzy system Change of error error H M L H LM HM H M LM M HM L L LM HM Develop a Neuro-fuzzy system, which is equivalent to the Mamdani-type fuzzy system. http://www.infm.ulst.ac.uk/~siddique 13 Lecture 10: Hybrid Systems References 1. M.N.H. Siddique and M. O. Tokhi, “GA-bsed Neuro-Fuzzy Controller for Flexible-link Manipulators", UK Conference on Computational Intelligence (UK-CI), Edinburgh, UK, Sepetember 10-12, 2001, pp. 2. J.-S. Roger Jang, C.-T. Sun, and E. Mizutani, “Neuro-fuzzy and Soft Computing”, Prentice Hall, 1997. 3. M.N.H. Siddique and M. O. Tokhi, “Neuro-Fuzzy Controller for Flexible-link Manipulators", Intenational Conference on Artificial Intelligence (IC-AI), Las Vegas, Nevada, USA, June 25-28, 2001. 4. D. Nauck, F. Klawonn and R. Kruse, Foundations of Neuro-Fuzzy Systems, John Wiley and Sons, 1997. Evolutionary Neural Networks Neuro-Evolutionary Systems By Evolutionary Neural Network systems it is mainly meant the designing and training of neural networks by Evolutionary Algorithms. The most popular and widely used training procedure for neural network, known as backpropagation algorithms, suffers from a number of problems Backpropagation’s speed and robustness sensitive to several of its control parameters such as number of hidden layers and initialisation etc Best parameters to use seem to vary from problem to problem. There is no known approach for specifying an appropriate architecture for new problem Slow for large problems Very often stuck in local minima or plateau. Several researchers have begun to research on robust methods for overcoming these kinds of problems. One such method may be EA. The interest in combinations of Neural Networks and evolutionary search procedures has grown rapidly in recent years. There are several arguments in favour of applying EA to NN optimisation (weights and/or topology) as EA has the potential to produce a global search of parameter space and thereby to avoid local minima. Also, it is advantageous to apply EA to problems where gradient information is difficult or costly to obtain. This implies that EA can potentially be applied to reinforcement learning problems with sparse feedback for training NN with non-differentiable neurons. The only disadvantage seems obvious with EA is the slow time scale. Various schemes for combining Evolutionary Algorithm and Neural Networks have been proposed and tested by many researchers in the last decade, but the literature is scattered among a variety of journals, proceedings and technical reports. Mainly two types of combinations have been reported in literature so far Supportive i.e. they are used sequentially and http://www.infm.ulst.ac.uk/~siddique 14 Lecture 10: Hybrid Systems Collaborative i.e. they are used simultaneously Supportive Combinations Initial population Neural Networks Evolutionary Algorithm Set of selectionheuristic or parameters Raw Data Supportive combinations typically involve the use of one of these methods to prepare data for consumption by the other. The supportive mechanism can be of two ways: neural networks to assist evolutionary algorithm (NN-EA) and evolutionary algorithms to assist neural networks (EA-NN). In the case of NN-EA, the concept is that there seems to be some natural groupings among the problems and that certain sets of heuristics make better starting points for some groups than for others. The neural network’s job is to learn this grouping and suggest starting points for any evolutionary algorithms. Neural networks are mostly pattern associators matching the descriptions of the incoming problem with good parameter set. These neural networks are trained using backpropagaton algorithms. The diagram shown in Figure 1 explains the supportive combination of NN and EA. Figure 1: Supportive combination of NN-EA In the case of EA-NN, the supportive mechanism can be divided into three categories according to what stage they are used in the process. EA to select input features or to transform the feature space used by NN classifier EA to select the learning rules or parameters that control learning in NN EA to analyse a NN Feature selection: Often the key to getting good results with a pattern classifier lies as much with how the data is presented. EA has been used to prepare data in two ways: transforming the feature space and selecting a subset of relevant features. In the first approach, transforming the feature space has been mainly applied to nearest neighbour type algorithms. For example, by letting EA choose the rotation and scaling parameters, the data is aligned in a manner such that intraclass differences are diminished and interclass differences are magnified. In the second approach, a set of input features are chosen as restricted feature set will improve the performance of a neural network classifier as well as reduce computation requirements. The main drawback of this approach is the high computation time required to train each network classifier using the features specified by the chromosome. Learning the learning rules: Backpropagation is known to implement a gradient descent method which has the drawbacks of being slow for large problems and being susceptible to becoming stuck in a local minima or in a plateau, can stop at early convergence or can take too long time to converge due to heuristic selection of control parameters such as learning rate (η), momentum (α) and acceleration (β) shown below. http://www.infm.ulst.ac.uk/~siddique 15 Lecture 10: Hybrid Systems ∆ wi ( t ) = − η ∂E + α∆wi (t − 1) + β∆wi (t − 2) ∂wi (1) Backprogation’s speed and robustness are sensitive to several of its control parameters and best parameters to use seem to vary from problem to problem. Mostly these control parameters are learned by trial and error. Several researchers used EA to learn the control parameters of NN (Harp et al, 1989; Belew et al, 1990). Explain and analyse neural networks: One of the barriers to acceptance of NN is the lack explanation facility similar to those available in most expert systems. Instead of using EA to build better NN, few researchers have used EA to help explain or analyse NN. In order to explore the ‘decision surface’ of a NN, EA can be used to discover input patterns that result in maximum or nearly maximum activation values for a given output neurons. The input patterns are represented in the chromosome by a set of real values between 0.0 and 1.0. EA is to discover three different types of vectors – (i) maximum activation vectors meaning output node is activated (ii) minimum activation vectors meaning output node is off and (iii) decision vectors meaning output node is at the decision threshold. Multiples runs of any EA with different random seeds can be used to find a set of vectors of each type. Evolutionary Algorithm Transformed features/ learning rule/ network explanation Neural Networks Classification/ Recognition/ Approximation Features, learning parameters Such a supportive combination is shown in Figure 2. Figure 2: Supportive combination of EA-NN Collaborative Combinations In collaborative combinations, EA and NN function together to solve problems. Among collaborative approaches, there are two main groupings There have been attempts to use evolutionary search to find appropriate connection weights in fixed architectures, shown in Figure 3. Alternatively, EAs have been used to find network architectures (topology), which are then trained and evaluated using some learning procedure (backpropagation), shown in Figure 4. Supervised learning in NN has mostly been formulated as weight training process in which efforts is made to find an optimal set of connection weights according to some optimality criteria. Global search procedure like EA can be used effectively in the training process as an evolution of connection weights towards an optimal set defined by a fitness function. The fitness can be defined as the minimum of the sum squared error (SSE) or mean square error (MSE) over a set of training data. http://www.infm.ulst.ac.uk/~siddique 16 Lecture 10: Hybrid Systems f (N ) = ∑ e2 (2) P 1 ∑ e2 P P Where N means the network and P is the number of patterns. f (N ) = (3) Σe2 Σe2 Fitness Fitness EA EA Weight change Connectivity Training data Training data ∆w Target Target Figure 4: Learning architecture. Figure 3: Learning weights. Chromosome representation The most convenient and straightforward chromosome representation of connection weights and biases is in string form. In such a representation scheme, each connection weight and bias is represented by some binary bits with certain length. An example of such string representation scheme for a feed forward NN with 5 neurons is shown in Figure 5. b1 1 2 w1 w2 w3 w4 3 b2 4 b3 w 1 w 2 w3 w4 w5 w 6 b 1 b 2 b 3 w5 5 w6 Figure 5: Chromosome represented in string form The binary encoding of connection weights need not be uniform as adopted by many researchers. It can also be Gray, exponential or more sophisticated. A limitation of binary representation is the precision of discretised connection weights. If too few bits are used to represent weights, training may take an extremely long time or even fail. On the other hand, http://www.infm.ulst.ac.uk/~siddique 17 Lecture 10: Hybrid Systems if too many bits are used, chromosome string for large NN become very long, which will prolong the evolution dramatically and make the evolution impractical. It is still an open issue of how to optimise the number of bits for each connection weight, range encoded, and the encoding scheme used. A dynamic encoding scheme can be adopted to alleviate those problems. To overcome those shortcomings of binary representation scheme, real numbers were proposed i.e. one real number per connection weight. Chromosome is then represented by concatenating these numbers as a string shown in Figure 5. The advantages are many-fold such as shorter string length with increased precision. Various kinds of crossover and adaptive crossover are applicable here. Standard mutation operation in binary strings cannot be applied directly in the real representation scheme. In such circumstances, an important task is to carefully design a set of genetic operators suitable to real encoding scheme. For example, mutation in real number chromosome representation can be as follows wi (t ) = wi (t − 1) ± random(0,1) Montana and Davis defined a large number of domain-specific genetic operators incorporating many heuristics about training NN (Montana and Davis, 1989). Another way of representing chromosome for a feed-forward NN is that a NN can be thought of as a weighted digraph with no closed paths and described by a upper or lower diagonal adjacency matrix with real valued elements. The nodes should be in a fixed order according to layers. An adjacency matrix is an N × N array in which elements n ij = 0 if i, j ∉ E for all i ≤ j n ij ≠ 0 if 〈i, j 〉 ∈ E for all i ≤ j , where i, j = 1,2, K , N and 〈i, j 〉 is an ordered pair and represents an edge or link between neurons i and j, E is the set of all edges of the graph and N is the total number of neurons in the network. The biases of the network are represented by the diagonal elements of the matrix expressed as ni , j ≠ 0 for all i = j Thus an adjacency matrix of a digrapgh can contain all information about the connectivity, weights and biases of a network. For example, the adjacency matrix shown in Figure 6 describes a three-layered feedforward neural network with bias. 1 wij θ θ j 4 6 5 layer k layer j 5 0 0 0 6 0 0 0 layer i Figure 6: Chromosome represented in matrix form http://www.infm.ulst.ac.uk/~siddique 4 5 6 1 0 0 0 .1 .3 From 2 0 0 0 .2 .5 node 3 0 0 0 .3 .4 4 0 0 0 .4 0 k w jk 2 3 to 1 2 3 18 0 0 0 .4 0 .5 .5 0 0 .6 Lecture 10: Hybrid Systems A layered feedforward network is one such that a path from input node to output node will have the same path length. Thus an n-layered neural network has the path length of n. The added advantage of the matrix representation is that it can be used for recurrent network as well. In this case the matrix will be a full matrix in that the weights and biases are the elements as defined below nij ≠ 0 if 〈i, j 〉 ∈ E ni , j ≠ 0 for all i ≠ j (for weights) for all i = j (for bias) GA for neural network architecture It is well know that NN’s architecture has significant impact NN’s information processing abilities. Unfortunately, there is no systematic way to design an optimal architecture for a particular task and it is mostly designed by experienced experts through trial-and-error. The optimal design can be viewed as a search problem in the design space according to some optimality criteria. There are several characteristics that make EA a better candidate for searching the surface such as surface is infinite large since the number of possible neurons and connections is unbounded surface is non-differentiable since changes in the number of neurons or connections is discrete and can have a discontinuous effect on the performance surface is complex and noisy since the mapping from NN’s architecture to performance after training in indirect, strongly epistatic and dependent on initial condition surface is deceptive since NN with similar architecture may have dramatically different information processing abilities and performances surface is multimodal since NN with quite different architectures can have very similar capabilities Perhaps the most intuitively obvious way to combine EA with neural networks is to evolve the architecture or topology i.e. how many neurons to use and how to connect them and then applying common training algorithms e.g. Backpropagation algorithm to tune the weights. Chromosome representation: A key issue here is to decide how much information about architecture should be encoded into a representation. At one end, all information about architecture can be represented directly by binary strings. This kind of representation is called direct encoding scheme. At the other end, only the most important parameters or features of architecture are represented such as the number of nodes, number of connections and type of activation functions. Other details are left to the learning process to decide. Such kind of representation is called indirect encoding scheme. In direct encoding scheme, a network can be represented by an N×N dimensional connectivity matrix C = (cij )N × N that constraints connections between the N neurons of the network where cij = 1 indicates the presence a connection from node i to node j and cij = 0 indicates absence. The connection matrix (adjacency matrix) is then converted to bit string genotype of length by concatenating the successive rows as shown in Figure 7. http://www.infm.ulst.ac.uk/~siddique 19 Lecture 10: Hybrid Systems From 1 1 0 2 0 3 1 4 1 5 0 2 0 0 1 1 0 3 0 0 0 0 1 4 0 0 0 0 1 5 0 0 0 0 0 b 0 0 1 1 1 000000 000000 110001 110001 001101 000000 000000 110001 110001 001101 1 3 5 4 2 Figure 7: Connectivity matrix Competing conventions This type of problem occurs when a structure in the evaluation space can be represented by very different chromosomes in the representation space. Standard crossover between two such chromosomes having the same convention will likely not result in a useful offspring. A E B ABCDEF C F D D F C DCBAFE B A E Figure 8: Competing convention Note that the only difference in the phenotypes is the switching of the two hidden nodes and such permuting of the hidden nodes of feed forward network does not alter the function and will exhibit the same fitness. http://www.infm.ulst.ac.uk/~siddique 20 Lecture 10: Hybrid Systems Example 5: An XOR gate is to be realised using a feedforward neural network shown in Figure below. A set of training data is also provided. Show how you can apply genetic algorithm to train this network. x1 x2 y 0 0 0 0 1 1 1 0 1 1 1 0 A E B C D F NN for an XOR gate Reference 1. David J. Montana and Lawrence Davis,(1989). Training Feedforward Neural Network using Genetic Algorithms, Proceedings of 11th International Joint Conference on Artificial Intelligence, pp. 762-767, San Mateo, CA, Morgan Kaufmann. 2. D. Whiteley, T. Starkweather and C. Bogart (1990). Genetic Algorithms and neural Networks: Optimizing Connections and Connectivity, Parallel Computing, vol. 14, pp. 347-361. 3. D. Dasgupta and D.R. McGregor (1992). Designing Neural Networks using the Structured Genetic Algorithm, Proceedings of the International Conference on Artificial Neural Networks (ICANN), I. Aleksander and J. Taylor Edt, Elsevier Science Publ., Brighton, UK, pp. 263-268. 4. Xin Yao and Yong Liu, (1997). A New Evolutionary System for Evolving Artificial Neural Networks, IEEE Trans. Neural Networks, Vol. 8, No. 3, pp. 694-713. 5. M.N.H. Siddique and M. O. Tokhi (2001). Training Neural Networks: Backpropagation vs Genetic Algorithms, IEEE International Joint Conference on Neural Network, Washington DC, USA, 14-19 July. http://www.infm.ulst.ac.uk/~siddique 21 Lecture 10: Hybrid Systems Evolutionary Neural Fuzzy Systems References Farag, W.A., Quintana, V.H. and Lambert-Torres, G. (1998) A Genetic-based Neuro-Fuzzy Approach for Modeling and Control of Dynamical Systems, IEEE Transaction on Neural Networks, Vol. 9, No. 5, pp. 756-767. Fukuda, T., Shimojima, K. and Shibata, T. (1994) Fuzzy, Neural Networks and Genetic Algorithm based Control Systems, Proceedings of the IEEE International Conference on Industrial Electronics, Control and Instrumentation, pp. 1220-1225. Loila, V., Sessa, S., Staiano, A., and Tagliaferri, R. (2000) Merging Fuzzy Logic, Neural Networks and Genetic Computation in the Design of a Decision Support System, International Journal of Intelligent Systems, Vol. 15, pp. 575-594. Mester, G. (1995) Neuro-Fuzzy-Genetic Controller Design for Robot Manipulators, Proceedings of the IEEE International Conference on Industrial Electronics, Control and Instrumentation, pp. 87-92. Chiaberge, M., Bene, G. Di, Pascoli, S. Di, Lazzerini, B., Maggiore, A. and Reyneri, L.M. (1995) Mixing Fuzzy, Neural and genetic Algorithms in an Integrated Design Environment for Intelligent Controllers, Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, pp. 2988-2993. Ichimura, T., Takano, T., and Tazaki, E. (1995) Reasoning and Learning Methods for Fuzzy Rules using Neural Networks with Adaptive Structured Genetic Algorithm, Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, pp. 3269-3274. Ichimura, T., Takano, T., and Tazaki, E. (1995) Applying Adaptive Structured Genetic Algorithm to Reasoning and Learning Methods for Fuzzy Rules using Neural Networks, Proceedings of the IEEE International Conference on Neural Networks, pp. 3124-3128. http://www.infm.ulst.ac.uk/~siddique 22