Download GA-Fuzzy Systems

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Airborne Networking wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

Transcript
Lecture 10: Hybrid Systems
Hybrid Systems: Synergies of Fuzzy, Neural and Evolutionary Computing
Evolutionary Fuzzy Systems
Although fuzzy logic systems have been successfully applied in many complex industrial
processes, they experience a deficiency in knowledge acquisition and rely to a great extent on
empirical and heuristic knowledge, which, in many cases, cannot be objectively elicited. One
of the most important considerations in designing fuzzy systems is construction of the
membership functions for each fuzzy set as well as the rule-base. In most existing
applications, the fuzzy rules are generated by an expert in the area, especially for the control
problems with only a few inputs. The correct choice of membership functions is by no means
trivial but plays a crucial role in the success of an application. Previously, generation of
membership functions had been a task mainly done either interactively, by trial and error, or
by human experts. With an increasing number of inputs and linguistic variables, the possible
number of rules for the system increases exponentially, which makes it difficult for experts to
define a complete set of rules and associated membership functions for a good system
performance. An automated way to design fuzzy systems might be preferable.
There are many ways to combat problems of this nature. The design of a fuzzy system can be
formulated as a search problem in a high-dimensional space where each point in the space
represents a rule set, membership function and the corresponding system performance, that is,
the performance of the system forms a hyper-surface in the space according to given
performance criteria. Thus, finding the optimal location of this hyper-surface is a search
problem, which is equivalent to developing the optimal fuzzy system design (Shi et al.,
1999).
These characteristics make evolutionary algorithms, specifically genetic algorithms, a more
suitable method for searching the hyper-surface rather many other conventional methods such
as hill climbing search. Efforts have been made to automate the construction of rule-bases
and define the membership functions in various ways using genetic algorithms. In most of the
cases, either the rule-base is fixed and the parameters of the membership functions are
adjusted or membership functions are fixed and genetic algorithms optimise the rule-base.
Some researchers have optimised the rule-base, the membership functions, scaling factors
and controller parameters, which seems somewhat redundant. A block diagram of the GAfuzzy system (a controller) is shown in Figure 1.
Investigations involving several example applications demonstrated that EA’s are capable of
optimising the membership functions as well as rule-bases of fuzzy logic controllers. In
general, the number of fuzzy rules increases exponentially with increasing number of input
variables or linguistic labels. Hence it is very difficult to determine and select which rules in
such a large rule space are the most suitable for controlling the process. Secondly, the
membership function plays an important role in determining the control action prescribed and
the performance of the system. In multivariable complex processes, the optimisation and
selection of membership functions will also be very difficult. There are different arguments
on whether the membership functions or rule-bases should be optimised. Based on the
research carried out in this area, these can be divided into the following categories:
membership function optimisation, rule-base optimisation and other parameters optimisation.
We will investigate how EA can be applied to a fuzzy system. Let the initial MFs and rule
base of the fuzzy system in Figure 1 be defined as shown in Figures 2 and 3.
http://www.infm.ulst.ac.uk/~siddique
1
Lecture 10: Hybrid Systems
EA
Σ |e|
FLC
Rule-base
e
Set point
e
+
Input MFs
(Fuzzification)
Output MFs
(Defuzzification)
Inference
u
Output
Plant
-
Figure 1: EA-based optimisation of a FLC.
nb
1
ns
zo
ps
pb
1
-20
-10
0
10
20
0
-25
36
nb
ns
zo
ps
pb
1
0.5
0
-25
zo
ps
pb
-10
0
10
25
(b) Change of error.
(a) error.
1
ns
0.5
0.5
0
-36
nb
nb
ns
zo
ps
pb
-2
-1
0
1
2
0.5
-10
0
10
0
-3
25
(d) Control input.
(c) Sum of error.
Figure 2: Initial membership functions of inputs and output.
error
NB
NS
ZO
PS
PB
NB
PB
PB
PS
PS
ZO
Change of error
NS
ZO
PS
PB
PB
PS
PS
ZO
ZO
ZO
ZO
ZO
ZO
ZO
NS
NS
NB
NB
Figure 3: Rule-base for a 2-inputs, 1-output system.
http://www.infm.ulst.ac.uk/~siddique
2
PB
ZO
NS
NS
NB
NB
3
Lecture 10: Hybrid Systems
Chromosome Representation
One of the key issues in evolutionary design of fuzzy systems using GAs is the genotype
representation, i.e. information encoded into chromosomes. A fuzzy system is specified only
when the rules and membership functions associated with each fuzzy set are determined. This
can be done in three ways
-
chromosome representation for membership functions,
chromosome representation for rule base, and
chromosome representation for both membership functions and rule base together
Chromosome Representation of Membership Functions
To translate membership functions to a representation useful as genetic material, the
functions are parameterised with one to four coefficients and each of these coefficients
constitutes a gene of the chromosome for genetic algorithms.
In fuzzy system design, one can frequently assume triangular membership functions for
which each membership function can be specified by just a few parameters. In the case of a
triangular membership function, it is determined by three parameters: left position, peak and
the right position. An overlapping (not more than 50%) of the fuzzy sets is desired to ensure a
good performance of the system. Therefore, the left and peak position of the next fuzzy set is
the same as the peak and right position of the previous fuzzy set as shown in Figure 4.
a1 a2
a3
a4 a5 a6 a7 b1 b2 b3
b4
b5 b6 b7 c1 c2
c3
c4 c5 c6 c7
Figure 4: Parameterised membership functions.
Seven parameters are needed to define five fuzzy sets for each input or output, that is, the five
membership functions with each having 3 parameters are
(a1, a2, a3), (a2, a3, a4), (a3, a4, a5), (a4, a5, a6) and (a5, a6, a7)
There are 21 parameters in total for all inputs and output. A reduction of the number of
parameters can be achieved by fixing the upper and lower limits of the universe of discourse
for each input and output as shown in Figure 5(a). Hence, the chromosome for membership
functions looks like in Figure 5(b).
http://www.infm.ulst.ac.uk/~siddique
3
Lecture 10: Hybrid Systems
amin
amax
a1 a2 a3 a4 a5
cmax
cmin
bmin b1 b2
b3 b4 b5 bmax
c1 c2 c3 c4 c5
(a) Fixed upper and lower limit of the membership functions.
,,
a1, a2, a3, a4, a5 b1, b2, b3, b4, b5 c1, c2, c3, c4, c5
Parameters of
input 1(error)
Parameters of
input 2 (Change of
error)
Parameters
of output
(b) Chromosome representation for MFs
Figure 5: Reduced chromosome representation for MFs.
Example 1: Consider a fuzzy system described by the following MFs and rule-base where x1
≅error and x2 ≅ Change of error.
µ(x1)
µ(x2)
A1
1
A2
1
B1
B2
6
7
.5
.5
1
2
3
4
5
4
6 x1 ≅error
5
8
9 x2 ≅ Ch_error
Membership functions for x1 and x2.
Table: Rule-base for the Sugeno-type FLC
x2 ≅ Ch_error
x1 ≅error
B2
B1
A1
z1
z3
A2
z2
z4
Where z1 = a1 x1 + b1 x 2 + 1 , z 2 = a 2 x1 + b2 x 2 + 1 , z 3 = a 3 x1 + b3 x 2 , and z 4 = a 4 x1 + b4 .
Explain how you can apply GA to optimise the parameters of the fuzzy system. Develop a
chromosome representation for the MFs.
http://www.infm.ulst.ac.uk/~siddique
4
Lecture 10: Hybrid Systems
Chromosome Representation of Rule-base
GA’s can be used to optimise the rule-base of a fuzzy system. The linguistic variables can be
represented by integer values, for example 0 for NB, 1 for NS, 2 for ZO, 3 for PS and 4 for
PB. Applying this code to the fuzzy rule-base shown in Figure 3, the encoded rule-base
shown in Figure 6 is obtained. A chromosome is thus obtained from the decision table by
going row-wise and coding each output fuzzy set as an integer in { 0,1, L , n }, where n is the
maximum number used to label the membership functions defined for the output variable of
the fuzzy system. In this case, n = 4 as shown in Figure 7.
error
NB
↓
0
4
4
3
3
2
NB→0
NS→1
ZO→2
PS→3
PB→4
Change of error
PS
ZO
NS
↓
↓
↓
3
2
1
4
3
2
2
1
4
3
2
1
0
3
2
2
1
0
PB
↓
4
2
1
1
0
0
Figure 6: Encoding of the rule-base.
{4 4 4 3 2 | 4 3 3 2 1 | 3 2 2 2 1 | 3 2 1 1 0 | 2 1 0 0 0 }
1
424
3 1
424
3 1
424
3 1
424
3 1
424
3
1st row
2 nd row
3 rd row
4 th row
5th row
Figure 7: Chromosome representation of the rule-base.
Example 2: Show a chromosome representation of the rule-base for the problem in
example1.
http://www.infm.ulst.ac.uk/~siddique
5
Lecture 10: Hybrid Systems
Chromosome Representation of both MFs and Rule-base
Chromosome representation for both MFs and rule-base together is straightforward.
Concatenating the strings of the chromosomes for MFs and rule-base can make a simple
representation. For example, such a representation is illustrated in Figure 8.
Rule
−base
6444444
4MFs
74444444
8 644444444
47
444444444
8
{1
a1a 2 a3 a 4 a5 }{b1b2 b3b4 b5 }{c1c2 c3 c4 c5 }{4 4 4 3 2 | 4 3 3 2 1 | 3 2 2 2 1 | 3 2 1 1 0 | 2 1 0 0 0 }
424
3 1
424
3 1
424
3 123 1
424
3
42
4 43
4 14243 14243 1
input1
input 2
output
1st row
2 nd row
3rd row
4th row
5th row
Figure 8: Chromosome representation of MFs and rule-base.
There will be two different mutation operators for both parts of the chromosome string. The
genes in the membership function part of the chromosome will be replaced by a real value
whereas genes of the rule-base part of the chromosome will be changed to either up a level or
down a level of the integer value to avoid possible large deterioration in performance.
Problem with binary coding is encountered in chromosome representation of the rule-base.
When mutation is applied to a linguistic code of the rule-base, it alters to another valid
linguistic code, which is restricted to a linguistic distance of two, i.e., up a level or down a
level. This is illustrated in Figure 9. It requires 3 bits to represent integer values from 0 to 4
for five linguistic variables. Performing mutation on a single bit can change to a value 5,
which is not a valid linguistic variable at all and thus, can cause a large deterioration of the
performance. Such a big jump of the value by mutation will be difficult to control in binary
coding. Therefore, an integer valued coding is suggestive for chromosome representation of
the rule-base.
rj → 0 0 1 = 1
Invalid
mutated value
Mutation
1
0 1 = 5
Figure 9: Problem in rule-base mutation using binary coding
Example 3: Show a chromosome representation of the MFs and rule-base for the problem in
example 1.
http://www.infm.ulst.ac.uk/~siddique
6
Lecture 10: Hybrid Systems
Objective Function
Finding a good fitness measurement is quite important for evolving practical systems using
ECs. Unlike traditional gradient-based methods, ECs can be used to evolve systems with any
kind of fitness measurement function including those that are non-differentiable and
discontinuous. How to define the fitness measurement function for a system to be evolved is
problem dependent.
The procedure of evaluating the knowledge base, i.e., membership functions and rule-base,
consists of submitting to a simulation model or real system, and returning an assessment
value according to a given cost function J subject to minimization. In many cases J is
determined as a summation over time of some instantaneous cost rate. As an example, a trial
knowledge base can be made to control the model of a process and then sum the errors over
the response trajectory. The sum of errors is then directly related to the objective fitness of
the trial. The fitness of trial is a measure of the overall worth of a solution, which takes into
account the factors of an objective criterion, in this case, the performance of a fuzzy system
implementable with the trial knowledge base. The objective is simply stated as the ability to
follow a set point with minimal error. This objective can thus be expressed in terms of
minimization of the system performance indices, which are in common use. These include
integral of absolute error (IAE), integral of square error (ISE) and integral of time weighted
absolute error (ITAE).
Assume a system with multiple inputs and outputs whose overall design effectiveness can be
measured by just one output of the overall system such as error. Finally, all membership
functions and the rule-base can be expressed by some list of m (no. of membership functions
and no. of rules) parameters, ( p1 , p 2 , L , p m ) = p , where each parameter takes only a finite
set of values. In the case of IAE, it can be specified by the function:
J ( p) =
n
∑ | e( k ) |
(1)
k =1
In the case of ISE, it is defined as
n
J ( p ) = ∑ e( k ) 2
(2)
k =1
In the case of ITAE, it is defined as
n
J ( p ) = ∑ ∆t ⋅ e( k )
(3)
k =1
where e(k) is the output error of the system. n is some reasonable number of time units by
which the system can be assumed to have settled quite close to a set point. Obviously the
objective is to minimize J ( p ) subject to p.
http://www.infm.ulst.ac.uk/~siddique
7
Lecture 10: Hybrid Systems
Evaluation
The practical problem of implementation is that how to evaluate each chromosome in the
population. In this case each time the fuzzy system is applied to the plant for each individual
of the population. Its performance is evaluated by calculating the sum of absolute error. Then
the value is assigned to the individual’s fitness. The time taken in the evaluation of genetic
structures, specially in the case of fuzzy system or controller, imposes restriction on the size
of population and also the number of generations required to run the GA to a final solution.
References:
1. Shi, Y., Eberhart, R. and Chen, Y. (1999). Implementation of evolutionary fuzzy systems,
IEEE Transaction on Fuzzy Systems, vol. 7, pp. 109-119.
2. Ishibuchi, H., Nozaki, K., Yamamoto, N. and Tanaka, H. (1995). Selecting fuzzy if-then
rules for classification problems using genetic algorithms, IEEE Transaction on Fuzzy
Systems, vol. 3, pp. 260-270.
3. Chin, T.C. and Qi, X.M (1997). Genetic algorithms for learning the rule base of fuzzy
logic controller, Fuzzy Sets and Systems, vol. 97, pp.1-7.
4. Karr, C.L. and Gentry, E.J. (1993). Fuzzy Control of pH using Genetic Algorithms, IEEE
Trans. On Fuzzy Systems, vol. 1, No. 1, pp. 46-53.
5. Huang, Y.-P. and Huang, C.-H. (1997). Real-valued genetic algorithms for fuzzy grey
prediction system, Fuzzy Sets and Systems, Vol. 87, No. 3, pp. 265-276, 1997.
6. Homaifar, A. and McCormick, Ed. (1995). Simultaneous design of membership functions
and rule sets for fuzzy controllers using genetic algorithms, IEEE Trans. on Fuzzy
Systems, Vol. 3 No. 2, pp. 129-139.
7. Qi, X.M and Chin, T.C. (1997). Genetic algorithms based fuzzy controller for higher
order systems, Fuzzy Sets and Systems, vol. 91, pp. 279-284.
8. Cho, H.-J, Cho, K.-B and Wang, B.-H. (1997). Fuzzy-PID hybrid control: Automatic rule
generation using genetic algorithms, Fuzzy Sets and Systems, Vol. 92, pp. 305-316.
9. Siarry, P. and Guely, F. (1998). A genetic algorithm for optimizing Takagi-Sugeno fuzzy
rule bases, Fuzzy Sets and Systems, Vol. 99, pp. 37- 471.
10. Linkens, D.A and Nyongesa, H.O. (1995a). Genetic algorithms for fuzzy control, Part 1:
Offline system development and application, IEE Proceedings of Control Theory and
Application, vol. 142, No. 3, pp.161-176.
11. Linkens, D.A and Nyongesa, H.O. (1995b). Genetic algorithms for fuzzy control, Part 2:
Online system development and application, IEE Proceedings of Control Theory and
Application, vol. 142, No. 3, pp.177-185.
12. Karr, C.L. (1991). Design of an adaptive fuzzy logic controller using a genetic algorithm,
Proceeedings of the 4th International Conference on Genetic Algorithms, Morgan
Kaufmann Publishers, San Mateo, CA, pp. 450-457.
13. Park, Y. J., Cho, H.S. and Cha, D.H. (1995). Genetic algorithm-based optimization of
fuzzy logic controller using characteristics parameters, IEEE International Conference on
Evolutionary Computation, Parth, Western Australia, Nov 29 - Dec 1, vol. 2, pp. 831-836.
14. Markrehchi, M. (1995). Application of genetic algorithms in fuzzy rules generation, IEEE
International Conference on Evolutionary Computation, Parth, Western Australia, Nov
29 - Dec 1, vol. 2, pp. 251-256.
http://www.infm.ulst.ac.uk/~siddique
8
Lecture 10: Hybrid Systems
Neuro-Fuzzy Systems
Features of Neural and Fuzzy Systems
Fuzzy systems and neural networks, both model-free systems, contain their own advantages
and drawbacks. One area of combining them, popularly known as fuzzy neural networks,
seeks maximisation of the desirable properties and the reduction of disadvantages in both
systems. However, the subjective phenomena such as reasoning and perceptions are often
regarded beyond the domain of conventional neural network theory. It is interesting to note
that fuzzy logic is another powerful tool for modelling uncertainties associated with human
cognition, thinking and perception. Paradigms based upon this integration are believed to
have considerable potential in control systems, adaptive systems and autonomous systems.
Neural networks
ƒ
ƒ
ƒ
ƒ
ƒ
ƒ
Fuzzy systems
ƒ
ƒ
ƒ
No mathematical model required
No rule-base required
Different learning algorithms available
Black box
Rules cannot be extracted
Capable of learning from experiential
data
ƒ
ƒ
ƒ
No mathematical model required
Prior rule-base can be used
Simple interpretation and
implementation
Rules must be available
No formal methods for tuning
Capable of working without much a
priori information
Neuro-Fuzzy systems
A neuro-fuzzy system is to find the parameters of a fuzzy system by means of learning
methods obtained from neural networks. The most important reason for combining fuzzy
systems with neural networks is their learning capability. Such a combination should be able
to learn linguistic rules and/or membership functions or optimising existing ones. Learning in
this case means
ƒ
ƒ
ƒ
Creating a rule-base
Adjusting membership functions from scratch
Determination of other system parameters
Types of Neuro-Fuzzy systems
In general two kinds of combinations between neural network and fuzzy systems are
distinguished
ƒ
ƒ
Cooperative neuro-fuzzy systems
Hybrid neuro-fuzzy systems
Cooperative neuro-fuzzy systems
The combination lies in the determination of certain parameters of a fuzzy system (mentioned
above) by neural network and vice versa where both neural network and fuzzy system work
independently of each other.
http://www.infm.ulst.ac.uk/~siddique
9
Lecture 10: Hybrid Systems
Fuzzy-NN cooperative systems
In this cooperation, fuzzy system translates linguistic statements into suitable perceptions in
form of input data to be used by a NN.
Linguistic
statements
Neural network
Fuzzy
Inference
System
Perceptions
as NN inputs
Decisions
Learning
Algorithm
Figure 1: Cooperative Fuzzy-Neural System.
NN-Fuzzy cooperative systems
In this cooperation, a neural network determines membership functions from training data.
This can be done by determining suitable parameters or by approximating the membership
functions with neural networks shown in Figure 2.
Rule-base
Experiential
data
Neural network
MFs
Fuzzy
Inference
System
Decisions/
Perceptions
as output
Learning
Algorithm
Figure 2: Learning fuzzy sets.
A neural network determines fuzzy rules from training data. Clustering approach is usually
applied and neural network learns offline, such a neuro-fuzzy system is shown in Figure 3.
http://www.infm.ulst.ac.uk/~siddique
10
Lecture 10: Hybrid Systems
MFs
Neural network
Experiential
data
Rule-base
Fuzzy
Inference
System
Decisions/
Perceptions
as output
Learning
Algorithm
Figure 3: Learning fuzzy rules
A neural network determines parameters online i.e. during the use of fuzzy system, to adapt
the membership functions and it can also learn the weights of the rules online or offline, such
a neuro-fuzzy system is shown in Figure 4.
Training
data
Neural network
Fuzzy rules
&
Initial MFs
rule weights,
and/or or
parameters
Fuzzy
system
Error
determination
Figure 4: Learning fuzzy rule weights.
Hybrid neuro-fuzzy systems
The idea of a hybrid approach is to interpret a fuzzy system in terms of a neural network. The
strategy adopted here with a neuro-fuzzy system is to find the parameters of a fuzzy system
by means of learning methods obtained from neural networks. A common way to apply a
learning algorithm to a fuzzy system is to represent it in a special neural-network-like
architecture. Then a learning algorithm, such as backpropagation, can be used to train the
http://www.infm.ulst.ac.uk/~siddique
11
Lecture 10: Hybrid Systems
system. An adaptive neuro-fuzzy system with two inputs and one output is shown in Figure 5.
This is described as follows:
x1 x2
A1
wi
Oi
wi
r1
N
r2
N
wi f i
x1
A2
B1
r3
N
r4
N
Σ
Y
x2
B2
Figure 5: Hybrid neuro-fuzzy system
Layer 1: Every node i in this layer is an adaptive node with triangular membership functions
where x1 and x2 are angle error and change of error. These nodes calculate the membership
grade of the inputs.
O1 j = µ Aj ( x1 )
,
(1)
O1 j = µ Bj ( x 2 )
where j = 1,2 .
Layer 2: Every node in this layer is a fixed node representing 4 rules labeled r1…r4. Each
node determines the firing strength of a rule as
w i = µ A j ( x 1 ). µ B j ( x 2 ),
i = 1, 2 , 3 , 4 ; j = 1, 2
(2)
Layer 3: Every node in this layer is a fixed node labelled N. Each node calculates the
normalized firing strength;
wi =
wi
,
4
∑w
i = 1, 2 ,... 4
(3)
i
i =1
Layer 4: Every node in this layer is an adaptive node with a linear function defined by
f i = a i . x1 + bi . x 2 + c i , i = 1,2,...4
(4)
where ai, bi and ci, i=1,2,…,4 are the parameters of the consequent part of the rule base. Each
node calculates the weighted value of the consequent part of each rule as
http://www.infm.ulst.ac.uk/~siddique
12
Lecture 10: Hybrid Systems
wi . f i = wi ( a i x1 + bi x 2 + c i ), i = 1,2,...4
(5)
Layer 5: The single node in this layer produces the control output by aggregating all the fired
rule values;
Y =
∑w
i
. fi ,
(6)
i = 1, 2 ,... 4
i
Thus an adaptive network has been created that is functionally equivalent to a Sugeno-type
fuzzy model. The extension from Sugeno-type Neuro-Fuzzy system to Tsukamoto-type is
straightforward. For Mamdani-type inference system with max-min composition, a
corresponding adaptive system can be constructed if discrete approximations are used to
replace the integrals in centroid (or other type) defuzzification scheme.
Example 1: A Mamdani-type fuzzy system is described by the following MFs and rule-base.
µ
1
L
55
µ
µ
50
1
H
M
55
error
H
-5
L
M
+5 ch-error
0
L
1
0
LM
2
4
M
6
HM
8
H
10 Torque
Membership functions for error, change of error and torque
Rule-base for Mamdani-type fuzzy system
Change of error
error
H
M
L
H
LM
HM
H
M
LM
M
HM
L
L
LM
HM
Develop a Neuro-fuzzy system, which is equivalent to the Mamdani-type fuzzy system.
http://www.infm.ulst.ac.uk/~siddique
13
Lecture 10: Hybrid Systems
References
1. M.N.H. Siddique and M. O. Tokhi, “GA-bsed Neuro-Fuzzy Controller for Flexible-link
Manipulators", UK Conference on Computational Intelligence (UK-CI), Edinburgh, UK,
Sepetember 10-12, 2001, pp.
2. J.-S. Roger Jang, C.-T. Sun, and E. Mizutani, “Neuro-fuzzy and Soft Computing”,
Prentice Hall, 1997.
3. M.N.H. Siddique and M. O. Tokhi, “Neuro-Fuzzy Controller for Flexible-link
Manipulators", Intenational Conference on Artificial Intelligence (IC-AI), Las Vegas,
Nevada, USA, June 25-28, 2001.
4.
D. Nauck, F. Klawonn and R. Kruse, Foundations of Neuro-Fuzzy Systems, John Wiley
and Sons, 1997.
Evolutionary Neural Networks
Neuro-Evolutionary Systems
By Evolutionary Neural Network systems it is mainly meant the designing and training of
neural networks by Evolutionary Algorithms. The most popular and widely used training
procedure for neural network, known as backpropagation algorithms, suffers from a number
of problems
ƒ
ƒ
ƒ
ƒ
ƒ
Backpropagation’s speed and robustness sensitive to several of its control parameters
such as number of hidden layers and initialisation etc
Best parameters to use seem to vary from problem to problem.
There is no known approach for specifying an appropriate architecture for new
problem
Slow for large problems
Very often stuck in local minima or plateau.
Several researchers have begun to research on robust methods for overcoming these kinds of
problems. One such method may be EA. The interest in combinations of Neural Networks
and evolutionary search procedures has grown rapidly in recent years. There are several
arguments in favour of applying EA to NN optimisation (weights and/or topology) as EA has
the potential to produce a global search of parameter space and thereby to avoid local
minima. Also, it is advantageous to apply EA to problems where gradient information is
difficult or costly to obtain. This implies that EA can potentially be applied to reinforcement
learning problems with sparse feedback for training NN with non-differentiable neurons. The
only disadvantage seems obvious with EA is the slow time scale.
Various schemes for combining Evolutionary Algorithm and Neural Networks have been
proposed and tested by many researchers in the last decade, but the literature is scattered
among a variety of journals, proceedings and technical reports. Mainly two types of
combinations have been reported in literature so far
ƒ
Supportive i.e. they are used sequentially and
http://www.infm.ulst.ac.uk/~siddique
14
Lecture 10: Hybrid Systems
ƒ
Collaborative i.e. they are used simultaneously
Supportive Combinations
Initial population
Neural Networks
Evolutionary
Algorithm
Set of selectionheuristic or
parameters
Raw
Data
Supportive combinations typically involve the use of one of these methods to prepare data for
consumption by the other. The supportive mechanism can be of two ways: neural networks to
assist evolutionary algorithm (NN-EA) and evolutionary algorithms to assist neural networks
(EA-NN). In the case of NN-EA, the concept is that there seems to be some natural groupings
among the problems and that certain sets of heuristics make better starting points for some
groups than for others. The neural network’s job is to learn this grouping and suggest starting
points for any evolutionary algorithms. Neural networks are mostly pattern associators
matching the descriptions of the incoming problem with good parameter set. These neural
networks are trained using backpropagaton algorithms. The diagram shown in Figure 1
explains the supportive combination of NN and EA.
Figure 1: Supportive combination of NN-EA
In the case of EA-NN, the supportive mechanism can be divided into three categories
according to what stage they are used in the process.
ƒ
ƒ
ƒ
EA to select input features or to transform the feature space used by NN classifier
EA to select the learning rules or parameters that control learning in NN
EA to analyse a NN
Feature selection: Often the key to getting good results with a pattern classifier lies as much
with how the data is presented. EA has been used to prepare data in two ways: transforming
the feature space and selecting a subset of relevant features. In the first approach,
transforming the feature space has been mainly applied to nearest neighbour type algorithms.
For example, by letting EA choose the rotation and scaling parameters, the data is aligned in
a manner such that intraclass differences are diminished and interclass differences are
magnified. In the second approach, a set of input features are chosen as restricted feature set
will improve the performance of a neural network classifier as well as reduce computation
requirements. The main drawback of this approach is the high computation time required to
train each network classifier using the features specified by the chromosome.
Learning the learning rules: Backpropagation is known to implement a gradient descent
method which has the drawbacks of being slow for large problems and being susceptible to
becoming stuck in a local minima or in a plateau, can stop at early convergence or can take
too long time to converge due to heuristic selection of control parameters such as learning
rate (η), momentum (α) and acceleration (β) shown below.
http://www.infm.ulst.ac.uk/~siddique
15
Lecture 10: Hybrid Systems
∆ wi ( t ) = − η
∂E
+ α∆wi (t − 1) + β∆wi (t − 2)
∂wi
(1)
Backprogation’s speed and robustness are sensitive to several of its control parameters and
best parameters to use seem to vary from problem to problem. Mostly these control
parameters are learned by trial and error. Several researchers used EA to learn the control
parameters of NN (Harp et al, 1989; Belew et al, 1990).
Explain and analyse neural networks: One of the barriers to acceptance of NN is the lack
explanation facility similar to those available in most expert systems. Instead of using EA to
build better NN, few researchers have used EA to help explain or analyse NN. In order to
explore the ‘decision surface’ of a NN, EA can be used to discover input patterns that result
in maximum or nearly maximum activation values for a given output neurons. The input
patterns are represented in the chromosome by a set of real values between 0.0 and 1.0. EA is
to discover three different types of vectors – (i) maximum activation vectors meaning output
node is activated (ii) minimum activation vectors meaning output node is off and (iii)
decision vectors meaning output node is at the decision threshold. Multiples runs of any EA
with different random seeds can be used to find a set of vectors of each type.
Evolutionary
Algorithm
Transformed
features/
learning rule/
network
explanation
Neural Networks
Classification/
Recognition/
Approximation
Features,
learning
parameters
Such a supportive combination is shown in Figure 2.
Figure 2: Supportive combination of EA-NN
Collaborative Combinations
In collaborative combinations, EA and NN function together to solve problems. Among
collaborative approaches, there are two main groupings
ƒ
ƒ
There have been attempts to use evolutionary search to find appropriate connection
weights in fixed architectures, shown in Figure 3.
Alternatively, EAs have been used to find network architectures (topology), which are
then trained and evaluated using some learning procedure (backpropagation), shown
in Figure 4.
Supervised learning in NN has mostly been formulated as weight training process in which
efforts is made to find an optimal set of connection weights according to some optimality
criteria. Global search procedure like EA can be used effectively in the training process as an
evolution of connection weights towards an optimal set defined by a fitness function. The
fitness can be defined as the minimum of the sum squared error (SSE) or mean square error
(MSE) over a set of training data.
http://www.infm.ulst.ac.uk/~siddique
16
Lecture 10: Hybrid Systems
f (N ) = ∑ e2
(2)
P
1
∑ e2
P P
Where N means the network and P is the number of patterns.
f (N ) =
(3)
Σe2
Σe2
Fitness
Fitness
EA
EA
Weight
change
Connectivity
Training
data
Training
data
∆w
Target
Target
Figure 4: Learning architecture.
Figure 3: Learning weights.
Chromosome representation
The most convenient and straightforward chromosome representation of connection weights
and biases is in string form. In such a representation scheme, each connection weight and bias
is represented by some binary bits with certain length. An example of such string
representation scheme for a feed forward NN with 5 neurons is shown in Figure 5.
b1
1
2
w1
w2
w3
w4
3
b2
4
b3
w 1 w 2 w3 w4 w5 w 6 b 1 b 2 b 3
w5
5
w6
Figure 5: Chromosome represented in string form
The binary encoding of connection weights need not be uniform as adopted by many
researchers. It can also be Gray, exponential or more sophisticated. A limitation of binary
representation is the precision of discretised connection weights. If too few bits are used to
represent weights, training may take an extremely long time or even fail. On the other hand,
http://www.infm.ulst.ac.uk/~siddique
17
Lecture 10: Hybrid Systems
if too many bits are used, chromosome string for large NN become very long, which will
prolong the evolution dramatically and make the evolution impractical. It is still an open
issue of how to optimise the number of bits for each connection weight, range encoded, and
the encoding scheme used. A dynamic encoding scheme can be adopted to alleviate those
problems. To overcome those shortcomings of binary representation scheme, real numbers
were proposed i.e. one real number per connection weight. Chromosome is then represented
by concatenating these numbers as a string shown in Figure 5. The advantages are many-fold
such as shorter string length with increased precision. Various kinds of crossover and
adaptive crossover are applicable here. Standard mutation operation in binary strings cannot
be applied directly in the real representation scheme. In such circumstances, an important
task is to carefully design a set of genetic operators suitable to real encoding scheme. For
example, mutation in real number chromosome representation can be as follows
wi (t ) = wi (t − 1) ± random(0,1)
Montana and Davis defined a large number of domain-specific genetic operators
incorporating many heuristics about training NN (Montana and Davis, 1989).
Another way of representing chromosome for a feed-forward NN is that a NN can be thought
of as a weighted digraph with no closed paths and described by a upper or lower diagonal
adjacency matrix with real valued elements. The nodes should be in a fixed order according
to layers. An adjacency matrix is an N × N array in which elements
n ij = 0 if i, j ∉ E
for all i ≤ j
n ij ≠ 0 if 〈i, j 〉 ∈ E
for all i ≤ j ,
where i, j = 1,2, K , N and 〈i, j 〉 is an ordered pair and represents an edge or link between
neurons i and j, E is the set of all edges of the graph and N is the total number of neurons in
the network. The biases of the network are represented by the diagonal elements of the matrix
expressed as
ni , j ≠ 0 for all i = j
Thus an adjacency matrix of a digrapgh can contain all information about the connectivity,
weights and biases of a network. For example, the adjacency matrix shown in Figure 6
describes a three-layered feedforward neural network with bias.
1
wij
θ
θ
j
4
6
5
layer k
layer j
5 0 0 0
6 0 0 0
layer i
Figure 6: Chromosome represented in matrix form
http://www.infm.ulst.ac.uk/~siddique
4 5 6
1 0 0 0 .1 .3
From 2 0 0 0 .2 .5
node 3 0 0 0 .3 .4
4 0 0 0 .4 0
k
w jk
2
3
to 1 2 3
18
0
0
0
.4
0 .5 .5
0 0 .6
Lecture 10: Hybrid Systems
A layered feedforward network is one such that a path from input node to output node will
have the same path length. Thus an n-layered neural network has the path length of n. The
added advantage of the matrix representation is that it can be used for recurrent network as
well. In this case the matrix will be a full matrix in that the weights and biases are the
elements as defined below
nij ≠ 0 if 〈i, j 〉 ∈ E
ni , j ≠ 0
for all i ≠ j (for weights)
for all i = j (for bias)
GA for neural network architecture
It is well know that NN’s architecture has significant impact NN’s information processing
abilities. Unfortunately, there is no systematic way to design an optimal architecture for a
particular task and it is mostly designed by experienced experts through trial-and-error. The
optimal design can be viewed as a search problem in the design space according to some
optimality criteria. There are several characteristics that make EA a better candidate for
searching the surface such as
ƒ surface is infinite large since the number of possible neurons and connections is
unbounded
ƒ surface is non-differentiable since changes in the number of neurons or connections is
discrete and can have a discontinuous effect on the performance
ƒ surface is complex and noisy since the mapping from NN’s architecture to
performance after training in indirect, strongly epistatic and dependent on initial
condition
ƒ surface is deceptive since NN with similar architecture may have dramatically
different information processing abilities and performances
ƒ surface is multimodal since NN with quite different architectures can have very
similar capabilities
Perhaps the most intuitively obvious way to combine EA with neural networks is to evolve
the architecture or topology i.e. how many neurons to use and how to connect them and then
applying common training algorithms e.g. Backpropagation algorithm to tune the weights.
Chromosome representation: A key issue here is to decide how much information about
architecture should be encoded into a representation. At one end, all information about
architecture can be represented directly by binary strings. This kind of representation is called
direct encoding scheme. At the other end, only the most important parameters or features of
architecture are represented such as the number of nodes, number of connections and type of
activation functions. Other details are left to the learning process to decide. Such kind of
representation is called indirect encoding scheme.
In direct encoding scheme, a network can be represented by an N×N dimensional
connectivity matrix C = (cij )N × N that constraints connections between the N neurons of the
network where cij = 1 indicates the presence a connection from node i to node j and cij = 0
indicates absence. The connection matrix (adjacency matrix) is then converted to bit string
genotype of length by concatenating the successive rows as shown in Figure 7.
http://www.infm.ulst.ac.uk/~siddique
19
Lecture 10: Hybrid Systems
From 1
1 0
2 0
3 1
4 1
5 0
2
0
0
1
1
0
3
0
0
0
0
1
4
0
0
0
0
1
5
0
0
0
0
0
b
0
0
1
1
1
000000
000000
110001
110001
001101
000000 000000 110001 110001 001101
1
3
5
4
2
Figure 7: Connectivity matrix
Competing conventions
This type of problem occurs when a structure in the evaluation space can be represented by
very different chromosomes in the representation space. Standard crossover between two
such chromosomes having the same convention will likely not result in a useful offspring.
A
E
B
ABCDEF
C
F
D
D
F
C
DCBAFE
B
A
E
Figure 8: Competing convention
Note that the only difference in the phenotypes is the switching of the two hidden nodes and
such permuting of the hidden nodes of feed forward network does not alter the function and
will exhibit the same fitness.
http://www.infm.ulst.ac.uk/~siddique
20
Lecture 10: Hybrid Systems
Example 5: An XOR gate is to be realised using a feedforward neural network shown in
Figure below. A set of training data is also provided. Show how you can apply genetic
algorithm to train this network.
x1
x2
y
0
0
0
0
1
1
1
0
1
1
1
0
A
E
B
C
D
F
NN for an XOR gate
Reference
1. David J. Montana and Lawrence Davis,(1989). Training Feedforward Neural Network
using Genetic Algorithms, Proceedings of 11th International Joint Conference on
Artificial Intelligence, pp. 762-767, San Mateo, CA, Morgan Kaufmann.
2. D. Whiteley, T. Starkweather and C. Bogart (1990). Genetic Algorithms and neural
Networks: Optimizing Connections and Connectivity, Parallel Computing, vol. 14, pp.
347-361.
3. D. Dasgupta and D.R. McGregor (1992). Designing Neural Networks using the
Structured Genetic Algorithm, Proceedings of the International Conference on Artificial
Neural Networks (ICANN), I. Aleksander and J. Taylor Edt, Elsevier Science Publ.,
Brighton, UK, pp. 263-268.
4. Xin Yao and Yong Liu, (1997). A New Evolutionary System for Evolving Artificial
Neural Networks, IEEE Trans. Neural Networks, Vol. 8, No. 3, pp. 694-713.
5. M.N.H. Siddique and M. O. Tokhi (2001). Training Neural Networks: Backpropagation
vs Genetic Algorithms, IEEE International Joint Conference on Neural Network,
Washington DC, USA, 14-19 July.
http://www.infm.ulst.ac.uk/~siddique
21
Lecture 10: Hybrid Systems
Evolutionary Neural Fuzzy Systems
References
Farag, W.A., Quintana, V.H. and Lambert-Torres, G. (1998) A Genetic-based Neuro-Fuzzy
Approach for Modeling and Control of Dynamical Systems, IEEE Transaction on Neural
Networks, Vol. 9, No. 5, pp. 756-767.
Fukuda, T., Shimojima, K. and Shibata, T. (1994) Fuzzy, Neural Networks and Genetic
Algorithm based Control Systems, Proceedings of the IEEE International Conference on
Industrial Electronics, Control and Instrumentation, pp. 1220-1225.
Loila, V., Sessa, S., Staiano, A., and Tagliaferri, R. (2000) Merging Fuzzy Logic, Neural
Networks and Genetic Computation in the Design of a Decision Support System,
International Journal of Intelligent Systems, Vol. 15, pp. 575-594.
Mester, G. (1995) Neuro-Fuzzy-Genetic Controller Design for Robot Manipulators,
Proceedings of the IEEE International Conference on Industrial Electronics, Control and
Instrumentation, pp. 87-92.
Chiaberge, M., Bene, G. Di, Pascoli, S. Di, Lazzerini, B., Maggiore, A. and Reyneri, L.M.
(1995) Mixing Fuzzy, Neural and genetic Algorithms in an Integrated Design Environment
for Intelligent Controllers, Proceedings of the IEEE International Conference on Systems,
Man and Cybernetics, pp. 2988-2993.
Ichimura, T., Takano, T., and Tazaki, E. (1995) Reasoning and Learning Methods for Fuzzy
Rules using Neural Networks with Adaptive Structured Genetic Algorithm, Proceedings of
the IEEE International Conference on Systems, Man and Cybernetics, pp. 3269-3274.
Ichimura, T., Takano, T., and Tazaki, E. (1995) Applying Adaptive Structured Genetic
Algorithm to Reasoning and Learning Methods for Fuzzy Rules using Neural Networks,
Proceedings of the IEEE International Conference on Neural Networks, pp. 3124-3128.
http://www.infm.ulst.ac.uk/~siddique
22