Download PowerPoint 簡報

Document related concepts

Knowledge representation and reasoning wikipedia , lookup

Convolutional neural network wikipedia , lookup

History of artificial intelligence wikipedia , lookup

Catastrophic interference wikipedia , lookup

Neural modeling fields wikipedia , lookup

Pattern recognition wikipedia , lookup

Concept learning wikipedia , lookup

Type-2 fuzzy sets and systems wikipedia , lookup

Machine learning wikipedia , lookup

Fuzzy concept wikipedia , lookup

Fuzzy logic wikipedia , lookup

Transcript
®Copyright of Shun-Feng Su
The Essence of
Computational Intelligence
計算型智慧的基本概念
Offered by 蘇順豐
Shun-Feng Su,
E-mail: [email protected]
Department of Electrical Engineering,
National Taiwan University of Science and Technology
1
March, 2009
®Copyright of Shun-Feng Su
Preface
 People
always dreams of having machines
that can act like human.
 Artificial
Intelligence is to study what are those
components that can facilitate such a dream.
 Due
to the nature of knowledge, traditional
artificial intelligence use symbols to construct
the conceptual world.
2
March, 2009
®Copyright of Shun-Feng Su
Preface
 Symbolic
artificial intelligence is very difficult to
manipulate for a real world problem, especially,
for implementing common sense knowledge.
 Recently,
computational intelligence (CI) is
commonly used and has demonstrated good
performance in various applications.
 CI
is named to distinguish itself from the
traditional symbolic artificial intelligence in the
property of easy manipulation with the use of
numerical knowledge representation.
3
March, 2009
®Copyright of Shun-Feng Su
Preface
The following three methodologies are often
considered as CI:
Fuzzy Systems,
 Neural Networks, and
 Genetic Algorithms (or referred to as
Evolutionary Computation. )

This talk is to provide fundamental concepts and
ideas in those often mentioned techniques.
4
March, 2009
®Copyright of Shun-Feng Su
Basics for CI
CI is known to have the following characteristics [1]:





Numerical knowledge representation;
Adaptability;
Fault tolerance;
Fast processing speed ;
Error rate optimality.
[1] J. C. Bezdek, “what is computational intelligence?” Computational
Intelligence: Imitating Life, J. M. Zurada, R. J. Marks II, and C. J.
Robinson, Eds., New York: IEEE Press, pp. 1-12, 1994.
5
March, 2009
®Copyright of Shun-Feng Su
Basics for CI
Possible advantages of using CI are:

Efficiency;

Robustness;

Good generalization capability;

Easy to use;

Easy to incorporate problem domain heuristics;

Superior performance in various applications.
6
March, 2009
®Copyright of Shun-Feng Su
Basics for CI
Possible advantages of using CI are:
Generation capability is to have a
fair chance to behave as required
for any input data.

Efficiency;

Robustness;

Good generalization capability;

Easy to use;

Easy to incorporate problem domain heuristics;

Superior performance in various applications.
7
March, 2009
®Copyright of Shun-Feng Su
Basics for CI
Possible problems encountered while using CI are:

Incomprehensive in knowledge;
 Lack
of theoretical analysis tools, such as
stability, performance guarantee, etc.;

Various subjective parameters required;

Lack of benchmarks in performance evaluation.
 May be disadvantages, but sometimes, may
provide good means for applications.
8
March, 2009
®Copyright of Shun-Feng Su
Outline
 Introductions
 Fuzzy
Systems
 Uncertainty and Its representation
 Fuzzy operations and Uncertainty Reasoning
 Fuzzy Logic Control
 Neural
Networks
 Genetic
Algorithms
 Epilogue
9
March, 2009
®Copyright of Shun-Feng Su
Introduction of Fuzzy Systems
Fuzzy systems have been widely used in various
applications.
In fact, the fundamental idea behind fuzzy
systems is to include uncertainty in the process.
Such an inclusion provides extra information so
that the systems can be more accurate.
In other words, fuzzy is vagueness by meaning,
but can provides accurate due to this extra
information.
10
March, 2009
®Copyright of Shun-Feng Su
Uncertainties in Intelligent systems
Uncertainties exists for the following reasons:
noise always exists in the environment;
 facts being true or events occurring may not be
certain;
 stored knowledge is incomplete or liable to change;
 exceptions are inevitable for any realistic knowledge;
 simplifications are necessary to reduce the
complexity of the system;
 partitions of continuous variables for rule-based
knowledge results in fuzzy set concept.
11
March, 2009

®Copyright of Shun-Feng Su
Uncertainties in Intelligent systems
Traditional systems always use nominal values to
reason and to make decision. But, to use more
information may have more accurate decision
making.
Thus, to act intelligently, those uncertainties cannot be
ignored in the way of computing.
To incorporate uncertainties in the decision making
process, the system must be capable of
representing uncertainty and also be equipped with
the capability of approximate reasoning.
12
March, 2009
®Copyright of Shun-Feng Su
Fuzzy Sets As
A Representation for Uncertainty
The traditional sets are called classical sets or
crisp sets.
In a crisp set, the membership belonging is crisp
and can be described in a simple yes/no
answer. That is, an element is either in the set
or not in the set. The membership function of A
is defined as
when x  A,
1,
 A ( x)  
0,
when x  A.
13
March, 2009
®Copyright of Shun-Feng Su
Fuzzy Sets As
A Representation for Uncertainty
The range of the membership function, , of a
fuzzy set A now is the interval [0,1] instead of
only binary values {0,1}.
Example: Let a fuzzy set A represent the
concept “real numbers that are close to 5” and
the membership function for A is
 A ( x) 
1
1  10( x  5)
14
2
March, 2009
®Copyright of Shun-Feng Su
Fuzzy Sets As
A Representation for Uncertainty
For example,
When x = 62mph, M(x)=0.4667, F(x)=0.5333.
When x = 63mph, M(x)=0.5333, F(x)=0.4667.
When x = 69mph, M(x)=0.0667, F(x)=0.9333.
15
March, 2009
®Copyright of Shun-Feng Su
Uncertainty Representations
Two often used uncertainty representations: Fuzzy set
and Probability. From the uncertainty concept per se
viewpoint, those two uncertainties are two different
types of uncertainty.

fuzzy set is to capture the idea of vagueness: To
indicate the degree of uncertainty about what it is.
 What is rain? What is fast?

probability is to capture the idea of ambiguity: To
indicate uncertain about whether it is there.
 Whether it rains? What the outcome of a die will be?
16
March, 2009
®Copyright of Shun-Feng Su
Fuzzy vs. Probability
From the mathematical representation viewpoint, they
are comparable and possess different reasoning
behaviors.


Reasoning with probabilities is mathematical sound
but is difficult to manipulate due to no modularity.
Reasoning with fuzzy sets does not provide
mathematical sound inference and is subjective,
but it is easy to manipulate.
In fact, other types of uncertainty can be found in the
literature.
17
March, 2009
®Copyright of Shun-Feng Su
Outline
 Introductions
 Fuzzy
Systems
 Uncertainty and Its representation
 Fuzzy Operations and Uncertainty Reasoning
 Fuzzy Logic Control
 Neural
Networks
 Genetic
Algorithms
 Epilogue
18
March, 2009
®Copyright of Shun-Feng Su
Operations of Fuzzy–
Extension Principle
Given a function f : U  V, now the input domain
is a fuzzy set A in U. What will be the output?
The extension principle states that the fuzzy
degree for A will be the fuzzy degree for y=f(A).
The concept is to pass the membership degree of
x to f(x); i.e. the function itself is crisp and will
not introduce any uncertainty. Thus, the
membership degree of x will truly appear for f(x).
19
March, 2009
®Copyright of Shun-Feng Su
Extension Principle
Two problems arising:
f(x) is a many-to-one function: i.e. f(x1)=f(x2), but
x1≠x2. Then, the membership degree can be
μ(x1) or μ(x2). In other words, the resultant
membership degree is μ(x1)μ(x2) .
The input domain consists of multiple variables.
Then, f(x1, x2, …, xn) is obtained when all x1,
x2, …, xn appear. In other words, the
membership degree of is μ(x1)μ(x2) … μ(xn).
20
March, 2009
®Copyright of Shun-Feng Su
Extension Principle
The extension principle allows the generalization of
crisp mathematical concept to the fuzzy set
framework, and extends point to point mapping to
mapping for fuzzy sets.
It provides a means for any function f that maps an ntuple (x1, x2, … ,xn) in the crisp set U to a point in
the crisp set V to be generalized to mapping n
fuzzy subsets in U to a fuzzy subset in V.
Any mathematical relationship between non-fuzzy
elements can be extended to deal with fuzzy
entities.
21
March, 2009
®Copyright of Shun-Feng Su
Classic Logic Reasoning
Logic reasoning is to find other true propositions (facts)
from given true propositions (knowledge and/or facts).
The scenario of logic reasoning can be interpreted as:
There is a knowledge base containing facts or rules.
Now, a new piece of information or the description of
the current situation is specified. Then, we want to find
out what the system can conclude or which action
should be taken under current circumstance.
The traditional reasoning is called the Modus Ponen as
(A(AB))B. That is one knowledge AB and a fact
A can result in the fact B.
22
March, 2009
®Copyright of Shun-Feng Su
Approximate Reasoning for Fuzzy sets
The most used inference rule is (A1(A2B))B. In the classic
logic, either A1=A2 or A1A2. Therefore, with the match and
fire property, either B is concluded or B is not concluded.
But, with the use of fuzzy sets, either A1 or A2 is a fuzzy set or
both. Then what can the reasoning process conclude?
Example: (speed=95Km/hr)(speed is too fastPull back the
throttle))  Whether the throttle should be pulled back?
In the most common cases, A2 is a fuzzy set and A1 is a fuzzy
singleton (crisp value). Note that B can be a crisp value or a
fuzzy set. However, the rule A2B is hardly fuzzy.
23
March, 2009
®Copyright of Shun-Feng Su
Approximate Reasoning for Fuzzy sets
The most used reasoning format is one of the categorical
reasoning called the compositional rule of inference or
the generalized modus ponens.
(X is A) and (IF (X is B) then (Y is C)) results in Y is AR.
where X and Y are fuzzy variables and A, B and C are
fuzzy labels (sets). Note that the resultant AR for Y is a
fuzzy set. Usually, the membership function of AR can
be computed as
 A R (v)  max min(  A (u ),t (  B (u ),  C (v))
u
24
March, 2009
®Copyright of Shun-Feng Su
Approximate Reasoning for Fuzzy sets
The above result can be viewed as the extension
principle.
 A R (v)  max min(  A (u ), t (  B (u ),  C (v))
u
is to find whether v is in Y is the selection among
various u (or operation max), the existence of
x=u in A and (and operation min) the relation
of x=u and y=v.
25
March, 2009
®Copyright of Shun-Feng Su
Approximate Reasoning for Fuzzy sets
Note that from the logic viewpoint, the implication
pq is equivalent to pq. However, this
equivalence states that the logic of pq equals
pq. But in the reasoning, the logic of the
implication is assumed to be true, and the
question is whether the current situations (x=u
and y=v) match the rule IF (X is B) then (Y is
C).
Therefore, the most commonly used relation is to
compute the t-norm of  B (u ) and  C (v) .
26
March, 2009
®Copyright of Shun-Feng Su
Approximate Reasoning for Fuzzy sets
Example: R1: IF (X is A1) and (Y is B1) then (Z is C1).
R2: IF (X is A2) and (Y is B2) then (Z is C2).
Now, the input is (x0, y0) and the reasoning can be
graphically shown as:
27
March, 2009
®Copyright of Shun-Feng Su
Outline
 Introductions
 Fuzzy
Systems
 Uncertainty and Its representation
 Fuzzy Operations and Uncertainty Reasoning
 Fuzzy Logic Control
 Neural
Networks
 Genetic
Algorithms
 Epilogue
28
March, 2009
®Copyright of Shun-Feng Su
Fuzzy Logic Control
A Fuzzy Logic Controller (FLC) is a controller
described by a collection of fuzzy rules (e.g. IFTHEN rules) involving linguistic variables.
The original idea for the use of fuzzy control is to
incorporate “expert experience” of human into
the design of controllers.
The utilization of linguistic variables, fuzzy control
rules and approximate reasoning provides a
means to incorporate human expert experience in
designing the controller.
29
March, 2009
®Copyright of Shun-Feng Su
Rationale behind Fuzzy Logic Control
In an FLC, the rule structure provides the adaptation
among strategies, and then the fuzzy mechanism
provides the interpreting capability among rules.
With the interpreting capability, the transition between
rules is gradual rather than abrupt. It is the so-called
softening process.
But, in recent development, fuzzy control is used
because it consists of multiple strategies (rules or
controllers) for different situations. It of course can
have better control performance than that of one
complicated controller.
30
March, 2009
®Copyright of Shun-Feng Su
Basic Structure of Fuzzy Logic Control
A typical architecture of an FLC consists four
principal components: a fuzzifier, a fuzzy rule
base, an inference engine, and a defuzzifier.
31
March, 2009
®Copyright of Shun-Feng Su
Fuzzy Logic Control
• Knowledge usually
is in a rule structure
and rule structures
need partition.
• Fuzzy control uses
fuzzy partition.
32
March, 2009
®Copyright of Shun-Feng Su
Fuzzy Logic Control
To use fuzzy
rules, the input
values must be
transferred into
fuzzy labels.
With fuzzy partition
The consequences of all
matched rules must be
transformed into actions.
33
March, 2009
®Copyright of Shun-Feng Su
Fuzzy Logic Control
To use fuzzy
rules, a value
must be defined
into labels.
With fuzzy partition
The consequences of all
matched rules must be
transformed into actions.
34
also referred to
as a fuzzy
system.
March, 2009
®Copyright of Shun-Feng Su
Basic Structure of Fuzzy Logic Control
The fuzzifier is to transform crisp measured
data (e.g., speed=100Km/hr) into suitable
linguistic labels (e.g. speed is too fast).
The fuzzy rule base stores the knowledge in
rule forms about how to control the
system to be controlled (e.g., IF “speed is
too low” THEN “increase the throttle
setting”).
35
March, 2009
®Copyright of Shun-Feng Su
Basic Structure of Fuzzy Logic Control
The inference engine is to infer desired
control strategies from rules by
performing approximate reasoning based
on current states.
The defuzzifier is to yield a non-fuzzy action
or decision from the inferred control
strategy (a fuzzy set) by the inference
engine.
36
March, 2009
®Copyright of Shun-Feng Su
Fuzzy Systems
Mamdani fuzzy rules :
If (X is A) and (Y is B) … then (Z is C)
Note that C is a fuzzy set.
TSK (in modeling) or TS (in control) fuzzy rules :
If (X is A) and (Y is B) … then Z=f(X,Y).
Now, f() is a crisp function.
37
March, 2009
®Copyright of Shun-Feng Su
Fuzzy Systems
Mamdani fuzzy rules :
If (X is A) and (Y is B) … then (Z is C)
TSK (in modeling) or TS (in control) fuzzy rules :
If (X is A) and (Y is B) … then Z=f(X,Y).
The approximate reasoning for the output of a
fuzzy rule is obtained from extension principle
as:
 A R (v)  max min(  A (u ), t (  B (u ),  C (v))).
u
38
March, 2009
®Copyright of Shun-Feng Su
Fuzzy Systems
Mamdani fuzzy rules : COA defuzzification
To find the center of the
area, it need to use
numerical integration.
39
March, 2009
®Copyright of Shun-Feng Su
Fuzzy Systems
TS fuzzy rules : Somewhat is also called COA.
But without numerical integration. It is obtained as
m
 i fi
z=
i 1
m
 i
,
Simple and easy to calculate.
Most importantly, it can be used
in any mathematical operations,
such as derivative.
i 1
 i and
f i are the firing strength and the
where
fired result for the i-th rule and m is the rule
number.
40
March, 2009
®Copyright of Shun-Feng Su
Fuzzy Systems
Thus, it can be found that in recent development,
most of approaches consider TS (or TSK) fuzzy
models.
TS fuzzy models have also another advantage in
applications. The output of a TS fuzzy model
system can be more sensitive to the changes of
the inputs.
It can eliminate the chattering effects in the final
control stage occurring in the use of traditional
fuzzy models (Mamdani fuzzy rules).
41
March, 2009
®Copyright of Shun-Feng Su
Fuzzy System
A fuzzy approximator is constructed by a set of
fuzzy rules as
R l : IF x1 is A1l , and , and xn is Anl THEN y F is  l ,
for l  1,2,, M
Generally,  is a fuzzy singleton.
In the literature, this fuzzy model can be said to be
a Mamdani fuzzy model (with singleton fuzzy
sets) and a TS fuzzy model (a crisp function).
l
A commonly-used fuzzy model in
control
42
March, 2009
®Copyright of Shun-Feng Su
Fuzzy System
A fuzzy approximator is constructed by a set of
fuzzy rules as
R l : IF x1 is A1l , and , and xn is Anl THEN y F is  l ,
for l  1,2,, M
l
Generally,  is a fuzzy singleton (TS fuzzy model).
To me, due to no numerical integration needed, it
is a TS fuzzy model. Also, no membership
functions are used in the consequences.
43
March, 2009
®Copyright of Shun-Feng Su
Fuzzy System
The fuzzy systems with the center-of area like
defuzzification and product inference can be
obtained as
M
y f ( x) 
n
  l (   A l ( xi ) )
l 1
M
i 1
n
i
 (   A l ( xi ) )
l 1 i 1
t-norm operation
for all premise
parts
i
It is a universal function approximator and is
written as y f (x θ)  θT ω .
44
March, 2009
®Copyright of Shun-Feng Su
Fuzzy System
The fuzzy systems with the center-of area like
defuzzification and product inference can be
obtained as
y f ( x) 
M
n
l 1
M
i 1
n
  l (   A l ( xi ) )
i
 (   A l ( xi ) )
l 1 i 1
Note that  is a
function of states.
i
It is a universal function approximator and is
written as y f (x θ)  θT ω .
This is what is used
Simple and differentiable.
45
in adaptive fuzzy
control.March, 2009
®Copyright of Shun-Feng Su
Fuzzy System
It should be noted that the above system is a
nonlinear system. But, it can be seen that the
form is virtually linear.
Thus, various approaches have been proposed to
handle nonlinear systems by using the linear
system techniques for the linear property
bearing in each rule, such as common P
stability, LMI design process, adaptive fuzzy
control, etc.
46
March, 2009
®Copyright of Shun-Feng Su
Outline
 Introductions
 Fuzzy
Systems
 Neural
Networks
 Machine Learning
 Neural Network Models
 Leaning Analysis
 Genetic
Algorithms
 Epilogue
47
March, 2009
®Copyright of Shun-Feng Su
Why need Learning
The problem domain knowledge for the complicated
system usually does not exist or is extremely difficult
to obtain.
The system may be asked to learn knowledge from
experience by itself.
Note that learning is an important capability for an
intelligent system, but not necessary.
It can be seen in the recent research, most intelligent
systems have been equipped with the learning
capability.
48
March, 2009
®Copyright of Shun-Feng Su
What is Learning?
There are two important definitions for learning:
H. Simon defined learning as – “any change in a
system that allows it to perform better the
second time on the repetition of the same task
or on another task drawn from the same
population.”
B. Kosko defined learning as change in all cases.
“A system learns if and only if the system
parameter vector or matrix has a nonzero time
derivative.”
49
March, 2009
®Copyright of Shun-Feng Su
Concept of Machine Learning
The first definition is to ask the system with
learning should always behave better as
learning continues.
The second definition is mainly for numerical
learning.
The fundamental problem for learning is
how to change the system to make the
system’s behaviors as required.
50
March, 2009
®Copyright of Shun-Feng Su
Concept of Machine Learning
The first definition is to ask the system with
learning should always behave better as
learning continues.
The second definition is mainly for numerical
learning.
The fundamental problem for learning is
how to change the system to make the
system’s behaviors as required.
so-called learning
51 algorithms
March, 2009
®Copyright of Shun-Feng Su
Symbolic Learning vs.
Numerical Learning
In a symbolic learning scheme, the representation
of knowledge is symbolic, such as the predicate
calculus and rules. The learning behavior is to
build a conceptual relationship between those
symbols from learned examples.
In a numerical learning scheme, the knowledge
somehow is coded into numerical data. The
learning behavior is concerned about changing
the values of parameters numerically.
52
March, 2009
®Copyright of Shun-Feng Su
Symbolic Learning
Examples of symbolic learning schemes: Inductive
Learning, Case-based Learning, Explanation-based
Learning, etc.
Symbolic learning is well suited to interact with human
experts, but very sensitive to noise.
The major drawback of this learning is that the knowledge
manipulation is very complicated.
Traditional artificial intelligence has been focused on
symbolic learning. However, due to the difficulty in
manipulation and sensitive to noise, symbolic learning
actually did not provide any significant advances in the
real-world applications. 53
March, 2009
®Copyright of Shun-Feng Su
Numerical Learning
Examples of numerical learning schemes: Neural
Networks, Cerebellar Model Arithmetic Computer
(CMAC), Fuzzy Modeling, etc.
Numerical learning is computational efficiency and
insensitive to noise, but incomprehensible. It is easy to
use but is difficult to incorporate expert knowledge.
Recently, due to the use of neural networks and fuzzy
systems, numerical learning has drawn more attentions .
Applications of numerical learning schemes can be found
in various disciples, such as artificial intelligence,
computer science, control engineering, decision theory,
expert systems, operation research, pattern recognition,
and robotics.
54
March, 2009
®Copyright of Shun-Feng Su
Concept of Learning
Depending on what type of information used in
determining how to change the system,
learning schemes are usually categorized
into three different kinds of learning;
 supervised learning,
 unsupervised learning and
 reinforcement learning.
Learning category
Reinforcement learning sometimes, is also said to
be supervised learning, but with less introductive
55
March, 2009
supervising.
®Copyright of Shun-Feng Su
Concept of Learning
In fact, most successful learning approaches is of
supervised learning due to its simplicity in the
required task.
Unsupervised learning is used for finding common
features or for clustering. (self-organizing)
Reinforcement learning is fantastical in ideas, but
due to its intricacy in learning (such as delay
reward, decoupling between two learning
systems), more study must be conducted.
Learning category
56
March, 2009
®Copyright of Shun-Feng Su
Outline
 Introductions
 Fuzzy
Systems
 Neural
Networks
 Machine Learning
 Neural Network Models
 Leaning Analysis
 Genetic
Algorithms
 Epilogue
57
March, 2009
®Copyright of Shun-Feng Su
Introduction of Neural Networks
Artificial neural networks (ANN) or in simple,
neural networks (NN) are systems that are
inspired by modeling networks of biological
neurons in the brain.
NN are a promising new generation of information
processing systems that demonstrate the ability
to learn, recall, and generalize from training
patterns or data.
58
March, 2009
®Copyright of Shun-Feng Su
Typical Biological Neuron and Its Model
59
March, 2009
®Copyright of Shun-Feng Su
Introduction of Neural Networks
NN have a large number of highly interconnected
processing elements (PE) or neurons that usually
operate in parallel.
NN are good at tasks such as pattern matching and
pattern classification, function approximation,
optimization, vector quantization, and data
clustering. However, traditional computers are
faster in algorithmic computational tasks and
precise arithmetic operations.
60
March, 2009
®Copyright of Shun-Feng Su
Introduction of Neural Networks
Since neural networks do not use a mathematical
model of how a system’s output depends on its
input (so-called model-free estimator), neural
network architectures can be applied to a wide
variety of problems.
Like Brains, neural networks recognize patterns
we cannot define. This is the property of
recognition without definition.
61
March, 2009
®Copyright of Shun-Feng Su
Introduction of Neural Networks
An NN is a parallel distributed informationprocessing structure with the following
characteristics:
-It is a neurally inspired mathematical model.
-It consists of a large number of highly
interconnected processing elements (neurons).
-Its connections (weights) hold the knowledge.
62
March, 2009
®Copyright of Shun-Feng Su
Introduction of Neural Networks
-A neuron can dynamically respond to its
stimulus, and the response completely
depends on its local information.
-It has the ability to learn, recall, and generalize
from training data by assigning or adjusting
the connection weights.
-Its collective behavior demonstrates the
computational power, and no single neuron
carries specific information (distributed
representation property).
63
March, 2009
®Copyright of Shun-Feng Su
Basic Models of Neural Networks
Models of ANNs are specified by three basic
entities:
1. Neuron Models: It describes how the neurons
process the input and how the output is
generated.
2. Connectivity: It defines how those neurons are
interconnected.
3. Learning Algorithms: It defines how the
connecting weights are updated to adjust the
networks so as to behavior
as
required.
64
March, 2009
®Copyright of Shun-Feng Su
Basic Models of Neural Networks
The processing in a neuron is separated into two
parts: input and output.
Associated with the input of a neuron is an
integration function f, which serves to
combine information, activation, or evidence
from an external source or other neurons into a
net-input to the neuron.
The most commonly used integration function is
linear and written as:
m
for i=1, 2, … , n
f i  neti   wij x j   i
j 1
where  i is the threshold
65 of the i-th neuron.March, 2009
®Copyright of Shun-Feng Su
Basic Models of Neural Networks
The output function of a neuron is usually called the
activation function in that the output of a neuron
serves the role of activation of the meaning
stored in the neuron.
66
March, 2009
®Copyright of Shun-Feng Su
Learning Rules for Neural Networks
67
March, 2009
®Copyright of Shun-Feng Su
Learning in Neural Networks
As we have mentioned, the basic characteristic of
ANNs is that they have the capability of learning.
Iterative learning procedures are used for a variety of
ANN architectures.
Learning in ANNs can be accomplished in several
ways:
 establishment of connections between neurons;
 adjustment of the weight values on the links;
 adjustment of threshold values in neurons.
In fact, these processes can all be considered as the
adjustment of weight values
68 on the links.
March, 2009
®Copyright of Shun-Feng Su
Learning in Neural Networks
The backpropagation (BP) learning algorithm is
usually applied for learning. Such networks are also
referred as backpropagation networks.
The fundamental idea is that when a cost function
p
1
(k )
(k ) 2 ,
E(w) is defined such as E(w)=
(
d

y
)

2 k 1 E(w) or
then the updating algorithm is w=w
(k )
p
w=- E =   ( d ( k )  y ( k ) ) y .
w j
k 1
w j
Since the above process is to update the weights
after all training patterns are taken into account,
this is kind of learning is called the batch learning.
69
March, 2009
®Copyright of Shun-Feng Su
Learning in Neural Networks
It can be found that when the batch learning is used,
the errors of all training patterns are summed
together and then the learning effects are for the
summary of all training patterns. Thus, the learning
cannot make adjustments for individual pattern and
the resultant learning is usually unacceptable.
The other kind of learning is called the on-line
learning or per-example learning. In this type of
learning, these changes are made individually
for
(k )
each pattern; i.e.,  ( d ( k )  y ( k ) ) y
70
w j
March, 2009
®Copyright of Shun-Feng Su
Learning in Neural Networks
When we want an NN to perform some tasks, the NN is
realized by finding an appropriate set of weights. In
other words, the obtained weights are to capture what
we want the NN to be or the knowledge.
The activation values of neurons represent the system at
some time snap. Thus, they capture the transition state
for some specific input set at a certain time spot.
From the information storage viewpoint, the weights of the
links encode the so-called long-term memory and the
activation states of neurons encode the short-term
memory in the NN.
71
March, 2009
®Copyright of Shun-Feng Su
Outline
 Introductions
 Fuzzy
Systems
 Neural
Networks
 Machine Learning
 Neural Network Models
 Leaning Analysis
 Genetic
Algorithms
 Epilogue
72
March, 2009
®Copyright of Shun-Feng Su
Universal Approximator Theorem
Neural network with as few as one hidden layer using
arbitrary squashing activation function and linear or
polynomial integration function can approximate
virtually any function of interest to any desired
degree of accuracy, provided sufficiently many
hidden neurons are available.
Any lack of success in applications may arise from
inadequate learning, insufficient number of hidden
neurons or lack of deterministic relationships
between inputs and desired outputs.
The theorem only stated the existence of the ideal
network, but does not provide any mechanism to
73
March, 2009
find it.
®Copyright of Shun-Feng Su
Learning Performance Analysis
Two types of learning phases must be distinguished in
the evaluation of learning performance, especially for
offline learning schemes: the training phase and
the testing phase.
In the training phase, the system is trained by the given
training patterns. Thus, in the training phase, the
system is under construction and the convergent
behavior of the training is concerned.
For the training performance, the convergent behavior
is concerned and it is simple to consider the learning
histories (training errors74
vs. training iterations).
March, 2009
®Copyright of Shun-Feng Su
Learning Performance Analysis
The learning convergent behaviors usually are
characterized by two properties: the convergent
speed and the converged error (training error).
If the system is offline learning scheme, the convergent
speed may not be a significant factor to be
considered.
An issue for the converged errors is the learning may
be stuck on local minima if iterative (incremental)
learning algorithms are used.
75
March, 2009
®Copyright of Shun-Feng Su
Learning Performance Analysis
Even through the learning algorithms are the major
factor in determining the convergent behavior, other
factors, such as the system structure, the training
data quality, etc., may also affect the training
performance.
The learning performance of the training phase is to
state how accurately the learned system can
approximate the desired outputs for a given input in
the training data set.
The purpose of learning is to obtain a system that after
learning can somehow have a fair chance to behave
as required for any input76
data or in short, to March, 2009
generalize.
®Copyright of Shun-Feng Su
Learning Performance Analysis
Thus, in the testing phase, the generalization
capability is concerned; that is, whether the
learned system can interpret those unlearned
patterns well.
In the testing phase, the learned system is tested
by another set of patterns, which are not used
in the training phase in any way, to define the
generalization errors.
The performance in this phase is usually referred
to as the generalization capability.
77
March, 2009
®Copyright of Shun-Feng Su
Validation of Generalization
There are several methods for estimating generalization
errors:
 Split-sample validation: To randomly select part of the
data as a test set, which must not used in any way during
training. (The most commonly used one).
 Cross-validation: To resample the training data set. In a
k-fold cross-validation, the data is divided into k subsets
with equal size. Then, the network is trained k times,
each time leaving out one of those subsets, but using
only the omitted subset to compute the error criterion. It is
also called “leave-one-out” cross-validation.
 Bootstrapping: Instead of repeating subsets of the data,
sub-samples are randomly drawn from the data. It seems
78
March, 2009
to work better than cross-validation.
®Copyright of Shun-Feng Su
Learning Performance Analysis
In general, there are two different types of
generalization: interpolation and extrapolation.
Interpolation can often be done reliably, but
extrapolation is notoriously unreliable.
Note that generalization is not always possible for
various learning systems despite the assertions
in the literature.
79
March, 2009
®Copyright of Shun-Feng Su
Learning Performance Analysis
There are three conditions that are typically necessary
(although not sufficient for good generalization).
 Deterministic input-output relationships: The inputs
to the network contain sufficient information pertaining to
the desired outputs. It is impossible to learn a nonexistent function.

Smooth functions: A small change in the inputs should
produce a small change in the outputs. Very non-smooth
functions (e.g. random noise) cannot be generalized.

Sufficient training data: The used training data should
be a sufficiently large and representative subset of the
population. Sufficient data can avoid extrapolation.
80
March, 2009
®Copyright of Shun-Feng Su
Overfitting and Underfitting
A system that is not sufficiently complex (i.e.,
parameters to be tuned are less than required) may
fail to detect fully the signal in a complicated data set,
leading to underfitting.
A network that is too complex may fit not only the signal
but also the noise, leading to overfitting.
Note that overfitting may occur even with noise-free
data.
There are various approaches proposed in the literature
jittering, weight decay, early stooping, Bayesian
learning, robust learning algorithms, etc. .
81
March, 2009
®Copyright of Shun-Feng Su
Local Learning Concept
The minimum disturbance principle suggests that
a better way of learning should be aimed at not
only reducing the output error for the current
training pattern but also minimizing disturbance
to the weights having already learned.
A learning system following the minimum
disturbance principle can learns more effective.
We refer it as the local learning concept.
82
March, 2009
®Copyright of Shun-Feng Su
Local Learning Concept
The updating effects of neural networks are
prevailed to all weights in the networks due to
the distributed knowledge representation. It
violates the minimum disturbance principle. It is
called the global learning.
Local learning can be more effective, but may not
always learn better.
Neural fuzzy systems use spatial relations to
define learning structure that can facilitate local
learning concept.
83
March, 2009
®Copyright of Shun-Feng Su
Network Structure for Fuzzy Systems
In this kind of approach, fuzzy models are
characterized by a set of parameters, such as
the centers and widths in membership functions,
the rule relationships, etc.
Since those parameters can be viewed as the
weights in a network, the traditional learning
schemes for neural networks then can be
adopted to this fuzzy modeling problem.
Those kinds of approaches are often referred as
neural fuzzy systems or neural-network-based
fuzzy systems.
84
March, 2009
®Copyright of Shun-Feng Su
Local Learning Concept
It can be found that neural fuzzy systems can always
have better learning capability than that of neural
networks.
Since local learning may restrain the learning on the
pre-defined relations to reduce the learning burden,
if those relations are not correct or cannot reflect
certain information, the effects on local learning may
not be acceptable.
Several systems can be classified as local learning
systems, such as radial basis function networks,
Wavelet networks, CMAC,
85 etc.
March, 2009
®Copyright of Shun-Feng Su
Outline
 Introductions
 Fuzzy
Systems
 Neural
Networks
 Genetic
Algorithms
 Optimization in Computational Intelligence
 Evolutionary Computation
 Other Non-derivation Optimization
 Epilogue
86
March, 2009
®Copyright of Shun-Feng Su
Optimization in Computational
Intelligence
Optimization processes are required in an intelligent
system due to:
 Better selection of applicable knowledge or strategies
can result in better performance;
 In the learning process, an optimal way of defining the
updating rule is required.
In general, an optimization problem requires finding a
setting of variable vector of the system such that a
certain quality criterion or called a performance function
is optimized. Sometimes, the variable vector may have
to satisfy some constraints.
87
March, 2009
®Copyright of Shun-Feng Su
Optimization in Computational
Intelligence
The traditional optimization approaches are to
develop a formal model that resembles the
original function and then solves it by means
of traditional mathematical methods .
Evolutionary algorithms have been widely used in
various intelligent systems. In fact, by
combining with fuzzy systems and networks,
lots of applications can be found in the
literature.
88
March, 2009
®Copyright of Shun-Feng Su
Evolutionary Computation
An important property of evolutionary algorithms in
search is that in the search process, auxiliary forms
of the fitness function, such as derivations, are not
required.
In fact, evolutionary computation should be
understood as a general adaptable concept for
problem solving rather than a collection of related
and ready-to-use algorithms.
89
March, 2009
®Copyright of Shun-Feng Su
Outline
 Introductions
 Fuzzy
Systems
 Neural
Networks
 Genetic
Algorithms
 Optimization in Computational Intelligence
 Evolutionary Computation
 Other Non-derivation Optimization
 Epilogue
90
March, 2009
®Copyright of Shun-Feng Su
Evolutionary Computation
The majority of current implementations of evolutionary
algorithms descend from three strongly related but
independent developed approaches:
 Genetic algorithms: to use binary as gene in its
representation to search for an optimal chromosome.
 Evolutionary programming: to evolve finite state
machines to predict events on the basis of former
observations.
 Evolution strategies: to solve difficult discrete and
continuous parameter optimization problems.
91
March, 2009
®Copyright of Shun-Feng Su
Evolutionary Computation
Evolutionary computation is to mimic the natural
selection process so as to find the best fitted
candidate for the solution. (Optimization)
Evolutionary algorithms can be viewed as
optimization approaches that use random
search algorithms with some guidance.
92
March, 2009
®Copyright of Shun-Feng Su
Evolutionary Computation
The guidance is fulfilled by a user-specified
fitness function.
In general, an optimization problem requires
finding a setting of variable vector of the system
such that a certain quality criterion or called a
performance function is optimized. Sometimes,
the variable vector may have to satisfy some
constraints.
93
March, 2009
®Copyright of Shun-Feng Su
Initialize population P(t)
Evaluate P(t)
Apply reproduction and crossover on P(t)
to yield C(t)
Apply mutation on C(t) to yield
and then evaluate D(t)
Select P(t+1) from P(t) and D(t)
based on the fitness
Stop criterion
satisfied ?
94
Stop
March, 2009
®Copyright of Shun-Feng Su
Evolutionary Computation
Evolutionary computation uses three basic operators to
manipulate the genetic composition (chromosomes) of
a population:
 Reproduction is a process of selecting parents for
generating offspring. The most highly rated
chromosomes in the current generation are most likely
copied in the new generation.
 Crossover provides a mechanism for chromosomes to
mix and match attributes through random processes.
 Mutation is to changed attributes (genes) in the new
generation to bring new possibility. Mutation is a very
important mechanism in avoiding local minimum in
optimization search.
95
March, 2009
®Copyright of Shun-Feng Su
Evolutionary Computation
The above operations play the role of generating
the new chromosomes for evolution.
Hopefully, the best-fitted solution can be
generated.
Besides, randomness play the essential roles in
those operations.
One attractive property of evolutionary algorithms
is that the performance of the solution is
always getting better.
96
March, 2009
®Copyright of Shun-Feng Su
Evolutionary Computation
However, due to the nature of adaptation to the
problems, the operations of evolutionary
algorithms must be designed by the users.
Moreover, if the optimization is constrained, the
initial population and the generations of new
chromosomes must be carefully selected.
97
March, 2009
®Copyright of Shun-Feng Su
Outline
 Introductions
 Fuzzy
Systems
 Neural
Networks
 Genetic
Algorithms
 Optimization in Computational Intelligence
 Evolutionary Computation
 Other Non-derivation Optimization
 Epilogue
98
March, 2009
®Copyright of Shun-Feng Su
Other Non-derivation Optimization
Other often mentioned approaches are Ants
(ACS, ACO, etc) and Particle Swarm
Optimization (PSO).
The overall ideas are all similar in that they all
use fitness values to guide the search with
some random mechanisms associated with
the search process.
Usually, these approaches can have better
search performance than that of genetic
algorithms.
99
March, 2009
®Copyright of Shun-Feng Su
Other Non-derivation Optimization
It is because Genetic algorithms are solutionwise search and swarm search algorithms
are component-wise search.
Also, it can be found that genetic algorithms are
easier to be trapped into a local minimum if
the initial population has some local
optimum properties.
Swarm algorithms can easily escape from such
an initial local optimum phenomena.
10
0
March, 2009
®Copyright of Shun-Feng Su
Epilogue
Computation intelligence is a new vehicle for the
next generation of artificial intelligence.
Nevertheless, only computational intelligence can
bring you nowhere.
To incorporate with other techniques may possibly
create new frontiers for our dreams.
10
1
March, 2009
®Copyright of Shun-Feng Su
Thank you for your attention!
Any Questions ?!
Shun-Feng Su,
Professor of Department of Electrical Engineering,
National Taiwan University of Science and Technology
E-mail: [email protected],
10
2
March, 2009