Download Introduction to Hybrid Systems – Part 1

Document related concepts

Machine learning wikipedia , lookup

Neural modeling fields wikipedia , lookup

Ecological interface design wikipedia , lookup

Pattern recognition wikipedia , lookup

Genetic algorithm wikipedia , lookup

AI winter wikipedia , lookup

Hierarchical temporal memory wikipedia , lookup

Incomplete Nature wikipedia , lookup

Personal knowledge base wikipedia , lookup

Gene expression programming wikipedia , lookup

History of artificial intelligence wikipedia , lookup

Catastrophic interference wikipedia , lookup

Knowledge representation and reasoning wikipedia , lookup

Convolutional neural network wikipedia , lookup

Transcript
Hybrid Systems
Hybridization
• Integrated architectures for machine learning
have been shown to provide performance
improvements over single representation
architectures.
• Integration, or hybridization, is achieved using a
spectrum of module or component architectures
ranging from those sharing independently
functioning components to architectures in which
different components are combined in inherently
inseparable ways.
• In this presentation we briefly survey prototypical
integrated architectures
Combinations

The combination of knowledge based systems,
neural networks and evolutionary computation
forms the core of an emerging approach to
building hybrid intelligent systems capable of
reasoning and learning in an uncertain and
imprecise environment.
Current Progress
• In recent years multiple module integrated
machine learning systems have been developed
to overcome the limitations inherent in single
component systems.
• Integrations of neural networks (NN), fuzzy logic
(FL) and global optimization algorithms have
received considerable attention [Abr] but
increasing attention is being paid to integrations
with case based reasoning (CBR) and rule
induction (RI) [Mar, Pren].
Primary Components
• The full spectrum of knowledge representation in
such systems is not confined to the primary
components.
• For example, in CBR systems although much
knowledge resides in the case library significant
problem solving knowledge may reside in
secondary technologies such as in the similarity
metric used to retrieve problem solution pairs
from the case library, in the adaptation
mechanisms used to improve an approximate
solution and in the case library maintenance
mechanisms.
MultiComponents
• Although it is possible to generalize about the
relative utilities of these component types based
on the primary knowledge representation
mechanisms these generalizations may no
longer remain valid in particular cases
depending on the characteristics of the
secondary mechanisms employed.
• Table 1 attempts to gauge the relative utilities of
single components systems based on the
primary knowledge representation.
Degree of Integration
• Besides differing in the types of component systems
employed, different integrated architectures have
emerged in a rather ad hoc way, Abraham [Abr].
• Least integrated architectures consisting of independent
components communicating with each other on a side by
side basis.
• More integration is shown in transformational or
hierarchial systems in which one technique may be used
for development and another for delivery or one
component may be used to optimize the performance of
another component.
• More fully integrated architectures combine different
effects to produce a balanced overall computational
model.
Transformational,
hierarchial and integrated
• Abraham categorizes such systems as
transformational, hierarchial and integrated. In a
transformational integrated system the system
may use one type of component to produce
another which is the functional system.
• For example, a rule based system may be used
to set the initial conditions for a neural network
solution to a problem.
• Thus, to create a modern intelligent system it
may be necessary to make a choice of
complementary techniques.
Stand Alone Models
• Independent components that do not
interact
• Solving problems that have naturally
independent components – eg., decision
support and categorization
Transformational
• Expert systems with neural networks
• Knowledge from the ES is used to set the
initial conditions and training set of the NN
Hierarchial Hybrid
• An ANN uses a GA to optimize its topology
and the output fed into an ES which
creates the desired output or explanation
Integrated – Fused Architectures
• Combine different techniques in one
computational model
• Share data structures and knowledge
representations
• Extended range of capabilities – e.g.,
classification with explanation, or,
adaptation with classification
Generalized Fused Framework
Fused Architecture
The architecture consists of four components and the
environment. The performance element (PE) is the
actual controller. The learning element.(LE) updates
the knowledge in the PE . The LE has access to the
environment, the past states and the performance
measure. It updates the PE. The
examines the
external performance and provides feedback to the
LE. The critic faces the problem of converting an
external reinforcement into an internal one. The
problem generator is to contribute to the exploration of
the problem space in an efficient way.
The framework does not specify the techniques.
System Types for Hybridization
• Knowledge-based Systems and if-then
rules
• CBR Systems
• Evolutionary Intelligence and Genetic
algorithms
• Artificial Neural Networks and Learning
• Fuzzy Systems
• PSO Systems
Knowledge in Intelligent Systems
• In rule induction systems knowledge is represented
explicitly by if-then rules that are obtained from example
sets.
• In neural networks knowledge is captures in synaptic
weights in systems of neurons that capture
categorizations in data sets.
• In evolutionary systems knowledge is captured in
evolving pools of selected genes and in heuristics for
selection of more adapted chromosomes.
• In case based systems knowledge is primarily stored in
the form of case histories that represent previously
developed problem-solution pairs.
• In PSO systems the knowledge is stored in the prticle
swarms
Table 1 (Adapted from [Abr, Jac] and [Neg]). A comparison of the utility of
case based reasoning systems (CBR), rule induction systems (RI),
neural networks (NN) genetic algorithms (GA) and fuzzy systems (FS),
with 1 representing low and 4 representing a high utility.
CBR
KB
NN
GA
FL
Know. rep.
3
4
1
2
4
Uncertainty
1
1
4
4
4
Approximation (noisy
incomplete data)
1
1
4
4
4
Adaptable
4
2
4
4
2
Learnable
3
1
4
4
2
Interpretable
3
4
1
2
4
Interpretability
• Synaptic weights in trained neural networks are
not easy to interpret with particular difficulties if
interpretations are required.
• Genetic algorithms model natural genetic
adaptation to changing environments and thus
are inherently adaptable and learn well
• Not easily interpretable because although the
knowledge resides partly in the selection
mechanism it is in the most part deeply
embedded within a population of adapted genes.
Adaptability
• Case based systems are adaptable
because changing the case library may be
sufficient to port a system to a related
area. If changes need to be made to the
similarity metric or the adaptation
mechanism or if the case structure needs
to be changed much more work may be
required.
Learnability
• Fuzzy rule based systems offer more
option through which learnability may be
more easily achieved.
• Fuzzy rules may be fine tuned by adjusting
the shapes of the fuzzy sets according to
user feedback [Abi]
Rules and cases
• Rule based systems employ an easily
comprehensible but rigid representation of
expert knowledge such systems may afford
better interpretation mechanisms.
• Similarly recent research shows [SØR] that
explanation techniques for large case bases is
most promising while case based learning and
maintenance can often be very efficient because
of the transparency of typical case libraries.
Example
Neural Expert Systems
Basic structure of a neural expert system
Training Data
Rule Extraction
Neural Knowledge Base
New
Data
Rule: IF - THEN
Inference Engine
Explanation Facilities
User Interface
User
Can we combine advantages of ANNs
with other IS systems to create more
powerful and effective systems?
Neural expert systems



Expert systems rely on logical inferences and
decision trees and focus on modelling human
reasoning. Neural networks rely on parallel data
processing and focus on modelling a human brain.
Expert systems treat the brain as a black-box.
Neural networks look at its structure and functions,
particularly at its ability to learn.
Knowledge in a rule-based expert system is
represented by IF-THEN production rules.
Knowledge in neural networks is stored as
synaptic weights between neurons.


In expert systems, knowledge can be divided into
individual rules and the user can see and
understand the piece of knowledge applied by the
system.
In neural networks, one cannot select a single
synaptic weight as a discrete piece of knowledge.
Here knowledge is embedded in the entire
network; it cannot be broken into individual
pieces, and any change of a synaptic weight may
lead to unpredictable results. A neural network is,
in fact, a black-box for its user.
Can we combine advantages of expert systems
and neural networks to create a more powerful
and effective expert system?
A hybrid system that combines a neural network and
a rule-based expert system is called a neural expert
system (or a connectionist expert system).
The heart of a neural expert system is the
inference engine. It controls the information
flow in the system and initiates inference over the
neural knowledge base. A neural inference engine
also ensures approximate reasoning.
Approximate reasoning


In a rule-based expert system, the inference engine
compares the condition part of each rule with data
given in the database. When the IF part of the rule
matches the data in the database, the rule is fired and
its THEN part is executed. The precise matching is
required (inference engine cannot cope with noisy or
incomplete data).
Neural expert systems use a trained neural network in
place of the knowledge base. The input data does not
have to precisely match the data that was used in
network training. This ability is called approximate
reasoning.
Rule extraction


Neurons in the network are connected by links,
each of which has a numerical weight attached to it.
The weights in a trained neural network determine
the strength or importance of the associated neuron
inputs.
Trained Neural Network To Identify Flying Objects
Is there any way that we could interpret the
values in the weights in a meaningful way?
Algorithm
By attaching a corresponding question to each input
neuron, we can enable the system to prompt the user
for initial values of the input variables:
Neuron: Wings
Question: Does the object have wings?
Neuron: Tail
Question: Does the object have a tail?
Neuron: Beak
Question: Does the object have a beak?
Neuron: Feathers
Question: Does the object have feathers?
Neuron: Engine
Question: Does the object have an engine?
Score 1 for yes, -1 for no and 0 for unknown
Use a sign function as the activation and interpret 0 for no and 1 for yes.
Exercise: Neuro-rule inference
If we set each input of the input layer to either +1 (true), 1 (false), or 0 (unknown), we can give a
semantic interpretation for the activation of any output neuron.
For example, if the object has Wings (+1), Beak (+1) and Feathers (+1), but does not have Engine (1)
What can we conclude about the object being a bird, a plane or a glider applying a threshold of 0 and using
the sign function as an activation function?
We can conclude that this object may be a Bird
X

1

(

0
.
8
)

0

(

0
.
2
)

1

2
.
2

1

2
.
8

(

1
)

(

1
.
1
)

5
.
3

0
Bird
How can we extract rules from this
Neural Network ?
Rule Extraction from a Neural Network
Algorithm for Extracting Confidence
Heuristic: Known greater than unknown
An inference can be made if the known net
weighted input to a neuron is greater than the
sum of the absolute values of the weights of
the unknown inputs.
n
n
 x w  w
i
i1
i
j 1
j
where i  known, j  known and n is the number
of neuron inputs.
Class Exercise: Confidence in Neural Rules
In the neural rules below suppose that you find an increasing amount
of information about an object:
1 It has feathers.
2 It has feathers and a beak
3 It has feathers, a beak and wings.
At what point, according to the above algorithm, can the inference be
made that the object is a bird? How much difference does the
knowledge about wings make?
Enter initial value for the input Feathers:
 +1
KNOWN = 12.8 = 2.8
UNKNOWN = 0.8 + 0.2 + 2.2 + 1.1 = 4.3
KNOWN  UNKNOWN
Enter initial value for the input Beak:
 +1
KNOWN = 12.8 + 12.2 = 5.0
UNKNOWN = 0.8 + 0.2 + 1.1 = 2.1
KNOWN  UNKNOWN
CONCLUDE: Bird is TRUE
A Set of rules can be mapped into a
multi-layer neural network architecture
1. The weights between the layers represent rule
certainties
1. After establishing the initial structure of the ANN a
training algorithm may be applied.
1. After training the weights may be used to refine the
initial set of rules.
Basic structure of a neural expert system
Training Data
Rule Extraction
Neural Knowledge Base
New
Data
Rule: IF - THEN
Inference Engine
Explanation Facilities
User Interface
User
Evolutionary neural networks
Evolutionary neural networks


Although neural networks are used for solving a
variety of problems, they still have some
limitations.
One of the most common is associated with neural
network training. The back-propagation learning
algorithm cannot guarantee an optimal solution.
In real-world applications, the back-propagation
algorithm might converge to a set of sub-optimal
weights from which it cannot escape. As a result,
the neural network is often unable to find a
desirable solution to a problem at hand.


Another difficulty is related to selecting an
optimal topology for the neural network. The
“right” network architecture for a particular
problem is often chosen by means of heuristics,
and designing a neural network topology is still
more art than engineering.
Genetic algorithms are an effective optimisation
technique that can guide both weight optimisation
and topology selection.
Encoding a set of weights in a chromosome


The second step is to define a fitness function for
evaluating the chromosome’s performance. This
function must estimate the performance of a
given neural network. We can apply here a
simple function defined by the sum of squared
errors.
The training set of examples is presented to the
network, and the sum of squared errors is
calculated. The smaller the sum, the fitter the
chromosome. The genetic algorithm attempts
to find a set of weights that minimises the sum
of squared errors.


The third step is to choose the genetic operators –
crossover and mutation. A crossover operator
takes two parent chromosomes and creates a
single child with genetic material from both
parents. Each gene in the child’s chromosome is
represented by the corresponding gene of the
randomly selected parent.
A mutation operator selects a gene in a
chromosome and adds a small random value
between 1 and 1 to each weight in this gene.
Crossover in weight optimisation
Mutation in weight optimisation
Can genetic algorithms help us in selecting
the network architecture?
The architecture of the network (i.e. the number of
neurons and their interconnections) often determines
the success or failure of the application. Usually the
network architecture is decided by trial and error;
there is a great need for a method of automatically
designing the architecture for a particular application.
Genetic algorithms may well be suited for this task.


The basic idea behind evolving a suitable network
architecture is to conduct a genetic search in a
population of possible architectures.
We must first choose a method of encoding a
network’s architecture into a chromosome.
Encoding the network architecture



The connection topology of a neural network can
be represented by a square connectivity matrix.
Each entry in the matrix defines the type of
connection from one neuron (column) to another
(row), where 0 means no connection and 1
denotes connection for which the weight can be
changed through learning.
To transform the connectivity matrix into a
chromosome, we need only to string the rows of
the matrix together.
Encoding of the network topology
The cycle of evolving a neural network topology