Download Topic 4A Neural Networks

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Time series wikipedia , lookup

Machine learning wikipedia , lookup

Perceptual control theory wikipedia , lookup

Gene expression programming wikipedia , lookup

Neural modeling fields wikipedia , lookup

Pattern recognition wikipedia , lookup

Hierarchical temporal memory wikipedia , lookup

Catastrophic interference wikipedia , lookup

Convolutional neural network wikipedia , lookup

Transcript
ICT619 Intelligent Systems
Topic 4: Artificial Neural Networks
PART A
Introduction
An overview of the biological neuron
The synthetic neuron
Structure and operation of an ANN
Learning in ANNs
-Supervised and unsupervised learning
ANN models
Applications
PART B
Developing neural network applications
- Design of the network
- Training issues
A comparison of ANN and ES
Hybrid ANN systems
Case Studies
841020185
1
ICT619 S2-05
Introduction
Artificial Neural Networks (ANN) are inspired by the biological brain, which consists of billions of
interconnected neurons working in parallel. The typical ANN is a computer program that enables us to
simulate a similar network of simple processing elements, albeit at a greatly reduced scale. The
significant feature of these networks is - they can adapt themselves to learn problem solutions. They
have proved particularly effective for problems that are hard to solve using conventional computing
methods.
Since its inception in the 1950’s and 60’s, the history of ANNs has been marked by great initial
enthusiasm followed by a relatively long period lacking serious attention during the 1970’s and early
1980’s. With a major breakthrough in neural network training methodology (the generalised delta
rule) in the 1980s, an upsurge in interest in ANNs has taken place. During the last 15 years or so,
numerous ANN-based systems have been developed experimentally and commercial products based on
the neural network principle have appeared in the market. A few other terms used for ANNs are –
neurocomputing, neural computing, neural networks, parallel distributed processing or connectionism.
ANNs and expert systems are similar to the extent that both provide a tool for problem solving when
no algorithm is available. While the ES rely on the solution being expressed as a set of heuristics by an
expert, ANNs learn solely from data.
An overview of the biological neuron
A neuron is a type of brain cell, which communicates with other neurons using neurotransmitters that
carry signals. Electrical impulses produced by a neuron travel out of it along a transmission channel
known as the axon and activate the synaptic junctions. The synaptic junctions act like gates between
the axons and dendrites, which form the network of interconnections between neurons. The impulses
cross the synapse and travel along the dendrites to other neurons.
A neuron adds its inputs and "fires" (produces an output) when the sum of its inputs exceeds a certain
threshold. The strengths of a neuron’s inputs are modified by the synaptic junctions. Because more than
one neuron is connected at the synaptic junctions, a chain reaction occurs where a large number of
neurons may be triggered. The resulting brain activity is responsible for our thought processes and
reactions to external stimuli.
There is an estimated 1000 billion neurons in the human brain, with each neuron connected to up to 10
thousand other neurons. The neurons are structured in layers. Each layer consists of a network of
neurons. The layers are also interconnected.
As a simplified model, the brain may be viewed as a massively parallel system of highly interconnected
but relatively simple processing elements. Learning in our brains occurs through a continuous process
841020185
2
ICT619 S2-05
of new interconnections forming between neurons, and adjustments at the synaptic junctions to
facilitate or inhibit the passage of signals.
The synthetic neuron
The synthetic or artificial neuron, which is a simple model of the biological neuron, was first proposed
in 1943 by McCulloch and Pitts. It consists of a summing function with an internal threshold, and
"weighted" inputs as shown below.
For a neuron receiving n inputs, each input xi ( i ranging from 1 to n) is weighted by multiplying it with
a weight wi . The sum of the wixi products gives the net activation of the neuron. This activation value
is subjected to a transfer function to produce the neuron’s output.
The weight value of the connection or link carrying signals from a neuron i to a neuron j is termed
wij..
wij
i
Direction of signal flow
j
Transfer functions
One of the design issues for ANNs is the type of transfer function used to compute the output of a node
from its net activation. Among the popular transfer functions are:
 Step function
 Signum function
 Sigmoid function
 Hyperbolic tangent function
In the step function, the neuron produces an output only when its net activation reaches a minimum
value – known as the threshold. For a binary neuron i, whose output is a 0 or 1 value, the step function
can be summarised as:
0 if activationi  T
outputi  
1 if activationi  T
When the threshold T is 0, the step function is called signum.
Another type of signum function is described by:
 1 if activationi  0

output i   0 if activationi  0
 1 if activation  0
i

The sigmoid transfer function produces a continuous value in the range 0 to 1. It has the form:
outputi

841020185
1
1  e  gain . activationi
3
ICT619 S2-05
The parameter gain is determined by the system designer. It affects the slope of the transfer around
zero. The multilayer perceptron uses the sigmoid as the transfer function.
A variant of the sigmoid transfer function is the hyperbolic tangent function. It has the form:
outputi

e activationi  e  activationi
e activationi  e activationi
This function has a shape similar to the sigmoid (shaped like an S), with the difference that the value
of outputi ranges between –1 and 1.
output i
output i
Step function
Signum
1
1
0
0
T
activation i
activation i
output i
output i
1
1
Sigmoid
0.5
0
Hyperbolic Tangent
0
-1
activation i
activation i
Figure Functional form of transfer functions
Structure and operation of an ANN
The building block of an ANN is the artificial neuron as described above with its weighted inputs,
summing and transfer function. The most common architecture of an ANN consists of two or more
layers of artificial neurons or nodes, with each node in a layer connected to every node in the following
layer.
Signal usually flows from the input layer, which is directly subjected to an input pattern, across one or
more hidden layers towards the output layer. The most popular ANN architecture, known as the
multilayer perceptron (shown in diagram above), follows this model.
841020185
4
ICT619 S2-05
Figure Structure of a typical ANN
In some models of the ANN, such as the self-organising map (SOM) or Kohonen net, nodes in the
same layer may have interconnections among them.
The input stimulus of an ANN are data values grouped together to form a pattern vector. As a rather
simplistic example, consider an ANN used to forecast interest rate movements. The input pattern vector
may consist of the four components – inflation rate, unemployment rate, economic growth rate, and the
proximity of federal election. The input layer would consist of four nodes, each of which would receive
a value representing one of these attributes. The output layer could have just one node whose output
level would indicate whether or not interest rate is likely to rise - say, high for a rise and low otherwise.
We can expect the ANN to produce the output for an accurate prediction, only if the functional
relationships between the relevant variables, namely the four components of the input pattern, and the
corresponding output – the interest rate, have been “learned” by the ANN.
Learning by ANNs is modelled after human learning behaviour, i.e., repeatedly going through the same
materials, making mistakes and adapting until able to carry out a given task successfully.
Learning in artificial neural networks
The connection weights are the single most important factor in the input output process of a neuron.
The values of these weights determine the outcome of the network. In most neural networks, the
system itself computes the weights among its neurons. The process by which an ANN arrives at the
values of these weights is known as learning or training.
Learning in ANNs takes place through an iterative training process during which node interconnection
weight values are adjusted. Initial weights, usually small random values, are assigned to the
interconnections between the ANN nodes.
Supervised and unsupervised learning
In supervised training, data sets comprising input patterns and corresponding output values are used.
The weight adjustments during each iteration aim to reduce the “error” or difference between the
ANN’s actual output and the expected correct output. For example, a node producing a small output
when it is expected to produce a large positive one, has its positive weight values increased and the
negative weight values decreased. This has the effect of increasing the sum of its inputs (its activation
level), resulting in a higher output level.
Pairs of sample input value and corresponding output value(s) are used to train the net repeatedly until
the output becomes satisfactorily accurate as indicated by a low enough error value. The matrix of
weight values in a trained ANN represents a model of the solution, or the “knowledge” required for the
solution.
841020185
5
ICT619 S2-05
In unsupervised training, there is no known expected output to be used for guiding the weight
adjustments. Instead, the net adapts itself to align its weight values with training patterns. This results
in groups of nodes responding strongly to specific groups of inputs patterns having some common
attributes. These nodes are said to form clusters of input patterns, which can be studied to find
interesting patterns in the input data.
Most ANNs employ supervised learning because this type is of more immediate use for problem
solving.
A neural network can be in one of two states: training mode and operation mode. Most ANNs do not
change their weights once training is finished and they are in operation
Learning in ANNs can also be categorised as:
- Off-line
- On-line
Training ANNs for real-life applications can be time consuming, but once trained, the resulting
network can be run highly efficiently – providing fast responses. These networks are of the off-line
learning type.
In on-line learning, the ANN continues to learn while in operation. Whenever it encounters an input
which is different from any that it has learned already, it adapts its weights to incorporate this input as
new knowledge.
ANN models
Although ANNs are supposed to model the structure and operation of the biological brain, there are
different types of neural networks depending on the architecture, learning strategy and operation. Three
of the most well known models are outlined below.
The Multilayer Perceptron (MLP)
In a multilayer perceptron, the neurons are arranged into an input layer, an output layer and one or
more hidden layers. It is the most popular ANN architecture and has served as the platform for the
rapid expansion of the application of ANNs. The MLP is also known as the backpropagation (or simply
backprop) network (BPN) because of the use of error values from the output layer in the layers before
it to calculate weight adjustments during training.
Figure The general multilayer perceptron architecture consisting of an input, a hidden and an output
layer of neurons.
841020185
6
ICT619 S2-05
The learning rule for the multilayer perceptron is known as "the generalised delta rule" or the
"backpropagation rule". The generalised delta rule repetitively calculates an error value for each input,
which is a function of the squared difference between the expected correct output and the actual output.
The calculated error is backpropagated from one layer to the previous one for adjusting the weights
connecting layers.
The error landscape in a multilayer perceptron
The behaviour of a neural network as it attempts to arrive at a solution can be visualised in terms of the
error function Ep, which is a function of the input and the weights. For a given pattern p, the error Ep
can be plotted against the weights to give the so called error surface. The error surface is a landscape
of hills and valleys, with points of minimum error corresponding to wells and maximum error found on
peaks.
The generalised delta rule aims to minimise Ep by adjusting weights so that they correspond to points
of lowest error. It does this by the method of gradient descent where the changes are made in the
steepest downward direction. All possible solutions are depressions in the error surface, known as
basins of attraction.
During training using error backpropagation, the modified interconnection weights of the multilayer
perceptron form internal representations that enable it to generate the desired outputs when given the
training inputs. This internal representation is also applicable to inputs that were not used during
training. The neural net is thus able to classify these previously unseen inputs according to the features
they share with the training examples. This feature of ANNs endows them with the capability to
generalise - something humans perform quite often. We haven't seen all the men in the world but we
are most likely to recognise one as a man instantly (unless heavily disguised).
The representation of the problem solution in the MLP may be viewed as the formation of a solution
surface spanning the solution points the network has been trained with (Dhar & Stein). Unknown
solution points found by the net during operation may be visualised as points interpolated from the
learned solution points.
Learning difficulties in multilayer perceptrons
Occasionally, the multilayer perceptron fails to settle into the global minimum of the energy surface
and instead find itself in one of the local minima. This is due to the gradient descent strategy followed.
A number of alternative approaches can be taken to reduce this possibility:




Lowering the gain term progressively. This term is used to influence the rate at which weight
changes are made in each iteration of the training. Its values by default is 1, but may be
gradually reduced to reduce the rate of change as training progresses.
Addition of more nodes for better representation of patterns. Too few nodes (and consequently
weights) can cause failure of the ANN to learn a pattern.
Introduction of a momentum term which determines the effect of past weight changes on the
current direction of the movement in weight space. Like the gain term, the momentum term
is a small numerical value in the range 0 … 1.
Addition of random noise to perturb the ANN out of local minima.
The Kohonen network (the self-organising map)
The type of learning used in multilayer perceptrons is supervised learning which requires the correct
response to be provided during training. Biological systems display this type of learning, but they are
also capable of learning by themselves - without a supervisor showing the correct response. Think of a
child learning from a teacher, but also as a baby learning to recognise its mother without any apparent
guidance.
A neural network with a similar capability is said to be self-organising. The Kohonen net belongs to
this group of ANNs. During training, the network changes its weights to learn appropriate associations,
without any right answers being provided.
841020185
7
ICT619 S2-05
The main difference in the layout of self-organising networks is the presence of lateral connections.
These connections are similar to the connectivities found in the cerebral cortex. The Kohonen net
consists of an input layer, which distributes the inputs to each node in a second layer, the so-called
competitive layer. Each of the nodes on this layer acts as an output node.
Figure The architecture of the Kohonen net.
Each neuron in the competitive layer is connected to other neurons in its neighbourhood and feedback
is restricted to neighbours through these lateral connections. Neurons in the competitive layer have
excitatory (positively weighted) connections to immediate neighbours and inhibitory (negatively
weighted) connections to more distant neurons.
As an input pattern is presented, some of the neurons in the competitive layer are sufficiently activated
to produce outputs, which are fed back to other neurons in their neighbourhoods.
The node with the set of input weights closest to the input pattern component values produces the
largest output. This node is termed the winning node. During training, input weights of the winning
node and its neighbours are adjusted to make them resemble the input pattern even more closely.
At the completion of training, the winning node ends up with its input weight values aligned with the
input pattern and produces the strongest output whenever that particular pattern is presented. The nodes
in the winning node's neighbourhood also have their weights modified to settle down to an average
representation of that pattern class. As a result, the net is able to represent clusters of similar input
patterns - a feature found useful for data mining applications.
The Hopfield Model
The Hopfield net is the most widely known of all the autoassociative ANNs. In autoassociation, a noisy
or partially incomplete input pattern causes the network to stabilise to a state corresponding to the
original pattern. It is also useful for optimisation tasks.
841020185
8
ICT619 S2-05
Figure 3: Architecture of the Hopfield network
The Hopfield net is a recursive ANN in which the output produced by each neuron is fed back as input
to all other neurons. Neurons perform a weighted sum with a step transfer function.
The Hopfield net has no learning algorithm as such. Patterns (or facts) are simply stored by setting
weights to make the network converge to states corresponding to the patterns to be learned. During
operation, an input pattern is applied to all neurons simultaneously and the network is left to stabilise.
Outputs from the neurons in the stable state form the output of the network. When presented with an
input pattern, the net outputs a stored pattern nearest to the presented pattern.
How ANNs are applied
Solutions to many real-life problems are very difficult, if not impossible, to define algorithmically due
mainly to the unstructured nature of the problem. There may be just too many variables and/or the
interactions of relevant variables may not be understood adequately. In many situations, input data may
be partially corrupt or missing, making it difficult, if not impossible, for a logical sequence of solution
steps as stated in algorithms to function effectively.
Instead of implementing an algorithm, the typical ANN attempts to arrive at an answer by learning to
identify the right answer through an iterative process of self-adaptation or training. This training is a
simulation of human behaviour in learning.
For example, consider a medical application where some measurement data related to symptoms,
treatments and history are given for each person in a sample, together with a value indicating whether a
certain disease was found present or not (after costly tests). Suppose we want to find a simple
relationship between the given data and the presence of the disease, which could lead to an inexpensive
method for screening for the disease, or to an understanding of which factors are important for disease
prevention.
If there are many factors, with complex interactions among them, the usual "linear" statistical
techniques may be inappropriate. In that case, one way of analysing data is by the use of neural
networks, which can, in principle, find simple relationships by means of an adaptive learning procedure
in very general circumstances.
Current applications of ANNs
Beyond applications in data analysis, such as the above, ANNs are being used in an increasing number
of applications where high-speed computation of functions is important. For example, to correct an
841020185
9
ICT619 S2-05
industrial robot's positioning to take into account the mass of an object it is holding, would normally
require time-consuming numerical calculations, which are impossible to do in a real-time situation
when the robot is in use. An ANN can learn the necessary corrections based on trial motions, and can
perform the corrections in real time. It is not necessary to use all possible masses and motions during
training, since ANNs have the capacity to interpolate smoothly to cases not presented in training.
Due to their ability to recognise complex patterns, ANNs have been widely applied in character,
handwritten text and signature recognition, as well as more complex images such as faces. They have
also been used successfully for speech recognition and synthesis.
One of the most successful applications of ANNs has been as a decision support tool in the area of
finance and banking. Some examples of commercial applications of ANN are:
 Financial market analysis for investment decision making
 Sales support - targeting customers for telemarketing
 Bankruptcy prediction
 Intelligent flexible manufacturing systems
 Stock market prediction
 Resource allocation – scheduling and management of personnel and equipment
The results of a recent survey (Quaddus & Khan) of the application ANNs in business are summarised
in the table shown below. This survey focussed on publications reporting applications in a number of
different categories. Note the numbers for 1998 are incomplete. It is interesting to note the apparent
drop in recent years in research publication on ANN commercial applications. This could be an
indication of the ANN methodology becoming more established in the market place with a shift from
research to actual development activities, which do not get as much publicity.
Table 1: Distribution of the Articles by Areas and Year
AREA
1988
Accounting/Auditing
1
Finance
0
Human resources
0
Information systems
4
Marketing/Distribution
2
Production
2
Others
0
Yearly Total
9
% of Total
1.32
89
0
0
0
6
2
6
0
14
2.05
90
1
4
0
9
2
8
1
25
3.65
91
1
11
1
7
3
21
7
51
7.46
92
6
19
0
15
8
31
3
82
11.99
93
3
28
1
24
10
38
8
112
16.37
94
3
27
1
21
12
24
7
95
13.89
95
7
18
0
18
17
50
8
118
17.25
96
7
5
0
13
29
29
7
90
13.16
97
5
9
0
18
14
31
5
82
11.99
98 Total % of Total
0
34
4.97
2
123
17.98
0
3
0.44
3
138
20.18
0
99
14.47
1
241
35.23
0
46
6.73
6
684
100.00
0.88 100.00
Currently neural network systems are available as either software simulation on conventional
computers, or less commonly as hardware that models the parallelism of neurons.
Although it is considered unlikely, ANN-based systems will replace conventional computing systems,
they are becoming established as an alternative to the algorithmic approach to information processing.
Some advantages of ANNs
Generalisation
Neural networks are capable of generalisation, that is, they classify an unknown pattern if it shares the
same distinguishing features with other patterns learned during training. This means noisy or
incomplete inputs will be classified because of their similarity with pure and complete inputs.
Fault Tolerance
Neural networks are highly fault tolerant. This characteristic is also known as "graceful degradation".
Because of its distributed nature, a neural network keeps on working even when a significant fraction
of its neurons and interconnections fail. Also, relearning after damage can be relatively quick.
841020185
10
ICT619 S2-05
Efficiency
The inherent parallelism in the operation of ANNs also makes them fast and efficient for handling large
amounts of data. Eg, once trained, execution time of an a multilayered perceptron is the time it takes to
calculated output values of successive layers of neurons for the hidden layer(s) and the output layer.
REFERENCES
AI Expert (special issue on ANN), June 1990.
BYTE (special issue on ANN), Aug. 1989.
Caudill,M., "The View from Now", AI Expert, June 1992, pp.27-31.
Dhar, V., & Stein, R., Seven Methods for Transforming Corporate Data into Business Intelligence.,
Prentice Hall 1997
URL on ANN - http://ai.about.com/library/weekly/aa031401a.htm
Kirrmann,H., "Neural Computing: The new gold rush in informatics", IEEE Micro June 1989 pp. 7-9
Lippman, R.P., "An Introduction to Computing with Neural Nets", IEEE ASSP Magazine, April 1987
pp.4-21.
Lisboa, P., (Ed.) Neural Networks Current Applications, Chapman & Hall, 1992.
Quaddus, M. A., and Khan, M. S., "Evolution of Artificial Neural Networks in Business Applications:
An Empirical Investigation Using a Growth Model", International Journal of Management & Decision
Making, Vol.3, No.1, March 2002, pp.19-34.
Negnevitsky, M. Artificial Intelligence A Guide to Intelligent Systems, Addison-Wesley 2005.
Wasserman, P.D., Neural Computing, Theory and Practice, Van Nostrand Reinhold, New York 1989
Zahedi, F., Intelligent Systems for Business, Wadsworth Publishing, , Belmont, California, 1993.
http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs11/report.html
http://www.cs.stir.ac.uk/~lss/NNIntro/InvSlides.html
There are also a number of other titles on artificial neural networks in Murdoch University library
841020185
11
ICT619 S2-05