* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Topic 4A Neural Networks
Time series wikipedia , lookup
Machine learning wikipedia , lookup
Perceptual control theory wikipedia , lookup
Gene expression programming wikipedia , lookup
Neural modeling fields wikipedia , lookup
Pattern recognition wikipedia , lookup
Hierarchical temporal memory wikipedia , lookup
ICT619 Intelligent Systems Topic 4: Artificial Neural Networks PART A Introduction An overview of the biological neuron The synthetic neuron Structure and operation of an ANN Learning in ANNs -Supervised and unsupervised learning ANN models Applications PART B Developing neural network applications - Design of the network - Training issues A comparison of ANN and ES Hybrid ANN systems Case Studies 841020185 1 ICT619 S2-05 Introduction Artificial Neural Networks (ANN) are inspired by the biological brain, which consists of billions of interconnected neurons working in parallel. The typical ANN is a computer program that enables us to simulate a similar network of simple processing elements, albeit at a greatly reduced scale. The significant feature of these networks is - they can adapt themselves to learn problem solutions. They have proved particularly effective for problems that are hard to solve using conventional computing methods. Since its inception in the 1950’s and 60’s, the history of ANNs has been marked by great initial enthusiasm followed by a relatively long period lacking serious attention during the 1970’s and early 1980’s. With a major breakthrough in neural network training methodology (the generalised delta rule) in the 1980s, an upsurge in interest in ANNs has taken place. During the last 15 years or so, numerous ANN-based systems have been developed experimentally and commercial products based on the neural network principle have appeared in the market. A few other terms used for ANNs are – neurocomputing, neural computing, neural networks, parallel distributed processing or connectionism. ANNs and expert systems are similar to the extent that both provide a tool for problem solving when no algorithm is available. While the ES rely on the solution being expressed as a set of heuristics by an expert, ANNs learn solely from data. An overview of the biological neuron A neuron is a type of brain cell, which communicates with other neurons using neurotransmitters that carry signals. Electrical impulses produced by a neuron travel out of it along a transmission channel known as the axon and activate the synaptic junctions. The synaptic junctions act like gates between the axons and dendrites, which form the network of interconnections between neurons. The impulses cross the synapse and travel along the dendrites to other neurons. A neuron adds its inputs and "fires" (produces an output) when the sum of its inputs exceeds a certain threshold. The strengths of a neuron’s inputs are modified by the synaptic junctions. Because more than one neuron is connected at the synaptic junctions, a chain reaction occurs where a large number of neurons may be triggered. The resulting brain activity is responsible for our thought processes and reactions to external stimuli. There is an estimated 1000 billion neurons in the human brain, with each neuron connected to up to 10 thousand other neurons. The neurons are structured in layers. Each layer consists of a network of neurons. The layers are also interconnected. As a simplified model, the brain may be viewed as a massively parallel system of highly interconnected but relatively simple processing elements. Learning in our brains occurs through a continuous process 841020185 2 ICT619 S2-05 of new interconnections forming between neurons, and adjustments at the synaptic junctions to facilitate or inhibit the passage of signals. The synthetic neuron The synthetic or artificial neuron, which is a simple model of the biological neuron, was first proposed in 1943 by McCulloch and Pitts. It consists of a summing function with an internal threshold, and "weighted" inputs as shown below. For a neuron receiving n inputs, each input xi ( i ranging from 1 to n) is weighted by multiplying it with a weight wi . The sum of the wixi products gives the net activation of the neuron. This activation value is subjected to a transfer function to produce the neuron’s output. The weight value of the connection or link carrying signals from a neuron i to a neuron j is termed wij.. wij i Direction of signal flow j Transfer functions One of the design issues for ANNs is the type of transfer function used to compute the output of a node from its net activation. Among the popular transfer functions are: Step function Signum function Sigmoid function Hyperbolic tangent function In the step function, the neuron produces an output only when its net activation reaches a minimum value – known as the threshold. For a binary neuron i, whose output is a 0 or 1 value, the step function can be summarised as: 0 if activationi T outputi 1 if activationi T When the threshold T is 0, the step function is called signum. Another type of signum function is described by: 1 if activationi 0 output i 0 if activationi 0 1 if activation 0 i The sigmoid transfer function produces a continuous value in the range 0 to 1. It has the form: outputi 841020185 1 1 e gain . activationi 3 ICT619 S2-05 The parameter gain is determined by the system designer. It affects the slope of the transfer around zero. The multilayer perceptron uses the sigmoid as the transfer function. A variant of the sigmoid transfer function is the hyperbolic tangent function. It has the form: outputi e activationi e activationi e activationi e activationi This function has a shape similar to the sigmoid (shaped like an S), with the difference that the value of outputi ranges between –1 and 1. output i output i Step function Signum 1 1 0 0 T activation i activation i output i output i 1 1 Sigmoid 0.5 0 Hyperbolic Tangent 0 -1 activation i activation i Figure Functional form of transfer functions Structure and operation of an ANN The building block of an ANN is the artificial neuron as described above with its weighted inputs, summing and transfer function. The most common architecture of an ANN consists of two or more layers of artificial neurons or nodes, with each node in a layer connected to every node in the following layer. Signal usually flows from the input layer, which is directly subjected to an input pattern, across one or more hidden layers towards the output layer. The most popular ANN architecture, known as the multilayer perceptron (shown in diagram above), follows this model. 841020185 4 ICT619 S2-05 Figure Structure of a typical ANN In some models of the ANN, such as the self-organising map (SOM) or Kohonen net, nodes in the same layer may have interconnections among them. The input stimulus of an ANN are data values grouped together to form a pattern vector. As a rather simplistic example, consider an ANN used to forecast interest rate movements. The input pattern vector may consist of the four components – inflation rate, unemployment rate, economic growth rate, and the proximity of federal election. The input layer would consist of four nodes, each of which would receive a value representing one of these attributes. The output layer could have just one node whose output level would indicate whether or not interest rate is likely to rise - say, high for a rise and low otherwise. We can expect the ANN to produce the output for an accurate prediction, only if the functional relationships between the relevant variables, namely the four components of the input pattern, and the corresponding output – the interest rate, have been “learned” by the ANN. Learning by ANNs is modelled after human learning behaviour, i.e., repeatedly going through the same materials, making mistakes and adapting until able to carry out a given task successfully. Learning in artificial neural networks The connection weights are the single most important factor in the input output process of a neuron. The values of these weights determine the outcome of the network. In most neural networks, the system itself computes the weights among its neurons. The process by which an ANN arrives at the values of these weights is known as learning or training. Learning in ANNs takes place through an iterative training process during which node interconnection weight values are adjusted. Initial weights, usually small random values, are assigned to the interconnections between the ANN nodes. Supervised and unsupervised learning In supervised training, data sets comprising input patterns and corresponding output values are used. The weight adjustments during each iteration aim to reduce the “error” or difference between the ANN’s actual output and the expected correct output. For example, a node producing a small output when it is expected to produce a large positive one, has its positive weight values increased and the negative weight values decreased. This has the effect of increasing the sum of its inputs (its activation level), resulting in a higher output level. Pairs of sample input value and corresponding output value(s) are used to train the net repeatedly until the output becomes satisfactorily accurate as indicated by a low enough error value. The matrix of weight values in a trained ANN represents a model of the solution, or the “knowledge” required for the solution. 841020185 5 ICT619 S2-05 In unsupervised training, there is no known expected output to be used for guiding the weight adjustments. Instead, the net adapts itself to align its weight values with training patterns. This results in groups of nodes responding strongly to specific groups of inputs patterns having some common attributes. These nodes are said to form clusters of input patterns, which can be studied to find interesting patterns in the input data. Most ANNs employ supervised learning because this type is of more immediate use for problem solving. A neural network can be in one of two states: training mode and operation mode. Most ANNs do not change their weights once training is finished and they are in operation Learning in ANNs can also be categorised as: - Off-line - On-line Training ANNs for real-life applications can be time consuming, but once trained, the resulting network can be run highly efficiently – providing fast responses. These networks are of the off-line learning type. In on-line learning, the ANN continues to learn while in operation. Whenever it encounters an input which is different from any that it has learned already, it adapts its weights to incorporate this input as new knowledge. ANN models Although ANNs are supposed to model the structure and operation of the biological brain, there are different types of neural networks depending on the architecture, learning strategy and operation. Three of the most well known models are outlined below. The Multilayer Perceptron (MLP) In a multilayer perceptron, the neurons are arranged into an input layer, an output layer and one or more hidden layers. It is the most popular ANN architecture and has served as the platform for the rapid expansion of the application of ANNs. The MLP is also known as the backpropagation (or simply backprop) network (BPN) because of the use of error values from the output layer in the layers before it to calculate weight adjustments during training. Figure The general multilayer perceptron architecture consisting of an input, a hidden and an output layer of neurons. 841020185 6 ICT619 S2-05 The learning rule for the multilayer perceptron is known as "the generalised delta rule" or the "backpropagation rule". The generalised delta rule repetitively calculates an error value for each input, which is a function of the squared difference between the expected correct output and the actual output. The calculated error is backpropagated from one layer to the previous one for adjusting the weights connecting layers. The error landscape in a multilayer perceptron The behaviour of a neural network as it attempts to arrive at a solution can be visualised in terms of the error function Ep, which is a function of the input and the weights. For a given pattern p, the error Ep can be plotted against the weights to give the so called error surface. The error surface is a landscape of hills and valleys, with points of minimum error corresponding to wells and maximum error found on peaks. The generalised delta rule aims to minimise Ep by adjusting weights so that they correspond to points of lowest error. It does this by the method of gradient descent where the changes are made in the steepest downward direction. All possible solutions are depressions in the error surface, known as basins of attraction. During training using error backpropagation, the modified interconnection weights of the multilayer perceptron form internal representations that enable it to generate the desired outputs when given the training inputs. This internal representation is also applicable to inputs that were not used during training. The neural net is thus able to classify these previously unseen inputs according to the features they share with the training examples. This feature of ANNs endows them with the capability to generalise - something humans perform quite often. We haven't seen all the men in the world but we are most likely to recognise one as a man instantly (unless heavily disguised). The representation of the problem solution in the MLP may be viewed as the formation of a solution surface spanning the solution points the network has been trained with (Dhar & Stein). Unknown solution points found by the net during operation may be visualised as points interpolated from the learned solution points. Learning difficulties in multilayer perceptrons Occasionally, the multilayer perceptron fails to settle into the global minimum of the energy surface and instead find itself in one of the local minima. This is due to the gradient descent strategy followed. A number of alternative approaches can be taken to reduce this possibility: Lowering the gain term progressively. This term is used to influence the rate at which weight changes are made in each iteration of the training. Its values by default is 1, but may be gradually reduced to reduce the rate of change as training progresses. Addition of more nodes for better representation of patterns. Too few nodes (and consequently weights) can cause failure of the ANN to learn a pattern. Introduction of a momentum term which determines the effect of past weight changes on the current direction of the movement in weight space. Like the gain term, the momentum term is a small numerical value in the range 0 … 1. Addition of random noise to perturb the ANN out of local minima. The Kohonen network (the self-organising map) The type of learning used in multilayer perceptrons is supervised learning which requires the correct response to be provided during training. Biological systems display this type of learning, but they are also capable of learning by themselves - without a supervisor showing the correct response. Think of a child learning from a teacher, but also as a baby learning to recognise its mother without any apparent guidance. A neural network with a similar capability is said to be self-organising. The Kohonen net belongs to this group of ANNs. During training, the network changes its weights to learn appropriate associations, without any right answers being provided. 841020185 7 ICT619 S2-05 The main difference in the layout of self-organising networks is the presence of lateral connections. These connections are similar to the connectivities found in the cerebral cortex. The Kohonen net consists of an input layer, which distributes the inputs to each node in a second layer, the so-called competitive layer. Each of the nodes on this layer acts as an output node. Figure The architecture of the Kohonen net. Each neuron in the competitive layer is connected to other neurons in its neighbourhood and feedback is restricted to neighbours through these lateral connections. Neurons in the competitive layer have excitatory (positively weighted) connections to immediate neighbours and inhibitory (negatively weighted) connections to more distant neurons. As an input pattern is presented, some of the neurons in the competitive layer are sufficiently activated to produce outputs, which are fed back to other neurons in their neighbourhoods. The node with the set of input weights closest to the input pattern component values produces the largest output. This node is termed the winning node. During training, input weights of the winning node and its neighbours are adjusted to make them resemble the input pattern even more closely. At the completion of training, the winning node ends up with its input weight values aligned with the input pattern and produces the strongest output whenever that particular pattern is presented. The nodes in the winning node's neighbourhood also have their weights modified to settle down to an average representation of that pattern class. As a result, the net is able to represent clusters of similar input patterns - a feature found useful for data mining applications. The Hopfield Model The Hopfield net is the most widely known of all the autoassociative ANNs. In autoassociation, a noisy or partially incomplete input pattern causes the network to stabilise to a state corresponding to the original pattern. It is also useful for optimisation tasks. 841020185 8 ICT619 S2-05 Figure 3: Architecture of the Hopfield network The Hopfield net is a recursive ANN in which the output produced by each neuron is fed back as input to all other neurons. Neurons perform a weighted sum with a step transfer function. The Hopfield net has no learning algorithm as such. Patterns (or facts) are simply stored by setting weights to make the network converge to states corresponding to the patterns to be learned. During operation, an input pattern is applied to all neurons simultaneously and the network is left to stabilise. Outputs from the neurons in the stable state form the output of the network. When presented with an input pattern, the net outputs a stored pattern nearest to the presented pattern. How ANNs are applied Solutions to many real-life problems are very difficult, if not impossible, to define algorithmically due mainly to the unstructured nature of the problem. There may be just too many variables and/or the interactions of relevant variables may not be understood adequately. In many situations, input data may be partially corrupt or missing, making it difficult, if not impossible, for a logical sequence of solution steps as stated in algorithms to function effectively. Instead of implementing an algorithm, the typical ANN attempts to arrive at an answer by learning to identify the right answer through an iterative process of self-adaptation or training. This training is a simulation of human behaviour in learning. For example, consider a medical application where some measurement data related to symptoms, treatments and history are given for each person in a sample, together with a value indicating whether a certain disease was found present or not (after costly tests). Suppose we want to find a simple relationship between the given data and the presence of the disease, which could lead to an inexpensive method for screening for the disease, or to an understanding of which factors are important for disease prevention. If there are many factors, with complex interactions among them, the usual "linear" statistical techniques may be inappropriate. In that case, one way of analysing data is by the use of neural networks, which can, in principle, find simple relationships by means of an adaptive learning procedure in very general circumstances. Current applications of ANNs Beyond applications in data analysis, such as the above, ANNs are being used in an increasing number of applications where high-speed computation of functions is important. For example, to correct an 841020185 9 ICT619 S2-05 industrial robot's positioning to take into account the mass of an object it is holding, would normally require time-consuming numerical calculations, which are impossible to do in a real-time situation when the robot is in use. An ANN can learn the necessary corrections based on trial motions, and can perform the corrections in real time. It is not necessary to use all possible masses and motions during training, since ANNs have the capacity to interpolate smoothly to cases not presented in training. Due to their ability to recognise complex patterns, ANNs have been widely applied in character, handwritten text and signature recognition, as well as more complex images such as faces. They have also been used successfully for speech recognition and synthesis. One of the most successful applications of ANNs has been as a decision support tool in the area of finance and banking. Some examples of commercial applications of ANN are: Financial market analysis for investment decision making Sales support - targeting customers for telemarketing Bankruptcy prediction Intelligent flexible manufacturing systems Stock market prediction Resource allocation – scheduling and management of personnel and equipment The results of a recent survey (Quaddus & Khan) of the application ANNs in business are summarised in the table shown below. This survey focussed on publications reporting applications in a number of different categories. Note the numbers for 1998 are incomplete. It is interesting to note the apparent drop in recent years in research publication on ANN commercial applications. This could be an indication of the ANN methodology becoming more established in the market place with a shift from research to actual development activities, which do not get as much publicity. Table 1: Distribution of the Articles by Areas and Year AREA 1988 Accounting/Auditing 1 Finance 0 Human resources 0 Information systems 4 Marketing/Distribution 2 Production 2 Others 0 Yearly Total 9 % of Total 1.32 89 0 0 0 6 2 6 0 14 2.05 90 1 4 0 9 2 8 1 25 3.65 91 1 11 1 7 3 21 7 51 7.46 92 6 19 0 15 8 31 3 82 11.99 93 3 28 1 24 10 38 8 112 16.37 94 3 27 1 21 12 24 7 95 13.89 95 7 18 0 18 17 50 8 118 17.25 96 7 5 0 13 29 29 7 90 13.16 97 5 9 0 18 14 31 5 82 11.99 98 Total % of Total 0 34 4.97 2 123 17.98 0 3 0.44 3 138 20.18 0 99 14.47 1 241 35.23 0 46 6.73 6 684 100.00 0.88 100.00 Currently neural network systems are available as either software simulation on conventional computers, or less commonly as hardware that models the parallelism of neurons. Although it is considered unlikely, ANN-based systems will replace conventional computing systems, they are becoming established as an alternative to the algorithmic approach to information processing. Some advantages of ANNs Generalisation Neural networks are capable of generalisation, that is, they classify an unknown pattern if it shares the same distinguishing features with other patterns learned during training. This means noisy or incomplete inputs will be classified because of their similarity with pure and complete inputs. Fault Tolerance Neural networks are highly fault tolerant. This characteristic is also known as "graceful degradation". Because of its distributed nature, a neural network keeps on working even when a significant fraction of its neurons and interconnections fail. Also, relearning after damage can be relatively quick. 841020185 10 ICT619 S2-05 Efficiency The inherent parallelism in the operation of ANNs also makes them fast and efficient for handling large amounts of data. Eg, once trained, execution time of an a multilayered perceptron is the time it takes to calculated output values of successive layers of neurons for the hidden layer(s) and the output layer. REFERENCES AI Expert (special issue on ANN), June 1990. BYTE (special issue on ANN), Aug. 1989. Caudill,M., "The View from Now", AI Expert, June 1992, pp.27-31. Dhar, V., & Stein, R., Seven Methods for Transforming Corporate Data into Business Intelligence., Prentice Hall 1997 URL on ANN - http://ai.about.com/library/weekly/aa031401a.htm Kirrmann,H., "Neural Computing: The new gold rush in informatics", IEEE Micro June 1989 pp. 7-9 Lippman, R.P., "An Introduction to Computing with Neural Nets", IEEE ASSP Magazine, April 1987 pp.4-21. Lisboa, P., (Ed.) Neural Networks Current Applications, Chapman & Hall, 1992. Quaddus, M. A., and Khan, M. S., "Evolution of Artificial Neural Networks in Business Applications: An Empirical Investigation Using a Growth Model", International Journal of Management & Decision Making, Vol.3, No.1, March 2002, pp.19-34. Negnevitsky, M. Artificial Intelligence A Guide to Intelligent Systems, Addison-Wesley 2005. Wasserman, P.D., Neural Computing, Theory and Practice, Van Nostrand Reinhold, New York 1989 Zahedi, F., Intelligent Systems for Business, Wadsworth Publishing, , Belmont, California, 1993. http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs11/report.html http://www.cs.stir.ac.uk/~lss/NNIntro/InvSlides.html There are also a number of other titles on artificial neural networks in Murdoch University library 841020185 11 ICT619 S2-05