* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Topic 4
Concept learning wikipedia , lookup
Pattern language wikipedia , lookup
Time series wikipedia , lookup
Perceptual control theory wikipedia , lookup
Machine learning wikipedia , lookup
Gene expression programming wikipedia , lookup
Neural modeling fields wikipedia , lookup
Backpropagation wikipedia , lookup
Hierarchical temporal memory wikipedia , lookup
Convolutional neural network wikipedia , lookup
ICT619 Intelligent Systems Topic 4: Artificial Neural Networks Artificial Neural Networks PART A Introduction An overview of the biological neuron The synthetic neuron Structure and operation of an ANN Problem solving by an ANN Learning in ANNs ANN models Applications PART B Developing neural network applications Design of the network Training issues A comparison of ANN and ES Hybrid ANN systems Case Studies ICT619 2 Introduction Artificial Neural Networks (ANN) Also known as Neural networks Neural computing (or neuro-computing) systems Connectionist models ANNs simulate the biological brain for problem solving This represents a totally different approach to machine intelligence from the symbolic logic approach The biological brain is a massively parallel system of interconnected processing elements ANNs simulate a similar network of simple processing elements at a greatly reduced scale ICT619 3 Introduction ANNs adapt themselves using data to learn problem solutions ANNs can be particularly effective for problems that are hard to solve using conventional computing methods First developed in the 1950s, slumped in 70s Great upsurge in interest in the mid 1980s Both ANNs and expert systems are non-algorithmic tools for problem solving ES rely on the solution being expressed as a set of heuristics by an expert ANNs learn solely from data. ICT619 4 ICT619 5 An overview of the biological neuron Estimated 1000 billion neurons in the human brain, with each connected to up to 10,000 others Electrical impulses produced by a neuron travel along the axon The axon connects to dendrites through synaptic junctions ICT619 6 An overview of the biological neuron ICT619 Photo: Osaka University 7 An overview of the biological neuron A neuron collects the excitation of its inputs and "fires" (produces a burst of activity) when the sum of its inputs exceeds a certain threshold The strengths of a neuron’s inputs are modified (enhanced or inhibited) by the synaptic junctions Learning in our brains occurs through a continuous process of new interconnections forming between neurons, and adjustments at the synaptic junctions ICT619 8 The synthetic neuron A simple model of the biological neuron, first proposed in 1943 by McCulloch and Pitts consists of a summing function with an internal threshold, and "weighted" inputs as shown below. ICT619 9 The synthetic neuron (cont’d) For a neuron receiving n inputs, each input xi ( i ranging from 1 to n) is weighted by multiplying it with a weight wi The sum of the products wixi gives the net activation value of the neuron The activation value is subjected to a transfer function to produce the neuron’s output The weight value of the connection carrying signals from a neuron i to a neuron j is termed wij.. ICT619 10 Transfer functions These compute the output of a node from its net activation. Among the popular transfer functions are: Step function Signum (or sign) function Sigmoid function Hyperbolic tangent function In the step function, the neuron produces an output only when its net activation reaches a minimum value – known as the threshold For a binary neuron i, whose output is a 0 or 1 value, the step function can be summarised as: 0 if activationi T outputi 1 if activationi T ICT619 11 Transfer functions (cont’d) The sign function returns a value between -1 and +1. To avoid confusion with 'sine' it is often called signum. outputi +1 0 activationi -1 1 if activationi 0 outputi 1 if activationi 0 ICT619 12 Transfer functions (cont’d) The sigmoid The sigmoid transfer function produces a continuous value in the range 0 to 1 The parameter gain affects the slope of the function around zero ICT619 13 Transfer functions (cont’d) The hyperbolic tangent A variant of the sigmoid transfer function outputi e activationi e activationi e activationi e activationi Has a shape similar to the sigmoid (like an S), with the difference being that the value of outputi ranges between –1 and 1. ICT619 14 Structure and operation of an ANN The building block of an ANN is the artificial neuron. It is characterised by weighted inputs summing and transfer function The most common architecture of an ANN consists of two or more layers of artificial neurons or nodes, with each node in a layer connected to every node in the following layer Signals usually flow from the input layer, which is directly subjected to an input pattern, across one or more hidden layers towards the output layer. ICT619 15 Structure and operation of an ANN The most popular ANN architecture, known as the multilayer perceptron (shown in diagram above), follows this model. In some models of the ANN, such as the selforganising map (SOM) or Kohonen net, nodes in the same layer may have interconnections among them In recurrent networks, connections can even go backwards to nodes closer ICT619 to input 16 Problem solving by an ANN The inputs of an ANN are data values grouped together to form a pattern Each data value (component of the pattern vector) is applied to one neuron in the input layer The output value(s) of node(s) in the output layer represent some function of the input pattern ICT619 17 Problem solving by an ANN (cont’d) • In the example above, the ANN maps the input pattern to either one of two classes The ANN produces the output for an accurate prediction, only if the functional relationships between the relevant variables, namely the components of the input pattern, and the corresponding output, have been “learned” by the ANN Any three-layer ANN can (at least in theory) represent the functional relationship between an input pattern and its class It may be difficult in practice for the ANN to learn a given relationship ICT619 18 Learning in ANN Common human learning behaviour: repeatedly going through same material, making mistakes and learning until able to carry out a given task successfully Learning by most ANNs is modelled after this type of human learning Learned knowledge to solve a given problem is stored in the interconnection weights of an ANN The process by which an ANN arrives at the right values of these weights is known as learning or training ICT619 19 Learning in ANN (cont’d) Learning in ANNs takes place through an iterative training process during which node interconnection weight values are adjusted Initial weights, usually small random values, are assigned to the interconnections between the ANN nodes. Like knowledge acquisition in ES, learning in ANNs can be the most time consuming phase in its development ICT619 20 Learning in ANNs (cont’d) ANN learning (or training) can be supervised or unsupervised In supervised training, data sets consisting of pairs, each one an input patterns and its expected correct output value, are used The weight adjustments during each iteration aim to reduce the “error” (difference between the ANN’s actual output and the expected correct output) Eg, a node producing a small negative output when it is expected to produce a large positive one, has its positive weight values increased and the negative weight values decreased ICT619 21 Learning in ANNs In supervised training, Pairs of sample input value and corresponding output value are used to train the net repeatedly until the output becomes satisfactorily accurate In unsupervised training, there is no known expected output used for guiding the weight adjustments The function to be optimised can be any function of the inputs and outputs, usually set by the application the net adapts itself to align its weight values with training patterns This results in groups of nodes responding strongly to specific groups of similar inputs patterns ICT619 22 The two states of an ANN A neural network can be in one of two states: training mode or operation mode Most ANNs learn off-line and do not change their weights once training is finished and they are in operation In an ANN capable of on-line learning, training and operation continue together ANN training can be time consuming, but once trained, the resulting network can be made to run very efficiently – providing fast responses ICT619 23 ANN models ANNs are supposed to model the structure and operation of the biological brain But there are different types of neural networks depending on the architecture, learning strategy and operation Three of the most well known models are: 1. The multilayer perceptron 2. The Kohonen network (the Self-Organising Map) 3. The Hopfield net The Multilayer Perceptron (MLP) is the most popular ANN architecture ICT619 24 The Multilayer Perceptron Nodes are arranged into an input layer, an output layer and one or more hidden layers Also known as the backpropagation network because of the use of error values from the output layer in the layers before it to calculate weight adjustments during training. Another name for the MLP is the feedforward network. ICT619 25 MLP learning algorithm The learning rule for the multilayer perceptron is known as "the generalised delta rule" or the "backpropagation rule" The generalised delta rule repeatedly calculates an error value for each input, which is a function of the squared difference between the expected correct output and the actual output The calculated error is backpropagated from one layer to the previous one, and is used to adjust the weights between connecting layers ICT619 26 MLP learning algorithm (cont’d) New weight = Old weight + change calculated from square of error Error = difference between desired output and actual output Training stops when error becomes acceptable, or after a predetermined number of iterations After training, the modified interconnection weights form a sort of internal representation that enables the ANN to generate desired outputs when given the training inputs – or even new inputs that are similar to training inputs This generalisation is a very important property ICT619 27 The error landscape in a multilayer perceptron For a given pattern p, the error Ep can be plotted against the weights to give the so called error surface The error surface is a landscape of hills and valleys, with points of minimum error corresponding to wells and maximum error found on peaks. The generalised delta rule aims to minimise Ep by adjusting weights so that they correspond to points of lowest error It follows the method of gradient descent where the changes are made in the steepest downward direction All possible solutions are depressions in the error surface, known as basins of attraction ICT619 28 The error landscape in a multilayer perceptron Ep j i ICT619 29 Learning difficulties in multilayer perceptrons - local minima The MLP may fail to settle into the global minimum of the error surface and instead find itself in one of the local minima This is due to the gradient descent strategy followed A number of alternative approaches can be taken to reduce this possibility: Lowering the gain term progressively Used to influence rate at which weight changes are made during training Value by default is 1, but it may be gradually reduced to reduce the rate of change as training progresses ICT619 30 Learning difficulties in multilayer perceptrons (cont’d) Addition of more nodes for better representation of patterns Too few nodes (and consequently not enough weights) can cause failure of the ANN to learn a pattern Introduction of a momentum term Determines effect of past weight changes on current direction of movement in weight space Momentum term is also a small numerical value in the range 0 -1 Addition of random noise to perturb the ANN out of local minima Usually done by adding small random values to weights. Takes the net to a different point in the error space – hopefully out of a local minimum ICT619 31 The Kohonen network (the selforganising map) Biological systems display both supervised and unsupervised learning behaviour A neural network with unsupervised learning capability is said to be self-organising During training, the Kohonen net changes its weights to learn appropriate associations, without any right answers being provided ICT619 32 The Kohonen network (cont’d) The Kohonen net consists of an input layer, that distributes the inputs to every node in a second layer, known as the competitive layer. The competitive (output) layer is usually organised into some 2-D or 3-D surface (feature map) ICT619 33 Operation of the Kohonen Net Each neuron in the competitive layer is connected to other neurons in its neighbourhood Neurons in the competitive layer have excitatory (positively weighted) connections to immediate neighbours and inhibitory (negatively weighted) connections to more distant neurons. As an input pattern is presented, some of the neurons in the competitive layer are sufficiently activated to produce outputs, which are fed to other neurons in their neighbourhoods The node with the set of input weights closest to the input pattern component values produces the largest output. This node is termed the best matching (or winning) node ICT619 34 Operation of the Kohonen Net (cont’d) During training, input weights of the best matching node and its neighbours are adjusted to make them resemble the input pattern even more closely At the completion of training, the best matching node ends up with its input weight values aligned with the input pattern and produces the strongest output whenever that particular pattern is presented The nodes in the winning node's neighbourhood also have their weights modified to settle down to an average representation of that pattern class As a result, the net is able to represent clusters of similar input patterns - a feature found useful for data mining applications, for example. ICT619 35 The Hopfield Model The Hopfield net is the most widely known of all the autoassociative pattern completing - ANNs In autoassociation, a noisy or partially incomplete input pattern causes the network to stabilise to a state corresponding to the original pattern It is also useful for optimisation tasks. The Hopfield net is a recurrent ANN in which the output produced by each neuron is fed back as input to all other neurons Neurons computer a weighted sum with a step transfer function. ICT619 36 The Hopfield Model (cont’d) The Hopfield net has no iterative learning algorithm as such. Patterns (or facts) are simply stored by adjusting the weights to lower a term called network energy During operation, an input pattern is applied to all neurons simultaneously and the network is left to stabilise Outputs from the neurons in the stable state form the output of the network. When presented with an input pattern, the net outputs a stored pattern nearest to the presented pattern. ICT619 37 When ANNs should be applied Difficulties with some real-life problems: Solutions are difficult, if not impossible, to define algorithmically due mainly to the unstructured nature Too many variables and/or the interactions of relevant variables not understood well Input data may be partially corrupt or missing, making it difficult for a logical sequence of solution steps to function effectively ICT619 38 When ANNs should be applied (cont’d) The typical ANN attempts to arrive at an answer by learning to identify the right answer through an iterative process of self-adaptation or training If there are many factors, with complex interactions among them, the usual "linear" statistical techniques may be inappropriate If sufficient data is available, an ANN can find the relevant functional relationship by means of an adaptive learning procedure from the data ICT619 39 Current applications of ANNs ANNs are good at recognition and classification tasks Due to their ability to recognise complex patterns, ANNs have been widely applied in character, handwritten text and signature recognition, as well as more complex images such as faces They have also been used successfully for speech recognition and synthesis ANNs are being used in an increasing number of applications where high-speed computation of functions is important, eg, in industrial robotics ICT619 40 Current applications of ANNs (cont’d) One of the more successful applications of ANNs has been as a decision support tool in the area of finance and banking Some examples of commercial applications of ANN are: Financial market analysis for investment decision making Sales support - targeting customers for telemarketing Bankruptcy prediction Intelligent flexible manufacturing systems Stock market prediction Resource allocation – scheduling and management of personnel and equipment ICT619 41 ANN applications - broad categories According to a survey (Quaddus & Khan, 2002) covering the period 1988 up to mid 1998, the main business application areas of ANNs are: Production (36%) Information systems (20%) Finance (18%) Marketing & distribution (14.5%) Accounting/Auditing (5%) Others (6.5%) ICT619 42 ANN applications - broad categories (cont’d) Table 1: Distribution of the Articles by Areas and Year AREA 1988 Accounting/Auditing 1 Finance 0 Human resources 0 Information systems 4 Marketing/Distribution 2 Production 2 Others 0 Yearly Total 9 % of Total 1.32 89 0 0 0 6 2 6 0 14 2.05 90 1 4 0 9 2 8 1 25 3.65 91 1 11 1 7 3 21 7 51 7.46 92 6 19 0 15 8 31 3 82 11.99 93 3 28 1 24 10 38 8 112 16.37 94 3 27 1 21 12 24 7 95 13.89 95 7 18 0 18 17 50 8 118 17.25 96 7 5 0 13 29 29 7 90 13.16 97 5 9 0 18 14 31 5 82 11.99 98 Total % of Total 0 34 4.97 2 123 17.98 0 3 0.44 3 138 20.18 0 99 14.47 1 241 35.23 0 46 6.73 6 684 100.00 0.88 100.00 The levelling off of publications on ANN applications may be attributed to the ANN moving from the research to the commercial application domain The emergence of other intelligent system tools may be another factor ICT619 43 Some advantages of ANNs Able to take incomplete or corrupt data and provide approximate results. Good at generalisation, that is recognising patterns similar to those learned during training Inherent parallelism makes them fault-tolerant – loss of a few interconnections or nodes leaves the system relatively unaffected Parallelism also makes ANNs fast and efficient for handling large amounts of data. ICT619 44 ANN State-of-the-art overview Currently neural network systems are available as Software simulation on conventional computers - prevalent Special purpose hardware that models the parallelism of neurons. ANN-based systems not likely to replace conventional computing systems, but they are an established alternative to the symbolic logic approach to information processing A new computing paradigm in the form of hybrid intelligent systems has emerged - often involving ANNs with other intelligent system tools ICT619 45 REFERENCES AI Expert (special issue on ANN), June 1990. BYTE (special issue on ANN), Aug. 1989. Caudill,M., "The View from Now", AI Expert, June 1992, pp.27-31. Dhar, V., & Stein, R., Seven Methods for Transforming Corporate Data into Business Intelligence., Prentice Hall 1997 Kirrmann,H., "Neural Computing: The new gold rush in informatics", IEEE Micro June 1989 pp. 7-9 Lippman, R.P., "An Introduction to Computing with Neural Nets", IEEE ASSP Magazine, April 1987 pp.4-21. Lisboa, P., (Ed.) Neural Networks Current Applications, Chapman & Hall, 1992. Negnevitsky, M. Artificial Intelligence A Guide to Intelligent Systems, Addison-Wesley 2005. ICT619 46 REFERENCES (cont’d) Quaddus, M. A., and Khan, M. S., "Evolution of Artificial Neural Networks in Business Applications: An Empirical Investigation Using a Growth Model", International Journal of Management and Decision Making, Vol.3, No.1, March 2002, pp.19-34.(see also ANN application publications end note library files, ICT619 ftp site) Wasserman, P.D., Neural Computing, Theory and Practice, Van Nostrand Reinhold, New York 1989 Wong, B.K., Bodnovich, T.A., Selvi, Yakup, "Neural Networks applications in business: A Review and Analysis of the literature (1988-95)", Decision Support Systems, 19, 1997, pp. 301-320. Zahedi, F., Intelligent Systems for Business, Wadsworth Publishing, Belmont, California, 1993. http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs11/report.ht ml ICT619 47