Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
ICT619 Intelligent Systems Topic 4: Artificial Neural Networks Artificial Neural Networks PART A Introduction An overview of the biological neuron The synthetic neuron Structure and operation of an ANN Problem solving by an ANN Learning in ANNs ANN models Applications PART B Developing neural network applications Design of the network Training issues A comparison of ANN and ES Hybrid ANN systems Case Studies ICT619 2 Developing neural network applications Neural Network Implementations Three possible practical implementations of ANNs are: 1. A software simulation program running on a digital computer 2. A hardware emulator connected to a host computer called a neurocomputer 3. True electronic circuits ICT619 3 Software Simulations of ANN Currently the cheapest and simplest implementation method for ANNs - at least for general purpose use. Simulates parallel processing on a conventional sequential digital computer Replicates temporal behaviour of the network by updating the activation level and output of each node for successive time steps These steps are represented by iterations or loops Within each loop, the updates for all nodes in a layer are performed. ICT619 4 Software simulations of ANN (cont’d) In multilayer ANNs, processing for a layer is completed and its output used to calculate states of the nodes in the following layer Typical additional features of ANN simulators 1. Configuring the net according to a chosen architecture and node operational characteristic 2. Implementation of training phase using a chosen training algorithm 3. Tools for visualising and analysing behaviour of nets ANN simulators are written in hi-level languages such as C, C++ and Java. ICT619 5 Advantages and possible problems with software simulators Advantages and possible problems with software simulators Main attraction of ANN simulators is the relatively low cost and wide availability of ready-made commercial packages They are also compact, flexible and highly portable. Writing your own simulator requires programming skills and would be time consuming (except that you don't have to now!) Training of ANNs using software simulators can be slow for larger networks (greater than a few hundred) ICT619 6 Commercially available neural net packages Prewritten shells with convenient user interfaces Cost a few hundred to tens of thousands of dollars Allow users to specify the ANN design and training parameters Usually provide graphic interfaces to enable monitoring of the net’s training and operation Likely to provide interfacing with other software systems such as spreadsheets and databases. ICT619 7 Neurocomputers Dedicated special-purpose digital computer (aka accelerator boards) Optimised to perform operations common in neural network simulation Acts as a coprocessor to a host computer and is controlled by a program running on the host. Can be tens to thousands of times faster than simulators Systems are available with approx. 1000 million IPS connection updates per second for networks with 8,192 neurons e.g ACC Neural Network Processor ICT619 8 Neurocomputers Genobyte's CAM-Brain Machine was developed between 1997 and 2000 ICT619 9 True Networks in Hardware Closer to biological neural networks than simulations Consist of synthetic neurons actually fabricated on silicon chips Commercially available hardwired ANNs are limited to a few thousand neurons per chip1. Chips connected in parallel to achieve larger networks. Problems: interconnection and interference, fixedvalued weights - work progressing on modifiable synapses. 1 Figures more than five years old. ICT619 10 Neural Network Development Methodology Aims to add structure and organisation to ANN applications development for reducing cost, increasing accuracy, consistency, user confidence and friendliness Split development into the following phases: The Concept Phase The Design Phase The Implementation Phase The Maintenance Phase ICT619 11 Neural Network Development Methodology - the Concept Phase Involves Validating the proposed application Selecting an appropriate neural paradigm. Application validation Problem characteristics suitable for neural network application are: Data intensive Multiple interacting parameters Incomplete, erroneous, noisy data Solution function unknown or expensive Requires flexibility, generalisation, fault-tolerance, speed ICT619 12 ANN Development Methodology - the Concept Phase (cont’d) Common examples of applications with above attributes are pattern recognition (eg, printed or handwritten character, consumer behaviour, risk patterns), forecasting (eg, stock market), signal (audio, video, ultrasound) processing Problems not suitable for ANN-based solutions include: A mathematically accurate and precise solution is available Solution involving deduction and step-wise logic appropriate Applications involving explaination or reporting One application area that is unsuitable for ANNs is resource management eg, inventory, accounts, sales data analysis ICT619 13 Selecting an ANN paradigm Decision based on comparison of application requirements to capabilities of different paradigms eg, the multilayer perceptron is well known for its pattern recognition capabilities, Kohonen net more suited for applications involving data clustering Choice of paradigm also influenced by the training method that can be employed eg. supervised training must have adequate number of input-correct output pairs available and training may take a relatively long time Technical and economic feasibility assessments should be carried out to complete the concept phase ICT619 14 The Design Phase The design phase specifies initial values and conditions at the node, network and training levels Decisions to be made at the node level include: Types of input – binary (0,1), bipolar (-1,+1), trivalent (1, 0, +1), discrete, continuous-valued Transfer function - step or threshold, hyperbolic tangent, sigmoid, consider possible use of lookup tables for speeding up calculations Decisions to be made at the network architecture level The number and size of layers and their connectivity (fully interconnected, or sparsely interconnected, feedforward or recurrent, other?) ICT619 15 The Design Phase (cont’d) 'Size' of a layer is the number of nodes in the layer For the input layer, size is determined by number of data sources (input vector components) and possibly the mathematical transformations done The number of nodes in the output layer is determined by the number of classes or decision values to be output Finding optimal size of the hidden layer needs some experimentation Too few nodes will produce inadequate mapping, while too many may result in inadequate generalisation ICT619 16 The Design Phase (cont’d) Connectivity Connectivity determines the flow of signals between neurons in the same or different layers Some ANN models, such as the multilayer perceptron, have only interlayer connections - there is no intralayer connection The Hopfield net is an example of a model with intralayer connections ICT619 17 The Design Phase (cont’d) Feedback There may be no feedback of output values, eg, the multilayer perceptron or There may be feedback as in a recurrent network eg, the Hopfield net Other design questions include Setting of parameters for the learning phase – eg, stopping criterion, learning rate. Possible addition of noise to speed up training. ICT619 18 The Implementation phase Typical steps: Gathering the training set Selecting the development environment Implementing the neural network Testing and debugging the network Gathering the training set Aims to get right type of data in adequate amount and in the right format ICT619 19 Gathering training data (cont’d) How much data to gather? Increasing data amount increases training time but may help earlier convergence Quality more important than quantity Collection of data Potential sources - historical records, instrument readings, simulation results Preparation of data Involves preprocessing including scaling, normalisation, binarisation, mapping to logarithmic scale, etc. ICT619 20 Gathering training data (cont’d) Type of data to collect should be representative of given problem including routine, unusual and boundary-condition cases Mix of good as well as imperfect data but not ambiguous or too erroneous. Amount of data to gather Increasing data amount increases training time but may help earlier convergence Quality more important than quantity ICT619 21 Gathering training data (cont’d) Collection of data Potential sources - historical records, instrument readings, simulation results Preparation of data Involves preprocessing including normalisation and possible binarisation ICT619 22 Selecting the development environment Hardware and software aspects Hardware requirements based on speed of operation memory and storage capacity software availability cost compatibility The most popular platforms are workstations and highend PC's (with accelerator board option) ICT619 23 Selecting the development environment Two options in choosing software 1. Custom-coded simulators – which requires more expertise on part of the user but provides maximum flexibility 2. Commercial development packages – which are usually easy to use because of a more sophisticated interface ICT619 24 Selecting the development environment (cont’d) Selection of hardware and software environment usually based on following considerations: ANN paradigm to be implemented Speed in training and recall Transportability Vendor support Extensibility Price ICT619 25 Implementing the neural network Common steps involved are: Selection of appropriate neural paradigm Setting network size Deciding on the learning algorithm Creation of screen displays Determining the halting criteria Collecting data for training and testing Data preparation including preprocessing Organising data into training and test sets ICT619 26 Implementation - Training Training the net, which consists of Loading the training set Initialisation of network weights – usually to small random values Starting the training process Monitoring the training process until training is completed Saving of weight values in a file for use during operation mode ICT619 27 Implementation – Training (cont’d) Possible problems arising during training Failure to converge to a set of optimal weight values Further weight adjustments fail to reduce output error, stuck in a local minimum Remedied by resetting the learning parameters and reinitialising the weights Overtraining Net fails to generalise, i.e., fails to classify less than perfect patterns Mix of good and imperfect patterns for training helps ICT619 28 Implementation – Training (cont’d) Training results may be affected by the method of presenting data set to the network. Adjustments may be made by varying the layer sizes and fine-tuning the learning parameters. To ensure optimal results, several variations of a neural network may be trained and each tested for accuracy ICT619 29 Implementation - Testing and Debugging Testing can be done by: 1. Observing operational behaviour of the net. 2. Analysing actual weights 3. Study of network behaviour under specific conditions Observing operational behaviour Network treated as a black box and its response to a series of test cases is evaluated Test data Should contain training cases as well as new cases Routine, unusual as well as boundary condition cases should be tried ICT619 30 Implementation - Testing and Debugging (cont’d) Testing by weight analysis Weights entering and exiting nodes analysed for relatively small and large values In case of significant errors detected in testing, debugging would involve examining the training cases for representativeness, accuracy and adequacy of number learning algorithm parameters such as the rate at which weights are adjusted neural network architecture, node characteristics, and connectivity training set-network interface, user-network interface ICT619 31 The Maintenance Phase Consists of placing the neural network in an operational environment with possible integration periodic performance evaluation, and maintenance Although often designed as stand-alone systems, some neural network systems are integrated with other information systems using: Loose-coupling – preprocessor, postprocessor, distributed component Tight-coupling or full integration as embedded component ICT619 32 The Maintenance Phase Possible ANN operational environments: ICT619 33 System evaluation Continual evaluation is necessary to ensure satisfactory performance in solving dynamic problems check for damaged or retrained networks. Evaluation can be carried out by reusing original test procedures with current data. ICT619 34 ANN Maintenance Involves modification necessitated by Decreasing accuracy Enhancements System modification falls into two categories involving either data or software. Data modification steps: Training data is modified or replaced Network retrained and re-evaluated. ICT619 35 ANN Maintenance (cont’d) Software changes include changes in Interfaces cooperating programs the structure of the network. If the network is changed, part of the design and most of the implementation phase may have to be repeated. Backup copies should be used for maintenance and research. ICT619 36 A comparison of ANN and ES Similarities between ES and ANN Both aim to create intelligent computer systems by mimicking human intelligence, although at different levels Design process of neither ES nor ANN is automatic Knowledge extraction in ES is a time and labour intensive process ANNs are capable of learning but selection and preprocessing of data have to be done carefully. ICT619 37 A comparison of ANN and ES (cont’d) Differences between ANN and ES Differ in aspects of design, operation and use Logic vs. brain ES simulate the human reasoning process based on formal logic ANNs are based on modelling the brain, both in structure and operation Sequential vs. parallel The nature of processing in ES is sequential ANNs are inherently parallel ICT619 38 A comparison of ANN and ES (cont’d) External and static vs. internal and dynamic Learning is performed external to the ES ANN itself is responsible for its knowledge acquisition during the training phase. Learning is always off-line in ES - knowledge remains static during operation Learning in ANNs, although mostly off-line, can be online Deductive vs. inductive inferencing Knowledge in an ES always used in a deductive reasoning process An ANN constructs its knowledge base inductively from examples, and uses it to produce decision through generalisation ICT619 39 A comparison of ANN and ES (cont’d) Knowledge representation: explicit vs. implicit ES store knowledge in explicit form -possible to inspect and modify individual rules ANNs knowledge stored implicitly in the interconnection weight values Design issues: simple vs. complex Technical side of ES development relatively simple without difficult design choices. ANN design process often one of trial and error ICT619 40 A comparison of ANN and ES (cont’d) User interface: white box vs. black box ES have explanation capability Difficulty in interpreting an ANN's knowledge-base effectively makes it a black box to the user State of maturity and recognition: wellestablished vs. early ES already well established as a methodology in commercial applications ANN recognition and development tools at a relatively early stage. ICT619 41 Hybrid systems Neuro-symbolic computing utilises the complementary nature of computing in neural networks (numerical) and expert systems (symbolic). Neuro-fuzzy systems combine neural networks with fuzzy logic ANNs can also be combined with genetic algorithm methodology Hybrid ES-ANN systems The strengths of the ES can be utilised to overcome the weaknesses of an ANN based system and vice versa. For example, ANN’s extraction of knowledge from data ES’s explanation capability ICT619 42 Hybrid ES-ANN systems Rule extraction by inference justification in an ANN MACIE, an ANN based decision support system described in (Gallant 1993) Extracts a single rule that justifies an inference in an ANN Inference in an ANN is represented by output of a single node This output is based upon incomplete input values fed from a number of nodes as shown in the diagram below. ICT619 43 Hybrid ES-ANN systems (cont’d) A node ui is defined to be a contributing node to node uj if wij ui 0. ICT619 44 Hybrid ES-ANN systems (cont’d) In this example, the contributing variables are {u2, u3, u5, u6 }. The rule produced in this example is: IF u6 = Unknown AND u2 = TRUE AND u3 = FALSE AND u5 = TRUE THEN conclude u7 = TRUE. ICT619 45 Hybrid ES-ANN systems (cont’d) One approach to hybrid systems divides a problem into tasks suitable for either ES and ANN These tasks are then performed by the appropriate methodology One example of such a system (Caudill 1991) is an intelligent system for delivering packages ES performs the task of producing the best loading strategy for packages into trucks ANN works out best route for delivering the packages efficiently. ICT619 46 Hybrid ES-ANN systems (cont’d) Hybrid ES-ANN systems with ANNs embedded within expert systems ANN used to determine which rule to fire, given the current state of facts. Another approach to hybrid ES-ANN uses an ANN as a preprocessor One or more ANNs produce classifications. Numerical outputs produced by ANN are interpreted symbolically by an ES as facts ES applies the facts for deductive reasoning ICT619 47 Case Study Case: Application of ANNs in bankruptcy prediction (Coleman et al, AI Review, Summer 1991, in Zahedi 1993) Predicts banks that were certain to fail within a year Predicts certainty given to bank examiners dealing with the bank in question. ANN has 11 inputs, each of which is a ratio developed by Peat Marwick. Developed by NeuralWare’s Application Development Services and Support Group (ADSS) Software used - the NeuralWorks Professional neural network development system. Uses the standard backpropagation (multiplayer perceptron) network. ICT619 48 Case Study (cont’d) ANN has 11 inputs, each a ratio developed by Peat Marwick. Inputs connected to a single hidden layer, which in turn is connected to a single node in the output layer. Network outputs a single value denoting whether the bank would or would not fail within that calendar year Employed the hyperbolic-tangent transfer function and a proprietary error function created by the ADSS staff. Trained on a set of 1,000 examples, 900 of which were viable banks and 100 of which were banks that had actually gone bankrupt Training consisted of about 50,000 iterations of the training set. Predicted 50% of banks that are viable, and 99% of banks that actually failed. ICT619 49 REFERENCES AI Expert (special issue on ANN), June 1990. BYTE (special issue on ANN), Aug. 1989. Caudill,M., "The View from Now", AI Expert, June 1992, pp.27-31. Dhar, V., & Stein, R., Seven Methods for Transforming Corporate Data into Business Intelligence., Prentice Hall 1997 Kirrmann,H., "Neural Computing: The new gold rush in informatics", IEEE Micro June 1989 pp. 7-9 Lippman, R.P., "An Introduction to Computing with Neural Nets", IEEE ASSP Magazine, April 1987 pp.4-21. Lisboa, P., (Ed.) Neural Networks Current Applications, Chapman & Hall, 1992. Negnevitsky, M. Artificial Intelligence A Guide to Intelligent Systems, Addison-Wesley 2005. ICT619 50 REFERENCES (cont’d) Bailey, D., & Thompson, D., How to Develop Neural Network Applications, AI Expert, June 1990, pp. 38-47. Caudill & Butler, Naturally Intelligent Systems, MIT Press,1989, pp 227-240. Caudill, M., “Expert networks”, BYTE pp.109-116, October 1991. Dhar, V., & Stein, R., Seven Methods for Transforming Corporate Data into Business Intelligence., Prentice Hall 1997. Gallant, S., Neural Network Learning and Expert Systems, MIT Press 1993. Medsker,L., Hybrid Intelligent Systems, Kluwer Academic Press, Boston 1995 Zahedi, F., Intelligent Systems for Business, Wadsworth Publishing, , Belmont, California, 1993. ICT619 51