Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Technological singularity wikipedia , lookup
Pattern recognition wikipedia , lookup
Hierarchical temporal memory wikipedia , lookup
Philosophy of artificial intelligence wikipedia , lookup
Intelligence explosion wikipedia , lookup
Ethics of artificial intelligence wikipedia , lookup
Catastrophic interference wikipedia , lookup
Existential risk from artificial general intelligence wikipedia , lookup
Applications of Artificial Intelligence: Neural Networks Neural Networks 8 The Problem Introduction Prediction Problems What Is a Neural Net? Using Neural Nets Summary Slide 1 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks The Problem • Slide 2 To what extent can models of the human central nervous system lead us to architectures that will be suitable for managing intelligence? ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Neural Networks The Problem 8 Introduction Prediction Problems What Is a Neural Net? Using Neural Nets Summary Slide 3 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Artificial neural networks • a class of very powerful, general-purpose tools readily applied to prediction, classification, and clustering • have been applied across a broad range of industries, from predicting financial series to diagnosing medical conditions, from identifying clusters of valuable customers to identifying fraudulent credit card transactions, from recognising numbers written on checks to predicting the failure rates of engines Slide 4 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Neural networks The appeal of neural networks is that they model, on a digital computer, the neural connections in human brains When used in well-defined domains, their ability to generalise and learn from data mimics our own ability to learn from experience Slide 5 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks NNs: drawback the results of training a neural network are internal weights distributed throughout the network these weights provide no more insight into why the solution is valid than asking many human experts why a particular decision is the right decision Slide 6 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks NNs the weights are not readily understandable although, increasingly, sophisticated techniques for probing into neural networks help provide some explanation – Neural networks are best approached as black boxes with mysterious internal workings Slide 7 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks History of NNs 1943: Warren McCulloch, a neurophysiologist, and Walter Pits, a logician, postulated a simple model to explain how biological neurons work 1950s: when digital computers first became available, computer scientists implemented models called perceptrons based on the work of McCulloch and Pits – results were disappointing for general problem-solving Slide 8 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks History of NNs 1968: Seymour Papert and Marvin Minsky showed these simple networks had theoretical deficiencies – work practically stops — until: 1982: John Hopfield invented backpropagation – a way of training neural networks that sidestepped the theoretical pitfalls of earlier approaches Slide 9 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks History of NNs 1980s: statisticians developed a technique called ‘logistic regression’ • the entire theory of neural networks can be explained using statistical methods, like probability distributions, likelihoods, and so on • leading to ill-founded criticism Slide 10 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks History of NNs 1980s: became very popular if only because they work – this popularity shows no sign of slowing Slide 11 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Neural Networks The Problem Introduction 8 Prediction Problems What Is a Neural Net? Using Neural Nets Summary Slide 12 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks NNs: Eg: Real Estate Appraisal Federal Home Loan Mortgage Corporation, has developed a product called Loan Prospector that does real estate appraisals automatically for homes throughout the United States Loan Prospector is based on neural network technology provided by HNC, Inc. Slide 13 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Appraisers combine the features of a house to come up with a valuation Slide 14 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks The appraiser does not apply some set formula balances experience and knowledge of the sales prices of similar homes – knowledge about housing prices is not static is aware of recent sale prices for homes throughout the region and can recognise trends Slide 15 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks 1992: researchers at IBM recognised this as a good problem for neural networks Slide 16 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Prediction problems well suited to neural networks if: • inputs are well understood — you have a good idea of which features are important, but not necessarily how to combine them • output is well understood — you know what you are trying to predict. • experience is available — you have plenty of examples where both the inputs and the output are known Slide 17 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Possible features (ie inputs) to appraise values in a single area: Slide 18 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks To extend to handle homes in many neighbourhoods include: – ZIP code information – neighbourhood demographics – neighbourhood quality-of-life indicators, such as ratings of schools and proximity to transportation Slide 19 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks to train neural network we need: sales price of the home and when it sold Slide 20 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Problem: neural networks work best when all the input and output values are between 0 and 1 have to massage all the values, both continuous and categorical, to get new values between 0 and 1 – Eg: marital status, gender, account status, product code, vendor id, and so on are categorical values Slide 21 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks To massage continuous values: subtract the lower bound of the range from the value and divide that result by the size of the range To massage categorical values: assign fractions between 0 and 1 Slide 22 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Slide 23 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Train network repeatedly feed the examples in the training set through the neural network network compares its predicted output value to the actual sales price and adjusts all its internal weights to improve the prediction aim: to calculate a good set of weights Slide 24 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Building a prediction model 1. Identify input and output features. 2. Massage values to [0,1] 3. Select a type of network 4. Train the network 5. Test the network if necessary, repeat the training 6. Apply the model generated by the network to predict outcomes for unknown inputs. Slide 25 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Maintaining a prediction model either repeatedly retrain the network with new data – OK if the network only needs to tweak results or start over again by adding new examples into the training set (perhaps removing older examples) and training an entirely new network Slide 26 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Warning a neural network is only as good as its training set the model is static and must be explicitly maintained with recent examples Slide 27 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Neural Networks The Problem Introduction Prediction Problems 8 What Is a Neural Net? Using Neural Nets Summary Slide 28 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks What Is a Neural Net? to start playing you don’t need to know! – many tools, both freeware and commercial off-the-shelf products – some let you train networks and use them with no more knowledge than needed for the real estate appraisal example Slide 29 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Neural Net a set of basic units modelled on biological neurons each unit has many inputs that it combines into a single output value these units are connected together, so the outputs from some units are used as inputs into other units Slide 30 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Feed-forward neural networks there is a one-way flow through the network—from the inputs to the outputs and there are no cycles in the network the simplest and most useful type of network Slide 31 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Slide 32 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Three basic questions • What are units and how do they behave? What is the “activation function”? • How are the units connected together? What is the “topology” of a network? • How does the network learn to recognise patterns? What is “backpropagation”? Slide 33 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Activation function has two parts: the combination function that merges all the inputs into a single value the transfer function transfers the value of the combination function to the output of the unit Slide 34 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks units designed to model the behaviour of biological neurons Slide 35 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Combination function most common is the weighted sum, where each input is multiplied by its weight and these products are added together or, the maximum of the weighted inputs, the minimum, or the logical AND or OR of the values etc Slide 36 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Transfer functions: sigmoid — most common transfer function hyperbolic tangent linear — limited value: a feed-forward neural network consisting only of units with linear transfer functions is really just doing a linear regression Slide 37 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks sigmoid linear hyp-tangent Slide 38 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Sigmoid function in the middle almost-linear, then it gradually saturates to either -1 or 1 corresponds to a gradual movement from a linear model a non-linear model so it copes with: linear problems, nearlinear problems, and non-linear problems Slide 39 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Transfer functions can have units with different transfer functions the default transfer function in most cases for off-the-shelf tools is the sigmoid Slide 40 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Eg: feed-forward network topology three layers: input layer is connected to the inputs, whose values have been massaged to fall between 0 and 1, and is connected to exactly one source – so in this example, the input layer does not actually do any work – in more complicated networks input layers do play a more significant role Slide 41 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Eg: three layers: /contd hidden layer is fully-connected to all the units in the input layer units in the hidden layer calculate their output by multiplying the value on each input by its corresponding weight, adding these up, and applying the sigmoid function in general: one hidden layer is sufficient Slide 42 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Eg: three layers: /contd How wide should the hidden layer be? • the wider the layer the greater the capacity of the network to recognise patterns • but network can recognise patternsof-one by memorising each of the training examples • so want the network to generalise on the training set, not to memorise it Slide 43 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Eg: three layers: /contd hidden layer inputs includes a constant input: that is always set to 1 like other inputs, it has a weight and is included in the combination function acts as a global offset that helps the network better understand patterns – well? that is the story! Slide 44 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Eg: three layers: /contd the output layer is connected to the output of the neural network, and is fully-connected to all the units in the hidden layer if the neural network is being used to calculate a single value, then there is only one unit in the output layer and the value that it produces will lie between 0 and 1 Slide 45 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Slide 46 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Eg: three layers: /contd more than one unit in the output layer Eg: a department store chain wants to predict the likelihood that customers will be purchasing products from three departments set up a neural network with three outputs, one for each department Slide 47 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Slide 48 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks How can the department store determine the right promotion to offer the customer? • taking the department corresponding to the unit with the maximum value • taking the departments corresponding to the units with the top three values • taking all departments corresponding to the units that exceed some threshold value • taking all departments corresponding to units that are some percentage of the unit with the maximum value no ‘correct’ answer Slide 49 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks How do Neural Networks Learn? Aim: to set the best weights on the inputs of each of the units Approach: use the training set to produce weights where the output of the network is as close to the desired output as possible for as many of the examples in the training set as possible Slide 50 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks How do Neural Networks Learn? Backpropagation: 1. given a training example, calculate the output 2. backpropagation then calculates the error, ie. the difference between the calculated and correct results 3. the error is fed back through the network and the weights are adjusted to minimise the error Slide 51 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Backpropagation critical part: using the error measure to adjust the weights each unit is assigned a specific responsibility for the error Eg: in the output layer, one unit is responsible for the whole error, so this unit assigns a responsibility for part of the error to each of its inputs, which come from units in the hidden layer, and so on Slide 52 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Backpropagation Given the error, how does a unit adjust its weights? It starts by measuring how sensitive its output is to each of its inputs Then it then adjusts each weight to reduce, but not eliminate, the error — generalised delta rule – the whole thing is a complicated mathematical procedure Slide 53 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Generalised delta rule two important parameters: – the momentum – the learning rate Slide 54 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Momentum the tendency of the weights inside each unit to change the “direction” they are heading in (ie. are they getting bigger or smaller) momentum tries to keep it going in the same direction a network with high momentum responds slowly to new examples Slide 55 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Learning rate how quickly the weights change best approach: to start big and decrease it slowly as the network is being trained initially, the weights are random, so large oscillations are useful to get in the vicinity of the best weights Slide 56 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Danger falling into a local optimum analogous to trying to climb to the top of a mountain and finding that you have only climbed to the top of a nearby hill controlling learning rate and momentum helps to find the best solution Slide 57 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Training Using Genetic Algorithms becoming increasingly popular both finding the weights and choosing the topology but you’ll have to wait! Slide 58 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Neural Networks The Problem Introduction Prediction Problems What Is a Neural Net? 8 Using Neural Nets Summary Slide 59 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Using Neural Nets Training Set Preparing the Data Interpreting the Results NNs for Time Series NNs for Data mining Slide 60 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Choosing the Training Set Points to consider: • Coverage of the values for all the features • The Number of Features • The Number of Inputs • The Number of Outputs • Available Computational Power Slide 61 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Training Set Coverage of the values for all the features training set should cover the full range of values for all the features that the network might encounter good idea to have several examples in the training set for each value of a categorical feature and for a range of values for ordered discrete and continuous features Slide 62 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Training Set Number of Features training time is directly related to the number of input features as the number of features increases, the network becomes more likely to converge to an inferior solution eg. first use statistical correlation or decision trees to determine which features are likely to be more important for predictive purposes Slide 63 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Training Set Number of Inputs the more features the more training examples that are needed to get a good coverage of patterns rule of thumb: a minimum of 10 to 20 examples for each feature; having several hundred is not unreasonable Slide 64 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Training Set Number of Outputs must be many examples for all possible output values from the network the number of training examples for each possible output should be about the same Slide 65 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Training Set • if network is to be used to detect rare events then you need to be sure that the training set has a sufficient number of examples of these rare events • a random sample of data is not sufficient, since common examples will swamp the rare examples—the training set needs to oversample the rare cases Slide 66 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Training Set Computational Power standard algorithm for training a neural network requires passing through the training set dozens or hundreds of times before the network converges on its optimal weights Slide 67 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Preparing the Data Features with Continuous Values • Dollar amounts • Averages • Ratios • Physical measurements Slide 68 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Preparing the Data problem with a skewed distribution eg ‘income’: Slide 69 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Preparing the Data skewed distribution /contd one solution is to discretize: $10,000—$17,999 very low $18,000—$31,999 low $32,000—$63,999 middle $64,000—$99,999 high $100,000 and above very high Slide 70 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Preparing the Data skewed distribution /contd one solution is to apply a function such as the logarithm: Slide 71 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Preparing the Data Features with Ordered, Discrete Values like continuous features, these have a maximum and minimum value and so the scaling formula may be applied — easy Slide 72 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Preparing the Data Ordered, Discrete Values like continuous features, these have a maximum and minimum value and so the scaling formula may be applied — easy Slide 73 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Preparing the Data Ordered, Discrete Values /contd if difference on one end of the scale is less significant than the other then try thermometer codes: 0 8 1 0 0 0 0 = 0.5000 1 8 1 1 0 0 0 = 0.7500 2 8 1 1 1 0 0 = 0.8750 3 8 1 1 1 1 0 = 0.9375 Slide 74 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Preparing the Data Features with Categorical Values eg: gender, marital status, etc. status codes product codes ZIP codes Slide 75 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Preparing the Data Categorical Values /contd first method: treat the codes as discrete, ordered values eg map “single,” “divorced,” “married,” “widowed,” and “unknown,” to 0.00, 0.25, 0.50, 0.75, and 1.00 BUT then “single” and “unknown” are very far apart whereas “divorced” and “married” are quite close Slide 76 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Preparing the Data Categorical Values /contd second method: break the categories into flags 1 of N coding: Gender Male Female Unknown Gender Male Flag 1.000 0.000 0.000 Gender Female Flag 0.000 1.000 0.000 Gender Unknown Flag 0.000 0.000 1.000 one input has become three! Slide 77 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Preparing the Data Categorical Values /contd second method /contd: or 1 of N-1 coding: but one input variable has become two! Slide 78 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Interpreting the Results For continuous values: • just convert from [0, 1] to the correct range For non-continuous values: • not so easy for binary 0.1 8 0 and 0.9 what about 0.5? Slide 79 8 1 but ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Interpreting non-continuous values: method 1: if ≤ 0.5 then 0, otherwise 1 method 2: if < 0.33 then 0, if > 0.66 then 1 otherwise ‘unknown’ method 3 — confidence levels: Slide 80 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Interpreting non-continuous values: method 4: output 1 is confidence level for ‘0’ output 2 is confidence level for ‘1’ but what do we do with output [0.1, 0.3]? Slide 81 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Dealing with non-continuous values: one approach: first: use training set to train the network second: use a ‘test set’ to calibrate the outputs the ‘test set’ is usually distinct from the ‘training set’ Slide 82 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Neural Networks for Time Series to train on the time-series data start training at the oldest point then move to the second oldest point and the oldest point goes to the next set of units in the input layer, and so on train like a feed-forward, backpropagation network trying to predict the next value in the series at each step Slide 83 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Slide 84 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Neural Networks for Time Series not limited to data from just a single time series can take multiple inputs remember: do you expect the inputs to functionally determine the outputs? Slide 85 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Neural Networks for Time Series given: Slide 86 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Neural Networks for Time Series use: Slide 87 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Feed-Forward, Backpropagation Networks how many units in the hidden layer? ≤ twice the number of input units start with the same as the number of input units and increase if necessary if using for a classification problem then have one hidden unit for each class Slide 88 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Feed-Forward, Backpropagation Networks how big should the training set be? for s input units, h hidden units, and 1 output unit, there are: h _ (s+1) + h+1 weights to determine size of training set should be 5 to 10 times the number of weights Slide 89 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Feed-Forward, Backpropagation Networks learning rate and momentum parameters? start with high learning rate and decrease momentum? Slide 90 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks What is going on inside a neural network? at present: can’t extract rules but we can do a sensitivity analysis on the relative importance of the inputs – this is often good enough Slide 91 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Neural Networks The Problem Introduction Prediction Problems What Is a Neural Net? Using Neural Nets 8 Slide 92 Summary ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Strengths of Neural Networks • They can handle a wide range of problems • They produce good results even in complicated domains • They handle both categorical and continuous variables • They are available in many off-theshelf packages Slide 93 ©J.K. Debenham, 2003 Applications of Artificial Intelligence: Neural Networks Weaknesses of Neural Networks • They require inputs in the range 0 to 1 • They cannot explain their results • They may converge prematurely to an inferior solution Slide 94 ©J.K. Debenham, 2003