Download Artificial Intelligence

Artificial Intelligence Lecture 9: [Part I]: Selected Topics on Neural Networks Faculty of Mathematical Sciences 4th 5th IT Elmuntasir Abdallah Hag Eltom http://www.rational-team.com/muntasir Lecture Objectives • Introduces the relationship between biological neurons, which make up human brains, and artificial neurons, which are used in artificial neural networks. • McCulloch and Pitts neurons are explained, and the capabilities and limitations of perceptrons are examined. • Multilayer neural networks are explored, and the backpropagation algorithm for supervised learning in multilayer networks is explained. • Recurrent networks, such as Hopfield networks and other bidirectional associative memories, are also explained. • Unsupervised learning is explained through the use of Kohonen maps and Hebb’s law. Neural Networks Simplified • Although the neural networks presented in this chapter are very simplistic, real-world networks can be extremely complex, consisting of hundreds or even thousands of neurons. Networks of this size can often appear like a “black box,” in the sense that it is not clear why they behave in the way they do. In fact, the behavior of complex neural networks is often emergent. Neurons Biological Neurons • The human brain contains over ten billion neurons, each of which is connected, on average, to several thousand other neurons. • These connections are known as synapses, and the human brain contains about 60 trillion such connections. • Neurons are in fact very simple processing elements. Each neuron contains a soma, which is the body of the neuron, an axon, and a number of dendrites. A simplified diagram of a biological neuron is shown next Neurons Biological Neurons Neurons Biological Neurons • The neuron receives inputs from other neurons along its dendrites, and when this input signal exceeds a certain threshold, the neuron “fires”—in fact, a chemical reaction occurs, which causes an electrical pulse, known as an action potential, to be sent down the axon (the output of the neuron), toward synapses that connect the neuron to the dendrites of other neurons. • Although each neuron individually is extremely simple, this enormously complex network of neurons is able to process information at a great rate and of extraordinary complexity. • The human brain far exceeds in terms of complexity any device created by man, or indeed, any naturally occurring object or structure in the universe, as far as we are aware today Neurons Biological Neurons • The human brain has a property known as plasticity, which means that neurons can change the nature and number of their connections to other neurons in response to events that occur. • In this way, the brain is able to learn. The brain uses a form of credit assignment to strengthen the connections between neurons that lead to correct solutions to problems and weakens connections that lead to incorrect solutions. • The strength of a connection, or synapse, determines how much influence it will have on the neurons to which it is connected, and so if a connection is weakened, it will play less of a role in subsequent computations. Neurons Artificial Neurons • Artificial neural networks are modeled on the human brain and consist of a number of artificial neurons. • Neurons in artificial neural networks tend to have fewer connections than biological neurons, and neural networks are all (currently) significantly smaller in terms of number of neurons than the human brain. • Each neuron (or node) in a neural network receives a number of inputs. • A function called the activation function is applied to these input values, which results in the activation level of the neuron, which is the output value of the neuron. There are a number of possible functions that can be used in neurons. Neurons Artificial Neurons • Some of the most commonly used activation functions: Neurons Artificial Neurons • In the Step function (Linear threshold function) the inputs to the neuron are summed (having each been multiplied by a weight), and this sum is compared with a threshold, t. If the sum is greater than the threshold, then the neuron fires and has an activation level of +1. Otherwise, it is inactive and has an activation level of zero. (In some networks, when the sum does not exceed the threshold, the activation level is considered to be -1 instead of 0). • Hence, the behavior of the neuron can be expressed as follows: Neurons Artificial Neurons • X is the weighted sum of the n inputs to the neuron, x1 to xn, where each input, xn is multiplied by its corresponding weight wn. For example, let us consider a simple neuron that has just two inputs. Each of these inputs has a weight associated with it, as follows: w1 = 0.8 w2 = 0.4 • The inputs to the neuron are x1 and x2: x1 = 0.7 x2 = 0.9 • So, the summed weight of these inputs is • (0.8 x 0.7) + (0.4 x 0.9) = 0.92 Neurons Artificial Neurons • The activation level Y, is defined for this neuron as Hence, if t is less than 0.92, then this neuron will fire with this particular set of inputs. Otherwise, it will have an activation level of zero. Neurons Artificial Neurons • A neural network consists of a set of neurons that are connected together. • The connections between neurons have weights associated with them, and each neuron passes its output on to the inputs of the neurons to which it is connected. This output depends on the application of the activation function to the inputs it receives. In this way, an input signal to the network is processed by the entire network and an output (or multiple outputs) produced. • There is no central processing or control mechanism, the entire network is involved in every piece of computation that takes place. Neurons Artificial Neurons • The way in which neurons behave over time is particularly interesting. • When an input is given to a neural network, the output does not appear immediately because it takes some finite period of time for signals to pass from one neuron to another. • In artificial neural networks this time is usually very short, but in the human brain, neural connections are surprisingly slow. It is only the enormously parallel nature of the brain that enables it to calculate so quickly. Neurons Artificial Neurons • For neural networks to learn, the weight associated with each connection (equivalent to a synapse in the biological brain) can be changed in response to particular sets of inputs and events. • Hebbian learning involves increasing the weight of a connection between two neurons if both neurons fire at the same time. Perceptrons • The perceptron, which was first proposed by Rosenblatt (1958), is a simple neuron that is used to classify its inputs into one of two categories. • A perceptron uses a step function that returns +1 if the weighted sum of the inputs, X, is greater than a threshold, t, and -1 if X is less than or equal to t: Perceptrons • in which case, the activation function for a perceptron can be written as: • Note that here we have allowed i to run from 0 instead of from 1. This means that we have introduced two new variables: w0 and x0.We define x0 as 1, and w0 as -t. • A single perceptron can be used to learn a classification task, where it receives an input and classifies it into one of two categories: 1 or 0.We can consider these to represent true and false, in which case the perceptron can learn to represent a Boolean operator, such as AND or OR. Learning Process of a Perceptron • First, random weights are assigned to the inputs. Typically, these weights will be chosen between -0.5 and +0.5. • Next, an item of training data is presented to the perceptron, and its output classification observed. If the output is incorrect, the weights are adjusted to try to more closely classify this input. In other words, if the perceptron incorrectly classifies a positive piece of training data as negative, then the weights need to be modified to increase the output for that set of inputs. • This can be done by adding a positive value to the weight of an input that had a negative input value, and vice versa. Learning Process of a Perceptron • The formula for this modification, as proposed by Rosenblatt (Rosenblatt 1960) is as follows: • where e is the error that was produced, and a is the learning rate,where 0 <a < 1; e is defined as 0 if the output is correct, and otherwise it is positive if the output is too low and negative if the output is too high. • In this way, if the output is too high, a decrease in weight is caused for an input that received a positive value. This rule is known as the perceptron training rule. Learning Process of a Perceptron • Once this modification to the weights has taken place, the next piece of training data is used in the same way. • Once all the training data have been applied, the process starts again, until all the weights are correct and all errors are zero. • Each iteration of this process is known as an epoch. • Let us examine a simple example: we will see how a perceptron can learn to represent the logical-OR function for two inputs.We will use a threshold of zero (t = 0) and a learning rate of 0.2. Learning Process of a Perceptron • First, the weight associated with each of the two inputs is initialized to a random value between -1 and +1: • w1 = - 0.2 • w2 = 0.4 • Now, the first epoch is run through. The training data will consist of the four combinations of 1’s and 0’s possible with two inputs. • Hence, our first piece of training data is • x1 = 0 • x2 = 0 • and our expected output is x1 ∨ x2 = 0. Learning Process of a Perceptron • We apply our formula for Y: • Hence, the output Y is as expected, and the error, e, is therefore 0. So the weights do not change. • The same goes for other cases. • Now consider the case x1=1 and x2=0. Learning Process of a Perceptron • We apply our formula for Y: • This is incorrect because 1 ∨ 0 = 1, so we should expect Y to be 1 for this set of inputs. Hence, the weights are adjusted. • We will use the perceptron training rule to assign new values to the weights: Learning Process of a Perceptron • Weight adjustment formula: • Our learning rate is 0.2, and in this case, the e is 1, so we will assign the following value to w1: w1 = - 0.2 + (0.2 x 1 x 1) = - 0.2 + 0.2 = 0 • We now use the same formula to assign a new value to w2: w2 = 0.4 + (0.2 x 0 x 1) = 0.4 Learning Process of a Perceptron • Because w2 did not contribute to this error, it is not adjusted. • The final piece of training data is now used (x1 = 1 and x2= 1): Y = Step ((0 x 1) + (0.4 x 1)) = Step (0 + 0.4) = Step (0.4) =1 • This is correct, and so the weights are not adjusted. • This is the end of the first epoch, and at this point the method runs again and continues to repeat until all four pieces of training data are classified correctly. Learning Process of a Perceptron [See perceptron example] Perceptrons • A perceptron can be trained to model other logical functions such as AND, but there are some functions that cannot be modeled using a perceptron, such as exclusive OR. • The reason for this is that perceptrons can only learn to model functions that are linearly separable. • A linearly separable function is one that can be drawn in a two-dimensional graph, and a single straight line can be drawn between the values so that inputs that are classified into one classification are on one side of the line, and inputs that are classified into the other are on the other side of the line. linearly separable functions Perceptrons • The reason that a single perceptron can only model functions that are linearly separable can be seen by examining the following function: • Using these functions, we are effectively dividing the search space using a line for which X = t. Hence, in a perceptron with two inputs, the line that divides one class from the other is defined as follows: w1x1 + w2x2 = t • The perceptron functions by identifying a set of values for wi, which generates a suitable function. In cases where no such linear function exists, the perceptron cannot succeed. Multilayer Neural Networks • Most real-world problems are not linearly separable, and so although perceptrons are an interesting model for studying the way in which artificial neurons can work, something more powerful is needed. • As has already been indicated, neural networks consist of a number of neurons that are connected together, usually arranged in layers. • A single perceptron can be thought of as a single-layer perceptron. Multilayer perceptrons are capable of modeling more complex functions, including ones that are not linearly separable, such as the exclusive-OR function Multilayer Neural Networks This is A simple three-layer feed-forward in contrast with recurrent networks. A typical feed-forward neural network consists of an input layer, one or two hidden layers, and an output layer, and may have anywhere between 10 and 1000 neurons in each layer. Backpropagation • Multilayer neural networks learn in much the same way as single perceptrons. • The main difference is that in a multilayer network, each neuron has weights associated with its inputs, and so there are a far greater number of weights to be adjusted when an error is made with a piece of training data. • Clearly, an important question is how to assign blame (or credit) to the various weights. One method that is commonly used is backpropagation. Recurrent Networks • The neural networks we have been studying so far are feed-forward networks. • A feed-forward network is acyclic, in the sense that there are no cycles in the network, because data passes from the inputs to the outputs, and not vice versa. • Once a feed-forward network has been trained, its state is fixed and does not alter as new input data is presented to it. In other words, it does not have memory. • A recurrent network can have connections that go backward from output nodes to input nodes and, in fact, can have arbitrary connections between any nodes. In this way, a recurrent network’s internal state can alter as sets of input data are presented to it, and it can be said to have a memory. Recurrent Networks • This is particularly useful in solving problems where the solution depends not just on the current inputs, but on all previous inputs. • For example, recurrent networks could be used to predict the stock market price of a particular stock, based on all previous values, or it could be used to predict what the weather will be like tomorrow, based on what the weather has been. • When learning, the recurrent network feeds its inputs through the network, including feeding data back from outputs to inputs, and repeats this process until the values of the outputs do not change. At this point, the network is said to be in a state of equilibrium or stability. Recurrent Networks • Recurrent networks are also known as attractor networks because they are attracted to certain output values. • The stable values of the network, which are also known as fundamental memories, are the output values used as the response to the inputs the network received. • A recurrent network can be considered to be a memory, which is able to learn a set of states—those that act as attractors for it. • Once such a network has been trained, for any given input it will output the attractor that is closest to that input.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Artificial Intelligence