Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Activity: • See the figure of the biological neuron • Draw a figure of a Perceptron • Explain the structure of the Perceptron and the meaning of its components Artificial Neural Networks (ANNs) have biological inspiration. The basic component of brain circuitry is a specialised cell called the neuron illustrated in Fig. 2-1-1. Neurons are electrically excitable and the cell body can generate electrical signals, called action potentials, which are propagated down the axis towards the nerve terminal. The electrical signal propagates only in this direction and it is an all-or-none event. Information is coded on the frequency of the signal. Fig. 2-1-1. A biological neuron The nerve terminal is close to the dendrites or body cells of other neurons, forming special junctions called synapses. Circuits can therefore be formed by a number of neurons. Any particular neuron has many inputs (some receive nerve terminals from hundreds or thousands of other neurons), each one with different strengths. The neuron integrates the strengths and fires action potentials accordingly. The input strengths are not fixed, but vary with use. The mechanisms behind this modification are now beginning to be understood. These changes in the input strengths are thought to be particularly relevant to learning and memory. Inspired by the biological fundamentals a neuron model was introduced by Rosenblatt [19] called Perceptron illustrated in Fig. 2-1-2. 3 Fig. 2-1-2. Perceptron In the Perceptron there is a set of synapses or connecting links, each link connecting each input to the summation block. Associated with each synapse, there is a strength or weight, which multiplies the associated input signal. The input signals are integrated in the neuron. Usually an adder is employed for computing a weighted summation of the input signals. Note also the existence of a threshold or bias term associated with the neuron. This can be associated with the adder, or, considered as another (fixed, with the value of 1) input. The output of this adder is denoted by s, which for a threshold function is applied for obtaining the output y. Activity: • Note the learning rule of the Perceptron • Try to perform a weight modification by yourself in an example The typical application of Rosenblatt was to activate an appropriate response unit for a given input patterns or a class of patterns. The inputs, outputs and training patterns were binary valued (0 or 1). Although there are different versions of Perceptron and corresponding learning rules, the basic rule is to alter the value of the weights only of active lines (the ones corresponding to a 1 input), and only when an error exists between the network output and the desired output. Formally: 1. If the output is 1 and it should be 1, or if the output is 0 and it should be 0, then do nothing 2. If the output is 0 and it should be 1, then increment the weights in all active lines 3. If the output is 1 and it should be 0, then decrement the weights in all active lines The weight vector, w, is updated as: w[k+1]=w[k]+∆w, (1) where ∆w is the change made to the weight vector, as: 4 ∆w i =α⋅(t[k]-y[k])⋅x i [k] (2) where α is the learning rate, t[k] and y[k] are the desired and actual output, respectively, at time k, and x i [k] is the i-th element of the input vector at time k. Thus, the i-th component of the weight vector, at the (k+1)-th iteration, is: w i [k ] + α ⋅ x i , if the output is 0, and it should be 1 (3) w i [k + 1] = w i [k ] − α ⋅ x i , if the output is 1, and it should be 0 w [k ], if the output is correct i Some variations have been made to this simple Perceptron model: First, some models do not employ a bias; the inputs to the net may be real valued, bipolar (+1, -1), as well as binary; the outputs may be bipolar. Activity: • See the many possibilities to organise the neurons in a network Biological neural networks are densely interconnected. This also happens with artificial neural networks. According to the flow of the signals within an ANN, we can divide the architectures into feedforward networks, if the signals flow just from input to output, or recurrent networks, if loops are allowed. Another possible classification is dependent on the existence of hidden neurons, i.e., neurons which are not input nor output neurons. If there are hidden neurons, we denote the network as a multilayer neural network, otherwise the network can be called a singlelayer neural network. Finally, if every neuron in one layer is connected with the layer immediately above, the network is called fully connected. Activity: • See the difference between the Perceptron and the modified Perceptron Multilayer perceptrons (MLPs) are the most widely known type of ANNs. If we see the Perceptron, we can diagnose, that it is not difficult to design a network structure with simple Perceptrons, as long as each one is individually trained to implement a specific function. However, to design the whole structure globally to implement a more complicated function (or sometimes even easier functions, like the XOR function) is impossible, because of the hard nonlinearity of the Perceptron. It is not possible to employ the calculation of the derivatives which needs for the learning algorithm. This problem can be solved if the hard threshold (sign function) of the Perceptron is replaced by a smoother, differentiable nonlinearity, such as a sigmoid or a hyperbolic tangent function. This modified Perceptron is illustrated in Fig. 2-1-3. and applied as processing units in the Multilayer Perceptron (MLP) which is a feedforward multilayer ANN presented in Fig. 2-1-4. 5 Fig. 2-1-3. Modified Perceptron Activity: • Draw a figure of MLP • Explain the components in MLP • Calculate the output of the MLP for a given input Fig. 2-1-4. Multilayer Perceptron 6 From a farther point of view actually any given input vector propagated forward through the network will cause activation in the output layer. The whole network function which maps the input vector to the output vector can be determined by the network’s weights. Every neuron in the network is a simple processing unit, which computes its own output based on its inputs. In the l-th layer the input of the i-th neuron (Fig. 2-1-3): (4) s i( l ) = ∑ y (jl−1) w (jil−1) + b i( l−1) , j∈pred ( i ) where pred(i) is the set of the preceding neurons of the i-th neuron, i.e. the set of those neurons from where there is connection to the investigated i-th neuron; w ji is the weight between the j-th and i-th neurons. The b i is the bias of the neuron. Usually the bias is considered as a weight coming from a “bias” unit having a constant 1 as output value. In Fig. 2-1-4 the bias units are denoted by letter b, which are connected to every processing unit by a certain amount of weight. The output of the i-th neuron can be calculated from the previously computed s i (l) by using a non-linear function mentioned above. Usually the sigmoid function is used which will provide the output of the i-th neuron as follows: 1 y i( l ) = . (5) (l) 1 + e −s i A nice property of this function is that its derivative can be calculated easily: ∂y i (6) = y i (1 − y i ). ∂s i The weights of the neural network can be determined by a learning algorithm based on training patterns in order to realise a desired mapping using the network. The learning algorithms will be introduced in a later lesson. 7