Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
EIE6207: Theoretical Fundamental and Engineering Approaches for Intelligent Signal and Information Processing Tutorial: Neural Networks and Backpropagation (Solutions) Q1 a) Because we need to minimize the error E, which is a function of the weights. Therefore, we compute the gradient of E with respect to the weights to find the best downhill direction at the current position of the weights in the weight space. slope= ¶E ¶w ji w ji =w ji (t ) E(w ji ) w ji (t) w ji In the above diagram, the gradient (slope) at w ji (t) is negative. If h is positive, w ji (t +1) > w ji (t) according to the BP update formula, which is what we want, i.e., going downhill. b) The learning rate should be small so that the change in the weights will not be very large in successive iterations. The key point is we want to have small change in weights for each iteration so that eventually, we will reach the bottom of a valley. If the learning rate is very large, the change in weights will be so large that the error E may increase from iteration to iteration. This is similar to jumping from one side of the hill to the other side but never be able to reach the bottom. c) No. It is a weakness of BP. BP is based on gradient descent. All gradient descent algorithm cannot found the global minimum (unless E(w) is quadratic in w). But in many real-world applications, it is not necessary to find the global minimum for the networks to be useful. 1/3 06/30/17 Q2 a) o1(2) = f o1(1) = f (å 2 i=1 w1i(1) xi + b1(1) (å 2 j=1 (1) (2) w1(2) j o j + b1 1 b1(2) ) o2(1) = f 1 2 ) (å 2 i=1 w2i(1) xi + b2(1) ) b2(1) W (1) x1 x2 b1(1) 1 As shown in the above figure, the outputs of the hidden nodes are a function of the linear weighted sum of inputs plus a bias. If the sigmoid function f() has a steep sloop, its output over a range of input x1 and x2 will look like the following figure. The left and right figures correspond to the output of hidden neurons 1 and 2, respectively. The first neuron maps data above L1 to 0.0 and below L1 to 1.0. Similarly, the second neuron maps data above L2 to 1.0 and below L2 to 0.0. The resulting mapping are shown in the bottom figure. The output neuron separates the data in the 2-D space defined by the hidden node outputs ( o1(1) and o2(1) ). As can be seen, the data on this new space are linearly separable and can be easily classified by L3, which is produced by the output neuron. 2/3 06/30/17 Output of Hidden Neuron 1 Output of Hidden Neuron 2 x2 L1 L2 x1 o2(1) o1(1) L3 3/3 06/30/17