Download Solutions - PolyU

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Integrating ADC wikipedia , lookup

UniPro protocol stack wikipedia , lookup

Transistor–transistor logic wikipedia , lookup

Immunity-aware programming wikipedia , lookup

Opto-isolator wikipedia , lookup

Rectiverter wikipedia , lookup

Two-port network wikipedia , lookup

Transcript
EIE6207: Theoretical Fundamental and Engineering Approaches for Intelligent
Signal and Information Processing
Tutorial: Neural Networks and Backpropagation (Solutions)
Q1
a) Because we need to minimize the error E, which is a function of the weights. Therefore, we
compute the gradient of E with respect to the weights to find the best downhill direction at the
current position of the weights in the weight space.
slope=
¶E
¶w ji
w ji =w ji (t )
E(w ji )
w ji (t)
w ji
In the above diagram, the gradient (slope) at w ji (t) is negative. If h is positive, w ji (t +1) > w ji (t)
according to the BP update formula, which is what we want, i.e., going downhill.
b) The learning rate should be small so that the change in the weights will not be very large in
successive iterations. The key point is we want to have small change in weights for each iteration
so that eventually, we will reach the bottom of a valley. If the learning rate is very large, the
change in weights will be so large that the error E may increase from iteration to iteration. This is
similar to jumping from one side of the hill to the other side but never be able to reach the bottom.
c) No. It is a weakness of BP. BP is based on gradient descent. All gradient descent algorithm cannot
found the global minimum (unless E(w) is quadratic in w). But in many real-world applications, it
is not necessary to find the global minimum for the networks to be useful.
1/3
06/30/17
Q2
a)
o1(2) = f
o1(1) = f
(å
2
i=1
w1i(1) xi + b1(1)
(å
2
j=1
(1)
(2)
w1(2)
j o j + b1
1
b1(2)
)
o2(1) = f
1
2
)
(å
2
i=1
w2i(1) xi + b2(1)
)
b2(1)
W (1)
x1
x2
b1(1)
1
As shown in the above figure, the outputs of the hidden nodes are a function of the linear weighted sum of
inputs plus a bias. If the sigmoid function f() has a steep sloop, its output over a range of input x1 and x2
will look like the following figure. The left and right figures correspond to the output of hidden neurons 1
and 2, respectively. The first neuron maps data above L1 to 0.0 and below L1 to 1.0. Similarly, the second
neuron maps data above L2 to 1.0 and below L2 to 0.0. The resulting mapping are shown in the bottom
figure. The output neuron separates the data in the 2-D space defined by the hidden node outputs
( o1(1) and o2(1) ). As can be seen, the data on this new space are linearly separable and can be easily
classified by L3, which is produced by the output neuron.
2/3
06/30/17
Output of Hidden Neuron 1
Output of Hidden Neuron 2
x2
L1
L2
x1
o2(1)
o1(1)
L3
3/3
06/30/17