Download Midterm exam - Suraj @ LUMS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

Network tap wikipedia , lookup

Piggybacking (Internet access) wikipedia , lookup

Airborne Networking wikipedia , lookup

Nonblocking minimal spanning switch wikipedia , lookup

Transcript
ROLL NO.:
NAME:
CS 537 / CMPE 537 – Neural Networks
Midterm Exam Solution
April 19, 2007
Duration: 80 minutes (10:15 to 11:35)
1. (10 points) Design a two-input neural network with McCulloch-Pitts neurons to
perform the logical operations (a) A   B, and (b) A XOR B, where A and B are
Boolean variables. Draw the network and show the weghts.
A
1
1
0
0
B
1
0
1
0
(a) Ouput
0
1
0
0
(b) Output
0
1
1
0
(a) is linearly separable, so a single-layer network can learn the operation. (b) requires a
hidden layer consisting of two neurons to learn the operation. All activation functions are
the threshold function defined as φ(v) = 1 if v > 0 and φ(v) = 0 otherwise.
CS 537 / CMPE 537 (Sp 06-07) – Dr. Asim Karim
Page 1 of 6
2. (10 points) Design a Wiener filter with 4 inputs and one output. The cross-correlation
of the inputs is zero and the auto-correlation of attribute 3 and 4 is twice and thrice,
respectively, that of attributes 1 and 2. The output is twice as strongly correlated with
attribute 2 than with the other attributes. Draw the network and compute the optimal
weights.
The environment can be described by the following correlation matrix and crosscorrelation vector:
There are four filters/weights w = [w1 w2 w3 w4]T
The optimal weights are given by the following system of equations:
Rxw = rxd or w = Rx-1rxd
Solving, we get
w1 = 1; w2 = 2, w3 = 0.5, w4 = 0.33
The optimal filter is thus defined as y = w1x1 + w2x2 + w3x3 + w4x4 .
3. (10 points) (a) Differentiate between the expected error, the time-average error, and
the ensemble-average error. (b) What is the instantaneous error stating its advantages
over the others.
The expected error, denoted by E[e], is given by the sum of the products of each possible
error value (e) and its probability of occurrence (P(e)).
The time-average error is the average error observed over a fixed and finite time period
for a given network. For example, the average error of a network over N training example
is the time-average error.
The ensemble-average error is the average error of a fixed and finite set of randomly
initialized networks when presented with a training example.
CS 537 / CMPE 537 (Sp 06-07) – Dr. Asim Karim
Page 2 of 6
The time-average and ensemble-average errors are approximation of the expected error
for a stochastic environment. If the environment is ergodic, then the time-average and
ensemble-average errors are equal. If the environment is ergodic and stationary, then all
three errors are equal.
The instantaneous error is the error of a given network when presented with a given
training example (i.e. e(i)). The advantage of this error measure is that its computation
does not require complete knowledge of the environment. Furthermore, a network trained
on the basis of the instantaneous error is capable of tracking concept drift in the
environment.
4. (15 points) Consider a single-layer single-neuron (output neuron) feedforward
network with linear activation functions. Use the method of steepest descent to derive
the update rule for the network that minimizes the squared error at the output neuron.
It is known that the desired output is defined by the equation w0 + w1x1 + w1x12 + …
+ wnxn + wnxn2, where w’s are the parameters of the network and n in the number of
inputs. Is the single-layer network appropriate for this problem? Explain briefly your
answer.
CS 537 / CMPE 537 (Sp 06-07) – Dr. Asim Karim
Page 3 of 6
5. (20 points) Consider a two-layer feedforward network with two inputs, one hidden
neuron, and one output neuron. All neurons use the logistic activation function. Use
the BP algorithm with momentum to update the weights of networks after each of the
training examples {(1, 0), 1} and {(0, 1), 0}. Assume all weights are initially equal to
1, η = 0.2, and α = 0.9. Show your working.
Weights from input to hidden layer, w(1) = [w0 w1 w2]T
Weights from hidden to output layer, w(2) = [w0 w1]T
Initially all weights are equal to 1.
Example {(1, 0), 1}
x = [1 1 0]T
v(1) = xTw(1) = 2
y1(1) = 1/(1+exp(-2)) = 0.8808
v(2) = y(1)Tw(2) = [1 0.8808]T[1 1] = 1.8808
y(2) = 1/(1+exp(-1.8808)) = 0.8677
Hidden to output layer
δ(2) = (d-y)*y(1-y) = (1-0.8677)*0.8677*(1-0.8677) = 0.0152
∆w(2) = α∆w(2) [previous] + η δ(2)y(1) = 0+0.2*0.0152*[1 0.8808]
= [0.0030 0.0027]
w(2) = [1.0030 1.0027]T
Input to hidden layer
δ(1) = y(1-y)* δ(2)*w1(2) = 0.8808*(1-0.8808)*0.0152*1 = 0.0016
∆w(1) = α∆w(1) [previous] + η δ(1)x = 0+0.2*0.0016*[1 1 0]
= [0.00032 0.00032 0]
w(1) = [1.00032 1.00032 1]T
Example {(0, 1), 0}
x = [1 0 1]T
v(1) = xTw(1) = [1 0 1]T[1.00032 1.00032 1] = 2.00032
y1(1) = 1/(1+exp(-2.00032)) = 0.8808
v(2) = y(1)Tw(2) = [1 0.8808]T[1.003 1.0027] = 1.8862
y(2) = 1/(1+exp(-1.8862)) = 0.8683
CS 537 / CMPE 537 (Sp 06-07) – Dr. Asim Karim
Page 4 of 6
Hidden to output layer
δ(2) = (d-y)*y(1-y) = (0-0.8683)*0.8683*(1-0.8683) = -0.0993
∆w(2) = α∆w(2) [previous] + η δ(2)y(1) = 0.9*[0.003 0.0027]+0.2*-0.0993*[1 0.8808]
= [-0.0172 -0.0151]
w(2) = w(2) [old) + ∆w(2) = [1.0030 1.0027] + [-0.0172 -0.0151]
= [0.9858 0.9876]T
Input to hidden layer
δ(1) = y(1-y)* δ(2)*w1(2) = 0.8808*(1-0.8808)*-0.0993*1.0027 = -0.0105
∆w(1) = α∆w(1) [previous] + η δ(1)x = 0.9*[0.00032 0.00032 0]+0.2*-0.0105*[1 0 1]
= [-0.0018 0.00029 -0.0021]
w(1) = w(1) [old) + ∆w(1) = [1.00032 1.00032 1] + [-0.0018 0.00029 -0.0021]
= [0.9985 1.0006 0.9979]T
6. (15 points) (a) 5 points) Graphically show that the following two sets of points are not
linearly separable:
Set 1: (3, 3.5), (3.5, 4), (0.5, 1.5); Set 2: (1, 8), (1.5, 7), (4, 1.5)
(b) (10 points) Show that when these points are mapped nonlinearly onto a 2-D space
they become linearly separable. Describe the nonlinear mapping and plot the points in the
new space such that they are now linearly separable.
(a)
(b)
We use two Gaussian transfer functions to map the points from the input space to the 2dimensional feature space.
y1 = 10*exp(-1/4*||x – t1||2) where t1 = [0.5 1.5]
y2 = 10*exp(-1/4*||x – t2||2) where t2 = [3.5 3.5]
CS 537 / CMPE 537 (Sp 06-07) – Dr. Asim Karim
Page 5 of 6
Using the above equations, the points are mapped to:
Set 1: (0.771, 9.394), (2.21, 9.394), (10, 0.388)
Set 2: (0.0243, 0.013), 0.00405, 0.172), (0.468, 3.456)
These points are now linearly separable in the y1-y2 plane.
7. (10 points) List at least 8 heuristics that can be used to improve the performance of
the BP algorithm.
1. Using pattern-by-pattern or incremental mode of training for adaptability
2. Maximizing information content by randomly presenting training examples
3. Output normalization such that it lies within the bounding values of the activation
function
4. Using the anti-symmetric activation function (tanh(v))
5. Normalizing the input such each attribute has zero mean and unit variance, and
they are uncorrelated
6. Using larger learning rate parameter for neurons closer to the input
7. Incorporating momentum in the learning rule
8. Initializing the weights with small uniform random values
8. (10 points) Describe the classification/decision rule for a (a) single-layer perceptron,
(b) multi-layer perceptron, and (c) Bayes classifier. Assume a general k-class problem
where k > 2.
(a)
Single-layer perceptron uses the threshold activation function.
For a k-class problem, ceiling(log2k) output neurons are needed.
Decision rule: class j = base 10 of (binary output) + 1
(b)
For a k-class problem, k output neurons are needed.
Decision rule: If neuron j has the maximum output then class is j
(c)
Decision rule: if the likelihood for class j is maximum, then class is j
OR
If the posterior probability of the data given class j is maximum, then class is j
CS 537 / CMPE 537 (Sp 06-07) – Dr. Asim Karim
Page 6 of 6