Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
ROLL NO.: NAME: CS 537 / CMPE 537 – Neural Networks Midterm Exam Solution April 19, 2007 Duration: 80 minutes (10:15 to 11:35) 1. (10 points) Design a two-input neural network with McCulloch-Pitts neurons to perform the logical operations (a) A B, and (b) A XOR B, where A and B are Boolean variables. Draw the network and show the weghts. A 1 1 0 0 B 1 0 1 0 (a) Ouput 0 1 0 0 (b) Output 0 1 1 0 (a) is linearly separable, so a single-layer network can learn the operation. (b) requires a hidden layer consisting of two neurons to learn the operation. All activation functions are the threshold function defined as φ(v) = 1 if v > 0 and φ(v) = 0 otherwise. CS 537 / CMPE 537 (Sp 06-07) – Dr. Asim Karim Page 1 of 6 2. (10 points) Design a Wiener filter with 4 inputs and one output. The cross-correlation of the inputs is zero and the auto-correlation of attribute 3 and 4 is twice and thrice, respectively, that of attributes 1 and 2. The output is twice as strongly correlated with attribute 2 than with the other attributes. Draw the network and compute the optimal weights. The environment can be described by the following correlation matrix and crosscorrelation vector: There are four filters/weights w = [w1 w2 w3 w4]T The optimal weights are given by the following system of equations: Rxw = rxd or w = Rx-1rxd Solving, we get w1 = 1; w2 = 2, w3 = 0.5, w4 = 0.33 The optimal filter is thus defined as y = w1x1 + w2x2 + w3x3 + w4x4 . 3. (10 points) (a) Differentiate between the expected error, the time-average error, and the ensemble-average error. (b) What is the instantaneous error stating its advantages over the others. The expected error, denoted by E[e], is given by the sum of the products of each possible error value (e) and its probability of occurrence (P(e)). The time-average error is the average error observed over a fixed and finite time period for a given network. For example, the average error of a network over N training example is the time-average error. The ensemble-average error is the average error of a fixed and finite set of randomly initialized networks when presented with a training example. CS 537 / CMPE 537 (Sp 06-07) – Dr. Asim Karim Page 2 of 6 The time-average and ensemble-average errors are approximation of the expected error for a stochastic environment. If the environment is ergodic, then the time-average and ensemble-average errors are equal. If the environment is ergodic and stationary, then all three errors are equal. The instantaneous error is the error of a given network when presented with a given training example (i.e. e(i)). The advantage of this error measure is that its computation does not require complete knowledge of the environment. Furthermore, a network trained on the basis of the instantaneous error is capable of tracking concept drift in the environment. 4. (15 points) Consider a single-layer single-neuron (output neuron) feedforward network with linear activation functions. Use the method of steepest descent to derive the update rule for the network that minimizes the squared error at the output neuron. It is known that the desired output is defined by the equation w0 + w1x1 + w1x12 + … + wnxn + wnxn2, where w’s are the parameters of the network and n in the number of inputs. Is the single-layer network appropriate for this problem? Explain briefly your answer. CS 537 / CMPE 537 (Sp 06-07) – Dr. Asim Karim Page 3 of 6 5. (20 points) Consider a two-layer feedforward network with two inputs, one hidden neuron, and one output neuron. All neurons use the logistic activation function. Use the BP algorithm with momentum to update the weights of networks after each of the training examples {(1, 0), 1} and {(0, 1), 0}. Assume all weights are initially equal to 1, η = 0.2, and α = 0.9. Show your working. Weights from input to hidden layer, w(1) = [w0 w1 w2]T Weights from hidden to output layer, w(2) = [w0 w1]T Initially all weights are equal to 1. Example {(1, 0), 1} x = [1 1 0]T v(1) = xTw(1) = 2 y1(1) = 1/(1+exp(-2)) = 0.8808 v(2) = y(1)Tw(2) = [1 0.8808]T[1 1] = 1.8808 y(2) = 1/(1+exp(-1.8808)) = 0.8677 Hidden to output layer δ(2) = (d-y)*y(1-y) = (1-0.8677)*0.8677*(1-0.8677) = 0.0152 ∆w(2) = α∆w(2) [previous] + η δ(2)y(1) = 0+0.2*0.0152*[1 0.8808] = [0.0030 0.0027] w(2) = [1.0030 1.0027]T Input to hidden layer δ(1) = y(1-y)* δ(2)*w1(2) = 0.8808*(1-0.8808)*0.0152*1 = 0.0016 ∆w(1) = α∆w(1) [previous] + η δ(1)x = 0+0.2*0.0016*[1 1 0] = [0.00032 0.00032 0] w(1) = [1.00032 1.00032 1]T Example {(0, 1), 0} x = [1 0 1]T v(1) = xTw(1) = [1 0 1]T[1.00032 1.00032 1] = 2.00032 y1(1) = 1/(1+exp(-2.00032)) = 0.8808 v(2) = y(1)Tw(2) = [1 0.8808]T[1.003 1.0027] = 1.8862 y(2) = 1/(1+exp(-1.8862)) = 0.8683 CS 537 / CMPE 537 (Sp 06-07) – Dr. Asim Karim Page 4 of 6 Hidden to output layer δ(2) = (d-y)*y(1-y) = (0-0.8683)*0.8683*(1-0.8683) = -0.0993 ∆w(2) = α∆w(2) [previous] + η δ(2)y(1) = 0.9*[0.003 0.0027]+0.2*-0.0993*[1 0.8808] = [-0.0172 -0.0151] w(2) = w(2) [old) + ∆w(2) = [1.0030 1.0027] + [-0.0172 -0.0151] = [0.9858 0.9876]T Input to hidden layer δ(1) = y(1-y)* δ(2)*w1(2) = 0.8808*(1-0.8808)*-0.0993*1.0027 = -0.0105 ∆w(1) = α∆w(1) [previous] + η δ(1)x = 0.9*[0.00032 0.00032 0]+0.2*-0.0105*[1 0 1] = [-0.0018 0.00029 -0.0021] w(1) = w(1) [old) + ∆w(1) = [1.00032 1.00032 1] + [-0.0018 0.00029 -0.0021] = [0.9985 1.0006 0.9979]T 6. (15 points) (a) 5 points) Graphically show that the following two sets of points are not linearly separable: Set 1: (3, 3.5), (3.5, 4), (0.5, 1.5); Set 2: (1, 8), (1.5, 7), (4, 1.5) (b) (10 points) Show that when these points are mapped nonlinearly onto a 2-D space they become linearly separable. Describe the nonlinear mapping and plot the points in the new space such that they are now linearly separable. (a) (b) We use two Gaussian transfer functions to map the points from the input space to the 2dimensional feature space. y1 = 10*exp(-1/4*||x – t1||2) where t1 = [0.5 1.5] y2 = 10*exp(-1/4*||x – t2||2) where t2 = [3.5 3.5] CS 537 / CMPE 537 (Sp 06-07) – Dr. Asim Karim Page 5 of 6 Using the above equations, the points are mapped to: Set 1: (0.771, 9.394), (2.21, 9.394), (10, 0.388) Set 2: (0.0243, 0.013), 0.00405, 0.172), (0.468, 3.456) These points are now linearly separable in the y1-y2 plane. 7. (10 points) List at least 8 heuristics that can be used to improve the performance of the BP algorithm. 1. Using pattern-by-pattern or incremental mode of training for adaptability 2. Maximizing information content by randomly presenting training examples 3. Output normalization such that it lies within the bounding values of the activation function 4. Using the anti-symmetric activation function (tanh(v)) 5. Normalizing the input such each attribute has zero mean and unit variance, and they are uncorrelated 6. Using larger learning rate parameter for neurons closer to the input 7. Incorporating momentum in the learning rule 8. Initializing the weights with small uniform random values 8. (10 points) Describe the classification/decision rule for a (a) single-layer perceptron, (b) multi-layer perceptron, and (c) Bayes classifier. Assume a general k-class problem where k > 2. (a) Single-layer perceptron uses the threshold activation function. For a k-class problem, ceiling(log2k) output neurons are needed. Decision rule: class j = base 10 of (binary output) + 1 (b) For a k-class problem, k output neurons are needed. Decision rule: If neuron j has the maximum output then class is j (c) Decision rule: if the likelihood for class j is maximum, then class is j OR If the posterior probability of the data given class j is maximum, then class is j CS 537 / CMPE 537 (Sp 06-07) – Dr. Asim Karim Page 6 of 6