Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Ludwig-Maximilians-Universitaet Muenchen Institute for Informatics Prof. Dr. Volker Tresp Janina Bleicher 08.05.2017 Machine Learning Summer 2017 Solutions to Exercise Sheet 2 Exercise 2-1 Matrices and Vectors in Python T Let a = (1, 2, 1) and b = (2, 2, 1)T be vectors, and 1 2 2 1 1 3 A = 2 1 , B = and C = matrices. 1 2 0 1 1 1 a) Calculate in Python the following: aT b, abT , AC, CAT , AT a, aT A. b) Invert B and check, if it holds that B−1 B = BB−1 = I. c) Generate a 3 × 3 orthonormale matrix, e.g. a shuffled unit matrix and check the calculation rules for orthonormal matrices. Exercise 2-2 Applying the perceptron learning rule in Python Let A and B be two classes, both comprising two patters: 2 1 0.5 0 A = p1 = , p2 = , B = p3 = , p4 = 4 0.5 1.5 0.5 Classes A and B are labelled with 1 and −1, respectively. Solve the following exercises in Python. Also, visualize the partial results. a) How many iterations are required by the pattern-based perceptron learning rule in order to seperate classes A and B correctly if the weight vector w is initialized as (0, 1, −1) and step size η is set to 0.1? b) How many iterations are required if η = 0.25? Is the order of the considered patterns relevant? If so, give an example, otherwise, prove it. c) After how many iterations does the gradient-based learning rule terminate for both η? In this case: Is the order of the considered patterns relevant? 1 Exercise 2-3 The ADALINE learning rule The adaptive linear element (ADALINE) model uses the least mean square cost function N cost = 1X (yi − ŷi )2 , 2 i=1 for N training set elements, where yi is the actual and ŷi the computed class label of pattern i. In contrast to the simple perceptron, classification is not realized by the signum-function. Instead, it is done directly: ŷ = h. (As a reminder: M is the number of input features of patterns xi ∈ M and the dimensionality of the weight vector w ∈ M , where x0 = 1 is constant and corresponds to the bias or offset.) R R a) Deduce the gradient descent-based learning rule (or: adaption rule) for the ADALINE process (analoguously to the perceptron learning rule). b) Specify the corresponding sample-based learning rule. c) What advantages do sample-based learning rules have? d) Name the most distinctive characteristics between the ADALINE model and the perceptron model. Exercise 2-4 Regularisation / Overfitting a) What is overfitting and how does it occur? b) How can a model be identified as “overfitted”? c) How can overfitting be avoided? 2