Download Machine Learning Solutions to Exercise Sheet 2

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Ludwig-Maximilians-Universitaet Muenchen
Institute for Informatics
Prof. Dr. Volker Tresp
Janina Bleicher
08.05.2017
Machine Learning
Summer 2017
Solutions to Exercise Sheet 2
Exercise 2-1
Matrices and Vectors in Python
T
Let a = (1, 2, 1) and b = (2, 2, 1)T be vectors, and


1 2
2 1
1 3
A =  2 1 , B =
and C =
matrices.
1 2
0 1
1 1
a) Calculate in Python the following:
aT b, abT , AC, CAT , AT a, aT A.
b) Invert B and check, if it holds that B−1 B = BB−1 = I.
c) Generate a 3 × 3 orthonormale matrix, e.g. a shuffled unit matrix and check the calculation rules for
orthonormal matrices.
Exercise 2-2
Applying the perceptron learning rule in Python
Let A and B be two classes, both comprising two patters:
2
1
0.5
0
A = p1 =
, p2 =
,
B = p3 =
, p4 =
4
0.5
1.5
0.5
Classes A and B are labelled with 1 and −1, respectively.
Solve the following exercises in Python. Also, visualize the partial results.
a) How many iterations are required by the pattern-based perceptron learning rule in order to seperate classes
A and B correctly if the weight vector w is initialized as (0, 1, −1) and step size η is set to 0.1?
b) How many iterations are required if η = 0.25? Is the order of the considered patterns relevant? If so, give
an example, otherwise, prove it.
c) After how many iterations does the gradient-based learning rule terminate for both η? In this case: Is the
order of the considered patterns relevant?
1
Exercise 2-3
The ADALINE learning rule
The adaptive linear element (ADALINE) model uses the least mean square cost function
N
cost =
1X
(yi − ŷi )2 ,
2
i=1
for N training set elements, where yi is the actual and ŷi the computed class label of pattern i. In contrast to
the simple perceptron, classification is not realized by the signum-function. Instead, it is done directly: ŷ = h.
(As a reminder: M is the number of input features of patterns xi ∈ M and the dimensionality of the weight
vector w ∈ M , where x0 = 1 is constant and corresponds to the bias or offset.)
R
R
a) Deduce the gradient descent-based learning rule (or: adaption rule) for the ADALINE process (analoguously
to the perceptron learning rule).
b) Specify the corresponding sample-based learning rule.
c) What advantages do sample-based learning rules have?
d) Name the most distinctive characteristics between the ADALINE model and the perceptron model.
Exercise 2-4
Regularisation / Overfitting
a) What is overfitting and how does it occur?
b) How can a model be identified as “overfitted”?
c) How can overfitting be avoided?
2