Download Machine Learning 1: Statistical Approaches

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Transcript
Machine Learning 1:
Statistical Approaches
Summer Semester 2012, Homework 1
Prof. Dr. J. Peters, M.Eng. O. Kroemer
Aufgabe 1.1 Linear Algebra [30 Points]
a) Matrix Inversion [10 Points]
1. Compute the inverse X −1 of the symmetric 2 × 2 matrix below.
2 2
X=
2 4
Show your work!
2. What are the eigenvectors v 1 and v 2 and eigenvalues λ1 and λ2 of X ? (hint:use the eig function in matlab)
3. Express X in terms of the eigenvectors v 1 and v 2 , and corresponding eigenvalues λ1 and λ2 :
1
Machine Learning 1: Statistical Approaches - Homework 1
Name, Vorname:
Matrikelnummer:
4. Give a numerical example of Ab = Ac but b 6= c. Is the matrix A invertible?
b) Simultaneous Equations [8 Points]
The figure below shows the relationship between the eight variables a, b, c, . . ., and h. The values of variables a
to g are given by the sum of the values of the neighbouring variables, e.g., b = a + h + c. The variable h is an
exception, and it is known that h = 15.
1. Write the problem in matrix form Ax = b, where x = [a b c d e f g h]T :
2
Machine Learning 1: Statistical Approaches - Homework 1
Name, Vorname:
Matrikelnummer:
2. Solve for x:
c) Gaussian Alignment [12 Points]
Download from
http://www.ias.informatik.tu-darmstadt.de/uploads/Teaching/MachineLearning1Lecture/GaussData
the set of 3D data points D = {x1 ,x2 , . . . ,x500 }. Using your favorite math language (e.g., MATLAB), complete
the following tasks. Please attach a copy of your code when handing in your homework.
1. Compute the mean vector µ and covariance matrix Σ for these points.
2. Compute the normalized Eigenvectors v 1 , v 2 , and v 3 of the covariance matrix.
3
Machine Learning 1: Statistical Approaches - Homework 1
Name, Vorname:
Matrikelnummer:
3. Compute a rotation matrix W such that the covariance matrix Σ∗ of the transformed points X ∗ = W X is
diagonal; i.e. the Gaussian distribution is aligned with the axes.
4. Plot the transformed points xi along the two dimensions with the highest variance. Attach the plot.
Aufgabe 1.2 Basics of Probabilities, Statistics and Information Theory [24 Points]
A robot plays table tennis against a ball gun. The ball gun can either serve to the left or to the right, i.e., G ∈
{l, r}. The robot can either do a fore- or backhand, i.e., R ∈ {f, b}. We are observing the following probability
distribution.
p(G = l, R = f )
0.1
p(G = l, R = b)
0.2
p(G = r, R = f )
0.4
p(G = r, R = b)
0.3
This distribution leads to a lot of questions for the robot’s table tennis trainer.
1. Rules of Probability [12 Points]
i. What are the marginal distributions p(G = l) and p(R = b)? Is the ball-gun like an unbiased coin?
ii. Determine the conditional probability p(R = f |G = l) and p(R = f |G = r) . If the teacher wanted to
train the robot both fore- and backhand equally often, should he chaneg ball gun?
4
Machine Learning 1: Statistical Approaches - Homework 1
Name, Vorname:
Matrikelnummer:
iii. The robot hits the ball. We have the probabilities p(H = s|R = f ) = 0.3 and p(H = s|R = b) = 0.9
where H = s means a successful hit while H = u was an unsuccesful stroke. With what probability
does the robot hit the ball?
2. Information Theory [12 Points]
i. How many bits would you need to encode a ball exhange, i.e., GR?
ii. How would the maximum entropy p∗ distribution with that many bits look like?
iii. You want to measure the difference between the distributions in the last two questions. How can we do
that?
5
Machine Learning 1: Statistical Approaches - Homework 1
Name, Vorname:
Matrikelnummer:
Aufgabe 1.3 Optimization [45 Points]
Here are two optimization problems. The first can be solved analytically while the second one can only be solved numerically.
a) Maximum Entropy Second Moment Matching [23 Points]
You are given an optimization problem where
max f (p)
p
s.t. c
=
=
n
X
pi log pi ,
i=1
n
X
pi i2 ,
n
X
pi .
i=1
1
=
i=1
Here, we denote p = [p1 , p2 , p3 , . . . , pn ].
Note: this class of problems is highly relevant in many fields.
1. Write out the Langrangian L(p, λ, θ) with Langragian multipliers λ and θ .
2. Differentiate the Langrangian with respect to pj .
3. Solve for the optimal p∗j .
6
Machine Learning 1: Statistical Approaches - Homework 1
Name, Vorname:
Matrikelnummer:
4. Write out the dual function G(λ, θ) = L(p∗ , λ, θ) after inserting all p∗j .
b) Gradient Descent on a Function [22 Points]
P
2
You want to optimize the function J (x1 , x2 ) = − 10
i=1 exp −x1 i − x2 − 1 − 5x1 − x2 as it describes your
machine learning problem very well. Use MATLAB, SciPy, Octave or your favorite math language in this excercise.
1. What is the derivative of J with respect to x = [x1 , x2 ]?
2. Please create a contour plot (e.g., using the MATLAB function contourf). Use x1 ∈ [0.05, . . . , 1] and x2 ∈
[−1.25, . . . , +1.25]. Attach the plot and indicate the maximum with a pen.
3. Write a gradient descent method that starts at point xinitial with a learning rate of α for N optimization
steps. Describe the optimization process. Plot the evolution of the methods over the optimization steps.
Ideally, overlay it with the contour plot from before.
Use the following values. Attach the plot.
i. xinitial = [0.9, 0.9], N = 1000, and α = 0.01.
ii. xinitial = [0.9, −0.9], N = 40, and α = 0.05.
4. Describe the difference between the two approaches in (3i) and (3ii).
i. Which reaches a higher value of J in 20 steps?
7
Machine Learning 1: Statistical Approaches - Homework 1
Name, Vorname:
Matrikelnummer:
ii. Which reaches the maximum? What is the maximum and what are the corresponding x1 and x2 values?
iii. What parameter stops one method from reaching the maximum? What happens as a result?
Aufgabe 1.4 Introduction [1 Points]
Watch jeopardy on
http://www.youtube.com/watch?v=-QYchgv5dMM
and tell us what answer Watson gave to „its largest airport is named for a World War II hero; its second largest for
a World War II battle“ in the category US cities?
8
Machine Learning 1: Statistical Approaches - Homework 1
Name, Vorname:
Matrikelnummer:
Earn Bonus Points!
Note, while the regular points count to your exercise grade, these count for the exam and are considerably harder.
Bonus 1: The Kalman Filter [20 Points]
Assume you have a Gaussian random variable x ∼ N x | µx , Σx , where x ∈ RD .
a) Time Update
The random variable x is being transformed according to the transition mapping
y = Ax + b + w ,
(1)
where y ∈ RE , A ∈ RE×D , b ∈ RE , and w ∼ N w | 0, Q is independent Gaussian (system) noise. “Independent” means that x and w are independent random variables.
1. Simplify the joint probability distribution p(x, w).
2. Write down p(y|x).
3. Compute p(y), i.e., the mean µy and the covariance Σy . Derive your result in detail.
b) Measurement Update
The random variable y is being transformed according to the measurement mapping
z = Cy + v ,
(2)
where z ∈ RF , C ∈ RF ×E , and v ∼ N v | 0, R is independent Gaussian (measurement) noise.
9
Machine Learning 1: Statistical Approaches - Homework 1
Name, Vorname:
Matrikelnummer:
1. Write down p(z|y).
2. Compute p(z), i.e., the mean µz and the covariance Σz . Derive your result in detail.
3. Now, a value ẑ is measured. Compute the posterior distribution p(y|ẑ).
• Hint 1 for solution: Apply Bayes’ theorem.
• Hint 2 for solution: Start by explicitly computing the joint Gaussian p(y, z). This also requires to compute
the cross-covariances covy,z [y, z] and covy,z [z, y]. Then, apply the rules for Gaussian conditioning.
Bonus 2 [10 Points]
Consider two random variables x, y with joint distribution p(x, y). Prove the following two results:
a) Ex [x] = Ey Ex [x|y]
10
Machine Learning 1: Statistical Approaches - Homework 1
Name, Vorname:
Matrikelnummer:
b) varx [x] = Ey varx [x|y] + vary Ex [x|y]
Here, Ex [x|y] denotes the expected value of x under the conditional distribution p(x|y), with a similar notation for the
conditional variance.
11